[2024-06-15 11:31:05,535][1648984] Saving configuration to train_dir/atari_2B_atari_journeyescape_1111/config.json... [2024-06-15 11:31:05,541][1648984] Rollout worker 0 uses device cpu [2024-06-15 11:31:05,541][1648984] Rollout worker 1 uses device cpu [2024-06-15 11:31:05,541][1648984] Rollout worker 2 uses device cpu [2024-06-15 11:31:05,541][1648984] Rollout worker 3 uses device cpu [2024-06-15 11:31:09,106][1648984] Using GPUs [0] for process 0 (actually maps to GPUs [1]) [2024-06-15 11:31:09,107][1648984] InferenceWorker_p0-w0: min num requests: 1 [2024-06-15 11:31:09,128][1648984] Starting all processes... [2024-06-15 11:31:09,129][1648984] Starting process learner_proc0 [2024-06-15 11:31:12,014][1648984] Starting all processes... [2024-06-15 11:31:12,017][1651340] Using GPUs [0] for process 0 (actually maps to GPUs [1]) [2024-06-15 11:31:12,017][1651340] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for learning process 0 [2024-06-15 11:31:12,018][1648984] Starting process inference_proc0-0 [2024-06-15 11:31:12,018][1648984] Starting process rollout_proc0 [2024-06-15 11:31:12,018][1648984] Starting process rollout_proc1 [2024-06-15 11:31:12,018][1648984] Starting process rollout_proc2 [2024-06-15 11:31:12,018][1648984] Starting process rollout_proc3 [2024-06-15 11:31:12,152][1651340] Num visible devices: 1 [2024-06-15 11:31:12,214][1651340] Setting fixed seed 1111 [2024-06-15 11:31:12,216][1651340] Using GPUs [0] for process 0 (actually maps to GPUs [1]) [2024-06-15 11:31:12,216][1651340] Initializing actor-critic model on device cuda:0 [2024-06-15 11:31:12,217][1651340] RunningMeanStd input shape: (4, 84, 84) [2024-06-15 11:31:12,218][1651340] RunningMeanStd input shape: (1,) [2024-06-15 11:31:12,235][1651340] ConvEncoder: input_channels=4 [2024-06-15 11:31:12,347][1651340] Conv encoder output size: 512 [2024-06-15 11:31:12,350][1651340] Created Actor Critic model with architecture: [2024-06-15 11:31:12,350][1651340] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) [2024-06-15 11:31:13,082][1651340] Using optimizer [2024-06-15 11:31:14,130][1651340] No checkpoints found [2024-06-15 11:31:14,130][1651340] Did not load from checkpoint, starting from scratch! [2024-06-15 11:31:14,131][1651340] Initialized policy 0 weights for model version 0 [2024-06-15 11:31:14,133][1651340] LearnerWorker_p0 finished initialization! [2024-06-15 11:31:14,133][1651340] Using GPUs [0] for process 0 (actually maps to GPUs [1]) [2024-06-15 11:31:14,186][1652487] Worker 3 uses CPU cores [72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] [2024-06-15 11:31:14,362][1652477] Worker 1 uses CPU cores [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47] [2024-06-15 11:31:14,445][1652475] Using GPUs [0] for process 0 (actually maps to GPUs [1]) [2024-06-15 11:31:14,445][1652475] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for inference process 0 [2024-06-15 11:31:14,492][1652475] Num visible devices: 1 [2024-06-15 11:31:14,538][1652479] Worker 2 uses CPU cores [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71] [2024-06-15 11:31:14,655][1648984] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:14,660][1652476] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] [2024-06-15 11:31:14,927][1652475] RunningMeanStd input shape: (4, 84, 84) [2024-06-15 11:31:14,928][1652475] RunningMeanStd input shape: (1,) [2024-06-15 11:31:14,944][1652475] ConvEncoder: input_channels=4 [2024-06-15 11:31:15,088][1652475] Conv encoder output size: 512 [2024-06-15 11:31:15,096][1648984] Inference worker 0-0 is ready! [2024-06-15 11:31:15,096][1648984] All inference workers are ready! Signal rollout workers to start! [2024-06-15 11:31:15,097][1652476] EnvRunner 0-0 uses policy 0 [2024-06-15 11:31:15,098][1652477] EnvRunner 1-0 uses policy 0 [2024-06-15 11:31:15,103][1652479] EnvRunner 2-0 uses policy 0 [2024-06-15 11:31:15,114][1652487] EnvRunner 3-0 uses policy 0 [2024-06-15 11:31:15,737][1648984] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:20,738][1648984] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:25,738][1648984] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:29,089][1648984] Heartbeat connected on Batcher_0 [2024-06-15 11:31:29,097][1648984] Heartbeat connected on LearnerWorker_p0 [2024-06-15 11:31:29,150][1648984] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-15 11:31:30,738][1648984] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:35,738][1648984] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:40,738][1648984] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:45,738][1648984] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:47,163][1648984] Heartbeat connected on RolloutWorker_w0 [2024-06-15 11:31:48,485][1648984] Heartbeat connected on RolloutWorker_w3 [2024-06-15 11:31:49,416][1648984] Heartbeat connected on RolloutWorker_w1 [2024-06-15 11:31:49,651][1648984] Heartbeat connected on RolloutWorker_w2 [2024-06-15 11:31:50,738][1648984] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 28.4. Samples: 1024. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:55,742][1648984] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1620.2. Samples: 66560. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:56,705][1651340] Signal inference workers to stop experience collection... [2024-06-15 11:31:56,741][1652475] InferenceWorker_p0-w0: stopping experience collection [2024-06-15 11:31:56,777][1652487] Worker 3, sleep for 0.750 sec to decorrelate experience collection [2024-06-15 11:31:57,531][1652487] Worker 3 awakens! [2024-06-15 11:31:59,193][1651340] Signal inference workers to resume experience collection... [2024-06-15 11:31:59,194][1652475] InferenceWorker_p0-w0: resuming experience collection [2024-06-15 11:32:00,762][1648984] Fps is (10 sec: 9806.7, 60 sec: 2132.1, 300 sec: 2132.1). Total num frames: 98304. Throughput: 0: 2877.0. Samples: 129536. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2024-06-15 11:32:00,906][1652477] Worker 1, sleep for 0.250 sec to decorrelate experience collection [2024-06-15 11:32:01,158][1652477] Worker 1 awakens! [2024-06-15 11:32:01,290][1652479] Worker 2, sleep for 0.500 sec to decorrelate experience collection [2024-06-15 11:32:01,292][1652475] Updated weights for policy 0, policy_version 68 (0.0013) [2024-06-15 11:32:01,812][1652479] Worker 2 awakens! [2024-06-15 11:32:02,903][1652475] Updated weights for policy 0, policy_version 128 (0.0040) [2024-06-15 11:32:03,908][1652475] Updated weights for policy 0, policy_version 176 (0.0011) [2024-06-15 11:32:05,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 9622.1, 300 sec: 9622.1). Total num frames: 491520. Throughput: 0: 3436.1. Samples: 154624. Policy #0 lag: (min: 111.0, avg: 156.2, max: 159.0) [2024-06-15 11:32:10,738][1648984] Fps is (10 sec: 42701.0, 60 sec: 9348.5, 300 sec: 9348.5). Total num frames: 524288. Throughput: 0: 4835.5. Samples: 217600. Policy #0 lag: (min: 111.0, avg: 156.2, max: 159.0) [2024-06-15 11:32:11,931][1652475] Updated weights for policy 0, policy_version 261 (0.0124) [2024-06-15 11:32:13,826][1652475] Updated weights for policy 0, policy_version 336 (0.0134) [2024-06-15 11:32:15,415][1652475] Updated weights for policy 0, policy_version 388 (0.0015) [2024-06-15 11:32:15,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 13653.3, 300 sec: 13411.4). Total num frames: 819200. Throughput: 0: 6303.3. Samples: 283648. Policy #0 lag: (min: 95.0, avg: 162.8, max: 351.0) [2024-06-15 11:32:17,634][1652475] Updated weights for policy 0, policy_version 480 (0.0012) [2024-06-15 11:32:20,738][1648984] Fps is (10 sec: 52430.1, 60 sec: 17476.2, 300 sec: 15867.7). Total num frames: 1048576. Throughput: 0: 6872.2. Samples: 309248. Policy #0 lag: (min: 125.0, avg: 236.3, max: 381.0) [2024-06-15 11:32:24,524][1652475] Updated weights for policy 0, policy_version 547 (0.0013) [2024-06-15 11:32:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 20206.9, 300 sec: 17056.5). Total num frames: 1212416. Throughput: 0: 8578.8. Samples: 386048. Policy #0 lag: (min: 15.0, avg: 84.3, max: 271.0) [2024-06-15 11:32:26,851][1652475] Updated weights for policy 0, policy_version 642 (0.0014) [2024-06-15 11:32:28,289][1652475] Updated weights for policy 0, policy_version 704 (0.0012) [2024-06-15 11:32:30,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 26214.3, 300 sec: 20673.2). Total num frames: 1572864. Throughput: 0: 9557.3. Samples: 430080. Policy #0 lag: (min: 79.0, avg: 243.8, max: 397.0) [2024-06-15 11:32:35,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 26214.3, 300 sec: 19398.4). Total num frames: 1572864. Throughput: 0: 10433.4. Samples: 470528. Policy #0 lag: (min: 79.0, avg: 243.8, max: 397.0) [2024-06-15 11:32:37,087][1652475] Updated weights for policy 0, policy_version 800 (0.0037) [2024-06-15 11:32:38,423][1651340] Signal inference workers to stop experience collection... (50 times) [2024-06-15 11:32:38,506][1652475] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-15 11:32:38,684][1651340] Signal inference workers to resume experience collection... (50 times) [2024-06-15 11:32:38,684][1652475] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-15 11:32:40,511][1652475] Updated weights for policy 0, policy_version 928 (0.0169) [2024-06-15 11:32:40,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 31675.7, 300 sec: 22078.2). Total num frames: 1900544. Throughput: 0: 10342.4. Samples: 531968. Policy #0 lag: (min: 127.0, avg: 162.0, max: 335.0) [2024-06-15 11:32:42,287][1652475] Updated weights for policy 0, policy_version 996 (0.0040) [2024-06-15 11:32:45,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 34952.5, 300 sec: 23024.8). Total num frames: 2097152. Throughput: 0: 10313.8. Samples: 593408. Policy #0 lag: (min: 46.0, avg: 221.8, max: 318.0) [2024-06-15 11:32:49,582][1652475] Updated weights for policy 0, policy_version 1056 (0.0011) [2024-06-15 11:32:50,738][1648984] Fps is (10 sec: 32766.8, 60 sec: 37136.9, 300 sec: 23190.7). Total num frames: 2228224. Throughput: 0: 10683.6. Samples: 635392. Policy #0 lag: (min: 15.0, avg: 63.5, max: 271.0) [2024-06-15 11:32:51,610][1652475] Updated weights for policy 0, policy_version 1136 (0.0014) [2024-06-15 11:32:53,171][1652475] Updated weights for policy 0, policy_version 1200 (0.0095) [2024-06-15 11:32:54,340][1652475] Updated weights for policy 0, policy_version 1251 (0.0013) [2024-06-15 11:32:55,758][1648984] Fps is (10 sec: 52321.0, 60 sec: 43675.7, 300 sec: 25928.4). Total num frames: 2621440. Throughput: 0: 10383.2. Samples: 685056. Policy #0 lag: (min: 76.0, avg: 159.2, max: 303.0) [2024-06-15 11:32:55,763][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000001280_2621440.pth... [2024-06-15 11:33:00,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 42069.2, 300 sec: 24711.4). Total num frames: 2621440. Throughput: 0: 10831.6. Samples: 771072. Policy #0 lag: (min: 76.0, avg: 159.2, max: 303.0) [2024-06-15 11:33:01,191][1652475] Updated weights for policy 0, policy_version 1296 (0.0037) [2024-06-15 11:33:02,679][1652475] Updated weights for policy 0, policy_version 1360 (0.0012) [2024-06-15 11:33:04,252][1652475] Updated weights for policy 0, policy_version 1427 (0.0013) [2024-06-15 11:33:05,738][1648984] Fps is (10 sec: 42686.2, 60 sec: 42598.4, 300 sec: 27433.9). Total num frames: 3047424. Throughput: 0: 10877.1. Samples: 798720. Policy #0 lag: (min: 150.0, avg: 179.5, max: 326.0) [2024-06-15 11:33:06,443][1652475] Updated weights for policy 0, policy_version 1520 (0.0014) [2024-06-15 11:33:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.8, 300 sec: 27099.1). Total num frames: 3145728. Throughput: 0: 10547.2. Samples: 860672. Policy #0 lag: (min: 150.0, avg: 179.5, max: 326.0) [2024-06-15 11:33:12,868][1652475] Updated weights for policy 0, policy_version 1553 (0.0013) [2024-06-15 11:33:13,828][1652475] Updated weights for policy 0, policy_version 1605 (0.0030) [2024-06-15 11:33:15,426][1652475] Updated weights for policy 0, policy_version 1680 (0.0112) [2024-06-15 11:33:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 28415.7). Total num frames: 3440640. Throughput: 0: 11173.0. Samples: 932864. Policy #0 lag: (min: 15.0, avg: 66.7, max: 271.0) [2024-06-15 11:33:16,331][1651340] Signal inference workers to stop experience collection... (100 times) [2024-06-15 11:33:16,370][1652475] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-15 11:33:16,751][1651340] Signal inference workers to resume experience collection... (100 times) [2024-06-15 11:33:16,751][1652475] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-15 11:33:18,214][1652475] Updated weights for policy 0, policy_version 1785 (0.0017) [2024-06-15 11:33:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 29108.1). Total num frames: 3670016. Throughput: 0: 10661.0. Samples: 950272. Policy #0 lag: (min: 143.0, avg: 221.4, max: 431.0) [2024-06-15 11:33:24,779][1652475] Updated weights for policy 0, policy_version 1812 (0.0012) [2024-06-15 11:33:25,733][1652475] Updated weights for policy 0, policy_version 1859 (0.0031) [2024-06-15 11:33:25,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43144.6, 300 sec: 28997.7). Total num frames: 3801088. Throughput: 0: 11229.9. Samples: 1037312. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 11:33:27,104][1652475] Updated weights for policy 0, policy_version 1936 (0.0015) [2024-06-15 11:33:29,602][1652475] Updated weights for policy 0, policy_version 2022 (0.0099) [2024-06-15 11:33:30,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 30821.8). Total num frames: 4194304. Throughput: 0: 11002.3. Samples: 1088512. Policy #0 lag: (min: 159.0, avg: 223.7, max: 411.0) [2024-06-15 11:33:35,667][1652475] Updated weights for policy 0, policy_version 2087 (0.0101) [2024-06-15 11:33:35,740][1648984] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 30194.0). Total num frames: 4259840. Throughput: 0: 11207.2. Samples: 1139712. Policy #0 lag: (min: 15.0, avg: 71.6, max: 271.0) [2024-06-15 11:33:37,340][1652475] Updated weights for policy 0, policy_version 2176 (0.0090) [2024-06-15 11:33:38,486][1652475] Updated weights for policy 0, policy_version 2228 (0.0018) [2024-06-15 11:33:40,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 46967.3, 300 sec: 32300.9). Total num frames: 4718592. Throughput: 0: 11565.1. Samples: 1205248. Policy #0 lag: (min: 41.0, avg: 195.1, max: 319.0) [2024-06-15 11:33:45,631][1652475] Updated weights for policy 0, policy_version 2307 (0.0015) [2024-06-15 11:33:45,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 31231.9). Total num frames: 4718592. Throughput: 0: 11639.5. Samples: 1294848. Policy #0 lag: (min: 41.0, avg: 195.1, max: 319.0) [2024-06-15 11:33:47,087][1652475] Updated weights for policy 0, policy_version 2387 (0.0014) [2024-06-15 11:33:49,228][1652475] Updated weights for policy 0, policy_version 2496 (0.0013) [2024-06-15 11:33:50,506][1652475] Updated weights for policy 0, policy_version 2544 (0.0013) [2024-06-15 11:33:50,742][1648984] Fps is (10 sec: 49130.8, 60 sec: 49694.7, 300 sec: 33379.6). Total num frames: 5210112. Throughput: 0: 11729.3. Samples: 1326592. Policy #0 lag: (min: 95.0, avg: 202.2, max: 287.0) [2024-06-15 11:33:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43705.6, 300 sec: 32547.8). Total num frames: 5242880. Throughput: 0: 12003.5. Samples: 1400832. Policy #0 lag: (min: 95.0, avg: 202.2, max: 287.0) [2024-06-15 11:33:56,468][1651340] Signal inference workers to stop experience collection... (150 times) [2024-06-15 11:33:56,518][1652475] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-15 11:33:56,743][1651340] Signal inference workers to resume experience collection... (150 times) [2024-06-15 11:33:56,745][1652475] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-15 11:33:56,893][1652475] Updated weights for policy 0, policy_version 2593 (0.0016) [2024-06-15 11:33:58,432][1652475] Updated weights for policy 0, policy_version 2672 (0.0017) [2024-06-15 11:34:00,112][1652475] Updated weights for policy 0, policy_version 2752 (0.0013) [2024-06-15 11:34:00,738][1648984] Fps is (10 sec: 45895.4, 60 sec: 50790.3, 300 sec: 34132.8). Total num frames: 5668864. Throughput: 0: 11832.9. Samples: 1465344. Policy #0 lag: (min: 93.0, avg: 206.9, max: 335.0) [2024-06-15 11:34:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 45329.1, 300 sec: 33709.9). Total num frames: 5767168. Throughput: 0: 12128.7. Samples: 1496064. Policy #0 lag: (min: 93.0, avg: 206.9, max: 335.0) [2024-06-15 11:34:05,739][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:05,740][1651340] Saving new best policy, reward=-19.570! [2024-06-15 11:34:08,262][1652475] Updated weights for policy 0, policy_version 2817 (0.0014) [2024-06-15 11:34:09,738][1652475] Updated weights for policy 0, policy_version 2884 (0.0014) [2024-06-15 11:34:10,738][1648984] Fps is (10 sec: 32766.5, 60 sec: 47513.2, 300 sec: 34055.2). Total num frames: 5996544. Throughput: 0: 11866.9. Samples: 1571328. Policy #0 lag: (min: 15.0, avg: 62.5, max: 271.0) [2024-06-15 11:34:10,739][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:11,097][1652475] Updated weights for policy 0, policy_version 2946 (0.0015) [2024-06-15 11:34:12,435][1652475] Updated weights for policy 0, policy_version 3008 (0.0012) [2024-06-15 11:34:13,766][1652475] Updated weights for policy 0, policy_version 3070 (0.0013) [2024-06-15 11:34:15,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 34743.6). Total num frames: 6291456. Throughput: 0: 12060.4. Samples: 1631232. Policy #0 lag: (min: 93.0, avg: 196.7, max: 321.0) [2024-06-15 11:34:15,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:20,704][1652475] Updated weights for policy 0, policy_version 3120 (0.0012) [2024-06-15 11:34:20,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 45328.8, 300 sec: 34338.3). Total num frames: 6389760. Throughput: 0: 11878.3. Samples: 1674240. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 11:34:20,739][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:22,672][1652475] Updated weights for policy 0, policy_version 3200 (0.0012) [2024-06-15 11:34:24,956][1652475] Updated weights for policy 0, policy_version 3296 (0.0016) [2024-06-15 11:34:25,739][1648984] Fps is (10 sec: 52428.5, 60 sec: 50244.2, 300 sec: 35669.1). Total num frames: 6815744. Throughput: 0: 11707.8. Samples: 1732096. Policy #0 lag: (min: 187.0, avg: 219.0, max: 381.0) [2024-06-15 11:34:25,740][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:30,738][1648984] Fps is (10 sec: 42599.5, 60 sec: 43690.7, 300 sec: 34759.6). Total num frames: 6815744. Throughput: 0: 11343.6. Samples: 1805312. Policy #0 lag: (min: 187.0, avg: 219.0, max: 381.0) [2024-06-15 11:34:30,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:32,340][1652475] Updated weights for policy 0, policy_version 3348 (0.0016) [2024-06-15 11:34:34,120][1651340] Signal inference workers to stop experience collection... (200 times) [2024-06-15 11:34:34,152][1652475] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-15 11:34:34,168][1652475] Updated weights for policy 0, policy_version 3425 (0.0013) [2024-06-15 11:34:34,421][1651340] Signal inference workers to resume experience collection... (200 times) [2024-06-15 11:34:34,421][1652475] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-15 11:34:35,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 48059.8, 300 sec: 35524.9). Total num frames: 7143424. Throughput: 0: 11435.8. Samples: 1841152. Policy #0 lag: (min: 15.0, avg: 69.8, max: 271.0) [2024-06-15 11:34:35,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:36,510][1652475] Updated weights for policy 0, policy_version 3528 (0.0014) [2024-06-15 11:34:37,673][1652475] Updated weights for policy 0, policy_version 3579 (0.0012) [2024-06-15 11:34:40,761][1648984] Fps is (10 sec: 52306.5, 60 sec: 43673.7, 300 sec: 35612.9). Total num frames: 7340032. Throughput: 0: 11019.4. Samples: 1896960. Policy #0 lag: (min: 127.0, avg: 219.8, max: 363.0) [2024-06-15 11:34:40,762][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:44,563][1652475] Updated weights for policy 0, policy_version 3621 (0.0013) [2024-06-15 11:34:45,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 46421.3, 300 sec: 35549.5). Total num frames: 7503872. Throughput: 0: 11207.1. Samples: 1969664. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 11:34:45,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:47,154][1652475] Updated weights for policy 0, policy_version 3731 (0.0105) [2024-06-15 11:34:48,540][1652475] Updated weights for policy 0, policy_version 3795 (0.0012) [2024-06-15 11:34:49,394][1652475] Updated weights for policy 0, policy_version 3836 (0.0010) [2024-06-15 11:34:50,738][1648984] Fps is (10 sec: 52551.1, 60 sec: 44240.0, 300 sec: 36395.0). Total num frames: 7864320. Throughput: 0: 11047.8. Samples: 1993216. Policy #0 lag: (min: 111.0, avg: 219.8, max: 378.0) [2024-06-15 11:34:50,739][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:55,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 44236.6, 300 sec: 35720.1). Total num frames: 7897088. Throughput: 0: 11138.9. Samples: 2072576. Policy #0 lag: (min: 10.0, avg: 65.2, max: 266.0) [2024-06-15 11:34:55,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:34:56,078][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000003872_7929856.pth... [2024-06-15 11:34:56,446][1652475] Updated weights for policy 0, policy_version 3888 (0.0012) [2024-06-15 11:34:57,927][1652475] Updated weights for policy 0, policy_version 3952 (0.0014) [2024-06-15 11:34:59,364][1652475] Updated weights for policy 0, policy_version 4008 (0.0013) [2024-06-15 11:35:00,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 43690.8, 300 sec: 36669.4). Total num frames: 8290304. Throughput: 0: 11059.2. Samples: 2128896. Policy #0 lag: (min: 125.0, avg: 208.2, max: 338.0) [2024-06-15 11:35:00,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:01,430][1652475] Updated weights for policy 0, policy_version 4091 (0.0017) [2024-06-15 11:35:05,738][1648984] Fps is (10 sec: 49154.1, 60 sec: 43690.8, 300 sec: 36301.4). Total num frames: 8388608. Throughput: 0: 10865.9. Samples: 2163200. Policy #0 lag: (min: 125.0, avg: 208.2, max: 338.0) [2024-06-15 11:35:05,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:08,517][1652475] Updated weights for policy 0, policy_version 4154 (0.0013) [2024-06-15 11:35:09,955][1652475] Updated weights for policy 0, policy_version 4208 (0.0013) [2024-06-15 11:35:10,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 44237.2, 300 sec: 36643.0). Total num frames: 8650752. Throughput: 0: 11195.8. Samples: 2235904. Policy #0 lag: (min: 15.0, avg: 70.9, max: 271.0) [2024-06-15 11:35:10,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:11,750][1652475] Updated weights for policy 0, policy_version 4272 (0.0012) [2024-06-15 11:35:12,315][1651340] Signal inference workers to stop experience collection... (250 times) [2024-06-15 11:35:12,349][1652475] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-15 11:35:12,588][1651340] Signal inference workers to resume experience collection... (250 times) [2024-06-15 11:35:12,590][1652475] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-15 11:35:13,497][1652475] Updated weights for policy 0, policy_version 4348 (0.0013) [2024-06-15 11:35:15,738][1648984] Fps is (10 sec: 52426.7, 60 sec: 43690.5, 300 sec: 36970.3). Total num frames: 8912896. Throughput: 0: 10820.2. Samples: 2292224. Policy #0 lag: (min: 143.0, avg: 229.1, max: 394.0) [2024-06-15 11:35:15,739][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:20,348][1652475] Updated weights for policy 0, policy_version 4416 (0.0014) [2024-06-15 11:35:20,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44237.0, 300 sec: 36751.8). Total num frames: 9043968. Throughput: 0: 10934.0. Samples: 2333184. Policy #0 lag: (min: 3.0, avg: 69.2, max: 259.0) [2024-06-15 11:35:20,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:22,772][1652475] Updated weights for policy 0, policy_version 4480 (0.0071) [2024-06-15 11:35:24,752][1652475] Updated weights for policy 0, policy_version 4562 (0.0087) [2024-06-15 11:35:25,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 43690.7, 300 sec: 37586.0). Total num frames: 9437184. Throughput: 0: 10814.5. Samples: 2383360. Policy #0 lag: (min: 95.0, avg: 197.0, max: 367.0) [2024-06-15 11:35:25,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 36852.1). Total num frames: 9437184. Throughput: 0: 10865.8. Samples: 2458624. Policy #0 lag: (min: 95.0, avg: 197.0, max: 367.0) [2024-06-15 11:35:30,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:31,515][1652475] Updated weights for policy 0, policy_version 4611 (0.0014) [2024-06-15 11:35:32,675][1652475] Updated weights for policy 0, policy_version 4668 (0.0014) [2024-06-15 11:35:35,491][1652475] Updated weights for policy 0, policy_version 4768 (0.0014) [2024-06-15 11:35:35,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43690.6, 300 sec: 37401.5). Total num frames: 9764864. Throughput: 0: 11150.2. Samples: 2494976. Policy #0 lag: (min: 0.0, avg: 60.2, max: 256.0) [2024-06-15 11:35:35,739][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:37,068][1652475] Updated weights for policy 0, policy_version 4832 (0.0015) [2024-06-15 11:35:40,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43707.7, 300 sec: 37437.6). Total num frames: 9961472. Throughput: 0: 10501.8. Samples: 2545152. Policy #0 lag: (min: 143.0, avg: 230.1, max: 399.0) [2024-06-15 11:35:40,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:44,613][1652475] Updated weights for policy 0, policy_version 4912 (0.0022) [2024-06-15 11:35:45,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 43144.5, 300 sec: 37230.5). Total num frames: 10092544. Throughput: 0: 10934.0. Samples: 2620928. Policy #0 lag: (min: 9.0, avg: 70.6, max: 265.0) [2024-06-15 11:35:45,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:47,299][1652475] Updated weights for policy 0, policy_version 4976 (0.0028) [2024-06-15 11:35:49,210][1652475] Updated weights for policy 0, policy_version 5056 (0.0013) [2024-06-15 11:35:50,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.7, 300 sec: 37861.9). Total num frames: 10452992. Throughput: 0: 10740.6. Samples: 2646528. Policy #0 lag: (min: 127.0, avg: 196.8, max: 335.0) [2024-06-15 11:35:50,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:50,901][1652475] Updated weights for policy 0, policy_version 5120 (0.0016) [2024-06-15 11:35:55,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.7, 300 sec: 37304.9). Total num frames: 10485760. Throughput: 0: 10672.3. Samples: 2716160. Policy #0 lag: (min: 127.0, avg: 196.8, max: 335.0) [2024-06-15 11:35:55,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:35:58,889][1652475] Updated weights for policy 0, policy_version 5216 (0.0013) [2024-06-15 11:35:59,026][1651340] Signal inference workers to stop experience collection... (300 times) [2024-06-15 11:35:59,065][1652475] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-15 11:35:59,361][1651340] Signal inference workers to resume experience collection... (300 times) [2024-06-15 11:35:59,363][1652475] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-15 11:36:00,738][1648984] Fps is (10 sec: 32767.3, 60 sec: 41506.0, 300 sec: 37683.8). Total num frames: 10780672. Throughput: 0: 10717.9. Samples: 2774528. Policy #0 lag: (min: 15.0, avg: 91.1, max: 271.0) [2024-06-15 11:36:00,739][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:36:01,021][1652475] Updated weights for policy 0, policy_version 5285 (0.0011) [2024-06-15 11:36:03,223][1652475] Updated weights for policy 0, policy_version 5366 (0.0015) [2024-06-15 11:36:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 37824.5). Total num frames: 11010048. Throughput: 0: 10217.2. Samples: 2792960. Policy #0 lag: (min: 127.0, avg: 215.4, max: 383.0) [2024-06-15 11:36:05,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:36:09,410][1652475] Updated weights for policy 0, policy_version 5410 (0.0013) [2024-06-15 11:36:10,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 41506.0, 300 sec: 37766.5). Total num frames: 11141120. Throughput: 0: 10854.4. Samples: 2871808. Policy #0 lag: (min: 6.0, avg: 71.6, max: 262.0) [2024-06-15 11:36:10,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:36:11,364][1652475] Updated weights for policy 0, policy_version 5472 (0.0020) [2024-06-15 11:36:13,249][1652475] Updated weights for policy 0, policy_version 5537 (0.0013) [2024-06-15 11:36:14,893][1652475] Updated weights for policy 0, policy_version 5604 (0.0014) [2024-06-15 11:36:15,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.9, 300 sec: 39099.4). Total num frames: 11534336. Throughput: 0: 10331.0. Samples: 2923520. Policy #0 lag: (min: 159.0, avg: 246.0, max: 415.0) [2024-06-15 11:36:15,738][1648984] Avg episode reward: [(0, '-19.570')] [2024-06-15 11:36:20,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 41505.9, 300 sec: 39099.4). Total num frames: 11534336. Throughput: 0: 10490.3. Samples: 2967040. Policy #0 lag: (min: 159.0, avg: 246.0, max: 415.0) [2024-06-15 11:36:20,739][1648984] Avg episode reward: [(0, '-6.400')] [2024-06-15 11:36:20,740][1651340] Saving new best policy, reward=-6.400! [2024-06-15 11:36:21,719][1652475] Updated weights for policy 0, policy_version 5634 (0.0012) [2024-06-15 11:36:22,759][1652475] Updated weights for policy 0, policy_version 5685 (0.0014) [2024-06-15 11:36:24,113][1652475] Updated weights for policy 0, policy_version 5730 (0.0011) [2024-06-15 11:36:25,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40413.9, 300 sec: 40210.2). Total num frames: 11862016. Throughput: 0: 10672.4. Samples: 3025408. Policy #0 lag: (min: 14.0, avg: 81.1, max: 270.0) [2024-06-15 11:36:25,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:36:25,885][1652475] Updated weights for policy 0, policy_version 5808 (0.0012) [2024-06-15 11:36:26,368][1651340] Saving new best policy, reward=-5.640! [2024-06-15 11:36:28,073][1652475] Updated weights for policy 0, policy_version 5886 (0.0013) [2024-06-15 11:36:30,739][1648984] Fps is (10 sec: 52422.5, 60 sec: 43689.6, 300 sec: 40876.5). Total num frames: 12058624. Throughput: 0: 10205.5. Samples: 3080192. Policy #0 lag: (min: 143.0, avg: 257.7, max: 399.0) [2024-06-15 11:36:30,740][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:36:34,521][1652475] Updated weights for policy 0, policy_version 5943 (0.0014) [2024-06-15 11:36:35,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40413.9, 300 sec: 41321.0). Total num frames: 12189696. Throughput: 0: 10547.2. Samples: 3121152. Policy #0 lag: (min: 1.0, avg: 66.1, max: 257.0) [2024-06-15 11:36:35,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:36:37,468][1652475] Updated weights for policy 0, policy_version 6000 (0.0014) [2024-06-15 11:36:39,235][1652475] Updated weights for policy 0, policy_version 6065 (0.0012) [2024-06-15 11:36:40,066][1651340] Signal inference workers to stop experience collection... (350 times) [2024-06-15 11:36:40,110][1652475] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-15 11:36:40,293][1651340] Signal inference workers to resume experience collection... (350 times) [2024-06-15 11:36:40,294][1652475] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-15 11:36:40,738][1648984] Fps is (10 sec: 49159.4, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 12550144. Throughput: 0: 10251.4. Samples: 3177472. Policy #0 lag: (min: 127.0, avg: 195.4, max: 335.0) [2024-06-15 11:36:40,741][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:36:40,782][1652475] Updated weights for policy 0, policy_version 6132 (0.0014) [2024-06-15 11:36:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 12582912. Throughput: 0: 10592.8. Samples: 3251200. Policy #0 lag: (min: 127.0, avg: 195.4, max: 335.0) [2024-06-15 11:36:45,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:36:46,312][1652475] Updated weights for policy 0, policy_version 6179 (0.0014) [2024-06-15 11:36:48,154][1652475] Updated weights for policy 0, policy_version 6227 (0.0013) [2024-06-15 11:36:50,182][1652475] Updated weights for policy 0, policy_version 6305 (0.0015) [2024-06-15 11:36:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 43875.8). Total num frames: 12943360. Throughput: 0: 10934.1. Samples: 3284992. Policy #0 lag: (min: 13.0, avg: 95.7, max: 269.0) [2024-06-15 11:36:50,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:36:51,819][1652475] Updated weights for policy 0, policy_version 6371 (0.0015) [2024-06-15 11:36:55,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 43690.6, 300 sec: 44101.5). Total num frames: 13107200. Throughput: 0: 10365.1. Samples: 3338240. Policy #0 lag: (min: 93.0, avg: 244.1, max: 430.0) [2024-06-15 11:36:55,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:36:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000006400_13107200.pth... [2024-06-15 11:36:55,826][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000001280_2621440.pth [2024-06-15 11:36:58,291][1652475] Updated weights for policy 0, policy_version 6436 (0.0024) [2024-06-15 11:37:00,746][1648984] Fps is (10 sec: 36014.2, 60 sec: 42046.5, 300 sec: 43430.2). Total num frames: 13303808. Throughput: 0: 10829.6. Samples: 3410944. Policy #0 lag: (min: 31.0, avg: 105.8, max: 287.0) [2024-06-15 11:37:00,747][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:00,926][1652475] Updated weights for policy 0, policy_version 6498 (0.0015) [2024-06-15 11:37:02,794][1652475] Updated weights for policy 0, policy_version 6584 (0.0015) [2024-06-15 11:37:04,126][1652475] Updated weights for policy 0, policy_version 6627 (0.0117) [2024-06-15 11:37:05,754][1648984] Fps is (10 sec: 52342.7, 60 sec: 43678.6, 300 sec: 44428.7). Total num frames: 13631488. Throughput: 0: 10395.5. Samples: 3435008. Policy #0 lag: (min: 47.0, avg: 203.8, max: 324.0) [2024-06-15 11:37:05,755][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:09,892][1652475] Updated weights for policy 0, policy_version 6683 (0.0013) [2024-06-15 11:37:10,709][1652475] Updated weights for policy 0, policy_version 6720 (0.0013) [2024-06-15 11:37:10,738][1648984] Fps is (10 sec: 45914.0, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 13762560. Throughput: 0: 10752.0. Samples: 3509248. Policy #0 lag: (min: 46.0, avg: 122.6, max: 302.0) [2024-06-15 11:37:10,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:13,622][1652475] Updated weights for policy 0, policy_version 6788 (0.0013) [2024-06-15 11:37:15,738][1648984] Fps is (10 sec: 42669.5, 60 sec: 42052.3, 300 sec: 44098.0). Total num frames: 14057472. Throughput: 0: 10809.3. Samples: 3566592. Policy #0 lag: (min: 46.0, avg: 122.6, max: 302.0) [2024-06-15 11:37:15,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:15,996][1652475] Updated weights for policy 0, policy_version 6868 (0.0012) [2024-06-15 11:37:17,129][1652475] Updated weights for policy 0, policy_version 6912 (0.0014) [2024-06-15 11:37:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.9, 300 sec: 43875.8). Total num frames: 14155776. Throughput: 0: 10592.7. Samples: 3597824. Policy #0 lag: (min: 79.0, avg: 220.3, max: 306.0) [2024-06-15 11:37:20,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:24,168][1652475] Updated weights for policy 0, policy_version 6977 (0.0014) [2024-06-15 11:37:25,567][1651340] Signal inference workers to stop experience collection... (400 times) [2024-06-15 11:37:25,641][1652475] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-15 11:37:25,674][1651340] Signal inference workers to resume experience collection... (400 times) [2024-06-15 11:37:25,698][1652475] Updated weights for policy 0, policy_version 7040 (0.0013) [2024-06-15 11:37:25,719][1652475] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-15 11:37:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 14417920. Throughput: 0: 11002.3. Samples: 3672576. Policy #0 lag: (min: 8.0, avg: 93.3, max: 264.0) [2024-06-15 11:37:25,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:27,241][1652475] Updated weights for policy 0, policy_version 7097 (0.0079) [2024-06-15 11:37:28,822][1652475] Updated weights for policy 0, policy_version 7160 (0.0013) [2024-06-15 11:37:30,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43691.8, 300 sec: 44431.2). Total num frames: 14680064. Throughput: 0: 10626.8. Samples: 3729408. Policy #0 lag: (min: 8.0, avg: 93.3, max: 264.0) [2024-06-15 11:37:30,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:35,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 43690.4, 300 sec: 43764.7). Total num frames: 14811136. Throughput: 0: 10774.7. Samples: 3769856. Policy #0 lag: (min: 5.0, avg: 92.1, max: 261.0) [2024-06-15 11:37:35,739][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:35,910][1652475] Updated weights for policy 0, policy_version 7237 (0.0129) [2024-06-15 11:37:37,205][1652475] Updated weights for policy 0, policy_version 7294 (0.0014) [2024-06-15 11:37:39,258][1652475] Updated weights for policy 0, policy_version 7351 (0.0020) [2024-06-15 11:37:40,642][1652475] Updated weights for policy 0, policy_version 7408 (0.0014) [2024-06-15 11:37:40,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 15171584. Throughput: 0: 10922.7. Samples: 3829760. Policy #0 lag: (min: 33.0, avg: 182.3, max: 291.0) [2024-06-15 11:37:40,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:45,548][1652475] Updated weights for policy 0, policy_version 7472 (0.0045) [2024-06-15 11:37:45,738][1648984] Fps is (10 sec: 49153.3, 60 sec: 45329.1, 300 sec: 44320.2). Total num frames: 15302656. Throughput: 0: 10970.2. Samples: 3904512. Policy #0 lag: (min: 33.0, avg: 182.3, max: 291.0) [2024-06-15 11:37:45,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:47,969][1652475] Updated weights for policy 0, policy_version 7536 (0.0014) [2024-06-15 11:37:50,304][1652475] Updated weights for policy 0, policy_version 7586 (0.0012) [2024-06-15 11:37:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43878.9). Total num frames: 15564800. Throughput: 0: 11143.0. Samples: 3936256. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 11:37:50,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:52,258][1652475] Updated weights for policy 0, policy_version 7673 (0.0133) [2024-06-15 11:37:55,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 15728640. Throughput: 0: 10934.0. Samples: 4001280. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 11:37:55,741][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:37:57,164][1652475] Updated weights for policy 0, policy_version 7712 (0.0015) [2024-06-15 11:37:57,854][1652475] Updated weights for policy 0, policy_version 7744 (0.0014) [2024-06-15 11:38:00,254][1652475] Updated weights for policy 0, policy_version 7806 (0.0015) [2024-06-15 11:38:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44789.2, 300 sec: 43875.8). Total num frames: 15990784. Throughput: 0: 11218.5. Samples: 4071424. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 11:38:00,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:38:02,693][1652475] Updated weights for policy 0, policy_version 7876 (0.0016) [2024-06-15 11:38:04,012][1652475] Updated weights for policy 0, policy_version 7932 (0.0014) [2024-06-15 11:38:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43702.8, 300 sec: 44431.2). Total num frames: 16252928. Throughput: 0: 11093.3. Samples: 4097024. Policy #0 lag: (min: 66.0, avg: 199.4, max: 322.0) [2024-06-15 11:38:05,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:38:09,322][1652475] Updated weights for policy 0, policy_version 7993 (0.0012) [2024-06-15 11:38:10,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 43690.5, 300 sec: 43875.8). Total num frames: 16384000. Throughput: 0: 11047.8. Samples: 4169728. Policy #0 lag: (min: 66.0, avg: 199.4, max: 322.0) [2024-06-15 11:38:10,739][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:38:10,907][1651340] Signal inference workers to stop experience collection... (450 times) [2024-06-15 11:38:10,948][1652475] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-15 11:38:11,138][1651340] Signal inference workers to resume experience collection... (450 times) [2024-06-15 11:38:11,139][1652475] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-15 11:38:11,888][1652475] Updated weights for policy 0, policy_version 8057 (0.0014) [2024-06-15 11:38:13,893][1652475] Updated weights for policy 0, policy_version 8123 (0.0012) [2024-06-15 11:38:15,741][1648984] Fps is (10 sec: 49151.7, 60 sec: 44782.9, 300 sec: 44320.1). Total num frames: 16744448. Throughput: 0: 11116.1. Samples: 4229632. Policy #0 lag: (min: 15.0, avg: 109.8, max: 271.0) [2024-06-15 11:38:15,742][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:38:15,849][1652475] Updated weights for policy 0, policy_version 8189 (0.0014) [2024-06-15 11:38:20,738][1648984] Fps is (10 sec: 42599.3, 60 sec: 44236.8, 300 sec: 44098.0). Total num frames: 16809984. Throughput: 0: 11070.6. Samples: 4268032. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 11:38:20,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:38:21,338][1652475] Updated weights for policy 0, policy_version 8247 (0.0094) [2024-06-15 11:38:23,908][1652475] Updated weights for policy 0, policy_version 8312 (0.0014) [2024-06-15 11:38:24,922][1652475] Updated weights for policy 0, policy_version 8352 (0.0013) [2024-06-15 11:38:25,739][1648984] Fps is (10 sec: 39320.8, 60 sec: 45328.9, 300 sec: 43875.8). Total num frames: 17137664. Throughput: 0: 11343.6. Samples: 4340224. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 11:38:25,739][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:38:27,042][1652475] Updated weights for policy 0, policy_version 8432 (0.0013) [2024-06-15 11:38:30,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 17301504. Throughput: 0: 11173.0. Samples: 4407296. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 11:38:30,738][1648984] Avg episode reward: [(0, '-5.640')] [2024-06-15 11:38:32,469][1652475] Updated weights for policy 0, policy_version 8496 (0.0015) [2024-06-15 11:38:34,905][1652475] Updated weights for policy 0, policy_version 8544 (0.0018) [2024-06-15 11:38:35,738][1648984] Fps is (10 sec: 42599.5, 60 sec: 45875.4, 300 sec: 43542.6). Total num frames: 17563648. Throughput: 0: 11252.6. Samples: 4442624. Policy #0 lag: (min: 15.0, avg: 96.3, max: 271.0) [2024-06-15 11:38:35,738][1648984] Avg episode reward: [(0, '-4.200')] [2024-06-15 11:38:35,739][1651340] Saving new best policy, reward=-4.200! [2024-06-15 11:38:36,810][1652475] Updated weights for policy 0, policy_version 8615 (0.0017) [2024-06-15 11:38:38,430][1652475] Updated weights for policy 0, policy_version 8677 (0.0013) [2024-06-15 11:38:40,743][1648984] Fps is (10 sec: 52402.5, 60 sec: 44233.1, 300 sec: 44430.4). Total num frames: 17825792. Throughput: 0: 11092.1. Samples: 4500480. Policy #0 lag: (min: 15.0, avg: 96.3, max: 271.0) [2024-06-15 11:38:40,743][1648984] Avg episode reward: [(0, '-3.170')] [2024-06-15 11:38:40,784][1651340] Saving new best policy, reward=-3.170! [2024-06-15 11:38:44,145][1652475] Updated weights for policy 0, policy_version 8720 (0.0014) [2024-06-15 11:38:44,993][1652475] Updated weights for policy 0, policy_version 8759 (0.0015) [2024-06-15 11:38:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 43210.0). Total num frames: 17956864. Throughput: 0: 11013.7. Samples: 4567040. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 11:38:45,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:38:45,739][1651340] Saving new best policy, reward=-2.970! [2024-06-15 11:38:48,048][1652475] Updated weights for policy 0, policy_version 8816 (0.0026) [2024-06-15 11:38:49,636][1652475] Updated weights for policy 0, policy_version 8864 (0.0015) [2024-06-15 11:38:50,738][1648984] Fps is (10 sec: 39340.2, 60 sec: 44236.6, 300 sec: 43986.9). Total num frames: 18219008. Throughput: 0: 11218.4. Samples: 4601856. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 11:38:50,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:38:52,083][1652475] Updated weights for policy 0, policy_version 8957 (0.0121) [2024-06-15 11:38:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 18350080. Throughput: 0: 10877.2. Samples: 4659200. Policy #0 lag: (min: 111.0, avg: 223.8, max: 351.0) [2024-06-15 11:38:55,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:38:55,885][1651340] Signal inference workers to stop experience collection... (500 times) [2024-06-15 11:38:55,943][1652475] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-15 11:38:56,205][1651340] Signal inference workers to resume experience collection... (500 times) [2024-06-15 11:38:56,206][1652475] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-15 11:38:56,207][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000008976_18382848.pth... [2024-06-15 11:38:56,391][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000003872_7929856.pth [2024-06-15 11:38:57,220][1652475] Updated weights for policy 0, policy_version 9024 (0.0014) [2024-06-15 11:39:00,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 18579456. Throughput: 0: 11093.3. Samples: 4728832. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 11:39:00,739][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:00,939][1652475] Updated weights for policy 0, policy_version 9080 (0.0056) [2024-06-15 11:39:02,855][1652475] Updated weights for policy 0, policy_version 9152 (0.0013) [2024-06-15 11:39:05,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43653.7). Total num frames: 18874368. Throughput: 0: 10752.0. Samples: 4751872. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 11:39:05,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:08,635][1652475] Updated weights for policy 0, policy_version 9248 (0.0014) [2024-06-15 11:39:10,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 19005440. Throughput: 0: 10797.6. Samples: 4826112. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 11:39:10,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:11,334][1652475] Updated weights for policy 0, policy_version 9281 (0.0014) [2024-06-15 11:39:13,097][1652475] Updated weights for policy 0, policy_version 9350 (0.0015) [2024-06-15 11:39:15,617][1652475] Updated weights for policy 0, policy_version 9463 (0.0013) [2024-06-15 11:39:15,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 19365888. Throughput: 0: 10570.0. Samples: 4882944. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 11:39:15,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:20,738][1648984] Fps is (10 sec: 45872.0, 60 sec: 44236.3, 300 sec: 42876.0). Total num frames: 19464192. Throughput: 0: 10706.3. Samples: 4924416. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 11:39:20,739][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:20,886][1652475] Updated weights for policy 0, policy_version 9520 (0.0146) [2024-06-15 11:39:24,265][1652475] Updated weights for policy 0, policy_version 9595 (0.0017) [2024-06-15 11:39:25,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.7, 300 sec: 43764.7). Total num frames: 19726336. Throughput: 0: 10912.5. Samples: 4991488. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 11:39:25,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:26,708][1652475] Updated weights for policy 0, policy_version 9680 (0.0012) [2024-06-15 11:39:28,035][1652475] Updated weights for policy 0, policy_version 9728 (0.0011) [2024-06-15 11:39:30,738][1648984] Fps is (10 sec: 45878.5, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 19922944. Throughput: 0: 10934.0. Samples: 5059072. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 11:39:30,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:32,649][1652475] Updated weights for policy 0, policy_version 9781 (0.0013) [2024-06-15 11:39:35,583][1652475] Updated weights for policy 0, policy_version 9840 (0.0013) [2024-06-15 11:39:35,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43144.6, 300 sec: 43434.9). Total num frames: 20152320. Throughput: 0: 10934.1. Samples: 5093888. Policy #0 lag: (min: 2.0, avg: 90.7, max: 258.0) [2024-06-15 11:39:35,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:37,456][1652475] Updated weights for policy 0, policy_version 9889 (0.0014) [2024-06-15 11:39:38,243][1651340] Signal inference workers to stop experience collection... (550 times) [2024-06-15 11:39:38,277][1652475] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-15 11:39:38,563][1651340] Signal inference workers to resume experience collection... (550 times) [2024-06-15 11:39:38,565][1652475] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-15 11:39:39,197][1652475] Updated weights for policy 0, policy_version 9958 (0.0011) [2024-06-15 11:39:40,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43694.2, 300 sec: 43875.8). Total num frames: 20447232. Throughput: 0: 10979.5. Samples: 5153280. Policy #0 lag: (min: 2.0, avg: 90.7, max: 258.0) [2024-06-15 11:39:40,739][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:43,588][1652475] Updated weights for policy 0, policy_version 10000 (0.0014) [2024-06-15 11:39:45,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 20578304. Throughput: 0: 11150.2. Samples: 5230592. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 11:39:45,739][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:46,003][1652475] Updated weights for policy 0, policy_version 10051 (0.0013) [2024-06-15 11:39:47,046][1652475] Updated weights for policy 0, policy_version 10107 (0.0076) [2024-06-15 11:39:48,858][1652475] Updated weights for policy 0, policy_version 10161 (0.0021) [2024-06-15 11:39:50,560][1652475] Updated weights for policy 0, policy_version 10231 (0.0012) [2024-06-15 11:39:50,738][1648984] Fps is (10 sec: 49152.8, 60 sec: 45329.3, 300 sec: 44209.1). Total num frames: 20938752. Throughput: 0: 11423.3. Samples: 5265920. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 11:39:50,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:55,467][1652475] Updated weights for policy 0, policy_version 10298 (0.0014) [2024-06-15 11:39:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 43431.5). Total num frames: 21102592. Throughput: 0: 11411.9. Samples: 5339648. Policy #0 lag: (min: 2.0, avg: 88.1, max: 258.0) [2024-06-15 11:39:55,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:39:58,293][1652475] Updated weights for policy 0, policy_version 10352 (0.0013) [2024-06-15 11:39:59,803][1652475] Updated weights for policy 0, policy_version 10386 (0.0014) [2024-06-15 11:40:00,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 45875.2, 300 sec: 43875.8). Total num frames: 21331968. Throughput: 0: 11605.3. Samples: 5405184. Policy #0 lag: (min: 2.0, avg: 88.1, max: 258.0) [2024-06-15 11:40:00,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:40:01,050][1652475] Updated weights for policy 0, policy_version 10434 (0.0011) [2024-06-15 11:40:02,359][1652475] Updated weights for policy 0, policy_version 10493 (0.0013) [2024-06-15 11:40:05,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 21495808. Throughput: 0: 11423.5. Samples: 5438464. Policy #0 lag: (min: 2.0, avg: 88.1, max: 258.0) [2024-06-15 11:40:05,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:40:06,539][1652475] Updated weights for policy 0, policy_version 10533 (0.0042) [2024-06-15 11:40:08,975][1652475] Updated weights for policy 0, policy_version 10577 (0.0014) [2024-06-15 11:40:10,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 21757952. Throughput: 0: 11491.5. Samples: 5508608. Policy #0 lag: (min: 15.0, avg: 115.8, max: 271.0) [2024-06-15 11:40:10,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:40:11,127][1652475] Updated weights for policy 0, policy_version 10629 (0.0014) [2024-06-15 11:40:13,356][1652475] Updated weights for policy 0, policy_version 10706 (0.0054) [2024-06-15 11:40:14,422][1652475] Updated weights for policy 0, policy_version 10752 (0.0013) [2024-06-15 11:40:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 22020096. Throughput: 0: 11377.8. Samples: 5571072. Policy #0 lag: (min: 15.0, avg: 115.8, max: 271.0) [2024-06-15 11:40:15,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:40:18,358][1652475] Updated weights for policy 0, policy_version 10814 (0.0013) [2024-06-15 11:40:20,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 45875.7, 300 sec: 43320.4). Total num frames: 22216704. Throughput: 0: 11377.8. Samples: 5605888. Policy #0 lag: (min: 15.0, avg: 123.8, max: 271.0) [2024-06-15 11:40:20,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:40:23,104][1652475] Updated weights for policy 0, policy_version 10884 (0.0016) [2024-06-15 11:40:23,806][1651340] Signal inference workers to stop experience collection... (600 times) [2024-06-15 11:40:23,868][1652475] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-15 11:40:24,069][1651340] Signal inference workers to resume experience collection... (600 times) [2024-06-15 11:40:24,070][1652475] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-15 11:40:24,760][1652475] Updated weights for policy 0, policy_version 10947 (0.0012) [2024-06-15 11:40:25,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 44209.0). Total num frames: 22478848. Throughput: 0: 11559.8. Samples: 5673472. Policy #0 lag: (min: 15.0, avg: 123.8, max: 271.0) [2024-06-15 11:40:25,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:40:28,894][1652475] Updated weights for policy 0, policy_version 11011 (0.0012) [2024-06-15 11:40:30,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 43764.7). Total num frames: 22675456. Throughput: 0: 11207.2. Samples: 5734912. Policy #0 lag: (min: 127.0, avg: 231.1, max: 383.0) [2024-06-15 11:40:30,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:40:32,718][1652475] Updated weights for policy 0, policy_version 11097 (0.0088) [2024-06-15 11:40:33,523][1652475] Updated weights for policy 0, policy_version 11136 (0.0033) [2024-06-15 11:40:35,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 44782.8, 300 sec: 43653.6). Total num frames: 22839296. Throughput: 0: 11195.7. Samples: 5769728. Policy #0 lag: (min: 15.0, avg: 114.3, max: 271.0) [2024-06-15 11:40:35,738][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:40:36,416][1652475] Updated weights for policy 0, policy_version 11185 (0.0022) [2024-06-15 11:40:37,754][1652475] Updated weights for policy 0, policy_version 11248 (0.0013) [2024-06-15 11:40:40,738][1648984] Fps is (10 sec: 39319.7, 60 sec: 43690.4, 300 sec: 43986.8). Total num frames: 23068672. Throughput: 0: 11081.9. Samples: 5838336. Policy #0 lag: (min: 15.0, avg: 114.3, max: 271.0) [2024-06-15 11:40:40,739][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:40:41,459][1652475] Updated weights for policy 0, policy_version 11312 (0.0018) [2024-06-15 11:40:44,469][1652475] Updated weights for policy 0, policy_version 11362 (0.0014) [2024-06-15 11:40:45,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 43653.6). Total num frames: 23330816. Throughput: 0: 11173.0. Samples: 5907968. Policy #0 lag: (min: 32.0, avg: 141.2, max: 288.0) [2024-06-15 11:40:45,739][1648984] Avg episode reward: [(0, '-2.970')] [2024-06-15 11:40:46,960][1652475] Updated weights for policy 0, policy_version 11408 (0.0013) [2024-06-15 11:40:49,271][1652475] Updated weights for policy 0, policy_version 11504 (0.0013) [2024-06-15 11:40:50,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 44236.7, 300 sec: 44431.2). Total num frames: 23592960. Throughput: 0: 11116.1. Samples: 5938688. Policy #0 lag: (min: 32.0, avg: 141.2, max: 288.0) [2024-06-15 11:40:50,738][1648984] Avg episode reward: [(0, '-2.270')] [2024-06-15 11:40:50,739][1651340] Saving new best policy, reward=-2.270! [2024-06-15 11:40:53,664][1652475] Updated weights for policy 0, policy_version 11556 (0.0014) [2024-06-15 11:40:55,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 23724032. Throughput: 0: 11127.4. Samples: 6009344. Policy #0 lag: (min: 20.0, avg: 130.1, max: 276.0) [2024-06-15 11:40:55,739][1648984] Avg episode reward: [(0, '-2.250')] [2024-06-15 11:40:56,045][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000011600_23756800.pth... [2024-06-15 11:40:56,212][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000006400_13107200.pth [2024-06-15 11:40:56,220][1651340] Saving new best policy, reward=-2.250! [2024-06-15 11:40:56,967][1652475] Updated weights for policy 0, policy_version 11635 (0.0013) [2024-06-15 11:41:00,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 43144.6, 300 sec: 43764.7). Total num frames: 23920640. Throughput: 0: 10922.7. Samples: 6062592. Policy #0 lag: (min: 20.0, avg: 130.1, max: 276.0) [2024-06-15 11:41:00,738][1648984] Avg episode reward: [(0, '-1.910')] [2024-06-15 11:41:01,394][1651340] Saving new best policy, reward=-1.910! [2024-06-15 11:41:01,980][1652475] Updated weights for policy 0, policy_version 11731 (0.0014) [2024-06-15 11:41:05,156][1652475] Updated weights for policy 0, policy_version 11777 (0.0013) [2024-06-15 11:41:05,740][1648984] Fps is (10 sec: 45866.0, 60 sec: 44781.2, 300 sec: 44208.7). Total num frames: 24182784. Throughput: 0: 10671.8. Samples: 6086144. Policy #0 lag: (min: 111.0, avg: 217.3, max: 367.0) [2024-06-15 11:41:05,740][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:06,133][1651340] Saving new best policy, reward=-1.810! [2024-06-15 11:41:08,776][1652475] Updated weights for policy 0, policy_version 11863 (0.0015) [2024-06-15 11:41:09,791][1652475] Updated weights for policy 0, policy_version 11904 (0.0027) [2024-06-15 11:41:10,738][1648984] Fps is (10 sec: 45874.4, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 24379392. Throughput: 0: 10729.2. Samples: 6156288. Policy #0 lag: (min: 111.0, avg: 217.3, max: 367.0) [2024-06-15 11:41:10,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:11,686][1651340] Signal inference workers to stop experience collection... (650 times) [2024-06-15 11:41:11,736][1652475] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-15 11:41:11,915][1651340] Signal inference workers to resume experience collection... (650 times) [2024-06-15 11:41:11,916][1652475] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-15 11:41:13,109][1652475] Updated weights for policy 0, policy_version 11959 (0.0014) [2024-06-15 11:41:14,809][1652475] Updated weights for policy 0, policy_version 12024 (0.0116) [2024-06-15 11:41:15,738][1648984] Fps is (10 sec: 45885.3, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 24641536. Throughput: 0: 10752.0. Samples: 6218752. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 11:41:15,739][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:17,869][1652475] Updated weights for policy 0, policy_version 12068 (0.0012) [2024-06-15 11:41:20,630][1652475] Updated weights for policy 0, policy_version 12144 (0.0016) [2024-06-15 11:41:20,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 44236.8, 300 sec: 44097.9). Total num frames: 24870912. Throughput: 0: 10831.7. Samples: 6257152. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 11:41:20,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:24,881][1652475] Updated weights for policy 0, policy_version 12217 (0.0020) [2024-06-15 11:41:25,741][1648984] Fps is (10 sec: 39309.4, 60 sec: 42596.1, 300 sec: 43986.6). Total num frames: 25034752. Throughput: 0: 10774.1. Samples: 6323200. Policy #0 lag: (min: 15.0, avg: 115.8, max: 271.0) [2024-06-15 11:41:25,742][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:29,599][1652475] Updated weights for policy 0, policy_version 12289 (0.0014) [2024-06-15 11:41:30,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 25296896. Throughput: 0: 10706.5. Samples: 6389760. Policy #0 lag: (min: 15.0, avg: 115.8, max: 271.0) [2024-06-15 11:41:30,739][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:31,635][1652475] Updated weights for policy 0, policy_version 12358 (0.0014) [2024-06-15 11:41:32,741][1652475] Updated weights for policy 0, policy_version 12416 (0.0013) [2024-06-15 11:41:35,738][1648984] Fps is (10 sec: 45889.3, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 25493504. Throughput: 0: 10740.6. Samples: 6422016. Policy #0 lag: (min: 15.0, avg: 123.5, max: 271.0) [2024-06-15 11:41:35,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:36,097][1652475] Updated weights for policy 0, policy_version 12469 (0.0014) [2024-06-15 11:41:38,240][1652475] Updated weights for policy 0, policy_version 12528 (0.0012) [2024-06-15 11:41:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.9, 300 sec: 44431.2). Total num frames: 25690112. Throughput: 0: 10626.9. Samples: 6487552. Policy #0 lag: (min: 15.0, avg: 123.5, max: 271.0) [2024-06-15 11:41:40,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:41,641][1652475] Updated weights for policy 0, policy_version 12581 (0.0015) [2024-06-15 11:41:43,519][1652475] Updated weights for policy 0, policy_version 12646 (0.0014) [2024-06-15 11:41:45,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.7, 300 sec: 44097.9). Total num frames: 25952256. Throughput: 0: 11070.6. Samples: 6560768. Policy #0 lag: (min: 15.0, avg: 123.5, max: 271.0) [2024-06-15 11:41:45,739][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:46,995][1652475] Updated weights for policy 0, policy_version 12692 (0.0013) [2024-06-15 11:41:49,463][1652475] Updated weights for policy 0, policy_version 12768 (0.0013) [2024-06-15 11:41:50,740][1648984] Fps is (10 sec: 52418.2, 60 sec: 43689.2, 300 sec: 44430.9). Total num frames: 26214400. Throughput: 0: 11309.6. Samples: 6595072. Policy #0 lag: (min: 15.0, avg: 122.6, max: 271.0) [2024-06-15 11:41:50,740][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:52,593][1652475] Updated weights for policy 0, policy_version 12816 (0.0013) [2024-06-15 11:41:54,197][1652475] Updated weights for policy 0, policy_version 12865 (0.0022) [2024-06-15 11:41:55,261][1652475] Updated weights for policy 0, policy_version 12922 (0.0095) [2024-06-15 11:41:55,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 45875.4, 300 sec: 44654.6). Total num frames: 26476544. Throughput: 0: 11286.8. Samples: 6664192. Policy #0 lag: (min: 15.0, avg: 122.6, max: 271.0) [2024-06-15 11:41:55,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:41:58,427][1651340] Signal inference workers to stop experience collection... (700 times) [2024-06-15 11:41:58,469][1652475] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-06-15 11:41:58,738][1651340] Signal inference workers to resume experience collection... (700 times) [2024-06-15 11:41:58,738][1652475] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-06-15 11:41:59,158][1652475] Updated weights for policy 0, policy_version 12976 (0.0013) [2024-06-15 11:42:00,738][1648984] Fps is (10 sec: 42607.2, 60 sec: 45329.0, 300 sec: 44100.4). Total num frames: 26640384. Throughput: 0: 11400.5. Samples: 6731776. Policy #0 lag: (min: 15.0, avg: 118.9, max: 271.0) [2024-06-15 11:42:00,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:01,388][1652475] Updated weights for policy 0, policy_version 13040 (0.0014) [2024-06-15 11:42:05,311][1652475] Updated weights for policy 0, policy_version 13092 (0.0013) [2024-06-15 11:42:05,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 44784.5, 300 sec: 44431.2). Total num frames: 26869760. Throughput: 0: 11264.0. Samples: 6764032. Policy #0 lag: (min: 15.0, avg: 118.9, max: 271.0) [2024-06-15 11:42:05,739][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:06,887][1652475] Updated weights for policy 0, policy_version 13183 (0.0013) [2024-06-15 11:42:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44783.0, 300 sec: 44097.9). Total num frames: 27066368. Throughput: 0: 11310.3. Samples: 6832128. Policy #0 lag: (min: 15.0, avg: 119.4, max: 271.0) [2024-06-15 11:42:10,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:11,133][1652475] Updated weights for policy 0, policy_version 13240 (0.0013) [2024-06-15 11:42:13,456][1652475] Updated weights for policy 0, policy_version 13303 (0.0104) [2024-06-15 11:42:15,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 27262976. Throughput: 0: 11298.2. Samples: 6898176. Policy #0 lag: (min: 15.0, avg: 119.4, max: 271.0) [2024-06-15 11:42:15,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:16,533][1652475] Updated weights for policy 0, policy_version 13347 (0.0012) [2024-06-15 11:42:17,888][1652475] Updated weights for policy 0, policy_version 13392 (0.0013) [2024-06-15 11:42:20,739][1648984] Fps is (10 sec: 45867.2, 60 sec: 44235.5, 300 sec: 44430.9). Total num frames: 27525120. Throughput: 0: 11331.9. Samples: 6931968. Policy #0 lag: (min: 15.0, avg: 119.4, max: 271.0) [2024-06-15 11:42:20,740][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:21,445][1652475] Updated weights for policy 0, policy_version 13441 (0.0012) [2024-06-15 11:42:22,996][1652475] Updated weights for policy 0, policy_version 13501 (0.0012) [2024-06-15 11:42:24,870][1652475] Updated weights for policy 0, policy_version 13552 (0.0012) [2024-06-15 11:42:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45877.6, 300 sec: 44431.2). Total num frames: 27787264. Throughput: 0: 11434.7. Samples: 7002112. Policy #0 lag: (min: 15.0, avg: 123.0, max: 271.0) [2024-06-15 11:42:25,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:28,417][1652475] Updated weights for policy 0, policy_version 13631 (0.0116) [2024-06-15 11:42:29,974][1652475] Updated weights for policy 0, policy_version 13696 (0.0014) [2024-06-15 11:42:30,738][1648984] Fps is (10 sec: 52437.7, 60 sec: 45875.2, 300 sec: 44875.5). Total num frames: 28049408. Throughput: 0: 11218.5. Samples: 7065600. Policy #0 lag: (min: 15.0, avg: 123.0, max: 271.0) [2024-06-15 11:42:30,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:34,905][1652475] Updated weights for policy 0, policy_version 13759 (0.0015) [2024-06-15 11:42:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 44783.0, 300 sec: 44097.9). Total num frames: 28180480. Throughput: 0: 11275.9. Samples: 7102464. Policy #0 lag: (min: 15.0, avg: 117.5, max: 271.0) [2024-06-15 11:42:35,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:36,863][1652475] Updated weights for policy 0, policy_version 13816 (0.0012) [2024-06-15 11:42:39,522][1652475] Updated weights for policy 0, policy_version 13884 (0.0015) [2024-06-15 11:42:40,746][1648984] Fps is (10 sec: 45875.1, 60 sec: 46967.5, 300 sec: 44764.4). Total num frames: 28508160. Throughput: 0: 11332.2. Samples: 7174144. Policy #0 lag: (min: 15.0, avg: 117.5, max: 271.0) [2024-06-15 11:42:40,747][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:40,867][1652475] Updated weights for policy 0, policy_version 13936 (0.0031) [2024-06-15 11:42:45,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 45329.0, 300 sec: 44431.2). Total num frames: 28672000. Throughput: 0: 11332.3. Samples: 7241728. Policy #0 lag: (min: 2.0, avg: 100.7, max: 258.0) [2024-06-15 11:42:45,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:45,975][1652475] Updated weights for policy 0, policy_version 14015 (0.0015) [2024-06-15 11:42:47,064][1651340] Signal inference workers to stop experience collection... (750 times) [2024-06-15 11:42:47,162][1652475] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-06-15 11:42:47,350][1651340] Signal inference workers to resume experience collection... (750 times) [2024-06-15 11:42:47,351][1652475] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-06-15 11:42:48,618][1652475] Updated weights for policy 0, policy_version 14078 (0.0012) [2024-06-15 11:42:50,745][1648984] Fps is (10 sec: 39321.7, 60 sec: 44784.4, 300 sec: 44653.3). Total num frames: 28901376. Throughput: 0: 11252.7. Samples: 7270400. Policy #0 lag: (min: 2.0, avg: 100.7, max: 258.0) [2024-06-15 11:42:50,745][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:52,225][1652475] Updated weights for policy 0, policy_version 14160 (0.0015) [2024-06-15 11:42:55,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 29097984. Throughput: 0: 11229.9. Samples: 7337472. Policy #0 lag: (min: 2.0, avg: 100.7, max: 258.0) [2024-06-15 11:42:55,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:42:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000014208_29097984.pth... [2024-06-15 11:42:55,821][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000008976_18382848.pth [2024-06-15 11:42:57,206][1652475] Updated weights for policy 0, policy_version 14230 (0.0015) [2024-06-15 11:42:58,723][1652475] Updated weights for policy 0, policy_version 14273 (0.0049) [2024-06-15 11:43:00,200][1652475] Updated weights for policy 0, policy_version 14334 (0.0015) [2024-06-15 11:43:00,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 45329.1, 300 sec: 44431.2). Total num frames: 29360128. Throughput: 0: 11241.2. Samples: 7404032. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 11:43:00,738][1648984] Avg episode reward: [(0, '-1.810')] [2024-06-15 11:43:03,431][1652475] Updated weights for policy 0, policy_version 14401 (0.0021) [2024-06-15 11:43:04,552][1652475] Updated weights for policy 0, policy_version 14464 (0.0054) [2024-06-15 11:43:05,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 45875.3, 300 sec: 44875.5). Total num frames: 29622272. Throughput: 0: 11275.8. Samples: 7439360. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 11:43:05,739][1648984] Avg episode reward: [(0, '-1.640')] [2024-06-15 11:43:05,740][1651340] Saving new best policy, reward=-1.640! [2024-06-15 11:43:09,930][1652475] Updated weights for policy 0, policy_version 14520 (0.0013) [2024-06-15 11:43:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 45329.1, 300 sec: 44209.0). Total num frames: 29786112. Throughput: 0: 11082.0. Samples: 7500800. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 11:43:10,738][1648984] Avg episode reward: [(0, '-1.500')] [2024-06-15 11:43:11,136][1651340] Saving new best policy, reward=-1.500! [2024-06-15 11:43:11,138][1652475] Updated weights for policy 0, policy_version 14560 (0.0012) [2024-06-15 11:43:15,738][1648984] Fps is (10 sec: 26214.7, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 29884416. Throughput: 0: 10991.0. Samples: 7560192. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 11:43:15,738][1648984] Avg episode reward: [(0, '-1.510')] [2024-06-15 11:43:16,409][1652475] Updated weights for policy 0, policy_version 14624 (0.0014) [2024-06-15 11:43:18,546][1652475] Updated weights for policy 0, policy_version 14709 (0.0118) [2024-06-15 11:43:20,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 43691.9, 300 sec: 44098.0). Total num frames: 30146560. Throughput: 0: 10683.7. Samples: 7583232. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 11:43:20,739][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:43:20,739][1651340] Saving new best policy, reward=-1.390! [2024-06-15 11:43:22,735][1652475] Updated weights for policy 0, policy_version 14740 (0.0014) [2024-06-15 11:43:25,383][1652475] Updated weights for policy 0, policy_version 14832 (0.0016) [2024-06-15 11:43:25,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 30408704. Throughput: 0: 10626.9. Samples: 7652352. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 11:43:25,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:43:29,249][1652475] Updated weights for policy 0, policy_version 14897 (0.0015) [2024-06-15 11:43:30,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 42598.5, 300 sec: 44209.0). Total num frames: 30605312. Throughput: 0: 10467.6. Samples: 7712768. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 11:43:30,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:43:30,755][1652475] Updated weights for policy 0, policy_version 14960 (0.0013) [2024-06-15 11:43:35,154][1651340] Signal inference workers to stop experience collection... (800 times) [2024-06-15 11:43:35,197][1652475] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-06-15 11:43:35,305][1651340] Signal inference workers to resume experience collection... (800 times) [2024-06-15 11:43:35,305][1652475] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-06-15 11:43:35,469][1652475] Updated weights for policy 0, policy_version 15032 (0.0012) [2024-06-15 11:43:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43987.6). Total num frames: 30801920. Throughput: 0: 10615.5. Samples: 7748096. Policy #0 lag: (min: 9.0, avg: 103.8, max: 265.0) [2024-06-15 11:43:35,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:43:37,458][1652475] Updated weights for policy 0, policy_version 15097 (0.0013) [2024-06-15 11:43:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 44209.0). Total num frames: 30998528. Throughput: 0: 10729.2. Samples: 7820288. Policy #0 lag: (min: 9.0, avg: 103.8, max: 265.0) [2024-06-15 11:43:40,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:43:40,841][1652475] Updated weights for policy 0, policy_version 15152 (0.0014) [2024-06-15 11:43:42,517][1652475] Updated weights for policy 0, policy_version 15216 (0.0015) [2024-06-15 11:43:45,740][1648984] Fps is (10 sec: 39314.6, 60 sec: 42051.1, 300 sec: 43986.6). Total num frames: 31195136. Throughput: 0: 10694.7. Samples: 7885312. Policy #0 lag: (min: 9.0, avg: 103.8, max: 265.0) [2024-06-15 11:43:45,740][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:43:46,699][1652475] Updated weights for policy 0, policy_version 15280 (0.0109) [2024-06-15 11:43:48,170][1652475] Updated weights for policy 0, policy_version 15298 (0.0027) [2024-06-15 11:43:49,295][1652475] Updated weights for policy 0, policy_version 15352 (0.0031) [2024-06-15 11:43:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.5, 300 sec: 44431.2). Total num frames: 31457280. Throughput: 0: 10683.8. Samples: 7920128. Policy #0 lag: (min: 7.0, avg: 113.2, max: 263.0) [2024-06-15 11:43:50,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:43:52,583][1652475] Updated weights for policy 0, policy_version 15424 (0.0013) [2024-06-15 11:43:54,028][1652475] Updated weights for policy 0, policy_version 15481 (0.0012) [2024-06-15 11:43:55,738][1648984] Fps is (10 sec: 52437.6, 60 sec: 43690.6, 300 sec: 44542.3). Total num frames: 31719424. Throughput: 0: 10695.1. Samples: 7982080. Policy #0 lag: (min: 7.0, avg: 113.2, max: 263.0) [2024-06-15 11:43:55,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:43:58,111][1652475] Updated weights for policy 0, policy_version 15541 (0.0012) [2024-06-15 11:44:00,374][1652475] Updated weights for policy 0, policy_version 15575 (0.0021) [2024-06-15 11:44:00,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 44209.0). Total num frames: 31916032. Throughput: 0: 11025.1. Samples: 8056320. Policy #0 lag: (min: 9.0, avg: 113.6, max: 265.0) [2024-06-15 11:44:00,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:02,875][1652475] Updated weights for policy 0, policy_version 15632 (0.0013) [2024-06-15 11:44:04,828][1652475] Updated weights for policy 0, policy_version 15702 (0.0013) [2024-06-15 11:44:05,746][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 32243712. Throughput: 0: 11275.4. Samples: 8090624. Policy #0 lag: (min: 9.0, avg: 113.6, max: 265.0) [2024-06-15 11:44:05,747][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:08,875][1652475] Updated weights for policy 0, policy_version 15745 (0.0013) [2024-06-15 11:44:10,097][1652475] Updated weights for policy 0, policy_version 15800 (0.0014) [2024-06-15 11:44:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.5, 300 sec: 44098.0). Total num frames: 32374784. Throughput: 0: 11195.7. Samples: 8156160. Policy #0 lag: (min: 15.0, avg: 115.0, max: 271.0) [2024-06-15 11:44:10,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:12,917][1652475] Updated weights for policy 0, policy_version 15870 (0.0146) [2024-06-15 11:44:15,342][1652475] Updated weights for policy 0, policy_version 15920 (0.0016) [2024-06-15 11:44:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 44653.4). Total num frames: 32636928. Throughput: 0: 11332.3. Samples: 8222720. Policy #0 lag: (min: 15.0, avg: 115.0, max: 271.0) [2024-06-15 11:44:15,745][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:16,913][1652475] Updated weights for policy 0, policy_version 15990 (0.0013) [2024-06-15 11:44:20,366][1651340] Signal inference workers to stop experience collection... (850 times) [2024-06-15 11:44:20,444][1652475] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-06-15 11:44:20,530][1651340] Signal inference workers to resume experience collection... (850 times) [2024-06-15 11:44:20,531][1652475] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-06-15 11:44:20,669][1652475] Updated weights for policy 0, policy_version 16018 (0.0011) [2024-06-15 11:44:20,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.9, 300 sec: 44320.1). Total num frames: 32800768. Throughput: 0: 11264.0. Samples: 8254976. Policy #0 lag: (min: 3.0, avg: 106.5, max: 259.0) [2024-06-15 11:44:20,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:23,890][1652475] Updated weights for policy 0, policy_version 16080 (0.0012) [2024-06-15 11:44:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 33030144. Throughput: 0: 11241.2. Samples: 8326144. Policy #0 lag: (min: 3.0, avg: 106.5, max: 259.0) [2024-06-15 11:44:25,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:26,372][1652475] Updated weights for policy 0, policy_version 16144 (0.0014) [2024-06-15 11:44:27,971][1652475] Updated weights for policy 0, policy_version 16212 (0.0012) [2024-06-15 11:44:30,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 33292288. Throughput: 0: 11173.4. Samples: 8388096. Policy #0 lag: (min: 3.0, avg: 106.5, max: 259.0) [2024-06-15 11:44:30,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:32,188][1652475] Updated weights for policy 0, policy_version 16259 (0.0107) [2024-06-15 11:44:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 44098.0). Total num frames: 33456128. Throughput: 0: 11173.0. Samples: 8422912. Policy #0 lag: (min: 15.0, avg: 121.0, max: 271.0) [2024-06-15 11:44:35,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:35,902][1652475] Updated weights for policy 0, policy_version 16352 (0.0042) [2024-06-15 11:44:37,959][1652475] Updated weights for policy 0, policy_version 16416 (0.0014) [2024-06-15 11:44:39,899][1652475] Updated weights for policy 0, policy_version 16483 (0.0012) [2024-06-15 11:44:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 46967.4, 300 sec: 44875.5). Total num frames: 33816576. Throughput: 0: 11286.8. Samples: 8489984. Policy #0 lag: (min: 15.0, avg: 121.0, max: 271.0) [2024-06-15 11:44:40,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:44,245][1652475] Updated weights for policy 0, policy_version 16544 (0.0012) [2024-06-15 11:44:45,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 45876.6, 300 sec: 44098.0). Total num frames: 33947648. Throughput: 0: 11150.2. Samples: 8558080. Policy #0 lag: (min: 31.0, avg: 134.4, max: 287.0) [2024-06-15 11:44:45,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:47,379][1652475] Updated weights for policy 0, policy_version 16593 (0.0012) [2024-06-15 11:44:49,482][1652475] Updated weights for policy 0, policy_version 16656 (0.0095) [2024-06-15 11:44:50,400][1652475] Updated weights for policy 0, policy_version 16704 (0.0015) [2024-06-15 11:44:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 45875.1, 300 sec: 44431.2). Total num frames: 34209792. Throughput: 0: 11138.8. Samples: 8591872. Policy #0 lag: (min: 31.0, avg: 134.4, max: 287.0) [2024-06-15 11:44:50,739][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:52,113][1652475] Updated weights for policy 0, policy_version 16763 (0.0015) [2024-06-15 11:44:55,739][1648984] Fps is (10 sec: 42597.9, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 34373632. Throughput: 0: 11161.6. Samples: 8658432. Policy #0 lag: (min: 31.0, avg: 134.4, max: 287.0) [2024-06-15 11:44:55,740][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:44:56,519][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000016816_34439168.pth... [2024-06-15 11:44:56,576][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000011600_23756800.pth [2024-06-15 11:44:56,717][1652475] Updated weights for policy 0, policy_version 16826 (0.0013) [2024-06-15 11:44:59,822][1652475] Updated weights for policy 0, policy_version 16865 (0.0014) [2024-06-15 11:45:00,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 44782.9, 300 sec: 44431.2). Total num frames: 34603008. Throughput: 0: 11059.2. Samples: 8720384. Policy #0 lag: (min: 47.0, avg: 147.4, max: 303.0) [2024-06-15 11:45:00,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:45:01,747][1652475] Updated weights for policy 0, policy_version 16912 (0.0019) [2024-06-15 11:45:03,651][1652475] Updated weights for policy 0, policy_version 16976 (0.0112) [2024-06-15 11:45:05,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 34865152. Throughput: 0: 11104.7. Samples: 8754688. Policy #0 lag: (min: 47.0, avg: 147.4, max: 303.0) [2024-06-15 11:45:05,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:45:07,844][1651340] Signal inference workers to stop experience collection... (900 times) [2024-06-15 11:45:07,879][1652475] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-06-15 11:45:08,089][1651340] Signal inference workers to resume experience collection... (900 times) [2024-06-15 11:45:08,090][1652475] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-06-15 11:45:08,448][1652475] Updated weights for policy 0, policy_version 17056 (0.0012) [2024-06-15 11:45:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 34996224. Throughput: 0: 10945.4. Samples: 8818688. Policy #0 lag: (min: 47.0, avg: 151.6, max: 303.0) [2024-06-15 11:45:10,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:45:11,323][1652475] Updated weights for policy 0, policy_version 17104 (0.0014) [2024-06-15 11:45:13,788][1652475] Updated weights for policy 0, policy_version 17156 (0.0013) [2024-06-15 11:45:15,393][1652475] Updated weights for policy 0, policy_version 17219 (0.0104) [2024-06-15 11:45:15,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 44320.1). Total num frames: 35291136. Throughput: 0: 10968.2. Samples: 8881664. Policy #0 lag: (min: 47.0, avg: 151.6, max: 303.0) [2024-06-15 11:45:15,739][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:45:20,613][1652475] Updated weights for policy 0, policy_version 17328 (0.0018) [2024-06-15 11:45:20,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 44782.9, 300 sec: 44098.0). Total num frames: 35487744. Throughput: 0: 10922.7. Samples: 8914432. Policy #0 lag: (min: 47.0, avg: 151.6, max: 303.0) [2024-06-15 11:45:20,738][1648984] Avg episode reward: [(0, '-1.390')] [2024-06-15 11:45:23,647][1652475] Updated weights for policy 0, policy_version 17366 (0.0014) [2024-06-15 11:45:25,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 35651584. Throughput: 0: 10911.3. Samples: 8980992. Policy #0 lag: (min: 15.0, avg: 105.0, max: 271.0) [2024-06-15 11:45:25,738][1648984] Avg episode reward: [(0, '-1.180')] [2024-06-15 11:45:25,744][1651340] Saving new best policy, reward=-1.180! [2024-06-15 11:45:27,683][1652475] Updated weights for policy 0, policy_version 17424 (0.0015) [2024-06-15 11:45:29,732][1652475] Updated weights for policy 0, policy_version 17520 (0.0014) [2024-06-15 11:45:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 35913728. Throughput: 0: 10683.7. Samples: 9038848. Policy #0 lag: (min: 15.0, avg: 105.0, max: 271.0) [2024-06-15 11:45:30,738][1648984] Avg episode reward: [(0, '-1.430')] [2024-06-15 11:45:32,794][1652475] Updated weights for policy 0, policy_version 17555 (0.0011) [2024-06-15 11:45:33,763][1652475] Updated weights for policy 0, policy_version 17597 (0.0026) [2024-06-15 11:45:35,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 43144.3, 300 sec: 43986.9). Total num frames: 36044800. Throughput: 0: 10672.3. Samples: 9072128. Policy #0 lag: (min: 15.0, avg: 105.0, max: 271.0) [2024-06-15 11:45:35,739][1648984] Avg episode reward: [(0, '-1.260')] [2024-06-15 11:45:37,286][1652475] Updated weights for policy 0, policy_version 17661 (0.0013) [2024-06-15 11:45:39,851][1652475] Updated weights for policy 0, policy_version 17728 (0.0011) [2024-06-15 11:45:40,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 44209.0). Total num frames: 36372480. Throughput: 0: 10786.1. Samples: 9143808. Policy #0 lag: (min: 15.0, avg: 137.2, max: 271.0) [2024-06-15 11:45:40,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:45:43,393][1652475] Updated weights for policy 0, policy_version 17797 (0.0015) [2024-06-15 11:45:44,835][1652475] Updated weights for policy 0, policy_version 17855 (0.0013) [2024-06-15 11:45:45,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 36569088. Throughput: 0: 10786.1. Samples: 9205760. Policy #0 lag: (min: 15.0, avg: 137.2, max: 271.0) [2024-06-15 11:45:45,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:45:49,049][1652475] Updated weights for policy 0, policy_version 17912 (0.0012) [2024-06-15 11:45:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 44209.1). Total num frames: 36765696. Throughput: 0: 10934.0. Samples: 9246720. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 11:45:50,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:45:50,992][1652475] Updated weights for policy 0, policy_version 17968 (0.0013) [2024-06-15 11:45:51,705][1651340] Signal inference workers to stop experience collection... (950 times) [2024-06-15 11:45:51,786][1652475] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-06-15 11:45:51,896][1651340] Signal inference workers to resume experience collection... (950 times) [2024-06-15 11:45:51,906][1652475] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-06-15 11:45:52,488][1652475] Updated weights for policy 0, policy_version 18047 (0.0017) [2024-06-15 11:45:55,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 37027840. Throughput: 0: 10990.9. Samples: 9313280. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 11:45:55,740][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:45:56,242][1652475] Updated weights for policy 0, policy_version 18111 (0.0017) [2024-06-15 11:46:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42598.4, 300 sec: 43987.2). Total num frames: 37158912. Throughput: 0: 11127.5. Samples: 9382400. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 11:46:00,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:01,039][1652475] Updated weights for policy 0, policy_version 18169 (0.0013) [2024-06-15 11:46:03,014][1652475] Updated weights for policy 0, policy_version 18225 (0.0012) [2024-06-15 11:46:04,537][1652475] Updated weights for policy 0, policy_version 18300 (0.0014) [2024-06-15 11:46:05,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 37486592. Throughput: 0: 11070.6. Samples: 9412608. Policy #0 lag: (min: 51.0, avg: 145.9, max: 307.0) [2024-06-15 11:46:05,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:08,261][1652475] Updated weights for policy 0, policy_version 18365 (0.0015) [2024-06-15 11:46:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 37617664. Throughput: 0: 10956.8. Samples: 9474048. Policy #0 lag: (min: 51.0, avg: 145.9, max: 307.0) [2024-06-15 11:46:10,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:12,918][1652475] Updated weights for policy 0, policy_version 18432 (0.0063) [2024-06-15 11:46:15,037][1652475] Updated weights for policy 0, policy_version 18496 (0.0015) [2024-06-15 11:46:15,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 37912576. Throughput: 0: 11207.1. Samples: 9543168. Policy #0 lag: (min: 51.0, avg: 145.9, max: 307.0) [2024-06-15 11:46:15,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:16,729][1652475] Updated weights for policy 0, policy_version 18558 (0.0013) [2024-06-15 11:46:20,132][1652475] Updated weights for policy 0, policy_version 18617 (0.0014) [2024-06-15 11:46:20,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44236.8, 300 sec: 44431.7). Total num frames: 38141952. Throughput: 0: 11138.9. Samples: 9573376. Policy #0 lag: (min: 101.0, avg: 198.8, max: 357.0) [2024-06-15 11:46:20,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:24,198][1652475] Updated weights for policy 0, policy_version 18683 (0.0012) [2024-06-15 11:46:25,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44782.9, 300 sec: 44209.0). Total num frames: 38338560. Throughput: 0: 11138.8. Samples: 9645056. Policy #0 lag: (min: 101.0, avg: 198.8, max: 357.0) [2024-06-15 11:46:25,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:26,120][1652475] Updated weights for policy 0, policy_version 18736 (0.0011) [2024-06-15 11:46:27,919][1652475] Updated weights for policy 0, policy_version 18816 (0.0104) [2024-06-15 11:46:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 44209.0). Total num frames: 38535168. Throughput: 0: 11195.7. Samples: 9709568. Policy #0 lag: (min: 101.0, avg: 198.8, max: 357.0) [2024-06-15 11:46:30,739][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:34,820][1652475] Updated weights for policy 0, policy_version 18886 (0.0013) [2024-06-15 11:46:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 45329.3, 300 sec: 44320.1). Total num frames: 38764544. Throughput: 0: 11025.1. Samples: 9742848. Policy #0 lag: (min: 15.0, avg: 135.0, max: 271.0) [2024-06-15 11:46:35,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:35,823][1652475] Updated weights for policy 0, policy_version 18941 (0.0012) [2024-06-15 11:46:37,854][1651340] Signal inference workers to stop experience collection... (1000 times) [2024-06-15 11:46:37,855][1651340] Signal inference workers to resume experience collection... (1000 times) [2024-06-15 11:46:37,868][1652475] Updated weights for policy 0, policy_version 19008 (0.0014) [2024-06-15 11:46:37,881][1652475] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-06-15 11:46:37,882][1652475] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-06-15 11:46:38,910][1652475] Updated weights for policy 0, policy_version 19057 (0.0037) [2024-06-15 11:46:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 44782.9, 300 sec: 44431.2). Total num frames: 39059456. Throughput: 0: 11047.8. Samples: 9810432. Policy #0 lag: (min: 15.0, avg: 135.0, max: 271.0) [2024-06-15 11:46:40,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:43,099][1652475] Updated weights for policy 0, policy_version 19091 (0.0011) [2024-06-15 11:46:45,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43690.6, 300 sec: 43987.2). Total num frames: 39190528. Throughput: 0: 11116.1. Samples: 9882624. Policy #0 lag: (min: 15.0, avg: 135.0, max: 271.0) [2024-06-15 11:46:45,739][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:46,274][1652475] Updated weights for policy 0, policy_version 19137 (0.0019) [2024-06-15 11:46:47,636][1652475] Updated weights for policy 0, policy_version 19198 (0.0014) [2024-06-15 11:46:49,253][1652475] Updated weights for policy 0, policy_version 19250 (0.0012) [2024-06-15 11:46:50,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 46421.3, 300 sec: 44320.1). Total num frames: 39550976. Throughput: 0: 11229.9. Samples: 9917952. Policy #0 lag: (min: 8.0, avg: 93.7, max: 264.0) [2024-06-15 11:46:50,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:51,062][1652475] Updated weights for policy 0, policy_version 19327 (0.0025) [2024-06-15 11:46:55,724][1652475] Updated weights for policy 0, policy_version 19386 (0.0022) [2024-06-15 11:46:55,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 39682048. Throughput: 0: 11286.7. Samples: 9981952. Policy #0 lag: (min: 8.0, avg: 93.7, max: 264.0) [2024-06-15 11:46:55,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:46:55,849][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000019392_39714816.pth... [2024-06-15 11:46:55,898][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000014208_29097984.pth [2024-06-15 11:46:59,354][1652475] Updated weights for policy 0, policy_version 19449 (0.0084) [2024-06-15 11:47:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 45875.2, 300 sec: 44209.1). Total num frames: 39911424. Throughput: 0: 11275.4. Samples: 10050560. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 11:47:00,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:47:00,882][1652475] Updated weights for policy 0, policy_version 19504 (0.0031) [2024-06-15 11:47:02,706][1652475] Updated weights for policy 0, policy_version 19574 (0.0083) [2024-06-15 11:47:05,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43690.6, 300 sec: 44209.0). Total num frames: 40108032. Throughput: 0: 11207.1. Samples: 10077696. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 11:47:05,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:47:07,469][1652475] Updated weights for policy 0, policy_version 19640 (0.0014) [2024-06-15 11:47:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 45329.1, 300 sec: 44320.1). Total num frames: 40337408. Throughput: 0: 11389.2. Samples: 10157568. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 11:47:10,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:47:11,132][1652475] Updated weights for policy 0, policy_version 19721 (0.0013) [2024-06-15 11:47:12,878][1652475] Updated weights for policy 0, policy_version 19792 (0.0014) [2024-06-15 11:47:15,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 45329.0, 300 sec: 44431.4). Total num frames: 40632320. Throughput: 0: 11298.1. Samples: 10217984. Policy #0 lag: (min: 127.0, avg: 242.6, max: 383.0) [2024-06-15 11:47:15,739][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:47:18,256][1652475] Updated weights for policy 0, policy_version 19856 (0.0013) [2024-06-15 11:47:20,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 40763392. Throughput: 0: 11411.9. Samples: 10256384. Policy #0 lag: (min: 127.0, avg: 242.6, max: 383.0) [2024-06-15 11:47:20,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:47:21,925][1652475] Updated weights for policy 0, policy_version 19940 (0.0095) [2024-06-15 11:47:22,169][1651340] Signal inference workers to stop experience collection... (1050 times) [2024-06-15 11:47:22,285][1652475] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-06-15 11:47:22,406][1651340] Signal inference workers to resume experience collection... (1050 times) [2024-06-15 11:47:22,407][1652475] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-06-15 11:47:23,462][1652475] Updated weights for policy 0, policy_version 20016 (0.0014) [2024-06-15 11:47:24,764][1652475] Updated weights for policy 0, policy_version 20064 (0.0012) [2024-06-15 11:47:25,720][1652475] Updated weights for policy 0, policy_version 20096 (0.0013) [2024-06-15 11:47:25,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 46967.5, 300 sec: 44431.2). Total num frames: 41156608. Throughput: 0: 11366.4. Samples: 10321920. Policy #0 lag: (min: 127.0, avg: 242.6, max: 383.0) [2024-06-15 11:47:25,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:47:30,698][1652475] Updated weights for policy 0, policy_version 20151 (0.0020) [2024-06-15 11:47:30,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 45329.1, 300 sec: 44320.1). Total num frames: 41254912. Throughput: 0: 11252.6. Samples: 10388992. Policy #0 lag: (min: 15.0, avg: 114.0, max: 271.0) [2024-06-15 11:47:30,738][1648984] Avg episode reward: [(0, '-1.340')] [2024-06-15 11:47:33,753][1652475] Updated weights for policy 0, policy_version 20197 (0.0014) [2024-06-15 11:47:34,667][1652475] Updated weights for policy 0, policy_version 20240 (0.0013) [2024-06-15 11:47:35,741][1648984] Fps is (10 sec: 36044.9, 60 sec: 45875.1, 300 sec: 44098.0). Total num frames: 41517056. Throughput: 0: 11377.8. Samples: 10429952. Policy #0 lag: (min: 15.0, avg: 114.0, max: 271.0) [2024-06-15 11:47:35,742][1648984] Avg episode reward: [(0, '-1.540')] [2024-06-15 11:47:37,096][1652475] Updated weights for policy 0, policy_version 20337 (0.0121) [2024-06-15 11:47:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 44098.0). Total num frames: 41680896. Throughput: 0: 11161.6. Samples: 10484224. Policy #0 lag: (min: 15.0, avg: 114.0, max: 271.0) [2024-06-15 11:47:40,738][1648984] Avg episode reward: [(0, '-1.310')] [2024-06-15 11:47:42,673][1652475] Updated weights for policy 0, policy_version 20408 (0.0014) [2024-06-15 11:47:45,584][1652475] Updated weights for policy 0, policy_version 20464 (0.0011) [2024-06-15 11:47:45,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 45329.2, 300 sec: 44098.0). Total num frames: 41910272. Throughput: 0: 11070.6. Samples: 10548736. Policy #0 lag: (min: 15.0, avg: 102.8, max: 271.0) [2024-06-15 11:47:45,738][1648984] Avg episode reward: [(0, '-1.240')] [2024-06-15 11:47:48,280][1652475] Updated weights for policy 0, policy_version 20544 (0.0014) [2024-06-15 11:47:49,723][1652475] Updated weights for policy 0, policy_version 20607 (0.0014) [2024-06-15 11:47:50,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 42205184. Throughput: 0: 11116.1. Samples: 10577920. Policy #0 lag: (min: 15.0, avg: 102.8, max: 271.0) [2024-06-15 11:47:50,738][1648984] Avg episode reward: [(0, '-1.230')] [2024-06-15 11:47:55,366][1652475] Updated weights for policy 0, policy_version 20665 (0.0014) [2024-06-15 11:47:55,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 42336256. Throughput: 0: 10911.3. Samples: 10648576. Policy #0 lag: (min: 15.0, avg: 102.8, max: 271.0) [2024-06-15 11:47:55,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:47:55,754][1651340] Saving new best policy, reward=-1.070! [2024-06-15 11:47:58,942][1652475] Updated weights for policy 0, policy_version 20720 (0.0013) [2024-06-15 11:48:00,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 42532864. Throughput: 0: 10945.5. Samples: 10710528. Policy #0 lag: (min: 58.0, avg: 116.7, max: 250.0) [2024-06-15 11:48:00,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:00,953][1652475] Updated weights for policy 0, policy_version 20787 (0.0011) [2024-06-15 11:48:02,793][1652475] Updated weights for policy 0, policy_version 20858 (0.0012) [2024-06-15 11:48:05,746][1648984] Fps is (10 sec: 39288.5, 60 sec: 43684.6, 300 sec: 43874.5). Total num frames: 42729472. Throughput: 0: 10670.3. Samples: 10736640. Policy #0 lag: (min: 58.0, avg: 116.7, max: 250.0) [2024-06-15 11:48:05,747][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:06,612][1651340] Signal inference workers to stop experience collection... (1100 times) [2024-06-15 11:48:06,624][1652475] Updated weights for policy 0, policy_version 20882 (0.0012) [2024-06-15 11:48:06,675][1652475] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-06-15 11:48:06,809][1651340] Signal inference workers to resume experience collection... (1100 times) [2024-06-15 11:48:06,810][1652475] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-06-15 11:48:07,422][1652475] Updated weights for policy 0, policy_version 20928 (0.0012) [2024-06-15 11:48:10,738][1648984] Fps is (10 sec: 42597.1, 60 sec: 43690.5, 300 sec: 44320.1). Total num frames: 42958848. Throughput: 0: 11036.4. Samples: 10818560. Policy #0 lag: (min: 58.0, avg: 116.7, max: 250.0) [2024-06-15 11:48:10,739][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:10,876][1652475] Updated weights for policy 0, policy_version 20992 (0.0014) [2024-06-15 11:48:12,176][1652475] Updated weights for policy 0, policy_version 21040 (0.0017) [2024-06-15 11:48:13,832][1652475] Updated weights for policy 0, policy_version 21104 (0.0014) [2024-06-15 11:48:15,738][1648984] Fps is (10 sec: 52473.5, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 43253760. Throughput: 0: 10843.0. Samples: 10876928. Policy #0 lag: (min: 142.0, avg: 233.5, max: 383.0) [2024-06-15 11:48:15,746][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:19,161][1652475] Updated weights for policy 0, policy_version 21182 (0.0096) [2024-06-15 11:48:20,738][1648984] Fps is (10 sec: 42599.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 43384832. Throughput: 0: 10854.4. Samples: 10918400. Policy #0 lag: (min: 142.0, avg: 233.5, max: 383.0) [2024-06-15 11:48:20,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:22,858][1652475] Updated weights for policy 0, policy_version 21264 (0.0013) [2024-06-15 11:48:24,956][1652475] Updated weights for policy 0, policy_version 21360 (0.0094) [2024-06-15 11:48:25,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.7, 300 sec: 44653.3). Total num frames: 43778048. Throughput: 0: 10808.9. Samples: 10970624. Policy #0 lag: (min: 142.0, avg: 233.5, max: 383.0) [2024-06-15 11:48:25,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:30,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42598.4, 300 sec: 44097.9). Total num frames: 43810816. Throughput: 0: 11138.8. Samples: 11049984. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 11:48:30,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:31,057][1652475] Updated weights for policy 0, policy_version 21410 (0.0011) [2024-06-15 11:48:33,510][1652475] Updated weights for policy 0, policy_version 21457 (0.0014) [2024-06-15 11:48:34,731][1652475] Updated weights for policy 0, policy_version 21520 (0.0107) [2024-06-15 11:48:35,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 43690.7, 300 sec: 44542.3). Total num frames: 44138496. Throughput: 0: 11241.2. Samples: 11083776. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 11:48:35,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:36,815][1652475] Updated weights for policy 0, policy_version 21600 (0.0014) [2024-06-15 11:48:40,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 44431.4). Total num frames: 44302336. Throughput: 0: 11025.1. Samples: 11144704. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 11:48:40,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:41,494][1652475] Updated weights for policy 0, policy_version 21648 (0.0025) [2024-06-15 11:48:42,442][1652475] Updated weights for policy 0, policy_version 21694 (0.0098) [2024-06-15 11:48:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 44320.1). Total num frames: 44531712. Throughput: 0: 11366.4. Samples: 11222016. Policy #0 lag: (min: 45.0, avg: 125.9, max: 301.0) [2024-06-15 11:48:45,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:45,782][1652475] Updated weights for policy 0, policy_version 21760 (0.0116) [2024-06-15 11:48:46,679][1651340] Signal inference workers to stop experience collection... (1150 times) [2024-06-15 11:48:46,733][1652475] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-06-15 11:48:46,947][1651340] Signal inference workers to resume experience collection... (1150 times) [2024-06-15 11:48:46,948][1652475] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-06-15 11:48:47,808][1652475] Updated weights for policy 0, policy_version 21840 (0.0012) [2024-06-15 11:48:48,977][1652475] Updated weights for policy 0, policy_version 21882 (0.0011) [2024-06-15 11:48:50,739][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 44826624. Throughput: 0: 11220.6. Samples: 11241472. Policy #0 lag: (min: 45.0, avg: 125.9, max: 301.0) [2024-06-15 11:48:50,740][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:53,431][1652475] Updated weights for policy 0, policy_version 21928 (0.0012) [2024-06-15 11:48:55,738][1648984] Fps is (10 sec: 42596.1, 60 sec: 43690.3, 300 sec: 44208.9). Total num frames: 44957696. Throughput: 0: 11195.7. Samples: 11322368. Policy #0 lag: (min: 45.0, avg: 125.9, max: 301.0) [2024-06-15 11:48:55,739][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:48:56,235][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000021984_45023232.pth... [2024-06-15 11:48:56,360][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000016816_34439168.pth [2024-06-15 11:48:56,430][1652475] Updated weights for policy 0, policy_version 21987 (0.0041) [2024-06-15 11:48:58,565][1652475] Updated weights for policy 0, policy_version 22083 (0.0017) [2024-06-15 11:48:59,771][1652475] Updated weights for policy 0, policy_version 22132 (0.0030) [2024-06-15 11:49:00,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 46967.4, 300 sec: 44431.2). Total num frames: 45350912. Throughput: 0: 11241.2. Samples: 11382784. Policy #0 lag: (min: 45.0, avg: 125.9, max: 301.0) [2024-06-15 11:49:00,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:49:04,322][1652475] Updated weights for policy 0, policy_version 22160 (0.0020) [2024-06-15 11:49:05,450][1652475] Updated weights for policy 0, policy_version 22207 (0.0015) [2024-06-15 11:49:05,738][1648984] Fps is (10 sec: 52431.9, 60 sec: 45881.7, 300 sec: 44431.2). Total num frames: 45481984. Throughput: 0: 11286.8. Samples: 11426304. Policy #0 lag: (min: 11.0, avg: 97.7, max: 267.0) [2024-06-15 11:49:05,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:49:09,199][1652475] Updated weights for policy 0, policy_version 22291 (0.0015) [2024-06-15 11:49:10,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 46967.6, 300 sec: 44542.3). Total num frames: 45776896. Throughput: 0: 11491.6. Samples: 11487744. Policy #0 lag: (min: 11.0, avg: 97.7, max: 267.0) [2024-06-15 11:49:10,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:49:10,880][1652475] Updated weights for policy 0, policy_version 22354 (0.0180) [2024-06-15 11:49:15,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43690.6, 300 sec: 44320.1). Total num frames: 45875200. Throughput: 0: 11229.9. Samples: 11555328. Policy #0 lag: (min: 11.0, avg: 97.7, max: 267.0) [2024-06-15 11:49:15,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:49:16,643][1652475] Updated weights for policy 0, policy_version 22402 (0.0020) [2024-06-15 11:49:19,998][1652475] Updated weights for policy 0, policy_version 22490 (0.0015) [2024-06-15 11:49:20,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 45329.0, 300 sec: 44320.1). Total num frames: 46104576. Throughput: 0: 11207.1. Samples: 11588096. Policy #0 lag: (min: 14.0, avg: 98.6, max: 270.0) [2024-06-15 11:49:20,739][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:49:21,829][1652475] Updated weights for policy 0, policy_version 22564 (0.0016) [2024-06-15 11:49:23,476][1652475] Updated weights for policy 0, policy_version 22624 (0.0013) [2024-06-15 11:49:25,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 46399488. Throughput: 0: 11013.7. Samples: 11640320. Policy #0 lag: (min: 14.0, avg: 98.6, max: 270.0) [2024-06-15 11:49:25,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:49:29,904][1652475] Updated weights for policy 0, policy_version 22688 (0.0014) [2024-06-15 11:49:30,630][1652475] Updated weights for policy 0, policy_version 22720 (0.0013) [2024-06-15 11:49:30,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45329.1, 300 sec: 44320.1). Total num frames: 46530560. Throughput: 0: 11013.7. Samples: 11717632. Policy #0 lag: (min: 14.0, avg: 98.6, max: 270.0) [2024-06-15 11:49:30,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:49:31,423][1651340] Signal inference workers to stop experience collection... (1200 times) [2024-06-15 11:49:31,536][1652475] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-06-15 11:49:31,691][1651340] Signal inference workers to resume experience collection... (1200 times) [2024-06-15 11:49:31,692][1652475] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-06-15 11:49:32,258][1652475] Updated weights for policy 0, policy_version 22777 (0.0013) [2024-06-15 11:49:34,544][1652475] Updated weights for policy 0, policy_version 22848 (0.0106) [2024-06-15 11:49:35,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 45875.1, 300 sec: 44320.1). Total num frames: 46891008. Throughput: 0: 11275.4. Samples: 11748864. Policy #0 lag: (min: 74.0, avg: 203.6, max: 329.0) [2024-06-15 11:49:35,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:49:35,862][1652475] Updated weights for policy 0, policy_version 22910 (0.0013) [2024-06-15 11:49:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43986.8). Total num frames: 46923776. Throughput: 0: 10911.4. Samples: 11813376. Policy #0 lag: (min: 74.0, avg: 203.6, max: 329.0) [2024-06-15 11:49:40,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:49:42,138][1652475] Updated weights for policy 0, policy_version 22975 (0.0042) [2024-06-15 11:49:43,983][1652475] Updated weights for policy 0, policy_version 23037 (0.0127) [2024-06-15 11:49:45,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 45329.0, 300 sec: 44209.0). Total num frames: 47251456. Throughput: 0: 11161.6. Samples: 11885056. Policy #0 lag: (min: 74.0, avg: 203.6, max: 329.0) [2024-06-15 11:49:45,739][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:49:45,865][1652475] Updated weights for policy 0, policy_version 23073 (0.0012) [2024-06-15 11:49:47,949][1652475] Updated weights for policy 0, policy_version 23161 (0.0162) [2024-06-15 11:49:50,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 47448064. Throughput: 0: 10695.1. Samples: 11907584. Policy #0 lag: (min: 74.0, avg: 203.6, max: 329.0) [2024-06-15 11:49:50,738][1648984] Avg episode reward: [(0, '-1.080')] [2024-06-15 11:49:54,773][1652475] Updated weights for policy 0, policy_version 23248 (0.0013) [2024-06-15 11:49:55,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 45329.5, 300 sec: 44320.1). Total num frames: 47677440. Throughput: 0: 10888.5. Samples: 11977728. Policy #0 lag: (min: 10.0, avg: 86.5, max: 266.0) [2024-06-15 11:49:55,738][1648984] Avg episode reward: [(0, '-1.130')] [2024-06-15 11:49:59,514][1652475] Updated weights for policy 0, policy_version 23328 (0.0034) [2024-06-15 11:50:00,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 44097.9). Total num frames: 47874048. Throughput: 0: 10649.6. Samples: 12034560. Policy #0 lag: (min: 10.0, avg: 86.5, max: 266.0) [2024-06-15 11:50:00,741][1648984] Avg episode reward: [(0, '-1.090')] [2024-06-15 11:50:01,310][1652475] Updated weights for policy 0, policy_version 23408 (0.0014) [2024-06-15 11:50:05,738][1648984] Fps is (10 sec: 29490.7, 60 sec: 41506.0, 300 sec: 43986.8). Total num frames: 47972352. Throughput: 0: 10626.8. Samples: 12066304. Policy #0 lag: (min: 10.0, avg: 86.5, max: 266.0) [2024-06-15 11:50:05,738][1648984] Avg episode reward: [(0, '-1.350')] [2024-06-15 11:50:07,897][1652475] Updated weights for policy 0, policy_version 23483 (0.0014) [2024-06-15 11:50:09,414][1652475] Updated weights for policy 0, policy_version 23523 (0.0014) [2024-06-15 11:50:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 43875.8). Total num frames: 48234496. Throughput: 0: 10740.6. Samples: 12123648. Policy #0 lag: (min: 2.0, avg: 133.3, max: 258.0) [2024-06-15 11:50:10,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:50:10,890][1652475] Updated weights for policy 0, policy_version 23558 (0.0054) [2024-06-15 11:50:11,034][1651340] Saving new best policy, reward=-1.000! [2024-06-15 11:50:12,520][1652475] Updated weights for policy 0, policy_version 23622 (0.0013) [2024-06-15 11:50:15,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.5, 300 sec: 44097.9). Total num frames: 48496640. Throughput: 0: 10399.2. Samples: 12185600. Policy #0 lag: (min: 2.0, avg: 133.3, max: 258.0) [2024-06-15 11:50:15,739][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:50:19,458][1652475] Updated weights for policy 0, policy_version 23700 (0.0017) [2024-06-15 11:50:19,868][1651340] Signal inference workers to stop experience collection... (1250 times) [2024-06-15 11:50:20,027][1652475] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-06-15 11:50:20,175][1651340] Signal inference workers to resume experience collection... (1250 times) [2024-06-15 11:50:20,176][1652475] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-06-15 11:50:20,529][1652475] Updated weights for policy 0, policy_version 23742 (0.0012) [2024-06-15 11:50:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 43986.9). Total num frames: 48627712. Throughput: 0: 10592.7. Samples: 12225536. Policy #0 lag: (min: 2.0, avg: 133.3, max: 258.0) [2024-06-15 11:50:20,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:50:22,803][1652475] Updated weights for policy 0, policy_version 23810 (0.0015) [2024-06-15 11:50:23,949][1652475] Updated weights for policy 0, policy_version 23861 (0.0012) [2024-06-15 11:50:25,550][1652475] Updated weights for policy 0, policy_version 23928 (0.0015) [2024-06-15 11:50:25,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 49020928. Throughput: 0: 10456.2. Samples: 12283904. Policy #0 lag: (min: 95.0, avg: 217.9, max: 351.0) [2024-06-15 11:50:25,739][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:50:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 43986.9). Total num frames: 49020928. Throughput: 0: 10467.6. Samples: 12356096. Policy #0 lag: (min: 95.0, avg: 217.9, max: 351.0) [2024-06-15 11:50:30,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:50:32,503][1652475] Updated weights for policy 0, policy_version 23990 (0.0012) [2024-06-15 11:50:33,930][1652475] Updated weights for policy 0, policy_version 24050 (0.0194) [2024-06-15 11:50:35,125][1652475] Updated weights for policy 0, policy_version 24096 (0.0107) [2024-06-15 11:50:35,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.1, 300 sec: 44097.9). Total num frames: 49381376. Throughput: 0: 10672.3. Samples: 12387840. Policy #0 lag: (min: 95.0, avg: 217.9, max: 351.0) [2024-06-15 11:50:35,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:50:36,930][1652475] Updated weights for policy 0, policy_version 24176 (0.0014) [2024-06-15 11:50:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 49545216. Throughput: 0: 10422.0. Samples: 12446720. Policy #0 lag: (min: 95.0, avg: 217.9, max: 351.0) [2024-06-15 11:50:40,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:50:44,967][1652475] Updated weights for policy 0, policy_version 24272 (0.0129) [2024-06-15 11:50:45,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 42052.2, 300 sec: 44097.9). Total num frames: 49774592. Throughput: 0: 10706.4. Samples: 12516352. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 11:50:45,739][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:50:46,830][1652475] Updated weights for policy 0, policy_version 24325 (0.0013) [2024-06-15 11:50:48,411][1652475] Updated weights for policy 0, policy_version 24400 (0.0024) [2024-06-15 11:50:50,739][1648984] Fps is (10 sec: 52422.4, 60 sec: 43689.7, 300 sec: 44208.8). Total num frames: 50069504. Throughput: 0: 10660.7. Samples: 12546048. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 11:50:50,740][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:50:55,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 40959.7, 300 sec: 43986.8). Total num frames: 50135040. Throughput: 0: 11059.1. Samples: 12621312. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 11:50:55,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:50:55,847][1652475] Updated weights for policy 0, policy_version 24496 (0.0013) [2024-06-15 11:50:56,188][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000024512_50200576.pth... [2024-06-15 11:50:56,366][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000019392_39714816.pth [2024-06-15 11:50:56,372][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000024512_50200576.pth [2024-06-15 11:50:57,609][1652475] Updated weights for policy 0, policy_version 24571 (0.0013) [2024-06-15 11:50:59,462][1652475] Updated weights for policy 0, policy_version 24624 (0.0012) [2024-06-15 11:50:59,964][1651340] Signal inference workers to stop experience collection... (1300 times) [2024-06-15 11:50:59,992][1652475] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-06-15 11:51:00,243][1651340] Signal inference workers to resume experience collection... (1300 times) [2024-06-15 11:51:00,245][1652475] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-06-15 11:51:00,738][1648984] Fps is (10 sec: 45880.9, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 50528256. Throughput: 0: 10922.7. Samples: 12677120. Policy #0 lag: (min: 79.0, avg: 201.0, max: 335.0) [2024-06-15 11:51:00,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:01,168][1652475] Updated weights for policy 0, policy_version 24701 (0.0013) [2024-06-15 11:51:05,738][1648984] Fps is (10 sec: 45877.1, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 50593792. Throughput: 0: 10854.4. Samples: 12713984. Policy #0 lag: (min: 79.0, avg: 201.0, max: 335.0) [2024-06-15 11:51:05,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:08,082][1652475] Updated weights for policy 0, policy_version 24754 (0.0019) [2024-06-15 11:51:09,369][1652475] Updated weights for policy 0, policy_version 24816 (0.0012) [2024-06-15 11:51:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44782.9, 300 sec: 44097.9). Total num frames: 50921472. Throughput: 0: 11127.5. Samples: 12784640. Policy #0 lag: (min: 79.0, avg: 201.0, max: 335.0) [2024-06-15 11:51:10,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:11,094][1652475] Updated weights for policy 0, policy_version 24880 (0.0021) [2024-06-15 11:51:12,847][1652475] Updated weights for policy 0, policy_version 24953 (0.0014) [2024-06-15 11:51:15,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 51118080. Throughput: 0: 10911.3. Samples: 12847104. Policy #0 lag: (min: 79.0, avg: 201.0, max: 335.0) [2024-06-15 11:51:15,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:20,444][1652475] Updated weights for policy 0, policy_version 25016 (0.0118) [2024-06-15 11:51:20,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 51249152. Throughput: 0: 11070.6. Samples: 12886016. Policy #0 lag: (min: 13.0, avg: 68.8, max: 269.0) [2024-06-15 11:51:20,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:22,743][1652475] Updated weights for policy 0, policy_version 25108 (0.0017) [2024-06-15 11:51:24,771][1652475] Updated weights for policy 0, policy_version 25186 (0.0013) [2024-06-15 11:51:25,424][1652475] Updated weights for policy 0, policy_version 25215 (0.0014) [2024-06-15 11:51:25,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 51642368. Throughput: 0: 10831.6. Samples: 12934144. Policy #0 lag: (min: 13.0, avg: 68.8, max: 269.0) [2024-06-15 11:51:25,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 51642368. Throughput: 0: 11013.8. Samples: 13011968. Policy #0 lag: (min: 13.0, avg: 68.8, max: 269.0) [2024-06-15 11:51:30,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:32,709][1652475] Updated weights for policy 0, policy_version 25273 (0.0013) [2024-06-15 11:51:34,085][1652475] Updated weights for policy 0, policy_version 25332 (0.0021) [2024-06-15 11:51:35,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 52002816. Throughput: 0: 11070.8. Samples: 13044224. Policy #0 lag: (min: 13.0, avg: 68.8, max: 269.0) [2024-06-15 11:51:35,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:36,587][1652475] Updated weights for policy 0, policy_version 25424 (0.0015) [2024-06-15 11:51:37,703][1652475] Updated weights for policy 0, policy_version 25466 (0.0011) [2024-06-15 11:51:40,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 52166656. Throughput: 0: 10592.8. Samples: 13097984. Policy #0 lag: (min: 52.0, avg: 194.4, max: 330.0) [2024-06-15 11:51:40,739][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:45,002][1651340] Signal inference workers to stop experience collection... (1350 times) [2024-06-15 11:51:45,102][1652475] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-06-15 11:51:45,173][1651340] Signal inference workers to resume experience collection... (1350 times) [2024-06-15 11:51:45,174][1652475] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-06-15 11:51:45,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 42598.6, 300 sec: 43320.4). Total num frames: 52330496. Throughput: 0: 11036.4. Samples: 13173760. Policy #0 lag: (min: 52.0, avg: 194.4, max: 330.0) [2024-06-15 11:51:45,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:45,822][1652475] Updated weights for policy 0, policy_version 25554 (0.0106) [2024-06-15 11:51:47,672][1652475] Updated weights for policy 0, policy_version 25621 (0.0014) [2024-06-15 11:51:49,230][1652475] Updated weights for policy 0, policy_version 25685 (0.0011) [2024-06-15 11:51:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43691.5, 300 sec: 44097.9). Total num frames: 52690944. Throughput: 0: 10774.7. Samples: 13198848. Policy #0 lag: (min: 52.0, avg: 194.4, max: 330.0) [2024-06-15 11:51:50,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:55,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42598.6, 300 sec: 43320.4). Total num frames: 52690944. Throughput: 0: 10797.5. Samples: 13270528. Policy #0 lag: (min: 52.0, avg: 194.4, max: 330.0) [2024-06-15 11:51:55,738][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:51:55,750][1652475] Updated weights for policy 0, policy_version 25730 (0.0017) [2024-06-15 11:51:57,527][1652475] Updated weights for policy 0, policy_version 25793 (0.0012) [2024-06-15 11:51:59,293][1652475] Updated weights for policy 0, policy_version 25865 (0.0011) [2024-06-15 11:52:00,739][1648984] Fps is (10 sec: 39316.6, 60 sec: 42597.5, 300 sec: 43986.7). Total num frames: 53084160. Throughput: 0: 10660.7. Samples: 13326848. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 11:52:00,740][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:52:01,168][1652475] Updated weights for policy 0, policy_version 25942 (0.0013) [2024-06-15 11:52:05,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 43690.4, 300 sec: 43653.6). Total num frames: 53215232. Throughput: 0: 10604.0. Samples: 13363200. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 11:52:05,739][1648984] Avg episode reward: [(0, '-1.000')] [2024-06-15 11:52:08,021][1652475] Updated weights for policy 0, policy_version 25985 (0.0024) [2024-06-15 11:52:09,402][1652475] Updated weights for policy 0, policy_version 26046 (0.0017) [2024-06-15 11:52:10,671][1652475] Updated weights for policy 0, policy_version 26082 (0.0010) [2024-06-15 11:52:10,738][1648984] Fps is (10 sec: 32771.5, 60 sec: 41506.0, 300 sec: 43320.4). Total num frames: 53411840. Throughput: 0: 11093.3. Samples: 13433344. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 11:52:10,739][1648984] Avg episode reward: [(0, '-1.370')] [2024-06-15 11:52:13,336][1652475] Updated weights for policy 0, policy_version 26192 (0.0012) [2024-06-15 11:52:15,738][1648984] Fps is (10 sec: 52430.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 53739520. Throughput: 0: 10399.3. Samples: 13479936. Policy #0 lag: (min: 164.0, avg: 249.9, max: 415.0) [2024-06-15 11:52:15,739][1648984] Avg episode reward: [(0, '-1.170')] [2024-06-15 11:52:20,738][1648984] Fps is (10 sec: 32768.8, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 53739520. Throughput: 0: 10399.3. Samples: 13512192. Policy #0 lag: (min: 164.0, avg: 249.9, max: 415.0) [2024-06-15 11:52:20,738][1648984] Avg episode reward: [(0, '-1.260')] [2024-06-15 11:52:20,912][1652475] Updated weights for policy 0, policy_version 26241 (0.0015) [2024-06-15 11:52:22,383][1652475] Updated weights for policy 0, policy_version 26304 (0.0012) [2024-06-15 11:52:23,741][1652475] Updated weights for policy 0, policy_version 26356 (0.0061) [2024-06-15 11:52:25,228][1651340] Signal inference workers to stop experience collection... (1400 times) [2024-06-15 11:52:25,269][1652475] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-06-15 11:52:25,272][1652475] Updated weights for policy 0, policy_version 26402 (0.0012) [2024-06-15 11:52:25,423][1651340] Signal inference workers to resume experience collection... (1400 times) [2024-06-15 11:52:25,435][1652475] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-06-15 11:52:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 43653.6). Total num frames: 54132736. Throughput: 0: 10570.0. Samples: 13573632. Policy #0 lag: (min: 164.0, avg: 249.9, max: 415.0) [2024-06-15 11:52:25,738][1648984] Avg episode reward: [(0, '-1.150')] [2024-06-15 11:52:26,559][1652475] Updated weights for policy 0, policy_version 26469 (0.0013) [2024-06-15 11:52:27,202][1652475] Updated weights for policy 0, policy_version 26496 (0.0015) [2024-06-15 11:52:30,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 54263808. Throughput: 0: 10296.9. Samples: 13637120. Policy #0 lag: (min: 164.0, avg: 249.9, max: 415.0) [2024-06-15 11:52:30,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:52:35,158][1652475] Updated weights for policy 0, policy_version 26552 (0.0015) [2024-06-15 11:52:35,738][1648984] Fps is (10 sec: 26214.2, 60 sec: 39867.8, 300 sec: 43098.3). Total num frames: 54394880. Throughput: 0: 10672.4. Samples: 13679104. Policy #0 lag: (min: 15.0, avg: 69.2, max: 271.0) [2024-06-15 11:52:35,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:52:36,856][1652475] Updated weights for policy 0, policy_version 26624 (0.0012) [2024-06-15 11:52:38,889][1652475] Updated weights for policy 0, policy_version 26704 (0.0013) [2024-06-15 11:52:40,742][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 54788096. Throughput: 0: 10262.8. Samples: 13732352. Policy #0 lag: (min: 15.0, avg: 69.2, max: 271.0) [2024-06-15 11:52:40,743][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:52:45,562][1652475] Updated weights for policy 0, policy_version 26753 (0.0014) [2024-06-15 11:52:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 40959.9, 300 sec: 42653.9). Total num frames: 54788096. Throughput: 0: 10786.4. Samples: 13812224. Policy #0 lag: (min: 15.0, avg: 69.2, max: 271.0) [2024-06-15 11:52:45,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:52:47,078][1652475] Updated weights for policy 0, policy_version 26818 (0.0012) [2024-06-15 11:52:48,362][1652475] Updated weights for policy 0, policy_version 26880 (0.0015) [2024-06-15 11:52:50,176][1652475] Updated weights for policy 0, policy_version 26960 (0.0151) [2024-06-15 11:52:50,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 55246848. Throughput: 0: 10604.2. Samples: 13840384. Policy #0 lag: (min: 15.0, avg: 69.2, max: 271.0) [2024-06-15 11:52:50,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:52:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 55312384. Throughput: 0: 10501.7. Samples: 13905920. Policy #0 lag: (min: 15.0, avg: 69.2, max: 271.0) [2024-06-15 11:52:55,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:52:55,757][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000027008_55312384.pth... [2024-06-15 11:52:55,818][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000021984_45023232.pth [2024-06-15 11:52:57,708][1652475] Updated weights for policy 0, policy_version 27024 (0.0013) [2024-06-15 11:52:59,141][1652475] Updated weights for policy 0, policy_version 27072 (0.0014) [2024-06-15 11:53:00,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 41507.1, 300 sec: 43543.8). Total num frames: 55574528. Throughput: 0: 10968.2. Samples: 13973504. Policy #0 lag: (min: 15.0, avg: 71.5, max: 271.0) [2024-06-15 11:53:00,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:01,143][1652475] Updated weights for policy 0, policy_version 27153 (0.0012) [2024-06-15 11:53:03,413][1652475] Updated weights for policy 0, policy_version 27260 (0.0030) [2024-06-15 11:53:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.9, 300 sec: 43653.7). Total num frames: 55836672. Throughput: 0: 10717.9. Samples: 13994496. Policy #0 lag: (min: 15.0, avg: 71.5, max: 271.0) [2024-06-15 11:53:05,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:10,394][1651340] Signal inference workers to stop experience collection... (1450 times) [2024-06-15 11:53:10,434][1652475] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-06-15 11:53:10,585][1651340] Signal inference workers to resume experience collection... (1450 times) [2024-06-15 11:53:10,586][1652475] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-06-15 11:53:10,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.5, 300 sec: 42987.2). Total num frames: 55934976. Throughput: 0: 11116.1. Samples: 14073856. Policy #0 lag: (min: 15.0, avg: 71.5, max: 271.0) [2024-06-15 11:53:10,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:10,850][1652475] Updated weights for policy 0, policy_version 27326 (0.0171) [2024-06-15 11:53:12,527][1652475] Updated weights for policy 0, policy_version 27395 (0.0104) [2024-06-15 11:53:14,419][1652475] Updated weights for policy 0, policy_version 27488 (0.0035) [2024-06-15 11:53:15,034][1652475] Updated weights for policy 0, policy_version 27517 (0.0013) [2024-06-15 11:53:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 56360960. Throughput: 0: 10922.7. Samples: 14128640. Policy #0 lag: (min: 15.0, avg: 71.5, max: 271.0) [2024-06-15 11:53:15,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 56360960. Throughput: 0: 10854.4. Samples: 14167552. Policy #0 lag: (min: 15.0, avg: 71.5, max: 271.0) [2024-06-15 11:53:20,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:22,564][1652475] Updated weights for policy 0, policy_version 27568 (0.0016) [2024-06-15 11:53:24,012][1652475] Updated weights for policy 0, policy_version 27632 (0.0011) [2024-06-15 11:53:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 56754176. Throughput: 0: 11184.4. Samples: 14235648. Policy #0 lag: (min: 5.0, avg: 56.3, max: 261.0) [2024-06-15 11:53:25,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:25,772][1652475] Updated weights for policy 0, policy_version 27715 (0.0012) [2024-06-15 11:53:26,987][1652475] Updated weights for policy 0, policy_version 27776 (0.0012) [2024-06-15 11:53:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 56885248. Throughput: 0: 10877.2. Samples: 14301696. Policy #0 lag: (min: 5.0, avg: 56.3, max: 261.0) [2024-06-15 11:53:30,742][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:34,817][1652475] Updated weights for policy 0, policy_version 27840 (0.0012) [2024-06-15 11:53:35,738][1648984] Fps is (10 sec: 32767.4, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 57081856. Throughput: 0: 11138.8. Samples: 14341632. Policy #0 lag: (min: 5.0, avg: 56.3, max: 261.0) [2024-06-15 11:53:35,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:37,254][1652475] Updated weights for policy 0, policy_version 27952 (0.0013) [2024-06-15 11:53:39,025][1652475] Updated weights for policy 0, policy_version 28023 (0.0014) [2024-06-15 11:53:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 57409536. Throughput: 0: 10786.2. Samples: 14391296. Policy #0 lag: (min: 5.0, avg: 56.3, max: 261.0) [2024-06-15 11:53:40,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:45,738][1648984] Fps is (10 sec: 32768.6, 60 sec: 43690.8, 300 sec: 42654.0). Total num frames: 57409536. Throughput: 0: 11047.8. Samples: 14470656. Policy #0 lag: (min: 5.0, avg: 56.3, max: 261.0) [2024-06-15 11:53:45,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:46,764][1652475] Updated weights for policy 0, policy_version 28083 (0.0014) [2024-06-15 11:53:47,971][1651340] Signal inference workers to stop experience collection... (1500 times) [2024-06-15 11:53:48,099][1652475] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-06-15 11:53:48,241][1651340] Signal inference workers to resume experience collection... (1500 times) [2024-06-15 11:53:48,241][1652475] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-06-15 11:53:48,897][1652475] Updated weights for policy 0, policy_version 28182 (0.0089) [2024-06-15 11:53:50,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 43764.8). Total num frames: 57868288. Throughput: 0: 11150.2. Samples: 14496256. Policy #0 lag: (min: 8.0, avg: 63.8, max: 264.0) [2024-06-15 11:53:50,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:51,190][1652475] Updated weights for policy 0, policy_version 28272 (0.0014) [2024-06-15 11:53:55,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 57933824. Throughput: 0: 10706.5. Samples: 14555648. Policy #0 lag: (min: 8.0, avg: 63.8, max: 264.0) [2024-06-15 11:53:55,739][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:53:59,475][1652475] Updated weights for policy 0, policy_version 28355 (0.0015) [2024-06-15 11:54:00,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 58163200. Throughput: 0: 11047.8. Samples: 14625792. Policy #0 lag: (min: 8.0, avg: 63.8, max: 264.0) [2024-06-15 11:54:00,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:54:01,826][1652475] Updated weights for policy 0, policy_version 28448 (0.0013) [2024-06-15 11:54:04,218][1652475] Updated weights for policy 0, policy_version 28544 (0.0014) [2024-06-15 11:54:05,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 58458112. Throughput: 0: 10524.4. Samples: 14641152. Policy #0 lag: (min: 8.0, avg: 63.8, max: 264.0) [2024-06-15 11:54:05,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:54:10,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 58458112. Throughput: 0: 10638.2. Samples: 14714368. Policy #0 lag: (min: 8.0, avg: 63.8, max: 264.0) [2024-06-15 11:54:10,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:54:13,072][1652475] Updated weights for policy 0, policy_version 28624 (0.0040) [2024-06-15 11:54:15,027][1652475] Updated weights for policy 0, policy_version 28692 (0.0013) [2024-06-15 11:54:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.0, 300 sec: 43098.3). Total num frames: 58818560. Throughput: 0: 10319.6. Samples: 14766080. Policy #0 lag: (min: 11.0, avg: 55.8, max: 267.0) [2024-06-15 11:54:15,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:54:16,487][1652475] Updated weights for policy 0, policy_version 28768 (0.0012) [2024-06-15 11:54:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 58982400. Throughput: 0: 10092.1. Samples: 14795776. Policy #0 lag: (min: 11.0, avg: 55.8, max: 267.0) [2024-06-15 11:54:20,738][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:54:25,182][1652475] Updated weights for policy 0, policy_version 28834 (0.0015) [2024-06-15 11:54:25,738][1648984] Fps is (10 sec: 26214.0, 60 sec: 38775.4, 300 sec: 42542.9). Total num frames: 59080704. Throughput: 0: 10729.2. Samples: 14874112. Policy #0 lag: (min: 11.0, avg: 55.8, max: 267.0) [2024-06-15 11:54:25,739][1648984] Avg episode reward: [(0, '-1.050')] [2024-06-15 11:54:26,447][1652475] Updated weights for policy 0, policy_version 28881 (0.0012) [2024-06-15 11:54:28,631][1652475] Updated weights for policy 0, policy_version 28978 (0.0014) [2024-06-15 11:54:28,991][1651340] Signal inference workers to stop experience collection... (1550 times) [2024-06-15 11:54:29,027][1652475] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-06-15 11:54:29,230][1651340] Signal inference workers to resume experience collection... (1550 times) [2024-06-15 11:54:29,231][1652475] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-06-15 11:54:30,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 59506688. Throughput: 0: 9966.9. Samples: 14919168. Policy #0 lag: (min: 11.0, avg: 55.8, max: 267.0) [2024-06-15 11:54:30,738][1648984] Avg episode reward: [(0, '-1.040')] [2024-06-15 11:54:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 40414.0, 300 sec: 42654.0). Total num frames: 59506688. Throughput: 0: 10262.8. Samples: 14958080. Policy #0 lag: (min: 11.0, avg: 55.8, max: 267.0) [2024-06-15 11:54:35,738][1648984] Avg episode reward: [(0, '-1.020')] [2024-06-15 11:54:36,300][1652475] Updated weights for policy 0, policy_version 29060 (0.0014) [2024-06-15 11:54:40,269][1652475] Updated weights for policy 0, policy_version 29185 (0.0016) [2024-06-15 11:54:40,738][1648984] Fps is (10 sec: 29490.4, 60 sec: 39867.6, 300 sec: 42542.8). Total num frames: 59801600. Throughput: 0: 10137.6. Samples: 15011840. Policy #0 lag: (min: 2.0, avg: 58.8, max: 258.0) [2024-06-15 11:54:40,739][1648984] Avg episode reward: [(0, '-1.030')] [2024-06-15 11:54:42,134][1652475] Updated weights for policy 0, policy_version 29268 (0.0012) [2024-06-15 11:54:45,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 60030976. Throughput: 0: 9921.4. Samples: 15072256. Policy #0 lag: (min: 2.0, avg: 58.8, max: 258.0) [2024-06-15 11:54:45,738][1648984] Avg episode reward: [(0, '-0.990')] [2024-06-15 11:54:45,739][1651340] Saving new best policy, reward=-0.990! [2024-06-15 11:54:50,189][1652475] Updated weights for policy 0, policy_version 29314 (0.0023) [2024-06-15 11:54:50,738][1648984] Fps is (10 sec: 29491.9, 60 sec: 37137.1, 300 sec: 42098.6). Total num frames: 60096512. Throughput: 0: 10433.4. Samples: 15110656. Policy #0 lag: (min: 2.0, avg: 58.8, max: 258.0) [2024-06-15 11:54:50,738][1648984] Avg episode reward: [(0, '-1.060')] [2024-06-15 11:54:52,263][1652475] Updated weights for policy 0, policy_version 29424 (0.0014) [2024-06-15 11:54:54,592][1652475] Updated weights for policy 0, policy_version 29509 (0.0183) [2024-06-15 11:54:55,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 60522496. Throughput: 0: 9955.5. Samples: 15162368. Policy #0 lag: (min: 2.0, avg: 58.8, max: 258.0) [2024-06-15 11:54:55,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:54:55,896][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000029568_60555264.pth... [2024-06-15 11:54:55,950][1652475] Updated weights for policy 0, policy_version 29568 (0.0083) [2024-06-15 11:54:55,975][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000024512_50200576.pth [2024-06-15 11:55:00,739][1648984] Fps is (10 sec: 45874.9, 60 sec: 39867.7, 300 sec: 42654.0). Total num frames: 60555264. Throughput: 0: 10444.8. Samples: 15236096. Policy #0 lag: (min: 203.0, avg: 257.5, max: 347.0) [2024-06-15 11:55:00,741][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:04,050][1652475] Updated weights for policy 0, policy_version 29632 (0.0159) [2024-06-15 11:55:05,738][1648984] Fps is (10 sec: 26213.9, 60 sec: 38775.3, 300 sec: 42542.8). Total num frames: 60784640. Throughput: 0: 10604.0. Samples: 15272960. Policy #0 lag: (min: 203.0, avg: 257.5, max: 347.0) [2024-06-15 11:55:05,739][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:06,717][1652475] Updated weights for policy 0, policy_version 29728 (0.0013) [2024-06-15 11:55:08,458][1652475] Updated weights for policy 0, policy_version 29796 (0.0017) [2024-06-15 11:55:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 61079552. Throughput: 0: 9864.5. Samples: 15318016. Policy #0 lag: (min: 203.0, avg: 257.5, max: 347.0) [2024-06-15 11:55:10,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:15,258][1652475] Updated weights for policy 0, policy_version 29840 (0.0014) [2024-06-15 11:55:15,738][1648984] Fps is (10 sec: 36045.9, 60 sec: 38775.5, 300 sec: 42431.8). Total num frames: 61145088. Throughput: 0: 10729.2. Samples: 15401984. Policy #0 lag: (min: 30.0, avg: 81.4, max: 286.0) [2024-06-15 11:55:15,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:16,301][1651340] Signal inference workers to stop experience collection... (1600 times) [2024-06-15 11:55:16,332][1652475] InferenceWorker_p0-w0: stopping experience collection (1600 times) [2024-06-15 11:55:16,607][1651340] Signal inference workers to resume experience collection... (1600 times) [2024-06-15 11:55:16,607][1652475] InferenceWorker_p0-w0: resuming experience collection (1600 times) [2024-06-15 11:55:18,268][1652475] Updated weights for policy 0, policy_version 29952 (0.0015) [2024-06-15 11:55:20,365][1652475] Updated weights for policy 0, policy_version 30048 (0.0014) [2024-06-15 11:55:20,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 42598.3, 300 sec: 42431.8). Total num frames: 61538304. Throughput: 0: 10251.3. Samples: 15419392. Policy #0 lag: (min: 30.0, avg: 81.4, max: 286.0) [2024-06-15 11:55:20,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:25,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42052.4, 300 sec: 42653.9). Total num frames: 61603840. Throughput: 0: 10570.0. Samples: 15487488. Policy #0 lag: (min: 30.0, avg: 81.4, max: 286.0) [2024-06-15 11:55:25,740][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:27,420][1652475] Updated weights for policy 0, policy_version 30084 (0.0012) [2024-06-15 11:55:29,490][1652475] Updated weights for policy 0, policy_version 30160 (0.0013) [2024-06-15 11:55:30,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 39867.6, 300 sec: 42431.8). Total num frames: 61898752. Throughput: 0: 10626.8. Samples: 15550464. Policy #0 lag: (min: 30.0, avg: 81.4, max: 286.0) [2024-06-15 11:55:30,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:31,669][1652475] Updated weights for policy 0, policy_version 30272 (0.0025) [2024-06-15 11:55:33,188][1652475] Updated weights for policy 0, policy_version 30329 (0.0010) [2024-06-15 11:55:35,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 62128128. Throughput: 0: 10285.5. Samples: 15573504. Policy #0 lag: (min: 30.0, avg: 81.4, max: 286.0) [2024-06-15 11:55:35,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:39,887][1652475] Updated weights for policy 0, policy_version 30353 (0.0019) [2024-06-15 11:55:40,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 40413.9, 300 sec: 42209.6). Total num frames: 62226432. Throughput: 0: 10911.3. Samples: 15653376. Policy #0 lag: (min: 10.0, avg: 65.5, max: 266.0) [2024-06-15 11:55:40,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:41,799][1652475] Updated weights for policy 0, policy_version 30420 (0.0013) [2024-06-15 11:55:44,020][1652475] Updated weights for policy 0, policy_version 30513 (0.0016) [2024-06-15 11:55:45,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 42543.0). Total num frames: 62619648. Throughput: 0: 10319.7. Samples: 15700480. Policy #0 lag: (min: 10.0, avg: 65.5, max: 266.0) [2024-06-15 11:55:45,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:45,859][1652475] Updated weights for policy 0, policy_version 30584 (0.0018) [2024-06-15 11:55:50,740][1648984] Fps is (10 sec: 42598.6, 60 sec: 42598.3, 300 sec: 42431.8). Total num frames: 62652416. Throughput: 0: 10262.8. Samples: 15734784. Policy #0 lag: (min: 10.0, avg: 65.5, max: 266.0) [2024-06-15 11:55:50,742][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:53,299][1652475] Updated weights for policy 0, policy_version 30640 (0.0014) [2024-06-15 11:55:54,398][1652475] Updated weights for policy 0, policy_version 30692 (0.0012) [2024-06-15 11:55:55,077][1651340] Signal inference workers to stop experience collection... (1650 times) [2024-06-15 11:55:55,151][1652475] InferenceWorker_p0-w0: stopping experience collection (1650 times) [2024-06-15 11:55:55,391][1651340] Signal inference workers to resume experience collection... (1650 times) [2024-06-15 11:55:55,394][1652475] InferenceWorker_p0-w0: resuming experience collection (1650 times) [2024-06-15 11:55:55,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 40413.9, 300 sec: 42098.5). Total num frames: 62947328. Throughput: 0: 10786.1. Samples: 15803392. Policy #0 lag: (min: 10.0, avg: 65.5, max: 266.0) [2024-06-15 11:55:55,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:55:55,890][1652475] Updated weights for policy 0, policy_version 30752 (0.0014) [2024-06-15 11:55:57,516][1652475] Updated weights for policy 0, policy_version 30801 (0.0012) [2024-06-15 11:56:00,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 63176704. Throughput: 0: 10171.7. Samples: 15859712. Policy #0 lag: (min: 10.0, avg: 65.5, max: 266.0) [2024-06-15 11:56:00,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:04,650][1652475] Updated weights for policy 0, policy_version 30864 (0.0012) [2024-06-15 11:56:05,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42052.5, 300 sec: 41987.5). Total num frames: 63307776. Throughput: 0: 10638.3. Samples: 15898112. Policy #0 lag: (min: 6.0, avg: 69.3, max: 262.0) [2024-06-15 11:56:05,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:06,720][1652475] Updated weights for policy 0, policy_version 30930 (0.0087) [2024-06-15 11:56:08,746][1652475] Updated weights for policy 0, policy_version 31012 (0.0101) [2024-06-15 11:56:10,458][1652475] Updated weights for policy 0, policy_version 31088 (0.0014) [2024-06-15 11:56:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 63700992. Throughput: 0: 10410.7. Samples: 15955968. Policy #0 lag: (min: 6.0, avg: 69.3, max: 262.0) [2024-06-15 11:56:10,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:15,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 63700992. Throughput: 0: 10626.9. Samples: 16028672. Policy #0 lag: (min: 6.0, avg: 69.3, max: 262.0) [2024-06-15 11:56:15,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:17,453][1652475] Updated weights for policy 0, policy_version 31152 (0.0021) [2024-06-15 11:56:19,165][1652475] Updated weights for policy 0, policy_version 31201 (0.0014) [2024-06-15 11:56:20,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 41506.3, 300 sec: 41987.5). Total num frames: 64028672. Throughput: 0: 10899.9. Samples: 16064000. Policy #0 lag: (min: 6.0, avg: 69.3, max: 262.0) [2024-06-15 11:56:20,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:20,842][1652475] Updated weights for policy 0, policy_version 31280 (0.0013) [2024-06-15 11:56:25,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 64225280. Throughput: 0: 10319.7. Samples: 16117760. Policy #0 lag: (min: 6.0, avg: 69.3, max: 262.0) [2024-06-15 11:56:25,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:28,931][1652475] Updated weights for policy 0, policy_version 31361 (0.0014) [2024-06-15 11:56:30,206][1652475] Updated weights for policy 0, policy_version 31410 (0.0010) [2024-06-15 11:56:30,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40960.1, 300 sec: 41876.4). Total num frames: 64356352. Throughput: 0: 10968.2. Samples: 16194048. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 11:56:30,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:32,141][1652475] Updated weights for policy 0, policy_version 31504 (0.0114) [2024-06-15 11:56:33,723][1652475] Updated weights for policy 0, policy_version 31568 (0.0092) [2024-06-15 11:56:35,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 64749568. Throughput: 0: 10763.4. Samples: 16219136. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 11:56:35,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:40,546][1651340] Signal inference workers to stop experience collection... (1700 times) [2024-06-15 11:56:40,591][1652475] InferenceWorker_p0-w0: stopping experience collection (1700 times) [2024-06-15 11:56:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 64749568. Throughput: 0: 10877.2. Samples: 16292864. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 11:56:40,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:40,805][1651340] Signal inference workers to resume experience collection... (1700 times) [2024-06-15 11:56:40,806][1652475] InferenceWorker_p0-w0: resuming experience collection (1700 times) [2024-06-15 11:56:40,808][1652475] Updated weights for policy 0, policy_version 31632 (0.0036) [2024-06-15 11:56:43,217][1652475] Updated weights for policy 0, policy_version 31747 (0.0085) [2024-06-15 11:56:45,064][1652475] Updated weights for policy 0, policy_version 31824 (0.0012) [2024-06-15 11:56:45,739][1648984] Fps is (10 sec: 45868.4, 60 sec: 43143.5, 300 sec: 42431.6). Total num frames: 65208320. Throughput: 0: 10785.8. Samples: 16345088. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 11:56:45,740][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.8, 300 sec: 42654.0). Total num frames: 65273856. Throughput: 0: 10695.1. Samples: 16379392. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 11:56:50,738][1648984] Avg episode reward: [(0, '-1.070')] [2024-06-15 11:56:53,379][1652475] Updated weights for policy 0, policy_version 31895 (0.0169) [2024-06-15 11:56:55,341][1652475] Updated weights for policy 0, policy_version 31970 (0.0012) [2024-06-15 11:56:55,738][1648984] Fps is (10 sec: 29495.2, 60 sec: 42598.4, 300 sec: 42098.7). Total num frames: 65503232. Throughput: 0: 11059.2. Samples: 16453632. Policy #0 lag: (min: 15.0, avg: 63.5, max: 271.0) [2024-06-15 11:56:55,738][1648984] Avg episode reward: [(0, '-0.720')] [2024-06-15 11:56:56,046][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000032000_65536000.pth... [2024-06-15 11:56:56,247][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000027008_55312384.pth [2024-06-15 11:56:56,254][1651340] Saving new best policy, reward=-0.720! [2024-06-15 11:56:57,072][1652475] Updated weights for policy 0, policy_version 32036 (0.0013) [2024-06-15 11:56:59,157][1652475] Updated weights for policy 0, policy_version 32117 (0.0013) [2024-06-15 11:57:00,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 65798144. Throughput: 0: 10604.1. Samples: 16505856. Policy #0 lag: (min: 15.0, avg: 63.5, max: 271.0) [2024-06-15 11:57:00,738][1648984] Avg episode reward: [(0, '-0.880')] [2024-06-15 11:57:05,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 65798144. Throughput: 0: 10752.0. Samples: 16547840. Policy #0 lag: (min: 15.0, avg: 63.5, max: 271.0) [2024-06-15 11:57:05,738][1648984] Avg episode reward: [(0, '-0.990')] [2024-06-15 11:57:06,344][1652475] Updated weights for policy 0, policy_version 32148 (0.0012) [2024-06-15 11:57:08,174][1652475] Updated weights for policy 0, policy_version 32224 (0.0014) [2024-06-15 11:57:10,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 66191360. Throughput: 0: 10786.1. Samples: 16603136. Policy #0 lag: (min: 15.0, avg: 63.5, max: 271.0) [2024-06-15 11:57:10,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:57:11,163][1652475] Updated weights for policy 0, policy_version 32337 (0.0027) [2024-06-15 11:57:15,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 66322432. Throughput: 0: 10410.7. Samples: 16662528. Policy #0 lag: (min: 15.0, avg: 63.5, max: 271.0) [2024-06-15 11:57:15,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:57:18,444][1652475] Updated weights for policy 0, policy_version 32388 (0.0013) [2024-06-15 11:57:20,332][1651340] Signal inference workers to stop experience collection... (1750 times) [2024-06-15 11:57:20,378][1652475] InferenceWorker_p0-w0: stopping experience collection (1750 times) [2024-06-15 11:57:20,386][1652475] Updated weights for policy 0, policy_version 32469 (0.0016) [2024-06-15 11:57:20,523][1651340] Signal inference workers to resume experience collection... (1750 times) [2024-06-15 11:57:20,524][1652475] InferenceWorker_p0-w0: resuming experience collection (1750 times) [2024-06-15 11:57:20,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 66519040. Throughput: 0: 10683.7. Samples: 16699904. Policy #0 lag: (min: 15.0, avg: 71.7, max: 271.0) [2024-06-15 11:57:20,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:57:22,427][1652475] Updated weights for policy 0, policy_version 32548 (0.0096) [2024-06-15 11:57:24,394][1652475] Updated weights for policy 0, policy_version 32624 (0.0013) [2024-06-15 11:57:25,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 66846720. Throughput: 0: 10001.0. Samples: 16742912. Policy #0 lag: (min: 15.0, avg: 71.7, max: 271.0) [2024-06-15 11:57:25,739][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:57:30,742][1648984] Fps is (10 sec: 32753.3, 60 sec: 41503.0, 300 sec: 42209.0). Total num frames: 66846720. Throughput: 0: 10512.4. Samples: 16818176. Policy #0 lag: (min: 15.0, avg: 71.7, max: 271.0) [2024-06-15 11:57:30,742][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:57:32,119][1652475] Updated weights for policy 0, policy_version 32675 (0.0011) [2024-06-15 11:57:34,068][1652475] Updated weights for policy 0, policy_version 32762 (0.0013) [2024-06-15 11:57:35,692][1652475] Updated weights for policy 0, policy_version 32821 (0.0013) [2024-06-15 11:57:35,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 40960.0, 300 sec: 42098.6). Total num frames: 67207168. Throughput: 0: 10422.0. Samples: 16848384. Policy #0 lag: (min: 15.0, avg: 71.7, max: 271.0) [2024-06-15 11:57:35,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:57:37,187][1652475] Updated weights for policy 0, policy_version 32889 (0.0014) [2024-06-15 11:57:40,738][1648984] Fps is (10 sec: 52451.8, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 67371008. Throughput: 0: 10001.1. Samples: 16903680. Policy #0 lag: (min: 15.0, avg: 71.7, max: 271.0) [2024-06-15 11:57:40,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:57:45,250][1652475] Updated weights for policy 0, policy_version 32966 (0.0017) [2024-06-15 11:57:45,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 38776.3, 300 sec: 41654.2). Total num frames: 67534848. Throughput: 0: 10342.4. Samples: 16971264. Policy #0 lag: (min: 14.0, avg: 70.2, max: 270.0) [2024-06-15 11:57:45,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:57:46,520][1652475] Updated weights for policy 0, policy_version 33024 (0.0015) [2024-06-15 11:57:48,820][1652475] Updated weights for policy 0, policy_version 33093 (0.0013) [2024-06-15 11:57:49,891][1652475] Updated weights for policy 0, policy_version 33152 (0.0014) [2024-06-15 11:57:50,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 67895296. Throughput: 0: 9978.3. Samples: 16996864. Policy #0 lag: (min: 14.0, avg: 70.2, max: 270.0) [2024-06-15 11:57:50,739][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:57:55,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 39867.7, 300 sec: 41765.3). Total num frames: 67895296. Throughput: 0: 10399.2. Samples: 17071104. Policy #0 lag: (min: 14.0, avg: 70.2, max: 270.0) [2024-06-15 11:57:55,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:57:57,884][1652475] Updated weights for policy 0, policy_version 33248 (0.0013) [2024-06-15 11:57:59,562][1652475] Updated weights for policy 0, policy_version 33296 (0.0024) [2024-06-15 11:58:00,738][1648984] Fps is (10 sec: 39322.7, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 68288512. Throughput: 0: 10376.5. Samples: 17129472. Policy #0 lag: (min: 14.0, avg: 70.2, max: 270.0) [2024-06-15 11:58:00,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:00,818][1651340] Signal inference workers to stop experience collection... (1800 times) [2024-06-15 11:58:00,853][1652475] InferenceWorker_p0-w0: stopping experience collection (1800 times) [2024-06-15 11:58:01,096][1651340] Signal inference workers to resume experience collection... (1800 times) [2024-06-15 11:58:01,098][1652475] InferenceWorker_p0-w0: resuming experience collection (1800 times) [2024-06-15 11:58:01,738][1652475] Updated weights for policy 0, policy_version 33392 (0.0014) [2024-06-15 11:58:05,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 43690.7, 300 sec: 42320.7). Total num frames: 68419584. Throughput: 0: 10240.0. Samples: 17160704. Policy #0 lag: (min: 14.0, avg: 70.2, max: 270.0) [2024-06-15 11:58:05,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:08,676][1652475] Updated weights for policy 0, policy_version 33441 (0.0014) [2024-06-15 11:58:10,399][1652475] Updated weights for policy 0, policy_version 33508 (0.0014) [2024-06-15 11:58:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 68648960. Throughput: 0: 10900.0. Samples: 17233408. Policy #0 lag: (min: 5.0, avg: 64.6, max: 261.0) [2024-06-15 11:58:10,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:12,485][1652475] Updated weights for policy 0, policy_version 33574 (0.0018) [2024-06-15 11:58:13,960][1652475] Updated weights for policy 0, policy_version 33661 (0.0014) [2024-06-15 11:58:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 68943872. Throughput: 0: 10605.1. Samples: 17295360. Policy #0 lag: (min: 5.0, avg: 64.6, max: 261.0) [2024-06-15 11:58:15,740][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:20,668][1652475] Updated weights for policy 0, policy_version 33731 (0.0105) [2024-06-15 11:58:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 41765.3). Total num frames: 69074944. Throughput: 0: 10808.9. Samples: 17334784. Policy #0 lag: (min: 5.0, avg: 64.6, max: 261.0) [2024-06-15 11:58:20,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:21,666][1652475] Updated weights for policy 0, policy_version 33779 (0.0015) [2024-06-15 11:58:23,824][1652475] Updated weights for policy 0, policy_version 33815 (0.0011) [2024-06-15 11:58:25,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.5, 300 sec: 42431.8). Total num frames: 69402624. Throughput: 0: 10922.7. Samples: 17395200. Policy #0 lag: (min: 5.0, avg: 64.6, max: 261.0) [2024-06-15 11:58:25,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:25,962][1652475] Updated weights for policy 0, policy_version 33913 (0.0014) [2024-06-15 11:58:30,739][1648984] Fps is (10 sec: 42591.8, 60 sec: 44238.9, 300 sec: 42098.3). Total num frames: 69500928. Throughput: 0: 11002.0. Samples: 17466368. Policy #0 lag: (min: 9.0, avg: 87.8, max: 265.0) [2024-06-15 11:58:30,740][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:31,802][1652475] Updated weights for policy 0, policy_version 33980 (0.0032) [2024-06-15 11:58:33,662][1652475] Updated weights for policy 0, policy_version 34038 (0.0012) [2024-06-15 11:58:35,602][1652475] Updated weights for policy 0, policy_version 34066 (0.0011) [2024-06-15 11:58:35,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42598.4, 300 sec: 41876.4). Total num frames: 69763072. Throughput: 0: 11013.7. Samples: 17492480. Policy #0 lag: (min: 9.0, avg: 87.8, max: 265.0) [2024-06-15 11:58:35,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:37,109][1652475] Updated weights for policy 0, policy_version 34132 (0.0011) [2024-06-15 11:58:40,738][1648984] Fps is (10 sec: 49159.7, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 69992448. Throughput: 0: 10854.4. Samples: 17559552. Policy #0 lag: (min: 9.0, avg: 87.8, max: 265.0) [2024-06-15 11:58:40,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:42,426][1652475] Updated weights for policy 0, policy_version 34179 (0.0020) [2024-06-15 11:58:43,992][1652475] Updated weights for policy 0, policy_version 34242 (0.0014) [2024-06-15 11:58:44,936][1651340] Signal inference workers to stop experience collection... (1850 times) [2024-06-15 11:58:44,972][1652475] InferenceWorker_p0-w0: stopping experience collection (1850 times) [2024-06-15 11:58:45,165][1651340] Signal inference workers to resume experience collection... (1850 times) [2024-06-15 11:58:45,166][1652475] InferenceWorker_p0-w0: resuming experience collection (1850 times) [2024-06-15 11:58:45,416][1652475] Updated weights for policy 0, policy_version 34301 (0.0015) [2024-06-15 11:58:45,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45329.1, 300 sec: 41987.5). Total num frames: 70254592. Throughput: 0: 11013.7. Samples: 17625088. Policy #0 lag: (min: 9.0, avg: 87.8, max: 265.0) [2024-06-15 11:58:45,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:48,425][1652475] Updated weights for policy 0, policy_version 34356 (0.0013) [2024-06-15 11:58:49,771][1652475] Updated weights for policy 0, policy_version 34423 (0.0013) [2024-06-15 11:58:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 70516736. Throughput: 0: 11218.5. Samples: 17665536. Policy #0 lag: (min: 9.0, avg: 87.8, max: 265.0) [2024-06-15 11:58:50,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:54,331][1652475] Updated weights for policy 0, policy_version 34465 (0.0013) [2024-06-15 11:58:55,658][1652475] Updated weights for policy 0, policy_version 34512 (0.0014) [2024-06-15 11:58:55,738][1648984] Fps is (10 sec: 42597.6, 60 sec: 46421.3, 300 sec: 42431.8). Total num frames: 70680576. Throughput: 0: 11104.7. Samples: 17733120. Policy #0 lag: (min: 15.0, avg: 107.9, max: 271.0) [2024-06-15 11:58:55,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:58:56,011][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000034528_70713344.pth... [2024-06-15 11:58:56,208][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000029568_60555264.pth [2024-06-15 11:58:59,358][1652475] Updated weights for policy 0, policy_version 34580 (0.0012) [2024-06-15 11:59:00,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 42320.7). Total num frames: 70942720. Throughput: 0: 11218.5. Samples: 17800192. Policy #0 lag: (min: 15.0, avg: 107.9, max: 271.0) [2024-06-15 11:59:00,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:59:00,904][1652475] Updated weights for policy 0, policy_version 34656 (0.0013) [2024-06-15 11:59:05,487][1652475] Updated weights for policy 0, policy_version 34708 (0.0013) [2024-06-15 11:59:05,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 71106560. Throughput: 0: 11093.3. Samples: 17833984. Policy #0 lag: (min: 15.0, avg: 107.9, max: 271.0) [2024-06-15 11:59:05,738][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:59:07,254][1652475] Updated weights for policy 0, policy_version 34771 (0.0014) [2024-06-15 11:59:08,257][1652475] Updated weights for policy 0, policy_version 34815 (0.0013) [2024-06-15 11:59:10,742][1648984] Fps is (10 sec: 39303.4, 60 sec: 44779.5, 300 sec: 42431.1). Total num frames: 71335936. Throughput: 0: 11285.6. Samples: 17903104. Policy #0 lag: (min: 15.0, avg: 107.9, max: 271.0) [2024-06-15 11:59:10,743][1648984] Avg episode reward: [(0, '-0.930')] [2024-06-15 11:59:11,439][1652475] Updated weights for policy 0, policy_version 34880 (0.0015) [2024-06-15 11:59:15,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 71565312. Throughput: 0: 11082.4. Samples: 17965056. Policy #0 lag: (min: 15.0, avg: 107.9, max: 271.0) [2024-06-15 11:59:15,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 11:59:18,463][1652475] Updated weights for policy 0, policy_version 34946 (0.0013) [2024-06-15 11:59:19,716][1652475] Updated weights for policy 0, policy_version 35005 (0.0012) [2024-06-15 11:59:20,738][1648984] Fps is (10 sec: 39339.1, 60 sec: 44236.7, 300 sec: 42876.1). Total num frames: 71729152. Throughput: 0: 11264.0. Samples: 17999360. Policy #0 lag: (min: 11.0, avg: 98.7, max: 267.0) [2024-06-15 11:59:20,739][1648984] Avg episode reward: [(0, '-1.060')] [2024-06-15 11:59:21,818][1652475] Updated weights for policy 0, policy_version 35075 (0.0014) [2024-06-15 11:59:22,775][1652475] Updated weights for policy 0, policy_version 35127 (0.0019) [2024-06-15 11:59:23,956][1652475] Updated weights for policy 0, policy_version 35184 (0.0013) [2024-06-15 11:59:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 44783.0, 300 sec: 42653.9). Total num frames: 72089600. Throughput: 0: 11138.8. Samples: 18060800. Policy #0 lag: (min: 11.0, avg: 98.7, max: 267.0) [2024-06-15 11:59:25,738][1648984] Avg episode reward: [(0, '-1.060')] [2024-06-15 11:59:30,305][1651340] Signal inference workers to stop experience collection... (1900 times) [2024-06-15 11:59:30,348][1652475] InferenceWorker_p0-w0: stopping experience collection (1900 times) [2024-06-15 11:59:30,588][1651340] Signal inference workers to resume experience collection... (1900 times) [2024-06-15 11:59:30,589][1652475] InferenceWorker_p0-w0: resuming experience collection (1900 times) [2024-06-15 11:59:30,738][1648984] Fps is (10 sec: 45876.3, 60 sec: 44784.1, 300 sec: 42987.2). Total num frames: 72187904. Throughput: 0: 11150.2. Samples: 18126848. Policy #0 lag: (min: 11.0, avg: 98.7, max: 267.0) [2024-06-15 11:59:30,738][1648984] Avg episode reward: [(0, '-0.840')] [2024-06-15 11:59:30,774][1652475] Updated weights for policy 0, policy_version 35258 (0.0013) [2024-06-15 11:59:32,481][1652475] Updated weights for policy 0, policy_version 35312 (0.0023) [2024-06-15 11:59:35,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 72384512. Throughput: 0: 10945.4. Samples: 18158080. Policy #0 lag: (min: 11.0, avg: 98.7, max: 267.0) [2024-06-15 11:59:35,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 11:59:36,894][1652475] Updated weights for policy 0, policy_version 35393 (0.0014) [2024-06-15 11:59:38,445][1652475] Updated weights for policy 0, policy_version 35456 (0.0013) [2024-06-15 11:59:40,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 72613888. Throughput: 0: 10717.9. Samples: 18215424. Policy #0 lag: (min: 11.0, avg: 98.7, max: 267.0) [2024-06-15 11:59:40,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 11:59:43,000][1652475] Updated weights for policy 0, policy_version 35515 (0.0017) [2024-06-15 11:59:44,986][1652475] Updated weights for policy 0, policy_version 35582 (0.0095) [2024-06-15 11:59:45,756][1648984] Fps is (10 sec: 49062.5, 60 sec: 43677.4, 300 sec: 43317.7). Total num frames: 72876032. Throughput: 0: 10713.5. Samples: 18282496. Policy #0 lag: (min: 9.0, avg: 106.1, max: 265.0) [2024-06-15 11:59:45,757][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 11:59:48,765][1652475] Updated weights for policy 0, policy_version 35632 (0.0014) [2024-06-15 11:59:50,121][1652475] Updated weights for policy 0, policy_version 35682 (0.0012) [2024-06-15 11:59:50,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 73138176. Throughput: 0: 10774.8. Samples: 18318848. Policy #0 lag: (min: 9.0, avg: 106.1, max: 265.0) [2024-06-15 11:59:50,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 11:59:53,234][1652475] Updated weights for policy 0, policy_version 35715 (0.0013) [2024-06-15 11:59:54,604][1652475] Updated weights for policy 0, policy_version 35772 (0.0011) [2024-06-15 11:59:55,754][1648984] Fps is (10 sec: 42605.9, 60 sec: 43678.8, 300 sec: 43206.9). Total num frames: 73302016. Throughput: 0: 10658.2. Samples: 18382848. Policy #0 lag: (min: 9.0, avg: 106.1, max: 265.0) [2024-06-15 11:59:55,755][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 11:59:56,499][1652475] Updated weights for policy 0, policy_version 35833 (0.0013) [2024-06-15 12:00:00,737][1648984] Fps is (10 sec: 32768.3, 60 sec: 42052.4, 300 sec: 42987.2). Total num frames: 73465856. Throughput: 0: 10865.8. Samples: 18454016. Policy #0 lag: (min: 9.0, avg: 106.1, max: 265.0) [2024-06-15 12:00:00,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:01,308][1652475] Updated weights for policy 0, policy_version 35904 (0.0104) [2024-06-15 12:00:02,450][1652475] Updated weights for policy 0, policy_version 35952 (0.0014) [2024-06-15 12:00:05,738][1648984] Fps is (10 sec: 42669.2, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 73728000. Throughput: 0: 10683.8. Samples: 18480128. Policy #0 lag: (min: 9.0, avg: 122.5, max: 265.0) [2024-06-15 12:00:05,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:05,986][1652475] Updated weights for policy 0, policy_version 36017 (0.0013) [2024-06-15 12:00:08,960][1652475] Updated weights for policy 0, policy_version 36080 (0.0027) [2024-06-15 12:00:10,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43147.9, 300 sec: 43320.4). Total num frames: 73924608. Throughput: 0: 10740.6. Samples: 18544128. Policy #0 lag: (min: 9.0, avg: 122.5, max: 265.0) [2024-06-15 12:00:10,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:12,025][1652475] Updated weights for policy 0, policy_version 36118 (0.0041) [2024-06-15 12:00:13,965][1652475] Updated weights for policy 0, policy_version 36208 (0.0095) [2024-06-15 12:00:14,413][1652475] Updated weights for policy 0, policy_version 36224 (0.0010) [2024-06-15 12:00:15,754][1648984] Fps is (10 sec: 45798.5, 60 sec: 43678.5, 300 sec: 42873.7). Total num frames: 74186752. Throughput: 0: 10793.5. Samples: 18612736. Policy #0 lag: (min: 9.0, avg: 122.5, max: 265.0) [2024-06-15 12:00:15,755][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:16,822][1651340] Signal inference workers to stop experience collection... (1950 times) [2024-06-15 12:00:16,863][1652475] InferenceWorker_p0-w0: stopping experience collection (1950 times) [2024-06-15 12:00:17,073][1651340] Signal inference workers to resume experience collection... (1950 times) [2024-06-15 12:00:17,074][1652475] InferenceWorker_p0-w0: resuming experience collection (1950 times) [2024-06-15 12:00:19,690][1652475] Updated weights for policy 0, policy_version 36289 (0.0107) [2024-06-15 12:00:20,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 44236.9, 300 sec: 43320.4). Total num frames: 74383360. Throughput: 0: 10820.3. Samples: 18644992. Policy #0 lag: (min: 9.0, avg: 122.5, max: 265.0) [2024-06-15 12:00:20,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:21,095][1652475] Updated weights for policy 0, policy_version 36352 (0.0014) [2024-06-15 12:00:24,954][1652475] Updated weights for policy 0, policy_version 36401 (0.0019) [2024-06-15 12:00:25,738][1648984] Fps is (10 sec: 42669.6, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 74612736. Throughput: 0: 11127.5. Samples: 18716160. Policy #0 lag: (min: 9.0, avg: 122.5, max: 265.0) [2024-06-15 12:00:25,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:26,734][1652475] Updated weights for policy 0, policy_version 36480 (0.0024) [2024-06-15 12:00:30,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 74842112. Throughput: 0: 11018.2. Samples: 18778112. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:00:30,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:31,879][1652475] Updated weights for policy 0, policy_version 36560 (0.0013) [2024-06-15 12:00:35,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.6, 300 sec: 43209.4). Total num frames: 74973184. Throughput: 0: 10899.9. Samples: 18809344. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:00:35,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:36,056][1652475] Updated weights for policy 0, policy_version 36628 (0.0016) [2024-06-15 12:00:38,265][1652475] Updated weights for policy 0, policy_version 36726 (0.0012) [2024-06-15 12:00:40,740][1648984] Fps is (10 sec: 39312.8, 60 sec: 43689.1, 300 sec: 42764.7). Total num frames: 75235328. Throughput: 0: 10914.7. Samples: 18873856. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:00:40,740][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:40,946][1652475] Updated weights for policy 0, policy_version 36754 (0.0012) [2024-06-15 12:00:44,052][1652475] Updated weights for policy 0, policy_version 36805 (0.0013) [2024-06-15 12:00:45,274][1652475] Updated weights for policy 0, policy_version 36860 (0.0015) [2024-06-15 12:00:45,745][1648984] Fps is (10 sec: 52392.2, 60 sec: 43698.9, 300 sec: 43541.5). Total num frames: 75497472. Throughput: 0: 10795.8. Samples: 18939904. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:00:45,745][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:49,000][1652475] Updated weights for policy 0, policy_version 36915 (0.0012) [2024-06-15 12:00:50,695][1652475] Updated weights for policy 0, policy_version 36986 (0.0012) [2024-06-15 12:00:50,738][1648984] Fps is (10 sec: 49163.4, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 75726848. Throughput: 0: 11150.2. Samples: 18981888. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:00:50,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:53,503][1652475] Updated weights for policy 0, policy_version 37045 (0.0012) [2024-06-15 12:00:55,738][1648984] Fps is (10 sec: 39347.7, 60 sec: 43156.2, 300 sec: 43098.2). Total num frames: 75890688. Throughput: 0: 10968.1. Samples: 19037696. Policy #0 lag: (min: 14.0, avg: 142.8, max: 270.0) [2024-06-15 12:00:55,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:00:56,425][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000037088_75956224.pth... [2024-06-15 12:00:56,533][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000032000_65536000.pth [2024-06-15 12:00:57,086][1652475] Updated weights for policy 0, policy_version 37112 (0.0014) [2024-06-15 12:01:00,737][1648984] Fps is (10 sec: 32768.2, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 76054528. Throughput: 0: 11029.2. Samples: 19108864. Policy #0 lag: (min: 14.0, avg: 142.8, max: 270.0) [2024-06-15 12:01:00,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:01:01,480][1652475] Updated weights for policy 0, policy_version 37168 (0.0012) [2024-06-15 12:01:02,329][1651340] Signal inference workers to stop experience collection... (2000 times) [2024-06-15 12:01:02,382][1652475] InferenceWorker_p0-w0: stopping experience collection (2000 times) [2024-06-15 12:01:02,495][1651340] Signal inference workers to resume experience collection... (2000 times) [2024-06-15 12:01:02,499][1652475] InferenceWorker_p0-w0: resuming experience collection (2000 times) [2024-06-15 12:01:03,116][1652475] Updated weights for policy 0, policy_version 37243 (0.0016) [2024-06-15 12:01:05,738][1648984] Fps is (10 sec: 49153.6, 60 sec: 44236.7, 300 sec: 42987.2). Total num frames: 76382208. Throughput: 0: 10854.4. Samples: 19133440. Policy #0 lag: (min: 14.0, avg: 142.8, max: 270.0) [2024-06-15 12:01:05,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:01:05,790][1652475] Updated weights for policy 0, policy_version 37310 (0.0026) [2024-06-15 12:01:09,471][1652475] Updated weights for policy 0, policy_version 37369 (0.0014) [2024-06-15 12:01:10,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 76546048. Throughput: 0: 10717.9. Samples: 19198464. Policy #0 lag: (min: 14.0, avg: 142.8, max: 270.0) [2024-06-15 12:01:10,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:01:13,006][1652475] Updated weights for policy 0, policy_version 37414 (0.0057) [2024-06-15 12:01:14,940][1652475] Updated weights for policy 0, policy_version 37493 (0.0013) [2024-06-15 12:01:15,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 43702.7, 300 sec: 43320.4). Total num frames: 76808192. Throughput: 0: 10808.9. Samples: 19264512. Policy #0 lag: (min: 14.0, avg: 142.8, max: 270.0) [2024-06-15 12:01:15,739][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:01:17,258][1652475] Updated weights for policy 0, policy_version 37537 (0.0014) [2024-06-15 12:01:20,376][1652475] Updated weights for policy 0, policy_version 37586 (0.0025) [2024-06-15 12:01:20,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 77004800. Throughput: 0: 10877.2. Samples: 19298816. Policy #0 lag: (min: 15.0, avg: 145.4, max: 271.0) [2024-06-15 12:01:20,738][1648984] Avg episode reward: [(0, '-0.790')] [2024-06-15 12:01:24,216][1652475] Updated weights for policy 0, policy_version 37651 (0.0014) [2024-06-15 12:01:25,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.5, 300 sec: 43653.6). Total num frames: 77234176. Throughput: 0: 11048.3. Samples: 19371008. Policy #0 lag: (min: 15.0, avg: 145.4, max: 271.0) [2024-06-15 12:01:25,739][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:01:26,362][1652475] Updated weights for policy 0, policy_version 37745 (0.0120) [2024-06-15 12:01:30,209][1652475] Updated weights for policy 0, policy_version 37794 (0.0012) [2024-06-15 12:01:30,738][1648984] Fps is (10 sec: 45873.9, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 77463552. Throughput: 0: 10901.5. Samples: 19430400. Policy #0 lag: (min: 15.0, avg: 145.4, max: 271.0) [2024-06-15 12:01:30,739][1648984] Avg episode reward: [(0, '-0.650')] [2024-06-15 12:01:30,740][1651340] Saving new best policy, reward=-0.650! [2024-06-15 12:01:32,457][1652475] Updated weights for policy 0, policy_version 37830 (0.0014) [2024-06-15 12:01:33,749][1652475] Updated weights for policy 0, policy_version 37886 (0.0011) [2024-06-15 12:01:35,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 77627392. Throughput: 0: 10626.8. Samples: 19460096. Policy #0 lag: (min: 15.0, avg: 145.4, max: 271.0) [2024-06-15 12:01:35,738][1648984] Avg episode reward: [(0, '-0.730')] [2024-06-15 12:01:36,882][1652475] Updated weights for policy 0, policy_version 37954 (0.0018) [2024-06-15 12:01:38,350][1652475] Updated weights for policy 0, policy_version 38014 (0.0013) [2024-06-15 12:01:40,797][1648984] Fps is (10 sec: 39091.7, 60 sec: 43649.3, 300 sec: 42867.7). Total num frames: 77856768. Throughput: 0: 10635.7. Samples: 19516928. Policy #0 lag: (min: 15.0, avg: 145.4, max: 271.0) [2024-06-15 12:01:40,797][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 12:01:40,814][1651340] Saving new best policy, reward=-0.610! [2024-06-15 12:01:44,968][1652475] Updated weights for policy 0, policy_version 38075 (0.0013) [2024-06-15 12:01:45,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 41511.0, 300 sec: 43098.3). Total num frames: 77987840. Throughput: 0: 10569.9. Samples: 19584512. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 12:01:45,756][1648984] Avg episode reward: [(0, '-0.710')] [2024-06-15 12:01:46,756][1652475] Updated weights for policy 0, policy_version 38140 (0.0012) [2024-06-15 12:01:49,367][1651340] Signal inference workers to stop experience collection... (2050 times) [2024-06-15 12:01:49,406][1652475] InferenceWorker_p0-w0: stopping experience collection (2050 times) [2024-06-15 12:01:49,638][1651340] Signal inference workers to resume experience collection... (2050 times) [2024-06-15 12:01:49,639][1652475] InferenceWorker_p0-w0: resuming experience collection (2050 times) [2024-06-15 12:01:50,019][1652475] Updated weights for policy 0, policy_version 38224 (0.0218) [2024-06-15 12:01:50,738][1648984] Fps is (10 sec: 49444.3, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 78348288. Throughput: 0: 10752.0. Samples: 19617280. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 12:01:50,738][1648984] Avg episode reward: [(0, '-0.600')] [2024-06-15 12:01:50,927][1651340] Saving new best policy, reward=-0.600! [2024-06-15 12:01:55,594][1652475] Updated weights for policy 0, policy_version 38289 (0.0028) [2024-06-15 12:01:55,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42052.5, 300 sec: 42765.0). Total num frames: 78413824. Throughput: 0: 10604.1. Samples: 19675648. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 12:01:55,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:01:56,758][1652475] Updated weights for policy 0, policy_version 38336 (0.0012) [2024-06-15 12:01:58,097][1652475] Updated weights for policy 0, policy_version 38389 (0.0017) [2024-06-15 12:02:00,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 43144.4, 300 sec: 43542.6). Total num frames: 78643200. Throughput: 0: 10740.6. Samples: 19747840. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 12:02:00,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:01,712][1652475] Updated weights for policy 0, policy_version 38456 (0.0013) [2024-06-15 12:02:02,898][1652475] Updated weights for policy 0, policy_version 38496 (0.0013) [2024-06-15 12:02:05,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 42052.2, 300 sec: 43098.2). Total num frames: 78905344. Throughput: 0: 10626.8. Samples: 19777024. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 12:02:05,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:07,233][1652475] Updated weights for policy 0, policy_version 38544 (0.0119) [2024-06-15 12:02:08,839][1652475] Updated weights for policy 0, policy_version 38608 (0.0013) [2024-06-15 12:02:10,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 79167488. Throughput: 0: 10444.8. Samples: 19841024. Policy #0 lag: (min: 3.0, avg: 131.5, max: 259.0) [2024-06-15 12:02:10,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:12,842][1652475] Updated weights for policy 0, policy_version 38672 (0.0011) [2024-06-15 12:02:13,869][1652475] Updated weights for policy 0, policy_version 38718 (0.0012) [2024-06-15 12:02:15,396][1652475] Updated weights for policy 0, policy_version 38782 (0.0013) [2024-06-15 12:02:15,739][1648984] Fps is (10 sec: 52422.2, 60 sec: 43689.8, 300 sec: 43764.5). Total num frames: 79429632. Throughput: 0: 10649.4. Samples: 19909632. Policy #0 lag: (min: 3.0, avg: 131.5, max: 259.0) [2024-06-15 12:02:15,739][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:19,333][1652475] Updated weights for policy 0, policy_version 38832 (0.0015) [2024-06-15 12:02:20,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 79626240. Throughput: 0: 10843.0. Samples: 19948032. Policy #0 lag: (min: 3.0, avg: 131.5, max: 259.0) [2024-06-15 12:02:20,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:20,841][1652475] Updated weights for policy 0, policy_version 38883 (0.0013) [2024-06-15 12:02:24,726][1652475] Updated weights for policy 0, policy_version 38946 (0.0011) [2024-06-15 12:02:25,738][1648984] Fps is (10 sec: 39326.7, 60 sec: 43144.7, 300 sec: 43987.5). Total num frames: 79822848. Throughput: 0: 11187.7. Samples: 20019712. Policy #0 lag: (min: 3.0, avg: 131.5, max: 259.0) [2024-06-15 12:02:25,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:27,336][1652475] Updated weights for policy 0, policy_version 39024 (0.0017) [2024-06-15 12:02:29,463][1652475] Updated weights for policy 0, policy_version 39044 (0.0013) [2024-06-15 12:02:30,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 43690.9, 300 sec: 43653.6). Total num frames: 80084992. Throughput: 0: 11082.0. Samples: 20083200. Policy #0 lag: (min: 3.0, avg: 131.5, max: 259.0) [2024-06-15 12:02:30,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:31,740][1652475] Updated weights for policy 0, policy_version 39120 (0.0014) [2024-06-15 12:02:32,954][1652475] Updated weights for policy 0, policy_version 39165 (0.0013) [2024-06-15 12:02:35,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 43144.3, 300 sec: 43542.5). Total num frames: 80216064. Throughput: 0: 10990.8. Samples: 20111872. Policy #0 lag: (min: 3.0, avg: 131.5, max: 259.0) [2024-06-15 12:02:35,739][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:36,905][1652475] Updated weights for policy 0, policy_version 39225 (0.0014) [2024-06-15 12:02:38,146][1651340] Signal inference workers to stop experience collection... (2100 times) [2024-06-15 12:02:38,273][1652475] InferenceWorker_p0-w0: stopping experience collection (2100 times) [2024-06-15 12:02:38,468][1651340] Signal inference workers to resume experience collection... (2100 times) [2024-06-15 12:02:38,469][1652475] InferenceWorker_p0-w0: resuming experience collection (2100 times) [2024-06-15 12:02:39,211][1652475] Updated weights for policy 0, policy_version 39269 (0.0027) [2024-06-15 12:02:40,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 43733.6, 300 sec: 43875.8). Total num frames: 80478208. Throughput: 0: 11195.7. Samples: 20179456. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 12:02:40,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:41,618][1652475] Updated weights for policy 0, policy_version 39298 (0.0018) [2024-06-15 12:02:42,795][1652475] Updated weights for policy 0, policy_version 39351 (0.0015) [2024-06-15 12:02:44,041][1652475] Updated weights for policy 0, policy_version 39392 (0.0011) [2024-06-15 12:02:45,738][1648984] Fps is (10 sec: 52430.1, 60 sec: 45875.1, 300 sec: 43542.6). Total num frames: 80740352. Throughput: 0: 11059.2. Samples: 20245504. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 12:02:45,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:48,529][1652475] Updated weights for policy 0, policy_version 39476 (0.0012) [2024-06-15 12:02:50,471][1652475] Updated weights for policy 0, policy_version 39520 (0.0019) [2024-06-15 12:02:50,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43144.5, 300 sec: 44209.1). Total num frames: 80936960. Throughput: 0: 11264.0. Samples: 20283904. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 12:02:50,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:52,910][1652475] Updated weights for policy 0, policy_version 39555 (0.0012) [2024-06-15 12:02:54,997][1652475] Updated weights for policy 0, policy_version 39633 (0.0014) [2024-06-15 12:02:55,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 46967.4, 300 sec: 43875.8). Total num frames: 81231872. Throughput: 0: 11309.5. Samples: 20349952. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 12:02:55,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:02:56,027][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000039680_81264640.pth... [2024-06-15 12:02:56,080][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000034528_70713344.pth [2024-06-15 12:02:59,318][1652475] Updated weights for policy 0, policy_version 39700 (0.0024) [2024-06-15 12:03:00,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 81395712. Throughput: 0: 11309.8. Samples: 20418560. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 12:03:00,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:03:02,536][1652475] Updated weights for policy 0, policy_version 39777 (0.0015) [2024-06-15 12:03:05,194][1652475] Updated weights for policy 0, policy_version 39827 (0.0036) [2024-06-15 12:03:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 45329.1, 300 sec: 43986.9). Total num frames: 81625088. Throughput: 0: 11161.6. Samples: 20450304. Policy #0 lag: (min: 6.0, avg: 112.7, max: 262.0) [2024-06-15 12:03:05,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:03:06,854][1652475] Updated weights for policy 0, policy_version 39891 (0.0025) [2024-06-15 12:03:10,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 81788928. Throughput: 0: 11070.6. Samples: 20517888. Policy #0 lag: (min: 6.0, avg: 112.7, max: 262.0) [2024-06-15 12:03:10,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:03:11,086][1652475] Updated weights for policy 0, policy_version 39960 (0.0014) [2024-06-15 12:03:13,844][1652475] Updated weights for policy 0, policy_version 40018 (0.0014) [2024-06-15 12:03:14,783][1652475] Updated weights for policy 0, policy_version 40060 (0.0016) [2024-06-15 12:03:15,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43691.7, 300 sec: 43986.9). Total num frames: 82051072. Throughput: 0: 11241.2. Samples: 20589056. Policy #0 lag: (min: 6.0, avg: 112.7, max: 262.0) [2024-06-15 12:03:15,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:03:16,994][1652475] Updated weights for policy 0, policy_version 40101 (0.0012) [2024-06-15 12:03:18,388][1652475] Updated weights for policy 0, policy_version 40144 (0.0016) [2024-06-15 12:03:20,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 82313216. Throughput: 0: 11343.7. Samples: 20622336. Policy #0 lag: (min: 6.0, avg: 112.7, max: 262.0) [2024-06-15 12:03:20,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:03:21,957][1652475] Updated weights for policy 0, policy_version 40197 (0.0136) [2024-06-15 12:03:22,909][1652475] Updated weights for policy 0, policy_version 40252 (0.0012) [2024-06-15 12:03:25,226][1651340] Signal inference workers to stop experience collection... (2150 times) [2024-06-15 12:03:25,264][1652475] InferenceWorker_p0-w0: stopping experience collection (2150 times) [2024-06-15 12:03:25,540][1651340] Signal inference workers to resume experience collection... (2150 times) [2024-06-15 12:03:25,541][1652475] InferenceWorker_p0-w0: resuming experience collection (2150 times) [2024-06-15 12:03:25,676][1652475] Updated weights for policy 0, policy_version 40307 (0.0014) [2024-06-15 12:03:25,739][1648984] Fps is (10 sec: 49143.8, 60 sec: 45327.8, 300 sec: 44209.0). Total num frames: 82542592. Throughput: 0: 11479.8. Samples: 20696064. Policy #0 lag: (min: 6.0, avg: 112.7, max: 262.0) [2024-06-15 12:03:25,740][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:03:27,805][1652475] Updated weights for policy 0, policy_version 40336 (0.0012) [2024-06-15 12:03:30,296][1652475] Updated weights for policy 0, policy_version 40401 (0.0012) [2024-06-15 12:03:30,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 44098.0). Total num frames: 82771968. Throughput: 0: 11264.0. Samples: 20752384. Policy #0 lag: (min: 15.0, avg: 124.5, max: 271.0) [2024-06-15 12:03:30,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:03:31,224][1652475] Updated weights for policy 0, policy_version 40448 (0.0104) [2024-06-15 12:03:34,472][1652475] Updated weights for policy 0, policy_version 40507 (0.0017) [2024-06-15 12:03:35,738][1648984] Fps is (10 sec: 42605.5, 60 sec: 45875.5, 300 sec: 43986.9). Total num frames: 82968576. Throughput: 0: 11332.3. Samples: 20793856. Policy #0 lag: (min: 15.0, avg: 124.5, max: 271.0) [2024-06-15 12:03:35,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:03:37,966][1652475] Updated weights for policy 0, policy_version 40560 (0.0011) [2024-06-15 12:03:40,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 83165184. Throughput: 0: 11286.8. Samples: 20857856. Policy #0 lag: (min: 15.0, avg: 124.5, max: 271.0) [2024-06-15 12:03:40,738][1648984] Avg episode reward: [(0, '-0.820')] [2024-06-15 12:03:40,940][1652475] Updated weights for policy 0, policy_version 40635 (0.0014) [2024-06-15 12:03:42,713][1652475] Updated weights for policy 0, policy_version 40700 (0.0015) [2024-06-15 12:03:45,739][1648984] Fps is (10 sec: 42597.2, 60 sec: 44236.7, 300 sec: 43653.6). Total num frames: 83394560. Throughput: 0: 11195.7. Samples: 20922368. Policy #0 lag: (min: 15.0, avg: 124.5, max: 271.0) [2024-06-15 12:03:45,741][1648984] Avg episode reward: [(0, '-0.830')] [2024-06-15 12:03:46,375][1652475] Updated weights for policy 0, policy_version 40765 (0.0015) [2024-06-15 12:03:49,480][1652475] Updated weights for policy 0, policy_version 40826 (0.0116) [2024-06-15 12:03:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 43875.8). Total num frames: 83623936. Throughput: 0: 11241.2. Samples: 20956160. Policy #0 lag: (min: 15.0, avg: 124.5, max: 271.0) [2024-06-15 12:03:50,738][1648984] Avg episode reward: [(0, '-0.650')] [2024-06-15 12:03:54,046][1652475] Updated weights for policy 0, policy_version 40887 (0.0022) [2024-06-15 12:03:55,478][1652475] Updated weights for policy 0, policy_version 40932 (0.0013) [2024-06-15 12:03:55,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 83853312. Throughput: 0: 11138.8. Samples: 21019136. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 12:03:55,738][1648984] Avg episode reward: [(0, '-0.640')] [2024-06-15 12:03:57,207][1652475] Updated weights for policy 0, policy_version 40992 (0.0014) [2024-06-15 12:04:00,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44783.0, 300 sec: 43986.9). Total num frames: 84082688. Throughput: 0: 11093.3. Samples: 21088256. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 12:04:00,738][1648984] Avg episode reward: [(0, '-0.620')] [2024-06-15 12:04:01,402][1652475] Updated weights for policy 0, policy_version 41081 (0.0104) [2024-06-15 12:04:05,362][1652475] Updated weights for policy 0, policy_version 41136 (0.0013) [2024-06-15 12:04:05,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 43876.5). Total num frames: 84279296. Throughput: 0: 10945.4. Samples: 21114880. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 12:04:05,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:04:07,441][1652475] Updated weights for policy 0, policy_version 41201 (0.0013) [2024-06-15 12:04:10,739][1648984] Fps is (10 sec: 32766.9, 60 sec: 43690.4, 300 sec: 43542.5). Total num frames: 84410368. Throughput: 0: 10627.1. Samples: 21174272. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 12:04:10,741][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:04:12,490][1652475] Updated weights for policy 0, policy_version 41264 (0.0012) [2024-06-15 12:04:15,205][1652475] Updated weights for policy 0, policy_version 41314 (0.0012) [2024-06-15 12:04:15,571][1651340] Signal inference workers to stop experience collection... (2200 times) [2024-06-15 12:04:15,636][1652475] InferenceWorker_p0-w0: stopping experience collection (2200 times) [2024-06-15 12:04:15,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.5, 300 sec: 43764.8). Total num frames: 84639744. Throughput: 0: 10934.0. Samples: 21244416. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 12:04:15,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:04:15,874][1651340] Signal inference workers to resume experience collection... (2200 times) [2024-06-15 12:04:15,889][1652475] InferenceWorker_p0-w0: resuming experience collection (2200 times) [2024-06-15 12:04:16,696][1652475] Updated weights for policy 0, policy_version 41376 (0.0013) [2024-06-15 12:04:17,933][1652475] Updated weights for policy 0, policy_version 41409 (0.0013) [2024-06-15 12:04:20,738][1648984] Fps is (10 sec: 52430.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 84934656. Throughput: 0: 10604.1. Samples: 21271040. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 12:04:20,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:04:23,161][1652475] Updated weights for policy 0, policy_version 41474 (0.0119) [2024-06-15 12:04:25,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 42053.3, 300 sec: 43653.6). Total num frames: 85065728. Throughput: 0: 10763.4. Samples: 21342208. Policy #0 lag: (min: 15.0, avg: 106.1, max: 271.0) [2024-06-15 12:04:25,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:04:26,152][1652475] Updated weights for policy 0, policy_version 41538 (0.0015) [2024-06-15 12:04:27,633][1652475] Updated weights for policy 0, policy_version 41600 (0.0013) [2024-06-15 12:04:29,205][1652475] Updated weights for policy 0, policy_version 41658 (0.0125) [2024-06-15 12:04:30,713][1652475] Updated weights for policy 0, policy_version 41727 (0.0015) [2024-06-15 12:04:30,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44782.9, 300 sec: 44320.1). Total num frames: 85458944. Throughput: 0: 10615.5. Samples: 21400064. Policy #0 lag: (min: 15.0, avg: 106.1, max: 271.0) [2024-06-15 12:04:30,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:04:35,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 43653.7). Total num frames: 85491712. Throughput: 0: 10774.8. Samples: 21441024. Policy #0 lag: (min: 15.0, avg: 106.1, max: 271.0) [2024-06-15 12:04:35,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:04:36,339][1652475] Updated weights for policy 0, policy_version 41776 (0.0012) [2024-06-15 12:04:39,004][1652475] Updated weights for policy 0, policy_version 41840 (0.0014) [2024-06-15 12:04:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 44236.9, 300 sec: 43878.5). Total num frames: 85819392. Throughput: 0: 10922.7. Samples: 21510656. Policy #0 lag: (min: 15.0, avg: 106.1, max: 271.0) [2024-06-15 12:04:40,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:04:40,892][1652475] Updated weights for policy 0, policy_version 41910 (0.0011) [2024-06-15 12:04:42,512][1652475] Updated weights for policy 0, policy_version 41957 (0.0046) [2024-06-15 12:04:45,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.7, 300 sec: 43542.6). Total num frames: 85983232. Throughput: 0: 10820.3. Samples: 21575168. Policy #0 lag: (min: 15.0, avg: 106.1, max: 271.0) [2024-06-15 12:04:45,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:04:47,613][1652475] Updated weights for policy 0, policy_version 42023 (0.0016) [2024-06-15 12:04:49,999][1652475] Updated weights for policy 0, policy_version 42065 (0.0015) [2024-06-15 12:04:50,737][1648984] Fps is (10 sec: 39322.1, 60 sec: 43144.7, 300 sec: 43767.2). Total num frames: 86212608. Throughput: 0: 11036.5. Samples: 21611520. Policy #0 lag: (min: 16.0, avg: 101.3, max: 272.0) [2024-06-15 12:04:50,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:04:51,565][1652475] Updated weights for policy 0, policy_version 42128 (0.0010) [2024-06-15 12:04:53,286][1652475] Updated weights for policy 0, policy_version 42178 (0.0029) [2024-06-15 12:04:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 86507520. Throughput: 0: 10979.6. Samples: 21668352. Policy #0 lag: (min: 16.0, avg: 101.3, max: 272.0) [2024-06-15 12:04:55,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:04:55,756][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000042240_86507520.pth... [2024-06-15 12:04:55,820][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000037088_75956224.pth [2024-06-15 12:04:59,207][1652475] Updated weights for policy 0, policy_version 42272 (0.0013) [2024-06-15 12:05:00,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 86638592. Throughput: 0: 11070.6. Samples: 21742592. Policy #0 lag: (min: 16.0, avg: 101.3, max: 272.0) [2024-06-15 12:05:00,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:01,408][1651340] Signal inference workers to stop experience collection... (2250 times) [2024-06-15 12:05:01,460][1652475] InferenceWorker_p0-w0: stopping experience collection (2250 times) [2024-06-15 12:05:01,694][1651340] Signal inference workers to resume experience collection... (2250 times) [2024-06-15 12:05:01,695][1652475] InferenceWorker_p0-w0: resuming experience collection (2250 times) [2024-06-15 12:05:02,566][1652475] Updated weights for policy 0, policy_version 42352 (0.0034) [2024-06-15 12:05:04,457][1652475] Updated weights for policy 0, policy_version 42432 (0.0032) [2024-06-15 12:05:05,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 44097.9). Total num frames: 86933504. Throughput: 0: 11195.7. Samples: 21774848. Policy #0 lag: (min: 16.0, avg: 101.3, max: 272.0) [2024-06-15 12:05:05,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:06,664][1652475] Updated weights for policy 0, policy_version 42496 (0.0012) [2024-06-15 12:05:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.9, 300 sec: 43545.0). Total num frames: 87031808. Throughput: 0: 11047.8. Samples: 21839360. Policy #0 lag: (min: 16.0, avg: 101.3, max: 272.0) [2024-06-15 12:05:10,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:11,693][1652475] Updated weights for policy 0, policy_version 42557 (0.0014) [2024-06-15 12:05:14,618][1652475] Updated weights for policy 0, policy_version 42612 (0.0012) [2024-06-15 12:05:15,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45329.1, 300 sec: 43986.9). Total num frames: 87359488. Throughput: 0: 11161.6. Samples: 21902336. Policy #0 lag: (min: 16.0, avg: 101.3, max: 272.0) [2024-06-15 12:05:15,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:16,217][1652475] Updated weights for policy 0, policy_version 42686 (0.0015) [2024-06-15 12:05:20,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 87556096. Throughput: 0: 10877.1. Samples: 21930496. Policy #0 lag: (min: 2.0, avg: 147.1, max: 258.0) [2024-06-15 12:05:20,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:22,603][1652475] Updated weights for policy 0, policy_version 42757 (0.0013) [2024-06-15 12:05:23,886][1652475] Updated weights for policy 0, policy_version 42815 (0.0013) [2024-06-15 12:05:25,771][1648984] Fps is (10 sec: 32658.4, 60 sec: 43666.3, 300 sec: 43537.6). Total num frames: 87687168. Throughput: 0: 10914.5. Samples: 22002176. Policy #0 lag: (min: 2.0, avg: 147.1, max: 258.0) [2024-06-15 12:05:25,772][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:26,648][1652475] Updated weights for policy 0, policy_version 42864 (0.0015) [2024-06-15 12:05:28,176][1652475] Updated weights for policy 0, policy_version 42935 (0.0026) [2024-06-15 12:05:29,788][1652475] Updated weights for policy 0, policy_version 42961 (0.0013) [2024-06-15 12:05:30,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43144.5, 300 sec: 44320.1). Total num frames: 88047616. Throughput: 0: 10911.3. Samples: 22066176. Policy #0 lag: (min: 2.0, avg: 147.1, max: 258.0) [2024-06-15 12:05:30,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:34,108][1652475] Updated weights for policy 0, policy_version 43011 (0.0013) [2024-06-15 12:05:35,738][1648984] Fps is (10 sec: 52605.3, 60 sec: 45329.0, 300 sec: 43987.2). Total num frames: 88211456. Throughput: 0: 10990.9. Samples: 22106112. Policy #0 lag: (min: 2.0, avg: 147.1, max: 258.0) [2024-06-15 12:05:35,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:37,156][1652475] Updated weights for policy 0, policy_version 43088 (0.0021) [2024-06-15 12:05:38,489][1652475] Updated weights for policy 0, policy_version 43142 (0.0015) [2024-06-15 12:05:40,751][1648984] Fps is (10 sec: 42543.3, 60 sec: 44227.2, 300 sec: 43986.0). Total num frames: 88473600. Throughput: 0: 11067.4. Samples: 22166528. Policy #0 lag: (min: 2.0, avg: 147.1, max: 258.0) [2024-06-15 12:05:40,751][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:41,188][1652475] Updated weights for policy 0, policy_version 43203 (0.0014) [2024-06-15 12:05:42,280][1652475] Updated weights for policy 0, policy_version 43251 (0.0014) [2024-06-15 12:05:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 88604672. Throughput: 0: 11116.1. Samples: 22242816. Policy #0 lag: (min: 2.0, avg: 147.1, max: 258.0) [2024-06-15 12:05:45,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:46,436][1651340] Signal inference workers to stop experience collection... (2300 times) [2024-06-15 12:05:46,486][1652475] InferenceWorker_p0-w0: stopping experience collection (2300 times) [2024-06-15 12:05:46,510][1652475] Updated weights for policy 0, policy_version 43282 (0.0014) [2024-06-15 12:05:46,677][1651340] Signal inference workers to resume experience collection... (2300 times) [2024-06-15 12:05:46,679][1652475] InferenceWorker_p0-w0: resuming experience collection (2300 times) [2024-06-15 12:05:49,275][1652475] Updated weights for policy 0, policy_version 43376 (0.0085) [2024-06-15 12:05:50,762][1648984] Fps is (10 sec: 45822.5, 60 sec: 45310.5, 300 sec: 44205.4). Total num frames: 88932352. Throughput: 0: 11144.2. Samples: 22276608. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 12:05:50,763][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:51,242][1652475] Updated weights for policy 0, policy_version 43456 (0.0011) [2024-06-15 12:05:54,263][1652475] Updated weights for policy 0, policy_version 43517 (0.0012) [2024-06-15 12:05:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 89128960. Throughput: 0: 10888.5. Samples: 22329344. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 12:05:55,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:05:59,451][1652475] Updated weights for policy 0, policy_version 43554 (0.0012) [2024-06-15 12:06:00,516][1652475] Updated weights for policy 0, policy_version 43600 (0.0013) [2024-06-15 12:06:00,738][1648984] Fps is (10 sec: 36131.2, 60 sec: 44236.4, 300 sec: 43764.6). Total num frames: 89292800. Throughput: 0: 11195.6. Samples: 22406144. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 12:06:00,740][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 12:06:01,375][1651340] Saving new best policy, reward=-0.590! [2024-06-15 12:06:02,264][1652475] Updated weights for policy 0, policy_version 43650 (0.0014) [2024-06-15 12:06:03,312][1652475] Updated weights for policy 0, policy_version 43712 (0.0082) [2024-06-15 12:06:05,738][1648984] Fps is (10 sec: 42596.8, 60 sec: 43690.4, 300 sec: 44097.9). Total num frames: 89554944. Throughput: 0: 11116.0. Samples: 22430720. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 12:06:05,739][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 12:06:10,738][1648984] Fps is (10 sec: 39323.5, 60 sec: 44236.7, 300 sec: 43653.7). Total num frames: 89686016. Throughput: 0: 10999.1. Samples: 22496768. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 12:06:10,738][1648984] Avg episode reward: [(0, '-0.660')] [2024-06-15 12:06:11,036][1652475] Updated weights for policy 0, policy_version 43808 (0.0163) [2024-06-15 12:06:13,452][1652475] Updated weights for policy 0, policy_version 43894 (0.0022) [2024-06-15 12:06:15,738][1648984] Fps is (10 sec: 36046.3, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 89915392. Throughput: 0: 10888.5. Samples: 22556160. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 12:06:15,738][1648984] Avg episode reward: [(0, '-0.650')] [2024-06-15 12:06:16,801][1652475] Updated weights for policy 0, policy_version 43962 (0.0013) [2024-06-15 12:06:19,692][1652475] Updated weights for policy 0, policy_version 44026 (0.0013) [2024-06-15 12:06:20,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 90177536. Throughput: 0: 10695.1. Samples: 22587392. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 12:06:20,740][1648984] Avg episode reward: [(0, '-0.620')] [2024-06-15 12:06:25,196][1652475] Updated weights for policy 0, policy_version 44087 (0.0013) [2024-06-15 12:06:25,738][1648984] Fps is (10 sec: 39319.5, 60 sec: 43714.8, 300 sec: 43542.5). Total num frames: 90308608. Throughput: 0: 10857.4. Samples: 22654976. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 12:06:25,739][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 12:06:26,295][1651340] Saving new best policy, reward=-0.560! [2024-06-15 12:06:27,315][1652475] Updated weights for policy 0, policy_version 44160 (0.0013) [2024-06-15 12:06:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 43875.8). Total num frames: 90570752. Throughput: 0: 10228.6. Samples: 22703104. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 12:06:30,738][1648984] Avg episode reward: [(0, '-0.740')] [2024-06-15 12:06:30,892][1652475] Updated weights for policy 0, policy_version 44240 (0.0013) [2024-06-15 12:06:32,014][1652475] Updated weights for policy 0, policy_version 44288 (0.0021) [2024-06-15 12:06:35,758][1648984] Fps is (10 sec: 39243.2, 60 sec: 41492.0, 300 sec: 43548.3). Total num frames: 90701824. Throughput: 0: 10229.5. Samples: 22736896. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 12:06:35,759][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:06:37,537][1651340] Signal inference workers to stop experience collection... (2350 times) [2024-06-15 12:06:37,630][1652475] InferenceWorker_p0-w0: stopping experience collection (2350 times) [2024-06-15 12:06:37,826][1651340] Signal inference workers to resume experience collection... (2350 times) [2024-06-15 12:06:37,828][1652475] InferenceWorker_p0-w0: resuming experience collection (2350 times) [2024-06-15 12:06:38,461][1652475] Updated weights for policy 0, policy_version 44337 (0.0013) [2024-06-15 12:06:39,784][1652475] Updated weights for policy 0, policy_version 44400 (0.0014) [2024-06-15 12:06:40,754][1648984] Fps is (10 sec: 42528.1, 60 sec: 42049.8, 300 sec: 44095.5). Total num frames: 90996736. Throughput: 0: 10668.4. Samples: 22809600. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 12:06:40,755][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:06:42,543][1652475] Updated weights for policy 0, policy_version 44482 (0.0128) [2024-06-15 12:06:45,739][1648984] Fps is (10 sec: 52530.4, 60 sec: 43689.8, 300 sec: 43653.5). Total num frames: 91226112. Throughput: 0: 10217.1. Samples: 22865920. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 12:06:45,739][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:06:49,701][1652475] Updated weights for policy 0, policy_version 44547 (0.0013) [2024-06-15 12:06:50,738][1648984] Fps is (10 sec: 32822.3, 60 sec: 39884.0, 300 sec: 43764.7). Total num frames: 91324416. Throughput: 0: 10581.4. Samples: 22906880. Policy #0 lag: (min: 10.0, avg: 73.7, max: 266.0) [2024-06-15 12:06:50,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:06:52,767][1652475] Updated weights for policy 0, policy_version 44688 (0.0013) [2024-06-15 12:06:54,570][1652475] Updated weights for policy 0, policy_version 44737 (0.0014) [2024-06-15 12:06:55,738][1648984] Fps is (10 sec: 45880.2, 60 sec: 42598.4, 300 sec: 44209.0). Total num frames: 91684864. Throughput: 0: 10308.3. Samples: 22960640. Policy #0 lag: (min: 10.0, avg: 73.7, max: 266.0) [2024-06-15 12:06:55,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:06:56,083][1652475] Updated weights for policy 0, policy_version 44794 (0.0020) [2024-06-15 12:06:56,164][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000044800_91750400.pth... [2024-06-15 12:06:56,219][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000039680_81264640.pth [2024-06-15 12:07:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 40960.3, 300 sec: 43542.6). Total num frames: 91750400. Throughput: 0: 10592.7. Samples: 23032832. Policy #0 lag: (min: 10.0, avg: 73.7, max: 266.0) [2024-06-15 12:07:00,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:03,800][1652475] Updated weights for policy 0, policy_version 44873 (0.0013) [2024-06-15 12:07:05,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.4, 300 sec: 43653.6). Total num frames: 92045312. Throughput: 0: 10649.6. Samples: 23066624. Policy #0 lag: (min: 10.0, avg: 73.7, max: 266.0) [2024-06-15 12:07:05,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:06,051][1652475] Updated weights for policy 0, policy_version 44962 (0.0015) [2024-06-15 12:07:10,740][1648984] Fps is (10 sec: 52416.7, 60 sec: 43142.9, 300 sec: 43542.4). Total num frames: 92274688. Throughput: 0: 10194.1. Samples: 23113728. Policy #0 lag: (min: 10.0, avg: 73.7, max: 266.0) [2024-06-15 12:07:10,741][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:14,939][1652475] Updated weights for policy 0, policy_version 45057 (0.0013) [2024-06-15 12:07:15,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 40413.8, 300 sec: 43098.3). Total num frames: 92340224. Throughput: 0: 10945.4. Samples: 23195648. Policy #0 lag: (min: 10.0, avg: 73.7, max: 266.0) [2024-06-15 12:07:15,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:17,916][1652475] Updated weights for policy 0, policy_version 45184 (0.0205) [2024-06-15 12:07:18,049][1651340] Signal inference workers to stop experience collection... (2400 times) [2024-06-15 12:07:18,139][1652475] InferenceWorker_p0-w0: stopping experience collection (2400 times) [2024-06-15 12:07:18,363][1651340] Signal inference workers to resume experience collection... (2400 times) [2024-06-15 12:07:18,364][1652475] InferenceWorker_p0-w0: resuming experience collection (2400 times) [2024-06-15 12:07:20,050][1652475] Updated weights for policy 0, policy_version 45280 (0.0073) [2024-06-15 12:07:20,738][1648984] Fps is (10 sec: 49164.0, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 92766208. Throughput: 0: 10631.7. Samples: 23215104. Policy #0 lag: (min: 158.0, avg: 187.9, max: 383.0) [2024-06-15 12:07:20,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:25,744][1648984] Fps is (10 sec: 45846.1, 60 sec: 41502.1, 300 sec: 43097.3). Total num frames: 92798976. Throughput: 0: 10561.0. Samples: 23284736. Policy #0 lag: (min: 158.0, avg: 187.9, max: 383.0) [2024-06-15 12:07:25,745][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:27,980][1652475] Updated weights for policy 0, policy_version 45344 (0.0011) [2024-06-15 12:07:30,003][1652475] Updated weights for policy 0, policy_version 45424 (0.0014) [2024-06-15 12:07:30,738][1648984] Fps is (10 sec: 29490.8, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 93061120. Throughput: 0: 10661.2. Samples: 23345664. Policy #0 lag: (min: 158.0, avg: 187.9, max: 383.0) [2024-06-15 12:07:30,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:31,506][1652475] Updated weights for policy 0, policy_version 45488 (0.0013) [2024-06-15 12:07:33,365][1652475] Updated weights for policy 0, policy_version 45555 (0.0022) [2024-06-15 12:07:35,738][1648984] Fps is (10 sec: 52461.7, 60 sec: 43705.5, 300 sec: 43542.6). Total num frames: 93323264. Throughput: 0: 10228.6. Samples: 23367168. Policy #0 lag: (min: 158.0, avg: 187.9, max: 383.0) [2024-06-15 12:07:35,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:40,738][1648984] Fps is (10 sec: 29491.5, 60 sec: 39332.4, 300 sec: 42765.0). Total num frames: 93356032. Throughput: 0: 10763.4. Samples: 23444992. Policy #0 lag: (min: 158.0, avg: 187.9, max: 383.0) [2024-06-15 12:07:40,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:40,894][1652475] Updated weights for policy 0, policy_version 45600 (0.0028) [2024-06-15 12:07:42,806][1652475] Updated weights for policy 0, policy_version 45665 (0.0013) [2024-06-15 12:07:44,093][1652475] Updated weights for policy 0, policy_version 45715 (0.0013) [2024-06-15 12:07:45,533][1652475] Updated weights for policy 0, policy_version 45781 (0.0011) [2024-06-15 12:07:45,740][1648984] Fps is (10 sec: 45875.3, 60 sec: 42599.2, 300 sec: 43542.6). Total num frames: 93782016. Throughput: 0: 10331.0. Samples: 23497728. Policy #0 lag: (min: 158.0, avg: 187.9, max: 383.0) [2024-06-15 12:07:45,741][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:50,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 93847552. Throughput: 0: 10296.9. Samples: 23529984. Policy #0 lag: (min: 158.0, avg: 187.9, max: 383.0) [2024-06-15 12:07:50,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:53,327][1652475] Updated weights for policy 0, policy_version 45856 (0.0125) [2024-06-15 12:07:55,471][1652475] Updated weights for policy 0, policy_version 45936 (0.0249) [2024-06-15 12:07:55,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 39867.8, 300 sec: 42987.2). Total num frames: 94076928. Throughput: 0: 10763.9. Samples: 23598080. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:07:55,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:07:57,219][1652475] Updated weights for policy 0, policy_version 46005 (0.0014) [2024-06-15 12:07:58,032][1651340] Signal inference workers to stop experience collection... (2450 times) [2024-06-15 12:07:58,082][1652475] InferenceWorker_p0-w0: stopping experience collection (2450 times) [2024-06-15 12:07:58,301][1651340] Signal inference workers to resume experience collection... (2450 times) [2024-06-15 12:07:58,304][1652475] InferenceWorker_p0-w0: resuming experience collection (2450 times) [2024-06-15 12:07:58,878][1652475] Updated weights for policy 0, policy_version 46070 (0.0012) [2024-06-15 12:08:00,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 94371840. Throughput: 0: 10092.1. Samples: 23649792. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:08:00,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:08:05,738][1648984] Fps is (10 sec: 32767.3, 60 sec: 39321.5, 300 sec: 42765.0). Total num frames: 94404608. Throughput: 0: 10524.4. Samples: 23688704. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:08:05,738][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:08:06,120][1652475] Updated weights for policy 0, policy_version 46115 (0.0011) [2024-06-15 12:08:08,427][1652475] Updated weights for policy 0, policy_version 46212 (0.0012) [2024-06-15 12:08:10,453][1652475] Updated weights for policy 0, policy_version 46288 (0.0013) [2024-06-15 12:08:10,757][1648984] Fps is (10 sec: 42517.7, 60 sec: 42040.6, 300 sec: 43206.5). Total num frames: 94797824. Throughput: 0: 10305.4. Samples: 23748608. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:08:10,757][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:08:11,597][1652475] Updated weights for policy 0, policy_version 46335 (0.0023) [2024-06-15 12:08:15,740][1648984] Fps is (10 sec: 49152.8, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 94896128. Throughput: 0: 10444.8. Samples: 23815680. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:08:15,741][1648984] Avg episode reward: [(0, '-0.800')] [2024-06-15 12:08:18,442][1652475] Updated weights for policy 0, policy_version 46387 (0.0013) [2024-06-15 12:08:20,250][1652475] Updated weights for policy 0, policy_version 46464 (0.0121) [2024-06-15 12:08:20,762][1648984] Fps is (10 sec: 39300.8, 60 sec: 40397.5, 300 sec: 42872.8). Total num frames: 95191040. Throughput: 0: 10814.4. Samples: 23854080. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:08:20,763][1648984] Avg episode reward: [(0, '-0.780')] [2024-06-15 12:08:22,435][1652475] Updated weights for policy 0, policy_version 46544 (0.0019) [2024-06-15 12:08:25,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43695.2, 300 sec: 42876.1). Total num frames: 95420416. Throughput: 0: 10217.2. Samples: 23904768. Policy #0 lag: (min: 95.0, avg: 213.8, max: 351.0) [2024-06-15 12:08:25,738][1648984] Avg episode reward: [(0, '-0.700')] [2024-06-15 12:08:30,738][1648984] Fps is (10 sec: 26278.0, 60 sec: 39867.7, 300 sec: 42320.7). Total num frames: 95453184. Throughput: 0: 10513.1. Samples: 23970816. Policy #0 lag: (min: 95.0, avg: 213.8, max: 351.0) [2024-06-15 12:08:30,738][1648984] Avg episode reward: [(0, '-0.760')] [2024-06-15 12:08:30,987][1652475] Updated weights for policy 0, policy_version 46625 (0.0123) [2024-06-15 12:08:32,400][1652475] Updated weights for policy 0, policy_version 46676 (0.0013) [2024-06-15 12:08:34,746][1652475] Updated weights for policy 0, policy_version 46779 (0.0022) [2024-06-15 12:08:35,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 95813632. Throughput: 0: 10274.1. Samples: 23992320. Policy #0 lag: (min: 95.0, avg: 213.8, max: 351.0) [2024-06-15 12:08:35,739][1648984] Avg episode reward: [(0, '-0.680')] [2024-06-15 12:08:37,151][1652475] Updated weights for policy 0, policy_version 46845 (0.0124) [2024-06-15 12:08:40,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 95944704. Throughput: 0: 10205.9. Samples: 24057344. Policy #0 lag: (min: 95.0, avg: 213.8, max: 351.0) [2024-06-15 12:08:40,738][1648984] Avg episode reward: [(0, '-0.670')] [2024-06-15 12:08:44,120][1651340] Signal inference workers to stop experience collection... (2500 times) [2024-06-15 12:08:44,167][1652475] InferenceWorker_p0-w0: stopping experience collection (2500 times) [2024-06-15 12:08:44,174][1652475] Updated weights for policy 0, policy_version 46917 (0.0013) [2024-06-15 12:08:44,439][1651340] Signal inference workers to resume experience collection... (2500 times) [2024-06-15 12:08:44,440][1652475] InferenceWorker_p0-w0: resuming experience collection (2500 times) [2024-06-15 12:08:45,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 40413.9, 300 sec: 42653.9). Total num frames: 96206848. Throughput: 0: 10376.5. Samples: 24116736. Policy #0 lag: (min: 95.0, avg: 213.8, max: 351.0) [2024-06-15 12:08:45,738][1648984] Avg episode reward: [(0, '-0.650')] [2024-06-15 12:08:46,102][1652475] Updated weights for policy 0, policy_version 46992 (0.0014) [2024-06-15 12:08:49,152][1652475] Updated weights for policy 0, policy_version 47043 (0.0013) [2024-06-15 12:08:50,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 96468992. Throughput: 0: 10069.4. Samples: 24141824. Policy #0 lag: (min: 95.0, avg: 213.8, max: 351.0) [2024-06-15 12:08:50,738][1648984] Avg episode reward: [(0, '-0.600')] [2024-06-15 12:08:55,738][1648984] Fps is (10 sec: 26214.3, 60 sec: 39867.7, 300 sec: 41987.5). Total num frames: 96468992. Throughput: 0: 10119.1. Samples: 24203776. Policy #0 lag: (min: 95.0, avg: 213.8, max: 351.0) [2024-06-15 12:08:55,738][1648984] Avg episode reward: [(0, '-0.600')] [2024-06-15 12:08:55,759][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000047104_96468992.pth... [2024-06-15 12:08:55,819][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000042240_86507520.pth [2024-06-15 12:08:56,523][1652475] Updated weights for policy 0, policy_version 47107 (0.0015) [2024-06-15 12:08:58,144][1652475] Updated weights for policy 0, policy_version 47185 (0.0123) [2024-06-15 12:08:59,789][1652475] Updated weights for policy 0, policy_version 47264 (0.0020) [2024-06-15 12:09:00,746][1648984] Fps is (10 sec: 39288.3, 60 sec: 41500.3, 300 sec: 42652.7). Total num frames: 96862208. Throughput: 0: 9987.8. Samples: 24265216. Policy #0 lag: (min: 15.0, avg: 44.8, max: 175.0) [2024-06-15 12:09:00,747][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:01,302][1651340] Saving new best policy, reward=-0.540! [2024-06-15 12:09:01,906][1652475] Updated weights for policy 0, policy_version 47354 (0.0015) [2024-06-15 12:09:05,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43144.6, 300 sec: 42654.0). Total num frames: 96993280. Throughput: 0: 9858.5. Samples: 24297472. Policy #0 lag: (min: 15.0, avg: 44.8, max: 175.0) [2024-06-15 12:09:05,739][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:08,971][1652475] Updated weights for policy 0, policy_version 47418 (0.0014) [2024-06-15 12:09:10,397][1652475] Updated weights for policy 0, policy_version 47474 (0.0014) [2024-06-15 12:09:10,738][1648984] Fps is (10 sec: 39354.9, 60 sec: 40973.0, 300 sec: 42765.0). Total num frames: 97255424. Throughput: 0: 10456.2. Samples: 24375296. Policy #0 lag: (min: 15.0, avg: 44.8, max: 175.0) [2024-06-15 12:09:10,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:11,844][1652475] Updated weights for policy 0, policy_version 47536 (0.0014) [2024-06-15 12:09:13,625][1652475] Updated weights for policy 0, policy_version 47606 (0.0011) [2024-06-15 12:09:15,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 97517568. Throughput: 0: 10285.5. Samples: 24433664. Policy #0 lag: (min: 15.0, avg: 44.8, max: 175.0) [2024-06-15 12:09:15,739][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:20,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 40976.6, 300 sec: 42653.9). Total num frames: 97648640. Throughput: 0: 10763.4. Samples: 24476672. Policy #0 lag: (min: 15.0, avg: 44.8, max: 175.0) [2024-06-15 12:09:20,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:21,466][1652475] Updated weights for policy 0, policy_version 47682 (0.0013) [2024-06-15 12:09:23,303][1652475] Updated weights for policy 0, policy_version 47760 (0.0013) [2024-06-15 12:09:24,278][1652475] Updated weights for policy 0, policy_version 47795 (0.0012) [2024-06-15 12:09:25,052][1651340] Signal inference workers to stop experience collection... (2550 times) [2024-06-15 12:09:25,112][1652475] InferenceWorker_p0-w0: stopping experience collection (2550 times) [2024-06-15 12:09:25,370][1651340] Signal inference workers to resume experience collection... (2550 times) [2024-06-15 12:09:25,371][1652475] InferenceWorker_p0-w0: resuming experience collection (2550 times) [2024-06-15 12:09:25,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 42598.3, 300 sec: 42431.8). Total num frames: 97976320. Throughput: 0: 10626.8. Samples: 24535552. Policy #0 lag: (min: 15.0, avg: 44.8, max: 175.0) [2024-06-15 12:09:25,739][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:25,805][1652475] Updated weights for policy 0, policy_version 47856 (0.0034) [2024-06-15 12:09:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.6, 300 sec: 42542.9). Total num frames: 98041856. Throughput: 0: 10945.4. Samples: 24609280. Policy #0 lag: (min: 15.0, avg: 44.8, max: 175.0) [2024-06-15 12:09:30,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:31,500][1652475] Updated weights for policy 0, policy_version 47920 (0.0015) [2024-06-15 12:09:34,228][1652475] Updated weights for policy 0, policy_version 47970 (0.0117) [2024-06-15 12:09:35,715][1652475] Updated weights for policy 0, policy_version 48032 (0.0021) [2024-06-15 12:09:35,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 42598.5, 300 sec: 42542.8). Total num frames: 98369536. Throughput: 0: 11104.7. Samples: 24641536. Policy #0 lag: (min: 15.0, avg: 88.6, max: 271.0) [2024-06-15 12:09:35,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:37,883][1652475] Updated weights for policy 0, policy_version 48112 (0.0014) [2024-06-15 12:09:40,739][1648984] Fps is (10 sec: 52420.1, 60 sec: 43689.5, 300 sec: 42653.7). Total num frames: 98566144. Throughput: 0: 10910.9. Samples: 24694784. Policy #0 lag: (min: 15.0, avg: 88.6, max: 271.0) [2024-06-15 12:09:40,740][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:43,231][1652475] Updated weights for policy 0, policy_version 48160 (0.0012) [2024-06-15 12:09:45,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 41506.2, 300 sec: 42320.7). Total num frames: 98697216. Throughput: 0: 11209.2. Samples: 24769536. Policy #0 lag: (min: 15.0, avg: 88.6, max: 271.0) [2024-06-15 12:09:45,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:46,536][1652475] Updated weights for policy 0, policy_version 48211 (0.0013) [2024-06-15 12:09:48,054][1652475] Updated weights for policy 0, policy_version 48274 (0.0011) [2024-06-15 12:09:50,440][1652475] Updated weights for policy 0, policy_version 48355 (0.0033) [2024-06-15 12:09:50,738][1648984] Fps is (10 sec: 49157.9, 60 sec: 43144.2, 300 sec: 42542.8). Total num frames: 99057664. Throughput: 0: 11150.1. Samples: 24799232. Policy #0 lag: (min: 15.0, avg: 88.6, max: 271.0) [2024-06-15 12:09:50,739][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:55,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 44236.8, 300 sec: 42320.7). Total num frames: 99123200. Throughput: 0: 10786.1. Samples: 24860672. Policy #0 lag: (min: 15.0, avg: 88.6, max: 271.0) [2024-06-15 12:09:55,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:09:55,837][1652475] Updated weights for policy 0, policy_version 48416 (0.0017) [2024-06-15 12:09:56,710][1652475] Updated weights for policy 0, policy_version 48448 (0.0012) [2024-06-15 12:10:00,236][1652475] Updated weights for policy 0, policy_version 48512 (0.0015) [2024-06-15 12:10:00,742][1648984] Fps is (10 sec: 32754.9, 60 sec: 42055.1, 300 sec: 42209.0). Total num frames: 99385344. Throughput: 0: 10910.2. Samples: 24924672. Policy #0 lag: (min: 15.0, avg: 88.6, max: 271.0) [2024-06-15 12:10:00,743][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:10:02,286][1652475] Updated weights for policy 0, policy_version 48592 (0.0013) [2024-06-15 12:10:05,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 99614720. Throughput: 0: 10524.5. Samples: 24950272. Policy #0 lag: (min: 72.0, avg: 167.6, max: 327.0) [2024-06-15 12:10:05,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:10:07,079][1652475] Updated weights for policy 0, policy_version 48647 (0.0034) [2024-06-15 12:10:08,596][1652475] Updated weights for policy 0, policy_version 48704 (0.0016) [2024-06-15 12:10:10,738][1648984] Fps is (10 sec: 36060.2, 60 sec: 41506.0, 300 sec: 41987.4). Total num frames: 99745792. Throughput: 0: 10820.3. Samples: 25022464. Policy #0 lag: (min: 72.0, avg: 167.6, max: 327.0) [2024-06-15 12:10:10,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:10:12,017][1651340] Signal inference workers to stop experience collection... (2600 times) [2024-06-15 12:10:12,057][1652475] InferenceWorker_p0-w0: stopping experience collection (2600 times) [2024-06-15 12:10:12,362][1651340] Signal inference workers to resume experience collection... (2600 times) [2024-06-15 12:10:12,364][1652475] InferenceWorker_p0-w0: resuming experience collection (2600 times) [2024-06-15 12:10:12,366][1652475] Updated weights for policy 0, policy_version 48768 (0.0012) [2024-06-15 12:10:13,950][1652475] Updated weights for policy 0, policy_version 48817 (0.0011) [2024-06-15 12:10:15,281][1652475] Updated weights for policy 0, policy_version 48880 (0.0014) [2024-06-15 12:10:15,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 100139008. Throughput: 0: 10501.7. Samples: 25081856. Policy #0 lag: (min: 72.0, avg: 167.6, max: 327.0) [2024-06-15 12:10:15,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:10:19,616][1652475] Updated weights for policy 0, policy_version 48929 (0.0012) [2024-06-15 12:10:20,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 43690.7, 300 sec: 42658.8). Total num frames: 100270080. Throughput: 0: 10717.9. Samples: 25123840. Policy #0 lag: (min: 72.0, avg: 167.6, max: 327.0) [2024-06-15 12:10:20,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:10:23,320][1652475] Updated weights for policy 0, policy_version 48995 (0.0012) [2024-06-15 12:10:25,065][1652475] Updated weights for policy 0, policy_version 49059 (0.0012) [2024-06-15 12:10:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 100532224. Throughput: 0: 10991.3. Samples: 25189376. Policy #0 lag: (min: 72.0, avg: 167.6, max: 327.0) [2024-06-15 12:10:25,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:10:26,843][1652475] Updated weights for policy 0, policy_version 49148 (0.0126) [2024-06-15 12:10:30,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 42320.7). Total num frames: 100696064. Throughput: 0: 10808.9. Samples: 25255936. Policy #0 lag: (min: 72.0, avg: 167.6, max: 327.0) [2024-06-15 12:10:30,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:10:31,440][1652475] Updated weights for policy 0, policy_version 49206 (0.0013) [2024-06-15 12:10:35,345][1652475] Updated weights for policy 0, policy_version 49264 (0.0016) [2024-06-15 12:10:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.5, 300 sec: 42211.5). Total num frames: 100925440. Throughput: 0: 10945.5. Samples: 25291776. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 12:10:35,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:10:37,295][1652475] Updated weights for policy 0, policy_version 49345 (0.0093) [2024-06-15 12:10:40,738][1648984] Fps is (10 sec: 49149.4, 60 sec: 43691.5, 300 sec: 42653.9). Total num frames: 101187584. Throughput: 0: 10854.3. Samples: 25349120. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 12:10:40,739][1648984] Avg episode reward: [(0, '-0.690')] [2024-06-15 12:10:42,860][1652475] Updated weights for policy 0, policy_version 49410 (0.0037) [2024-06-15 12:10:44,333][1652475] Updated weights for policy 0, policy_version 49472 (0.0014) [2024-06-15 12:10:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 41990.9). Total num frames: 101318656. Throughput: 0: 10946.5. Samples: 25417216. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 12:10:45,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 12:10:48,414][1652475] Updated weights for policy 0, policy_version 49552 (0.0081) [2024-06-15 12:10:50,337][1652475] Updated weights for policy 0, policy_version 49632 (0.0175) [2024-06-15 12:10:50,738][1648984] Fps is (10 sec: 49154.8, 60 sec: 43691.0, 300 sec: 42542.9). Total num frames: 101679104. Throughput: 0: 11104.7. Samples: 25449984. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 12:10:50,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:10:50,935][1651340] Saving new best policy, reward=-0.520! [2024-06-15 12:10:54,993][1652475] Updated weights for policy 0, policy_version 49681 (0.0012) [2024-06-15 12:10:55,003][1651340] Signal inference workers to stop experience collection... (2650 times) [2024-06-15 12:10:55,036][1652475] InferenceWorker_p0-w0: stopping experience collection (2650 times) [2024-06-15 12:10:55,278][1651340] Signal inference workers to resume experience collection... (2650 times) [2024-06-15 12:10:55,278][1652475] InferenceWorker_p0-w0: resuming experience collection (2650 times) [2024-06-15 12:10:55,738][1648984] Fps is (10 sec: 49150.4, 60 sec: 44782.7, 300 sec: 42431.8). Total num frames: 101810176. Throughput: 0: 11047.8. Samples: 25519616. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 12:10:55,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:10:55,991][1652475] Updated weights for policy 0, policy_version 49725 (0.0011) [2024-06-15 12:10:56,030][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000049728_101842944.pth... [2024-06-15 12:10:56,077][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000044800_91750400.pth [2024-06-15 12:10:56,083][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000049728_101842944.pth [2024-06-15 12:10:59,491][1652475] Updated weights for policy 0, policy_version 49793 (0.0014) [2024-06-15 12:11:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45332.4, 300 sec: 42542.9). Total num frames: 102105088. Throughput: 0: 11229.9. Samples: 25587200. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 12:11:00,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:11:00,930][1652475] Updated weights for policy 0, policy_version 49859 (0.0112) [2024-06-15 12:11:02,364][1652475] Updated weights for policy 0, policy_version 49912 (0.0013) [2024-06-15 12:11:05,738][1648984] Fps is (10 sec: 42600.2, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 102236160. Throughput: 0: 10899.9. Samples: 25614336. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 12:11:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:11:05,739][1651340] Saving new best policy, reward=-0.500! [2024-06-15 12:11:06,864][1652475] Updated weights for policy 0, policy_version 49952 (0.0012) [2024-06-15 12:11:10,738][1648984] Fps is (10 sec: 26214.5, 60 sec: 43690.8, 300 sec: 42209.6). Total num frames: 102367232. Throughput: 0: 11036.5. Samples: 25686016. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:11:10,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 12:11:11,119][1652475] Updated weights for policy 0, policy_version 50000 (0.0013) [2024-06-15 12:11:12,496][1652475] Updated weights for policy 0, policy_version 50052 (0.0015) [2024-06-15 12:11:13,880][1652475] Updated weights for policy 0, policy_version 50112 (0.0033) [2024-06-15 12:11:15,299][1652475] Updated weights for policy 0, policy_version 50171 (0.0087) [2024-06-15 12:11:15,742][1648984] Fps is (10 sec: 52404.2, 60 sec: 43687.3, 300 sec: 42653.3). Total num frames: 102760448. Throughput: 0: 10796.4. Samples: 25741824. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:11:15,743][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:11:18,854][1652475] Updated weights for policy 0, policy_version 50230 (0.0015) [2024-06-15 12:11:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 102891520. Throughput: 0: 10729.3. Samples: 25774592. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:11:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:11:20,739][1651340] Saving new best policy, reward=-0.470! [2024-06-15 12:11:23,535][1652475] Updated weights for policy 0, policy_version 50288 (0.0013) [2024-06-15 12:11:25,286][1652475] Updated weights for policy 0, policy_version 50352 (0.0013) [2024-06-15 12:11:25,738][1648984] Fps is (10 sec: 39340.1, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 103153664. Throughput: 0: 10956.9. Samples: 25842176. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:11:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:11:26,975][1652475] Updated weights for policy 0, policy_version 50404 (0.0012) [2024-06-15 12:11:30,545][1652475] Updated weights for policy 0, policy_version 50451 (0.0013) [2024-06-15 12:11:30,738][1648984] Fps is (10 sec: 42596.7, 60 sec: 43690.5, 300 sec: 42767.9). Total num frames: 103317504. Throughput: 0: 10865.7. Samples: 25906176. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:11:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:11:34,864][1652475] Updated weights for policy 0, policy_version 50499 (0.0012) [2024-06-15 12:11:35,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42598.4, 300 sec: 42323.1). Total num frames: 103481344. Throughput: 0: 10899.9. Samples: 25940480. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:11:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:11:37,213][1652475] Updated weights for policy 0, policy_version 50592 (0.0013) [2024-06-15 12:11:38,027][1652475] Updated weights for policy 0, policy_version 50624 (0.0010) [2024-06-15 12:11:38,687][1651340] Signal inference workers to stop experience collection... (2700 times) [2024-06-15 12:11:38,733][1652475] InferenceWorker_p0-w0: stopping experience collection (2700 times) [2024-06-15 12:11:38,901][1651340] Signal inference workers to resume experience collection... (2700 times) [2024-06-15 12:11:38,902][1652475] InferenceWorker_p0-w0: resuming experience collection (2700 times) [2024-06-15 12:11:39,571][1652475] Updated weights for policy 0, policy_version 50688 (0.0013) [2024-06-15 12:11:40,738][1648984] Fps is (10 sec: 49153.6, 60 sec: 43691.1, 300 sec: 42654.1). Total num frames: 103809024. Throughput: 0: 10661.1. Samples: 25999360. Policy #0 lag: (min: 111.0, avg: 181.5, max: 319.0) [2024-06-15 12:11:40,740][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:11:45,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 103940096. Throughput: 0: 10729.3. Samples: 26070016. Policy #0 lag: (min: 111.0, avg: 181.5, max: 319.0) [2024-06-15 12:11:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:11:47,074][1652475] Updated weights for policy 0, policy_version 50753 (0.0020) [2024-06-15 12:11:48,577][1652475] Updated weights for policy 0, policy_version 50820 (0.0083) [2024-06-15 12:11:49,961][1652475] Updated weights for policy 0, policy_version 50874 (0.0012) [2024-06-15 12:11:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 104235008. Throughput: 0: 10877.1. Samples: 26103808. Policy #0 lag: (min: 111.0, avg: 181.5, max: 319.0) [2024-06-15 12:11:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:11:51,189][1652475] Updated weights for policy 0, policy_version 50934 (0.0013) [2024-06-15 12:11:54,121][1652475] Updated weights for policy 0, policy_version 51000 (0.0059) [2024-06-15 12:11:55,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44237.1, 300 sec: 43098.3). Total num frames: 104464384. Throughput: 0: 10672.4. Samples: 26166272. Policy #0 lag: (min: 111.0, avg: 181.5, max: 319.0) [2024-06-15 12:11:55,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:11:59,656][1652475] Updated weights for policy 0, policy_version 51062 (0.0015) [2024-06-15 12:12:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 104628224. Throughput: 0: 11037.6. Samples: 26238464. Policy #0 lag: (min: 111.0, avg: 181.5, max: 319.0) [2024-06-15 12:12:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:01,189][1652475] Updated weights for policy 0, policy_version 51120 (0.0076) [2024-06-15 12:12:01,933][1652475] Updated weights for policy 0, policy_version 51139 (0.0017) [2024-06-15 12:12:03,040][1652475] Updated weights for policy 0, policy_version 51200 (0.0012) [2024-06-15 12:12:05,724][1652475] Updated weights for policy 0, policy_version 51254 (0.0013) [2024-06-15 12:12:05,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 45329.0, 300 sec: 42987.5). Total num frames: 104955904. Throughput: 0: 11013.7. Samples: 26270208. Policy #0 lag: (min: 111.0, avg: 181.5, max: 319.0) [2024-06-15 12:12:05,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 105021440. Throughput: 0: 11093.3. Samples: 26341376. Policy #0 lag: (min: 1.0, avg: 78.2, max: 257.0) [2024-06-15 12:12:10,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:10,986][1652475] Updated weights for policy 0, policy_version 51301 (0.0041) [2024-06-15 12:12:12,233][1652475] Updated weights for policy 0, policy_version 51348 (0.0014) [2024-06-15 12:12:13,776][1652475] Updated weights for policy 0, policy_version 51394 (0.0013) [2024-06-15 12:12:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43694.0, 300 sec: 42765.0). Total num frames: 105381888. Throughput: 0: 11104.8. Samples: 26405888. Policy #0 lag: (min: 1.0, avg: 78.2, max: 257.0) [2024-06-15 12:12:15,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:15,934][1652475] Updated weights for policy 0, policy_version 51461 (0.0014) [2024-06-15 12:12:20,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 43099.2). Total num frames: 105512960. Throughput: 0: 11104.7. Samples: 26440192. Policy #0 lag: (min: 1.0, avg: 78.2, max: 257.0) [2024-06-15 12:12:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:21,172][1652475] Updated weights for policy 0, policy_version 51536 (0.0016) [2024-06-15 12:12:22,034][1652475] Updated weights for policy 0, policy_version 51577 (0.0011) [2024-06-15 12:12:24,620][1652475] Updated weights for policy 0, policy_version 51637 (0.0011) [2024-06-15 12:12:25,390][1651340] Signal inference workers to stop experience collection... (2750 times) [2024-06-15 12:12:25,437][1652475] InferenceWorker_p0-w0: stopping experience collection (2750 times) [2024-06-15 12:12:25,640][1651340] Signal inference workers to resume experience collection... (2750 times) [2024-06-15 12:12:25,641][1652475] InferenceWorker_p0-w0: resuming experience collection (2750 times) [2024-06-15 12:12:25,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 105807872. Throughput: 0: 11343.6. Samples: 26509824. Policy #0 lag: (min: 1.0, avg: 78.2, max: 257.0) [2024-06-15 12:12:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:26,408][1652475] Updated weights for policy 0, policy_version 51696 (0.0029) [2024-06-15 12:12:28,289][1652475] Updated weights for policy 0, policy_version 51747 (0.0012) [2024-06-15 12:12:30,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45329.3, 300 sec: 43098.3). Total num frames: 106037248. Throughput: 0: 11275.4. Samples: 26577408. Policy #0 lag: (min: 1.0, avg: 78.2, max: 257.0) [2024-06-15 12:12:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:32,433][1652475] Updated weights for policy 0, policy_version 51808 (0.0044) [2024-06-15 12:12:35,741][1648984] Fps is (10 sec: 36044.6, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 106168320. Throughput: 0: 11207.1. Samples: 26608128. Policy #0 lag: (min: 1.0, avg: 78.2, max: 257.0) [2024-06-15 12:12:35,742][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:36,434][1652475] Updated weights for policy 0, policy_version 51873 (0.0025) [2024-06-15 12:12:38,629][1652475] Updated weights for policy 0, policy_version 51962 (0.0013) [2024-06-15 12:12:39,721][1652475] Updated weights for policy 0, policy_version 52000 (0.0012) [2024-06-15 12:12:40,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 45875.2, 300 sec: 43320.4). Total num frames: 106561536. Throughput: 0: 11377.8. Samples: 26678272. Policy #0 lag: (min: 1.0, avg: 78.2, max: 257.0) [2024-06-15 12:12:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:43,290][1652475] Updated weights for policy 0, policy_version 52036 (0.0039) [2024-06-15 12:12:45,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 106692608. Throughput: 0: 11195.7. Samples: 26742272. Policy #0 lag: (min: 10.0, avg: 117.0, max: 266.0) [2024-06-15 12:12:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:47,143][1652475] Updated weights for policy 0, policy_version 52099 (0.0014) [2024-06-15 12:12:48,246][1652475] Updated weights for policy 0, policy_version 52160 (0.0013) [2024-06-15 12:12:50,277][1652475] Updated weights for policy 0, policy_version 52222 (0.0014) [2024-06-15 12:12:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 106954752. Throughput: 0: 11275.4. Samples: 26777600. Policy #0 lag: (min: 10.0, avg: 117.0, max: 266.0) [2024-06-15 12:12:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:12:52,584][1652475] Updated weights for policy 0, policy_version 52277 (0.0013) [2024-06-15 12:12:55,738][1648984] Fps is (10 sec: 45872.2, 60 sec: 44782.5, 300 sec: 43320.3). Total num frames: 107151360. Throughput: 0: 11138.7. Samples: 26842624. Policy #0 lag: (min: 10.0, avg: 117.0, max: 266.0) [2024-06-15 12:12:55,739][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:12:56,045][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000052336_107184128.pth... [2024-06-15 12:12:56,045][1652475] Updated weights for policy 0, policy_version 52336 (0.0011) [2024-06-15 12:12:56,109][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000047104_96468992.pth [2024-06-15 12:13:00,112][1652475] Updated weights for policy 0, policy_version 52414 (0.0016) [2024-06-15 12:13:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 45329.1, 300 sec: 43875.8). Total num frames: 107347968. Throughput: 0: 10990.9. Samples: 26900480. Policy #0 lag: (min: 10.0, avg: 117.0, max: 266.0) [2024-06-15 12:13:00,739][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:13:02,409][1652475] Updated weights for policy 0, policy_version 52478 (0.0019) [2024-06-15 12:13:05,738][1648984] Fps is (10 sec: 32770.0, 60 sec: 42052.2, 300 sec: 42989.9). Total num frames: 107479040. Throughput: 0: 10911.3. Samples: 26931200. Policy #0 lag: (min: 10.0, avg: 117.0, max: 266.0) [2024-06-15 12:13:05,738][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 12:13:09,850][1652475] Updated weights for policy 0, policy_version 52560 (0.0029) [2024-06-15 12:13:10,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 107708416. Throughput: 0: 10865.8. Samples: 26998784. Policy #0 lag: (min: 10.0, avg: 117.0, max: 266.0) [2024-06-15 12:13:10,738][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 12:13:11,572][1652475] Updated weights for policy 0, policy_version 52624 (0.0020) [2024-06-15 12:13:13,882][1652475] Updated weights for policy 0, policy_version 52690 (0.0081) [2024-06-15 12:13:14,196][1651340] Signal inference workers to stop experience collection... (2800 times) [2024-06-15 12:13:14,228][1652475] InferenceWorker_p0-w0: stopping experience collection (2800 times) [2024-06-15 12:13:14,366][1651340] Signal inference workers to resume experience collection... (2800 times) [2024-06-15 12:13:14,367][1652475] InferenceWorker_p0-w0: resuming experience collection (2800 times) [2024-06-15 12:13:15,739][1648984] Fps is (10 sec: 52421.0, 60 sec: 43689.6, 300 sec: 43434.8). Total num frames: 108003328. Throughput: 0: 10717.5. Samples: 27059712. Policy #0 lag: (min: 10.0, avg: 117.0, max: 266.0) [2024-06-15 12:13:15,740][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 12:13:17,507][1652475] Updated weights for policy 0, policy_version 52740 (0.0013) [2024-06-15 12:13:18,549][1652475] Updated weights for policy 0, policy_version 52796 (0.0012) [2024-06-15 12:13:20,739][1648984] Fps is (10 sec: 42594.3, 60 sec: 43689.9, 300 sec: 43098.1). Total num frames: 108134400. Throughput: 0: 10876.9. Samples: 27097600. Policy #0 lag: (min: 14.0, avg: 119.9, max: 270.0) [2024-06-15 12:13:20,739][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 12:13:21,808][1652475] Updated weights for policy 0, policy_version 52848 (0.0013) [2024-06-15 12:13:23,424][1652475] Updated weights for policy 0, policy_version 52896 (0.0018) [2024-06-15 12:13:24,961][1652475] Updated weights for policy 0, policy_version 52930 (0.0013) [2024-06-15 12:13:25,738][1648984] Fps is (10 sec: 45882.6, 60 sec: 44236.9, 300 sec: 44098.0). Total num frames: 108462080. Throughput: 0: 10763.4. Samples: 27162624. Policy #0 lag: (min: 14.0, avg: 119.9, max: 270.0) [2024-06-15 12:13:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:13:25,922][1652475] Updated weights for policy 0, policy_version 52983 (0.0012) [2024-06-15 12:13:26,066][1651340] Saving new best policy, reward=-0.430! [2024-06-15 12:13:29,112][1652475] Updated weights for policy 0, policy_version 53024 (0.0128) [2024-06-15 12:13:30,738][1648984] Fps is (10 sec: 52434.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 108658688. Throughput: 0: 10820.3. Samples: 27229184. Policy #0 lag: (min: 14.0, avg: 119.9, max: 270.0) [2024-06-15 12:13:30,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:13:32,814][1652475] Updated weights for policy 0, policy_version 53095 (0.0013) [2024-06-15 12:13:35,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 108789760. Throughput: 0: 10661.0. Samples: 27257344. Policy #0 lag: (min: 14.0, avg: 119.9, max: 270.0) [2024-06-15 12:13:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:13:35,739][1651340] Saving new best policy, reward=-0.390! [2024-06-15 12:13:37,012][1652475] Updated weights for policy 0, policy_version 53168 (0.0013) [2024-06-15 12:13:40,113][1652475] Updated weights for policy 0, policy_version 53232 (0.0012) [2024-06-15 12:13:40,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 41506.0, 300 sec: 43542.5). Total num frames: 109051904. Throughput: 0: 10774.8. Samples: 27327488. Policy #0 lag: (min: 14.0, avg: 119.9, max: 270.0) [2024-06-15 12:13:40,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:13:41,256][1651340] Saving new best policy, reward=-0.370! [2024-06-15 12:13:41,818][1652475] Updated weights for policy 0, policy_version 53311 (0.0012) [2024-06-15 12:13:45,331][1652475] Updated weights for policy 0, policy_version 53348 (0.0013) [2024-06-15 12:13:45,738][1648984] Fps is (10 sec: 49151.3, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 109281280. Throughput: 0: 10786.1. Samples: 27385856. Policy #0 lag: (min: 14.0, avg: 119.9, max: 270.0) [2024-06-15 12:13:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:13:48,385][1652475] Updated weights for policy 0, policy_version 53392 (0.0136) [2024-06-15 12:13:49,567][1652475] Updated weights for policy 0, policy_version 53439 (0.0026) [2024-06-15 12:13:50,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 41506.2, 300 sec: 43986.9). Total num frames: 109445120. Throughput: 0: 10979.6. Samples: 27425280. Policy #0 lag: (min: 14.0, avg: 119.9, max: 270.0) [2024-06-15 12:13:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:13:52,551][1652475] Updated weights for policy 0, policy_version 53506 (0.0109) [2024-06-15 12:13:53,589][1652475] Updated weights for policy 0, policy_version 53568 (0.0012) [2024-06-15 12:13:55,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43145.0, 300 sec: 43654.9). Total num frames: 109740032. Throughput: 0: 10820.3. Samples: 27485696. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 12:13:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:13:56,914][1652475] Updated weights for policy 0, policy_version 53631 (0.0078) [2024-06-15 12:14:00,738][1648984] Fps is (10 sec: 45874.1, 60 sec: 42598.2, 300 sec: 43764.7). Total num frames: 109903872. Throughput: 0: 11014.0. Samples: 27555328. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 12:14:00,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:01,000][1652475] Updated weights for policy 0, policy_version 53687 (0.0010) [2024-06-15 12:14:03,114][1652475] Updated weights for policy 0, policy_version 53719 (0.0012) [2024-06-15 12:14:03,406][1651340] Signal inference workers to stop experience collection... (2850 times) [2024-06-15 12:14:03,436][1652475] InferenceWorker_p0-w0: stopping experience collection (2850 times) [2024-06-15 12:14:03,712][1651340] Signal inference workers to resume experience collection... (2850 times) [2024-06-15 12:14:03,713][1652475] InferenceWorker_p0-w0: resuming experience collection (2850 times) [2024-06-15 12:14:04,394][1652475] Updated weights for policy 0, policy_version 53776 (0.0032) [2024-06-15 12:14:05,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 45875.3, 300 sec: 43986.9). Total num frames: 110231552. Throughput: 0: 11070.9. Samples: 27595776. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 12:14:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:06,261][1652475] Updated weights for policy 0, policy_version 53826 (0.0013) [2024-06-15 12:14:10,738][1648984] Fps is (10 sec: 45876.5, 60 sec: 44236.9, 300 sec: 43542.6). Total num frames: 110362624. Throughput: 0: 11081.9. Samples: 27661312. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 12:14:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:11,457][1652475] Updated weights for policy 0, policy_version 53890 (0.0011) [2024-06-15 12:14:12,655][1652475] Updated weights for policy 0, policy_version 53949 (0.0112) [2024-06-15 12:14:15,320][1652475] Updated weights for policy 0, policy_version 54016 (0.0013) [2024-06-15 12:14:15,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44237.9, 300 sec: 44098.0). Total num frames: 110657536. Throughput: 0: 11173.0. Samples: 27731968. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 12:14:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:17,800][1652475] Updated weights for policy 0, policy_version 54083 (0.0139) [2024-06-15 12:14:20,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 45876.0, 300 sec: 43764.7). Total num frames: 110886912. Throughput: 0: 11218.5. Samples: 27762176. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 12:14:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:22,531][1652475] Updated weights for policy 0, policy_version 54145 (0.0013) [2024-06-15 12:14:23,517][1652475] Updated weights for policy 0, policy_version 54206 (0.0014) [2024-06-15 12:14:25,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.6, 300 sec: 44209.0). Total num frames: 111083520. Throughput: 0: 11309.6. Samples: 27836416. Policy #0 lag: (min: 5.0, avg: 89.7, max: 261.0) [2024-06-15 12:14:25,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:26,366][1652475] Updated weights for policy 0, policy_version 54263 (0.0017) [2024-06-15 12:14:27,836][1652475] Updated weights for policy 0, policy_version 54336 (0.0099) [2024-06-15 12:14:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.1, 300 sec: 44209.0). Total num frames: 111411200. Throughput: 0: 11514.3. Samples: 27904000. Policy #0 lag: (min: 5.0, avg: 89.7, max: 261.0) [2024-06-15 12:14:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:33,497][1652475] Updated weights for policy 0, policy_version 54407 (0.0078) [2024-06-15 12:14:34,613][1652475] Updated weights for policy 0, policy_version 54464 (0.0017) [2024-06-15 12:14:35,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 45875.1, 300 sec: 43987.1). Total num frames: 111542272. Throughput: 0: 11548.4. Samples: 27944960. Policy #0 lag: (min: 5.0, avg: 89.7, max: 261.0) [2024-06-15 12:14:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:37,950][1652475] Updated weights for policy 0, policy_version 54532 (0.0137) [2024-06-15 12:14:39,181][1652475] Updated weights for policy 0, policy_version 54585 (0.0014) [2024-06-15 12:14:40,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 46967.7, 300 sec: 44653.3). Total num frames: 111869952. Throughput: 0: 11616.7. Samples: 28008448. Policy #0 lag: (min: 5.0, avg: 89.7, max: 261.0) [2024-06-15 12:14:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:40,965][1652475] Updated weights for policy 0, policy_version 54640 (0.0013) [2024-06-15 12:14:45,223][1652475] Updated weights for policy 0, policy_version 54688 (0.0012) [2024-06-15 12:14:45,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 45875.3, 300 sec: 43986.9). Total num frames: 112033792. Throughput: 0: 11628.2. Samples: 28078592. Policy #0 lag: (min: 5.0, avg: 89.7, max: 261.0) [2024-06-15 12:14:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:48,498][1651340] Signal inference workers to stop experience collection... (2900 times) [2024-06-15 12:14:48,563][1652475] InferenceWorker_p0-w0: stopping experience collection (2900 times) [2024-06-15 12:14:48,796][1651340] Signal inference workers to resume experience collection... (2900 times) [2024-06-15 12:14:48,796][1652475] InferenceWorker_p0-w0: resuming experience collection (2900 times) [2024-06-15 12:14:48,799][1652475] Updated weights for policy 0, policy_version 54736 (0.0014) [2024-06-15 12:14:50,567][1652475] Updated weights for policy 0, policy_version 54802 (0.0012) [2024-06-15 12:14:50,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 46421.3, 300 sec: 44431.2). Total num frames: 112230400. Throughput: 0: 11491.5. Samples: 28112896. Policy #0 lag: (min: 5.0, avg: 89.7, max: 261.0) [2024-06-15 12:14:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:52,135][1652475] Updated weights for policy 0, policy_version 54851 (0.0028) [2024-06-15 12:14:53,212][1652475] Updated weights for policy 0, policy_version 54909 (0.0019) [2024-06-15 12:14:55,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 45329.0, 300 sec: 44320.8). Total num frames: 112459776. Throughput: 0: 11446.0. Samples: 28176384. Policy #0 lag: (min: 5.0, avg: 89.7, max: 261.0) [2024-06-15 12:14:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:14:55,751][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000054912_112459776.pth... [2024-06-15 12:14:55,822][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000049728_101842944.pth [2024-06-15 12:14:57,798][1652475] Updated weights for policy 0, policy_version 54960 (0.0012) [2024-06-15 12:15:00,582][1652475] Updated weights for policy 0, policy_version 55008 (0.0046) [2024-06-15 12:15:00,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 45875.4, 300 sec: 44209.0). Total num frames: 112656384. Throughput: 0: 11377.8. Samples: 28243968. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 12:15:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:15:01,908][1652475] Updated weights for policy 0, policy_version 55056 (0.0013) [2024-06-15 12:15:04,783][1652475] Updated weights for policy 0, policy_version 55152 (0.0098) [2024-06-15 12:15:05,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 44875.5). Total num frames: 112984064. Throughput: 0: 11400.5. Samples: 28275200. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 12:15:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:15:09,760][1652475] Updated weights for policy 0, policy_version 55187 (0.0012) [2024-06-15 12:15:10,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 113115136. Throughput: 0: 11309.5. Samples: 28345344. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 12:15:10,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 12:15:11,466][1652475] Updated weights for policy 0, policy_version 55248 (0.0015) [2024-06-15 12:15:15,253][1652475] Updated weights for policy 0, policy_version 55328 (0.0013) [2024-06-15 12:15:15,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 44783.0, 300 sec: 44320.1). Total num frames: 113344512. Throughput: 0: 11116.1. Samples: 28404224. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 12:15:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:15:17,497][1652475] Updated weights for policy 0, policy_version 55377 (0.0011) [2024-06-15 12:15:18,351][1652475] Updated weights for policy 0, policy_version 55423 (0.0014) [2024-06-15 12:15:20,738][1648984] Fps is (10 sec: 39319.8, 60 sec: 43690.4, 300 sec: 43986.8). Total num frames: 113508352. Throughput: 0: 10877.1. Samples: 28434432. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 12:15:20,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:15:21,697][1652475] Updated weights for policy 0, policy_version 55477 (0.0027) [2024-06-15 12:15:23,614][1652475] Updated weights for policy 0, policy_version 55536 (0.0031) [2024-06-15 12:15:25,738][1648984] Fps is (10 sec: 42597.6, 60 sec: 44782.8, 300 sec: 44320.1). Total num frames: 113770496. Throughput: 0: 11002.3. Samples: 28503552. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 12:15:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:15:27,102][1652475] Updated weights for policy 0, policy_version 55584 (0.0012) [2024-06-15 12:15:29,714][1652475] Updated weights for policy 0, policy_version 55648 (0.0015) [2024-06-15 12:15:30,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 114032640. Throughput: 0: 10865.7. Samples: 28567552. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 12:15:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:15:33,051][1652475] Updated weights for policy 0, policy_version 55720 (0.0109) [2024-06-15 12:15:33,557][1652475] Updated weights for policy 0, policy_version 55743 (0.0020) [2024-06-15 12:15:34,839][1651340] Signal inference workers to stop experience collection... (2950 times) [2024-06-15 12:15:34,914][1652475] InferenceWorker_p0-w0: stopping experience collection (2950 times) [2024-06-15 12:15:35,095][1651340] Signal inference workers to resume experience collection... (2950 times) [2024-06-15 12:15:35,102][1652475] InferenceWorker_p0-w0: resuming experience collection (2950 times) [2024-06-15 12:15:35,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 45329.0, 300 sec: 44320.2). Total num frames: 114262016. Throughput: 0: 10843.0. Samples: 28600832. Policy #0 lag: (min: 27.0, avg: 155.2, max: 283.0) [2024-06-15 12:15:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:15:35,757][1652475] Updated weights for policy 0, policy_version 55806 (0.0012) [2024-06-15 12:15:39,639][1652475] Updated weights for policy 0, policy_version 55872 (0.0015) [2024-06-15 12:15:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 44431.2). Total num frames: 114425856. Throughput: 0: 10888.5. Samples: 28666368. Policy #0 lag: (min: 27.0, avg: 155.2, max: 283.0) [2024-06-15 12:15:40,738][1648984] Avg episode reward: [(0, '-0.630')] [2024-06-15 12:15:45,738][1648984] Fps is (10 sec: 29491.5, 60 sec: 42052.2, 300 sec: 43653.6). Total num frames: 114556928. Throughput: 0: 10774.7. Samples: 28728832. Policy #0 lag: (min: 27.0, avg: 155.2, max: 283.0) [2024-06-15 12:15:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 12:15:46,256][1652475] Updated weights for policy 0, policy_version 55968 (0.0014) [2024-06-15 12:15:49,820][1652475] Updated weights for policy 0, policy_version 56048 (0.0019) [2024-06-15 12:15:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.5, 300 sec: 44098.0). Total num frames: 114819072. Throughput: 0: 10683.7. Samples: 28755968. Policy #0 lag: (min: 27.0, avg: 155.2, max: 283.0) [2024-06-15 12:15:50,738][1648984] Avg episode reward: [(0, '-0.660')] [2024-06-15 12:15:51,627][1652475] Updated weights for policy 0, policy_version 56112 (0.0016) [2024-06-15 12:15:53,708][1652475] Updated weights for policy 0, policy_version 56160 (0.0012) [2024-06-15 12:15:54,496][1652475] Updated weights for policy 0, policy_version 56189 (0.0017) [2024-06-15 12:15:55,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 115081216. Throughput: 0: 10490.3. Samples: 28817408. Policy #0 lag: (min: 27.0, avg: 155.2, max: 283.0) [2024-06-15 12:15:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 115212288. Throughput: 0: 10752.0. Samples: 28888064. Policy #0 lag: (min: 27.0, avg: 155.2, max: 283.0) [2024-06-15 12:16:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:01,375][1652475] Updated weights for policy 0, policy_version 56258 (0.0035) [2024-06-15 12:16:02,871][1652475] Updated weights for policy 0, policy_version 56325 (0.0013) [2024-06-15 12:16:04,242][1652475] Updated weights for policy 0, policy_version 56384 (0.0013) [2024-06-15 12:16:05,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 44542.3). Total num frames: 115507200. Throughput: 0: 10740.7. Samples: 28917760. Policy #0 lag: (min: 27.0, avg: 155.2, max: 283.0) [2024-06-15 12:16:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43543.3). Total num frames: 115605504. Throughput: 0: 10683.8. Samples: 28984320. Policy #0 lag: (min: 27.0, avg: 155.2, max: 283.0) [2024-06-15 12:16:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:10,949][1652475] Updated weights for policy 0, policy_version 56464 (0.0086) [2024-06-15 12:16:14,586][1652475] Updated weights for policy 0, policy_version 56560 (0.0014) [2024-06-15 12:16:15,739][1648984] Fps is (10 sec: 39317.5, 60 sec: 42597.6, 300 sec: 44097.8). Total num frames: 115900416. Throughput: 0: 10638.0. Samples: 29046272. Policy #0 lag: (min: 15.0, avg: 100.2, max: 271.0) [2024-06-15 12:16:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:16,211][1652475] Updated weights for policy 0, policy_version 56624 (0.0012) [2024-06-15 12:16:18,063][1652475] Updated weights for policy 0, policy_version 56699 (0.0013) [2024-06-15 12:16:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43691.0, 300 sec: 43986.9). Total num frames: 116129792. Throughput: 0: 10524.5. Samples: 29074432. Policy #0 lag: (min: 15.0, avg: 100.2, max: 271.0) [2024-06-15 12:16:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:24,065][1652475] Updated weights for policy 0, policy_version 56760 (0.0124) [2024-06-15 12:16:25,738][1648984] Fps is (10 sec: 36048.4, 60 sec: 41506.2, 300 sec: 43875.8). Total num frames: 116260864. Throughput: 0: 10695.1. Samples: 29147648. Policy #0 lag: (min: 15.0, avg: 100.2, max: 271.0) [2024-06-15 12:16:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:26,198][1651340] Signal inference workers to stop experience collection... (3000 times) [2024-06-15 12:16:26,238][1652475] InferenceWorker_p0-w0: stopping experience collection (3000 times) [2024-06-15 12:16:26,440][1651340] Signal inference workers to resume experience collection... (3000 times) [2024-06-15 12:16:26,440][1652475] InferenceWorker_p0-w0: resuming experience collection (3000 times) [2024-06-15 12:16:27,032][1652475] Updated weights for policy 0, policy_version 56822 (0.0013) [2024-06-15 12:16:28,407][1652475] Updated weights for policy 0, policy_version 56891 (0.0015) [2024-06-15 12:16:29,773][1652475] Updated weights for policy 0, policy_version 56950 (0.0013) [2024-06-15 12:16:30,740][1648984] Fps is (10 sec: 52416.7, 60 sec: 43689.1, 300 sec: 44653.0). Total num frames: 116654080. Throughput: 0: 10637.7. Samples: 29207552. Policy #0 lag: (min: 15.0, avg: 100.2, max: 271.0) [2024-06-15 12:16:30,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:35,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 41506.2, 300 sec: 43875.8). Total num frames: 116752384. Throughput: 0: 10945.4. Samples: 29248512. Policy #0 lag: (min: 15.0, avg: 100.2, max: 271.0) [2024-06-15 12:16:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:35,744][1652475] Updated weights for policy 0, policy_version 57023 (0.0021) [2024-06-15 12:16:38,471][1652475] Updated weights for policy 0, policy_version 57072 (0.0013) [2024-06-15 12:16:39,029][1652475] Updated weights for policy 0, policy_version 57094 (0.0020) [2024-06-15 12:16:40,602][1652475] Updated weights for policy 0, policy_version 57153 (0.0013) [2024-06-15 12:16:40,738][1648984] Fps is (10 sec: 39330.3, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 117047296. Throughput: 0: 11013.7. Samples: 29313024. Policy #0 lag: (min: 15.0, avg: 100.2, max: 271.0) [2024-06-15 12:16:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:41,899][1652475] Updated weights for policy 0, policy_version 57213 (0.0033) [2024-06-15 12:16:45,744][1648984] Fps is (10 sec: 42571.3, 60 sec: 43686.0, 300 sec: 43874.8). Total num frames: 117178368. Throughput: 0: 10943.9. Samples: 29380608. Policy #0 lag: (min: 15.0, avg: 100.2, max: 271.0) [2024-06-15 12:16:45,745][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:47,649][1652475] Updated weights for policy 0, policy_version 57276 (0.0014) [2024-06-15 12:16:50,212][1652475] Updated weights for policy 0, policy_version 57322 (0.0012) [2024-06-15 12:16:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 117440512. Throughput: 0: 11025.1. Samples: 29413888. Policy #0 lag: (min: 9.0, avg: 91.9, max: 265.0) [2024-06-15 12:16:50,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:51,288][1652475] Updated weights for policy 0, policy_version 57361 (0.0012) [2024-06-15 12:16:52,520][1652475] Updated weights for policy 0, policy_version 57409 (0.0108) [2024-06-15 12:16:53,664][1652475] Updated weights for policy 0, policy_version 57469 (0.0011) [2024-06-15 12:16:55,738][1648984] Fps is (10 sec: 52460.5, 60 sec: 43690.5, 300 sec: 44320.1). Total num frames: 117702656. Throughput: 0: 10934.0. Samples: 29476352. Policy #0 lag: (min: 9.0, avg: 91.9, max: 265.0) [2024-06-15 12:16:55,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:16:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000057472_117702656.pth... [2024-06-15 12:16:55,818][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000052336_107184128.pth [2024-06-15 12:16:59,558][1652475] Updated weights for policy 0, policy_version 57530 (0.0032) [2024-06-15 12:17:00,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 117833728. Throughput: 0: 11207.4. Samples: 29550592. Policy #0 lag: (min: 9.0, avg: 91.9, max: 265.0) [2024-06-15 12:17:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:17:01,599][1652475] Updated weights for policy 0, policy_version 57571 (0.0017) [2024-06-15 12:17:03,081][1652475] Updated weights for policy 0, policy_version 57617 (0.0014) [2024-06-15 12:17:05,016][1652475] Updated weights for policy 0, policy_version 57699 (0.0082) [2024-06-15 12:17:05,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 45328.8, 300 sec: 44764.4). Total num frames: 118226944. Throughput: 0: 11252.5. Samples: 29580800. Policy #0 lag: (min: 9.0, avg: 91.9, max: 265.0) [2024-06-15 12:17:05,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:17:10,397][1651340] Signal inference workers to stop experience collection... (3050 times) [2024-06-15 12:17:10,453][1652475] InferenceWorker_p0-w0: stopping experience collection (3050 times) [2024-06-15 12:17:10,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 118226944. Throughput: 0: 11070.6. Samples: 29645824. Policy #0 lag: (min: 9.0, avg: 91.9, max: 265.0) [2024-06-15 12:17:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:17:10,744][1651340] Signal inference workers to resume experience collection... (3050 times) [2024-06-15 12:17:10,745][1652475] InferenceWorker_p0-w0: resuming experience collection (3050 times) [2024-06-15 12:17:10,872][1652475] Updated weights for policy 0, policy_version 57745 (0.0012) [2024-06-15 12:17:13,002][1652475] Updated weights for policy 0, policy_version 57794 (0.0015) [2024-06-15 12:17:15,418][1652475] Updated weights for policy 0, policy_version 57888 (0.0088) [2024-06-15 12:17:15,738][1648984] Fps is (10 sec: 32768.8, 60 sec: 44237.5, 300 sec: 44209.0). Total num frames: 118554624. Throughput: 0: 11150.8. Samples: 29709312. Policy #0 lag: (min: 9.0, avg: 91.9, max: 265.0) [2024-06-15 12:17:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:17:17,396][1652475] Updated weights for policy 0, policy_version 57980 (0.0012) [2024-06-15 12:17:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 118751232. Throughput: 0: 10774.8. Samples: 29733376. Policy #0 lag: (min: 9.0, avg: 91.9, max: 265.0) [2024-06-15 12:17:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:17:23,970][1652475] Updated weights for policy 0, policy_version 58040 (0.0088) [2024-06-15 12:17:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 118915072. Throughput: 0: 10899.9. Samples: 29803520. Policy #0 lag: (min: 15.0, avg: 94.1, max: 271.0) [2024-06-15 12:17:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:17:26,097][1652475] Updated weights for policy 0, policy_version 58086 (0.0014) [2024-06-15 12:17:27,388][1652475] Updated weights for policy 0, policy_version 58128 (0.0014) [2024-06-15 12:17:29,299][1652475] Updated weights for policy 0, policy_version 58224 (0.0015) [2024-06-15 12:17:30,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43692.2, 300 sec: 44431.2). Total num frames: 119275520. Throughput: 0: 10708.0. Samples: 29862400. Policy #0 lag: (min: 15.0, avg: 94.1, max: 271.0) [2024-06-15 12:17:30,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:17:34,832][1652475] Updated weights for policy 0, policy_version 58272 (0.0012) [2024-06-15 12:17:35,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 44236.7, 300 sec: 43542.5). Total num frames: 119406592. Throughput: 0: 10695.1. Samples: 29895168. Policy #0 lag: (min: 15.0, avg: 94.1, max: 271.0) [2024-06-15 12:17:35,739][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 12:17:39,076][1652475] Updated weights for policy 0, policy_version 58336 (0.0014) [2024-06-15 12:17:40,378][1652475] Updated weights for policy 0, policy_version 58400 (0.0012) [2024-06-15 12:17:40,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 43144.5, 300 sec: 43875.8). Total num frames: 119635968. Throughput: 0: 10911.4. Samples: 29967360. Policy #0 lag: (min: 15.0, avg: 94.1, max: 271.0) [2024-06-15 12:17:40,738][1648984] Avg episode reward: [(0, '-0.620')] [2024-06-15 12:17:41,635][1652475] Updated weights for policy 0, policy_version 58464 (0.0092) [2024-06-15 12:17:45,738][1648984] Fps is (10 sec: 42599.5, 60 sec: 44241.6, 300 sec: 43653.7). Total num frames: 119832576. Throughput: 0: 10717.9. Samples: 30032896. Policy #0 lag: (min: 15.0, avg: 94.1, max: 271.0) [2024-06-15 12:17:45,738][1648984] Avg episode reward: [(0, '-0.620')] [2024-06-15 12:17:45,872][1652475] Updated weights for policy 0, policy_version 58528 (0.0014) [2024-06-15 12:17:46,579][1652475] Updated weights for policy 0, policy_version 58560 (0.0043) [2024-06-15 12:17:50,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42598.4, 300 sec: 43542.7). Total num frames: 119996416. Throughput: 0: 10820.3. Samples: 30067712. Policy #0 lag: (min: 15.0, avg: 94.1, max: 271.0) [2024-06-15 12:17:50,738][1648984] Avg episode reward: [(0, '-0.620')] [2024-06-15 12:17:51,459][1652475] Updated weights for policy 0, policy_version 58640 (0.0013) [2024-06-15 12:17:52,771][1651340] Signal inference workers to stop experience collection... (3100 times) [2024-06-15 12:17:52,820][1652475] InferenceWorker_p0-w0: stopping experience collection (3100 times) [2024-06-15 12:17:52,992][1651340] Signal inference workers to resume experience collection... (3100 times) [2024-06-15 12:17:52,992][1652475] InferenceWorker_p0-w0: resuming experience collection (3100 times) [2024-06-15 12:17:53,168][1652475] Updated weights for policy 0, policy_version 58707 (0.0016) [2024-06-15 12:17:54,081][1652475] Updated weights for policy 0, policy_version 58750 (0.0039) [2024-06-15 12:17:55,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 120324096. Throughput: 0: 10831.6. Samples: 30133248. Policy #0 lag: (min: 15.0, avg: 94.1, max: 271.0) [2024-06-15 12:17:55,738][1648984] Avg episode reward: [(0, '-0.630')] [2024-06-15 12:17:57,593][1652475] Updated weights for policy 0, policy_version 58800 (0.0013) [2024-06-15 12:18:00,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 120455168. Throughput: 0: 11116.1. Samples: 30209536. Policy #0 lag: (min: 15.0, avg: 94.1, max: 271.0) [2024-06-15 12:18:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:18:01,703][1652475] Updated weights for policy 0, policy_version 58833 (0.0034) [2024-06-15 12:18:03,145][1652475] Updated weights for policy 0, policy_version 58915 (0.0013) [2024-06-15 12:18:04,688][1652475] Updated weights for policy 0, policy_version 58992 (0.0013) [2024-06-15 12:18:05,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.9, 300 sec: 44542.3). Total num frames: 120848384. Throughput: 0: 11229.9. Samples: 30238720. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 12:18:05,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 12:18:10,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 43764.9). Total num frames: 120913920. Throughput: 0: 11104.7. Samples: 30303232. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 12:18:10,751][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 12:18:11,266][1652475] Updated weights for policy 0, policy_version 59069 (0.0015) [2024-06-15 12:18:13,679][1652475] Updated weights for policy 0, policy_version 59129 (0.0014) [2024-06-15 12:18:15,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43690.7, 300 sec: 44209.2). Total num frames: 121176064. Throughput: 0: 11218.5. Samples: 30367232. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 12:18:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:18:15,953][1652475] Updated weights for policy 0, policy_version 59187 (0.0013) [2024-06-15 12:18:17,672][1652475] Updated weights for policy 0, policy_version 59262 (0.0081) [2024-06-15 12:18:20,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 121372672. Throughput: 0: 11036.5. Samples: 30391808. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 12:18:20,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:18:23,158][1652475] Updated weights for policy 0, policy_version 59328 (0.0011) [2024-06-15 12:18:25,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 45329.1, 300 sec: 43986.9). Total num frames: 121634816. Throughput: 0: 11047.8. Samples: 30464512. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 12:18:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:18:26,893][1652475] Updated weights for policy 0, policy_version 59398 (0.0015) [2024-06-15 12:18:28,973][1652475] Updated weights for policy 0, policy_version 59475 (0.0014) [2024-06-15 12:18:30,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 121896960. Throughput: 0: 10797.5. Samples: 30518784. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 12:18:30,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:18:34,003][1652475] Updated weights for policy 0, policy_version 59537 (0.0017) [2024-06-15 12:18:35,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 122028032. Throughput: 0: 11070.6. Samples: 30565888. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 12:18:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:18:36,214][1652475] Updated weights for policy 0, policy_version 59587 (0.0013) [2024-06-15 12:18:38,540][1652475] Updated weights for policy 0, policy_version 59652 (0.0015) [2024-06-15 12:18:38,749][1651340] Signal inference workers to stop experience collection... (3150 times) [2024-06-15 12:18:38,815][1652475] InferenceWorker_p0-w0: stopping experience collection (3150 times) [2024-06-15 12:18:38,982][1651340] Signal inference workers to resume experience collection... (3150 times) [2024-06-15 12:18:38,983][1652475] InferenceWorker_p0-w0: resuming experience collection (3150 times) [2024-06-15 12:18:40,148][1652475] Updated weights for policy 0, policy_version 59715 (0.0084) [2024-06-15 12:18:40,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 44782.8, 300 sec: 44209.0). Total num frames: 122322944. Throughput: 0: 11104.7. Samples: 30632960. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 12:18:40,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:18:41,500][1652475] Updated weights for policy 0, policy_version 59770 (0.0015) [2024-06-15 12:18:45,722][1652475] Updated weights for policy 0, policy_version 59811 (0.0012) [2024-06-15 12:18:45,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 122486784. Throughput: 0: 10979.6. Samples: 30703616. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 12:18:45,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:18:47,990][1652475] Updated weights for policy 0, policy_version 59872 (0.0012) [2024-06-15 12:18:49,988][1652475] Updated weights for policy 0, policy_version 59926 (0.0013) [2024-06-15 12:18:50,709][1652475] Updated weights for policy 0, policy_version 59964 (0.0036) [2024-06-15 12:18:50,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 46421.3, 300 sec: 44209.0). Total num frames: 122781696. Throughput: 0: 11059.2. Samples: 30736384. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 12:18:50,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:18:55,738][1648984] Fps is (10 sec: 45874.5, 60 sec: 43690.6, 300 sec: 44209.0). Total num frames: 122945536. Throughput: 0: 11172.9. Samples: 30806016. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 12:18:55,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:18:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000060032_122945536.pth... [2024-06-15 12:18:55,791][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000054912_112459776.pth [2024-06-15 12:18:56,429][1652475] Updated weights for policy 0, policy_version 60034 (0.0013) [2024-06-15 12:18:58,676][1652475] Updated weights for policy 0, policy_version 60116 (0.0014) [2024-06-15 12:19:00,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 45875.1, 300 sec: 43986.9). Total num frames: 123207680. Throughput: 0: 11343.6. Samples: 30877696. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 12:19:00,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:19:01,332][1652475] Updated weights for policy 0, policy_version 60182 (0.0078) [2024-06-15 12:19:03,788][1652475] Updated weights for policy 0, policy_version 60277 (0.0019) [2024-06-15 12:19:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 123469824. Throughput: 0: 11411.9. Samples: 30905344. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 12:19:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:19:08,895][1652475] Updated weights for policy 0, policy_version 60344 (0.0012) [2024-06-15 12:19:10,366][1652475] Updated weights for policy 0, policy_version 60389 (0.0080) [2024-06-15 12:19:10,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 46967.4, 300 sec: 44320.1). Total num frames: 123731968. Throughput: 0: 11446.0. Samples: 30979584. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 12:19:10,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:19:13,178][1652475] Updated weights for policy 0, policy_version 60449 (0.0013) [2024-06-15 12:19:14,533][1652475] Updated weights for policy 0, policy_version 60496 (0.0012) [2024-06-15 12:19:15,689][1652475] Updated weights for policy 0, policy_version 60538 (0.0027) [2024-06-15 12:19:15,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 46421.3, 300 sec: 44320.1). Total num frames: 123961344. Throughput: 0: 11639.5. Samples: 31042560. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 12:19:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:19:20,070][1652475] Updated weights for policy 0, policy_version 60592 (0.0014) [2024-06-15 12:19:20,738][1648984] Fps is (10 sec: 42599.3, 60 sec: 46421.4, 300 sec: 44320.1). Total num frames: 124157952. Throughput: 0: 11559.8. Samples: 31086080. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 12:19:20,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:19:21,612][1652475] Updated weights for policy 0, policy_version 60670 (0.0015) [2024-06-15 12:19:24,303][1651340] Signal inference workers to stop experience collection... (3200 times) [2024-06-15 12:19:24,415][1652475] InferenceWorker_p0-w0: stopping experience collection (3200 times) [2024-06-15 12:19:24,557][1651340] Signal inference workers to resume experience collection... (3200 times) [2024-06-15 12:19:24,559][1652475] InferenceWorker_p0-w0: resuming experience collection (3200 times) [2024-06-15 12:19:25,269][1652475] Updated weights for policy 0, policy_version 60736 (0.0049) [2024-06-15 12:19:25,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45875.3, 300 sec: 43986.9). Total num frames: 124387328. Throughput: 0: 11537.1. Samples: 31152128. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 12:19:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:19:27,161][1652475] Updated weights for policy 0, policy_version 60792 (0.0013) [2024-06-15 12:19:30,738][1648984] Fps is (10 sec: 36044.1, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 124518400. Throughput: 0: 11514.3. Samples: 31221760. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 12:19:30,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:19:31,820][1652475] Updated weights for policy 0, policy_version 60835 (0.0012) [2024-06-15 12:19:33,392][1652475] Updated weights for policy 0, policy_version 60912 (0.0022) [2024-06-15 12:19:35,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 45875.0, 300 sec: 43764.7). Total num frames: 124780544. Throughput: 0: 11434.6. Samples: 31250944. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 12:19:35,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:19:37,227][1652475] Updated weights for policy 0, policy_version 60981 (0.0070) [2024-06-15 12:19:38,154][1652475] Updated weights for policy 0, policy_version 61008 (0.0011) [2024-06-15 12:19:40,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 45329.3, 300 sec: 44098.0). Total num frames: 125042688. Throughput: 0: 11264.1. Samples: 31312896. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 12:19:40,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:19:42,460][1652475] Updated weights for policy 0, policy_version 61060 (0.0022) [2024-06-15 12:19:43,447][1652475] Updated weights for policy 0, policy_version 61120 (0.0014) [2024-06-15 12:19:45,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 45329.0, 300 sec: 43986.9). Total num frames: 125206528. Throughput: 0: 11286.8. Samples: 31385600. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 12:19:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:19:48,544][1652475] Updated weights for policy 0, policy_version 61189 (0.0014) [2024-06-15 12:19:50,738][1648984] Fps is (10 sec: 42597.3, 60 sec: 44782.8, 300 sec: 44097.9). Total num frames: 125468672. Throughput: 0: 11377.7. Samples: 31417344. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 12:19:50,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:19:51,571][1652475] Updated weights for policy 0, policy_version 61296 (0.0014) [2024-06-15 12:19:55,554][1652475] Updated weights for policy 0, policy_version 61375 (0.0014) [2024-06-15 12:19:55,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 45875.3, 300 sec: 44209.0). Total num frames: 125698048. Throughput: 0: 11013.7. Samples: 31475200. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 12:19:55,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:19:58,952][1652475] Updated weights for policy 0, policy_version 61433 (0.0013) [2024-06-15 12:20:00,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 125829120. Throughput: 0: 11138.9. Samples: 31543808. Policy #0 lag: (min: 13.0, avg: 109.2, max: 269.0) [2024-06-15 12:20:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:20:01,802][1652475] Updated weights for policy 0, policy_version 61488 (0.0017) [2024-06-15 12:20:03,498][1652475] Updated weights for policy 0, policy_version 61552 (0.0013) [2024-06-15 12:20:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 126091264. Throughput: 0: 10820.2. Samples: 31572992. Policy #0 lag: (min: 13.0, avg: 109.2, max: 269.0) [2024-06-15 12:20:05,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:20:06,434][1652475] Updated weights for policy 0, policy_version 61584 (0.0015) [2024-06-15 12:20:08,852][1652475] Updated weights for policy 0, policy_version 61633 (0.0025) [2024-06-15 12:20:10,742][1648984] Fps is (10 sec: 52404.7, 60 sec: 43687.4, 300 sec: 44097.3). Total num frames: 126353408. Throughput: 0: 10921.5. Samples: 31643648. Policy #0 lag: (min: 13.0, avg: 109.2, max: 269.0) [2024-06-15 12:20:10,743][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:20:12,657][1651340] Signal inference workers to stop experience collection... (3250 times) [2024-06-15 12:20:12,717][1652475] InferenceWorker_p0-w0: stopping experience collection (3250 times) [2024-06-15 12:20:12,720][1652475] Updated weights for policy 0, policy_version 61698 (0.0014) [2024-06-15 12:20:12,911][1651340] Signal inference workers to resume experience collection... (3250 times) [2024-06-15 12:20:12,912][1652475] InferenceWorker_p0-w0: resuming experience collection (3250 times) [2024-06-15 12:20:14,758][1652475] Updated weights for policy 0, policy_version 61779 (0.0012) [2024-06-15 12:20:15,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 44320.2). Total num frames: 126582784. Throughput: 0: 10706.5. Samples: 31703552. Policy #0 lag: (min: 13.0, avg: 109.2, max: 269.0) [2024-06-15 12:20:15,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 12:20:15,912][1652475] Updated weights for policy 0, policy_version 61824 (0.0012) [2024-06-15 12:20:20,738][1648984] Fps is (10 sec: 36061.5, 60 sec: 42598.3, 300 sec: 43875.8). Total num frames: 126713856. Throughput: 0: 10854.5. Samples: 31739392. Policy #0 lag: (min: 13.0, avg: 109.2, max: 269.0) [2024-06-15 12:20:20,738][1648984] Avg episode reward: [(0, '-0.660')] [2024-06-15 12:20:20,988][1652475] Updated weights for policy 0, policy_version 61889 (0.0017) [2024-06-15 12:20:24,739][1652475] Updated weights for policy 0, policy_version 61956 (0.0015) [2024-06-15 12:20:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.5, 300 sec: 43875.8). Total num frames: 126976000. Throughput: 0: 10865.8. Samples: 31801856. Policy #0 lag: (min: 13.0, avg: 109.2, max: 269.0) [2024-06-15 12:20:25,738][1648984] Avg episode reward: [(0, '-0.740')] [2024-06-15 12:20:25,915][1652475] Updated weights for policy 0, policy_version 62010 (0.0017) [2024-06-15 12:20:27,826][1652475] Updated weights for policy 0, policy_version 62073 (0.0013) [2024-06-15 12:20:30,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.8, 300 sec: 43653.7). Total num frames: 127139840. Throughput: 0: 10763.4. Samples: 31869952. Policy #0 lag: (min: 13.0, avg: 109.2, max: 269.0) [2024-06-15 12:20:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:20:32,120][1652475] Updated weights for policy 0, policy_version 62112 (0.0150) [2024-06-15 12:20:33,428][1652475] Updated weights for policy 0, policy_version 62162 (0.0022) [2024-06-15 12:20:34,397][1652475] Updated weights for policy 0, policy_version 62206 (0.0013) [2024-06-15 12:20:35,738][1648984] Fps is (10 sec: 42597.5, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 127401984. Throughput: 0: 10808.9. Samples: 31903744. Policy #0 lag: (min: 13.0, avg: 109.2, max: 269.0) [2024-06-15 12:20:35,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:20:37,418][1652475] Updated weights for policy 0, policy_version 62264 (0.0014) [2024-06-15 12:20:39,430][1652475] Updated weights for policy 0, policy_version 62326 (0.0015) [2024-06-15 12:20:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 127664128. Throughput: 0: 10922.7. Samples: 31966720. Policy #0 lag: (min: 24.0, avg: 142.4, max: 280.0) [2024-06-15 12:20:40,740][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:20:43,998][1652475] Updated weights for policy 0, policy_version 62368 (0.0014) [2024-06-15 12:20:45,738][1648984] Fps is (10 sec: 42599.5, 60 sec: 43690.8, 300 sec: 44098.0). Total num frames: 127827968. Throughput: 0: 10843.0. Samples: 32031744. Policy #0 lag: (min: 24.0, avg: 142.4, max: 280.0) [2024-06-15 12:20:45,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:20:46,354][1652475] Updated weights for policy 0, policy_version 62454 (0.0013) [2024-06-15 12:20:48,496][1652475] Updated weights for policy 0, policy_version 62482 (0.0013) [2024-06-15 12:20:50,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43144.8, 300 sec: 43986.9). Total num frames: 128057344. Throughput: 0: 10899.9. Samples: 32063488. Policy #0 lag: (min: 24.0, avg: 142.4, max: 280.0) [2024-06-15 12:20:50,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:20:51,399][1652475] Updated weights for policy 0, policy_version 62547 (0.0013) [2024-06-15 12:20:55,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 128188416. Throughput: 0: 10832.7. Samples: 32131072. Policy #0 lag: (min: 24.0, avg: 142.4, max: 280.0) [2024-06-15 12:20:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:20:56,083][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000062608_128221184.pth... [2024-06-15 12:20:56,222][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000057472_117702656.pth [2024-06-15 12:20:56,736][1652475] Updated weights for policy 0, policy_version 62630 (0.0015) [2024-06-15 12:20:57,525][1651340] Signal inference workers to stop experience collection... (3300 times) [2024-06-15 12:20:57,598][1652475] InferenceWorker_p0-w0: stopping experience collection (3300 times) [2024-06-15 12:20:57,799][1651340] Signal inference workers to resume experience collection... (3300 times) [2024-06-15 12:20:57,800][1652475] InferenceWorker_p0-w0: resuming experience collection (3300 times) [2024-06-15 12:20:58,715][1652475] Updated weights for policy 0, policy_version 62715 (0.0014) [2024-06-15 12:21:00,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 128483328. Throughput: 0: 10899.9. Samples: 32194048. Policy #0 lag: (min: 24.0, avg: 142.4, max: 280.0) [2024-06-15 12:21:00,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:03,117][1652475] Updated weights for policy 0, policy_version 62785 (0.0015) [2024-06-15 12:21:04,361][1652475] Updated weights for policy 0, policy_version 62847 (0.0025) [2024-06-15 12:21:05,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 128712704. Throughput: 0: 10843.0. Samples: 32227328. Policy #0 lag: (min: 24.0, avg: 142.4, max: 280.0) [2024-06-15 12:21:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:09,272][1652475] Updated weights for policy 0, policy_version 62912 (0.0121) [2024-06-15 12:21:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43147.9, 300 sec: 44209.2). Total num frames: 128942080. Throughput: 0: 10934.1. Samples: 32293888. Policy #0 lag: (min: 24.0, avg: 142.4, max: 280.0) [2024-06-15 12:21:10,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:10,798][1652475] Updated weights for policy 0, policy_version 62974 (0.0013) [2024-06-15 12:21:13,394][1652475] Updated weights for policy 0, policy_version 63034 (0.0013) [2024-06-15 12:21:15,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43144.5, 300 sec: 44209.0). Total num frames: 129171456. Throughput: 0: 10877.1. Samples: 32359424. Policy #0 lag: (min: 31.0, avg: 157.5, max: 287.0) [2024-06-15 12:21:15,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:16,023][1652475] Updated weights for policy 0, policy_version 63090 (0.0025) [2024-06-15 12:21:20,247][1652475] Updated weights for policy 0, policy_version 63152 (0.0151) [2024-06-15 12:21:20,740][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 129368064. Throughput: 0: 10945.5. Samples: 32396288. Policy #0 lag: (min: 31.0, avg: 157.5, max: 287.0) [2024-06-15 12:21:20,741][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:22,191][1652475] Updated weights for policy 0, policy_version 63221 (0.0014) [2024-06-15 12:21:24,538][1652475] Updated weights for policy 0, policy_version 63264 (0.0012) [2024-06-15 12:21:25,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 43987.2). Total num frames: 129630208. Throughput: 0: 10911.3. Samples: 32457728. Policy #0 lag: (min: 31.0, avg: 157.5, max: 287.0) [2024-06-15 12:21:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:26,931][1652475] Updated weights for policy 0, policy_version 63302 (0.0013) [2024-06-15 12:21:30,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 44098.0). Total num frames: 129761280. Throughput: 0: 11059.2. Samples: 32529408. Policy #0 lag: (min: 31.0, avg: 157.5, max: 287.0) [2024-06-15 12:21:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:31,018][1652475] Updated weights for policy 0, policy_version 63362 (0.0014) [2024-06-15 12:21:32,740][1652475] Updated weights for policy 0, policy_version 63426 (0.0013) [2024-06-15 12:21:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 130023424. Throughput: 0: 10934.0. Samples: 32555520. Policy #0 lag: (min: 31.0, avg: 157.5, max: 287.0) [2024-06-15 12:21:35,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:36,239][1652475] Updated weights for policy 0, policy_version 63504 (0.0013) [2024-06-15 12:21:39,607][1652475] Updated weights for policy 0, policy_version 63570 (0.0013) [2024-06-15 12:21:40,745][1648984] Fps is (10 sec: 49117.9, 60 sec: 43139.6, 300 sec: 44320.0). Total num frames: 130252800. Throughput: 0: 10886.9. Samples: 32621056. Policy #0 lag: (min: 31.0, avg: 157.5, max: 287.0) [2024-06-15 12:21:40,745][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:44,443][1652475] Updated weights for policy 0, policy_version 63636 (0.0022) [2024-06-15 12:21:44,771][1651340] Signal inference workers to stop experience collection... (3350 times) [2024-06-15 12:21:44,821][1652475] InferenceWorker_p0-w0: stopping experience collection (3350 times) [2024-06-15 12:21:45,053][1651340] Signal inference workers to resume experience collection... (3350 times) [2024-06-15 12:21:45,054][1652475] InferenceWorker_p0-w0: resuming experience collection (3350 times) [2024-06-15 12:21:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 130416640. Throughput: 0: 10888.5. Samples: 32684032. Policy #0 lag: (min: 31.0, avg: 157.5, max: 287.0) [2024-06-15 12:21:45,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:46,191][1652475] Updated weights for policy 0, policy_version 63712 (0.0014) [2024-06-15 12:21:48,603][1652475] Updated weights for policy 0, policy_version 63762 (0.0016) [2024-06-15 12:21:49,401][1652475] Updated weights for policy 0, policy_version 63805 (0.0015) [2024-06-15 12:21:50,738][1648984] Fps is (10 sec: 42627.4, 60 sec: 43690.5, 300 sec: 43986.9). Total num frames: 130678784. Throughput: 0: 10808.9. Samples: 32713728. Policy #0 lag: (min: 31.0, avg: 157.5, max: 287.0) [2024-06-15 12:21:50,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:21:52,828][1652475] Updated weights for policy 0, policy_version 63865 (0.0053) [2024-06-15 12:21:55,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 130809856. Throughput: 0: 10865.8. Samples: 32782848. Policy #0 lag: (min: 31.0, avg: 157.5, max: 287.0) [2024-06-15 12:21:55,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:21:57,147][1652475] Updated weights for policy 0, policy_version 63906 (0.0013) [2024-06-15 12:21:58,748][1652475] Updated weights for policy 0, policy_version 63991 (0.0015) [2024-06-15 12:22:00,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.6, 300 sec: 43653.7). Total num frames: 131104768. Throughput: 0: 10740.6. Samples: 32842752. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 12:22:00,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 12:22:01,491][1652475] Updated weights for policy 0, policy_version 64048 (0.0015) [2024-06-15 12:22:04,540][1652475] Updated weights for policy 0, policy_version 64100 (0.0013) [2024-06-15 12:22:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 131334144. Throughput: 0: 10558.6. Samples: 32871424. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 12:22:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:22:08,963][1652475] Updated weights for policy 0, policy_version 64160 (0.0013) [2024-06-15 12:22:10,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42052.2, 300 sec: 43764.7). Total num frames: 131465216. Throughput: 0: 10604.1. Samples: 32934912. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 12:22:10,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:22:10,842][1652475] Updated weights for policy 0, policy_version 64208 (0.0016) [2024-06-15 12:22:12,082][1652475] Updated weights for policy 0, policy_version 64256 (0.0013) [2024-06-15 12:22:15,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 131727360. Throughput: 0: 10387.9. Samples: 32996864. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 12:22:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:22:16,050][1652475] Updated weights for policy 0, policy_version 64339 (0.0012) [2024-06-15 12:22:20,344][1652475] Updated weights for policy 0, policy_version 64391 (0.0014) [2024-06-15 12:22:20,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 43986.9). Total num frames: 131891200. Throughput: 0: 10513.1. Samples: 33028608. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 12:22:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:22:23,005][1652475] Updated weights for policy 0, policy_version 64450 (0.0013) [2024-06-15 12:22:24,443][1652475] Updated weights for policy 0, policy_version 64509 (0.0036) [2024-06-15 12:22:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 132120576. Throughput: 0: 10503.3. Samples: 33093632. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 12:22:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:22:27,482][1652475] Updated weights for policy 0, policy_version 64569 (0.0026) [2024-06-15 12:22:29,079][1652475] Updated weights for policy 0, policy_version 64631 (0.0012) [2024-06-15 12:22:30,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 132382720. Throughput: 0: 10581.3. Samples: 33160192. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 12:22:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:22:33,595][1651340] Signal inference workers to stop experience collection... (3400 times) [2024-06-15 12:22:33,626][1652475] InferenceWorker_p0-w0: stopping experience collection (3400 times) [2024-06-15 12:22:33,955][1651340] Signal inference workers to resume experience collection... (3400 times) [2024-06-15 12:22:33,956][1652475] InferenceWorker_p0-w0: resuming experience collection (3400 times) [2024-06-15 12:22:34,425][1652475] Updated weights for policy 0, policy_version 64688 (0.0018) [2024-06-15 12:22:35,738][1648984] Fps is (10 sec: 45873.8, 60 sec: 42598.1, 300 sec: 43875.8). Total num frames: 132579328. Throughput: 0: 10740.6. Samples: 33197056. Policy #0 lag: (min: 15.0, avg: 96.4, max: 271.0) [2024-06-15 12:22:35,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:22:35,883][1652475] Updated weights for policy 0, policy_version 64752 (0.0096) [2024-06-15 12:22:39,488][1652475] Updated weights for policy 0, policy_version 64828 (0.0016) [2024-06-15 12:22:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42057.1, 300 sec: 43875.8). Total num frames: 132775936. Throughput: 0: 10387.9. Samples: 33250304. Policy #0 lag: (min: 15.0, avg: 127.0, max: 271.0) [2024-06-15 12:22:40,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:22:41,752][1652475] Updated weights for policy 0, policy_version 64880 (0.0011) [2024-06-15 12:22:45,738][1648984] Fps is (10 sec: 32769.2, 60 sec: 41506.1, 300 sec: 43764.7). Total num frames: 132907008. Throughput: 0: 10422.0. Samples: 33311744. Policy #0 lag: (min: 15.0, avg: 127.0, max: 271.0) [2024-06-15 12:22:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:22:47,877][1652475] Updated weights for policy 0, policy_version 64956 (0.0014) [2024-06-15 12:22:49,363][1652475] Updated weights for policy 0, policy_version 65021 (0.0015) [2024-06-15 12:22:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 133234688. Throughput: 0: 10422.1. Samples: 33340416. Policy #0 lag: (min: 15.0, avg: 127.0, max: 271.0) [2024-06-15 12:22:50,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:22:50,840][1652475] Updated weights for policy 0, policy_version 65072 (0.0011) [2024-06-15 12:22:54,759][1652475] Updated weights for policy 0, policy_version 65094 (0.0013) [2024-06-15 12:22:55,739][1648984] Fps is (10 sec: 45874.9, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 133365760. Throughput: 0: 10558.6. Samples: 33410048. Policy #0 lag: (min: 15.0, avg: 127.0, max: 271.0) [2024-06-15 12:22:55,740][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:22:55,864][1652475] Updated weights for policy 0, policy_version 65140 (0.0014) [2024-06-15 12:22:56,092][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000065152_133431296.pth... [2024-06-15 12:22:56,167][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000060032_122945536.pth [2024-06-15 12:22:59,033][1652475] Updated weights for policy 0, policy_version 65184 (0.0021) [2024-06-15 12:23:00,231][1652475] Updated weights for policy 0, policy_version 65227 (0.0013) [2024-06-15 12:23:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 133627904. Throughput: 0: 10638.2. Samples: 33475584. Policy #0 lag: (min: 15.0, avg: 127.0, max: 271.0) [2024-06-15 12:23:00,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:01,398][1652475] Updated weights for policy 0, policy_version 65273 (0.0013) [2024-06-15 12:23:02,937][1652475] Updated weights for policy 0, policy_version 65339 (0.0013) [2024-06-15 12:23:05,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 41506.2, 300 sec: 43764.7). Total num frames: 133824512. Throughput: 0: 10592.7. Samples: 33505280. Policy #0 lag: (min: 15.0, avg: 127.0, max: 271.0) [2024-06-15 12:23:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:07,009][1652475] Updated weights for policy 0, policy_version 65379 (0.0013) [2024-06-15 12:23:10,048][1652475] Updated weights for policy 0, policy_version 65424 (0.0014) [2024-06-15 12:23:10,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.6, 300 sec: 43653.6). Total num frames: 134053888. Throughput: 0: 10888.6. Samples: 33583616. Policy #0 lag: (min: 15.0, avg: 127.0, max: 271.0) [2024-06-15 12:23:10,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:11,887][1652475] Updated weights for policy 0, policy_version 65488 (0.0014) [2024-06-15 12:23:12,820][1652475] Updated weights for policy 0, policy_version 65534 (0.0012) [2024-06-15 12:23:14,308][1652475] Updated weights for policy 0, policy_version 65584 (0.0013) [2024-06-15 12:23:15,742][1648984] Fps is (10 sec: 52405.2, 60 sec: 43687.4, 300 sec: 43986.2). Total num frames: 134348800. Throughput: 0: 10785.1. Samples: 33645568. Policy #0 lag: (min: 15.0, avg: 127.0, max: 271.0) [2024-06-15 12:23:15,744][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:17,737][1651340] Signal inference workers to stop experience collection... (3450 times) [2024-06-15 12:23:17,819][1652475] InferenceWorker_p0-w0: stopping experience collection (3450 times) [2024-06-15 12:23:17,821][1652475] Updated weights for policy 0, policy_version 65636 (0.0013) [2024-06-15 12:23:18,026][1651340] Signal inference workers to resume experience collection... (3450 times) [2024-06-15 12:23:18,027][1652475] InferenceWorker_p0-w0: resuming experience collection (3450 times) [2024-06-15 12:23:20,740][1648984] Fps is (10 sec: 42598.2, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 134479872. Throughput: 0: 10752.1. Samples: 33680896. Policy #0 lag: (min: 15.0, avg: 127.0, max: 271.0) [2024-06-15 12:23:20,741][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:22,154][1652475] Updated weights for policy 0, policy_version 65698 (0.0013) [2024-06-15 12:23:23,325][1652475] Updated weights for policy 0, policy_version 65746 (0.0039) [2024-06-15 12:23:25,121][1652475] Updated weights for policy 0, policy_version 65812 (0.0012) [2024-06-15 12:23:25,738][1648984] Fps is (10 sec: 49174.2, 60 sec: 45329.1, 300 sec: 43875.8). Total num frames: 134840320. Throughput: 0: 11093.3. Samples: 33749504. Policy #0 lag: (min: 15.0, avg: 96.5, max: 271.0) [2024-06-15 12:23:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:28,796][1652475] Updated weights for policy 0, policy_version 65861 (0.0017) [2024-06-15 12:23:29,819][1652475] Updated weights for policy 0, policy_version 65910 (0.0018) [2024-06-15 12:23:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 135004160. Throughput: 0: 11275.4. Samples: 33819136. Policy #0 lag: (min: 15.0, avg: 96.5, max: 271.0) [2024-06-15 12:23:30,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:33,863][1652475] Updated weights for policy 0, policy_version 65952 (0.0013) [2024-06-15 12:23:35,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43690.9, 300 sec: 43653.7). Total num frames: 135200768. Throughput: 0: 11548.5. Samples: 33860096. Policy #0 lag: (min: 15.0, avg: 96.5, max: 271.0) [2024-06-15 12:23:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:35,997][1652475] Updated weights for policy 0, policy_version 66043 (0.0013) [2024-06-15 12:23:37,832][1652475] Updated weights for policy 0, policy_version 66112 (0.0012) [2024-06-15 12:23:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 135397376. Throughput: 0: 11184.4. Samples: 33913344. Policy #0 lag: (min: 15.0, avg: 96.5, max: 271.0) [2024-06-15 12:23:40,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:45,637][1652475] Updated weights for policy 0, policy_version 66178 (0.0015) [2024-06-15 12:23:45,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 135528448. Throughput: 0: 11343.6. Samples: 33986048. Policy #0 lag: (min: 15.0, avg: 96.5, max: 271.0) [2024-06-15 12:23:45,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:47,243][1652475] Updated weights for policy 0, policy_version 66240 (0.0014) [2024-06-15 12:23:48,761][1652475] Updated weights for policy 0, policy_version 66292 (0.0014) [2024-06-15 12:23:50,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 135888896. Throughput: 0: 11195.7. Samples: 34009088. Policy #0 lag: (min: 15.0, avg: 96.5, max: 271.0) [2024-06-15 12:23:50,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:53,529][1652475] Updated weights for policy 0, policy_version 66370 (0.0013) [2024-06-15 12:23:55,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 136052736. Throughput: 0: 10843.0. Samples: 34071552. Policy #0 lag: (min: 15.0, avg: 96.5, max: 271.0) [2024-06-15 12:23:55,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:23:58,481][1652475] Updated weights for policy 0, policy_version 66464 (0.0014) [2024-06-15 12:24:00,612][1652475] Updated weights for policy 0, policy_version 66529 (0.0014) [2024-06-15 12:24:00,783][1648984] Fps is (10 sec: 35883.1, 60 sec: 43657.9, 300 sec: 43313.8). Total num frames: 136249344. Throughput: 0: 10969.7. Samples: 34139648. Policy #0 lag: (min: 15.0, avg: 96.5, max: 271.0) [2024-06-15 12:24:00,784][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:24:02,002][1651340] Signal inference workers to stop experience collection... (3500 times) [2024-06-15 12:24:02,088][1652475] InferenceWorker_p0-w0: stopping experience collection (3500 times) [2024-06-15 12:24:02,272][1651340] Signal inference workers to resume experience collection... (3500 times) [2024-06-15 12:24:02,272][1652475] InferenceWorker_p0-w0: resuming experience collection (3500 times) [2024-06-15 12:24:02,768][1652475] Updated weights for policy 0, policy_version 66623 (0.0013) [2024-06-15 12:24:05,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 136445952. Throughput: 0: 10729.3. Samples: 34163712. Policy #0 lag: (min: 111.0, avg: 211.7, max: 335.0) [2024-06-15 12:24:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:24:10,157][1652475] Updated weights for policy 0, policy_version 66690 (0.0027) [2024-06-15 12:24:10,738][1648984] Fps is (10 sec: 36207.6, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 136609792. Throughput: 0: 10797.5. Samples: 34235392. Policy #0 lag: (min: 111.0, avg: 211.7, max: 335.0) [2024-06-15 12:24:10,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:24:11,633][1652475] Updated weights for policy 0, policy_version 66749 (0.0012) [2024-06-15 12:24:13,391][1652475] Updated weights for policy 0, policy_version 66803 (0.0011) [2024-06-15 12:24:14,731][1652475] Updated weights for policy 0, policy_version 66876 (0.0016) [2024-06-15 12:24:15,751][1648984] Fps is (10 sec: 52357.9, 60 sec: 43684.1, 300 sec: 43429.5). Total num frames: 136970240. Throughput: 0: 10521.3. Samples: 34292736. Policy #0 lag: (min: 111.0, avg: 211.7, max: 335.0) [2024-06-15 12:24:15,752][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:24:19,161][1652475] Updated weights for policy 0, policy_version 66941 (0.0013) [2024-06-15 12:24:20,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 137101312. Throughput: 0: 10478.9. Samples: 34331648. Policy #0 lag: (min: 111.0, avg: 211.7, max: 335.0) [2024-06-15 12:24:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 12:24:20,739][1651340] Saving new best policy, reward=-0.320! [2024-06-15 12:24:25,518][1652475] Updated weights for policy 0, policy_version 67027 (0.0014) [2024-06-15 12:24:25,738][1648984] Fps is (10 sec: 32811.7, 60 sec: 40959.8, 300 sec: 43320.4). Total num frames: 137297920. Throughput: 0: 10683.7. Samples: 34394112. Policy #0 lag: (min: 111.0, avg: 211.7, max: 335.0) [2024-06-15 12:24:25,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:24:26,847][1652475] Updated weights for policy 0, policy_version 67091 (0.0012) [2024-06-15 12:24:29,808][1652475] Updated weights for policy 0, policy_version 67152 (0.0016) [2024-06-15 12:24:30,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 137592832. Throughput: 0: 10410.7. Samples: 34454528. Policy #0 lag: (min: 111.0, avg: 211.7, max: 335.0) [2024-06-15 12:24:30,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 12:24:35,271][1652475] Updated weights for policy 0, policy_version 67218 (0.0012) [2024-06-15 12:24:35,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 137691136. Throughput: 0: 10695.1. Samples: 34490368. Policy #0 lag: (min: 111.0, avg: 211.7, max: 335.0) [2024-06-15 12:24:35,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 12:24:37,200][1652475] Updated weights for policy 0, policy_version 67312 (0.0108) [2024-06-15 12:24:38,568][1652475] Updated weights for policy 0, policy_version 67376 (0.0015) [2024-06-15 12:24:40,740][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 138018816. Throughput: 0: 10740.6. Samples: 34554880. Policy #0 lag: (min: 111.0, avg: 211.7, max: 335.0) [2024-06-15 12:24:40,741][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 12:24:41,761][1652475] Updated weights for policy 0, policy_version 67424 (0.0013) [2024-06-15 12:24:45,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 138149888. Throughput: 0: 10888.1. Samples: 34629120. Policy #0 lag: (min: 111.0, avg: 211.7, max: 335.0) [2024-06-15 12:24:45,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 12:24:47,373][1652475] Updated weights for policy 0, policy_version 67488 (0.0015) [2024-06-15 12:24:48,183][1651340] Signal inference workers to stop experience collection... (3550 times) [2024-06-15 12:24:48,274][1652475] InferenceWorker_p0-w0: stopping experience collection (3550 times) [2024-06-15 12:24:48,370][1651340] Signal inference workers to resume experience collection... (3550 times) [2024-06-15 12:24:48,370][1652475] InferenceWorker_p0-w0: resuming experience collection (3550 times) [2024-06-15 12:24:48,906][1652475] Updated weights for policy 0, policy_version 67556 (0.0115) [2024-06-15 12:24:50,128][1652475] Updated weights for policy 0, policy_version 67616 (0.0014) [2024-06-15 12:24:50,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 138510336. Throughput: 0: 11070.6. Samples: 34661888. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 12:24:50,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 12:24:53,243][1652475] Updated weights for policy 0, policy_version 67682 (0.0012) [2024-06-15 12:24:55,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 138674176. Throughput: 0: 10820.3. Samples: 34722304. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 12:24:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:24:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000067712_138674176.pth... [2024-06-15 12:24:55,793][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000062608_128221184.pth [2024-06-15 12:24:58,660][1652475] Updated weights for policy 0, policy_version 67717 (0.0013) [2024-06-15 12:25:00,434][1652475] Updated weights for policy 0, policy_version 67779 (0.0013) [2024-06-15 12:25:00,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43177.0, 300 sec: 43209.3). Total num frames: 138838016. Throughput: 0: 11142.2. Samples: 34793984. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 12:25:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:02,446][1652475] Updated weights for policy 0, policy_version 67872 (0.0014) [2024-06-15 12:25:05,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 43210.0). Total num frames: 139100160. Throughput: 0: 10831.7. Samples: 34819072. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 12:25:05,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 12:25:06,536][1652475] Updated weights for policy 0, policy_version 67952 (0.0013) [2024-06-15 12:25:10,270][1652475] Updated weights for policy 0, policy_version 68000 (0.0015) [2024-06-15 12:25:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 43098.3). Total num frames: 139296768. Throughput: 0: 10991.0. Samples: 34888704. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 12:25:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:13,193][1652475] Updated weights for policy 0, policy_version 68080 (0.0014) [2024-06-15 12:25:15,094][1652475] Updated weights for policy 0, policy_version 68160 (0.0017) [2024-06-15 12:25:15,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43700.5, 300 sec: 43653.6). Total num frames: 139591680. Throughput: 0: 11002.3. Samples: 34949632. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 12:25:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:19,039][1652475] Updated weights for policy 0, policy_version 68220 (0.0014) [2024-06-15 12:25:20,738][1648984] Fps is (10 sec: 42596.9, 60 sec: 43690.5, 300 sec: 43209.3). Total num frames: 139722752. Throughput: 0: 11025.0. Samples: 34986496. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 12:25:20,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:24,173][1652475] Updated weights for policy 0, policy_version 68290 (0.0088) [2024-06-15 12:25:25,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 44783.0, 300 sec: 43542.5). Total num frames: 139984896. Throughput: 0: 11195.7. Samples: 35058688. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 12:25:25,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:25,928][1652475] Updated weights for policy 0, policy_version 68354 (0.0012) [2024-06-15 12:25:29,261][1652475] Updated weights for policy 0, policy_version 68420 (0.0034) [2024-06-15 12:25:30,738][1648984] Fps is (10 sec: 52431.3, 60 sec: 44236.9, 300 sec: 43542.6). Total num frames: 140247040. Throughput: 0: 10865.8. Samples: 35118080. Policy #0 lag: (min: 4.0, avg: 130.4, max: 260.0) [2024-06-15 12:25:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:34,193][1651340] Signal inference workers to stop experience collection... (3600 times) [2024-06-15 12:25:34,237][1652475] InferenceWorker_p0-w0: stopping experience collection (3600 times) [2024-06-15 12:25:34,261][1652475] Updated weights for policy 0, policy_version 68500 (0.0021) [2024-06-15 12:25:34,422][1651340] Signal inference workers to resume experience collection... (3600 times) [2024-06-15 12:25:34,423][1652475] InferenceWorker_p0-w0: resuming experience collection (3600 times) [2024-06-15 12:25:35,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 140378112. Throughput: 0: 10968.2. Samples: 35155456. Policy #0 lag: (min: 4.0, avg: 130.4, max: 260.0) [2024-06-15 12:25:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:36,480][1652475] Updated weights for policy 0, policy_version 68562 (0.0014) [2024-06-15 12:25:38,782][1652475] Updated weights for policy 0, policy_version 68656 (0.0013) [2024-06-15 12:25:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 140640256. Throughput: 0: 10831.7. Samples: 35209728. Policy #0 lag: (min: 4.0, avg: 130.4, max: 260.0) [2024-06-15 12:25:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:42,573][1652475] Updated weights for policy 0, policy_version 68720 (0.0014) [2024-06-15 12:25:45,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 140804096. Throughput: 0: 10968.2. Samples: 35287552. Policy #0 lag: (min: 4.0, avg: 130.4, max: 260.0) [2024-06-15 12:25:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:45,871][1652475] Updated weights for policy 0, policy_version 68768 (0.0016) [2024-06-15 12:25:48,308][1652475] Updated weights for policy 0, policy_version 68820 (0.0011) [2024-06-15 12:25:50,092][1652475] Updated weights for policy 0, policy_version 68896 (0.0015) [2024-06-15 12:25:50,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 141164544. Throughput: 0: 11138.8. Samples: 35320320. Policy #0 lag: (min: 4.0, avg: 130.4, max: 260.0) [2024-06-15 12:25:50,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:52,959][1652475] Updated weights for policy 0, policy_version 68929 (0.0038) [2024-06-15 12:25:54,132][1652475] Updated weights for policy 0, policy_version 68984 (0.0014) [2024-06-15 12:25:55,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 141295616. Throughput: 0: 11104.7. Samples: 35388416. Policy #0 lag: (min: 4.0, avg: 130.4, max: 260.0) [2024-06-15 12:25:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:25:57,591][1652475] Updated weights for policy 0, policy_version 69052 (0.0014) [2024-06-15 12:26:00,244][1652475] Updated weights for policy 0, policy_version 69104 (0.0013) [2024-06-15 12:26:00,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 45329.1, 300 sec: 43542.6). Total num frames: 141557760. Throughput: 0: 11275.4. Samples: 35457024. Policy #0 lag: (min: 4.0, avg: 130.4, max: 260.0) [2024-06-15 12:26:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:26:02,063][1652475] Updated weights for policy 0, policy_version 69184 (0.0012) [2024-06-15 12:26:05,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 141787136. Throughput: 0: 11104.8. Samples: 35486208. Policy #0 lag: (min: 4.0, avg: 130.4, max: 260.0) [2024-06-15 12:26:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:26:08,466][1652475] Updated weights for policy 0, policy_version 69250 (0.0011) [2024-06-15 12:26:09,628][1652475] Updated weights for policy 0, policy_version 69311 (0.0019) [2024-06-15 12:26:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 141950976. Throughput: 0: 11093.4. Samples: 35557888. Policy #0 lag: (min: 4.0, avg: 130.4, max: 260.0) [2024-06-15 12:26:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:26:12,916][1652475] Updated weights for policy 0, policy_version 69392 (0.0011) [2024-06-15 12:26:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 142213120. Throughput: 0: 11172.9. Samples: 35620864. Policy #0 lag: (min: 31.0, avg: 129.9, max: 287.0) [2024-06-15 12:26:15,741][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:26:16,394][1652475] Updated weights for policy 0, policy_version 69443 (0.0013) [2024-06-15 12:26:17,144][1651340] Signal inference workers to stop experience collection... (3650 times) [2024-06-15 12:26:17,166][1652475] InferenceWorker_p0-w0: stopping experience collection (3650 times) [2024-06-15 12:26:17,386][1651340] Signal inference workers to resume experience collection... (3650 times) [2024-06-15 12:26:17,387][1652475] InferenceWorker_p0-w0: resuming experience collection (3650 times) [2024-06-15 12:26:20,058][1652475] Updated weights for policy 0, policy_version 69520 (0.0014) [2024-06-15 12:26:20,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 45329.4, 300 sec: 43431.5). Total num frames: 142442496. Throughput: 0: 11138.9. Samples: 35656704. Policy #0 lag: (min: 31.0, avg: 129.9, max: 287.0) [2024-06-15 12:26:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:26:23,096][1652475] Updated weights for policy 0, policy_version 69577 (0.0012) [2024-06-15 12:26:24,284][1652475] Updated weights for policy 0, policy_version 69625 (0.0015) [2024-06-15 12:26:25,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 142671872. Throughput: 0: 11502.9. Samples: 35727360. Policy #0 lag: (min: 31.0, avg: 129.9, max: 287.0) [2024-06-15 12:26:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:26:26,073][1652475] Updated weights for policy 0, policy_version 69690 (0.0013) [2024-06-15 12:26:30,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.5, 300 sec: 43542.5). Total num frames: 142868480. Throughput: 0: 11116.1. Samples: 35787776. Policy #0 lag: (min: 31.0, avg: 129.9, max: 287.0) [2024-06-15 12:26:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:26:33,329][1652475] Updated weights for policy 0, policy_version 69792 (0.0106) [2024-06-15 12:26:35,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43690.7, 300 sec: 43210.3). Total num frames: 142999552. Throughput: 0: 11093.3. Samples: 35819520. Policy #0 lag: (min: 31.0, avg: 129.9, max: 287.0) [2024-06-15 12:26:35,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:26:35,925][1652475] Updated weights for policy 0, policy_version 69840 (0.0014) [2024-06-15 12:26:37,935][1652475] Updated weights for policy 0, policy_version 69906 (0.0015) [2024-06-15 12:26:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 143261696. Throughput: 0: 10740.6. Samples: 35871744. Policy #0 lag: (min: 31.0, avg: 129.9, max: 287.0) [2024-06-15 12:26:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:26:40,758][1652475] Updated weights for policy 0, policy_version 69968 (0.0018) [2024-06-15 12:26:41,652][1652475] Updated weights for policy 0, policy_version 70013 (0.0136) [2024-06-15 12:26:45,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43144.4, 300 sec: 43098.2). Total num frames: 143392768. Throughput: 0: 10786.1. Samples: 35942400. Policy #0 lag: (min: 31.0, avg: 129.9, max: 287.0) [2024-06-15 12:26:45,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:26:46,868][1652475] Updated weights for policy 0, policy_version 70052 (0.0032) [2024-06-15 12:26:48,879][1652475] Updated weights for policy 0, policy_version 70128 (0.0012) [2024-06-15 12:26:50,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 143654912. Throughput: 0: 10831.6. Samples: 35973632. Policy #0 lag: (min: 31.0, avg: 129.9, max: 287.0) [2024-06-15 12:26:50,742][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:26:51,710][1652475] Updated weights for policy 0, policy_version 70200 (0.0014) [2024-06-15 12:26:52,713][1652475] Updated weights for policy 0, policy_version 70240 (0.0013) [2024-06-15 12:26:55,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 143917056. Throughput: 0: 10717.8. Samples: 36040192. Policy #0 lag: (min: 31.0, avg: 129.9, max: 287.0) [2024-06-15 12:26:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:26:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000070272_143917056.pth... [2024-06-15 12:26:55,818][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000065152_133431296.pth [2024-06-15 12:26:58,650][1652475] Updated weights for policy 0, policy_version 70311 (0.0013) [2024-06-15 12:27:00,429][1652475] Updated weights for policy 0, policy_version 70384 (0.0013) [2024-06-15 12:27:00,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 144179200. Throughput: 0: 10786.1. Samples: 36106240. Policy #0 lag: (min: 0.0, avg: 81.6, max: 256.0) [2024-06-15 12:27:00,741][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:27:02,884][1651340] Signal inference workers to stop experience collection... (3700 times) [2024-06-15 12:27:02,896][1652475] Updated weights for policy 0, policy_version 70433 (0.0013) [2024-06-15 12:27:02,911][1652475] InferenceWorker_p0-w0: stopping experience collection (3700 times) [2024-06-15 12:27:03,143][1651340] Signal inference workers to resume experience collection... (3700 times) [2024-06-15 12:27:03,144][1652475] InferenceWorker_p0-w0: resuming experience collection (3700 times) [2024-06-15 12:27:04,115][1652475] Updated weights for policy 0, policy_version 70480 (0.0020) [2024-06-15 12:27:05,083][1652475] Updated weights for policy 0, policy_version 70528 (0.0015) [2024-06-15 12:27:05,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 144441344. Throughput: 0: 10740.6. Samples: 36140032. Policy #0 lag: (min: 0.0, avg: 81.6, max: 256.0) [2024-06-15 12:27:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:27:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 144572416. Throughput: 0: 10865.8. Samples: 36216320. Policy #0 lag: (min: 0.0, avg: 81.6, max: 256.0) [2024-06-15 12:27:10,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:27:10,997][1652475] Updated weights for policy 0, policy_version 70609 (0.0015) [2024-06-15 12:27:14,603][1652475] Updated weights for policy 0, policy_version 70692 (0.0015) [2024-06-15 12:27:15,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 144867328. Throughput: 0: 10820.3. Samples: 36274688. Policy #0 lag: (min: 0.0, avg: 81.6, max: 256.0) [2024-06-15 12:27:15,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:27:16,666][1652475] Updated weights for policy 0, policy_version 70778 (0.0017) [2024-06-15 12:27:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 144965632. Throughput: 0: 10752.0. Samples: 36303360. Policy #0 lag: (min: 0.0, avg: 81.6, max: 256.0) [2024-06-15 12:27:20,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 12:27:21,358][1652475] Updated weights for policy 0, policy_version 70825 (0.0014) [2024-06-15 12:27:23,663][1652475] Updated weights for policy 0, policy_version 70851 (0.0047) [2024-06-15 12:27:25,036][1652475] Updated weights for policy 0, policy_version 70912 (0.0018) [2024-06-15 12:27:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.6, 300 sec: 43653.6). Total num frames: 145260544. Throughput: 0: 11138.9. Samples: 36372992. Policy #0 lag: (min: 0.0, avg: 81.6, max: 256.0) [2024-06-15 12:27:25,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:27:28,824][1652475] Updated weights for policy 0, policy_version 70992 (0.0015) [2024-06-15 12:27:30,739][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 145489920. Throughput: 0: 11013.7. Samples: 36438016. Policy #0 lag: (min: 0.0, avg: 81.6, max: 256.0) [2024-06-15 12:27:30,740][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:27:32,129][1652475] Updated weights for policy 0, policy_version 71056 (0.0011) [2024-06-15 12:27:33,106][1652475] Updated weights for policy 0, policy_version 71099 (0.0023) [2024-06-15 12:27:35,738][1648984] Fps is (10 sec: 36044.2, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 145620992. Throughput: 0: 11081.9. Samples: 36472320. Policy #0 lag: (min: 0.0, avg: 81.6, max: 256.0) [2024-06-15 12:27:35,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:27:37,514][1652475] Updated weights for policy 0, policy_version 71200 (0.0015) [2024-06-15 12:27:40,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 145883136. Throughput: 0: 10990.9. Samples: 36534784. Policy #0 lag: (min: 0.0, avg: 81.6, max: 256.0) [2024-06-15 12:27:40,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:27:41,428][1652475] Updated weights for policy 0, policy_version 71264 (0.0023) [2024-06-15 12:27:44,853][1652475] Updated weights for policy 0, policy_version 71344 (0.0118) [2024-06-15 12:27:45,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 45875.2, 300 sec: 43764.7). Total num frames: 146145280. Throughput: 0: 11025.0. Samples: 36602368. Policy #0 lag: (min: 22.0, avg: 138.9, max: 278.0) [2024-06-15 12:27:45,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:27:48,619][1652475] Updated weights for policy 0, policy_version 71394 (0.0015) [2024-06-15 12:27:49,448][1651340] Signal inference workers to stop experience collection... (3750 times) [2024-06-15 12:27:49,500][1652475] InferenceWorker_p0-w0: stopping experience collection (3750 times) [2024-06-15 12:27:49,631][1651340] Signal inference workers to resume experience collection... (3750 times) [2024-06-15 12:27:49,633][1652475] InferenceWorker_p0-w0: resuming experience collection (3750 times) [2024-06-15 12:27:50,707][1652475] Updated weights for policy 0, policy_version 71486 (0.0092) [2024-06-15 12:27:50,737][1648984] Fps is (10 sec: 52430.1, 60 sec: 45875.3, 300 sec: 44209.1). Total num frames: 146407424. Throughput: 0: 11104.8. Samples: 36639744. Policy #0 lag: (min: 22.0, avg: 138.9, max: 278.0) [2024-06-15 12:27:50,740][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:27:53,907][1652475] Updated weights for policy 0, policy_version 71522 (0.0013) [2024-06-15 12:27:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 146538496. Throughput: 0: 10661.0. Samples: 36696064. Policy #0 lag: (min: 22.0, avg: 138.9, max: 278.0) [2024-06-15 12:27:55,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:27:56,764][1652475] Updated weights for policy 0, policy_version 71570 (0.0019) [2024-06-15 12:28:00,738][1648984] Fps is (10 sec: 26213.9, 60 sec: 41506.1, 300 sec: 43542.5). Total num frames: 146669568. Throughput: 0: 10899.9. Samples: 36765184. Policy #0 lag: (min: 22.0, avg: 138.9, max: 278.0) [2024-06-15 12:28:00,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:28:01,369][1652475] Updated weights for policy 0, policy_version 71650 (0.0127) [2024-06-15 12:28:03,135][1652475] Updated weights for policy 0, policy_version 71728 (0.0013) [2024-06-15 12:28:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43653.6). Total num frames: 146931712. Throughput: 0: 10729.2. Samples: 36786176. Policy #0 lag: (min: 22.0, avg: 138.9, max: 278.0) [2024-06-15 12:28:05,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:28:06,518][1652475] Updated weights for policy 0, policy_version 71792 (0.0120) [2024-06-15 12:28:09,179][1652475] Updated weights for policy 0, policy_version 71842 (0.0146) [2024-06-15 12:28:10,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43543.2). Total num frames: 147193856. Throughput: 0: 10797.5. Samples: 36858880. Policy #0 lag: (min: 22.0, avg: 138.9, max: 278.0) [2024-06-15 12:28:10,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:28:13,762][1652475] Updated weights for policy 0, policy_version 71936 (0.0014) [2024-06-15 12:28:15,087][1652475] Updated weights for policy 0, policy_version 71995 (0.0012) [2024-06-15 12:28:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 147456000. Throughput: 0: 10729.3. Samples: 36920832. Policy #0 lag: (min: 22.0, avg: 138.9, max: 278.0) [2024-06-15 12:28:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:28:18,828][1652475] Updated weights for policy 0, policy_version 72056 (0.0048) [2024-06-15 12:28:20,741][1648984] Fps is (10 sec: 42583.5, 60 sec: 44234.2, 300 sec: 43319.9). Total num frames: 147619840. Throughput: 0: 10819.5. Samples: 36959232. Policy #0 lag: (min: 22.0, avg: 138.9, max: 278.0) [2024-06-15 12:28:20,742][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:28:21,807][1652475] Updated weights for policy 0, policy_version 72128 (0.0013) [2024-06-15 12:28:25,533][1652475] Updated weights for policy 0, policy_version 72192 (0.0018) [2024-06-15 12:28:25,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43144.4, 300 sec: 43542.5). Total num frames: 147849216. Throughput: 0: 10922.7. Samples: 37026304. Policy #0 lag: (min: 55.0, avg: 154.7, max: 311.0) [2024-06-15 12:28:25,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:28:30,056][1652475] Updated weights for policy 0, policy_version 72259 (0.0018) [2024-06-15 12:28:30,738][1648984] Fps is (10 sec: 42613.0, 60 sec: 42598.5, 300 sec: 43542.5). Total num frames: 148045824. Throughput: 0: 10808.9. Samples: 37088768. Policy #0 lag: (min: 55.0, avg: 154.7, max: 311.0) [2024-06-15 12:28:30,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:28:31,331][1652475] Updated weights for policy 0, policy_version 72315 (0.0016) [2024-06-15 12:28:33,564][1652475] Updated weights for policy 0, policy_version 72376 (0.0014) [2024-06-15 12:28:35,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 148242432. Throughput: 0: 10706.5. Samples: 37121536. Policy #0 lag: (min: 55.0, avg: 154.7, max: 311.0) [2024-06-15 12:28:35,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:28:36,495][1651340] Signal inference workers to stop experience collection... (3800 times) [2024-06-15 12:28:36,544][1652475] InferenceWorker_p0-w0: stopping experience collection (3800 times) [2024-06-15 12:28:36,722][1651340] Signal inference workers to resume experience collection... (3800 times) [2024-06-15 12:28:36,723][1652475] InferenceWorker_p0-w0: resuming experience collection (3800 times) [2024-06-15 12:28:36,872][1652475] Updated weights for policy 0, policy_version 72440 (0.0015) [2024-06-15 12:28:38,299][1652475] Updated weights for policy 0, policy_version 72482 (0.0022) [2024-06-15 12:28:40,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 148504576. Throughput: 0: 10854.4. Samples: 37184512. Policy #0 lag: (min: 55.0, avg: 154.7, max: 311.0) [2024-06-15 12:28:40,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:28:42,155][1652475] Updated weights for policy 0, policy_version 72544 (0.0018) [2024-06-15 12:28:44,541][1652475] Updated weights for policy 0, policy_version 72582 (0.0026) [2024-06-15 12:28:45,738][1648984] Fps is (10 sec: 49151.3, 60 sec: 43144.5, 300 sec: 43542.5). Total num frames: 148733952. Throughput: 0: 10899.9. Samples: 37255680. Policy #0 lag: (min: 55.0, avg: 154.7, max: 311.0) [2024-06-15 12:28:45,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:28:45,837][1652475] Updated weights for policy 0, policy_version 72640 (0.0014) [2024-06-15 12:28:49,473][1652475] Updated weights for policy 0, policy_version 72710 (0.0012) [2024-06-15 12:28:50,561][1652475] Updated weights for policy 0, policy_version 72768 (0.0013) [2024-06-15 12:28:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.5, 300 sec: 43986.9). Total num frames: 149028864. Throughput: 0: 11150.2. Samples: 37287936. Policy #0 lag: (min: 55.0, avg: 154.7, max: 311.0) [2024-06-15 12:28:50,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:28:54,569][1652475] Updated weights for policy 0, policy_version 72818 (0.0013) [2024-06-15 12:28:55,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 43771.4). Total num frames: 149159936. Throughput: 0: 10854.4. Samples: 37347328. Policy #0 lag: (min: 55.0, avg: 154.7, max: 311.0) [2024-06-15 12:28:55,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:28:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000072832_149159936.pth... [2024-06-15 12:28:55,793][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000067712_138674176.pth [2024-06-15 12:29:00,066][1652475] Updated weights for policy 0, policy_version 72896 (0.0014) [2024-06-15 12:29:00,740][1648984] Fps is (10 sec: 29491.2, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 149323776. Throughput: 0: 10808.9. Samples: 37407232. Policy #0 lag: (min: 55.0, avg: 154.7, max: 311.0) [2024-06-15 12:29:00,741][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:29:04,159][1652475] Updated weights for policy 0, policy_version 72992 (0.0017) [2024-06-15 12:29:05,750][1648984] Fps is (10 sec: 39322.5, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 149553152. Throughput: 0: 10639.1. Samples: 37437952. Policy #0 lag: (min: 55.0, avg: 154.7, max: 311.0) [2024-06-15 12:29:05,750][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:29:06,778][1652475] Updated weights for policy 0, policy_version 73072 (0.0014) [2024-06-15 12:29:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41506.1, 300 sec: 43100.2). Total num frames: 149684224. Throughput: 0: 10570.0. Samples: 37501952. Policy #0 lag: (min: 55.0, avg: 154.7, max: 311.0) [2024-06-15 12:29:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:29:11,545][1652475] Updated weights for policy 0, policy_version 73105 (0.0012) [2024-06-15 12:29:13,055][1652475] Updated weights for policy 0, policy_version 73184 (0.0063) [2024-06-15 12:29:15,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 149946368. Throughput: 0: 10592.7. Samples: 37565440. Policy #0 lag: (min: 9.0, avg: 82.9, max: 265.0) [2024-06-15 12:29:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:29:17,046][1652475] Updated weights for policy 0, policy_version 73264 (0.0013) [2024-06-15 12:29:18,568][1652475] Updated weights for policy 0, policy_version 73318 (0.0018) [2024-06-15 12:29:20,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43147.0, 300 sec: 43764.7). Total num frames: 150208512. Throughput: 0: 10535.8. Samples: 37595648. Policy #0 lag: (min: 9.0, avg: 82.9, max: 265.0) [2024-06-15 12:29:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:29:23,169][1652475] Updated weights for policy 0, policy_version 73360 (0.0107) [2024-06-15 12:29:24,152][1651340] Signal inference workers to stop experience collection... (3850 times) [2024-06-15 12:29:24,234][1652475] InferenceWorker_p0-w0: stopping experience collection (3850 times) [2024-06-15 12:29:24,392][1651340] Signal inference workers to resume experience collection... (3850 times) [2024-06-15 12:29:24,393][1652475] InferenceWorker_p0-w0: resuming experience collection (3850 times) [2024-06-15 12:29:25,448][1652475] Updated weights for policy 0, policy_version 73459 (0.0013) [2024-06-15 12:29:25,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 150470656. Throughput: 0: 10626.8. Samples: 37662720. Policy #0 lag: (min: 9.0, avg: 82.9, max: 265.0) [2024-06-15 12:29:25,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:29:29,147][1652475] Updated weights for policy 0, policy_version 73526 (0.0012) [2024-06-15 12:29:30,718][1652475] Updated weights for policy 0, policy_version 73584 (0.0023) [2024-06-15 12:29:30,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 44236.9, 300 sec: 44098.0). Total num frames: 150700032. Throughput: 0: 10387.9. Samples: 37723136. Policy #0 lag: (min: 9.0, avg: 82.9, max: 265.0) [2024-06-15 12:29:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:29:35,738][1648984] Fps is (10 sec: 26215.0, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 150732800. Throughput: 0: 10399.3. Samples: 37755904. Policy #0 lag: (min: 9.0, avg: 82.9, max: 265.0) [2024-06-15 12:29:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:29:36,238][1652475] Updated weights for policy 0, policy_version 73623 (0.0018) [2024-06-15 12:29:39,053][1652475] Updated weights for policy 0, policy_version 73718 (0.0013) [2024-06-15 12:29:40,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 42052.4, 300 sec: 43653.6). Total num frames: 151027712. Throughput: 0: 10399.3. Samples: 37815296. Policy #0 lag: (min: 9.0, avg: 82.9, max: 265.0) [2024-06-15 12:29:40,738][1648984] Avg episode reward: [(0, '-0.660')] [2024-06-15 12:29:40,813][1652475] Updated weights for policy 0, policy_version 73760 (0.0014) [2024-06-15 12:29:43,562][1652475] Updated weights for policy 0, policy_version 73793 (0.0011) [2024-06-15 12:29:44,854][1652475] Updated weights for policy 0, policy_version 73856 (0.0013) [2024-06-15 12:29:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42052.4, 300 sec: 43209.3). Total num frames: 151257088. Throughput: 0: 10467.6. Samples: 37878272. Policy #0 lag: (min: 9.0, avg: 82.9, max: 265.0) [2024-06-15 12:29:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:29:49,233][1652475] Updated weights for policy 0, policy_version 73918 (0.0021) [2024-06-15 12:29:50,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 40413.9, 300 sec: 43320.4). Total num frames: 151453696. Throughput: 0: 10615.4. Samples: 37915648. Policy #0 lag: (min: 9.0, avg: 82.9, max: 265.0) [2024-06-15 12:29:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:29:50,954][1652475] Updated weights for policy 0, policy_version 73971 (0.0011) [2024-06-15 12:29:53,350][1652475] Updated weights for policy 0, policy_version 74046 (0.0013) [2024-06-15 12:29:55,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 151650304. Throughput: 0: 10444.8. Samples: 37971968. Policy #0 lag: (min: 9.0, avg: 82.9, max: 265.0) [2024-06-15 12:29:55,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:29:57,931][1652475] Updated weights for policy 0, policy_version 74104 (0.0012) [2024-06-15 12:30:00,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 151781376. Throughput: 0: 10513.1. Samples: 38038528. Policy #0 lag: (min: 10.0, avg: 120.6, max: 266.0) [2024-06-15 12:30:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:01,995][1652475] Updated weights for policy 0, policy_version 74168 (0.0014) [2024-06-15 12:30:03,519][1652475] Updated weights for policy 0, policy_version 74224 (0.0014) [2024-06-15 12:30:05,685][1652475] Updated weights for policy 0, policy_version 74272 (0.0014) [2024-06-15 12:30:05,738][1648984] Fps is (10 sec: 45876.3, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 152109056. Throughput: 0: 10456.2. Samples: 38066176. Policy #0 lag: (min: 10.0, avg: 120.6, max: 266.0) [2024-06-15 12:30:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:09,060][1652475] Updated weights for policy 0, policy_version 74336 (0.0024) [2024-06-15 12:30:09,838][1652475] Updated weights for policy 0, policy_version 74368 (0.0031) [2024-06-15 12:30:10,738][1648984] Fps is (10 sec: 52426.7, 60 sec: 43690.4, 300 sec: 43098.2). Total num frames: 152305664. Throughput: 0: 10467.5. Samples: 38133760. Policy #0 lag: (min: 10.0, avg: 120.6, max: 266.0) [2024-06-15 12:30:10,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:13,301][1651340] Signal inference workers to stop experience collection... (3900 times) [2024-06-15 12:30:13,340][1652475] InferenceWorker_p0-w0: stopping experience collection (3900 times) [2024-06-15 12:30:13,500][1651340] Signal inference workers to resume experience collection... (3900 times) [2024-06-15 12:30:13,501][1652475] InferenceWorker_p0-w0: resuming experience collection (3900 times) [2024-06-15 12:30:13,938][1652475] Updated weights for policy 0, policy_version 74432 (0.0016) [2024-06-15 12:30:15,113][1652475] Updated weights for policy 0, policy_version 74484 (0.0011) [2024-06-15 12:30:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 152567808. Throughput: 0: 10604.1. Samples: 38200320. Policy #0 lag: (min: 10.0, avg: 120.6, max: 266.0) [2024-06-15 12:30:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:17,738][1652475] Updated weights for policy 0, policy_version 74550 (0.0011) [2024-06-15 12:30:20,738][1648984] Fps is (10 sec: 39323.6, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 152698880. Throughput: 0: 10570.0. Samples: 38231552. Policy #0 lag: (min: 10.0, avg: 120.6, max: 266.0) [2024-06-15 12:30:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:21,263][1652475] Updated weights for policy 0, policy_version 74597 (0.0012) [2024-06-15 12:30:24,647][1652475] Updated weights for policy 0, policy_version 74640 (0.0032) [2024-06-15 12:30:25,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.1, 300 sec: 42987.2). Total num frames: 152928256. Throughput: 0: 10877.1. Samples: 38304768. Policy #0 lag: (min: 10.0, avg: 120.6, max: 266.0) [2024-06-15 12:30:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:26,360][1652475] Updated weights for policy 0, policy_version 74708 (0.0014) [2024-06-15 12:30:28,611][1652475] Updated weights for policy 0, policy_version 74760 (0.0014) [2024-06-15 12:30:29,927][1652475] Updated weights for policy 0, policy_version 74816 (0.0012) [2024-06-15 12:30:30,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 42052.2, 300 sec: 43542.6). Total num frames: 153223168. Throughput: 0: 10831.6. Samples: 38365696. Policy #0 lag: (min: 10.0, avg: 120.6, max: 266.0) [2024-06-15 12:30:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:35,739][1648984] Fps is (10 sec: 42596.7, 60 sec: 43690.4, 300 sec: 43098.2). Total num frames: 153354240. Throughput: 0: 10740.5. Samples: 38398976. Policy #0 lag: (min: 10.0, avg: 120.6, max: 266.0) [2024-06-15 12:30:35,741][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:36,518][1652475] Updated weights for policy 0, policy_version 74896 (0.0014) [2024-06-15 12:30:37,947][1652475] Updated weights for policy 0, policy_version 74960 (0.0112) [2024-06-15 12:30:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 153616384. Throughput: 0: 10956.8. Samples: 38465024. Policy #0 lag: (min: 10.0, avg: 120.6, max: 266.0) [2024-06-15 12:30:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:40,768][1652475] Updated weights for policy 0, policy_version 75024 (0.0012) [2024-06-15 12:30:41,835][1652475] Updated weights for policy 0, policy_version 75069 (0.0013) [2024-06-15 12:30:45,738][1648984] Fps is (10 sec: 52430.6, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 153878528. Throughput: 0: 10990.9. Samples: 38533120. Policy #0 lag: (min: 6.0, avg: 116.2, max: 262.0) [2024-06-15 12:30:45,752][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:47,940][1652475] Updated weights for policy 0, policy_version 75145 (0.0014) [2024-06-15 12:30:49,062][1652475] Updated weights for policy 0, policy_version 75200 (0.0012) [2024-06-15 12:30:50,469][1652475] Updated weights for policy 0, policy_version 75256 (0.0014) [2024-06-15 12:30:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 154140672. Throughput: 0: 11241.2. Samples: 38572032. Policy #0 lag: (min: 6.0, avg: 116.2, max: 262.0) [2024-06-15 12:30:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:52,434][1652475] Updated weights for policy 0, policy_version 75297 (0.0011) [2024-06-15 12:30:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.8, 300 sec: 43098.2). Total num frames: 154271744. Throughput: 0: 11150.3. Samples: 38635520. Policy #0 lag: (min: 6.0, avg: 116.2, max: 262.0) [2024-06-15 12:30:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:30:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000075328_154271744.pth... [2024-06-15 12:30:56,004][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000070272_143917056.pth [2024-06-15 12:30:56,020][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000075328_154271744.pth [2024-06-15 12:30:56,518][1652475] Updated weights for policy 0, policy_version 75345 (0.0013) [2024-06-15 12:30:57,275][1652475] Updated weights for policy 0, policy_version 75391 (0.0027) [2024-06-15 12:30:59,175][1651340] Signal inference workers to stop experience collection... (3950 times) [2024-06-15 12:30:59,216][1652475] InferenceWorker_p0-w0: stopping experience collection (3950 times) [2024-06-15 12:30:59,392][1651340] Signal inference workers to resume experience collection... (3950 times) [2024-06-15 12:30:59,409][1652475] InferenceWorker_p0-w0: resuming experience collection (3950 times) [2024-06-15 12:31:00,071][1652475] Updated weights for policy 0, policy_version 75444 (0.0016) [2024-06-15 12:31:00,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 45875.0, 300 sec: 43209.3). Total num frames: 154533888. Throughput: 0: 11229.8. Samples: 38705664. Policy #0 lag: (min: 6.0, avg: 116.2, max: 262.0) [2024-06-15 12:31:00,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:31:02,341][1652475] Updated weights for policy 0, policy_version 75510 (0.0014) [2024-06-15 12:31:03,779][1652475] Updated weights for policy 0, policy_version 75539 (0.0011) [2024-06-15 12:31:04,812][1652475] Updated weights for policy 0, policy_version 75583 (0.0016) [2024-06-15 12:31:05,740][1648984] Fps is (10 sec: 52426.9, 60 sec: 44782.6, 300 sec: 43542.5). Total num frames: 154796032. Throughput: 0: 11241.1. Samples: 38737408. Policy #0 lag: (min: 6.0, avg: 116.2, max: 262.0) [2024-06-15 12:31:05,741][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:31:08,715][1652475] Updated weights for policy 0, policy_version 75647 (0.0016) [2024-06-15 12:31:10,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.9, 300 sec: 43098.2). Total num frames: 154927104. Throughput: 0: 11161.6. Samples: 38807040. Policy #0 lag: (min: 6.0, avg: 116.2, max: 262.0) [2024-06-15 12:31:10,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 12:31:11,863][1652475] Updated weights for policy 0, policy_version 75705 (0.0013) [2024-06-15 12:31:13,984][1652475] Updated weights for policy 0, policy_version 75768 (0.0014) [2024-06-15 12:31:15,738][1648984] Fps is (10 sec: 42600.2, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 155222016. Throughput: 0: 11195.7. Samples: 38869504. Policy #0 lag: (min: 6.0, avg: 116.2, max: 262.0) [2024-06-15 12:31:15,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:31:16,391][1652475] Updated weights for policy 0, policy_version 75831 (0.0013) [2024-06-15 12:31:20,153][1652475] Updated weights for policy 0, policy_version 75888 (0.0063) [2024-06-15 12:31:20,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 43320.4). Total num frames: 155451392. Throughput: 0: 11116.2. Samples: 38899200. Policy #0 lag: (min: 6.0, avg: 116.2, max: 262.0) [2024-06-15 12:31:20,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:31:24,594][1652475] Updated weights for policy 0, policy_version 75955 (0.0014) [2024-06-15 12:31:25,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 45329.0, 300 sec: 43320.4). Total num frames: 155648000. Throughput: 0: 11184.3. Samples: 38968320. Policy #0 lag: (min: 6.0, avg: 116.2, max: 262.0) [2024-06-15 12:31:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:31:26,431][1652475] Updated weights for policy 0, policy_version 76030 (0.0012) [2024-06-15 12:31:30,695][1652475] Updated weights for policy 0, policy_version 76098 (0.0015) [2024-06-15 12:31:30,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 155844608. Throughput: 0: 11036.4. Samples: 39029760. Policy #0 lag: (min: 6.0, avg: 116.2, max: 262.0) [2024-06-15 12:31:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:31:31,611][1652475] Updated weights for policy 0, policy_version 76152 (0.0011) [2024-06-15 12:31:35,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 44783.2, 300 sec: 43320.4). Total num frames: 156041216. Throughput: 0: 10968.2. Samples: 39065600. Policy #0 lag: (min: 8.0, avg: 133.9, max: 264.0) [2024-06-15 12:31:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:31:36,270][1652475] Updated weights for policy 0, policy_version 76221 (0.0014) [2024-06-15 12:31:38,639][1652475] Updated weights for policy 0, policy_version 76286 (0.0014) [2024-06-15 12:31:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 156270592. Throughput: 0: 10956.8. Samples: 39128576. Policy #0 lag: (min: 8.0, avg: 133.9, max: 264.0) [2024-06-15 12:31:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:31:41,577][1652475] Updated weights for policy 0, policy_version 76345 (0.0014) [2024-06-15 12:31:43,118][1652475] Updated weights for policy 0, policy_version 76385 (0.0027) [2024-06-15 12:31:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 156499968. Throughput: 0: 11002.4. Samples: 39200768. Policy #0 lag: (min: 8.0, avg: 133.9, max: 264.0) [2024-06-15 12:31:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:31:47,862][1651340] Signal inference workers to stop experience collection... (4000 times) [2024-06-15 12:31:47,937][1652475] InferenceWorker_p0-w0: stopping experience collection (4000 times) [2024-06-15 12:31:47,940][1652475] Updated weights for policy 0, policy_version 76436 (0.0016) [2024-06-15 12:31:48,204][1651340] Signal inference workers to resume experience collection... (4000 times) [2024-06-15 12:31:48,218][1652475] InferenceWorker_p0-w0: resuming experience collection (4000 times) [2024-06-15 12:31:50,028][1652475] Updated weights for policy 0, policy_version 76512 (0.0013) [2024-06-15 12:31:50,738][1648984] Fps is (10 sec: 49148.5, 60 sec: 43690.1, 300 sec: 43542.5). Total num frames: 156762112. Throughput: 0: 11070.5. Samples: 39235584. Policy #0 lag: (min: 8.0, avg: 133.9, max: 264.0) [2024-06-15 12:31:50,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:31:52,635][1652475] Updated weights for policy 0, policy_version 76581 (0.0017) [2024-06-15 12:31:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 156893184. Throughput: 0: 10752.0. Samples: 39290880. Policy #0 lag: (min: 8.0, avg: 133.9, max: 264.0) [2024-06-15 12:31:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:31:56,349][1652475] Updated weights for policy 0, policy_version 76640 (0.0014) [2024-06-15 12:32:00,159][1652475] Updated weights for policy 0, policy_version 76706 (0.0068) [2024-06-15 12:32:00,738][1648984] Fps is (10 sec: 36047.6, 60 sec: 43144.7, 300 sec: 42987.2). Total num frames: 157122560. Throughput: 0: 10820.3. Samples: 39356416. Policy #0 lag: (min: 8.0, avg: 133.9, max: 264.0) [2024-06-15 12:32:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:01,854][1652475] Updated weights for policy 0, policy_version 76738 (0.0016) [2024-06-15 12:32:03,595][1652475] Updated weights for policy 0, policy_version 76805 (0.0013) [2024-06-15 12:32:05,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43691.0, 300 sec: 43542.6). Total num frames: 157417472. Throughput: 0: 10934.1. Samples: 39391232. Policy #0 lag: (min: 8.0, avg: 133.9, max: 264.0) [2024-06-15 12:32:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:08,899][1652475] Updated weights for policy 0, policy_version 76880 (0.0013) [2024-06-15 12:32:10,609][1652475] Updated weights for policy 0, policy_version 76931 (0.0014) [2024-06-15 12:32:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 157548544. Throughput: 0: 10899.9. Samples: 39458816. Policy #0 lag: (min: 8.0, avg: 133.9, max: 264.0) [2024-06-15 12:32:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:12,133][1652475] Updated weights for policy 0, policy_version 76992 (0.0018) [2024-06-15 12:32:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 157843456. Throughput: 0: 10945.4. Samples: 39522304. Policy #0 lag: (min: 8.0, avg: 133.9, max: 264.0) [2024-06-15 12:32:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:15,779][1652475] Updated weights for policy 0, policy_version 77077 (0.0015) [2024-06-15 12:32:16,520][1652475] Updated weights for policy 0, policy_version 77116 (0.0017) [2024-06-15 12:32:20,742][1648984] Fps is (10 sec: 39303.9, 60 sec: 41503.1, 300 sec: 42986.5). Total num frames: 157941760. Throughput: 0: 10864.7. Samples: 39554560. Policy #0 lag: (min: 8.0, avg: 133.9, max: 264.0) [2024-06-15 12:32:20,743][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:22,693][1652475] Updated weights for policy 0, policy_version 77185 (0.0014) [2024-06-15 12:32:24,038][1652475] Updated weights for policy 0, policy_version 77240 (0.0011) [2024-06-15 12:32:25,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 158203904. Throughput: 0: 10899.9. Samples: 39619072. Policy #0 lag: (min: 7.0, avg: 90.4, max: 263.0) [2024-06-15 12:32:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:26,877][1652475] Updated weights for policy 0, policy_version 77296 (0.0011) [2024-06-15 12:32:28,199][1652475] Updated weights for policy 0, policy_version 77360 (0.0013) [2024-06-15 12:32:30,738][1648984] Fps is (10 sec: 52451.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 158466048. Throughput: 0: 10797.5. Samples: 39686656. Policy #0 lag: (min: 7.0, avg: 90.4, max: 263.0) [2024-06-15 12:32:30,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:33,498][1651340] Signal inference workers to stop experience collection... (4050 times) [2024-06-15 12:32:33,536][1652475] InferenceWorker_p0-w0: stopping experience collection (4050 times) [2024-06-15 12:32:33,833][1651340] Signal inference workers to resume experience collection... (4050 times) [2024-06-15 12:32:33,834][1652475] InferenceWorker_p0-w0: resuming experience collection (4050 times) [2024-06-15 12:32:34,004][1652475] Updated weights for policy 0, policy_version 77410 (0.0012) [2024-06-15 12:32:35,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 158662656. Throughput: 0: 10854.6. Samples: 39724032. Policy #0 lag: (min: 7.0, avg: 90.4, max: 263.0) [2024-06-15 12:32:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:35,803][1652475] Updated weights for policy 0, policy_version 77488 (0.0020) [2024-06-15 12:32:37,800][1652475] Updated weights for policy 0, policy_version 77520 (0.0012) [2024-06-15 12:32:40,140][1652475] Updated weights for policy 0, policy_version 77618 (0.0014) [2024-06-15 12:32:40,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 45329.1, 300 sec: 43542.6). Total num frames: 158990336. Throughput: 0: 10899.9. Samples: 39781376. Policy #0 lag: (min: 7.0, avg: 90.4, max: 263.0) [2024-06-15 12:32:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:45,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 159023104. Throughput: 0: 11082.0. Samples: 39855104. Policy #0 lag: (min: 7.0, avg: 90.4, max: 263.0) [2024-06-15 12:32:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:45,983][1652475] Updated weights for policy 0, policy_version 77664 (0.0038) [2024-06-15 12:32:47,995][1652475] Updated weights for policy 0, policy_version 77744 (0.0017) [2024-06-15 12:32:50,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 42598.9, 300 sec: 43320.4). Total num frames: 159318016. Throughput: 0: 10808.9. Samples: 39877632. Policy #0 lag: (min: 7.0, avg: 90.4, max: 263.0) [2024-06-15 12:32:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:51,080][1652475] Updated weights for policy 0, policy_version 77824 (0.0065) [2024-06-15 12:32:55,739][1648984] Fps is (10 sec: 49150.3, 60 sec: 43690.5, 300 sec: 43542.5). Total num frames: 159514624. Throughput: 0: 10649.5. Samples: 39938048. Policy #0 lag: (min: 7.0, avg: 90.4, max: 263.0) [2024-06-15 12:32:55,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:32:55,748][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000077888_159514624.pth... [2024-06-15 12:32:55,790][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000072832_149159936.pth [2024-06-15 12:32:58,060][1652475] Updated weights for policy 0, policy_version 77894 (0.0013) [2024-06-15 12:33:00,262][1652475] Updated weights for policy 0, policy_version 77984 (0.0022) [2024-06-15 12:33:00,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 159744000. Throughput: 0: 10695.1. Samples: 40003584. Policy #0 lag: (min: 7.0, avg: 90.4, max: 263.0) [2024-06-15 12:33:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:33:02,270][1652475] Updated weights for policy 0, policy_version 78032 (0.0050) [2024-06-15 12:33:03,994][1652475] Updated weights for policy 0, policy_version 78099 (0.0012) [2024-06-15 12:33:05,738][1648984] Fps is (10 sec: 52430.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 160038912. Throughput: 0: 10730.3. Samples: 40037376. Policy #0 lag: (min: 7.0, avg: 90.4, max: 263.0) [2024-06-15 12:33:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:33:10,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 160071680. Throughput: 0: 10831.6. Samples: 40106496. Policy #0 lag: (min: 15.0, avg: 83.9, max: 271.0) [2024-06-15 12:33:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:33:10,800][1652475] Updated weights for policy 0, policy_version 78176 (0.0014) [2024-06-15 12:33:12,279][1652475] Updated weights for policy 0, policy_version 78240 (0.0014) [2024-06-15 12:33:14,313][1652475] Updated weights for policy 0, policy_version 78279 (0.0034) [2024-06-15 12:33:14,954][1651340] Signal inference workers to stop experience collection... (4100 times) [2024-06-15 12:33:15,026][1652475] InferenceWorker_p0-w0: stopping experience collection (4100 times) [2024-06-15 12:33:15,184][1651340] Signal inference workers to resume experience collection... (4100 times) [2024-06-15 12:33:15,185][1652475] InferenceWorker_p0-w0: resuming experience collection (4100 times) [2024-06-15 12:33:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.6, 300 sec: 43432.0). Total num frames: 160432128. Throughput: 0: 10661.0. Samples: 40166400. Policy #0 lag: (min: 15.0, avg: 83.9, max: 271.0) [2024-06-15 12:33:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:33:16,351][1652475] Updated weights for policy 0, policy_version 78374 (0.0022) [2024-06-15 12:33:20,738][1648984] Fps is (10 sec: 49150.2, 60 sec: 43693.6, 300 sec: 43098.2). Total num frames: 160563200. Throughput: 0: 10558.5. Samples: 40199168. Policy #0 lag: (min: 15.0, avg: 83.9, max: 271.0) [2024-06-15 12:33:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:33:22,056][1652475] Updated weights for policy 0, policy_version 78404 (0.0048) [2024-06-15 12:33:23,916][1652475] Updated weights for policy 0, policy_version 78480 (0.0012) [2024-06-15 12:33:25,144][1652475] Updated weights for policy 0, policy_version 78528 (0.0018) [2024-06-15 12:33:25,738][1648984] Fps is (10 sec: 39319.9, 60 sec: 43690.4, 300 sec: 43320.4). Total num frames: 160825344. Throughput: 0: 10797.4. Samples: 40267264. Policy #0 lag: (min: 15.0, avg: 83.9, max: 271.0) [2024-06-15 12:33:25,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 12:33:27,105][1652475] Updated weights for policy 0, policy_version 78591 (0.0013) [2024-06-15 12:33:28,778][1652475] Updated weights for policy 0, policy_version 78655 (0.0017) [2024-06-15 12:33:30,738][1648984] Fps is (10 sec: 52430.9, 60 sec: 43690.7, 300 sec: 43542.5). Total num frames: 161087488. Throughput: 0: 10433.4. Samples: 40324608. Policy #0 lag: (min: 15.0, avg: 83.9, max: 271.0) [2024-06-15 12:33:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:33:35,738][1648984] Fps is (10 sec: 36046.1, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 161185792. Throughput: 0: 10763.4. Samples: 40361984. Policy #0 lag: (min: 15.0, avg: 83.9, max: 271.0) [2024-06-15 12:33:35,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 12:33:35,824][1652475] Updated weights for policy 0, policy_version 78715 (0.0016) [2024-06-15 12:33:39,251][1652475] Updated weights for policy 0, policy_version 78823 (0.0109) [2024-06-15 12:33:40,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 41505.9, 300 sec: 43209.3). Total num frames: 161480704. Throughput: 0: 10626.8. Samples: 40416256. Policy #0 lag: (min: 15.0, avg: 83.9, max: 271.0) [2024-06-15 12:33:40,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:33:42,778][1652475] Updated weights for policy 0, policy_version 78880 (0.0168) [2024-06-15 12:33:45,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 161611776. Throughput: 0: 10683.7. Samples: 40484352. Policy #0 lag: (min: 15.0, avg: 83.9, max: 271.0) [2024-06-15 12:33:45,759][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:33:47,440][1652475] Updated weights for policy 0, policy_version 78949 (0.0015) [2024-06-15 12:33:49,506][1652475] Updated weights for policy 0, policy_version 78997 (0.0013) [2024-06-15 12:33:50,738][1648984] Fps is (10 sec: 39323.0, 60 sec: 42598.5, 300 sec: 43098.3). Total num frames: 161873920. Throughput: 0: 10706.5. Samples: 40519168. Policy #0 lag: (min: 15.0, avg: 83.9, max: 271.0) [2024-06-15 12:33:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:33:50,941][1652475] Updated weights for policy 0, policy_version 79063 (0.0014) [2024-06-15 12:33:53,995][1652475] Updated weights for policy 0, policy_version 79120 (0.0014) [2024-06-15 12:33:54,863][1652475] Updated weights for policy 0, policy_version 79161 (0.0016) [2024-06-15 12:33:55,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.8, 300 sec: 43431.5). Total num frames: 162136064. Throughput: 0: 10661.0. Samples: 40586240. Policy #0 lag: (min: 15.0, avg: 83.9, max: 271.0) [2024-06-15 12:33:55,752][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:33:58,927][1652475] Updated weights for policy 0, policy_version 79200 (0.0017) [2024-06-15 12:34:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 162299904. Throughput: 0: 10911.3. Samples: 40657408. Policy #0 lag: (min: 23.0, avg: 110.1, max: 279.0) [2024-06-15 12:34:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:34:01,365][1651340] Signal inference workers to stop experience collection... (4150 times) [2024-06-15 12:34:01,395][1652475] InferenceWorker_p0-w0: stopping experience collection (4150 times) [2024-06-15 12:34:01,623][1651340] Signal inference workers to resume experience collection... (4150 times) [2024-06-15 12:34:01,624][1652475] InferenceWorker_p0-w0: resuming experience collection (4150 times) [2024-06-15 12:34:01,626][1652475] Updated weights for policy 0, policy_version 79296 (0.0015) [2024-06-15 12:34:02,886][1652475] Updated weights for policy 0, policy_version 79356 (0.0012) [2024-06-15 12:34:05,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 162594816. Throughput: 0: 10763.5. Samples: 40683520. Policy #0 lag: (min: 23.0, avg: 110.1, max: 279.0) [2024-06-15 12:34:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 12:34:06,267][1652475] Updated weights for policy 0, policy_version 79424 (0.0018) [2024-06-15 12:34:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 162693120. Throughput: 0: 10831.7. Samples: 40754688. Policy #0 lag: (min: 23.0, avg: 110.1, max: 279.0) [2024-06-15 12:34:10,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:34:13,180][1652475] Updated weights for policy 0, policy_version 79536 (0.0013) [2024-06-15 12:34:15,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 162922496. Throughput: 0: 10911.3. Samples: 40815616. Policy #0 lag: (min: 23.0, avg: 110.1, max: 279.0) [2024-06-15 12:34:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:34:16,861][1652475] Updated weights for policy 0, policy_version 79616 (0.0015) [2024-06-15 12:34:20,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43691.0, 300 sec: 43098.3). Total num frames: 163184640. Throughput: 0: 10478.9. Samples: 40833536. Policy #0 lag: (min: 23.0, avg: 110.1, max: 279.0) [2024-06-15 12:34:20,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:34:23,992][1652475] Updated weights for policy 0, policy_version 79681 (0.0119) [2024-06-15 12:34:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.4, 300 sec: 42765.0). Total num frames: 163315712. Throughput: 0: 10900.0. Samples: 40906752. Policy #0 lag: (min: 23.0, avg: 110.1, max: 279.0) [2024-06-15 12:34:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:34:25,856][1652475] Updated weights for policy 0, policy_version 79745 (0.0015) [2024-06-15 12:34:27,036][1652475] Updated weights for policy 0, policy_version 79803 (0.0013) [2024-06-15 12:34:29,587][1652475] Updated weights for policy 0, policy_version 79888 (0.0015) [2024-06-15 12:34:30,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 163708928. Throughput: 0: 10569.9. Samples: 40960000. Policy #0 lag: (min: 23.0, avg: 110.1, max: 279.0) [2024-06-15 12:34:30,750][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:34:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 163708928. Throughput: 0: 10649.6. Samples: 40998400. Policy #0 lag: (min: 23.0, avg: 110.1, max: 279.0) [2024-06-15 12:34:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:34:36,228][1652475] Updated weights for policy 0, policy_version 79952 (0.0016) [2024-06-15 12:34:37,145][1652475] Updated weights for policy 0, policy_version 79998 (0.0014) [2024-06-15 12:34:39,528][1652475] Updated weights for policy 0, policy_version 80064 (0.0013) [2024-06-15 12:34:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.7, 300 sec: 43431.5). Total num frames: 164069376. Throughput: 0: 10740.6. Samples: 41069568. Policy #0 lag: (min: 23.0, avg: 110.1, max: 279.0) [2024-06-15 12:34:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:34:41,404][1652475] Updated weights for policy 0, policy_version 80144 (0.0018) [2024-06-15 12:34:45,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 164233216. Throughput: 0: 10501.7. Samples: 41129984. Policy #0 lag: (min: 23.0, avg: 110.1, max: 279.0) [2024-06-15 12:34:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:34:48,186][1652475] Updated weights for policy 0, policy_version 80208 (0.0015) [2024-06-15 12:34:49,970][1651340] Signal inference workers to stop experience collection... (4200 times) [2024-06-15 12:34:50,014][1652475] InferenceWorker_p0-w0: stopping experience collection (4200 times) [2024-06-15 12:34:50,281][1651340] Signal inference workers to resume experience collection... (4200 times) [2024-06-15 12:34:50,281][1652475] InferenceWorker_p0-w0: resuming experience collection (4200 times) [2024-06-15 12:34:50,283][1652475] Updated weights for policy 0, policy_version 80272 (0.0013) [2024-06-15 12:34:50,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 164429824. Throughput: 0: 10763.4. Samples: 41167872. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 12:34:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:34:51,974][1652475] Updated weights for policy 0, policy_version 80336 (0.0012) [2024-06-15 12:34:54,034][1652475] Updated weights for policy 0, policy_version 80419 (0.0012) [2024-06-15 12:34:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 164757504. Throughput: 0: 10387.9. Samples: 41222144. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 12:34:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:34:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000080448_164757504.pth... [2024-06-15 12:34:55,827][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000075328_154271744.pth [2024-06-15 12:35:00,316][1652475] Updated weights for policy 0, policy_version 80481 (0.0080) [2024-06-15 12:35:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 164855808. Throughput: 0: 10752.0. Samples: 41299456. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 12:35:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:35:03,084][1652475] Updated weights for policy 0, policy_version 80547 (0.0016) [2024-06-15 12:35:04,447][1652475] Updated weights for policy 0, policy_version 80613 (0.0013) [2024-06-15 12:35:05,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.5, 300 sec: 43653.7). Total num frames: 165183488. Throughput: 0: 11116.1. Samples: 41333760. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 12:35:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:35:06,468][1652475] Updated weights for policy 0, policy_version 80688 (0.0014) [2024-06-15 12:35:10,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43144.4, 300 sec: 43098.2). Total num frames: 165281792. Throughput: 0: 10865.7. Samples: 41395712. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 12:35:10,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:35:11,562][1652475] Updated weights for policy 0, policy_version 80736 (0.0020) [2024-06-15 12:35:14,465][1652475] Updated weights for policy 0, policy_version 80770 (0.0012) [2024-06-15 12:35:15,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 165511168. Throughput: 0: 11332.3. Samples: 41469952. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 12:35:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:35:16,391][1652475] Updated weights for policy 0, policy_version 80848 (0.0012) [2024-06-15 12:35:18,879][1652475] Updated weights for policy 0, policy_version 80944 (0.0016) [2024-06-15 12:35:20,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 165806080. Throughput: 0: 10843.0. Samples: 41486336. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 12:35:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:35:24,052][1652475] Updated weights for policy 0, policy_version 80992 (0.0018) [2024-06-15 12:35:25,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 165937152. Throughput: 0: 10877.1. Samples: 41559040. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 12:35:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:35:28,203][1652475] Updated weights for policy 0, policy_version 81060 (0.0012) [2024-06-15 12:35:29,974][1652475] Updated weights for policy 0, policy_version 81123 (0.0013) [2024-06-15 12:35:30,701][1651340] Signal inference workers to stop experience collection... (4250 times) [2024-06-15 12:35:30,736][1652475] InferenceWorker_p0-w0: stopping experience collection (4250 times) [2024-06-15 12:35:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 166199296. Throughput: 0: 10797.5. Samples: 41615872. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 12:35:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:35:30,938][1651340] Signal inference workers to resume experience collection... (4250 times) [2024-06-15 12:35:30,939][1652475] InferenceWorker_p0-w0: resuming experience collection (4250 times) [2024-06-15 12:35:31,075][1652475] Updated weights for policy 0, policy_version 81172 (0.0013) [2024-06-15 12:35:35,512][1652475] Updated weights for policy 0, policy_version 81236 (0.0029) [2024-06-15 12:35:35,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 166395904. Throughput: 0: 10729.3. Samples: 41650688. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 12:35:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:35:39,254][1652475] Updated weights for policy 0, policy_version 81283 (0.0019) [2024-06-15 12:35:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 166559744. Throughput: 0: 11241.2. Samples: 41728000. Policy #0 lag: (min: 13.0, avg: 88.5, max: 269.0) [2024-06-15 12:35:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:35:40,743][1652475] Updated weights for policy 0, policy_version 81344 (0.0020) [2024-06-15 12:35:43,629][1652475] Updated weights for policy 0, policy_version 81472 (0.0102) [2024-06-15 12:35:45,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 166854656. Throughput: 0: 10706.5. Samples: 41781248. Policy #0 lag: (min: 13.0, avg: 88.5, max: 269.0) [2024-06-15 12:35:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:35:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 166985728. Throughput: 0: 10740.6. Samples: 41817088. Policy #0 lag: (min: 13.0, avg: 88.5, max: 269.0) [2024-06-15 12:35:50,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:35:51,561][1652475] Updated weights for policy 0, policy_version 81540 (0.0014) [2024-06-15 12:35:53,097][1652475] Updated weights for policy 0, policy_version 81616 (0.0081) [2024-06-15 12:35:54,537][1652475] Updated weights for policy 0, policy_version 81680 (0.0012) [2024-06-15 12:35:55,672][1652475] Updated weights for policy 0, policy_version 81728 (0.0071) [2024-06-15 12:35:55,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.5, 300 sec: 43542.6). Total num frames: 167378944. Throughput: 0: 10535.8. Samples: 41869824. Policy #0 lag: (min: 13.0, avg: 88.5, max: 269.0) [2024-06-15 12:35:55,739][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 12:36:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42654.0). Total num frames: 167378944. Throughput: 0: 10444.8. Samples: 41939968. Policy #0 lag: (min: 13.0, avg: 88.5, max: 269.0) [2024-06-15 12:36:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:36:03,695][1652475] Updated weights for policy 0, policy_version 81812 (0.0014) [2024-06-15 12:36:05,737][1652475] Updated weights for policy 0, policy_version 81912 (0.0084) [2024-06-15 12:36:05,738][1648984] Fps is (10 sec: 36045.8, 60 sec: 42598.5, 300 sec: 43431.5). Total num frames: 167739392. Throughput: 0: 10820.3. Samples: 41973248. Policy #0 lag: (min: 13.0, avg: 88.5, max: 269.0) [2024-06-15 12:36:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:36:07,183][1652475] Updated weights for policy 0, policy_version 81955 (0.0016) [2024-06-15 12:36:10,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 167903232. Throughput: 0: 10570.0. Samples: 42034688. Policy #0 lag: (min: 13.0, avg: 88.5, max: 269.0) [2024-06-15 12:36:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:36:14,132][1652475] Updated weights for policy 0, policy_version 82016 (0.0012) [2024-06-15 12:36:15,289][1651340] Signal inference workers to stop experience collection... (4300 times) [2024-06-15 12:36:15,339][1652475] InferenceWorker_p0-w0: stopping experience collection (4300 times) [2024-06-15 12:36:15,592][1651340] Signal inference workers to resume experience collection... (4300 times) [2024-06-15 12:36:15,593][1652475] InferenceWorker_p0-w0: resuming experience collection (4300 times) [2024-06-15 12:36:15,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 168099840. Throughput: 0: 10843.1. Samples: 42103808. Policy #0 lag: (min: 13.0, avg: 88.5, max: 269.0) [2024-06-15 12:36:15,741][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:36:16,389][1652475] Updated weights for policy 0, policy_version 82112 (0.0043) [2024-06-15 12:36:17,891][1652475] Updated weights for policy 0, policy_version 82170 (0.0012) [2024-06-15 12:36:19,450][1652475] Updated weights for policy 0, policy_version 82235 (0.0012) [2024-06-15 12:36:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 168427520. Throughput: 0: 10649.6. Samples: 42129920. Policy #0 lag: (min: 13.0, avg: 88.5, max: 269.0) [2024-06-15 12:36:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:36:25,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 168427520. Throughput: 0: 10547.2. Samples: 42202624. Policy #0 lag: (min: 13.0, avg: 88.5, max: 269.0) [2024-06-15 12:36:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:36:26,361][1652475] Updated weights for policy 0, policy_version 82276 (0.0013) [2024-06-15 12:36:27,672][1652475] Updated weights for policy 0, policy_version 82321 (0.0013) [2024-06-15 12:36:29,030][1652475] Updated weights for policy 0, policy_version 82371 (0.0014) [2024-06-15 12:36:30,724][1652475] Updated weights for policy 0, policy_version 82434 (0.0013) [2024-06-15 12:36:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 168820736. Throughput: 0: 10763.4. Samples: 42265600. Policy #0 lag: (min: 22.0, avg: 149.5, max: 274.0) [2024-06-15 12:36:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:36:32,128][1652475] Updated weights for policy 0, policy_version 82496 (0.0015) [2024-06-15 12:36:35,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 168951808. Throughput: 0: 10592.7. Samples: 42293760. Policy #0 lag: (min: 22.0, avg: 149.5, max: 274.0) [2024-06-15 12:36:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:36:38,428][1652475] Updated weights for policy 0, policy_version 82552 (0.0034) [2024-06-15 12:36:40,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 169148416. Throughput: 0: 10945.5. Samples: 42362368. Policy #0 lag: (min: 22.0, avg: 149.5, max: 274.0) [2024-06-15 12:36:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 12:36:40,822][1652475] Updated weights for policy 0, policy_version 82608 (0.0013) [2024-06-15 12:36:42,504][1652475] Updated weights for policy 0, policy_version 82682 (0.0018) [2024-06-15 12:36:44,525][1652475] Updated weights for policy 0, policy_version 82744 (0.0023) [2024-06-15 12:36:45,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 43098.4). Total num frames: 169476096. Throughput: 0: 10615.5. Samples: 42417664. Policy #0 lag: (min: 22.0, avg: 149.5, max: 274.0) [2024-06-15 12:36:45,760][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:36:49,525][1652475] Updated weights for policy 0, policy_version 82800 (0.0012) [2024-06-15 12:36:50,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 169607168. Throughput: 0: 10729.3. Samples: 42456064. Policy #0 lag: (min: 22.0, avg: 149.5, max: 274.0) [2024-06-15 12:36:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:36:53,808][1652475] Updated weights for policy 0, policy_version 82867 (0.0030) [2024-06-15 12:36:55,606][1652475] Updated weights for policy 0, policy_version 82946 (0.0101) [2024-06-15 12:36:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.3, 300 sec: 43209.3). Total num frames: 169869312. Throughput: 0: 10888.6. Samples: 42524672. Policy #0 lag: (min: 22.0, avg: 149.5, max: 274.0) [2024-06-15 12:36:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:36:56,159][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000082976_169934848.pth... [2024-06-15 12:36:56,297][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000077888_159514624.pth [2024-06-15 12:37:00,284][1652475] Updated weights for policy 0, policy_version 83024 (0.0012) [2024-06-15 12:37:00,737][1651340] Signal inference workers to stop experience collection... (4350 times) [2024-06-15 12:37:00,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 44782.8, 300 sec: 42876.1). Total num frames: 170065920. Throughput: 0: 10786.1. Samples: 42589184. Policy #0 lag: (min: 22.0, avg: 149.5, max: 274.0) [2024-06-15 12:37:00,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:00,813][1652475] InferenceWorker_p0-w0: stopping experience collection (4350 times) [2024-06-15 12:37:01,062][1651340] Signal inference workers to resume experience collection... (4350 times) [2024-06-15 12:37:01,063][1652475] InferenceWorker_p0-w0: resuming experience collection (4350 times) [2024-06-15 12:37:05,563][1652475] Updated weights for policy 0, policy_version 83104 (0.0125) [2024-06-15 12:37:05,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 40959.9, 300 sec: 42876.1). Total num frames: 170196992. Throughput: 0: 10934.0. Samples: 42621952. Policy #0 lag: (min: 22.0, avg: 149.5, max: 274.0) [2024-06-15 12:37:05,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:07,612][1652475] Updated weights for policy 0, policy_version 83184 (0.0086) [2024-06-15 12:37:08,886][1652475] Updated weights for policy 0, policy_version 83256 (0.0013) [2024-06-15 12:37:10,738][1648984] Fps is (10 sec: 45876.3, 60 sec: 43690.8, 300 sec: 42987.2). Total num frames: 170524672. Throughput: 0: 10581.4. Samples: 42678784. Policy #0 lag: (min: 22.0, avg: 149.5, max: 274.0) [2024-06-15 12:37:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:15,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42598.3, 300 sec: 43098.9). Total num frames: 170655744. Throughput: 0: 10899.9. Samples: 42756096. Policy #0 lag: (min: 22.0, avg: 149.5, max: 274.0) [2024-06-15 12:37:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:16,959][1652475] Updated weights for policy 0, policy_version 83332 (0.0016) [2024-06-15 12:37:18,808][1652475] Updated weights for policy 0, policy_version 83411 (0.0012) [2024-06-15 12:37:20,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 170983424. Throughput: 0: 10990.9. Samples: 42788352. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:37:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:21,229][1652475] Updated weights for policy 0, policy_version 83515 (0.0018) [2024-06-15 12:37:23,616][1652475] Updated weights for policy 0, policy_version 83568 (0.0014) [2024-06-15 12:37:25,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 45875.1, 300 sec: 43098.2). Total num frames: 171180032. Throughput: 0: 10899.9. Samples: 42852864. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:37:25,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:29,528][1652475] Updated weights for policy 0, policy_version 83632 (0.0013) [2024-06-15 12:37:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.5, 300 sec: 43098.2). Total num frames: 171376640. Throughput: 0: 11229.9. Samples: 42923008. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:37:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:31,292][1652475] Updated weights for policy 0, policy_version 83712 (0.0080) [2024-06-15 12:37:32,806][1652475] Updated weights for policy 0, policy_version 83771 (0.0012) [2024-06-15 12:37:35,745][1652475] Updated weights for policy 0, policy_version 83830 (0.0020) [2024-06-15 12:37:35,738][1648984] Fps is (10 sec: 49153.1, 60 sec: 45329.1, 300 sec: 42987.2). Total num frames: 171671552. Throughput: 0: 10956.8. Samples: 42949120. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:37:35,746][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 171737088. Throughput: 0: 11104.7. Samples: 43024384. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:37:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:40,825][1652475] Updated weights for policy 0, policy_version 83860 (0.0014) [2024-06-15 12:37:42,451][1652475] Updated weights for policy 0, policy_version 83936 (0.0112) [2024-06-15 12:37:42,874][1651340] Signal inference workers to stop experience collection... (4400 times) [2024-06-15 12:37:42,918][1652475] InferenceWorker_p0-w0: stopping experience collection (4400 times) [2024-06-15 12:37:43,178][1651340] Signal inference workers to resume experience collection... (4400 times) [2024-06-15 12:37:43,179][1652475] InferenceWorker_p0-w0: resuming experience collection (4400 times) [2024-06-15 12:37:44,190][1652475] Updated weights for policy 0, policy_version 84003 (0.0080) [2024-06-15 12:37:45,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 172097536. Throughput: 0: 11013.7. Samples: 43084800. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:37:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:47,169][1652475] Updated weights for policy 0, policy_version 84065 (0.0015) [2024-06-15 12:37:50,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 172228608. Throughput: 0: 11047.8. Samples: 43119104. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:37:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:52,480][1652475] Updated weights for policy 0, policy_version 84123 (0.0013) [2024-06-15 12:37:54,329][1652475] Updated weights for policy 0, policy_version 84198 (0.0136) [2024-06-15 12:37:55,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 172523520. Throughput: 0: 11218.5. Samples: 43183616. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:37:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:37:56,055][1652475] Updated weights for policy 0, policy_version 84258 (0.0096) [2024-06-15 12:37:58,408][1652475] Updated weights for policy 0, policy_version 84304 (0.0013) [2024-06-15 12:38:00,747][1648984] Fps is (10 sec: 52428.8, 60 sec: 44783.0, 300 sec: 43098.2). Total num frames: 172752896. Throughput: 0: 10979.6. Samples: 43250176. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:38:00,749][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 12:38:04,319][1652475] Updated weights for policy 0, policy_version 84374 (0.0012) [2024-06-15 12:38:05,648][1652475] Updated weights for policy 0, policy_version 84434 (0.0012) [2024-06-15 12:38:05,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 45329.1, 300 sec: 43542.6). Total num frames: 172916736. Throughput: 0: 11059.2. Samples: 43286016. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:38:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:38:06,904][1652475] Updated weights for policy 0, policy_version 84483 (0.0042) [2024-06-15 12:38:08,085][1652475] Updated weights for policy 0, policy_version 84543 (0.0012) [2024-06-15 12:38:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 173146112. Throughput: 0: 10956.8. Samples: 43345920. Policy #0 lag: (min: 95.0, avg: 187.8, max: 319.0) [2024-06-15 12:38:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 12:38:13,075][1652475] Updated weights for policy 0, policy_version 84605 (0.0014) [2024-06-15 12:38:15,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 44236.9, 300 sec: 43209.4). Total num frames: 173309952. Throughput: 0: 10968.2. Samples: 43416576. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 12:38:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 12:38:16,805][1652475] Updated weights for policy 0, policy_version 84663 (0.0013) [2024-06-15 12:38:19,107][1652475] Updated weights for policy 0, policy_version 84720 (0.0012) [2024-06-15 12:38:20,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.6, 300 sec: 43320.5). Total num frames: 173604864. Throughput: 0: 10990.9. Samples: 43443712. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 12:38:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:38:21,140][1652475] Updated weights for policy 0, policy_version 84798 (0.0027) [2024-06-15 12:38:25,304][1652475] Updated weights for policy 0, policy_version 84856 (0.0014) [2024-06-15 12:38:25,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 173801472. Throughput: 0: 10786.1. Samples: 43509760. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 12:38:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:38:27,875][1651340] Signal inference workers to stop experience collection... (4450 times) [2024-06-15 12:38:27,994][1652475] InferenceWorker_p0-w0: stopping experience collection (4450 times) [2024-06-15 12:38:28,019][1652475] Updated weights for policy 0, policy_version 84904 (0.0139) [2024-06-15 12:38:28,126][1651340] Signal inference workers to resume experience collection... (4450 times) [2024-06-15 12:38:28,128][1652475] InferenceWorker_p0-w0: resuming experience collection (4450 times) [2024-06-15 12:38:30,146][1652475] Updated weights for policy 0, policy_version 84944 (0.0013) [2024-06-15 12:38:30,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43690.5, 300 sec: 43431.5). Total num frames: 173998080. Throughput: 0: 10888.5. Samples: 43574784. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 12:38:30,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:38:31,576][1652475] Updated weights for policy 0, policy_version 84995 (0.0013) [2024-06-15 12:38:33,086][1652475] Updated weights for policy 0, policy_version 85056 (0.0012) [2024-06-15 12:38:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 43209.4). Total num frames: 174227456. Throughput: 0: 10695.1. Samples: 43600384. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 12:38:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:38:36,642][1652475] Updated weights for policy 0, policy_version 85114 (0.0016) [2024-06-15 12:38:40,398][1652475] Updated weights for policy 0, policy_version 85168 (0.0014) [2024-06-15 12:38:40,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 45329.0, 300 sec: 43542.6). Total num frames: 174456832. Throughput: 0: 10956.8. Samples: 43676672. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 12:38:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:38:42,493][1652475] Updated weights for policy 0, policy_version 85217 (0.0012) [2024-06-15 12:38:44,240][1652475] Updated weights for policy 0, policy_version 85296 (0.0022) [2024-06-15 12:38:45,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 174718976. Throughput: 0: 10752.0. Samples: 43734016. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 12:38:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 12:38:48,596][1652475] Updated weights for policy 0, policy_version 85367 (0.0013) [2024-06-15 12:38:50,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 174850048. Throughput: 0: 10752.0. Samples: 43769856. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 12:38:50,738][1648984] Avg episode reward: [(0, '-0.650')] [2024-06-15 12:38:52,547][1652475] Updated weights for policy 0, policy_version 85429 (0.0013) [2024-06-15 12:38:55,000][1652475] Updated weights for policy 0, policy_version 85472 (0.0091) [2024-06-15 12:38:55,738][1648984] Fps is (10 sec: 39319.8, 60 sec: 43144.3, 300 sec: 43431.4). Total num frames: 175112192. Throughput: 0: 10797.4. Samples: 43831808. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 12:38:55,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:38:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000085504_175112192.pth... [2024-06-15 12:38:55,806][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000080448_164757504.pth [2024-06-15 12:38:57,218][1652475] Updated weights for policy 0, policy_version 85542 (0.0011) [2024-06-15 12:38:58,968][1652475] Updated weights for policy 0, policy_version 85584 (0.0012) [2024-06-15 12:39:00,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 175374336. Throughput: 0: 10615.5. Samples: 43894272. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 12:39:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 12:39:05,534][1652475] Updated weights for policy 0, policy_version 85638 (0.0012) [2024-06-15 12:39:05,738][1648984] Fps is (10 sec: 29492.2, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 175407104. Throughput: 0: 10865.8. Samples: 43932672. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:39:07,502][1652475] Updated weights for policy 0, policy_version 85719 (0.0012) [2024-06-15 12:39:08,995][1652475] Updated weights for policy 0, policy_version 85792 (0.0012) [2024-06-15 12:39:10,671][1652475] Updated weights for policy 0, policy_version 85828 (0.0012) [2024-06-15 12:39:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 175767552. Throughput: 0: 10672.3. Samples: 43990016. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:39:15,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 175898624. Throughput: 0: 10899.9. Samples: 44065280. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:39:17,122][1652475] Updated weights for policy 0, policy_version 85904 (0.0015) [2024-06-15 12:39:17,256][1651340] Signal inference workers to stop experience collection... (4500 times) [2024-06-15 12:39:17,315][1652475] InferenceWorker_p0-w0: stopping experience collection (4500 times) [2024-06-15 12:39:17,547][1651340] Signal inference workers to resume experience collection... (4500 times) [2024-06-15 12:39:17,548][1652475] InferenceWorker_p0-w0: resuming experience collection (4500 times) [2024-06-15 12:39:19,248][1652475] Updated weights for policy 0, policy_version 85989 (0.0017) [2024-06-15 12:39:20,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 176193536. Throughput: 0: 11013.7. Samples: 44096000. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:39:21,553][1652475] Updated weights for policy 0, policy_version 86064 (0.0011) [2024-06-15 12:39:23,256][1652475] Updated weights for policy 0, policy_version 86136 (0.0012) [2024-06-15 12:39:25,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 176422912. Throughput: 0: 10661.0. Samples: 44156416. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:39:29,975][1652475] Updated weights for policy 0, policy_version 86199 (0.0017) [2024-06-15 12:39:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.8, 300 sec: 43764.7). Total num frames: 176619520. Throughput: 0: 10968.1. Samples: 44227584. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:39:31,357][1652475] Updated weights for policy 0, policy_version 86272 (0.0012) [2024-06-15 12:39:33,484][1652475] Updated weights for policy 0, policy_version 86327 (0.0014) [2024-06-15 12:39:34,977][1652475] Updated weights for policy 0, policy_version 86397 (0.0013) [2024-06-15 12:39:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45329.1, 300 sec: 43653.7). Total num frames: 176947200. Throughput: 0: 10854.4. Samples: 44258304. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:39:40,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 176947200. Throughput: 0: 11116.2. Samples: 44332032. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:40,740][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:39:42,491][1652475] Updated weights for policy 0, policy_version 86480 (0.0014) [2024-06-15 12:39:45,007][1652475] Updated weights for policy 0, policy_version 86582 (0.0014) [2024-06-15 12:39:45,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 43690.4, 300 sec: 43764.7). Total num frames: 177340416. Throughput: 0: 11013.6. Samples: 44389888. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:45,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:39:46,491][1652475] Updated weights for policy 0, policy_version 86625 (0.0016) [2024-06-15 12:39:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 177471488. Throughput: 0: 10865.8. Samples: 44421632. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:39:53,758][1652475] Updated weights for policy 0, policy_version 86704 (0.0012) [2024-06-15 12:39:55,630][1652475] Updated weights for policy 0, policy_version 86784 (0.0120) [2024-06-15 12:39:55,738][1648984] Fps is (10 sec: 39322.7, 60 sec: 43690.9, 300 sec: 43653.7). Total num frames: 177733632. Throughput: 0: 11116.1. Samples: 44490240. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 12:39:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:39:58,442][1652475] Updated weights for policy 0, policy_version 86849 (0.0015) [2024-06-15 12:39:59,206][1651340] Signal inference workers to stop experience collection... (4550 times) [2024-06-15 12:39:59,267][1652475] InferenceWorker_p0-w0: stopping experience collection (4550 times) [2024-06-15 12:39:59,460][1651340] Signal inference workers to resume experience collection... (4550 times) [2024-06-15 12:39:59,461][1652475] InferenceWorker_p0-w0: resuming experience collection (4550 times) [2024-06-15 12:39:59,798][1652475] Updated weights for policy 0, policy_version 86910 (0.0027) [2024-06-15 12:40:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 177995776. Throughput: 0: 10706.5. Samples: 44547072. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:40:05,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 178061312. Throughput: 0: 10911.3. Samples: 44587008. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:05,740][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:40:06,308][1652475] Updated weights for policy 0, policy_version 86979 (0.0015) [2024-06-15 12:40:07,920][1652475] Updated weights for policy 0, policy_version 87047 (0.0013) [2024-06-15 12:40:09,127][1652475] Updated weights for policy 0, policy_version 87097 (0.0015) [2024-06-15 12:40:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.9, 300 sec: 43764.7). Total num frames: 178421760. Throughput: 0: 10968.2. Samples: 44649984. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:40:11,368][1652475] Updated weights for policy 0, policy_version 87152 (0.0013) [2024-06-15 12:40:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 178520064. Throughput: 0: 10934.1. Samples: 44719616. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:15,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:40:17,379][1652475] Updated weights for policy 0, policy_version 87206 (0.0023) [2024-06-15 12:40:18,771][1652475] Updated weights for policy 0, policy_version 87249 (0.0013) [2024-06-15 12:40:20,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 43653.7). Total num frames: 178814976. Throughput: 0: 11070.5. Samples: 44756480. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:40:21,399][1652475] Updated weights for policy 0, policy_version 87350 (0.0144) [2024-06-15 12:40:24,327][1652475] Updated weights for policy 0, policy_version 87419 (0.0011) [2024-06-15 12:40:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 179044352. Throughput: 0: 10558.6. Samples: 44807168. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:40:30,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 179142656. Throughput: 0: 10797.6. Samples: 44875776. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:30,740][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:40:31,270][1652475] Updated weights for policy 0, policy_version 87504 (0.0013) [2024-06-15 12:40:34,075][1652475] Updated weights for policy 0, policy_version 87607 (0.0131) [2024-06-15 12:40:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 43653.7). Total num frames: 179437568. Throughput: 0: 10501.7. Samples: 44894208. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:40:37,484][1652475] Updated weights for policy 0, policy_version 87664 (0.0012) [2024-06-15 12:40:40,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 179568640. Throughput: 0: 10490.3. Samples: 44962304. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:40:43,979][1652475] Updated weights for policy 0, policy_version 87728 (0.0222) [2024-06-15 12:40:45,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 40414.0, 300 sec: 43320.4). Total num frames: 179765248. Throughput: 0: 10661.0. Samples: 45026816. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:40:46,006][1652475] Updated weights for policy 0, policy_version 87795 (0.0023) [2024-06-15 12:40:47,182][1651340] Signal inference workers to stop experience collection... (4600 times) [2024-06-15 12:40:47,242][1652475] InferenceWorker_p0-w0: stopping experience collection (4600 times) [2024-06-15 12:40:47,474][1651340] Signal inference workers to resume experience collection... (4600 times) [2024-06-15 12:40:47,474][1652475] InferenceWorker_p0-w0: resuming experience collection (4600 times) [2024-06-15 12:40:47,807][1652475] Updated weights for policy 0, policy_version 87872 (0.0078) [2024-06-15 12:40:50,166][1652475] Updated weights for policy 0, policy_version 87935 (0.0017) [2024-06-15 12:40:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 180092928. Throughput: 0: 10296.9. Samples: 45050368. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:40:55,739][1648984] Fps is (10 sec: 32767.9, 60 sec: 39321.6, 300 sec: 43098.3). Total num frames: 180092928. Throughput: 0: 10535.8. Samples: 45124096. Policy #0 lag: (min: 15.0, avg: 167.0, max: 271.0) [2024-06-15 12:40:55,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:40:56,202][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000087968_180158464.pth... [2024-06-15 12:40:56,306][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000082976_169934848.pth [2024-06-15 12:40:57,119][1652475] Updated weights for policy 0, policy_version 88001 (0.0016) [2024-06-15 12:40:58,572][1652475] Updated weights for policy 0, policy_version 88064 (0.0013) [2024-06-15 12:41:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 180486144. Throughput: 0: 10137.6. Samples: 45175808. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 12:41:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:41:01,793][1652475] Updated weights for policy 0, policy_version 88132 (0.0152) [2024-06-15 12:41:02,906][1652475] Updated weights for policy 0, policy_version 88189 (0.0029) [2024-06-15 12:41:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 180617216. Throughput: 0: 10137.6. Samples: 45212672. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 12:41:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:41:09,101][1652475] Updated weights for policy 0, policy_version 88248 (0.0017) [2024-06-15 12:41:10,713][1652475] Updated weights for policy 0, policy_version 88291 (0.0022) [2024-06-15 12:41:10,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 39867.7, 300 sec: 43098.2). Total num frames: 180813824. Throughput: 0: 10490.3. Samples: 45279232. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 12:41:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:41:12,453][1652475] Updated weights for policy 0, policy_version 88368 (0.0011) [2024-06-15 12:41:13,905][1652475] Updated weights for policy 0, policy_version 88404 (0.0013) [2024-06-15 12:41:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 181141504. Throughput: 0: 10137.6. Samples: 45331968. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 12:41:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:41:20,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 38775.5, 300 sec: 43098.3). Total num frames: 181141504. Throughput: 0: 10490.3. Samples: 45366272. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 12:41:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:41:21,788][1652475] Updated weights for policy 0, policy_version 88482 (0.0020) [2024-06-15 12:41:23,648][1652475] Updated weights for policy 0, policy_version 88560 (0.0012) [2024-06-15 12:41:25,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 39867.8, 300 sec: 42765.0). Total num frames: 181436416. Throughput: 0: 10296.9. Samples: 45425664. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 12:41:25,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 12:41:25,924][1652475] Updated weights for policy 0, policy_version 88610 (0.0013) [2024-06-15 12:41:27,056][1652475] Updated weights for policy 0, policy_version 88672 (0.0012) [2024-06-15 12:41:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 181665792. Throughput: 0: 10365.2. Samples: 45493248. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 12:41:30,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:41:33,995][1652475] Updated weights for policy 0, policy_version 88761 (0.0014) [2024-06-15 12:41:34,803][1651340] Signal inference workers to stop experience collection... (4650 times) [2024-06-15 12:41:34,895][1652475] InferenceWorker_p0-w0: stopping experience collection (4650 times) [2024-06-15 12:41:34,997][1651340] Signal inference workers to resume experience collection... (4650 times) [2024-06-15 12:41:34,998][1652475] InferenceWorker_p0-w0: resuming experience collection (4650 times) [2024-06-15 12:41:35,173][1652475] Updated weights for policy 0, policy_version 88803 (0.0134) [2024-06-15 12:41:35,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 181927936. Throughput: 0: 10706.5. Samples: 45532160. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 12:41:35,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:41:37,664][1652475] Updated weights for policy 0, policy_version 88912 (0.0016) [2024-06-15 12:41:38,717][1652475] Updated weights for policy 0, policy_version 88960 (0.0012) [2024-06-15 12:41:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 182190080. Throughput: 0: 10308.3. Samples: 45587968. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 12:41:40,740][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:41:45,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 182288384. Throughput: 0: 10820.2. Samples: 45662720. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 12:41:45,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:41:45,839][1652475] Updated weights for policy 0, policy_version 89024 (0.0014) [2024-06-15 12:41:47,520][1652475] Updated weights for policy 0, policy_version 89078 (0.0013) [2024-06-15 12:41:48,963][1652475] Updated weights for policy 0, policy_version 89136 (0.0012) [2024-06-15 12:41:50,669][1652475] Updated weights for policy 0, policy_version 89211 (0.0019) [2024-06-15 12:41:50,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 182714368. Throughput: 0: 10706.5. Samples: 45694464. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:41:50,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:41:55,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 182714368. Throughput: 0: 10672.3. Samples: 45759488. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:41:55,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:41:57,740][1652475] Updated weights for policy 0, policy_version 89275 (0.0015) [2024-06-15 12:41:59,009][1652475] Updated weights for policy 0, policy_version 89328 (0.0014) [2024-06-15 12:42:00,633][1652475] Updated weights for policy 0, policy_version 89392 (0.0015) [2024-06-15 12:42:00,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.5, 300 sec: 43653.7). Total num frames: 183074816. Throughput: 0: 11002.3. Samples: 45827072. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:42:00,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:42:02,302][1652475] Updated weights for policy 0, policy_version 89456 (0.0013) [2024-06-15 12:42:05,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 183238656. Throughput: 0: 10968.1. Samples: 45859840. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:42:05,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:42:09,608][1652475] Updated weights for policy 0, policy_version 89536 (0.0034) [2024-06-15 12:42:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 183435264. Throughput: 0: 11218.5. Samples: 45930496. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:42:10,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:42:11,022][1652475] Updated weights for policy 0, policy_version 89600 (0.0032) [2024-06-15 12:42:12,588][1652475] Updated weights for policy 0, policy_version 89658 (0.0013) [2024-06-15 12:42:13,593][1652475] Updated weights for policy 0, policy_version 89696 (0.0011) [2024-06-15 12:42:15,738][1648984] Fps is (10 sec: 52430.6, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 183762944. Throughput: 0: 11150.2. Samples: 45995008. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:42:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:42:19,684][1651340] Signal inference workers to stop experience collection... (4700 times) [2024-06-15 12:42:19,753][1652475] InferenceWorker_p0-w0: stopping experience collection (4700 times) [2024-06-15 12:42:19,755][1652475] Updated weights for policy 0, policy_version 89749 (0.0023) [2024-06-15 12:42:19,960][1651340] Signal inference workers to resume experience collection... (4700 times) [2024-06-15 12:42:19,961][1652475] InferenceWorker_p0-w0: resuming experience collection (4700 times) [2024-06-15 12:42:20,742][1648984] Fps is (10 sec: 45854.3, 60 sec: 45871.7, 300 sec: 43097.6). Total num frames: 183894016. Throughput: 0: 11171.9. Samples: 46034944. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:42:20,743][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:42:20,884][1652475] Updated weights for policy 0, policy_version 89796 (0.0013) [2024-06-15 12:42:21,889][1652475] Updated weights for policy 0, policy_version 89846 (0.0015) [2024-06-15 12:42:23,221][1652475] Updated weights for policy 0, policy_version 89904 (0.0018) [2024-06-15 12:42:25,342][1652475] Updated weights for policy 0, policy_version 89968 (0.0026) [2024-06-15 12:42:25,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 47513.5, 300 sec: 43764.7). Total num frames: 184287232. Throughput: 0: 11457.4. Samples: 46103552. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:42:25,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:42:30,738][1648984] Fps is (10 sec: 39336.1, 60 sec: 43690.0, 300 sec: 42764.9). Total num frames: 184287232. Throughput: 0: 11377.6. Samples: 46174720. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:42:30,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:42:31,370][1652475] Updated weights for policy 0, policy_version 90006 (0.0012) [2024-06-15 12:42:33,493][1652475] Updated weights for policy 0, policy_version 90102 (0.0015) [2024-06-15 12:42:35,195][1652475] Updated weights for policy 0, policy_version 90160 (0.0014) [2024-06-15 12:42:35,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 45875.3, 300 sec: 43875.8). Total num frames: 184680448. Throughput: 0: 11309.5. Samples: 46203392. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:42:35,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:42:36,359][1652475] Updated weights for policy 0, policy_version 90195 (0.0014) [2024-06-15 12:42:40,738][1648984] Fps is (10 sec: 52432.9, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 184811520. Throughput: 0: 11366.4. Samples: 46270976. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:42:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 12:42:40,745][1651340] Saving new best policy, reward=-0.300! [2024-06-15 12:42:42,372][1652475] Updated weights for policy 0, policy_version 90257 (0.0012) [2024-06-15 12:42:44,102][1652475] Updated weights for policy 0, policy_version 90336 (0.0014) [2024-06-15 12:42:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 46421.4, 300 sec: 43542.6). Total num frames: 185073664. Throughput: 0: 11275.4. Samples: 46334464. Policy #0 lag: (min: 63.0, avg: 200.0, max: 303.0) [2024-06-15 12:42:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:42:47,979][1652475] Updated weights for policy 0, policy_version 90400 (0.0013) [2024-06-15 12:42:49,848][1652475] Updated weights for policy 0, policy_version 90480 (0.0013) [2024-06-15 12:42:50,740][1648984] Fps is (10 sec: 52415.2, 60 sec: 43688.8, 300 sec: 43431.1). Total num frames: 185335808. Throughput: 0: 11229.3. Samples: 46365184. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:42:50,741][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:42:50,742][1651340] Saving new best policy, reward=-0.260! [2024-06-15 12:42:55,278][1652475] Updated weights for policy 0, policy_version 90512 (0.0163) [2024-06-15 12:42:55,738][1648984] Fps is (10 sec: 32767.2, 60 sec: 44782.8, 300 sec: 42876.1). Total num frames: 185401344. Throughput: 0: 11161.5. Samples: 46432768. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:42:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 12:42:56,188][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000090560_185466880.pth... [2024-06-15 12:42:56,477][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000085504_175112192.pth [2024-06-15 12:42:56,532][1652475] Updated weights for policy 0, policy_version 90563 (0.0014) [2024-06-15 12:43:00,023][1652475] Updated weights for policy 0, policy_version 90658 (0.0133) [2024-06-15 12:43:00,738][1648984] Fps is (10 sec: 39332.2, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 185729024. Throughput: 0: 11070.6. Samples: 46493184. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:43:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 12:43:01,298][1651340] Signal inference workers to stop experience collection... (4750 times) [2024-06-15 12:43:01,332][1652475] InferenceWorker_p0-w0: stopping experience collection (4750 times) [2024-06-15 12:43:01,587][1651340] Signal inference workers to resume experience collection... (4750 times) [2024-06-15 12:43:01,588][1652475] InferenceWorker_p0-w0: resuming experience collection (4750 times) [2024-06-15 12:43:02,296][1652475] Updated weights for policy 0, policy_version 90747 (0.0018) [2024-06-15 12:43:05,738][1648984] Fps is (10 sec: 45876.2, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 185860096. Throughput: 0: 10753.1. Samples: 46518784. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:43:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 12:43:08,421][1652475] Updated weights for policy 0, policy_version 90813 (0.0012) [2024-06-15 12:43:10,407][1652475] Updated weights for policy 0, policy_version 90880 (0.0012) [2024-06-15 12:43:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 186122240. Throughput: 0: 10843.0. Samples: 46591488. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:43:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 12:43:13,515][1652475] Updated weights for policy 0, policy_version 90960 (0.0012) [2024-06-15 12:43:14,889][1652475] Updated weights for policy 0, policy_version 91005 (0.0028) [2024-06-15 12:43:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 186384384. Throughput: 0: 10456.4. Samples: 46645248. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:43:15,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 12:43:20,738][1648984] Fps is (10 sec: 32767.5, 60 sec: 42601.5, 300 sec: 42876.1). Total num frames: 186449920. Throughput: 0: 10695.1. Samples: 46684672. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:43:20,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 12:43:21,001][1652475] Updated weights for policy 0, policy_version 91056 (0.0014) [2024-06-15 12:43:21,752][1652475] Updated weights for policy 0, policy_version 91088 (0.0012) [2024-06-15 12:43:22,578][1652475] Updated weights for policy 0, policy_version 91129 (0.0014) [2024-06-15 12:43:24,951][1652475] Updated weights for policy 0, policy_version 91171 (0.0013) [2024-06-15 12:43:25,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 43320.4). Total num frames: 186777600. Throughput: 0: 10695.1. Samples: 46752256. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:43:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 12:43:27,079][1652475] Updated weights for policy 0, policy_version 91256 (0.0011) [2024-06-15 12:43:30,738][1648984] Fps is (10 sec: 45876.3, 60 sec: 43691.3, 300 sec: 42987.2). Total num frames: 186908672. Throughput: 0: 10535.8. Samples: 46808576. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:43:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 12:43:32,703][1652475] Updated weights for policy 0, policy_version 91328 (0.0013) [2024-06-15 12:43:34,628][1652475] Updated weights for policy 0, policy_version 91388 (0.0023) [2024-06-15 12:43:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 187170816. Throughput: 0: 10582.0. Samples: 46841344. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:43:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:43:37,167][1652475] Updated weights for policy 0, policy_version 91428 (0.0013) [2024-06-15 12:43:39,692][1652475] Updated weights for policy 0, policy_version 91472 (0.0013) [2024-06-15 12:43:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 187432960. Throughput: 0: 10467.6. Samples: 46903808. Policy #0 lag: (min: 15.0, avg: 107.7, max: 207.0) [2024-06-15 12:43:40,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:43:44,464][1652475] Updated weights for policy 0, policy_version 91526 (0.0058) [2024-06-15 12:43:45,720][1652475] Updated weights for policy 0, policy_version 91581 (0.0017) [2024-06-15 12:43:45,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 187531264. Throughput: 0: 10581.3. Samples: 46969344. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:43:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:43:47,176][1652475] Updated weights for policy 0, policy_version 91632 (0.0013) [2024-06-15 12:43:47,896][1652475] Updated weights for policy 0, policy_version 91651 (0.0021) [2024-06-15 12:43:49,322][1652475] Updated weights for policy 0, policy_version 91705 (0.0012) [2024-06-15 12:43:50,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41508.0, 300 sec: 43098.3). Total num frames: 187826176. Throughput: 0: 10729.3. Samples: 47001600. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:43:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:43:51,033][1651340] Signal inference workers to stop experience collection... (4800 times) [2024-06-15 12:43:51,082][1652475] InferenceWorker_p0-w0: stopping experience collection (4800 times) [2024-06-15 12:43:51,309][1651340] Signal inference workers to resume experience collection... (4800 times) [2024-06-15 12:43:51,310][1652475] InferenceWorker_p0-w0: resuming experience collection (4800 times) [2024-06-15 12:43:51,800][1652475] Updated weights for policy 0, policy_version 91744 (0.0133) [2024-06-15 12:43:55,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42598.6, 300 sec: 42653.9). Total num frames: 187957248. Throughput: 0: 10615.5. Samples: 47069184. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:43:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:43:56,786][1652475] Updated weights for policy 0, policy_version 91815 (0.0018) [2024-06-15 12:43:58,396][1652475] Updated weights for policy 0, policy_version 91859 (0.0013) [2024-06-15 12:44:00,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43144.5, 300 sec: 43764.7). Total num frames: 188317696. Throughput: 0: 10774.8. Samples: 47130112. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:44:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:00,752][1652475] Updated weights for policy 0, policy_version 91957 (0.0014) [2024-06-15 12:44:03,847][1652475] Updated weights for policy 0, policy_version 91987 (0.0012) [2024-06-15 12:44:04,548][1652475] Updated weights for policy 0, policy_version 92029 (0.0023) [2024-06-15 12:44:05,740][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 188481536. Throughput: 0: 10683.8. Samples: 47165440. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:44:05,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:08,331][1652475] Updated weights for policy 0, policy_version 92082 (0.0020) [2024-06-15 12:44:09,659][1652475] Updated weights for policy 0, policy_version 92112 (0.0011) [2024-06-15 12:44:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 188710912. Throughput: 0: 10956.8. Samples: 47245312. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:44:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:11,994][1652475] Updated weights for policy 0, policy_version 92192 (0.0014) [2024-06-15 12:44:15,567][1652475] Updated weights for policy 0, policy_version 92244 (0.0013) [2024-06-15 12:44:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 43098.2). Total num frames: 188907520. Throughput: 0: 10968.2. Samples: 47302144. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:44:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:19,592][1652475] Updated weights for policy 0, policy_version 92304 (0.0013) [2024-06-15 12:44:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44783.1, 300 sec: 43098.3). Total num frames: 189136896. Throughput: 0: 11013.7. Samples: 47336960. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:44:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:20,747][1652475] Updated weights for policy 0, policy_version 92352 (0.0025) [2024-06-15 12:44:22,769][1652475] Updated weights for policy 0, policy_version 92412 (0.0012) [2024-06-15 12:44:25,740][1648984] Fps is (10 sec: 49142.1, 60 sec: 43689.1, 300 sec: 43320.1). Total num frames: 189399040. Throughput: 0: 10910.8. Samples: 47394816. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:44:25,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:27,009][1652475] Updated weights for policy 0, policy_version 92482 (0.0014) [2024-06-15 12:44:28,310][1652475] Updated weights for policy 0, policy_version 92543 (0.0012) [2024-06-15 12:44:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 189530112. Throughput: 0: 11104.7. Samples: 47469056. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:44:30,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:32,739][1652475] Updated weights for policy 0, policy_version 92603 (0.0012) [2024-06-15 12:44:35,386][1652475] Updated weights for policy 0, policy_version 92672 (0.0013) [2024-06-15 12:44:35,738][1648984] Fps is (10 sec: 42607.2, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 189825024. Throughput: 0: 11081.9. Samples: 47500288. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:44:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:35,815][1651340] Signal inference workers to stop experience collection... (4850 times) [2024-06-15 12:44:35,861][1652475] InferenceWorker_p0-w0: stopping experience collection (4850 times) [2024-06-15 12:44:36,023][1651340] Signal inference workers to resume experience collection... (4850 times) [2024-06-15 12:44:36,023][1652475] InferenceWorker_p0-w0: resuming experience collection (4850 times) [2024-06-15 12:44:36,595][1652475] Updated weights for policy 0, policy_version 92731 (0.0108) [2024-06-15 12:44:40,609][1652475] Updated weights for policy 0, policy_version 92793 (0.0014) [2024-06-15 12:44:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 190054400. Throughput: 0: 10968.2. Samples: 47562752. Policy #0 lag: (min: 12.0, avg: 77.9, max: 204.0) [2024-06-15 12:44:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:45,322][1652475] Updated weights for policy 0, policy_version 92858 (0.0014) [2024-06-15 12:44:45,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 190185472. Throughput: 0: 10956.8. Samples: 47623168. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:44:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:47,449][1652475] Updated weights for policy 0, policy_version 92916 (0.0046) [2024-06-15 12:44:48,686][1652475] Updated weights for policy 0, policy_version 92982 (0.0013) [2024-06-15 12:44:50,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 190447616. Throughput: 0: 10934.0. Samples: 47657472. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:44:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:44:51,588][1652475] Updated weights for policy 0, policy_version 93012 (0.0013) [2024-06-15 12:44:55,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 190611456. Throughput: 0: 10740.6. Samples: 47728640. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:44:55,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 12:44:55,818][1652475] Updated weights for policy 0, policy_version 93088 (0.0014) [2024-06-15 12:44:56,155][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000093104_190676992.pth... [2024-06-15 12:44:56,228][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000087968_180158464.pth [2024-06-15 12:44:57,687][1652475] Updated weights for policy 0, policy_version 93122 (0.0013) [2024-06-15 12:44:58,901][1652475] Updated weights for policy 0, policy_version 93183 (0.0013) [2024-06-15 12:45:00,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 190939136. Throughput: 0: 10979.6. Samples: 47796224. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:45:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 12:45:00,858][1652475] Updated weights for policy 0, policy_version 93245 (0.0013) [2024-06-15 12:45:03,088][1652475] Updated weights for policy 0, policy_version 93299 (0.0014) [2024-06-15 12:45:05,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 191102976. Throughput: 0: 10922.7. Samples: 47828480. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:45:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 12:45:08,500][1652475] Updated weights for policy 0, policy_version 93360 (0.0013) [2024-06-15 12:45:10,132][1652475] Updated weights for policy 0, policy_version 93433 (0.0017) [2024-06-15 12:45:10,738][1648984] Fps is (10 sec: 42597.4, 60 sec: 44236.6, 300 sec: 43542.5). Total num frames: 191365120. Throughput: 0: 11014.1. Samples: 47890432. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:45:10,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 12:45:13,897][1652475] Updated weights for policy 0, policy_version 93500 (0.0012) [2024-06-15 12:45:15,116][1652475] Updated weights for policy 0, policy_version 93552 (0.0012) [2024-06-15 12:45:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 191627264. Throughput: 0: 10752.0. Samples: 47952896. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:45:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:45:19,776][1652475] Updated weights for policy 0, policy_version 93602 (0.0013) [2024-06-15 12:45:20,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 44236.7, 300 sec: 43209.3). Total num frames: 191791104. Throughput: 0: 11104.7. Samples: 48000000. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:45:20,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:45:21,107][1651340] Signal inference workers to stop experience collection... (4900 times) [2024-06-15 12:45:21,234][1652475] InferenceWorker_p0-w0: stopping experience collection (4900 times) [2024-06-15 12:45:21,336][1651340] Signal inference workers to resume experience collection... (4900 times) [2024-06-15 12:45:21,337][1652475] InferenceWorker_p0-w0: resuming experience collection (4900 times) [2024-06-15 12:45:21,640][1652475] Updated weights for policy 0, policy_version 93696 (0.0093) [2024-06-15 12:45:25,241][1652475] Updated weights for policy 0, policy_version 93763 (0.0098) [2024-06-15 12:45:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44238.3, 300 sec: 43764.7). Total num frames: 192053248. Throughput: 0: 11138.8. Samples: 48064000. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:45:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:45:30,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 192151552. Throughput: 0: 11298.1. Samples: 48131584. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:45:30,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:45:32,197][1652475] Updated weights for policy 0, policy_version 93872 (0.0014) [2024-06-15 12:45:34,184][1652475] Updated weights for policy 0, policy_version 93943 (0.0014) [2024-06-15 12:45:35,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 192413696. Throughput: 0: 11161.6. Samples: 48159744. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:45:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:45:36,977][1652475] Updated weights for policy 0, policy_version 93985 (0.0012) [2024-06-15 12:45:38,876][1652475] Updated weights for policy 0, policy_version 94073 (0.0013) [2024-06-15 12:45:40,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 192675840. Throughput: 0: 10888.5. Samples: 48218624. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 12:45:40,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:45:44,208][1652475] Updated weights for policy 0, policy_version 94112 (0.0012) [2024-06-15 12:45:45,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 192839680. Throughput: 0: 10945.4. Samples: 48288768. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:45:45,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:45:46,617][1652475] Updated weights for policy 0, policy_version 94207 (0.0019) [2024-06-15 12:45:49,342][1652475] Updated weights for policy 0, policy_version 94265 (0.0014) [2024-06-15 12:45:50,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 44236.8, 300 sec: 44098.0). Total num frames: 193101824. Throughput: 0: 10934.1. Samples: 48320512. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:45:50,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:45:51,113][1652475] Updated weights for policy 0, policy_version 94304 (0.0012) [2024-06-15 12:45:55,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 193200128. Throughput: 0: 10888.6. Samples: 48380416. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:45:55,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:45:56,644][1652475] Updated weights for policy 0, policy_version 94337 (0.0051) [2024-06-15 12:45:57,905][1652475] Updated weights for policy 0, policy_version 94400 (0.0099) [2024-06-15 12:45:59,106][1652475] Updated weights for policy 0, policy_version 94452 (0.0012) [2024-06-15 12:46:00,480][1652475] Updated weights for policy 0, policy_version 94484 (0.0013) [2024-06-15 12:46:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.6, 300 sec: 43764.7). Total num frames: 193527808. Throughput: 0: 10854.4. Samples: 48441344. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:46:00,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:46:02,967][1652475] Updated weights for policy 0, policy_version 94530 (0.0016) [2024-06-15 12:46:04,426][1652475] Updated weights for policy 0, policy_version 94592 (0.0027) [2024-06-15 12:46:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 193724416. Throughput: 0: 10615.5. Samples: 48477696. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:46:05,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:09,016][1651340] Signal inference workers to stop experience collection... (4950 times) [2024-06-15 12:46:09,084][1652475] InferenceWorker_p0-w0: stopping experience collection (4950 times) [2024-06-15 12:46:09,254][1651340] Signal inference workers to resume experience collection... (4950 times) [2024-06-15 12:46:09,255][1652475] InferenceWorker_p0-w0: resuming experience collection (4950 times) [2024-06-15 12:46:09,622][1652475] Updated weights for policy 0, policy_version 94656 (0.0012) [2024-06-15 12:46:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.6, 300 sec: 43320.4). Total num frames: 193921024. Throughput: 0: 10683.7. Samples: 48544768. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:46:10,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:11,029][1652475] Updated weights for policy 0, policy_version 94714 (0.0017) [2024-06-15 12:46:12,401][1652475] Updated weights for policy 0, policy_version 94776 (0.0015) [2024-06-15 12:46:15,650][1652475] Updated weights for policy 0, policy_version 94838 (0.0013) [2024-06-15 12:46:15,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43144.6, 300 sec: 44320.1). Total num frames: 194215936. Throughput: 0: 10649.6. Samples: 48610816. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:46:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:20,479][1652475] Updated weights for policy 0, policy_version 94867 (0.0037) [2024-06-15 12:46:20,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 194281472. Throughput: 0: 10786.1. Samples: 48645120. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:46:20,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:22,932][1652475] Updated weights for policy 0, policy_version 94966 (0.0146) [2024-06-15 12:46:24,810][1652475] Updated weights for policy 0, policy_version 95033 (0.0013) [2024-06-15 12:46:25,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 194641920. Throughput: 0: 10808.9. Samples: 48705024. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:46:25,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:27,920][1652475] Updated weights for policy 0, policy_version 95088 (0.0012) [2024-06-15 12:46:30,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 194772992. Throughput: 0: 10797.5. Samples: 48774656. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:46:30,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:32,353][1652475] Updated weights for policy 0, policy_version 95152 (0.0015) [2024-06-15 12:46:34,356][1652475] Updated weights for policy 0, policy_version 95200 (0.0022) [2024-06-15 12:46:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 195035136. Throughput: 0: 10979.5. Samples: 48814592. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:46:35,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:36,875][1652475] Updated weights for policy 0, policy_version 95289 (0.0018) [2024-06-15 12:46:39,288][1652475] Updated weights for policy 0, policy_version 95319 (0.0012) [2024-06-15 12:46:40,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 44098.0). Total num frames: 195297280. Throughput: 0: 11013.7. Samples: 48876032. Policy #0 lag: (min: 11.0, avg: 85.3, max: 267.0) [2024-06-15 12:46:40,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:43,025][1652475] Updated weights for policy 0, policy_version 95363 (0.0011) [2024-06-15 12:46:45,721][1652475] Updated weights for policy 0, policy_version 95440 (0.0013) [2024-06-15 12:46:45,740][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 195461120. Throughput: 0: 11286.7. Samples: 48949248. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:46:45,741][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:47,397][1652475] Updated weights for policy 0, policy_version 95505 (0.0134) [2024-06-15 12:46:50,746][1648984] Fps is (10 sec: 42562.0, 60 sec: 43684.4, 300 sec: 44096.7). Total num frames: 195723264. Throughput: 0: 11000.2. Samples: 48972800. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:46:50,747][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:51,064][1652475] Updated weights for policy 0, policy_version 95584 (0.0062) [2024-06-15 12:46:54,634][1651340] Signal inference workers to stop experience collection... (5000 times) [2024-06-15 12:46:54,696][1652475] InferenceWorker_p0-w0: stopping experience collection (5000 times) [2024-06-15 12:46:54,835][1651340] Signal inference workers to resume experience collection... (5000 times) [2024-06-15 12:46:54,836][1652475] InferenceWorker_p0-w0: resuming experience collection (5000 times) [2024-06-15 12:46:55,714][1652475] Updated weights for policy 0, policy_version 95675 (0.0015) [2024-06-15 12:46:55,738][1648984] Fps is (10 sec: 45873.8, 60 sec: 45328.8, 300 sec: 43542.5). Total num frames: 195919872. Throughput: 0: 11207.0. Samples: 49049088. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:46:55,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:46:55,801][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000095680_195952640.pth... [2024-06-15 12:46:55,889][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000090560_185466880.pth [2024-06-15 12:46:57,928][1652475] Updated weights for policy 0, policy_version 95717 (0.0018) [2024-06-15 12:47:00,014][1652475] Updated weights for policy 0, policy_version 95804 (0.0012) [2024-06-15 12:47:00,738][1648984] Fps is (10 sec: 49193.9, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 196214784. Throughput: 0: 11013.7. Samples: 49106432. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:47:00,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:47:03,129][1652475] Updated weights for policy 0, policy_version 95868 (0.0016) [2024-06-15 12:47:05,738][1648984] Fps is (10 sec: 42600.2, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 196345856. Throughput: 0: 11127.5. Samples: 49145856. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:47:05,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:47:09,008][1652475] Updated weights for policy 0, policy_version 95938 (0.0012) [2024-06-15 12:47:10,739][1648984] Fps is (10 sec: 39317.4, 60 sec: 44782.1, 300 sec: 43542.4). Total num frames: 196608000. Throughput: 0: 11377.5. Samples: 49217024. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:47:10,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:47:10,784][1652475] Updated weights for policy 0, policy_version 96002 (0.0012) [2024-06-15 12:47:12,139][1652475] Updated weights for policy 0, policy_version 96054 (0.0012) [2024-06-15 12:47:14,250][1652475] Updated weights for policy 0, policy_version 96096 (0.0056) [2024-06-15 12:47:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44236.7, 300 sec: 43987.6). Total num frames: 196870144. Throughput: 0: 11150.2. Samples: 49276416. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:47:15,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:47:18,434][1652475] Updated weights for policy 0, policy_version 96145 (0.0015) [2024-06-15 12:47:20,738][1648984] Fps is (10 sec: 39325.5, 60 sec: 45329.0, 300 sec: 43098.2). Total num frames: 197001216. Throughput: 0: 11070.6. Samples: 49312768. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:47:20,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:47:21,715][1652475] Updated weights for policy 0, policy_version 96208 (0.0013) [2024-06-15 12:47:24,247][1652475] Updated weights for policy 0, policy_version 96272 (0.0016) [2024-06-15 12:47:25,746][1648984] Fps is (10 sec: 39302.3, 60 sec: 43687.1, 300 sec: 43986.3). Total num frames: 197263360. Throughput: 0: 10932.8. Samples: 49368064. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:47:25,752][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:47:26,382][1652475] Updated weights for policy 0, policy_version 96338 (0.0014) [2024-06-15 12:47:30,737][1648984] Fps is (10 sec: 39322.6, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 197394432. Throughput: 0: 10820.3. Samples: 49436160. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:47:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:47:32,097][1652475] Updated weights for policy 0, policy_version 96400 (0.0013) [2024-06-15 12:47:34,490][1652475] Updated weights for policy 0, policy_version 96480 (0.0146) [2024-06-15 12:47:35,738][1648984] Fps is (10 sec: 39340.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 197656576. Throughput: 0: 11061.3. Samples: 49470464. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:47:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:47:36,300][1652475] Updated weights for policy 0, policy_version 96544 (0.0017) [2024-06-15 12:47:38,225][1652475] Updated weights for policy 0, policy_version 96608 (0.0115) [2024-06-15 12:47:38,352][1651340] Signal inference workers to stop experience collection... (5050 times) [2024-06-15 12:47:38,397][1652475] InferenceWorker_p0-w0: stopping experience collection (5050 times) [2024-06-15 12:47:38,561][1651340] Signal inference workers to resume experience collection... (5050 times) [2024-06-15 12:47:38,562][1652475] InferenceWorker_p0-w0: resuming experience collection (5050 times) [2024-06-15 12:47:40,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 197918720. Throughput: 0: 10592.8. Samples: 49525760. Policy #0 lag: (min: 5.0, avg: 95.4, max: 261.0) [2024-06-15 12:47:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:47:44,452][1652475] Updated weights for policy 0, policy_version 96677 (0.0012) [2024-06-15 12:47:45,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 43209.7). Total num frames: 198082560. Throughput: 0: 11013.7. Samples: 49602048. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:47:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:47:45,890][1652475] Updated weights for policy 0, policy_version 96723 (0.0012) [2024-06-15 12:47:47,521][1652475] Updated weights for policy 0, policy_version 96786 (0.0014) [2024-06-15 12:47:49,404][1652475] Updated weights for policy 0, policy_version 96836 (0.0013) [2024-06-15 12:47:50,680][1652475] Updated weights for policy 0, policy_version 96896 (0.0138) [2024-06-15 12:47:50,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45335.5, 300 sec: 44209.1). Total num frames: 198443008. Throughput: 0: 10774.7. Samples: 49630720. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:47:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:47:55,746][1648984] Fps is (10 sec: 39287.9, 60 sec: 42592.5, 300 sec: 43208.1). Total num frames: 198475776. Throughput: 0: 10829.8. Samples: 49704448. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:47:55,747][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:47:56,612][1652475] Updated weights for policy 0, policy_version 96953 (0.0013) [2024-06-15 12:47:58,507][1652475] Updated weights for policy 0, policy_version 97022 (0.0016) [2024-06-15 12:48:00,041][1652475] Updated weights for policy 0, policy_version 97083 (0.0131) [2024-06-15 12:48:00,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 198836224. Throughput: 0: 10797.5. Samples: 49762304. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:48:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 12:48:02,369][1652475] Updated weights for policy 0, policy_version 97125 (0.0014) [2024-06-15 12:48:05,738][1648984] Fps is (10 sec: 49194.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 198967296. Throughput: 0: 10706.5. Samples: 49794560. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:48:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:48:07,937][1652475] Updated weights for policy 0, policy_version 97185 (0.0038) [2024-06-15 12:48:09,834][1652475] Updated weights for policy 0, policy_version 97249 (0.0014) [2024-06-15 12:48:10,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43691.5, 300 sec: 43542.6). Total num frames: 199229440. Throughput: 0: 11071.8. Samples: 49866240. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:48:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:48:11,890][1652475] Updated weights for policy 0, policy_version 97344 (0.0035) [2024-06-15 12:48:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 43875.8). Total num frames: 199393280. Throughput: 0: 10865.8. Samples: 49925120. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:48:15,743][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:48:19,289][1652475] Updated weights for policy 0, policy_version 97411 (0.0021) [2024-06-15 12:48:20,519][1652475] Updated weights for policy 0, policy_version 97466 (0.0015) [2024-06-15 12:48:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 199622656. Throughput: 0: 10729.3. Samples: 49953280. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:48:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:48:24,406][1652475] Updated weights for policy 0, policy_version 97552 (0.0016) [2024-06-15 12:48:25,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43694.2, 300 sec: 43986.9). Total num frames: 199884800. Throughput: 0: 10956.8. Samples: 50018816. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:48:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:48:27,420][1652475] Updated weights for policy 0, policy_version 97616 (0.0022) [2024-06-15 12:48:27,982][1651340] Signal inference workers to stop experience collection... (5100 times) [2024-06-15 12:48:28,030][1652475] InferenceWorker_p0-w0: stopping experience collection (5100 times) [2024-06-15 12:48:28,202][1651340] Signal inference workers to resume experience collection... (5100 times) [2024-06-15 12:48:28,203][1652475] InferenceWorker_p0-w0: resuming experience collection (5100 times) [2024-06-15 12:48:30,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.5, 300 sec: 43542.5). Total num frames: 200015872. Throughput: 0: 10786.1. Samples: 50087424. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:48:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:48:31,529][1652475] Updated weights for policy 0, policy_version 97697 (0.0013) [2024-06-15 12:48:34,089][1652475] Updated weights for policy 0, policy_version 97760 (0.0093) [2024-06-15 12:48:35,740][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 200278016. Throughput: 0: 10865.8. Samples: 50119680. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:48:35,741][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:48:36,329][1652475] Updated weights for policy 0, policy_version 97808 (0.0092) [2024-06-15 12:48:38,945][1652475] Updated weights for policy 0, policy_version 97859 (0.0013) [2024-06-15 12:48:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 44097.9). Total num frames: 200540160. Throughput: 0: 10731.3. Samples: 50187264. Policy #0 lag: (min: 1.0, avg: 68.5, max: 257.0) [2024-06-15 12:48:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:48:42,722][1652475] Updated weights for policy 0, policy_version 97936 (0.0019) [2024-06-15 12:48:45,455][1652475] Updated weights for policy 0, policy_version 98000 (0.0013) [2024-06-15 12:48:45,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 200704000. Throughput: 0: 10854.4. Samples: 50250752. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:48:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:48:48,255][1652475] Updated weights for policy 0, policy_version 98064 (0.0033) [2024-06-15 12:48:49,492][1652475] Updated weights for policy 0, policy_version 98110 (0.0021) [2024-06-15 12:48:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 200933376. Throughput: 0: 10911.3. Samples: 50285568. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:48:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:48:51,972][1652475] Updated weights for policy 0, policy_version 98172 (0.0026) [2024-06-15 12:48:55,444][1652475] Updated weights for policy 0, policy_version 98240 (0.0014) [2024-06-15 12:48:55,738][1648984] Fps is (10 sec: 49149.9, 60 sec: 45335.3, 300 sec: 43653.6). Total num frames: 201195520. Throughput: 0: 10865.7. Samples: 50355200. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:48:55,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:48:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000098240_201195520.pth... [2024-06-15 12:48:55,877][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000093104_190676992.pth [2024-06-15 12:48:58,011][1652475] Updated weights for policy 0, policy_version 98298 (0.0015) [2024-06-15 12:49:00,614][1652475] Updated weights for policy 0, policy_version 98368 (0.0014) [2024-06-15 12:49:00,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 201457664. Throughput: 0: 10899.9. Samples: 50415616. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:49:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:49:05,738][1648984] Fps is (10 sec: 39323.0, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 201588736. Throughput: 0: 11104.7. Samples: 50452992. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:49:05,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:49:06,756][1652475] Updated weights for policy 0, policy_version 98464 (0.0019) [2024-06-15 12:49:08,646][1652475] Updated weights for policy 0, policy_version 98497 (0.0017) [2024-06-15 12:49:10,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43690.5, 300 sec: 43875.8). Total num frames: 201850880. Throughput: 0: 11116.1. Samples: 50519040. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:49:10,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:49:11,004][1652475] Updated weights for policy 0, policy_version 98576 (0.0014) [2024-06-15 12:49:12,223][1652475] Updated weights for policy 0, policy_version 98624 (0.0013) [2024-06-15 12:49:15,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44236.8, 300 sec: 43764.7). Total num frames: 202047488. Throughput: 0: 11173.0. Samples: 50590208. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:49:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:49:15,983][1652475] Updated weights for policy 0, policy_version 98681 (0.0015) [2024-06-15 12:49:17,920][1651340] Signal inference workers to stop experience collection... (5150 times) [2024-06-15 12:49:17,994][1652475] InferenceWorker_p0-w0: stopping experience collection (5150 times) [2024-06-15 12:49:18,342][1651340] Signal inference workers to resume experience collection... (5150 times) [2024-06-15 12:49:18,344][1652475] InferenceWorker_p0-w0: resuming experience collection (5150 times) [2024-06-15 12:49:18,561][1652475] Updated weights for policy 0, policy_version 98723 (0.0012) [2024-06-15 12:49:20,175][1652475] Updated weights for policy 0, policy_version 98753 (0.0013) [2024-06-15 12:49:20,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 44782.9, 300 sec: 43765.0). Total num frames: 202309632. Throughput: 0: 11195.7. Samples: 50623488. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:49:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 12:49:21,174][1652475] Updated weights for policy 0, policy_version 98816 (0.0012) [2024-06-15 12:49:23,508][1652475] Updated weights for policy 0, policy_version 98878 (0.0013) [2024-06-15 12:49:25,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 202506240. Throughput: 0: 11127.5. Samples: 50688000. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:49:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 12:49:27,683][1652475] Updated weights for policy 0, policy_version 98935 (0.0012) [2024-06-15 12:49:29,092][1652475] Updated weights for policy 0, policy_version 98976 (0.0012) [2024-06-15 12:49:30,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 45875.2, 300 sec: 43875.8). Total num frames: 202768384. Throughput: 0: 11252.6. Samples: 50757120. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:49:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 12:49:33,979][1652475] Updated weights for policy 0, policy_version 99040 (0.0032) [2024-06-15 12:49:35,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 202964992. Throughput: 0: 11264.0. Samples: 50792448. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:49:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 12:49:36,099][1652475] Updated weights for policy 0, policy_version 99127 (0.0091) [2024-06-15 12:49:39,412][1652475] Updated weights for policy 0, policy_version 99184 (0.0014) [2024-06-15 12:49:40,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 203161600. Throughput: 0: 10922.8. Samples: 50846720. Policy #0 lag: (min: 31.0, avg: 133.7, max: 287.0) [2024-06-15 12:49:40,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:49:42,711][1652475] Updated weights for policy 0, policy_version 99232 (0.0013) [2024-06-15 12:49:45,483][1652475] Updated weights for policy 0, policy_version 99269 (0.0012) [2024-06-15 12:49:45,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 203325440. Throughput: 0: 11070.6. Samples: 50913792. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:49:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 12:49:47,300][1652475] Updated weights for policy 0, policy_version 99344 (0.0014) [2024-06-15 12:49:50,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 44098.0). Total num frames: 203620352. Throughput: 0: 10786.1. Samples: 50938368. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:49:50,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:49:50,833][1652475] Updated weights for policy 0, policy_version 99424 (0.0123) [2024-06-15 12:49:55,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.5, 300 sec: 43320.4). Total num frames: 203718656. Throughput: 0: 10911.3. Samples: 51010048. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:49:55,746][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:49:55,924][1652475] Updated weights for policy 0, policy_version 99488 (0.0014) [2024-06-15 12:49:58,853][1652475] Updated weights for policy 0, policy_version 99568 (0.0015) [2024-06-15 12:50:00,434][1652475] Updated weights for policy 0, policy_version 99632 (0.0011) [2024-06-15 12:50:00,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 204079104. Throughput: 0: 10592.7. Samples: 51066880. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:50:00,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:50:03,261][1652475] Updated weights for policy 0, policy_version 99680 (0.0013) [2024-06-15 12:50:05,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 204210176. Throughput: 0: 10706.5. Samples: 51105280. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:50:05,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:50:06,890][1651340] Signal inference workers to stop experience collection... (5200 times) [2024-06-15 12:50:06,987][1652475] InferenceWorker_p0-w0: stopping experience collection (5200 times) [2024-06-15 12:50:07,207][1651340] Signal inference workers to resume experience collection... (5200 times) [2024-06-15 12:50:07,208][1652475] InferenceWorker_p0-w0: resuming experience collection (5200 times) [2024-06-15 12:50:07,567][1652475] Updated weights for policy 0, policy_version 99744 (0.0050) [2024-06-15 12:50:09,749][1652475] Updated weights for policy 0, policy_version 99792 (0.0015) [2024-06-15 12:50:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 204439552. Throughput: 0: 10831.6. Samples: 51175424. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:50:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:50:11,836][1652475] Updated weights for policy 0, policy_version 99888 (0.0013) [2024-06-15 12:50:15,164][1652475] Updated weights for policy 0, policy_version 99968 (0.0015) [2024-06-15 12:50:15,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44782.8, 300 sec: 43875.8). Total num frames: 204734464. Throughput: 0: 10774.8. Samples: 51241984. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:50:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:50:19,779][1652475] Updated weights for policy 0, policy_version 100032 (0.0012) [2024-06-15 12:50:20,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 204865536. Throughput: 0: 10899.9. Samples: 51282944. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:50:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:50:22,673][1652475] Updated weights for policy 0, policy_version 100113 (0.0013) [2024-06-15 12:50:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 205127680. Throughput: 0: 10922.7. Samples: 51338240. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:50:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:50:27,060][1652475] Updated weights for policy 0, policy_version 100166 (0.0034) [2024-06-15 12:50:28,380][1652475] Updated weights for policy 0, policy_version 100224 (0.0015) [2024-06-15 12:50:30,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 205357056. Throughput: 0: 11013.7. Samples: 51409408. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:50:30,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 12:50:30,982][1652475] Updated weights for policy 0, policy_version 100288 (0.0014) [2024-06-15 12:50:33,405][1652475] Updated weights for policy 0, policy_version 100349 (0.0012) [2024-06-15 12:50:35,561][1652475] Updated weights for policy 0, policy_version 100413 (0.0015) [2024-06-15 12:50:35,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 44783.0, 300 sec: 43986.9). Total num frames: 205651968. Throughput: 0: 11036.5. Samples: 51435008. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:50:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:50:40,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 205717504. Throughput: 0: 11161.6. Samples: 51512320. Policy #0 lag: (min: 41.0, avg: 159.0, max: 297.0) [2024-06-15 12:50:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:50:42,143][1652475] Updated weights for policy 0, policy_version 100512 (0.0033) [2024-06-15 12:50:43,089][1652475] Updated weights for policy 0, policy_version 100544 (0.0016) [2024-06-15 12:50:44,973][1652475] Updated weights for policy 0, policy_version 100603 (0.0014) [2024-06-15 12:50:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 45329.1, 300 sec: 43875.8). Total num frames: 206045184. Throughput: 0: 11150.2. Samples: 51568640. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:50:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:50:47,901][1652475] Updated weights for policy 0, policy_version 100656 (0.0013) [2024-06-15 12:50:50,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 206176256. Throughput: 0: 11127.5. Samples: 51606016. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:50:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:50:52,229][1652475] Updated weights for policy 0, policy_version 100689 (0.0014) [2024-06-15 12:50:52,741][1651340] Signal inference workers to stop experience collection... (5250 times) [2024-06-15 12:50:52,801][1652475] InferenceWorker_p0-w0: stopping experience collection (5250 times) [2024-06-15 12:50:53,034][1651340] Signal inference workers to resume experience collection... (5250 times) [2024-06-15 12:50:53,035][1652475] InferenceWorker_p0-w0: resuming experience collection (5250 times) [2024-06-15 12:50:54,411][1652475] Updated weights for policy 0, policy_version 100770 (0.0013) [2024-06-15 12:50:55,738][1648984] Fps is (10 sec: 39320.0, 60 sec: 45328.8, 300 sec: 43764.7). Total num frames: 206438400. Throughput: 0: 10968.1. Samples: 51668992. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:50:55,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:50:55,964][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000100816_206471168.pth... [2024-06-15 12:50:56,174][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000095680_195952640.pth [2024-06-15 12:50:56,180][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000100816_206471168.pth [2024-06-15 12:50:56,889][1652475] Updated weights for policy 0, policy_version 100858 (0.0014) [2024-06-15 12:50:59,782][1652475] Updated weights for policy 0, policy_version 100924 (0.0014) [2024-06-15 12:51:00,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 206700544. Throughput: 0: 10899.9. Samples: 51732480. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:51:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:51:05,026][1652475] Updated weights for policy 0, policy_version 100978 (0.0013) [2024-06-15 12:51:05,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 206831616. Throughput: 0: 10911.2. Samples: 51773952. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:51:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:51:06,778][1652475] Updated weights for policy 0, policy_version 101056 (0.0013) [2024-06-15 12:51:08,534][1652475] Updated weights for policy 0, policy_version 101119 (0.0014) [2024-06-15 12:51:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 207093760. Throughput: 0: 10990.9. Samples: 51832832. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:51:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:51:12,030][1652475] Updated weights for policy 0, policy_version 101178 (0.0013) [2024-06-15 12:51:15,738][1648984] Fps is (10 sec: 39322.6, 60 sec: 41506.2, 300 sec: 43875.8). Total num frames: 207224832. Throughput: 0: 11013.7. Samples: 51905024. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:51:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:51:16,379][1652475] Updated weights for policy 0, policy_version 101217 (0.0012) [2024-06-15 12:51:18,215][1652475] Updated weights for policy 0, policy_version 101303 (0.0012) [2024-06-15 12:51:19,987][1652475] Updated weights for policy 0, policy_version 101369 (0.0011) [2024-06-15 12:51:20,739][1648984] Fps is (10 sec: 52422.4, 60 sec: 45874.3, 300 sec: 43986.7). Total num frames: 207618048. Throughput: 0: 11070.3. Samples: 51933184. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:51:20,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:51:23,475][1652475] Updated weights for policy 0, policy_version 101411 (0.0012) [2024-06-15 12:51:25,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 207749120. Throughput: 0: 10808.9. Samples: 51998720. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:51:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:51:28,130][1652475] Updated weights for policy 0, policy_version 101472 (0.0013) [2024-06-15 12:51:30,185][1652475] Updated weights for policy 0, policy_version 101568 (0.0014) [2024-06-15 12:51:30,738][1648984] Fps is (10 sec: 39326.6, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 208011264. Throughput: 0: 11025.1. Samples: 52064768. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:51:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:51:31,904][1652475] Updated weights for policy 0, policy_version 101632 (0.0059) [2024-06-15 12:51:35,738][1648984] Fps is (10 sec: 45876.1, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 208207872. Throughput: 0: 10899.9. Samples: 52096512. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:51:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:51:39,787][1651340] Signal inference workers to stop experience collection... (5300 times) [2024-06-15 12:51:39,853][1652475] Updated weights for policy 0, policy_version 101699 (0.0083) [2024-06-15 12:51:39,943][1652475] InferenceWorker_p0-w0: stopping experience collection (5300 times) [2024-06-15 12:51:40,105][1651340] Signal inference workers to resume experience collection... (5300 times) [2024-06-15 12:51:40,106][1652475] InferenceWorker_p0-w0: resuming experience collection (5300 times) [2024-06-15 12:51:40,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 44236.9, 300 sec: 43764.7). Total num frames: 208371712. Throughput: 0: 11013.8. Samples: 52164608. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:51:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:51:42,134][1652475] Updated weights for policy 0, policy_version 101800 (0.0106) [2024-06-15 12:51:44,103][1652475] Updated weights for policy 0, policy_version 101872 (0.0025) [2024-06-15 12:51:45,738][1648984] Fps is (10 sec: 45873.6, 60 sec: 43690.4, 300 sec: 43877.0). Total num frames: 208666624. Throughput: 0: 10888.4. Samples: 52222464. Policy #0 lag: (min: 2.0, avg: 129.5, max: 258.0) [2024-06-15 12:51:45,739][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 12:51:47,353][1652475] Updated weights for policy 0, policy_version 101909 (0.0013) [2024-06-15 12:51:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 43653.7). Total num frames: 208797696. Throughput: 0: 10717.9. Samples: 52256256. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:51:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:51:51,714][1652475] Updated weights for policy 0, policy_version 101956 (0.0046) [2024-06-15 12:51:53,391][1652475] Updated weights for policy 0, policy_version 102017 (0.0015) [2024-06-15 12:51:55,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 209059840. Throughput: 0: 10808.9. Samples: 52319232. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:51:55,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 12:51:57,879][1652475] Updated weights for policy 0, policy_version 102112 (0.0020) [2024-06-15 12:51:59,604][1652475] Updated weights for policy 0, policy_version 102160 (0.0015) [2024-06-15 12:52:00,631][1652475] Updated weights for policy 0, policy_version 102203 (0.0010) [2024-06-15 12:52:00,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 43690.4, 300 sec: 43986.8). Total num frames: 209321984. Throughput: 0: 10444.7. Samples: 52375040. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:52:00,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 12:52:05,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 42598.6, 300 sec: 43320.6). Total num frames: 209387520. Throughput: 0: 10740.9. Samples: 52416512. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:52:05,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 12:52:05,744][1652475] Updated weights for policy 0, policy_version 102256 (0.0013) [2024-06-15 12:52:08,586][1652475] Updated weights for policy 0, policy_version 102352 (0.0015) [2024-06-15 12:52:10,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 209715200. Throughput: 0: 10706.5. Samples: 52480512. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:52:10,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 12:52:11,129][1652475] Updated weights for policy 0, policy_version 102417 (0.0016) [2024-06-15 12:52:12,115][1652475] Updated weights for policy 0, policy_version 102463 (0.0013) [2024-06-15 12:52:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 209846272. Throughput: 0: 10854.4. Samples: 52553216. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:52:15,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 12:52:17,848][1652475] Updated weights for policy 0, policy_version 102533 (0.0014) [2024-06-15 12:52:18,873][1652475] Updated weights for policy 0, policy_version 102592 (0.0015) [2024-06-15 12:52:20,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43691.6, 300 sec: 43987.6). Total num frames: 210239488. Throughput: 0: 10911.3. Samples: 52587520. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:52:20,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 12:52:21,900][1652475] Updated weights for policy 0, policy_version 102657 (0.0092) [2024-06-15 12:52:22,193][1651340] Signal inference workers to stop experience collection... (5350 times) [2024-06-15 12:52:22,270][1652475] InferenceWorker_p0-w0: stopping experience collection (5350 times) [2024-06-15 12:52:22,446][1651340] Signal inference workers to resume experience collection... (5350 times) [2024-06-15 12:52:22,450][1652475] InferenceWorker_p0-w0: resuming experience collection (5350 times) [2024-06-15 12:52:23,169][1652475] Updated weights for policy 0, policy_version 102716 (0.0089) [2024-06-15 12:52:25,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.7, 300 sec: 43986.8). Total num frames: 210370560. Throughput: 0: 10945.4. Samples: 52657152. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:52:25,739][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 12:52:29,426][1652475] Updated weights for policy 0, policy_version 102817 (0.0014) [2024-06-15 12:52:30,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44236.7, 300 sec: 44098.0). Total num frames: 210665472. Throughput: 0: 11218.5. Samples: 52727296. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:52:30,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 12:52:31,165][1652475] Updated weights for policy 0, policy_version 102884 (0.0022) [2024-06-15 12:52:33,789][1652475] Updated weights for policy 0, policy_version 102932 (0.0014) [2024-06-15 12:52:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 210894848. Throughput: 0: 11264.0. Samples: 52763136. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:52:35,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 12:52:38,977][1652475] Updated weights for policy 0, policy_version 102982 (0.0013) [2024-06-15 12:52:40,705][1652475] Updated weights for policy 0, policy_version 103076 (0.0015) [2024-06-15 12:52:40,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45329.1, 300 sec: 44098.0). Total num frames: 211091456. Throughput: 0: 11605.4. Samples: 52841472. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:52:40,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 12:52:42,105][1652475] Updated weights for policy 0, policy_version 103143 (0.0095) [2024-06-15 12:52:45,288][1652475] Updated weights for policy 0, policy_version 103225 (0.0021) [2024-06-15 12:52:45,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 45875.4, 300 sec: 43986.9). Total num frames: 211419136. Throughput: 0: 11548.5. Samples: 52894720. Policy #0 lag: (min: 15.0, avg: 134.3, max: 271.0) [2024-06-15 12:52:45,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:52:50,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 45329.1, 300 sec: 44210.3). Total num frames: 211517440. Throughput: 0: 11605.3. Samples: 52938752. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:52:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:52:52,151][1652475] Updated weights for policy 0, policy_version 103297 (0.0014) [2024-06-15 12:52:53,772][1652475] Updated weights for policy 0, policy_version 103376 (0.0015) [2024-06-15 12:52:55,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 45875.1, 300 sec: 43986.9). Total num frames: 211812352. Throughput: 0: 11446.0. Samples: 52995584. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:52:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 12:52:55,759][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000103424_211812352.pth... [2024-06-15 12:52:55,806][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000098240_201195520.pth [2024-06-15 12:52:57,204][1652475] Updated weights for policy 0, policy_version 103425 (0.0013) [2024-06-15 12:52:58,590][1652475] Updated weights for policy 0, policy_version 103477 (0.0015) [2024-06-15 12:53:00,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 211943424. Throughput: 0: 11411.9. Samples: 53066752. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:53:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:01,874][1652475] Updated weights for policy 0, policy_version 103510 (0.0015) [2024-06-15 12:53:05,326][1652475] Updated weights for policy 0, policy_version 103607 (0.0015) [2024-06-15 12:53:05,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 46967.5, 300 sec: 43986.9). Total num frames: 212205568. Throughput: 0: 11343.6. Samples: 53097984. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:53:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:07,061][1652475] Updated weights for policy 0, policy_version 103650 (0.0035) [2024-06-15 12:53:09,386][1651340] Signal inference workers to stop experience collection... (5400 times) [2024-06-15 12:53:09,464][1652475] InferenceWorker_p0-w0: stopping experience collection (5400 times) [2024-06-15 12:53:09,672][1651340] Signal inference workers to resume experience collection... (5400 times) [2024-06-15 12:53:09,673][1652475] InferenceWorker_p0-w0: resuming experience collection (5400 times) [2024-06-15 12:53:10,094][1652475] Updated weights for policy 0, policy_version 103728 (0.0153) [2024-06-15 12:53:10,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 44320.1). Total num frames: 212467712. Throughput: 0: 11207.1. Samples: 53161472. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:53:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:14,834][1652475] Updated weights for policy 0, policy_version 103804 (0.0014) [2024-06-15 12:53:15,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 45875.1, 300 sec: 43986.9). Total num frames: 212598784. Throughput: 0: 11093.3. Samples: 53226496. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:53:15,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:17,610][1652475] Updated weights for policy 0, policy_version 103860 (0.0012) [2024-06-15 12:53:18,911][1652475] Updated weights for policy 0, policy_version 103890 (0.0021) [2024-06-15 12:53:20,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 212860928. Throughput: 0: 11013.7. Samples: 53258752. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:53:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:21,469][1652475] Updated weights for policy 0, policy_version 103939 (0.0097) [2024-06-15 12:53:25,676][1652475] Updated weights for policy 0, policy_version 104006 (0.0013) [2024-06-15 12:53:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 212992000. Throughput: 0: 10729.2. Samples: 53324288. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:53:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:28,929][1652475] Updated weights for policy 0, policy_version 104096 (0.0160) [2024-06-15 12:53:29,625][1652475] Updated weights for policy 0, policy_version 104124 (0.0013) [2024-06-15 12:53:30,740][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 44098.0). Total num frames: 213286912. Throughput: 0: 11002.3. Samples: 53389824. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:53:30,740][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:31,545][1652475] Updated weights for policy 0, policy_version 104181 (0.0017) [2024-06-15 12:53:34,312][1652475] Updated weights for policy 0, policy_version 104229 (0.0013) [2024-06-15 12:53:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 213516288. Throughput: 0: 10752.0. Samples: 53422592. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:53:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:37,614][1652475] Updated weights for policy 0, policy_version 104260 (0.0013) [2024-06-15 12:53:40,638][1652475] Updated weights for policy 0, policy_version 104352 (0.0015) [2024-06-15 12:53:40,740][1648984] Fps is (10 sec: 42590.5, 60 sec: 43689.3, 300 sec: 44097.7). Total num frames: 213712896. Throughput: 0: 11036.0. Samples: 53492224. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:53:40,740][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:42,226][1652475] Updated weights for policy 0, policy_version 104400 (0.0015) [2024-06-15 12:53:45,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 213909504. Throughput: 0: 10808.9. Samples: 53553152. Policy #0 lag: (min: 5.0, avg: 55.9, max: 255.0) [2024-06-15 12:53:45,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:45,861][1652475] Updated weights for policy 0, policy_version 104464 (0.0013) [2024-06-15 12:53:47,041][1652475] Updated weights for policy 0, policy_version 104507 (0.0016) [2024-06-15 12:53:50,738][1648984] Fps is (10 sec: 42606.7, 60 sec: 43690.6, 300 sec: 43875.9). Total num frames: 214138880. Throughput: 0: 10797.5. Samples: 53583872. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:53:50,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:50,902][1652475] Updated weights for policy 0, policy_version 104570 (0.0095) [2024-06-15 12:53:53,088][1652475] Updated weights for policy 0, policy_version 104612 (0.0013) [2024-06-15 12:53:53,611][1652475] Updated weights for policy 0, policy_version 104638 (0.0017) [2024-06-15 12:53:55,375][1652475] Updated weights for policy 0, policy_version 104688 (0.0014) [2024-06-15 12:53:55,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 214433792. Throughput: 0: 10934.0. Samples: 53653504. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:53:55,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:53:58,588][1652475] Updated weights for policy 0, policy_version 104742 (0.0024) [2024-06-15 12:54:00,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 214564864. Throughput: 0: 10831.7. Samples: 53713920. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:00,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 12:54:01,323][1651340] Signal inference workers to stop experience collection... (5450 times) [2024-06-15 12:54:01,404][1652475] InferenceWorker_p0-w0: stopping experience collection (5450 times) [2024-06-15 12:54:01,480][1651340] Signal inference workers to resume experience collection... (5450 times) [2024-06-15 12:54:01,481][1652475] InferenceWorker_p0-w0: resuming experience collection (5450 times) [2024-06-15 12:54:01,826][1652475] Updated weights for policy 0, policy_version 104800 (0.0013) [2024-06-15 12:54:05,160][1652475] Updated weights for policy 0, policy_version 104838 (0.0012) [2024-06-15 12:54:05,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42598.3, 300 sec: 43764.7). Total num frames: 214761472. Throughput: 0: 10945.4. Samples: 53751296. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 12:54:06,965][1652475] Updated weights for policy 0, policy_version 104912 (0.0013) [2024-06-15 12:54:08,873][1652475] Updated weights for policy 0, policy_version 104976 (0.0121) [2024-06-15 12:54:10,037][1652475] Updated weights for policy 0, policy_version 105024 (0.0013) [2024-06-15 12:54:10,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 215089152. Throughput: 0: 10865.8. Samples: 53813248. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:10,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:54:15,688][1652475] Updated weights for policy 0, policy_version 105082 (0.0016) [2024-06-15 12:54:15,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 215187456. Throughput: 0: 10854.4. Samples: 53878272. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:15,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 12:54:19,428][1652475] Updated weights for policy 0, policy_version 105173 (0.0012) [2024-06-15 12:54:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 215482368. Throughput: 0: 10843.0. Samples: 53910528. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:54:21,846][1652475] Updated weights for policy 0, policy_version 105217 (0.0012) [2024-06-15 12:54:22,781][1652475] Updated weights for policy 0, policy_version 105276 (0.0012) [2024-06-15 12:54:25,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 215613440. Throughput: 0: 10627.3. Samples: 53970432. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:54:28,278][1652475] Updated weights for policy 0, policy_version 105337 (0.0013) [2024-06-15 12:54:30,674][1652475] Updated weights for policy 0, policy_version 105392 (0.0021) [2024-06-15 12:54:30,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 215842816. Throughput: 0: 10854.4. Samples: 54041600. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:54:31,874][1652475] Updated weights for policy 0, policy_version 105444 (0.0015) [2024-06-15 12:54:32,500][1652475] Updated weights for policy 0, policy_version 105472 (0.0015) [2024-06-15 12:54:34,516][1652475] Updated weights for policy 0, policy_version 105536 (0.0023) [2024-06-15 12:54:35,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 216137728. Throughput: 0: 10877.1. Samples: 54073344. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:35,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:54:39,664][1652475] Updated weights for policy 0, policy_version 105592 (0.0118) [2024-06-15 12:54:40,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42599.7, 300 sec: 43875.8). Total num frames: 216268800. Throughput: 0: 10888.5. Samples: 54143488. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:54:41,365][1652475] Updated weights for policy 0, policy_version 105620 (0.0012) [2024-06-15 12:54:43,406][1652475] Updated weights for policy 0, policy_version 105698 (0.0054) [2024-06-15 12:54:45,739][1648984] Fps is (10 sec: 39316.2, 60 sec: 43689.6, 300 sec: 43764.5). Total num frames: 216530944. Throughput: 0: 10922.3. Samples: 54205440. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:45,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:54:45,777][1652475] Updated weights for policy 0, policy_version 105729 (0.0024) [2024-06-15 12:54:46,116][1651340] Signal inference workers to stop experience collection... (5500 times) [2024-06-15 12:54:46,193][1652475] InferenceWorker_p0-w0: stopping experience collection (5500 times) [2024-06-15 12:54:46,313][1651340] Signal inference workers to resume experience collection... (5500 times) [2024-06-15 12:54:46,314][1652475] InferenceWorker_p0-w0: resuming experience collection (5500 times) [2024-06-15 12:54:50,422][1652475] Updated weights for policy 0, policy_version 105824 (0.0092) [2024-06-15 12:54:50,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43144.5, 300 sec: 44098.0). Total num frames: 216727552. Throughput: 0: 10877.1. Samples: 54240768. Policy #0 lag: (min: 31.0, avg: 125.8, max: 287.0) [2024-06-15 12:54:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:54:53,624][1652475] Updated weights for policy 0, policy_version 105888 (0.0013) [2024-06-15 12:54:55,319][1652475] Updated weights for policy 0, policy_version 105952 (0.0011) [2024-06-15 12:54:55,738][1648984] Fps is (10 sec: 45880.4, 60 sec: 42598.2, 300 sec: 43764.7). Total num frames: 216989696. Throughput: 0: 10888.5. Samples: 54303232. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:54:55,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 12:54:56,200][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000105984_217055232.pth... [2024-06-15 12:54:56,282][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000100816_206471168.pth [2024-06-15 12:54:58,227][1652475] Updated weights for policy 0, policy_version 106000 (0.0014) [2024-06-15 12:55:00,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 217186304. Throughput: 0: 10808.9. Samples: 54364672. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:55:02,837][1652475] Updated weights for policy 0, policy_version 106080 (0.0014) [2024-06-15 12:55:05,738][1648984] Fps is (10 sec: 32769.0, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 217317376. Throughput: 0: 10843.0. Samples: 54398464. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:55:06,594][1652475] Updated weights for policy 0, policy_version 106149 (0.0012) [2024-06-15 12:55:08,451][1652475] Updated weights for policy 0, policy_version 106226 (0.0150) [2024-06-15 12:55:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 217579520. Throughput: 0: 10740.6. Samples: 54453760. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:10,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 12:55:12,450][1652475] Updated weights for policy 0, policy_version 106272 (0.0013) [2024-06-15 12:55:15,111][1652475] Updated weights for policy 0, policy_version 106368 (0.0123) [2024-06-15 12:55:15,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 44236.9, 300 sec: 43986.9). Total num frames: 217841664. Throughput: 0: 10479.0. Samples: 54513152. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:15,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 12:55:19,190][1652475] Updated weights for policy 0, policy_version 106429 (0.0012) [2024-06-15 12:55:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 217972736. Throughput: 0: 10558.6. Samples: 54548480. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:20,740][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:55:21,909][1652475] Updated weights for policy 0, policy_version 106480 (0.0015) [2024-06-15 12:55:22,277][1652475] Updated weights for policy 0, policy_version 106496 (0.0012) [2024-06-15 12:55:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 218202112. Throughput: 0: 10479.0. Samples: 54615040. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:25,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:55:26,460][1652475] Updated weights for policy 0, policy_version 106576 (0.0018) [2024-06-15 12:55:30,343][1652475] Updated weights for policy 0, policy_version 106656 (0.0015) [2024-06-15 12:55:30,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 218464256. Throughput: 0: 10467.9. Samples: 54676480. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:30,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:55:34,003][1652475] Updated weights for policy 0, policy_version 106708 (0.0011) [2024-06-15 12:55:35,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 43764.7). Total num frames: 218628096. Throughput: 0: 10410.7. Samples: 54709248. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:35,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:55:37,225][1651340] Signal inference workers to stop experience collection... (5550 times) [2024-06-15 12:55:37,288][1652475] InferenceWorker_p0-w0: stopping experience collection (5550 times) [2024-06-15 12:55:37,424][1651340] Signal inference workers to resume experience collection... (5550 times) [2024-06-15 12:55:37,424][1652475] InferenceWorker_p0-w0: resuming experience collection (5550 times) [2024-06-15 12:55:37,427][1652475] Updated weights for policy 0, policy_version 106768 (0.0056) [2024-06-15 12:55:38,437][1652475] Updated weights for policy 0, policy_version 106816 (0.0012) [2024-06-15 12:55:40,078][1652475] Updated weights for policy 0, policy_version 106873 (0.0016) [2024-06-15 12:55:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43542.5). Total num frames: 218890240. Throughput: 0: 10524.5. Samples: 54776832. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:40,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:55:42,758][1652475] Updated weights for policy 0, policy_version 106932 (0.0013) [2024-06-15 12:55:45,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42053.3, 300 sec: 43653.6). Total num frames: 219054080. Throughput: 0: 10638.2. Samples: 54843392. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:45,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:55:46,185][1652475] Updated weights for policy 0, policy_version 106978 (0.0035) [2024-06-15 12:55:49,811][1652475] Updated weights for policy 0, policy_version 107040 (0.0016) [2024-06-15 12:55:50,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.5, 300 sec: 43542.6). Total num frames: 219283456. Throughput: 0: 10649.6. Samples: 54877696. Policy #0 lag: (min: 49.0, avg: 143.9, max: 305.0) [2024-06-15 12:55:50,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:55:51,732][1652475] Updated weights for policy 0, policy_version 107120 (0.0014) [2024-06-15 12:55:54,012][1652475] Updated weights for policy 0, policy_version 107171 (0.0011) [2024-06-15 12:55:55,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 42598.6, 300 sec: 43542.5). Total num frames: 219545600. Throughput: 0: 10808.9. Samples: 54940160. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:55:55,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:55:57,905][1652475] Updated weights for policy 0, policy_version 107236 (0.0096) [2024-06-15 12:56:00,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 41506.0, 300 sec: 43542.6). Total num frames: 219676672. Throughput: 0: 11059.2. Samples: 55010816. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:00,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:56:01,273][1652475] Updated weights for policy 0, policy_version 107280 (0.0083) [2024-06-15 12:56:03,356][1652475] Updated weights for policy 0, policy_version 107362 (0.0012) [2024-06-15 12:56:05,381][1652475] Updated weights for policy 0, policy_version 107397 (0.0014) [2024-06-15 12:56:05,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 219971584. Throughput: 0: 10854.4. Samples: 55036928. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:05,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:56:06,593][1652475] Updated weights for policy 0, policy_version 107451 (0.0013) [2024-06-15 12:56:10,051][1652475] Updated weights for policy 0, policy_version 107515 (0.0019) [2024-06-15 12:56:10,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 220200960. Throughput: 0: 10956.8. Samples: 55108096. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:10,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:56:13,614][1652475] Updated weights for policy 0, policy_version 107583 (0.0109) [2024-06-15 12:56:15,215][1652475] Updated weights for policy 0, policy_version 107632 (0.0016) [2024-06-15 12:56:15,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.6, 300 sec: 43542.7). Total num frames: 220463104. Throughput: 0: 11047.8. Samples: 55173632. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:15,741][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 12:56:17,982][1652475] Updated weights for policy 0, policy_version 107685 (0.0012) [2024-06-15 12:56:20,738][1648984] Fps is (10 sec: 42597.6, 60 sec: 44236.6, 300 sec: 43653.6). Total num frames: 220626944. Throughput: 0: 11059.1. Samples: 55206912. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:56:21,167][1651340] Signal inference workers to stop experience collection... (5600 times) [2024-06-15 12:56:21,206][1652475] Updated weights for policy 0, policy_version 107747 (0.0014) [2024-06-15 12:56:21,237][1652475] InferenceWorker_p0-w0: stopping experience collection (5600 times) [2024-06-15 12:56:21,375][1651340] Signal inference workers to resume experience collection... (5600 times) [2024-06-15 12:56:21,376][1652475] InferenceWorker_p0-w0: resuming experience collection (5600 times) [2024-06-15 12:56:25,261][1652475] Updated weights for policy 0, policy_version 107824 (0.0014) [2024-06-15 12:56:25,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 220823552. Throughput: 0: 11070.6. Samples: 55275008. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:25,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:56:26,472][1652475] Updated weights for policy 0, policy_version 107872 (0.0013) [2024-06-15 12:56:27,237][1652475] Updated weights for policy 0, policy_version 107904 (0.0013) [2024-06-15 12:56:30,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 220987392. Throughput: 0: 10922.7. Samples: 55334912. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 12:56:33,880][1652475] Updated weights for policy 0, policy_version 108029 (0.0114) [2024-06-15 12:56:35,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 221249536. Throughput: 0: 10706.5. Samples: 55359488. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 12:56:38,004][1652475] Updated weights for policy 0, policy_version 108094 (0.0117) [2024-06-15 12:56:40,664][1652475] Updated weights for policy 0, policy_version 108157 (0.0014) [2024-06-15 12:56:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 221511680. Throughput: 0: 10774.8. Samples: 55425024. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:56:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 221675520. Throughput: 0: 10604.1. Samples: 55488000. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:56:46,267][1652475] Updated weights for policy 0, policy_version 108272 (0.0015) [2024-06-15 12:56:50,638][1652475] Updated weights for policy 0, policy_version 108339 (0.0014) [2024-06-15 12:56:50,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 221872128. Throughput: 0: 10672.4. Samples: 55517184. Policy #0 lag: (min: 106.0, avg: 191.9, max: 362.0) [2024-06-15 12:56:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:56:52,551][1652475] Updated weights for policy 0, policy_version 108400 (0.0030) [2024-06-15 12:56:55,762][1648984] Fps is (10 sec: 35956.6, 60 sec: 41489.1, 300 sec: 43094.7). Total num frames: 222035968. Throughput: 0: 10552.8. Samples: 55583232. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:56:55,763][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:56:56,236][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000108432_222068736.pth... [2024-06-15 12:56:56,390][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000103424_211812352.pth [2024-06-15 12:56:57,657][1652475] Updated weights for policy 0, policy_version 108480 (0.0015) [2024-06-15 12:57:00,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.8, 300 sec: 43764.7). Total num frames: 222298112. Throughput: 0: 10387.9. Samples: 55641088. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:57:01,980][1652475] Updated weights for policy 0, policy_version 108545 (0.0012) [2024-06-15 12:57:04,251][1652475] Updated weights for policy 0, policy_version 108610 (0.0014) [2024-06-15 12:57:05,738][1648984] Fps is (10 sec: 52557.5, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 222560256. Throughput: 0: 10296.9. Samples: 55670272. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:57:08,766][1652475] Updated weights for policy 0, policy_version 108688 (0.0024) [2024-06-15 12:57:09,784][1651340] Signal inference workers to stop experience collection... (5650 times) [2024-06-15 12:57:09,812][1652475] InferenceWorker_p0-w0: stopping experience collection (5650 times) [2024-06-15 12:57:09,887][1651340] Signal inference workers to resume experience collection... (5650 times) [2024-06-15 12:57:09,894][1652475] InferenceWorker_p0-w0: resuming experience collection (5650 times) [2024-06-15 12:57:09,896][1652475] Updated weights for policy 0, policy_version 108736 (0.0010) [2024-06-15 12:57:10,738][1648984] Fps is (10 sec: 42595.6, 60 sec: 42051.8, 300 sec: 43653.5). Total num frames: 222724096. Throughput: 0: 10365.0. Samples: 55741440. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:10,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:57:14,248][1652475] Updated weights for policy 0, policy_version 108801 (0.0011) [2024-06-15 12:57:15,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 222920704. Throughput: 0: 10456.2. Samples: 55805440. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:57:16,643][1652475] Updated weights for policy 0, policy_version 108887 (0.0012) [2024-06-15 12:57:20,738][1648984] Fps is (10 sec: 36047.1, 60 sec: 40960.2, 300 sec: 43098.3). Total num frames: 223084544. Throughput: 0: 10547.2. Samples: 55834112. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:57:20,911][1652475] Updated weights for policy 0, policy_version 108944 (0.0015) [2024-06-15 12:57:22,121][1652475] Updated weights for policy 0, policy_version 108991 (0.0014) [2024-06-15 12:57:24,050][1652475] Updated weights for policy 0, policy_version 109056 (0.0013) [2024-06-15 12:57:25,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 223346688. Throughput: 0: 10456.2. Samples: 55895552. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:57:28,857][1652475] Updated weights for policy 0, policy_version 109152 (0.0013) [2024-06-15 12:57:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 223608832. Throughput: 0: 10410.7. Samples: 55956480. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:30,740][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:57:33,010][1652475] Updated weights for policy 0, policy_version 109216 (0.0014) [2024-06-15 12:57:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 42876.1). Total num frames: 223739904. Throughput: 0: 10456.2. Samples: 55987712. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 12:57:37,711][1652475] Updated weights for policy 0, policy_version 109296 (0.0019) [2024-06-15 12:57:40,093][1652475] Updated weights for policy 0, policy_version 109332 (0.0022) [2024-06-15 12:57:40,738][1648984] Fps is (10 sec: 36042.6, 60 sec: 40959.6, 300 sec: 42542.8). Total num frames: 223969280. Throughput: 0: 10473.2. Samples: 56054272. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:40,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 12:57:42,136][1652475] Updated weights for policy 0, policy_version 109394 (0.0012) [2024-06-15 12:57:44,310][1652475] Updated weights for policy 0, policy_version 109456 (0.0014) [2024-06-15 12:57:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 224264192. Throughput: 0: 10592.7. Samples: 56117760. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:45,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:57:48,224][1652475] Updated weights for policy 0, policy_version 109505 (0.0020) [2024-06-15 12:57:50,738][1648984] Fps is (10 sec: 42600.7, 60 sec: 42052.2, 300 sec: 42654.0). Total num frames: 224395264. Throughput: 0: 10865.8. Samples: 56159232. Policy #0 lag: (min: 47.0, avg: 173.8, max: 303.0) [2024-06-15 12:57:50,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:57:51,094][1652475] Updated weights for policy 0, policy_version 109571 (0.0013) [2024-06-15 12:57:53,920][1652475] Updated weights for policy 0, policy_version 109680 (0.0014) [2024-06-15 12:57:55,739][1648984] Fps is (10 sec: 42598.0, 60 sec: 44254.9, 300 sec: 43209.3). Total num frames: 224690176. Throughput: 0: 10649.7. Samples: 56220672. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:57:55,740][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:57:56,276][1652475] Updated weights for policy 0, policy_version 109752 (0.0014) [2024-06-15 12:57:59,919][1651340] Signal inference workers to stop experience collection... (5700 times) [2024-06-15 12:57:59,990][1652475] InferenceWorker_p0-w0: stopping experience collection (5700 times) [2024-06-15 12:58:00,140][1651340] Signal inference workers to resume experience collection... (5700 times) [2024-06-15 12:58:00,141][1652475] InferenceWorker_p0-w0: resuming experience collection (5700 times) [2024-06-15 12:58:00,713][1652475] Updated weights for policy 0, policy_version 109795 (0.0012) [2024-06-15 12:58:00,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 224854016. Throughput: 0: 10865.7. Samples: 56294400. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:00,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:58:03,061][1652475] Updated weights for policy 0, policy_version 109844 (0.0025) [2024-06-15 12:58:04,318][1652475] Updated weights for policy 0, policy_version 109890 (0.0012) [2024-06-15 12:58:05,451][1652475] Updated weights for policy 0, policy_version 109949 (0.0011) [2024-06-15 12:58:05,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 225181696. Throughput: 0: 11013.7. Samples: 56329728. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:05,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:58:10,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 43145.0, 300 sec: 43098.3). Total num frames: 225312768. Throughput: 0: 11070.6. Samples: 56393728. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:10,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:58:11,135][1652475] Updated weights for policy 0, policy_version 110017 (0.0016) [2024-06-15 12:58:12,215][1652475] Updated weights for policy 0, policy_version 110075 (0.0017) [2024-06-15 12:58:15,388][1652475] Updated weights for policy 0, policy_version 110139 (0.0015) [2024-06-15 12:58:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44236.7, 300 sec: 43098.3). Total num frames: 225574912. Throughput: 0: 11275.4. Samples: 56463872. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:15,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:58:16,917][1652475] Updated weights for policy 0, policy_version 110205 (0.0100) [2024-06-15 12:58:19,161][1652475] Updated weights for policy 0, policy_version 110256 (0.0019) [2024-06-15 12:58:20,738][1648984] Fps is (10 sec: 52427.2, 60 sec: 45875.0, 300 sec: 43542.5). Total num frames: 225837056. Throughput: 0: 11354.9. Samples: 56498688. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:20,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:58:23,450][1652475] Updated weights for policy 0, policy_version 110309 (0.0029) [2024-06-15 12:58:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 225968128. Throughput: 0: 11412.0. Samples: 56567808. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:25,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:58:26,188][1652475] Updated weights for policy 0, policy_version 110368 (0.0015) [2024-06-15 12:58:27,872][1652475] Updated weights for policy 0, policy_version 110433 (0.0014) [2024-06-15 12:58:29,988][1652475] Updated weights for policy 0, policy_version 110480 (0.0013) [2024-06-15 12:58:30,738][1648984] Fps is (10 sec: 49153.3, 60 sec: 45329.0, 300 sec: 43431.5). Total num frames: 226328576. Throughput: 0: 11514.3. Samples: 56635904. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 12:58:34,350][1652475] Updated weights for policy 0, policy_version 110544 (0.0016) [2024-06-15 12:58:35,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 43320.7). Total num frames: 226492416. Throughput: 0: 11400.5. Samples: 56672256. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 12:58:36,692][1652475] Updated weights for policy 0, policy_version 110595 (0.0137) [2024-06-15 12:58:38,711][1652475] Updated weights for policy 0, policy_version 110658 (0.0015) [2024-06-15 12:58:40,738][1648984] Fps is (10 sec: 42597.0, 60 sec: 46421.5, 300 sec: 43542.5). Total num frames: 226754560. Throughput: 0: 11480.1. Samples: 56737280. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:40,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 12:58:42,999][1652475] Updated weights for policy 0, policy_version 110736 (0.0014) [2024-06-15 12:58:45,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 226885632. Throughput: 0: 11161.6. Samples: 56796672. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 12:58:46,491][1651340] Signal inference workers to stop experience collection... (5750 times) [2024-06-15 12:58:46,520][1652475] InferenceWorker_p0-w0: stopping experience collection (5750 times) [2024-06-15 12:58:46,751][1651340] Signal inference workers to resume experience collection... (5750 times) [2024-06-15 12:58:46,751][1652475] InferenceWorker_p0-w0: resuming experience collection (5750 times) [2024-06-15 12:58:46,754][1652475] Updated weights for policy 0, policy_version 110816 (0.0016) [2024-06-15 12:58:50,738][1648984] Fps is (10 sec: 29492.1, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 227049472. Throughput: 0: 11013.7. Samples: 56825344. Policy #0 lag: (min: 24.0, avg: 148.5, max: 280.0) [2024-06-15 12:58:50,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:58:50,765][1652475] Updated weights for policy 0, policy_version 110866 (0.0019) [2024-06-15 12:58:51,996][1652475] Updated weights for policy 0, policy_version 110913 (0.0014) [2024-06-15 12:58:55,655][1652475] Updated weights for policy 0, policy_version 110992 (0.0125) [2024-06-15 12:58:55,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 227311616. Throughput: 0: 11036.4. Samples: 56890368. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:58:55,796][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:58:56,430][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000111024_227377152.pth... [2024-06-15 12:58:56,471][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000105984_217055232.pth [2024-06-15 12:58:56,590][1652475] Updated weights for policy 0, policy_version 111032 (0.0018) [2024-06-15 12:58:58,990][1652475] Updated weights for policy 0, policy_version 111088 (0.0013) [2024-06-15 12:59:00,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 227540992. Throughput: 0: 11082.0. Samples: 56962560. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:59:02,932][1652475] Updated weights for policy 0, policy_version 111152 (0.0027) [2024-06-15 12:59:04,917][1652475] Updated weights for policy 0, policy_version 111232 (0.0022) [2024-06-15 12:59:05,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 227803136. Throughput: 0: 11002.4. Samples: 56993792. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:59:10,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 44236.7, 300 sec: 43320.4). Total num frames: 227966976. Throughput: 0: 10854.4. Samples: 57056256. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:59:11,313][1652475] Updated weights for policy 0, policy_version 111360 (0.0015) [2024-06-15 12:59:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 228196352. Throughput: 0: 10843.0. Samples: 57123840. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:59:15,867][1652475] Updated weights for policy 0, policy_version 111440 (0.0015) [2024-06-15 12:59:19,208][1652475] Updated weights for policy 0, policy_version 111490 (0.0012) [2024-06-15 12:59:20,740][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 228458496. Throughput: 0: 10661.0. Samples: 57152000. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:20,741][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:59:22,344][1652475] Updated weights for policy 0, policy_version 111553 (0.0015) [2024-06-15 12:59:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 228589568. Throughput: 0: 10786.2. Samples: 57222656. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:59:26,340][1652475] Updated weights for policy 0, policy_version 111632 (0.0014) [2024-06-15 12:59:27,949][1652475] Updated weights for policy 0, policy_version 111699 (0.0014) [2024-06-15 12:59:29,130][1652475] Updated weights for policy 0, policy_version 111744 (0.0037) [2024-06-15 12:59:30,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 228851712. Throughput: 0: 10877.2. Samples: 57286144. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:59:31,598][1651340] Signal inference workers to stop experience collection... (5800 times) [2024-06-15 12:59:31,661][1652475] InferenceWorker_p0-w0: stopping experience collection (5800 times) [2024-06-15 12:59:31,842][1651340] Signal inference workers to resume experience collection... (5800 times) [2024-06-15 12:59:31,843][1652475] InferenceWorker_p0-w0: resuming experience collection (5800 times) [2024-06-15 12:59:34,219][1652475] Updated weights for policy 0, policy_version 111824 (0.0029) [2024-06-15 12:59:35,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 229113856. Throughput: 0: 10956.8. Samples: 57318400. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 12:59:38,575][1652475] Updated weights for policy 0, policy_version 111888 (0.0016) [2024-06-15 12:59:40,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42598.6, 300 sec: 43320.6). Total num frames: 229310464. Throughput: 0: 11036.5. Samples: 57387008. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 12:59:40,777][1652475] Updated weights for policy 0, policy_version 111973 (0.0191) [2024-06-15 12:59:43,551][1652475] Updated weights for policy 0, policy_version 112048 (0.0014) [2024-06-15 12:59:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 229507072. Throughput: 0: 10672.4. Samples: 57442816. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:45,740][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 12:59:49,876][1652475] Updated weights for policy 0, policy_version 112117 (0.0102) [2024-06-15 12:59:50,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 229670912. Throughput: 0: 10774.8. Samples: 57478656. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 12:59:51,385][1652475] Updated weights for policy 0, policy_version 112176 (0.0013) [2024-06-15 12:59:53,477][1652475] Updated weights for policy 0, policy_version 112214 (0.0021) [2024-06-15 12:59:55,303][1652475] Updated weights for policy 0, policy_version 112304 (0.0091) [2024-06-15 12:59:55,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45329.1, 300 sec: 43542.6). Total num frames: 230031360. Throughput: 0: 10604.1. Samples: 57533440. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 12:59:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:00:00,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 230031360. Throughput: 0: 10740.6. Samples: 57607168. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 13:00:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:00:01,901][1652475] Updated weights for policy 0, policy_version 112344 (0.0012) [2024-06-15 13:00:03,335][1652475] Updated weights for policy 0, policy_version 112404 (0.0013) [2024-06-15 13:00:04,183][1652475] Updated weights for policy 0, policy_version 112448 (0.0012) [2024-06-15 13:00:05,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 230359040. Throughput: 0: 10752.0. Samples: 57635840. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:05,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:00:06,103][1652475] Updated weights for policy 0, policy_version 112501 (0.0018) [2024-06-15 13:00:07,578][1652475] Updated weights for policy 0, policy_version 112575 (0.0015) [2024-06-15 13:00:10,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 230555648. Throughput: 0: 10729.2. Samples: 57705472. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:00:13,856][1652475] Updated weights for policy 0, policy_version 112630 (0.0012) [2024-06-15 13:00:15,410][1652475] Updated weights for policy 0, policy_version 112704 (0.0099) [2024-06-15 13:00:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 230817792. Throughput: 0: 10763.4. Samples: 57770496. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:00:16,429][1651340] Signal inference workers to stop experience collection... (5850 times) [2024-06-15 13:00:16,507][1652475] InferenceWorker_p0-w0: stopping experience collection (5850 times) [2024-06-15 13:00:16,683][1651340] Signal inference workers to resume experience collection... (5850 times) [2024-06-15 13:00:16,684][1652475] InferenceWorker_p0-w0: resuming experience collection (5850 times) [2024-06-15 13:00:18,100][1652475] Updated weights for policy 0, policy_version 112769 (0.0137) [2024-06-15 13:00:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 231079936. Throughput: 0: 10763.4. Samples: 57802752. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:00:25,158][1652475] Updated weights for policy 0, policy_version 112835 (0.0014) [2024-06-15 13:00:25,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 231112704. Throughput: 0: 10774.8. Samples: 57871872. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:00:27,782][1652475] Updated weights for policy 0, policy_version 112955 (0.0083) [2024-06-15 13:00:29,618][1652475] Updated weights for policy 0, policy_version 113018 (0.0014) [2024-06-15 13:00:30,740][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.7, 300 sec: 43653.6). Total num frames: 231505920. Throughput: 0: 10774.7. Samples: 57927680. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:30,741][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:00:31,316][1652475] Updated weights for policy 0, policy_version 113059 (0.0011) [2024-06-15 13:00:35,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 231604224. Throughput: 0: 10752.0. Samples: 57962496. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:00:37,733][1652475] Updated weights for policy 0, policy_version 113104 (0.0012) [2024-06-15 13:00:39,890][1652475] Updated weights for policy 0, policy_version 113188 (0.0095) [2024-06-15 13:00:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 231866368. Throughput: 0: 10990.9. Samples: 58028032. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:00:40,939][1652475] Updated weights for policy 0, policy_version 113235 (0.0025) [2024-06-15 13:00:42,931][1652475] Updated weights for policy 0, policy_version 113312 (0.0070) [2024-06-15 13:00:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 232128512. Throughput: 0: 10763.4. Samples: 58091520. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:00:49,771][1652475] Updated weights for policy 0, policy_version 113362 (0.0014) [2024-06-15 13:00:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 232259584. Throughput: 0: 11025.1. Samples: 58131968. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:00:51,977][1652475] Updated weights for policy 0, policy_version 113468 (0.0138) [2024-06-15 13:00:53,889][1652475] Updated weights for policy 0, policy_version 113520 (0.0013) [2024-06-15 13:00:55,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42598.3, 300 sec: 43764.7). Total num frames: 232587264. Throughput: 0: 10740.6. Samples: 58188800. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:00:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:00:56,192][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000113584_232620032.pth... [2024-06-15 13:00:56,229][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000108432_222068736.pth [2024-06-15 13:00:56,465][1652475] Updated weights for policy 0, policy_version 113593 (0.0013) [2024-06-15 13:01:00,740][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 232652800. Throughput: 0: 10683.7. Samples: 58251264. Policy #0 lag: (min: 15.0, avg: 59.5, max: 207.0) [2024-06-15 13:01:00,741][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:01:03,720][1652475] Updated weights for policy 0, policy_version 113681 (0.0014) [2024-06-15 13:01:04,157][1651340] Signal inference workers to stop experience collection... (5900 times) [2024-06-15 13:01:04,185][1652475] InferenceWorker_p0-w0: stopping experience collection (5900 times) [2024-06-15 13:01:04,453][1651340] Signal inference workers to resume experience collection... (5900 times) [2024-06-15 13:01:04,461][1652475] InferenceWorker_p0-w0: resuming experience collection (5900 times) [2024-06-15 13:01:05,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 232914944. Throughput: 0: 10569.9. Samples: 58278400. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:01:07,285][1652475] Updated weights for policy 0, policy_version 113746 (0.0015) [2024-06-15 13:01:09,086][1652475] Updated weights for policy 0, policy_version 113840 (0.0087) [2024-06-15 13:01:10,762][1648984] Fps is (10 sec: 52300.5, 60 sec: 43672.8, 300 sec: 43094.7). Total num frames: 233177088. Throughput: 0: 10279.9. Samples: 58334720. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:10,763][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:01:15,644][1652475] Updated weights for policy 0, policy_version 113909 (0.0091) [2024-06-15 13:01:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.0, 300 sec: 42876.1). Total num frames: 233275392. Throughput: 0: 10604.1. Samples: 58404864. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:15,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:01:17,424][1652475] Updated weights for policy 0, policy_version 113952 (0.0011) [2024-06-15 13:01:18,825][1652475] Updated weights for policy 0, policy_version 114001 (0.0039) [2024-06-15 13:01:20,520][1652475] Updated weights for policy 0, policy_version 114082 (0.0015) [2024-06-15 13:01:20,738][1648984] Fps is (10 sec: 45988.6, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 233635840. Throughput: 0: 10524.5. Samples: 58436096. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:20,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:01:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 233701376. Throughput: 0: 10581.3. Samples: 58504192. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:25,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:01:27,146][1652475] Updated weights for policy 0, policy_version 114160 (0.0013) [2024-06-15 13:01:28,243][1652475] Updated weights for policy 0, policy_version 114181 (0.0012) [2024-06-15 13:01:29,511][1652475] Updated weights for policy 0, policy_version 114236 (0.0012) [2024-06-15 13:01:30,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 41506.2, 300 sec: 43209.3). Total num frames: 233996288. Throughput: 0: 10706.5. Samples: 58573312. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:30,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:01:31,520][1652475] Updated weights for policy 0, policy_version 114304 (0.0108) [2024-06-15 13:01:35,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 234225664. Throughput: 0: 10353.7. Samples: 58597888. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:35,739][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:01:38,265][1652475] Updated weights for policy 0, policy_version 114374 (0.0016) [2024-06-15 13:01:39,550][1652475] Updated weights for policy 0, policy_version 114432 (0.0012) [2024-06-15 13:01:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 234389504. Throughput: 0: 10763.4. Samples: 58673152. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:40,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:01:41,588][1652475] Updated weights for policy 0, policy_version 114492 (0.0012) [2024-06-15 13:01:43,439][1652475] Updated weights for policy 0, policy_version 114560 (0.0133) [2024-06-15 13:01:44,759][1652475] Updated weights for policy 0, policy_version 114624 (0.0124) [2024-06-15 13:01:45,743][1648984] Fps is (10 sec: 52403.8, 60 sec: 43687.0, 300 sec: 43652.9). Total num frames: 234749952. Throughput: 0: 10648.4. Samples: 58730496. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:45,744][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:01:50,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.2, 300 sec: 43101.8). Total num frames: 234749952. Throughput: 0: 10911.3. Samples: 58769408. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:50,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:01:51,863][1651340] Signal inference workers to stop experience collection... (5950 times) [2024-06-15 13:01:51,903][1652475] InferenceWorker_p0-w0: stopping experience collection (5950 times) [2024-06-15 13:01:52,131][1651340] Signal inference workers to resume experience collection... (5950 times) [2024-06-15 13:01:52,132][1652475] InferenceWorker_p0-w0: resuming experience collection (5950 times) [2024-06-15 13:01:52,134][1652475] Updated weights for policy 0, policy_version 114688 (0.0017) [2024-06-15 13:01:53,653][1652475] Updated weights for policy 0, policy_version 114746 (0.0012) [2024-06-15 13:01:55,738][1648984] Fps is (10 sec: 36062.8, 60 sec: 42052.3, 300 sec: 43431.5). Total num frames: 235110400. Throughput: 0: 11008.3. Samples: 58829824. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:01:55,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:01:56,553][1652475] Updated weights for policy 0, policy_version 114833 (0.0028) [2024-06-15 13:01:57,617][1652475] Updated weights for policy 0, policy_version 114878 (0.0013) [2024-06-15 13:02:00,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 235274240. Throughput: 0: 10808.9. Samples: 58891264. Policy #0 lag: (min: 47.0, avg: 104.2, max: 287.0) [2024-06-15 13:02:00,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:02:04,913][1652475] Updated weights for policy 0, policy_version 114944 (0.0043) [2024-06-15 13:02:05,742][1648984] Fps is (10 sec: 36028.9, 60 sec: 42595.3, 300 sec: 43208.8). Total num frames: 235470848. Throughput: 0: 10967.1. Samples: 58929664. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:05,743][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 13:02:06,317][1652475] Updated weights for policy 0, policy_version 115008 (0.0014) [2024-06-15 13:02:09,692][1652475] Updated weights for policy 0, policy_version 115088 (0.0207) [2024-06-15 13:02:10,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43708.5, 300 sec: 43653.6). Total num frames: 235798528. Throughput: 0: 10558.6. Samples: 58979328. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:02:15,641][1652475] Updated weights for policy 0, policy_version 115168 (0.0016) [2024-06-15 13:02:15,738][1648984] Fps is (10 sec: 39339.3, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 235864064. Throughput: 0: 10467.6. Samples: 59044352. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:15,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:02:19,191][1652475] Updated weights for policy 0, policy_version 115234 (0.0021) [2024-06-15 13:02:20,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 40960.0, 300 sec: 43209.3). Total num frames: 236093440. Throughput: 0: 10604.2. Samples: 59075072. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:02:21,448][1652475] Updated weights for policy 0, policy_version 115326 (0.0019) [2024-06-15 13:02:25,018][1652475] Updated weights for policy 0, policy_version 115386 (0.0019) [2024-06-15 13:02:25,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 236322816. Throughput: 0: 10285.5. Samples: 59136000. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:02:28,323][1652475] Updated weights for policy 0, policy_version 115455 (0.0013) [2024-06-15 13:02:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 43209.3). Total num frames: 236486656. Throughput: 0: 10537.0. Samples: 59204608. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:02:31,858][1652475] Updated weights for policy 0, policy_version 115514 (0.0014) [2024-06-15 13:02:33,459][1652475] Updated weights for policy 0, policy_version 115580 (0.0013) [2024-06-15 13:02:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.3, 300 sec: 43209.4). Total num frames: 236716032. Throughput: 0: 10194.5. Samples: 59228160. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:02:36,547][1651340] Signal inference workers to stop experience collection... (6000 times) [2024-06-15 13:02:36,620][1652475] InferenceWorker_p0-w0: stopping experience collection (6000 times) [2024-06-15 13:02:36,852][1651340] Signal inference workers to resume experience collection... (6000 times) [2024-06-15 13:02:36,866][1652475] InferenceWorker_p0-w0: resuming experience collection (6000 times) [2024-06-15 13:02:36,869][1652475] Updated weights for policy 0, policy_version 115632 (0.0012) [2024-06-15 13:02:40,427][1652475] Updated weights for policy 0, policy_version 115705 (0.0020) [2024-06-15 13:02:40,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 236978176. Throughput: 0: 10478.9. Samples: 59301376. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:02:44,628][1652475] Updated weights for policy 0, policy_version 115778 (0.0014) [2024-06-15 13:02:45,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 40963.3, 300 sec: 43431.5). Total num frames: 237207552. Throughput: 0: 10331.0. Samples: 59356160. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:02:45,884][1652475] Updated weights for policy 0, policy_version 115840 (0.0013) [2024-06-15 13:02:49,596][1652475] Updated weights for policy 0, policy_version 115902 (0.0012) [2024-06-15 13:02:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 237371392. Throughput: 0: 10343.4. Samples: 59395072. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:02:55,462][1652475] Updated weights for policy 0, policy_version 116000 (0.0014) [2024-06-15 13:02:55,739][1648984] Fps is (10 sec: 36039.0, 60 sec: 40958.8, 300 sec: 43098.0). Total num frames: 237568000. Throughput: 0: 10717.5. Samples: 59461632. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:02:55,740][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:02:56,183][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000116032_237633536.pth... [2024-06-15 13:02:56,234][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000111024_227377152.pth [2024-06-15 13:02:57,058][1652475] Updated weights for policy 0, policy_version 116064 (0.0038) [2024-06-15 13:03:00,509][1652475] Updated weights for policy 0, policy_version 116113 (0.0038) [2024-06-15 13:03:00,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 237797376. Throughput: 0: 10729.2. Samples: 59527168. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:03:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:03:03,583][1652475] Updated weights for policy 0, policy_version 116192 (0.0015) [2024-06-15 13:03:05,738][1648984] Fps is (10 sec: 45883.3, 60 sec: 42601.6, 300 sec: 43098.3). Total num frames: 238026752. Throughput: 0: 10763.4. Samples: 59559424. Policy #0 lag: (min: 41.0, avg: 96.3, max: 265.0) [2024-06-15 13:03:05,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:03:06,761][1652475] Updated weights for policy 0, policy_version 116243 (0.0076) [2024-06-15 13:03:07,540][1652475] Updated weights for policy 0, policy_version 116287 (0.0094) [2024-06-15 13:03:10,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 238288896. Throughput: 0: 10831.6. Samples: 59623424. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:03:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:03:12,004][1652475] Updated weights for policy 0, policy_version 116357 (0.0019) [2024-06-15 13:03:14,806][1652475] Updated weights for policy 0, policy_version 116417 (0.0028) [2024-06-15 13:03:15,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 238485504. Throughput: 0: 10820.3. Samples: 59691520. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:03:15,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 13:03:19,774][1652475] Updated weights for policy 0, policy_version 116482 (0.0015) [2024-06-15 13:03:20,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 42052.2, 300 sec: 42876.1). Total num frames: 238616576. Throughput: 0: 11025.1. Samples: 59724288. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:03:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:03:21,436][1652475] Updated weights for policy 0, policy_version 116545 (0.0144) [2024-06-15 13:03:24,090][1652475] Updated weights for policy 0, policy_version 116624 (0.0017) [2024-06-15 13:03:25,147][1652475] Updated weights for policy 0, policy_version 116671 (0.0012) [2024-06-15 13:03:25,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 238944256. Throughput: 0: 10592.7. Samples: 59778048. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:03:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:03:28,590][1651340] Signal inference workers to stop experience collection... (6050 times) [2024-06-15 13:03:28,621][1652475] InferenceWorker_p0-w0: stopping experience collection (6050 times) [2024-06-15 13:03:28,850][1651340] Signal inference workers to resume experience collection... (6050 times) [2024-06-15 13:03:28,851][1652475] InferenceWorker_p0-w0: resuming experience collection (6050 times) [2024-06-15 13:03:30,740][1648984] Fps is (10 sec: 45874.7, 60 sec: 43144.4, 300 sec: 42653.9). Total num frames: 239075328. Throughput: 0: 10854.4. Samples: 59844608. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:03:30,741][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:03:32,360][1652475] Updated weights for policy 0, policy_version 116768 (0.0012) [2024-06-15 13:03:33,851][1652475] Updated weights for policy 0, policy_version 116816 (0.0013) [2024-06-15 13:03:35,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 239337472. Throughput: 0: 10649.6. Samples: 59874304. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:03:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:03:35,937][1652475] Updated weights for policy 0, policy_version 116880 (0.0036) [2024-06-15 13:03:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 239468544. Throughput: 0: 10695.5. Samples: 59942912. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:03:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:03:41,080][1652475] Updated weights for policy 0, policy_version 116935 (0.0035) [2024-06-15 13:03:44,187][1652475] Updated weights for policy 0, policy_version 117012 (0.0013) [2024-06-15 13:03:45,489][1652475] Updated weights for policy 0, policy_version 117064 (0.0012) [2024-06-15 13:03:45,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 239763456. Throughput: 0: 10683.7. Samples: 60007936. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:03:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:03:48,290][1652475] Updated weights for policy 0, policy_version 117136 (0.0016) [2024-06-15 13:03:50,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 239992832. Throughput: 0: 10695.1. Samples: 60040704. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:03:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:03:53,130][1652475] Updated weights for policy 0, policy_version 117188 (0.0015) [2024-06-15 13:03:55,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 43145.5, 300 sec: 42765.0). Total num frames: 240156672. Throughput: 0: 10717.8. Samples: 60105728. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:03:55,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:03:55,802][1652475] Updated weights for policy 0, policy_version 117264 (0.0115) [2024-06-15 13:03:56,747][1652475] Updated weights for policy 0, policy_version 117310 (0.0015) [2024-06-15 13:03:58,567][1652475] Updated weights for policy 0, policy_version 117372 (0.0014) [2024-06-15 13:04:00,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 240418816. Throughput: 0: 10661.0. Samples: 60171264. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:04:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:04:01,034][1652475] Updated weights for policy 0, policy_version 117409 (0.0015) [2024-06-15 13:04:05,678][1652475] Updated weights for policy 0, policy_version 117475 (0.0018) [2024-06-15 13:04:05,738][1648984] Fps is (10 sec: 42599.6, 60 sec: 42598.3, 300 sec: 42765.0). Total num frames: 240582656. Throughput: 0: 10683.7. Samples: 60205056. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 13:04:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:04:09,326][1652475] Updated weights for policy 0, policy_version 117563 (0.0015) [2024-06-15 13:04:10,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 240844800. Throughput: 0: 10786.1. Samples: 60263424. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:04:10,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:04:10,994][1652475] Updated weights for policy 0, policy_version 117626 (0.0014) [2024-06-15 13:04:14,396][1652475] Updated weights for policy 0, policy_version 117687 (0.0015) [2024-06-15 13:04:15,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42598.4, 300 sec: 42654.0). Total num frames: 241041408. Throughput: 0: 10763.4. Samples: 60328960. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:04:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:04:17,237][1651340] Signal inference workers to stop experience collection... (6100 times) [2024-06-15 13:04:17,266][1652475] InferenceWorker_p0-w0: stopping experience collection (6100 times) [2024-06-15 13:04:17,552][1651340] Signal inference workers to resume experience collection... (6100 times) [2024-06-15 13:04:17,553][1652475] InferenceWorker_p0-w0: resuming experience collection (6100 times) [2024-06-15 13:04:18,475][1652475] Updated weights for policy 0, policy_version 117759 (0.0014) [2024-06-15 13:04:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 241238016. Throughput: 0: 10820.3. Samples: 60361216. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:04:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:04:21,232][1652475] Updated weights for policy 0, policy_version 117810 (0.0012) [2024-06-15 13:04:22,881][1652475] Updated weights for policy 0, policy_version 117886 (0.0013) [2024-06-15 13:04:25,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 241500160. Throughput: 0: 10808.9. Samples: 60429312. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:04:25,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:04:26,373][1652475] Updated weights for policy 0, policy_version 117952 (0.0016) [2024-06-15 13:04:30,414][1652475] Updated weights for policy 0, policy_version 118015 (0.0013) [2024-06-15 13:04:30,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 241696768. Throughput: 0: 10717.9. Samples: 60490240. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:04:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 13:04:33,937][1652475] Updated weights for policy 0, policy_version 118080 (0.0014) [2024-06-15 13:04:35,739][1648984] Fps is (10 sec: 39317.5, 60 sec: 42597.7, 300 sec: 42653.8). Total num frames: 241893376. Throughput: 0: 10615.2. Samples: 60518400. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:04:35,739][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 13:04:40,042][1652475] Updated weights for policy 0, policy_version 118163 (0.0039) [2024-06-15 13:04:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.6, 300 sec: 42542.9). Total num frames: 242057216. Throughput: 0: 10581.4. Samples: 60581888. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:04:40,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:04:41,822][1652475] Updated weights for policy 0, policy_version 118245 (0.0028) [2024-06-15 13:04:44,648][1652475] Updated weights for policy 0, policy_version 118291 (0.0014) [2024-06-15 13:04:45,738][1648984] Fps is (10 sec: 45880.0, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 242352128. Throughput: 0: 10444.8. Samples: 60641280. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:04:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:04:48,437][1652475] Updated weights for policy 0, policy_version 118370 (0.0023) [2024-06-15 13:04:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 242483200. Throughput: 0: 10467.6. Samples: 60676096. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:04:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:04:52,087][1652475] Updated weights for policy 0, policy_version 118403 (0.0012) [2024-06-15 13:04:54,078][1652475] Updated weights for policy 0, policy_version 118496 (0.0086) [2024-06-15 13:04:55,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.8, 300 sec: 43098.2). Total num frames: 242745344. Throughput: 0: 10592.7. Samples: 60740096. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:04:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:04:56,054][1652475] Updated weights for policy 0, policy_version 118545 (0.0013) [2024-06-15 13:04:56,311][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000118560_242810880.pth... [2024-06-15 13:04:56,481][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000113584_232620032.pth [2024-06-15 13:05:00,292][1652475] Updated weights for policy 0, policy_version 118604 (0.0014) [2024-06-15 13:05:00,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42052.3, 300 sec: 42653.9). Total num frames: 242941952. Throughput: 0: 10649.6. Samples: 60808192. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:05:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:05:04,041][1652475] Updated weights for policy 0, policy_version 118658 (0.0028) [2024-06-15 13:05:04,784][1651340] Signal inference workers to stop experience collection... (6150 times) [2024-06-15 13:05:04,812][1652475] InferenceWorker_p0-w0: stopping experience collection (6150 times) [2024-06-15 13:05:05,010][1651340] Signal inference workers to resume experience collection... (6150 times) [2024-06-15 13:05:05,011][1652475] InferenceWorker_p0-w0: resuming experience collection (6150 times) [2024-06-15 13:05:05,685][1652475] Updated weights for policy 0, policy_version 118736 (0.0013) [2024-06-15 13:05:05,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43144.6, 300 sec: 42765.0). Total num frames: 243171328. Throughput: 0: 10763.4. Samples: 60845568. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:05:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:05:07,865][1652475] Updated weights for policy 0, policy_version 118800 (0.0019) [2024-06-15 13:05:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 243400704. Throughput: 0: 10570.0. Samples: 60904960. Policy #0 lag: (min: 10.0, avg: 108.5, max: 266.0) [2024-06-15 13:05:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:05:11,840][1652475] Updated weights for policy 0, policy_version 118864 (0.0037) [2024-06-15 13:05:15,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 243531776. Throughput: 0: 10899.9. Samples: 60980736. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:05:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:05:15,744][1652475] Updated weights for policy 0, policy_version 118914 (0.0015) [2024-06-15 13:05:17,423][1652475] Updated weights for policy 0, policy_version 118988 (0.0014) [2024-06-15 13:05:19,551][1652475] Updated weights for policy 0, policy_version 119043 (0.0015) [2024-06-15 13:05:20,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 243892224. Throughput: 0: 10843.3. Samples: 61006336. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:05:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:05:20,880][1652475] Updated weights for policy 0, policy_version 119098 (0.0013) [2024-06-15 13:05:24,063][1652475] Updated weights for policy 0, policy_version 119162 (0.0014) [2024-06-15 13:05:25,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 244056064. Throughput: 0: 10956.8. Samples: 61074944. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:05:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:05:28,400][1652475] Updated weights for policy 0, policy_version 119216 (0.0012) [2024-06-15 13:05:29,817][1652475] Updated weights for policy 0, policy_version 119286 (0.0014) [2024-06-15 13:05:30,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 244318208. Throughput: 0: 11218.5. Samples: 61146112. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:05:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:05:33,192][1652475] Updated weights for policy 0, policy_version 119352 (0.0015) [2024-06-15 13:05:35,735][1652475] Updated weights for policy 0, policy_version 119408 (0.0013) [2024-06-15 13:05:35,737][1648984] Fps is (10 sec: 49152.8, 60 sec: 44237.7, 300 sec: 42987.2). Total num frames: 244547584. Throughput: 0: 11059.2. Samples: 61173760. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:05:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:05:40,569][1652475] Updated weights for policy 0, policy_version 119475 (0.0014) [2024-06-15 13:05:40,738][1648984] Fps is (10 sec: 36044.1, 60 sec: 43690.5, 300 sec: 42542.8). Total num frames: 244678656. Throughput: 0: 11081.9. Samples: 61238784. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:05:40,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:05:45,724][1652475] Updated weights for policy 0, policy_version 119553 (0.0106) [2024-06-15 13:05:45,738][1648984] Fps is (10 sec: 29490.7, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 244842496. Throughput: 0: 10945.4. Samples: 61300736. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:05:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:05:47,375][1652475] Updated weights for policy 0, policy_version 119619 (0.0015) [2024-06-15 13:05:48,168][1651340] Signal inference workers to stop experience collection... (6200 times) [2024-06-15 13:05:48,224][1652475] InferenceWorker_p0-w0: stopping experience collection (6200 times) [2024-06-15 13:05:48,335][1651340] Signal inference workers to resume experience collection... (6200 times) [2024-06-15 13:05:48,336][1652475] InferenceWorker_p0-w0: resuming experience collection (6200 times) [2024-06-15 13:05:48,658][1652475] Updated weights for policy 0, policy_version 119680 (0.0013) [2024-06-15 13:05:50,738][1648984] Fps is (10 sec: 42597.0, 60 sec: 43690.2, 300 sec: 42431.7). Total num frames: 245104640. Throughput: 0: 10672.2. Samples: 61325824. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:05:50,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 13:05:52,765][1652475] Updated weights for policy 0, policy_version 119744 (0.0026) [2024-06-15 13:05:55,740][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 245334016. Throughput: 0: 10911.3. Samples: 61395968. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:05:55,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:05:55,943][1652475] Updated weights for policy 0, policy_version 119806 (0.0015) [2024-06-15 13:05:59,471][1652475] Updated weights for policy 0, policy_version 119872 (0.0110) [2024-06-15 13:06:00,738][1648984] Fps is (10 sec: 52431.9, 60 sec: 44782.9, 300 sec: 43098.3). Total num frames: 245628928. Throughput: 0: 10615.5. Samples: 61458432. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:06:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:06:03,544][1652475] Updated weights for policy 0, policy_version 119938 (0.0018) [2024-06-15 13:06:05,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 42657.5). Total num frames: 245760000. Throughput: 0: 10888.5. Samples: 61496320. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:06:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:06:06,262][1652475] Updated weights for policy 0, policy_version 120001 (0.0015) [2024-06-15 13:06:07,753][1652475] Updated weights for policy 0, policy_version 120063 (0.0016) [2024-06-15 13:06:10,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 245989376. Throughput: 0: 10786.1. Samples: 61560320. Policy #0 lag: (min: 15.0, avg: 127.2, max: 271.0) [2024-06-15 13:06:10,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:06:11,098][1652475] Updated weights for policy 0, policy_version 120128 (0.0014) [2024-06-15 13:06:12,468][1652475] Updated weights for policy 0, policy_version 120190 (0.0013) [2024-06-15 13:06:15,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 42653.9). Total num frames: 246218752. Throughput: 0: 10740.6. Samples: 61629440. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:06:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:06:18,070][1652475] Updated weights for policy 0, policy_version 120258 (0.0155) [2024-06-15 13:06:20,739][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 246415360. Throughput: 0: 10820.2. Samples: 61660672. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:06:20,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:06:21,868][1652475] Updated weights for policy 0, policy_version 120324 (0.0014) [2024-06-15 13:06:23,803][1652475] Updated weights for policy 0, policy_version 120416 (0.0014) [2024-06-15 13:06:25,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 246677504. Throughput: 0: 10672.4. Samples: 61719040. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:06:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:06:28,247][1652475] Updated weights for policy 0, policy_version 120496 (0.0029) [2024-06-15 13:06:30,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 246874112. Throughput: 0: 10808.9. Samples: 61787136. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:06:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:06:31,406][1652475] Updated weights for policy 0, policy_version 120573 (0.0013) [2024-06-15 13:06:35,192][1652475] Updated weights for policy 0, policy_version 120624 (0.0012) [2024-06-15 13:06:35,703][1651340] Signal inference workers to stop experience collection... (6250 times) [2024-06-15 13:06:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 247070720. Throughput: 0: 11002.5. Samples: 61820928. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:06:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:06:35,773][1652475] InferenceWorker_p0-w0: stopping experience collection (6250 times) [2024-06-15 13:06:35,890][1651340] Signal inference workers to resume experience collection... (6250 times) [2024-06-15 13:06:35,891][1652475] InferenceWorker_p0-w0: resuming experience collection (6250 times) [2024-06-15 13:06:36,697][1652475] Updated weights for policy 0, policy_version 120703 (0.0012) [2024-06-15 13:06:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.7, 300 sec: 42432.5). Total num frames: 247267328. Throughput: 0: 11025.1. Samples: 61892096. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:06:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:06:41,044][1652475] Updated weights for policy 0, policy_version 120761 (0.0139) [2024-06-15 13:06:43,352][1652475] Updated weights for policy 0, policy_version 120824 (0.0014) [2024-06-15 13:06:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 247463936. Throughput: 0: 10911.3. Samples: 61949440. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:06:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 13:06:47,058][1652475] Updated weights for policy 0, policy_version 120880 (0.0124) [2024-06-15 13:06:50,750][1648984] Fps is (10 sec: 45816.6, 60 sec: 43681.8, 300 sec: 42763.2). Total num frames: 247726080. Throughput: 0: 10726.2. Samples: 61979136. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:06:50,751][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:06:52,423][1652475] Updated weights for policy 0, policy_version 120961 (0.0116) [2024-06-15 13:06:54,188][1652475] Updated weights for policy 0, policy_version 121028 (0.0013) [2024-06-15 13:06:55,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 44236.8, 300 sec: 43098.2). Total num frames: 247988224. Throughput: 0: 10695.1. Samples: 62041600. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:06:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:06:55,755][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000121088_247988224.pth... [2024-06-15 13:06:55,796][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000116032_237633536.pth [2024-06-15 13:06:57,598][1652475] Updated weights for policy 0, policy_version 121091 (0.0014) [2024-06-15 13:06:58,412][1652475] Updated weights for policy 0, policy_version 121142 (0.0013) [2024-06-15 13:07:00,739][1648984] Fps is (10 sec: 39367.2, 60 sec: 41505.3, 300 sec: 42876.6). Total num frames: 248119296. Throughput: 0: 10660.7. Samples: 62109184. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:07:00,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:07:02,513][1652475] Updated weights for policy 0, policy_version 121208 (0.0012) [2024-06-15 13:07:05,519][1652475] Updated weights for policy 0, policy_version 121271 (0.0012) [2024-06-15 13:07:05,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 43144.6, 300 sec: 42542.9). Total num frames: 248348672. Throughput: 0: 10661.0. Samples: 62140416. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:07:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:07:07,600][1652475] Updated weights for policy 0, policy_version 121315 (0.0014) [2024-06-15 13:07:09,598][1652475] Updated weights for policy 0, policy_version 121367 (0.0023) [2024-06-15 13:07:10,740][1648984] Fps is (10 sec: 52434.4, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 248643584. Throughput: 0: 10808.9. Samples: 62205440. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:07:10,741][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:07:13,939][1652475] Updated weights for policy 0, policy_version 121445 (0.0015) [2024-06-15 13:07:15,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42598.5, 300 sec: 42987.2). Total num frames: 248774656. Throughput: 0: 10717.9. Samples: 62269440. Policy #0 lag: (min: 8.0, avg: 124.3, max: 264.0) [2024-06-15 13:07:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:07:16,784][1652475] Updated weights for policy 0, policy_version 121504 (0.0094) [2024-06-15 13:07:18,897][1652475] Updated weights for policy 0, policy_version 121552 (0.0014) [2024-06-15 13:07:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 249036800. Throughput: 0: 10763.4. Samples: 62305280. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:07:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:07:20,779][1652475] Updated weights for policy 0, policy_version 121602 (0.0013) [2024-06-15 13:07:21,814][1652475] Updated weights for policy 0, policy_version 121662 (0.0017) [2024-06-15 13:07:25,529][1651340] Signal inference workers to stop experience collection... (6300 times) [2024-06-15 13:07:25,582][1652475] InferenceWorker_p0-w0: stopping experience collection (6300 times) [2024-06-15 13:07:25,745][1648984] Fps is (10 sec: 39294.0, 60 sec: 41501.3, 300 sec: 42986.2). Total num frames: 249167872. Throughput: 0: 10716.2. Samples: 62374400. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:07:25,745][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:07:25,821][1651340] Signal inference workers to resume experience collection... (6300 times) [2024-06-15 13:07:25,823][1652475] InferenceWorker_p0-w0: resuming experience collection (6300 times) [2024-06-15 13:07:26,779][1652475] Updated weights for policy 0, policy_version 121717 (0.0015) [2024-06-15 13:07:27,467][1652475] Updated weights for policy 0, policy_version 121734 (0.0032) [2024-06-15 13:07:28,517][1652475] Updated weights for policy 0, policy_version 121789 (0.0013) [2024-06-15 13:07:30,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 249462784. Throughput: 0: 11036.4. Samples: 62446080. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:07:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:07:31,999][1652475] Updated weights for policy 0, policy_version 121872 (0.0079) [2024-06-15 13:07:35,738][1648984] Fps is (10 sec: 52465.5, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 249692160. Throughput: 0: 10937.1. Samples: 62471168. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:07:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:07:38,047][1652475] Updated weights for policy 0, policy_version 121952 (0.0015) [2024-06-15 13:07:40,210][1652475] Updated weights for policy 0, policy_version 122020 (0.0012) [2024-06-15 13:07:40,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44236.7, 300 sec: 43098.3). Total num frames: 249921536. Throughput: 0: 11116.1. Samples: 62541824. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:07:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:07:43,154][1652475] Updated weights for policy 0, policy_version 122086 (0.0034) [2024-06-15 13:07:44,711][1652475] Updated weights for policy 0, policy_version 122135 (0.0039) [2024-06-15 13:07:45,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 45875.1, 300 sec: 43542.6). Total num frames: 250216448. Throughput: 0: 10911.5. Samples: 62600192. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:07:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:07:50,458][1652475] Updated weights for policy 0, policy_version 122224 (0.0029) [2024-06-15 13:07:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43153.7, 300 sec: 43209.6). Total num frames: 250314752. Throughput: 0: 11082.0. Samples: 62639104. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:07:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:07:52,430][1652475] Updated weights for policy 0, policy_version 122301 (0.0014) [2024-06-15 13:07:55,656][1652475] Updated weights for policy 0, policy_version 122360 (0.0016) [2024-06-15 13:07:55,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 250576896. Throughput: 0: 10968.2. Samples: 62699008. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:07:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:07:58,661][1652475] Updated weights for policy 0, policy_version 122389 (0.0027) [2024-06-15 13:08:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43691.5, 300 sec: 43098.2). Total num frames: 250740736. Throughput: 0: 10888.5. Samples: 62759424. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:08:00,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:01,651][1652475] Updated weights for policy 0, policy_version 122448 (0.0016) [2024-06-15 13:08:05,138][1652475] Updated weights for policy 0, policy_version 122512 (0.0014) [2024-06-15 13:08:05,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 250937344. Throughput: 0: 10752.0. Samples: 62789120. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:08:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:08:06,975][1652475] Updated weights for policy 0, policy_version 122576 (0.0122) [2024-06-15 13:08:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 42876.1). Total num frames: 251133952. Throughput: 0: 10571.6. Samples: 62850048. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:08:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:11,161][1652475] Updated weights for policy 0, policy_version 122643 (0.0014) [2024-06-15 13:08:13,936][1651340] Signal inference workers to stop experience collection... (6350 times) [2024-06-15 13:08:13,964][1652475] Updated weights for policy 0, policy_version 122690 (0.0013) [2024-06-15 13:08:13,981][1652475] InferenceWorker_p0-w0: stopping experience collection (6350 times) [2024-06-15 13:08:14,292][1651340] Signal inference workers to resume experience collection... (6350 times) [2024-06-15 13:08:14,296][1652475] InferenceWorker_p0-w0: resuming experience collection (6350 times) [2024-06-15 13:08:15,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 251396096. Throughput: 0: 10296.9. Samples: 62909440. Policy #0 lag: (min: 27.0, avg: 132.1, max: 283.0) [2024-06-15 13:08:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:18,280][1652475] Updated weights for policy 0, policy_version 122770 (0.0012) [2024-06-15 13:08:19,583][1652475] Updated weights for policy 0, policy_version 122837 (0.0021) [2024-06-15 13:08:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 251658240. Throughput: 0: 10592.7. Samples: 62947840. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:08:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:23,522][1652475] Updated weights for policy 0, policy_version 122912 (0.0013) [2024-06-15 13:08:25,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44241.9, 300 sec: 43209.3). Total num frames: 251822080. Throughput: 0: 10456.2. Samples: 63012352. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:08:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:26,033][1652475] Updated weights for policy 0, policy_version 122976 (0.0012) [2024-06-15 13:08:30,187][1652475] Updated weights for policy 0, policy_version 123028 (0.0014) [2024-06-15 13:08:30,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42598.5, 300 sec: 42987.2). Total num frames: 252018688. Throughput: 0: 10820.3. Samples: 63087104. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:08:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:31,638][1652475] Updated weights for policy 0, policy_version 123091 (0.0013) [2024-06-15 13:08:34,463][1652475] Updated weights for policy 0, policy_version 123156 (0.0119) [2024-06-15 13:08:35,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 252313600. Throughput: 0: 10581.3. Samples: 63115264. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:08:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:37,581][1652475] Updated weights for policy 0, policy_version 123216 (0.0013) [2024-06-15 13:08:38,836][1652475] Updated weights for policy 0, policy_version 123260 (0.0119) [2024-06-15 13:08:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 252444672. Throughput: 0: 10661.0. Samples: 63178752. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:08:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:42,313][1652475] Updated weights for policy 0, policy_version 123312 (0.0100) [2024-06-15 13:08:43,890][1652475] Updated weights for policy 0, policy_version 123387 (0.0012) [2024-06-15 13:08:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 252706816. Throughput: 0: 10899.9. Samples: 63249920. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:08:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:46,820][1652475] Updated weights for policy 0, policy_version 123448 (0.0012) [2024-06-15 13:08:49,645][1652475] Updated weights for policy 0, policy_version 123513 (0.0013) [2024-06-15 13:08:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 252968960. Throughput: 0: 11093.3. Samples: 63288320. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:08:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:54,162][1652475] Updated weights for policy 0, policy_version 123583 (0.0016) [2024-06-15 13:08:55,746][1648984] Fps is (10 sec: 52386.3, 60 sec: 44230.9, 300 sec: 43430.3). Total num frames: 253231104. Throughput: 0: 11193.7. Samples: 63353856. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:08:55,746][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:08:55,767][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000123648_253231104.pth... [2024-06-15 13:08:55,814][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000118560_242810880.pth [2024-06-15 13:08:57,980][1651340] Signal inference workers to stop experience collection... (6400 times) [2024-06-15 13:08:58,031][1652475] InferenceWorker_p0-w0: stopping experience collection (6400 times) [2024-06-15 13:08:58,285][1651340] Signal inference workers to resume experience collection... (6400 times) [2024-06-15 13:08:58,285][1652475] InferenceWorker_p0-w0: resuming experience collection (6400 times) [2024-06-15 13:08:58,449][1652475] Updated weights for policy 0, policy_version 123681 (0.0015) [2024-06-15 13:09:00,173][1652475] Updated weights for policy 0, policy_version 123714 (0.0018) [2024-06-15 13:09:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 253394944. Throughput: 0: 11377.7. Samples: 63421440. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:09:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 13:09:01,497][1652475] Updated weights for policy 0, policy_version 123775 (0.0012) [2024-06-15 13:09:05,611][1652475] Updated weights for policy 0, policy_version 123838 (0.0016) [2024-06-15 13:09:05,738][1648984] Fps is (10 sec: 39353.5, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 253624320. Throughput: 0: 11241.3. Samples: 63453696. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:09:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:09:07,211][1652475] Updated weights for policy 0, policy_version 123896 (0.0017) [2024-06-15 13:09:09,503][1652475] Updated weights for policy 0, policy_version 123927 (0.0013) [2024-06-15 13:09:10,292][1652475] Updated weights for policy 0, policy_version 123968 (0.0015) [2024-06-15 13:09:10,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 253886464. Throughput: 0: 11252.6. Samples: 63518720. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:09:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:09:14,263][1652475] Updated weights for policy 0, policy_version 124026 (0.0012) [2024-06-15 13:09:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 254017536. Throughput: 0: 11059.2. Samples: 63584768. Policy #0 lag: (min: 19.0, avg: 113.5, max: 275.0) [2024-06-15 13:09:15,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:09:16,755][1652475] Updated weights for policy 0, policy_version 124069 (0.0012) [2024-06-15 13:09:20,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 254246912. Throughput: 0: 11138.8. Samples: 63616512. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:09:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:09:20,986][1652475] Updated weights for policy 0, policy_version 124160 (0.0013) [2024-06-15 13:09:22,602][1652475] Updated weights for policy 0, policy_version 124224 (0.0017) [2024-06-15 13:09:25,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 43144.4, 300 sec: 43098.2). Total num frames: 254410752. Throughput: 0: 11002.3. Samples: 63673856. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:09:25,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:09:26,859][1652475] Updated weights for policy 0, policy_version 124285 (0.0014) [2024-06-15 13:09:29,315][1652475] Updated weights for policy 0, policy_version 124346 (0.0064) [2024-06-15 13:09:30,762][1648984] Fps is (10 sec: 42598.6, 60 sec: 44236.7, 300 sec: 43320.6). Total num frames: 254672896. Throughput: 0: 11059.2. Samples: 63747584. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:09:30,762][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:09:33,610][1652475] Updated weights for policy 0, policy_version 124432 (0.0013) [2024-06-15 13:09:35,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 254935040. Throughput: 0: 10820.3. Samples: 63775232. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:09:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:09:38,326][1652475] Updated weights for policy 0, policy_version 124502 (0.0021) [2024-06-15 13:09:39,014][1652475] Updated weights for policy 0, policy_version 124544 (0.0011) [2024-06-15 13:09:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 255098880. Throughput: 0: 10833.6. Samples: 63841280. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:09:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:09:41,632][1652475] Updated weights for policy 0, policy_version 124608 (0.0108) [2024-06-15 13:09:45,323][1651340] Signal inference workers to stop experience collection... (6450 times) [2024-06-15 13:09:45,376][1652475] InferenceWorker_p0-w0: stopping experience collection (6450 times) [2024-06-15 13:09:45,547][1651340] Signal inference workers to resume experience collection... (6450 times) [2024-06-15 13:09:45,548][1652475] InferenceWorker_p0-w0: resuming experience collection (6450 times) [2024-06-15 13:09:45,706][1652475] Updated weights for policy 0, policy_version 124691 (0.0101) [2024-06-15 13:09:45,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 255361024. Throughput: 0: 10717.9. Samples: 63903744. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:09:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:09:50,006][1652475] Updated weights for policy 0, policy_version 124739 (0.0012) [2024-06-15 13:09:50,739][1648984] Fps is (10 sec: 42593.2, 60 sec: 42597.5, 300 sec: 43320.2). Total num frames: 255524864. Throughput: 0: 10763.1. Samples: 63938048. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:09:50,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:09:52,110][1652475] Updated weights for policy 0, policy_version 124803 (0.0013) [2024-06-15 13:09:55,742][1648984] Fps is (10 sec: 36030.5, 60 sec: 41509.0, 300 sec: 43319.8). Total num frames: 255721472. Throughput: 0: 10773.8. Samples: 64003584. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:09:55,742][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:09:55,874][1652475] Updated weights for policy 0, policy_version 124880 (0.0017) [2024-06-15 13:09:57,233][1652475] Updated weights for policy 0, policy_version 124931 (0.0015) [2024-06-15 13:09:58,258][1652475] Updated weights for policy 0, policy_version 124992 (0.0013) [2024-06-15 13:10:00,738][1648984] Fps is (10 sec: 45881.0, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 255983616. Throughput: 0: 10865.8. Samples: 64073728. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:10:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:10:03,346][1652475] Updated weights for policy 0, policy_version 125056 (0.0116) [2024-06-15 13:10:05,738][1648984] Fps is (10 sec: 52449.2, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 256245760. Throughput: 0: 10899.9. Samples: 64107008. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:10:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:10:07,489][1652475] Updated weights for policy 0, policy_version 125121 (0.0052) [2024-06-15 13:10:08,757][1652475] Updated weights for policy 0, policy_version 125171 (0.0013) [2024-06-15 13:10:10,310][1652475] Updated weights for policy 0, policy_version 125241 (0.0013) [2024-06-15 13:10:10,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 256507904. Throughput: 0: 11002.4. Samples: 64168960. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:10:10,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 13:10:15,274][1652475] Updated weights for policy 0, policy_version 125309 (0.0013) [2024-06-15 13:10:15,740][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 256638976. Throughput: 0: 10706.5. Samples: 64229376. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:10:15,741][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:10:20,123][1652475] Updated weights for policy 0, policy_version 125392 (0.0234) [2024-06-15 13:10:20,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 256835584. Throughput: 0: 10752.0. Samples: 64259072. Policy #0 lag: (min: 47.0, avg: 168.7, max: 303.0) [2024-06-15 13:10:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:10:21,124][1652475] Updated weights for policy 0, policy_version 125434 (0.0109) [2024-06-15 13:10:23,096][1652475] Updated weights for policy 0, policy_version 125495 (0.0012) [2024-06-15 13:10:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.8, 300 sec: 43098.2). Total num frames: 257032192. Throughput: 0: 10729.2. Samples: 64324096. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:10:25,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 13:10:26,452][1652475] Updated weights for policy 0, policy_version 125538 (0.0012) [2024-06-15 13:10:29,995][1652475] Updated weights for policy 0, policy_version 125572 (0.0012) [2024-06-15 13:10:30,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 42598.2, 300 sec: 42987.1). Total num frames: 257228800. Throughput: 0: 11002.2. Samples: 64398848. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:10:30,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:10:32,016][1652475] Updated weights for policy 0, policy_version 125651 (0.0012) [2024-06-15 13:10:32,401][1651340] Signal inference workers to stop experience collection... (6500 times) [2024-06-15 13:10:32,480][1652475] InferenceWorker_p0-w0: stopping experience collection (6500 times) [2024-06-15 13:10:32,740][1651340] Signal inference workers to resume experience collection... (6500 times) [2024-06-15 13:10:32,742][1652475] InferenceWorker_p0-w0: resuming experience collection (6500 times) [2024-06-15 13:10:33,757][1652475] Updated weights for policy 0, policy_version 125699 (0.0027) [2024-06-15 13:10:35,154][1652475] Updated weights for policy 0, policy_version 125758 (0.0014) [2024-06-15 13:10:35,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43653.7). Total num frames: 257556480. Throughput: 0: 10775.1. Samples: 64422912. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:10:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:10:38,603][1652475] Updated weights for policy 0, policy_version 125815 (0.0014) [2024-06-15 13:10:40,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 43144.4, 300 sec: 43542.5). Total num frames: 257687552. Throughput: 0: 10855.3. Samples: 64492032. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:10:40,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:10:42,522][1652475] Updated weights for policy 0, policy_version 125873 (0.0012) [2024-06-15 13:10:43,978][1652475] Updated weights for policy 0, policy_version 125922 (0.0014) [2024-06-15 13:10:45,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 43764.8). Total num frames: 258015232. Throughput: 0: 10752.0. Samples: 64557568. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:10:45,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:10:45,964][1652475] Updated weights for policy 0, policy_version 126007 (0.0013) [2024-06-15 13:10:49,950][1652475] Updated weights for policy 0, policy_version 126048 (0.0013) [2024-06-15 13:10:50,584][1652475] Updated weights for policy 0, policy_version 126074 (0.0010) [2024-06-15 13:10:50,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 44783.9, 300 sec: 43653.7). Total num frames: 258211840. Throughput: 0: 10854.4. Samples: 64595456. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:10:50,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:10:54,106][1652475] Updated weights for policy 0, policy_version 126136 (0.0027) [2024-06-15 13:10:55,740][1648984] Fps is (10 sec: 39312.2, 60 sec: 44784.1, 300 sec: 43320.1). Total num frames: 258408448. Throughput: 0: 11024.5. Samples: 64665088. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:10:55,741][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:10:56,447][1652475] Updated weights for policy 0, policy_version 126208 (0.0133) [2024-06-15 13:10:56,454][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000126208_258473984.pth... [2024-06-15 13:10:56,601][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000121088_247988224.pth [2024-06-15 13:10:56,605][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000126208_258473984.pth [2024-06-15 13:10:57,870][1652475] Updated weights for policy 0, policy_version 126266 (0.0012) [2024-06-15 13:11:00,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 258605056. Throughput: 0: 11047.8. Samples: 64726528. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:11:00,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:11:01,648][1652475] Updated weights for policy 0, policy_version 126304 (0.0013) [2024-06-15 13:11:05,738][1648984] Fps is (10 sec: 36053.3, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 258768896. Throughput: 0: 11161.6. Samples: 64761344. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:11:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:11:06,098][1652475] Updated weights for policy 0, policy_version 126368 (0.0116) [2024-06-15 13:11:07,197][1652475] Updated weights for policy 0, policy_version 126416 (0.0014) [2024-06-15 13:11:08,739][1652475] Updated weights for policy 0, policy_version 126469 (0.0014) [2024-06-15 13:11:10,015][1652475] Updated weights for policy 0, policy_version 126527 (0.0011) [2024-06-15 13:11:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 259129344. Throughput: 0: 11002.3. Samples: 64819200. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:11:10,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:11:14,212][1652475] Updated weights for policy 0, policy_version 126581 (0.0013) [2024-06-15 13:11:15,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 259260416. Throughput: 0: 11025.1. Samples: 64894976. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:11:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:11:18,475][1652475] Updated weights for policy 0, policy_version 126648 (0.0029) [2024-06-15 13:11:18,680][1651340] Signal inference workers to stop experience collection... (6550 times) [2024-06-15 13:11:18,699][1652475] InferenceWorker_p0-w0: stopping experience collection (6550 times) [2024-06-15 13:11:18,753][1651340] Signal inference workers to resume experience collection... (6550 times) [2024-06-15 13:11:18,798][1652475] InferenceWorker_p0-w0: resuming experience collection (6550 times) [2024-06-15 13:11:19,392][1652475] Updated weights for policy 0, policy_version 126673 (0.0013) [2024-06-15 13:11:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 259555328. Throughput: 0: 11207.1. Samples: 64927232. Policy #0 lag: (min: 18.0, avg: 159.2, max: 337.0) [2024-06-15 13:11:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:11:20,879][1652475] Updated weights for policy 0, policy_version 126739 (0.0116) [2024-06-15 13:11:25,543][1652475] Updated weights for policy 0, policy_version 126800 (0.0014) [2024-06-15 13:11:25,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 259686400. Throughput: 0: 11013.7. Samples: 64987648. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:11:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:11:29,990][1652475] Updated weights for policy 0, policy_version 126865 (0.0016) [2024-06-15 13:11:30,739][1648984] Fps is (10 sec: 32764.6, 60 sec: 44236.3, 300 sec: 43431.3). Total num frames: 259883008. Throughput: 0: 11024.8. Samples: 65053696. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:11:30,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:11:31,866][1652475] Updated weights for policy 0, policy_version 126930 (0.0014) [2024-06-15 13:11:33,815][1652475] Updated weights for policy 0, policy_version 127008 (0.0012) [2024-06-15 13:11:35,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 260177920. Throughput: 0: 10672.4. Samples: 65075712. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:11:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:11:40,231][1652475] Updated weights for policy 0, policy_version 127095 (0.0092) [2024-06-15 13:11:40,738][1648984] Fps is (10 sec: 42602.6, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 260308992. Throughput: 0: 10729.8. Samples: 65147904. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:11:40,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:11:42,672][1652475] Updated weights for policy 0, policy_version 127161 (0.0014) [2024-06-15 13:11:44,639][1652475] Updated weights for policy 0, policy_version 127225 (0.0015) [2024-06-15 13:11:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 43544.4). Total num frames: 260571136. Throughput: 0: 10592.7. Samples: 65203200. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:11:45,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:11:47,184][1652475] Updated weights for policy 0, policy_version 127295 (0.0013) [2024-06-15 13:11:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 260702208. Throughput: 0: 10570.0. Samples: 65236992. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:11:50,740][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:11:51,835][1652475] Updated weights for policy 0, policy_version 127353 (0.0017) [2024-06-15 13:11:54,829][1652475] Updated weights for policy 0, policy_version 127413 (0.0013) [2024-06-15 13:11:55,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43146.2, 300 sec: 43653.8). Total num frames: 260997120. Throughput: 0: 10865.8. Samples: 65308160. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:11:55,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:11:56,037][1652475] Updated weights for policy 0, policy_version 127457 (0.0012) [2024-06-15 13:11:57,571][1652475] Updated weights for policy 0, policy_version 127507 (0.0013) [2024-06-15 13:12:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 261226496. Throughput: 0: 10695.1. Samples: 65376256. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:12:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:12:03,738][1652475] Updated weights for policy 0, policy_version 127588 (0.0032) [2024-06-15 13:12:05,362][1651340] Signal inference workers to stop experience collection... (6600 times) [2024-06-15 13:12:05,456][1652475] InferenceWorker_p0-w0: stopping experience collection (6600 times) [2024-06-15 13:12:05,458][1652475] Updated weights for policy 0, policy_version 127639 (0.0074) [2024-06-15 13:12:05,606][1651340] Signal inference workers to resume experience collection... (6600 times) [2024-06-15 13:12:05,607][1652475] InferenceWorker_p0-w0: resuming experience collection (6600 times) [2024-06-15 13:12:05,738][1648984] Fps is (10 sec: 42596.4, 60 sec: 44236.4, 300 sec: 43320.3). Total num frames: 261423104. Throughput: 0: 10854.3. Samples: 65415680. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:12:05,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:12:06,692][1652475] Updated weights for policy 0, policy_version 127681 (0.0126) [2024-06-15 13:12:08,035][1652475] Updated weights for policy 0, policy_version 127744 (0.0011) [2024-06-15 13:12:10,182][1652475] Updated weights for policy 0, policy_version 127805 (0.0013) [2024-06-15 13:12:10,738][1648984] Fps is (10 sec: 52425.8, 60 sec: 43690.3, 300 sec: 43986.8). Total num frames: 261750784. Throughput: 0: 10854.3. Samples: 65476096. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:12:10,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:12:15,661][1652475] Updated weights for policy 0, policy_version 127870 (0.0013) [2024-06-15 13:12:15,738][1648984] Fps is (10 sec: 45877.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 261881856. Throughput: 0: 11013.9. Samples: 65549312. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:12:15,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:12:17,568][1652475] Updated weights for policy 0, policy_version 127935 (0.0014) [2024-06-15 13:12:19,777][1652475] Updated weights for policy 0, policy_version 128000 (0.0012) [2024-06-15 13:12:20,738][1648984] Fps is (10 sec: 39323.6, 60 sec: 43144.5, 300 sec: 43987.9). Total num frames: 262144000. Throughput: 0: 11138.8. Samples: 65576960. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:12:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:12:22,000][1652475] Updated weights for policy 0, policy_version 128064 (0.0014) [2024-06-15 13:12:25,740][1648984] Fps is (10 sec: 39321.1, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 262275072. Throughput: 0: 10979.5. Samples: 65641984. Policy #0 lag: (min: 12.0, avg: 130.0, max: 268.0) [2024-06-15 13:12:25,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:12:29,119][1652475] Updated weights for policy 0, policy_version 128145 (0.0017) [2024-06-15 13:12:30,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 44237.4, 300 sec: 43542.5). Total num frames: 262537216. Throughput: 0: 11093.3. Samples: 65702400. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:12:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:12:31,774][1652475] Updated weights for policy 0, policy_version 128193 (0.0014) [2024-06-15 13:12:33,457][1652475] Updated weights for policy 0, policy_version 128261 (0.0014) [2024-06-15 13:12:35,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 262799360. Throughput: 0: 10990.9. Samples: 65731584. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:12:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:12:39,416][1652475] Updated weights for policy 0, policy_version 128321 (0.0162) [2024-06-15 13:12:40,740][1648984] Fps is (10 sec: 36045.2, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 262897664. Throughput: 0: 10865.8. Samples: 65797120. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:12:40,741][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 13:12:40,787][1652475] Updated weights for policy 0, policy_version 128382 (0.0013) [2024-06-15 13:12:44,667][1652475] Updated weights for policy 0, policy_version 128451 (0.0160) [2024-06-15 13:12:45,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 42598.2, 300 sec: 43431.5). Total num frames: 263127040. Throughput: 0: 10660.9. Samples: 65856000. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:12:45,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:12:47,406][1652475] Updated weights for policy 0, policy_version 128568 (0.0120) [2024-06-15 13:12:50,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 43690.7, 300 sec: 43209.4). Total num frames: 263323648. Throughput: 0: 10308.4. Samples: 65879552. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:12:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:12:52,588][1652475] Updated weights for policy 0, policy_version 128610 (0.0058) [2024-06-15 13:12:54,694][1651340] Signal inference workers to stop experience collection... (6650 times) [2024-06-15 13:12:54,715][1652475] Updated weights for policy 0, policy_version 128641 (0.0016) [2024-06-15 13:12:54,742][1652475] InferenceWorker_p0-w0: stopping experience collection (6650 times) [2024-06-15 13:12:54,983][1651340] Signal inference workers to resume experience collection... (6650 times) [2024-06-15 13:12:54,984][1652475] InferenceWorker_p0-w0: resuming experience collection (6650 times) [2024-06-15 13:12:55,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 42598.5, 300 sec: 43431.5). Total num frames: 263553024. Throughput: 0: 10683.9. Samples: 65956864. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:12:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:12:55,936][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000128704_263585792.pth... [2024-06-15 13:12:55,987][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000123648_253231104.pth [2024-06-15 13:12:57,028][1652475] Updated weights for policy 0, policy_version 128709 (0.0184) [2024-06-15 13:12:59,424][1652475] Updated weights for policy 0, policy_version 128801 (0.0087) [2024-06-15 13:13:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 263847936. Throughput: 0: 10285.5. Samples: 66012160. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:13:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:13:04,181][1652475] Updated weights for policy 0, policy_version 128864 (0.0012) [2024-06-15 13:13:05,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42598.8, 300 sec: 43542.6). Total num frames: 263979008. Throughput: 0: 10638.2. Samples: 66055680. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:13:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:13:06,652][1652475] Updated weights for policy 0, policy_version 128903 (0.0034) [2024-06-15 13:13:09,103][1652475] Updated weights for policy 0, policy_version 128976 (0.0125) [2024-06-15 13:13:10,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42052.7, 300 sec: 43653.6). Total num frames: 264273920. Throughput: 0: 10535.9. Samples: 66116096. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:13:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:13:11,636][1652475] Updated weights for policy 0, policy_version 129080 (0.0184) [2024-06-15 13:13:15,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 264372224. Throughput: 0: 10740.6. Samples: 66185728. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:13:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:13:16,740][1652475] Updated weights for policy 0, policy_version 129146 (0.0014) [2024-06-15 13:13:19,825][1652475] Updated weights for policy 0, policy_version 129211 (0.0012) [2024-06-15 13:13:20,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 264634368. Throughput: 0: 10808.9. Samples: 66217984. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:13:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:13:22,297][1652475] Updated weights for policy 0, policy_version 129267 (0.0012) [2024-06-15 13:13:23,924][1652475] Updated weights for policy 0, policy_version 129332 (0.0052) [2024-06-15 13:13:25,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 264896512. Throughput: 0: 10649.6. Samples: 66276352. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:13:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:13:28,006][1652475] Updated weights for policy 0, policy_version 129363 (0.0011) [2024-06-15 13:13:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 43098.2). Total num frames: 265027584. Throughput: 0: 10934.1. Samples: 66348032. Policy #0 lag: (min: 31.0, avg: 95.9, max: 287.0) [2024-06-15 13:13:30,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:13:30,923][1652475] Updated weights for policy 0, policy_version 129409 (0.0014) [2024-06-15 13:13:32,295][1652475] Updated weights for policy 0, policy_version 129467 (0.0026) [2024-06-15 13:13:34,210][1652475] Updated weights for policy 0, policy_version 129524 (0.0012) [2024-06-15 13:13:35,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 265388032. Throughput: 0: 11116.1. Samples: 66379776. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:13:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:13:35,905][1652475] Updated weights for policy 0, policy_version 129595 (0.0013) [2024-06-15 13:13:40,504][1651340] Signal inference workers to stop experience collection... (6700 times) [2024-06-15 13:13:40,547][1652475] InferenceWorker_p0-w0: stopping experience collection (6700 times) [2024-06-15 13:13:40,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 42598.5, 300 sec: 43209.3). Total num frames: 265453568. Throughput: 0: 10843.0. Samples: 66444800. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:13:40,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:13:40,761][1651340] Signal inference workers to resume experience collection... (6700 times) [2024-06-15 13:13:40,763][1652475] InferenceWorker_p0-w0: resuming experience collection (6700 times) [2024-06-15 13:13:43,457][1652475] Updated weights for policy 0, policy_version 129666 (0.0114) [2024-06-15 13:13:44,957][1652475] Updated weights for policy 0, policy_version 129720 (0.0033) [2024-06-15 13:13:45,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43144.7, 300 sec: 43209.3). Total num frames: 265715712. Throughput: 0: 10911.3. Samples: 66503168. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:13:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:13:46,095][1652475] Updated weights for policy 0, policy_version 129761 (0.0012) [2024-06-15 13:13:48,119][1652475] Updated weights for policy 0, policy_version 129856 (0.0013) [2024-06-15 13:13:50,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.7, 300 sec: 43099.4). Total num frames: 265945088. Throughput: 0: 10535.8. Samples: 66529792. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:13:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:13:55,558][1652475] Updated weights for policy 0, policy_version 129904 (0.0014) [2024-06-15 13:13:55,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 266043392. Throughput: 0: 10706.5. Samples: 66597888. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:13:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:13:56,865][1652475] Updated weights for policy 0, policy_version 129954 (0.0011) [2024-06-15 13:13:58,704][1652475] Updated weights for policy 0, policy_version 130032 (0.0014) [2024-06-15 13:14:00,690][1652475] Updated weights for policy 0, policy_version 130096 (0.0023) [2024-06-15 13:14:00,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 266436608. Throughput: 0: 10376.6. Samples: 66652672. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:14:00,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:14:05,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 266469376. Throughput: 0: 10387.9. Samples: 66685440. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:14:05,740][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:14:07,710][1652475] Updated weights for policy 0, policy_version 130151 (0.0018) [2024-06-15 13:14:09,655][1652475] Updated weights for policy 0, policy_version 130236 (0.0013) [2024-06-15 13:14:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42052.2, 300 sec: 43320.4). Total num frames: 266797056. Throughput: 0: 10535.8. Samples: 66750464. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:14:10,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:14:10,856][1652475] Updated weights for policy 0, policy_version 130274 (0.0018) [2024-06-15 13:14:13,189][1652475] Updated weights for policy 0, policy_version 130352 (0.0016) [2024-06-15 13:14:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 43209.3). Total num frames: 266993664. Throughput: 0: 10365.2. Samples: 66814464. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:14:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:14:19,253][1652475] Updated weights for policy 0, policy_version 130406 (0.0013) [2024-06-15 13:14:20,714][1652475] Updated weights for policy 0, policy_version 130468 (0.0023) [2024-06-15 13:14:20,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 267190272. Throughput: 0: 10604.1. Samples: 66856960. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:14:20,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:14:22,518][1652475] Updated weights for policy 0, policy_version 130534 (0.0014) [2024-06-15 13:14:23,739][1651340] Signal inference workers to stop experience collection... (6750 times) [2024-06-15 13:14:23,795][1652475] InferenceWorker_p0-w0: stopping experience collection (6750 times) [2024-06-15 13:14:23,995][1651340] Signal inference workers to resume experience collection... (6750 times) [2024-06-15 13:14:23,996][1652475] InferenceWorker_p0-w0: resuming experience collection (6750 times) [2024-06-15 13:14:24,200][1652475] Updated weights for policy 0, policy_version 130581 (0.0030) [2024-06-15 13:14:25,213][1652475] Updated weights for policy 0, policy_version 130624 (0.0016) [2024-06-15 13:14:25,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 267517952. Throughput: 0: 10444.8. Samples: 66914816. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:14:25,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:14:30,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 267583488. Throughput: 0: 10911.3. Samples: 66994176. Policy #0 lag: (min: 11.0, avg: 108.0, max: 267.0) [2024-06-15 13:14:30,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:14:32,077][1652475] Updated weights for policy 0, policy_version 130708 (0.0108) [2024-06-15 13:14:32,719][1652475] Updated weights for policy 0, policy_version 130751 (0.0014) [2024-06-15 13:14:34,999][1652475] Updated weights for policy 0, policy_version 130810 (0.0014) [2024-06-15 13:14:35,740][1648984] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 43542.5). Total num frames: 267943936. Throughput: 0: 10911.3. Samples: 67020800. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:14:35,741][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:14:36,172][1652475] Updated weights for policy 0, policy_version 130849 (0.0013) [2024-06-15 13:14:40,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 43144.4, 300 sec: 42987.1). Total num frames: 268042240. Throughput: 0: 10945.4. Samples: 67090432. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:14:40,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 13:14:42,133][1652475] Updated weights for policy 0, policy_version 130897 (0.0012) [2024-06-15 13:14:43,585][1652475] Updated weights for policy 0, policy_version 130963 (0.0036) [2024-06-15 13:14:45,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 43144.5, 300 sec: 43320.6). Total num frames: 268304384. Throughput: 0: 11150.2. Samples: 67154432. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:14:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:14:46,463][1652475] Updated weights for policy 0, policy_version 131029 (0.0013) [2024-06-15 13:14:48,052][1652475] Updated weights for policy 0, policy_version 131104 (0.0011) [2024-06-15 13:14:50,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 43690.6, 300 sec: 43543.1). Total num frames: 268566528. Throughput: 0: 11013.7. Samples: 67181056. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:14:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:14:53,709][1652475] Updated weights for policy 0, policy_version 131152 (0.0014) [2024-06-15 13:14:55,711][1652475] Updated weights for policy 0, policy_version 131218 (0.0014) [2024-06-15 13:14:55,738][1648984] Fps is (10 sec: 42596.4, 60 sec: 44782.6, 300 sec: 43209.3). Total num frames: 268730368. Throughput: 0: 11093.2. Samples: 67249664. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:14:55,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 13:14:56,262][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000131248_268795904.pth... [2024-06-15 13:14:56,312][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000126208_258473984.pth [2024-06-15 13:14:58,086][1652475] Updated weights for policy 0, policy_version 131288 (0.0016) [2024-06-15 13:15:00,494][1652475] Updated weights for policy 0, policy_version 131345 (0.0014) [2024-06-15 13:15:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.3, 300 sec: 43209.3). Total num frames: 268992512. Throughput: 0: 11025.0. Samples: 67310592. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:15:00,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:15:01,519][1652475] Updated weights for policy 0, policy_version 131392 (0.0026) [2024-06-15 13:15:05,738][1648984] Fps is (10 sec: 42600.3, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 269156352. Throughput: 0: 10899.9. Samples: 67347456. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:15:05,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:06,067][1652475] Updated weights for policy 0, policy_version 131450 (0.0035) [2024-06-15 13:15:08,909][1652475] Updated weights for policy 0, policy_version 131496 (0.0014) [2024-06-15 13:15:09,982][1651340] Signal inference workers to stop experience collection... (6800 times) [2024-06-15 13:15:10,014][1652475] InferenceWorker_p0-w0: stopping experience collection (6800 times) [2024-06-15 13:15:10,188][1651340] Signal inference workers to resume experience collection... (6800 times) [2024-06-15 13:15:10,189][1652475] InferenceWorker_p0-w0: resuming experience collection (6800 times) [2024-06-15 13:15:10,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 269451264. Throughput: 0: 11161.6. Samples: 67417088. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:15:10,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:10,882][1652475] Updated weights for policy 0, policy_version 131578 (0.0013) [2024-06-15 13:15:12,547][1652475] Updated weights for policy 0, policy_version 131632 (0.0012) [2024-06-15 13:15:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 269615104. Throughput: 0: 10877.1. Samples: 67483648. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:15:15,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:16,437][1652475] Updated weights for policy 0, policy_version 131649 (0.0013) [2024-06-15 13:15:17,782][1652475] Updated weights for policy 0, policy_version 131709 (0.0013) [2024-06-15 13:15:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 269844480. Throughput: 0: 10968.2. Samples: 67514368. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:15:20,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:20,913][1652475] Updated weights for policy 0, policy_version 131767 (0.0012) [2024-06-15 13:15:21,986][1652475] Updated weights for policy 0, policy_version 131808 (0.0012) [2024-06-15 13:15:24,068][1652475] Updated weights for policy 0, policy_version 131856 (0.0049) [2024-06-15 13:15:25,089][1652475] Updated weights for policy 0, policy_version 131901 (0.0011) [2024-06-15 13:15:25,748][1648984] Fps is (10 sec: 52383.0, 60 sec: 43684.3, 300 sec: 43763.5). Total num frames: 270139392. Throughput: 0: 10863.7. Samples: 67579392. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:15:25,749][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:28,728][1652475] Updated weights for policy 0, policy_version 131959 (0.0013) [2024-06-15 13:15:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 270270464. Throughput: 0: 11104.7. Samples: 67654144. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:15:30,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:31,800][1652475] Updated weights for policy 0, policy_version 132000 (0.0015) [2024-06-15 13:15:33,229][1652475] Updated weights for policy 0, policy_version 132064 (0.0013) [2024-06-15 13:15:33,875][1652475] Updated weights for policy 0, policy_version 132095 (0.0023) [2024-06-15 13:15:35,738][1648984] Fps is (10 sec: 45915.2, 60 sec: 44236.9, 300 sec: 43764.7). Total num frames: 270598144. Throughput: 0: 11195.7. Samples: 67684864. Policy #0 lag: (min: 12.0, avg: 139.8, max: 268.0) [2024-06-15 13:15:35,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:35,782][1652475] Updated weights for policy 0, policy_version 132144 (0.0012) [2024-06-15 13:15:39,573][1652475] Updated weights for policy 0, policy_version 132176 (0.0014) [2024-06-15 13:15:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45875.4, 300 sec: 43320.4). Total num frames: 270794752. Throughput: 0: 11366.5. Samples: 67761152. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:15:40,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:42,534][1652475] Updated weights for policy 0, policy_version 132240 (0.0012) [2024-06-15 13:15:44,100][1652475] Updated weights for policy 0, policy_version 132309 (0.0012) [2024-06-15 13:15:45,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 45875.1, 300 sec: 43542.5). Total num frames: 271056896. Throughput: 0: 11411.9. Samples: 67824128. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:15:45,739][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:46,101][1652475] Updated weights for policy 0, policy_version 132368 (0.0017) [2024-06-15 13:15:47,302][1652475] Updated weights for policy 0, policy_version 132416 (0.0019) [2024-06-15 13:15:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43320.8). Total num frames: 271187968. Throughput: 0: 11389.2. Samples: 67859968. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:15:50,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:51,796][1652475] Updated weights for policy 0, policy_version 132475 (0.0012) [2024-06-15 13:15:54,811][1651340] Signal inference workers to stop experience collection... (6850 times) [2024-06-15 13:15:54,870][1652475] InferenceWorker_p0-w0: stopping experience collection (6850 times) [2024-06-15 13:15:55,016][1651340] Signal inference workers to resume experience collection... (6850 times) [2024-06-15 13:15:55,017][1652475] InferenceWorker_p0-w0: resuming experience collection (6850 times) [2024-06-15 13:15:55,020][1652475] Updated weights for policy 0, policy_version 132544 (0.0128) [2024-06-15 13:15:55,739][1648984] Fps is (10 sec: 45874.9, 60 sec: 46421.5, 300 sec: 43764.7). Total num frames: 271515648. Throughput: 0: 11491.5. Samples: 67934208. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:15:55,740][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:15:57,398][1652475] Updated weights for policy 0, policy_version 132611 (0.0013) [2024-06-15 13:16:00,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 45329.2, 300 sec: 43875.8). Total num frames: 271712256. Throughput: 0: 11355.0. Samples: 67994624. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:16:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:16:03,808][1652475] Updated weights for policy 0, policy_version 132675 (0.0012) [2024-06-15 13:16:04,859][1652475] Updated weights for policy 0, policy_version 132730 (0.0043) [2024-06-15 13:16:05,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 45875.2, 300 sec: 43320.4). Total num frames: 271908864. Throughput: 0: 11525.7. Samples: 68033024. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:16:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:16:06,672][1652475] Updated weights for policy 0, policy_version 132804 (0.0015) [2024-06-15 13:16:08,514][1652475] Updated weights for policy 0, policy_version 132880 (0.0013) [2024-06-15 13:16:10,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 46421.3, 300 sec: 43986.9). Total num frames: 272236544. Throughput: 0: 11300.3. Samples: 68087808. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:16:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:16:15,738][1648984] Fps is (10 sec: 32767.3, 60 sec: 43690.5, 300 sec: 42987.1). Total num frames: 272236544. Throughput: 0: 11309.5. Samples: 68163072. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:16:15,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:16:15,968][1652475] Updated weights for policy 0, policy_version 132932 (0.0027) [2024-06-15 13:16:17,361][1652475] Updated weights for policy 0, policy_version 133008 (0.0136) [2024-06-15 13:16:18,335][1652475] Updated weights for policy 0, policy_version 133052 (0.0014) [2024-06-15 13:16:20,367][1652475] Updated weights for policy 0, policy_version 133120 (0.0017) [2024-06-15 13:16:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 46421.4, 300 sec: 43875.8). Total num frames: 272629760. Throughput: 0: 11309.5. Samples: 68193792. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:16:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:16:21,753][1652475] Updated weights for policy 0, policy_version 133174 (0.0016) [2024-06-15 13:16:25,738][1648984] Fps is (10 sec: 52430.1, 60 sec: 43697.0, 300 sec: 43653.8). Total num frames: 272760832. Throughput: 0: 11150.2. Samples: 68262912. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:16:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:16:27,225][1652475] Updated weights for policy 0, policy_version 133220 (0.0014) [2024-06-15 13:16:29,291][1652475] Updated weights for policy 0, policy_version 133280 (0.0025) [2024-06-15 13:16:30,472][1652475] Updated weights for policy 0, policy_version 133328 (0.0014) [2024-06-15 13:16:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 43653.6). Total num frames: 273055744. Throughput: 0: 11412.0. Samples: 68337664. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:16:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:16:32,189][1652475] Updated weights for policy 0, policy_version 133392 (0.0013) [2024-06-15 13:16:35,738][1648984] Fps is (10 sec: 52425.8, 60 sec: 44782.5, 300 sec: 43986.8). Total num frames: 273285120. Throughput: 0: 11047.7. Samples: 68357120. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:16:35,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:16:38,383][1652475] Updated weights for policy 0, policy_version 133464 (0.0013) [2024-06-15 13:16:40,246][1651340] Signal inference workers to stop experience collection... (6900 times) [2024-06-15 13:16:40,286][1652475] InferenceWorker_p0-w0: stopping experience collection (6900 times) [2024-06-15 13:16:40,402][1651340] Signal inference workers to resume experience collection... (6900 times) [2024-06-15 13:16:40,403][1652475] InferenceWorker_p0-w0: resuming experience collection (6900 times) [2024-06-15 13:16:40,519][1652475] Updated weights for policy 0, policy_version 133521 (0.0016) [2024-06-15 13:16:40,756][1648984] Fps is (10 sec: 42597.8, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 273481728. Throughput: 0: 11161.6. Samples: 68436480. Policy #0 lag: (min: 1.0, avg: 98.3, max: 257.0) [2024-06-15 13:16:40,757][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:16:41,305][1652475] Updated weights for policy 0, policy_version 133568 (0.0013) [2024-06-15 13:16:42,988][1652475] Updated weights for policy 0, policy_version 133627 (0.0011) [2024-06-15 13:16:45,015][1652475] Updated weights for policy 0, policy_version 133687 (0.0039) [2024-06-15 13:16:45,738][1648984] Fps is (10 sec: 52431.7, 60 sec: 45875.3, 300 sec: 44431.2). Total num frames: 273809408. Throughput: 0: 11184.3. Samples: 68497920. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:16:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:16:49,308][1652475] Updated weights for policy 0, policy_version 133734 (0.0025) [2024-06-15 13:16:50,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 43875.8). Total num frames: 273940480. Throughput: 0: 11411.9. Samples: 68546560. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:16:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:16:51,840][1652475] Updated weights for policy 0, policy_version 133777 (0.0014) [2024-06-15 13:16:53,624][1652475] Updated weights for policy 0, policy_version 133856 (0.0018) [2024-06-15 13:16:55,478][1652475] Updated weights for policy 0, policy_version 133920 (0.0018) [2024-06-15 13:16:55,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 45875.3, 300 sec: 44209.0). Total num frames: 274268160. Throughput: 0: 11537.0. Samples: 68606976. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:16:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:16:56,271][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000133952_274333696.pth... [2024-06-15 13:16:56,292][1652475] Updated weights for policy 0, policy_version 133952 (0.0014) [2024-06-15 13:16:56,311][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000128704_263585792.pth [2024-06-15 13:17:00,398][1652475] Updated weights for policy 0, policy_version 134016 (0.0016) [2024-06-15 13:17:00,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 45875.1, 300 sec: 44209.1). Total num frames: 274464768. Throughput: 0: 11503.0. Samples: 68680704. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:17:00,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:17:05,660][1652475] Updated weights for policy 0, policy_version 134096 (0.0015) [2024-06-15 13:17:05,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 45329.1, 300 sec: 43653.7). Total num frames: 274628608. Throughput: 0: 11559.8. Samples: 68713984. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:17:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:17:06,990][1652475] Updated weights for policy 0, policy_version 134145 (0.0013) [2024-06-15 13:17:08,416][1652475] Updated weights for policy 0, policy_version 134207 (0.0030) [2024-06-15 13:17:10,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 274857984. Throughput: 0: 11275.4. Samples: 68770304. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:17:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:17:14,290][1652475] Updated weights for policy 0, policy_version 134275 (0.0013) [2024-06-15 13:17:15,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 48059.9, 300 sec: 43986.9). Total num frames: 275120128. Throughput: 0: 11275.4. Samples: 68845056. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:17:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:17:16,768][1652475] Updated weights for policy 0, policy_version 134338 (0.0014) [2024-06-15 13:17:17,877][1652475] Updated weights for policy 0, policy_version 134390 (0.0015) [2024-06-15 13:17:19,582][1652475] Updated weights for policy 0, policy_version 134454 (0.0012) [2024-06-15 13:17:20,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 44431.2). Total num frames: 275382272. Throughput: 0: 11548.6. Samples: 68876800. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:17:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:17:24,391][1651340] Signal inference workers to stop experience collection... (6950 times) [2024-06-15 13:17:24,460][1652475] InferenceWorker_p0-w0: stopping experience collection (6950 times) [2024-06-15 13:17:24,568][1651340] Signal inference workers to resume experience collection... (6950 times) [2024-06-15 13:17:24,569][1652475] InferenceWorker_p0-w0: resuming experience collection (6950 times) [2024-06-15 13:17:24,571][1652475] Updated weights for policy 0, policy_version 134512 (0.0015) [2024-06-15 13:17:25,745][1648984] Fps is (10 sec: 42566.6, 60 sec: 46415.6, 300 sec: 44096.9). Total num frames: 275546112. Throughput: 0: 11523.8. Samples: 68955136. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:17:25,746][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:17:26,760][1652475] Updated weights for policy 0, policy_version 134592 (0.0017) [2024-06-15 13:17:29,008][1652475] Updated weights for policy 0, policy_version 134656 (0.0106) [2024-06-15 13:17:30,663][1652475] Updated weights for policy 0, policy_version 134711 (0.0121) [2024-06-15 13:17:30,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 46967.5, 300 sec: 44320.1). Total num frames: 275873792. Throughput: 0: 11480.2. Samples: 69014528. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:17:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:17:34,954][1652475] Updated weights for policy 0, policy_version 134739 (0.0051) [2024-06-15 13:17:35,738][1648984] Fps is (10 sec: 45909.1, 60 sec: 45329.4, 300 sec: 44431.2). Total num frames: 276004864. Throughput: 0: 11423.3. Samples: 69060608. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:17:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:17:37,505][1652475] Updated weights for policy 0, policy_version 134820 (0.0014) [2024-06-15 13:17:38,819][1652475] Updated weights for policy 0, policy_version 134865 (0.0019) [2024-06-15 13:17:40,398][1652475] Updated weights for policy 0, policy_version 134928 (0.0014) [2024-06-15 13:17:40,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 47513.7, 300 sec: 44764.4). Total num frames: 276332544. Throughput: 0: 11594.0. Samples: 69128704. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:17:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:17:45,511][1652475] Updated weights for policy 0, policy_version 134978 (0.0014) [2024-06-15 13:17:45,738][1648984] Fps is (10 sec: 42597.6, 60 sec: 43690.5, 300 sec: 44431.1). Total num frames: 276430848. Throughput: 0: 11673.5. Samples: 69206016. Policy #0 lag: (min: 17.0, avg: 130.4, max: 263.0) [2024-06-15 13:17:45,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:17:47,908][1652475] Updated weights for policy 0, policy_version 135056 (0.0014) [2024-06-15 13:17:49,871][1652475] Updated weights for policy 0, policy_version 135120 (0.0016) [2024-06-15 13:17:50,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 44986.6). Total num frames: 276824064. Throughput: 0: 11582.6. Samples: 69235200. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:17:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:17:51,762][1652475] Updated weights for policy 0, policy_version 135170 (0.0019) [2024-06-15 13:17:53,236][1652475] Updated weights for policy 0, policy_version 135230 (0.0013) [2024-06-15 13:17:55,738][1648984] Fps is (10 sec: 52430.1, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 276955136. Throughput: 0: 11832.9. Samples: 69302784. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:17:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:17:58,079][1652475] Updated weights for policy 0, policy_version 135296 (0.0021) [2024-06-15 13:18:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 45875.3, 300 sec: 44875.5). Total num frames: 277217280. Throughput: 0: 11730.5. Samples: 69372928. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:18:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:18:01,426][1652475] Updated weights for policy 0, policy_version 135363 (0.0013) [2024-06-15 13:18:03,089][1652475] Updated weights for policy 0, policy_version 135428 (0.0016) [2024-06-15 13:18:04,350][1652475] Updated weights for policy 0, policy_version 135486 (0.0014) [2024-06-15 13:18:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 47513.6, 300 sec: 44764.4). Total num frames: 277479424. Throughput: 0: 11798.8. Samples: 69407744. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:18:05,752][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:18:09,508][1652475] Updated weights for policy 0, policy_version 135543 (0.0013) [2024-06-15 13:18:10,469][1651340] Signal inference workers to stop experience collection... (7000 times) [2024-06-15 13:18:10,506][1652475] InferenceWorker_p0-w0: stopping experience collection (7000 times) [2024-06-15 13:18:10,668][1651340] Signal inference workers to resume experience collection... (7000 times) [2024-06-15 13:18:10,670][1652475] InferenceWorker_p0-w0: resuming experience collection (7000 times) [2024-06-15 13:18:10,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 46967.4, 300 sec: 45097.7). Total num frames: 277676032. Throughput: 0: 11641.4. Samples: 69478912. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:18:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:18:11,213][1652475] Updated weights for policy 0, policy_version 135615 (0.0013) [2024-06-15 13:18:14,007][1652475] Updated weights for policy 0, policy_version 135674 (0.0013) [2024-06-15 13:18:15,405][1652475] Updated weights for policy 0, policy_version 135730 (0.0014) [2024-06-15 13:18:15,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 45319.8). Total num frames: 278003712. Throughput: 0: 11707.7. Samples: 69541376. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:18:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:18:19,798][1652475] Updated weights for policy 0, policy_version 135765 (0.0011) [2024-06-15 13:18:20,738][1648984] Fps is (10 sec: 45876.0, 60 sec: 45875.3, 300 sec: 44875.5). Total num frames: 278134784. Throughput: 0: 11457.5. Samples: 69576192. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:18:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:18:24,420][1652475] Updated weights for policy 0, policy_version 135845 (0.0012) [2024-06-15 13:18:25,739][1648984] Fps is (10 sec: 32767.2, 60 sec: 46426.9, 300 sec: 45097.6). Total num frames: 278331392. Throughput: 0: 11480.1. Samples: 69645312. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:18:25,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:18:25,987][1652475] Updated weights for policy 0, policy_version 135928 (0.0013) [2024-06-15 13:18:27,877][1652475] Updated weights for policy 0, policy_version 135973 (0.0019) [2024-06-15 13:18:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 278528000. Throughput: 0: 11127.5. Samples: 69706752. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:18:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:18:31,703][1652475] Updated weights for policy 0, policy_version 136033 (0.0029) [2024-06-15 13:18:35,738][1648984] Fps is (10 sec: 32768.7, 60 sec: 44236.9, 300 sec: 44764.4). Total num frames: 278659072. Throughput: 0: 11184.4. Samples: 69738496. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:18:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:18:36,134][1652475] Updated weights for policy 0, policy_version 136096 (0.0013) [2024-06-15 13:18:38,010][1652475] Updated weights for policy 0, policy_version 136176 (0.0013) [2024-06-15 13:18:40,712][1652475] Updated weights for policy 0, policy_version 136248 (0.0015) [2024-06-15 13:18:40,738][1648984] Fps is (10 sec: 49150.3, 60 sec: 44782.7, 300 sec: 45097.6). Total num frames: 279019520. Throughput: 0: 11195.7. Samples: 69806592. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:18:40,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:18:43,279][1652475] Updated weights for policy 0, policy_version 136294 (0.0060) [2024-06-15 13:18:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45875.4, 300 sec: 44875.5). Total num frames: 279183360. Throughput: 0: 11161.6. Samples: 69875200. Policy #0 lag: (min: 3.0, avg: 78.6, max: 259.0) [2024-06-15 13:18:45,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:18:47,286][1652475] Updated weights for policy 0, policy_version 136337 (0.0012) [2024-06-15 13:18:48,303][1652475] Updated weights for policy 0, policy_version 136389 (0.0143) [2024-06-15 13:18:50,456][1652475] Updated weights for policy 0, policy_version 136449 (0.0013) [2024-06-15 13:18:50,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44236.5, 300 sec: 45541.9). Total num frames: 279478272. Throughput: 0: 11298.0. Samples: 69916160. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:18:50,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:18:53,357][1652475] Updated weights for policy 0, policy_version 136515 (0.0016) [2024-06-15 13:18:54,688][1652475] Updated weights for policy 0, policy_version 136568 (0.0011) [2024-06-15 13:18:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 44986.6). Total num frames: 279707648. Throughput: 0: 11138.9. Samples: 69980160. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:18:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:18:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000136576_279707648.pth... [2024-06-15 13:18:55,784][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000131248_268795904.pth [2024-06-15 13:18:57,682][1651340] Signal inference workers to stop experience collection... (7050 times) [2024-06-15 13:18:57,766][1652475] InferenceWorker_p0-w0: stopping experience collection (7050 times) [2024-06-15 13:18:57,870][1651340] Signal inference workers to resume experience collection... (7050 times) [2024-06-15 13:18:57,870][1652475] InferenceWorker_p0-w0: resuming experience collection (7050 times) [2024-06-15 13:18:58,317][1652475] Updated weights for policy 0, policy_version 136613 (0.0011) [2024-06-15 13:18:59,975][1652475] Updated weights for policy 0, policy_version 136704 (0.0028) [2024-06-15 13:19:00,738][1648984] Fps is (10 sec: 49153.9, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 279969792. Throughput: 0: 11411.9. Samples: 70054912. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:19:02,860][1652475] Updated weights for policy 0, policy_version 136766 (0.0014) [2024-06-15 13:19:05,297][1652475] Updated weights for policy 0, policy_version 136820 (0.0013) [2024-06-15 13:19:05,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 45542.0). Total num frames: 280231936. Throughput: 0: 11320.9. Samples: 70085632. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:19:10,281][1652475] Updated weights for policy 0, policy_version 136855 (0.0012) [2024-06-15 13:19:10,738][1648984] Fps is (10 sec: 32766.7, 60 sec: 43690.4, 300 sec: 45097.6). Total num frames: 280297472. Throughput: 0: 11480.1. Samples: 70161920. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:10,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:19:11,291][1652475] Updated weights for policy 0, policy_version 136904 (0.0012) [2024-06-15 13:19:13,701][1652475] Updated weights for policy 0, policy_version 136981 (0.0015) [2024-06-15 13:19:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 45542.0). Total num frames: 280625152. Throughput: 0: 11389.2. Samples: 70219264. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:19:16,145][1652475] Updated weights for policy 0, policy_version 137043 (0.0024) [2024-06-15 13:19:20,738][1648984] Fps is (10 sec: 45876.1, 60 sec: 43690.5, 300 sec: 44875.5). Total num frames: 280756224. Throughput: 0: 11525.6. Samples: 70257152. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:19:21,128][1652475] Updated weights for policy 0, policy_version 137104 (0.0022) [2024-06-15 13:19:22,838][1652475] Updated weights for policy 0, policy_version 137168 (0.0016) [2024-06-15 13:19:24,299][1652475] Updated weights for policy 0, policy_version 137221 (0.0014) [2024-06-15 13:19:25,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 46421.5, 300 sec: 45875.2). Total num frames: 281116672. Throughput: 0: 11537.1. Samples: 70325760. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:19:27,858][1652475] Updated weights for policy 0, policy_version 137285 (0.0014) [2024-06-15 13:19:29,075][1652475] Updated weights for policy 0, policy_version 137334 (0.0012) [2024-06-15 13:19:30,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 45875.1, 300 sec: 45208.7). Total num frames: 281280512. Throughput: 0: 11593.9. Samples: 70396928. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:19:32,848][1652475] Updated weights for policy 0, policy_version 137382 (0.0016) [2024-06-15 13:19:33,940][1652475] Updated weights for policy 0, policy_version 137440 (0.0017) [2024-06-15 13:19:35,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 49152.0, 300 sec: 45986.3). Total num frames: 281608192. Throughput: 0: 11503.0. Samples: 70433792. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:19:36,208][1652475] Updated weights for policy 0, policy_version 137528 (0.0015) [2024-06-15 13:19:40,092][1652475] Updated weights for policy 0, policy_version 137584 (0.0018) [2024-06-15 13:19:40,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 46421.6, 300 sec: 45764.1). Total num frames: 281804800. Throughput: 0: 11639.5. Samples: 70503936. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:40,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:19:43,814][1652475] Updated weights for policy 0, policy_version 137616 (0.0020) [2024-06-15 13:19:43,908][1651340] Signal inference workers to stop experience collection... (7100 times) [2024-06-15 13:19:43,965][1652475] InferenceWorker_p0-w0: stopping experience collection (7100 times) [2024-06-15 13:19:44,133][1651340] Signal inference workers to resume experience collection... (7100 times) [2024-06-15 13:19:44,133][1652475] InferenceWorker_p0-w0: resuming experience collection (7100 times) [2024-06-15 13:19:45,092][1652475] Updated weights for policy 0, policy_version 137680 (0.0109) [2024-06-15 13:19:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 47513.5, 300 sec: 45653.0). Total num frames: 282034176. Throughput: 0: 11548.4. Samples: 70574592. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:45,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:19:46,201][1652475] Updated weights for policy 0, policy_version 137732 (0.0119) [2024-06-15 13:19:50,259][1652475] Updated weights for policy 0, policy_version 137808 (0.0014) [2024-06-15 13:19:50,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 46421.6, 300 sec: 45875.3). Total num frames: 282263552. Throughput: 0: 11514.3. Samples: 70603776. Policy #0 lag: (min: 74.0, avg: 151.9, max: 330.0) [2024-06-15 13:19:50,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:19:51,489][1652475] Updated weights for policy 0, policy_version 137855 (0.0011) [2024-06-15 13:19:55,740][1648984] Fps is (10 sec: 36036.6, 60 sec: 44781.1, 300 sec: 45430.5). Total num frames: 282394624. Throughput: 0: 11559.3. Samples: 70682112. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:19:55,741][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:19:55,929][1652475] Updated weights for policy 0, policy_version 137905 (0.0017) [2024-06-15 13:19:57,286][1652475] Updated weights for policy 0, policy_version 137969 (0.0016) [2024-06-15 13:19:58,865][1652475] Updated weights for policy 0, policy_version 138048 (0.0013) [2024-06-15 13:20:00,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 282722304. Throughput: 0: 11650.9. Samples: 70743552. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:20:02,408][1652475] Updated weights for policy 0, policy_version 138111 (0.0012) [2024-06-15 13:20:05,739][1648984] Fps is (10 sec: 45882.0, 60 sec: 43690.0, 300 sec: 45430.8). Total num frames: 282853376. Throughput: 0: 11684.8. Samples: 70782976. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:05,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:20:07,592][1652475] Updated weights for policy 0, policy_version 138208 (0.0014) [2024-06-15 13:20:09,886][1652475] Updated weights for policy 0, policy_version 138272 (0.0118) [2024-06-15 13:20:10,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 49152.3, 300 sec: 46208.4). Total num frames: 283246592. Throughput: 0: 11730.5. Samples: 70853632. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:20:12,887][1652475] Updated weights for policy 0, policy_version 138352 (0.0014) [2024-06-15 13:20:15,738][1648984] Fps is (10 sec: 52433.3, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 283377664. Throughput: 0: 11787.4. Samples: 70927360. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:15,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:20:16,800][1652475] Updated weights for policy 0, policy_version 138372 (0.0013) [2024-06-15 13:20:18,522][1652475] Updated weights for policy 0, policy_version 138464 (0.0013) [2024-06-15 13:20:20,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 49152.2, 300 sec: 45987.6). Total num frames: 283705344. Throughput: 0: 11787.4. Samples: 70964224. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:20:20,897][1652475] Updated weights for policy 0, policy_version 138535 (0.0021) [2024-06-15 13:20:22,865][1651340] Signal inference workers to stop experience collection... (7150 times) [2024-06-15 13:20:22,888][1652475] InferenceWorker_p0-w0: stopping experience collection (7150 times) [2024-06-15 13:20:23,182][1651340] Signal inference workers to resume experience collection... (7150 times) [2024-06-15 13:20:23,183][1652475] InferenceWorker_p0-w0: resuming experience collection (7150 times) [2024-06-15 13:20:23,789][1652475] Updated weights for policy 0, policy_version 138594 (0.0014) [2024-06-15 13:20:25,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 283901952. Throughput: 0: 11605.4. Samples: 71026176. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:25,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:20:29,389][1652475] Updated weights for policy 0, policy_version 138672 (0.0021) [2024-06-15 13:20:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 284098560. Throughput: 0: 11707.8. Samples: 71101440. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 13:20:30,916][1652475] Updated weights for policy 0, policy_version 138751 (0.0063) [2024-06-15 13:20:33,915][1652475] Updated weights for policy 0, policy_version 138832 (0.0014) [2024-06-15 13:20:34,995][1652475] Updated weights for policy 0, policy_version 138875 (0.0021) [2024-06-15 13:20:35,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 284426240. Throughput: 0: 11707.8. Samples: 71130624. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:20:40,515][1652475] Updated weights for policy 0, policy_version 138934 (0.0015) [2024-06-15 13:20:40,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 284557312. Throughput: 0: 11776.6. Samples: 71212032. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:20:41,647][1652475] Updated weights for policy 0, policy_version 138976 (0.0031) [2024-06-15 13:20:44,201][1652475] Updated weights for policy 0, policy_version 139043 (0.0015) [2024-06-15 13:20:45,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 47513.6, 300 sec: 46430.6). Total num frames: 284884992. Throughput: 0: 11684.9. Samples: 71269376. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:20:45,796][1652475] Updated weights for policy 0, policy_version 139120 (0.0014) [2024-06-15 13:20:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 45329.1, 300 sec: 45653.1). Total num frames: 284983296. Throughput: 0: 11742.1. Samples: 71311360. Policy #0 lag: (min: 63.0, avg: 187.9, max: 319.0) [2024-06-15 13:20:50,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:20:51,723][1652475] Updated weights for policy 0, policy_version 139194 (0.0019) [2024-06-15 13:20:53,547][1652475] Updated weights for policy 0, policy_version 139260 (0.0084) [2024-06-15 13:20:55,615][1652475] Updated weights for policy 0, policy_version 139302 (0.0013) [2024-06-15 13:20:55,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 48061.6, 300 sec: 45986.3). Total num frames: 285278208. Throughput: 0: 11628.1. Samples: 71376896. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:20:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:20:56,042][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000139328_285343744.pth... [2024-06-15 13:20:56,099][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000133952_274333696.pth [2024-06-15 13:20:57,336][1652475] Updated weights for policy 0, policy_version 139376 (0.0013) [2024-06-15 13:21:00,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 285474816. Throughput: 0: 11605.3. Samples: 71449600. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:21:03,223][1652475] Updated weights for policy 0, policy_version 139452 (0.0137) [2024-06-15 13:21:04,822][1652475] Updated weights for policy 0, policy_version 139516 (0.0109) [2024-06-15 13:21:05,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 48060.5, 300 sec: 45764.1). Total num frames: 285736960. Throughput: 0: 11525.7. Samples: 71482880. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:21:07,159][1652475] Updated weights for policy 0, policy_version 139575 (0.0021) [2024-06-15 13:21:07,789][1651340] Signal inference workers to stop experience collection... (7200 times) [2024-06-15 13:21:07,843][1652475] InferenceWorker_p0-w0: stopping experience collection (7200 times) [2024-06-15 13:21:08,095][1651340] Signal inference workers to resume experience collection... (7200 times) [2024-06-15 13:21:08,096][1652475] InferenceWorker_p0-w0: resuming experience collection (7200 times) [2024-06-15 13:21:08,593][1652475] Updated weights for policy 0, policy_version 139619 (0.0011) [2024-06-15 13:21:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 285999104. Throughput: 0: 11537.0. Samples: 71545344. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:21:13,906][1652475] Updated weights for policy 0, policy_version 139680 (0.0012) [2024-06-15 13:21:15,140][1652475] Updated weights for policy 0, policy_version 139728 (0.0123) [2024-06-15 13:21:15,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 286195712. Throughput: 0: 11514.3. Samples: 71619584. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 13:21:17,750][1652475] Updated weights for policy 0, policy_version 139792 (0.0035) [2024-06-15 13:21:18,577][1652475] Updated weights for policy 0, policy_version 139832 (0.0012) [2024-06-15 13:21:20,359][1652475] Updated weights for policy 0, policy_version 139895 (0.0044) [2024-06-15 13:21:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 286523392. Throughput: 0: 11559.8. Samples: 71650816. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:21:25,738][1648984] Fps is (10 sec: 36044.0, 60 sec: 44236.6, 300 sec: 45764.1). Total num frames: 286556160. Throughput: 0: 11343.6. Samples: 71722496. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:25,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:21:26,610][1652475] Updated weights for policy 0, policy_version 139960 (0.0014) [2024-06-15 13:21:27,772][1652475] Updated weights for policy 0, policy_version 140005 (0.0016) [2024-06-15 13:21:30,657][1652475] Updated weights for policy 0, policy_version 140064 (0.0037) [2024-06-15 13:21:30,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 45875.2, 300 sec: 45986.4). Total num frames: 286851072. Throughput: 0: 11411.9. Samples: 71782912. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:30,739][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 13:21:31,001][1651340] Saving new best policy, reward=-0.220! [2024-06-15 13:21:32,621][1652475] Updated weights for policy 0, policy_version 140144 (0.0021) [2024-06-15 13:21:35,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 43690.6, 300 sec: 45986.3). Total num frames: 287047680. Throughput: 0: 11093.3. Samples: 71810560. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:35,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:21:37,738][1652475] Updated weights for policy 0, policy_version 140192 (0.0016) [2024-06-15 13:21:39,199][1652475] Updated weights for policy 0, policy_version 140240 (0.0136) [2024-06-15 13:21:40,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 45875.0, 300 sec: 45764.1). Total num frames: 287309824. Throughput: 0: 11241.2. Samples: 71882752. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:40,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:21:42,213][1652475] Updated weights for policy 0, policy_version 140304 (0.0014) [2024-06-15 13:21:43,871][1652475] Updated weights for policy 0, policy_version 140370 (0.0012) [2024-06-15 13:21:45,025][1652475] Updated weights for policy 0, policy_version 140416 (0.0011) [2024-06-15 13:21:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 287571968. Throughput: 0: 10956.8. Samples: 71942656. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:45,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:21:50,738][1648984] Fps is (10 sec: 39322.6, 60 sec: 45329.0, 300 sec: 45542.0). Total num frames: 287703040. Throughput: 0: 11104.7. Samples: 71982592. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:50,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:21:50,984][1652475] Updated weights for policy 0, policy_version 140498 (0.0130) [2024-06-15 13:21:53,904][1652475] Updated weights for policy 0, policy_version 140549 (0.0013) [2024-06-15 13:21:54,222][1651340] Signal inference workers to stop experience collection... (7250 times) [2024-06-15 13:21:54,266][1652475] InferenceWorker_p0-w0: stopping experience collection (7250 times) [2024-06-15 13:21:54,412][1651340] Signal inference workers to resume experience collection... (7250 times) [2024-06-15 13:21:54,413][1652475] InferenceWorker_p0-w0: resuming experience collection (7250 times) [2024-06-15 13:21:55,742][1648984] Fps is (10 sec: 39304.0, 60 sec: 44779.6, 300 sec: 45763.4). Total num frames: 287965184. Throughput: 0: 11206.0. Samples: 72049664. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:21:55,743][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:21:56,187][1652475] Updated weights for policy 0, policy_version 140640 (0.0012) [2024-06-15 13:21:57,021][1652475] Updated weights for policy 0, policy_version 140671 (0.0011) [2024-06-15 13:22:00,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 43690.5, 300 sec: 45653.0). Total num frames: 288096256. Throughput: 0: 10979.5. Samples: 72113664. Policy #0 lag: (min: 6.0, avg: 127.5, max: 262.0) [2024-06-15 13:22:00,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:22:02,616][1652475] Updated weights for policy 0, policy_version 140731 (0.0013) [2024-06-15 13:22:03,790][1652475] Updated weights for policy 0, policy_version 140793 (0.0012) [2024-06-15 13:22:05,738][1648984] Fps is (10 sec: 39339.3, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 288358400. Throughput: 0: 10979.6. Samples: 72144896. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:05,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:22:07,563][1652475] Updated weights for policy 0, policy_version 140864 (0.0014) [2024-06-15 13:22:09,279][1652475] Updated weights for policy 0, policy_version 140927 (0.0019) [2024-06-15 13:22:10,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 288620544. Throughput: 0: 10683.8. Samples: 72203264. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:10,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:22:14,076][1652475] Updated weights for policy 0, policy_version 140976 (0.0015) [2024-06-15 13:22:15,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 44236.8, 300 sec: 45653.1). Total num frames: 288849920. Throughput: 0: 10945.4. Samples: 72275456. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:15,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:22:18,365][1652475] Updated weights for policy 0, policy_version 141072 (0.0184) [2024-06-15 13:22:20,291][1652475] Updated weights for policy 0, policy_version 141152 (0.0012) [2024-06-15 13:22:20,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42598.5, 300 sec: 45876.4). Total num frames: 289079296. Throughput: 0: 11195.8. Samples: 72314368. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:20,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:22:24,557][1652475] Updated weights for policy 0, policy_version 141187 (0.0012) [2024-06-15 13:22:25,739][1648984] Fps is (10 sec: 42598.3, 60 sec: 45329.2, 300 sec: 45430.9). Total num frames: 289275904. Throughput: 0: 10991.0. Samples: 72377344. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:25,740][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:22:27,177][1652475] Updated weights for policy 0, policy_version 141266 (0.0014) [2024-06-15 13:22:30,207][1652475] Updated weights for policy 0, policy_version 141344 (0.0133) [2024-06-15 13:22:30,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44236.8, 300 sec: 45764.1). Total num frames: 289505280. Throughput: 0: 11150.2. Samples: 72444416. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:30,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 13:22:32,522][1652475] Updated weights for policy 0, policy_version 141424 (0.0011) [2024-06-15 13:22:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 45208.7). Total num frames: 289669120. Throughput: 0: 10877.2. Samples: 72472064. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:35,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 13:22:36,524][1652475] Updated weights for policy 0, policy_version 141444 (0.0011) [2024-06-15 13:22:37,338][1652475] Updated weights for policy 0, policy_version 141492 (0.0014) [2024-06-15 13:22:38,496][1651340] Signal inference workers to stop experience collection... (7300 times) [2024-06-15 13:22:38,572][1652475] InferenceWorker_p0-w0: stopping experience collection (7300 times) [2024-06-15 13:22:38,681][1651340] Signal inference workers to resume experience collection... (7300 times) [2024-06-15 13:22:38,682][1652475] InferenceWorker_p0-w0: resuming experience collection (7300 times) [2024-06-15 13:22:39,180][1652475] Updated weights for policy 0, policy_version 141559 (0.0195) [2024-06-15 13:22:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.9, 300 sec: 45764.2). Total num frames: 289931264. Throughput: 0: 10980.7. Samples: 72543744. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:22:42,222][1652475] Updated weights for policy 0, policy_version 141616 (0.0015) [2024-06-15 13:22:44,129][1652475] Updated weights for policy 0, policy_version 141686 (0.0012) [2024-06-15 13:22:45,740][1648984] Fps is (10 sec: 52415.6, 60 sec: 43688.9, 300 sec: 45319.4). Total num frames: 290193408. Throughput: 0: 10933.5. Samples: 72605696. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:45,742][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:22:50,708][1652475] Updated weights for policy 0, policy_version 141760 (0.0013) [2024-06-15 13:22:50,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.8, 300 sec: 45319.8). Total num frames: 290324480. Throughput: 0: 10991.0. Samples: 72639488. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:50,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:22:51,968][1652475] Updated weights for policy 0, policy_version 141821 (0.0031) [2024-06-15 13:22:54,901][1652475] Updated weights for policy 0, policy_version 141879 (0.0013) [2024-06-15 13:22:55,738][1648984] Fps is (10 sec: 39329.5, 60 sec: 43693.6, 300 sec: 45319.7). Total num frames: 290586624. Throughput: 0: 11036.3. Samples: 72699904. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:22:55,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:22:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000141888_290586624.pth... [2024-06-15 13:22:55,821][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000136576_279707648.pth [2024-06-15 13:22:57,587][1652475] Updated weights for policy 0, policy_version 141939 (0.0099) [2024-06-15 13:23:00,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.8, 300 sec: 44875.5). Total num frames: 290717696. Throughput: 0: 10888.5. Samples: 72765440. Policy #0 lag: (min: 15.0, avg: 89.9, max: 271.0) [2024-06-15 13:23:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:23:02,229][1652475] Updated weights for policy 0, policy_version 141984 (0.0014) [2024-06-15 13:23:04,552][1652475] Updated weights for policy 0, policy_version 142074 (0.0130) [2024-06-15 13:23:05,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.5, 300 sec: 45097.6). Total num frames: 290979840. Throughput: 0: 10672.3. Samples: 72794624. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:05,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:23:08,048][1652475] Updated weights for policy 0, policy_version 142135 (0.0023) [2024-06-15 13:23:09,943][1652475] Updated weights for policy 0, policy_version 142207 (0.0012) [2024-06-15 13:23:10,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 291241984. Throughput: 0: 10592.7. Samples: 72854016. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:23:15,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 42052.3, 300 sec: 44875.5). Total num frames: 291373056. Throughput: 0: 10661.0. Samples: 72924160. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:23:16,110][1652475] Updated weights for policy 0, policy_version 142288 (0.0016) [2024-06-15 13:23:17,319][1652475] Updated weights for policy 0, policy_version 142336 (0.0013) [2024-06-15 13:23:20,612][1652475] Updated weights for policy 0, policy_version 142398 (0.0013) [2024-06-15 13:23:20,742][1648984] Fps is (10 sec: 39303.8, 60 sec: 42595.2, 300 sec: 45097.0). Total num frames: 291635200. Throughput: 0: 10682.7. Samples: 72952832. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:20,743][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:23:25,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41506.1, 300 sec: 44875.5). Total num frames: 291766272. Throughput: 0: 10513.0. Samples: 73016832. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:23:26,654][1652475] Updated weights for policy 0, policy_version 142480 (0.0014) [2024-06-15 13:23:26,809][1651340] Signal inference workers to stop experience collection... (7350 times) [2024-06-15 13:23:26,851][1652475] InferenceWorker_p0-w0: stopping experience collection (7350 times) [2024-06-15 13:23:27,046][1651340] Signal inference workers to resume experience collection... (7350 times) [2024-06-15 13:23:27,047][1652475] InferenceWorker_p0-w0: resuming experience collection (7350 times) [2024-06-15 13:23:27,892][1652475] Updated weights for policy 0, policy_version 142527 (0.0013) [2024-06-15 13:23:29,676][1652475] Updated weights for policy 0, policy_version 142586 (0.0013) [2024-06-15 13:23:30,738][1648984] Fps is (10 sec: 39338.9, 60 sec: 42052.2, 300 sec: 45319.8). Total num frames: 292028416. Throughput: 0: 10525.0. Samples: 73079296. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:30,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:23:32,150][1652475] Updated weights for policy 0, policy_version 142652 (0.0020) [2024-06-15 13:23:33,903][1652475] Updated weights for policy 0, policy_version 142704 (0.0023) [2024-06-15 13:23:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 44986.6). Total num frames: 292290560. Throughput: 0: 10478.9. Samples: 73111040. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:23:40,502][1652475] Updated weights for policy 0, policy_version 142775 (0.0012) [2024-06-15 13:23:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 44875.5). Total num frames: 292421632. Throughput: 0: 10649.7. Samples: 73179136. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:23:41,618][1652475] Updated weights for policy 0, policy_version 142805 (0.0011) [2024-06-15 13:23:44,334][1652475] Updated weights for policy 0, policy_version 142868 (0.0035) [2024-06-15 13:23:45,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 42053.8, 300 sec: 44875.5). Total num frames: 292716544. Throughput: 0: 10478.9. Samples: 73236992. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:45,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:23:46,309][1652475] Updated weights for policy 0, policy_version 142960 (0.0014) [2024-06-15 13:23:50,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 44431.2). Total num frames: 292814848. Throughput: 0: 10513.1. Samples: 73267712. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:50,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:23:51,237][1652475] Updated weights for policy 0, policy_version 143008 (0.0121) [2024-06-15 13:23:54,577][1652475] Updated weights for policy 0, policy_version 143072 (0.0014) [2024-06-15 13:23:55,604][1652475] Updated weights for policy 0, policy_version 143106 (0.0011) [2024-06-15 13:23:55,738][1648984] Fps is (10 sec: 36045.7, 60 sec: 41506.4, 300 sec: 44431.2). Total num frames: 293076992. Throughput: 0: 10752.0. Samples: 73337856. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:23:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:23:57,406][1652475] Updated weights for policy 0, policy_version 143184 (0.0039) [2024-06-15 13:23:58,502][1652475] Updated weights for policy 0, policy_version 143232 (0.0012) [2024-06-15 13:24:00,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 293339136. Throughput: 0: 10638.2. Samples: 73402880. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:24:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:24:02,905][1652475] Updated weights for policy 0, policy_version 143280 (0.0017) [2024-06-15 13:24:05,466][1652475] Updated weights for policy 0, policy_version 143302 (0.0018) [2024-06-15 13:24:05,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 42052.4, 300 sec: 44764.5). Total num frames: 293502976. Throughput: 0: 10821.3. Samples: 73439744. Policy #0 lag: (min: 87.0, avg: 180.7, max: 343.0) [2024-06-15 13:24:05,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:24:06,726][1652475] Updated weights for policy 0, policy_version 143354 (0.0014) [2024-06-15 13:24:08,628][1652475] Updated weights for policy 0, policy_version 143424 (0.0011) [2024-06-15 13:24:09,431][1651340] Signal inference workers to stop experience collection... (7400 times) [2024-06-15 13:24:09,460][1652475] InferenceWorker_p0-w0: stopping experience collection (7400 times) [2024-06-15 13:24:09,684][1651340] Signal inference workers to resume experience collection... (7400 times) [2024-06-15 13:24:09,685][1652475] InferenceWorker_p0-w0: resuming experience collection (7400 times) [2024-06-15 13:24:09,903][1652475] Updated weights for policy 0, policy_version 143485 (0.0012) [2024-06-15 13:24:10,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 44875.5). Total num frames: 293863424. Throughput: 0: 10774.8. Samples: 73501696. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:24:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:24:15,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 43690.6, 300 sec: 44875.5). Total num frames: 293994496. Throughput: 0: 10968.2. Samples: 73572864. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:24:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:24:17,640][1652475] Updated weights for policy 0, policy_version 143556 (0.0124) [2024-06-15 13:24:18,797][1652475] Updated weights for policy 0, policy_version 143608 (0.0011) [2024-06-15 13:24:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43694.0, 300 sec: 44542.3). Total num frames: 294256640. Throughput: 0: 11093.4. Samples: 73610240. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:24:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:24:20,950][1652475] Updated weights for policy 0, policy_version 143698 (0.0014) [2024-06-15 13:24:21,958][1652475] Updated weights for policy 0, policy_version 143744 (0.0015) [2024-06-15 13:24:25,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 44875.5). Total num frames: 294518784. Throughput: 0: 11013.7. Samples: 73674752. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:24:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:24:29,231][1652475] Updated weights for policy 0, policy_version 143824 (0.0015) [2024-06-15 13:24:30,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 43690.6, 300 sec: 44209.0). Total num frames: 294649856. Throughput: 0: 11275.4. Samples: 73744384. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:24:30,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:24:31,363][1652475] Updated weights for policy 0, policy_version 143904 (0.0014) [2024-06-15 13:24:33,252][1652475] Updated weights for policy 0, policy_version 143974 (0.0011) [2024-06-15 13:24:33,912][1652475] Updated weights for policy 0, policy_version 144000 (0.0012) [2024-06-15 13:24:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 294912000. Throughput: 0: 11207.1. Samples: 73772032. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:24:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:24:40,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 43690.7, 300 sec: 44098.0). Total num frames: 295043072. Throughput: 0: 11195.7. Samples: 73841664. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:24:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:24:40,998][1652475] Updated weights for policy 0, policy_version 144067 (0.0013) [2024-06-15 13:24:42,815][1652475] Updated weights for policy 0, policy_version 144149 (0.0015) [2024-06-15 13:24:44,640][1652475] Updated weights for policy 0, policy_version 144227 (0.0014) [2024-06-15 13:24:45,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 45329.1, 300 sec: 44653.3). Total num frames: 295436288. Throughput: 0: 11161.5. Samples: 73905152. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:24:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:24:47,909][1652475] Updated weights for policy 0, policy_version 144259 (0.0016) [2024-06-15 13:24:49,119][1652475] Updated weights for policy 0, policy_version 144318 (0.0132) [2024-06-15 13:24:50,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 44653.7). Total num frames: 295567360. Throughput: 0: 11184.4. Samples: 73943040. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:24:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:24:54,036][1652475] Updated weights for policy 0, policy_version 144370 (0.0014) [2024-06-15 13:24:54,763][1651340] Signal inference workers to stop experience collection... (7450 times) [2024-06-15 13:24:54,823][1652475] InferenceWorker_p0-w0: stopping experience collection (7450 times) [2024-06-15 13:24:54,993][1651340] Signal inference workers to resume experience collection... (7450 times) [2024-06-15 13:24:54,994][1652475] InferenceWorker_p0-w0: resuming experience collection (7450 times) [2024-06-15 13:24:55,740][1648984] Fps is (10 sec: 36038.3, 60 sec: 45327.6, 300 sec: 44319.8). Total num frames: 295796736. Throughput: 0: 11320.4. Samples: 74011136. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:24:55,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 13:24:55,784][1652475] Updated weights for policy 0, policy_version 144448 (0.0013) [2024-06-15 13:24:56,215][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000144464_295862272.pth... [2024-06-15 13:24:56,423][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000139328_285343744.pth [2024-06-15 13:25:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 44431.3). Total num frames: 295960576. Throughput: 0: 10945.4. Samples: 74065408. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:25:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:25:02,680][1652475] Updated weights for policy 0, policy_version 144528 (0.0095) [2024-06-15 13:25:05,738][1648984] Fps is (10 sec: 32772.3, 60 sec: 43690.3, 300 sec: 43653.5). Total num frames: 296124416. Throughput: 0: 10831.5. Samples: 74097664. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:25:05,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:25:06,078][1652475] Updated weights for policy 0, policy_version 144608 (0.0015) [2024-06-15 13:25:07,928][1652475] Updated weights for policy 0, policy_version 144675 (0.0011) [2024-06-15 13:25:09,551][1652475] Updated weights for policy 0, policy_version 144739 (0.0011) [2024-06-15 13:25:10,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 296484864. Throughput: 0: 10535.8. Samples: 74148864. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:25:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:25:15,739][1648984] Fps is (10 sec: 36041.8, 60 sec: 41505.1, 300 sec: 43320.2). Total num frames: 296484864. Throughput: 0: 10569.6. Samples: 74220032. Policy #0 lag: (min: 19.0, avg: 132.3, max: 275.0) [2024-06-15 13:25:15,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:25:16,534][1652475] Updated weights for policy 0, policy_version 144784 (0.0121) [2024-06-15 13:25:18,264][1652475] Updated weights for policy 0, policy_version 144864 (0.0100) [2024-06-15 13:25:19,582][1652475] Updated weights for policy 0, policy_version 144912 (0.0013) [2024-06-15 13:25:20,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 43875.8). Total num frames: 296845312. Throughput: 0: 10581.3. Samples: 74248192. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:25:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:25:22,323][1652475] Updated weights for policy 0, policy_version 144979 (0.0017) [2024-06-15 13:25:25,738][1648984] Fps is (10 sec: 52436.9, 60 sec: 41506.2, 300 sec: 43764.7). Total num frames: 297009152. Throughput: 0: 10467.6. Samples: 74312704. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:25:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:25:29,168][1652475] Updated weights for policy 0, policy_version 145044 (0.0013) [2024-06-15 13:25:30,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42052.4, 300 sec: 43209.3). Total num frames: 297172992. Throughput: 0: 10558.6. Samples: 74380288. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:25:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:25:31,294][1652475] Updated weights for policy 0, policy_version 145122 (0.0014) [2024-06-15 13:25:33,030][1652475] Updated weights for policy 0, policy_version 145186 (0.0011) [2024-06-15 13:25:34,600][1652475] Updated weights for policy 0, policy_version 145232 (0.0012) [2024-06-15 13:25:35,710][1652475] Updated weights for policy 0, policy_version 145280 (0.0041) [2024-06-15 13:25:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 297533440. Throughput: 0: 10160.4. Samples: 74400256. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:25:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:25:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.2, 300 sec: 42876.1). Total num frames: 297533440. Throughput: 0: 10229.1. Samples: 74471424. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:25:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:25:42,479][1651340] Signal inference workers to stop experience collection... (7500 times) [2024-06-15 13:25:42,520][1652475] InferenceWorker_p0-w0: stopping experience collection (7500 times) [2024-06-15 13:25:42,718][1651340] Signal inference workers to resume experience collection... (7500 times) [2024-06-15 13:25:42,719][1652475] InferenceWorker_p0-w0: resuming experience collection (7500 times) [2024-06-15 13:25:43,839][1652475] Updated weights for policy 0, policy_version 145378 (0.0164) [2024-06-15 13:25:45,358][1652475] Updated weights for policy 0, policy_version 145456 (0.0013) [2024-06-15 13:25:45,739][1648984] Fps is (10 sec: 39318.0, 60 sec: 41505.7, 300 sec: 43875.7). Total num frames: 297926656. Throughput: 0: 10239.8. Samples: 74526208. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:25:45,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:25:47,798][1652475] Updated weights for policy 0, policy_version 145520 (0.0014) [2024-06-15 13:25:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 298057728. Throughput: 0: 10171.9. Samples: 74555392. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:25:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:25:55,535][1652475] Updated weights for policy 0, policy_version 145584 (0.0118) [2024-06-15 13:25:55,738][1648984] Fps is (10 sec: 22939.4, 60 sec: 39322.9, 300 sec: 42987.2). Total num frames: 298156032. Throughput: 0: 10752.0. Samples: 74632704. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:25:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:25:57,988][1652475] Updated weights for policy 0, policy_version 145696 (0.0119) [2024-06-15 13:25:59,676][1652475] Updated weights for policy 0, policy_version 145730 (0.0010) [2024-06-15 13:26:00,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43144.4, 300 sec: 43431.5). Total num frames: 298549248. Throughput: 0: 10206.2. Samples: 74679296. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:26:00,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:26:05,742][1648984] Fps is (10 sec: 42579.8, 60 sec: 40957.5, 300 sec: 42653.3). Total num frames: 298582016. Throughput: 0: 10318.6. Samples: 74712576. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:26:05,743][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:26:06,327][1652475] Updated weights for policy 0, policy_version 145793 (0.0014) [2024-06-15 13:26:07,624][1652475] Updated weights for policy 0, policy_version 145850 (0.0014) [2024-06-15 13:26:10,738][1648984] Fps is (10 sec: 22937.9, 60 sec: 38229.3, 300 sec: 42653.9). Total num frames: 298778624. Throughput: 0: 10387.9. Samples: 74780160. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:26:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:26:11,187][1652475] Updated weights for policy 0, policy_version 145920 (0.0015) [2024-06-15 13:26:13,688][1652475] Updated weights for policy 0, policy_version 146016 (0.0022) [2024-06-15 13:26:15,738][1648984] Fps is (10 sec: 52451.6, 60 sec: 43691.7, 300 sec: 42653.9). Total num frames: 299106304. Throughput: 0: 10046.6. Samples: 74832384. Policy #0 lag: (min: 13.0, avg: 77.6, max: 269.0) [2024-06-15 13:26:15,762][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:26:19,790][1652475] Updated weights for policy 0, policy_version 146105 (0.0016) [2024-06-15 13:26:20,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 39867.8, 300 sec: 42987.2). Total num frames: 299237376. Throughput: 0: 10490.3. Samples: 74872320. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:26:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:26:22,181][1652475] Updated weights for policy 0, policy_version 146130 (0.0013) [2024-06-15 13:26:23,464][1652475] Updated weights for policy 0, policy_version 146192 (0.0014) [2024-06-15 13:26:24,936][1651340] Signal inference workers to stop experience collection... (7550 times) [2024-06-15 13:26:24,970][1652475] InferenceWorker_p0-w0: stopping experience collection (7550 times) [2024-06-15 13:26:25,321][1651340] Signal inference workers to resume experience collection... (7550 times) [2024-06-15 13:26:25,322][1652475] InferenceWorker_p0-w0: resuming experience collection (7550 times) [2024-06-15 13:26:25,324][1652475] Updated weights for policy 0, policy_version 146256 (0.0014) [2024-06-15 13:26:25,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 299565056. Throughput: 0: 10456.1. Samples: 74941952. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:26:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:26:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40960.0, 300 sec: 42654.0). Total num frames: 299630592. Throughput: 0: 10729.5. Samples: 75009024. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:26:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:26:30,853][1652475] Updated weights for policy 0, policy_version 146320 (0.0028) [2024-06-15 13:26:33,983][1652475] Updated weights for policy 0, policy_version 146384 (0.0013) [2024-06-15 13:26:35,742][1648984] Fps is (10 sec: 36029.2, 60 sec: 39864.8, 300 sec: 42764.4). Total num frames: 299925504. Throughput: 0: 10830.6. Samples: 75042816. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:26:35,742][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:26:35,938][1652475] Updated weights for policy 0, policy_version 146467 (0.0013) [2024-06-15 13:26:38,120][1652475] Updated weights for policy 0, policy_version 146556 (0.0117) [2024-06-15 13:26:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 300154880. Throughput: 0: 10251.4. Samples: 75094016. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:26:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:26:44,596][1652475] Updated weights for policy 0, policy_version 146624 (0.0014) [2024-06-15 13:26:45,739][1648984] Fps is (10 sec: 36060.8, 60 sec: 39322.2, 300 sec: 42653.9). Total num frames: 300285952. Throughput: 0: 10740.6. Samples: 75162624. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:26:45,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:26:47,614][1652475] Updated weights for policy 0, policy_version 146676 (0.0026) [2024-06-15 13:26:49,814][1652475] Updated weights for policy 0, policy_version 146768 (0.0013) [2024-06-15 13:26:50,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 43144.6, 300 sec: 42987.8). Total num frames: 300646400. Throughput: 0: 10764.4. Samples: 75196928. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:26:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:26:55,221][1652475] Updated weights for policy 0, policy_version 146825 (0.0014) [2024-06-15 13:26:55,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 300744704. Throughput: 0: 10592.7. Samples: 75256832. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:26:55,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:26:56,160][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000146864_300777472.pth... [2024-06-15 13:26:56,196][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000141888_290586624.pth [2024-06-15 13:26:56,540][1652475] Updated weights for policy 0, policy_version 146880 (0.0012) [2024-06-15 13:27:00,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40414.0, 300 sec: 42765.0). Total num frames: 300974080. Throughput: 0: 10820.3. Samples: 75319296. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:27:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:27:01,077][1652475] Updated weights for policy 0, policy_version 146978 (0.0015) [2024-06-15 13:27:02,774][1652475] Updated weights for policy 0, policy_version 147056 (0.0016) [2024-06-15 13:27:05,738][1648984] Fps is (10 sec: 45873.9, 60 sec: 43693.6, 300 sec: 42653.9). Total num frames: 301203456. Throughput: 0: 10581.2. Samples: 75348480. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:27:05,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:27:08,240][1652475] Updated weights for policy 0, policy_version 147120 (0.0013) [2024-06-15 13:27:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 301334528. Throughput: 0: 10683.8. Samples: 75422720. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:27:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:27:10,744][1652475] Updated weights for policy 0, policy_version 147145 (0.0011) [2024-06-15 13:27:11,358][1651340] Signal inference workers to stop experience collection... (7600 times) [2024-06-15 13:27:11,427][1652475] InferenceWorker_p0-w0: stopping experience collection (7600 times) [2024-06-15 13:27:11,569][1651340] Signal inference workers to resume experience collection... (7600 times) [2024-06-15 13:27:11,586][1652475] InferenceWorker_p0-w0: resuming experience collection (7600 times) [2024-06-15 13:27:12,040][1652475] Updated weights for policy 0, policy_version 147201 (0.0013) [2024-06-15 13:27:13,948][1652475] Updated weights for policy 0, policy_version 147296 (0.0013) [2024-06-15 13:27:15,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 301727744. Throughput: 0: 10592.7. Samples: 75485696. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:27:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 13:27:20,238][1652475] Updated weights for policy 0, policy_version 147376 (0.0014) [2024-06-15 13:27:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 301858816. Throughput: 0: 10707.6. Samples: 75524608. Policy #0 lag: (min: 57.0, avg: 149.1, max: 313.0) [2024-06-15 13:27:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:27:22,847][1652475] Updated weights for policy 0, policy_version 147440 (0.0016) [2024-06-15 13:27:25,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 302186496. Throughput: 0: 10865.8. Samples: 75582976. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:27:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:27:26,445][1652475] Updated weights for policy 0, policy_version 147578 (0.0014) [2024-06-15 13:27:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 302252032. Throughput: 0: 10535.8. Samples: 75636736. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:27:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:27:34,738][1652475] Updated weights for policy 0, policy_version 147622 (0.0112) [2024-06-15 13:27:35,738][1648984] Fps is (10 sec: 22937.8, 60 sec: 41509.2, 300 sec: 42320.7). Total num frames: 302415872. Throughput: 0: 10547.2. Samples: 75671552. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:27:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:27:35,794][1652475] Updated weights for policy 0, policy_version 147680 (0.0012) [2024-06-15 13:27:38,184][1652475] Updated weights for policy 0, policy_version 147771 (0.0161) [2024-06-15 13:27:40,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 42432.1). Total num frames: 302710784. Throughput: 0: 10410.7. Samples: 75725312. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:27:40,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 13:27:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 302776320. Throughput: 0: 10615.5. Samples: 75796992. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:27:45,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 13:27:46,462][1652475] Updated weights for policy 0, policy_version 147842 (0.0015) [2024-06-15 13:27:48,303][1652475] Updated weights for policy 0, policy_version 147921 (0.0035) [2024-06-15 13:27:50,707][1652475] Updated weights for policy 0, policy_version 148016 (0.0016) [2024-06-15 13:27:50,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 41506.1, 300 sec: 42542.9). Total num frames: 303136768. Throughput: 0: 10626.9. Samples: 75826688. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:27:50,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 13:27:51,145][1652475] Updated weights for policy 0, policy_version 148032 (0.0012) [2024-06-15 13:27:53,592][1651340] Signal inference workers to stop experience collection... (7650 times) [2024-06-15 13:27:53,628][1652475] InferenceWorker_p0-w0: stopping experience collection (7650 times) [2024-06-15 13:27:53,967][1651340] Signal inference workers to resume experience collection... (7650 times) [2024-06-15 13:27:53,967][1652475] InferenceWorker_p0-w0: resuming experience collection (7650 times) [2024-06-15 13:27:54,335][1652475] Updated weights for policy 0, policy_version 148095 (0.0014) [2024-06-15 13:27:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 303300608. Throughput: 0: 10149.0. Samples: 75879424. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:27:55,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 13:28:00,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 40960.0, 300 sec: 42209.7). Total num frames: 303431680. Throughput: 0: 10387.9. Samples: 75953152. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:28:00,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 13:28:00,759][1652475] Updated weights for policy 0, policy_version 148176 (0.0014) [2024-06-15 13:28:02,092][1652475] Updated weights for policy 0, policy_version 148224 (0.0042) [2024-06-15 13:28:05,665][1652475] Updated weights for policy 0, policy_version 148289 (0.0017) [2024-06-15 13:28:05,738][1648984] Fps is (10 sec: 39319.8, 60 sec: 41506.0, 300 sec: 42209.5). Total num frames: 303693824. Throughput: 0: 9989.6. Samples: 75974144. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:28:05,746][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 13:28:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 303824896. Throughput: 0: 10228.6. Samples: 76043264. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:28:10,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 13:28:11,753][1652475] Updated weights for policy 0, policy_version 148357 (0.0030) [2024-06-15 13:28:13,761][1652475] Updated weights for policy 0, policy_version 148433 (0.0119) [2024-06-15 13:28:15,738][1648984] Fps is (10 sec: 42601.0, 60 sec: 39867.8, 300 sec: 42321.4). Total num frames: 304119808. Throughput: 0: 10274.1. Samples: 76099072. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:28:15,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:28:16,267][1652475] Updated weights for policy 0, policy_version 148528 (0.0016) [2024-06-15 13:28:19,335][1652475] Updated weights for policy 0, policy_version 148580 (0.0014) [2024-06-15 13:28:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 304349184. Throughput: 0: 10137.6. Samples: 76127744. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:28:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:28:25,166][1652475] Updated weights for policy 0, policy_version 148656 (0.0017) [2024-06-15 13:28:25,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 38229.3, 300 sec: 42209.6). Total num frames: 304480256. Throughput: 0: 10490.3. Samples: 76197376. Policy #0 lag: (min: 63.0, avg: 147.1, max: 319.0) [2024-06-15 13:28:25,741][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 13:28:28,269][1652475] Updated weights for policy 0, policy_version 148721 (0.0013) [2024-06-15 13:28:30,026][1652475] Updated weights for policy 0, policy_version 148796 (0.0013) [2024-06-15 13:28:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 304742400. Throughput: 0: 9921.4. Samples: 76243456. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:28:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:28:32,075][1652475] Updated weights for policy 0, policy_version 148849 (0.0016) [2024-06-15 13:28:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 40959.9, 300 sec: 42209.6). Total num frames: 304873472. Throughput: 0: 9966.9. Samples: 76275200. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:28:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:28:36,655][1652475] Updated weights for policy 0, policy_version 148899 (0.0031) [2024-06-15 13:28:40,344][1652475] Updated weights for policy 0, policy_version 148946 (0.0012) [2024-06-15 13:28:40,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 39321.6, 300 sec: 41876.4). Total num frames: 305070080. Throughput: 0: 10444.8. Samples: 76349440. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:28:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:28:41,502][1651340] Signal inference workers to stop experience collection... (7700 times) [2024-06-15 13:28:41,618][1652475] InferenceWorker_p0-w0: stopping experience collection (7700 times) [2024-06-15 13:28:41,790][1651340] Signal inference workers to resume experience collection... (7700 times) [2024-06-15 13:28:41,791][1652475] InferenceWorker_p0-w0: resuming experience collection (7700 times) [2024-06-15 13:28:42,094][1652475] Updated weights for policy 0, policy_version 149041 (0.0087) [2024-06-15 13:28:43,668][1652475] Updated weights for policy 0, policy_version 149115 (0.0013) [2024-06-15 13:28:45,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 305397760. Throughput: 0: 10217.2. Samples: 76412928. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:28:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:28:48,314][1652475] Updated weights for policy 0, policy_version 149177 (0.0014) [2024-06-15 13:28:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 39867.7, 300 sec: 42209.6). Total num frames: 305528832. Throughput: 0: 10490.4. Samples: 76446208. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:28:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:28:52,642][1652475] Updated weights for policy 0, policy_version 149248 (0.0012) [2024-06-15 13:28:54,865][1652475] Updated weights for policy 0, policy_version 149329 (0.0014) [2024-06-15 13:28:55,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.6, 300 sec: 42542.9). Total num frames: 305889280. Throughput: 0: 10444.8. Samples: 76513280. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:28:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:28:55,762][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000149376_305922048.pth... [2024-06-15 13:28:55,835][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000144464_295862272.pth [2024-06-15 13:28:58,618][1652475] Updated weights for policy 0, policy_version 149377 (0.0134) [2024-06-15 13:29:00,739][1648984] Fps is (10 sec: 52420.2, 60 sec: 43689.5, 300 sec: 42542.6). Total num frames: 306053120. Throughput: 0: 10751.6. Samples: 76582912. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:29:00,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:29:03,452][1652475] Updated weights for policy 0, policy_version 149472 (0.0011) [2024-06-15 13:29:05,423][1652475] Updated weights for policy 0, policy_version 149567 (0.0015) [2024-06-15 13:29:05,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43691.0, 300 sec: 42209.6). Total num frames: 306315264. Throughput: 0: 11036.4. Samples: 76624384. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:29:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:29:07,512][1652475] Updated weights for policy 0, policy_version 149629 (0.0014) [2024-06-15 13:29:10,738][1648984] Fps is (10 sec: 42604.4, 60 sec: 44236.7, 300 sec: 42320.7). Total num frames: 306479104. Throughput: 0: 10865.7. Samples: 76686336. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:29:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:29:11,719][1652475] Updated weights for policy 0, policy_version 149688 (0.0013) [2024-06-15 13:29:14,986][1652475] Updated weights for policy 0, policy_version 149728 (0.0017) [2024-06-15 13:29:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.4, 300 sec: 42209.6). Total num frames: 306708480. Throughput: 0: 11400.5. Samples: 76756480. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:29:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:29:16,715][1652475] Updated weights for policy 0, policy_version 149808 (0.0014) [2024-06-15 13:29:18,164][1652475] Updated weights for policy 0, policy_version 149845 (0.0011) [2024-06-15 13:29:20,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 43690.6, 300 sec: 42209.6). Total num frames: 306970624. Throughput: 0: 11309.5. Samples: 76784128. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:29:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:29:22,531][1652475] Updated weights for policy 0, policy_version 149905 (0.0087) [2024-06-15 13:29:25,683][1652475] Updated weights for policy 0, policy_version 149955 (0.0012) [2024-06-15 13:29:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 42209.7). Total num frames: 307101696. Throughput: 0: 11275.4. Samples: 76856832. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:29:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:29:26,618][1651340] Signal inference workers to stop experience collection... (7750 times) [2024-06-15 13:29:26,705][1652475] InferenceWorker_p0-w0: stopping experience collection (7750 times) [2024-06-15 13:29:26,950][1651340] Signal inference workers to resume experience collection... (7750 times) [2024-06-15 13:29:26,980][1652475] InferenceWorker_p0-w0: resuming experience collection (7750 times) [2024-06-15 13:29:27,694][1652475] Updated weights for policy 0, policy_version 150048 (0.0086) [2024-06-15 13:29:29,805][1652475] Updated weights for policy 0, policy_version 150114 (0.0013) [2024-06-15 13:29:30,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 45875.3, 300 sec: 42654.0). Total num frames: 307494912. Throughput: 0: 11173.0. Samples: 76915712. Policy #0 lag: (min: 5.0, avg: 124.6, max: 325.0) [2024-06-15 13:29:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:29:35,300][1652475] Updated weights for policy 0, policy_version 150197 (0.0014) [2024-06-15 13:29:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 42654.0). Total num frames: 307625984. Throughput: 0: 11389.2. Samples: 76958720. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:29:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:29:38,675][1652475] Updated weights for policy 0, policy_version 150271 (0.0014) [2024-06-15 13:29:40,664][1652475] Updated weights for policy 0, policy_version 150339 (0.0014) [2024-06-15 13:29:40,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 46967.5, 300 sec: 42209.7). Total num frames: 307888128. Throughput: 0: 11286.8. Samples: 77021184. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:29:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:29:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 308019200. Throughput: 0: 11127.9. Samples: 77083648. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:29:45,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:29:47,809][1652475] Updated weights for policy 0, policy_version 150401 (0.0084) [2024-06-15 13:29:49,626][1652475] Updated weights for policy 0, policy_version 150480 (0.0013) [2024-06-15 13:29:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 42321.0). Total num frames: 308281344. Throughput: 0: 11036.5. Samples: 77121024. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:29:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:29:51,722][1652475] Updated weights for policy 0, policy_version 150586 (0.0026) [2024-06-15 13:29:55,312][1652475] Updated weights for policy 0, policy_version 150654 (0.0012) [2024-06-15 13:29:55,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44236.8, 300 sec: 42653.9). Total num frames: 308543488. Throughput: 0: 10968.2. Samples: 77179904. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:29:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:30:00,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 43145.8, 300 sec: 42431.9). Total num frames: 308641792. Throughput: 0: 10956.8. Samples: 77249536. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:30:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:30:01,447][1652475] Updated weights for policy 0, policy_version 150741 (0.0014) [2024-06-15 13:30:02,903][1652475] Updated weights for policy 0, policy_version 150816 (0.0012) [2024-06-15 13:30:03,510][1652475] Updated weights for policy 0, policy_version 150848 (0.0022) [2024-06-15 13:30:05,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 308936704. Throughput: 0: 10911.3. Samples: 77275136. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:30:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:30:07,266][1652475] Updated weights for policy 0, policy_version 150896 (0.0021) [2024-06-15 13:30:10,762][1648984] Fps is (10 sec: 45762.4, 60 sec: 43672.9, 300 sec: 42761.7). Total num frames: 309100544. Throughput: 0: 11075.9. Samples: 77355520. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:30:10,763][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:30:11,522][1652475] Updated weights for policy 0, policy_version 150968 (0.0038) [2024-06-15 13:30:11,730][1651340] Signal inference workers to stop experience collection... (7800 times) [2024-06-15 13:30:11,774][1651340] Signal inference workers to resume experience collection... (7800 times) [2024-06-15 13:30:11,791][1652475] InferenceWorker_p0-w0: stopping experience collection (7800 times) [2024-06-15 13:30:11,818][1652475] InferenceWorker_p0-w0: resuming experience collection (7800 times) [2024-06-15 13:30:12,412][1652475] Updated weights for policy 0, policy_version 151008 (0.0014) [2024-06-15 13:30:15,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 42765.0). Total num frames: 309460992. Throughput: 0: 11229.8. Samples: 77421056. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:30:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:30:17,081][1652475] Updated weights for policy 0, policy_version 151105 (0.0014) [2024-06-15 13:30:20,738][1648984] Fps is (10 sec: 49272.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 309592064. Throughput: 0: 11116.1. Samples: 77458944. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:30:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:30:21,424][1652475] Updated weights for policy 0, policy_version 151175 (0.0014) [2024-06-15 13:30:23,070][1652475] Updated weights for policy 0, policy_version 151236 (0.0014) [2024-06-15 13:30:24,819][1652475] Updated weights for policy 0, policy_version 151314 (0.0012) [2024-06-15 13:30:25,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 43431.5). Total num frames: 309985280. Throughput: 0: 11377.8. Samples: 77533184. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:30:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:30:28,934][1652475] Updated weights for policy 0, policy_version 151392 (0.0185) [2024-06-15 13:30:30,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 310116352. Throughput: 0: 11423.3. Samples: 77597696. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:30:30,741][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:30:33,163][1652475] Updated weights for policy 0, policy_version 151456 (0.0012) [2024-06-15 13:30:35,164][1652475] Updated weights for policy 0, policy_version 151520 (0.0012) [2024-06-15 13:30:35,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 45329.0, 300 sec: 43431.5). Total num frames: 310345728. Throughput: 0: 11446.0. Samples: 77636096. Policy #0 lag: (min: 3.0, avg: 100.0, max: 259.0) [2024-06-15 13:30:35,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 13:30:36,547][1652475] Updated weights for policy 0, policy_version 151575 (0.0023) [2024-06-15 13:30:40,304][1652475] Updated weights for policy 0, policy_version 151648 (0.0015) [2024-06-15 13:30:40,738][1648984] Fps is (10 sec: 49151.2, 60 sec: 45329.0, 300 sec: 42987.3). Total num frames: 310607872. Throughput: 0: 11434.6. Samples: 77694464. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:30:40,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:30:45,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 44783.0, 300 sec: 42876.1). Total num frames: 310706176. Throughput: 0: 11480.2. Samples: 77766144. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:30:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:30:45,815][1652475] Updated weights for policy 0, policy_version 151714 (0.0012) [2024-06-15 13:30:48,124][1652475] Updated weights for policy 0, policy_version 151804 (0.0013) [2024-06-15 13:30:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 45329.1, 300 sec: 43542.6). Total num frames: 311001088. Throughput: 0: 11525.7. Samples: 77793792. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:30:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:30:51,121][1652475] Updated weights for policy 0, policy_version 151875 (0.0205) [2024-06-15 13:30:52,283][1652475] Updated weights for policy 0, policy_version 151936 (0.0014) [2024-06-15 13:30:55,748][1648984] Fps is (10 sec: 45827.2, 60 sec: 43683.0, 300 sec: 42763.5). Total num frames: 311164928. Throughput: 0: 11233.4. Samples: 77860864. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:30:55,749][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:30:55,757][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000151936_311164928.pth... [2024-06-15 13:30:55,832][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000146864_300777472.pth [2024-06-15 13:30:55,838][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000151936_311164928.pth [2024-06-15 13:30:56,326][1651340] Signal inference workers to stop experience collection... (7850 times) [2024-06-15 13:30:56,357][1652475] InferenceWorker_p0-w0: stopping experience collection (7850 times) [2024-06-15 13:30:56,517][1651340] Signal inference workers to resume experience collection... (7850 times) [2024-06-15 13:30:56,518][1652475] InferenceWorker_p0-w0: resuming experience collection (7850 times) [2024-06-15 13:30:57,601][1652475] Updated weights for policy 0, policy_version 152000 (0.0015) [2024-06-15 13:30:59,740][1652475] Updated weights for policy 0, policy_version 152064 (0.0018) [2024-06-15 13:31:00,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 43543.2). Total num frames: 311427072. Throughput: 0: 11355.0. Samples: 77932032. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:31:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:02,430][1652475] Updated weights for policy 0, policy_version 152132 (0.0125) [2024-06-15 13:31:03,565][1652475] Updated weights for policy 0, policy_version 152191 (0.0014) [2024-06-15 13:31:05,738][1648984] Fps is (10 sec: 52484.1, 60 sec: 45875.3, 300 sec: 43764.7). Total num frames: 311689216. Throughput: 0: 11150.2. Samples: 77960704. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:31:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:09,243][1652475] Updated weights for policy 0, policy_version 152249 (0.0014) [2024-06-15 13:31:10,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 46986.8, 300 sec: 43431.5). Total num frames: 311918592. Throughput: 0: 11161.6. Samples: 78035456. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:31:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:10,913][1652475] Updated weights for policy 0, policy_version 152311 (0.0013) [2024-06-15 13:31:13,878][1652475] Updated weights for policy 0, policy_version 152400 (0.0153) [2024-06-15 13:31:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 312213504. Throughput: 0: 11138.8. Samples: 78098944. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:31:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:19,673][1652475] Updated weights for policy 0, policy_version 152464 (0.0100) [2024-06-15 13:31:20,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 45329.1, 300 sec: 43209.3). Total num frames: 312311808. Throughput: 0: 11252.6. Samples: 78142464. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:31:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:22,147][1652475] Updated weights for policy 0, policy_version 152549 (0.0015) [2024-06-15 13:31:24,280][1652475] Updated weights for policy 0, policy_version 152592 (0.0015) [2024-06-15 13:31:25,746][1648984] Fps is (10 sec: 42574.7, 60 sec: 44232.8, 300 sec: 44097.1). Total num frames: 312639488. Throughput: 0: 11410.5. Samples: 78208000. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:31:25,762][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:26,189][1652475] Updated weights for policy 0, policy_version 152675 (0.0014) [2024-06-15 13:31:30,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43690.7, 300 sec: 43432.1). Total num frames: 312737792. Throughput: 0: 11446.1. Samples: 78281216. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:31:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:31,748][1652475] Updated weights for policy 0, policy_version 152738 (0.0015) [2024-06-15 13:31:34,184][1652475] Updated weights for policy 0, policy_version 152816 (0.0015) [2024-06-15 13:31:35,739][1648984] Fps is (10 sec: 36061.3, 60 sec: 44236.1, 300 sec: 43542.4). Total num frames: 312999936. Throughput: 0: 11457.2. Samples: 78309376. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:31:35,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:36,018][1652475] Updated weights for policy 0, policy_version 152848 (0.0014) [2024-06-15 13:31:37,306][1652475] Updated weights for policy 0, policy_version 152912 (0.0016) [2024-06-15 13:31:37,752][1651340] Signal inference workers to stop experience collection... (7900 times) [2024-06-15 13:31:37,784][1652475] InferenceWorker_p0-w0: stopping experience collection (7900 times) [2024-06-15 13:31:37,968][1651340] Signal inference workers to resume experience collection... (7900 times) [2024-06-15 13:31:37,969][1652475] InferenceWorker_p0-w0: resuming experience collection (7900 times) [2024-06-15 13:31:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 313262080. Throughput: 0: 11437.3. Samples: 78375424. Policy #0 lag: (min: 15.0, avg: 135.7, max: 271.0) [2024-06-15 13:31:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:42,719][1652475] Updated weights for policy 0, policy_version 152980 (0.0016) [2024-06-15 13:31:45,618][1652475] Updated weights for policy 0, policy_version 153072 (0.0111) [2024-06-15 13:31:45,738][1648984] Fps is (10 sec: 49157.1, 60 sec: 46421.4, 300 sec: 43542.6). Total num frames: 313491456. Throughput: 0: 11411.9. Samples: 78445568. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:31:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:47,416][1652475] Updated weights for policy 0, policy_version 153105 (0.0030) [2024-06-15 13:31:49,350][1652475] Updated weights for policy 0, policy_version 153200 (0.0027) [2024-06-15 13:31:50,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 46421.4, 300 sec: 44209.0). Total num frames: 313786368. Throughput: 0: 11514.3. Samples: 78478848. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:31:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:31:55,647][1652475] Updated weights for policy 0, policy_version 153273 (0.0015) [2024-06-15 13:31:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 45337.0, 300 sec: 43764.7). Total num frames: 313884672. Throughput: 0: 11457.4. Samples: 78551040. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:31:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:31:57,514][1652475] Updated weights for policy 0, policy_version 153344 (0.0100) [2024-06-15 13:32:00,365][1652475] Updated weights for policy 0, policy_version 153424 (0.0084) [2024-06-15 13:32:00,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 46421.3, 300 sec: 44098.0). Total num frames: 314212352. Throughput: 0: 11252.6. Samples: 78605312. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:32:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:32:05,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 314310656. Throughput: 0: 10934.1. Samples: 78634496. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:32:05,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:32:07,922][1652475] Updated weights for policy 0, policy_version 153474 (0.0029) [2024-06-15 13:32:09,987][1652475] Updated weights for policy 0, policy_version 153557 (0.0036) [2024-06-15 13:32:10,756][1648984] Fps is (10 sec: 32707.5, 60 sec: 43677.1, 300 sec: 43428.8). Total num frames: 314540032. Throughput: 0: 10930.9. Samples: 78700032. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:32:10,758][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:32:11,954][1652475] Updated weights for policy 0, policy_version 153633 (0.0123) [2024-06-15 13:32:12,703][1652475] Updated weights for policy 0, policy_version 153664 (0.0024) [2024-06-15 13:32:15,606][1652475] Updated weights for policy 0, policy_version 153722 (0.0016) [2024-06-15 13:32:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 314834944. Throughput: 0: 10626.8. Samples: 78759424. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:32:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 13:32:20,738][1648984] Fps is (10 sec: 36112.0, 60 sec: 43144.6, 300 sec: 43098.3). Total num frames: 314900480. Throughput: 0: 10809.1. Samples: 78795776. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:32:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 13:32:21,109][1652475] Updated weights for policy 0, policy_version 153778 (0.0012) [2024-06-15 13:32:23,819][1651340] Signal inference workers to stop experience collection... (7950 times) [2024-06-15 13:32:23,865][1652475] InferenceWorker_p0-w0: stopping experience collection (7950 times) [2024-06-15 13:32:24,066][1651340] Signal inference workers to resume experience collection... (7950 times) [2024-06-15 13:32:24,067][1652475] InferenceWorker_p0-w0: resuming experience collection (7950 times) [2024-06-15 13:32:24,071][1652475] Updated weights for policy 0, policy_version 153904 (0.0014) [2024-06-15 13:32:25,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43148.5, 300 sec: 43986.9). Total num frames: 315228160. Throughput: 0: 10501.7. Samples: 78848000. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:32:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 13:32:28,183][1652475] Updated weights for policy 0, policy_version 153968 (0.0013) [2024-06-15 13:32:30,738][1648984] Fps is (10 sec: 45874.4, 60 sec: 43690.5, 300 sec: 43875.8). Total num frames: 315359232. Throughput: 0: 10467.5. Samples: 78916608. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:32:30,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 13:32:33,488][1652475] Updated weights for policy 0, policy_version 154037 (0.0014) [2024-06-15 13:32:35,176][1652475] Updated weights for policy 0, policy_version 154096 (0.0012) [2024-06-15 13:32:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43691.3, 300 sec: 43764.7). Total num frames: 315621376. Throughput: 0: 10444.8. Samples: 78948864. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:32:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 13:32:36,577][1652475] Updated weights for policy 0, policy_version 154160 (0.0014) [2024-06-15 13:32:39,703][1652475] Updated weights for policy 0, policy_version 154192 (0.0011) [2024-06-15 13:32:40,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 315883520. Throughput: 0: 10433.4. Samples: 79020544. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:32:40,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 13:32:44,772][1652475] Updated weights for policy 0, policy_version 154272 (0.0015) [2024-06-15 13:32:45,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 316014592. Throughput: 0: 10581.4. Samples: 79081472. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 13:32:45,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 13:32:47,459][1652475] Updated weights for policy 0, policy_version 154368 (0.0094) [2024-06-15 13:32:48,996][1652475] Updated weights for policy 0, policy_version 154431 (0.0013) [2024-06-15 13:32:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 316276736. Throughput: 0: 10501.7. Samples: 79107072. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:32:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 13:32:53,487][1652475] Updated weights for policy 0, policy_version 154496 (0.0022) [2024-06-15 13:32:55,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 43986.9). Total num frames: 316407808. Throughput: 0: 10471.9. Samples: 79171072. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:32:55,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:32:55,759][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000154496_316407808.pth... [2024-06-15 13:32:55,817][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000149376_305922048.pth [2024-06-15 13:32:59,283][1652475] Updated weights for policy 0, policy_version 154565 (0.0013) [2024-06-15 13:33:00,661][1652475] Updated weights for policy 0, policy_version 154624 (0.0021) [2024-06-15 13:33:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 40960.0, 300 sec: 43986.9). Total num frames: 316669952. Throughput: 0: 10558.6. Samples: 79234560. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:00,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:33:04,105][1652475] Updated weights for policy 0, policy_version 154689 (0.0121) [2024-06-15 13:33:05,075][1652475] Updated weights for policy 0, policy_version 154746 (0.0104) [2024-06-15 13:33:05,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.5, 300 sec: 44431.2). Total num frames: 316932096. Throughput: 0: 10285.5. Samples: 79258624. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:05,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:10,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 40426.4, 300 sec: 43542.5). Total num frames: 316964864. Throughput: 0: 10683.7. Samples: 79328768. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:11,023][1652475] Updated weights for policy 0, policy_version 154787 (0.0013) [2024-06-15 13:33:12,603][1652475] Updated weights for policy 0, policy_version 154848 (0.0024) [2024-06-15 13:33:12,717][1651340] Signal inference workers to stop experience collection... (8000 times) [2024-06-15 13:33:12,768][1652475] InferenceWorker_p0-w0: stopping experience collection (8000 times) [2024-06-15 13:33:13,094][1651340] Signal inference workers to resume experience collection... (8000 times) [2024-06-15 13:33:13,094][1652475] InferenceWorker_p0-w0: resuming experience collection (8000 times) [2024-06-15 13:33:13,918][1652475] Updated weights for policy 0, policy_version 154896 (0.0013) [2024-06-15 13:33:15,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 317325312. Throughput: 0: 10501.7. Samples: 79389184. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:16,386][1652475] Updated weights for policy 0, policy_version 154961 (0.0055) [2024-06-15 13:33:17,409][1652475] Updated weights for policy 0, policy_version 155008 (0.0012) [2024-06-15 13:33:20,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 317456384. Throughput: 0: 10513.1. Samples: 79421952. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:23,257][1652475] Updated weights for policy 0, policy_version 155069 (0.0013) [2024-06-15 13:33:24,787][1652475] Updated weights for policy 0, policy_version 155120 (0.0022) [2024-06-15 13:33:25,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 44098.0). Total num frames: 317751296. Throughput: 0: 10490.3. Samples: 79492608. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:28,399][1652475] Updated weights for policy 0, policy_version 155216 (0.0015) [2024-06-15 13:33:29,558][1652475] Updated weights for policy 0, policy_version 155263 (0.0015) [2024-06-15 13:33:30,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 317980672. Throughput: 0: 10478.9. Samples: 79553024. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:35,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.2, 300 sec: 44209.0). Total num frames: 318111744. Throughput: 0: 10843.0. Samples: 79595008. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:36,424][1652475] Updated weights for policy 0, policy_version 155360 (0.0015) [2024-06-15 13:33:38,703][1652475] Updated weights for policy 0, policy_version 155427 (0.0013) [2024-06-15 13:33:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 44098.0). Total num frames: 318406656. Throughput: 0: 10558.6. Samples: 79646208. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:40,742][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:40,835][1652475] Updated weights for policy 0, policy_version 155480 (0.0012) [2024-06-15 13:33:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 318504960. Throughput: 0: 10786.2. Samples: 79719936. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:46,374][1652475] Updated weights for policy 0, policy_version 155536 (0.0012) [2024-06-15 13:33:47,426][1652475] Updated weights for policy 0, policy_version 155578 (0.0013) [2024-06-15 13:33:48,608][1652475] Updated weights for policy 0, policy_version 155617 (0.0030) [2024-06-15 13:33:50,072][1652475] Updated weights for policy 0, policy_version 155664 (0.0013) [2024-06-15 13:33:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 43875.8). Total num frames: 318832640. Throughput: 0: 11025.1. Samples: 79754752. Policy #0 lag: (min: 127.0, avg: 231.9, max: 383.0) [2024-06-15 13:33:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:52,568][1652475] Updated weights for policy 0, policy_version 155745 (0.0015) [2024-06-15 13:33:55,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.7, 300 sec: 43987.1). Total num frames: 319029248. Throughput: 0: 10786.1. Samples: 79814144. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:33:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:33:58,882][1652475] Updated weights for policy 0, policy_version 155824 (0.0013) [2024-06-15 13:34:00,478][1651340] Signal inference workers to stop experience collection... (8050 times) [2024-06-15 13:34:00,543][1652475] InferenceWorker_p0-w0: stopping experience collection (8050 times) [2024-06-15 13:34:00,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42052.4, 300 sec: 43653.7). Total num frames: 319193088. Throughput: 0: 11036.5. Samples: 79885824. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:34:00,786][1651340] Signal inference workers to resume experience collection... (8050 times) [2024-06-15 13:34:00,787][1652475] InferenceWorker_p0-w0: resuming experience collection (8050 times) [2024-06-15 13:34:01,618][1652475] Updated weights for policy 0, policy_version 155904 (0.0014) [2024-06-15 13:34:03,120][1652475] Updated weights for policy 0, policy_version 155961 (0.0013) [2024-06-15 13:34:05,253][1652475] Updated weights for policy 0, policy_version 156032 (0.0012) [2024-06-15 13:34:05,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.8, 300 sec: 44320.1). Total num frames: 319553536. Throughput: 0: 10877.2. Samples: 79911424. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:34:10,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 43764.7). Total num frames: 319619072. Throughput: 0: 10843.0. Samples: 79980544. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:34:11,339][1652475] Updated weights for policy 0, policy_version 156087 (0.0013) [2024-06-15 13:34:13,707][1652475] Updated weights for policy 0, policy_version 156150 (0.0031) [2024-06-15 13:34:15,224][1652475] Updated weights for policy 0, policy_version 156224 (0.0094) [2024-06-15 13:34:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 319946752. Throughput: 0: 10820.3. Samples: 80039936. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:34:20,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 320077824. Throughput: 0: 10547.2. Samples: 80069632. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:34:24,160][1652475] Updated weights for policy 0, policy_version 156320 (0.0016) [2024-06-15 13:34:25,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 320274432. Throughput: 0: 10934.0. Samples: 80138240. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:25,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 13:34:25,971][1652475] Updated weights for policy 0, policy_version 156402 (0.0020) [2024-06-15 13:34:30,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 320503808. Throughput: 0: 10592.7. Samples: 80196608. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:34:30,919][1652475] Updated weights for policy 0, policy_version 156512 (0.0014) [2024-06-15 13:34:35,299][1652475] Updated weights for policy 0, policy_version 156560 (0.0012) [2024-06-15 13:34:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 320667648. Throughput: 0: 10490.3. Samples: 80226816. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:35,738][1648984] Avg episode reward: [(0, '-0.190')] [2024-06-15 13:34:35,973][1651340] Saving new best policy, reward=-0.190! [2024-06-15 13:34:36,739][1652475] Updated weights for policy 0, policy_version 156625 (0.0027) [2024-06-15 13:34:38,217][1652475] Updated weights for policy 0, policy_version 156688 (0.0015) [2024-06-15 13:34:40,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 320995328. Throughput: 0: 10581.3. Samples: 80290304. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:40,741][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 13:34:43,570][1652475] Updated weights for policy 0, policy_version 156738 (0.0019) [2024-06-15 13:34:44,830][1652475] Updated weights for policy 0, policy_version 156799 (0.0013) [2024-06-15 13:34:45,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 321126400. Throughput: 0: 10581.3. Samples: 80361984. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:45,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 13:34:47,083][1651340] Signal inference workers to stop experience collection... (8100 times) [2024-06-15 13:34:47,124][1652475] InferenceWorker_p0-w0: stopping experience collection (8100 times) [2024-06-15 13:34:47,343][1651340] Signal inference workers to resume experience collection... (8100 times) [2024-06-15 13:34:47,344][1652475] InferenceWorker_p0-w0: resuming experience collection (8100 times) [2024-06-15 13:34:47,814][1652475] Updated weights for policy 0, policy_version 156854 (0.0084) [2024-06-15 13:34:49,148][1652475] Updated weights for policy 0, policy_version 156912 (0.0017) [2024-06-15 13:34:50,726][1652475] Updated weights for policy 0, policy_version 156976 (0.0016) [2024-06-15 13:34:50,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 321486848. Throughput: 0: 10797.5. Samples: 80397312. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:50,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 13:34:55,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 43653.6). Total num frames: 321519616. Throughput: 0: 10740.6. Samples: 80463872. Policy #0 lag: (min: 60.0, avg: 216.9, max: 316.0) [2024-06-15 13:34:55,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 13:34:56,319][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000157024_321585152.pth... [2024-06-15 13:34:56,436][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000151936_311164928.pth [2024-06-15 13:34:56,690][1652475] Updated weights for policy 0, policy_version 157040 (0.0013) [2024-06-15 13:34:58,565][1652475] Updated weights for policy 0, policy_version 157073 (0.0013) [2024-06-15 13:35:00,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 44782.7, 300 sec: 43875.8). Total num frames: 321880064. Throughput: 0: 10763.3. Samples: 80524288. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:00,739][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 13:35:00,869][1652475] Updated weights for policy 0, policy_version 157178 (0.0013) [2024-06-15 13:35:03,059][1652475] Updated weights for policy 0, policy_version 157242 (0.0014) [2024-06-15 13:35:05,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 41506.1, 300 sec: 43879.4). Total num frames: 322043904. Throughput: 0: 10786.1. Samples: 80555008. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:05,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 13:35:09,388][1652475] Updated weights for policy 0, policy_version 157305 (0.0013) [2024-06-15 13:35:10,738][1648984] Fps is (10 sec: 29491.7, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 322174976. Throughput: 0: 10752.0. Samples: 80622080. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:35:11,545][1652475] Updated weights for policy 0, policy_version 157367 (0.0015) [2024-06-15 13:35:13,452][1652475] Updated weights for policy 0, policy_version 157436 (0.0012) [2024-06-15 13:35:14,940][1652475] Updated weights for policy 0, policy_version 157501 (0.0013) [2024-06-15 13:35:15,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 322568192. Throughput: 0: 10752.0. Samples: 80680448. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:15,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:35:20,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 322633728. Throughput: 0: 10854.4. Samples: 80715264. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:20,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:35:21,260][1652475] Updated weights for policy 0, policy_version 157564 (0.0012) [2024-06-15 13:35:24,458][1652475] Updated weights for policy 0, policy_version 157605 (0.0016) [2024-06-15 13:35:25,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 322895872. Throughput: 0: 10956.8. Samples: 80783360. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:25,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:35:26,753][1652475] Updated weights for policy 0, policy_version 157699 (0.0013) [2024-06-15 13:35:30,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 323092480. Throughput: 0: 10706.5. Samples: 80843776. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:30,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:35:32,066][1652475] Updated weights for policy 0, policy_version 157761 (0.0014) [2024-06-15 13:35:33,378][1652475] Updated weights for policy 0, policy_version 157818 (0.0013) [2024-06-15 13:35:35,305][1651340] Signal inference workers to stop experience collection... (8150 times) [2024-06-15 13:35:35,345][1652475] InferenceWorker_p0-w0: stopping experience collection (8150 times) [2024-06-15 13:35:35,496][1651340] Signal inference workers to resume experience collection... (8150 times) [2024-06-15 13:35:35,497][1652475] InferenceWorker_p0-w0: resuming experience collection (8150 times) [2024-06-15 13:35:35,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 323256320. Throughput: 0: 10717.9. Samples: 80879616. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:35,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:35:36,095][1652475] Updated weights for policy 0, policy_version 157872 (0.0014) [2024-06-15 13:35:37,562][1652475] Updated weights for policy 0, policy_version 157936 (0.0100) [2024-06-15 13:35:39,100][1652475] Updated weights for policy 0, policy_version 157987 (0.0013) [2024-06-15 13:35:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 323616768. Throughput: 0: 10615.5. Samples: 80941568. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:40,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:35:43,690][1652475] Updated weights for policy 0, policy_version 158018 (0.0014) [2024-06-15 13:35:45,156][1652475] Updated weights for policy 0, policy_version 158080 (0.0038) [2024-06-15 13:35:45,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 323747840. Throughput: 0: 10797.6. Samples: 81010176. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:45,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:35:48,023][1652475] Updated weights for policy 0, policy_version 158144 (0.0132) [2024-06-15 13:35:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 43544.1). Total num frames: 324009984. Throughput: 0: 10922.7. Samples: 81046528. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:50,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:35:51,539][1652475] Updated weights for policy 0, policy_version 158240 (0.0077) [2024-06-15 13:35:55,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 324141056. Throughput: 0: 10786.2. Samples: 81107456. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:35:55,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:35:55,858][1652475] Updated weights for policy 0, policy_version 158275 (0.0013) [2024-06-15 13:35:57,193][1652475] Updated weights for policy 0, policy_version 158330 (0.0011) [2024-06-15 13:35:59,453][1652475] Updated weights for policy 0, policy_version 158389 (0.0026) [2024-06-15 13:36:00,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42052.5, 300 sec: 43098.2). Total num frames: 324403200. Throughput: 0: 11082.0. Samples: 81179136. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 13:36:00,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:36:02,648][1652475] Updated weights for policy 0, policy_version 158464 (0.0021) [2024-06-15 13:36:04,217][1652475] Updated weights for policy 0, policy_version 158524 (0.0012) [2024-06-15 13:36:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 324665344. Throughput: 0: 10911.3. Samples: 81206272. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:05,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:36:07,957][1652475] Updated weights for policy 0, policy_version 158564 (0.0012) [2024-06-15 13:36:10,261][1652475] Updated weights for policy 0, policy_version 158614 (0.0014) [2024-06-15 13:36:10,738][1648984] Fps is (10 sec: 49151.0, 60 sec: 45329.0, 300 sec: 42987.1). Total num frames: 324894720. Throughput: 0: 11059.2. Samples: 81281024. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:10,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:36:12,790][1652475] Updated weights for policy 0, policy_version 158659 (0.0013) [2024-06-15 13:36:15,240][1652475] Updated weights for policy 0, policy_version 158752 (0.0015) [2024-06-15 13:36:15,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 325156864. Throughput: 0: 11025.0. Samples: 81339904. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:15,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:36:18,917][1651340] Signal inference workers to stop experience collection... (8200 times) [2024-06-15 13:36:18,971][1652475] InferenceWorker_p0-w0: stopping experience collection (8200 times) [2024-06-15 13:36:19,134][1651340] Signal inference workers to resume experience collection... (8200 times) [2024-06-15 13:36:19,135][1652475] InferenceWorker_p0-w0: resuming experience collection (8200 times) [2024-06-15 13:36:19,217][1652475] Updated weights for policy 0, policy_version 158800 (0.0098) [2024-06-15 13:36:20,738][1648984] Fps is (10 sec: 42599.3, 60 sec: 44782.9, 300 sec: 42988.0). Total num frames: 325320704. Throughput: 0: 11161.6. Samples: 81381888. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:20,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:36:21,992][1652475] Updated weights for policy 0, policy_version 158880 (0.0091) [2024-06-15 13:36:25,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 325517312. Throughput: 0: 11309.5. Samples: 81450496. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:25,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:36:25,774][1652475] Updated weights for policy 0, policy_version 158960 (0.0014) [2024-06-15 13:36:27,581][1652475] Updated weights for policy 0, policy_version 159031 (0.0013) [2024-06-15 13:36:30,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43690.6, 300 sec: 43098.4). Total num frames: 325713920. Throughput: 0: 11241.2. Samples: 81516032. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:30,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:36:31,950][1652475] Updated weights for policy 0, policy_version 159097 (0.0014) [2024-06-15 13:36:34,713][1652475] Updated weights for policy 0, policy_version 159167 (0.0036) [2024-06-15 13:36:35,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 45329.1, 300 sec: 43098.3). Total num frames: 325976064. Throughput: 0: 11207.1. Samples: 81550848. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:36:38,280][1652475] Updated weights for policy 0, policy_version 159232 (0.0115) [2024-06-15 13:36:39,650][1652475] Updated weights for policy 0, policy_version 159291 (0.0027) [2024-06-15 13:36:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 326238208. Throughput: 0: 11013.6. Samples: 81603072. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:40,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:36:45,634][1652475] Updated weights for policy 0, policy_version 159376 (0.0014) [2024-06-15 13:36:45,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 326402048. Throughput: 0: 11013.7. Samples: 81674752. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:36:46,463][1652475] Updated weights for policy 0, policy_version 159423 (0.0014) [2024-06-15 13:36:49,848][1652475] Updated weights for policy 0, policy_version 159481 (0.0013) [2024-06-15 13:36:50,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 326631424. Throughput: 0: 11116.1. Samples: 81706496. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:36:51,852][1652475] Updated weights for policy 0, policy_version 159541 (0.0012) [2024-06-15 13:36:55,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 326762496. Throughput: 0: 10911.3. Samples: 81772032. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:36:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:36:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000159552_326762496.pth... [2024-06-15 13:36:55,786][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000154496_316407808.pth [2024-06-15 13:36:57,605][1652475] Updated weights for policy 0, policy_version 159616 (0.0020) [2024-06-15 13:36:59,287][1652475] Updated weights for policy 0, policy_version 159680 (0.0013) [2024-06-15 13:37:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 327057408. Throughput: 0: 10934.1. Samples: 81831936. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:37:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:37:01,633][1652475] Updated weights for policy 0, policy_version 159738 (0.0016) [2024-06-15 13:37:03,582][1651340] Signal inference workers to stop experience collection... (8250 times) [2024-06-15 13:37:03,624][1652475] InferenceWorker_p0-w0: stopping experience collection (8250 times) [2024-06-15 13:37:03,797][1651340] Signal inference workers to resume experience collection... (8250 times) [2024-06-15 13:37:03,797][1652475] InferenceWorker_p0-w0: resuming experience collection (8250 times) [2024-06-15 13:37:03,800][1652475] Updated weights for policy 0, policy_version 159792 (0.0013) [2024-06-15 13:37:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43212.1). Total num frames: 327286784. Throughput: 0: 10752.0. Samples: 81865728. Policy #0 lag: (min: 31.0, avg: 173.7, max: 319.0) [2024-06-15 13:37:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:37:08,564][1652475] Updated weights for policy 0, policy_version 159829 (0.0012) [2024-06-15 13:37:09,967][1652475] Updated weights for policy 0, policy_version 159881 (0.0035) [2024-06-15 13:37:10,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 327483392. Throughput: 0: 10797.5. Samples: 81936384. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:37:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:37:11,226][1652475] Updated weights for policy 0, policy_version 159930 (0.0087) [2024-06-15 13:37:12,603][1652475] Updated weights for policy 0, policy_version 159984 (0.0012) [2024-06-15 13:37:15,120][1652475] Updated weights for policy 0, policy_version 160038 (0.0021) [2024-06-15 13:37:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44236.9, 300 sec: 43764.7). Total num frames: 327811072. Throughput: 0: 10808.9. Samples: 82002432. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:37:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:37:19,770][1652475] Updated weights for policy 0, policy_version 160082 (0.0013) [2024-06-15 13:37:20,742][1648984] Fps is (10 sec: 45853.6, 60 sec: 43687.2, 300 sec: 43097.6). Total num frames: 327942144. Throughput: 0: 10967.0. Samples: 82044416. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:37:20,743][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:37:21,943][1652475] Updated weights for policy 0, policy_version 160187 (0.0014) [2024-06-15 13:37:24,681][1652475] Updated weights for policy 0, policy_version 160251 (0.0014) [2024-06-15 13:37:25,760][1648984] Fps is (10 sec: 39232.5, 60 sec: 44765.9, 300 sec: 43539.2). Total num frames: 328204288. Throughput: 0: 11133.3. Samples: 82104320. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:37:25,761][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:37:30,758][1648984] Fps is (10 sec: 39259.1, 60 sec: 43675.7, 300 sec: 43095.2). Total num frames: 328335360. Throughput: 0: 10951.8. Samples: 82167808. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:37:30,759][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 13:37:32,135][1652475] Updated weights for policy 0, policy_version 160321 (0.0016) [2024-06-15 13:37:34,208][1652475] Updated weights for policy 0, policy_version 160400 (0.0020) [2024-06-15 13:37:35,575][1652475] Updated weights for policy 0, policy_version 160448 (0.0013) [2024-06-15 13:37:35,738][1648984] Fps is (10 sec: 39411.0, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 328597504. Throughput: 0: 10899.9. Samples: 82196992. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:37:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:37:39,591][1652475] Updated weights for policy 0, policy_version 160528 (0.0136) [2024-06-15 13:37:40,738][1648984] Fps is (10 sec: 52537.5, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 328859648. Throughput: 0: 10706.5. Samples: 82253824. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:37:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 13:37:44,944][1652475] Updated weights for policy 0, policy_version 160596 (0.0014) [2024-06-15 13:37:45,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42598.3, 300 sec: 42987.2). Total num frames: 328957952. Throughput: 0: 10877.1. Samples: 82321408. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:37:45,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:37:45,940][1652475] Updated weights for policy 0, policy_version 160639 (0.0012) [2024-06-15 13:37:47,991][1652475] Updated weights for policy 0, policy_version 160704 (0.0014) [2024-06-15 13:37:50,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 329187328. Throughput: 0: 10797.5. Samples: 82351616. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:37:50,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:37:50,976][1651340] Signal inference workers to stop experience collection... (8300 times) [2024-06-15 13:37:51,008][1652475] InferenceWorker_p0-w0: stopping experience collection (8300 times) [2024-06-15 13:37:51,207][1651340] Signal inference workers to resume experience collection... (8300 times) [2024-06-15 13:37:51,220][1652475] InferenceWorker_p0-w0: resuming experience collection (8300 times) [2024-06-15 13:37:51,377][1652475] Updated weights for policy 0, policy_version 160770 (0.0014) [2024-06-15 13:37:52,387][1652475] Updated weights for policy 0, policy_version 160824 (0.0012) [2024-06-15 13:37:55,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 329383936. Throughput: 0: 10649.6. Samples: 82415616. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:37:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:37:57,889][1652475] Updated weights for policy 0, policy_version 160887 (0.0090) [2024-06-15 13:37:58,935][1652475] Updated weights for policy 0, policy_version 160923 (0.0021) [2024-06-15 13:38:00,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 329646080. Throughput: 0: 10626.9. Samples: 82480640. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:38:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:38:02,731][1652475] Updated weights for policy 0, policy_version 160993 (0.0046) [2024-06-15 13:38:03,705][1652475] Updated weights for policy 0, policy_version 161043 (0.0012) [2024-06-15 13:38:05,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 329908224. Throughput: 0: 10434.5. Samples: 82513920. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:38:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:38:08,801][1652475] Updated weights for policy 0, policy_version 161107 (0.0032) [2024-06-15 13:38:09,765][1652475] Updated weights for policy 0, policy_version 161152 (0.0015) [2024-06-15 13:38:10,740][1648984] Fps is (10 sec: 42598.1, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 330072064. Throughput: 0: 10711.9. Samples: 82586112. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 13:38:10,743][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:38:11,362][1652475] Updated weights for policy 0, policy_version 161205 (0.0013) [2024-06-15 13:38:14,157][1652475] Updated weights for policy 0, policy_version 161264 (0.0026) [2024-06-15 13:38:14,997][1652475] Updated weights for policy 0, policy_version 161300 (0.0014) [2024-06-15 13:38:15,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 330432512. Throughput: 0: 10688.6. Samples: 82648576. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:38:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:38:20,172][1652475] Updated weights for policy 0, policy_version 161360 (0.0014) [2024-06-15 13:38:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42601.8, 300 sec: 43209.3). Total num frames: 330498048. Throughput: 0: 10922.7. Samples: 82688512. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:38:20,762][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:38:21,379][1652475] Updated weights for policy 0, policy_version 161408 (0.0136) [2024-06-15 13:38:24,873][1652475] Updated weights for policy 0, policy_version 161479 (0.0014) [2024-06-15 13:38:25,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 43160.9, 300 sec: 43431.5). Total num frames: 330792960. Throughput: 0: 11116.1. Samples: 82754048. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:38:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:38:25,903][1652475] Updated weights for policy 0, policy_version 161535 (0.0015) [2024-06-15 13:38:30,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43705.8, 300 sec: 43542.6). Total num frames: 330956800. Throughput: 0: 11207.1. Samples: 82825728. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:38:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:38:31,471][1652475] Updated weights for policy 0, policy_version 161603 (0.0014) [2024-06-15 13:38:33,136][1652475] Updated weights for policy 0, policy_version 161680 (0.0014) [2024-06-15 13:38:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 331218944. Throughput: 0: 11218.5. Samples: 82856448. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:38:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:38:36,261][1651340] Signal inference workers to stop experience collection... (8350 times) [2024-06-15 13:38:36,332][1652475] Updated weights for policy 0, policy_version 161735 (0.0014) [2024-06-15 13:38:36,351][1652475] InferenceWorker_p0-w0: stopping experience collection (8350 times) [2024-06-15 13:38:36,472][1651340] Signal inference workers to resume experience collection... (8350 times) [2024-06-15 13:38:36,472][1652475] InferenceWorker_p0-w0: resuming experience collection (8350 times) [2024-06-15 13:38:37,123][1652475] Updated weights for policy 0, policy_version 161778 (0.0016) [2024-06-15 13:38:38,587][1652475] Updated weights for policy 0, policy_version 161840 (0.0014) [2024-06-15 13:38:40,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 331481088. Throughput: 0: 11309.5. Samples: 82924544. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:38:40,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:38:43,362][1652475] Updated weights for policy 0, policy_version 161904 (0.0013) [2024-06-15 13:38:44,422][1652475] Updated weights for policy 0, policy_version 161939 (0.0021) [2024-06-15 13:38:45,436][1652475] Updated weights for policy 0, policy_version 161984 (0.0016) [2024-06-15 13:38:45,742][1648984] Fps is (10 sec: 52404.8, 60 sec: 46417.9, 300 sec: 43764.0). Total num frames: 331743232. Throughput: 0: 11456.3. Samples: 82996224. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:38:45,743][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:38:49,308][1652475] Updated weights for policy 0, policy_version 162064 (0.0048) [2024-06-15 13:38:50,739][1648984] Fps is (10 sec: 52423.4, 60 sec: 46966.6, 300 sec: 43986.7). Total num frames: 332005376. Throughput: 0: 11457.1. Samples: 83029504. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:38:50,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:38:55,078][1652475] Updated weights for policy 0, policy_version 162120 (0.0012) [2024-06-15 13:38:55,738][1648984] Fps is (10 sec: 32782.7, 60 sec: 44783.0, 300 sec: 43653.6). Total num frames: 332070912. Throughput: 0: 11389.2. Samples: 83098624. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:38:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:38:56,339][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000162176_332136448.pth... [2024-06-15 13:38:56,490][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000157024_321585152.pth [2024-06-15 13:38:57,191][1652475] Updated weights for policy 0, policy_version 162208 (0.0012) [2024-06-15 13:38:58,772][1652475] Updated weights for policy 0, policy_version 162242 (0.0011) [2024-06-15 13:39:00,738][1648984] Fps is (10 sec: 39326.1, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 332398592. Throughput: 0: 11264.1. Samples: 83155456. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:39:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:39:03,361][1652475] Updated weights for policy 0, policy_version 162307 (0.0113) [2024-06-15 13:39:05,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 332529664. Throughput: 0: 11104.7. Samples: 83188224. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:39:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:39:08,505][1652475] Updated weights for policy 0, policy_version 162416 (0.0013) [2024-06-15 13:39:10,519][1652475] Updated weights for policy 0, policy_version 162448 (0.0012) [2024-06-15 13:39:10,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 332693504. Throughput: 0: 10888.5. Samples: 83244032. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:39:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:39:11,772][1652475] Updated weights for policy 0, policy_version 162512 (0.0013) [2024-06-15 13:39:15,739][1648984] Fps is (10 sec: 39320.4, 60 sec: 41506.1, 300 sec: 43542.5). Total num frames: 332922880. Throughput: 0: 10786.1. Samples: 83311104. Policy #0 lag: (min: 5.0, avg: 113.7, max: 245.0) [2024-06-15 13:39:15,741][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:39:16,235][1652475] Updated weights for policy 0, policy_version 162562 (0.0102) [2024-06-15 13:39:17,489][1652475] Updated weights for policy 0, policy_version 162623 (0.0013) [2024-06-15 13:39:20,401][1652475] Updated weights for policy 0, policy_version 162681 (0.0013) [2024-06-15 13:39:20,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 333185024. Throughput: 0: 10865.8. Samples: 83345408. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:39:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:39:22,292][1652475] Updated weights for policy 0, policy_version 162736 (0.0013) [2024-06-15 13:39:23,077][1651340] Signal inference workers to stop experience collection... (8400 times) [2024-06-15 13:39:23,127][1652475] InferenceWorker_p0-w0: stopping experience collection (8400 times) [2024-06-15 13:39:23,450][1651340] Signal inference workers to resume experience collection... (8400 times) [2024-06-15 13:39:23,451][1652475] InferenceWorker_p0-w0: resuming experience collection (8400 times) [2024-06-15 13:39:24,195][1652475] Updated weights for policy 0, policy_version 162800 (0.0015) [2024-06-15 13:39:25,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 333447168. Throughput: 0: 10752.0. Samples: 83408384. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:39:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:39:29,247][1652475] Updated weights for policy 0, policy_version 162866 (0.0091) [2024-06-15 13:39:30,737][1652475] Updated weights for policy 0, policy_version 162896 (0.0012) [2024-06-15 13:39:30,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 333611008. Throughput: 0: 10787.2. Samples: 83481600. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:39:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:39:33,009][1652475] Updated weights for policy 0, policy_version 162960 (0.0029) [2024-06-15 13:39:34,902][1652475] Updated weights for policy 0, policy_version 163010 (0.0014) [2024-06-15 13:39:35,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44782.8, 300 sec: 43764.7). Total num frames: 333905920. Throughput: 0: 10718.1. Samples: 83511808. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:39:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:39:40,090][1652475] Updated weights for policy 0, policy_version 163073 (0.0014) [2024-06-15 13:39:40,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 334036992. Throughput: 0: 10740.6. Samples: 83581952. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:39:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:39:41,426][1652475] Updated weights for policy 0, policy_version 163136 (0.0011) [2024-06-15 13:39:43,527][1652475] Updated weights for policy 0, policy_version 163200 (0.0014) [2024-06-15 13:39:45,662][1652475] Updated weights for policy 0, policy_version 163264 (0.0015) [2024-06-15 13:39:45,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43694.0, 300 sec: 43653.6). Total num frames: 334364672. Throughput: 0: 10911.3. Samples: 83646464. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:39:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 13:39:50,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 41506.9, 300 sec: 43986.9). Total num frames: 334495744. Throughput: 0: 10797.5. Samples: 83674112. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:39:50,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 13:39:51,225][1652475] Updated weights for policy 0, policy_version 163331 (0.0016) [2024-06-15 13:39:52,529][1652475] Updated weights for policy 0, policy_version 163392 (0.0014) [2024-06-15 13:39:55,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 334725120. Throughput: 0: 11172.9. Samples: 83746816. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:39:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:39:58,163][1652475] Updated weights for policy 0, policy_version 163488 (0.0150) [2024-06-15 13:40:00,461][1652475] Updated weights for policy 0, policy_version 163572 (0.0013) [2024-06-15 13:40:00,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.5, 300 sec: 43986.8). Total num frames: 335020032. Throughput: 0: 10888.5. Samples: 83801088. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:40:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:03,562][1652475] Updated weights for policy 0, policy_version 163616 (0.0013) [2024-06-15 13:40:05,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 335151104. Throughput: 0: 10979.6. Samples: 83839488. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:40:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:07,129][1652475] Updated weights for policy 0, policy_version 163682 (0.0016) [2024-06-15 13:40:10,738][1648984] Fps is (10 sec: 32768.9, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 335347712. Throughput: 0: 11116.1. Samples: 83908608. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:40:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:11,026][1652475] Updated weights for policy 0, policy_version 163760 (0.0195) [2024-06-15 13:40:11,502][1651340] Signal inference workers to stop experience collection... (8450 times) [2024-06-15 13:40:11,548][1652475] InferenceWorker_p0-w0: stopping experience collection (8450 times) [2024-06-15 13:40:11,765][1651340] Signal inference workers to resume experience collection... (8450 times) [2024-06-15 13:40:11,767][1652475] InferenceWorker_p0-w0: resuming experience collection (8450 times) [2024-06-15 13:40:12,902][1652475] Updated weights for policy 0, policy_version 163838 (0.0013) [2024-06-15 13:40:15,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.8, 300 sec: 43764.7). Total num frames: 335544320. Throughput: 0: 10808.9. Samples: 83968000. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:40:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:16,756][1652475] Updated weights for policy 0, policy_version 163899 (0.0014) [2024-06-15 13:40:19,415][1652475] Updated weights for policy 0, policy_version 163938 (0.0015) [2024-06-15 13:40:20,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 335806464. Throughput: 0: 10911.3. Samples: 84002816. Policy #0 lag: (min: 0.0, avg: 102.6, max: 256.0) [2024-06-15 13:40:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:22,760][1652475] Updated weights for policy 0, policy_version 164004 (0.0039) [2024-06-15 13:40:24,507][1652475] Updated weights for policy 0, policy_version 164080 (0.0013) [2024-06-15 13:40:25,738][1648984] Fps is (10 sec: 52427.2, 60 sec: 43690.4, 300 sec: 43986.8). Total num frames: 336068608. Throughput: 0: 10717.8. Samples: 84064256. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:40:25,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:28,599][1652475] Updated weights for policy 0, policy_version 164130 (0.0018) [2024-06-15 13:40:30,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 336199680. Throughput: 0: 10808.9. Samples: 84132864. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:40:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:32,055][1652475] Updated weights for policy 0, policy_version 164211 (0.0012) [2024-06-15 13:40:34,734][1652475] Updated weights for policy 0, policy_version 164244 (0.0014) [2024-06-15 13:40:35,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 42598.3, 300 sec: 43542.5). Total num frames: 336461824. Throughput: 0: 10899.9. Samples: 84164608. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:40:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:36,601][1652475] Updated weights for policy 0, policy_version 164320 (0.0012) [2024-06-15 13:40:40,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.8, 300 sec: 43764.7). Total num frames: 336658432. Throughput: 0: 10672.4. Samples: 84227072. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:40:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:40,916][1652475] Updated weights for policy 0, policy_version 164406 (0.0166) [2024-06-15 13:40:44,444][1652475] Updated weights for policy 0, policy_version 164471 (0.0014) [2024-06-15 13:40:45,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 336855040. Throughput: 0: 10865.8. Samples: 84290048. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:40:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:47,827][1652475] Updated weights for policy 0, policy_version 164528 (0.0013) [2024-06-15 13:40:49,392][1652475] Updated weights for policy 0, policy_version 164592 (0.0014) [2024-06-15 13:40:50,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 337117184. Throughput: 0: 10672.3. Samples: 84319744. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:40:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:53,196][1652475] Updated weights for policy 0, policy_version 164656 (0.0016) [2024-06-15 13:40:55,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 337281024. Throughput: 0: 10649.6. Samples: 84387840. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:40:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:40:56,242][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000164720_337346560.pth... [2024-06-15 13:40:56,284][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000159552_326762496.pth [2024-06-15 13:40:56,641][1652475] Updated weights for policy 0, policy_version 164736 (0.0015) [2024-06-15 13:40:59,465][1651340] Signal inference workers to stop experience collection... (8500 times) [2024-06-15 13:40:59,526][1652475] InferenceWorker_p0-w0: stopping experience collection (8500 times) [2024-06-15 13:40:59,735][1651340] Signal inference workers to resume experience collection... (8500 times) [2024-06-15 13:40:59,736][1652475] InferenceWorker_p0-w0: resuming experience collection (8500 times) [2024-06-15 13:41:00,570][1652475] Updated weights for policy 0, policy_version 164824 (0.0012) [2024-06-15 13:41:00,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42598.6, 300 sec: 43764.7). Total num frames: 337575936. Throughput: 0: 10717.9. Samples: 84450304. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:41:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:41:01,414][1652475] Updated weights for policy 0, policy_version 164862 (0.0013) [2024-06-15 13:41:05,628][1652475] Updated weights for policy 0, policy_version 164925 (0.0017) [2024-06-15 13:41:05,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 43653.7). Total num frames: 337772544. Throughput: 0: 10752.0. Samples: 84486656. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:41:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:41:07,951][1652475] Updated weights for policy 0, policy_version 164986 (0.0015) [2024-06-15 13:41:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 337969152. Throughput: 0: 10922.7. Samples: 84555776. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:41:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:41:10,831][1652475] Updated weights for policy 0, policy_version 165025 (0.0012) [2024-06-15 13:41:12,610][1652475] Updated weights for policy 0, policy_version 165118 (0.0099) [2024-06-15 13:41:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 338165760. Throughput: 0: 10922.7. Samples: 84624384. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:41:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:41:17,915][1652475] Updated weights for policy 0, policy_version 165176 (0.0013) [2024-06-15 13:41:19,056][1652475] Updated weights for policy 0, policy_version 165216 (0.0121) [2024-06-15 13:41:20,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43690.8, 300 sec: 43764.7). Total num frames: 338427904. Throughput: 0: 10911.3. Samples: 84655616. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:41:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:41:22,129][1652475] Updated weights for policy 0, policy_version 165269 (0.0037) [2024-06-15 13:41:23,967][1652475] Updated weights for policy 0, policy_version 165334 (0.0013) [2024-06-15 13:41:24,902][1652475] Updated weights for policy 0, policy_version 165376 (0.0015) [2024-06-15 13:41:25,750][1648984] Fps is (10 sec: 52363.3, 60 sec: 43681.8, 300 sec: 43985.0). Total num frames: 338690048. Throughput: 0: 10908.2. Samples: 84718080. Policy #0 lag: (min: 69.0, avg: 161.4, max: 325.0) [2024-06-15 13:41:25,751][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:41:30,444][1652475] Updated weights for policy 0, policy_version 165439 (0.0013) [2024-06-15 13:41:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 338821120. Throughput: 0: 10843.0. Samples: 84777984. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:41:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:41:32,272][1652475] Updated weights for policy 0, policy_version 165494 (0.0014) [2024-06-15 13:41:35,738][1648984] Fps is (10 sec: 39371.1, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 339083264. Throughput: 0: 10922.7. Samples: 84811264. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:41:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:41:37,347][1652475] Updated weights for policy 0, policy_version 165569 (0.0085) [2024-06-15 13:41:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 339214336. Throughput: 0: 10843.0. Samples: 84875776. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:41:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:41:42,388][1652475] Updated weights for policy 0, policy_version 165680 (0.0024) [2024-06-15 13:41:44,395][1652475] Updated weights for policy 0, policy_version 165744 (0.0011) [2024-06-15 13:41:45,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 43690.5, 300 sec: 43542.5). Total num frames: 339476480. Throughput: 0: 10786.1. Samples: 84935680. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:41:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:41:45,749][1652475] Updated weights for policy 0, policy_version 165776 (0.0010) [2024-06-15 13:41:45,820][1651340] Signal inference workers to stop experience collection... (8550 times) [2024-06-15 13:41:45,871][1652475] InferenceWorker_p0-w0: stopping experience collection (8550 times) [2024-06-15 13:41:46,037][1651340] Signal inference workers to resume experience collection... (8550 times) [2024-06-15 13:41:46,038][1652475] InferenceWorker_p0-w0: resuming experience collection (8550 times) [2024-06-15 13:41:46,653][1652475] Updated weights for policy 0, policy_version 165823 (0.0013) [2024-06-15 13:41:50,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 43764.7). Total num frames: 339673088. Throughput: 0: 10774.8. Samples: 84971520. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:41:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:41:50,862][1652475] Updated weights for policy 0, policy_version 165875 (0.0012) [2024-06-15 13:41:54,125][1652475] Updated weights for policy 0, policy_version 165942 (0.0012) [2024-06-15 13:41:55,738][1648984] Fps is (10 sec: 45876.2, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 339935232. Throughput: 0: 10774.8. Samples: 85040640. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:41:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:41:55,741][1652475] Updated weights for policy 0, policy_version 165986 (0.0012) [2024-06-15 13:41:57,476][1652475] Updated weights for policy 0, policy_version 166039 (0.0013) [2024-06-15 13:42:00,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 340131840. Throughput: 0: 10752.0. Samples: 85108224. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:42:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 13:42:01,135][1652475] Updated weights for policy 0, policy_version 166084 (0.0034) [2024-06-15 13:42:05,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 340262912. Throughput: 0: 10706.5. Samples: 85137408. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:42:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:42:06,352][1652475] Updated weights for policy 0, policy_version 166176 (0.0103) [2024-06-15 13:42:08,182][1652475] Updated weights for policy 0, policy_version 166243 (0.0098) [2024-06-15 13:42:10,740][1648984] Fps is (10 sec: 39311.3, 60 sec: 42596.6, 300 sec: 43097.9). Total num frames: 340525056. Throughput: 0: 10595.0. Samples: 85194752. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:42:10,741][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:42:11,625][1652475] Updated weights for policy 0, policy_version 166288 (0.0013) [2024-06-15 13:42:13,581][1652475] Updated weights for policy 0, policy_version 166356 (0.0016) [2024-06-15 13:42:15,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 43690.5, 300 sec: 43543.2). Total num frames: 340787200. Throughput: 0: 10581.3. Samples: 85254144. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:42:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 13:42:19,070][1652475] Updated weights for policy 0, policy_version 166435 (0.0025) [2024-06-15 13:42:20,543][1652475] Updated weights for policy 0, policy_version 166496 (0.0038) [2024-06-15 13:42:20,738][1648984] Fps is (10 sec: 45887.1, 60 sec: 42598.3, 300 sec: 43323.7). Total num frames: 340983808. Throughput: 0: 10774.7. Samples: 85296128. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:42:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:42:24,409][1652475] Updated weights for policy 0, policy_version 166560 (0.0106) [2024-06-15 13:42:25,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42607.2, 300 sec: 43767.8). Total num frames: 341245952. Throughput: 0: 10729.2. Samples: 85358592. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:42:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:42:26,075][1652475] Updated weights for policy 0, policy_version 166645 (0.0016) [2024-06-15 13:42:29,804][1652475] Updated weights for policy 0, policy_version 166672 (0.0014) [2024-06-15 13:42:30,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 341409792. Throughput: 0: 10991.0. Samples: 85430272. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:42:30,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:42:31,878][1652475] Updated weights for policy 0, policy_version 166752 (0.0014) [2024-06-15 13:42:35,559][1651340] Signal inference workers to stop experience collection... (8600 times) [2024-06-15 13:42:35,650][1652475] InferenceWorker_p0-w0: stopping experience collection (8600 times) [2024-06-15 13:42:35,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 341573632. Throughput: 0: 10786.1. Samples: 85456896. Policy #0 lag: (min: 15.0, avg: 79.2, max: 209.0) [2024-06-15 13:42:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:42:35,769][1651340] Signal inference workers to resume experience collection... (8600 times) [2024-06-15 13:42:35,771][1652475] InferenceWorker_p0-w0: resuming experience collection (8600 times) [2024-06-15 13:42:36,519][1652475] Updated weights for policy 0, policy_version 166836 (0.0014) [2024-06-15 13:42:40,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 43653.7). Total num frames: 341835776. Throughput: 0: 10763.4. Samples: 85524992. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:42:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:42:42,303][1652475] Updated weights for policy 0, policy_version 166928 (0.0016) [2024-06-15 13:42:43,883][1652475] Updated weights for policy 0, policy_version 166992 (0.0095) [2024-06-15 13:42:44,959][1652475] Updated weights for policy 0, policy_version 167037 (0.0021) [2024-06-15 13:42:45,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.8, 300 sec: 43764.7). Total num frames: 342097920. Throughput: 0: 10786.1. Samples: 85593600. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:42:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:42:48,303][1652475] Updated weights for policy 0, policy_version 167096 (0.0012) [2024-06-15 13:42:49,723][1652475] Updated weights for policy 0, policy_version 167165 (0.0021) [2024-06-15 13:42:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44783.0, 300 sec: 43986.9). Total num frames: 342360064. Throughput: 0: 10899.9. Samples: 85627904. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:42:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:42:54,711][1652475] Updated weights for policy 0, policy_version 167221 (0.0013) [2024-06-15 13:42:55,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 342556672. Throughput: 0: 11219.1. Samples: 85699584. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:42:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:42:56,013][1652475] Updated weights for policy 0, policy_version 167289 (0.0014) [2024-06-15 13:42:56,071][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000167296_342622208.pth... [2024-06-15 13:42:56,144][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000162176_332136448.pth [2024-06-15 13:42:59,565][1652475] Updated weights for policy 0, policy_version 167331 (0.0012) [2024-06-15 13:43:00,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 342786048. Throughput: 0: 11343.7. Samples: 85764608. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:43:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:43:00,766][1652475] Updated weights for policy 0, policy_version 167392 (0.0112) [2024-06-15 13:43:05,697][1652475] Updated weights for policy 0, policy_version 167458 (0.0014) [2024-06-15 13:43:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 44782.9, 300 sec: 43653.6). Total num frames: 342949888. Throughput: 0: 11264.0. Samples: 85803008. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:43:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:43:07,664][1652475] Updated weights for policy 0, policy_version 167545 (0.0201) [2024-06-15 13:43:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44238.8, 300 sec: 43209.4). Total num frames: 343179264. Throughput: 0: 11298.2. Samples: 85867008. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:43:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:43:11,407][1652475] Updated weights for policy 0, policy_version 167616 (0.0013) [2024-06-15 13:43:12,726][1652475] Updated weights for policy 0, policy_version 167680 (0.0028) [2024-06-15 13:43:15,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.8, 300 sec: 43764.7). Total num frames: 343408640. Throughput: 0: 11298.2. Samples: 85938688. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:43:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:43:17,084][1651340] Signal inference workers to stop experience collection... (8650 times) [2024-06-15 13:43:17,147][1652475] InferenceWorker_p0-w0: stopping experience collection (8650 times) [2024-06-15 13:43:17,379][1651340] Signal inference workers to resume experience collection... (8650 times) [2024-06-15 13:43:17,380][1652475] InferenceWorker_p0-w0: resuming experience collection (8650 times) [2024-06-15 13:43:18,267][1652475] Updated weights for policy 0, policy_version 167744 (0.0016) [2024-06-15 13:43:19,784][1652475] Updated weights for policy 0, policy_version 167806 (0.0012) [2024-06-15 13:43:20,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 44782.9, 300 sec: 43653.6). Total num frames: 343670784. Throughput: 0: 11423.3. Samples: 85970944. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:43:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:43:22,821][1652475] Updated weights for policy 0, policy_version 167872 (0.0012) [2024-06-15 13:43:24,493][1652475] Updated weights for policy 0, policy_version 167936 (0.0021) [2024-06-15 13:43:25,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44783.1, 300 sec: 43986.9). Total num frames: 343932928. Throughput: 0: 11241.3. Samples: 86030848. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:43:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:43:30,563][1652475] Updated weights for policy 0, policy_version 168020 (0.0015) [2024-06-15 13:43:30,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 45329.0, 300 sec: 43764.7). Total num frames: 344129536. Throughput: 0: 11343.6. Samples: 86104064. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:43:30,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:43:32,773][1652475] Updated weights for policy 0, policy_version 168067 (0.0016) [2024-06-15 13:43:34,545][1652475] Updated weights for policy 0, policy_version 168144 (0.0014) [2024-06-15 13:43:35,532][1652475] Updated weights for policy 0, policy_version 168192 (0.0014) [2024-06-15 13:43:35,737][1648984] Fps is (10 sec: 52429.5, 60 sec: 48059.9, 300 sec: 43986.9). Total num frames: 344457216. Throughput: 0: 11241.3. Samples: 86133760. Policy #0 lag: (min: 15.0, avg: 108.4, max: 271.0) [2024-06-15 13:43:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:43:40,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 44783.0, 300 sec: 43321.1). Total num frames: 344522752. Throughput: 0: 11275.4. Samples: 86206976. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:43:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:43:41,328][1652475] Updated weights for policy 0, policy_version 168256 (0.0014) [2024-06-15 13:43:44,591][1652475] Updated weights for policy 0, policy_version 168342 (0.0024) [2024-06-15 13:43:45,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 45875.2, 300 sec: 43542.7). Total num frames: 344850432. Throughput: 0: 11013.7. Samples: 86260224. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:43:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:43:47,829][1652475] Updated weights for policy 0, policy_version 168386 (0.0013) [2024-06-15 13:43:50,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 344981504. Throughput: 0: 11013.7. Samples: 86298624. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:43:50,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:43:52,095][1652475] Updated weights for policy 0, policy_version 168450 (0.0044) [2024-06-15 13:43:55,072][1652475] Updated weights for policy 0, policy_version 168528 (0.0098) [2024-06-15 13:43:55,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 345210880. Throughput: 0: 11036.4. Samples: 86363648. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:43:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:43:56,619][1652475] Updated weights for policy 0, policy_version 168608 (0.0011) [2024-06-15 13:44:00,137][1652475] Updated weights for policy 0, policy_version 168657 (0.0012) [2024-06-15 13:44:00,452][1651340] Signal inference workers to stop experience collection... (8700 times) [2024-06-15 13:44:00,498][1652475] InferenceWorker_p0-w0: stopping experience collection (8700 times) [2024-06-15 13:44:00,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 43764.7). Total num frames: 345440256. Throughput: 0: 11070.6. Samples: 86436864. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:44:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:44:00,779][1651340] Signal inference workers to resume experience collection... (8700 times) [2024-06-15 13:44:00,780][1652475] InferenceWorker_p0-w0: resuming experience collection (8700 times) [2024-06-15 13:44:03,774][1652475] Updated weights for policy 0, policy_version 168736 (0.0011) [2024-06-15 13:44:05,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44782.9, 300 sec: 43875.8). Total num frames: 345636864. Throughput: 0: 11173.0. Samples: 86473728. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:44:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:44:06,065][1652475] Updated weights for policy 0, policy_version 168770 (0.0013) [2024-06-15 13:44:07,470][1652475] Updated weights for policy 0, policy_version 168832 (0.0013) [2024-06-15 13:44:08,872][1652475] Updated weights for policy 0, policy_version 168893 (0.0013) [2024-06-15 13:44:10,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 45329.1, 300 sec: 43986.9). Total num frames: 345899008. Throughput: 0: 11195.7. Samples: 86534656. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:44:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:44:13,052][1652475] Updated weights for policy 0, policy_version 168957 (0.0012) [2024-06-15 13:44:15,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 45329.0, 300 sec: 43875.8). Total num frames: 346128384. Throughput: 0: 11161.6. Samples: 86606336. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:44:15,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:44:17,790][1652475] Updated weights for policy 0, policy_version 169040 (0.0090) [2024-06-15 13:44:19,323][1652475] Updated weights for policy 0, policy_version 169104 (0.0017) [2024-06-15 13:44:20,366][1652475] Updated weights for policy 0, policy_version 169148 (0.0012) [2024-06-15 13:44:20,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 346423296. Throughput: 0: 11059.1. Samples: 86631424. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:44:20,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 13:44:24,771][1652475] Updated weights for policy 0, policy_version 169204 (0.0012) [2024-06-15 13:44:25,738][1648984] Fps is (10 sec: 42597.6, 60 sec: 43690.5, 300 sec: 43875.8). Total num frames: 346554368. Throughput: 0: 10843.0. Samples: 86694912. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:44:25,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:44:29,210][1652475] Updated weights for policy 0, policy_version 169238 (0.0014) [2024-06-15 13:44:30,738][1648984] Fps is (10 sec: 29490.5, 60 sec: 43144.4, 300 sec: 43431.5). Total num frames: 346718208. Throughput: 0: 11059.1. Samples: 86757888. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:44:30,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:44:30,932][1652475] Updated weights for policy 0, policy_version 169315 (0.0014) [2024-06-15 13:44:33,861][1652475] Updated weights for policy 0, policy_version 169377 (0.0014) [2024-06-15 13:44:35,603][1652475] Updated weights for policy 0, policy_version 169411 (0.0012) [2024-06-15 13:44:35,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41505.8, 300 sec: 43764.7). Total num frames: 346947584. Throughput: 0: 10945.4. Samples: 86791168. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:44:35,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:44:40,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 347111424. Throughput: 0: 10990.9. Samples: 86858240. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:44:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:44:41,059][1652475] Updated weights for policy 0, policy_version 169508 (0.0016) [2024-06-15 13:44:42,525][1652475] Updated weights for policy 0, policy_version 169571 (0.0014) [2024-06-15 13:44:45,222][1652475] Updated weights for policy 0, policy_version 169619 (0.0012) [2024-06-15 13:44:45,738][1648984] Fps is (10 sec: 45873.8, 60 sec: 42598.0, 300 sec: 43764.6). Total num frames: 347406336. Throughput: 0: 10842.9. Samples: 86924800. Policy #0 lag: (min: 31.0, avg: 98.6, max: 287.0) [2024-06-15 13:44:45,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:44:46,154][1652475] Updated weights for policy 0, policy_version 169657 (0.0012) [2024-06-15 13:44:47,274][1651340] Signal inference workers to stop experience collection... (8750 times) [2024-06-15 13:44:47,336][1652475] Updated weights for policy 0, policy_version 169685 (0.0013) [2024-06-15 13:44:47,366][1652475] InferenceWorker_p0-w0: stopping experience collection (8750 times) [2024-06-15 13:44:47,512][1651340] Signal inference workers to resume experience collection... (8750 times) [2024-06-15 13:44:47,513][1652475] InferenceWorker_p0-w0: resuming experience collection (8750 times) [2024-06-15 13:44:48,157][1652475] Updated weights for policy 0, policy_version 169727 (0.0016) [2024-06-15 13:44:50,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.8, 300 sec: 43653.7). Total num frames: 347602944. Throughput: 0: 10877.2. Samples: 86963200. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:44:50,741][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:44:52,357][1652475] Updated weights for policy 0, policy_version 169792 (0.0014) [2024-06-15 13:44:53,360][1652475] Updated weights for policy 0, policy_version 169840 (0.0013) [2024-06-15 13:44:55,738][1648984] Fps is (10 sec: 49155.0, 60 sec: 44783.0, 300 sec: 43653.7). Total num frames: 347897856. Throughput: 0: 11127.5. Samples: 87035392. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:44:55,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:44:56,215][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000169888_347930624.pth... [2024-06-15 13:44:56,423][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000164720_337346560.pth [2024-06-15 13:44:56,933][1652475] Updated weights for policy 0, policy_version 169916 (0.0040) [2024-06-15 13:44:59,295][1652475] Updated weights for policy 0, policy_version 169984 (0.0022) [2024-06-15 13:45:00,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44783.0, 300 sec: 43986.9). Total num frames: 348127232. Throughput: 0: 10934.1. Samples: 87098368. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:45:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:45:03,737][1652475] Updated weights for policy 0, policy_version 170037 (0.0014) [2024-06-15 13:45:05,187][1652475] Updated weights for policy 0, policy_version 170107 (0.0013) [2024-06-15 13:45:05,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 44209.0). Total num frames: 348389376. Throughput: 0: 11264.0. Samples: 87138304. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:45:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:45:09,122][1652475] Updated weights for policy 0, policy_version 170160 (0.0094) [2024-06-15 13:45:10,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 44209.0). Total num frames: 348585984. Throughput: 0: 11195.8. Samples: 87198720. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:45:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:45:14,846][1652475] Updated weights for policy 0, policy_version 170241 (0.0142) [2024-06-15 13:45:15,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43144.6, 300 sec: 43764.7). Total num frames: 348717056. Throughput: 0: 11377.9. Samples: 87269888. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:45:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:45:16,090][1652475] Updated weights for policy 0, policy_version 170294 (0.0015) [2024-06-15 13:45:20,118][1652475] Updated weights for policy 0, policy_version 170384 (0.0015) [2024-06-15 13:45:20,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.5, 300 sec: 43764.8). Total num frames: 348979200. Throughput: 0: 11218.6. Samples: 87296000. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:45:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:45:21,885][1652475] Updated weights for policy 0, policy_version 170435 (0.0013) [2024-06-15 13:45:23,263][1652475] Updated weights for policy 0, policy_version 170486 (0.0012) [2024-06-15 13:45:25,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 349175808. Throughput: 0: 11286.8. Samples: 87366144. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:45:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:45:27,206][1652475] Updated weights for policy 0, policy_version 170545 (0.0037) [2024-06-15 13:45:30,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 45329.2, 300 sec: 43986.9). Total num frames: 349437952. Throughput: 0: 11309.7. Samples: 87433728. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:45:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:45:31,480][1652475] Updated weights for policy 0, policy_version 170625 (0.0013) [2024-06-15 13:45:31,778][1651340] Signal inference workers to stop experience collection... (8800 times) [2024-06-15 13:45:31,828][1652475] InferenceWorker_p0-w0: stopping experience collection (8800 times) [2024-06-15 13:45:32,032][1651340] Signal inference workers to resume experience collection... (8800 times) [2024-06-15 13:45:32,033][1652475] InferenceWorker_p0-w0: resuming experience collection (8800 times) [2024-06-15 13:45:32,754][1652475] Updated weights for policy 0, policy_version 170687 (0.0014) [2024-06-15 13:45:34,761][1652475] Updated weights for policy 0, policy_version 170752 (0.0015) [2024-06-15 13:45:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45875.5, 300 sec: 44209.0). Total num frames: 349700096. Throughput: 0: 11195.7. Samples: 87467008. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:45:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:45:39,836][1652475] Updated weights for policy 0, policy_version 170819 (0.0014) [2024-06-15 13:45:40,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 46967.4, 300 sec: 44320.1). Total num frames: 349929472. Throughput: 0: 11104.7. Samples: 87535104. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:45:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 13:45:43,274][1652475] Updated weights for policy 0, policy_version 170881 (0.0013) [2024-06-15 13:45:45,162][1652475] Updated weights for policy 0, policy_version 170948 (0.0014) [2024-06-15 13:45:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45329.6, 300 sec: 44098.0). Total num frames: 350126080. Throughput: 0: 11138.8. Samples: 87599616. Policy #0 lag: (min: 39.0, avg: 156.9, max: 295.0) [2024-06-15 13:45:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:45:46,426][1652475] Updated weights for policy 0, policy_version 171008 (0.0013) [2024-06-15 13:45:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 45329.0, 300 sec: 44209.0). Total num frames: 350322688. Throughput: 0: 11093.3. Samples: 87637504. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:45:50,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 13:45:50,808][1652475] Updated weights for policy 0, policy_version 171060 (0.0115) [2024-06-15 13:45:52,419][1652475] Updated weights for policy 0, policy_version 171136 (0.0133) [2024-06-15 13:45:55,243][1652475] Updated weights for policy 0, policy_version 171200 (0.0014) [2024-06-15 13:45:55,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 45328.9, 300 sec: 44209.0). Total num frames: 350617600. Throughput: 0: 11195.7. Samples: 87702528. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:45:55,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:46:00,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 350748672. Throughput: 0: 11218.5. Samples: 87774720. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:46:00,822][1652475] Updated weights for policy 0, policy_version 171265 (0.0015) [2024-06-15 13:46:04,051][1652475] Updated weights for policy 0, policy_version 171351 (0.0014) [2024-06-15 13:46:05,211][1652475] Updated weights for policy 0, policy_version 171408 (0.0028) [2024-06-15 13:46:05,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 44782.9, 300 sec: 44431.2). Total num frames: 351076352. Throughput: 0: 11332.2. Samples: 87805952. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:05,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 13:46:09,744][1652475] Updated weights for policy 0, policy_version 171473 (0.0026) [2024-06-15 13:46:10,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 44236.8, 300 sec: 44320.1). Total num frames: 351240192. Throughput: 0: 11377.8. Samples: 87878144. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:10,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 13:46:12,064][1652475] Updated weights for policy 0, policy_version 171522 (0.0012) [2024-06-15 13:46:13,545][1652475] Updated weights for policy 0, policy_version 171583 (0.0013) [2024-06-15 13:46:15,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 44320.1). Total num frames: 351502336. Throughput: 0: 11298.1. Samples: 87942144. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:15,740][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 13:46:15,875][1651340] Signal inference workers to stop experience collection... (8850 times) [2024-06-15 13:46:15,945][1652475] InferenceWorker_p0-w0: stopping experience collection (8850 times) [2024-06-15 13:46:15,947][1652475] Updated weights for policy 0, policy_version 171635 (0.0013) [2024-06-15 13:46:16,097][1651340] Signal inference workers to resume experience collection... (8850 times) [2024-06-15 13:46:16,098][1652475] InferenceWorker_p0-w0: resuming experience collection (8850 times) [2024-06-15 13:46:17,399][1652475] Updated weights for policy 0, policy_version 171712 (0.0147) [2024-06-15 13:46:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44782.9, 300 sec: 43988.7). Total num frames: 351666176. Throughput: 0: 11241.2. Samples: 87972864. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:20,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 13:46:22,615][1652475] Updated weights for policy 0, policy_version 171775 (0.0014) [2024-06-15 13:46:24,298][1652475] Updated weights for policy 0, policy_version 171814 (0.0011) [2024-06-15 13:46:25,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 44431.2). Total num frames: 351928320. Throughput: 0: 11377.8. Samples: 88047104. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:46:27,403][1652475] Updated weights for policy 0, policy_version 171901 (0.0016) [2024-06-15 13:46:28,521][1652475] Updated weights for policy 0, policy_version 171954 (0.0093) [2024-06-15 13:46:30,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 44431.2). Total num frames: 352190464. Throughput: 0: 11286.7. Samples: 88107520. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 13:46:32,856][1652475] Updated weights for policy 0, policy_version 171985 (0.0012) [2024-06-15 13:46:35,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 352321536. Throughput: 0: 11173.0. Samples: 88140288. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:35,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:46:38,404][1652475] Updated weights for policy 0, policy_version 172080 (0.0014) [2024-06-15 13:46:40,254][1652475] Updated weights for policy 0, policy_version 172154 (0.0021) [2024-06-15 13:46:40,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44236.7, 300 sec: 44431.2). Total num frames: 352583680. Throughput: 0: 11036.5. Samples: 88199168. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:40,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 13:46:42,812][1652475] Updated weights for policy 0, policy_version 172195 (0.0012) [2024-06-15 13:46:45,345][1652475] Updated weights for policy 0, policy_version 172256 (0.0016) [2024-06-15 13:46:45,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 44782.8, 300 sec: 44542.3). Total num frames: 352813056. Throughput: 0: 10740.6. Samples: 88258048. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:45,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:46:49,391][1652475] Updated weights for policy 0, policy_version 172289 (0.0011) [2024-06-15 13:46:50,380][1652475] Updated weights for policy 0, policy_version 172344 (0.0012) [2024-06-15 13:46:50,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 44236.9, 300 sec: 44209.0). Total num frames: 352976896. Throughput: 0: 10865.8. Samples: 88294912. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:46:51,879][1652475] Updated weights for policy 0, policy_version 172411 (0.0090) [2024-06-15 13:46:55,645][1652475] Updated weights for policy 0, policy_version 172472 (0.0124) [2024-06-15 13:46:55,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43144.5, 300 sec: 44320.1). Total num frames: 353206272. Throughput: 0: 10877.1. Samples: 88367616. Policy #0 lag: (min: 47.0, avg: 135.1, max: 303.0) [2024-06-15 13:46:55,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:46:55,813][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000172480_353239040.pth... [2024-06-15 13:46:55,929][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000167296_342622208.pth [2024-06-15 13:46:57,459][1652475] Updated weights for policy 0, policy_version 172540 (0.0015) [2024-06-15 13:47:00,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 353370112. Throughput: 0: 10752.0. Samples: 88425984. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:02,997][1652475] Updated weights for policy 0, policy_version 172608 (0.0013) [2024-06-15 13:47:03,081][1651340] Signal inference workers to stop experience collection... (8900 times) [2024-06-15 13:47:03,135][1652475] InferenceWorker_p0-w0: stopping experience collection (8900 times) [2024-06-15 13:47:03,274][1651340] Signal inference workers to resume experience collection... (8900 times) [2024-06-15 13:47:03,275][1652475] InferenceWorker_p0-w0: resuming experience collection (8900 times) [2024-06-15 13:47:04,157][1652475] Updated weights for policy 0, policy_version 172668 (0.0127) [2024-06-15 13:47:05,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 42598.4, 300 sec: 44431.6). Total num frames: 353632256. Throughput: 0: 10831.6. Samples: 88460288. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:08,491][1652475] Updated weights for policy 0, policy_version 172752 (0.0013) [2024-06-15 13:47:09,802][1652475] Updated weights for policy 0, policy_version 172796 (0.0027) [2024-06-15 13:47:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 353894400. Throughput: 0: 10547.2. Samples: 88521728. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:14,577][1652475] Updated weights for policy 0, policy_version 172850 (0.0013) [2024-06-15 13:47:15,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43144.6, 300 sec: 44431.2). Total num frames: 354091008. Throughput: 0: 10763.4. Samples: 88591872. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:16,090][1652475] Updated weights for policy 0, policy_version 172922 (0.0012) [2024-06-15 13:47:19,783][1652475] Updated weights for policy 0, policy_version 172991 (0.0014) [2024-06-15 13:47:20,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 44320.1). Total num frames: 354320384. Throughput: 0: 10797.5. Samples: 88626176. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:21,616][1652475] Updated weights for policy 0, policy_version 173046 (0.0083) [2024-06-15 13:47:25,738][1648984] Fps is (10 sec: 36043.9, 60 sec: 42052.1, 300 sec: 44209.0). Total num frames: 354451456. Throughput: 0: 10899.9. Samples: 88689664. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:26,260][1652475] Updated weights for policy 0, policy_version 173104 (0.0012) [2024-06-15 13:47:27,162][1652475] Updated weights for policy 0, policy_version 173152 (0.0020) [2024-06-15 13:47:30,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 44431.2). Total num frames: 354680832. Throughput: 0: 11184.4. Samples: 88761344. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:31,723][1652475] Updated weights for policy 0, policy_version 173217 (0.0014) [2024-06-15 13:47:33,648][1652475] Updated weights for policy 0, policy_version 173301 (0.0083) [2024-06-15 13:47:35,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 354942976. Throughput: 0: 10911.3. Samples: 88785920. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:37,419][1652475] Updated weights for policy 0, policy_version 173344 (0.0013) [2024-06-15 13:47:39,103][1652475] Updated weights for policy 0, policy_version 173424 (0.0012) [2024-06-15 13:47:40,746][1648984] Fps is (10 sec: 52384.5, 60 sec: 43684.6, 300 sec: 44429.9). Total num frames: 355205120. Throughput: 0: 10841.0. Samples: 88855552. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:40,747][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:42,944][1652475] Updated weights for policy 0, policy_version 173457 (0.0016) [2024-06-15 13:47:44,073][1652475] Updated weights for policy 0, policy_version 173506 (0.0021) [2024-06-15 13:47:45,608][1652475] Updated weights for policy 0, policy_version 173568 (0.0077) [2024-06-15 13:47:45,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 355467264. Throughput: 0: 10990.9. Samples: 88920576. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:48,588][1651340] Signal inference workers to stop experience collection... (8950 times) [2024-06-15 13:47:48,628][1652475] InferenceWorker_p0-w0: stopping experience collection (8950 times) [2024-06-15 13:47:48,769][1651340] Signal inference workers to resume experience collection... (8950 times) [2024-06-15 13:47:48,771][1652475] InferenceWorker_p0-w0: resuming experience collection (8950 times) [2024-06-15 13:47:49,820][1652475] Updated weights for policy 0, policy_version 173637 (0.0014) [2024-06-15 13:47:50,738][1648984] Fps is (10 sec: 49193.6, 60 sec: 45329.0, 300 sec: 44542.3). Total num frames: 355696640. Throughput: 0: 11127.5. Samples: 88961024. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:50,754][1652475] Updated weights for policy 0, policy_version 173694 (0.0040) [2024-06-15 13:47:54,881][1652475] Updated weights for policy 0, policy_version 173744 (0.0037) [2024-06-15 13:47:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44237.0, 300 sec: 44320.1). Total num frames: 355860480. Throughput: 0: 11320.9. Samples: 89031168. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:47:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:47:56,657][1652475] Updated weights for policy 0, policy_version 173808 (0.0038) [2024-06-15 13:48:00,704][1652475] Updated weights for policy 0, policy_version 173844 (0.0012) [2024-06-15 13:48:00,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 44236.8, 300 sec: 44320.1). Total num frames: 356024320. Throughput: 0: 11286.7. Samples: 89099776. Policy #0 lag: (min: 63.0, avg: 174.0, max: 319.0) [2024-06-15 13:48:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:48:02,404][1652475] Updated weights for policy 0, policy_version 173921 (0.0089) [2024-06-15 13:48:05,475][1652475] Updated weights for policy 0, policy_version 173968 (0.0013) [2024-06-15 13:48:05,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 356286464. Throughput: 0: 11150.2. Samples: 89127936. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:05,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 13:48:07,453][1652475] Updated weights for policy 0, policy_version 174048 (0.0109) [2024-06-15 13:48:10,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 356515840. Throughput: 0: 11116.1. Samples: 89189888. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:10,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 13:48:12,635][1652475] Updated weights for policy 0, policy_version 174128 (0.0095) [2024-06-15 13:48:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 356646912. Throughput: 0: 10979.6. Samples: 89255424. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:48:16,995][1652475] Updated weights for policy 0, policy_version 174178 (0.0015) [2024-06-15 13:48:18,200][1652475] Updated weights for policy 0, policy_version 174226 (0.0012) [2024-06-15 13:48:20,650][1652475] Updated weights for policy 0, policy_version 174288 (0.0024) [2024-06-15 13:48:20,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 44097.9). Total num frames: 356941824. Throughput: 0: 11036.4. Samples: 89282560. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:20,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:48:21,429][1652475] Updated weights for policy 0, policy_version 174332 (0.0048) [2024-06-15 13:48:24,582][1652475] Updated weights for policy 0, policy_version 174400 (0.0014) [2024-06-15 13:48:25,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 45329.2, 300 sec: 44209.0). Total num frames: 357171200. Throughput: 0: 10890.6. Samples: 89345536. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:48:29,512][1652475] Updated weights for policy 0, policy_version 174464 (0.0026) [2024-06-15 13:48:30,471][1651340] Signal inference workers to stop experience collection... (9000 times) [2024-06-15 13:48:30,541][1652475] InferenceWorker_p0-w0: stopping experience collection (9000 times) [2024-06-15 13:48:30,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 357367808. Throughput: 0: 10945.4. Samples: 89413120. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:30,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:48:30,764][1651340] Signal inference workers to resume experience collection... (9000 times) [2024-06-15 13:48:30,765][1652475] InferenceWorker_p0-w0: resuming experience collection (9000 times) [2024-06-15 13:48:31,001][1652475] Updated weights for policy 0, policy_version 174520 (0.0012) [2024-06-15 13:48:33,864][1652475] Updated weights for policy 0, policy_version 174591 (0.0014) [2024-06-15 13:48:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 44320.1). Total num frames: 357597184. Throughput: 0: 10808.9. Samples: 89447424. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:48:40,227][1652475] Updated weights for policy 0, policy_version 174659 (0.0014) [2024-06-15 13:48:40,738][1648984] Fps is (10 sec: 36043.9, 60 sec: 42058.1, 300 sec: 43653.6). Total num frames: 357728256. Throughput: 0: 10706.4. Samples: 89512960. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:40,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:48:42,381][1652475] Updated weights for policy 0, policy_version 174737 (0.0013) [2024-06-15 13:48:45,147][1652475] Updated weights for policy 0, policy_version 174790 (0.0015) [2024-06-15 13:48:45,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42598.3, 300 sec: 44209.0). Total num frames: 358023168. Throughput: 0: 10444.8. Samples: 89569792. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:45,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:48:49,954][1652475] Updated weights for policy 0, policy_version 174880 (0.0012) [2024-06-15 13:48:50,746][1648984] Fps is (10 sec: 49111.3, 60 sec: 42046.3, 300 sec: 44096.7). Total num frames: 358219776. Throughput: 0: 10499.7. Samples: 89600512. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:50,747][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:48:54,208][1652475] Updated weights for policy 0, policy_version 174966 (0.0014) [2024-06-15 13:48:55,740][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.6, 300 sec: 44209.0). Total num frames: 358481920. Throughput: 0: 10478.9. Samples: 89661440. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:48:55,741][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 13:48:55,747][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000175040_358481920.pth... [2024-06-15 13:48:55,816][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000169888_347930624.pth [2024-06-15 13:48:57,916][1652475] Updated weights for policy 0, policy_version 175058 (0.0014) [2024-06-15 13:48:58,896][1652475] Updated weights for policy 0, policy_version 175104 (0.0023) [2024-06-15 13:49:00,738][1648984] Fps is (10 sec: 39354.2, 60 sec: 43144.4, 300 sec: 43986.8). Total num frames: 358612992. Throughput: 0: 10296.8. Samples: 89718784. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:49:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:49:04,049][1652475] Updated weights for policy 0, policy_version 175158 (0.0013) [2024-06-15 13:49:05,268][1652475] Updated weights for policy 0, policy_version 175205 (0.0013) [2024-06-15 13:49:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 358875136. Throughput: 0: 10604.1. Samples: 89759744. Policy #0 lag: (min: 47.0, avg: 137.3, max: 303.0) [2024-06-15 13:49:05,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:08,605][1652475] Updated weights for policy 0, policy_version 175270 (0.0018) [2024-06-15 13:49:09,974][1652475] Updated weights for policy 0, policy_version 175334 (0.0014) [2024-06-15 13:49:10,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43690.7, 300 sec: 44098.0). Total num frames: 359137280. Throughput: 0: 10615.5. Samples: 89823232. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:49:10,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:13,665][1652475] Updated weights for policy 0, policy_version 175376 (0.0013) [2024-06-15 13:49:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 359268352. Throughput: 0: 10717.8. Samples: 89895424. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:49:15,740][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:16,536][1652475] Updated weights for policy 0, policy_version 175440 (0.0015) [2024-06-15 13:49:18,835][1651340] Signal inference workers to stop experience collection... (9050 times) [2024-06-15 13:49:18,885][1652475] Updated weights for policy 0, policy_version 175506 (0.0097) [2024-06-15 13:49:18,944][1652475] InferenceWorker_p0-w0: stopping experience collection (9050 times) [2024-06-15 13:49:19,036][1651340] Signal inference workers to resume experience collection... (9050 times) [2024-06-15 13:49:19,038][1652475] InferenceWorker_p0-w0: resuming experience collection (9050 times) [2024-06-15 13:49:19,568][1652475] Updated weights for policy 0, policy_version 175549 (0.0012) [2024-06-15 13:49:20,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44236.8, 300 sec: 44209.1). Total num frames: 359596032. Throughput: 0: 10706.5. Samples: 89929216. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:49:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:24,980][1652475] Updated weights for policy 0, policy_version 175618 (0.0035) [2024-06-15 13:49:25,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.5, 300 sec: 44098.0). Total num frames: 359727104. Throughput: 0: 10774.8. Samples: 89997824. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:49:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:26,378][1652475] Updated weights for policy 0, policy_version 175672 (0.0031) [2024-06-15 13:49:28,716][1652475] Updated weights for policy 0, policy_version 175739 (0.0013) [2024-06-15 13:49:30,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 44320.2). Total num frames: 360022016. Throughput: 0: 11025.1. Samples: 90065920. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:49:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:30,832][1652475] Updated weights for policy 0, policy_version 175808 (0.0012) [2024-06-15 13:49:33,041][1652475] Updated weights for policy 0, policy_version 175869 (0.0014) [2024-06-15 13:49:35,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43144.5, 300 sec: 44320.1). Total num frames: 360185856. Throughput: 0: 11038.5. Samples: 90097152. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:49:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:37,928][1652475] Updated weights for policy 0, policy_version 175933 (0.0013) [2024-06-15 13:49:40,365][1652475] Updated weights for policy 0, policy_version 175990 (0.0016) [2024-06-15 13:49:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 45329.2, 300 sec: 44209.1). Total num frames: 360448000. Throughput: 0: 11400.6. Samples: 90174464. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:49:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:41,812][1652475] Updated weights for policy 0, policy_version 176048 (0.0011) [2024-06-15 13:49:44,013][1652475] Updated weights for policy 0, policy_version 176122 (0.0056) [2024-06-15 13:49:45,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 360710144. Throughput: 0: 11559.9. Samples: 90238976. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:49:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:48,850][1652475] Updated weights for policy 0, policy_version 176185 (0.0121) [2024-06-15 13:49:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43696.9, 300 sec: 43875.8). Total num frames: 360841216. Throughput: 0: 11423.3. Samples: 90273792. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:49:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:51,700][1652475] Updated weights for policy 0, policy_version 176248 (0.0012) [2024-06-15 13:49:54,158][1652475] Updated weights for policy 0, policy_version 176312 (0.0126) [2024-06-15 13:49:55,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 44782.9, 300 sec: 44209.0). Total num frames: 361168896. Throughput: 0: 11446.0. Samples: 90338304. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:49:55,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:49:56,140][1652475] Updated weights for policy 0, policy_version 176378 (0.0104) [2024-06-15 13:49:59,656][1652475] Updated weights for policy 0, policy_version 176416 (0.0012) [2024-06-15 13:50:00,227][1652475] Updated weights for policy 0, policy_version 176448 (0.0032) [2024-06-15 13:50:00,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 45875.3, 300 sec: 43986.9). Total num frames: 361365504. Throughput: 0: 11411.9. Samples: 90408960. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:50:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:50:04,797][1652475] Updated weights for policy 0, policy_version 176516 (0.0109) [2024-06-15 13:50:05,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 45329.1, 300 sec: 44098.0). Total num frames: 361594880. Throughput: 0: 11400.5. Samples: 90442240. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:50:05,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:50:06,603][1651340] Signal inference workers to stop experience collection... (9100 times) [2024-06-15 13:50:06,674][1652475] InferenceWorker_p0-w0: stopping experience collection (9100 times) [2024-06-15 13:50:06,800][1651340] Signal inference workers to resume experience collection... (9100 times) [2024-06-15 13:50:06,801][1652475] InferenceWorker_p0-w0: resuming experience collection (9100 times) [2024-06-15 13:50:07,319][1652475] Updated weights for policy 0, policy_version 176610 (0.0013) [2024-06-15 13:50:10,738][1648984] Fps is (10 sec: 42596.4, 60 sec: 44236.4, 300 sec: 44320.0). Total num frames: 361791488. Throughput: 0: 11389.0. Samples: 90510336. Policy #0 lag: (min: 15.0, avg: 130.3, max: 319.0) [2024-06-15 13:50:10,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:50:10,749][1652475] Updated weights for policy 0, policy_version 176672 (0.0016) [2024-06-15 13:50:14,550][1652475] Updated weights for policy 0, policy_version 176720 (0.0012) [2024-06-15 13:50:15,679][1652475] Updated weights for policy 0, policy_version 176768 (0.0104) [2024-06-15 13:50:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45875.2, 300 sec: 44209.0). Total num frames: 362020864. Throughput: 0: 11355.0. Samples: 90576896. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:50:15,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:50:18,176][1652475] Updated weights for policy 0, policy_version 176832 (0.0013) [2024-06-15 13:50:19,167][1652475] Updated weights for policy 0, policy_version 176872 (0.0014) [2024-06-15 13:50:20,738][1648984] Fps is (10 sec: 49154.5, 60 sec: 44782.9, 300 sec: 44431.2). Total num frames: 362283008. Throughput: 0: 11355.0. Samples: 90608128. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:50:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:50:22,392][1652475] Updated weights for policy 0, policy_version 176928 (0.0012) [2024-06-15 13:50:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 362414080. Throughput: 0: 11093.3. Samples: 90673664. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:50:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:50:28,126][1652475] Updated weights for policy 0, policy_version 177008 (0.0019) [2024-06-15 13:50:29,622][1652475] Updated weights for policy 0, policy_version 177077 (0.0012) [2024-06-15 13:50:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 362676224. Throughput: 0: 11025.1. Samples: 90735104. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:50:30,740][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 13:50:32,240][1652475] Updated weights for policy 0, policy_version 177142 (0.0013) [2024-06-15 13:50:34,541][1652475] Updated weights for policy 0, policy_version 177184 (0.0014) [2024-06-15 13:50:35,323][1652475] Updated weights for policy 0, policy_version 177216 (0.0012) [2024-06-15 13:50:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 44098.0). Total num frames: 362938368. Throughput: 0: 10820.3. Samples: 90760704. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:50:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:50:40,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 43764.7). Total num frames: 363036672. Throughput: 0: 10922.7. Samples: 90829824. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:50:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:50:42,414][1652475] Updated weights for policy 0, policy_version 177296 (0.0136) [2024-06-15 13:50:45,702][1652475] Updated weights for policy 0, policy_version 177376 (0.0018) [2024-06-15 13:50:45,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42598.4, 300 sec: 43875.8). Total num frames: 363266048. Throughput: 0: 10740.6. Samples: 90892288. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:50:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:50:47,878][1652475] Updated weights for policy 0, policy_version 177468 (0.0049) [2024-06-15 13:50:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 363462656. Throughput: 0: 10456.2. Samples: 90912768. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:50:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:50:53,075][1652475] Updated weights for policy 0, policy_version 177509 (0.0020) [2024-06-15 13:50:54,629][1652475] Updated weights for policy 0, policy_version 177555 (0.0014) [2024-06-15 13:50:55,738][1648984] Fps is (10 sec: 45873.0, 60 sec: 42598.1, 300 sec: 43986.8). Total num frames: 363724800. Throughput: 0: 10638.2. Samples: 90989056. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:50:55,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:50:55,755][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000177600_363724800.pth... [2024-06-15 13:50:55,799][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000172480_353239040.pth [2024-06-15 13:50:55,804][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000177600_363724800.pth [2024-06-15 13:50:57,043][1651340] Signal inference workers to stop experience collection... (9150 times) [2024-06-15 13:50:57,120][1652475] InferenceWorker_p0-w0: stopping experience collection (9150 times) [2024-06-15 13:50:57,123][1652475] Updated weights for policy 0, policy_version 177606 (0.0011) [2024-06-15 13:50:57,277][1651340] Signal inference workers to resume experience collection... (9150 times) [2024-06-15 13:50:57,278][1652475] InferenceWorker_p0-w0: resuming experience collection (9150 times) [2024-06-15 13:50:59,540][1652475] Updated weights for policy 0, policy_version 177712 (0.0096) [2024-06-15 13:51:00,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 363986944. Throughput: 0: 10376.5. Samples: 91043840. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:51:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 13:51:05,275][1652475] Updated weights for policy 0, policy_version 177786 (0.0015) [2024-06-15 13:51:05,738][1648984] Fps is (10 sec: 39323.2, 60 sec: 42052.2, 300 sec: 43653.6). Total num frames: 364118016. Throughput: 0: 10547.2. Samples: 91082752. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:51:05,751][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 13:51:07,560][1652475] Updated weights for policy 0, policy_version 177856 (0.0013) [2024-06-15 13:51:10,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 41506.5, 300 sec: 43320.4). Total num frames: 364281856. Throughput: 0: 10387.9. Samples: 91141120. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:51:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 13:51:11,346][1652475] Updated weights for policy 0, policy_version 177910 (0.0012) [2024-06-15 13:51:13,057][1652475] Updated weights for policy 0, policy_version 177968 (0.0012) [2024-06-15 13:51:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 364511232. Throughput: 0: 10433.4. Samples: 91204608. Policy #0 lag: (min: 4.0, avg: 97.5, max: 260.0) [2024-06-15 13:51:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 13:51:19,474][1652475] Updated weights for policy 0, policy_version 178064 (0.0101) [2024-06-15 13:51:20,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 41506.1, 300 sec: 43542.5). Total num frames: 364773376. Throughput: 0: 10626.8. Samples: 91238912. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:51:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:51:22,845][1652475] Updated weights for policy 0, policy_version 178145 (0.0016) [2024-06-15 13:51:24,726][1652475] Updated weights for policy 0, policy_version 178208 (0.0095) [2024-06-15 13:51:25,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 365035520. Throughput: 0: 10422.0. Samples: 91298816. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:51:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:51:30,252][1652475] Updated weights for policy 0, policy_version 178256 (0.0013) [2024-06-15 13:51:30,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 40413.9, 300 sec: 43320.4). Total num frames: 365101056. Throughput: 0: 10672.4. Samples: 91372544. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:51:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:51:32,785][1652475] Updated weights for policy 0, policy_version 178358 (0.0014) [2024-06-15 13:51:35,070][1652475] Updated weights for policy 0, policy_version 178426 (0.0015) [2024-06-15 13:51:35,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.0, 300 sec: 43542.5). Total num frames: 365428736. Throughput: 0: 10706.4. Samples: 91394560. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:51:35,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:51:37,631][1652475] Updated weights for policy 0, policy_version 178495 (0.0012) [2024-06-15 13:51:40,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 365559808. Throughput: 0: 10467.7. Samples: 91460096. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:51:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:51:44,154][1651340] Signal inference workers to stop experience collection... (9200 times) [2024-06-15 13:51:44,186][1652475] InferenceWorker_p0-w0: stopping experience collection (9200 times) [2024-06-15 13:51:44,190][1652475] Updated weights for policy 0, policy_version 178578 (0.0014) [2024-06-15 13:51:44,430][1651340] Signal inference workers to resume experience collection... (9200 times) [2024-06-15 13:51:44,431][1652475] InferenceWorker_p0-w0: resuming experience collection (9200 times) [2024-06-15 13:51:45,409][1652475] Updated weights for policy 0, policy_version 178624 (0.0013) [2024-06-15 13:51:45,738][1648984] Fps is (10 sec: 39322.6, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 365821952. Throughput: 0: 10513.1. Samples: 91516928. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:51:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:51:47,700][1652475] Updated weights for policy 0, policy_version 178680 (0.0079) [2024-06-15 13:51:50,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43653.7). Total num frames: 366084096. Throughput: 0: 10387.9. Samples: 91550208. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:51:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:51:54,781][1652475] Updated weights for policy 0, policy_version 178755 (0.0013) [2024-06-15 13:51:55,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 40414.1, 300 sec: 43320.4). Total num frames: 366149632. Throughput: 0: 10695.1. Samples: 91622400. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:51:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:51:57,232][1652475] Updated weights for policy 0, policy_version 178848 (0.0102) [2024-06-15 13:51:59,144][1652475] Updated weights for policy 0, policy_version 178912 (0.0014) [2024-06-15 13:51:59,714][1652475] Updated weights for policy 0, policy_version 178944 (0.0013) [2024-06-15 13:52:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 366477312. Throughput: 0: 10604.1. Samples: 91681792. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:52:00,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:52:02,099][1652475] Updated weights for policy 0, policy_version 179002 (0.0010) [2024-06-15 13:52:05,750][1648984] Fps is (10 sec: 45821.3, 60 sec: 41498.0, 300 sec: 43096.5). Total num frames: 366608384. Throughput: 0: 10612.7. Samples: 91716608. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:52:05,750][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:52:07,328][1652475] Updated weights for policy 0, policy_version 179056 (0.0012) [2024-06-15 13:52:08,816][1652475] Updated weights for policy 0, policy_version 179120 (0.0149) [2024-06-15 13:52:10,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 366870528. Throughput: 0: 10706.5. Samples: 91780608. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:52:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:52:11,391][1652475] Updated weights for policy 0, policy_version 179184 (0.0014) [2024-06-15 13:52:13,880][1652475] Updated weights for policy 0, policy_version 179248 (0.0014) [2024-06-15 13:52:15,738][1648984] Fps is (10 sec: 52490.6, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 367132672. Throughput: 0: 10558.6. Samples: 91847680. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:52:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:52:19,028][1652475] Updated weights for policy 0, policy_version 179296 (0.0013) [2024-06-15 13:52:20,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42598.5, 300 sec: 43653.7). Total num frames: 367329280. Throughput: 0: 10945.5. Samples: 91887104. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:52:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:52:20,890][1652475] Updated weights for policy 0, policy_version 179365 (0.0019) [2024-06-15 13:52:23,259][1652475] Updated weights for policy 0, policy_version 179450 (0.0016) [2024-06-15 13:52:25,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 367558656. Throughput: 0: 10774.8. Samples: 91944960. Policy #0 lag: (min: 95.0, avg: 192.4, max: 367.0) [2024-06-15 13:52:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:52:26,502][1652475] Updated weights for policy 0, policy_version 179512 (0.0014) [2024-06-15 13:52:30,750][1648984] Fps is (10 sec: 35999.3, 60 sec: 43135.5, 300 sec: 43207.5). Total num frames: 367689728. Throughput: 0: 11135.7. Samples: 92018176. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:52:30,751][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:52:31,380][1652475] Updated weights for policy 0, policy_version 179584 (0.0038) [2024-06-15 13:52:31,772][1651340] Signal inference workers to stop experience collection... (9250 times) [2024-06-15 13:52:31,797][1652475] InferenceWorker_p0-w0: stopping experience collection (9250 times) [2024-06-15 13:52:31,984][1651340] Signal inference workers to resume experience collection... (9250 times) [2024-06-15 13:52:31,984][1652475] InferenceWorker_p0-w0: resuming experience collection (9250 times) [2024-06-15 13:52:34,955][1652475] Updated weights for policy 0, policy_version 179673 (0.0013) [2024-06-15 13:52:35,616][1652475] Updated weights for policy 0, policy_version 179711 (0.0135) [2024-06-15 13:52:35,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.8, 300 sec: 43543.8). Total num frames: 368050176. Throughput: 0: 10922.7. Samples: 92041728. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:52:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:52:39,115][1652475] Updated weights for policy 0, policy_version 179766 (0.0011) [2024-06-15 13:52:40,738][1648984] Fps is (10 sec: 49213.1, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 368181248. Throughput: 0: 10752.0. Samples: 92106240. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:52:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:52:42,651][1652475] Updated weights for policy 0, policy_version 179824 (0.0013) [2024-06-15 13:52:44,784][1652475] Updated weights for policy 0, policy_version 179874 (0.0014) [2024-06-15 13:52:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 368443392. Throughput: 0: 10979.5. Samples: 92175872. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:52:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 13:52:47,041][1652475] Updated weights for policy 0, policy_version 179936 (0.0011) [2024-06-15 13:52:49,032][1652475] Updated weights for policy 0, policy_version 179984 (0.0015) [2024-06-15 13:52:50,152][1652475] Updated weights for policy 0, policy_version 180030 (0.0010) [2024-06-15 13:52:50,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 368705536. Throughput: 0: 10914.2. Samples: 92207616. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:52:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:52:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44783.0, 300 sec: 43431.5). Total num frames: 368836608. Throughput: 0: 10831.7. Samples: 92268032. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:52:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 13:52:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000180096_368836608.pth... [2024-06-15 13:52:55,830][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000175040_358481920.pth [2024-06-15 13:52:59,028][1652475] Updated weights for policy 0, policy_version 180103 (0.0017) [2024-06-15 13:53:00,335][1652475] Updated weights for policy 0, policy_version 180161 (0.0013) [2024-06-15 13:53:00,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 369000448. Throughput: 0: 10774.8. Samples: 92332544. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:53:00,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:53:02,446][1652475] Updated weights for policy 0, policy_version 180240 (0.0012) [2024-06-15 13:53:05,352][1652475] Updated weights for policy 0, policy_version 180320 (0.0013) [2024-06-15 13:53:05,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 45338.0, 300 sec: 43431.5). Total num frames: 369328128. Throughput: 0: 10399.3. Samples: 92355072. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:53:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:53:10,738][1648984] Fps is (10 sec: 36043.6, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 369360896. Throughput: 0: 10751.9. Samples: 92428800. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:53:10,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:53:11,479][1652475] Updated weights for policy 0, policy_version 180370 (0.0026) [2024-06-15 13:53:12,972][1652475] Updated weights for policy 0, policy_version 180432 (0.0012) [2024-06-15 13:53:15,275][1652475] Updated weights for policy 0, policy_version 180512 (0.0013) [2024-06-15 13:53:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 369721344. Throughput: 0: 10447.7. Samples: 92488192. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:53:15,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:53:16,721][1652475] Updated weights for policy 0, policy_version 180560 (0.0014) [2024-06-15 13:53:17,979][1652475] Updated weights for policy 0, policy_version 180608 (0.0134) [2024-06-15 13:53:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42598.2, 300 sec: 43098.2). Total num frames: 369885184. Throughput: 0: 10638.2. Samples: 92520448. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:53:20,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:53:23,754][1651340] Signal inference workers to stop experience collection... (9300 times) [2024-06-15 13:53:23,810][1652475] InferenceWorker_p0-w0: stopping experience collection (9300 times) [2024-06-15 13:53:24,096][1651340] Signal inference workers to resume experience collection... (9300 times) [2024-06-15 13:53:24,098][1652475] InferenceWorker_p0-w0: resuming experience collection (9300 times) [2024-06-15 13:53:25,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 40960.0, 300 sec: 42876.1). Total num frames: 370016256. Throughput: 0: 10717.9. Samples: 92588544. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:53:25,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:53:26,494][1652475] Updated weights for policy 0, policy_version 180709 (0.0014) [2024-06-15 13:53:28,576][1652475] Updated weights for policy 0, policy_version 180793 (0.0014) [2024-06-15 13:53:30,739][1648984] Fps is (10 sec: 45869.2, 60 sec: 44244.9, 300 sec: 43209.1). Total num frames: 370343936. Throughput: 0: 10296.5. Samples: 92639232. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:53:30,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:53:30,975][1652475] Updated weights for policy 0, policy_version 180848 (0.0112) [2024-06-15 13:53:35,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 39321.6, 300 sec: 42987.2). Total num frames: 370409472. Throughput: 0: 10262.7. Samples: 92669440. Policy #0 lag: (min: 17.0, avg: 148.5, max: 273.0) [2024-06-15 13:53:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:53:36,930][1652475] Updated weights for policy 0, policy_version 180898 (0.0014) [2024-06-15 13:53:38,713][1652475] Updated weights for policy 0, policy_version 180960 (0.0013) [2024-06-15 13:53:40,738][1648984] Fps is (10 sec: 39327.7, 60 sec: 42598.5, 300 sec: 43098.3). Total num frames: 370737152. Throughput: 0: 10433.4. Samples: 92737536. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:53:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:53:41,055][1652475] Updated weights for policy 0, policy_version 181047 (0.0099) [2024-06-15 13:53:44,327][1652475] Updated weights for policy 0, policy_version 181088 (0.0012) [2024-06-15 13:53:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 41506.1, 300 sec: 43099.5). Total num frames: 370933760. Throughput: 0: 10331.0. Samples: 92797440. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:53:45,750][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:53:48,937][1652475] Updated weights for policy 0, policy_version 181140 (0.0016) [2024-06-15 13:53:50,739][1648984] Fps is (10 sec: 36044.0, 60 sec: 39867.6, 300 sec: 42765.0). Total num frames: 371097600. Throughput: 0: 10683.7. Samples: 92835840. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:53:50,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:53:51,299][1652475] Updated weights for policy 0, policy_version 181219 (0.0016) [2024-06-15 13:53:53,134][1652475] Updated weights for policy 0, policy_version 181308 (0.0028) [2024-06-15 13:53:55,741][1648984] Fps is (10 sec: 42583.2, 60 sec: 42049.8, 300 sec: 43208.8). Total num frames: 371359744. Throughput: 0: 10273.4. Samples: 92891136. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:53:55,742][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:53:56,667][1652475] Updated weights for policy 0, policy_version 181374 (0.0012) [2024-06-15 13:54:00,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 371490816. Throughput: 0: 10615.5. Samples: 92965888. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:54:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:02,987][1652475] Updated weights for policy 0, policy_version 181456 (0.0013) [2024-06-15 13:54:04,554][1652475] Updated weights for policy 0, policy_version 181524 (0.0088) [2024-06-15 13:54:05,738][1648984] Fps is (10 sec: 49169.6, 60 sec: 42052.2, 300 sec: 43098.2). Total num frames: 371851264. Throughput: 0: 10513.1. Samples: 92993536. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:54:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:07,814][1651340] Signal inference workers to stop experience collection... (9350 times) [2024-06-15 13:54:07,839][1652475] Updated weights for policy 0, policy_version 181586 (0.0012) [2024-06-15 13:54:07,874][1652475] InferenceWorker_p0-w0: stopping experience collection (9350 times) [2024-06-15 13:54:08,095][1651340] Signal inference workers to resume experience collection... (9350 times) [2024-06-15 13:54:08,096][1652475] InferenceWorker_p0-w0: resuming experience collection (9350 times) [2024-06-15 13:54:10,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.8, 300 sec: 43098.2). Total num frames: 371982336. Throughput: 0: 10490.3. Samples: 93060608. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:54:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:12,342][1652475] Updated weights for policy 0, policy_version 181648 (0.0015) [2024-06-15 13:54:13,556][1652475] Updated weights for policy 0, policy_version 181693 (0.0013) [2024-06-15 13:54:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 372244480. Throughput: 0: 10763.8. Samples: 93123584. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:54:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:15,940][1652475] Updated weights for policy 0, policy_version 181776 (0.0015) [2024-06-15 13:54:20,079][1652475] Updated weights for policy 0, policy_version 181856 (0.0103) [2024-06-15 13:54:20,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43144.7, 300 sec: 43209.3). Total num frames: 372473856. Throughput: 0: 10797.5. Samples: 93155328. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:54:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:24,518][1652475] Updated weights for policy 0, policy_version 181904 (0.0012) [2024-06-15 13:54:25,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 372637696. Throughput: 0: 10922.7. Samples: 93229056. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:54:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:26,447][1652475] Updated weights for policy 0, policy_version 181968 (0.0132) [2024-06-15 13:54:28,095][1652475] Updated weights for policy 0, policy_version 182032 (0.0015) [2024-06-15 13:54:30,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42599.5, 300 sec: 43098.2). Total num frames: 372899840. Throughput: 0: 10922.7. Samples: 93288960. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:54:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:31,302][1652475] Updated weights for policy 0, policy_version 182084 (0.0014) [2024-06-15 13:54:32,820][1652475] Updated weights for policy 0, policy_version 182144 (0.0013) [2024-06-15 13:54:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 373030912. Throughput: 0: 10774.8. Samples: 93320704. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:54:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:38,760][1652475] Updated weights for policy 0, policy_version 182224 (0.0015) [2024-06-15 13:54:40,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 373325824. Throughput: 0: 11003.2. Samples: 93386240. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 13:54:40,744][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:41,369][1652475] Updated weights for policy 0, policy_version 182329 (0.0016) [2024-06-15 13:54:44,536][1652475] Updated weights for policy 0, policy_version 182369 (0.0014) [2024-06-15 13:54:45,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 373555200. Throughput: 0: 10706.5. Samples: 93447680. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:54:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:49,367][1652475] Updated weights for policy 0, policy_version 182436 (0.0028) [2024-06-15 13:54:50,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43144.6, 300 sec: 42431.8). Total num frames: 373686272. Throughput: 0: 11013.7. Samples: 93489152. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:54:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:52,716][1652475] Updated weights for policy 0, policy_version 182545 (0.0016) [2024-06-15 13:54:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43147.1, 300 sec: 42653.9). Total num frames: 373948416. Throughput: 0: 10649.6. Samples: 93539840. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:54:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:54:55,765][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000182592_373948416.pth... [2024-06-15 13:54:55,824][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000177600_363724800.pth [2024-06-15 13:54:56,065][1651340] Signal inference workers to stop experience collection... (9400 times) [2024-06-15 13:54:56,101][1652475] InferenceWorker_p0-w0: stopping experience collection (9400 times) [2024-06-15 13:54:56,310][1651340] Signal inference workers to resume experience collection... (9400 times) [2024-06-15 13:54:56,311][1652475] InferenceWorker_p0-w0: resuming experience collection (9400 times) [2024-06-15 13:54:56,867][1652475] Updated weights for policy 0, policy_version 182627 (0.0024) [2024-06-15 13:55:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.5, 300 sec: 42320.7). Total num frames: 374079488. Throughput: 0: 10843.0. Samples: 93611520. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:55:01,974][1652475] Updated weights for policy 0, policy_version 182693 (0.0024) [2024-06-15 13:55:03,309][1652475] Updated weights for policy 0, policy_version 182736 (0.0136) [2024-06-15 13:55:05,321][1652475] Updated weights for policy 0, policy_version 182816 (0.0013) [2024-06-15 13:55:05,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 42765.1). Total num frames: 374407168. Throughput: 0: 10865.8. Samples: 93644288. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:55:07,903][1652475] Updated weights for policy 0, policy_version 182857 (0.0012) [2024-06-15 13:55:09,376][1652475] Updated weights for policy 0, policy_version 182912 (0.0011) [2024-06-15 13:55:10,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 374603776. Throughput: 0: 10581.3. Samples: 93705216. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:10,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 13:55:15,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 374734848. Throughput: 0: 10717.9. Samples: 93771264. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 13:55:16,373][1652475] Updated weights for policy 0, policy_version 182996 (0.0043) [2024-06-15 13:55:17,401][1652475] Updated weights for policy 0, policy_version 183037 (0.0011) [2024-06-15 13:55:19,660][1652475] Updated weights for policy 0, policy_version 183106 (0.0014) [2024-06-15 13:55:20,738][1648984] Fps is (10 sec: 49153.0, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 375095296. Throughput: 0: 10570.0. Samples: 93796352. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 13:55:25,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 375128064. Throughput: 0: 10456.2. Samples: 93856768. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:25,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 13:55:26,717][1652475] Updated weights for policy 0, policy_version 183184 (0.0016) [2024-06-15 13:55:27,898][1652475] Updated weights for policy 0, policy_version 183230 (0.0012) [2024-06-15 13:55:30,712][1652475] Updated weights for policy 0, policy_version 183285 (0.0017) [2024-06-15 13:55:30,738][1648984] Fps is (10 sec: 26214.3, 60 sec: 40960.0, 300 sec: 42098.5). Total num frames: 375357440. Throughput: 0: 10558.6. Samples: 93922816. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 13:55:32,891][1652475] Updated weights for policy 0, policy_version 183376 (0.0013) [2024-06-15 13:55:33,850][1652475] Updated weights for policy 0, policy_version 183420 (0.0011) [2024-06-15 13:55:35,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 375652352. Throughput: 0: 10092.1. Samples: 93943296. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 13:55:39,852][1652475] Updated weights for policy 0, policy_version 183472 (0.0012) [2024-06-15 13:55:40,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 40959.9, 300 sec: 42431.8). Total num frames: 375783424. Throughput: 0: 10626.8. Samples: 94018048. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:55:41,931][1652475] Updated weights for policy 0, policy_version 183504 (0.0014) [2024-06-15 13:55:43,299][1651340] Signal inference workers to stop experience collection... (9450 times) [2024-06-15 13:55:43,336][1652475] InferenceWorker_p0-w0: stopping experience collection (9450 times) [2024-06-15 13:55:43,532][1651340] Signal inference workers to resume experience collection... (9450 times) [2024-06-15 13:55:43,533][1652475] InferenceWorker_p0-w0: resuming experience collection (9450 times) [2024-06-15 13:55:44,296][1652475] Updated weights for policy 0, policy_version 183600 (0.0013) [2024-06-15 13:55:45,740][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.4, 300 sec: 42987.2). Total num frames: 376143872. Throughput: 0: 10251.4. Samples: 94072832. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:45,741][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:55:50,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 42209.7). Total num frames: 376176640. Throughput: 0: 10205.9. Samples: 94103552. Policy #0 lag: (min: 13.0, avg: 137.6, max: 269.0) [2024-06-15 13:55:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 13:55:52,820][1652475] Updated weights for policy 0, policy_version 183681 (0.0013) [2024-06-15 13:55:55,283][1652475] Updated weights for policy 0, policy_version 183792 (0.0015) [2024-06-15 13:55:55,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 376438784. Throughput: 0: 10331.0. Samples: 94170112. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:55:55,742][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 13:55:56,333][1652475] Updated weights for policy 0, policy_version 183825 (0.0012) [2024-06-15 13:55:58,704][1652475] Updated weights for policy 0, policy_version 183928 (0.0014) [2024-06-15 13:56:00,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 376700928. Throughput: 0: 10126.2. Samples: 94226944. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:00,740][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:05,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 39321.7, 300 sec: 42320.7). Total num frames: 376766464. Throughput: 0: 10410.7. Samples: 94264832. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:05,927][1652475] Updated weights for policy 0, policy_version 183984 (0.0029) [2024-06-15 13:56:07,360][1652475] Updated weights for policy 0, policy_version 184033 (0.0012) [2024-06-15 13:56:08,983][1652475] Updated weights for policy 0, policy_version 184084 (0.0013) [2024-06-15 13:56:10,738][1648984] Fps is (10 sec: 42595.5, 60 sec: 42051.9, 300 sec: 42764.9). Total num frames: 377126912. Throughput: 0: 10456.0. Samples: 94327296. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:10,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:11,395][1652475] Updated weights for policy 0, policy_version 184176 (0.0013) [2024-06-15 13:56:15,750][1648984] Fps is (10 sec: 45817.5, 60 sec: 41497.5, 300 sec: 42207.8). Total num frames: 377225216. Throughput: 0: 10476.0. Samples: 94394368. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:15,751][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:17,175][1652475] Updated weights for policy 0, policy_version 184209 (0.0013) [2024-06-15 13:56:19,313][1652475] Updated weights for policy 0, policy_version 184292 (0.0013) [2024-06-15 13:56:20,738][1648984] Fps is (10 sec: 36047.3, 60 sec: 39867.7, 300 sec: 42209.6). Total num frames: 377487360. Throughput: 0: 10797.5. Samples: 94429184. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:21,003][1652475] Updated weights for policy 0, policy_version 184339 (0.0034) [2024-06-15 13:56:23,097][1652475] Updated weights for policy 0, policy_version 184432 (0.0012) [2024-06-15 13:56:25,738][1648984] Fps is (10 sec: 52494.1, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 377749504. Throughput: 0: 10399.3. Samples: 94486016. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:25,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:29,063][1652475] Updated weights for policy 0, policy_version 184480 (0.0016) [2024-06-15 13:56:30,677][1651340] Signal inference workers to stop experience collection... (9500 times) [2024-06-15 13:56:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42209.7). Total num frames: 377880576. Throughput: 0: 10808.9. Samples: 94559232. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:30,776][1652475] InferenceWorker_p0-w0: stopping experience collection (9500 times) [2024-06-15 13:56:30,893][1651340] Signal inference workers to resume experience collection... (9500 times) [2024-06-15 13:56:30,895][1652475] InferenceWorker_p0-w0: resuming experience collection (9500 times) [2024-06-15 13:56:31,243][1652475] Updated weights for policy 0, policy_version 184544 (0.0017) [2024-06-15 13:56:33,706][1652475] Updated weights for policy 0, policy_version 184624 (0.0014) [2024-06-15 13:56:35,394][1652475] Updated weights for policy 0, policy_version 184697 (0.0126) [2024-06-15 13:56:35,739][1648984] Fps is (10 sec: 52423.9, 60 sec: 43690.0, 300 sec: 43098.1). Total num frames: 378273792. Throughput: 0: 10808.6. Samples: 94589952. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:35,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:40,676][1652475] Updated weights for policy 0, policy_version 184736 (0.0012) [2024-06-15 13:56:40,738][1648984] Fps is (10 sec: 45872.7, 60 sec: 42598.1, 300 sec: 42431.7). Total num frames: 378339328. Throughput: 0: 10820.2. Samples: 94657024. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:40,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:43,683][1652475] Updated weights for policy 0, policy_version 184800 (0.0017) [2024-06-15 13:56:45,696][1652475] Updated weights for policy 0, policy_version 184894 (0.0029) [2024-06-15 13:56:45,751][1648984] Fps is (10 sec: 39275.1, 60 sec: 42043.3, 300 sec: 42652.1). Total num frames: 378667008. Throughput: 0: 10942.3. Samples: 94719488. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:45,752][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:47,099][1652475] Updated weights for policy 0, policy_version 184952 (0.0013) [2024-06-15 13:56:50,738][1648984] Fps is (10 sec: 45877.8, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 378798080. Throughput: 0: 10854.4. Samples: 94753280. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:50,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:53,106][1652475] Updated weights for policy 0, policy_version 185019 (0.0015) [2024-06-15 13:56:55,738][1648984] Fps is (10 sec: 32810.3, 60 sec: 42598.5, 300 sec: 42431.8). Total num frames: 378994688. Throughput: 0: 11013.9. Samples: 94822912. Policy #0 lag: (min: 15.0, avg: 90.6, max: 271.0) [2024-06-15 13:56:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:56:56,079][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000185072_379027456.pth... [2024-06-15 13:56:56,238][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000180096_368836608.pth [2024-06-15 13:56:56,504][1652475] Updated weights for policy 0, policy_version 185088 (0.0024) [2024-06-15 13:56:59,150][1652475] Updated weights for policy 0, policy_version 185204 (0.0014) [2024-06-15 13:57:00,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43100.0). Total num frames: 379322368. Throughput: 0: 10698.1. Samples: 94875648. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:00,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:57:05,334][1652475] Updated weights for policy 0, policy_version 185277 (0.0108) [2024-06-15 13:57:05,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 42654.0). Total num frames: 379453440. Throughput: 0: 10888.5. Samples: 94919168. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:57:09,933][1652475] Updated weights for policy 0, policy_version 185365 (0.0013) [2024-06-15 13:57:10,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42598.9, 300 sec: 42542.9). Total num frames: 379682816. Throughput: 0: 10968.2. Samples: 94979584. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:10,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:57:11,262][1652475] Updated weights for policy 0, policy_version 185424 (0.0012) [2024-06-15 13:57:11,426][1651340] Signal inference workers to stop experience collection... (9550 times) [2024-06-15 13:57:11,521][1652475] InferenceWorker_p0-w0: stopping experience collection (9550 times) [2024-06-15 13:57:11,652][1651340] Signal inference workers to resume experience collection... (9550 times) [2024-06-15 13:57:11,653][1652475] InferenceWorker_p0-w0: resuming experience collection (9550 times) [2024-06-15 13:57:12,307][1652475] Updated weights for policy 0, policy_version 185468 (0.0013) [2024-06-15 13:57:15,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43699.8, 300 sec: 42431.8). Total num frames: 379846656. Throughput: 0: 10990.9. Samples: 95053824. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:15,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:57:16,775][1652475] Updated weights for policy 0, policy_version 185529 (0.0013) [2024-06-15 13:57:20,199][1652475] Updated weights for policy 0, policy_version 185584 (0.0045) [2024-06-15 13:57:20,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 380108800. Throughput: 0: 11093.6. Samples: 95089152. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:57:21,638][1652475] Updated weights for policy 0, policy_version 185634 (0.0019) [2024-06-15 13:57:23,809][1652475] Updated weights for policy 0, policy_version 185717 (0.0013) [2024-06-15 13:57:25,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.8, 300 sec: 42989.0). Total num frames: 380370944. Throughput: 0: 10854.5. Samples: 95145472. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:57:27,954][1652475] Updated weights for policy 0, policy_version 185752 (0.0014) [2024-06-15 13:57:30,664][1652475] Updated weights for policy 0, policy_version 185798 (0.0013) [2024-06-15 13:57:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 380502016. Throughput: 0: 11153.4. Samples: 95221248. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 13:57:32,681][1652475] Updated weights for policy 0, policy_version 185872 (0.0013) [2024-06-15 13:57:33,752][1652475] Updated weights for policy 0, policy_version 185919 (0.0014) [2024-06-15 13:57:35,436][1652475] Updated weights for policy 0, policy_version 185972 (0.0012) [2024-06-15 13:57:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43691.5, 300 sec: 43098.3). Total num frames: 380895232. Throughput: 0: 10945.4. Samples: 95245824. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 13:57:39,606][1652475] Updated weights for policy 0, policy_version 186016 (0.0012) [2024-06-15 13:57:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44783.3, 300 sec: 42653.9). Total num frames: 381026304. Throughput: 0: 10877.1. Samples: 95312384. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:40,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 13:57:41,995][1652475] Updated weights for policy 0, policy_version 186051 (0.0012) [2024-06-15 13:57:43,077][1652475] Updated weights for policy 0, policy_version 186106 (0.0013) [2024-06-15 13:57:45,738][1648984] Fps is (10 sec: 26214.4, 60 sec: 41515.1, 300 sec: 42209.6). Total num frames: 381157376. Throughput: 0: 11070.6. Samples: 95373824. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:45,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 13:57:47,683][1652475] Updated weights for policy 0, policy_version 186169 (0.0012) [2024-06-15 13:57:48,946][1652475] Updated weights for policy 0, policy_version 186211 (0.0046) [2024-06-15 13:57:50,663][1652475] Updated weights for policy 0, policy_version 186288 (0.0014) [2024-06-15 13:57:50,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45329.1, 300 sec: 42987.2). Total num frames: 381517824. Throughput: 0: 10911.3. Samples: 95410176. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:57:54,147][1652475] Updated weights for policy 0, policy_version 186324 (0.0012) [2024-06-15 13:57:55,738][1648984] Fps is (10 sec: 52426.8, 60 sec: 44782.7, 300 sec: 42987.1). Total num frames: 381681664. Throughput: 0: 10945.3. Samples: 95472128. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:57:55,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 13:57:58,669][1652475] Updated weights for policy 0, policy_version 186387 (0.0015) [2024-06-15 13:57:59,960][1651340] Signal inference workers to stop experience collection... (9600 times) [2024-06-15 13:58:00,028][1652475] InferenceWorker_p0-w0: stopping experience collection (9600 times) [2024-06-15 13:58:00,202][1651340] Signal inference workers to resume experience collection... (9600 times) [2024-06-15 13:58:00,203][1652475] InferenceWorker_p0-w0: resuming experience collection (9600 times) [2024-06-15 13:58:00,497][1652475] Updated weights for policy 0, policy_version 186464 (0.0014) [2024-06-15 13:58:00,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 381878272. Throughput: 0: 10865.8. Samples: 95542784. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:58:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 13:58:03,172][1652475] Updated weights for policy 0, policy_version 186544 (0.0015) [2024-06-15 13:58:05,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 382074880. Throughput: 0: 10695.1. Samples: 95570432. Policy #0 lag: (min: 95.0, avg: 219.1, max: 306.0) [2024-06-15 13:58:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:58:07,103][1652475] Updated weights for policy 0, policy_version 186623 (0.0037) [2024-06-15 13:58:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43144.5, 300 sec: 42542.8). Total num frames: 382271488. Throughput: 0: 10808.9. Samples: 95631872. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:58:10,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:58:11,071][1652475] Updated weights for policy 0, policy_version 186674 (0.0014) [2024-06-15 13:58:12,308][1652475] Updated weights for policy 0, policy_version 186720 (0.0027) [2024-06-15 13:58:15,738][1648984] Fps is (10 sec: 45873.8, 60 sec: 44782.7, 300 sec: 42876.1). Total num frames: 382533632. Throughput: 0: 10672.3. Samples: 95701504. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:58:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 13:58:16,307][1652475] Updated weights for policy 0, policy_version 186808 (0.0013) [2024-06-15 13:58:20,209][1652475] Updated weights for policy 0, policy_version 186874 (0.0068) [2024-06-15 13:58:20,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 382730240. Throughput: 0: 10820.3. Samples: 95732736. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:58:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:58:23,518][1652475] Updated weights for policy 0, policy_version 186944 (0.0017) [2024-06-15 13:58:25,738][1648984] Fps is (10 sec: 45876.5, 60 sec: 43690.6, 300 sec: 42876.3). Total num frames: 382992384. Throughput: 0: 10626.8. Samples: 95790592. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:58:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:58:27,442][1652475] Updated weights for policy 0, policy_version 187024 (0.0014) [2024-06-15 13:58:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 383123456. Throughput: 0: 10831.6. Samples: 95861248. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:58:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:58:31,470][1652475] Updated weights for policy 0, policy_version 187075 (0.0013) [2024-06-15 13:58:34,786][1652475] Updated weights for policy 0, policy_version 187176 (0.0021) [2024-06-15 13:58:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 383418368. Throughput: 0: 10729.2. Samples: 95892992. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:58:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:58:36,073][1652475] Updated weights for policy 0, policy_version 187235 (0.0018) [2024-06-15 13:58:39,577][1652475] Updated weights for policy 0, policy_version 187299 (0.0017) [2024-06-15 13:58:40,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 383647744. Throughput: 0: 10934.1. Samples: 95964160. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:58:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:58:43,552][1652475] Updated weights for policy 0, policy_version 187344 (0.0013) [2024-06-15 13:58:44,599][1652475] Updated weights for policy 0, policy_version 187392 (0.0012) [2024-06-15 13:58:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 383811584. Throughput: 0: 10911.3. Samples: 96033792. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:58:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:58:47,066][1651340] Signal inference workers to stop experience collection... (9650 times) [2024-06-15 13:58:47,108][1652475] InferenceWorker_p0-w0: stopping experience collection (9650 times) [2024-06-15 13:58:47,329][1651340] Signal inference workers to resume experience collection... (9650 times) [2024-06-15 13:58:47,330][1652475] InferenceWorker_p0-w0: resuming experience collection (9650 times) [2024-06-15 13:58:47,544][1652475] Updated weights for policy 0, policy_version 187496 (0.0021) [2024-06-15 13:58:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 42987.7). Total num frames: 384040960. Throughput: 0: 10888.5. Samples: 96060416. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:58:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:58:51,141][1652475] Updated weights for policy 0, policy_version 187552 (0.0013) [2024-06-15 13:58:54,333][1652475] Updated weights for policy 0, policy_version 187585 (0.0012) [2024-06-15 13:58:55,123][1652475] Updated weights for policy 0, policy_version 187636 (0.0017) [2024-06-15 13:58:55,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.9, 300 sec: 43431.5). Total num frames: 384303104. Throughput: 0: 11275.4. Samples: 96139264. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:58:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:58:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000187648_384303104.pth... [2024-06-15 13:58:55,786][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000182592_373948416.pth [2024-06-15 13:58:57,220][1652475] Updated weights for policy 0, policy_version 187680 (0.0066) [2024-06-15 13:58:59,130][1652475] Updated weights for policy 0, policy_version 187765 (0.0079) [2024-06-15 13:59:00,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 44783.0, 300 sec: 43098.3). Total num frames: 384565248. Throughput: 0: 11082.1. Samples: 96200192. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:59:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:59:03,100][1652475] Updated weights for policy 0, policy_version 187830 (0.0014) [2024-06-15 13:59:05,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 384696320. Throughput: 0: 11172.9. Samples: 96235520. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:59:05,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:59:06,252][1652475] Updated weights for policy 0, policy_version 187856 (0.0020) [2024-06-15 13:59:08,493][1652475] Updated weights for policy 0, policy_version 187939 (0.0088) [2024-06-15 13:59:09,142][1652475] Updated weights for policy 0, policy_version 187967 (0.0051) [2024-06-15 13:59:10,706][1652475] Updated weights for policy 0, policy_version 188029 (0.0081) [2024-06-15 13:59:10,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 46421.3, 300 sec: 43431.5). Total num frames: 385056768. Throughput: 0: 11468.8. Samples: 96306688. Policy #0 lag: (min: 0.0, avg: 137.0, max: 256.0) [2024-06-15 13:59:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:59:14,080][1652475] Updated weights for policy 0, policy_version 188080 (0.0012) [2024-06-15 13:59:15,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 44783.2, 300 sec: 43209.3). Total num frames: 385220608. Throughput: 0: 11537.1. Samples: 96380416. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 13:59:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:59:17,247][1652475] Updated weights for policy 0, policy_version 188112 (0.0011) [2024-06-15 13:59:18,082][1652475] Updated weights for policy 0, policy_version 188159 (0.0013) [2024-06-15 13:59:19,684][1652475] Updated weights for policy 0, policy_version 188215 (0.0013) [2024-06-15 13:59:20,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45875.1, 300 sec: 43542.6). Total num frames: 385482752. Throughput: 0: 11696.3. Samples: 96419328. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 13:59:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:59:21,624][1652475] Updated weights for policy 0, policy_version 188263 (0.0126) [2024-06-15 13:59:24,461][1652475] Updated weights for policy 0, policy_version 188324 (0.0016) [2024-06-15 13:59:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 385744896. Throughput: 0: 11525.7. Samples: 96482816. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 13:59:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:59:29,121][1652475] Updated weights for policy 0, policy_version 188385 (0.0015) [2024-06-15 13:59:30,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 46421.3, 300 sec: 43653.6). Total num frames: 385908736. Throughput: 0: 11707.7. Samples: 96560640. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 13:59:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:59:31,537][1652475] Updated weights for policy 0, policy_version 188474 (0.0017) [2024-06-15 13:59:33,613][1652475] Updated weights for policy 0, policy_version 188544 (0.0017) [2024-06-15 13:59:34,294][1651340] Signal inference workers to stop experience collection... (9700 times) [2024-06-15 13:59:34,356][1652475] InferenceWorker_p0-w0: stopping experience collection (9700 times) [2024-06-15 13:59:34,467][1651340] Signal inference workers to resume experience collection... (9700 times) [2024-06-15 13:59:34,467][1652475] InferenceWorker_p0-w0: resuming experience collection (9700 times) [2024-06-15 13:59:35,306][1652475] Updated weights for policy 0, policy_version 188608 (0.0013) [2024-06-15 13:59:35,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 47513.6, 300 sec: 43875.8). Total num frames: 386269184. Throughput: 0: 11673.6. Samples: 96585728. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 13:59:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:59:40,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 386334720. Throughput: 0: 11673.6. Samples: 96664576. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 13:59:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 13:59:42,010][1652475] Updated weights for policy 0, policy_version 188675 (0.0029) [2024-06-15 13:59:43,485][1652475] Updated weights for policy 0, policy_version 188734 (0.0013) [2024-06-15 13:59:45,186][1652475] Updated weights for policy 0, policy_version 188799 (0.0013) [2024-06-15 13:59:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 47513.6, 300 sec: 43986.9). Total num frames: 386662400. Throughput: 0: 11582.6. Samples: 96721408. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 13:59:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 13:59:50,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 386793472. Throughput: 0: 11491.6. Samples: 96752640. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 13:59:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 13:59:52,234][1652475] Updated weights for policy 0, policy_version 188912 (0.0148) [2024-06-15 13:59:55,740][1648984] Fps is (10 sec: 29491.1, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 386957312. Throughput: 0: 11480.2. Samples: 96823296. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 13:59:55,741][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 13:59:57,531][1652475] Updated weights for policy 0, policy_version 189024 (0.0014) [2024-06-15 13:59:59,642][1652475] Updated weights for policy 0, policy_version 189112 (0.0015) [2024-06-15 14:00:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 43764.7). Total num frames: 387317760. Throughput: 0: 11013.7. Samples: 96876032. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 14:00:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:00:03,458][1652475] Updated weights for policy 0, policy_version 189168 (0.0015) [2024-06-15 14:00:05,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 387448832. Throughput: 0: 11047.8. Samples: 96916480. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 14:00:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:00:09,197][1652475] Updated weights for policy 0, policy_version 189235 (0.0013) [2024-06-15 14:00:10,738][1648984] Fps is (10 sec: 32767.5, 60 sec: 43144.5, 300 sec: 43764.7). Total num frames: 387645440. Throughput: 0: 11127.4. Samples: 96983552. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 14:00:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:00:11,820][1652475] Updated weights for policy 0, policy_version 189328 (0.0063) [2024-06-15 14:00:14,973][1652475] Updated weights for policy 0, policy_version 189385 (0.0011) [2024-06-15 14:00:15,739][1648984] Fps is (10 sec: 45870.2, 60 sec: 44782.1, 300 sec: 43431.3). Total num frames: 387907584. Throughput: 0: 10524.2. Samples: 97034240. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 14:00:15,740][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 14:00:16,103][1652475] Updated weights for policy 0, policy_version 189436 (0.0013) [2024-06-15 14:00:20,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 41506.1, 300 sec: 43542.5). Total num frames: 387973120. Throughput: 0: 10683.7. Samples: 97066496. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 14:00:20,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 14:00:23,599][1652475] Updated weights for policy 0, policy_version 189520 (0.0012) [2024-06-15 14:00:23,757][1651340] Signal inference workers to stop experience collection... (9750 times) [2024-06-15 14:00:23,846][1652475] InferenceWorker_p0-w0: stopping experience collection (9750 times) [2024-06-15 14:00:24,045][1651340] Signal inference workers to resume experience collection... (9750 times) [2024-06-15 14:00:24,046][1652475] InferenceWorker_p0-w0: resuming experience collection (9750 times) [2024-06-15 14:00:25,738][1648984] Fps is (10 sec: 36048.7, 60 sec: 42052.3, 300 sec: 43764.7). Total num frames: 388268032. Throughput: 0: 10194.5. Samples: 97123328. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:00:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:00:25,761][1652475] Updated weights for policy 0, policy_version 189600 (0.0146) [2024-06-15 14:00:30,010][1652475] Updated weights for policy 0, policy_version 189664 (0.0040) [2024-06-15 14:00:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 388497408. Throughput: 0: 10342.4. Samples: 97186816. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:00:30,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:00:33,440][1652475] Updated weights for policy 0, policy_version 189712 (0.0013) [2024-06-15 14:00:34,747][1652475] Updated weights for policy 0, policy_version 189760 (0.0018) [2024-06-15 14:00:35,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 40413.8, 300 sec: 43764.7). Total num frames: 388694016. Throughput: 0: 10490.3. Samples: 97224704. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:00:35,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:00:36,420][1652475] Updated weights for policy 0, policy_version 189821 (0.0022) [2024-06-15 14:00:37,732][1652475] Updated weights for policy 0, policy_version 189880 (0.0016) [2024-06-15 14:00:40,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 388890624. Throughput: 0: 10285.5. Samples: 97286144. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:00:40,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:00:42,396][1652475] Updated weights for policy 0, policy_version 189926 (0.0095) [2024-06-15 14:00:45,070][1652475] Updated weights for policy 0, policy_version 189972 (0.0012) [2024-06-15 14:00:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40413.8, 300 sec: 43764.7). Total num frames: 389087232. Throughput: 0: 10752.0. Samples: 97359872. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:00:45,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:00:46,315][1652475] Updated weights for policy 0, policy_version 190017 (0.0012) [2024-06-15 14:00:48,246][1652475] Updated weights for policy 0, policy_version 190096 (0.0111) [2024-06-15 14:00:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 389414912. Throughput: 0: 10433.4. Samples: 97385984. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:00:50,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:00:53,687][1652475] Updated weights for policy 0, policy_version 190148 (0.0015) [2024-06-15 14:00:55,012][1652475] Updated weights for policy 0, policy_version 190208 (0.0012) [2024-06-15 14:00:55,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 389545984. Throughput: 0: 10478.9. Samples: 97455104. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:00:55,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:00:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000190208_389545984.pth... [2024-06-15 14:00:55,814][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000185072_379027456.pth [2024-06-15 14:00:58,326][1652475] Updated weights for policy 0, policy_version 190270 (0.0109) [2024-06-15 14:00:59,585][1652475] Updated weights for policy 0, policy_version 190320 (0.0151) [2024-06-15 14:01:00,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 42598.2, 300 sec: 44431.1). Total num frames: 389873664. Throughput: 0: 10718.1. Samples: 97516544. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:01:00,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:01,218][1652475] Updated weights for policy 0, policy_version 190389 (0.0015) [2024-06-15 14:01:05,738][1648984] Fps is (10 sec: 39320.0, 60 sec: 41505.8, 300 sec: 43431.5). Total num frames: 389939200. Throughput: 0: 10808.8. Samples: 97552896. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:01:05,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:06,135][1652475] Updated weights for policy 0, policy_version 190418 (0.0011) [2024-06-15 14:01:06,546][1651340] Signal inference workers to stop experience collection... (9800 times) [2024-06-15 14:01:06,576][1652475] InferenceWorker_p0-w0: stopping experience collection (9800 times) [2024-06-15 14:01:06,823][1651340] Signal inference workers to resume experience collection... (9800 times) [2024-06-15 14:01:06,824][1652475] InferenceWorker_p0-w0: resuming experience collection (9800 times) [2024-06-15 14:01:07,206][1652475] Updated weights for policy 0, policy_version 190464 (0.0012) [2024-06-15 14:01:10,035][1652475] Updated weights for policy 0, policy_version 190519 (0.0013) [2024-06-15 14:01:10,739][1648984] Fps is (10 sec: 32762.8, 60 sec: 42597.2, 300 sec: 43988.5). Total num frames: 390201344. Throughput: 0: 10990.5. Samples: 97617920. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:01:10,740][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:12,778][1652475] Updated weights for policy 0, policy_version 190608 (0.0116) [2024-06-15 14:01:15,738][1648984] Fps is (10 sec: 52431.0, 60 sec: 42599.1, 300 sec: 43986.9). Total num frames: 390463488. Throughput: 0: 10888.5. Samples: 97676800. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:01:15,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:18,291][1652475] Updated weights for policy 0, policy_version 190659 (0.0011) [2024-06-15 14:01:19,744][1652475] Updated weights for policy 0, policy_version 190720 (0.0012) [2024-06-15 14:01:20,738][1648984] Fps is (10 sec: 42605.7, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 390627328. Throughput: 0: 10968.2. Samples: 97718272. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:01:20,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:21,617][1652475] Updated weights for policy 0, policy_version 190782 (0.0116) [2024-06-15 14:01:24,190][1652475] Updated weights for policy 0, policy_version 190832 (0.0014) [2024-06-15 14:01:25,738][1648984] Fps is (10 sec: 49149.8, 60 sec: 44782.6, 300 sec: 44320.0). Total num frames: 390955008. Throughput: 0: 10877.0. Samples: 97775616. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:01:25,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:30,530][1652475] Updated weights for policy 0, policy_version 190913 (0.0013) [2024-06-15 14:01:30,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 41506.0, 300 sec: 43098.4). Total num frames: 390987776. Throughput: 0: 10820.2. Samples: 97846784. Policy #0 lag: (min: 13.0, avg: 63.9, max: 253.0) [2024-06-15 14:01:30,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:32,951][1652475] Updated weights for policy 0, policy_version 190992 (0.0113) [2024-06-15 14:01:35,292][1652475] Updated weights for policy 0, policy_version 191043 (0.0026) [2024-06-15 14:01:35,738][1648984] Fps is (10 sec: 32769.6, 60 sec: 43144.6, 300 sec: 43875.9). Total num frames: 391282688. Throughput: 0: 10831.6. Samples: 97873408. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:01:35,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:37,263][1652475] Updated weights for policy 0, policy_version 191136 (0.0013) [2024-06-15 14:01:40,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.6, 300 sec: 43544.5). Total num frames: 391512064. Throughput: 0: 10649.6. Samples: 97934336. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:01:40,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:43,094][1652475] Updated weights for policy 0, policy_version 191200 (0.0014) [2024-06-15 14:01:45,356][1652475] Updated weights for policy 0, policy_version 191265 (0.0016) [2024-06-15 14:01:45,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44236.9, 300 sec: 43875.8). Total num frames: 391741440. Throughput: 0: 10865.8. Samples: 98005504. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:01:45,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:47,346][1652475] Updated weights for policy 0, policy_version 191298 (0.0044) [2024-06-15 14:01:48,827][1652475] Updated weights for policy 0, policy_version 191362 (0.0013) [2024-06-15 14:01:50,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 392036352. Throughput: 0: 10809.0. Samples: 98039296. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:01:50,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:54,410][1651340] Signal inference workers to stop experience collection... (9850 times) [2024-06-15 14:01:54,451][1652475] InferenceWorker_p0-w0: stopping experience collection (9850 times) [2024-06-15 14:01:54,583][1651340] Signal inference workers to resume experience collection... (9850 times) [2024-06-15 14:01:54,583][1652475] InferenceWorker_p0-w0: resuming experience collection (9850 times) [2024-06-15 14:01:54,585][1652475] Updated weights for policy 0, policy_version 191440 (0.0027) [2024-06-15 14:01:55,739][1648984] Fps is (10 sec: 42591.6, 60 sec: 43689.6, 300 sec: 43542.3). Total num frames: 392167424. Throughput: 0: 10922.7. Samples: 98109440. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:01:55,740][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:01:57,407][1652475] Updated weights for policy 0, policy_version 191536 (0.0166) [2024-06-15 14:01:59,438][1652475] Updated weights for policy 0, policy_version 191570 (0.0015) [2024-06-15 14:02:00,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 43144.6, 300 sec: 44097.9). Total num frames: 392462336. Throughput: 0: 10911.3. Samples: 98167808. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:02:00,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:02:01,495][1652475] Updated weights for policy 0, policy_version 191672 (0.0015) [2024-06-15 14:02:05,741][1648984] Fps is (10 sec: 39314.6, 60 sec: 43688.6, 300 sec: 43653.1). Total num frames: 392560640. Throughput: 0: 10796.7. Samples: 98204160. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:02:05,741][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 14:02:07,251][1652475] Updated weights for policy 0, policy_version 191741 (0.0127) [2024-06-15 14:02:09,904][1652475] Updated weights for policy 0, policy_version 191799 (0.0015) [2024-06-15 14:02:10,738][1648984] Fps is (10 sec: 36043.0, 60 sec: 43691.6, 300 sec: 43986.8). Total num frames: 392822784. Throughput: 0: 11002.3. Samples: 98270720. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:02:10,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:02:12,625][1652475] Updated weights for policy 0, policy_version 191872 (0.0013) [2024-06-15 14:02:14,357][1652475] Updated weights for policy 0, policy_version 191936 (0.0013) [2024-06-15 14:02:15,760][1648984] Fps is (10 sec: 52332.0, 60 sec: 43674.8, 300 sec: 43983.6). Total num frames: 393084928. Throughput: 0: 10587.6. Samples: 98323456. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:02:15,760][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:02:19,290][1652475] Updated weights for policy 0, policy_version 191993 (0.0012) [2024-06-15 14:02:20,738][1648984] Fps is (10 sec: 39323.9, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 393216000. Throughput: 0: 10706.5. Samples: 98355200. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:02:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:02:24,178][1652475] Updated weights for policy 0, policy_version 192049 (0.0012) [2024-06-15 14:02:25,738][1648984] Fps is (10 sec: 36123.6, 60 sec: 41506.5, 300 sec: 43875.8). Total num frames: 393445376. Throughput: 0: 10740.6. Samples: 98417664. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:02:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:02:27,587][1652475] Updated weights for policy 0, policy_version 192129 (0.0011) [2024-06-15 14:02:30,157][1652475] Updated weights for policy 0, policy_version 192208 (0.0014) [2024-06-15 14:02:30,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 393674752. Throughput: 0: 10456.1. Samples: 98476032. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:02:30,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:02:35,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 40960.0, 300 sec: 43098.2). Total num frames: 393740288. Throughput: 0: 10490.3. Samples: 98511360. Policy #0 lag: (min: 7.0, avg: 86.6, max: 263.0) [2024-06-15 14:02:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:02:36,449][1652475] Updated weights for policy 0, policy_version 192292 (0.0016) [2024-06-15 14:02:39,919][1652475] Updated weights for policy 0, policy_version 192400 (0.0013) [2024-06-15 14:02:40,010][1651340] Signal inference workers to stop experience collection... (9900 times) [2024-06-15 14:02:40,111][1652475] InferenceWorker_p0-w0: stopping experience collection (9900 times) [2024-06-15 14:02:40,201][1651340] Signal inference workers to resume experience collection... (9900 times) [2024-06-15 14:02:40,210][1652475] InferenceWorker_p0-w0: resuming experience collection (9900 times) [2024-06-15 14:02:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43144.5, 300 sec: 43875.8). Total num frames: 394100736. Throughput: 0: 10240.3. Samples: 98570240. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:02:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:02:40,874][1652475] Updated weights for policy 0, policy_version 192447 (0.0014) [2024-06-15 14:02:45,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 42052.1, 300 sec: 43209.3). Total num frames: 394264576. Throughput: 0: 10399.3. Samples: 98635776. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:02:45,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:02:47,596][1652475] Updated weights for policy 0, policy_version 192513 (0.0015) [2024-06-15 14:02:48,742][1652475] Updated weights for policy 0, policy_version 192576 (0.0012) [2024-06-15 14:02:50,088][1652475] Updated weights for policy 0, policy_version 192624 (0.0012) [2024-06-15 14:02:50,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.0, 300 sec: 43542.6). Total num frames: 394526720. Throughput: 0: 10445.5. Samples: 98674176. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:02:50,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 14:02:51,435][1652475] Updated weights for policy 0, policy_version 192690 (0.0012) [2024-06-15 14:02:55,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 42053.1, 300 sec: 43431.4). Total num frames: 394690560. Throughput: 0: 10456.2. Samples: 98741248. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:02:55,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:02:56,161][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000192736_394723328.pth... [2024-06-15 14:02:56,296][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000187648_384303104.pth [2024-06-15 14:02:56,770][1652475] Updated weights for policy 0, policy_version 192759 (0.0013) [2024-06-15 14:03:00,301][1652475] Updated weights for policy 0, policy_version 192822 (0.0014) [2024-06-15 14:03:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 40960.0, 300 sec: 43542.6). Total num frames: 394919936. Throughput: 0: 10768.6. Samples: 98807808. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:03:00,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:02,458][1652475] Updated weights for policy 0, policy_version 192912 (0.0014) [2024-06-15 14:03:03,495][1652475] Updated weights for policy 0, policy_version 192959 (0.0014) [2024-06-15 14:03:05,738][1648984] Fps is (10 sec: 49153.4, 60 sec: 43693.1, 300 sec: 43764.7). Total num frames: 395182080. Throughput: 0: 10649.6. Samples: 98834432. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:03:05,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:08,306][1652475] Updated weights for policy 0, policy_version 193008 (0.0015) [2024-06-15 14:03:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.5, 300 sec: 43320.5). Total num frames: 395313152. Throughput: 0: 10922.7. Samples: 98909184. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:03:10,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:12,156][1652475] Updated weights for policy 0, policy_version 193077 (0.0017) [2024-06-15 14:03:14,629][1652475] Updated weights for policy 0, policy_version 193185 (0.0015) [2024-06-15 14:03:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43706.5, 300 sec: 43986.9). Total num frames: 395706368. Throughput: 0: 10843.0. Samples: 98963968. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:03:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:20,102][1652475] Updated weights for policy 0, policy_version 193248 (0.0015) [2024-06-15 14:03:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 395837440. Throughput: 0: 10979.6. Samples: 99005440. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:03:20,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:23,922][1652475] Updated weights for policy 0, policy_version 193316 (0.0013) [2024-06-15 14:03:24,908][1651340] Signal inference workers to stop experience collection... (9950 times) [2024-06-15 14:03:24,964][1652475] InferenceWorker_p0-w0: stopping experience collection (9950 times) [2024-06-15 14:03:25,117][1651340] Signal inference workers to resume experience collection... (9950 times) [2024-06-15 14:03:25,130][1652475] InferenceWorker_p0-w0: resuming experience collection (9950 times) [2024-06-15 14:03:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 396066816. Throughput: 0: 11229.9. Samples: 99075584. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:03:25,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:26,236][1652475] Updated weights for policy 0, policy_version 193424 (0.0016) [2024-06-15 14:03:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 396230656. Throughput: 0: 11264.0. Samples: 99142656. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:03:30,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:31,488][1652475] Updated weights for policy 0, policy_version 193488 (0.0013) [2024-06-15 14:03:34,560][1652475] Updated weights for policy 0, policy_version 193552 (0.0013) [2024-06-15 14:03:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 396492800. Throughput: 0: 11173.0. Samples: 99176960. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:03:35,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:36,004][1652475] Updated weights for policy 0, policy_version 193619 (0.0013) [2024-06-15 14:03:37,487][1652475] Updated weights for policy 0, policy_version 193680 (0.0015) [2024-06-15 14:03:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 396754944. Throughput: 0: 11025.1. Samples: 99237376. Policy #0 lag: (min: 79.0, avg: 137.9, max: 319.0) [2024-06-15 14:03:40,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:42,537][1652475] Updated weights for policy 0, policy_version 193729 (0.0090) [2024-06-15 14:03:43,540][1652475] Updated weights for policy 0, policy_version 193788 (0.0013) [2024-06-15 14:03:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.9, 300 sec: 43653.6). Total num frames: 396918784. Throughput: 0: 11389.2. Samples: 99320320. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:03:45,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:46,687][1652475] Updated weights for policy 0, policy_version 193858 (0.0017) [2024-06-15 14:03:48,828][1652475] Updated weights for policy 0, policy_version 193936 (0.0015) [2024-06-15 14:03:50,010][1652475] Updated weights for policy 0, policy_version 193980 (0.0014) [2024-06-15 14:03:50,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 45875.3, 300 sec: 43986.9). Total num frames: 397279232. Throughput: 0: 11332.3. Samples: 99344384. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:03:50,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:55,393][1652475] Updated weights for policy 0, policy_version 194048 (0.0014) [2024-06-15 14:03:55,740][1648984] Fps is (10 sec: 49151.7, 60 sec: 45329.2, 300 sec: 43542.5). Total num frames: 397410304. Throughput: 0: 11320.9. Samples: 99418624. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:03:55,741][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:03:58,596][1652475] Updated weights for policy 0, policy_version 194116 (0.0018) [2024-06-15 14:03:59,928][1652475] Updated weights for policy 0, policy_version 194176 (0.0017) [2024-06-15 14:04:00,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 46967.5, 300 sec: 44209.0). Total num frames: 397737984. Throughput: 0: 11468.8. Samples: 99480064. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:00,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:04:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 397803520. Throughput: 0: 11229.9. Samples: 99510784. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:05,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:04:05,887][1652475] Updated weights for policy 0, policy_version 194244 (0.0013) [2024-06-15 14:04:09,422][1651340] Signal inference workers to stop experience collection... (10000 times) [2024-06-15 14:04:09,473][1652475] Updated weights for policy 0, policy_version 194324 (0.0012) [2024-06-15 14:04:09,497][1652475] InferenceWorker_p0-w0: stopping experience collection (10000 times) [2024-06-15 14:04:09,633][1651340] Signal inference workers to resume experience collection... (10000 times) [2024-06-15 14:04:09,634][1652475] InferenceWorker_p0-w0: resuming experience collection (10000 times) [2024-06-15 14:04:10,738][1648984] Fps is (10 sec: 32767.2, 60 sec: 45875.0, 300 sec: 43542.5). Total num frames: 398065664. Throughput: 0: 11309.5. Samples: 99584512. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:10,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:04:11,144][1652475] Updated weights for policy 0, policy_version 194387 (0.0135) [2024-06-15 14:04:13,549][1652475] Updated weights for policy 0, policy_version 194465 (0.0011) [2024-06-15 14:04:15,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 398327808. Throughput: 0: 11025.1. Samples: 99638784. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:04:19,432][1652475] Updated weights for policy 0, policy_version 194544 (0.0013) [2024-06-15 14:04:20,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 398458880. Throughput: 0: 11138.8. Samples: 99678208. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:20,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:04:22,185][1652475] Updated weights for policy 0, policy_version 194592 (0.0030) [2024-06-15 14:04:24,601][1652475] Updated weights for policy 0, policy_version 194672 (0.0016) [2024-06-15 14:04:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 398753792. Throughput: 0: 11059.2. Samples: 99735040. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:25,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:04:30,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 398852096. Throughput: 0: 10638.2. Samples: 99799040. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:30,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:04:31,160][1652475] Updated weights for policy 0, policy_version 194768 (0.0096) [2024-06-15 14:04:35,390][1652475] Updated weights for policy 0, policy_version 194837 (0.0029) [2024-06-15 14:04:35,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 399048704. Throughput: 0: 10786.1. Samples: 99829760. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:04:37,700][1652475] Updated weights for policy 0, policy_version 194928 (0.0023) [2024-06-15 14:04:39,474][1652475] Updated weights for policy 0, policy_version 194980 (0.0013) [2024-06-15 14:04:40,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 399376384. Throughput: 0: 10410.6. Samples: 99887104. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:40,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 14:04:43,471][1652475] Updated weights for policy 0, policy_version 195072 (0.0027) [2024-06-15 14:04:45,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 399507456. Throughput: 0: 10604.1. Samples: 99957248. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:04:47,801][1652475] Updated weights for policy 0, policy_version 195131 (0.0013) [2024-06-15 14:04:50,139][1652475] Updated weights for policy 0, policy_version 195199 (0.0018) [2024-06-15 14:04:50,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 399769600. Throughput: 0: 10570.0. Samples: 99986432. Policy #0 lag: (min: 10.0, avg: 91.9, max: 266.0) [2024-06-15 14:04:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:04:52,838][1652475] Updated weights for policy 0, policy_version 195257 (0.0014) [2024-06-15 14:04:55,738][1648984] Fps is (10 sec: 45873.9, 60 sec: 42598.2, 300 sec: 42876.0). Total num frames: 399966208. Throughput: 0: 10251.4. Samples: 100045824. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:04:55,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:04:56,196][1652475] Updated weights for policy 0, policy_version 195325 (0.0108) [2024-06-15 14:04:56,242][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000195328_400031744.pth... [2024-06-15 14:04:56,288][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000190208_389545984.pth [2024-06-15 14:04:59,957][1651340] Signal inference workers to stop experience collection... (10050 times) [2024-06-15 14:05:00,052][1652475] InferenceWorker_p0-w0: stopping experience collection (10050 times) [2024-06-15 14:05:00,231][1651340] Signal inference workers to resume experience collection... (10050 times) [2024-06-15 14:05:00,246][1652475] InferenceWorker_p0-w0: resuming experience collection (10050 times) [2024-06-15 14:05:00,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 39321.5, 300 sec: 42876.1). Total num frames: 400097280. Throughput: 0: 10649.5. Samples: 100118016. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:00,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:05:02,459][1652475] Updated weights for policy 0, policy_version 195440 (0.0233) [2024-06-15 14:05:05,745][1648984] Fps is (10 sec: 45856.7, 60 sec: 43687.5, 300 sec: 43319.8). Total num frames: 400424960. Throughput: 0: 10318.7. Samples: 100142592. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:05,749][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:05:08,314][1652475] Updated weights for policy 0, policy_version 195521 (0.0013) [2024-06-15 14:05:09,562][1652475] Updated weights for policy 0, policy_version 195579 (0.0012) [2024-06-15 14:05:10,742][1648984] Fps is (10 sec: 45855.3, 60 sec: 41503.2, 300 sec: 42875.6). Total num frames: 400556032. Throughput: 0: 10477.9. Samples: 100206592. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:10,743][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:05:12,560][1652475] Updated weights for policy 0, policy_version 195621 (0.0013) [2024-06-15 14:05:14,322][1652475] Updated weights for policy 0, policy_version 195705 (0.0015) [2024-06-15 14:05:15,738][1648984] Fps is (10 sec: 39338.8, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 400818176. Throughput: 0: 10478.9. Samples: 100270592. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:05:16,686][1652475] Updated weights for policy 0, policy_version 195750 (0.0043) [2024-06-15 14:05:20,146][1652475] Updated weights for policy 0, policy_version 195778 (0.0015) [2024-06-15 14:05:20,738][1648984] Fps is (10 sec: 42617.6, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 400982016. Throughput: 0: 10535.8. Samples: 100303872. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:05:21,408][1652475] Updated weights for policy 0, policy_version 195832 (0.0013) [2024-06-15 14:05:25,046][1652475] Updated weights for policy 0, policy_version 195905 (0.0123) [2024-06-15 14:05:25,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 401276928. Throughput: 0: 10911.3. Samples: 100378112. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:05:27,707][1652475] Updated weights for policy 0, policy_version 195984 (0.0080) [2024-06-15 14:05:28,673][1652475] Updated weights for policy 0, policy_version 196032 (0.0017) [2024-06-15 14:05:30,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 401473536. Throughput: 0: 10763.4. Samples: 100441600. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:05:35,663][1652475] Updated weights for policy 0, policy_version 196112 (0.0023) [2024-06-15 14:05:35,790][1648984] Fps is (10 sec: 35856.9, 60 sec: 43106.9, 300 sec: 43201.7). Total num frames: 401637376. Throughput: 0: 10830.4. Samples: 100474368. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:35,791][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:05:36,956][1652475] Updated weights for policy 0, policy_version 196164 (0.0014) [2024-06-15 14:05:38,045][1652475] Updated weights for policy 0, policy_version 196221 (0.0013) [2024-06-15 14:05:39,973][1652475] Updated weights for policy 0, policy_version 196272 (0.0014) [2024-06-15 14:05:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 401997824. Throughput: 0: 11070.6. Samples: 100544000. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:05:42,560][1652475] Updated weights for policy 0, policy_version 196292 (0.0012) [2024-06-15 14:05:42,906][1651340] Signal inference workers to stop experience collection... (10100 times) [2024-06-15 14:05:42,941][1652475] InferenceWorker_p0-w0: stopping experience collection (10100 times) [2024-06-15 14:05:43,176][1651340] Signal inference workers to resume experience collection... (10100 times) [2024-06-15 14:05:43,177][1652475] InferenceWorker_p0-w0: resuming experience collection (10100 times) [2024-06-15 14:05:45,738][1648984] Fps is (10 sec: 49410.5, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 402128896. Throughput: 0: 11082.0. Samples: 100616704. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:05:46,278][1652475] Updated weights for policy 0, policy_version 196368 (0.0075) [2024-06-15 14:05:48,014][1652475] Updated weights for policy 0, policy_version 196432 (0.0013) [2024-06-15 14:05:50,609][1652475] Updated weights for policy 0, policy_version 196482 (0.0014) [2024-06-15 14:05:50,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 402391040. Throughput: 0: 11231.0. Samples: 100647936. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:05:53,555][1652475] Updated weights for policy 0, policy_version 196545 (0.0112) [2024-06-15 14:05:55,740][1648984] Fps is (10 sec: 52427.4, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 402653184. Throughput: 0: 11390.2. Samples: 100719104. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 14:05:55,744][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:05:57,756][1652475] Updated weights for policy 0, policy_version 196624 (0.0014) [2024-06-15 14:05:59,302][1652475] Updated weights for policy 0, policy_version 196688 (0.0013) [2024-06-15 14:06:00,288][1652475] Updated weights for policy 0, policy_version 196736 (0.0024) [2024-06-15 14:06:00,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 46967.5, 300 sec: 43986.9). Total num frames: 402915328. Throughput: 0: 11480.1. Samples: 100787200. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:06:03,418][1652475] Updated weights for policy 0, policy_version 196797 (0.0015) [2024-06-15 14:06:05,738][1648984] Fps is (10 sec: 45876.9, 60 sec: 44786.2, 300 sec: 43765.0). Total num frames: 403111936. Throughput: 0: 11446.1. Samples: 100818944. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:06:06,252][1652475] Updated weights for policy 0, policy_version 196864 (0.0014) [2024-06-15 14:06:10,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 45878.6, 300 sec: 43542.6). Total num frames: 403308544. Throughput: 0: 11389.1. Samples: 100890624. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:06:11,059][1652475] Updated weights for policy 0, policy_version 196953 (0.0014) [2024-06-15 14:06:14,487][1652475] Updated weights for policy 0, policy_version 197008 (0.0014) [2024-06-15 14:06:15,683][1652475] Updated weights for policy 0, policy_version 197056 (0.0014) [2024-06-15 14:06:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 43875.8). Total num frames: 403570688. Throughput: 0: 11366.4. Samples: 100953088. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:06:17,871][1652475] Updated weights for policy 0, policy_version 197109 (0.0012) [2024-06-15 14:06:20,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 45328.8, 300 sec: 43209.4). Total num frames: 403701760. Throughput: 0: 11413.7. Samples: 100987392. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:20,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:06:21,729][1652475] Updated weights for policy 0, policy_version 197155 (0.0014) [2024-06-15 14:06:23,148][1652475] Updated weights for policy 0, policy_version 197218 (0.0012) [2024-06-15 14:06:25,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 403963904. Throughput: 0: 11366.4. Samples: 101055488. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:06:26,273][1652475] Updated weights for policy 0, policy_version 197250 (0.0015) [2024-06-15 14:06:27,632][1652475] Updated weights for policy 0, policy_version 197308 (0.0012) [2024-06-15 14:06:28,547][1651340] Signal inference workers to stop experience collection... (10150 times) [2024-06-15 14:06:28,600][1652475] InferenceWorker_p0-w0: stopping experience collection (10150 times) [2024-06-15 14:06:28,777][1651340] Signal inference workers to resume experience collection... (10150 times) [2024-06-15 14:06:28,795][1652475] InferenceWorker_p0-w0: resuming experience collection (10150 times) [2024-06-15 14:06:29,343][1652475] Updated weights for policy 0, policy_version 197367 (0.0049) [2024-06-15 14:06:30,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 45875.2, 300 sec: 43875.8). Total num frames: 404226048. Throughput: 0: 11241.2. Samples: 101122560. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:06:33,169][1652475] Updated weights for policy 0, policy_version 197414 (0.0011) [2024-06-15 14:06:34,413][1652475] Updated weights for policy 0, policy_version 197460 (0.0013) [2024-06-15 14:06:35,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 47555.1, 300 sec: 43986.9). Total num frames: 404488192. Throughput: 0: 11366.4. Samples: 101159424. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:06:37,899][1652475] Updated weights for policy 0, policy_version 197522 (0.0130) [2024-06-15 14:06:38,918][1652475] Updated weights for policy 0, policy_version 197567 (0.0030) [2024-06-15 14:06:40,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 45875.3, 300 sec: 44097.9). Total num frames: 404750336. Throughput: 0: 11298.2. Samples: 101227520. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:06:44,147][1652475] Updated weights for policy 0, policy_version 197655 (0.0043) [2024-06-15 14:06:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 404881408. Throughput: 0: 11218.5. Samples: 101292032. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:45,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:06:46,611][1652475] Updated weights for policy 0, policy_version 197703 (0.0113) [2024-06-15 14:06:50,285][1652475] Updated weights for policy 0, policy_version 197777 (0.0012) [2024-06-15 14:06:50,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 44783.0, 300 sec: 43765.0). Total num frames: 405078016. Throughput: 0: 11229.9. Samples: 101324288. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:06:52,601][1652475] Updated weights for policy 0, policy_version 197872 (0.0018) [2024-06-15 14:06:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.9, 300 sec: 43431.5). Total num frames: 405274624. Throughput: 0: 10911.3. Samples: 101381632. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:06:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:06:55,757][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000197888_405274624.pth... [2024-06-15 14:06:55,982][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000192736_394723328.pth [2024-06-15 14:06:56,920][1652475] Updated weights for policy 0, policy_version 197936 (0.0011) [2024-06-15 14:07:00,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.2, 300 sec: 43543.1). Total num frames: 405405696. Throughput: 0: 10956.8. Samples: 101446144. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:07:00,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 14:07:01,812][1652475] Updated weights for policy 0, policy_version 197984 (0.0025) [2024-06-15 14:07:03,795][1652475] Updated weights for policy 0, policy_version 198071 (0.0032) [2024-06-15 14:07:05,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 44236.7, 300 sec: 43875.9). Total num frames: 405766144. Throughput: 0: 10774.8. Samples: 101472256. Policy #0 lag: (min: 31.0, avg: 141.5, max: 271.0) [2024-06-15 14:07:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:07:05,854][1652475] Updated weights for policy 0, policy_version 198136 (0.0012) [2024-06-15 14:07:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43323.6). Total num frames: 405864448. Throughput: 0: 10604.1. Samples: 101532672. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:07:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:07:10,904][1652475] Updated weights for policy 0, policy_version 198192 (0.0014) [2024-06-15 14:07:13,912][1652475] Updated weights for policy 0, policy_version 198264 (0.0014) [2024-06-15 14:07:15,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 406061056. Throughput: 0: 10615.5. Samples: 101600256. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:07:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:07:17,013][1651340] Signal inference workers to stop experience collection... (10200 times) [2024-06-15 14:07:17,073][1652475] InferenceWorker_p0-w0: stopping experience collection (10200 times) [2024-06-15 14:07:17,190][1651340] Signal inference workers to resume experience collection... (10200 times) [2024-06-15 14:07:17,190][1652475] InferenceWorker_p0-w0: resuming experience collection (10200 times) [2024-06-15 14:07:17,193][1652475] Updated weights for policy 0, policy_version 198352 (0.0120) [2024-06-15 14:07:20,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.9, 300 sec: 43653.6). Total num frames: 406323200. Throughput: 0: 10331.0. Samples: 101624320. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:07:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:07:23,115][1652475] Updated weights for policy 0, policy_version 198435 (0.0013) [2024-06-15 14:07:25,481][1652475] Updated weights for policy 0, policy_version 198496 (0.0041) [2024-06-15 14:07:25,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 406519808. Throughput: 0: 10365.1. Samples: 101693952. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:07:25,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:07:28,933][1652475] Updated weights for policy 0, policy_version 198560 (0.0014) [2024-06-15 14:07:30,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 42598.3, 300 sec: 44209.0). Total num frames: 406781952. Throughput: 0: 10285.5. Samples: 101754880. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:07:30,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:07:31,122][1652475] Updated weights for policy 0, policy_version 198653 (0.0020) [2024-06-15 14:07:35,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 41506.2, 300 sec: 43653.7). Total num frames: 406978560. Throughput: 0: 10308.3. Samples: 101788160. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:07:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:07:37,053][1652475] Updated weights for policy 0, policy_version 198736 (0.0014) [2024-06-15 14:07:40,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 39321.5, 300 sec: 43542.6). Total num frames: 407109632. Throughput: 0: 10478.9. Samples: 101853184. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:07:40,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:07:40,786][1652475] Updated weights for policy 0, policy_version 198800 (0.0016) [2024-06-15 14:07:43,001][1652475] Updated weights for policy 0, policy_version 198880 (0.0016) [2024-06-15 14:07:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 407371776. Throughput: 0: 10467.6. Samples: 101917184. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:07:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:07:46,824][1652475] Updated weights for policy 0, policy_version 198944 (0.0029) [2024-06-15 14:07:50,285][1652475] Updated weights for policy 0, policy_version 198995 (0.0013) [2024-06-15 14:07:50,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 41506.1, 300 sec: 43653.7). Total num frames: 407568384. Throughput: 0: 10604.1. Samples: 101949440. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:07:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:07:53,015][1652475] Updated weights for policy 0, policy_version 199059 (0.0155) [2024-06-15 14:07:55,188][1652475] Updated weights for policy 0, policy_version 199143 (0.0012) [2024-06-15 14:07:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 407896064. Throughput: 0: 10649.6. Samples: 102011904. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:07:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:07:59,022][1652475] Updated weights for policy 0, policy_version 199184 (0.0012) [2024-06-15 14:08:00,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 408027136. Throughput: 0: 10592.7. Samples: 102076928. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:08:00,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:02,509][1652475] Updated weights for policy 0, policy_version 199266 (0.0015) [2024-06-15 14:08:05,393][1651340] Signal inference workers to stop experience collection... (10250 times) [2024-06-15 14:08:05,469][1652475] InferenceWorker_p0-w0: stopping experience collection (10250 times) [2024-06-15 14:08:05,533][1652475] Updated weights for policy 0, policy_version 199335 (0.0133) [2024-06-15 14:08:05,646][1651340] Signal inference workers to resume experience collection... (10250 times) [2024-06-15 14:08:05,682][1652475] InferenceWorker_p0-w0: resuming experience collection (10250 times) [2024-06-15 14:08:05,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.2, 300 sec: 43875.8). Total num frames: 408256512. Throughput: 0: 10843.0. Samples: 102112256. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:08:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:07,547][1652475] Updated weights for policy 0, policy_version 199415 (0.0014) [2024-06-15 14:08:10,739][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 408420352. Throughput: 0: 10683.7. Samples: 102174720. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:08:10,741][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:11,711][1652475] Updated weights for policy 0, policy_version 199456 (0.0131) [2024-06-15 14:08:14,573][1652475] Updated weights for policy 0, policy_version 199520 (0.0018) [2024-06-15 14:08:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 408682496. Throughput: 0: 10831.7. Samples: 102242304. Policy #0 lag: (min: 10.0, avg: 131.0, max: 266.0) [2024-06-15 14:08:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:17,070][1652475] Updated weights for policy 0, policy_version 199569 (0.0013) [2024-06-15 14:08:19,195][1652475] Updated weights for policy 0, policy_version 199651 (0.0013) [2024-06-15 14:08:20,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 408944640. Throughput: 0: 10797.5. Samples: 102274048. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:08:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:23,559][1652475] Updated weights for policy 0, policy_version 199701 (0.0016) [2024-06-15 14:08:25,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 42598.3, 300 sec: 43542.5). Total num frames: 409075712. Throughput: 0: 10786.1. Samples: 102338560. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:08:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:25,872][1652475] Updated weights for policy 0, policy_version 199749 (0.0012) [2024-06-15 14:08:29,057][1652475] Updated weights for policy 0, policy_version 199830 (0.0161) [2024-06-15 14:08:30,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43144.7, 300 sec: 43653.6). Total num frames: 409370624. Throughput: 0: 10877.1. Samples: 102406656. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:08:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:31,038][1652475] Updated weights for policy 0, policy_version 199904 (0.0015) [2024-06-15 14:08:35,157][1652475] Updated weights for policy 0, policy_version 199953 (0.0025) [2024-06-15 14:08:35,738][1648984] Fps is (10 sec: 45876.0, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 409534464. Throughput: 0: 10808.9. Samples: 102435840. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:08:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:37,629][1652475] Updated weights for policy 0, policy_version 200004 (0.0014) [2024-06-15 14:08:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 409731072. Throughput: 0: 11002.3. Samples: 102507008. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:08:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:40,909][1652475] Updated weights for policy 0, policy_version 200066 (0.0024) [2024-06-15 14:08:42,917][1652475] Updated weights for policy 0, policy_version 200144 (0.0100) [2024-06-15 14:08:45,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 409993216. Throughput: 0: 10979.5. Samples: 102571008. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:08:45,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:46,413][1652475] Updated weights for policy 0, policy_version 200215 (0.0014) [2024-06-15 14:08:47,299][1652475] Updated weights for policy 0, policy_version 200256 (0.0013) [2024-06-15 14:08:50,541][1652475] Updated weights for policy 0, policy_version 200315 (0.0032) [2024-06-15 14:08:50,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 410255360. Throughput: 0: 10945.4. Samples: 102604800. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:08:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:52,636][1651340] Signal inference workers to stop experience collection... (10300 times) [2024-06-15 14:08:52,696][1652475] InferenceWorker_p0-w0: stopping experience collection (10300 times) [2024-06-15 14:08:52,966][1651340] Signal inference workers to resume experience collection... (10300 times) [2024-06-15 14:08:52,967][1652475] InferenceWorker_p0-w0: resuming experience collection (10300 times) [2024-06-15 14:08:53,814][1652475] Updated weights for policy 0, policy_version 200368 (0.0012) [2024-06-15 14:08:55,297][1652475] Updated weights for policy 0, policy_version 200432 (0.0013) [2024-06-15 14:08:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 410517504. Throughput: 0: 11070.6. Samples: 102672896. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:08:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:08:55,792][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000200448_410517504.pth... [2024-06-15 14:08:55,845][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000195328_400031744.pth [2024-06-15 14:08:59,319][1652475] Updated weights for policy 0, policy_version 200496 (0.0018) [2024-06-15 14:09:00,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 410648576. Throughput: 0: 11082.0. Samples: 102740992. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:09:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:09:01,331][1652475] Updated weights for policy 0, policy_version 200544 (0.0014) [2024-06-15 14:09:04,740][1652475] Updated weights for policy 0, policy_version 200578 (0.0014) [2024-06-15 14:09:05,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 410845184. Throughput: 0: 11104.7. Samples: 102773760. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:09:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:09:06,139][1652475] Updated weights for policy 0, policy_version 200629 (0.0012) [2024-06-15 14:09:09,746][1652475] Updated weights for policy 0, policy_version 200709 (0.0015) [2024-06-15 14:09:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 411107328. Throughput: 0: 11104.8. Samples: 102838272. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:09:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:09:14,985][1652475] Updated weights for policy 0, policy_version 200772 (0.0016) [2024-06-15 14:09:15,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 411238400. Throughput: 0: 11002.3. Samples: 102901760. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:09:15,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:09:17,675][1652475] Updated weights for policy 0, policy_version 200880 (0.0213) [2024-06-15 14:09:19,953][1652475] Updated weights for policy 0, policy_version 200934 (0.0020) [2024-06-15 14:09:20,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 411566080. Throughput: 0: 10808.9. Samples: 102922240. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:09:20,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:09:23,072][1652475] Updated weights for policy 0, policy_version 200982 (0.0014) [2024-06-15 14:09:25,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 43542.5). Total num frames: 411697152. Throughput: 0: 10581.3. Samples: 102983168. Policy #0 lag: (min: 31.0, avg: 139.3, max: 287.0) [2024-06-15 14:09:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:09:27,927][1652475] Updated weights for policy 0, policy_version 201040 (0.0014) [2024-06-15 14:09:30,255][1652475] Updated weights for policy 0, policy_version 201152 (0.0014) [2024-06-15 14:09:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.6, 300 sec: 43764.7). Total num frames: 411959296. Throughput: 0: 10410.7. Samples: 103039488. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:09:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:09:35,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 42598.2, 300 sec: 43098.2). Total num frames: 412090368. Throughput: 0: 10319.6. Samples: 103069184. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:09:35,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:09:37,420][1652475] Updated weights for policy 0, policy_version 201217 (0.0013) [2024-06-15 14:09:38,604][1652475] Updated weights for policy 0, policy_version 201279 (0.0013) [2024-06-15 14:09:40,213][1651340] Signal inference workers to stop experience collection... (10350 times) [2024-06-15 14:09:40,265][1652475] InferenceWorker_p0-w0: stopping experience collection (10350 times) [2024-06-15 14:09:40,432][1651340] Signal inference workers to resume experience collection... (10350 times) [2024-06-15 14:09:40,433][1652475] InferenceWorker_p0-w0: resuming experience collection (10350 times) [2024-06-15 14:09:40,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 412286976. Throughput: 0: 10262.8. Samples: 103134720. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:09:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:09:40,973][1652475] Updated weights for policy 0, policy_version 201338 (0.0112) [2024-06-15 14:09:44,764][1652475] Updated weights for policy 0, policy_version 201408 (0.0012) [2024-06-15 14:09:45,738][1648984] Fps is (10 sec: 45876.3, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 412549120. Throughput: 0: 10057.9. Samples: 103193600. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:09:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:09:46,334][1652475] Updated weights for policy 0, policy_version 201472 (0.0012) [2024-06-15 14:09:50,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 41506.1, 300 sec: 43320.5). Total num frames: 412745728. Throughput: 0: 10114.8. Samples: 103228928. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:09:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:09:52,426][1652475] Updated weights for policy 0, policy_version 201552 (0.0015) [2024-06-15 14:09:53,591][1652475] Updated weights for policy 0, policy_version 201599 (0.0012) [2024-06-15 14:09:55,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 39321.6, 300 sec: 43320.4). Total num frames: 412876800. Throughput: 0: 10126.2. Samples: 103293952. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:09:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:09:58,337][1652475] Updated weights for policy 0, policy_version 201696 (0.0143) [2024-06-15 14:10:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 43098.9). Total num frames: 413138944. Throughput: 0: 10035.3. Samples: 103353344. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:10:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:02,064][1652475] Updated weights for policy 0, policy_version 201746 (0.0013) [2024-06-15 14:10:04,930][1652475] Updated weights for policy 0, policy_version 201826 (0.0014) [2024-06-15 14:10:05,596][1652475] Updated weights for policy 0, policy_version 201856 (0.0012) [2024-06-15 14:10:05,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 42598.4, 300 sec: 43543.2). Total num frames: 413401088. Throughput: 0: 10376.5. Samples: 103389184. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:10:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:09,676][1652475] Updated weights for policy 0, policy_version 201936 (0.0086) [2024-06-15 14:10:10,762][1648984] Fps is (10 sec: 52302.8, 60 sec: 42581.3, 300 sec: 43539.0). Total num frames: 413663232. Throughput: 0: 10484.7. Samples: 103455232. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:10:10,762][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:12,976][1652475] Updated weights for policy 0, policy_version 201987 (0.0012) [2024-06-15 14:10:14,059][1652475] Updated weights for policy 0, policy_version 202047 (0.0023) [2024-06-15 14:10:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.5, 300 sec: 43431.5). Total num frames: 413794304. Throughput: 0: 10843.0. Samples: 103527424. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:10:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:19,841][1652475] Updated weights for policy 0, policy_version 202114 (0.0012) [2024-06-15 14:10:20,738][1648984] Fps is (10 sec: 32847.0, 60 sec: 40413.9, 300 sec: 43098.2). Total num frames: 413990912. Throughput: 0: 10922.7. Samples: 103560704. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:10:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:22,454][1652475] Updated weights for policy 0, policy_version 202224 (0.0014) [2024-06-15 14:10:25,246][1651340] Signal inference workers to stop experience collection... (10400 times) [2024-06-15 14:10:25,300][1652475] InferenceWorker_p0-w0: stopping experience collection (10400 times) [2024-06-15 14:10:25,491][1651340] Signal inference workers to resume experience collection... (10400 times) [2024-06-15 14:10:25,492][1652475] InferenceWorker_p0-w0: resuming experience collection (10400 times) [2024-06-15 14:10:25,494][1652475] Updated weights for policy 0, policy_version 202288 (0.0102) [2024-06-15 14:10:25,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 414285824. Throughput: 0: 10854.4. Samples: 103623168. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:10:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:27,987][1652475] Updated weights for policy 0, policy_version 202322 (0.0011) [2024-06-15 14:10:30,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 41506.1, 300 sec: 43439.2). Total num frames: 414449664. Throughput: 0: 11173.0. Samples: 103696384. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:10:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:31,892][1652475] Updated weights for policy 0, policy_version 202400 (0.0013) [2024-06-15 14:10:33,834][1652475] Updated weights for policy 0, policy_version 202488 (0.0013) [2024-06-15 14:10:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 414711808. Throughput: 0: 11013.7. Samples: 103724544. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 14:10:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:37,237][1652475] Updated weights for policy 0, policy_version 202558 (0.0012) [2024-06-15 14:10:40,738][1648984] Fps is (10 sec: 49149.8, 60 sec: 44236.5, 300 sec: 43431.4). Total num frames: 414941184. Throughput: 0: 11116.0. Samples: 103794176. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:10:40,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:41,029][1652475] Updated weights for policy 0, policy_version 202622 (0.0014) [2024-06-15 14:10:45,404][1652475] Updated weights for policy 0, policy_version 202704 (0.0075) [2024-06-15 14:10:45,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 415137792. Throughput: 0: 11161.6. Samples: 103855616. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:10:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:48,855][1652475] Updated weights for policy 0, policy_version 202787 (0.0016) [2024-06-15 14:10:50,739][1648984] Fps is (10 sec: 42596.2, 60 sec: 43689.9, 300 sec: 43098.2). Total num frames: 415367168. Throughput: 0: 11047.6. Samples: 103886336. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:10:50,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:52,235][1652475] Updated weights for policy 0, policy_version 202832 (0.0014) [2024-06-15 14:10:53,358][1652475] Updated weights for policy 0, policy_version 202880 (0.0030) [2024-06-15 14:10:55,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 415531008. Throughput: 0: 11178.9. Samples: 103958016. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:10:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:10:56,056][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000202912_415563776.pth... [2024-06-15 14:10:56,230][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000197888_405274624.pth [2024-06-15 14:10:56,237][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000202912_415563776.pth [2024-06-15 14:10:57,212][1652475] Updated weights for policy 0, policy_version 202946 (0.0014) [2024-06-15 14:11:00,092][1652475] Updated weights for policy 0, policy_version 203010 (0.0129) [2024-06-15 14:11:00,739][1648984] Fps is (10 sec: 45873.1, 60 sec: 44781.8, 300 sec: 43098.0). Total num frames: 415825920. Throughput: 0: 10854.1. Samples: 104015872. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:11:00,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:11:04,193][1652475] Updated weights for policy 0, policy_version 203074 (0.0014) [2024-06-15 14:11:05,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 416022528. Throughput: 0: 10843.0. Samples: 104048640. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:11:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:11:07,644][1652475] Updated weights for policy 0, policy_version 203138 (0.0012) [2024-06-15 14:11:09,123][1652475] Updated weights for policy 0, policy_version 203196 (0.0091) [2024-06-15 14:11:10,447][1652475] Updated weights for policy 0, policy_version 203260 (0.0146) [2024-06-15 14:11:10,738][1648984] Fps is (10 sec: 45881.8, 60 sec: 43708.2, 300 sec: 43098.2). Total num frames: 416284672. Throughput: 0: 11002.3. Samples: 104118272. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:11:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:11:11,584][1651340] Signal inference workers to stop experience collection... (10450 times) [2024-06-15 14:11:11,658][1652475] InferenceWorker_p0-w0: stopping experience collection (10450 times) [2024-06-15 14:11:11,833][1651340] Signal inference workers to resume experience collection... (10450 times) [2024-06-15 14:11:11,834][1652475] InferenceWorker_p0-w0: resuming experience collection (10450 times) [2024-06-15 14:11:12,835][1652475] Updated weights for policy 0, policy_version 203323 (0.0023) [2024-06-15 14:11:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 416415744. Throughput: 0: 10820.3. Samples: 104183296. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:11:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:11:17,061][1652475] Updated weights for policy 0, policy_version 203387 (0.0013) [2024-06-15 14:11:20,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 416645120. Throughput: 0: 11013.7. Samples: 104220160. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:11:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:11:21,133][1652475] Updated weights for policy 0, policy_version 203459 (0.0012) [2024-06-15 14:11:22,300][1652475] Updated weights for policy 0, policy_version 203520 (0.0014) [2024-06-15 14:11:25,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 44236.9, 300 sec: 43098.3). Total num frames: 416940032. Throughput: 0: 10774.9. Samples: 104279040. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:11:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:11:28,548][1652475] Updated weights for policy 0, policy_version 203587 (0.0109) [2024-06-15 14:11:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 417071104. Throughput: 0: 10968.2. Samples: 104349184. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:11:30,762][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 14:11:30,993][1652475] Updated weights for policy 0, policy_version 203649 (0.0011) [2024-06-15 14:11:33,688][1652475] Updated weights for policy 0, policy_version 203714 (0.0012) [2024-06-15 14:11:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 417333248. Throughput: 0: 10934.3. Samples: 104378368. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:11:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:11:36,126][1652475] Updated weights for policy 0, policy_version 203808 (0.0024) [2024-06-15 14:11:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42052.6, 300 sec: 42653.9). Total num frames: 417464320. Throughput: 0: 10638.2. Samples: 104436736. Policy #0 lag: (min: 8.0, avg: 130.6, max: 264.0) [2024-06-15 14:11:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:11:42,188][1652475] Updated weights for policy 0, policy_version 203856 (0.0018) [2024-06-15 14:11:43,604][1652475] Updated weights for policy 0, policy_version 203907 (0.0017) [2024-06-15 14:11:45,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 43144.3, 300 sec: 42876.1). Total num frames: 417726464. Throughput: 0: 10695.4. Samples: 104497152. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:11:45,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:11:48,396][1652475] Updated weights for policy 0, policy_version 203984 (0.0019) [2024-06-15 14:11:50,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42599.1, 300 sec: 42876.1). Total num frames: 417923072. Throughput: 0: 10740.6. Samples: 104531968. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:11:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 14:11:50,845][1652475] Updated weights for policy 0, policy_version 204080 (0.0015) [2024-06-15 14:11:55,134][1652475] Updated weights for policy 0, policy_version 204144 (0.0014) [2024-06-15 14:11:55,739][1648984] Fps is (10 sec: 39322.5, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 418119680. Throughput: 0: 10490.3. Samples: 104590336. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:11:55,740][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:11:56,573][1652475] Updated weights for policy 0, policy_version 204196 (0.0014) [2024-06-15 14:12:00,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 40960.9, 300 sec: 42431.8). Total num frames: 418283520. Throughput: 0: 10547.2. Samples: 104657920. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:00,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 14:12:01,122][1651340] Signal inference workers to stop experience collection... (10500 times) [2024-06-15 14:12:01,173][1652475] InferenceWorker_p0-w0: stopping experience collection (10500 times) [2024-06-15 14:12:01,406][1651340] Signal inference workers to resume experience collection... (10500 times) [2024-06-15 14:12:01,407][1652475] InferenceWorker_p0-w0: resuming experience collection (10500 times) [2024-06-15 14:12:02,522][1652475] Updated weights for policy 0, policy_version 204307 (0.0017) [2024-06-15 14:12:05,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 418512896. Throughput: 0: 10160.4. Samples: 104677376. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:06,137][1652475] Updated weights for policy 0, policy_version 204355 (0.0016) [2024-06-15 14:12:09,439][1652475] Updated weights for policy 0, policy_version 204432 (0.0024) [2024-06-15 14:12:10,516][1652475] Updated weights for policy 0, policy_version 204475 (0.0018) [2024-06-15 14:12:10,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 418775040. Throughput: 0: 10501.7. Samples: 104751616. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:13,512][1652475] Updated weights for policy 0, policy_version 204529 (0.0012) [2024-06-15 14:12:15,119][1652475] Updated weights for policy 0, policy_version 204595 (0.0012) [2024-06-15 14:12:15,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 419037184. Throughput: 0: 10217.3. Samples: 104808960. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:18,719][1652475] Updated weights for policy 0, policy_version 204640 (0.0012) [2024-06-15 14:12:19,209][1652475] Updated weights for policy 0, policy_version 204672 (0.0031) [2024-06-15 14:12:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 419168256. Throughput: 0: 10456.2. Samples: 104848896. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:22,374][1652475] Updated weights for policy 0, policy_version 204732 (0.0014) [2024-06-15 14:12:25,737][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 42876.1). Total num frames: 419430400. Throughput: 0: 10695.1. Samples: 104918016. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:25,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:26,502][1652475] Updated weights for policy 0, policy_version 204832 (0.0013) [2024-06-15 14:12:30,512][1652475] Updated weights for policy 0, policy_version 204880 (0.0014) [2024-06-15 14:12:30,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 419594240. Throughput: 0: 10661.0. Samples: 104976896. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:33,474][1652475] Updated weights for policy 0, policy_version 204930 (0.0013) [2024-06-15 14:12:34,500][1652475] Updated weights for policy 0, policy_version 204984 (0.0013) [2024-06-15 14:12:35,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 419823616. Throughput: 0: 10706.5. Samples: 105013760. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:37,012][1652475] Updated weights for policy 0, policy_version 205026 (0.0014) [2024-06-15 14:12:39,311][1652475] Updated weights for policy 0, policy_version 205113 (0.0038) [2024-06-15 14:12:40,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 420085760. Throughput: 0: 10683.7. Samples: 105071104. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:43,104][1652475] Updated weights for policy 0, policy_version 205168 (0.0016) [2024-06-15 14:12:45,498][1652475] Updated weights for policy 0, policy_version 205202 (0.0015) [2024-06-15 14:12:45,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.6, 300 sec: 43098.3). Total num frames: 420282368. Throughput: 0: 10843.1. Samples: 105145856. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:48,027][1651340] Signal inference workers to stop experience collection... (10550 times) [2024-06-15 14:12:48,059][1652475] Updated weights for policy 0, policy_version 205250 (0.0014) [2024-06-15 14:12:48,094][1652475] InferenceWorker_p0-w0: stopping experience collection (10550 times) [2024-06-15 14:12:48,373][1651340] Signal inference workers to resume experience collection... (10550 times) [2024-06-15 14:12:48,374][1652475] InferenceWorker_p0-w0: resuming experience collection (10550 times) [2024-06-15 14:12:50,040][1652475] Updated weights for policy 0, policy_version 205328 (0.0014) [2024-06-15 14:12:50,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 420544512. Throughput: 0: 11195.7. Samples: 105181184. Policy #0 lag: (min: 58.0, avg: 111.9, max: 250.0) [2024-06-15 14:12:50,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:54,064][1652475] Updated weights for policy 0, policy_version 205392 (0.0014) [2024-06-15 14:12:55,738][1648984] Fps is (10 sec: 45874.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 420741120. Throughput: 0: 10979.5. Samples: 105245696. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:12:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:12:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000205440_420741120.pth... [2024-06-15 14:12:55,851][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000200448_410517504.pth [2024-06-15 14:12:56,787][1652475] Updated weights for policy 0, policy_version 205456 (0.0038) [2024-06-15 14:13:00,686][1652475] Updated weights for policy 0, policy_version 205523 (0.0014) [2024-06-15 14:13:00,738][1648984] Fps is (10 sec: 36043.0, 60 sec: 43690.3, 300 sec: 42876.0). Total num frames: 420904960. Throughput: 0: 11241.1. Samples: 105314816. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:00,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:13:03,009][1652475] Updated weights for policy 0, policy_version 205616 (0.0013) [2024-06-15 14:13:05,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 421134336. Throughput: 0: 10763.4. Samples: 105333248. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:05,745][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:13:06,874][1652475] Updated weights for policy 0, policy_version 205664 (0.0011) [2024-06-15 14:13:07,543][1652475] Updated weights for policy 0, policy_version 205694 (0.0046) [2024-06-15 14:13:10,110][1652475] Updated weights for policy 0, policy_version 205749 (0.0017) [2024-06-15 14:13:10,738][1648984] Fps is (10 sec: 49154.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 421396480. Throughput: 0: 10854.4. Samples: 105406464. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:10,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:13:13,222][1652475] Updated weights for policy 0, policy_version 205793 (0.0040) [2024-06-15 14:13:14,816][1652475] Updated weights for policy 0, policy_version 205861 (0.0017) [2024-06-15 14:13:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 421658624. Throughput: 0: 10922.7. Samples: 105468416. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:15,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:13:19,106][1652475] Updated weights for policy 0, policy_version 205928 (0.0014) [2024-06-15 14:13:20,066][1652475] Updated weights for policy 0, policy_version 205953 (0.0014) [2024-06-15 14:13:20,739][1648984] Fps is (10 sec: 45870.8, 60 sec: 44782.2, 300 sec: 43320.3). Total num frames: 421855232. Throughput: 0: 10990.7. Samples: 105508352. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:20,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:13:21,383][1652475] Updated weights for policy 0, policy_version 206011 (0.0013) [2024-06-15 14:13:25,147][1652475] Updated weights for policy 0, policy_version 206064 (0.0015) [2024-06-15 14:13:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 422051840. Throughput: 0: 11207.1. Samples: 105575424. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:25,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:13:26,678][1652475] Updated weights for policy 0, policy_version 206139 (0.0013) [2024-06-15 14:13:30,738][1648984] Fps is (10 sec: 42602.6, 60 sec: 44783.0, 300 sec: 43209.3). Total num frames: 422281216. Throughput: 0: 11025.0. Samples: 105641984. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:13:30,842][1652475] Updated weights for policy 0, policy_version 206205 (0.0019) [2024-06-15 14:13:32,160][1651340] Signal inference workers to stop experience collection... (10600 times) [2024-06-15 14:13:32,200][1652475] InferenceWorker_p0-w0: stopping experience collection (10600 times) [2024-06-15 14:13:32,482][1651340] Signal inference workers to resume experience collection... (10600 times) [2024-06-15 14:13:32,483][1652475] InferenceWorker_p0-w0: resuming experience collection (10600 times) [2024-06-15 14:13:33,219][1652475] Updated weights for policy 0, policy_version 206266 (0.0012) [2024-06-15 14:13:35,739][1648984] Fps is (10 sec: 39315.4, 60 sec: 43689.5, 300 sec: 43098.0). Total num frames: 422445056. Throughput: 0: 10774.4. Samples: 105666048. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:35,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:13:37,028][1652475] Updated weights for policy 0, policy_version 206308 (0.0014) [2024-06-15 14:13:38,743][1652475] Updated weights for policy 0, policy_version 206393 (0.0014) [2024-06-15 14:13:40,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 422707200. Throughput: 0: 10899.9. Samples: 105736192. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:13:43,385][1652475] Updated weights for policy 0, policy_version 206450 (0.0011) [2024-06-15 14:13:45,294][1652475] Updated weights for policy 0, policy_version 206525 (0.0139) [2024-06-15 14:13:45,738][1648984] Fps is (10 sec: 52437.1, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 422969344. Throughput: 0: 10695.2. Samples: 105796096. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:13:49,332][1652475] Updated weights for policy 0, policy_version 206583 (0.0013) [2024-06-15 14:13:50,478][1652475] Updated weights for policy 0, policy_version 206624 (0.0014) [2024-06-15 14:13:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 423165952. Throughput: 0: 11184.3. Samples: 105836544. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:50,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:13:55,202][1652475] Updated weights for policy 0, policy_version 206692 (0.0141) [2024-06-15 14:13:55,750][1648984] Fps is (10 sec: 35999.3, 60 sec: 43135.5, 300 sec: 42985.3). Total num frames: 423329792. Throughput: 0: 10908.2. Samples: 105897472. Policy #0 lag: (min: 79.0, avg: 195.5, max: 351.0) [2024-06-15 14:13:55,751][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:13:58,821][1652475] Updated weights for policy 0, policy_version 206768 (0.0013) [2024-06-15 14:14:00,739][1648984] Fps is (10 sec: 36039.0, 60 sec: 43689.8, 300 sec: 42986.9). Total num frames: 423526400. Throughput: 0: 10854.0. Samples: 105956864. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:00,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 14:14:01,392][1652475] Updated weights for policy 0, policy_version 206844 (0.0012) [2024-06-15 14:14:05,412][1652475] Updated weights for policy 0, policy_version 206906 (0.0014) [2024-06-15 14:14:05,738][1648984] Fps is (10 sec: 42652.3, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 423755776. Throughput: 0: 10615.7. Samples: 105986048. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:14:07,043][1652475] Updated weights for policy 0, policy_version 206950 (0.0105) [2024-06-15 14:14:10,738][1648984] Fps is (10 sec: 36049.8, 60 sec: 41506.0, 300 sec: 42876.1). Total num frames: 423886848. Throughput: 0: 10296.8. Samples: 106038784. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:10,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:14:11,399][1652475] Updated weights for policy 0, policy_version 207008 (0.0077) [2024-06-15 14:14:15,143][1652475] Updated weights for policy 0, policy_version 207094 (0.0013) [2024-06-15 14:14:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 424148992. Throughput: 0: 10160.3. Samples: 106099200. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:14:18,167][1652475] Updated weights for policy 0, policy_version 207152 (0.0013) [2024-06-15 14:14:20,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 40960.6, 300 sec: 42765.0). Total num frames: 424312832. Throughput: 0: 10342.7. Samples: 106131456. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:14:21,261][1652475] Updated weights for policy 0, policy_version 207216 (0.0012) [2024-06-15 14:14:22,974][1651340] Signal inference workers to stop experience collection... (10650 times) [2024-06-15 14:14:22,982][1652475] Updated weights for policy 0, policy_version 207249 (0.0013) [2024-06-15 14:14:23,034][1652475] InferenceWorker_p0-w0: stopping experience collection (10650 times) [2024-06-15 14:14:23,144][1651340] Signal inference workers to resume experience collection... (10650 times) [2024-06-15 14:14:23,144][1652475] InferenceWorker_p0-w0: resuming experience collection (10650 times) [2024-06-15 14:14:25,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 424542208. Throughput: 0: 10376.5. Samples: 106203136. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:14:26,036][1652475] Updated weights for policy 0, policy_version 207297 (0.0020) [2024-06-15 14:14:27,594][1652475] Updated weights for policy 0, policy_version 207359 (0.0012) [2024-06-15 14:14:29,939][1652475] Updated weights for policy 0, policy_version 207414 (0.0024) [2024-06-15 14:14:30,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 424804352. Throughput: 0: 10456.2. Samples: 106266624. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:14:32,239][1652475] Updated weights for policy 0, policy_version 207445 (0.0051) [2024-06-15 14:14:34,791][1652475] Updated weights for policy 0, policy_version 207521 (0.0013) [2024-06-15 14:14:35,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 43691.8, 300 sec: 43320.4). Total num frames: 425066496. Throughput: 0: 10353.8. Samples: 106302464. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:14:38,131][1652475] Updated weights for policy 0, policy_version 207577 (0.0016) [2024-06-15 14:14:38,998][1652475] Updated weights for policy 0, policy_version 207616 (0.0056) [2024-06-15 14:14:40,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 425230336. Throughput: 0: 10538.8. Samples: 106371584. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:14:41,643][1652475] Updated weights for policy 0, policy_version 207680 (0.0012) [2024-06-15 14:14:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 425459712. Throughput: 0: 10627.2. Samples: 106435072. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:14:46,208][1652475] Updated weights for policy 0, policy_version 207751 (0.0150) [2024-06-15 14:14:49,584][1652475] Updated weights for policy 0, policy_version 207824 (0.0098) [2024-06-15 14:14:50,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 425721856. Throughput: 0: 10661.0. Samples: 106465792. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:14:52,376][1652475] Updated weights for policy 0, policy_version 207878 (0.0013) [2024-06-15 14:14:53,652][1652475] Updated weights for policy 0, policy_version 207936 (0.0013) [2024-06-15 14:14:55,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 42060.9, 300 sec: 43098.2). Total num frames: 425852928. Throughput: 0: 11047.8. Samples: 106535936. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:14:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:14:56,072][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000207968_425918464.pth... [2024-06-15 14:14:56,220][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000202912_415563776.pth [2024-06-15 14:14:56,761][1652475] Updated weights for policy 0, policy_version 207998 (0.0013) [2024-06-15 14:14:58,890][1652475] Updated weights for policy 0, policy_version 208055 (0.0013) [2024-06-15 14:15:00,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43145.8, 300 sec: 43098.3). Total num frames: 426115072. Throughput: 0: 11218.5. Samples: 106604032. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:15:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:01,769][1652475] Updated weights for policy 0, policy_version 208098 (0.0013) [2024-06-15 14:15:04,906][1652475] Updated weights for policy 0, policy_version 208183 (0.0014) [2024-06-15 14:15:05,738][1648984] Fps is (10 sec: 52430.6, 60 sec: 43690.7, 300 sec: 43101.8). Total num frames: 426377216. Throughput: 0: 11252.7. Samples: 106637824. Policy #0 lag: (min: 15.0, avg: 147.2, max: 335.0) [2024-06-15 14:15:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:08,796][1652475] Updated weights for policy 0, policy_version 208225 (0.0014) [2024-06-15 14:15:10,316][1651340] Signal inference workers to stop experience collection... (10700 times) [2024-06-15 14:15:10,374][1652475] InferenceWorker_p0-w0: stopping experience collection (10700 times) [2024-06-15 14:15:10,381][1652475] Updated weights for policy 0, policy_version 208291 (0.0012) [2024-06-15 14:15:10,617][1651340] Signal inference workers to resume experience collection... (10700 times) [2024-06-15 14:15:10,617][1652475] InferenceWorker_p0-w0: resuming experience collection (10700 times) [2024-06-15 14:15:10,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 45329.3, 300 sec: 43431.5). Total num frames: 426606592. Throughput: 0: 11127.5. Samples: 106703872. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:15:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:13,473][1652475] Updated weights for policy 0, policy_version 208339 (0.0015) [2024-06-15 14:15:15,679][1652475] Updated weights for policy 0, policy_version 208404 (0.0015) [2024-06-15 14:15:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.9, 300 sec: 43431.5). Total num frames: 426803200. Throughput: 0: 11252.6. Samples: 106772992. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:15:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:16,801][1652475] Updated weights for policy 0, policy_version 208448 (0.0053) [2024-06-15 14:15:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44783.0, 300 sec: 43098.3). Total num frames: 426999808. Throughput: 0: 11207.1. Samples: 106806784. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:15:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:21,287][1652475] Updated weights for policy 0, policy_version 208514 (0.0038) [2024-06-15 14:15:25,540][1652475] Updated weights for policy 0, policy_version 208611 (0.0015) [2024-06-15 14:15:25,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 45329.3, 300 sec: 43431.5). Total num frames: 427261952. Throughput: 0: 10968.2. Samples: 106865152. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:15:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:28,451][1652475] Updated weights for policy 0, policy_version 208699 (0.0013) [2024-06-15 14:15:30,746][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 427425792. Throughput: 0: 11184.4. Samples: 106938368. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:15:30,746][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:32,473][1652475] Updated weights for policy 0, policy_version 208766 (0.0012) [2024-06-15 14:15:34,854][1652475] Updated weights for policy 0, policy_version 208829 (0.0011) [2024-06-15 14:15:35,760][1648984] Fps is (10 sec: 42502.5, 60 sec: 43674.3, 300 sec: 43206.1). Total num frames: 427687936. Throughput: 0: 11156.0. Samples: 106968064. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:15:35,761][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:37,991][1652475] Updated weights for policy 0, policy_version 208887 (0.0016) [2024-06-15 14:15:40,201][1652475] Updated weights for policy 0, policy_version 208960 (0.0012) [2024-06-15 14:15:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45329.0, 300 sec: 43431.5). Total num frames: 427950080. Throughput: 0: 11059.3. Samples: 107033600. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:15:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:44,391][1652475] Updated weights for policy 0, policy_version 209021 (0.0111) [2024-06-15 14:15:45,738][1648984] Fps is (10 sec: 39409.4, 60 sec: 43690.5, 300 sec: 43098.4). Total num frames: 428081152. Throughput: 0: 11047.8. Samples: 107101184. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:15:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:46,860][1652475] Updated weights for policy 0, policy_version 209077 (0.0013) [2024-06-15 14:15:49,330][1652475] Updated weights for policy 0, policy_version 209120 (0.0012) [2024-06-15 14:15:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 428343296. Throughput: 0: 11047.8. Samples: 107134976. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:15:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:51,010][1652475] Updated weights for policy 0, policy_version 209168 (0.0012) [2024-06-15 14:15:54,680][1652475] Updated weights for policy 0, policy_version 209217 (0.0016) [2024-06-15 14:15:55,754][1648984] Fps is (10 sec: 49074.5, 60 sec: 45317.2, 300 sec: 43207.2). Total num frames: 428572672. Throughput: 0: 11100.8. Samples: 107203584. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:15:55,755][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:15:57,820][1652475] Updated weights for policy 0, policy_version 209296 (0.0014) [2024-06-15 14:15:59,019][1652475] Updated weights for policy 0, policy_version 209343 (0.0012) [2024-06-15 14:16:00,439][1651340] Signal inference workers to stop experience collection... (10750 times) [2024-06-15 14:16:00,483][1652475] InferenceWorker_p0-w0: stopping experience collection (10750 times) [2024-06-15 14:16:00,735][1651340] Signal inference workers to resume experience collection... (10750 times) [2024-06-15 14:16:00,735][1652475] InferenceWorker_p0-w0: resuming experience collection (10750 times) [2024-06-15 14:16:00,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 428769280. Throughput: 0: 11025.1. Samples: 107269120. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:16:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:16:01,442][1652475] Updated weights for policy 0, policy_version 209400 (0.0015) [2024-06-15 14:16:03,233][1652475] Updated weights for policy 0, policy_version 209440 (0.0012) [2024-06-15 14:16:05,738][1648984] Fps is (10 sec: 42667.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 428998656. Throughput: 0: 10934.0. Samples: 107298816. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:16:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:16:07,254][1652475] Updated weights for policy 0, policy_version 209508 (0.0037) [2024-06-15 14:16:10,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 429129728. Throughput: 0: 11150.2. Samples: 107366912. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:16:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:16:11,354][1652475] Updated weights for policy 0, policy_version 209584 (0.0036) [2024-06-15 14:16:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 429391872. Throughput: 0: 10752.0. Samples: 107422208. Policy #0 lag: (min: 5.0, avg: 103.0, max: 261.0) [2024-06-15 14:16:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:16:16,249][1652475] Updated weights for policy 0, policy_version 209683 (0.0012) [2024-06-15 14:16:17,351][1652475] Updated weights for policy 0, policy_version 209728 (0.0013) [2024-06-15 14:16:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 44236.9, 300 sec: 43098.3). Total num frames: 429654016. Throughput: 0: 10700.5. Samples: 107449344. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:16:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 14:16:24,555][1652475] Updated weights for policy 0, policy_version 209808 (0.0099) [2024-06-15 14:16:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 429785088. Throughput: 0: 10695.1. Samples: 107514880. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:16:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:16:26,124][1652475] Updated weights for policy 0, policy_version 209888 (0.0014) [2024-06-15 14:16:30,740][1648984] Fps is (10 sec: 29490.7, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 429948928. Throughput: 0: 10501.7. Samples: 107573760. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:16:30,741][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:16:31,816][1652475] Updated weights for policy 0, policy_version 209984 (0.0012) [2024-06-15 14:16:33,220][1652475] Updated weights for policy 0, policy_version 210042 (0.0040) [2024-06-15 14:16:35,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41521.6, 300 sec: 43098.2). Total num frames: 430178304. Throughput: 0: 10274.1. Samples: 107597312. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:16:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:16:38,420][1652475] Updated weights for policy 0, policy_version 210144 (0.0166) [2024-06-15 14:16:40,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 430440448. Throughput: 0: 10186.7. Samples: 107661824. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:16:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:16:43,231][1652475] Updated weights for policy 0, policy_version 210208 (0.0011) [2024-06-15 14:16:45,176][1652475] Updated weights for policy 0, policy_version 210288 (0.0014) [2024-06-15 14:16:45,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.9, 300 sec: 43320.4). Total num frames: 430702592. Throughput: 0: 10217.2. Samples: 107728896. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:16:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:16:48,526][1651340] Signal inference workers to stop experience collection... (10800 times) [2024-06-15 14:16:48,616][1652475] InferenceWorker_p0-w0: stopping experience collection (10800 times) [2024-06-15 14:16:48,779][1651340] Signal inference workers to resume experience collection... (10800 times) [2024-06-15 14:16:48,780][1652475] InferenceWorker_p0-w0: resuming experience collection (10800 times) [2024-06-15 14:16:49,295][1652475] Updated weights for policy 0, policy_version 210355 (0.0018) [2024-06-15 14:16:50,498][1652475] Updated weights for policy 0, policy_version 210400 (0.0014) [2024-06-15 14:16:50,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 430899200. Throughput: 0: 10444.8. Samples: 107768832. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:16:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:16:54,775][1652475] Updated weights for policy 0, policy_version 210448 (0.0044) [2024-06-15 14:16:55,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 41517.2, 300 sec: 43320.4). Total num frames: 431063040. Throughput: 0: 10410.7. Samples: 107835392. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:16:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:16:56,191][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000210512_431128576.pth... [2024-06-15 14:16:56,305][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000205440_420741120.pth [2024-06-15 14:16:56,617][1652475] Updated weights for policy 0, policy_version 210528 (0.0178) [2024-06-15 14:17:00,371][1652475] Updated weights for policy 0, policy_version 210563 (0.0012) [2024-06-15 14:17:00,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 431259648. Throughput: 0: 10626.8. Samples: 107900416. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:17:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:02,028][1652475] Updated weights for policy 0, policy_version 210640 (0.0013) [2024-06-15 14:17:05,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 431489024. Throughput: 0: 10490.3. Samples: 107921408. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:17:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:07,827][1652475] Updated weights for policy 0, policy_version 210736 (0.0012) [2024-06-15 14:17:09,420][1652475] Updated weights for policy 0, policy_version 210815 (0.0013) [2024-06-15 14:17:10,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 431751168. Throughput: 0: 10547.2. Samples: 107989504. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:17:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:14,380][1652475] Updated weights for policy 0, policy_version 210880 (0.0123) [2024-06-15 14:17:15,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 431947776. Throughput: 0: 10604.1. Samples: 108050944. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:17:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:19,840][1652475] Updated weights for policy 0, policy_version 210960 (0.0031) [2024-06-15 14:17:20,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 40959.9, 300 sec: 42987.2). Total num frames: 432111616. Throughput: 0: 10911.3. Samples: 108088320. Policy #0 lag: (min: 7.0, avg: 133.2, max: 263.0) [2024-06-15 14:17:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:21,774][1652475] Updated weights for policy 0, policy_version 211056 (0.0012) [2024-06-15 14:17:25,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 432308224. Throughput: 0: 10934.1. Samples: 108153856. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:17:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:26,678][1652475] Updated weights for policy 0, policy_version 211136 (0.0013) [2024-06-15 14:17:30,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43144.6, 300 sec: 43098.3). Total num frames: 432537600. Throughput: 0: 10717.9. Samples: 108211200. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:17:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:31,973][1652475] Updated weights for policy 0, policy_version 211202 (0.0102) [2024-06-15 14:17:32,420][1651340] Signal inference workers to stop experience collection... (10850 times) [2024-06-15 14:17:32,482][1652475] InferenceWorker_p0-w0: stopping experience collection (10850 times) [2024-06-15 14:17:32,648][1651340] Signal inference workers to resume experience collection... (10850 times) [2024-06-15 14:17:32,650][1652475] InferenceWorker_p0-w0: resuming experience collection (10850 times) [2024-06-15 14:17:33,183][1652475] Updated weights for policy 0, policy_version 211272 (0.0016) [2024-06-15 14:17:35,738][1648984] Fps is (10 sec: 49150.2, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 432799744. Throughput: 0: 10615.4. Samples: 108246528. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:17:35,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:38,028][1652475] Updated weights for policy 0, policy_version 211360 (0.0014) [2024-06-15 14:17:39,494][1652475] Updated weights for policy 0, policy_version 211410 (0.0013) [2024-06-15 14:17:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 433061888. Throughput: 0: 10615.5. Samples: 108313088. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:17:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:44,413][1652475] Updated weights for policy 0, policy_version 211488 (0.0014) [2024-06-15 14:17:45,738][1648984] Fps is (10 sec: 45876.6, 60 sec: 42598.3, 300 sec: 43098.3). Total num frames: 433258496. Throughput: 0: 10547.2. Samples: 108375040. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:17:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:49,172][1652475] Updated weights for policy 0, policy_version 211600 (0.0014) [2024-06-15 14:17:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42598.5, 300 sec: 43098.3). Total num frames: 433455104. Throughput: 0: 10922.7. Samples: 108412928. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:17:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:50,898][1652475] Updated weights for policy 0, policy_version 211664 (0.0129) [2024-06-15 14:17:55,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 433586176. Throughput: 0: 10808.9. Samples: 108475904. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:17:55,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:17:55,885][1652475] Updated weights for policy 0, policy_version 211728 (0.0012) [2024-06-15 14:17:58,111][1652475] Updated weights for policy 0, policy_version 211829 (0.0246) [2024-06-15 14:18:00,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 433848320. Throughput: 0: 10968.2. Samples: 108544512. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:18:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:18:01,642][1652475] Updated weights for policy 0, policy_version 211859 (0.0011) [2024-06-15 14:18:03,921][1652475] Updated weights for policy 0, policy_version 211956 (0.0143) [2024-06-15 14:18:05,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 434110464. Throughput: 0: 10752.0. Samples: 108572160. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:18:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:18:09,404][1652475] Updated weights for policy 0, policy_version 212034 (0.0038) [2024-06-15 14:18:10,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 434339840. Throughput: 0: 10729.3. Samples: 108636672. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:18:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:18:10,788][1652475] Updated weights for policy 0, policy_version 212088 (0.0011) [2024-06-15 14:18:14,331][1652475] Updated weights for policy 0, policy_version 212131 (0.0012) [2024-06-15 14:18:15,192][1651340] Signal inference workers to stop experience collection... (10900 times) [2024-06-15 14:18:15,240][1652475] InferenceWorker_p0-w0: stopping experience collection (10900 times) [2024-06-15 14:18:15,376][1651340] Signal inference workers to resume experience collection... (10900 times) [2024-06-15 14:18:15,379][1652475] InferenceWorker_p0-w0: resuming experience collection (10900 times) [2024-06-15 14:18:15,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.7, 300 sec: 43098.4). Total num frames: 434569216. Throughput: 0: 10911.3. Samples: 108702208. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:18:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:18:15,931][1652475] Updated weights for policy 0, policy_version 212208 (0.0013) [2024-06-15 14:18:20,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 434700288. Throughput: 0: 10877.2. Samples: 108736000. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:18:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:18:20,881][1652475] Updated weights for policy 0, policy_version 212259 (0.0012) [2024-06-15 14:18:22,436][1652475] Updated weights for policy 0, policy_version 212323 (0.0012) [2024-06-15 14:18:25,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 434896896. Throughput: 0: 10865.8. Samples: 108802048. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:18:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:18:26,636][1652475] Updated weights for policy 0, policy_version 212392 (0.0018) [2024-06-15 14:18:27,667][1652475] Updated weights for policy 0, policy_version 212448 (0.0013) [2024-06-15 14:18:30,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43690.6, 300 sec: 43098.5). Total num frames: 435159040. Throughput: 0: 11013.7. Samples: 108870656. Policy #0 lag: (min: 9.0, avg: 108.7, max: 265.0) [2024-06-15 14:18:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:18:32,796][1652475] Updated weights for policy 0, policy_version 212530 (0.0130) [2024-06-15 14:18:34,241][1652475] Updated weights for policy 0, policy_version 212592 (0.0012) [2024-06-15 14:18:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43691.0, 300 sec: 43098.3). Total num frames: 435421184. Throughput: 0: 10786.1. Samples: 108898304. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:18:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:18:38,961][1652475] Updated weights for policy 0, policy_version 212656 (0.0022) [2024-06-15 14:18:40,759][1648984] Fps is (10 sec: 45779.4, 60 sec: 42583.5, 300 sec: 42873.1). Total num frames: 435617792. Throughput: 0: 10747.0. Samples: 108959744. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:18:40,759][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:18:41,252][1652475] Updated weights for policy 0, policy_version 212733 (0.0015) [2024-06-15 14:18:45,738][1648984] Fps is (10 sec: 36043.5, 60 sec: 42052.1, 300 sec: 42765.0). Total num frames: 435781632. Throughput: 0: 10376.5. Samples: 109011456. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:18:45,738][1652475] Updated weights for policy 0, policy_version 212793 (0.0016) [2024-06-15 14:18:45,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:18:47,849][1652475] Updated weights for policy 0, policy_version 212848 (0.0011) [2024-06-15 14:18:50,738][1648984] Fps is (10 sec: 32836.6, 60 sec: 41506.1, 300 sec: 42766.8). Total num frames: 435945472. Throughput: 0: 10387.9. Samples: 109039616. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:18:50,738][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 14:18:52,719][1652475] Updated weights for policy 0, policy_version 212896 (0.0013) [2024-06-15 14:18:54,697][1652475] Updated weights for policy 0, policy_version 212987 (0.0030) [2024-06-15 14:18:55,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 43690.6, 300 sec: 42987.4). Total num frames: 436207616. Throughput: 0: 10228.6. Samples: 109096960. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:18:55,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:18:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000212992_436207616.pth... [2024-06-15 14:18:55,785][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000207968_425918464.pth [2024-06-15 14:18:59,655][1652475] Updated weights for policy 0, policy_version 213062 (0.0014) [2024-06-15 14:19:00,625][1652475] Updated weights for policy 0, policy_version 213117 (0.0013) [2024-06-15 14:19:00,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 436469760. Throughput: 0: 10399.2. Samples: 109170176. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:19:00,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:03,251][1651340] Signal inference workers to stop experience collection... (10950 times) [2024-06-15 14:19:03,301][1652475] InferenceWorker_p0-w0: stopping experience collection (10950 times) [2024-06-15 14:19:03,598][1651340] Signal inference workers to resume experience collection... (10950 times) [2024-06-15 14:19:03,609][1652475] InferenceWorker_p0-w0: resuming experience collection (10950 times) [2024-06-15 14:19:04,584][1652475] Updated weights for policy 0, policy_version 213173 (0.0013) [2024-06-15 14:19:05,728][1652475] Updated weights for policy 0, policy_version 213220 (0.0136) [2024-06-15 14:19:05,738][1648984] Fps is (10 sec: 45876.0, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 436666368. Throughput: 0: 10513.1. Samples: 109209088. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:19:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:09,580][1652475] Updated weights for policy 0, policy_version 213280 (0.0017) [2024-06-15 14:19:10,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 436862976. Throughput: 0: 10695.1. Samples: 109283328. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:19:10,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:10,987][1652475] Updated weights for policy 0, policy_version 213328 (0.0011) [2024-06-15 14:19:14,732][1652475] Updated weights for policy 0, policy_version 213379 (0.0018) [2024-06-15 14:19:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 43320.4). Total num frames: 437092352. Throughput: 0: 10626.8. Samples: 109348864. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:19:15,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:16,805][1652475] Updated weights for policy 0, policy_version 213472 (0.0015) [2024-06-15 14:19:20,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43144.6, 300 sec: 43209.4). Total num frames: 437288960. Throughput: 0: 10569.9. Samples: 109373952. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:19:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:21,352][1652475] Updated weights for policy 0, policy_version 213552 (0.0096) [2024-06-15 14:19:23,715][1652475] Updated weights for policy 0, policy_version 213616 (0.0124) [2024-06-15 14:19:25,740][1648984] Fps is (10 sec: 42588.1, 60 sec: 43688.9, 300 sec: 43097.9). Total num frames: 437518336. Throughput: 0: 10836.1. Samples: 109447168. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:19:25,741][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:26,742][1652475] Updated weights for policy 0, policy_version 213664 (0.0018) [2024-06-15 14:19:28,534][1652475] Updated weights for policy 0, policy_version 213744 (0.0014) [2024-06-15 14:19:30,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 437780480. Throughput: 0: 11116.2. Samples: 109511680. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:19:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:33,496][1652475] Updated weights for policy 0, policy_version 213812 (0.0014) [2024-06-15 14:19:34,810][1652475] Updated weights for policy 0, policy_version 213841 (0.0017) [2024-06-15 14:19:35,738][1648984] Fps is (10 sec: 52441.3, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 438042624. Throughput: 0: 11366.4. Samples: 109551104. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:19:35,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:37,176][1652475] Updated weights for policy 0, policy_version 213891 (0.0014) [2024-06-15 14:19:38,585][1652475] Updated weights for policy 0, policy_version 213945 (0.0100) [2024-06-15 14:19:39,955][1652475] Updated weights for policy 0, policy_version 214014 (0.0011) [2024-06-15 14:19:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44798.6, 300 sec: 43542.6). Total num frames: 438304768. Throughput: 0: 11514.4. Samples: 109615104. Policy #0 lag: (min: 79.0, avg: 167.5, max: 319.0) [2024-06-15 14:19:40,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:45,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44237.0, 300 sec: 43098.3). Total num frames: 438435840. Throughput: 0: 11480.2. Samples: 109686784. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:19:45,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:46,097][1652475] Updated weights for policy 0, policy_version 214083 (0.0013) [2024-06-15 14:19:46,352][1651340] Signal inference workers to stop experience collection... (11000 times) [2024-06-15 14:19:46,443][1652475] InferenceWorker_p0-w0: stopping experience collection (11000 times) [2024-06-15 14:19:46,504][1651340] Signal inference workers to resume experience collection... (11000 times) [2024-06-15 14:19:46,504][1652475] InferenceWorker_p0-w0: resuming experience collection (11000 times) [2024-06-15 14:19:47,127][1652475] Updated weights for policy 0, policy_version 214144 (0.0013) [2024-06-15 14:19:50,516][1652475] Updated weights for policy 0, policy_version 214224 (0.0140) [2024-06-15 14:19:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 46421.3, 300 sec: 43653.7). Total num frames: 438730752. Throughput: 0: 11389.2. Samples: 109721600. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:19:50,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 438829056. Throughput: 0: 11229.8. Samples: 109788672. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:19:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:19:56,033][1652475] Updated weights for policy 0, policy_version 214289 (0.0016) [2024-06-15 14:19:58,117][1652475] Updated weights for policy 0, policy_version 214352 (0.0015) [2024-06-15 14:20:00,300][1652475] Updated weights for policy 0, policy_version 214401 (0.0014) [2024-06-15 14:20:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44236.9, 300 sec: 43209.3). Total num frames: 439123968. Throughput: 0: 11218.5. Samples: 109853696. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:00,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:20:01,774][1652475] Updated weights for policy 0, policy_version 214455 (0.0015) [2024-06-15 14:20:03,537][1652475] Updated weights for policy 0, policy_version 214528 (0.0014) [2024-06-15 14:20:05,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44782.9, 300 sec: 43209.3). Total num frames: 439353344. Throughput: 0: 11218.5. Samples: 109878784. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:20:10,558][1652475] Updated weights for policy 0, policy_version 214595 (0.0021) [2024-06-15 14:20:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 439517184. Throughput: 0: 11071.2. Samples: 109945344. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:10,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:20:12,568][1652475] Updated weights for policy 0, policy_version 214672 (0.0013) [2024-06-15 14:20:14,955][1652475] Updated weights for policy 0, policy_version 214737 (0.0013) [2024-06-15 14:20:15,738][1648984] Fps is (10 sec: 49149.8, 60 sec: 45874.8, 300 sec: 43542.5). Total num frames: 439844864. Throughput: 0: 11070.4. Samples: 110009856. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:15,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:20:20,127][1652475] Updated weights for policy 0, policy_version 214791 (0.0022) [2024-06-15 14:20:20,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.9, 300 sec: 42987.2). Total num frames: 439943168. Throughput: 0: 11047.8. Samples: 110048256. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:20:22,291][1652475] Updated weights for policy 0, policy_version 214864 (0.0023) [2024-06-15 14:20:24,773][1652475] Updated weights for policy 0, policy_version 214960 (0.0102) [2024-06-15 14:20:25,738][1648984] Fps is (10 sec: 42600.6, 60 sec: 45877.1, 300 sec: 43542.6). Total num frames: 440270848. Throughput: 0: 11047.8. Samples: 110112256. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:20:26,545][1652475] Updated weights for policy 0, policy_version 214999 (0.0042) [2024-06-15 14:20:27,428][1652475] Updated weights for policy 0, policy_version 215039 (0.0013) [2024-06-15 14:20:30,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.6, 300 sec: 43101.5). Total num frames: 440401920. Throughput: 0: 11116.1. Samples: 110187008. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:20:33,806][1652475] Updated weights for policy 0, policy_version 215108 (0.0012) [2024-06-15 14:20:34,853][1652475] Updated weights for policy 0, policy_version 215162 (0.0022) [2024-06-15 14:20:35,353][1651340] Signal inference workers to stop experience collection... (11050 times) [2024-06-15 14:20:35,385][1652475] InferenceWorker_p0-w0: stopping experience collection (11050 times) [2024-06-15 14:20:35,531][1651340] Signal inference workers to resume experience collection... (11050 times) [2024-06-15 14:20:35,565][1652475] InferenceWorker_p0-w0: resuming experience collection (11050 times) [2024-06-15 14:20:35,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 440696832. Throughput: 0: 11013.7. Samples: 110217216. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:35,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:20:36,303][1652475] Updated weights for policy 0, policy_version 215220 (0.0012) [2024-06-15 14:20:37,400][1652475] Updated weights for policy 0, policy_version 215267 (0.0012) [2024-06-15 14:20:40,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 440926208. Throughput: 0: 11150.2. Samples: 110290432. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:40,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:20:43,470][1652475] Updated weights for policy 0, policy_version 215312 (0.0108) [2024-06-15 14:20:45,500][1652475] Updated weights for policy 0, policy_version 215392 (0.0044) [2024-06-15 14:20:45,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 441122816. Throughput: 0: 11218.5. Samples: 110358528. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:20:47,112][1652475] Updated weights for policy 0, policy_version 215447 (0.0016) [2024-06-15 14:20:47,917][1652475] Updated weights for policy 0, policy_version 215488 (0.0012) [2024-06-15 14:20:50,272][1652475] Updated weights for policy 0, policy_version 215547 (0.0011) [2024-06-15 14:20:50,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 45329.0, 300 sec: 43656.0). Total num frames: 441450496. Throughput: 0: 11343.6. Samples: 110389248. Policy #0 lag: (min: 4.0, avg: 86.5, max: 260.0) [2024-06-15 14:20:50,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:20:55,643][1652475] Updated weights for policy 0, policy_version 215616 (0.0016) [2024-06-15 14:20:55,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 45875.2, 300 sec: 43431.5). Total num frames: 441581568. Throughput: 0: 11377.8. Samples: 110457344. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:20:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 14:20:55,788][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000215616_441581568.pth... [2024-06-15 14:20:55,830][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000210512_431128576.pth [2024-06-15 14:20:58,091][1652475] Updated weights for policy 0, policy_version 215676 (0.0075) [2024-06-15 14:21:00,738][1648984] Fps is (10 sec: 32767.3, 60 sec: 44236.6, 300 sec: 43320.4). Total num frames: 441778176. Throughput: 0: 11184.4. Samples: 110513152. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:00,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:21:01,375][1652475] Updated weights for policy 0, policy_version 215743 (0.0016) [2024-06-15 14:21:05,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 441974784. Throughput: 0: 10865.8. Samples: 110537216. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:05,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:21:08,343][1652475] Updated weights for policy 0, policy_version 215813 (0.0013) [2024-06-15 14:21:10,700][1652475] Updated weights for policy 0, policy_version 215875 (0.0013) [2024-06-15 14:21:10,738][1648984] Fps is (10 sec: 32769.1, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 442105856. Throughput: 0: 10820.3. Samples: 110599168. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:10,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:21:11,905][1652475] Updated weights for policy 0, policy_version 215932 (0.0012) [2024-06-15 14:21:14,418][1652475] Updated weights for policy 0, policy_version 216000 (0.0013) [2024-06-15 14:21:15,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43691.0, 300 sec: 43431.5). Total num frames: 442466304. Throughput: 0: 10456.2. Samples: 110657536. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:15,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:21:15,912][1652475] Updated weights for policy 0, policy_version 216063 (0.0013) [2024-06-15 14:21:20,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 442499072. Throughput: 0: 10524.4. Samples: 110690816. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:21:22,764][1652475] Updated weights for policy 0, policy_version 216124 (0.0091) [2024-06-15 14:21:24,156][1652475] Updated weights for policy 0, policy_version 216162 (0.0016) [2024-06-15 14:21:25,414][1651340] Signal inference workers to stop experience collection... (11100 times) [2024-06-15 14:21:25,453][1652475] InferenceWorker_p0-w0: stopping experience collection (11100 times) [2024-06-15 14:21:25,455][1652475] Updated weights for policy 0, policy_version 216212 (0.0014) [2024-06-15 14:21:25,673][1651340] Signal inference workers to resume experience collection... (11100 times) [2024-06-15 14:21:25,678][1652475] InferenceWorker_p0-w0: resuming experience collection (11100 times) [2024-06-15 14:21:25,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 42598.4, 300 sec: 43653.7). Total num frames: 442826752. Throughput: 0: 10456.2. Samples: 110760960. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:21:27,049][1652475] Updated weights for policy 0, policy_version 216273 (0.0013) [2024-06-15 14:21:30,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 443023360. Throughput: 0: 10387.9. Samples: 110825984. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:21:33,606][1652475] Updated weights for policy 0, policy_version 216336 (0.0046) [2024-06-15 14:21:34,817][1652475] Updated weights for policy 0, policy_version 216382 (0.0013) [2024-06-15 14:21:35,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 443219968. Throughput: 0: 10581.3. Samples: 110865408. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:21:35,967][1652475] Updated weights for policy 0, policy_version 216436 (0.0056) [2024-06-15 14:21:37,735][1652475] Updated weights for policy 0, policy_version 216480 (0.0014) [2024-06-15 14:21:39,554][1652475] Updated weights for policy 0, policy_version 216547 (0.0012) [2024-06-15 14:21:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 443547648. Throughput: 0: 10387.9. Samples: 110924800. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:40,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:21:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 443613184. Throughput: 0: 10797.6. Samples: 110999040. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:45,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:21:46,182][1652475] Updated weights for policy 0, policy_version 216640 (0.0014) [2024-06-15 14:21:49,387][1652475] Updated weights for policy 0, policy_version 216706 (0.0015) [2024-06-15 14:21:50,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.1, 300 sec: 43653.6). Total num frames: 443940864. Throughput: 0: 10763.4. Samples: 111021568. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:50,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:21:50,832][1652475] Updated weights for policy 0, policy_version 216777 (0.0015) [2024-06-15 14:21:55,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 444071936. Throughput: 0: 10922.6. Samples: 111090688. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:21:55,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:21:57,458][1652475] Updated weights for policy 0, policy_version 216848 (0.0014) [2024-06-15 14:21:59,279][1652475] Updated weights for policy 0, policy_version 216912 (0.0014) [2024-06-15 14:22:00,739][1648984] Fps is (10 sec: 39317.6, 60 sec: 42597.8, 300 sec: 43542.4). Total num frames: 444334080. Throughput: 0: 10945.2. Samples: 111150080. Policy #0 lag: (min: 5.0, avg: 76.0, max: 261.0) [2024-06-15 14:22:00,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:01,917][1652475] Updated weights for policy 0, policy_version 216998 (0.0013) [2024-06-15 14:22:03,633][1652475] Updated weights for policy 0, policy_version 217060 (0.0012) [2024-06-15 14:22:05,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 444596224. Throughput: 0: 10899.9. Samples: 111181312. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:10,353][1652475] Updated weights for policy 0, policy_version 217120 (0.0013) [2024-06-15 14:22:10,738][1648984] Fps is (10 sec: 32771.5, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 444661760. Throughput: 0: 10945.4. Samples: 111253504. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:11,148][1652475] Updated weights for policy 0, policy_version 217151 (0.0012) [2024-06-15 14:22:11,401][1651340] Signal inference workers to stop experience collection... (11150 times) [2024-06-15 14:22:11,489][1652475] InferenceWorker_p0-w0: stopping experience collection (11150 times) [2024-06-15 14:22:11,747][1651340] Signal inference workers to resume experience collection... (11150 times) [2024-06-15 14:22:11,747][1652475] InferenceWorker_p0-w0: resuming experience collection (11150 times) [2024-06-15 14:22:12,747][1652475] Updated weights for policy 0, policy_version 217204 (0.0011) [2024-06-15 14:22:14,288][1652475] Updated weights for policy 0, policy_version 217280 (0.0023) [2024-06-15 14:22:15,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 445054976. Throughput: 0: 10808.9. Samples: 111312384. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:15,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:16,315][1652475] Updated weights for policy 0, policy_version 217344 (0.0071) [2024-06-15 14:22:20,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 445120512. Throughput: 0: 10763.4. Samples: 111349760. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:22,384][1652475] Updated weights for policy 0, policy_version 217407 (0.0012) [2024-06-15 14:22:25,565][1652475] Updated weights for policy 0, policy_version 217488 (0.0013) [2024-06-15 14:22:25,738][1648984] Fps is (10 sec: 36044.2, 60 sec: 43144.4, 300 sec: 43653.6). Total num frames: 445415424. Throughput: 0: 10945.4. Samples: 111417344. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:25,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:26,487][1652475] Updated weights for policy 0, policy_version 217530 (0.0014) [2024-06-15 14:22:28,609][1652475] Updated weights for policy 0, policy_version 217592 (0.0015) [2024-06-15 14:22:30,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 445644800. Throughput: 0: 10638.2. Samples: 111477760. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:33,970][1652475] Updated weights for policy 0, policy_version 217655 (0.0016) [2024-06-15 14:22:35,738][1648984] Fps is (10 sec: 36045.6, 60 sec: 42598.5, 300 sec: 43098.3). Total num frames: 445775872. Throughput: 0: 10945.4. Samples: 111514112. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:36,215][1652475] Updated weights for policy 0, policy_version 217696 (0.0019) [2024-06-15 14:22:39,375][1652475] Updated weights for policy 0, policy_version 217808 (0.0013) [2024-06-15 14:22:40,541][1652475] Updated weights for policy 0, policy_version 217856 (0.0015) [2024-06-15 14:22:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 446169088. Throughput: 0: 10877.2. Samples: 111580160. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:40,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:45,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 44236.9, 300 sec: 43431.5). Total num frames: 446267392. Throughput: 0: 11105.0. Samples: 111649792. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:45,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:45,829][1652475] Updated weights for policy 0, policy_version 217918 (0.0028) [2024-06-15 14:22:48,072][1652475] Updated weights for policy 0, policy_version 217984 (0.0014) [2024-06-15 14:22:49,701][1652475] Updated weights for policy 0, policy_version 218044 (0.0093) [2024-06-15 14:22:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 446562304. Throughput: 0: 11070.6. Samples: 111679488. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:50,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:51,335][1652475] Updated weights for policy 0, policy_version 218080 (0.0013) [2024-06-15 14:22:55,738][1648984] Fps is (10 sec: 42596.2, 60 sec: 43690.4, 300 sec: 43542.5). Total num frames: 446693376. Throughput: 0: 11116.0. Samples: 111753728. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:22:55,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:22:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000218112_446693376.pth... [2024-06-15 14:22:55,961][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000212992_436207616.pth [2024-06-15 14:22:56,035][1652475] Updated weights for policy 0, policy_version 218115 (0.0014) [2024-06-15 14:22:56,688][1651340] Signal inference workers to stop experience collection... (11200 times) [2024-06-15 14:22:56,727][1652475] InferenceWorker_p0-w0: stopping experience collection (11200 times) [2024-06-15 14:22:56,981][1651340] Signal inference workers to resume experience collection... (11200 times) [2024-06-15 14:22:56,982][1652475] InferenceWorker_p0-w0: resuming experience collection (11200 times) [2024-06-15 14:22:58,171][1652475] Updated weights for policy 0, policy_version 218178 (0.0014) [2024-06-15 14:22:59,158][1652475] Updated weights for policy 0, policy_version 218231 (0.0013) [2024-06-15 14:23:00,716][1652475] Updated weights for policy 0, policy_version 218288 (0.0014) [2024-06-15 14:23:00,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 45329.8, 300 sec: 43875.8). Total num frames: 447053824. Throughput: 0: 11275.4. Samples: 111819776. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:23:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:23:02,886][1652475] Updated weights for policy 0, policy_version 218337 (0.0100) [2024-06-15 14:23:05,738][1648984] Fps is (10 sec: 52431.5, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 447217664. Throughput: 0: 11093.3. Samples: 111848960. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:23:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 14:23:08,106][1652475] Updated weights for policy 0, policy_version 218400 (0.0012) [2024-06-15 14:23:09,964][1652475] Updated weights for policy 0, policy_version 218464 (0.0014) [2024-06-15 14:23:10,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 46967.5, 300 sec: 43764.7). Total num frames: 447479808. Throughput: 0: 11195.8. Samples: 111921152. Policy #0 lag: (min: 2.0, avg: 160.6, max: 274.0) [2024-06-15 14:23:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 14:23:12,601][1652475] Updated weights for policy 0, policy_version 218528 (0.0105) [2024-06-15 14:23:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.5, 300 sec: 43764.7). Total num frames: 447610880. Throughput: 0: 11138.8. Samples: 111979008. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:23:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:23:16,879][1652475] Updated weights for policy 0, policy_version 218597 (0.0041) [2024-06-15 14:23:20,711][1652475] Updated weights for policy 0, policy_version 218672 (0.0016) [2024-06-15 14:23:20,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 45329.0, 300 sec: 43875.8). Total num frames: 447840256. Throughput: 0: 11013.7. Samples: 112009728. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:23:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:23:24,125][1652475] Updated weights for policy 0, policy_version 218740 (0.0015) [2024-06-15 14:23:25,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.8, 300 sec: 43653.6). Total num frames: 448036864. Throughput: 0: 10831.7. Samples: 112067584. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:23:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:23:26,584][1652475] Updated weights for policy 0, policy_version 218808 (0.0014) [2024-06-15 14:23:28,902][1652475] Updated weights for policy 0, policy_version 218875 (0.0025) [2024-06-15 14:23:30,739][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 448266240. Throughput: 0: 10717.9. Samples: 112132096. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:23:30,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:23:33,614][1652475] Updated weights for policy 0, policy_version 218928 (0.0013) [2024-06-15 14:23:35,739][1648984] Fps is (10 sec: 36040.1, 60 sec: 43689.7, 300 sec: 43323.3). Total num frames: 448397312. Throughput: 0: 10854.1. Samples: 112167936. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:23:35,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:23:37,238][1652475] Updated weights for policy 0, policy_version 219013 (0.0017) [2024-06-15 14:23:38,035][1652475] Updated weights for policy 0, policy_version 219056 (0.0013) [2024-06-15 14:23:40,009][1652475] Updated weights for policy 0, policy_version 219088 (0.0013) [2024-06-15 14:23:40,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 43875.8). Total num frames: 448724992. Throughput: 0: 10649.7. Samples: 112232960. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:23:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:23:44,584][1652475] Updated weights for policy 0, policy_version 219152 (0.0012) [2024-06-15 14:23:45,738][1648984] Fps is (10 sec: 52435.9, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 448921600. Throughput: 0: 10615.5. Samples: 112297472. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:23:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:23:47,914][1651340] Signal inference workers to stop experience collection... (11250 times) [2024-06-15 14:23:47,913][1652475] Updated weights for policy 0, policy_version 219201 (0.0015) [2024-06-15 14:23:47,988][1652475] InferenceWorker_p0-w0: stopping experience collection (11250 times) [2024-06-15 14:23:48,200][1651340] Signal inference workers to resume experience collection... (11250 times) [2024-06-15 14:23:48,200][1652475] InferenceWorker_p0-w0: resuming experience collection (11250 times) [2024-06-15 14:23:50,046][1652475] Updated weights for policy 0, policy_version 219312 (0.0015) [2024-06-15 14:23:50,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 449183744. Throughput: 0: 10797.5. Samples: 112334848. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:23:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:23:52,683][1652475] Updated weights for policy 0, policy_version 219383 (0.0013) [2024-06-15 14:23:55,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43691.0, 300 sec: 43542.6). Total num frames: 449314816. Throughput: 0: 10513.0. Samples: 112394240. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:23:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:23:57,772][1652475] Updated weights for policy 0, policy_version 219451 (0.0014) [2024-06-15 14:24:00,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 40960.1, 300 sec: 43542.6). Total num frames: 449511424. Throughput: 0: 10786.1. Samples: 112464384. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:24:00,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:01,263][1652475] Updated weights for policy 0, policy_version 219527 (0.0013) [2024-06-15 14:24:03,394][1652475] Updated weights for policy 0, policy_version 219600 (0.0014) [2024-06-15 14:24:05,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 449839104. Throughput: 0: 10820.3. Samples: 112496640. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:24:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:08,962][1652475] Updated weights for policy 0, policy_version 219664 (0.0014) [2024-06-15 14:24:10,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 41506.1, 300 sec: 43653.6). Total num frames: 449970176. Throughput: 0: 11013.7. Samples: 112563200. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:24:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:11,449][1652475] Updated weights for policy 0, policy_version 219714 (0.0020) [2024-06-15 14:24:12,406][1652475] Updated weights for policy 0, policy_version 219765 (0.0012) [2024-06-15 14:24:13,916][1652475] Updated weights for policy 0, policy_version 219840 (0.0063) [2024-06-15 14:24:15,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 45329.0, 300 sec: 44209.0). Total num frames: 450330624. Throughput: 0: 11047.8. Samples: 112629248. Policy #0 lag: (min: 15.0, avg: 142.5, max: 271.0) [2024-06-15 14:24:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:20,340][1652475] Updated weights for policy 0, policy_version 219909 (0.0031) [2024-06-15 14:24:20,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 43654.0). Total num frames: 450396160. Throughput: 0: 11025.4. Samples: 112664064. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:24:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:21,328][1652475] Updated weights for policy 0, policy_version 219961 (0.0012) [2024-06-15 14:24:23,395][1652475] Updated weights for policy 0, policy_version 220013 (0.0089) [2024-06-15 14:24:24,963][1652475] Updated weights for policy 0, policy_version 220089 (0.0126) [2024-06-15 14:24:25,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 45328.9, 300 sec: 43986.8). Total num frames: 450756608. Throughput: 0: 11184.3. Samples: 112736256. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:24:25,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:27,060][1652475] Updated weights for policy 0, policy_version 220134 (0.0013) [2024-06-15 14:24:30,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 450887680. Throughput: 0: 11252.6. Samples: 112803840. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:24:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:32,025][1651340] Signal inference workers to stop experience collection... (11300 times) [2024-06-15 14:24:32,077][1652475] InferenceWorker_p0-w0: stopping experience collection (11300 times) [2024-06-15 14:24:32,274][1651340] Signal inference workers to resume experience collection... (11300 times) [2024-06-15 14:24:32,275][1652475] InferenceWorker_p0-w0: resuming experience collection (11300 times) [2024-06-15 14:24:32,277][1652475] Updated weights for policy 0, policy_version 220192 (0.0106) [2024-06-15 14:24:34,384][1652475] Updated weights for policy 0, policy_version 220226 (0.0013) [2024-06-15 14:24:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 46422.3, 300 sec: 43653.6). Total num frames: 451182592. Throughput: 0: 11150.2. Samples: 112836608. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:24:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:36,573][1652475] Updated weights for policy 0, policy_version 220336 (0.0105) [2024-06-15 14:24:38,541][1652475] Updated weights for policy 0, policy_version 220384 (0.0012) [2024-06-15 14:24:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 451411968. Throughput: 0: 11298.1. Samples: 112902656. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:24:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:42,854][1652475] Updated weights for policy 0, policy_version 220420 (0.0012) [2024-06-15 14:24:45,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 451543040. Throughput: 0: 11343.6. Samples: 112974848. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:24:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:46,072][1652475] Updated weights for policy 0, policy_version 220496 (0.0014) [2024-06-15 14:24:47,723][1652475] Updated weights for policy 0, policy_version 220580 (0.0013) [2024-06-15 14:24:49,929][1652475] Updated weights for policy 0, policy_version 220624 (0.0012) [2024-06-15 14:24:50,739][1648984] Fps is (10 sec: 49151.6, 60 sec: 45329.0, 300 sec: 44320.1). Total num frames: 451903488. Throughput: 0: 11264.0. Samples: 113003520. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:24:50,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:55,018][1652475] Updated weights for policy 0, policy_version 220688 (0.0013) [2024-06-15 14:24:55,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 43653.6). Total num frames: 452001792. Throughput: 0: 11411.9. Samples: 113076736. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:24:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:24:56,157][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000220736_452067328.pth... [2024-06-15 14:24:56,225][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000215616_441581568.pth [2024-06-15 14:24:57,332][1652475] Updated weights for policy 0, policy_version 220740 (0.0016) [2024-06-15 14:24:59,274][1652475] Updated weights for policy 0, policy_version 220832 (0.0104) [2024-06-15 14:24:59,961][1652475] Updated weights for policy 0, policy_version 220864 (0.0011) [2024-06-15 14:25:00,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 46967.4, 300 sec: 43986.9). Total num frames: 452329472. Throughput: 0: 11286.8. Samples: 113137152. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:25:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:25:03,249][1652475] Updated weights for policy 0, policy_version 220926 (0.0013) [2024-06-15 14:25:05,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 452460544. Throughput: 0: 11275.4. Samples: 113171456. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:25:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:25:07,730][1652475] Updated weights for policy 0, policy_version 220992 (0.0014) [2024-06-15 14:25:10,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 46421.3, 300 sec: 43764.8). Total num frames: 452755456. Throughput: 0: 11298.2. Samples: 113244672. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:25:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:25:11,204][1652475] Updated weights for policy 0, policy_version 221089 (0.0116) [2024-06-15 14:25:14,859][1652475] Updated weights for policy 0, policy_version 221154 (0.0051) [2024-06-15 14:25:14,868][1651340] Signal inference workers to stop experience collection... (11350 times) [2024-06-15 14:25:14,998][1652475] InferenceWorker_p0-w0: stopping experience collection (11350 times) [2024-06-15 14:25:15,098][1651340] Signal inference workers to resume experience collection... (11350 times) [2024-06-15 14:25:15,099][1652475] InferenceWorker_p0-w0: resuming experience collection (11350 times) [2024-06-15 14:25:15,622][1652475] Updated weights for policy 0, policy_version 221184 (0.0018) [2024-06-15 14:25:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 452984832. Throughput: 0: 11036.4. Samples: 113300480. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:25:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:25:19,760][1652475] Updated weights for policy 0, policy_version 221246 (0.0031) [2024-06-15 14:25:20,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 45329.0, 300 sec: 43542.6). Total num frames: 453115904. Throughput: 0: 11150.2. Samples: 113338368. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:25:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:25:23,650][1652475] Updated weights for policy 0, policy_version 221346 (0.0021) [2024-06-15 14:25:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 453378048. Throughput: 0: 10956.8. Samples: 113395712. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:25:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 14:25:28,088][1652475] Updated weights for policy 0, policy_version 221413 (0.0186) [2024-06-15 14:25:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 453509120. Throughput: 0: 10786.1. Samples: 113460224. Policy #0 lag: (min: 11.0, avg: 98.0, max: 267.0) [2024-06-15 14:25:30,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:25:30,843][1652475] Updated weights for policy 0, policy_version 221447 (0.0014) [2024-06-15 14:25:34,271][1652475] Updated weights for policy 0, policy_version 221536 (0.0015) [2024-06-15 14:25:35,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 453804032. Throughput: 0: 10899.9. Samples: 113494016. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:25:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:25:36,798][1652475] Updated weights for policy 0, policy_version 221632 (0.0135) [2024-06-15 14:25:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 43320.4). Total num frames: 453902336. Throughput: 0: 10342.4. Samples: 113542144. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:25:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:25:45,149][1652475] Updated weights for policy 0, policy_version 221729 (0.0014) [2024-06-15 14:25:45,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 454131712. Throughput: 0: 10467.5. Samples: 113608192. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:25:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:25:46,885][1652475] Updated weights for policy 0, policy_version 221761 (0.0013) [2024-06-15 14:25:48,991][1652475] Updated weights for policy 0, policy_version 221840 (0.0016) [2024-06-15 14:25:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 454426624. Throughput: 0: 10410.7. Samples: 113639936. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:25:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:25:54,287][1652475] Updated weights for policy 0, policy_version 221936 (0.0016) [2024-06-15 14:25:55,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42598.4, 300 sec: 43320.5). Total num frames: 454557696. Throughput: 0: 10285.5. Samples: 113707520. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:25:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:25:57,072][1652475] Updated weights for policy 0, policy_version 222013 (0.0021) [2024-06-15 14:25:58,780][1652475] Updated weights for policy 0, policy_version 222064 (0.0027) [2024-06-15 14:26:00,365][1652475] Updated weights for policy 0, policy_version 222096 (0.0016) [2024-06-15 14:26:00,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 454885376. Throughput: 0: 10683.7. Samples: 113781248. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:26:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:04,794][1651340] Signal inference workers to stop experience collection... (11400 times) [2024-06-15 14:26:04,823][1652475] InferenceWorker_p0-w0: stopping experience collection (11400 times) [2024-06-15 14:26:04,825][1652475] Updated weights for policy 0, policy_version 222178 (0.0044) [2024-06-15 14:26:05,029][1651340] Signal inference workers to resume experience collection... (11400 times) [2024-06-15 14:26:05,042][1652475] InferenceWorker_p0-w0: resuming experience collection (11400 times) [2024-06-15 14:26:05,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 455081984. Throughput: 0: 10683.7. Samples: 113819136. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:26:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:07,118][1652475] Updated weights for policy 0, policy_version 222226 (0.0014) [2024-06-15 14:26:10,318][1652475] Updated weights for policy 0, policy_version 222334 (0.0017) [2024-06-15 14:26:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.6, 300 sec: 43653.7). Total num frames: 455344128. Throughput: 0: 10877.1. Samples: 113885184. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:26:10,741][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:12,550][1652475] Updated weights for policy 0, policy_version 222395 (0.0015) [2024-06-15 14:26:15,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 41506.2, 300 sec: 43986.9). Total num frames: 455475200. Throughput: 0: 11036.5. Samples: 113956864. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:26:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:16,842][1652475] Updated weights for policy 0, policy_version 222441 (0.0028) [2024-06-15 14:26:18,433][1652475] Updated weights for policy 0, policy_version 222480 (0.0012) [2024-06-15 14:26:19,411][1652475] Updated weights for policy 0, policy_version 222521 (0.0021) [2024-06-15 14:26:20,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 43986.9). Total num frames: 455802880. Throughput: 0: 11116.1. Samples: 113994240. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:26:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:21,453][1652475] Updated weights for policy 0, policy_version 222592 (0.0015) [2024-06-15 14:26:25,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 455999488. Throughput: 0: 11434.7. Samples: 114056704. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:26:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:27,495][1652475] Updated weights for policy 0, policy_version 222672 (0.0018) [2024-06-15 14:26:29,726][1652475] Updated weights for policy 0, policy_version 222723 (0.0013) [2024-06-15 14:26:30,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 44782.8, 300 sec: 43986.9). Total num frames: 456196096. Throughput: 0: 11548.4. Samples: 114127872. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:26:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:31,255][1652475] Updated weights for policy 0, policy_version 222783 (0.0038) [2024-06-15 14:26:32,992][1652475] Updated weights for policy 0, policy_version 222848 (0.0013) [2024-06-15 14:26:35,725][1652475] Updated weights for policy 0, policy_version 222911 (0.0012) [2024-06-15 14:26:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45329.2, 300 sec: 43986.9). Total num frames: 456523776. Throughput: 0: 11457.4. Samples: 114155520. Policy #0 lag: (min: 8.0, avg: 110.5, max: 264.0) [2024-06-15 14:26:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:39,911][1652475] Updated weights for policy 0, policy_version 222974 (0.0015) [2024-06-15 14:26:40,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 45875.2, 300 sec: 44209.0). Total num frames: 456654848. Throughput: 0: 11628.1. Samples: 114230784. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:26:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:42,558][1652475] Updated weights for policy 0, policy_version 223038 (0.0029) [2024-06-15 14:26:44,392][1652475] Updated weights for policy 0, policy_version 223074 (0.0013) [2024-06-15 14:26:45,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 46421.4, 300 sec: 43986.9). Total num frames: 456916992. Throughput: 0: 11411.9. Samples: 114294784. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:26:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:46,059][1652475] Updated weights for policy 0, policy_version 223105 (0.0014) [2024-06-15 14:26:50,605][1652475] Updated weights for policy 0, policy_version 223187 (0.0014) [2024-06-15 14:26:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.7, 300 sec: 44098.0). Total num frames: 457080832. Throughput: 0: 11355.0. Samples: 114330112. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:26:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:51,367][1652475] Updated weights for policy 0, policy_version 223228 (0.0013) [2024-06-15 14:26:52,910][1651340] Signal inference workers to stop experience collection... (11450 times) [2024-06-15 14:26:52,956][1652475] InferenceWorker_p0-w0: stopping experience collection (11450 times) [2024-06-15 14:26:53,105][1651340] Signal inference workers to resume experience collection... (11450 times) [2024-06-15 14:26:53,107][1652475] InferenceWorker_p0-w0: resuming experience collection (11450 times) [2024-06-15 14:26:53,890][1652475] Updated weights for policy 0, policy_version 223295 (0.0022) [2024-06-15 14:26:55,742][1648984] Fps is (10 sec: 49129.3, 60 sec: 47509.9, 300 sec: 44319.6). Total num frames: 457408512. Throughput: 0: 11433.5. Samples: 114399744. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:26:55,743][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:26:55,843][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000223360_457441280.pth... [2024-06-15 14:26:55,892][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000218112_446693376.pth [2024-06-15 14:26:57,660][1652475] Updated weights for policy 0, policy_version 223363 (0.0013) [2024-06-15 14:26:58,722][1652475] Updated weights for policy 0, policy_version 223420 (0.0037) [2024-06-15 14:27:00,746][1648984] Fps is (10 sec: 49110.7, 60 sec: 44776.6, 300 sec: 43985.6). Total num frames: 457572352. Throughput: 0: 11364.2. Samples: 114468352. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:27:00,747][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:27:02,844][1652475] Updated weights for policy 0, policy_version 223474 (0.0107) [2024-06-15 14:27:05,084][1652475] Updated weights for policy 0, policy_version 223536 (0.0014) [2024-06-15 14:27:05,745][1648984] Fps is (10 sec: 42587.3, 60 sec: 45869.7, 300 sec: 44652.2). Total num frames: 457834496. Throughput: 0: 11319.1. Samples: 114503680. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:27:05,746][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:27:07,131][1652475] Updated weights for policy 0, policy_version 223584 (0.0014) [2024-06-15 14:27:09,491][1652475] Updated weights for policy 0, policy_version 223642 (0.0011) [2024-06-15 14:27:10,738][1648984] Fps is (10 sec: 52473.8, 60 sec: 45875.2, 300 sec: 44209.0). Total num frames: 458096640. Throughput: 0: 11366.4. Samples: 114568192. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:27:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:27:13,721][1652475] Updated weights for policy 0, policy_version 223696 (0.0012) [2024-06-15 14:27:14,756][1652475] Updated weights for policy 0, policy_version 223740 (0.0015) [2024-06-15 14:27:15,757][1648984] Fps is (10 sec: 39274.7, 60 sec: 45860.5, 300 sec: 44428.3). Total num frames: 458227712. Throughput: 0: 11373.0. Samples: 114639872. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:27:15,757][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:27:16,981][1652475] Updated weights for policy 0, policy_version 223803 (0.0015) [2024-06-15 14:27:19,166][1652475] Updated weights for policy 0, policy_version 223866 (0.0014) [2024-06-15 14:27:20,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45329.1, 300 sec: 44431.2). Total num frames: 458522624. Throughput: 0: 11468.8. Samples: 114671616. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:27:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:27:21,209][1652475] Updated weights for policy 0, policy_version 223920 (0.0012) [2024-06-15 14:27:25,738][1648984] Fps is (10 sec: 45963.1, 60 sec: 44782.8, 300 sec: 44209.0). Total num frames: 458686464. Throughput: 0: 11480.2. Samples: 114747392. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:27:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:27:26,217][1652475] Updated weights for policy 0, policy_version 223997 (0.0013) [2024-06-15 14:27:28,158][1652475] Updated weights for policy 0, policy_version 224062 (0.0075) [2024-06-15 14:27:29,992][1652475] Updated weights for policy 0, policy_version 224112 (0.0013) [2024-06-15 14:27:30,738][1648984] Fps is (10 sec: 49150.1, 60 sec: 46967.3, 300 sec: 44875.4). Total num frames: 459014144. Throughput: 0: 11343.6. Samples: 114805248. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:27:30,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:27:32,912][1652475] Updated weights for policy 0, policy_version 224153 (0.0013) [2024-06-15 14:27:35,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 459145216. Throughput: 0: 11411.9. Samples: 114843648. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:27:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:27:36,413][1652475] Updated weights for policy 0, policy_version 224196 (0.0012) [2024-06-15 14:27:37,343][1652475] Updated weights for policy 0, policy_version 224253 (0.0013) [2024-06-15 14:27:38,868][1652475] Updated weights for policy 0, policy_version 224304 (0.0013) [2024-06-15 14:27:40,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 45875.2, 300 sec: 44542.3). Total num frames: 459407360. Throughput: 0: 11356.2. Samples: 114910720. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:27:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:27:40,855][1651340] Signal inference workers to stop experience collection... (11500 times) [2024-06-15 14:27:40,931][1652475] InferenceWorker_p0-w0: stopping experience collection (11500 times) [2024-06-15 14:27:41,056][1651340] Signal inference workers to resume experience collection... (11500 times) [2024-06-15 14:27:41,056][1652475] InferenceWorker_p0-w0: resuming experience collection (11500 times) [2024-06-15 14:27:41,395][1652475] Updated weights for policy 0, policy_version 224352 (0.0013) [2024-06-15 14:27:44,676][1652475] Updated weights for policy 0, policy_version 224416 (0.0014) [2024-06-15 14:27:45,738][1648984] Fps is (10 sec: 52426.8, 60 sec: 45875.0, 300 sec: 44431.1). Total num frames: 459669504. Throughput: 0: 11186.4. Samples: 114971648. Policy #0 lag: (min: 25.0, avg: 119.3, max: 281.0) [2024-06-15 14:27:45,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:27:50,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 43690.7, 300 sec: 44098.0). Total num frames: 459702272. Throughput: 0: 11072.3. Samples: 115001856. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:27:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:27:50,994][1652475] Updated weights for policy 0, policy_version 224480 (0.0013) [2024-06-15 14:27:53,178][1652475] Updated weights for policy 0, policy_version 224565 (0.0012) [2024-06-15 14:27:55,738][1648984] Fps is (10 sec: 29491.7, 60 sec: 42601.6, 300 sec: 43764.7). Total num frames: 459964416. Throughput: 0: 10751.9. Samples: 115052032. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:27:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:27:56,294][1652475] Updated weights for policy 0, policy_version 224610 (0.0026) [2024-06-15 14:27:59,178][1652475] Updated weights for policy 0, policy_version 224688 (0.0015) [2024-06-15 14:28:00,738][1648984] Fps is (10 sec: 49152.8, 60 sec: 43696.9, 300 sec: 43986.9). Total num frames: 460193792. Throughput: 0: 10540.3. Samples: 115113984. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:28:03,389][1652475] Updated weights for policy 0, policy_version 224752 (0.0013) [2024-06-15 14:28:05,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 42057.4, 300 sec: 43653.6). Total num frames: 460357632. Throughput: 0: 10592.7. Samples: 115148288. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:28:06,388][1652475] Updated weights for policy 0, policy_version 224825 (0.0013) [2024-06-15 14:28:08,305][1652475] Updated weights for policy 0, policy_version 224892 (0.0016) [2024-06-15 14:28:10,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 460587008. Throughput: 0: 10296.9. Samples: 115210752. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:28:11,525][1652475] Updated weights for policy 0, policy_version 224931 (0.0013) [2024-06-15 14:28:14,161][1652475] Updated weights for policy 0, policy_version 224981 (0.0011) [2024-06-15 14:28:15,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43704.7, 300 sec: 44098.0). Total num frames: 460849152. Throughput: 0: 10638.3. Samples: 115283968. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:28:17,098][1652475] Updated weights for policy 0, policy_version 225057 (0.0014) [2024-06-15 14:28:18,128][1652475] Updated weights for policy 0, policy_version 225090 (0.0017) [2024-06-15 14:28:19,377][1652475] Updated weights for policy 0, policy_version 225144 (0.0013) [2024-06-15 14:28:20,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43144.5, 300 sec: 44320.1). Total num frames: 461111296. Throughput: 0: 10638.2. Samples: 115322368. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:28:22,620][1652475] Updated weights for policy 0, policy_version 225203 (0.0012) [2024-06-15 14:28:25,728][1652475] Updated weights for policy 0, policy_version 225252 (0.0014) [2024-06-15 14:28:25,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 461307904. Throughput: 0: 10638.2. Samples: 115389440. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:28:28,282][1652475] Updated weights for policy 0, policy_version 225312 (0.0014) [2024-06-15 14:28:30,328][1651340] Signal inference workers to stop experience collection... (11550 times) [2024-06-15 14:28:30,371][1652475] InferenceWorker_p0-w0: stopping experience collection (11550 times) [2024-06-15 14:28:30,575][1651340] Signal inference workers to resume experience collection... (11550 times) [2024-06-15 14:28:30,577][1652475] InferenceWorker_p0-w0: resuming experience collection (11550 times) [2024-06-15 14:28:30,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42598.7, 300 sec: 44653.6). Total num frames: 461570048. Throughput: 0: 10809.0. Samples: 115458048. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:28:31,011][1652475] Updated weights for policy 0, policy_version 225392 (0.0017) [2024-06-15 14:28:34,068][1652475] Updated weights for policy 0, policy_version 225456 (0.0016) [2024-06-15 14:28:35,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.6, 300 sec: 44209.0). Total num frames: 461766656. Throughput: 0: 10786.1. Samples: 115487232. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:28:37,481][1652475] Updated weights for policy 0, policy_version 225507 (0.0013) [2024-06-15 14:28:40,241][1652475] Updated weights for policy 0, policy_version 225555 (0.0030) [2024-06-15 14:28:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 44209.0). Total num frames: 461963264. Throughput: 0: 11264.1. Samples: 115558912. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:28:42,170][1652475] Updated weights for policy 0, policy_version 225616 (0.0018) [2024-06-15 14:28:43,013][1652475] Updated weights for policy 0, policy_version 225657 (0.0012) [2024-06-15 14:28:45,431][1652475] Updated weights for policy 0, policy_version 225723 (0.0013) [2024-06-15 14:28:45,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.9, 300 sec: 44431.2). Total num frames: 462290944. Throughput: 0: 11264.0. Samples: 115620864. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:28:49,960][1652475] Updated weights for policy 0, policy_version 225785 (0.0014) [2024-06-15 14:28:50,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 45329.1, 300 sec: 44431.2). Total num frames: 462422016. Throughput: 0: 11355.0. Samples: 115659264. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:28:52,723][1652475] Updated weights for policy 0, policy_version 225850 (0.0013) [2024-06-15 14:28:54,808][1652475] Updated weights for policy 0, policy_version 225917 (0.0013) [2024-06-15 14:28:55,740][1648984] Fps is (10 sec: 39313.2, 60 sec: 45327.5, 300 sec: 44653.0). Total num frames: 462684160. Throughput: 0: 11365.9. Samples: 115722240. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:28:55,740][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:28:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000225920_462684160.pth... [2024-06-15 14:28:55,808][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000220736_452067328.pth [2024-06-15 14:28:57,358][1652475] Updated weights for policy 0, policy_version 225968 (0.0014) [2024-06-15 14:29:00,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 462815232. Throughput: 0: 11252.6. Samples: 115790336. Policy #0 lag: (min: 15.0, avg: 91.3, max: 271.0) [2024-06-15 14:29:00,740][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:01,199][1652475] Updated weights for policy 0, policy_version 225989 (0.0013) [2024-06-15 14:29:04,240][1652475] Updated weights for policy 0, policy_version 226080 (0.0117) [2024-06-15 14:29:05,738][1648984] Fps is (10 sec: 39330.0, 60 sec: 45329.0, 300 sec: 44431.2). Total num frames: 463077376. Throughput: 0: 11070.6. Samples: 115820544. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:06,607][1652475] Updated weights for policy 0, policy_version 226147 (0.0013) [2024-06-15 14:29:08,819][1652475] Updated weights for policy 0, policy_version 226192 (0.0015) [2024-06-15 14:29:10,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 44098.0). Total num frames: 463339520. Throughput: 0: 11013.7. Samples: 115885056. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:13,682][1652475] Updated weights for policy 0, policy_version 226272 (0.0020) [2024-06-15 14:29:15,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 463470592. Throughput: 0: 10956.8. Samples: 115951104. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:15,951][1652475] Updated weights for policy 0, policy_version 226320 (0.0011) [2024-06-15 14:29:17,031][1652475] Updated weights for policy 0, policy_version 226368 (0.0031) [2024-06-15 14:29:18,514][1652475] Updated weights for policy 0, policy_version 226424 (0.0016) [2024-06-15 14:29:20,169][1651340] Signal inference workers to stop experience collection... (11600 times) [2024-06-15 14:29:20,237][1652475] InferenceWorker_p0-w0: stopping experience collection (11600 times) [2024-06-15 14:29:20,366][1651340] Signal inference workers to resume experience collection... (11600 times) [2024-06-15 14:29:20,367][1652475] InferenceWorker_p0-w0: resuming experience collection (11600 times) [2024-06-15 14:29:20,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 44098.0). Total num frames: 463765504. Throughput: 0: 11070.6. Samples: 115985408. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:21,286][1652475] Updated weights for policy 0, policy_version 226492 (0.0015) [2024-06-15 14:29:25,739][1648984] Fps is (10 sec: 49151.5, 60 sec: 44236.8, 300 sec: 44320.1). Total num frames: 463962112. Throughput: 0: 11025.0. Samples: 116055040. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:25,740][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:25,962][1652475] Updated weights for policy 0, policy_version 226555 (0.0012) [2024-06-15 14:29:29,124][1652475] Updated weights for policy 0, policy_version 226640 (0.0014) [2024-06-15 14:29:30,738][1648984] Fps is (10 sec: 49149.5, 60 sec: 44782.5, 300 sec: 44320.0). Total num frames: 464257024. Throughput: 0: 11002.2. Samples: 116115968. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:30,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:32,578][1652475] Updated weights for policy 0, policy_version 226705 (0.0014) [2024-06-15 14:29:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 464388096. Throughput: 0: 10854.4. Samples: 116147712. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:37,485][1652475] Updated weights for policy 0, policy_version 226791 (0.0017) [2024-06-15 14:29:39,745][1652475] Updated weights for policy 0, policy_version 226848 (0.0014) [2024-06-15 14:29:40,738][1648984] Fps is (10 sec: 39323.2, 60 sec: 44782.9, 300 sec: 44431.2). Total num frames: 464650240. Throughput: 0: 11128.0. Samples: 116222976. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:41,195][1652475] Updated weights for policy 0, policy_version 226901 (0.0016) [2024-06-15 14:29:43,954][1652475] Updated weights for policy 0, policy_version 226960 (0.0013) [2024-06-15 14:29:45,739][1648984] Fps is (10 sec: 52424.1, 60 sec: 43690.0, 300 sec: 44097.8). Total num frames: 464912384. Throughput: 0: 10922.4. Samples: 116281856. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:45,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:48,759][1652475] Updated weights for policy 0, policy_version 227028 (0.0015) [2024-06-15 14:29:50,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43690.6, 300 sec: 44209.0). Total num frames: 465043456. Throughput: 0: 11195.7. Samples: 116324352. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:50,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:29:50,773][1652475] Updated weights for policy 0, policy_version 227088 (0.0014) [2024-06-15 14:29:51,617][1652475] Updated weights for policy 0, policy_version 227128 (0.0015) [2024-06-15 14:29:53,033][1652475] Updated weights for policy 0, policy_version 227185 (0.0017) [2024-06-15 14:29:55,738][1648984] Fps is (10 sec: 42602.4, 60 sec: 44238.4, 300 sec: 44097.9). Total num frames: 465338368. Throughput: 0: 11218.5. Samples: 116389888. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:29:55,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 14:29:56,383][1652475] Updated weights for policy 0, policy_version 227259 (0.0015) [2024-06-15 14:30:00,738][1648984] Fps is (10 sec: 39322.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 465436672. Throughput: 0: 11241.2. Samples: 116456960. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:30:00,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:30:02,036][1652475] Updated weights for policy 0, policy_version 227312 (0.0013) [2024-06-15 14:30:05,682][1652475] Updated weights for policy 0, policy_version 227394 (0.0014) [2024-06-15 14:30:05,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 465698816. Throughput: 0: 11059.2. Samples: 116483072. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:30:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:30:06,941][1652475] Updated weights for policy 0, policy_version 227454 (0.0025) [2024-06-15 14:30:07,326][1651340] Signal inference workers to stop experience collection... (11650 times) [2024-06-15 14:30:07,369][1652475] InferenceWorker_p0-w0: stopping experience collection (11650 times) [2024-06-15 14:30:07,540][1651340] Signal inference workers to resume experience collection... (11650 times) [2024-06-15 14:30:07,548][1652475] InferenceWorker_p0-w0: resuming experience collection (11650 times) [2024-06-15 14:30:10,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 465960960. Throughput: 0: 10695.1. Samples: 116536320. Policy #0 lag: (min: 15.0, avg: 104.5, max: 271.0) [2024-06-15 14:30:10,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 14:30:15,141][1652475] Updated weights for policy 0, policy_version 227536 (0.0015) [2024-06-15 14:30:15,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 466026496. Throughput: 0: 10797.6. Samples: 116601856. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:30:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 14:30:17,412][1652475] Updated weights for policy 0, policy_version 227632 (0.0014) [2024-06-15 14:30:19,422][1652475] Updated weights for policy 0, policy_version 227705 (0.0016) [2024-06-15 14:30:20,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 466354176. Throughput: 0: 10626.9. Samples: 116625920. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:30:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 14:30:22,796][1652475] Updated weights for policy 0, policy_version 227769 (0.0014) [2024-06-15 14:30:25,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42052.3, 300 sec: 43986.9). Total num frames: 466485248. Throughput: 0: 10331.0. Samples: 116687872. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:30:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:30:28,497][1652475] Updated weights for policy 0, policy_version 227824 (0.0042) [2024-06-15 14:30:30,224][1652475] Updated weights for policy 0, policy_version 227875 (0.0016) [2024-06-15 14:30:30,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40960.3, 300 sec: 43764.7). Total num frames: 466714624. Throughput: 0: 10490.5. Samples: 116753920. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:30:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:30:32,384][1652475] Updated weights for policy 0, policy_version 227962 (0.0012) [2024-06-15 14:30:34,597][1652475] Updated weights for policy 0, policy_version 228000 (0.0013) [2024-06-15 14:30:35,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 467009536. Throughput: 0: 10114.9. Samples: 116779520. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:30:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:30:39,494][1652475] Updated weights for policy 0, policy_version 228066 (0.0014) [2024-06-15 14:30:40,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 44098.0). Total num frames: 467140608. Throughput: 0: 10422.1. Samples: 116858880. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:30:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:30:41,585][1652475] Updated weights for policy 0, policy_version 228116 (0.0020) [2024-06-15 14:30:43,548][1652475] Updated weights for policy 0, policy_version 228192 (0.0015) [2024-06-15 14:30:45,738][1648984] Fps is (10 sec: 42596.1, 60 sec: 42052.6, 300 sec: 44097.9). Total num frames: 467435520. Throughput: 0: 10251.3. Samples: 116918272. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:30:45,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:30:45,853][1652475] Updated weights for policy 0, policy_version 228241 (0.0012) [2024-06-15 14:30:46,691][1652475] Updated weights for policy 0, policy_version 228282 (0.0042) [2024-06-15 14:30:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.3, 300 sec: 43986.9). Total num frames: 467533824. Throughput: 0: 10467.6. Samples: 116954112. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:30:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:30:53,178][1652475] Updated weights for policy 0, policy_version 228353 (0.0013) [2024-06-15 14:30:55,162][1651340] Signal inference workers to stop experience collection... (11700 times) [2024-06-15 14:30:55,187][1652475] Updated weights for policy 0, policy_version 228433 (0.0101) [2024-06-15 14:30:55,217][1652475] InferenceWorker_p0-w0: stopping experience collection (11700 times) [2024-06-15 14:30:55,462][1651340] Signal inference workers to resume experience collection... (11700 times) [2024-06-15 14:30:55,463][1652475] InferenceWorker_p0-w0: resuming experience collection (11700 times) [2024-06-15 14:30:55,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.0, 300 sec: 43986.8). Total num frames: 467861504. Throughput: 0: 10786.0. Samples: 117021696. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:30:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:30:56,158][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000228480_467927040.pth... [2024-06-15 14:30:56,203][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000223360_457441280.pth [2024-06-15 14:30:56,209][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000228480_467927040.pth [2024-06-15 14:30:57,833][1652475] Updated weights for policy 0, policy_version 228499 (0.0164) [2024-06-15 14:31:00,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 468058112. Throughput: 0: 10774.7. Samples: 117086720. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:31:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:03,364][1652475] Updated weights for policy 0, policy_version 228576 (0.0013) [2024-06-15 14:31:05,487][1652475] Updated weights for policy 0, policy_version 228626 (0.0012) [2024-06-15 14:31:05,738][1648984] Fps is (10 sec: 36046.4, 60 sec: 42052.4, 300 sec: 43653.6). Total num frames: 468221952. Throughput: 0: 11047.8. Samples: 117123072. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:31:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:06,922][1652475] Updated weights for policy 0, policy_version 228689 (0.0014) [2024-06-15 14:31:07,848][1652475] Updated weights for policy 0, policy_version 228730 (0.0012) [2024-06-15 14:31:09,588][1652475] Updated weights for policy 0, policy_version 228770 (0.0045) [2024-06-15 14:31:10,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 468582400. Throughput: 0: 11047.8. Samples: 117185024. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:31:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:14,094][1652475] Updated weights for policy 0, policy_version 228816 (0.0069) [2024-06-15 14:31:15,076][1652475] Updated weights for policy 0, policy_version 228856 (0.0012) [2024-06-15 14:31:15,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 468713472. Throughput: 0: 11264.0. Samples: 117260800. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:31:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:17,721][1652475] Updated weights for policy 0, policy_version 228916 (0.0031) [2024-06-15 14:31:19,556][1652475] Updated weights for policy 0, policy_version 228991 (0.0139) [2024-06-15 14:31:20,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 468975616. Throughput: 0: 11366.4. Samples: 117291008. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:31:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:22,012][1652475] Updated weights for policy 0, policy_version 229048 (0.0015) [2024-06-15 14:31:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 469106688. Throughput: 0: 11116.1. Samples: 117359104. Policy #0 lag: (min: 12.0, avg: 75.2, max: 268.0) [2024-06-15 14:31:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:26,887][1652475] Updated weights for policy 0, policy_version 229104 (0.0013) [2024-06-15 14:31:28,999][1652475] Updated weights for policy 0, policy_version 229139 (0.0013) [2024-06-15 14:31:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44783.0, 300 sec: 43653.6). Total num frames: 469401600. Throughput: 0: 11264.1. Samples: 117425152. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:31:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:30,808][1652475] Updated weights for policy 0, policy_version 229216 (0.0015) [2024-06-15 14:31:31,957][1652475] Updated weights for policy 0, policy_version 229253 (0.0039) [2024-06-15 14:31:33,283][1652475] Updated weights for policy 0, policy_version 229311 (0.0014) [2024-06-15 14:31:35,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 469630976. Throughput: 0: 10968.2. Samples: 117447680. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:31:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:39,037][1652475] Updated weights for policy 0, policy_version 229373 (0.0020) [2024-06-15 14:31:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 469794816. Throughput: 0: 11230.0. Samples: 117527040. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:31:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:41,482][1651340] Signal inference workers to stop experience collection... (11750 times) [2024-06-15 14:31:41,561][1652475] InferenceWorker_p0-w0: stopping experience collection (11750 times) [2024-06-15 14:31:41,822][1651340] Signal inference workers to resume experience collection... (11750 times) [2024-06-15 14:31:41,823][1652475] InferenceWorker_p0-w0: resuming experience collection (11750 times) [2024-06-15 14:31:42,452][1652475] Updated weights for policy 0, policy_version 229458 (0.0015) [2024-06-15 14:31:43,555][1652475] Updated weights for policy 0, policy_version 229503 (0.0016) [2024-06-15 14:31:45,027][1652475] Updated weights for policy 0, policy_version 229541 (0.0021) [2024-06-15 14:31:45,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 45329.2, 300 sec: 44320.1). Total num frames: 470155264. Throughput: 0: 10979.5. Samples: 117580800. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:31:45,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:50,411][1652475] Updated weights for policy 0, policy_version 229616 (0.0110) [2024-06-15 14:31:50,770][1648984] Fps is (10 sec: 48993.0, 60 sec: 45850.4, 300 sec: 43649.5). Total num frames: 470286336. Throughput: 0: 11153.5. Samples: 117625344. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:31:50,771][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:53,487][1652475] Updated weights for policy 0, policy_version 229667 (0.0013) [2024-06-15 14:31:55,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 44237.1, 300 sec: 43877.1). Total num frames: 470515712. Throughput: 0: 11036.5. Samples: 117681664. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:31:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:31:55,811][1652475] Updated weights for policy 0, policy_version 229758 (0.0012) [2024-06-15 14:31:57,393][1652475] Updated weights for policy 0, policy_version 229815 (0.0018) [2024-06-15 14:32:00,738][1648984] Fps is (10 sec: 39449.8, 60 sec: 43690.7, 300 sec: 43543.6). Total num frames: 470679552. Throughput: 0: 10922.7. Samples: 117752320. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:32:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:32:05,161][1652475] Updated weights for policy 0, policy_version 229894 (0.0012) [2024-06-15 14:32:05,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 470843392. Throughput: 0: 10911.3. Samples: 117782016. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:32:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:32:06,593][1652475] Updated weights for policy 0, policy_version 229958 (0.0012) [2024-06-15 14:32:08,139][1652475] Updated weights for policy 0, policy_version 230019 (0.0010) [2024-06-15 14:32:09,336][1652475] Updated weights for policy 0, policy_version 230077 (0.0153) [2024-06-15 14:32:10,747][1648984] Fps is (10 sec: 52383.9, 60 sec: 43684.5, 300 sec: 43988.5). Total num frames: 471203840. Throughput: 0: 10715.8. Samples: 117841408. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:32:10,762][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 14:32:15,287][1652475] Updated weights for policy 0, policy_version 230144 (0.0014) [2024-06-15 14:32:15,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 471334912. Throughput: 0: 10797.5. Samples: 117911040. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:32:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:32:18,617][1652475] Updated weights for policy 0, policy_version 230208 (0.0015) [2024-06-15 14:32:20,270][1652475] Updated weights for policy 0, policy_version 230273 (0.0026) [2024-06-15 14:32:20,738][1648984] Fps is (10 sec: 42635.1, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 471629824. Throughput: 0: 10968.2. Samples: 117941248. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:32:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:32:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 471728128. Throughput: 0: 10490.3. Samples: 117999104. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:32:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:32:27,467][1652475] Updated weights for policy 0, policy_version 230338 (0.0015) [2024-06-15 14:32:28,878][1652475] Updated weights for policy 0, policy_version 230398 (0.0014) [2024-06-15 14:32:29,954][1651340] Signal inference workers to stop experience collection... (11800 times) [2024-06-15 14:32:29,985][1652475] InferenceWorker_p0-w0: stopping experience collection (11800 times) [2024-06-15 14:32:30,157][1651340] Signal inference workers to resume experience collection... (11800 times) [2024-06-15 14:32:30,158][1652475] InferenceWorker_p0-w0: resuming experience collection (11800 times) [2024-06-15 14:32:30,738][1648984] Fps is (10 sec: 32766.6, 60 sec: 42598.1, 300 sec: 43431.4). Total num frames: 471957504. Throughput: 0: 10649.6. Samples: 118060032. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:32:30,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:32:30,810][1652475] Updated weights for policy 0, policy_version 230453 (0.0014) [2024-06-15 14:32:33,254][1652475] Updated weights for policy 0, policy_version 230525 (0.0013) [2024-06-15 14:32:34,966][1652475] Updated weights for policy 0, policy_version 230588 (0.0012) [2024-06-15 14:32:35,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 472252416. Throughput: 0: 10281.6. Samples: 118087680. Policy #0 lag: (min: 15.0, avg: 105.9, max: 271.0) [2024-06-15 14:32:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:32:40,396][1652475] Updated weights for policy 0, policy_version 230655 (0.0013) [2024-06-15 14:32:40,758][1648984] Fps is (10 sec: 42514.7, 60 sec: 43130.1, 300 sec: 43095.4). Total num frames: 472383488. Throughput: 0: 10485.7. Samples: 118153728. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:32:40,758][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:32:45,071][1652475] Updated weights for policy 0, policy_version 230720 (0.0013) [2024-06-15 14:32:45,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 39868.0, 300 sec: 43542.6). Total num frames: 472547328. Throughput: 0: 10308.3. Samples: 118216192. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:32:45,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:32:46,464][1652475] Updated weights for policy 0, policy_version 230776 (0.0013) [2024-06-15 14:32:48,126][1652475] Updated weights for policy 0, policy_version 230839 (0.0012) [2024-06-15 14:32:50,738][1648984] Fps is (10 sec: 39400.5, 60 sec: 41528.6, 300 sec: 43431.5). Total num frames: 472776704. Throughput: 0: 10194.5. Samples: 118240768. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:32:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:32:52,203][1652475] Updated weights for policy 0, policy_version 230912 (0.0018) [2024-06-15 14:32:55,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 39867.7, 300 sec: 43098.2). Total num frames: 472907776. Throughput: 0: 10458.1. Samples: 118311936. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:32:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:32:56,344][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000230944_472973312.pth... [2024-06-15 14:32:56,483][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000225920_462684160.pth [2024-06-15 14:32:57,962][1652475] Updated weights for policy 0, policy_version 231009 (0.0016) [2024-06-15 14:33:00,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42598.3, 300 sec: 43653.6). Total num frames: 473235456. Throughput: 0: 10183.1. Samples: 118369280. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:33:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:00,825][1652475] Updated weights for policy 0, policy_version 231073 (0.0013) [2024-06-15 14:33:04,310][1652475] Updated weights for policy 0, policy_version 231107 (0.0013) [2024-06-15 14:33:05,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 473432064. Throughput: 0: 10262.7. Samples: 118403072. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:33:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:08,152][1652475] Updated weights for policy 0, policy_version 231184 (0.0013) [2024-06-15 14:33:09,919][1652475] Updated weights for policy 0, policy_version 231253 (0.0011) [2024-06-15 14:33:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 40965.8, 300 sec: 43431.5). Total num frames: 473661440. Throughput: 0: 10433.4. Samples: 118468608. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:33:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:12,451][1652475] Updated weights for policy 0, policy_version 231328 (0.0104) [2024-06-15 14:33:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 473825280. Throughput: 0: 10535.9. Samples: 118534144. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:33:15,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:16,672][1651340] Signal inference workers to stop experience collection... (11850 times) [2024-06-15 14:33:16,709][1652475] InferenceWorker_p0-w0: stopping experience collection (11850 times) [2024-06-15 14:33:16,842][1651340] Signal inference workers to resume experience collection... (11850 times) [2024-06-15 14:33:16,850][1652475] InferenceWorker_p0-w0: resuming experience collection (11850 times) [2024-06-15 14:33:16,852][1652475] Updated weights for policy 0, policy_version 231392 (0.0014) [2024-06-15 14:33:20,424][1652475] Updated weights for policy 0, policy_version 231465 (0.0015) [2024-06-15 14:33:20,741][1648984] Fps is (10 sec: 39310.4, 60 sec: 40411.9, 300 sec: 43208.9). Total num frames: 474054656. Throughput: 0: 10671.7. Samples: 118567936. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:33:20,741][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:22,289][1652475] Updated weights for policy 0, policy_version 231540 (0.0015) [2024-06-15 14:33:24,782][1652475] Updated weights for policy 0, policy_version 231600 (0.0013) [2024-06-15 14:33:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 474349568. Throughput: 0: 10711.2. Samples: 118635520. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:33:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:28,234][1652475] Updated weights for policy 0, policy_version 231639 (0.0012) [2024-06-15 14:33:30,738][1648984] Fps is (10 sec: 42610.9, 60 sec: 42052.5, 300 sec: 43098.3). Total num frames: 474480640. Throughput: 0: 10854.4. Samples: 118704640. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:33:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:31,710][1652475] Updated weights for policy 0, policy_version 231697 (0.0014) [2024-06-15 14:33:33,708][1652475] Updated weights for policy 0, policy_version 231776 (0.0014) [2024-06-15 14:33:35,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 43320.4). Total num frames: 474742784. Throughput: 0: 10911.3. Samples: 118731776. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:33:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:37,386][1652475] Updated weights for policy 0, policy_version 231867 (0.0011) [2024-06-15 14:33:40,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42612.6, 300 sec: 42876.1). Total num frames: 474939392. Throughput: 0: 10763.4. Samples: 118796288. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:33:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:41,233][1652475] Updated weights for policy 0, policy_version 231927 (0.0013) [2024-06-15 14:33:45,049][1652475] Updated weights for policy 0, policy_version 231968 (0.0016) [2024-06-15 14:33:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.3, 300 sec: 42987.2). Total num frames: 475103232. Throughput: 0: 10956.8. Samples: 118862336. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 14:33:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:48,475][1652475] Updated weights for policy 0, policy_version 232065 (0.0015) [2024-06-15 14:33:50,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.7, 300 sec: 43098.6). Total num frames: 475398144. Throughput: 0: 10899.9. Samples: 118893568. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:33:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:52,191][1652475] Updated weights for policy 0, policy_version 232148 (0.0062) [2024-06-15 14:33:55,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 475529216. Throughput: 0: 10922.7. Samples: 118960128. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:33:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:33:55,822][1652475] Updated weights for policy 0, policy_version 232208 (0.0140) [2024-06-15 14:33:57,359][1652475] Updated weights for policy 0, policy_version 232272 (0.0013) [2024-06-15 14:34:00,680][1652475] Updated weights for policy 0, policy_version 232336 (0.0033) [2024-06-15 14:34:00,738][1648984] Fps is (10 sec: 42597.3, 60 sec: 43144.4, 300 sec: 43209.3). Total num frames: 475824128. Throughput: 0: 11081.9. Samples: 119032832. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:00,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:34:01,077][1651340] Signal inference workers to stop experience collection... (11900 times) [2024-06-15 14:34:01,120][1652475] InferenceWorker_p0-w0: stopping experience collection (11900 times) [2024-06-15 14:34:01,252][1651340] Signal inference workers to resume experience collection... (11900 times) [2024-06-15 14:34:01,253][1652475] InferenceWorker_p0-w0: resuming experience collection (11900 times) [2024-06-15 14:34:05,403][1652475] Updated weights for policy 0, policy_version 232442 (0.0017) [2024-06-15 14:34:05,738][1648984] Fps is (10 sec: 52426.4, 60 sec: 43690.3, 300 sec: 43098.2). Total num frames: 476053504. Throughput: 0: 11059.8. Samples: 119065600. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:05,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:34:08,625][1652475] Updated weights for policy 0, policy_version 232496 (0.0012) [2024-06-15 14:34:10,738][1648984] Fps is (10 sec: 49153.3, 60 sec: 44236.9, 300 sec: 43542.6). Total num frames: 476315648. Throughput: 0: 10865.8. Samples: 119124480. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:34:12,836][1652475] Updated weights for policy 0, policy_version 232579 (0.0016) [2024-06-15 14:34:15,738][1648984] Fps is (10 sec: 39323.4, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 476446720. Throughput: 0: 10820.3. Samples: 119191552. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:34:16,557][1652475] Updated weights for policy 0, policy_version 232642 (0.0015) [2024-06-15 14:34:17,969][1652475] Updated weights for policy 0, policy_version 232704 (0.0013) [2024-06-15 14:34:20,428][1652475] Updated weights for policy 0, policy_version 232761 (0.0018) [2024-06-15 14:34:20,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44238.9, 300 sec: 43209.3). Total num frames: 476708864. Throughput: 0: 10922.6. Samples: 119223296. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:34:21,481][1652475] Updated weights for policy 0, policy_version 232816 (0.0014) [2024-06-15 14:34:21,850][1652475] Updated weights for policy 0, policy_version 232832 (0.0041) [2024-06-15 14:34:25,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 476971008. Throughput: 0: 11207.1. Samples: 119300608. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:34:27,723][1652475] Updated weights for policy 0, policy_version 232901 (0.0093) [2024-06-15 14:34:28,913][1652475] Updated weights for policy 0, policy_version 232958 (0.0016) [2024-06-15 14:34:30,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 477167616. Throughput: 0: 11252.6. Samples: 119368704. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:34:31,238][1652475] Updated weights for policy 0, policy_version 233025 (0.0013) [2024-06-15 14:34:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 477364224. Throughput: 0: 11116.1. Samples: 119393792. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:35,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:34:36,177][1652475] Updated weights for policy 0, policy_version 233104 (0.0015) [2024-06-15 14:34:37,476][1652475] Updated weights for policy 0, policy_version 233150 (0.0013) [2024-06-15 14:34:40,743][1648984] Fps is (10 sec: 32749.0, 60 sec: 42594.3, 300 sec: 42653.2). Total num frames: 477495296. Throughput: 0: 11103.3. Samples: 119459840. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:40,744][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:34:43,109][1652475] Updated weights for policy 0, policy_version 233248 (0.0013) [2024-06-15 14:34:45,110][1652475] Updated weights for policy 0, policy_version 233296 (0.0019) [2024-06-15 14:34:45,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 45875.2, 300 sec: 43431.5). Total num frames: 477855744. Throughput: 0: 10672.4. Samples: 119513088. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 14:34:48,586][1651340] Signal inference workers to stop experience collection... (11950 times) [2024-06-15 14:34:48,615][1652475] InferenceWorker_p0-w0: stopping experience collection (11950 times) [2024-06-15 14:34:48,957][1651340] Signal inference workers to resume experience collection... (11950 times) [2024-06-15 14:34:48,957][1652475] InferenceWorker_p0-w0: resuming experience collection (11950 times) [2024-06-15 14:34:49,375][1652475] Updated weights for policy 0, policy_version 233376 (0.0012) [2024-06-15 14:34:50,738][1648984] Fps is (10 sec: 52457.5, 60 sec: 43690.4, 300 sec: 42987.1). Total num frames: 478019584. Throughput: 0: 10695.2. Samples: 119546880. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:50,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 14:34:55,285][1652475] Updated weights for policy 0, policy_version 233440 (0.0014) [2024-06-15 14:34:55,738][1648984] Fps is (10 sec: 26213.3, 60 sec: 43144.3, 300 sec: 42987.1). Total num frames: 478117888. Throughput: 0: 10751.9. Samples: 119608320. Policy #0 lag: (min: 11.0, avg: 143.5, max: 267.0) [2024-06-15 14:34:55,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 14:34:56,092][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000233472_478150656.pth... [2024-06-15 14:34:56,229][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000228480_467927040.pth [2024-06-15 14:34:57,250][1652475] Updated weights for policy 0, policy_version 233520 (0.0013) [2024-06-15 14:34:59,147][1652475] Updated weights for policy 0, policy_version 233596 (0.0021) [2024-06-15 14:35:00,755][1648984] Fps is (10 sec: 39256.7, 60 sec: 43132.6, 300 sec: 43095.8). Total num frames: 478412800. Throughput: 0: 10486.4. Samples: 119663616. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:00,756][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:35:05,738][1648984] Fps is (10 sec: 42600.1, 60 sec: 41506.5, 300 sec: 42653.9). Total num frames: 478543872. Throughput: 0: 10581.3. Samples: 119699456. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:07,486][1652475] Updated weights for policy 0, policy_version 233696 (0.0013) [2024-06-15 14:35:08,868][1652475] Updated weights for policy 0, policy_version 233744 (0.0011) [2024-06-15 14:35:10,738][1648984] Fps is (10 sec: 42670.5, 60 sec: 42052.3, 300 sec: 43431.5). Total num frames: 478838784. Throughput: 0: 10217.3. Samples: 119760384. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:10,977][1652475] Updated weights for policy 0, policy_version 233824 (0.0097) [2024-06-15 14:35:15,505][1652475] Updated weights for policy 0, policy_version 233872 (0.0028) [2024-06-15 14:35:15,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 478969856. Throughput: 0: 10171.7. Samples: 119826432. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:19,794][1652475] Updated weights for policy 0, policy_version 233952 (0.0082) [2024-06-15 14:35:20,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.2, 300 sec: 43098.2). Total num frames: 479199232. Throughput: 0: 10319.7. Samples: 119858176. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:22,179][1652475] Updated weights for policy 0, policy_version 234032 (0.0012) [2024-06-15 14:35:23,656][1652475] Updated weights for policy 0, policy_version 234096 (0.0014) [2024-06-15 14:35:25,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 479461376. Throughput: 0: 10059.2. Samples: 119912448. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:28,519][1652475] Updated weights for policy 0, policy_version 234160 (0.0014) [2024-06-15 14:35:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40413.9, 300 sec: 42653.9). Total num frames: 479592448. Throughput: 0: 10558.6. Samples: 119988224. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:31,353][1652475] Updated weights for policy 0, policy_version 234197 (0.0019) [2024-06-15 14:35:33,514][1652475] Updated weights for policy 0, policy_version 234257 (0.0038) [2024-06-15 14:35:34,525][1651340] Signal inference workers to stop experience collection... (12000 times) [2024-06-15 14:35:34,587][1652475] InferenceWorker_p0-w0: stopping experience collection (12000 times) [2024-06-15 14:35:34,735][1651340] Signal inference workers to resume experience collection... (12000 times) [2024-06-15 14:35:34,736][1652475] InferenceWorker_p0-w0: resuming experience collection (12000 times) [2024-06-15 14:35:35,030][1652475] Updated weights for policy 0, policy_version 234336 (0.0013) [2024-06-15 14:35:35,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43542.5). Total num frames: 479985664. Throughput: 0: 10558.6. Samples: 120022016. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:35,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:40,095][1652475] Updated weights for policy 0, policy_version 234402 (0.0012) [2024-06-15 14:35:40,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43148.7, 300 sec: 42876.2). Total num frames: 480083968. Throughput: 0: 10740.7. Samples: 120091648. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:43,460][1652475] Updated weights for policy 0, policy_version 234449 (0.0014) [2024-06-15 14:35:44,588][1652475] Updated weights for policy 0, policy_version 234496 (0.0031) [2024-06-15 14:35:45,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 480346112. Throughput: 0: 10915.4. Samples: 120154624. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:46,264][1652475] Updated weights for policy 0, policy_version 234576 (0.0011) [2024-06-15 14:35:47,004][1652475] Updated weights for policy 0, policy_version 234624 (0.0013) [2024-06-15 14:35:50,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 41506.3, 300 sec: 42876.1). Total num frames: 480509952. Throughput: 0: 10843.0. Samples: 120187392. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:52,040][1652475] Updated weights for policy 0, policy_version 234688 (0.0011) [2024-06-15 14:35:55,740][1648984] Fps is (10 sec: 42598.0, 60 sec: 44237.0, 300 sec: 43098.2). Total num frames: 480772096. Throughput: 0: 11059.2. Samples: 120258048. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:35:55,746][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:35:56,670][1652475] Updated weights for policy 0, policy_version 234768 (0.0015) [2024-06-15 14:35:57,776][1652475] Updated weights for policy 0, policy_version 234816 (0.0012) [2024-06-15 14:35:58,947][1652475] Updated weights for policy 0, policy_version 234874 (0.0119) [2024-06-15 14:36:00,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43702.9, 300 sec: 43431.5). Total num frames: 481034240. Throughput: 0: 10934.1. Samples: 120318464. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:36:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:36:03,595][1652475] Updated weights for policy 0, policy_version 234928 (0.0026) [2024-06-15 14:36:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 481165312. Throughput: 0: 11047.8. Samples: 120355328. Policy #0 lag: (min: 95.0, avg: 173.6, max: 271.0) [2024-06-15 14:36:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:36:07,066][1652475] Updated weights for policy 0, policy_version 235000 (0.0015) [2024-06-15 14:36:09,374][1652475] Updated weights for policy 0, policy_version 235059 (0.0014) [2024-06-15 14:36:10,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 481525760. Throughput: 0: 11252.6. Samples: 120418816. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:36:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:36:10,871][1652475] Updated weights for policy 0, policy_version 235129 (0.0013) [2024-06-15 14:36:15,141][1652475] Updated weights for policy 0, policy_version 235184 (0.0025) [2024-06-15 14:36:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 45329.1, 300 sec: 43098.2). Total num frames: 481689600. Throughput: 0: 11161.6. Samples: 120490496. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:36:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:36:18,431][1652475] Updated weights for policy 0, policy_version 235256 (0.0014) [2024-06-15 14:36:19,983][1651340] Signal inference workers to stop experience collection... (12050 times) [2024-06-15 14:36:20,019][1652475] InferenceWorker_p0-w0: stopping experience collection (12050 times) [2024-06-15 14:36:20,210][1651340] Signal inference workers to resume experience collection... (12050 times) [2024-06-15 14:36:20,211][1652475] InferenceWorker_p0-w0: resuming experience collection (12050 times) [2024-06-15 14:36:20,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 45329.0, 300 sec: 43431.5). Total num frames: 481918976. Throughput: 0: 11161.6. Samples: 120524288. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:36:20,739][1652475] Updated weights for policy 0, policy_version 235323 (0.0014) [2024-06-15 14:36:20,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:36:22,135][1652475] Updated weights for policy 0, policy_version 235362 (0.0130) [2024-06-15 14:36:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 482082816. Throughput: 0: 11013.7. Samples: 120587264. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:36:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:36:26,553][1652475] Updated weights for policy 0, policy_version 235424 (0.0017) [2024-06-15 14:36:29,132][1652475] Updated weights for policy 0, policy_version 235461 (0.0017) [2024-06-15 14:36:30,259][1652475] Updated weights for policy 0, policy_version 235520 (0.0078) [2024-06-15 14:36:30,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 482344960. Throughput: 0: 11252.6. Samples: 120660992. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:36:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:36:32,761][1652475] Updated weights for policy 0, policy_version 235588 (0.0013) [2024-06-15 14:36:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 482607104. Throughput: 0: 11195.8. Samples: 120691200. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:36:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:36:37,699][1652475] Updated weights for policy 0, policy_version 235651 (0.0022) [2024-06-15 14:36:40,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 44236.6, 300 sec: 42653.9). Total num frames: 482738176. Throughput: 0: 11093.3. Samples: 120757248. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:36:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:36:41,544][1652475] Updated weights for policy 0, policy_version 235728 (0.0023) [2024-06-15 14:36:43,105][1652475] Updated weights for policy 0, policy_version 235792 (0.0018) [2024-06-15 14:36:44,327][1652475] Updated weights for policy 0, policy_version 235840 (0.0151) [2024-06-15 14:36:45,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 45875.1, 300 sec: 43436.3). Total num frames: 483098624. Throughput: 0: 11070.6. Samples: 120816640. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:36:45,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:36:45,782][1652475] Updated weights for policy 0, policy_version 235900 (0.0013) [2024-06-15 14:36:50,738][1648984] Fps is (10 sec: 45876.5, 60 sec: 44783.0, 300 sec: 42987.2). Total num frames: 483196928. Throughput: 0: 11104.7. Samples: 120855040. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:36:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:36:51,064][1652475] Updated weights for policy 0, policy_version 235952 (0.0035) [2024-06-15 14:36:54,084][1652475] Updated weights for policy 0, policy_version 235989 (0.0014) [2024-06-15 14:36:55,557][1652475] Updated weights for policy 0, policy_version 236048 (0.0018) [2024-06-15 14:36:55,746][1648984] Fps is (10 sec: 32740.2, 60 sec: 44230.5, 300 sec: 43208.1). Total num frames: 483426304. Throughput: 0: 11239.1. Samples: 120924672. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:36:55,747][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:36:56,439][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000236080_483491840.pth... [2024-06-15 14:36:56,589][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000230944_472973312.pth [2024-06-15 14:36:57,654][1652475] Updated weights for policy 0, policy_version 236128 (0.0012) [2024-06-15 14:37:00,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 483655680. Throughput: 0: 10922.7. Samples: 120982016. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:37:00,740][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:37:02,381][1652475] Updated weights for policy 0, policy_version 236176 (0.0017) [2024-06-15 14:37:05,738][1648984] Fps is (10 sec: 36075.1, 60 sec: 43690.5, 300 sec: 42655.1). Total num frames: 483786752. Throughput: 0: 10797.5. Samples: 121010176. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:37:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 14:37:08,427][1651340] Signal inference workers to stop experience collection... (12100 times) [2024-06-15 14:37:08,488][1652475] InferenceWorker_p0-w0: stopping experience collection (12100 times) [2024-06-15 14:37:08,700][1651340] Signal inference workers to resume experience collection... (12100 times) [2024-06-15 14:37:08,710][1652475] InferenceWorker_p0-w0: resuming experience collection (12100 times) [2024-06-15 14:37:08,712][1652475] Updated weights for policy 0, policy_version 236272 (0.0030) [2024-06-15 14:37:10,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 484016128. Throughput: 0: 10695.1. Samples: 121068544. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:37:10,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:37:10,856][1652475] Updated weights for policy 0, policy_version 236352 (0.0012) [2024-06-15 14:37:12,460][1652475] Updated weights for policy 0, policy_version 236411 (0.0012) [2024-06-15 14:37:15,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 41506.1, 300 sec: 42542.8). Total num frames: 484179968. Throughput: 0: 10262.8. Samples: 121122816. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 14:37:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:37:18,427][1652475] Updated weights for policy 0, policy_version 236477 (0.0012) [2024-06-15 14:37:20,738][1648984] Fps is (10 sec: 29490.7, 60 sec: 39867.6, 300 sec: 42653.9). Total num frames: 484311040. Throughput: 0: 10262.7. Samples: 121153024. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:37:20,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:37:21,913][1652475] Updated weights for policy 0, policy_version 236528 (0.0083) [2024-06-15 14:37:23,734][1652475] Updated weights for policy 0, policy_version 236608 (0.0015) [2024-06-15 14:37:25,174][1652475] Updated weights for policy 0, policy_version 236672 (0.0091) [2024-06-15 14:37:25,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43209.4). Total num frames: 484704256. Throughput: 0: 9978.4. Samples: 121206272. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:37:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:37:30,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 39867.7, 300 sec: 42320.7). Total num frames: 484737024. Throughput: 0: 10433.4. Samples: 121286144. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:37:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:37:31,327][1652475] Updated weights for policy 0, policy_version 236728 (0.0013) [2024-06-15 14:37:33,493][1652475] Updated weights for policy 0, policy_version 236776 (0.0013) [2024-06-15 14:37:35,376][1652475] Updated weights for policy 0, policy_version 236848 (0.0013) [2024-06-15 14:37:35,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 40960.0, 300 sec: 42990.1). Total num frames: 485064704. Throughput: 0: 10194.5. Samples: 121313792. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:37:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:37:36,776][1652475] Updated weights for policy 0, policy_version 236912 (0.0014) [2024-06-15 14:37:40,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 41506.3, 300 sec: 42987.2). Total num frames: 485228544. Throughput: 0: 10082.6. Samples: 121378304. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:37:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:37:43,743][1652475] Updated weights for policy 0, policy_version 236981 (0.0014) [2024-06-15 14:37:44,652][1652475] Updated weights for policy 0, policy_version 237011 (0.0014) [2024-06-15 14:37:45,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 39321.6, 300 sec: 42987.2). Total num frames: 485457920. Throughput: 0: 10365.1. Samples: 121448448. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:37:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:37:47,483][1652475] Updated weights for policy 0, policy_version 237120 (0.0013) [2024-06-15 14:37:48,298][1651340] Signal inference workers to stop experience collection... (12150 times) [2024-06-15 14:37:48,331][1652475] InferenceWorker_p0-w0: stopping experience collection (12150 times) [2024-06-15 14:37:48,485][1651340] Signal inference workers to resume experience collection... (12150 times) [2024-06-15 14:37:48,485][1652475] InferenceWorker_p0-w0: resuming experience collection (12150 times) [2024-06-15 14:37:48,756][1652475] Updated weights for policy 0, policy_version 237184 (0.0011) [2024-06-15 14:37:50,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 485752832. Throughput: 0: 10137.6. Samples: 121466368. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:37:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:37:55,535][1652475] Updated weights for policy 0, policy_version 237239 (0.0038) [2024-06-15 14:37:55,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 40965.8, 300 sec: 42876.1). Total num frames: 485883904. Throughput: 0: 10706.5. Samples: 121550336. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:37:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:37:57,538][1652475] Updated weights for policy 0, policy_version 237283 (0.0116) [2024-06-15 14:37:59,714][1652475] Updated weights for policy 0, policy_version 237379 (0.0184) [2024-06-15 14:38:00,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 486244352. Throughput: 0: 10626.9. Samples: 121601024. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:38:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:00,793][1652475] Updated weights for policy 0, policy_version 237440 (0.0017) [2024-06-15 14:38:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.3, 300 sec: 42765.0). Total num frames: 486277120. Throughput: 0: 10877.2. Samples: 121642496. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:38:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:07,555][1652475] Updated weights for policy 0, policy_version 237504 (0.0137) [2024-06-15 14:38:09,969][1652475] Updated weights for policy 0, policy_version 237570 (0.0014) [2024-06-15 14:38:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 486604800. Throughput: 0: 11082.0. Samples: 121704960. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:38:10,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:11,781][1652475] Updated weights for policy 0, policy_version 237664 (0.0012) [2024-06-15 14:38:15,760][1648984] Fps is (10 sec: 52312.2, 60 sec: 43674.4, 300 sec: 43206.5). Total num frames: 486801408. Throughput: 0: 10826.3. Samples: 121773568. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:38:15,761][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:18,960][1652475] Updated weights for policy 0, policy_version 237728 (0.0016) [2024-06-15 14:38:19,982][1652475] Updated weights for policy 0, policy_version 237776 (0.0013) [2024-06-15 14:38:20,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45329.3, 300 sec: 42987.2). Total num frames: 487030784. Throughput: 0: 11138.8. Samples: 121815040. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:38:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:20,984][1652475] Updated weights for policy 0, policy_version 237820 (0.0027) [2024-06-15 14:38:22,314][1652475] Updated weights for policy 0, policy_version 237868 (0.0011) [2024-06-15 14:38:23,735][1652475] Updated weights for policy 0, policy_version 237936 (0.0082) [2024-06-15 14:38:25,738][1648984] Fps is (10 sec: 52546.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 487325696. Throughput: 0: 11059.2. Samples: 121875968. Policy #0 lag: (min: 47.0, avg: 154.6, max: 303.0) [2024-06-15 14:38:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:30,292][1652475] Updated weights for policy 0, policy_version 238000 (0.0034) [2024-06-15 14:38:30,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 45329.2, 300 sec: 43098.3). Total num frames: 487456768. Throughput: 0: 11173.0. Samples: 121951232. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:38:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:31,401][1652475] Updated weights for policy 0, policy_version 238032 (0.0012) [2024-06-15 14:38:32,324][1652475] Updated weights for policy 0, policy_version 238071 (0.0014) [2024-06-15 14:38:32,721][1651340] Signal inference workers to stop experience collection... (12200 times) [2024-06-15 14:38:32,755][1652475] InferenceWorker_p0-w0: stopping experience collection (12200 times) [2024-06-15 14:38:33,037][1651340] Signal inference workers to resume experience collection... (12200 times) [2024-06-15 14:38:33,037][1652475] InferenceWorker_p0-w0: resuming experience collection (12200 times) [2024-06-15 14:38:33,714][1652475] Updated weights for policy 0, policy_version 238115 (0.0013) [2024-06-15 14:38:35,104][1652475] Updated weights for policy 0, policy_version 238192 (0.0012) [2024-06-15 14:38:35,741][1648984] Fps is (10 sec: 52428.6, 60 sec: 46421.3, 300 sec: 43764.7). Total num frames: 487849984. Throughput: 0: 11491.6. Samples: 121983488. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:38:35,742][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 487849984. Throughput: 0: 11161.6. Samples: 122052608. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:38:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:42,156][1652475] Updated weights for policy 0, policy_version 238272 (0.0015) [2024-06-15 14:38:43,761][1652475] Updated weights for policy 0, policy_version 238334 (0.0231) [2024-06-15 14:38:45,220][1652475] Updated weights for policy 0, policy_version 238384 (0.0013) [2024-06-15 14:38:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 46421.3, 300 sec: 43542.6). Total num frames: 488243200. Throughput: 0: 11423.3. Samples: 122115072. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:38:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:46,603][1652475] Updated weights for policy 0, policy_version 238448 (0.0016) [2024-06-15 14:38:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 488374272. Throughput: 0: 11264.0. Samples: 122149376. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:38:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:53,563][1652475] Updated weights for policy 0, policy_version 238519 (0.0014) [2024-06-15 14:38:55,432][1652475] Updated weights for policy 0, policy_version 238586 (0.0047) [2024-06-15 14:38:55,768][1648984] Fps is (10 sec: 39201.0, 60 sec: 45851.7, 300 sec: 43427.0). Total num frames: 488636416. Throughput: 0: 11563.3. Samples: 122225664. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:38:55,769][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:38:56,554][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000238624_488701952.pth... [2024-06-15 14:38:56,696][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000233472_478150656.pth [2024-06-15 14:38:57,704][1652475] Updated weights for policy 0, policy_version 238672 (0.0014) [2024-06-15 14:38:58,515][1652475] Updated weights for policy 0, policy_version 238714 (0.0014) [2024-06-15 14:39:00,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 488898560. Throughput: 0: 11349.3. Samples: 122284032. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:39:00,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:39:05,738][1648984] Fps is (10 sec: 32869.0, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 488964096. Throughput: 0: 11218.5. Samples: 122319872. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:39:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:39:05,988][1652475] Updated weights for policy 0, policy_version 238776 (0.0013) [2024-06-15 14:39:07,219][1652475] Updated weights for policy 0, policy_version 238819 (0.0013) [2024-06-15 14:39:09,069][1652475] Updated weights for policy 0, policy_version 238896 (0.0013) [2024-06-15 14:39:10,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 46421.3, 300 sec: 43875.8). Total num frames: 489390080. Throughput: 0: 11116.1. Samples: 122376192. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:39:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:39:10,817][1652475] Updated weights for policy 0, policy_version 238972 (0.0013) [2024-06-15 14:39:15,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43706.9, 300 sec: 43098.3). Total num frames: 489422848. Throughput: 0: 10956.8. Samples: 122444288. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:39:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:39:17,647][1651340] Signal inference workers to stop experience collection... (12250 times) [2024-06-15 14:39:17,683][1652475] InferenceWorker_p0-w0: stopping experience collection (12250 times) [2024-06-15 14:39:17,952][1651340] Signal inference workers to resume experience collection... (12250 times) [2024-06-15 14:39:17,953][1652475] InferenceWorker_p0-w0: resuming experience collection (12250 times) [2024-06-15 14:39:18,349][1652475] Updated weights for policy 0, policy_version 239039 (0.0012) [2024-06-15 14:39:20,559][1652475] Updated weights for policy 0, policy_version 239093 (0.0117) [2024-06-15 14:39:20,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 489684992. Throughput: 0: 10854.4. Samples: 122471936. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:39:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:39:22,528][1652475] Updated weights for policy 0, policy_version 239174 (0.0018) [2024-06-15 14:39:25,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 489947136. Throughput: 0: 10478.9. Samples: 122524160. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:39:25,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:39:30,180][1652475] Updated weights for policy 0, policy_version 239248 (0.0151) [2024-06-15 14:39:30,740][1648984] Fps is (10 sec: 32759.8, 60 sec: 42596.6, 300 sec: 42875.7). Total num frames: 490012672. Throughput: 0: 10694.5. Samples: 122596352. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:39:30,743][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:39:32,549][1652475] Updated weights for policy 0, policy_version 239328 (0.0017) [2024-06-15 14:39:34,691][1652475] Updated weights for policy 0, policy_version 239392 (0.0014) [2024-06-15 14:39:35,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 41506.2, 300 sec: 43543.4). Total num frames: 490340352. Throughput: 0: 10501.7. Samples: 122621952. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:39:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:39:36,335][1652475] Updated weights for policy 0, policy_version 239458 (0.0012) [2024-06-15 14:39:40,738][1648984] Fps is (10 sec: 45886.7, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 490471424. Throughput: 0: 10224.2. Samples: 122685440. Policy #0 lag: (min: 15.0, avg: 79.0, max: 271.0) [2024-06-15 14:39:40,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:39:43,822][1652475] Updated weights for policy 0, policy_version 239520 (0.0012) [2024-06-15 14:39:45,514][1652475] Updated weights for policy 0, policy_version 239588 (0.0014) [2024-06-15 14:39:45,738][1648984] Fps is (10 sec: 36044.2, 60 sec: 40959.9, 300 sec: 42987.2). Total num frames: 490700800. Throughput: 0: 10490.3. Samples: 122756096. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:39:45,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:39:48,416][1652475] Updated weights for policy 0, policy_version 239712 (0.0016) [2024-06-15 14:39:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43653.7). Total num frames: 490995712. Throughput: 0: 10126.2. Samples: 122775552. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:39:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:39:55,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 39341.8, 300 sec: 42656.4). Total num frames: 490995712. Throughput: 0: 10547.2. Samples: 122850816. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:39:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:39:56,186][1652475] Updated weights for policy 0, policy_version 239763 (0.0013) [2024-06-15 14:39:58,046][1652475] Updated weights for policy 0, policy_version 239840 (0.0012) [2024-06-15 14:39:59,808][1652475] Updated weights for policy 0, policy_version 239920 (0.0012) [2024-06-15 14:39:59,898][1651340] Signal inference workers to stop experience collection... (12300 times) [2024-06-15 14:39:59,967][1652475] InferenceWorker_p0-w0: stopping experience collection (12300 times) [2024-06-15 14:40:00,274][1651340] Signal inference workers to resume experience collection... (12300 times) [2024-06-15 14:40:00,275][1652475] InferenceWorker_p0-w0: resuming experience collection (12300 times) [2024-06-15 14:40:00,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 491421696. Throughput: 0: 10217.2. Samples: 122904064. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:01,873][1652475] Updated weights for policy 0, policy_version 239999 (0.0013) [2024-06-15 14:40:05,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 42598.5, 300 sec: 42987.2). Total num frames: 491520000. Throughput: 0: 10251.4. Samples: 122933248. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:09,944][1652475] Updated weights for policy 0, policy_version 240080 (0.0012) [2024-06-15 14:40:10,741][1648984] Fps is (10 sec: 32767.8, 60 sec: 39321.6, 300 sec: 43320.4). Total num frames: 491749376. Throughput: 0: 10808.9. Samples: 123010560. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:10,745][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:11,731][1652475] Updated weights for policy 0, policy_version 240151 (0.0017) [2024-06-15 14:40:14,149][1652475] Updated weights for policy 0, policy_version 240246 (0.0094) [2024-06-15 14:40:15,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 492044288. Throughput: 0: 10331.6. Samples: 123061248. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:20,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 39321.6, 300 sec: 42654.0). Total num frames: 492044288. Throughput: 0: 10672.4. Samples: 123102208. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:21,547][1652475] Updated weights for policy 0, policy_version 240305 (0.0046) [2024-06-15 14:40:23,960][1652475] Updated weights for policy 0, policy_version 240400 (0.0198) [2024-06-15 14:40:25,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 43764.7). Total num frames: 492503040. Throughput: 0: 10535.8. Samples: 123159552. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:26,206][1652475] Updated weights for policy 0, policy_version 240512 (0.0014) [2024-06-15 14:40:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 42600.2, 300 sec: 42654.0). Total num frames: 492568576. Throughput: 0: 10433.5. Samples: 123225600. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:33,901][1652475] Updated weights for policy 0, policy_version 240573 (0.0016) [2024-06-15 14:40:35,350][1652475] Updated weights for policy 0, policy_version 240631 (0.0013) [2024-06-15 14:40:35,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 492830720. Throughput: 0: 10786.1. Samples: 123260928. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:37,180][1652475] Updated weights for policy 0, policy_version 240711 (0.0013) [2024-06-15 14:40:38,383][1652475] Updated weights for policy 0, policy_version 240768 (0.0077) [2024-06-15 14:40:40,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 43690.5, 300 sec: 43209.3). Total num frames: 493092864. Throughput: 0: 10410.6. Samples: 123319296. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:44,972][1651340] Signal inference workers to stop experience collection... (12350 times) [2024-06-15 14:40:45,031][1652475] InferenceWorker_p0-w0: stopping experience collection (12350 times) [2024-06-15 14:40:45,244][1651340] Signal inference workers to resume experience collection... (12350 times) [2024-06-15 14:40:45,245][1652475] InferenceWorker_p0-w0: resuming experience collection (12350 times) [2024-06-15 14:40:45,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.3, 300 sec: 42987.2). Total num frames: 493191168. Throughput: 0: 10865.8. Samples: 123393024. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:45,841][1652475] Updated weights for policy 0, policy_version 240830 (0.0012) [2024-06-15 14:40:47,394][1652475] Updated weights for policy 0, policy_version 240880 (0.0012) [2024-06-15 14:40:48,530][1652475] Updated weights for policy 0, policy_version 240928 (0.0012) [2024-06-15 14:40:50,305][1652475] Updated weights for policy 0, policy_version 240995 (0.0084) [2024-06-15 14:40:50,738][1648984] Fps is (10 sec: 49153.3, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 493584384. Throughput: 0: 10911.3. Samples: 123424256. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:55,738][1648984] Fps is (10 sec: 42596.5, 60 sec: 43690.4, 300 sec: 42653.9). Total num frames: 493617152. Throughput: 0: 10615.4. Samples: 123488256. Policy #0 lag: (min: 15.0, avg: 54.9, max: 207.0) [2024-06-15 14:40:55,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:40:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000241024_493617152.pth... [2024-06-15 14:40:55,780][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000236080_483491840.pth [2024-06-15 14:40:57,159][1652475] Updated weights for policy 0, policy_version 241059 (0.0067) [2024-06-15 14:40:58,964][1652475] Updated weights for policy 0, policy_version 241125 (0.0110) [2024-06-15 14:41:00,542][1652475] Updated weights for policy 0, policy_version 241185 (0.0014) [2024-06-15 14:41:00,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42052.2, 300 sec: 43320.4). Total num frames: 493944832. Throughput: 0: 10899.9. Samples: 123551744. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:41:02,369][1652475] Updated weights for policy 0, policy_version 241248 (0.0012) [2024-06-15 14:41:05,738][1648984] Fps is (10 sec: 52431.0, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 494141440. Throughput: 0: 10581.3. Samples: 123578368. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:41:08,735][1652475] Updated weights for policy 0, policy_version 241300 (0.0014) [2024-06-15 14:41:09,520][1652475] Updated weights for policy 0, policy_version 241344 (0.0022) [2024-06-15 14:41:10,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 494338048. Throughput: 0: 10956.8. Samples: 123652608. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:41:11,318][1652475] Updated weights for policy 0, policy_version 241398 (0.0012) [2024-06-15 14:41:12,680][1652475] Updated weights for policy 0, policy_version 241456 (0.0014) [2024-06-15 14:41:13,819][1652475] Updated weights for policy 0, policy_version 241493 (0.0012) [2024-06-15 14:41:15,757][1648984] Fps is (10 sec: 52327.5, 60 sec: 43676.6, 300 sec: 43206.5). Total num frames: 494665728. Throughput: 0: 10861.1. Samples: 123714560. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:15,758][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:41:20,333][1652475] Updated weights for policy 0, policy_version 241573 (0.0014) [2024-06-15 14:41:20,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 494796800. Throughput: 0: 10945.4. Samples: 123753472. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:41:22,369][1652475] Updated weights for policy 0, policy_version 241620 (0.0025) [2024-06-15 14:41:23,891][1652475] Updated weights for policy 0, policy_version 241683 (0.0012) [2024-06-15 14:41:24,877][1651340] Signal inference workers to stop experience collection... (12400 times) [2024-06-15 14:41:24,896][1651340] Signal inference workers to resume experience collection... (12400 times) [2024-06-15 14:41:24,913][1652475] InferenceWorker_p0-w0: stopping experience collection (12400 times) [2024-06-15 14:41:24,930][1652475] InferenceWorker_p0-w0: resuming experience collection (12400 times) [2024-06-15 14:41:25,379][1652475] Updated weights for policy 0, policy_version 241751 (0.0013) [2024-06-15 14:41:25,738][1648984] Fps is (10 sec: 45964.3, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 495124480. Throughput: 0: 11082.0. Samples: 123817984. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:25,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 14:41:26,235][1652475] Updated weights for policy 0, policy_version 241789 (0.0021) [2024-06-15 14:41:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 495190016. Throughput: 0: 10990.9. Samples: 123887616. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:41:35,277][1652475] Updated weights for policy 0, policy_version 241888 (0.0093) [2024-06-15 14:41:35,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 495419392. Throughput: 0: 11138.9. Samples: 123925504. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:41:37,067][1652475] Updated weights for policy 0, policy_version 241971 (0.0012) [2024-06-15 14:41:38,832][1652475] Updated weights for policy 0, policy_version 242048 (0.0015) [2024-06-15 14:41:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.9, 300 sec: 42765.0). Total num frames: 495714304. Throughput: 0: 10626.9. Samples: 123966464. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:41:45,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 495714304. Throughput: 0: 10752.0. Samples: 124035584. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:41:48,655][1652475] Updated weights for policy 0, policy_version 242160 (0.0014) [2024-06-15 14:41:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42052.2, 300 sec: 42988.4). Total num frames: 496107520. Throughput: 0: 10808.9. Samples: 124064768. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:41:52,548][1652475] Updated weights for policy 0, policy_version 242256 (0.0014) [2024-06-15 14:41:55,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.9, 300 sec: 42653.9). Total num frames: 496238592. Throughput: 0: 10274.1. Samples: 124114944. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:41:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:41:57,371][1652475] Updated weights for policy 0, policy_version 242305 (0.0013) [2024-06-15 14:42:00,436][1652475] Updated weights for policy 0, policy_version 242384 (0.0015) [2024-06-15 14:42:00,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 40960.0, 300 sec: 42765.0). Total num frames: 496402432. Throughput: 0: 10540.4. Samples: 124188672. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:42:00,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:42:01,997][1652475] Updated weights for policy 0, policy_version 242439 (0.0013) [2024-06-15 14:42:02,960][1652475] Updated weights for policy 0, policy_version 242491 (0.0016) [2024-06-15 14:42:05,099][1652475] Updated weights for policy 0, policy_version 242552 (0.0014) [2024-06-15 14:42:05,743][1648984] Fps is (10 sec: 52403.8, 60 sec: 43687.1, 300 sec: 43208.6). Total num frames: 496762880. Throughput: 0: 10284.4. Samples: 124216320. Policy #0 lag: (min: 8.0, avg: 79.4, max: 264.0) [2024-06-15 14:42:05,743][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:42:09,200][1652475] Updated weights for policy 0, policy_version 242581 (0.0042) [2024-06-15 14:42:10,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 496893952. Throughput: 0: 10569.9. Samples: 124293632. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:42:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:42:11,002][1652475] Updated weights for policy 0, policy_version 242628 (0.0020) [2024-06-15 14:42:11,402][1651340] Signal inference workers to stop experience collection... (12450 times) [2024-06-15 14:42:11,432][1652475] InferenceWorker_p0-w0: stopping experience collection (12450 times) [2024-06-15 14:42:11,738][1651340] Signal inference workers to resume experience collection... (12450 times) [2024-06-15 14:42:11,752][1652475] InferenceWorker_p0-w0: resuming experience collection (12450 times) [2024-06-15 14:42:12,646][1652475] Updated weights for policy 0, policy_version 242685 (0.0012) [2024-06-15 14:42:13,987][1652475] Updated weights for policy 0, policy_version 242722 (0.0013) [2024-06-15 14:42:15,363][1652475] Updated weights for policy 0, policy_version 242771 (0.0012) [2024-06-15 14:42:15,738][1648984] Fps is (10 sec: 45897.4, 60 sec: 42612.1, 300 sec: 43764.8). Total num frames: 497221632. Throughput: 0: 10456.2. Samples: 124358144. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:42:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:42:20,443][1652475] Updated weights for policy 0, policy_version 242836 (0.0025) [2024-06-15 14:42:20,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 497352704. Throughput: 0: 10444.8. Samples: 124395520. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:42:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:42:21,504][1652475] Updated weights for policy 0, policy_version 242877 (0.0015) [2024-06-15 14:42:23,564][1652475] Updated weights for policy 0, policy_version 242930 (0.0014) [2024-06-15 14:42:25,209][1652475] Updated weights for policy 0, policy_version 242976 (0.0014) [2024-06-15 14:42:25,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 43875.8). Total num frames: 497680384. Throughput: 0: 11047.8. Samples: 124463616. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:42:25,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:42:27,201][1652475] Updated weights for policy 0, policy_version 243043 (0.0016) [2024-06-15 14:42:30,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 497811456. Throughput: 0: 11207.1. Samples: 124539904. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:42:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:42:31,598][1652475] Updated weights for policy 0, policy_version 243104 (0.0015) [2024-06-15 14:42:32,465][1652475] Updated weights for policy 0, policy_version 243136 (0.0013) [2024-06-15 14:42:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 498073600. Throughput: 0: 11332.3. Samples: 124574720. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:42:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:42:36,056][1652475] Updated weights for policy 0, policy_version 243216 (0.0016) [2024-06-15 14:42:37,014][1652475] Updated weights for policy 0, policy_version 243264 (0.0012) [2024-06-15 14:42:38,921][1652475] Updated weights for policy 0, policy_version 243328 (0.0014) [2024-06-15 14:42:40,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43653.7). Total num frames: 498335744. Throughput: 0: 11503.0. Samples: 124632576. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:42:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:42:43,648][1652475] Updated weights for policy 0, policy_version 243385 (0.0012) [2024-06-15 14:42:45,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 46967.5, 300 sec: 43320.4). Total num frames: 498532352. Throughput: 0: 11605.3. Samples: 124710912. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:42:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:42:46,060][1652475] Updated weights for policy 0, policy_version 243447 (0.0014) [2024-06-15 14:42:48,746][1652475] Updated weights for policy 0, policy_version 243515 (0.0098) [2024-06-15 14:42:50,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 43986.9). Total num frames: 498860032. Throughput: 0: 11652.1. Samples: 124740608. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:42:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:42:53,852][1652475] Updated weights for policy 0, policy_version 243586 (0.0042) [2024-06-15 14:42:55,192][1652475] Updated weights for policy 0, policy_version 243643 (0.0036) [2024-06-15 14:42:55,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 45875.2, 300 sec: 43209.3). Total num frames: 498991104. Throughput: 0: 11502.9. Samples: 124811264. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:42:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:42:55,781][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000243648_498991104.pth... [2024-06-15 14:42:55,818][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000238624_488701952.pth [2024-06-15 14:42:57,531][1651340] Signal inference workers to stop experience collection... (12500 times) [2024-06-15 14:42:57,551][1652475] InferenceWorker_p0-w0: stopping experience collection (12500 times) [2024-06-15 14:42:57,585][1652475] Updated weights for policy 0, policy_version 243682 (0.0015) [2024-06-15 14:42:57,869][1651340] Signal inference workers to resume experience collection... (12500 times) [2024-06-15 14:42:57,870][1652475] InferenceWorker_p0-w0: resuming experience collection (12500 times) [2024-06-15 14:43:00,486][1652475] Updated weights for policy 0, policy_version 243765 (0.0014) [2024-06-15 14:43:00,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 46967.5, 300 sec: 43875.8). Total num frames: 499220480. Throughput: 0: 11446.1. Samples: 124873216. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:43:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:43:01,546][1652475] Updated weights for policy 0, policy_version 243808 (0.0012) [2024-06-15 14:43:05,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 44240.4, 300 sec: 43431.5). Total num frames: 499417088. Throughput: 0: 11355.0. Samples: 124906496. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:43:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:43:06,337][1652475] Updated weights for policy 0, policy_version 243888 (0.0013) [2024-06-15 14:43:09,271][1652475] Updated weights for policy 0, policy_version 243937 (0.0015) [2024-06-15 14:43:09,849][1652475] Updated weights for policy 0, policy_version 243965 (0.0017) [2024-06-15 14:43:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45875.3, 300 sec: 43545.9). Total num frames: 499646464. Throughput: 0: 11423.3. Samples: 124977664. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:43:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:43:12,711][1652475] Updated weights for policy 0, policy_version 244038 (0.0013) [2024-06-15 14:43:15,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 44782.9, 300 sec: 43653.6). Total num frames: 499908608. Throughput: 0: 11195.7. Samples: 125043712. Policy #0 lag: (min: 15.0, avg: 101.8, max: 271.0) [2024-06-15 14:43:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:43:16,944][1652475] Updated weights for policy 0, policy_version 244112 (0.0016) [2024-06-15 14:43:20,060][1652475] Updated weights for policy 0, policy_version 244161 (0.0085) [2024-06-15 14:43:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 43209.3). Total num frames: 500072448. Throughput: 0: 11195.7. Samples: 125078528. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:43:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:43:21,255][1652475] Updated weights for policy 0, policy_version 244215 (0.0036) [2024-06-15 14:43:23,989][1652475] Updated weights for policy 0, policy_version 244272 (0.0012) [2024-06-15 14:43:25,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 500432896. Throughput: 0: 11434.7. Samples: 125147136. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:43:25,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:43:27,591][1652475] Updated weights for policy 0, policy_version 244356 (0.0015) [2024-06-15 14:43:30,744][1648984] Fps is (10 sec: 49118.1, 60 sec: 45869.9, 300 sec: 43097.3). Total num frames: 500563968. Throughput: 0: 11262.3. Samples: 125217792. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:43:30,745][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:43:31,505][1652475] Updated weights for policy 0, policy_version 244420 (0.0014) [2024-06-15 14:43:32,972][1652475] Updated weights for policy 0, policy_version 244480 (0.0015) [2024-06-15 14:43:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 500826112. Throughput: 0: 11286.7. Samples: 125248512. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:43:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:43:35,811][1652475] Updated weights for policy 0, policy_version 244548 (0.0014) [2024-06-15 14:43:37,166][1652475] Updated weights for policy 0, policy_version 244606 (0.0012) [2024-06-15 14:43:40,738][1648984] Fps is (10 sec: 49185.5, 60 sec: 45329.0, 300 sec: 43431.5). Total num frames: 501055488. Throughput: 0: 11332.3. Samples: 125321216. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:43:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:43:40,954][1652475] Updated weights for policy 0, policy_version 244664 (0.0055) [2024-06-15 14:43:42,978][1651340] Signal inference workers to stop experience collection... (12550 times) [2024-06-15 14:43:43,037][1652475] InferenceWorker_p0-w0: stopping experience collection (12550 times) [2024-06-15 14:43:43,222][1651340] Signal inference workers to resume experience collection... (12550 times) [2024-06-15 14:43:43,223][1652475] InferenceWorker_p0-w0: resuming experience collection (12550 times) [2024-06-15 14:43:43,871][1652475] Updated weights for policy 0, policy_version 244705 (0.0013) [2024-06-15 14:43:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 501219328. Throughput: 0: 11309.5. Samples: 125382144. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:43:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:43:46,150][1652475] Updated weights for policy 0, policy_version 244752 (0.0016) [2024-06-15 14:43:47,887][1652475] Updated weights for policy 0, policy_version 244822 (0.0018) [2024-06-15 14:43:50,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 43690.7, 300 sec: 43547.1). Total num frames: 501481472. Throughput: 0: 11252.6. Samples: 125412864. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:43:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:43:53,887][1652475] Updated weights for policy 0, policy_version 244865 (0.0011) [2024-06-15 14:43:55,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 501612544. Throughput: 0: 11218.5. Samples: 125482496. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:43:55,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 14:43:55,802][1652475] Updated weights for policy 0, policy_version 244944 (0.0013) [2024-06-15 14:43:57,158][1652475] Updated weights for policy 0, policy_version 244993 (0.0012) [2024-06-15 14:43:58,492][1652475] Updated weights for policy 0, policy_version 245056 (0.0014) [2024-06-15 14:44:00,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 45329.0, 300 sec: 43986.9). Total num frames: 501940224. Throughput: 0: 11070.6. Samples: 125541888. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:44:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:44:01,339][1652475] Updated weights for policy 0, policy_version 245117 (0.0013) [2024-06-15 14:44:05,738][1648984] Fps is (10 sec: 42596.0, 60 sec: 43690.2, 300 sec: 42876.0). Total num frames: 502038528. Throughput: 0: 11002.2. Samples: 125573632. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:44:05,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:44:06,855][1652475] Updated weights for policy 0, policy_version 245184 (0.0012) [2024-06-15 14:44:09,999][1652475] Updated weights for policy 0, policy_version 245243 (0.0014) [2024-06-15 14:44:10,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 44236.9, 300 sec: 43653.7). Total num frames: 502300672. Throughput: 0: 10808.9. Samples: 125633536. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:44:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:44:11,641][1652475] Updated weights for policy 0, policy_version 245302 (0.0015) [2024-06-15 14:44:13,302][1652475] Updated weights for policy 0, policy_version 245329 (0.0012) [2024-06-15 14:44:15,738][1648984] Fps is (10 sec: 49154.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 502530048. Throughput: 0: 10696.7. Samples: 125699072. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:44:15,741][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 14:44:16,731][1652475] Updated weights for policy 0, policy_version 245381 (0.0029) [2024-06-15 14:44:17,980][1652475] Updated weights for policy 0, policy_version 245438 (0.0013) [2024-06-15 14:44:20,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 502726656. Throughput: 0: 10752.0. Samples: 125732352. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:44:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:44:21,070][1652475] Updated weights for policy 0, policy_version 245488 (0.0012) [2024-06-15 14:44:24,024][1652475] Updated weights for policy 0, policy_version 245538 (0.0013) [2024-06-15 14:44:25,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 43987.2). Total num frames: 502988800. Throughput: 0: 10672.4. Samples: 125801472. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:44:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:44:25,949][1652475] Updated weights for policy 0, policy_version 245624 (0.0110) [2024-06-15 14:44:30,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43695.7, 300 sec: 43542.6). Total num frames: 503185408. Throughput: 0: 10683.7. Samples: 125862912. Policy #0 lag: (min: 5.0, avg: 102.1, max: 261.0) [2024-06-15 14:44:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:44:32,262][1651340] Signal inference workers to stop experience collection... (12600 times) [2024-06-15 14:44:32,311][1652475] Updated weights for policy 0, policy_version 245697 (0.0015) [2024-06-15 14:44:32,327][1652475] InferenceWorker_p0-w0: stopping experience collection (12600 times) [2024-06-15 14:44:32,524][1651340] Signal inference workers to resume experience collection... (12600 times) [2024-06-15 14:44:32,525][1652475] InferenceWorker_p0-w0: resuming experience collection (12600 times) [2024-06-15 14:44:33,574][1652475] Updated weights for policy 0, policy_version 245760 (0.0018) [2024-06-15 14:44:35,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42052.2, 300 sec: 43653.6). Total num frames: 503349248. Throughput: 0: 10763.3. Samples: 125897216. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:44:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:44:37,459][1652475] Updated weights for policy 0, policy_version 245841 (0.0119) [2024-06-15 14:44:40,738][1648984] Fps is (10 sec: 42597.0, 60 sec: 42598.2, 300 sec: 43764.7). Total num frames: 503611392. Throughput: 0: 10558.5. Samples: 125957632. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:44:40,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:44:40,816][1652475] Updated weights for policy 0, policy_version 245920 (0.0014) [2024-06-15 14:44:44,280][1652475] Updated weights for policy 0, policy_version 245957 (0.0014) [2024-06-15 14:44:45,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 503840768. Throughput: 0: 10786.1. Samples: 126027264. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:44:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:44:48,000][1652475] Updated weights for policy 0, policy_version 246032 (0.0017) [2024-06-15 14:44:50,436][1652475] Updated weights for policy 0, policy_version 246128 (0.0013) [2024-06-15 14:44:50,738][1648984] Fps is (10 sec: 49153.7, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 504102912. Throughput: 0: 10911.4. Samples: 126064640. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:44:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:44:53,576][1652475] Updated weights for policy 0, policy_version 246196 (0.0016) [2024-06-15 14:44:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 504233984. Throughput: 0: 10774.7. Samples: 126118400. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:44:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:44:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000246208_504233984.pth... [2024-06-15 14:44:55,818][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000241024_493617152.pth [2024-06-15 14:44:57,921][1652475] Updated weights for policy 0, policy_version 246265 (0.0014) [2024-06-15 14:45:00,738][1648984] Fps is (10 sec: 26214.2, 60 sec: 40413.9, 300 sec: 43542.6). Total num frames: 504365056. Throughput: 0: 10911.3. Samples: 126190080. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:45:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:01,592][1652475] Updated weights for policy 0, policy_version 246320 (0.0012) [2024-06-15 14:45:03,257][1652475] Updated weights for policy 0, policy_version 246394 (0.0019) [2024-06-15 14:45:05,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 44783.4, 300 sec: 43986.9). Total num frames: 504725504. Throughput: 0: 10717.9. Samples: 126214656. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:45:05,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:05,762][1652475] Updated weights for policy 0, policy_version 246455 (0.0015) [2024-06-15 14:45:10,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42052.2, 300 sec: 43320.4). Total num frames: 504823808. Throughput: 0: 10740.6. Samples: 126284800. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:45:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:11,141][1652475] Updated weights for policy 0, policy_version 246527 (0.0012) [2024-06-15 14:45:13,649][1652475] Updated weights for policy 0, policy_version 246576 (0.0013) [2024-06-15 14:45:15,083][1652475] Updated weights for policy 0, policy_version 246640 (0.0012) [2024-06-15 14:45:15,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 505151488. Throughput: 0: 10763.4. Samples: 126347264. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:45:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:16,439][1651340] Signal inference workers to stop experience collection... (12650 times) [2024-06-15 14:45:16,495][1652475] InferenceWorker_p0-w0: stopping experience collection (12650 times) [2024-06-15 14:45:16,745][1651340] Signal inference workers to resume experience collection... (12650 times) [2024-06-15 14:45:16,754][1652475] InferenceWorker_p0-w0: resuming experience collection (12650 times) [2024-06-15 14:45:16,757][1652475] Updated weights for policy 0, policy_version 246688 (0.0024) [2024-06-15 14:45:20,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 505282560. Throughput: 0: 10774.7. Samples: 126382080. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:45:20,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:22,227][1652475] Updated weights for policy 0, policy_version 246775 (0.0016) [2024-06-15 14:45:25,390][1652475] Updated weights for policy 0, policy_version 246819 (0.0013) [2024-06-15 14:45:25,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 42052.2, 300 sec: 43875.8). Total num frames: 505511936. Throughput: 0: 10991.0. Samples: 126452224. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:45:25,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:27,284][1652475] Updated weights for policy 0, policy_version 246907 (0.0013) [2024-06-15 14:45:28,947][1652475] Updated weights for policy 0, policy_version 246964 (0.0014) [2024-06-15 14:45:30,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 505806848. Throughput: 0: 10683.7. Samples: 126508032. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:45:30,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:34,251][1652475] Updated weights for policy 0, policy_version 247008 (0.0025) [2024-06-15 14:45:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 505937920. Throughput: 0: 10752.0. Samples: 126548480. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:45:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:37,079][1652475] Updated weights for policy 0, policy_version 247056 (0.0119) [2024-06-15 14:45:39,041][1652475] Updated weights for policy 0, policy_version 247137 (0.0222) [2024-06-15 14:45:40,447][1652475] Updated weights for policy 0, policy_version 247200 (0.0016) [2024-06-15 14:45:40,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44237.0, 300 sec: 44320.1). Total num frames: 506265600. Throughput: 0: 10899.9. Samples: 126608896. Policy #0 lag: (min: 7.0, avg: 106.6, max: 263.0) [2024-06-15 14:45:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:45,342][1652475] Updated weights for policy 0, policy_version 247248 (0.0017) [2024-06-15 14:45:45,737][1648984] Fps is (10 sec: 45876.4, 60 sec: 42598.5, 300 sec: 43431.5). Total num frames: 506396672. Throughput: 0: 10945.5. Samples: 126682624. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:45:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:46,636][1652475] Updated weights for policy 0, policy_version 247294 (0.0016) [2024-06-15 14:45:50,627][1652475] Updated weights for policy 0, policy_version 247376 (0.0014) [2024-06-15 14:45:50,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42052.3, 300 sec: 44098.0). Total num frames: 506626048. Throughput: 0: 11093.3. Samples: 126713856. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:45:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:45:52,566][1652475] Updated weights for policy 0, policy_version 247447 (0.0123) [2024-06-15 14:45:55,738][1648984] Fps is (10 sec: 45871.2, 60 sec: 43690.2, 300 sec: 43764.6). Total num frames: 506855424. Throughput: 0: 10854.2. Samples: 126773248. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:45:55,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:45:57,568][1652475] Updated weights for policy 0, policy_version 247493 (0.0039) [2024-06-15 14:45:59,042][1652475] Updated weights for policy 0, policy_version 247543 (0.0033) [2024-06-15 14:46:00,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 506986496. Throughput: 0: 10968.2. Samples: 126840832. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 14:46:01,657][1652475] Updated weights for policy 0, policy_version 247585 (0.0012) [2024-06-15 14:46:01,997][1651340] Signal inference workers to stop experience collection... (12700 times) [2024-06-15 14:46:02,042][1652475] InferenceWorker_p0-w0: stopping experience collection (12700 times) [2024-06-15 14:46:02,215][1651340] Signal inference workers to resume experience collection... (12700 times) [2024-06-15 14:46:02,216][1652475] InferenceWorker_p0-w0: resuming experience collection (12700 times) [2024-06-15 14:46:03,019][1652475] Updated weights for policy 0, policy_version 247648 (0.0012) [2024-06-15 14:46:03,692][1652475] Updated weights for policy 0, policy_version 247680 (0.0019) [2024-06-15 14:46:05,077][1652475] Updated weights for policy 0, policy_version 247737 (0.0015) [2024-06-15 14:46:05,738][1648984] Fps is (10 sec: 52431.4, 60 sec: 44236.6, 300 sec: 44209.0). Total num frames: 507379712. Throughput: 0: 10865.8. Samples: 126871040. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:05,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:46:09,427][1652475] Updated weights for policy 0, policy_version 247792 (0.0019) [2024-06-15 14:46:10,739][1648984] Fps is (10 sec: 52424.1, 60 sec: 44782.3, 300 sec: 43545.3). Total num frames: 507510784. Throughput: 0: 11013.5. Samples: 126947840. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:10,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 14:46:12,300][1652475] Updated weights for policy 0, policy_version 247844 (0.0013) [2024-06-15 14:46:14,232][1652475] Updated weights for policy 0, policy_version 247936 (0.0011) [2024-06-15 14:46:15,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.7, 300 sec: 44097.9). Total num frames: 507805696. Throughput: 0: 11150.2. Samples: 127009792. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:15,739][1648984] Avg episode reward: [(0, '-0.630')] [2024-06-15 14:46:16,390][1652475] Updated weights for policy 0, policy_version 247996 (0.0015) [2024-06-15 14:46:20,738][1648984] Fps is (10 sec: 39323.5, 60 sec: 43690.5, 300 sec: 43320.3). Total num frames: 507904000. Throughput: 0: 11036.4. Samples: 127045120. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:20,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:46:23,429][1652475] Updated weights for policy 0, policy_version 248069 (0.0123) [2024-06-15 14:46:24,642][1652475] Updated weights for policy 0, policy_version 248128 (0.0022) [2024-06-15 14:46:25,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 44236.9, 300 sec: 43986.9). Total num frames: 508166144. Throughput: 0: 10945.4. Samples: 127101440. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:25,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:46:27,097][1652475] Updated weights for policy 0, policy_version 248183 (0.0041) [2024-06-15 14:46:28,321][1652475] Updated weights for policy 0, policy_version 248224 (0.0028) [2024-06-15 14:46:30,738][1648984] Fps is (10 sec: 52430.8, 60 sec: 43690.6, 300 sec: 44097.9). Total num frames: 508428288. Throughput: 0: 10649.6. Samples: 127161856. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:46:35,633][1652475] Updated weights for policy 0, policy_version 248294 (0.0013) [2024-06-15 14:46:35,738][1648984] Fps is (10 sec: 32767.1, 60 sec: 42598.2, 300 sec: 43320.4). Total num frames: 508493824. Throughput: 0: 10763.3. Samples: 127198208. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:35,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:46:37,695][1652475] Updated weights for policy 0, policy_version 248341 (0.0013) [2024-06-15 14:46:39,067][1652475] Updated weights for policy 0, policy_version 248400 (0.0012) [2024-06-15 14:46:40,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43144.5, 300 sec: 44542.2). Total num frames: 508854272. Throughput: 0: 10900.1. Samples: 127263744. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:46:40,861][1652475] Updated weights for policy 0, policy_version 248469 (0.0012) [2024-06-15 14:46:45,740][1648984] Fps is (10 sec: 45876.4, 60 sec: 42598.2, 300 sec: 43542.6). Total num frames: 508952576. Throughput: 0: 10899.9. Samples: 127331328. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:45,741][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:46:47,084][1651340] Signal inference workers to stop experience collection... (12750 times) [2024-06-15 14:46:47,149][1652475] InferenceWorker_p0-w0: stopping experience collection (12750 times) [2024-06-15 14:46:47,306][1651340] Signal inference workers to resume experience collection... (12750 times) [2024-06-15 14:46:47,307][1652475] InferenceWorker_p0-w0: resuming experience collection (12750 times) [2024-06-15 14:46:47,536][1652475] Updated weights for policy 0, policy_version 248575 (0.0014) [2024-06-15 14:46:50,696][1652475] Updated weights for policy 0, policy_version 248640 (0.0130) [2024-06-15 14:46:50,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 509214720. Throughput: 0: 10911.3. Samples: 127362048. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:46:52,361][1652475] Updated weights for policy 0, policy_version 248706 (0.0013) [2024-06-15 14:46:55,739][1648984] Fps is (10 sec: 52419.8, 60 sec: 43689.9, 300 sec: 44319.8). Total num frames: 509476864. Throughput: 0: 10569.7. Samples: 127423488. Policy #0 lag: (min: 14.0, avg: 103.2, max: 270.0) [2024-06-15 14:46:55,740][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:46:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000248768_509476864.pth... [2024-06-15 14:46:55,802][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000243648_498991104.pth [2024-06-15 14:46:58,337][1652475] Updated weights for policy 0, policy_version 248769 (0.0013) [2024-06-15 14:47:00,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 43543.3). Total num frames: 509607936. Throughput: 0: 10911.3. Samples: 127500800. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:01,005][1652475] Updated weights for policy 0, policy_version 248848 (0.0032) [2024-06-15 14:47:02,351][1652475] Updated weights for policy 0, policy_version 248900 (0.0014) [2024-06-15 14:47:03,636][1652475] Updated weights for policy 0, policy_version 248948 (0.0043) [2024-06-15 14:47:05,332][1652475] Updated weights for policy 0, policy_version 249021 (0.0011) [2024-06-15 14:47:05,738][1648984] Fps is (10 sec: 52437.7, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 510001152. Throughput: 0: 10865.9. Samples: 127534080. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:10,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42599.0, 300 sec: 43542.6). Total num frames: 510066688. Throughput: 0: 11116.1. Samples: 127601664. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:11,317][1652475] Updated weights for policy 0, policy_version 249081 (0.0012) [2024-06-15 14:47:13,576][1652475] Updated weights for policy 0, policy_version 249136 (0.0012) [2024-06-15 14:47:15,622][1652475] Updated weights for policy 0, policy_version 249216 (0.0223) [2024-06-15 14:47:15,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 43144.4, 300 sec: 44209.0). Total num frames: 510394368. Throughput: 0: 11002.2. Samples: 127656960. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:15,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:16,771][1652475] Updated weights for policy 0, policy_version 249273 (0.0013) [2024-06-15 14:47:20,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 510525440. Throughput: 0: 10911.4. Samples: 127689216. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:23,465][1652475] Updated weights for policy 0, policy_version 249335 (0.0011) [2024-06-15 14:47:25,571][1652475] Updated weights for policy 0, policy_version 249381 (0.0012) [2024-06-15 14:47:25,738][1648984] Fps is (10 sec: 32768.9, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 510722048. Throughput: 0: 11104.7. Samples: 127763456. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:27,313][1651340] Signal inference workers to stop experience collection... (12800 times) [2024-06-15 14:47:27,394][1652475] InferenceWorker_p0-w0: stopping experience collection (12800 times) [2024-06-15 14:47:27,396][1652475] Updated weights for policy 0, policy_version 249462 (0.0012) [2024-06-15 14:47:27,571][1651340] Signal inference workers to resume experience collection... (12800 times) [2024-06-15 14:47:27,573][1652475] InferenceWorker_p0-w0: resuming experience collection (12800 times) [2024-06-15 14:47:28,669][1652475] Updated weights for policy 0, policy_version 249520 (0.0014) [2024-06-15 14:47:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 511049728. Throughput: 0: 10899.9. Samples: 127821824. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:35,471][1652475] Updated weights for policy 0, policy_version 249596 (0.0031) [2024-06-15 14:47:35,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44783.1, 300 sec: 43542.5). Total num frames: 511180800. Throughput: 0: 11081.9. Samples: 127860736. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:37,703][1652475] Updated weights for policy 0, policy_version 249648 (0.0041) [2024-06-15 14:47:39,475][1652475] Updated weights for policy 0, policy_version 249719 (0.0014) [2024-06-15 14:47:40,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44236.9, 300 sec: 43986.9). Total num frames: 511508480. Throughput: 0: 11036.9. Samples: 127920128. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:41,225][1652475] Updated weights for policy 0, policy_version 249790 (0.0035) [2024-06-15 14:47:45,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 511574016. Throughput: 0: 10843.0. Samples: 127988736. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:47,210][1652475] Updated weights for policy 0, policy_version 249840 (0.0049) [2024-06-15 14:47:49,932][1652475] Updated weights for policy 0, policy_version 249920 (0.0116) [2024-06-15 14:47:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 511901696. Throughput: 0: 10922.7. Samples: 128025600. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:50,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:51,782][1652475] Updated weights for policy 0, policy_version 250000 (0.0127) [2024-06-15 14:47:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43691.9, 300 sec: 43653.6). Total num frames: 512098304. Throughput: 0: 10604.1. Samples: 128078848. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:47:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:47:59,027][1652475] Updated weights for policy 0, policy_version 250080 (0.0014) [2024-06-15 14:48:00,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 512229376. Throughput: 0: 11025.1. Samples: 128153088. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:48:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:48:02,140][1652475] Updated weights for policy 0, policy_version 250160 (0.0013) [2024-06-15 14:48:03,823][1652475] Updated weights for policy 0, policy_version 250224 (0.0014) [2024-06-15 14:48:05,655][1652475] Updated weights for policy 0, policy_version 250297 (0.0013) [2024-06-15 14:48:05,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 512589824. Throughput: 0: 10899.9. Samples: 128179712. Policy #0 lag: (min: 5.0, avg: 87.0, max: 261.0) [2024-06-15 14:48:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:48:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 512655360. Throughput: 0: 10797.5. Samples: 128249344. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:48:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:48:11,358][1652475] Updated weights for policy 0, policy_version 250368 (0.0013) [2024-06-15 14:48:13,426][1651340] Signal inference workers to stop experience collection... (12850 times) [2024-06-15 14:48:13,464][1652475] InferenceWorker_p0-w0: stopping experience collection (12850 times) [2024-06-15 14:48:13,687][1651340] Signal inference workers to resume experience collection... (12850 times) [2024-06-15 14:48:13,688][1652475] InferenceWorker_p0-w0: resuming experience collection (12850 times) [2024-06-15 14:48:15,172][1652475] Updated weights for policy 0, policy_version 250449 (0.0017) [2024-06-15 14:48:15,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42598.6, 300 sec: 43653.6). Total num frames: 512950272. Throughput: 0: 10865.8. Samples: 128310784. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:48:15,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 14:48:16,868][1652475] Updated weights for policy 0, policy_version 250516 (0.0013) [2024-06-15 14:48:20,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 513146880. Throughput: 0: 10570.0. Samples: 128336384. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:48:20,738][1648984] Avg episode reward: [(0, '-0.620')] [2024-06-15 14:48:22,304][1652475] Updated weights for policy 0, policy_version 250576 (0.0024) [2024-06-15 14:48:25,582][1652475] Updated weights for policy 0, policy_version 250640 (0.0011) [2024-06-15 14:48:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 43210.3). Total num frames: 513310720. Throughput: 0: 10911.3. Samples: 128411136. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:48:25,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:48:27,372][1652475] Updated weights for policy 0, policy_version 250720 (0.0092) [2024-06-15 14:48:29,622][1652475] Updated weights for policy 0, policy_version 250806 (0.0116) [2024-06-15 14:48:30,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.5, 300 sec: 43542.5). Total num frames: 513671168. Throughput: 0: 10604.0. Samples: 128465920. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:48:30,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:48:35,240][1652475] Updated weights for policy 0, policy_version 250850 (0.0012) [2024-06-15 14:48:35,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 513802240. Throughput: 0: 10649.6. Samples: 128504832. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:48:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:48:37,795][1652475] Updated weights for policy 0, policy_version 250921 (0.0054) [2024-06-15 14:48:39,039][1652475] Updated weights for policy 0, policy_version 250976 (0.0012) [2024-06-15 14:48:40,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 514097152. Throughput: 0: 10865.8. Samples: 128567808. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:48:40,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:48:41,462][1652475] Updated weights for policy 0, policy_version 251064 (0.0013) [2024-06-15 14:48:45,740][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 514195456. Throughput: 0: 10535.8. Samples: 128627200. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:48:45,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:48:48,854][1652475] Updated weights for policy 0, policy_version 251136 (0.0034) [2024-06-15 14:48:50,627][1652475] Updated weights for policy 0, policy_version 251194 (0.0014) [2024-06-15 14:48:50,738][1648984] Fps is (10 sec: 36043.3, 60 sec: 42598.1, 300 sec: 43542.5). Total num frames: 514457600. Throughput: 0: 10706.4. Samples: 128661504. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:48:50,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:48:52,306][1652475] Updated weights for policy 0, policy_version 251248 (0.0015) [2024-06-15 14:48:54,251][1652475] Updated weights for policy 0, policy_version 251312 (0.0030) [2024-06-15 14:48:55,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 514719744. Throughput: 0: 10342.4. Samples: 128714752. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:48:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:48:55,749][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000251328_514719744.pth... [2024-06-15 14:48:55,825][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000246208_504233984.pth [2024-06-15 14:49:00,186][1651340] Signal inference workers to stop experience collection... (12900 times) [2024-06-15 14:49:00,237][1652475] InferenceWorker_p0-w0: stopping experience collection (12900 times) [2024-06-15 14:49:00,417][1651340] Signal inference workers to resume experience collection... (12900 times) [2024-06-15 14:49:00,417][1652475] InferenceWorker_p0-w0: resuming experience collection (12900 times) [2024-06-15 14:49:00,422][1652475] Updated weights for policy 0, policy_version 251376 (0.0015) [2024-06-15 14:49:00,738][1648984] Fps is (10 sec: 36046.1, 60 sec: 43144.5, 300 sec: 43320.5). Total num frames: 514818048. Throughput: 0: 10604.1. Samples: 128787968. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:49:00,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:02,114][1652475] Updated weights for policy 0, policy_version 251426 (0.0013) [2024-06-15 14:49:04,562][1652475] Updated weights for policy 0, policy_version 251472 (0.0011) [2024-06-15 14:49:05,738][1648984] Fps is (10 sec: 36045.6, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 515080192. Throughput: 0: 10729.3. Samples: 128819200. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:49:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:06,706][1652475] Updated weights for policy 0, policy_version 251552 (0.0012) [2024-06-15 14:49:10,762][1648984] Fps is (10 sec: 42494.7, 60 sec: 43126.9, 300 sec: 43094.7). Total num frames: 515244032. Throughput: 0: 10370.9. Samples: 128878080. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:49:10,763][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:12,620][1652475] Updated weights for policy 0, policy_version 251618 (0.0126) [2024-06-15 14:49:13,951][1652475] Updated weights for policy 0, policy_version 251667 (0.0012) [2024-06-15 14:49:15,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 515506176. Throughput: 0: 10683.8. Samples: 128946688. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:49:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:16,795][1652475] Updated weights for policy 0, policy_version 251714 (0.0108) [2024-06-15 14:49:18,983][1652475] Updated weights for policy 0, policy_version 251795 (0.0012) [2024-06-15 14:49:20,738][1648984] Fps is (10 sec: 52557.6, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 515768320. Throughput: 0: 10513.1. Samples: 128977920. Policy #0 lag: (min: 12.0, avg: 89.8, max: 268.0) [2024-06-15 14:49:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:24,260][1652475] Updated weights for policy 0, policy_version 251856 (0.0015) [2024-06-15 14:49:25,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 515899392. Throughput: 0: 10615.5. Samples: 129045504. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:49:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:25,917][1652475] Updated weights for policy 0, policy_version 251907 (0.0023) [2024-06-15 14:49:27,217][1652475] Updated weights for policy 0, policy_version 251964 (0.0011) [2024-06-15 14:49:30,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.2, 300 sec: 43320.4). Total num frames: 516128768. Throughput: 0: 10683.7. Samples: 129107968. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:49:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:31,351][1652475] Updated weights for policy 0, policy_version 252048 (0.0130) [2024-06-15 14:49:35,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 516292608. Throughput: 0: 10536.0. Samples: 129135616. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:49:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:35,864][1652475] Updated weights for policy 0, policy_version 252112 (0.0012) [2024-06-15 14:49:37,065][1652475] Updated weights for policy 0, policy_version 252156 (0.0017) [2024-06-15 14:49:38,924][1652475] Updated weights for policy 0, policy_version 252208 (0.0013) [2024-06-15 14:49:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 40960.0, 300 sec: 43098.2). Total num frames: 516554752. Throughput: 0: 10831.7. Samples: 129202176. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:49:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:41,709][1652475] Updated weights for policy 0, policy_version 252256 (0.0027) [2024-06-15 14:49:43,099][1652475] Updated weights for policy 0, policy_version 252320 (0.0125) [2024-06-15 14:49:45,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 516816896. Throughput: 0: 10797.5. Samples: 129273856. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:49:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:47,057][1651340] Signal inference workers to stop experience collection... (12950 times) [2024-06-15 14:49:47,120][1652475] InferenceWorker_p0-w0: stopping experience collection (12950 times) [2024-06-15 14:49:47,333][1651340] Signal inference workers to resume experience collection... (12950 times) [2024-06-15 14:49:47,358][1652475] InferenceWorker_p0-w0: resuming experience collection (12950 times) [2024-06-15 14:49:47,720][1652475] Updated weights for policy 0, policy_version 252384 (0.0013) [2024-06-15 14:49:49,059][1652475] Updated weights for policy 0, policy_version 252420 (0.0015) [2024-06-15 14:49:50,327][1652475] Updated weights for policy 0, policy_version 252468 (0.0012) [2024-06-15 14:49:50,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43691.0, 300 sec: 43542.6). Total num frames: 517079040. Throughput: 0: 10877.1. Samples: 129308672. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:49:50,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:49:53,338][1652475] Updated weights for policy 0, policy_version 252514 (0.0013) [2024-06-15 14:49:54,952][1652475] Updated weights for policy 0, policy_version 252592 (0.0012) [2024-06-15 14:49:55,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 517341184. Throughput: 0: 11065.2. Samples: 129375744. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:49:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:50:00,023][1652475] Updated weights for policy 0, policy_version 252666 (0.0014) [2024-06-15 14:50:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44236.9, 300 sec: 43209.3). Total num frames: 517472256. Throughput: 0: 11150.2. Samples: 129448448. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:50:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:50:01,682][1652475] Updated weights for policy 0, policy_version 252720 (0.0012) [2024-06-15 14:50:04,602][1652475] Updated weights for policy 0, policy_version 252775 (0.0012) [2024-06-15 14:50:05,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44782.9, 300 sec: 43875.8). Total num frames: 517767168. Throughput: 0: 11241.2. Samples: 129483776. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:50:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:50:06,366][1652475] Updated weights for policy 0, policy_version 252848 (0.0147) [2024-06-15 14:50:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44254.9, 300 sec: 43209.3). Total num frames: 517898240. Throughput: 0: 11184.4. Samples: 129548800. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:50:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:50:11,333][1652475] Updated weights for policy 0, policy_version 252912 (0.0013) [2024-06-15 14:50:15,689][1652475] Updated weights for policy 0, policy_version 252996 (0.0012) [2024-06-15 14:50:15,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 518127616. Throughput: 0: 11377.8. Samples: 129619968. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:50:15,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:50:17,994][1652475] Updated weights for policy 0, policy_version 253090 (0.0116) [2024-06-15 14:50:20,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 43653.7). Total num frames: 518389760. Throughput: 0: 11298.1. Samples: 129644032. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:50:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:50:23,193][1652475] Updated weights for policy 0, policy_version 253138 (0.0013) [2024-06-15 14:50:25,263][1652475] Updated weights for policy 0, policy_version 253216 (0.0014) [2024-06-15 14:50:25,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 45329.2, 300 sec: 43431.5). Total num frames: 518619136. Throughput: 0: 11366.4. Samples: 129713664. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:50:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:50:28,265][1652475] Updated weights for policy 0, policy_version 253264 (0.0012) [2024-06-15 14:50:29,121][1651340] Signal inference workers to stop experience collection... (13000 times) [2024-06-15 14:50:29,197][1652475] InferenceWorker_p0-w0: stopping experience collection (13000 times) [2024-06-15 14:50:29,376][1651340] Signal inference workers to resume experience collection... (13000 times) [2024-06-15 14:50:29,378][1652475] InferenceWorker_p0-w0: resuming experience collection (13000 times) [2024-06-15 14:50:30,412][1652475] Updated weights for policy 0, policy_version 253360 (0.0014) [2024-06-15 14:50:30,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 46421.3, 300 sec: 43986.9). Total num frames: 518914048. Throughput: 0: 11047.8. Samples: 129771008. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 14:50:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 14:50:35,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 518979584. Throughput: 0: 11082.0. Samples: 129807360. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:50:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:50:36,081][1652475] Updated weights for policy 0, policy_version 253424 (0.0014) [2024-06-15 14:50:37,749][1652475] Updated weights for policy 0, policy_version 253457 (0.0013) [2024-06-15 14:50:40,068][1652475] Updated weights for policy 0, policy_version 253520 (0.0013) [2024-06-15 14:50:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 519274496. Throughput: 0: 11025.1. Samples: 129871872. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:50:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:50:41,643][1652475] Updated weights for policy 0, policy_version 253600 (0.0023) [2024-06-15 14:50:45,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 519438336. Throughput: 0: 10808.9. Samples: 129934848. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:50:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:50:47,898][1652475] Updated weights for policy 0, policy_version 253664 (0.0147) [2024-06-15 14:50:50,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42052.3, 300 sec: 43209.4). Total num frames: 519602176. Throughput: 0: 10661.0. Samples: 129963520. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:50:50,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:50:51,587][1652475] Updated weights for policy 0, policy_version 253760 (0.0012) [2024-06-15 14:50:53,747][1652475] Updated weights for policy 0, policy_version 253840 (0.0102) [2024-06-15 14:50:55,738][1648984] Fps is (10 sec: 52426.5, 60 sec: 43690.5, 300 sec: 43986.8). Total num frames: 519962624. Throughput: 0: 10535.7. Samples: 130022912. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:50:55,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:50:55,757][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000253888_519962624.pth... [2024-06-15 14:50:55,840][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000248768_509476864.pth [2024-06-15 14:50:55,845][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000253888_519962624.pth [2024-06-15 14:51:00,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 41506.2, 300 sec: 42654.0). Total num frames: 519962624. Throughput: 0: 10501.7. Samples: 130092544. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:51:00,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 14:51:01,129][1652475] Updated weights for policy 0, policy_version 253904 (0.0013) [2024-06-15 14:51:03,441][1652475] Updated weights for policy 0, policy_version 254016 (0.0141) [2024-06-15 14:51:05,738][1648984] Fps is (10 sec: 32769.3, 60 sec: 42052.3, 300 sec: 43320.5). Total num frames: 520290304. Throughput: 0: 10501.7. Samples: 130116608. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:51:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 14:51:06,157][1652475] Updated weights for policy 0, policy_version 254080 (0.0016) [2024-06-15 14:51:10,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 520486912. Throughput: 0: 10160.3. Samples: 130170880. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:51:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 14:51:13,516][1652475] Updated weights for policy 0, policy_version 254149 (0.0083) [2024-06-15 14:51:14,792][1652475] Updated weights for policy 0, policy_version 254208 (0.0106) [2024-06-15 14:51:15,739][1648984] Fps is (10 sec: 39315.7, 60 sec: 42597.4, 300 sec: 43320.3). Total num frames: 520683520. Throughput: 0: 10501.4. Samples: 130243584. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:51:15,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:51:18,876][1651340] Signal inference workers to stop experience collection... (13050 times) [2024-06-15 14:51:18,954][1652475] InferenceWorker_p0-w0: stopping experience collection (13050 times) [2024-06-15 14:51:19,138][1651340] Signal inference workers to resume experience collection... (13050 times) [2024-06-15 14:51:19,139][1652475] InferenceWorker_p0-w0: resuming experience collection (13050 times) [2024-06-15 14:51:19,141][1652475] Updated weights for policy 0, policy_version 254288 (0.0015) [2024-06-15 14:51:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 520880128. Throughput: 0: 10387.9. Samples: 130274816. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:51:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:51:20,829][1652475] Updated weights for policy 0, policy_version 254352 (0.0013) [2024-06-15 14:51:25,738][1648984] Fps is (10 sec: 39327.2, 60 sec: 40959.9, 300 sec: 42876.1). Total num frames: 521076736. Throughput: 0: 10387.9. Samples: 130339328. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:51:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:51:25,750][1652475] Updated weights for policy 0, policy_version 254432 (0.0100) [2024-06-15 14:51:27,580][1652475] Updated weights for policy 0, policy_version 254521 (0.0013) [2024-06-15 14:51:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 39321.7, 300 sec: 43320.5). Total num frames: 521273344. Throughput: 0: 10456.2. Samples: 130405376. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:51:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:51:32,363][1652475] Updated weights for policy 0, policy_version 254578 (0.0013) [2024-06-15 14:51:33,870][1652475] Updated weights for policy 0, policy_version 254650 (0.0012) [2024-06-15 14:51:35,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 521535488. Throughput: 0: 10433.4. Samples: 130433024. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:51:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:51:37,933][1652475] Updated weights for policy 0, policy_version 254688 (0.0012) [2024-06-15 14:51:40,161][1652475] Updated weights for policy 0, policy_version 254778 (0.0012) [2024-06-15 14:51:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 521797632. Throughput: 0: 10615.6. Samples: 130500608. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:51:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:51:44,955][1652475] Updated weights for policy 0, policy_version 254848 (0.0013) [2024-06-15 14:51:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 521994240. Throughput: 0: 10410.7. Samples: 130561024. Policy #0 lag: (min: 15.0, avg: 97.2, max: 271.0) [2024-06-15 14:51:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:51:49,779][1652475] Updated weights for policy 0, policy_version 254917 (0.0012) [2024-06-15 14:51:50,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42052.3, 300 sec: 42876.4). Total num frames: 522125312. Throughput: 0: 10615.5. Samples: 130594304. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:51:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:51:51,045][1652475] Updated weights for policy 0, policy_version 254964 (0.0013) [2024-06-15 14:51:52,685][1652475] Updated weights for policy 0, policy_version 255034 (0.0013) [2024-06-15 14:51:55,739][1648984] Fps is (10 sec: 32767.8, 60 sec: 39321.8, 300 sec: 43098.2). Total num frames: 522321920. Throughput: 0: 10854.4. Samples: 130659328. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:51:55,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:51:56,765][1652475] Updated weights for policy 0, policy_version 255080 (0.0013) [2024-06-15 14:51:58,166][1652475] Updated weights for policy 0, policy_version 255137 (0.0013) [2024-06-15 14:52:00,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 522584064. Throughput: 0: 10832.0. Samples: 130731008. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:52:01,599][1651340] Signal inference workers to stop experience collection... (13100 times) [2024-06-15 14:52:01,642][1652475] InferenceWorker_p0-w0: stopping experience collection (13100 times) [2024-06-15 14:52:01,663][1652475] Updated weights for policy 0, policy_version 255188 (0.0020) [2024-06-15 14:52:01,869][1651340] Signal inference workers to resume experience collection... (13100 times) [2024-06-15 14:52:01,870][1652475] InferenceWorker_p0-w0: resuming experience collection (13100 times) [2024-06-15 14:52:03,695][1652475] Updated weights for policy 0, policy_version 255265 (0.0129) [2024-06-15 14:52:04,389][1652475] Updated weights for policy 0, policy_version 255295 (0.0011) [2024-06-15 14:52:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 522846208. Throughput: 0: 10626.8. Samples: 130753024. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:52:09,297][1652475] Updated weights for policy 0, policy_version 255376 (0.0013) [2024-06-15 14:52:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 523108352. Throughput: 0: 10808.9. Samples: 130825728. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:52:14,155][1652475] Updated weights for policy 0, policy_version 255456 (0.0191) [2024-06-15 14:52:15,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43691.7, 300 sec: 43320.4). Total num frames: 523304960. Throughput: 0: 10706.5. Samples: 130887168. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:52:15,961][1652475] Updated weights for policy 0, policy_version 255536 (0.0015) [2024-06-15 14:52:20,594][1652475] Updated weights for policy 0, policy_version 255584 (0.0101) [2024-06-15 14:52:20,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 523436032. Throughput: 0: 10911.3. Samples: 130924032. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:52:22,543][1652475] Updated weights for policy 0, policy_version 255675 (0.0022) [2024-06-15 14:52:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 523665408. Throughput: 0: 10831.6. Samples: 130988032. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:52:26,224][1652475] Updated weights for policy 0, policy_version 255717 (0.0017) [2024-06-15 14:52:27,810][1652475] Updated weights for policy 0, policy_version 255792 (0.0012) [2024-06-15 14:52:30,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 523894784. Throughput: 0: 11138.8. Samples: 131062272. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:52:31,577][1652475] Updated weights for policy 0, policy_version 255824 (0.0015) [2024-06-15 14:52:33,309][1652475] Updated weights for policy 0, policy_version 255905 (0.0014) [2024-06-15 14:52:35,744][1648984] Fps is (10 sec: 49151.7, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 524156928. Throughput: 0: 11036.4. Samples: 131090944. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:35,753][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:52:37,299][1652475] Updated weights for policy 0, policy_version 255968 (0.0013) [2024-06-15 14:52:39,428][1652475] Updated weights for policy 0, policy_version 256064 (0.0012) [2024-06-15 14:52:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 524419072. Throughput: 0: 10956.8. Samples: 131152384. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:52:43,856][1651340] Signal inference workers to stop experience collection... (13150 times) [2024-06-15 14:52:43,919][1652475] InferenceWorker_p0-w0: stopping experience collection (13150 times) [2024-06-15 14:52:44,018][1651340] Signal inference workers to resume experience collection... (13150 times) [2024-06-15 14:52:44,062][1652475] InferenceWorker_p0-w0: resuming experience collection (13150 times) [2024-06-15 14:52:44,505][1652475] Updated weights for policy 0, policy_version 256132 (0.0123) [2024-06-15 14:52:45,566][1652475] Updated weights for policy 0, policy_version 256184 (0.0014) [2024-06-15 14:52:45,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 524681216. Throughput: 0: 10990.9. Samples: 131225600. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:52:49,005][1652475] Updated weights for policy 0, policy_version 256224 (0.0012) [2024-06-15 14:52:50,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 43320.4). Total num frames: 524877824. Throughput: 0: 11309.5. Samples: 131261952. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:52:51,038][1652475] Updated weights for policy 0, policy_version 256310 (0.0020) [2024-06-15 14:52:55,198][1652475] Updated weights for policy 0, policy_version 256352 (0.0015) [2024-06-15 14:52:55,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 525041664. Throughput: 0: 11264.0. Samples: 131332608. Policy #0 lag: (min: 95.0, avg: 201.0, max: 367.0) [2024-06-15 14:52:55,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:52:56,045][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000256384_525074432.pth... [2024-06-15 14:52:56,162][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000251328_514719744.pth [2024-06-15 14:52:57,108][1652475] Updated weights for policy 0, policy_version 256432 (0.0097) [2024-06-15 14:52:59,490][1652475] Updated weights for policy 0, policy_version 256464 (0.0013) [2024-06-15 14:53:00,333][1652475] Updated weights for policy 0, policy_version 256512 (0.0013) [2024-06-15 14:53:00,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 43209.3). Total num frames: 525336576. Throughput: 0: 11286.7. Samples: 131395072. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:00,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 14:53:03,108][1652475] Updated weights for policy 0, policy_version 256568 (0.0013) [2024-06-15 14:53:05,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 525500416. Throughput: 0: 11286.8. Samples: 131431936. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:05,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 14:53:06,048][1652475] Updated weights for policy 0, policy_version 256610 (0.0018) [2024-06-15 14:53:07,724][1652475] Updated weights for policy 0, policy_version 256696 (0.0098) [2024-06-15 14:53:10,742][1648984] Fps is (10 sec: 39303.4, 60 sec: 43687.3, 300 sec: 43319.7). Total num frames: 525729792. Throughput: 0: 11479.0. Samples: 131504640. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:10,743][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 14:53:11,846][1652475] Updated weights for policy 0, policy_version 256765 (0.0022) [2024-06-15 14:53:13,478][1652475] Updated weights for policy 0, policy_version 256802 (0.0099) [2024-06-15 14:53:13,926][1652475] Updated weights for policy 0, policy_version 256826 (0.0011) [2024-06-15 14:53:15,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 525991936. Throughput: 0: 11355.0. Samples: 131573248. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 14:53:17,807][1652475] Updated weights for policy 0, policy_version 256896 (0.0015) [2024-06-15 14:53:19,976][1652475] Updated weights for policy 0, policy_version 256959 (0.0013) [2024-06-15 14:53:20,738][1648984] Fps is (10 sec: 52453.0, 60 sec: 46967.5, 300 sec: 43875.8). Total num frames: 526254080. Throughput: 0: 11423.3. Samples: 131604992. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:20,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 14:53:24,287][1652475] Updated weights for policy 0, policy_version 257011 (0.0015) [2024-06-15 14:53:25,591][1651340] Signal inference workers to stop experience collection... (13200 times) [2024-06-15 14:53:25,671][1652475] InferenceWorker_p0-w0: stopping experience collection (13200 times) [2024-06-15 14:53:25,739][1648984] Fps is (10 sec: 45869.6, 60 sec: 46420.4, 300 sec: 43320.3). Total num frames: 526450688. Throughput: 0: 11377.5. Samples: 131664384. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:25,740][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:53:25,860][1651340] Signal inference workers to resume experience collection... (13200 times) [2024-06-15 14:53:25,860][1652475] InferenceWorker_p0-w0: resuming experience collection (13200 times) [2024-06-15 14:53:26,068][1652475] Updated weights for policy 0, policy_version 257081 (0.0013) [2024-06-15 14:53:30,738][1648984] Fps is (10 sec: 26214.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 526516224. Throughput: 0: 11207.1. Samples: 131729920. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:53:32,520][1652475] Updated weights for policy 0, policy_version 257152 (0.0014) [2024-06-15 14:53:35,507][1652475] Updated weights for policy 0, policy_version 257232 (0.0012) [2024-06-15 14:53:35,738][1648984] Fps is (10 sec: 36049.1, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 526811136. Throughput: 0: 10979.5. Samples: 131756032. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:53:38,093][1652475] Updated weights for policy 0, policy_version 257328 (0.0119) [2024-06-15 14:53:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 527040512. Throughput: 0: 10626.8. Samples: 131810816. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:53:43,918][1652475] Updated weights for policy 0, policy_version 257360 (0.0042) [2024-06-15 14:53:45,660][1652475] Updated weights for policy 0, policy_version 257426 (0.0012) [2024-06-15 14:53:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42052.2, 300 sec: 43209.4). Total num frames: 527204352. Throughput: 0: 10865.8. Samples: 131884032. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:45,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:53:46,548][1652475] Updated weights for policy 0, policy_version 257465 (0.0013) [2024-06-15 14:53:49,588][1652475] Updated weights for policy 0, policy_version 257552 (0.0013) [2024-06-15 14:53:50,739][1648984] Fps is (10 sec: 52424.0, 60 sec: 44782.2, 300 sec: 43542.4). Total num frames: 527564800. Throughput: 0: 10706.3. Samples: 131913728. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:50,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:53:55,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 42052.4, 300 sec: 43209.4). Total num frames: 527564800. Throughput: 0: 10548.3. Samples: 131979264. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:53:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:53:56,107][1652475] Updated weights for policy 0, policy_version 257616 (0.0012) [2024-06-15 14:53:57,805][1652475] Updated weights for policy 0, policy_version 257680 (0.0014) [2024-06-15 14:54:00,438][1652475] Updated weights for policy 0, policy_version 257744 (0.0015) [2024-06-15 14:54:00,738][1648984] Fps is (10 sec: 29494.2, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 527859712. Throughput: 0: 10456.2. Samples: 132043776. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:54:00,797][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:03,033][1652475] Updated weights for policy 0, policy_version 257851 (0.0015) [2024-06-15 14:54:05,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43144.5, 300 sec: 43546.2). Total num frames: 528089088. Throughput: 0: 10205.9. Samples: 132064256. Policy #0 lag: (min: 95.0, avg: 174.3, max: 351.0) [2024-06-15 14:54:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:08,755][1652475] Updated weights for policy 0, policy_version 257904 (0.0013) [2024-06-15 14:54:10,738][1648984] Fps is (10 sec: 42596.6, 60 sec: 42601.4, 300 sec: 43320.3). Total num frames: 528285696. Throughput: 0: 10627.0. Samples: 132142592. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:54:10,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:10,744][1652475] Updated weights for policy 0, policy_version 257956 (0.0015) [2024-06-15 14:54:12,734][1652475] Updated weights for policy 0, policy_version 258021 (0.0014) [2024-06-15 14:54:13,047][1651340] Signal inference workers to stop experience collection... (13250 times) [2024-06-15 14:54:13,086][1652475] InferenceWorker_p0-w0: stopping experience collection (13250 times) [2024-06-15 14:54:13,329][1651340] Signal inference workers to resume experience collection... (13250 times) [2024-06-15 14:54:13,330][1652475] InferenceWorker_p0-w0: resuming experience collection (13250 times) [2024-06-15 14:54:14,891][1652475] Updated weights for policy 0, policy_version 258104 (0.0135) [2024-06-15 14:54:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 528613376. Throughput: 0: 10365.2. Samples: 132196352. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:54:15,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:20,738][1648984] Fps is (10 sec: 36046.3, 60 sec: 39867.8, 300 sec: 43209.4). Total num frames: 528646144. Throughput: 0: 10752.0. Samples: 132239872. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:54:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:21,120][1652475] Updated weights for policy 0, policy_version 258160 (0.0012) [2024-06-15 14:54:23,355][1652475] Updated weights for policy 0, policy_version 258229 (0.0029) [2024-06-15 14:54:25,159][1652475] Updated weights for policy 0, policy_version 258289 (0.0013) [2024-06-15 14:54:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42599.2, 300 sec: 43653.6). Total num frames: 529006592. Throughput: 0: 10888.6. Samples: 132300800. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:54:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:26,906][1652475] Updated weights for policy 0, policy_version 258363 (0.0012) [2024-06-15 14:54:30,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 529137664. Throughput: 0: 10808.9. Samples: 132370432. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:54:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:33,960][1652475] Updated weights for policy 0, policy_version 258433 (0.0015) [2024-06-15 14:54:35,500][1652475] Updated weights for policy 0, policy_version 258499 (0.0119) [2024-06-15 14:54:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 529399808. Throughput: 0: 11002.6. Samples: 132408832. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:54:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:36,960][1652475] Updated weights for policy 0, policy_version 258547 (0.0014) [2024-06-15 14:54:38,510][1652475] Updated weights for policy 0, policy_version 258615 (0.0014) [2024-06-15 14:54:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 529661952. Throughput: 0: 10797.5. Samples: 132465152. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:54:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:45,382][1652475] Updated weights for policy 0, policy_version 258683 (0.0015) [2024-06-15 14:54:45,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 529793024. Throughput: 0: 11070.5. Samples: 132541952. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:54:45,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:47,201][1652475] Updated weights for policy 0, policy_version 258737 (0.0062) [2024-06-15 14:54:49,734][1652475] Updated weights for policy 0, policy_version 258848 (0.0117) [2024-06-15 14:54:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43691.4, 300 sec: 43542.6). Total num frames: 530186240. Throughput: 0: 11059.2. Samples: 132561920. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:54:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:55,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 43690.3, 300 sec: 43098.2). Total num frames: 530186240. Throughput: 0: 10990.9. Samples: 132637184. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:54:55,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:54:55,748][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000258880_530186240.pth... [2024-06-15 14:54:55,791][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000253888_519962624.pth [2024-06-15 14:54:56,470][1652475] Updated weights for policy 0, policy_version 258884 (0.0037) [2024-06-15 14:54:57,867][1651340] Signal inference workers to stop experience collection... (13300 times) [2024-06-15 14:54:57,906][1652475] InferenceWorker_p0-w0: stopping experience collection (13300 times) [2024-06-15 14:54:57,973][1651340] Signal inference workers to resume experience collection... (13300 times) [2024-06-15 14:54:57,973][1652475] InferenceWorker_p0-w0: resuming experience collection (13300 times) [2024-06-15 14:54:58,173][1652475] Updated weights for policy 0, policy_version 258949 (0.0015) [2024-06-15 14:55:00,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 530513920. Throughput: 0: 11059.2. Samples: 132694016. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:55:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:55:01,128][1652475] Updated weights for policy 0, policy_version 259062 (0.0101) [2024-06-15 14:55:02,445][1652475] Updated weights for policy 0, policy_version 259129 (0.0011) [2024-06-15 14:55:05,738][1648984] Fps is (10 sec: 52431.1, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 530710528. Throughput: 0: 10706.5. Samples: 132721664. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:55:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:55:10,307][1652475] Updated weights for policy 0, policy_version 259185 (0.0016) [2024-06-15 14:55:10,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42598.6, 300 sec: 43098.2). Total num frames: 530841600. Throughput: 0: 11013.7. Samples: 132796416. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:55:10,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 14:55:12,071][1652475] Updated weights for policy 0, policy_version 259267 (0.0013) [2024-06-15 14:55:13,121][1652475] Updated weights for policy 0, policy_version 259326 (0.0015) [2024-06-15 14:55:14,860][1652475] Updated weights for policy 0, policy_version 259362 (0.0019) [2024-06-15 14:55:15,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 531234816. Throughput: 0: 10729.2. Samples: 132853248. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:55:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:55:20,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 531234816. Throughput: 0: 10740.6. Samples: 132892160. Policy #0 lag: (min: 9.0, avg: 83.9, max: 265.0) [2024-06-15 14:55:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:55:21,427][1652475] Updated weights for policy 0, policy_version 259428 (0.0013) [2024-06-15 14:55:22,581][1652475] Updated weights for policy 0, policy_version 259479 (0.0011) [2024-06-15 14:55:24,333][1652475] Updated weights for policy 0, policy_version 259568 (0.0125) [2024-06-15 14:55:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 531628032. Throughput: 0: 10797.5. Samples: 132951040. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:55:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 14:55:27,151][1652475] Updated weights for policy 0, policy_version 259618 (0.0014) [2024-06-15 14:55:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 531759104. Throughput: 0: 10706.5. Samples: 133023744. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:55:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:55:34,260][1652475] Updated weights for policy 0, policy_version 259701 (0.0016) [2024-06-15 14:55:35,750][1648984] Fps is (10 sec: 36000.3, 60 sec: 43135.6, 300 sec: 43096.4). Total num frames: 531988480. Throughput: 0: 11033.4. Samples: 133058560. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:55:35,751][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 14:55:36,191][1652475] Updated weights for policy 0, policy_version 259780 (0.0122) [2024-06-15 14:55:37,058][1651340] Signal inference workers to stop experience collection... (13350 times) [2024-06-15 14:55:37,108][1652475] InferenceWorker_p0-w0: stopping experience collection (13350 times) [2024-06-15 14:55:37,350][1651340] Signal inference workers to resume experience collection... (13350 times) [2024-06-15 14:55:37,351][1652475] InferenceWorker_p0-w0: resuming experience collection (13350 times) [2024-06-15 14:55:38,107][1652475] Updated weights for policy 0, policy_version 259841 (0.0012) [2024-06-15 14:55:39,605][1652475] Updated weights for policy 0, policy_version 259904 (0.0014) [2024-06-15 14:55:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 532283392. Throughput: 0: 10365.2. Samples: 133103616. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:55:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:55:45,738][1648984] Fps is (10 sec: 29526.3, 60 sec: 41505.9, 300 sec: 42987.1). Total num frames: 532283392. Throughput: 0: 10672.2. Samples: 133174272. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:55:45,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 14:55:49,015][1652475] Updated weights for policy 0, policy_version 260000 (0.0014) [2024-06-15 14:55:50,738][1648984] Fps is (10 sec: 32767.4, 60 sec: 40413.7, 300 sec: 42876.1). Total num frames: 532611072. Throughput: 0: 10752.0. Samples: 133205504. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:55:50,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:55:51,377][1652475] Updated weights for policy 0, policy_version 260096 (0.0015) [2024-06-15 14:55:52,889][1652475] Updated weights for policy 0, policy_version 260159 (0.0012) [2024-06-15 14:55:55,738][1648984] Fps is (10 sec: 52431.2, 60 sec: 43691.0, 300 sec: 43542.6). Total num frames: 532807680. Throughput: 0: 10194.5. Samples: 133255168. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:55:55,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:56:00,738][1648984] Fps is (10 sec: 22938.0, 60 sec: 38775.5, 300 sec: 42542.9). Total num frames: 532840448. Throughput: 0: 10592.7. Samples: 133329920. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:56:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 14:56:02,349][1652475] Updated weights for policy 0, policy_version 260240 (0.0013) [2024-06-15 14:56:04,354][1652475] Updated weights for policy 0, policy_version 260323 (0.0013) [2024-06-15 14:56:05,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 533266432. Throughput: 0: 10137.6. Samples: 133348352. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:56:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 14:56:06,254][1652475] Updated weights for policy 0, policy_version 260403 (0.0013) [2024-06-15 14:56:10,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 41506.1, 300 sec: 42876.3). Total num frames: 533331968. Throughput: 0: 10194.5. Samples: 133409792. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:56:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:56:13,595][1652475] Updated weights for policy 0, policy_version 260448 (0.0124) [2024-06-15 14:56:15,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 39321.6, 300 sec: 43098.2). Total num frames: 533594112. Throughput: 0: 9966.9. Samples: 133472256. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:56:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:56:16,596][1652475] Updated weights for policy 0, policy_version 260576 (0.0093) [2024-06-15 14:56:18,437][1652475] Updated weights for policy 0, policy_version 260642 (0.0083) [2024-06-15 14:56:20,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 533856256. Throughput: 0: 9605.5. Samples: 133490688. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:56:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:56:25,738][1648984] Fps is (10 sec: 26214.5, 60 sec: 37137.1, 300 sec: 42653.9). Total num frames: 533856256. Throughput: 0: 10240.0. Samples: 133564416. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:56:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:56:26,417][1651340] Signal inference workers to stop experience collection... (13400 times) [2024-06-15 14:56:26,471][1652475] InferenceWorker_p0-w0: stopping experience collection (13400 times) [2024-06-15 14:56:26,475][1652475] Updated weights for policy 0, policy_version 260707 (0.0014) [2024-06-15 14:56:26,713][1651340] Signal inference workers to resume experience collection... (13400 times) [2024-06-15 14:56:26,714][1652475] InferenceWorker_p0-w0: resuming experience collection (13400 times) [2024-06-15 14:56:28,824][1652475] Updated weights for policy 0, policy_version 260800 (0.0014) [2024-06-15 14:56:30,048][1652475] Updated weights for policy 0, policy_version 260848 (0.0013) [2024-06-15 14:56:30,739][1648984] Fps is (10 sec: 39317.3, 60 sec: 41505.4, 300 sec: 43098.1). Total num frames: 534249472. Throughput: 0: 9921.3. Samples: 133620736. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 14:56:30,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:56:31,864][1652475] Updated weights for policy 0, policy_version 260926 (0.0115) [2024-06-15 14:56:35,738][1648984] Fps is (10 sec: 52427.4, 60 sec: 39875.8, 300 sec: 42653.9). Total num frames: 534380544. Throughput: 0: 9875.9. Samples: 133649920. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:56:35,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:56:39,806][1652475] Updated weights for policy 0, policy_version 260994 (0.0012) [2024-06-15 14:56:40,738][1648984] Fps is (10 sec: 32770.8, 60 sec: 38229.2, 300 sec: 42653.9). Total num frames: 534577152. Throughput: 0: 10274.1. Samples: 133717504. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:56:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:56:41,357][1652475] Updated weights for policy 0, policy_version 261048 (0.0012) [2024-06-15 14:56:42,969][1652475] Updated weights for policy 0, policy_version 261107 (0.0015) [2024-06-15 14:56:44,420][1652475] Updated weights for policy 0, policy_version 261180 (0.0014) [2024-06-15 14:56:45,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 43691.0, 300 sec: 43320.4). Total num frames: 534904832. Throughput: 0: 9921.4. Samples: 133776384. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:56:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:56:50,738][1648984] Fps is (10 sec: 36045.8, 60 sec: 38775.6, 300 sec: 42765.0). Total num frames: 534937600. Throughput: 0: 10331.0. Samples: 133813248. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:56:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:56:51,696][1652475] Updated weights for policy 0, policy_version 261238 (0.0013) [2024-06-15 14:56:53,925][1652475] Updated weights for policy 0, policy_version 261314 (0.0071) [2024-06-15 14:56:55,123][1652475] Updated weights for policy 0, policy_version 261372 (0.0101) [2024-06-15 14:56:55,744][1648984] Fps is (10 sec: 42571.3, 60 sec: 42047.8, 300 sec: 43208.4). Total num frames: 535330816. Throughput: 0: 10284.1. Samples: 133872640. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:56:55,745][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:56:56,259][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000261424_535396352.pth... [2024-06-15 14:56:56,289][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000256384_525074432.pth [2024-06-15 14:56:56,518][1652475] Updated weights for policy 0, policy_version 261438 (0.0012) [2024-06-15 14:57:00,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 535429120. Throughput: 0: 10399.3. Samples: 133940224. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:57:00,740][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:57:04,114][1652475] Updated weights for policy 0, policy_version 261505 (0.0101) [2024-06-15 14:57:05,355][1652475] Updated weights for policy 0, policy_version 261566 (0.0012) [2024-06-15 14:57:05,739][1648984] Fps is (10 sec: 36063.9, 60 sec: 40413.1, 300 sec: 42653.8). Total num frames: 535691264. Throughput: 0: 10763.1. Samples: 133975040. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:57:05,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:57:06,530][1651340] Signal inference workers to stop experience collection... (13450 times) [2024-06-15 14:57:06,557][1652475] InferenceWorker_p0-w0: stopping experience collection (13450 times) [2024-06-15 14:57:06,674][1651340] Signal inference workers to resume experience collection... (13450 times) [2024-06-15 14:57:06,674][1652475] InferenceWorker_p0-w0: resuming experience collection (13450 times) [2024-06-15 14:57:07,482][1652475] Updated weights for policy 0, policy_version 261637 (0.0096) [2024-06-15 14:57:10,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 535953408. Throughput: 0: 10433.4. Samples: 134033920. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:57:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:57:14,687][1652475] Updated weights for policy 0, policy_version 261713 (0.0027) [2024-06-15 14:57:15,738][1648984] Fps is (10 sec: 36048.8, 60 sec: 40960.0, 300 sec: 42765.0). Total num frames: 536051712. Throughput: 0: 10877.4. Samples: 134110208. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:57:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:57:16,299][1652475] Updated weights for policy 0, policy_version 261777 (0.0018) [2024-06-15 14:57:18,094][1652475] Updated weights for policy 0, policy_version 261827 (0.0012) [2024-06-15 14:57:20,564][1652475] Updated weights for policy 0, policy_version 261951 (0.0013) [2024-06-15 14:57:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 536477696. Throughput: 0: 10877.2. Samples: 134139392. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:57:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 14:57:25,739][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 536477696. Throughput: 0: 10763.4. Samples: 134201856. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:57:25,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 14:57:27,240][1652475] Updated weights for policy 0, policy_version 262015 (0.0014) [2024-06-15 14:57:29,663][1652475] Updated weights for policy 0, policy_version 262074 (0.0015) [2024-06-15 14:57:30,770][1648984] Fps is (10 sec: 32662.0, 60 sec: 42576.1, 300 sec: 42871.4). Total num frames: 536805376. Throughput: 0: 10926.2. Samples: 134268416. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:57:30,771][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:57:31,575][1652475] Updated weights for policy 0, policy_version 262160 (0.0135) [2024-06-15 14:57:32,684][1652475] Updated weights for policy 0, policy_version 262208 (0.0016) [2024-06-15 14:57:35,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.9, 300 sec: 42653.9). Total num frames: 537001984. Throughput: 0: 10661.0. Samples: 134292992. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:57:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 14:57:38,277][1652475] Updated weights for policy 0, policy_version 262262 (0.0016) [2024-06-15 14:57:40,738][1648984] Fps is (10 sec: 32874.6, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 537133056. Throughput: 0: 10992.5. Samples: 134367232. Policy #0 lag: (min: 88.0, avg: 233.0, max: 328.0) [2024-06-15 14:57:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 14:57:41,737][1652475] Updated weights for policy 0, policy_version 262306 (0.0016) [2024-06-15 14:57:43,198][1652475] Updated weights for policy 0, policy_version 262384 (0.0014) [2024-06-15 14:57:45,196][1652475] Updated weights for policy 0, policy_version 262459 (0.0015) [2024-06-15 14:57:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 537526272. Throughput: 0: 10831.6. Samples: 134427648. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:57:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 14:57:50,212][1651340] Signal inference workers to stop experience collection... (13500 times) [2024-06-15 14:57:50,260][1652475] InferenceWorker_p0-w0: stopping experience collection (13500 times) [2024-06-15 14:57:50,415][1651340] Signal inference workers to resume experience collection... (13500 times) [2024-06-15 14:57:50,415][1652475] InferenceWorker_p0-w0: resuming experience collection (13500 times) [2024-06-15 14:57:50,572][1652475] Updated weights for policy 0, policy_version 262522 (0.0015) [2024-06-15 14:57:50,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45329.0, 300 sec: 42765.0). Total num frames: 537657344. Throughput: 0: 10911.5. Samples: 134466048. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:57:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:57:54,241][1652475] Updated weights for policy 0, policy_version 262583 (0.0014) [2024-06-15 14:57:55,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 42602.9, 300 sec: 42542.8). Total num frames: 537886720. Throughput: 0: 11002.3. Samples: 134529024. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:57:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 14:57:57,290][1652475] Updated weights for policy 0, policy_version 262704 (0.0014) [2024-06-15 14:58:00,737][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.8, 300 sec: 42542.9). Total num frames: 538050560. Throughput: 0: 10535.9. Samples: 134584320. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:58:03,337][1652475] Updated weights for policy 0, policy_version 262736 (0.0012) [2024-06-15 14:58:05,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42599.2, 300 sec: 42432.4). Total num frames: 538247168. Throughput: 0: 10820.3. Samples: 134626304. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 14:58:06,325][1652475] Updated weights for policy 0, policy_version 262840 (0.0022) [2024-06-15 14:58:08,031][1652475] Updated weights for policy 0, policy_version 262909 (0.0016) [2024-06-15 14:58:09,650][1652475] Updated weights for policy 0, policy_version 262974 (0.0016) [2024-06-15 14:58:10,738][1648984] Fps is (10 sec: 52427.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 538574848. Throughput: 0: 10558.6. Samples: 134676992. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:10,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 14:58:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.5, 300 sec: 41987.5). Total num frames: 538640384. Throughput: 0: 10645.9. Samples: 134747136. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:58:17,493][1652475] Updated weights for policy 0, policy_version 263041 (0.0149) [2024-06-15 14:58:18,822][1652475] Updated weights for policy 0, policy_version 263102 (0.0014) [2024-06-15 14:58:20,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 40413.9, 300 sec: 42209.8). Total num frames: 538902528. Throughput: 0: 10695.1. Samples: 134774272. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 14:58:22,382][1652475] Updated weights for policy 0, policy_version 263200 (0.0186) [2024-06-15 14:58:25,739][1648984] Fps is (10 sec: 45874.1, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 539099136. Throughput: 0: 10296.8. Samples: 134830592. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:25,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:58:28,101][1652475] Updated weights for policy 0, policy_version 263268 (0.0013) [2024-06-15 14:58:30,426][1652475] Updated weights for policy 0, policy_version 263331 (0.0014) [2024-06-15 14:58:30,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42075.0, 300 sec: 42431.8). Total num frames: 539328512. Throughput: 0: 10501.7. Samples: 134900224. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:58:33,676][1652475] Updated weights for policy 0, policy_version 263392 (0.0013) [2024-06-15 14:58:34,937][1651340] Signal inference workers to stop experience collection... (13550 times) [2024-06-15 14:58:34,994][1652475] InferenceWorker_p0-w0: stopping experience collection (13550 times) [2024-06-15 14:58:35,142][1651340] Signal inference workers to resume experience collection... (13550 times) [2024-06-15 14:58:35,143][1652475] InferenceWorker_p0-w0: resuming experience collection (13550 times) [2024-06-15 14:58:35,500][1652475] Updated weights for policy 0, policy_version 263479 (0.0016) [2024-06-15 14:58:35,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 539623424. Throughput: 0: 10433.4. Samples: 134935552. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:58:39,719][1652475] Updated weights for policy 0, policy_version 263536 (0.0023) [2024-06-15 14:58:40,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 43690.8, 300 sec: 42542.9). Total num frames: 539754496. Throughput: 0: 10547.2. Samples: 135003648. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:58:42,042][1652475] Updated weights for policy 0, policy_version 263600 (0.0014) [2024-06-15 14:58:45,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 40960.0, 300 sec: 42098.7). Total num frames: 539983872. Throughput: 0: 10808.9. Samples: 135070720. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:58:45,898][1652475] Updated weights for policy 0, policy_version 263666 (0.0012) [2024-06-15 14:58:46,960][1652475] Updated weights for policy 0, policy_version 263717 (0.0013) [2024-06-15 14:58:50,512][1652475] Updated weights for policy 0, policy_version 263779 (0.0014) [2024-06-15 14:58:50,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 540246016. Throughput: 0: 10615.5. Samples: 135104000. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:58:53,354][1652475] Updated weights for policy 0, policy_version 263868 (0.0014) [2024-06-15 14:58:55,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42542.8). Total num frames: 540409856. Throughput: 0: 11002.3. Samples: 135172096. Policy #0 lag: (min: 95.0, avg: 144.5, max: 255.0) [2024-06-15 14:58:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:58:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000263872_540409856.pth... [2024-06-15 14:58:55,914][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000258880_530186240.pth [2024-06-15 14:58:57,316][1652475] Updated weights for policy 0, policy_version 263938 (0.0098) [2024-06-15 14:58:58,561][1652475] Updated weights for policy 0, policy_version 264000 (0.0019) [2024-06-15 14:59:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 540672000. Throughput: 0: 11002.3. Samples: 135242240. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:59:02,422][1652475] Updated weights for policy 0, policy_version 264055 (0.0014) [2024-06-15 14:59:04,427][1652475] Updated weights for policy 0, policy_version 264096 (0.0023) [2024-06-15 14:59:05,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 540934144. Throughput: 0: 11104.7. Samples: 135273984. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:59:07,499][1652475] Updated weights for policy 0, policy_version 264134 (0.0012) [2024-06-15 14:59:09,279][1652475] Updated weights for policy 0, policy_version 264198 (0.0014) [2024-06-15 14:59:10,307][1652475] Updated weights for policy 0, policy_version 264248 (0.0012) [2024-06-15 14:59:10,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 541196288. Throughput: 0: 11400.6. Samples: 135343616. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:59:13,544][1652475] Updated weights for policy 0, policy_version 264292 (0.0015) [2024-06-15 14:59:15,329][1652475] Updated weights for policy 0, policy_version 264330 (0.0017) [2024-06-15 14:59:15,739][1648984] Fps is (10 sec: 42597.4, 60 sec: 45328.9, 300 sec: 43098.2). Total num frames: 541360128. Throughput: 0: 11400.5. Samples: 135413248. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:15,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:59:16,295][1652475] Updated weights for policy 0, policy_version 264369 (0.0034) [2024-06-15 14:59:19,622][1652475] Updated weights for policy 0, policy_version 264432 (0.0013) [2024-06-15 14:59:20,552][1651340] Signal inference workers to stop experience collection... (13600 times) [2024-06-15 14:59:20,604][1652475] InferenceWorker_p0-w0: stopping experience collection (13600 times) [2024-06-15 14:59:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45329.1, 300 sec: 42765.0). Total num frames: 541622272. Throughput: 0: 11525.7. Samples: 135454208. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:59:20,841][1651340] Signal inference workers to resume experience collection... (13600 times) [2024-06-15 14:59:20,841][1652475] InferenceWorker_p0-w0: resuming experience collection (13600 times) [2024-06-15 14:59:20,844][1652475] Updated weights for policy 0, policy_version 264480 (0.0013) [2024-06-15 14:59:23,592][1652475] Updated weights for policy 0, policy_version 264518 (0.0028) [2024-06-15 14:59:24,955][1652475] Updated weights for policy 0, policy_version 264575 (0.0015) [2024-06-15 14:59:25,738][1648984] Fps is (10 sec: 49153.1, 60 sec: 45875.4, 300 sec: 43098.2). Total num frames: 541851648. Throughput: 0: 11446.0. Samples: 135518720. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:59:26,906][1652475] Updated weights for policy 0, policy_version 264637 (0.0015) [2024-06-15 14:59:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44783.0, 300 sec: 42765.0). Total num frames: 542015488. Throughput: 0: 11650.8. Samples: 135595008. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 14:59:32,190][1652475] Updated weights for policy 0, policy_version 264713 (0.0016) [2024-06-15 14:59:34,269][1652475] Updated weights for policy 0, policy_version 264769 (0.0015) [2024-06-15 14:59:35,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 45875.3, 300 sec: 43098.2). Total num frames: 542375936. Throughput: 0: 11593.9. Samples: 135625728. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 14:59:36,961][1652475] Updated weights for policy 0, policy_version 264848 (0.0012) [2024-06-15 14:59:40,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 45875.1, 300 sec: 43098.3). Total num frames: 542507008. Throughput: 0: 11514.3. Samples: 135690240. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:40,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 14:59:42,162][1652475] Updated weights for policy 0, policy_version 264918 (0.0014) [2024-06-15 14:59:43,465][1652475] Updated weights for policy 0, policy_version 264961 (0.0013) [2024-06-15 14:59:45,154][1652475] Updated weights for policy 0, policy_version 265024 (0.0013) [2024-06-15 14:59:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 46421.3, 300 sec: 42653.9). Total num frames: 542769152. Throughput: 0: 11389.2. Samples: 135754752. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 14:59:50,483][1652475] Updated weights for policy 0, policy_version 265091 (0.0017) [2024-06-15 14:59:50,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44782.9, 300 sec: 43209.4). Total num frames: 542932992. Throughput: 0: 11309.5. Samples: 135782912. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:50,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 14:59:51,662][1652475] Updated weights for policy 0, policy_version 265148 (0.0013) [2024-06-15 14:59:55,446][1652475] Updated weights for policy 0, policy_version 265202 (0.0012) [2024-06-15 14:59:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 42876.1). Total num frames: 543162368. Throughput: 0: 11446.1. Samples: 135858688. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 14:59:55,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 14:59:59,139][1652475] Updated weights for policy 0, policy_version 265312 (0.0103) [2024-06-15 15:00:00,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 45875.2, 300 sec: 43098.2). Total num frames: 543424512. Throughput: 0: 11047.9. Samples: 135910400. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 15:00:00,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:00:03,617][1652475] Updated weights for policy 0, policy_version 265376 (0.0013) [2024-06-15 15:00:05,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 543555584. Throughput: 0: 11025.1. Samples: 135950336. Policy #0 lag: (min: 11.0, avg: 103.0, max: 267.0) [2024-06-15 15:00:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:00:07,111][1652475] Updated weights for policy 0, policy_version 265440 (0.0014) [2024-06-15 15:00:08,251][1651340] Signal inference workers to stop experience collection... (13650 times) [2024-06-15 15:00:08,372][1652475] InferenceWorker_p0-w0: stopping experience collection (13650 times) [2024-06-15 15:00:08,460][1651340] Signal inference workers to resume experience collection... (13650 times) [2024-06-15 15:00:08,460][1652475] InferenceWorker_p0-w0: resuming experience collection (13650 times) [2024-06-15 15:00:08,903][1652475] Updated weights for policy 0, policy_version 265523 (0.0134) [2024-06-15 15:00:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 543817728. Throughput: 0: 10854.4. Samples: 136007168. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:00:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:00:12,430][1652475] Updated weights for policy 0, policy_version 265571 (0.0025) [2024-06-15 15:00:12,919][1652475] Updated weights for policy 0, policy_version 265600 (0.0013) [2024-06-15 15:00:15,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45329.3, 300 sec: 43542.6). Total num frames: 544079872. Throughput: 0: 10740.6. Samples: 136078336. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:00:15,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:00:15,753][1652475] Updated weights for policy 0, policy_version 265664 (0.0011) [2024-06-15 15:00:20,455][1652475] Updated weights for policy 0, policy_version 265760 (0.0017) [2024-06-15 15:00:20,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44236.8, 300 sec: 42876.1). Total num frames: 544276480. Throughput: 0: 10865.8. Samples: 136114688. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:00:20,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:00:24,567][1652475] Updated weights for policy 0, policy_version 265824 (0.0012) [2024-06-15 15:00:25,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 544473088. Throughput: 0: 10808.9. Samples: 136176640. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:00:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:00:26,276][1652475] Updated weights for policy 0, policy_version 265872 (0.0013) [2024-06-15 15:00:30,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43144.5, 300 sec: 42766.8). Total num frames: 544604160. Throughput: 0: 10786.1. Samples: 136240128. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:00:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:00:31,954][1652475] Updated weights for policy 0, policy_version 265921 (0.0014) [2024-06-15 15:00:34,010][1652475] Updated weights for policy 0, policy_version 266016 (0.0013) [2024-06-15 15:00:35,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 41505.9, 300 sec: 42653.9). Total num frames: 544866304. Throughput: 0: 10831.6. Samples: 136270336. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:00:35,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:00:36,153][1652475] Updated weights for policy 0, policy_version 266080 (0.0065) [2024-06-15 15:00:38,599][1652475] Updated weights for policy 0, policy_version 266160 (0.0013) [2024-06-15 15:00:40,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 545128448. Throughput: 0: 10513.1. Samples: 136331776. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:00:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:00:44,407][1652475] Updated weights for policy 0, policy_version 266224 (0.0013) [2024-06-15 15:00:45,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 545259520. Throughput: 0: 10911.3. Samples: 136401408. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:00:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:00:46,956][1652475] Updated weights for policy 0, policy_version 266288 (0.0013) [2024-06-15 15:00:48,716][1652475] Updated weights for policy 0, policy_version 266364 (0.0118) [2024-06-15 15:00:50,539][1652475] Updated weights for policy 0, policy_version 266402 (0.0011) [2024-06-15 15:00:50,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 545587200. Throughput: 0: 10615.5. Samples: 136428032. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:00:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:00:55,574][1652475] Updated weights for policy 0, policy_version 266436 (0.0016) [2024-06-15 15:00:55,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 545685504. Throughput: 0: 10877.2. Samples: 136496640. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:00:55,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:00:56,100][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000266464_545718272.pth... [2024-06-15 15:00:56,251][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000261424_535396352.pth [2024-06-15 15:00:56,821][1652475] Updated weights for policy 0, policy_version 266491 (0.0012) [2024-06-15 15:00:58,201][1651340] Signal inference workers to stop experience collection... (13700 times) [2024-06-15 15:00:58,252][1652475] InferenceWorker_p0-w0: stopping experience collection (13700 times) [2024-06-15 15:00:58,453][1651340] Signal inference workers to resume experience collection... (13700 times) [2024-06-15 15:00:58,454][1652475] InferenceWorker_p0-w0: resuming experience collection (13700 times) [2024-06-15 15:01:00,109][1652475] Updated weights for policy 0, policy_version 266579 (0.0128) [2024-06-15 15:01:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 545980416. Throughput: 0: 10672.3. Samples: 136558592. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:01:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:02,295][1652475] Updated weights for policy 0, policy_version 266641 (0.0012) [2024-06-15 15:01:05,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 546177024. Throughput: 0: 10513.1. Samples: 136587776. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:01:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:08,336][1652475] Updated weights for policy 0, policy_version 266704 (0.0011) [2024-06-15 15:01:10,603][1652475] Updated weights for policy 0, policy_version 266754 (0.0012) [2024-06-15 15:01:10,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 546308096. Throughput: 0: 10661.0. Samples: 136656384. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:01:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:12,354][1652475] Updated weights for policy 0, policy_version 266821 (0.0012) [2024-06-15 15:01:14,806][1652475] Updated weights for policy 0, policy_version 266912 (0.0013) [2024-06-15 15:01:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 546701312. Throughput: 0: 10569.9. Samples: 136715776. Policy #0 lag: (min: 47.0, avg: 133.2, max: 303.0) [2024-06-15 15:01:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:20,506][1652475] Updated weights for policy 0, policy_version 266950 (0.0014) [2024-06-15 15:01:20,740][1648984] Fps is (10 sec: 42598.3, 60 sec: 40960.0, 300 sec: 43653.6). Total num frames: 546734080. Throughput: 0: 10752.1. Samples: 136754176. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:01:20,741][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:21,638][1652475] Updated weights for policy 0, policy_version 267008 (0.0076) [2024-06-15 15:01:24,348][1652475] Updated weights for policy 0, policy_version 267074 (0.0012) [2024-06-15 15:01:25,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43144.6, 300 sec: 43431.6). Total num frames: 547061760. Throughput: 0: 10854.4. Samples: 136820224. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:01:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:26,820][1652475] Updated weights for policy 0, policy_version 267168 (0.0096) [2024-06-15 15:01:30,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 547225600. Throughput: 0: 10717.9. Samples: 136883712. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:01:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:33,109][1652475] Updated weights for policy 0, policy_version 267248 (0.0015) [2024-06-15 15:01:35,480][1652475] Updated weights for policy 0, policy_version 267296 (0.0049) [2024-06-15 15:01:35,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42598.6, 300 sec: 43542.6). Total num frames: 547422208. Throughput: 0: 10877.2. Samples: 136917504. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:01:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:37,836][1652475] Updated weights for policy 0, policy_version 267392 (0.0109) [2024-06-15 15:01:39,864][1652475] Updated weights for policy 0, policy_version 267456 (0.0017) [2024-06-15 15:01:40,755][1648984] Fps is (10 sec: 52340.6, 60 sec: 43678.4, 300 sec: 43540.1). Total num frames: 547749888. Throughput: 0: 10645.6. Samples: 136975872. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:01:40,756][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:45,179][1651340] Signal inference workers to stop experience collection... (13750 times) [2024-06-15 15:01:45,258][1652475] InferenceWorker_p0-w0: stopping experience collection (13750 times) [2024-06-15 15:01:45,458][1651340] Signal inference workers to resume experience collection... (13750 times) [2024-06-15 15:01:45,459][1652475] InferenceWorker_p0-w0: resuming experience collection (13750 times) [2024-06-15 15:01:45,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42598.3, 300 sec: 43653.6). Total num frames: 547815424. Throughput: 0: 10877.2. Samples: 137048064. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:01:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:46,133][1652475] Updated weights for policy 0, policy_version 267517 (0.0012) [2024-06-15 15:01:48,208][1652475] Updated weights for policy 0, policy_version 267569 (0.0013) [2024-06-15 15:01:50,418][1652475] Updated weights for policy 0, policy_version 267649 (0.0014) [2024-06-15 15:01:50,738][1648984] Fps is (10 sec: 42670.1, 60 sec: 43144.5, 300 sec: 43543.5). Total num frames: 548175872. Throughput: 0: 10934.0. Samples: 137079808. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:01:50,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:01:51,647][1652475] Updated weights for policy 0, policy_version 267705 (0.0013) [2024-06-15 15:01:55,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 548274176. Throughput: 0: 10808.9. Samples: 137142784. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:01:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:01:57,257][1652475] Updated weights for policy 0, policy_version 267735 (0.0012) [2024-06-15 15:01:58,659][1652475] Updated weights for policy 0, policy_version 267792 (0.0014) [2024-06-15 15:02:00,713][1652475] Updated weights for policy 0, policy_version 267856 (0.0013) [2024-06-15 15:02:00,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43144.6, 300 sec: 43653.8). Total num frames: 548569088. Throughput: 0: 10968.2. Samples: 137209344. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:02:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:02:03,257][1652475] Updated weights for policy 0, policy_version 267908 (0.0056) [2024-06-15 15:02:05,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 548798464. Throughput: 0: 10820.3. Samples: 137241088. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:02:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:02:09,011][1652475] Updated weights for policy 0, policy_version 267987 (0.0015) [2024-06-15 15:02:09,924][1652475] Updated weights for policy 0, policy_version 268032 (0.0013) [2024-06-15 15:02:10,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44782.9, 300 sec: 43875.8). Total num frames: 548995072. Throughput: 0: 10774.7. Samples: 137305088. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:02:10,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:02:11,184][1652475] Updated weights for policy 0, policy_version 268092 (0.0013) [2024-06-15 15:02:13,043][1652475] Updated weights for policy 0, policy_version 268160 (0.0015) [2024-06-15 15:02:15,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 549191680. Throughput: 0: 10854.4. Samples: 137372160. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:02:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:02:17,640][1652475] Updated weights for policy 0, policy_version 268213 (0.0013) [2024-06-15 15:02:20,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 44236.8, 300 sec: 43764.7). Total num frames: 549388288. Throughput: 0: 10831.6. Samples: 137404928. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:02:20,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 15:02:21,233][1652475] Updated weights for policy 0, policy_version 268283 (0.0114) [2024-06-15 15:02:23,623][1652475] Updated weights for policy 0, policy_version 268352 (0.0017) [2024-06-15 15:02:25,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 44236.8, 300 sec: 43769.5). Total num frames: 549715968. Throughput: 0: 10972.3. Samples: 137469440. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:02:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:02:28,007][1652475] Updated weights for policy 0, policy_version 268432 (0.0026) [2024-06-15 15:02:28,523][1651340] Signal inference workers to stop experience collection... (13800 times) [2024-06-15 15:02:28,570][1652475] InferenceWorker_p0-w0: stopping experience collection (13800 times) [2024-06-15 15:02:28,742][1651340] Signal inference workers to resume experience collection... (13800 times) [2024-06-15 15:02:28,743][1652475] InferenceWorker_p0-w0: resuming experience collection (13800 times) [2024-06-15 15:02:30,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 549847040. Throughput: 0: 10877.2. Samples: 137537536. Policy #0 lag: (min: 15.0, avg: 88.5, max: 271.0) [2024-06-15 15:02:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:02:31,825][1652475] Updated weights for policy 0, policy_version 268481 (0.0013) [2024-06-15 15:02:33,072][1652475] Updated weights for policy 0, policy_version 268540 (0.0012) [2024-06-15 15:02:35,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 44782.7, 300 sec: 43986.8). Total num frames: 550109184. Throughput: 0: 10945.4. Samples: 137572352. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:02:35,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:02:35,938][1652475] Updated weights for policy 0, policy_version 268624 (0.0015) [2024-06-15 15:02:39,691][1652475] Updated weights for policy 0, policy_version 268704 (0.0014) [2024-06-15 15:02:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43702.9, 300 sec: 43542.6). Total num frames: 550371328. Throughput: 0: 10922.7. Samples: 137634304. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:02:40,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:02:45,738][1648984] Fps is (10 sec: 29492.3, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 550404096. Throughput: 0: 11070.6. Samples: 137707520. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:02:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:02:45,894][1652475] Updated weights for policy 0, policy_version 268754 (0.0016) [2024-06-15 15:02:48,486][1652475] Updated weights for policy 0, policy_version 268865 (0.0017) [2024-06-15 15:02:50,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 550764544. Throughput: 0: 10877.1. Samples: 137730560. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:02:50,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:02:51,218][1652475] Updated weights for policy 0, policy_version 268931 (0.0015) [2024-06-15 15:02:52,552][1652475] Updated weights for policy 0, policy_version 268988 (0.0039) [2024-06-15 15:02:55,738][1648984] Fps is (10 sec: 49150.2, 60 sec: 43690.4, 300 sec: 43542.5). Total num frames: 550895616. Throughput: 0: 10774.7. Samples: 137789952. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:02:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:02:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000268992_550895616.pth... [2024-06-15 15:02:55,811][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000263872_540409856.pth [2024-06-15 15:02:59,016][1652475] Updated weights for policy 0, policy_version 269045 (0.0145) [2024-06-15 15:03:00,580][1652475] Updated weights for policy 0, policy_version 269118 (0.0027) [2024-06-15 15:03:00,738][1648984] Fps is (10 sec: 39320.1, 60 sec: 43144.1, 300 sec: 43764.6). Total num frames: 551157760. Throughput: 0: 10786.0. Samples: 137857536. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:03:00,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:03:03,926][1652475] Updated weights for policy 0, policy_version 269216 (0.0128) [2024-06-15 15:03:05,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 551419904. Throughput: 0: 10626.8. Samples: 137883136. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:03:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:03:10,730][1652475] Updated weights for policy 0, policy_version 269280 (0.0014) [2024-06-15 15:03:10,738][1648984] Fps is (10 sec: 32769.8, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 551485440. Throughput: 0: 10729.3. Samples: 137952256. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:03:10,758][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:03:12,989][1652475] Updated weights for policy 0, policy_version 269371 (0.0131) [2024-06-15 15:03:15,506][1651340] Signal inference workers to stop experience collection... (13850 times) [2024-06-15 15:03:15,603][1652475] InferenceWorker_p0-w0: stopping experience collection (13850 times) [2024-06-15 15:03:15,737][1648984] Fps is (10 sec: 29491.8, 60 sec: 42052.4, 300 sec: 43431.5). Total num frames: 551714816. Throughput: 0: 10558.6. Samples: 138012672. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:03:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:03:15,829][1651340] Signal inference workers to resume experience collection... (13850 times) [2024-06-15 15:03:15,830][1652475] InferenceWorker_p0-w0: resuming experience collection (13850 times) [2024-06-15 15:03:16,969][1652475] Updated weights for policy 0, policy_version 269456 (0.0012) [2024-06-15 15:03:18,032][1652475] Updated weights for policy 0, policy_version 269502 (0.0013) [2024-06-15 15:03:20,738][1648984] Fps is (10 sec: 45873.8, 60 sec: 42598.2, 300 sec: 43542.6). Total num frames: 551944192. Throughput: 0: 10285.5. Samples: 138035200. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:03:20,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:03:25,068][1652475] Updated weights for policy 0, policy_version 269603 (0.0013) [2024-06-15 15:03:25,738][1648984] Fps is (10 sec: 49151.2, 60 sec: 41506.1, 300 sec: 43653.6). Total num frames: 552206336. Throughput: 0: 10433.4. Samples: 138103808. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:03:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:03:28,924][1652475] Updated weights for policy 0, policy_version 269699 (0.0031) [2024-06-15 15:03:30,480][1652475] Updated weights for policy 0, policy_version 269760 (0.0013) [2024-06-15 15:03:30,738][1648984] Fps is (10 sec: 52430.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 552468480. Throughput: 0: 10058.0. Samples: 138160128. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:03:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:03:35,738][1648984] Fps is (10 sec: 29491.5, 60 sec: 39868.0, 300 sec: 43209.3). Total num frames: 552501248. Throughput: 0: 10376.6. Samples: 138197504. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:03:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:03:36,962][1652475] Updated weights for policy 0, policy_version 269825 (0.0014) [2024-06-15 15:03:38,420][1652475] Updated weights for policy 0, policy_version 269883 (0.0013) [2024-06-15 15:03:40,549][1652475] Updated weights for policy 0, policy_version 269924 (0.0012) [2024-06-15 15:03:40,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 40413.9, 300 sec: 43431.5). Total num frames: 552796160. Throughput: 0: 10501.8. Samples: 138262528. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:03:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:03:42,614][1652475] Updated weights for policy 0, policy_version 270014 (0.0015) [2024-06-15 15:03:45,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 552992768. Throughput: 0: 10342.5. Samples: 138322944. Policy #0 lag: (min: 31.0, avg: 118.9, max: 287.0) [2024-06-15 15:03:45,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:03:48,887][1652475] Updated weights for policy 0, policy_version 270064 (0.0012) [2024-06-15 15:03:50,653][1652475] Updated weights for policy 0, policy_version 270132 (0.0012) [2024-06-15 15:03:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 40960.1, 300 sec: 43431.5). Total num frames: 553222144. Throughput: 0: 10558.6. Samples: 138358272. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:03:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:03:54,338][1652475] Updated weights for policy 0, policy_version 270208 (0.0244) [2024-06-15 15:03:55,522][1652475] Updated weights for policy 0, policy_version 270271 (0.0014) [2024-06-15 15:03:55,738][1648984] Fps is (10 sec: 52426.8, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 553517056. Throughput: 0: 10171.6. Samples: 138409984. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:03:55,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:04:00,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40414.2, 300 sec: 42876.1). Total num frames: 553582592. Throughput: 0: 10524.4. Samples: 138486272. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:04:00,753][1651340] Signal inference workers to stop experience collection... (13900 times) [2024-06-15 15:04:00,826][1652475] InferenceWorker_p0-w0: stopping experience collection (13900 times) [2024-06-15 15:04:00,968][1651340] Signal inference workers to resume experience collection... (13900 times) [2024-06-15 15:04:00,969][1652475] InferenceWorker_p0-w0: resuming experience collection (13900 times) [2024-06-15 15:04:01,379][1652475] Updated weights for policy 0, policy_version 270339 (0.0080) [2024-06-15 15:04:04,726][1652475] Updated weights for policy 0, policy_version 270404 (0.0035) [2024-06-15 15:04:05,738][1648984] Fps is (10 sec: 32769.2, 60 sec: 40413.9, 300 sec: 42876.1). Total num frames: 553844736. Throughput: 0: 10626.9. Samples: 138513408. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:04:06,583][1652475] Updated weights for policy 0, policy_version 270487 (0.0014) [2024-06-15 15:04:10,750][1648984] Fps is (10 sec: 45817.8, 60 sec: 42589.5, 300 sec: 42985.4). Total num frames: 554041344. Throughput: 0: 10589.8. Samples: 138580480. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:10,751][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:04:12,628][1652475] Updated weights for policy 0, policy_version 270582 (0.0016) [2024-06-15 15:04:13,857][1652475] Updated weights for policy 0, policy_version 270624 (0.0016) [2024-06-15 15:04:14,714][1652475] Updated weights for policy 0, policy_version 270652 (0.0012) [2024-06-15 15:04:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 554303488. Throughput: 0: 10831.6. Samples: 138647552. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:15,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:04:17,584][1652475] Updated weights for policy 0, policy_version 270707 (0.0014) [2024-06-15 15:04:20,738][1648984] Fps is (10 sec: 52494.8, 60 sec: 43690.9, 300 sec: 43098.3). Total num frames: 554565632. Throughput: 0: 10695.1. Samples: 138678784. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:20,738][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 15:04:22,956][1652475] Updated weights for policy 0, policy_version 270786 (0.0014) [2024-06-15 15:04:24,232][1652475] Updated weights for policy 0, policy_version 270848 (0.0018) [2024-06-15 15:04:25,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 554762240. Throughput: 0: 10854.4. Samples: 138750976. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:25,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:04:29,470][1652475] Updated weights for policy 0, policy_version 270946 (0.0016) [2024-06-15 15:04:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 554958848. Throughput: 0: 10763.4. Samples: 138807296. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:04:32,159][1652475] Updated weights for policy 0, policy_version 271033 (0.0059) [2024-06-15 15:04:35,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 555122688. Throughput: 0: 10717.9. Samples: 138840576. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:04:36,443][1652475] Updated weights for policy 0, policy_version 271096 (0.0019) [2024-06-15 15:04:38,317][1652475] Updated weights for policy 0, policy_version 271164 (0.0013) [2024-06-15 15:04:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 555352064. Throughput: 0: 10956.9. Samples: 138903040. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 15:04:42,243][1652475] Updated weights for policy 0, policy_version 271226 (0.0033) [2024-06-15 15:04:44,303][1652475] Updated weights for policy 0, policy_version 271295 (0.0072) [2024-06-15 15:04:45,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 43690.5, 300 sec: 42987.2). Total num frames: 555614208. Throughput: 0: 10740.6. Samples: 138969600. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:45,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:04:48,511][1651340] Signal inference workers to stop experience collection... (13950 times) [2024-06-15 15:04:48,591][1652475] InferenceWorker_p0-w0: stopping experience collection (13950 times) [2024-06-15 15:04:48,784][1651340] Signal inference workers to resume experience collection... (13950 times) [2024-06-15 15:04:48,785][1652475] InferenceWorker_p0-w0: resuming experience collection (13950 times) [2024-06-15 15:04:49,678][1652475] Updated weights for policy 0, policy_version 271360 (0.0060) [2024-06-15 15:04:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 555810816. Throughput: 0: 11002.3. Samples: 139008512. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:04:51,211][1652475] Updated weights for policy 0, policy_version 271420 (0.0024) [2024-06-15 15:04:54,205][1652475] Updated weights for policy 0, policy_version 271485 (0.0152) [2024-06-15 15:04:55,753][1648984] Fps is (10 sec: 49079.1, 60 sec: 43134.0, 300 sec: 42985.0). Total num frames: 556105728. Throughput: 0: 10694.5. Samples: 139061760. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:04:55,753][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:04:55,894][1652475] Updated weights for policy 0, policy_version 271549 (0.0014) [2024-06-15 15:04:55,918][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000271552_556138496.pth... [2024-06-15 15:04:55,961][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000266464_545718272.pth [2024-06-15 15:05:00,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 556138496. Throughput: 0: 10945.4. Samples: 139140096. Policy #0 lag: (min: 1.0, avg: 75.8, max: 257.0) [2024-06-15 15:05:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:05:02,858][1652475] Updated weights for policy 0, policy_version 271632 (0.0216) [2024-06-15 15:05:03,727][1652475] Updated weights for policy 0, policy_version 271680 (0.0013) [2024-06-15 15:05:05,738][1648984] Fps is (10 sec: 39380.7, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 556498944. Throughput: 0: 10706.5. Samples: 139160576. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:05:05,982][1652475] Updated weights for policy 0, policy_version 271744 (0.0014) [2024-06-15 15:05:10,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43699.6, 300 sec: 42653.9). Total num frames: 556662784. Throughput: 0: 10547.1. Samples: 139225600. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:10,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:05:13,044][1652475] Updated weights for policy 0, policy_version 271812 (0.0013) [2024-06-15 15:05:15,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 556793856. Throughput: 0: 10808.9. Samples: 139293696. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:15,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 15:05:15,989][1652475] Updated weights for policy 0, policy_version 271888 (0.0138) [2024-06-15 15:05:17,505][1652475] Updated weights for policy 0, policy_version 271938 (0.0014) [2024-06-15 15:05:18,699][1652475] Updated weights for policy 0, policy_version 271986 (0.0014) [2024-06-15 15:05:20,611][1652475] Updated weights for policy 0, policy_version 272064 (0.0016) [2024-06-15 15:05:20,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 557187072. Throughput: 0: 10649.6. Samples: 139319808. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:20,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:05:25,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 557252608. Throughput: 0: 10695.1. Samples: 139384320. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:05:27,659][1652475] Updated weights for policy 0, policy_version 272129 (0.0014) [2024-06-15 15:05:28,942][1652475] Updated weights for policy 0, policy_version 272192 (0.0015) [2024-06-15 15:05:30,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 557514752. Throughput: 0: 10615.5. Samples: 139447296. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:05:31,163][1652475] Updated weights for policy 0, policy_version 272250 (0.0013) [2024-06-15 15:05:33,437][1652475] Updated weights for policy 0, policy_version 272313 (0.0156) [2024-06-15 15:05:35,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 557711360. Throughput: 0: 10319.6. Samples: 139472896. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:05:36,749][1651340] Signal inference workers to stop experience collection... (14000 times) [2024-06-15 15:05:36,796][1652475] InferenceWorker_p0-w0: stopping experience collection (14000 times) [2024-06-15 15:05:36,962][1651340] Signal inference workers to resume experience collection... (14000 times) [2024-06-15 15:05:36,963][1652475] InferenceWorker_p0-w0: resuming experience collection (14000 times) [2024-06-15 15:05:37,565][1652475] Updated weights for policy 0, policy_version 272360 (0.0146) [2024-06-15 15:05:40,367][1652475] Updated weights for policy 0, policy_version 272404 (0.0038) [2024-06-15 15:05:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 557907968. Throughput: 0: 10846.6. Samples: 139549696. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:05:42,168][1652475] Updated weights for policy 0, policy_version 272480 (0.0013) [2024-06-15 15:05:42,949][1652475] Updated weights for policy 0, policy_version 272510 (0.0015) [2024-06-15 15:05:45,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 558235648. Throughput: 0: 10376.5. Samples: 139607040. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:05:48,974][1652475] Updated weights for policy 0, policy_version 272592 (0.0014) [2024-06-15 15:05:50,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.2, 300 sec: 42987.1). Total num frames: 558366720. Throughput: 0: 10854.4. Samples: 139649024. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:50,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:05:52,892][1652475] Updated weights for policy 0, policy_version 272680 (0.0019) [2024-06-15 15:05:54,885][1652475] Updated weights for policy 0, policy_version 272752 (0.0012) [2024-06-15 15:05:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42062.7, 300 sec: 42876.1). Total num frames: 558628864. Throughput: 0: 10717.9. Samples: 139707904. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:05:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:05:56,325][1652475] Updated weights for policy 0, policy_version 272787 (0.0012) [2024-06-15 15:06:00,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 558759936. Throughput: 0: 10843.0. Samples: 139781632. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:06:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:06:00,983][1652475] Updated weights for policy 0, policy_version 272864 (0.0027) [2024-06-15 15:06:04,665][1652475] Updated weights for policy 0, policy_version 272928 (0.0108) [2024-06-15 15:06:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 559022080. Throughput: 0: 10991.0. Samples: 139814400. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:06:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:06:07,225][1652475] Updated weights for policy 0, policy_version 273015 (0.0174) [2024-06-15 15:06:09,679][1652475] Updated weights for policy 0, policy_version 273079 (0.0013) [2024-06-15 15:06:10,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 559284224. Throughput: 0: 10752.0. Samples: 139868160. Policy #0 lag: (min: 13.0, avg: 66.9, max: 212.0) [2024-06-15 15:06:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:06:14,521][1652475] Updated weights for policy 0, policy_version 273138 (0.0014) [2024-06-15 15:06:15,739][1648984] Fps is (10 sec: 39320.8, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 559415296. Throughput: 0: 10877.1. Samples: 139936768. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:06:15,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:06:18,000][1652475] Updated weights for policy 0, policy_version 273216 (0.0014) [2024-06-15 15:06:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 559677440. Throughput: 0: 10945.4. Samples: 139965440. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:06:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:06:21,279][1651340] Signal inference workers to stop experience collection... (14050 times) [2024-06-15 15:06:21,317][1652475] Updated weights for policy 0, policy_version 273297 (0.0013) [2024-06-15 15:06:21,339][1652475] InferenceWorker_p0-w0: stopping experience collection (14050 times) [2024-06-15 15:06:21,534][1651340] Signal inference workers to resume experience collection... (14050 times) [2024-06-15 15:06:21,554][1652475] InferenceWorker_p0-w0: resuming experience collection (14050 times) [2024-06-15 15:06:25,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 559841280. Throughput: 0: 10797.5. Samples: 140035584. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:06:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:06:25,794][1652475] Updated weights for policy 0, policy_version 273376 (0.0012) [2024-06-15 15:06:30,089][1652475] Updated weights for policy 0, policy_version 273472 (0.0115) [2024-06-15 15:06:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 560103424. Throughput: 0: 10877.2. Samples: 140096512. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:06:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:06:31,417][1652475] Updated weights for policy 0, policy_version 273529 (0.0013) [2024-06-15 15:06:33,910][1652475] Updated weights for policy 0, policy_version 273589 (0.0042) [2024-06-15 15:06:35,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 42656.4). Total num frames: 560332800. Throughput: 0: 10706.5. Samples: 140130816. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:06:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:06:38,213][1652475] Updated weights for policy 0, policy_version 273648 (0.0012) [2024-06-15 15:06:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 560463872. Throughput: 0: 10820.3. Samples: 140194816. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:06:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:06:41,547][1652475] Updated weights for policy 0, policy_version 273681 (0.0053) [2024-06-15 15:06:43,201][1652475] Updated weights for policy 0, policy_version 273750 (0.0011) [2024-06-15 15:06:45,586][1652475] Updated weights for policy 0, policy_version 273828 (0.0129) [2024-06-15 15:06:45,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 560791552. Throughput: 0: 10592.7. Samples: 140258304. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:06:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:06:49,713][1652475] Updated weights for policy 0, policy_version 273914 (0.0014) [2024-06-15 15:06:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 560988160. Throughput: 0: 10558.6. Samples: 140289536. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:06:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:06:53,885][1652475] Updated weights for policy 0, policy_version 273969 (0.0106) [2024-06-15 15:06:55,743][1648984] Fps is (10 sec: 36025.6, 60 sec: 42048.6, 300 sec: 42653.2). Total num frames: 561152000. Throughput: 0: 10887.3. Samples: 140358144. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:06:55,743][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:06:56,094][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000274032_561217536.pth... [2024-06-15 15:06:56,094][1652475] Updated weights for policy 0, policy_version 274032 (0.0012) [2024-06-15 15:06:56,306][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000268992_550895616.pth [2024-06-15 15:06:58,237][1652475] Updated weights for policy 0, policy_version 274106 (0.0015) [2024-06-15 15:07:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 561381376. Throughput: 0: 10672.4. Samples: 140417024. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:07:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:07:03,142][1652475] Updated weights for policy 0, policy_version 274168 (0.0011) [2024-06-15 15:07:05,178][1652475] Updated weights for policy 0, policy_version 274209 (0.0011) [2024-06-15 15:07:05,738][1648984] Fps is (10 sec: 49177.6, 60 sec: 43690.5, 300 sec: 42876.1). Total num frames: 561643520. Throughput: 0: 10695.1. Samples: 140446720. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:07:05,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:07:07,708][1652475] Updated weights for policy 0, policy_version 274241 (0.0041) [2024-06-15 15:07:08,900][1651340] Signal inference workers to stop experience collection... (14100 times) [2024-06-15 15:07:08,964][1652475] InferenceWorker_p0-w0: stopping experience collection (14100 times) [2024-06-15 15:07:09,234][1651340] Signal inference workers to resume experience collection... (14100 times) [2024-06-15 15:07:09,236][1652475] InferenceWorker_p0-w0: resuming experience collection (14100 times) [2024-06-15 15:07:09,417][1652475] Updated weights for policy 0, policy_version 274311 (0.0022) [2024-06-15 15:07:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 561905664. Throughput: 0: 10672.4. Samples: 140515840. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:07:10,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:07:14,748][1652475] Updated weights for policy 0, policy_version 274371 (0.0012) [2024-06-15 15:07:15,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 561971200. Throughput: 0: 10899.9. Samples: 140587008. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:07:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:07:16,424][1652475] Updated weights for policy 0, policy_version 274448 (0.0014) [2024-06-15 15:07:17,596][1652475] Updated weights for policy 0, policy_version 274496 (0.0015) [2024-06-15 15:07:20,754][1648984] Fps is (10 sec: 39256.7, 60 sec: 43678.7, 300 sec: 42651.6). Total num frames: 562298880. Throughput: 0: 10850.4. Samples: 140619264. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:07:20,755][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:07:21,026][1652475] Updated weights for policy 0, policy_version 274576 (0.0013) [2024-06-15 15:07:25,747][1648984] Fps is (10 sec: 45875.0, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 562429952. Throughput: 0: 10752.0. Samples: 140678656. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 15:07:25,753][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:07:27,594][1652475] Updated weights for policy 0, policy_version 274656 (0.0012) [2024-06-15 15:07:30,691][1652475] Updated weights for policy 0, policy_version 274690 (0.0013) [2024-06-15 15:07:30,738][1648984] Fps is (10 sec: 26257.6, 60 sec: 40960.0, 300 sec: 42209.7). Total num frames: 562561024. Throughput: 0: 10877.1. Samples: 140747776. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:07:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:07:33,128][1652475] Updated weights for policy 0, policy_version 274800 (0.0013) [2024-06-15 15:07:34,796][1652475] Updated weights for policy 0, policy_version 274870 (0.0011) [2024-06-15 15:07:35,746][1648984] Fps is (10 sec: 52384.8, 60 sec: 43684.5, 300 sec: 42652.7). Total num frames: 562954240. Throughput: 0: 10670.3. Samples: 140769792. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:07:35,747][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:07:39,235][1652475] Updated weights for policy 0, policy_version 274928 (0.0013) [2024-06-15 15:07:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 563085312. Throughput: 0: 10639.5. Samples: 140836864. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:07:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:07:43,969][1652475] Updated weights for policy 0, policy_version 274992 (0.0013) [2024-06-15 15:07:45,738][1648984] Fps is (10 sec: 39354.6, 60 sec: 42598.4, 300 sec: 42654.0). Total num frames: 563347456. Throughput: 0: 10717.8. Samples: 140899328. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:07:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:07:46,027][1652475] Updated weights for policy 0, policy_version 275088 (0.0014) [2024-06-15 15:07:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 42765.1). Total num frames: 563511296. Throughput: 0: 10683.8. Samples: 140927488. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:07:50,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:07:51,007][1652475] Updated weights for policy 0, policy_version 275168 (0.0015) [2024-06-15 15:07:55,325][1652475] Updated weights for policy 0, policy_version 275220 (0.0015) [2024-06-15 15:07:55,692][1651340] Signal inference workers to stop experience collection... (14150 times) [2024-06-15 15:07:55,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 42056.0, 300 sec: 42431.9). Total num frames: 563675136. Throughput: 0: 10695.1. Samples: 140997120. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:07:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:07:55,752][1652475] InferenceWorker_p0-w0: stopping experience collection (14150 times) [2024-06-15 15:07:55,927][1651340] Signal inference workers to resume experience collection... (14150 times) [2024-06-15 15:07:55,928][1652475] InferenceWorker_p0-w0: resuming experience collection (14150 times) [2024-06-15 15:07:57,184][1652475] Updated weights for policy 0, policy_version 275296 (0.0012) [2024-06-15 15:07:59,384][1652475] Updated weights for policy 0, policy_version 275364 (0.0012) [2024-06-15 15:08:00,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 564002816. Throughput: 0: 10399.3. Samples: 141054976. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:08:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:08:03,293][1652475] Updated weights for policy 0, policy_version 275424 (0.0014) [2024-06-15 15:08:05,745][1648984] Fps is (10 sec: 45875.1, 60 sec: 41506.2, 300 sec: 42876.1). Total num frames: 564133888. Throughput: 0: 10528.3. Samples: 141092864. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:08:05,762][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:08:07,297][1652475] Updated weights for policy 0, policy_version 275488 (0.0012) [2024-06-15 15:08:08,793][1652475] Updated weights for policy 0, policy_version 275552 (0.0013) [2024-06-15 15:08:09,669][1652475] Updated weights for policy 0, policy_version 275583 (0.0012) [2024-06-15 15:08:10,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 43098.2). Total num frames: 564428800. Throughput: 0: 10626.8. Samples: 141156864. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:08:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:08:14,661][1652475] Updated weights for policy 0, policy_version 275649 (0.0013) [2024-06-15 15:08:15,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 564625408. Throughput: 0: 10604.1. Samples: 141224960. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:08:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:08:18,598][1652475] Updated weights for policy 0, policy_version 275713 (0.0013) [2024-06-15 15:08:20,431][1652475] Updated weights for policy 0, policy_version 275795 (0.0013) [2024-06-15 15:08:20,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42063.8, 300 sec: 42765.0). Total num frames: 564822016. Throughput: 0: 10902.0. Samples: 141260288. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:08:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:08:23,031][1652475] Updated weights for policy 0, policy_version 275858 (0.0043) [2024-06-15 15:08:25,739][1648984] Fps is (10 sec: 42590.7, 60 sec: 43689.4, 300 sec: 42653.7). Total num frames: 565051392. Throughput: 0: 10774.3. Samples: 141321728. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:08:25,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:08:27,595][1652475] Updated weights for policy 0, policy_version 275936 (0.0011) [2024-06-15 15:08:30,683][1652475] Updated weights for policy 0, policy_version 275985 (0.0011) [2024-06-15 15:08:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 43098.2). Total num frames: 565215232. Throughput: 0: 10934.1. Samples: 141391360. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:08:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:08:32,329][1652475] Updated weights for policy 0, policy_version 276050 (0.0013) [2024-06-15 15:08:34,932][1652475] Updated weights for policy 0, policy_version 276121 (0.0092) [2024-06-15 15:08:35,738][1648984] Fps is (10 sec: 49160.6, 60 sec: 43150.6, 300 sec: 43209.3). Total num frames: 565542912. Throughput: 0: 10877.2. Samples: 141416960. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 15:08:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:08:35,822][1652475] Updated weights for policy 0, policy_version 276158 (0.0013) [2024-06-15 15:08:40,142][1652475] Updated weights for policy 0, policy_version 276216 (0.0142) [2024-06-15 15:08:40,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 565706752. Throughput: 0: 10968.2. Samples: 141490688. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:08:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:08:42,214][1651340] Signal inference workers to stop experience collection... (14200 times) [2024-06-15 15:08:42,260][1652475] InferenceWorker_p0-w0: stopping experience collection (14200 times) [2024-06-15 15:08:42,426][1651340] Signal inference workers to resume experience collection... (14200 times) [2024-06-15 15:08:42,450][1652475] InferenceWorker_p0-w0: resuming experience collection (14200 times) [2024-06-15 15:08:43,556][1652475] Updated weights for policy 0, policy_version 276292 (0.0012) [2024-06-15 15:08:45,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 565968896. Throughput: 0: 11025.0. Samples: 141551104. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:08:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:08:46,370][1652475] Updated weights for policy 0, policy_version 276353 (0.0012) [2024-06-15 15:08:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.6, 300 sec: 42654.0). Total num frames: 566099968. Throughput: 0: 10945.4. Samples: 141585408. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:08:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:08:50,943][1652475] Updated weights for policy 0, policy_version 276432 (0.0105) [2024-06-15 15:08:54,445][1652475] Updated weights for policy 0, policy_version 276496 (0.0013) [2024-06-15 15:08:55,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 44782.7, 300 sec: 43320.4). Total num frames: 566362112. Throughput: 0: 11116.0. Samples: 141657088. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:08:55,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:08:56,236][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000276576_566427648.pth... [2024-06-15 15:08:56,386][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000271552_556138496.pth [2024-06-15 15:08:56,844][1652475] Updated weights for policy 0, policy_version 276603 (0.0014) [2024-06-15 15:08:59,365][1652475] Updated weights for policy 0, policy_version 276645 (0.0011) [2024-06-15 15:09:00,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 566624256. Throughput: 0: 10979.5. Samples: 141719040. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:09:02,858][1652475] Updated weights for policy 0, policy_version 276704 (0.0013) [2024-06-15 15:09:05,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 43690.7, 300 sec: 43100.1). Total num frames: 566755328. Throughput: 0: 10888.5. Samples: 141750272. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:09:06,246][1652475] Updated weights for policy 0, policy_version 276756 (0.0070) [2024-06-15 15:09:08,060][1652475] Updated weights for policy 0, policy_version 276801 (0.0012) [2024-06-15 15:09:09,473][1652475] Updated weights for policy 0, policy_version 276851 (0.0012) [2024-06-15 15:09:10,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44236.9, 300 sec: 43320.4). Total num frames: 567083008. Throughput: 0: 11105.1. Samples: 141821440. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:09:14,523][1652475] Updated weights for policy 0, policy_version 276944 (0.0013) [2024-06-15 15:09:15,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44236.8, 300 sec: 43098.2). Total num frames: 567279616. Throughput: 0: 10899.9. Samples: 141881856. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:09:18,087][1652475] Updated weights for policy 0, policy_version 277024 (0.0020) [2024-06-15 15:09:20,262][1652475] Updated weights for policy 0, policy_version 277076 (0.0012) [2024-06-15 15:09:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 567476224. Throughput: 0: 11150.2. Samples: 141918720. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:20,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 15:09:22,250][1652475] Updated weights for policy 0, policy_version 277157 (0.0014) [2024-06-15 15:09:25,742][1648984] Fps is (10 sec: 39303.9, 60 sec: 43688.7, 300 sec: 43097.6). Total num frames: 567672832. Throughput: 0: 10944.3. Samples: 141983232. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:25,743][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:09:28,707][1651340] Signal inference workers to stop experience collection... (14250 times) [2024-06-15 15:09:28,740][1652475] InferenceWorker_p0-w0: stopping experience collection (14250 times) [2024-06-15 15:09:28,943][1651340] Signal inference workers to resume experience collection... (14250 times) [2024-06-15 15:09:28,944][1652475] InferenceWorker_p0-w0: resuming experience collection (14250 times) [2024-06-15 15:09:29,086][1652475] Updated weights for policy 0, policy_version 277220 (0.0012) [2024-06-15 15:09:30,685][1652475] Updated weights for policy 0, policy_version 277281 (0.0013) [2024-06-15 15:09:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 567869440. Throughput: 0: 11036.5. Samples: 142047744. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:09:31,567][1652475] Updated weights for policy 0, policy_version 277315 (0.0011) [2024-06-15 15:09:33,882][1652475] Updated weights for policy 0, policy_version 277408 (0.0016) [2024-06-15 15:09:35,738][1648984] Fps is (10 sec: 52452.2, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 568197120. Throughput: 0: 10808.9. Samples: 142071808. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:09:40,738][1648984] Fps is (10 sec: 32767.5, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 568197120. Throughput: 0: 10774.8. Samples: 142141952. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:40,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:09:41,386][1652475] Updated weights for policy 0, policy_version 277457 (0.0015) [2024-06-15 15:09:42,385][1652475] Updated weights for policy 0, policy_version 277504 (0.0010) [2024-06-15 15:09:44,145][1652475] Updated weights for policy 0, policy_version 277568 (0.0011) [2024-06-15 15:09:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 568590336. Throughput: 0: 10729.3. Samples: 142201856. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 15:09:46,772][1652475] Updated weights for policy 0, policy_version 277680 (0.0096) [2024-06-15 15:09:50,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.4, 300 sec: 42767.2). Total num frames: 568721408. Throughput: 0: 10581.3. Samples: 142226432. Policy #0 lag: (min: 18.0, avg: 118.7, max: 274.0) [2024-06-15 15:09:50,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:09:53,862][1652475] Updated weights for policy 0, policy_version 277728 (0.0103) [2024-06-15 15:09:54,631][1652475] Updated weights for policy 0, policy_version 277758 (0.0011) [2024-06-15 15:09:55,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 42052.4, 300 sec: 43209.3). Total num frames: 568885248. Throughput: 0: 10535.8. Samples: 142295552. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:09:55,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:09:56,140][1652475] Updated weights for policy 0, policy_version 277808 (0.0012) [2024-06-15 15:09:58,263][1652475] Updated weights for policy 0, policy_version 277880 (0.0021) [2024-06-15 15:09:59,900][1652475] Updated weights for policy 0, policy_version 277950 (0.0116) [2024-06-15 15:10:00,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 569245696. Throughput: 0: 10478.9. Samples: 142353408. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:10:05,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 569311232. Throughput: 0: 10501.7. Samples: 142391296. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:10:06,053][1652475] Updated weights for policy 0, policy_version 278008 (0.0034) [2024-06-15 15:10:07,945][1652475] Updated weights for policy 0, policy_version 278072 (0.0014) [2024-06-15 15:10:09,945][1652475] Updated weights for policy 0, policy_version 278132 (0.0118) [2024-06-15 15:10:10,776][1648984] Fps is (10 sec: 39170.4, 60 sec: 42571.0, 300 sec: 43536.9). Total num frames: 569638912. Throughput: 0: 10402.8. Samples: 142451712. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:10,777][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:10:13,519][1651340] Signal inference workers to stop experience collection... (14300 times) [2024-06-15 15:10:13,549][1652475] InferenceWorker_p0-w0: stopping experience collection (14300 times) [2024-06-15 15:10:13,814][1651340] Signal inference workers to resume experience collection... (14300 times) [2024-06-15 15:10:13,822][1652475] InferenceWorker_p0-w0: resuming experience collection (14300 times) [2024-06-15 15:10:14,114][1652475] Updated weights for policy 0, policy_version 278192 (0.0014) [2024-06-15 15:10:15,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 569769984. Throughput: 0: 10524.4. Samples: 142521344. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:15,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:10:16,864][1652475] Updated weights for policy 0, policy_version 278224 (0.0011) [2024-06-15 15:10:19,362][1652475] Updated weights for policy 0, policy_version 278306 (0.0019) [2024-06-15 15:10:20,738][1648984] Fps is (10 sec: 39474.2, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 570032128. Throughput: 0: 10752.0. Samples: 142555648. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:10:21,482][1652475] Updated weights for policy 0, policy_version 278368 (0.0015) [2024-06-15 15:10:25,336][1652475] Updated weights for policy 0, policy_version 278448 (0.0015) [2024-06-15 15:10:25,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43693.9, 300 sec: 43320.4). Total num frames: 570294272. Throughput: 0: 10649.6. Samples: 142621184. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:10:28,645][1652475] Updated weights for policy 0, policy_version 278498 (0.0013) [2024-06-15 15:10:30,185][1652475] Updated weights for policy 0, policy_version 278535 (0.0011) [2024-06-15 15:10:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 570458112. Throughput: 0: 10934.0. Samples: 142693888. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:10:31,470][1652475] Updated weights for policy 0, policy_version 278588 (0.0013) [2024-06-15 15:10:33,895][1652475] Updated weights for policy 0, policy_version 278656 (0.0136) [2024-06-15 15:10:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 43320.4). Total num frames: 570687488. Throughput: 0: 11013.8. Samples: 142722048. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:10:37,209][1652475] Updated weights for policy 0, policy_version 278708 (0.0108) [2024-06-15 15:10:40,502][1652475] Updated weights for policy 0, policy_version 278768 (0.0111) [2024-06-15 15:10:40,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 45329.2, 300 sec: 42987.2). Total num frames: 570916864. Throughput: 0: 11116.1. Samples: 142795776. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:10:42,348][1652475] Updated weights for policy 0, policy_version 278816 (0.0013) [2024-06-15 15:10:44,797][1652475] Updated weights for policy 0, policy_version 278852 (0.0017) [2024-06-15 15:10:45,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 571179008. Throughput: 0: 11195.7. Samples: 142857216. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:10:45,843][1652475] Updated weights for policy 0, policy_version 278912 (0.0015) [2024-06-15 15:10:48,880][1652475] Updated weights for policy 0, policy_version 278967 (0.0015) [2024-06-15 15:10:50,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 571342848. Throughput: 0: 11161.6. Samples: 142893568. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:10:52,034][1652475] Updated weights for policy 0, policy_version 279015 (0.0023) [2024-06-15 15:10:54,515][1652475] Updated weights for policy 0, policy_version 279088 (0.0013) [2024-06-15 15:10:55,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 45329.0, 300 sec: 43542.5). Total num frames: 571604992. Throughput: 0: 11205.3. Samples: 142955520. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:10:55,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:10:55,758][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000279104_571604992.pth... [2024-06-15 15:10:55,858][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000274032_561217536.pth [2024-06-15 15:10:55,861][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000279104_571604992.pth [2024-06-15 15:10:57,058][1652475] Updated weights for policy 0, policy_version 279136 (0.0013) [2024-06-15 15:11:00,149][1652475] Updated weights for policy 0, policy_version 279184 (0.0012) [2024-06-15 15:11:00,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 571801600. Throughput: 0: 11229.9. Samples: 143026688. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:11:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:11:02,792][1651340] Signal inference workers to stop experience collection... (14350 times) [2024-06-15 15:11:02,844][1652475] InferenceWorker_p0-w0: stopping experience collection (14350 times) [2024-06-15 15:11:03,133][1651340] Signal inference workers to resume experience collection... (14350 times) [2024-06-15 15:11:03,134][1652475] InferenceWorker_p0-w0: resuming experience collection (14350 times) [2024-06-15 15:11:03,517][1652475] Updated weights for policy 0, policy_version 279264 (0.0017) [2024-06-15 15:11:05,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 45329.0, 300 sec: 43209.3). Total num frames: 572030976. Throughput: 0: 11138.8. Samples: 143056896. Policy #0 lag: (min: 15.0, avg: 75.4, max: 271.0) [2024-06-15 15:11:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:11:05,860][1652475] Updated weights for policy 0, policy_version 279328 (0.0015) [2024-06-15 15:11:09,154][1652475] Updated weights for policy 0, policy_version 279395 (0.0103) [2024-06-15 15:11:10,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43718.8, 300 sec: 43542.6). Total num frames: 572260352. Throughput: 0: 11116.1. Samples: 143121408. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:11:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:11:13,056][1652475] Updated weights for policy 0, policy_version 279472 (0.0016) [2024-06-15 15:11:15,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44783.1, 300 sec: 43320.4). Total num frames: 572456960. Throughput: 0: 10945.4. Samples: 143186432. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:11:15,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 15:11:15,994][1652475] Updated weights for policy 0, policy_version 279550 (0.0015) [2024-06-15 15:11:20,433][1652475] Updated weights for policy 0, policy_version 279620 (0.0013) [2024-06-15 15:11:20,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 572686336. Throughput: 0: 11036.4. Samples: 143218688. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:11:20,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 15:11:24,389][1652475] Updated weights for policy 0, policy_version 279713 (0.0013) [2024-06-15 15:11:25,738][1648984] Fps is (10 sec: 45874.2, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 572915712. Throughput: 0: 10752.0. Samples: 143279616. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:11:25,739][1648984] Avg episode reward: [(0, '-0.620')] [2024-06-15 15:11:27,267][1652475] Updated weights for policy 0, policy_version 279765 (0.0020) [2024-06-15 15:11:30,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 43144.6, 300 sec: 43098.3). Total num frames: 573046784. Throughput: 0: 10990.9. Samples: 143351808. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:11:30,738][1648984] Avg episode reward: [(0, '-0.630')] [2024-06-15 15:11:31,997][1652475] Updated weights for policy 0, policy_version 279845 (0.0149) [2024-06-15 15:11:33,430][1652475] Updated weights for policy 0, policy_version 279908 (0.0012) [2024-06-15 15:11:35,296][1652475] Updated weights for policy 0, policy_version 279958 (0.0037) [2024-06-15 15:11:35,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 573374464. Throughput: 0: 10854.4. Samples: 143382016. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:11:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:11:39,715][1652475] Updated weights for policy 0, policy_version 280003 (0.0033) [2024-06-15 15:11:40,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 573505536. Throughput: 0: 10922.7. Samples: 143447040. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:11:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:11:41,049][1652475] Updated weights for policy 0, policy_version 280063 (0.0025) [2024-06-15 15:11:44,138][1652475] Updated weights for policy 0, policy_version 280115 (0.0013) [2024-06-15 15:11:45,663][1652475] Updated weights for policy 0, policy_version 280176 (0.0011) [2024-06-15 15:11:45,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 573800448. Throughput: 0: 10831.6. Samples: 143514112. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:11:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:11:47,435][1652475] Updated weights for policy 0, policy_version 280240 (0.0014) [2024-06-15 15:11:50,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.6, 300 sec: 43432.3). Total num frames: 573964288. Throughput: 0: 10774.7. Samples: 143541760. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:11:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:11:52,236][1651340] Signal inference workers to stop experience collection... (14400 times) [2024-06-15 15:11:52,339][1652475] InferenceWorker_p0-w0: stopping experience collection (14400 times) [2024-06-15 15:11:52,468][1651340] Signal inference workers to resume experience collection... (14400 times) [2024-06-15 15:11:52,468][1652475] InferenceWorker_p0-w0: resuming experience collection (14400 times) [2024-06-15 15:11:52,655][1652475] Updated weights for policy 0, policy_version 280277 (0.0013) [2024-06-15 15:11:55,731][1652475] Updated weights for policy 0, policy_version 280353 (0.0017) [2024-06-15 15:11:55,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 574160896. Throughput: 0: 11002.3. Samples: 143616512. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:11:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:11:58,177][1652475] Updated weights for policy 0, policy_version 280442 (0.0015) [2024-06-15 15:11:59,940][1652475] Updated weights for policy 0, policy_version 280512 (0.0016) [2024-06-15 15:12:00,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 574488576. Throughput: 0: 10615.4. Samples: 143664128. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:12:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:12:05,738][1648984] Fps is (10 sec: 45873.9, 60 sec: 43144.3, 300 sec: 43098.2). Total num frames: 574619648. Throughput: 0: 10717.8. Samples: 143700992. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:12:05,739][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:12:08,308][1652475] Updated weights for policy 0, policy_version 280624 (0.0015) [2024-06-15 15:12:10,738][1648984] Fps is (10 sec: 26214.0, 60 sec: 41506.0, 300 sec: 43320.4). Total num frames: 574750720. Throughput: 0: 10820.3. Samples: 143766528. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:12:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:12:11,865][1652475] Updated weights for policy 0, policy_version 280678 (0.0012) [2024-06-15 15:12:13,917][1652475] Updated weights for policy 0, policy_version 280763 (0.0015) [2024-06-15 15:12:15,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 42598.4, 300 sec: 43100.7). Total num frames: 575012864. Throughput: 0: 10456.2. Samples: 143822336. Policy #0 lag: (min: 31.0, avg: 156.3, max: 287.0) [2024-06-15 15:12:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:12:18,523][1652475] Updated weights for policy 0, policy_version 280833 (0.0013) [2024-06-15 15:12:19,979][1652475] Updated weights for policy 0, policy_version 280895 (0.0010) [2024-06-15 15:12:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 575275008. Throughput: 0: 10558.6. Samples: 143857152. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:12:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:12:25,392][1652475] Updated weights for policy 0, policy_version 280960 (0.0089) [2024-06-15 15:12:25,739][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 575406080. Throughput: 0: 10592.7. Samples: 143923712. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:12:25,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:12:28,181][1652475] Updated weights for policy 0, policy_version 281040 (0.0013) [2024-06-15 15:12:30,580][1652475] Updated weights for policy 0, policy_version 281094 (0.0027) [2024-06-15 15:12:30,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 43099.5). Total num frames: 575668224. Throughput: 0: 10399.3. Samples: 143982080. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:12:30,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:12:32,071][1652475] Updated weights for policy 0, policy_version 281149 (0.0016) [2024-06-15 15:12:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 40413.9, 300 sec: 43098.3). Total num frames: 575799296. Throughput: 0: 10479.0. Samples: 144013312. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:12:35,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 15:12:37,406][1652475] Updated weights for policy 0, policy_version 281200 (0.0018) [2024-06-15 15:12:39,411][1651340] Signal inference workers to stop experience collection... (14450 times) [2024-06-15 15:12:39,445][1652475] InferenceWorker_p0-w0: stopping experience collection (14450 times) [2024-06-15 15:12:39,687][1651340] Signal inference workers to resume experience collection... (14450 times) [2024-06-15 15:12:39,688][1652475] InferenceWorker_p0-w0: resuming experience collection (14450 times) [2024-06-15 15:12:40,365][1652475] Updated weights for policy 0, policy_version 281282 (0.0022) [2024-06-15 15:12:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 576094208. Throughput: 0: 10342.4. Samples: 144081920. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:12:40,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:12:41,704][1652475] Updated weights for policy 0, policy_version 281336 (0.0011) [2024-06-15 15:12:43,767][1652475] Updated weights for policy 0, policy_version 281401 (0.0013) [2024-06-15 15:12:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 42052.3, 300 sec: 43431.5). Total num frames: 576323584. Throughput: 0: 10729.2. Samples: 144146944. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:12:45,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:12:49,536][1652475] Updated weights for policy 0, policy_version 281469 (0.0013) [2024-06-15 15:12:50,692][1652475] Updated weights for policy 0, policy_version 281520 (0.0036) [2024-06-15 15:12:50,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 43144.4, 300 sec: 43653.6). Total num frames: 576552960. Throughput: 0: 10797.5. Samples: 144186880. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:12:50,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:12:52,257][1652475] Updated weights for policy 0, policy_version 281570 (0.0014) [2024-06-15 15:12:54,860][1652475] Updated weights for policy 0, policy_version 281633 (0.0012) [2024-06-15 15:12:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 576847872. Throughput: 0: 10752.0. Samples: 144250368. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:12:55,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:12:55,750][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000281664_576847872.pth... [2024-06-15 15:12:55,816][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000276576_566427648.pth [2024-06-15 15:13:00,589][1652475] Updated weights for policy 0, policy_version 281682 (0.0012) [2024-06-15 15:13:00,738][1648984] Fps is (10 sec: 32768.9, 60 sec: 39867.7, 300 sec: 43209.3). Total num frames: 576880640. Throughput: 0: 11127.5. Samples: 144323072. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:13:00,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:13:02,978][1652475] Updated weights for policy 0, policy_version 281785 (0.0023) [2024-06-15 15:13:04,865][1652475] Updated weights for policy 0, policy_version 281851 (0.0012) [2024-06-15 15:13:05,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.9, 300 sec: 43431.5). Total num frames: 577241088. Throughput: 0: 10831.7. Samples: 144344576. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:13:05,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:13:07,793][1652475] Updated weights for policy 0, policy_version 281911 (0.0101) [2024-06-15 15:13:10,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 577372160. Throughput: 0: 10843.0. Samples: 144411648. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:13:10,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:13:13,312][1652475] Updated weights for policy 0, policy_version 281953 (0.0014) [2024-06-15 15:13:15,319][1652475] Updated weights for policy 0, policy_version 282036 (0.0013) [2024-06-15 15:13:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 577634304. Throughput: 0: 10945.4. Samples: 144474624. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:13:15,740][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:13:16,314][1652475] Updated weights for policy 0, policy_version 282066 (0.0012) [2024-06-15 15:13:17,292][1652475] Updated weights for policy 0, policy_version 282112 (0.0012) [2024-06-15 15:13:19,625][1652475] Updated weights for policy 0, policy_version 282175 (0.0013) [2024-06-15 15:13:20,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43542.8). Total num frames: 577896448. Throughput: 0: 11059.2. Samples: 144510976. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:13:20,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:13:25,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 577961984. Throughput: 0: 11116.1. Samples: 144582144. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:13:25,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:13:26,030][1651340] Signal inference workers to stop experience collection... (14500 times) [2024-06-15 15:13:26,064][1652475] InferenceWorker_p0-w0: stopping experience collection (14500 times) [2024-06-15 15:13:26,199][1651340] Signal inference workers to resume experience collection... (14500 times) [2024-06-15 15:13:26,200][1652475] InferenceWorker_p0-w0: resuming experience collection (14500 times) [2024-06-15 15:13:26,420][1652475] Updated weights for policy 0, policy_version 282241 (0.0014) [2024-06-15 15:13:29,069][1652475] Updated weights for policy 0, policy_version 282336 (0.0014) [2024-06-15 15:13:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 578289664. Throughput: 0: 10774.8. Samples: 144631808. Policy #0 lag: (min: 13.0, avg: 134.8, max: 269.0) [2024-06-15 15:13:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:13:31,628][1652475] Updated weights for policy 0, policy_version 282384 (0.0013) [2024-06-15 15:13:35,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 578420736. Throughput: 0: 10592.8. Samples: 144663552. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:13:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:13:38,227][1652475] Updated weights for policy 0, policy_version 282467 (0.0015) [2024-06-15 15:13:40,354][1652475] Updated weights for policy 0, policy_version 282544 (0.0015) [2024-06-15 15:13:40,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 578650112. Throughput: 0: 10649.6. Samples: 144729600. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:13:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:13:42,259][1652475] Updated weights for policy 0, policy_version 282624 (0.0194) [2024-06-15 15:13:45,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 578813952. Throughput: 0: 10319.6. Samples: 144787456. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:13:45,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:13:47,322][1652475] Updated weights for policy 0, policy_version 282679 (0.0014) [2024-06-15 15:13:50,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 40414.1, 300 sec: 42765.1). Total num frames: 578977792. Throughput: 0: 10524.5. Samples: 144818176. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:13:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:13:51,525][1652475] Updated weights for policy 0, policy_version 282738 (0.0017) [2024-06-15 15:13:52,719][1652475] Updated weights for policy 0, policy_version 282800 (0.0086) [2024-06-15 15:13:54,769][1652475] Updated weights for policy 0, policy_version 282880 (0.0019) [2024-06-15 15:13:55,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 579338240. Throughput: 0: 10228.6. Samples: 144871936. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:13:55,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:14:00,203][1652475] Updated weights for policy 0, policy_version 282942 (0.0013) [2024-06-15 15:14:00,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.6, 300 sec: 43098.3). Total num frames: 579469312. Throughput: 0: 10274.1. Samples: 144936960. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:14:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:14:04,300][1652475] Updated weights for policy 0, policy_version 282999 (0.0013) [2024-06-15 15:14:05,493][1652475] Updated weights for policy 0, policy_version 283040 (0.0012) [2024-06-15 15:14:05,738][1648984] Fps is (10 sec: 32768.6, 60 sec: 40413.8, 300 sec: 42653.9). Total num frames: 579665920. Throughput: 0: 10251.4. Samples: 144972288. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:14:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:14:07,170][1652475] Updated weights for policy 0, policy_version 283121 (0.0091) [2024-06-15 15:14:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 579862528. Throughput: 0: 9955.6. Samples: 145030144. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:14:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:14:10,757][1651340] Signal inference workers to stop experience collection... (14550 times) [2024-06-15 15:14:10,804][1652475] InferenceWorker_p0-w0: stopping experience collection (14550 times) [2024-06-15 15:14:10,978][1651340] Signal inference workers to resume experience collection... (14550 times) [2024-06-15 15:14:10,979][1652475] InferenceWorker_p0-w0: resuming experience collection (14550 times) [2024-06-15 15:14:11,371][1652475] Updated weights for policy 0, policy_version 283168 (0.0022) [2024-06-15 15:14:15,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 39867.7, 300 sec: 42542.9). Total num frames: 580026368. Throughput: 0: 10433.4. Samples: 145101312. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:14:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:14:16,615][1652475] Updated weights for policy 0, policy_version 283248 (0.0014) [2024-06-15 15:14:18,149][1652475] Updated weights for policy 0, policy_version 283314 (0.0012) [2024-06-15 15:14:19,370][1652475] Updated weights for policy 0, policy_version 283360 (0.0071) [2024-06-15 15:14:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 41506.1, 300 sec: 43098.9). Total num frames: 580386816. Throughput: 0: 10251.4. Samples: 145124864. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:14:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:14:23,311][1652475] Updated weights for policy 0, policy_version 283408 (0.0145) [2024-06-15 15:14:24,303][1652475] Updated weights for policy 0, policy_version 283456 (0.0015) [2024-06-15 15:14:25,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 580517888. Throughput: 0: 10319.6. Samples: 145193984. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:14:25,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:14:28,820][1652475] Updated weights for policy 0, policy_version 283518 (0.0012) [2024-06-15 15:14:30,535][1652475] Updated weights for policy 0, policy_version 283587 (0.0014) [2024-06-15 15:14:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 580812800. Throughput: 0: 10501.7. Samples: 145260032. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:14:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 15:14:31,985][1652475] Updated weights for policy 0, policy_version 283648 (0.0013) [2024-06-15 15:14:35,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 580976640. Throughput: 0: 10478.9. Samples: 145289728. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:14:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 15:14:36,143][1652475] Updated weights for policy 0, policy_version 283712 (0.0015) [2024-06-15 15:14:40,738][1648984] Fps is (10 sec: 26214.3, 60 sec: 40413.8, 300 sec: 42320.7). Total num frames: 581074944. Throughput: 0: 10797.6. Samples: 145357824. Policy #0 lag: (min: 3.0, avg: 151.5, max: 259.0) [2024-06-15 15:14:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:14:41,782][1652475] Updated weights for policy 0, policy_version 283778 (0.0014) [2024-06-15 15:14:44,740][1652475] Updated weights for policy 0, policy_version 283897 (0.0013) [2024-06-15 15:14:45,739][1648984] Fps is (10 sec: 45874.3, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 581435392. Throughput: 0: 10501.6. Samples: 145409536. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:14:45,740][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:14:48,432][1652475] Updated weights for policy 0, policy_version 283959 (0.0024) [2024-06-15 15:14:50,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 581566464. Throughput: 0: 10365.2. Samples: 145438720. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:14:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:14:53,861][1652475] Updated weights for policy 0, policy_version 284025 (0.0120) [2024-06-15 15:14:55,374][1652475] Updated weights for policy 0, policy_version 284065 (0.0012) [2024-06-15 15:14:55,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 40960.1, 300 sec: 42542.9). Total num frames: 581795840. Throughput: 0: 10672.3. Samples: 145510400. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:14:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:14:55,804][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000284096_581828608.pth... [2024-06-15 15:14:55,869][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000279104_571604992.pth [2024-06-15 15:14:56,381][1651340] Signal inference workers to stop experience collection... (14600 times) [2024-06-15 15:14:56,434][1652475] InferenceWorker_p0-w0: stopping experience collection (14600 times) [2024-06-15 15:14:56,610][1651340] Signal inference workers to resume experience collection... (14600 times) [2024-06-15 15:14:56,611][1652475] InferenceWorker_p0-w0: resuming experience collection (14600 times) [2024-06-15 15:14:57,467][1652475] Updated weights for policy 0, policy_version 284158 (0.0016) [2024-06-15 15:14:59,496][1652475] Updated weights for policy 0, policy_version 284196 (0.0016) [2024-06-15 15:15:00,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 582090752. Throughput: 0: 10422.0. Samples: 145570304. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:15:05,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 42548.4). Total num frames: 582189056. Throughput: 0: 10820.3. Samples: 145611776. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:15:05,781][1652475] Updated weights for policy 0, policy_version 284281 (0.0014) [2024-06-15 15:15:06,832][1652475] Updated weights for policy 0, policy_version 284320 (0.0012) [2024-06-15 15:15:08,572][1652475] Updated weights for policy 0, policy_version 284368 (0.0044) [2024-06-15 15:15:09,506][1652475] Updated weights for policy 0, policy_version 284409 (0.0018) [2024-06-15 15:15:10,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 582516736. Throughput: 0: 10638.3. Samples: 145672704. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:15:15,746][1648984] Fps is (10 sec: 42562.0, 60 sec: 43138.4, 300 sec: 42652.7). Total num frames: 582615040. Throughput: 0: 10795.5. Samples: 145745920. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:15,747][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:15:16,487][1652475] Updated weights for policy 0, policy_version 284482 (0.0014) [2024-06-15 15:15:17,843][1652475] Updated weights for policy 0, policy_version 284538 (0.0012) [2024-06-15 15:15:19,298][1652475] Updated weights for policy 0, policy_version 284608 (0.0013) [2024-06-15 15:15:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 582909952. Throughput: 0: 10922.7. Samples: 145781248. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:15:22,176][1652475] Updated weights for policy 0, policy_version 284676 (0.0082) [2024-06-15 15:15:23,225][1652475] Updated weights for policy 0, policy_version 284725 (0.0011) [2024-06-15 15:15:25,738][1648984] Fps is (10 sec: 52473.6, 60 sec: 43690.8, 300 sec: 42987.2). Total num frames: 583139328. Throughput: 0: 10638.2. Samples: 145836544. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:15:29,497][1652475] Updated weights for policy 0, policy_version 284784 (0.0017) [2024-06-15 15:15:30,739][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 583303168. Throughput: 0: 11047.9. Samples: 145906688. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:30,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:15:31,491][1652475] Updated weights for policy 0, policy_version 284864 (0.0113) [2024-06-15 15:15:35,226][1652475] Updated weights for policy 0, policy_version 284976 (0.0155) [2024-06-15 15:15:35,746][1648984] Fps is (10 sec: 52385.8, 60 sec: 44776.8, 300 sec: 43208.1). Total num frames: 583663616. Throughput: 0: 11136.8. Samples: 145939968. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:35,746][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:15:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 42431.8). Total num frames: 583696384. Throughput: 0: 11002.3. Samples: 146005504. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:15:41,174][1652475] Updated weights for policy 0, policy_version 285040 (0.0015) [2024-06-15 15:15:42,157][1651340] Signal inference workers to stop experience collection... (14650 times) [2024-06-15 15:15:42,189][1652475] InferenceWorker_p0-w0: stopping experience collection (14650 times) [2024-06-15 15:15:42,389][1651340] Signal inference workers to resume experience collection... (14650 times) [2024-06-15 15:15:42,393][1652475] InferenceWorker_p0-w0: resuming experience collection (14650 times) [2024-06-15 15:15:42,959][1652475] Updated weights for policy 0, policy_version 285116 (0.0014) [2024-06-15 15:15:45,505][1652475] Updated weights for policy 0, policy_version 285168 (0.0014) [2024-06-15 15:15:45,738][1648984] Fps is (10 sec: 36074.3, 60 sec: 43144.7, 300 sec: 42987.2). Total num frames: 584024064. Throughput: 0: 11138.8. Samples: 146071552. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:15:47,596][1652475] Updated weights for policy 0, policy_version 285248 (0.0087) [2024-06-15 15:15:50,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 584187904. Throughput: 0: 10786.1. Samples: 146097152. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:15:53,465][1652475] Updated weights for policy 0, policy_version 285305 (0.0013) [2024-06-15 15:15:55,184][1652475] Updated weights for policy 0, policy_version 285376 (0.0012) [2024-06-15 15:15:55,738][1648984] Fps is (10 sec: 42597.6, 60 sec: 44236.7, 300 sec: 42876.1). Total num frames: 584450048. Throughput: 0: 11025.0. Samples: 146168832. Policy #0 lag: (min: 93.0, avg: 194.3, max: 351.0) [2024-06-15 15:15:55,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 15:15:57,994][1652475] Updated weights for policy 0, policy_version 285435 (0.0013) [2024-06-15 15:16:00,270][1652475] Updated weights for policy 0, policy_version 285496 (0.0014) [2024-06-15 15:16:00,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 584712192. Throughput: 0: 10754.0. Samples: 146229760. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:16:04,530][1652475] Updated weights for policy 0, policy_version 285561 (0.0029) [2024-06-15 15:16:05,741][1648984] Fps is (10 sec: 39310.4, 60 sec: 44234.5, 300 sec: 42653.5). Total num frames: 584843264. Throughput: 0: 10774.0. Samples: 146266112. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:05,742][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:16:07,019][1652475] Updated weights for policy 0, policy_version 285619 (0.0117) [2024-06-15 15:16:09,478][1652475] Updated weights for policy 0, policy_version 285680 (0.0015) [2024-06-15 15:16:10,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 585105408. Throughput: 0: 10934.0. Samples: 146328576. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:10,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:16:13,097][1652475] Updated weights for policy 0, policy_version 285750 (0.0011) [2024-06-15 15:16:15,738][1648984] Fps is (10 sec: 39333.6, 60 sec: 43696.9, 300 sec: 42542.9). Total num frames: 585236480. Throughput: 0: 10888.5. Samples: 146396672. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:16:16,955][1652475] Updated weights for policy 0, policy_version 285779 (0.0013) [2024-06-15 15:16:19,010][1652475] Updated weights for policy 0, policy_version 285872 (0.0014) [2024-06-15 15:16:20,307][1652475] Updated weights for policy 0, policy_version 285920 (0.0011) [2024-06-15 15:16:20,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 44783.0, 300 sec: 42987.2). Total num frames: 585596928. Throughput: 0: 10833.6. Samples: 146427392. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:16:21,004][1652475] Updated weights for policy 0, policy_version 285949 (0.0010) [2024-06-15 15:16:24,424][1652475] Updated weights for policy 0, policy_version 286000 (0.0032) [2024-06-15 15:16:25,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 585760768. Throughput: 0: 10865.8. Samples: 146494464. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:25,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:16:28,950][1652475] Updated weights for policy 0, policy_version 286034 (0.0034) [2024-06-15 15:16:29,919][1651340] Signal inference workers to stop experience collection... (14700 times) [2024-06-15 15:16:29,976][1652475] Updated weights for policy 0, policy_version 286086 (0.0016) [2024-06-15 15:16:29,999][1652475] InferenceWorker_p0-w0: stopping experience collection (14700 times) [2024-06-15 15:16:30,101][1651340] Signal inference workers to resume experience collection... (14700 times) [2024-06-15 15:16:30,102][1652475] InferenceWorker_p0-w0: resuming experience collection (14700 times) [2024-06-15 15:16:30,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 44236.8, 300 sec: 42653.9). Total num frames: 585957376. Throughput: 0: 10911.3. Samples: 146562560. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:16:31,237][1652475] Updated weights for policy 0, policy_version 286146 (0.0096) [2024-06-15 15:16:34,842][1652475] Updated weights for policy 0, policy_version 286211 (0.0015) [2024-06-15 15:16:35,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42604.2, 300 sec: 43098.3). Total num frames: 586219520. Throughput: 0: 11036.4. Samples: 146593792. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:16:40,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 43144.5, 300 sec: 42320.7). Total num frames: 586285056. Throughput: 0: 11013.7. Samples: 146664448. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:16:40,879][1652475] Updated weights for policy 0, policy_version 286288 (0.0142) [2024-06-15 15:16:43,261][1652475] Updated weights for policy 0, policy_version 286385 (0.0014) [2024-06-15 15:16:44,686][1652475] Updated weights for policy 0, policy_version 286448 (0.0013) [2024-06-15 15:16:45,738][1648984] Fps is (10 sec: 45874.2, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 586678272. Throughput: 0: 11002.3. Samples: 146724864. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:45,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:16:46,561][1652475] Updated weights for policy 0, policy_version 286483 (0.0013) [2024-06-15 15:16:50,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 586809344. Throughput: 0: 10912.0. Samples: 146757120. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:16:52,833][1652475] Updated weights for policy 0, policy_version 286529 (0.0028) [2024-06-15 15:16:54,092][1652475] Updated weights for policy 0, policy_version 286592 (0.0109) [2024-06-15 15:16:55,770][1648984] Fps is (10 sec: 35928.8, 60 sec: 43121.3, 300 sec: 42538.2). Total num frames: 587038720. Throughput: 0: 11142.2. Samples: 146830336. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:16:55,771][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 15:16:56,292][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000286672_587104256.pth... [2024-06-15 15:16:56,465][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000281664_576847872.pth [2024-06-15 15:16:56,729][1652475] Updated weights for policy 0, policy_version 286688 (0.0012) [2024-06-15 15:16:57,875][1652475] Updated weights for policy 0, policy_version 286721 (0.0012) [2024-06-15 15:17:00,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 587333632. Throughput: 0: 10672.3. Samples: 146876928. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:17:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:17:05,272][1652475] Updated weights for policy 0, policy_version 286785 (0.0014) [2024-06-15 15:17:05,738][1648984] Fps is (10 sec: 32874.8, 60 sec: 42054.4, 300 sec: 42765.0). Total num frames: 587366400. Throughput: 0: 10831.6. Samples: 146914816. Policy #0 lag: (min: 3.0, avg: 120.8, max: 259.0) [2024-06-15 15:17:05,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:17:06,732][1652475] Updated weights for policy 0, policy_version 286848 (0.0041) [2024-06-15 15:17:10,071][1652475] Updated weights for policy 0, policy_version 286930 (0.0014) [2024-06-15 15:17:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 587694080. Throughput: 0: 10740.6. Samples: 146977792. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:17:10,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:17:11,527][1651340] Signal inference workers to stop experience collection... (14750 times) [2024-06-15 15:17:11,720][1652475] InferenceWorker_p0-w0: stopping experience collection (14750 times) [2024-06-15 15:17:11,799][1651340] Signal inference workers to resume experience collection... (14750 times) [2024-06-15 15:17:11,800][1652475] InferenceWorker_p0-w0: resuming experience collection (14750 times) [2024-06-15 15:17:11,970][1652475] Updated weights for policy 0, policy_version 287010 (0.0111) [2024-06-15 15:17:15,738][1648984] Fps is (10 sec: 49151.2, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 587857920. Throughput: 0: 10638.2. Samples: 147041280. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:17:15,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:17:17,103][1652475] Updated weights for policy 0, policy_version 287045 (0.0016) [2024-06-15 15:17:18,461][1652475] Updated weights for policy 0, policy_version 287103 (0.0011) [2024-06-15 15:17:20,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 588087296. Throughput: 0: 10774.8. Samples: 147078656. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:17:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:17:20,821][1652475] Updated weights for policy 0, policy_version 287168 (0.0013) [2024-06-15 15:17:22,345][1652475] Updated weights for policy 0, policy_version 287232 (0.0099) [2024-06-15 15:17:23,413][1652475] Updated weights for policy 0, policy_version 287286 (0.0013) [2024-06-15 15:17:25,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 588382208. Throughput: 0: 10570.0. Samples: 147140096. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:17:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:17:29,791][1652475] Updated weights for policy 0, policy_version 287344 (0.0015) [2024-06-15 15:17:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 588513280. Throughput: 0: 10854.5. Samples: 147213312. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:17:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:17:32,086][1652475] Updated weights for policy 0, policy_version 287395 (0.0153) [2024-06-15 15:17:34,178][1652475] Updated weights for policy 0, policy_version 287488 (0.0026) [2024-06-15 15:17:35,496][1652475] Updated weights for policy 0, policy_version 287546 (0.0013) [2024-06-15 15:17:35,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 588906496. Throughput: 0: 10729.2. Samples: 147239936. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:17:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:17:40,738][1648984] Fps is (10 sec: 45872.8, 60 sec: 44782.6, 300 sec: 42876.0). Total num frames: 588972032. Throughput: 0: 10782.4. Samples: 147315200. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:17:40,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:17:41,193][1652475] Updated weights for policy 0, policy_version 287611 (0.0051) [2024-06-15 15:17:44,530][1652475] Updated weights for policy 0, policy_version 287665 (0.0014) [2024-06-15 15:17:45,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42598.5, 300 sec: 42987.2). Total num frames: 589234176. Throughput: 0: 11161.6. Samples: 147379200. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:17:45,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:17:46,318][1652475] Updated weights for policy 0, policy_version 287744 (0.0013) [2024-06-15 15:17:47,599][1652475] Updated weights for policy 0, policy_version 287803 (0.0098) [2024-06-15 15:17:50,738][1648984] Fps is (10 sec: 45877.1, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 589430784. Throughput: 0: 10945.4. Samples: 147407360. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:17:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:17:52,632][1652475] Updated weights for policy 0, policy_version 287869 (0.0012) [2024-06-15 15:17:55,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 42075.1, 300 sec: 42987.2). Total num frames: 589561856. Throughput: 0: 11184.4. Samples: 147481088. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:17:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:17:57,105][1651340] Signal inference workers to stop experience collection... (14800 times) [2024-06-15 15:17:57,134][1652475] Updated weights for policy 0, policy_version 287937 (0.0245) [2024-06-15 15:17:57,191][1652475] InferenceWorker_p0-w0: stopping experience collection (14800 times) [2024-06-15 15:17:57,340][1651340] Signal inference workers to resume experience collection... (14800 times) [2024-06-15 15:17:57,353][1652475] InferenceWorker_p0-w0: resuming experience collection (14800 times) [2024-06-15 15:17:58,347][1652475] Updated weights for policy 0, policy_version 288000 (0.0013) [2024-06-15 15:17:59,918][1652475] Updated weights for policy 0, policy_version 288061 (0.0013) [2024-06-15 15:18:00,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 589955072. Throughput: 0: 10945.5. Samples: 147533824. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:18:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:18:04,900][1652475] Updated weights for policy 0, policy_version 288120 (0.0015) [2024-06-15 15:18:05,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45329.1, 300 sec: 43098.3). Total num frames: 590086144. Throughput: 0: 11036.4. Samples: 147575296. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:18:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:18:08,075][1652475] Updated weights for policy 0, policy_version 288176 (0.0105) [2024-06-15 15:18:09,566][1652475] Updated weights for policy 0, policy_version 288227 (0.0013) [2024-06-15 15:18:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44783.0, 300 sec: 43209.3). Total num frames: 590381056. Throughput: 0: 11195.7. Samples: 147643904. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:18:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:18:11,697][1652475] Updated weights for policy 0, policy_version 288315 (0.0015) [2024-06-15 15:18:15,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44237.0, 300 sec: 42765.0). Total num frames: 590512128. Throughput: 0: 10877.2. Samples: 147702784. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:18:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:18:16,094][1652475] Updated weights for policy 0, policy_version 288354 (0.0018) [2024-06-15 15:18:18,984][1652475] Updated weights for policy 0, policy_version 288387 (0.0014) [2024-06-15 15:18:20,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 590741504. Throughput: 0: 11138.9. Samples: 147741184. Policy #0 lag: (min: 63.0, avg: 162.4, max: 338.0) [2024-06-15 15:18:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:18:22,460][1652475] Updated weights for policy 0, policy_version 288508 (0.0014) [2024-06-15 15:18:23,866][1652475] Updated weights for policy 0, policy_version 288544 (0.0040) [2024-06-15 15:18:25,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 591003648. Throughput: 0: 10820.4. Samples: 147802112. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:18:25,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:18:27,908][1652475] Updated weights for policy 0, policy_version 288608 (0.0046) [2024-06-15 15:18:30,475][1652475] Updated weights for policy 0, policy_version 288645 (0.0012) [2024-06-15 15:18:30,738][1648984] Fps is (10 sec: 42597.4, 60 sec: 44236.7, 300 sec: 43209.3). Total num frames: 591167488. Throughput: 0: 10945.4. Samples: 147871744. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:18:30,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:18:33,134][1652475] Updated weights for policy 0, policy_version 288721 (0.0013) [2024-06-15 15:18:33,856][1652475] Updated weights for policy 0, policy_version 288763 (0.0014) [2024-06-15 15:18:35,131][1652475] Updated weights for policy 0, policy_version 288801 (0.0011) [2024-06-15 15:18:35,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 591495168. Throughput: 0: 11036.5. Samples: 147904000. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:18:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:18:35,743][1652475] Updated weights for policy 0, policy_version 288830 (0.0014) [2024-06-15 15:18:40,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42598.7, 300 sec: 43098.3). Total num frames: 591527936. Throughput: 0: 10945.4. Samples: 147973632. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:18:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:18:41,746][1652475] Updated weights for policy 0, policy_version 288872 (0.0027) [2024-06-15 15:18:43,104][1651340] Signal inference workers to stop experience collection... (14850 times) [2024-06-15 15:18:43,147][1652475] InferenceWorker_p0-w0: stopping experience collection (14850 times) [2024-06-15 15:18:43,342][1651340] Signal inference workers to resume experience collection... (14850 times) [2024-06-15 15:18:43,355][1652475] InferenceWorker_p0-w0: resuming experience collection (14850 times) [2024-06-15 15:18:43,739][1652475] Updated weights for policy 0, policy_version 288960 (0.0012) [2024-06-15 15:18:45,126][1652475] Updated weights for policy 0, policy_version 289016 (0.0011) [2024-06-15 15:18:45,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44783.0, 300 sec: 43875.8). Total num frames: 591921152. Throughput: 0: 11104.7. Samples: 148033536. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:18:45,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:18:46,247][1652475] Updated weights for policy 0, policy_version 289056 (0.0036) [2024-06-15 15:18:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 592052224. Throughput: 0: 10956.8. Samples: 148068352. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:18:50,741][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:18:53,088][1652475] Updated weights for policy 0, policy_version 289104 (0.0024) [2024-06-15 15:18:55,064][1652475] Updated weights for policy 0, policy_version 289186 (0.0153) [2024-06-15 15:18:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 45875.1, 300 sec: 43542.5). Total num frames: 592314368. Throughput: 0: 11093.3. Samples: 148143104. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:18:55,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:18:56,372][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000289248_592379904.pth... [2024-06-15 15:18:56,479][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000284096_581828608.pth [2024-06-15 15:18:57,124][1652475] Updated weights for policy 0, policy_version 289277 (0.0013) [2024-06-15 15:18:58,616][1652475] Updated weights for policy 0, policy_version 289336 (0.0013) [2024-06-15 15:19:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 592576512. Throughput: 0: 11070.6. Samples: 148200960. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:19:00,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:19:04,867][1652475] Updated weights for policy 0, policy_version 289363 (0.0044) [2024-06-15 15:19:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 592707584. Throughput: 0: 11002.3. Samples: 148236288. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:19:05,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 15:19:07,542][1652475] Updated weights for policy 0, policy_version 289428 (0.0012) [2024-06-15 15:19:09,916][1652475] Updated weights for policy 0, policy_version 289522 (0.0101) [2024-06-15 15:19:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 593002496. Throughput: 0: 10956.8. Samples: 148295168. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:19:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:19:15,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 43144.4, 300 sec: 43098.2). Total num frames: 593100800. Throughput: 0: 10661.0. Samples: 148351488. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:19:15,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:19:17,134][1652475] Updated weights for policy 0, policy_version 289616 (0.0014) [2024-06-15 15:19:18,117][1652475] Updated weights for policy 0, policy_version 289664 (0.0014) [2024-06-15 15:19:20,738][1648984] Fps is (10 sec: 22937.6, 60 sec: 41506.0, 300 sec: 43098.3). Total num frames: 593231872. Throughput: 0: 10638.2. Samples: 148382720. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:19:20,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:19:22,709][1652475] Updated weights for policy 0, policy_version 289730 (0.0016) [2024-06-15 15:19:24,217][1652475] Updated weights for policy 0, policy_version 289792 (0.0132) [2024-06-15 15:19:25,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 593625088. Throughput: 0: 10456.2. Samples: 148444160. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:19:25,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:19:28,209][1651340] Signal inference workers to stop experience collection... (14900 times) [2024-06-15 15:19:28,242][1652475] InferenceWorker_p0-w0: stopping experience collection (14900 times) [2024-06-15 15:19:28,434][1651340] Signal inference workers to resume experience collection... (14900 times) [2024-06-15 15:19:28,435][1652475] InferenceWorker_p0-w0: resuming experience collection (14900 times) [2024-06-15 15:19:28,437][1652475] Updated weights for policy 0, policy_version 289872 (0.0014) [2024-06-15 15:19:29,328][1652475] Updated weights for policy 0, policy_version 289914 (0.0014) [2024-06-15 15:19:30,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 593756160. Throughput: 0: 10729.2. Samples: 148516352. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:19:30,739][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 15:19:32,948][1652475] Updated weights for policy 0, policy_version 289957 (0.0013) [2024-06-15 15:19:35,127][1652475] Updated weights for policy 0, policy_version 290042 (0.0013) [2024-06-15 15:19:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.3, 300 sec: 43875.8). Total num frames: 594018304. Throughput: 0: 10786.1. Samples: 148553728. Policy #0 lag: (min: 15.0, avg: 167.3, max: 335.0) [2024-06-15 15:19:35,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 15:19:37,267][1652475] Updated weights for policy 0, policy_version 290105 (0.0012) [2024-06-15 15:19:40,171][1652475] Updated weights for policy 0, policy_version 290160 (0.0012) [2024-06-15 15:19:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.1, 300 sec: 43542.6). Total num frames: 594280448. Throughput: 0: 10524.4. Samples: 148616704. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:19:40,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:19:43,724][1652475] Updated weights for policy 0, policy_version 290195 (0.0042) [2024-06-15 15:19:45,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 42598.3, 300 sec: 43764.7). Total num frames: 594477056. Throughput: 0: 10808.9. Samples: 148687360. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:19:45,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:19:46,064][1652475] Updated weights for policy 0, policy_version 290294 (0.0016) [2024-06-15 15:19:48,466][1652475] Updated weights for policy 0, policy_version 290352 (0.0147) [2024-06-15 15:19:50,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.7, 300 sec: 43653.7). Total num frames: 594673664. Throughput: 0: 10717.9. Samples: 148718592. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:19:50,740][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:19:50,872][1652475] Updated weights for policy 0, policy_version 290384 (0.0011) [2024-06-15 15:19:55,073][1652475] Updated weights for policy 0, policy_version 290433 (0.0038) [2024-06-15 15:19:55,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 594837504. Throughput: 0: 11093.3. Samples: 148794368. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:19:55,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:19:57,498][1652475] Updated weights for policy 0, policy_version 290532 (0.0212) [2024-06-15 15:19:59,981][1652475] Updated weights for policy 0, policy_version 290592 (0.0012) [2024-06-15 15:20:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 44097.9). Total num frames: 595197952. Throughput: 0: 11161.6. Samples: 148853760. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:20:00,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:20:02,447][1652475] Updated weights for policy 0, policy_version 290656 (0.0015) [2024-06-15 15:20:05,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 595329024. Throughput: 0: 11207.1. Samples: 148887040. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:20:05,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:20:07,467][1652475] Updated weights for policy 0, policy_version 290736 (0.0154) [2024-06-15 15:20:09,367][1652475] Updated weights for policy 0, policy_version 290774 (0.0014) [2024-06-15 15:20:10,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43144.4, 300 sec: 43988.1). Total num frames: 595591168. Throughput: 0: 11332.2. Samples: 148954112. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:20:10,739][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:20:11,541][1652475] Updated weights for policy 0, policy_version 290832 (0.0013) [2024-06-15 15:20:12,031][1651340] Signal inference workers to stop experience collection... (14950 times) [2024-06-15 15:20:12,095][1652475] InferenceWorker_p0-w0: stopping experience collection (14950 times) [2024-06-15 15:20:12,296][1651340] Signal inference workers to resume experience collection... (14950 times) [2024-06-15 15:20:12,297][1652475] InferenceWorker_p0-w0: resuming experience collection (14950 times) [2024-06-15 15:20:13,648][1652475] Updated weights for policy 0, policy_version 290883 (0.0015) [2024-06-15 15:20:15,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 45875.4, 300 sec: 43875.8). Total num frames: 595853312. Throughput: 0: 11241.3. Samples: 149022208. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:20:15,740][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:20:17,963][1652475] Updated weights for policy 0, policy_version 290947 (0.0012) [2024-06-15 15:20:20,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 595984384. Throughput: 0: 11264.0. Samples: 149060608. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:20:20,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:20:21,268][1652475] Updated weights for policy 0, policy_version 291042 (0.0082) [2024-06-15 15:20:24,416][1652475] Updated weights for policy 0, policy_version 291120 (0.0014) [2024-06-15 15:20:25,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 44098.0). Total num frames: 596312064. Throughput: 0: 11207.2. Samples: 149121024. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:20:25,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 15:20:25,967][1652475] Updated weights for policy 0, policy_version 291184 (0.0099) [2024-06-15 15:20:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.8, 300 sec: 43099.5). Total num frames: 596377600. Throughput: 0: 11104.7. Samples: 149187072. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:20:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:20:32,371][1652475] Updated weights for policy 0, policy_version 291221 (0.0024) [2024-06-15 15:20:33,647][1652475] Updated weights for policy 0, policy_version 291288 (0.0027) [2024-06-15 15:20:34,662][1652475] Updated weights for policy 0, policy_version 291328 (0.0014) [2024-06-15 15:20:35,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 596672512. Throughput: 0: 11093.3. Samples: 149217792. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:20:35,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:20:36,878][1652475] Updated weights for policy 0, policy_version 291408 (0.0013) [2024-06-15 15:20:37,972][1652475] Updated weights for policy 0, policy_version 291449 (0.0012) [2024-06-15 15:20:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.8, 300 sec: 43653.6). Total num frames: 596901888. Throughput: 0: 10843.1. Samples: 149282304. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:20:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:20:44,663][1652475] Updated weights for policy 0, policy_version 291520 (0.0013) [2024-06-15 15:20:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.6, 300 sec: 43653.6). Total num frames: 597065728. Throughput: 0: 11002.3. Samples: 149348864. Policy #0 lag: (min: 13.0, avg: 167.2, max: 381.0) [2024-06-15 15:20:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:20:47,552][1652475] Updated weights for policy 0, policy_version 291617 (0.0015) [2024-06-15 15:20:50,092][1652475] Updated weights for policy 0, policy_version 291654 (0.0025) [2024-06-15 15:20:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 43764.8). Total num frames: 597360640. Throughput: 0: 10820.3. Samples: 149373952. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:20:50,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:20:55,646][1652475] Updated weights for policy 0, policy_version 291718 (0.0018) [2024-06-15 15:20:55,738][1648984] Fps is (10 sec: 36044.1, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 597426176. Throughput: 0: 10956.8. Samples: 149447168. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:20:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:20:56,149][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000291744_597491712.pth... [2024-06-15 15:20:56,294][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000286672_587104256.pth [2024-06-15 15:20:57,942][1652475] Updated weights for policy 0, policy_version 291796 (0.0013) [2024-06-15 15:20:58,525][1651340] Signal inference workers to stop experience collection... (15000 times) [2024-06-15 15:20:58,578][1652475] InferenceWorker_p0-w0: stopping experience collection (15000 times) [2024-06-15 15:20:58,743][1651340] Signal inference workers to resume experience collection... (15000 times) [2024-06-15 15:20:58,744][1652475] InferenceWorker_p0-w0: resuming experience collection (15000 times) [2024-06-15 15:20:59,258][1652475] Updated weights for policy 0, policy_version 291862 (0.0082) [2024-06-15 15:21:00,090][1652475] Updated weights for policy 0, policy_version 291898 (0.0011) [2024-06-15 15:21:00,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.7, 300 sec: 43987.3). Total num frames: 597819392. Throughput: 0: 10786.1. Samples: 149507584. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:00,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:21:03,066][1652475] Updated weights for policy 0, policy_version 291942 (0.0013) [2024-06-15 15:21:05,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 597950464. Throughput: 0: 10740.6. Samples: 149543936. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:21:07,279][1652475] Updated weights for policy 0, policy_version 291984 (0.0012) [2024-06-15 15:21:09,220][1652475] Updated weights for policy 0, policy_version 292053 (0.0015) [2024-06-15 15:21:10,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.9, 300 sec: 44098.0). Total num frames: 598245376. Throughput: 0: 10922.7. Samples: 149612544. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:21:10,835][1652475] Updated weights for policy 0, policy_version 292116 (0.0014) [2024-06-15 15:21:15,396][1652475] Updated weights for policy 0, policy_version 292192 (0.0013) [2024-06-15 15:21:15,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 598409216. Throughput: 0: 10774.8. Samples: 149671936. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:15,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:21:20,273][1652475] Updated weights for policy 0, policy_version 292243 (0.0031) [2024-06-15 15:21:20,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 598573056. Throughput: 0: 10820.2. Samples: 149704704. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:21:21,210][1652475] Updated weights for policy 0, policy_version 292289 (0.0013) [2024-06-15 15:21:23,091][1652475] Updated weights for policy 0, policy_version 292368 (0.0089) [2024-06-15 15:21:25,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 598867968. Throughput: 0: 10683.7. Samples: 149763072. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:25,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:21:26,998][1652475] Updated weights for policy 0, policy_version 292433 (0.0028) [2024-06-15 15:21:30,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 598999040. Throughput: 0: 10774.8. Samples: 149833728. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:21:32,831][1652475] Updated weights for policy 0, policy_version 292512 (0.0148) [2024-06-15 15:21:33,685][1652475] Updated weights for policy 0, policy_version 292544 (0.0013) [2024-06-15 15:21:35,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42598.4, 300 sec: 43875.8). Total num frames: 599228416. Throughput: 0: 10854.4. Samples: 149862400. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:35,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 15:21:35,940][1652475] Updated weights for policy 0, policy_version 292608 (0.0012) [2024-06-15 15:21:37,430][1652475] Updated weights for policy 0, policy_version 292672 (0.0013) [2024-06-15 15:21:39,776][1652475] Updated weights for policy 0, policy_version 292732 (0.0013) [2024-06-15 15:21:40,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 599523328. Throughput: 0: 10513.1. Samples: 149920256. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:40,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:21:44,644][1651340] Signal inference workers to stop experience collection... (15050 times) [2024-06-15 15:21:44,680][1652475] InferenceWorker_p0-w0: stopping experience collection (15050 times) [2024-06-15 15:21:44,691][1652475] Updated weights for policy 0, policy_version 292771 (0.0013) [2024-06-15 15:21:44,997][1651340] Signal inference workers to resume experience collection... (15050 times) [2024-06-15 15:21:44,998][1652475] InferenceWorker_p0-w0: resuming experience collection (15050 times) [2024-06-15 15:21:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 599654400. Throughput: 0: 10695.1. Samples: 149988864. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:45,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:21:48,620][1652475] Updated weights for policy 0, policy_version 292864 (0.0012) [2024-06-15 15:21:50,183][1652475] Updated weights for policy 0, policy_version 292916 (0.0015) [2024-06-15 15:21:50,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 42598.4, 300 sec: 43658.4). Total num frames: 599916544. Throughput: 0: 10569.9. Samples: 150019584. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:50,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 15:21:51,607][1652475] Updated weights for policy 0, policy_version 292984 (0.0012) [2024-06-15 15:21:55,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.9, 300 sec: 43209.3). Total num frames: 600080384. Throughput: 0: 10535.8. Samples: 150086656. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:21:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:21:56,675][1652475] Updated weights for policy 0, policy_version 293048 (0.0014) [2024-06-15 15:22:00,494][1652475] Updated weights for policy 0, policy_version 293113 (0.0016) [2024-06-15 15:22:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 43875.8). Total num frames: 600309760. Throughput: 0: 10672.3. Samples: 150152192. Policy #0 lag: (min: 10.0, avg: 156.5, max: 298.0) [2024-06-15 15:22:00,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:22:02,382][1652475] Updated weights for policy 0, policy_version 293168 (0.0011) [2024-06-15 15:22:04,027][1652475] Updated weights for policy 0, policy_version 293241 (0.0095) [2024-06-15 15:22:05,738][1648984] Fps is (10 sec: 49151.2, 60 sec: 43690.5, 300 sec: 43653.6). Total num frames: 600571904. Throughput: 0: 10615.4. Samples: 150182400. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:05,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:22:08,688][1652475] Updated weights for policy 0, policy_version 293296 (0.0015) [2024-06-15 15:22:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 40959.9, 300 sec: 43542.6). Total num frames: 600702976. Throughput: 0: 10752.0. Samples: 150246912. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:10,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:22:12,689][1652475] Updated weights for policy 0, policy_version 293369 (0.0015) [2024-06-15 15:22:14,743][1652475] Updated weights for policy 0, policy_version 293424 (0.0014) [2024-06-15 15:22:15,738][1648984] Fps is (10 sec: 42599.5, 60 sec: 43144.5, 300 sec: 43764.7). Total num frames: 600997888. Throughput: 0: 10535.8. Samples: 150307840. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:22:16,688][1652475] Updated weights for policy 0, policy_version 293504 (0.0044) [2024-06-15 15:22:20,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 601161728. Throughput: 0: 10570.0. Samples: 150338048. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:20,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:22:23,094][1652475] Updated weights for policy 0, policy_version 293572 (0.0017) [2024-06-15 15:22:24,151][1652475] Updated weights for policy 0, policy_version 293632 (0.0092) [2024-06-15 15:22:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 601391104. Throughput: 0: 10900.0. Samples: 150410752. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:22:26,460][1652475] Updated weights for policy 0, policy_version 293694 (0.0015) [2024-06-15 15:22:27,839][1652475] Updated weights for policy 0, policy_version 293733 (0.0014) [2024-06-15 15:22:30,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 601620480. Throughput: 0: 10945.4. Samples: 150481408. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:30,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:22:31,627][1652475] Updated weights for policy 0, policy_version 293765 (0.0013) [2024-06-15 15:22:31,893][1651340] Signal inference workers to stop experience collection... (15100 times) [2024-06-15 15:22:31,992][1652475] InferenceWorker_p0-w0: stopping experience collection (15100 times) [2024-06-15 15:22:32,225][1651340] Signal inference workers to resume experience collection... (15100 times) [2024-06-15 15:22:32,226][1652475] InferenceWorker_p0-w0: resuming experience collection (15100 times) [2024-06-15 15:22:34,177][1652475] Updated weights for policy 0, policy_version 293826 (0.0147) [2024-06-15 15:22:35,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 44236.8, 300 sec: 43764.8). Total num frames: 601882624. Throughput: 0: 10968.2. Samples: 150513152. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:22:37,587][1652475] Updated weights for policy 0, policy_version 293920 (0.0055) [2024-06-15 15:22:39,513][1652475] Updated weights for policy 0, policy_version 293985 (0.0013) [2024-06-15 15:22:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 602144768. Throughput: 0: 10843.0. Samples: 150574592. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:22:45,617][1652475] Updated weights for policy 0, policy_version 294051 (0.0012) [2024-06-15 15:22:45,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 602210304. Throughput: 0: 11036.5. Samples: 150648832. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:22:47,098][1652475] Updated weights for policy 0, policy_version 294129 (0.0025) [2024-06-15 15:22:49,620][1652475] Updated weights for policy 0, policy_version 294192 (0.0013) [2024-06-15 15:22:50,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 44098.0). Total num frames: 602570752. Throughput: 0: 11013.8. Samples: 150678016. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:50,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:22:51,409][1652475] Updated weights for policy 0, policy_version 294271 (0.0022) [2024-06-15 15:22:55,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 602669056. Throughput: 0: 11047.8. Samples: 150744064. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:22:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:22:55,762][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000294272_602669056.pth... [2024-06-15 15:22:55,859][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000289248_592379904.pth [2024-06-15 15:22:58,396][1652475] Updated weights for policy 0, policy_version 294323 (0.0014) [2024-06-15 15:23:00,118][1652475] Updated weights for policy 0, policy_version 294399 (0.0194) [2024-06-15 15:23:00,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 602931200. Throughput: 0: 11059.2. Samples: 150805504. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:23:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:23:01,982][1652475] Updated weights for policy 0, policy_version 294464 (0.0014) [2024-06-15 15:23:05,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.9, 300 sec: 43431.5). Total num frames: 603193344. Throughput: 0: 10922.7. Samples: 150829568. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:23:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:23:09,578][1652475] Updated weights for policy 0, policy_version 294544 (0.0014) [2024-06-15 15:23:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 603324416. Throughput: 0: 10979.6. Samples: 150904832. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:23:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:23:11,028][1652475] Updated weights for policy 0, policy_version 294594 (0.0014) [2024-06-15 15:23:12,520][1652475] Updated weights for policy 0, policy_version 294656 (0.0012) [2024-06-15 15:23:14,732][1652475] Updated weights for policy 0, policy_version 294718 (0.0049) [2024-06-15 15:23:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 603586560. Throughput: 0: 10604.1. Samples: 150958592. Policy #0 lag: (min: 95.0, avg: 208.2, max: 330.0) [2024-06-15 15:23:15,740][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:23:17,055][1651340] Signal inference workers to stop experience collection... (15150 times) [2024-06-15 15:23:17,121][1652475] InferenceWorker_p0-w0: stopping experience collection (15150 times) [2024-06-15 15:23:17,232][1651340] Signal inference workers to resume experience collection... (15150 times) [2024-06-15 15:23:17,233][1652475] InferenceWorker_p0-w0: resuming experience collection (15150 times) [2024-06-15 15:23:17,359][1652475] Updated weights for policy 0, policy_version 294776 (0.0029) [2024-06-15 15:23:20,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 603717632. Throughput: 0: 10649.6. Samples: 150992384. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:23:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:23:22,367][1652475] Updated weights for policy 0, policy_version 294848 (0.0015) [2024-06-15 15:23:25,405][1652475] Updated weights for policy 0, policy_version 294917 (0.0023) [2024-06-15 15:23:25,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 604012544. Throughput: 0: 10729.3. Samples: 151057408. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:23:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:23:26,682][1652475] Updated weights for policy 0, policy_version 294968 (0.0013) [2024-06-15 15:23:29,567][1652475] Updated weights for policy 0, policy_version 295031 (0.0012) [2024-06-15 15:23:30,738][1648984] Fps is (10 sec: 52427.4, 60 sec: 43690.5, 300 sec: 43209.3). Total num frames: 604241920. Throughput: 0: 10513.0. Samples: 151121920. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:23:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:23:34,526][1652475] Updated weights for policy 0, policy_version 295088 (0.0014) [2024-06-15 15:23:35,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 604438528. Throughput: 0: 10763.4. Samples: 151162368. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:23:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:23:36,356][1652475] Updated weights for policy 0, policy_version 295165 (0.0016) [2024-06-15 15:23:38,705][1652475] Updated weights for policy 0, policy_version 295222 (0.0093) [2024-06-15 15:23:40,489][1652475] Updated weights for policy 0, policy_version 295250 (0.0015) [2024-06-15 15:23:40,738][1648984] Fps is (10 sec: 45876.6, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 604700672. Throughput: 0: 10547.2. Samples: 151218688. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:23:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:23:45,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 604766208. Throughput: 0: 10774.7. Samples: 151290368. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:23:45,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:23:46,320][1652475] Updated weights for policy 0, policy_version 295312 (0.0014) [2024-06-15 15:23:48,191][1652475] Updated weights for policy 0, policy_version 295377 (0.0015) [2024-06-15 15:23:49,922][1652475] Updated weights for policy 0, policy_version 295441 (0.0013) [2024-06-15 15:23:50,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 605159424. Throughput: 0: 10922.6. Samples: 151321088. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:23:50,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 15:23:52,375][1652475] Updated weights for policy 0, policy_version 295520 (0.0098) [2024-06-15 15:23:55,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 605290496. Throughput: 0: 10535.8. Samples: 151378944. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:23:55,739][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 15:23:57,203][1652475] Updated weights for policy 0, policy_version 295554 (0.0013) [2024-06-15 15:24:00,738][1648984] Fps is (10 sec: 26214.4, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 605421568. Throughput: 0: 10888.5. Samples: 151448576. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:24:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:24:01,544][1652475] Updated weights for policy 0, policy_version 295632 (0.0044) [2024-06-15 15:24:03,129][1652475] Updated weights for policy 0, policy_version 295686 (0.0013) [2024-06-15 15:24:04,801][1652475] Updated weights for policy 0, policy_version 295760 (0.0040) [2024-06-15 15:24:04,930][1651340] Signal inference workers to stop experience collection... (15200 times) [2024-06-15 15:24:04,957][1652475] InferenceWorker_p0-w0: stopping experience collection (15200 times) [2024-06-15 15:24:05,112][1651340] Signal inference workers to resume experience collection... (15200 times) [2024-06-15 15:24:05,113][1652475] InferenceWorker_p0-w0: resuming experience collection (15200 times) [2024-06-15 15:24:05,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 605814784. Throughput: 0: 10808.9. Samples: 151478784. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:24:05,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:24:09,832][1652475] Updated weights for policy 0, policy_version 295856 (0.0102) [2024-06-15 15:24:10,751][1648984] Fps is (10 sec: 52360.0, 60 sec: 43681.1, 300 sec: 43540.6). Total num frames: 605945856. Throughput: 0: 10783.0. Samples: 151542784. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:24:10,752][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:24:14,348][1652475] Updated weights for policy 0, policy_version 295931 (0.0013) [2024-06-15 15:24:15,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 606142464. Throughput: 0: 10740.7. Samples: 151605248. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:24:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:24:17,525][1652475] Updated weights for policy 0, policy_version 296020 (0.0014) [2024-06-15 15:24:18,365][1652475] Updated weights for policy 0, policy_version 296062 (0.0073) [2024-06-15 15:24:20,739][1648984] Fps is (10 sec: 39369.8, 60 sec: 43690.0, 300 sec: 43098.1). Total num frames: 606339072. Throughput: 0: 10467.3. Samples: 151633408. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:24:20,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:24:21,838][1652475] Updated weights for policy 0, policy_version 296123 (0.0013) [2024-06-15 15:24:25,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 606535680. Throughput: 0: 10843.0. Samples: 151706624. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:24:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:24:25,921][1652475] Updated weights for policy 0, policy_version 296184 (0.0038) [2024-06-15 15:24:27,076][1652475] Updated weights for policy 0, policy_version 296227 (0.0012) [2024-06-15 15:24:29,285][1652475] Updated weights for policy 0, policy_version 296276 (0.0018) [2024-06-15 15:24:30,312][1652475] Updated weights for policy 0, policy_version 296320 (0.0014) [2024-06-15 15:24:30,738][1648984] Fps is (10 sec: 52433.9, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 606863360. Throughput: 0: 10729.3. Samples: 151773184. Policy #0 lag: (min: 31.0, avg: 177.8, max: 331.0) [2024-06-15 15:24:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:24:33,257][1652475] Updated weights for policy 0, policy_version 296381 (0.0017) [2024-06-15 15:24:35,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 606994432. Throughput: 0: 10752.0. Samples: 151804928. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:24:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:24:37,421][1652475] Updated weights for policy 0, policy_version 296446 (0.0013) [2024-06-15 15:24:38,752][1652475] Updated weights for policy 0, policy_version 296510 (0.0014) [2024-06-15 15:24:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 607256576. Throughput: 0: 11047.8. Samples: 151876096. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:24:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:24:41,977][1652475] Updated weights for policy 0, policy_version 296573 (0.0013) [2024-06-15 15:24:44,497][1652475] Updated weights for policy 0, policy_version 296637 (0.0024) [2024-06-15 15:24:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 607518720. Throughput: 0: 10979.6. Samples: 151942656. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:24:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:24:49,014][1652475] Updated weights for policy 0, policy_version 296704 (0.0019) [2024-06-15 15:24:50,137][1652475] Updated weights for policy 0, policy_version 296767 (0.0013) [2024-06-15 15:24:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 607780864. Throughput: 0: 11241.3. Samples: 151984640. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:24:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:24:52,693][1651340] Signal inference workers to stop experience collection... (15250 times) [2024-06-15 15:24:52,746][1652475] InferenceWorker_p0-w0: stopping experience collection (15250 times) [2024-06-15 15:24:52,939][1651340] Signal inference workers to resume experience collection... (15250 times) [2024-06-15 15:24:52,940][1652475] InferenceWorker_p0-w0: resuming experience collection (15250 times) [2024-06-15 15:24:53,368][1652475] Updated weights for policy 0, policy_version 296819 (0.0014) [2024-06-15 15:24:55,218][1652475] Updated weights for policy 0, policy_version 296864 (0.0012) [2024-06-15 15:24:55,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 608010240. Throughput: 0: 11290.1. Samples: 152050688. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:24:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:24:55,817][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000296896_608043008.pth... [2024-06-15 15:24:55,884][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000291744_597491712.pth [2024-06-15 15:25:00,152][1652475] Updated weights for policy 0, policy_version 296946 (0.0012) [2024-06-15 15:25:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 46421.4, 300 sec: 43653.7). Total num frames: 608206848. Throughput: 0: 11377.8. Samples: 152117248. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:25:00,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 15:25:01,475][1652475] Updated weights for policy 0, policy_version 297015 (0.0014) [2024-06-15 15:25:04,461][1652475] Updated weights for policy 0, policy_version 297072 (0.0015) [2024-06-15 15:25:05,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 608436224. Throughput: 0: 11560.1. Samples: 152153600. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:25:05,740][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:25:09,045][1652475] Updated weights for policy 0, policy_version 297120 (0.0015) [2024-06-15 15:25:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43700.3, 300 sec: 43098.2). Total num frames: 608567296. Throughput: 0: 11389.2. Samples: 152219136. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:25:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:25:12,052][1652475] Updated weights for policy 0, policy_version 297232 (0.0214) [2024-06-15 15:25:15,395][1652475] Updated weights for policy 0, policy_version 297283 (0.0011) [2024-06-15 15:25:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45329.0, 300 sec: 43653.6). Total num frames: 608862208. Throughput: 0: 11264.0. Samples: 152280064. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:25:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:25:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43691.4, 300 sec: 42876.1). Total num frames: 608960512. Throughput: 0: 11229.9. Samples: 152310272. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:25:20,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 15:25:21,159][1652475] Updated weights for policy 0, policy_version 297345 (0.0014) [2024-06-15 15:25:23,052][1652475] Updated weights for policy 0, policy_version 297440 (0.0013) [2024-06-15 15:25:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 43764.7). Total num frames: 609288192. Throughput: 0: 11093.3. Samples: 152375296. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:25:25,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:25:25,898][1652475] Updated weights for policy 0, policy_version 297505 (0.0019) [2024-06-15 15:25:28,581][1652475] Updated weights for policy 0, policy_version 297570 (0.0013) [2024-06-15 15:25:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 609484800. Throughput: 0: 10979.6. Samples: 152436736. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:25:30,738][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 15:25:33,527][1652475] Updated weights for policy 0, policy_version 297632 (0.0028) [2024-06-15 15:25:35,579][1652475] Updated weights for policy 0, policy_version 297725 (0.0017) [2024-06-15 15:25:35,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 609746944. Throughput: 0: 10877.2. Samples: 152474112. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:25:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:25:39,298][1651340] Signal inference workers to stop experience collection... (15300 times) [2024-06-15 15:25:39,374][1652475] InferenceWorker_p0-w0: stopping experience collection (15300 times) [2024-06-15 15:25:39,543][1651340] Signal inference workers to resume experience collection... (15300 times) [2024-06-15 15:25:39,558][1652475] InferenceWorker_p0-w0: resuming experience collection (15300 times) [2024-06-15 15:25:39,915][1652475] Updated weights for policy 0, policy_version 297792 (0.0021) [2024-06-15 15:25:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 609910784. Throughput: 0: 10740.6. Samples: 152534016. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:25:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:25:41,607][1652475] Updated weights for policy 0, policy_version 297852 (0.0014) [2024-06-15 15:25:45,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 610074624. Throughput: 0: 10820.2. Samples: 152604160. Policy #0 lag: (min: 14.0, avg: 135.1, max: 270.0) [2024-06-15 15:25:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:25:45,942][1652475] Updated weights for policy 0, policy_version 297908 (0.0012) [2024-06-15 15:25:47,569][1652475] Updated weights for policy 0, policy_version 297978 (0.0014) [2024-06-15 15:25:50,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 610271232. Throughput: 0: 10558.6. Samples: 152628736. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:25:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:25:52,581][1652475] Updated weights for policy 0, policy_version 298049 (0.0013) [2024-06-15 15:25:55,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42052.2, 300 sec: 43098.2). Total num frames: 610533376. Throughput: 0: 10387.9. Samples: 152686592. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:25:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:25:58,009][1652475] Updated weights for policy 0, policy_version 298113 (0.0013) [2024-06-15 15:25:59,688][1652475] Updated weights for policy 0, policy_version 298177 (0.0014) [2024-06-15 15:26:00,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42052.2, 300 sec: 43320.4). Total num frames: 610729984. Throughput: 0: 10524.4. Samples: 152753664. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:26:01,030][1652475] Updated weights for policy 0, policy_version 298234 (0.0013) [2024-06-15 15:26:04,640][1652475] Updated weights for policy 0, policy_version 298288 (0.0012) [2024-06-15 15:26:05,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 610992128. Throughput: 0: 10581.3. Samples: 152786432. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:26:05,891][1652475] Updated weights for policy 0, policy_version 298339 (0.0012) [2024-06-15 15:26:09,383][1652475] Updated weights for policy 0, policy_version 298384 (0.0013) [2024-06-15 15:26:10,738][1648984] Fps is (10 sec: 45872.5, 60 sec: 43690.2, 300 sec: 43320.3). Total num frames: 611188736. Throughput: 0: 10649.5. Samples: 152854528. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:10,740][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:26:13,334][1652475] Updated weights for policy 0, policy_version 298435 (0.0013) [2024-06-15 15:26:14,707][1652475] Updated weights for policy 0, policy_version 298492 (0.0015) [2024-06-15 15:26:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.3, 300 sec: 43431.5). Total num frames: 611385344. Throughput: 0: 10638.2. Samples: 152915456. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:26:16,191][1652475] Updated weights for policy 0, policy_version 298552 (0.0013) [2024-06-15 15:26:20,738][1648984] Fps is (10 sec: 39323.8, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 611581952. Throughput: 0: 10433.4. Samples: 152943616. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:26:20,825][1652475] Updated weights for policy 0, policy_version 298627 (0.0014) [2024-06-15 15:26:22,192][1652475] Updated weights for policy 0, policy_version 298688 (0.0012) [2024-06-15 15:26:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 611778560. Throughput: 0: 10604.1. Samples: 153011200. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:26:25,782][1651340] Signal inference workers to stop experience collection... (15350 times) [2024-06-15 15:26:25,864][1652475] InferenceWorker_p0-w0: stopping experience collection (15350 times) [2024-06-15 15:26:26,057][1651340] Signal inference workers to resume experience collection... (15350 times) [2024-06-15 15:26:26,058][1652475] InferenceWorker_p0-w0: resuming experience collection (15350 times) [2024-06-15 15:26:26,811][1652475] Updated weights for policy 0, policy_version 298756 (0.0124) [2024-06-15 15:26:28,237][1652475] Updated weights for policy 0, policy_version 298815 (0.0012) [2024-06-15 15:26:30,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 41506.0, 300 sec: 43209.3). Total num frames: 611975168. Throughput: 0: 10478.9. Samples: 153075712. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:30,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:26:32,741][1652475] Updated weights for policy 0, policy_version 298876 (0.0015) [2024-06-15 15:26:34,307][1652475] Updated weights for policy 0, policy_version 298942 (0.0016) [2024-06-15 15:26:35,750][1648984] Fps is (10 sec: 45827.8, 60 sec: 41498.9, 300 sec: 43096.8). Total num frames: 612237312. Throughput: 0: 10681.3. Samples: 153109504. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:35,753][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:26:37,958][1652475] Updated weights for policy 0, policy_version 298997 (0.0018) [2024-06-15 15:26:40,740][1648984] Fps is (10 sec: 52420.9, 60 sec: 43143.3, 300 sec: 43542.3). Total num frames: 612499456. Throughput: 0: 10649.2. Samples: 153165824. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:40,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:26:44,176][1652475] Updated weights for policy 0, policy_version 299078 (0.0013) [2024-06-15 15:26:45,738][1648984] Fps is (10 sec: 39362.2, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 612630528. Throughput: 0: 10843.0. Samples: 153241600. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:26:46,015][1652475] Updated weights for policy 0, policy_version 299153 (0.0013) [2024-06-15 15:26:47,006][1652475] Updated weights for policy 0, policy_version 299200 (0.0012) [2024-06-15 15:26:50,738][1648984] Fps is (10 sec: 39328.5, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 612892672. Throughput: 0: 10797.5. Samples: 153272320. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:26:51,434][1652475] Updated weights for policy 0, policy_version 299296 (0.0139) [2024-06-15 15:26:55,740][1648984] Fps is (10 sec: 39313.6, 60 sec: 41504.7, 300 sec: 43097.9). Total num frames: 613023744. Throughput: 0: 10660.6. Samples: 153334272. Policy #0 lag: (min: 44.0, avg: 130.1, max: 299.0) [2024-06-15 15:26:55,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:26:55,763][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000299328_613023744.pth... [2024-06-15 15:26:55,868][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000294272_602669056.pth [2024-06-15 15:26:56,964][1652475] Updated weights for policy 0, policy_version 299360 (0.0019) [2024-06-15 15:26:59,099][1652475] Updated weights for policy 0, policy_version 299440 (0.0015) [2024-06-15 15:27:00,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42598.3, 300 sec: 43098.3). Total num frames: 613285888. Throughput: 0: 10695.1. Samples: 153396736. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:00,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:27:02,365][1652475] Updated weights for policy 0, policy_version 299504 (0.0014) [2024-06-15 15:27:04,264][1652475] Updated weights for policy 0, policy_version 299584 (0.0011) [2024-06-15 15:27:05,738][1648984] Fps is (10 sec: 52439.4, 60 sec: 42598.3, 300 sec: 43542.6). Total num frames: 613548032. Throughput: 0: 10672.3. Samples: 153423872. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:27:10,359][1651340] Signal inference workers to stop experience collection... (15400 times) [2024-06-15 15:27:10,410][1652475] InferenceWorker_p0-w0: stopping experience collection (15400 times) [2024-06-15 15:27:10,606][1651340] Signal inference workers to resume experience collection... (15400 times) [2024-06-15 15:27:10,606][1652475] InferenceWorker_p0-w0: resuming experience collection (15400 times) [2024-06-15 15:27:10,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 42052.7, 300 sec: 43098.3). Total num frames: 613711872. Throughput: 0: 10717.9. Samples: 153493504. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:27:10,857][1652475] Updated weights for policy 0, policy_version 299668 (0.0012) [2024-06-15 15:27:14,224][1652475] Updated weights for policy 0, policy_version 299744 (0.0013) [2024-06-15 15:27:15,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 613974016. Throughput: 0: 10615.5. Samples: 153553408. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:27:16,381][1652475] Updated weights for policy 0, policy_version 299824 (0.0097) [2024-06-15 15:27:20,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 614072320. Throughput: 0: 10617.9. Samples: 153587200. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:27:21,851][1652475] Updated weights for policy 0, policy_version 299872 (0.0014) [2024-06-15 15:27:23,557][1652475] Updated weights for policy 0, policy_version 299938 (0.0011) [2024-06-15 15:27:25,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 614334464. Throughput: 0: 10718.3. Samples: 153648128. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 15:27:25,908][1652475] Updated weights for policy 0, policy_version 299987 (0.0022) [2024-06-15 15:27:28,260][1652475] Updated weights for policy 0, policy_version 300053 (0.0016) [2024-06-15 15:27:30,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 614596608. Throughput: 0: 10490.3. Samples: 153713664. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:27:33,001][1652475] Updated weights for policy 0, policy_version 300100 (0.0013) [2024-06-15 15:27:35,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 41513.4, 300 sec: 42654.0). Total num frames: 614727680. Throughput: 0: 10615.5. Samples: 153750016. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:35,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 15:27:36,861][1652475] Updated weights for policy 0, policy_version 300161 (0.0022) [2024-06-15 15:27:38,787][1652475] Updated weights for policy 0, policy_version 300250 (0.0014) [2024-06-15 15:27:39,823][1652475] Updated weights for policy 0, policy_version 300289 (0.0017) [2024-06-15 15:27:40,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43145.7, 300 sec: 43653.6). Total num frames: 615088128. Throughput: 0: 10468.0. Samples: 153805312. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:40,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:27:44,668][1652475] Updated weights for policy 0, policy_version 300368 (0.0014) [2024-06-15 15:27:45,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 615219200. Throughput: 0: 10615.5. Samples: 153874432. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:27:48,853][1652475] Updated weights for policy 0, policy_version 300432 (0.0012) [2024-06-15 15:27:50,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 615383040. Throughput: 0: 10820.3. Samples: 153910784. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:50,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:27:51,805][1652475] Updated weights for policy 0, policy_version 300517 (0.0016) [2024-06-15 15:27:53,078][1652475] Updated weights for policy 0, policy_version 300580 (0.0016) [2024-06-15 15:27:55,740][1648984] Fps is (10 sec: 42588.5, 60 sec: 43690.5, 300 sec: 43097.9). Total num frames: 615645184. Throughput: 0: 10649.0. Samples: 153972736. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:27:55,741][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 15:27:55,780][1652475] Updated weights for policy 0, policy_version 300624 (0.0012) [2024-06-15 15:27:55,894][1651340] Signal inference workers to stop experience collection... (15450 times) [2024-06-15 15:27:55,952][1652475] InferenceWorker_p0-w0: stopping experience collection (15450 times) [2024-06-15 15:27:56,202][1651340] Signal inference workers to resume experience collection... (15450 times) [2024-06-15 15:27:56,202][1652475] InferenceWorker_p0-w0: resuming experience collection (15450 times) [2024-06-15 15:27:56,865][1652475] Updated weights for policy 0, policy_version 300665 (0.0011) [2024-06-15 15:28:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 615809024. Throughput: 0: 11047.8. Samples: 154050560. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:28:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:28:01,476][1652475] Updated weights for policy 0, policy_version 300736 (0.0014) [2024-06-15 15:28:03,440][1652475] Updated weights for policy 0, policy_version 300784 (0.0013) [2024-06-15 15:28:04,990][1652475] Updated weights for policy 0, policy_version 300857 (0.0014) [2024-06-15 15:28:05,738][1648984] Fps is (10 sec: 52441.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 616169472. Throughput: 0: 10922.7. Samples: 154078720. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:28:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:28:07,997][1652475] Updated weights for policy 0, policy_version 300899 (0.0013) [2024-06-15 15:28:08,503][1652475] Updated weights for policy 0, policy_version 300928 (0.0013) [2024-06-15 15:28:10,740][1648984] Fps is (10 sec: 49142.8, 60 sec: 43143.1, 300 sec: 43098.0). Total num frames: 616300544. Throughput: 0: 11172.5. Samples: 154150912. Policy #0 lag: (min: 105.0, avg: 198.1, max: 383.0) [2024-06-15 15:28:10,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:28:12,772][1652475] Updated weights for policy 0, policy_version 300982 (0.0020) [2024-06-15 15:28:14,246][1652475] Updated weights for policy 0, policy_version 301041 (0.0113) [2024-06-15 15:28:15,561][1652475] Updated weights for policy 0, policy_version 301114 (0.0015) [2024-06-15 15:28:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 45329.1, 300 sec: 43986.9). Total num frames: 616693760. Throughput: 0: 11116.1. Samples: 154213888. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:28:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:28:19,413][1652475] Updated weights for policy 0, policy_version 301175 (0.0012) [2024-06-15 15:28:20,738][1648984] Fps is (10 sec: 52439.0, 60 sec: 45875.2, 300 sec: 43431.5). Total num frames: 616824832. Throughput: 0: 11173.0. Samples: 154252800. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:28:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:28:23,591][1652475] Updated weights for policy 0, policy_version 301216 (0.0020) [2024-06-15 15:28:25,494][1652475] Updated weights for policy 0, policy_version 301267 (0.0015) [2024-06-15 15:28:25,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 44783.0, 300 sec: 43320.5). Total num frames: 617021440. Throughput: 0: 11468.8. Samples: 154321408. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:28:25,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:28:27,248][1652475] Updated weights for policy 0, policy_version 301331 (0.0025) [2024-06-15 15:28:30,060][1652475] Updated weights for policy 0, policy_version 301412 (0.0014) [2024-06-15 15:28:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 43764.7). Total num frames: 617349120. Throughput: 0: 11286.8. Samples: 154382336. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:28:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:28:34,549][1652475] Updated weights for policy 0, policy_version 301472 (0.0012) [2024-06-15 15:28:35,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 45875.1, 300 sec: 43320.4). Total num frames: 617480192. Throughput: 0: 11264.0. Samples: 154417664. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:28:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:28:36,881][1652475] Updated weights for policy 0, policy_version 301522 (0.0026) [2024-06-15 15:28:39,680][1652475] Updated weights for policy 0, policy_version 301573 (0.0013) [2024-06-15 15:28:40,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 617709568. Throughput: 0: 11367.0. Samples: 154484224. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:28:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:28:42,831][1651340] Signal inference workers to stop experience collection... (15500 times) [2024-06-15 15:28:42,869][1652475] InferenceWorker_p0-w0: stopping experience collection (15500 times) [2024-06-15 15:28:43,063][1651340] Signal inference workers to resume experience collection... (15500 times) [2024-06-15 15:28:43,064][1652475] InferenceWorker_p0-w0: resuming experience collection (15500 times) [2024-06-15 15:28:43,448][1652475] Updated weights for policy 0, policy_version 301680 (0.0083) [2024-06-15 15:28:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 617873408. Throughput: 0: 11104.7. Samples: 154550272. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:28:45,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:28:46,900][1652475] Updated weights for policy 0, policy_version 301760 (0.0017) [2024-06-15 15:28:49,029][1652475] Updated weights for policy 0, policy_version 301812 (0.0056) [2024-06-15 15:28:50,741][1648984] Fps is (10 sec: 42586.2, 60 sec: 45873.0, 300 sec: 43542.1). Total num frames: 618135552. Throughput: 0: 11195.0. Samples: 154582528. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:28:50,741][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:28:51,886][1652475] Updated weights for policy 0, policy_version 301872 (0.0017) [2024-06-15 15:28:55,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 44784.7, 300 sec: 43764.7). Total num frames: 618332160. Throughput: 0: 11082.4. Samples: 154649600. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:28:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:28:56,175][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000301936_618364928.pth... [2024-06-15 15:28:56,211][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000296896_608043008.pth [2024-06-15 15:28:56,302][1652475] Updated weights for policy 0, policy_version 301937 (0.0090) [2024-06-15 15:28:58,157][1652475] Updated weights for policy 0, policy_version 301984 (0.0012) [2024-06-15 15:28:59,971][1652475] Updated weights for policy 0, policy_version 302033 (0.0014) [2024-06-15 15:29:00,738][1648984] Fps is (10 sec: 49166.5, 60 sec: 46967.5, 300 sec: 43431.5). Total num frames: 618627072. Throughput: 0: 11150.2. Samples: 154715648. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:29:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:29:00,778][1652475] Updated weights for policy 0, policy_version 302080 (0.0033) [2024-06-15 15:29:05,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 43544.5). Total num frames: 618790912. Throughput: 0: 11025.1. Samples: 154748928. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:29:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:29:07,036][1652475] Updated weights for policy 0, policy_version 302166 (0.0018) [2024-06-15 15:29:07,992][1652475] Updated weights for policy 0, policy_version 302203 (0.0012) [2024-06-15 15:29:10,336][1652475] Updated weights for policy 0, policy_version 302272 (0.0039) [2024-06-15 15:29:10,739][1648984] Fps is (10 sec: 42598.0, 60 sec: 45876.6, 300 sec: 43764.7). Total num frames: 619053056. Throughput: 0: 11047.8. Samples: 154818560. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:29:10,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:29:12,791][1652475] Updated weights for policy 0, policy_version 302331 (0.0038) [2024-06-15 15:29:15,738][1648984] Fps is (10 sec: 49150.3, 60 sec: 43144.2, 300 sec: 43875.9). Total num frames: 619282432. Throughput: 0: 11104.6. Samples: 154882048. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:29:15,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:29:15,748][1652475] Updated weights for policy 0, policy_version 302393 (0.0013) [2024-06-15 15:29:19,929][1652475] Updated weights for policy 0, policy_version 302463 (0.0012) [2024-06-15 15:29:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 619446272. Throughput: 0: 11138.8. Samples: 154918912. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:29:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:29:23,353][1652475] Updated weights for policy 0, policy_version 302529 (0.0013) [2024-06-15 15:29:24,570][1652475] Updated weights for policy 0, policy_version 302586 (0.0016) [2024-06-15 15:29:25,738][1648984] Fps is (10 sec: 42599.6, 60 sec: 44782.9, 300 sec: 43542.5). Total num frames: 619708416. Throughput: 0: 11025.1. Samples: 154980352. Policy #0 lag: (min: 24.0, avg: 128.0, max: 271.0) [2024-06-15 15:29:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:29:27,922][1652475] Updated weights for policy 0, policy_version 302655 (0.0015) [2024-06-15 15:29:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 619839488. Throughput: 0: 11138.8. Samples: 155051520. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:29:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:29:32,770][1651340] Signal inference workers to stop experience collection... (15550 times) [2024-06-15 15:29:32,831][1652475] InferenceWorker_p0-w0: stopping experience collection (15550 times) [2024-06-15 15:29:32,963][1651340] Signal inference workers to resume experience collection... (15550 times) [2024-06-15 15:29:32,995][1652475] InferenceWorker_p0-w0: resuming experience collection (15550 times) [2024-06-15 15:29:33,149][1652475] Updated weights for policy 0, policy_version 302738 (0.0014) [2024-06-15 15:29:34,799][1652475] Updated weights for policy 0, policy_version 302786 (0.0042) [2024-06-15 15:29:35,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 45329.0, 300 sec: 43875.8). Total num frames: 620199936. Throughput: 0: 11116.8. Samples: 155082752. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:29:35,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:29:35,930][1652475] Updated weights for policy 0, policy_version 302843 (0.0015) [2024-06-15 15:29:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 620363776. Throughput: 0: 10956.8. Samples: 155142656. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:29:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:29:42,832][1652475] Updated weights for policy 0, policy_version 302928 (0.0019) [2024-06-15 15:29:45,738][1648984] Fps is (10 sec: 29491.5, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 620494848. Throughput: 0: 11025.0. Samples: 155211776. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:29:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:29:46,695][1652475] Updated weights for policy 0, policy_version 303009 (0.0015) [2024-06-15 15:29:48,960][1652475] Updated weights for policy 0, policy_version 303104 (0.0158) [2024-06-15 15:29:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43692.7, 300 sec: 43209.3). Total num frames: 620756992. Throughput: 0: 10911.3. Samples: 155239936. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:29:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:29:54,944][1652475] Updated weights for policy 0, policy_version 303184 (0.0115) [2024-06-15 15:29:55,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 44236.7, 300 sec: 43320.4). Total num frames: 620986368. Throughput: 0: 10786.1. Samples: 155303936. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:29:55,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:29:59,465][1652475] Updated weights for policy 0, policy_version 303253 (0.0013) [2024-06-15 15:30:00,740][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 43098.2). Total num frames: 621150208. Throughput: 0: 10900.0. Samples: 155372544. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:30:00,741][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:30:01,052][1652475] Updated weights for policy 0, policy_version 303314 (0.0012) [2024-06-15 15:30:03,209][1652475] Updated weights for policy 0, policy_version 303392 (0.0013) [2024-06-15 15:30:05,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 621412352. Throughput: 0: 10535.8. Samples: 155393024. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:30:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:30:07,557][1652475] Updated weights for policy 0, policy_version 303459 (0.0013) [2024-06-15 15:30:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 621543424. Throughput: 0: 10729.2. Samples: 155463168. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:30:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:30:11,267][1652475] Updated weights for policy 0, policy_version 303504 (0.0014) [2024-06-15 15:30:13,989][1652475] Updated weights for policy 0, policy_version 303570 (0.0015) [2024-06-15 15:30:15,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 42598.6, 300 sec: 43653.6). Total num frames: 621838336. Throughput: 0: 10558.6. Samples: 155526656. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:30:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:30:15,939][1652475] Updated weights for policy 0, policy_version 303651 (0.0023) [2024-06-15 15:30:18,980][1651340] Signal inference workers to stop experience collection... (15600 times) [2024-06-15 15:30:19,024][1652475] InferenceWorker_p0-w0: stopping experience collection (15600 times) [2024-06-15 15:30:19,246][1651340] Signal inference workers to resume experience collection... (15600 times) [2024-06-15 15:30:19,246][1652475] InferenceWorker_p0-w0: resuming experience collection (15600 times) [2024-06-15 15:30:19,797][1652475] Updated weights for policy 0, policy_version 303740 (0.0013) [2024-06-15 15:30:20,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 622067712. Throughput: 0: 10626.9. Samples: 155560960. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:30:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:30:24,184][1652475] Updated weights for policy 0, policy_version 303804 (0.0013) [2024-06-15 15:30:25,740][1648984] Fps is (10 sec: 36034.7, 60 sec: 41504.2, 300 sec: 43097.8). Total num frames: 622198784. Throughput: 0: 10774.1. Samples: 155627520. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:30:25,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:30:27,533][1652475] Updated weights for policy 0, policy_version 303904 (0.0039) [2024-06-15 15:30:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 622460928. Throughput: 0: 10615.5. Samples: 155689472. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:30:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:30:31,384][1652475] Updated weights for policy 0, policy_version 303957 (0.0034) [2024-06-15 15:30:34,803][1652475] Updated weights for policy 0, policy_version 304016 (0.0012) [2024-06-15 15:30:35,738][1648984] Fps is (10 sec: 49166.2, 60 sec: 41506.3, 300 sec: 43320.4). Total num frames: 622690304. Throughput: 0: 10831.7. Samples: 155727360. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:30:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:30:37,343][1652475] Updated weights for policy 0, policy_version 304067 (0.0025) [2024-06-15 15:30:38,596][1652475] Updated weights for policy 0, policy_version 304128 (0.0013) [2024-06-15 15:30:39,960][1652475] Updated weights for policy 0, policy_version 304182 (0.0011) [2024-06-15 15:30:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 622985216. Throughput: 0: 10865.8. Samples: 155792896. Policy #0 lag: (min: 35.0, avg: 159.2, max: 291.0) [2024-06-15 15:30:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:30:42,935][1652475] Updated weights for policy 0, policy_version 304226 (0.0013) [2024-06-15 15:30:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 623116288. Throughput: 0: 10865.8. Samples: 155861504. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:30:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:30:47,000][1652475] Updated weights for policy 0, policy_version 304314 (0.0014) [2024-06-15 15:30:49,849][1652475] Updated weights for policy 0, policy_version 304380 (0.0146) [2024-06-15 15:30:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 623378432. Throughput: 0: 11070.6. Samples: 155891200. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:30:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:30:53,752][1652475] Updated weights for policy 0, policy_version 304440 (0.0021) [2024-06-15 15:30:55,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 623575040. Throughput: 0: 10945.4. Samples: 155955712. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:30:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:30:56,032][1652475] Updated weights for policy 0, policy_version 304498 (0.0013) [2024-06-15 15:30:56,287][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000304512_623640576.pth... [2024-06-15 15:30:56,340][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000299328_613023744.pth [2024-06-15 15:30:56,345][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000304512_623640576.pth [2024-06-15 15:30:57,702][1652475] Updated weights for policy 0, policy_version 304544 (0.0011) [2024-06-15 15:30:58,602][1652475] Updated weights for policy 0, policy_version 304576 (0.0018) [2024-06-15 15:31:00,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 623837184. Throughput: 0: 10968.2. Samples: 156020224. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:31:01,207][1652475] Updated weights for policy 0, policy_version 304635 (0.0019) [2024-06-15 15:31:05,177][1652475] Updated weights for policy 0, policy_version 304689 (0.0013) [2024-06-15 15:31:05,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.8, 300 sec: 43542.7). Total num frames: 624033792. Throughput: 0: 10990.9. Samples: 156055552. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:31:08,254][1651340] Signal inference workers to stop experience collection... (15650 times) [2024-06-15 15:31:08,388][1652475] InferenceWorker_p0-w0: stopping experience collection (15650 times) [2024-06-15 15:31:08,556][1651340] Signal inference workers to resume experience collection... (15650 times) [2024-06-15 15:31:08,557][1652475] InferenceWorker_p0-w0: resuming experience collection (15650 times) [2024-06-15 15:31:08,751][1652475] Updated weights for policy 0, policy_version 304740 (0.0012) [2024-06-15 15:31:10,720][1652475] Updated weights for policy 0, policy_version 304816 (0.0015) [2024-06-15 15:31:10,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 624263168. Throughput: 0: 11071.3. Samples: 156125696. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:31:12,589][1652475] Updated weights for policy 0, policy_version 304893 (0.0013) [2024-06-15 15:31:15,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.8, 300 sec: 43653.7). Total num frames: 624459776. Throughput: 0: 11059.2. Samples: 156187136. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:31:16,565][1652475] Updated weights for policy 0, policy_version 304953 (0.0013) [2024-06-15 15:31:20,658][1652475] Updated weights for policy 0, policy_version 305016 (0.0015) [2024-06-15 15:31:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43144.6, 300 sec: 43653.7). Total num frames: 624656384. Throughput: 0: 10979.6. Samples: 156221440. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:31:22,337][1652475] Updated weights for policy 0, policy_version 305072 (0.0011) [2024-06-15 15:31:24,300][1652475] Updated weights for policy 0, policy_version 305145 (0.0014) [2024-06-15 15:31:25,738][1648984] Fps is (10 sec: 49150.5, 60 sec: 45877.2, 300 sec: 43986.9). Total num frames: 624951296. Throughput: 0: 10877.1. Samples: 156282368. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:25,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:31:28,132][1652475] Updated weights for policy 0, policy_version 305202 (0.0037) [2024-06-15 15:31:30,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43690.6, 300 sec: 43544.1). Total num frames: 625082368. Throughput: 0: 11036.4. Samples: 156358144. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:31:31,994][1652475] Updated weights for policy 0, policy_version 305252 (0.0014) [2024-06-15 15:31:33,526][1652475] Updated weights for policy 0, policy_version 305312 (0.0013) [2024-06-15 15:31:35,553][1652475] Updated weights for policy 0, policy_version 305376 (0.0014) [2024-06-15 15:31:35,738][1648984] Fps is (10 sec: 45876.1, 60 sec: 45329.0, 300 sec: 43765.0). Total num frames: 625410048. Throughput: 0: 11104.7. Samples: 156390912. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:31:39,721][1652475] Updated weights for policy 0, policy_version 305470 (0.0076) [2024-06-15 15:31:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 625606656. Throughput: 0: 11070.6. Samples: 156453888. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:31:44,316][1652475] Updated weights for policy 0, policy_version 305533 (0.0012) [2024-06-15 15:31:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 625803264. Throughput: 0: 11218.5. Samples: 156525056. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:31:45,969][1652475] Updated weights for policy 0, policy_version 305594 (0.0012) [2024-06-15 15:31:47,996][1652475] Updated weights for policy 0, policy_version 305664 (0.0013) [2024-06-15 15:31:50,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 44236.7, 300 sec: 44098.3). Total num frames: 626032640. Throughput: 0: 11036.4. Samples: 156552192. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:31:51,438][1652475] Updated weights for policy 0, policy_version 305724 (0.0015) [2024-06-15 15:31:55,738][1648984] Fps is (10 sec: 32767.2, 60 sec: 42598.3, 300 sec: 43542.5). Total num frames: 626130944. Throughput: 0: 11093.3. Samples: 156624896. Policy #0 lag: (min: 15.0, avg: 141.8, max: 271.0) [2024-06-15 15:31:55,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:31:56,567][1651340] Signal inference workers to stop experience collection... (15700 times) [2024-06-15 15:31:56,617][1652475] InferenceWorker_p0-w0: stopping experience collection (15700 times) [2024-06-15 15:31:56,889][1651340] Signal inference workers to resume experience collection... (15700 times) [2024-06-15 15:31:56,889][1652475] InferenceWorker_p0-w0: resuming experience collection (15700 times) [2024-06-15 15:31:57,327][1652475] Updated weights for policy 0, policy_version 305792 (0.0018) [2024-06-15 15:31:58,996][1652475] Updated weights for policy 0, policy_version 305861 (0.0015) [2024-06-15 15:32:00,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 626524160. Throughput: 0: 10968.1. Samples: 156680704. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:32:01,847][1652475] Updated weights for policy 0, policy_version 305936 (0.0015) [2024-06-15 15:32:05,743][1648984] Fps is (10 sec: 52404.8, 60 sec: 43687.1, 300 sec: 43875.1). Total num frames: 626655232. Throughput: 0: 10921.5. Samples: 156712960. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:05,743][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:32:08,373][1652475] Updated weights for policy 0, policy_version 305985 (0.0014) [2024-06-15 15:32:10,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 626819072. Throughput: 0: 11161.6. Samples: 156784640. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:10,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:32:11,131][1652475] Updated weights for policy 0, policy_version 306080 (0.0012) [2024-06-15 15:32:12,862][1652475] Updated weights for policy 0, policy_version 306163 (0.0015) [2024-06-15 15:32:14,214][1652475] Updated weights for policy 0, policy_version 306224 (0.0014) [2024-06-15 15:32:15,743][1648984] Fps is (10 sec: 52425.6, 60 sec: 45324.9, 300 sec: 44430.4). Total num frames: 627179520. Throughput: 0: 10693.8. Samples: 156839424. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:15,744][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:32:20,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.2, 300 sec: 43542.6). Total num frames: 627179520. Throughput: 0: 10820.3. Samples: 156877824. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:32:21,279][1652475] Updated weights for policy 0, policy_version 306272 (0.0013) [2024-06-15 15:32:23,162][1652475] Updated weights for policy 0, policy_version 306339 (0.0012) [2024-06-15 15:32:25,657][1652475] Updated weights for policy 0, policy_version 306419 (0.0014) [2024-06-15 15:32:25,738][1648984] Fps is (10 sec: 36064.1, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 627539968. Throughput: 0: 10797.5. Samples: 156939776. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:32:27,286][1652475] Updated weights for policy 0, policy_version 306490 (0.0013) [2024-06-15 15:32:30,739][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43986.8). Total num frames: 627703808. Throughput: 0: 10729.2. Samples: 157007872. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:30,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 15:32:33,345][1652475] Updated weights for policy 0, policy_version 306528 (0.0013) [2024-06-15 15:32:35,479][1652475] Updated weights for policy 0, policy_version 306612 (0.0015) [2024-06-15 15:32:35,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 627965952. Throughput: 0: 10956.8. Samples: 157045248. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 15:32:36,934][1652475] Updated weights for policy 0, policy_version 306656 (0.0037) [2024-06-15 15:32:37,758][1651340] Signal inference workers to stop experience collection... (15750 times) [2024-06-15 15:32:37,788][1652475] InferenceWorker_p0-w0: stopping experience collection (15750 times) [2024-06-15 15:32:37,995][1651340] Signal inference workers to resume experience collection... (15750 times) [2024-06-15 15:32:38,010][1652475] InferenceWorker_p0-w0: resuming experience collection (15750 times) [2024-06-15 15:32:39,045][1652475] Updated weights for policy 0, policy_version 306745 (0.0014) [2024-06-15 15:32:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 44097.9). Total num frames: 628228096. Throughput: 0: 10422.1. Samples: 157093888. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 15:32:45,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 41506.1, 300 sec: 43764.7). Total num frames: 628293632. Throughput: 0: 10945.4. Samples: 157173248. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 15:32:46,070][1652475] Updated weights for policy 0, policy_version 306800 (0.0012) [2024-06-15 15:32:47,542][1652475] Updated weights for policy 0, policy_version 306880 (0.0015) [2024-06-15 15:32:49,269][1652475] Updated weights for policy 0, policy_version 306938 (0.0011) [2024-06-15 15:32:50,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44236.9, 300 sec: 44209.4). Total num frames: 628686848. Throughput: 0: 10810.0. Samples: 157199360. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:32:51,100][1652475] Updated weights for policy 0, policy_version 306992 (0.0024) [2024-06-15 15:32:55,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.8, 300 sec: 43875.8). Total num frames: 628752384. Throughput: 0: 10626.8. Samples: 157262848. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:32:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:32:55,747][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000307008_628752384.pth... [2024-06-15 15:32:55,812][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000301936_618364928.pth [2024-06-15 15:32:58,191][1652475] Updated weights for policy 0, policy_version 307072 (0.0015) [2024-06-15 15:32:59,939][1652475] Updated weights for policy 0, policy_version 307140 (0.0141) [2024-06-15 15:33:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 629080064. Throughput: 0: 10685.0. Samples: 157320192. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:33:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:33:01,247][1652475] Updated weights for policy 0, policy_version 307192 (0.0025) [2024-06-15 15:33:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41509.5, 300 sec: 43542.8). Total num frames: 629145600. Throughput: 0: 10592.7. Samples: 157354496. Policy #0 lag: (min: 31.0, avg: 79.8, max: 223.0) [2024-06-15 15:33:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 15:33:06,820][1652475] Updated weights for policy 0, policy_version 307260 (0.0015) [2024-06-15 15:33:09,891][1652475] Updated weights for policy 0, policy_version 307327 (0.0017) [2024-06-15 15:33:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 629440512. Throughput: 0: 10740.6. Samples: 157423104. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:33:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:33:12,022][1652475] Updated weights for policy 0, policy_version 307395 (0.0011) [2024-06-15 15:33:15,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 41509.9, 300 sec: 43542.6). Total num frames: 629669888. Throughput: 0: 10490.3. Samples: 157479936. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:33:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:33:19,311][1652475] Updated weights for policy 0, policy_version 307472 (0.0013) [2024-06-15 15:33:20,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 629800960. Throughput: 0: 10422.1. Samples: 157514240. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:33:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:33:20,901][1652475] Updated weights for policy 0, policy_version 307523 (0.0012) [2024-06-15 15:33:23,736][1652475] Updated weights for policy 0, policy_version 307602 (0.0121) [2024-06-15 15:33:24,884][1652475] Updated weights for policy 0, policy_version 307653 (0.0020) [2024-06-15 15:33:25,178][1651340] Signal inference workers to stop experience collection... (15800 times) [2024-06-15 15:33:25,285][1652475] InferenceWorker_p0-w0: stopping experience collection (15800 times) [2024-06-15 15:33:25,477][1651340] Signal inference workers to resume experience collection... (15800 times) [2024-06-15 15:33:25,478][1652475] InferenceWorker_p0-w0: resuming experience collection (15800 times) [2024-06-15 15:33:25,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 630128640. Throughput: 0: 10729.3. Samples: 157576704. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:33:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:33:26,181][1652475] Updated weights for policy 0, policy_version 307709 (0.0012) [2024-06-15 15:33:30,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 630259712. Throughput: 0: 10479.0. Samples: 157644800. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:33:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:33:31,325][1652475] Updated weights for policy 0, policy_version 307775 (0.0012) [2024-06-15 15:33:33,288][1652475] Updated weights for policy 0, policy_version 307836 (0.0114) [2024-06-15 15:33:35,738][1648984] Fps is (10 sec: 42597.3, 60 sec: 43144.4, 300 sec: 43542.5). Total num frames: 630554624. Throughput: 0: 10524.4. Samples: 157672960. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:33:35,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:33:35,810][1652475] Updated weights for policy 0, policy_version 307896 (0.0014) [2024-06-15 15:33:39,116][1652475] Updated weights for policy 0, policy_version 307966 (0.0017) [2024-06-15 15:33:40,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 41506.1, 300 sec: 43542.5). Total num frames: 630718464. Throughput: 0: 10672.4. Samples: 157743104. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:33:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:33:43,243][1652475] Updated weights for policy 0, policy_version 308023 (0.0014) [2024-06-15 15:33:44,633][1652475] Updated weights for policy 0, policy_version 308080 (0.0012) [2024-06-15 15:33:45,742][1648984] Fps is (10 sec: 42580.9, 60 sec: 44779.8, 300 sec: 43542.4). Total num frames: 630980608. Throughput: 0: 10762.3. Samples: 157804544. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:33:45,742][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:33:47,219][1652475] Updated weights for policy 0, policy_version 308151 (0.0012) [2024-06-15 15:33:50,579][1652475] Updated weights for policy 0, policy_version 308192 (0.0015) [2024-06-15 15:33:50,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 631177216. Throughput: 0: 10808.9. Samples: 157840896. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:33:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:33:54,556][1652475] Updated weights for policy 0, policy_version 308248 (0.0014) [2024-06-15 15:33:55,738][1648984] Fps is (10 sec: 42617.0, 60 sec: 44236.9, 300 sec: 43320.4). Total num frames: 631406592. Throughput: 0: 10877.2. Samples: 157912576. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:33:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:33:56,652][1652475] Updated weights for policy 0, policy_version 308342 (0.0012) [2024-06-15 15:33:59,199][1652475] Updated weights for policy 0, policy_version 308385 (0.0012) [2024-06-15 15:34:00,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 631635968. Throughput: 0: 10945.4. Samples: 157972480. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:34:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:34:02,034][1652475] Updated weights for policy 0, policy_version 308422 (0.0011) [2024-06-15 15:34:03,117][1652475] Updated weights for policy 0, policy_version 308480 (0.0014) [2024-06-15 15:34:05,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 631767040. Throughput: 0: 11013.7. Samples: 158009856. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:34:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:34:07,050][1652475] Updated weights for policy 0, policy_version 308546 (0.0014) [2024-06-15 15:34:08,228][1652475] Updated weights for policy 0, policy_version 308606 (0.0013) [2024-06-15 15:34:10,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 632127488. Throughput: 0: 11081.9. Samples: 158075392. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:34:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:34:10,860][1652475] Updated weights for policy 0, policy_version 308665 (0.0012) [2024-06-15 15:34:14,529][1651340] Signal inference workers to stop experience collection... (15850 times) [2024-06-15 15:34:14,574][1652475] InferenceWorker_p0-w0: stopping experience collection (15850 times) [2024-06-15 15:34:14,740][1651340] Signal inference workers to resume experience collection... (15850 times) [2024-06-15 15:34:14,742][1652475] InferenceWorker_p0-w0: resuming experience collection (15850 times) [2024-06-15 15:34:15,210][1652475] Updated weights for policy 0, policy_version 308733 (0.0136) [2024-06-15 15:34:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 632291328. Throughput: 0: 11002.3. Samples: 158139904. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:34:15,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:34:18,755][1652475] Updated weights for policy 0, policy_version 308793 (0.0014) [2024-06-15 15:34:20,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 632422400. Throughput: 0: 11116.1. Samples: 158173184. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:34:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 15:34:22,230][1652475] Updated weights for policy 0, policy_version 308864 (0.0013) [2024-06-15 15:34:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.3, 300 sec: 43542.6). Total num frames: 632684544. Throughput: 0: 10672.4. Samples: 158223360. Policy #0 lag: (min: 15.0, avg: 133.7, max: 303.0) [2024-06-15 15:34:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:34:27,357][1652475] Updated weights for policy 0, policy_version 308944 (0.0012) [2024-06-15 15:34:30,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42598.3, 300 sec: 42765.0). Total num frames: 632815616. Throughput: 0: 10718.9. Samples: 158286848. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:34:30,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:34:31,205][1652475] Updated weights for policy 0, policy_version 309010 (0.0014) [2024-06-15 15:34:34,960][1652475] Updated weights for policy 0, policy_version 309091 (0.0013) [2024-06-15 15:34:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42052.4, 300 sec: 43098.3). Total num frames: 633077760. Throughput: 0: 10649.6. Samples: 158320128. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:34:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:34:36,359][1652475] Updated weights for policy 0, policy_version 309168 (0.0013) [2024-06-15 15:34:39,703][1652475] Updated weights for policy 0, policy_version 309216 (0.0014) [2024-06-15 15:34:40,740][1648984] Fps is (10 sec: 52419.5, 60 sec: 43689.4, 300 sec: 43542.3). Total num frames: 633339904. Throughput: 0: 10489.9. Samples: 158384640. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:34:40,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:34:43,231][1652475] Updated weights for policy 0, policy_version 309280 (0.0014) [2024-06-15 15:34:45,300][1652475] Updated weights for policy 0, policy_version 309317 (0.0013) [2024-06-15 15:34:45,738][1648984] Fps is (10 sec: 42597.0, 60 sec: 42055.1, 300 sec: 43209.3). Total num frames: 633503744. Throughput: 0: 10706.4. Samples: 158454272. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:34:45,740][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:34:48,215][1652475] Updated weights for policy 0, policy_version 309380 (0.0013) [2024-06-15 15:34:49,375][1652475] Updated weights for policy 0, policy_version 309433 (0.0014) [2024-06-15 15:34:50,738][1648984] Fps is (10 sec: 39328.8, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 633733120. Throughput: 0: 10604.1. Samples: 158487040. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:34:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:34:51,758][1652475] Updated weights for policy 0, policy_version 309472 (0.0016) [2024-06-15 15:34:54,397][1652475] Updated weights for policy 0, policy_version 309511 (0.0014) [2024-06-15 15:34:55,547][1652475] Updated weights for policy 0, policy_version 309568 (0.0012) [2024-06-15 15:34:55,738][1648984] Fps is (10 sec: 49153.1, 60 sec: 43144.4, 300 sec: 43542.6). Total num frames: 633995264. Throughput: 0: 10626.8. Samples: 158553600. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:34:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:34:55,753][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000309568_633995264.pth... [2024-06-15 15:34:55,800][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000304512_623640576.pth [2024-06-15 15:34:57,564][1652475] Updated weights for policy 0, policy_version 309626 (0.0015) [2024-06-15 15:35:00,716][1652475] Updated weights for policy 0, policy_version 309691 (0.0036) [2024-06-15 15:35:00,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 634224640. Throughput: 0: 10672.4. Samples: 158620160. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:35:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:35:03,428][1651340] Signal inference workers to stop experience collection... (15900 times) [2024-06-15 15:35:03,550][1652475] InferenceWorker_p0-w0: stopping experience collection (15900 times) [2024-06-15 15:35:03,703][1651340] Signal inference workers to resume experience collection... (15900 times) [2024-06-15 15:35:03,704][1652475] InferenceWorker_p0-w0: resuming experience collection (15900 times) [2024-06-15 15:35:03,931][1652475] Updated weights for policy 0, policy_version 309731 (0.0129) [2024-06-15 15:35:05,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 634388480. Throughput: 0: 10729.3. Samples: 158656000. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:35:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:35:06,189][1652475] Updated weights for policy 0, policy_version 309779 (0.0015) [2024-06-15 15:35:09,134][1652475] Updated weights for policy 0, policy_version 309856 (0.0013) [2024-06-15 15:35:10,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 43431.5). Total num frames: 634650624. Throughput: 0: 10979.5. Samples: 158717440. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:35:10,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:35:12,305][1652475] Updated weights for policy 0, policy_version 309941 (0.0012) [2024-06-15 15:35:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 634814464. Throughput: 0: 11082.0. Samples: 158785536. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:35:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:35:16,447][1652475] Updated weights for policy 0, policy_version 310000 (0.0012) [2024-06-15 15:35:20,544][1652475] Updated weights for policy 0, policy_version 310064 (0.0012) [2024-06-15 15:35:20,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 43144.5, 300 sec: 43431.9). Total num frames: 635011072. Throughput: 0: 10934.0. Samples: 158812160. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:35:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:35:21,992][1652475] Updated weights for policy 0, policy_version 310133 (0.0017) [2024-06-15 15:35:24,297][1652475] Updated weights for policy 0, policy_version 310207 (0.0013) [2024-06-15 15:35:25,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 635305984. Throughput: 0: 10832.1. Samples: 158872064. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:35:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:35:28,231][1652475] Updated weights for policy 0, policy_version 310262 (0.0011) [2024-06-15 15:35:30,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.8, 300 sec: 43209.3). Total num frames: 635437056. Throughput: 0: 10865.9. Samples: 158943232. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:35:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:35:32,580][1652475] Updated weights for policy 0, policy_version 310320 (0.0012) [2024-06-15 15:35:34,278][1652475] Updated weights for policy 0, policy_version 310368 (0.0016) [2024-06-15 15:35:35,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 635764736. Throughput: 0: 10911.3. Samples: 158978048. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:35:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:35:36,095][1652475] Updated weights for policy 0, policy_version 310455 (0.0014) [2024-06-15 15:35:40,676][1652475] Updated weights for policy 0, policy_version 310522 (0.0105) [2024-06-15 15:35:40,739][1648984] Fps is (10 sec: 49150.4, 60 sec: 43145.7, 300 sec: 43431.4). Total num frames: 635928576. Throughput: 0: 11002.3. Samples: 159048704. Policy #0 lag: (min: 0.0, avg: 108.8, max: 256.0) [2024-06-15 15:35:40,742][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:35:43,685][1652475] Updated weights for policy 0, policy_version 310576 (0.0013) [2024-06-15 15:35:44,921][1652475] Updated weights for policy 0, policy_version 310610 (0.0011) [2024-06-15 15:35:45,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 44783.2, 300 sec: 43431.5). Total num frames: 636190720. Throughput: 0: 10911.3. Samples: 159111168. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:35:45,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 15:35:46,852][1652475] Updated weights for policy 0, policy_version 310704 (0.0014) [2024-06-15 15:35:50,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 636354560. Throughput: 0: 10774.7. Samples: 159140864. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:35:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:35:52,949][1651340] Signal inference workers to stop experience collection... (15950 times) [2024-06-15 15:35:53,004][1652475] InferenceWorker_p0-w0: stopping experience collection (15950 times) [2024-06-15 15:35:53,191][1651340] Signal inference workers to resume experience collection... (15950 times) [2024-06-15 15:35:53,192][1652475] InferenceWorker_p0-w0: resuming experience collection (15950 times) [2024-06-15 15:35:53,620][1652475] Updated weights for policy 0, policy_version 310756 (0.0078) [2024-06-15 15:35:55,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 636583936. Throughput: 0: 11082.0. Samples: 159216128. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:35:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:35:56,216][1652475] Updated weights for policy 0, policy_version 310854 (0.0020) [2024-06-15 15:35:57,152][1652475] Updated weights for policy 0, policy_version 310911 (0.0097) [2024-06-15 15:36:00,739][1648984] Fps is (10 sec: 52423.8, 60 sec: 44236.1, 300 sec: 43542.4). Total num frames: 636878848. Throughput: 0: 10751.7. Samples: 159269376. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:00,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:36:05,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 636878848. Throughput: 0: 11127.5. Samples: 159312896. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:36:05,761][1652475] Updated weights for policy 0, policy_version 310992 (0.0014) [2024-06-15 15:36:08,336][1652475] Updated weights for policy 0, policy_version 311104 (0.0013) [2024-06-15 15:36:10,701][1652475] Updated weights for policy 0, policy_version 311200 (0.0014) [2024-06-15 15:36:10,738][1648984] Fps is (10 sec: 45879.5, 60 sec: 44783.0, 300 sec: 43653.6). Total num frames: 637337600. Throughput: 0: 11047.8. Samples: 159369216. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:36:15,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 637403136. Throughput: 0: 11013.7. Samples: 159438848. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:36:17,825][1652475] Updated weights for policy 0, policy_version 311234 (0.0011) [2024-06-15 15:36:19,255][1652475] Updated weights for policy 0, policy_version 311300 (0.0012) [2024-06-15 15:36:20,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 637665280. Throughput: 0: 11082.0. Samples: 159476736. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:36:21,449][1652475] Updated weights for policy 0, policy_version 311394 (0.0015) [2024-06-15 15:36:23,586][1652475] Updated weights for policy 0, policy_version 311472 (0.0012) [2024-06-15 15:36:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 637927424. Throughput: 0: 10479.0. Samples: 159520256. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:36:30,754][1648984] Fps is (10 sec: 26171.0, 60 sec: 41494.7, 300 sec: 42429.4). Total num frames: 637927424. Throughput: 0: 10884.5. Samples: 159601152. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:30,756][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:36:31,293][1652475] Updated weights for policy 0, policy_version 311520 (0.0017) [2024-06-15 15:36:32,882][1651340] Signal inference workers to stop experience collection... (16000 times) [2024-06-15 15:36:32,933][1652475] Updated weights for policy 0, policy_version 311586 (0.0185) [2024-06-15 15:36:32,999][1652475] InferenceWorker_p0-w0: stopping experience collection (16000 times) [2024-06-15 15:36:33,155][1651340] Signal inference workers to resume experience collection... (16000 times) [2024-06-15 15:36:33,156][1652475] InferenceWorker_p0-w0: resuming experience collection (16000 times) [2024-06-15 15:36:35,059][1652475] Updated weights for policy 0, policy_version 311664 (0.0012) [2024-06-15 15:36:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 638320640. Throughput: 0: 10729.2. Samples: 159623680. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:36:36,839][1652475] Updated weights for policy 0, policy_version 311733 (0.0114) [2024-06-15 15:36:40,740][1648984] Fps is (10 sec: 52515.5, 60 sec: 42052.4, 300 sec: 42876.1). Total num frames: 638451712. Throughput: 0: 10342.4. Samples: 159681536. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:40,742][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 15:36:43,719][1652475] Updated weights for policy 0, policy_version 311763 (0.0020) [2024-06-15 15:36:45,738][1648984] Fps is (10 sec: 26214.3, 60 sec: 39867.6, 300 sec: 42542.9). Total num frames: 638582784. Throughput: 0: 10570.2. Samples: 159745024. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:36:45,789][1652475] Updated weights for policy 0, policy_version 311809 (0.0012) [2024-06-15 15:36:47,417][1652475] Updated weights for policy 0, policy_version 311874 (0.0012) [2024-06-15 15:36:49,546][1652475] Updated weights for policy 0, policy_version 311971 (0.0014) [2024-06-15 15:36:50,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 638976000. Throughput: 0: 10160.3. Samples: 159770112. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:36:55,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 39867.6, 300 sec: 42209.6). Total num frames: 638976000. Throughput: 0: 10365.1. Samples: 159835648. Policy #0 lag: (min: 7.0, avg: 98.0, max: 263.0) [2024-06-15 15:36:55,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:36:56,337][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000312032_639041536.pth... [2024-06-15 15:36:56,337][1652475] Updated weights for policy 0, policy_version 312032 (0.0015) [2024-06-15 15:36:56,516][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000307008_628752384.pth [2024-06-15 15:36:58,401][1652475] Updated weights for policy 0, policy_version 312067 (0.0013) [2024-06-15 15:37:00,040][1652475] Updated weights for policy 0, policy_version 312144 (0.0356) [2024-06-15 15:37:00,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 40414.6, 300 sec: 42876.8). Total num frames: 639303680. Throughput: 0: 10240.0. Samples: 159899648. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:37:02,035][1652475] Updated weights for policy 0, policy_version 312227 (0.0013) [2024-06-15 15:37:05,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 639500288. Throughput: 0: 10103.5. Samples: 159931392. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:05,742][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:37:06,950][1652475] Updated weights for policy 0, policy_version 312260 (0.0014) [2024-06-15 15:37:09,781][1652475] Updated weights for policy 0, policy_version 312324 (0.0013) [2024-06-15 15:37:10,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 39321.6, 300 sec: 42432.6). Total num frames: 639696896. Throughput: 0: 10752.0. Samples: 160004096. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:37:11,675][1652475] Updated weights for policy 0, policy_version 312400 (0.0138) [2024-06-15 15:37:13,098][1652475] Updated weights for policy 0, policy_version 312465 (0.0012) [2024-06-15 15:37:13,360][1651340] Signal inference workers to stop experience collection... (16050 times) [2024-06-15 15:37:13,426][1652475] InferenceWorker_p0-w0: stopping experience collection (16050 times) [2024-06-15 15:37:13,660][1651340] Signal inference workers to resume experience collection... (16050 times) [2024-06-15 15:37:13,661][1652475] InferenceWorker_p0-w0: resuming experience collection (16050 times) [2024-06-15 15:37:15,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 640024576. Throughput: 0: 10289.3. Samples: 160064000. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:37:19,756][1652475] Updated weights for policy 0, policy_version 312544 (0.0012) [2024-06-15 15:37:20,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 640155648. Throughput: 0: 10649.6. Samples: 160102912. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:37:21,983][1652475] Updated weights for policy 0, policy_version 312592 (0.0014) [2024-06-15 15:37:24,514][1652475] Updated weights for policy 0, policy_version 312704 (0.0013) [2024-06-15 15:37:25,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 640548864. Throughput: 0: 10638.2. Samples: 160160256. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:37:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43702.7, 300 sec: 42653.9). Total num frames: 640548864. Throughput: 0: 10763.4. Samples: 160229376. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:37:31,541][1652475] Updated weights for policy 0, policy_version 312770 (0.0013) [2024-06-15 15:37:34,703][1652475] Updated weights for policy 0, policy_version 312880 (0.0014) [2024-06-15 15:37:35,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 640876544. Throughput: 0: 10979.6. Samples: 160264192. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:37:36,075][1652475] Updated weights for policy 0, policy_version 312953 (0.0013) [2024-06-15 15:37:37,496][1652475] Updated weights for policy 0, policy_version 312996 (0.0012) [2024-06-15 15:37:40,739][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 641073152. Throughput: 0: 10865.8. Samples: 160324608. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:40,742][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:37:43,059][1652475] Updated weights for policy 0, policy_version 313040 (0.0011) [2024-06-15 15:37:45,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 641204224. Throughput: 0: 10979.6. Samples: 160393728. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:37:46,001][1652475] Updated weights for policy 0, policy_version 313107 (0.0020) [2024-06-15 15:37:46,758][1652475] Updated weights for policy 0, policy_version 313152 (0.0013) [2024-06-15 15:37:50,038][1652475] Updated weights for policy 0, policy_version 313254 (0.0111) [2024-06-15 15:37:50,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 641597440. Throughput: 0: 10990.9. Samples: 160425984. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:37:55,616][1652475] Updated weights for policy 0, policy_version 313313 (0.0014) [2024-06-15 15:37:55,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 44783.1, 300 sec: 42653.9). Total num frames: 641662976. Throughput: 0: 10740.6. Samples: 160487424. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:37:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:37:57,691][1652475] Updated weights for policy 0, policy_version 313367 (0.0126) [2024-06-15 15:38:00,479][1652475] Updated weights for policy 0, policy_version 313442 (0.0014) [2024-06-15 15:38:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 641957888. Throughput: 0: 10729.2. Samples: 160546816. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:38:00,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 15:38:02,197][1651340] Signal inference workers to stop experience collection... (16100 times) [2024-06-15 15:38:02,287][1652475] InferenceWorker_p0-w0: stopping experience collection (16100 times) [2024-06-15 15:38:02,464][1651340] Signal inference workers to resume experience collection... (16100 times) [2024-06-15 15:38:02,465][1652475] InferenceWorker_p0-w0: resuming experience collection (16100 times) [2024-06-15 15:38:03,062][1652475] Updated weights for policy 0, policy_version 313507 (0.0014) [2024-06-15 15:38:05,737][1648984] Fps is (10 sec: 45875.9, 60 sec: 43690.8, 300 sec: 42987.2). Total num frames: 642121728. Throughput: 0: 10479.0. Samples: 160574464. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:38:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:38:09,261][1652475] Updated weights for policy 0, policy_version 313576 (0.0109) [2024-06-15 15:38:10,284][1652475] Updated weights for policy 0, policy_version 313616 (0.0014) [2024-06-15 15:38:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 642318336. Throughput: 0: 10820.3. Samples: 160647168. Policy #0 lag: (min: 54.0, avg: 130.0, max: 310.0) [2024-06-15 15:38:10,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:38:11,890][1652475] Updated weights for policy 0, policy_version 313680 (0.0014) [2024-06-15 15:38:14,481][1652475] Updated weights for policy 0, policy_version 313736 (0.0012) [2024-06-15 15:38:15,743][1648984] Fps is (10 sec: 52427.4, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 642646016. Throughput: 0: 10581.3. Samples: 160705536. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:38:15,745][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:38:20,332][1652475] Updated weights for policy 0, policy_version 313800 (0.0014) [2024-06-15 15:38:20,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 42052.2, 300 sec: 42542.8). Total num frames: 642678784. Throughput: 0: 10615.4. Samples: 160741888. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:38:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:38:21,219][1652475] Updated weights for policy 0, policy_version 313846 (0.0014) [2024-06-15 15:38:24,082][1652475] Updated weights for policy 0, policy_version 313920 (0.0142) [2024-06-15 15:38:25,623][1652475] Updated weights for policy 0, policy_version 313976 (0.0018) [2024-06-15 15:38:25,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 40959.9, 300 sec: 43209.3). Total num frames: 643006464. Throughput: 0: 10763.4. Samples: 160808960. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:38:25,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:38:27,063][1652475] Updated weights for policy 0, policy_version 314038 (0.0012) [2024-06-15 15:38:30,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 43690.7, 300 sec: 42765.1). Total num frames: 643170304. Throughput: 0: 10763.4. Samples: 160878080. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:38:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:38:32,802][1652475] Updated weights for policy 0, policy_version 314102 (0.0017) [2024-06-15 15:38:35,654][1652475] Updated weights for policy 0, policy_version 314160 (0.0011) [2024-06-15 15:38:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 643399680. Throughput: 0: 10740.6. Samples: 160909312. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:38:35,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:38:37,720][1652475] Updated weights for policy 0, policy_version 314239 (0.0188) [2024-06-15 15:38:39,379][1652475] Updated weights for policy 0, policy_version 314296 (0.0037) [2024-06-15 15:38:40,774][1648984] Fps is (10 sec: 52238.9, 60 sec: 43664.4, 300 sec: 43093.6). Total num frames: 643694592. Throughput: 0: 10504.6. Samples: 160960512. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:38:40,775][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:38:45,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43144.4, 300 sec: 42765.0). Total num frames: 643792896. Throughput: 0: 10820.2. Samples: 161033728. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:38:45,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:38:45,765][1652475] Updated weights for policy 0, policy_version 314355 (0.0014) [2024-06-15 15:38:47,520][1652475] Updated weights for policy 0, policy_version 314386 (0.0011) [2024-06-15 15:38:48,268][1651340] Signal inference workers to stop experience collection... (16150 times) [2024-06-15 15:38:48,298][1652475] InferenceWorker_p0-w0: stopping experience collection (16150 times) [2024-06-15 15:38:48,486][1651340] Signal inference workers to resume experience collection... (16150 times) [2024-06-15 15:38:48,487][1652475] InferenceWorker_p0-w0: resuming experience collection (16150 times) [2024-06-15 15:38:49,938][1652475] Updated weights for policy 0, policy_version 314490 (0.0185) [2024-06-15 15:38:50,738][1648984] Fps is (10 sec: 42753.8, 60 sec: 42052.3, 300 sec: 43098.2). Total num frames: 644120576. Throughput: 0: 10820.2. Samples: 161061376. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:38:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:38:51,710][1652475] Updated weights for policy 0, policy_version 314555 (0.0047) [2024-06-15 15:38:55,758][1648984] Fps is (10 sec: 42511.5, 60 sec: 42583.8, 300 sec: 42651.0). Total num frames: 644218880. Throughput: 0: 10485.5. Samples: 161119232. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:38:55,759][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:38:55,770][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000314560_644218880.pth... [2024-06-15 15:38:55,857][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000309568_633995264.pth [2024-06-15 15:38:59,580][1652475] Updated weights for policy 0, policy_version 314624 (0.0034) [2024-06-15 15:39:00,739][1648984] Fps is (10 sec: 29491.0, 60 sec: 40960.0, 300 sec: 42876.1). Total num frames: 644415488. Throughput: 0: 10763.4. Samples: 161189888. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:39:00,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:39:01,955][1652475] Updated weights for policy 0, policy_version 314720 (0.0014) [2024-06-15 15:39:03,972][1652475] Updated weights for policy 0, policy_version 314800 (0.0013) [2024-06-15 15:39:05,738][1648984] Fps is (10 sec: 52536.0, 60 sec: 43690.5, 300 sec: 42765.0). Total num frames: 644743168. Throughput: 0: 10433.4. Samples: 161211392. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:39:05,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:39:10,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 40413.8, 300 sec: 42209.6). Total num frames: 644743168. Throughput: 0: 10376.5. Samples: 161275904. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:39:10,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:39:12,488][1652475] Updated weights for policy 0, policy_version 314850 (0.0014) [2024-06-15 15:39:14,145][1652475] Updated weights for policy 0, policy_version 314928 (0.0012) [2024-06-15 15:39:15,764][1648984] Fps is (10 sec: 35952.2, 60 sec: 40942.4, 300 sec: 42983.4). Total num frames: 645103616. Throughput: 0: 10165.9. Samples: 161335808. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:39:15,764][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:39:16,248][1652475] Updated weights for policy 0, policy_version 315024 (0.0036) [2024-06-15 15:39:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 645267456. Throughput: 0: 10057.9. Samples: 161361920. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:39:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:39:23,552][1652475] Updated weights for policy 0, policy_version 315088 (0.0014) [2024-06-15 15:39:25,739][1648984] Fps is (10 sec: 36133.1, 60 sec: 40959.1, 300 sec: 42875.9). Total num frames: 645464064. Throughput: 0: 10680.6. Samples: 161440768. Policy #0 lag: (min: 23.0, avg: 153.8, max: 271.0) [2024-06-15 15:39:25,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:39:25,990][1652475] Updated weights for policy 0, policy_version 315186 (0.0014) [2024-06-15 15:39:27,653][1652475] Updated weights for policy 0, policy_version 315264 (0.0014) [2024-06-15 15:39:28,961][1651340] Signal inference workers to stop experience collection... (16200 times) [2024-06-15 15:39:29,009][1652475] InferenceWorker_p0-w0: stopping experience collection (16200 times) [2024-06-15 15:39:29,143][1651340] Signal inference workers to resume experience collection... (16200 times) [2024-06-15 15:39:29,143][1652475] InferenceWorker_p0-w0: resuming experience collection (16200 times) [2024-06-15 15:39:29,498][1652475] Updated weights for policy 0, policy_version 315328 (0.0020) [2024-06-15 15:39:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 645791744. Throughput: 0: 10137.6. Samples: 161489920. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:39:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:39:35,738][1648984] Fps is (10 sec: 36049.7, 60 sec: 40413.8, 300 sec: 42321.0). Total num frames: 645824512. Throughput: 0: 10490.3. Samples: 161533440. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:39:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:39:36,269][1652475] Updated weights for policy 0, policy_version 315381 (0.0051) [2024-06-15 15:39:37,514][1652475] Updated weights for policy 0, policy_version 315424 (0.0012) [2024-06-15 15:39:39,369][1652475] Updated weights for policy 0, policy_version 315505 (0.0118) [2024-06-15 15:39:40,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42624.2, 300 sec: 43209.4). Total num frames: 646250496. Throughput: 0: 10631.7. Samples: 161597440. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:39:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:39:40,926][1652475] Updated weights for policy 0, policy_version 315571 (0.0036) [2024-06-15 15:39:45,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 42052.3, 300 sec: 42653.9). Total num frames: 646316032. Throughput: 0: 10752.0. Samples: 161673728. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:39:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:39:47,547][1652475] Updated weights for policy 0, policy_version 315646 (0.0015) [2024-06-15 15:39:50,530][1652475] Updated weights for policy 0, policy_version 315728 (0.0019) [2024-06-15 15:39:50,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 646610944. Throughput: 0: 11036.5. Samples: 161708032. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:39:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:39:52,980][1652475] Updated weights for policy 0, policy_version 315833 (0.0013) [2024-06-15 15:39:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43705.6, 300 sec: 42765.0). Total num frames: 646840320. Throughput: 0: 10752.0. Samples: 161759744. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:39:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:39:59,734][1652475] Updated weights for policy 0, policy_version 315875 (0.0013) [2024-06-15 15:40:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 646971392. Throughput: 0: 11031.4. Samples: 161831936. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:40:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:40:01,861][1652475] Updated weights for policy 0, policy_version 315952 (0.0014) [2024-06-15 15:40:03,166][1652475] Updated weights for policy 0, policy_version 315984 (0.0012) [2024-06-15 15:40:05,120][1652475] Updated weights for policy 0, policy_version 316053 (0.0027) [2024-06-15 15:40:05,739][1648984] Fps is (10 sec: 49145.7, 60 sec: 43143.6, 300 sec: 42987.0). Total num frames: 647331840. Throughput: 0: 11081.6. Samples: 161860608. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:40:05,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:40:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 647364608. Throughput: 0: 10729.6. Samples: 161923584. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:40:10,740][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 15:40:11,185][1652475] Updated weights for policy 0, policy_version 316112 (0.0012) [2024-06-15 15:40:13,856][1652475] Updated weights for policy 0, policy_version 316192 (0.0041) [2024-06-15 15:40:15,351][1651340] Signal inference workers to stop experience collection... (16250 times) [2024-06-15 15:40:15,391][1652475] InferenceWorker_p0-w0: stopping experience collection (16250 times) [2024-06-15 15:40:15,617][1651340] Signal inference workers to resume experience collection... (16250 times) [2024-06-15 15:40:15,618][1652475] InferenceWorker_p0-w0: resuming experience collection (16250 times) [2024-06-15 15:40:15,738][1648984] Fps is (10 sec: 36049.7, 60 sec: 43163.1, 300 sec: 42987.2). Total num frames: 647692288. Throughput: 0: 10956.8. Samples: 161982976. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:40:15,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 15:40:15,788][1652475] Updated weights for policy 0, policy_version 316257 (0.0015) [2024-06-15 15:40:18,838][1652475] Updated weights for policy 0, policy_version 316321 (0.0014) [2024-06-15 15:40:20,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 647888896. Throughput: 0: 10717.9. Samples: 162015744. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:40:20,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:40:23,661][1652475] Updated weights for policy 0, policy_version 316384 (0.0018) [2024-06-15 15:40:25,739][1648984] Fps is (10 sec: 32763.9, 60 sec: 42598.5, 300 sec: 42653.8). Total num frames: 648019968. Throughput: 0: 10694.8. Samples: 162078720. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:40:25,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:40:26,954][1652475] Updated weights for policy 0, policy_version 316433 (0.0012) [2024-06-15 15:40:29,513][1652475] Updated weights for policy 0, policy_version 316541 (0.0015) [2024-06-15 15:40:30,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 41506.0, 300 sec: 42431.8). Total num frames: 648282112. Throughput: 0: 10228.6. Samples: 162134016. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:40:30,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:40:35,738][1648984] Fps is (10 sec: 39326.7, 60 sec: 43144.6, 300 sec: 42320.8). Total num frames: 648413184. Throughput: 0: 10114.9. Samples: 162163200. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:40:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:40:35,986][1652475] Updated weights for policy 0, policy_version 316624 (0.0213) [2024-06-15 15:40:39,642][1652475] Updated weights for policy 0, policy_version 316688 (0.0014) [2024-06-15 15:40:40,739][1648984] Fps is (10 sec: 39316.2, 60 sec: 40412.9, 300 sec: 42320.5). Total num frames: 648675328. Throughput: 0: 10512.7. Samples: 162232832. Policy #0 lag: (min: 40.0, avg: 153.9, max: 328.0) [2024-06-15 15:40:40,740][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:40:42,500][1652475] Updated weights for policy 0, policy_version 316768 (0.0014) [2024-06-15 15:40:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 648937472. Throughput: 0: 10001.1. Samples: 162281984. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:40:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:40:48,750][1652475] Updated weights for policy 0, policy_version 316865 (0.0013) [2024-06-15 15:40:50,087][1652475] Updated weights for policy 0, policy_version 316923 (0.0013) [2024-06-15 15:40:50,738][1648984] Fps is (10 sec: 39327.7, 60 sec: 40960.0, 300 sec: 42320.7). Total num frames: 649068544. Throughput: 0: 10217.6. Samples: 162320384. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:40:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:40:53,072][1652475] Updated weights for policy 0, policy_version 316983 (0.0072) [2024-06-15 15:40:55,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 41506.0, 300 sec: 42209.7). Total num frames: 649330688. Throughput: 0: 10262.7. Samples: 162385408. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:40:55,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:40:56,079][1652475] Updated weights for policy 0, policy_version 317072 (0.0112) [2024-06-15 15:40:56,080][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000317072_649363456.pth... [2024-06-15 15:40:56,249][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000312032_639041536.pth [2024-06-15 15:40:57,148][1652475] Updated weights for policy 0, policy_version 317120 (0.0011) [2024-06-15 15:41:00,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 649494528. Throughput: 0: 10399.3. Samples: 162450944. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:41:05,095][1652475] Updated weights for policy 0, policy_version 317217 (0.0049) [2024-06-15 15:41:05,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 39868.6, 300 sec: 41987.5). Total num frames: 649723904. Throughput: 0: 10342.4. Samples: 162481152. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:05,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:41:06,804][1651340] Signal inference workers to stop experience collection... (16300 times) [2024-06-15 15:41:06,850][1652475] InferenceWorker_p0-w0: stopping experience collection (16300 times) [2024-06-15 15:41:07,082][1651340] Signal inference workers to resume experience collection... (16300 times) [2024-06-15 15:41:07,083][1652475] InferenceWorker_p0-w0: resuming experience collection (16300 times) [2024-06-15 15:41:07,319][1652475] Updated weights for policy 0, policy_version 317280 (0.0014) [2024-06-15 15:41:09,563][1652475] Updated weights for policy 0, policy_version 317369 (0.0013) [2024-06-15 15:41:10,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 649986048. Throughput: 0: 10228.9. Samples: 162539008. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:10,738][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 15:41:15,069][1652475] Updated weights for policy 0, policy_version 317432 (0.0013) [2024-06-15 15:41:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40413.8, 300 sec: 42209.6). Total num frames: 650117120. Throughput: 0: 10524.5. Samples: 162607616. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:15,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:41:17,295][1652475] Updated weights for policy 0, policy_version 317498 (0.0013) [2024-06-15 15:41:19,106][1652475] Updated weights for policy 0, policy_version 317542 (0.0012) [2024-06-15 15:41:20,624][1652475] Updated weights for policy 0, policy_version 317600 (0.0114) [2024-06-15 15:41:20,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 650444800. Throughput: 0: 10615.5. Samples: 162640896. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:41:21,601][1652475] Updated weights for policy 0, policy_version 317632 (0.0020) [2024-06-15 15:41:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.9, 300 sec: 42656.3). Total num frames: 650510336. Throughput: 0: 10433.8. Samples: 162702336. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 15:41:28,712][1652475] Updated weights for policy 0, policy_version 317695 (0.0013) [2024-06-15 15:41:30,373][1652475] Updated weights for policy 0, policy_version 317760 (0.0014) [2024-06-15 15:41:30,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42052.4, 300 sec: 42320.7). Total num frames: 650805248. Throughput: 0: 10797.5. Samples: 162767872. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:41:32,028][1652475] Updated weights for policy 0, policy_version 317827 (0.0024) [2024-06-15 15:41:35,741][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 651034624. Throughput: 0: 10422.0. Samples: 162789376. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:35,743][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:41:39,007][1652475] Updated weights for policy 0, policy_version 317904 (0.0013) [2024-06-15 15:41:39,932][1652475] Updated weights for policy 0, policy_version 317951 (0.0015) [2024-06-15 15:41:40,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 41507.1, 300 sec: 42653.9). Total num frames: 651165696. Throughput: 0: 10672.4. Samples: 162865664. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:41:43,258][1652475] Updated weights for policy 0, policy_version 318017 (0.0017) [2024-06-15 15:41:44,829][1652475] Updated weights for policy 0, policy_version 318083 (0.0012) [2024-06-15 15:41:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 651493376. Throughput: 0: 10581.3. Samples: 162927104. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:45,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:41:46,112][1652475] Updated weights for policy 0, policy_version 318139 (0.0017) [2024-06-15 15:41:50,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 42052.0, 300 sec: 42765.0). Total num frames: 651591680. Throughput: 0: 10729.2. Samples: 162963968. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:50,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:41:50,752][1652475] Updated weights for policy 0, policy_version 318165 (0.0013) [2024-06-15 15:41:52,445][1652475] Updated weights for policy 0, policy_version 318211 (0.0032) [2024-06-15 15:41:52,779][1651340] Signal inference workers to stop experience collection... (16350 times) [2024-06-15 15:41:52,840][1652475] InferenceWorker_p0-w0: stopping experience collection (16350 times) [2024-06-15 15:41:53,038][1651340] Signal inference workers to resume experience collection... (16350 times) [2024-06-15 15:41:53,039][1652475] InferenceWorker_p0-w0: resuming experience collection (16350 times) [2024-06-15 15:41:54,787][1652475] Updated weights for policy 0, policy_version 318288 (0.0014) [2024-06-15 15:41:55,743][1648984] Fps is (10 sec: 42577.7, 60 sec: 43141.2, 300 sec: 42764.3). Total num frames: 651919360. Throughput: 0: 11012.5. Samples: 163034624. Policy #0 lag: (min: 15.0, avg: 141.4, max: 335.0) [2024-06-15 15:41:55,744][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:41:57,242][1652475] Updated weights for policy 0, policy_version 318395 (0.0111) [2024-06-15 15:42:00,738][1648984] Fps is (10 sec: 49153.6, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 652083200. Throughput: 0: 10934.1. Samples: 163099648. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:42:02,540][1652475] Updated weights for policy 0, policy_version 318434 (0.0017) [2024-06-15 15:42:05,738][1648984] Fps is (10 sec: 36062.7, 60 sec: 42598.5, 300 sec: 42654.0). Total num frames: 652279808. Throughput: 0: 11047.8. Samples: 163138048. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:42:05,952][1652475] Updated weights for policy 0, policy_version 318517 (0.0013) [2024-06-15 15:42:08,049][1652475] Updated weights for policy 0, policy_version 318581 (0.0014) [2024-06-15 15:42:09,872][1652475] Updated weights for policy 0, policy_version 318649 (0.0014) [2024-06-15 15:42:10,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 652607488. Throughput: 0: 10797.5. Samples: 163188224. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:42:15,002][1652475] Updated weights for policy 0, policy_version 318692 (0.0013) [2024-06-15 15:42:15,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 652738560. Throughput: 0: 11002.3. Samples: 163262976. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:15,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:42:17,907][1652475] Updated weights for policy 0, policy_version 318755 (0.0035) [2024-06-15 15:42:18,670][1652475] Updated weights for policy 0, policy_version 318785 (0.0027) [2024-06-15 15:42:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43144.5, 300 sec: 42320.7). Total num frames: 653033472. Throughput: 0: 11286.8. Samples: 163297280. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:42:20,763][1652475] Updated weights for policy 0, policy_version 318865 (0.0013) [2024-06-15 15:42:25,545][1652475] Updated weights for policy 0, policy_version 318915 (0.0012) [2024-06-15 15:42:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 653131776. Throughput: 0: 11025.1. Samples: 163361792. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:42:26,900][1652475] Updated weights for policy 0, policy_version 318970 (0.0012) [2024-06-15 15:42:30,097][1652475] Updated weights for policy 0, policy_version 319031 (0.0024) [2024-06-15 15:42:30,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43144.5, 300 sec: 42431.8). Total num frames: 653393920. Throughput: 0: 11002.3. Samples: 163422208. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:30,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:42:32,127][1652475] Updated weights for policy 0, policy_version 319102 (0.0013) [2024-06-15 15:42:34,933][1652475] Updated weights for policy 0, policy_version 319168 (0.0013) [2024-06-15 15:42:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 653656064. Throughput: 0: 10831.7. Samples: 163451392. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:42:38,479][1651340] Signal inference workers to stop experience collection... (16400 times) [2024-06-15 15:42:38,540][1652475] InferenceWorker_p0-w0: stopping experience collection (16400 times) [2024-06-15 15:42:38,708][1651340] Signal inference workers to resume experience collection... (16400 times) [2024-06-15 15:42:38,709][1652475] InferenceWorker_p0-w0: resuming experience collection (16400 times) [2024-06-15 15:42:39,042][1652475] Updated weights for policy 0, policy_version 319232 (0.0013) [2024-06-15 15:42:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 653787136. Throughput: 0: 10741.8. Samples: 163517952. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:42:43,467][1652475] Updated weights for policy 0, policy_version 319328 (0.0014) [2024-06-15 15:42:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 654049280. Throughput: 0: 10604.1. Samples: 163576832. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:42:47,303][1652475] Updated weights for policy 0, policy_version 319419 (0.0017) [2024-06-15 15:42:50,530][1652475] Updated weights for policy 0, policy_version 319480 (0.0012) [2024-06-15 15:42:50,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45329.3, 300 sec: 42876.1). Total num frames: 654311424. Throughput: 0: 10570.0. Samples: 163613696. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:50,741][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 15:42:54,378][1652475] Updated weights for policy 0, policy_version 319521 (0.0014) [2024-06-15 15:42:55,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 42055.5, 300 sec: 42320.7). Total num frames: 654442496. Throughput: 0: 10911.2. Samples: 163679232. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:42:55,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:42:55,756][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000319552_654442496.pth... [2024-06-15 15:42:56,009][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000314560_644218880.pth [2024-06-15 15:42:56,228][1652475] Updated weights for policy 0, policy_version 319568 (0.0073) [2024-06-15 15:42:58,212][1652475] Updated weights for policy 0, policy_version 319648 (0.0012) [2024-06-15 15:43:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 654704640. Throughput: 0: 10763.4. Samples: 163747328. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:43:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:43:01,467][1652475] Updated weights for policy 0, policy_version 319718 (0.0012) [2024-06-15 15:43:05,744][1648984] Fps is (10 sec: 42572.5, 60 sec: 43139.9, 300 sec: 42541.9). Total num frames: 654868480. Throughput: 0: 10659.5. Samples: 163777024. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:43:05,745][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:43:05,967][1652475] Updated weights for policy 0, policy_version 319779 (0.0089) [2024-06-15 15:43:09,363][1652475] Updated weights for policy 0, policy_version 319843 (0.0014) [2024-06-15 15:43:10,623][1652475] Updated weights for policy 0, policy_version 319892 (0.0030) [2024-06-15 15:43:10,739][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 655130624. Throughput: 0: 10740.6. Samples: 163845120. Policy #0 lag: (min: 111.0, avg: 228.3, max: 367.0) [2024-06-15 15:43:10,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:43:12,762][1652475] Updated weights for policy 0, policy_version 319937 (0.0013) [2024-06-15 15:43:13,980][1652475] Updated weights for policy 0, policy_version 320000 (0.0019) [2024-06-15 15:43:15,738][1648984] Fps is (10 sec: 49183.6, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 655360000. Throughput: 0: 10831.7. Samples: 163909632. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:43:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:43:20,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40959.9, 300 sec: 42320.7). Total num frames: 655491072. Throughput: 0: 10888.5. Samples: 163941376. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:43:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:43:21,273][1652475] Updated weights for policy 0, policy_version 320086 (0.0015) [2024-06-15 15:43:23,541][1652475] Updated weights for policy 0, policy_version 320176 (0.0134) [2024-06-15 15:43:25,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 655818752. Throughput: 0: 10763.4. Samples: 164002304. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:43:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:43:25,807][1651340] Signal inference workers to stop experience collection... (16450 times) [2024-06-15 15:43:25,840][1652475] InferenceWorker_p0-w0: stopping experience collection (16450 times) [2024-06-15 15:43:26,021][1651340] Signal inference workers to resume experience collection... (16450 times) [2024-06-15 15:43:26,021][1652475] InferenceWorker_p0-w0: resuming experience collection (16450 times) [2024-06-15 15:43:26,235][1652475] Updated weights for policy 0, policy_version 320247 (0.0014) [2024-06-15 15:43:29,700][1652475] Updated weights for policy 0, policy_version 320295 (0.0016) [2024-06-15 15:43:30,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 656015360. Throughput: 0: 10956.8. Samples: 164069888. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:43:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:43:33,047][1652475] Updated weights for policy 0, policy_version 320336 (0.0015) [2024-06-15 15:43:35,647][1652475] Updated weights for policy 0, policy_version 320447 (0.0209) [2024-06-15 15:43:35,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 42659.2). Total num frames: 656277504. Throughput: 0: 10899.9. Samples: 164104192. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:43:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 15:43:40,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 43690.4, 300 sec: 42765.0). Total num frames: 656408576. Throughput: 0: 10808.9. Samples: 164165632. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:43:40,739][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 15:43:41,714][1652475] Updated weights for policy 0, policy_version 320544 (0.0015) [2024-06-15 15:43:45,693][1652475] Updated weights for policy 0, policy_version 320608 (0.0014) [2024-06-15 15:43:45,739][1648984] Fps is (10 sec: 32764.8, 60 sec: 42597.7, 300 sec: 42320.6). Total num frames: 656605184. Throughput: 0: 10683.5. Samples: 164228096. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:43:45,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:43:47,927][1652475] Updated weights for policy 0, policy_version 320704 (0.0014) [2024-06-15 15:43:50,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 41506.1, 300 sec: 42656.9). Total num frames: 656801792. Throughput: 0: 10491.8. Samples: 164249088. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:43:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:43:53,375][1652475] Updated weights for policy 0, policy_version 320768 (0.0016) [2024-06-15 15:43:55,738][1648984] Fps is (10 sec: 36047.5, 60 sec: 42052.3, 300 sec: 42542.8). Total num frames: 656965632. Throughput: 0: 10604.0. Samples: 164322304. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:43:55,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:43:57,233][1652475] Updated weights for policy 0, policy_version 320833 (0.0014) [2024-06-15 15:43:59,968][1652475] Updated weights for policy 0, policy_version 320944 (0.0013) [2024-06-15 15:44:00,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 657326080. Throughput: 0: 10240.0. Samples: 164370432. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:44:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:44:04,358][1652475] Updated weights for policy 0, policy_version 320976 (0.0016) [2024-06-15 15:44:05,738][1648984] Fps is (10 sec: 49153.0, 60 sec: 43149.1, 300 sec: 43098.3). Total num frames: 657457152. Throughput: 0: 10467.6. Samples: 164412416. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:44:05,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:44:08,343][1652475] Updated weights for policy 0, policy_version 321040 (0.0013) [2024-06-15 15:44:09,259][1652475] Updated weights for policy 0, policy_version 321088 (0.0013) [2024-06-15 15:44:10,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42052.3, 300 sec: 42546.6). Total num frames: 657653760. Throughput: 0: 10558.6. Samples: 164477440. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:44:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:44:12,038][1652475] Updated weights for policy 0, policy_version 321184 (0.0039) [2024-06-15 15:44:15,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 41505.9, 300 sec: 42653.9). Total num frames: 657850368. Throughput: 0: 10410.6. Samples: 164538368. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:44:15,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:44:16,247][1651340] Signal inference workers to stop experience collection... (16500 times) [2024-06-15 15:44:16,322][1652475] InferenceWorker_p0-w0: stopping experience collection (16500 times) [2024-06-15 15:44:16,514][1651340] Signal inference workers to resume experience collection... (16500 times) [2024-06-15 15:44:16,515][1652475] InferenceWorker_p0-w0: resuming experience collection (16500 times) [2024-06-15 15:44:17,165][1652475] Updated weights for policy 0, policy_version 321250 (0.0013) [2024-06-15 15:44:20,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42052.3, 300 sec: 42543.1). Total num frames: 658014208. Throughput: 0: 10365.2. Samples: 164570624. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:44:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:44:20,845][1652475] Updated weights for policy 0, policy_version 321312 (0.0028) [2024-06-15 15:44:23,646][1652475] Updated weights for policy 0, policy_version 321396 (0.0013) [2024-06-15 15:44:25,091][1652475] Updated weights for policy 0, policy_version 321463 (0.0120) [2024-06-15 15:44:25,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 42598.3, 300 sec: 42653.9). Total num frames: 658374656. Throughput: 0: 10319.7. Samples: 164630016. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 15:44:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:44:29,273][1652475] Updated weights for policy 0, policy_version 321504 (0.0036) [2024-06-15 15:44:30,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 658505728. Throughput: 0: 10433.7. Samples: 164697600. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:44:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:44:33,553][1652475] Updated weights for policy 0, policy_version 321596 (0.0020) [2024-06-15 15:44:35,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 40413.9, 300 sec: 42209.6). Total num frames: 658702336. Throughput: 0: 10706.5. Samples: 164730880. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:44:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:44:36,685][1652475] Updated weights for policy 0, policy_version 321688 (0.0091) [2024-06-15 15:44:40,750][1648984] Fps is (10 sec: 39271.7, 60 sec: 41497.6, 300 sec: 42652.1). Total num frames: 658898944. Throughput: 0: 10532.9. Samples: 164796416. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:44:40,751][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 15:44:40,890][1652475] Updated weights for policy 0, policy_version 321744 (0.0016) [2024-06-15 15:44:41,958][1652475] Updated weights for policy 0, policy_version 321792 (0.0016) [2024-06-15 15:44:44,999][1652475] Updated weights for policy 0, policy_version 321853 (0.0041) [2024-06-15 15:44:45,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42599.1, 300 sec: 42542.9). Total num frames: 659161088. Throughput: 0: 10968.2. Samples: 164864000. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:44:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:44:47,348][1652475] Updated weights for policy 0, policy_version 321893 (0.0014) [2024-06-15 15:44:49,075][1652475] Updated weights for policy 0, policy_version 321968 (0.0014) [2024-06-15 15:44:50,738][1648984] Fps is (10 sec: 52495.3, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 659423232. Throughput: 0: 10752.0. Samples: 164896256. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:44:50,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 15:44:52,117][1652475] Updated weights for policy 0, policy_version 321992 (0.0011) [2024-06-15 15:44:55,184][1652475] Updated weights for policy 0, policy_version 322050 (0.0015) [2024-06-15 15:44:55,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43690.8, 300 sec: 42765.0). Total num frames: 659587072. Throughput: 0: 10831.6. Samples: 164964864. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:44:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:44:56,247][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000322096_659652608.pth... [2024-06-15 15:44:56,318][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000317072_649363456.pth [2024-06-15 15:44:56,620][1652475] Updated weights for policy 0, policy_version 322110 (0.0012) [2024-06-15 15:44:58,900][1652475] Updated weights for policy 0, policy_version 322163 (0.0013) [2024-06-15 15:45:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 42320.9). Total num frames: 659816448. Throughput: 0: 10922.8. Samples: 165029888. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:45:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:45:01,984][1651340] Signal inference workers to stop experience collection... (16550 times) [2024-06-15 15:45:02,034][1652475] InferenceWorker_p0-w0: stopping experience collection (16550 times) [2024-06-15 15:45:02,246][1651340] Signal inference workers to resume experience collection... (16550 times) [2024-06-15 15:45:02,247][1652475] InferenceWorker_p0-w0: resuming experience collection (16550 times) [2024-06-15 15:45:02,448][1652475] Updated weights for policy 0, policy_version 322230 (0.0013) [2024-06-15 15:45:04,206][1652475] Updated weights for policy 0, policy_version 322272 (0.0012) [2024-06-15 15:45:05,739][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 660078592. Throughput: 0: 10945.4. Samples: 165063168. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:45:05,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:45:07,848][1652475] Updated weights for policy 0, policy_version 322363 (0.0028) [2024-06-15 15:45:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 660209664. Throughput: 0: 10979.6. Samples: 165124096. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:45:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:45:12,745][1652475] Updated weights for policy 0, policy_version 322425 (0.0015) [2024-06-15 15:45:14,364][1652475] Updated weights for policy 0, policy_version 322464 (0.0029) [2024-06-15 15:45:15,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44237.1, 300 sec: 42765.0). Total num frames: 660504576. Throughput: 0: 10854.4. Samples: 165186048. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:45:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:45:15,882][1652475] Updated weights for policy 0, policy_version 322517 (0.0013) [2024-06-15 15:45:19,326][1652475] Updated weights for policy 0, policy_version 322583 (0.0013) [2024-06-15 15:45:20,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45329.1, 300 sec: 43098.4). Total num frames: 660733952. Throughput: 0: 10820.3. Samples: 165217792. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:45:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:45:25,213][1652475] Updated weights for policy 0, policy_version 322640 (0.0014) [2024-06-15 15:45:25,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 40414.0, 300 sec: 42431.8). Total num frames: 660799488. Throughput: 0: 10959.9. Samples: 165289472. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:45:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:45:26,616][1652475] Updated weights for policy 0, policy_version 322688 (0.0016) [2024-06-15 15:45:29,017][1652475] Updated weights for policy 0, policy_version 322784 (0.0013) [2024-06-15 15:45:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 661127168. Throughput: 0: 10558.6. Samples: 165339136. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:45:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:45:32,108][1652475] Updated weights for policy 0, policy_version 322864 (0.0033) [2024-06-15 15:45:35,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 42598.4, 300 sec: 42654.1). Total num frames: 661258240. Throughput: 0: 10535.8. Samples: 165370368. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:45:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:45:38,090][1652475] Updated weights for policy 0, policy_version 322916 (0.0020) [2024-06-15 15:45:39,303][1652475] Updated weights for policy 0, policy_version 322961 (0.0013) [2024-06-15 15:45:40,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43699.9, 300 sec: 42653.9). Total num frames: 661520384. Throughput: 0: 10638.2. Samples: 165443584. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 15:45:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:45:41,237][1652475] Updated weights for policy 0, policy_version 323041 (0.0106) [2024-06-15 15:45:44,231][1652475] Updated weights for policy 0, policy_version 323131 (0.0013) [2024-06-15 15:45:45,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 661782528. Throughput: 0: 10535.8. Samples: 165504000. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:45:45,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:45:50,059][1652475] Updated weights for policy 0, policy_version 323193 (0.0017) [2024-06-15 15:45:50,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 41506.0, 300 sec: 42654.0). Total num frames: 661913600. Throughput: 0: 10638.2. Samples: 165541888. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:45:50,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:45:50,782][1651340] Signal inference workers to stop experience collection... (16600 times) [2024-06-15 15:45:50,838][1652475] InferenceWorker_p0-w0: stopping experience collection (16600 times) [2024-06-15 15:45:51,109][1651340] Signal inference workers to resume experience collection... (16600 times) [2024-06-15 15:45:51,109][1652475] InferenceWorker_p0-w0: resuming experience collection (16600 times) [2024-06-15 15:45:52,695][1652475] Updated weights for policy 0, policy_version 323265 (0.0012) [2024-06-15 15:45:54,051][1652475] Updated weights for policy 0, policy_version 323328 (0.0014) [2024-06-15 15:45:55,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 45329.0, 300 sec: 43431.5). Total num frames: 662306816. Throughput: 0: 10695.1. Samples: 165605376. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:45:55,738][1648984] Avg episode reward: [(0, '-0.600')] [2024-06-15 15:46:00,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41505.9, 300 sec: 42653.9). Total num frames: 662306816. Throughput: 0: 10820.2. Samples: 165672960. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:00,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:46:00,926][1652475] Updated weights for policy 0, policy_version 323408 (0.0014) [2024-06-15 15:46:02,323][1652475] Updated weights for policy 0, policy_version 323454 (0.0032) [2024-06-15 15:46:03,802][1652475] Updated weights for policy 0, policy_version 323504 (0.0015) [2024-06-15 15:46:05,747][1648984] Fps is (10 sec: 26198.3, 60 sec: 41501.8, 300 sec: 42653.0). Total num frames: 662568960. Throughput: 0: 10739.1. Samples: 165701120. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:05,748][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:46:07,256][1652475] Updated weights for policy 0, policy_version 323573 (0.0116) [2024-06-15 15:46:08,663][1652475] Updated weights for policy 0, policy_version 323620 (0.0012) [2024-06-15 15:46:10,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 662831104. Throughput: 0: 10478.9. Samples: 165761024. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:10,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:46:12,716][1652475] Updated weights for policy 0, policy_version 323680 (0.0028) [2024-06-15 15:46:15,738][1648984] Fps is (10 sec: 45904.2, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 663027712. Throughput: 0: 10865.8. Samples: 165828096. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:46:15,894][1652475] Updated weights for policy 0, policy_version 323760 (0.0014) [2024-06-15 15:46:19,001][1652475] Updated weights for policy 0, policy_version 323834 (0.0012) [2024-06-15 15:46:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 663224320. Throughput: 0: 10877.2. Samples: 165859840. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:20,748][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:46:22,486][1652475] Updated weights for policy 0, policy_version 323901 (0.0033) [2024-06-15 15:46:25,657][1652475] Updated weights for policy 0, policy_version 323964 (0.0011) [2024-06-15 15:46:25,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 42987.2). Total num frames: 663486464. Throughput: 0: 10717.9. Samples: 165925888. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:46:28,041][1652475] Updated weights for policy 0, policy_version 324004 (0.0012) [2024-06-15 15:46:30,371][1652475] Updated weights for policy 0, policy_version 324069 (0.0020) [2024-06-15 15:46:30,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 663715840. Throughput: 0: 10808.9. Samples: 165990400. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:46:33,784][1652475] Updated weights for policy 0, policy_version 324128 (0.0011) [2024-06-15 15:46:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 663879680. Throughput: 0: 10752.0. Samples: 166025728. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:46:36,603][1652475] Updated weights for policy 0, policy_version 324192 (0.0013) [2024-06-15 15:46:39,132][1651340] Signal inference workers to stop experience collection... (16650 times) [2024-06-15 15:46:39,187][1652475] InferenceWorker_p0-w0: stopping experience collection (16650 times) [2024-06-15 15:46:39,296][1651340] Signal inference workers to resume experience collection... (16650 times) [2024-06-15 15:46:39,297][1652475] InferenceWorker_p0-w0: resuming experience collection (16650 times) [2024-06-15 15:46:39,299][1652475] Updated weights for policy 0, policy_version 324256 (0.0013) [2024-06-15 15:46:39,808][1652475] Updated weights for policy 0, policy_version 324285 (0.0013) [2024-06-15 15:46:40,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 664141824. Throughput: 0: 10808.9. Samples: 166091776. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:46:42,244][1652475] Updated weights for policy 0, policy_version 324346 (0.0018) [2024-06-15 15:46:45,665][1652475] Updated weights for policy 0, policy_version 324412 (0.0015) [2024-06-15 15:46:45,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 664403968. Throughput: 0: 10843.1. Samples: 166160896. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:46:48,655][1652475] Updated weights for policy 0, policy_version 324464 (0.0018) [2024-06-15 15:46:50,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 44236.9, 300 sec: 42876.8). Total num frames: 664567808. Throughput: 0: 10912.8. Samples: 166192128. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:46:53,010][1652475] Updated weights for policy 0, policy_version 324548 (0.0014) [2024-06-15 15:46:54,309][1652475] Updated weights for policy 0, policy_version 324601 (0.0013) [2024-06-15 15:46:55,739][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 43098.2). Total num frames: 664797184. Throughput: 0: 10979.6. Samples: 166255104. Policy #0 lag: (min: 63.0, avg: 213.3, max: 319.0) [2024-06-15 15:46:55,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:46:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000324608_664797184.pth... [2024-06-15 15:46:55,862][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000319552_654442496.pth [2024-06-15 15:46:57,789][1652475] Updated weights for policy 0, policy_version 324656 (0.0016) [2024-06-15 15:47:00,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44237.0, 300 sec: 42987.2). Total num frames: 664961024. Throughput: 0: 11059.2. Samples: 166325760. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:47:00,954][1652475] Updated weights for policy 0, policy_version 324704 (0.0013) [2024-06-15 15:47:02,193][1652475] Updated weights for policy 0, policy_version 324757 (0.0015) [2024-06-15 15:47:03,021][1652475] Updated weights for policy 0, policy_version 324798 (0.0012) [2024-06-15 15:47:04,986][1652475] Updated weights for policy 0, policy_version 324856 (0.0012) [2024-06-15 15:47:05,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45880.0, 300 sec: 43098.3). Total num frames: 665321472. Throughput: 0: 11104.7. Samples: 166359552. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 15:47:08,554][1652475] Updated weights for policy 0, policy_version 324923 (0.0012) [2024-06-15 15:47:10,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 665452544. Throughput: 0: 11161.6. Samples: 166428160. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:47:13,906][1652475] Updated weights for policy 0, policy_version 324992 (0.0013) [2024-06-15 15:47:15,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 44236.8, 300 sec: 42876.1). Total num frames: 665681920. Throughput: 0: 11150.2. Samples: 166492160. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:15,751][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:47:16,558][1652475] Updated weights for policy 0, policy_version 325076 (0.0014) [2024-06-15 15:47:19,810][1652475] Updated weights for policy 0, policy_version 325136 (0.0015) [2024-06-15 15:47:20,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 665944064. Throughput: 0: 10956.8. Samples: 166518784. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:47:21,023][1652475] Updated weights for policy 0, policy_version 325182 (0.0014) [2024-06-15 15:47:25,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 666009600. Throughput: 0: 11104.7. Samples: 166591488. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:47:27,128][1652475] Updated weights for policy 0, policy_version 325250 (0.0015) [2024-06-15 15:47:27,435][1651340] Signal inference workers to stop experience collection... (16700 times) [2024-06-15 15:47:27,458][1652475] InferenceWorker_p0-w0: stopping experience collection (16700 times) [2024-06-15 15:47:27,642][1651340] Signal inference workers to resume experience collection... (16700 times) [2024-06-15 15:47:27,643][1652475] InferenceWorker_p0-w0: resuming experience collection (16700 times) [2024-06-15 15:47:28,886][1652475] Updated weights for policy 0, policy_version 325328 (0.0105) [2024-06-15 15:47:30,739][1648984] Fps is (10 sec: 42593.2, 60 sec: 44235.9, 300 sec: 43098.1). Total num frames: 666370048. Throughput: 0: 10785.8. Samples: 166646272. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:30,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:47:31,712][1652475] Updated weights for policy 0, policy_version 325377 (0.0012) [2024-06-15 15:47:35,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 666501120. Throughput: 0: 10820.2. Samples: 166679040. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:47:38,038][1652475] Updated weights for policy 0, policy_version 325442 (0.0014) [2024-06-15 15:47:39,858][1652475] Updated weights for policy 0, policy_version 325505 (0.0012) [2024-06-15 15:47:40,738][1648984] Fps is (10 sec: 32772.1, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 666697728. Throughput: 0: 10865.8. Samples: 166744064. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:47:42,665][1652475] Updated weights for policy 0, policy_version 325629 (0.0130) [2024-06-15 15:47:45,652][1652475] Updated weights for policy 0, policy_version 325692 (0.0012) [2024-06-15 15:47:45,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 667025408. Throughput: 0: 10513.1. Samples: 166798848. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:47:50,738][1648984] Fps is (10 sec: 32767.3, 60 sec: 40959.9, 300 sec: 42654.0). Total num frames: 667025408. Throughput: 0: 10558.5. Samples: 166834688. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:47:52,408][1652475] Updated weights for policy 0, policy_version 325760 (0.0016) [2024-06-15 15:47:54,020][1652475] Updated weights for policy 0, policy_version 325824 (0.0120) [2024-06-15 15:47:55,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 667353088. Throughput: 0: 10353.8. Samples: 166894080. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:47:55,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:47:56,116][1652475] Updated weights for policy 0, policy_version 325880 (0.0012) [2024-06-15 15:48:00,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43144.5, 300 sec: 42988.1). Total num frames: 667549696. Throughput: 0: 10387.9. Samples: 166959616. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:48:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:48:03,834][1652475] Updated weights for policy 0, policy_version 325968 (0.0019) [2024-06-15 15:48:05,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 39867.7, 300 sec: 42653.9). Total num frames: 667713536. Throughput: 0: 10604.1. Samples: 166995968. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:48:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:48:06,075][1652475] Updated weights for policy 0, policy_version 326048 (0.0013) [2024-06-15 15:48:07,850][1652475] Updated weights for policy 0, policy_version 326112 (0.0034) [2024-06-15 15:48:09,581][1652475] Updated weights for policy 0, policy_version 326180 (0.0013) [2024-06-15 15:48:10,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 668073984. Throughput: 0: 10160.3. Samples: 167048704. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:48:10,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:48:15,738][1648984] Fps is (10 sec: 36043.7, 60 sec: 39867.5, 300 sec: 42653.9). Total num frames: 668073984. Throughput: 0: 10649.8. Samples: 167125504. Policy #0 lag: (min: 11.0, avg: 112.7, max: 267.0) [2024-06-15 15:48:15,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:48:16,163][1652475] Updated weights for policy 0, policy_version 326226 (0.0012) [2024-06-15 15:48:16,548][1651340] Signal inference workers to stop experience collection... (16750 times) [2024-06-15 15:48:16,589][1652475] InferenceWorker_p0-w0: stopping experience collection (16750 times) [2024-06-15 15:48:16,897][1651340] Signal inference workers to resume experience collection... (16750 times) [2024-06-15 15:48:16,898][1652475] InferenceWorker_p0-w0: resuming experience collection (16750 times) [2024-06-15 15:48:18,256][1652475] Updated weights for policy 0, policy_version 326288 (0.0013) [2024-06-15 15:48:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 668467200. Throughput: 0: 10592.7. Samples: 167155712. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:48:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:48:21,541][1652475] Updated weights for policy 0, policy_version 326440 (0.0018) [2024-06-15 15:48:25,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 668598272. Throughput: 0: 10296.9. Samples: 167207424. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:48:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:48:28,801][1652475] Updated weights for policy 0, policy_version 326500 (0.0017) [2024-06-15 15:48:30,058][1652475] Updated weights for policy 0, policy_version 326529 (0.0013) [2024-06-15 15:48:30,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 40414.7, 300 sec: 42431.8). Total num frames: 668794880. Throughput: 0: 10604.1. Samples: 167276032. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:48:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:48:31,481][1652475] Updated weights for policy 0, policy_version 326608 (0.0087) [2024-06-15 15:48:35,402][1652475] Updated weights for policy 0, policy_version 326672 (0.0013) [2024-06-15 15:48:35,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 669057024. Throughput: 0: 10479.0. Samples: 167306240. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:48:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:48:39,952][1652475] Updated weights for policy 0, policy_version 326752 (0.0016) [2024-06-15 15:48:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 42765.2). Total num frames: 669220864. Throughput: 0: 10740.6. Samples: 167377408. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:48:40,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:48:41,816][1652475] Updated weights for policy 0, policy_version 326803 (0.0013) [2024-06-15 15:48:42,651][1652475] Updated weights for policy 0, policy_version 326848 (0.0013) [2024-06-15 15:48:43,940][1652475] Updated weights for policy 0, policy_version 326912 (0.0013) [2024-06-15 15:48:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 669515776. Throughput: 0: 10752.0. Samples: 167443456. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:48:45,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:48:48,783][1652475] Updated weights for policy 0, policy_version 326967 (0.0014) [2024-06-15 15:48:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.8, 300 sec: 42987.2). Total num frames: 669646848. Throughput: 0: 10729.3. Samples: 167478784. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:48:50,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:48:51,548][1652475] Updated weights for policy 0, policy_version 327008 (0.0191) [2024-06-15 15:48:53,073][1652475] Updated weights for policy 0, policy_version 327043 (0.0012) [2024-06-15 15:48:54,247][1652475] Updated weights for policy 0, policy_version 327104 (0.0037) [2024-06-15 15:48:55,571][1652475] Updated weights for policy 0, policy_version 327167 (0.0107) [2024-06-15 15:48:55,744][1648984] Fps is (10 sec: 52395.0, 60 sec: 44778.1, 300 sec: 43097.3). Total num frames: 670040064. Throughput: 0: 11046.2. Samples: 167545856. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:48:55,745][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:48:55,750][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000327168_670040064.pth... [2024-06-15 15:48:55,847][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000322096_659652608.pth [2024-06-15 15:49:00,073][1651340] Signal inference workers to stop experience collection... (16800 times) [2024-06-15 15:49:00,129][1652475] InferenceWorker_p0-w0: stopping experience collection (16800 times) [2024-06-15 15:49:00,386][1651340] Signal inference workers to resume experience collection... (16800 times) [2024-06-15 15:49:00,387][1652475] InferenceWorker_p0-w0: resuming experience collection (16800 times) [2024-06-15 15:49:00,599][1652475] Updated weights for policy 0, policy_version 327225 (0.0012) [2024-06-15 15:49:00,740][1648984] Fps is (10 sec: 52425.6, 60 sec: 43690.3, 300 sec: 43098.2). Total num frames: 670171136. Throughput: 0: 10831.6. Samples: 167612928. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:49:00,741][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:49:03,268][1652475] Updated weights for policy 0, policy_version 327286 (0.0013) [2024-06-15 15:49:04,892][1652475] Updated weights for policy 0, policy_version 327328 (0.0013) [2024-06-15 15:49:05,738][1648984] Fps is (10 sec: 39347.2, 60 sec: 45329.1, 300 sec: 43320.4). Total num frames: 670433280. Throughput: 0: 10865.8. Samples: 167644672. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:49:05,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:49:10,738][1648984] Fps is (10 sec: 39323.9, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 670564352. Throughput: 0: 11184.4. Samples: 167710720. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:49:10,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:49:11,220][1652475] Updated weights for policy 0, policy_version 327425 (0.0016) [2024-06-15 15:49:14,357][1652475] Updated weights for policy 0, policy_version 327489 (0.0119) [2024-06-15 15:49:15,691][1652475] Updated weights for policy 0, policy_version 327549 (0.0012) [2024-06-15 15:49:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 45329.4, 300 sec: 43320.4). Total num frames: 670793728. Throughput: 0: 11025.1. Samples: 167772160. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:49:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:49:17,667][1652475] Updated weights for policy 0, policy_version 327586 (0.0011) [2024-06-15 15:49:19,182][1652475] Updated weights for policy 0, policy_version 327664 (0.0014) [2024-06-15 15:49:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 671088640. Throughput: 0: 11082.0. Samples: 167804928. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:49:20,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:49:24,262][1652475] Updated weights for policy 0, policy_version 327719 (0.0048) [2024-06-15 15:49:25,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 671219712. Throughput: 0: 10990.9. Samples: 167872000. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:49:25,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:49:27,718][1652475] Updated weights for policy 0, policy_version 327776 (0.0013) [2024-06-15 15:49:29,771][1652475] Updated weights for policy 0, policy_version 327871 (0.0013) [2024-06-15 15:49:30,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 671514624. Throughput: 0: 10956.8. Samples: 167936512. Policy #0 lag: (min: 15.0, avg: 69.1, max: 255.0) [2024-06-15 15:49:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:49:31,305][1652475] Updated weights for policy 0, policy_version 327927 (0.0014) [2024-06-15 15:49:34,941][1652475] Updated weights for policy 0, policy_version 327956 (0.0028) [2024-06-15 15:49:35,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 44236.9, 300 sec: 43433.4). Total num frames: 671711232. Throughput: 0: 10911.3. Samples: 167969792. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:49:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:49:40,012][1652475] Updated weights for policy 0, policy_version 328032 (0.0018) [2024-06-15 15:49:40,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 44236.8, 300 sec: 43098.2). Total num frames: 671875072. Throughput: 0: 11015.3. Samples: 168041472. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:49:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:49:41,446][1652475] Updated weights for policy 0, policy_version 328080 (0.0024) [2024-06-15 15:49:43,469][1652475] Updated weights for policy 0, policy_version 328160 (0.0012) [2024-06-15 15:49:44,094][1652475] Updated weights for policy 0, policy_version 328192 (0.0013) [2024-06-15 15:49:45,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 672137216. Throughput: 0: 10774.9. Samples: 168097792. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:49:45,742][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:49:47,311][1651340] Signal inference workers to stop experience collection... (16850 times) [2024-06-15 15:49:47,349][1652475] InferenceWorker_p0-w0: stopping experience collection (16850 times) [2024-06-15 15:49:47,513][1651340] Signal inference workers to resume experience collection... (16850 times) [2024-06-15 15:49:47,514][1652475] InferenceWorker_p0-w0: resuming experience collection (16850 times) [2024-06-15 15:49:47,714][1652475] Updated weights for policy 0, policy_version 328250 (0.0026) [2024-06-15 15:49:50,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 672268288. Throughput: 0: 10831.6. Samples: 168132096. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:49:50,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 15:49:54,167][1652475] Updated weights for policy 0, policy_version 328353 (0.0018) [2024-06-15 15:49:55,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42603.0, 300 sec: 43320.4). Total num frames: 672595968. Throughput: 0: 10877.2. Samples: 168200192. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:49:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:49:55,998][1652475] Updated weights for policy 0, policy_version 328439 (0.0013) [2024-06-15 15:49:58,672][1652475] Updated weights for policy 0, policy_version 328468 (0.0037) [2024-06-15 15:50:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43691.2, 300 sec: 43098.3). Total num frames: 672792576. Throughput: 0: 10911.3. Samples: 168263168. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:50:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:50:03,725][1652475] Updated weights for policy 0, policy_version 328528 (0.0029) [2024-06-15 15:50:05,300][1652475] Updated weights for policy 0, policy_version 328598 (0.0013) [2024-06-15 15:50:05,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 672989184. Throughput: 0: 11150.2. Samples: 168306688. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:50:05,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:50:07,640][1652475] Updated weights for policy 0, policy_version 328697 (0.0217) [2024-06-15 15:50:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 43209.3). Total num frames: 673251328. Throughput: 0: 10922.7. Samples: 168363520. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:50:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:50:11,066][1652475] Updated weights for policy 0, policy_version 328768 (0.0013) [2024-06-15 15:50:15,751][1648984] Fps is (10 sec: 35998.2, 60 sec: 42589.1, 300 sec: 42763.1). Total num frames: 673349632. Throughput: 0: 11181.1. Samples: 168439808. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:50:15,751][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:50:16,414][1652475] Updated weights for policy 0, policy_version 328832 (0.0013) [2024-06-15 15:50:19,360][1652475] Updated weights for policy 0, policy_version 328928 (0.0013) [2024-06-15 15:50:20,741][1648984] Fps is (10 sec: 45858.7, 60 sec: 43688.0, 300 sec: 43764.2). Total num frames: 673710080. Throughput: 0: 10990.1. Samples: 168464384. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:50:20,742][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:50:21,703][1652475] Updated weights for policy 0, policy_version 329010 (0.0014) [2024-06-15 15:50:25,738][1648984] Fps is (10 sec: 49215.9, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 673841152. Throughput: 0: 10934.0. Samples: 168533504. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:50:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:50:28,046][1652475] Updated weights for policy 0, policy_version 329060 (0.0023) [2024-06-15 15:50:29,411][1652475] Updated weights for policy 0, policy_version 329104 (0.0012) [2024-06-15 15:50:30,738][1648984] Fps is (10 sec: 39335.1, 60 sec: 43144.4, 300 sec: 43542.6). Total num frames: 674103296. Throughput: 0: 11138.8. Samples: 168599040. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:50:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:50:30,993][1651340] Signal inference workers to stop experience collection... (16900 times) [2024-06-15 15:50:31,014][1652475] Updated weights for policy 0, policy_version 329172 (0.0014) [2024-06-15 15:50:31,034][1652475] InferenceWorker_p0-w0: stopping experience collection (16900 times) [2024-06-15 15:50:31,125][1651340] Signal inference workers to resume experience collection... (16900 times) [2024-06-15 15:50:31,125][1652475] InferenceWorker_p0-w0: resuming experience collection (16900 times) [2024-06-15 15:50:32,536][1652475] Updated weights for policy 0, policy_version 329248 (0.0013) [2024-06-15 15:50:35,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 674365440. Throughput: 0: 10956.8. Samples: 168625152. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:50:35,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:50:39,662][1652475] Updated weights for policy 0, policy_version 329328 (0.0016) [2024-06-15 15:50:40,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 674496512. Throughput: 0: 11059.2. Samples: 168697856. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:50:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:50:42,031][1652475] Updated weights for policy 0, policy_version 329392 (0.0012) [2024-06-15 15:50:43,476][1652475] Updated weights for policy 0, policy_version 329462 (0.0032) [2024-06-15 15:50:45,519][1652475] Updated weights for policy 0, policy_version 329507 (0.0089) [2024-06-15 15:50:45,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 674824192. Throughput: 0: 11082.0. Samples: 168761856. Policy #0 lag: (min: 52.0, avg: 206.5, max: 307.0) [2024-06-15 15:50:45,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:50:50,489][1652475] Updated weights for policy 0, policy_version 329554 (0.0014) [2024-06-15 15:50:50,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 674955264. Throughput: 0: 10911.3. Samples: 168797696. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:50:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:50:52,679][1652475] Updated weights for policy 0, policy_version 329602 (0.0012) [2024-06-15 15:50:55,101][1652475] Updated weights for policy 0, policy_version 329712 (0.0014) [2024-06-15 15:50:55,741][1648984] Fps is (10 sec: 45860.6, 60 sec: 44780.6, 300 sec: 43986.4). Total num frames: 675282944. Throughput: 0: 11024.3. Samples: 168859648. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:50:55,742][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:50:55,751][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000329728_675282944.pth... [2024-06-15 15:50:55,816][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000324608_664797184.pth [2024-06-15 15:50:55,821][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000329728_675282944.pth [2024-06-15 15:50:58,962][1652475] Updated weights for policy 0, policy_version 329749 (0.0013) [2024-06-15 15:50:59,822][1652475] Updated weights for policy 0, policy_version 329790 (0.0017) [2024-06-15 15:51:00,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43543.5). Total num frames: 675414016. Throughput: 0: 10880.3. Samples: 168929280. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:00,740][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:51:03,132][1652475] Updated weights for policy 0, policy_version 329850 (0.0014) [2024-06-15 15:51:05,562][1652475] Updated weights for policy 0, policy_version 329936 (0.0014) [2024-06-15 15:51:05,738][1648984] Fps is (10 sec: 42611.6, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 675708928. Throughput: 0: 11094.2. Samples: 168963584. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:51:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 675840000. Throughput: 0: 10956.8. Samples: 169026560. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:51:11,468][1652475] Updated weights for policy 0, policy_version 330039 (0.0119) [2024-06-15 15:51:15,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 45339.0, 300 sec: 43542.6). Total num frames: 676069376. Throughput: 0: 11013.7. Samples: 169094656. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:51:15,761][1652475] Updated weights for policy 0, policy_version 330112 (0.0016) [2024-06-15 15:51:17,549][1651340] Signal inference workers to stop experience collection... (16950 times) [2024-06-15 15:51:17,609][1652475] InferenceWorker_p0-w0: stopping experience collection (16950 times) [2024-06-15 15:51:17,790][1651340] Signal inference workers to resume experience collection... (16950 times) [2024-06-15 15:51:17,791][1652475] InferenceWorker_p0-w0: resuming experience collection (16950 times) [2024-06-15 15:51:17,953][1652475] Updated weights for policy 0, policy_version 330199 (0.0208) [2024-06-15 15:51:20,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43693.3, 300 sec: 43542.6). Total num frames: 676331520. Throughput: 0: 11013.7. Samples: 169120768. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:51:22,521][1652475] Updated weights for policy 0, policy_version 330261 (0.0012) [2024-06-15 15:51:25,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 676462592. Throughput: 0: 10934.1. Samples: 169189888. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:25,740][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:51:26,704][1652475] Updated weights for policy 0, policy_version 330310 (0.0014) [2024-06-15 15:51:27,834][1652475] Updated weights for policy 0, policy_version 330367 (0.0013) [2024-06-15 15:51:29,870][1652475] Updated weights for policy 0, policy_version 330448 (0.0021) [2024-06-15 15:51:30,745][1648984] Fps is (10 sec: 49118.0, 60 sec: 45324.0, 300 sec: 43874.8). Total num frames: 676823040. Throughput: 0: 10932.4. Samples: 169253888. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:30,745][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:51:33,748][1652475] Updated weights for policy 0, policy_version 330497 (0.0015) [2024-06-15 15:51:35,117][1652475] Updated weights for policy 0, policy_version 330551 (0.0014) [2024-06-15 15:51:35,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 676986880. Throughput: 0: 11047.8. Samples: 169294848. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:51:39,483][1652475] Updated weights for policy 0, policy_version 330594 (0.0031) [2024-06-15 15:51:40,739][1648984] Fps is (10 sec: 36063.9, 60 sec: 44781.8, 300 sec: 43320.2). Total num frames: 677183488. Throughput: 0: 11127.9. Samples: 169360384. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:40,740][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:51:40,887][1652475] Updated weights for policy 0, policy_version 330661 (0.0014) [2024-06-15 15:51:42,732][1652475] Updated weights for policy 0, policy_version 330745 (0.0011) [2024-06-15 15:51:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 677380096. Throughput: 0: 10968.2. Samples: 169422848. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:51:47,181][1652475] Updated weights for policy 0, policy_version 330807 (0.0014) [2024-06-15 15:51:50,738][1648984] Fps is (10 sec: 32773.4, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 677511168. Throughput: 0: 10843.0. Samples: 169451520. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:51:52,037][1652475] Updated weights for policy 0, policy_version 330864 (0.0021) [2024-06-15 15:51:53,841][1652475] Updated weights for policy 0, policy_version 330944 (0.0013) [2024-06-15 15:51:55,438][1652475] Updated weights for policy 0, policy_version 331006 (0.0016) [2024-06-15 15:51:55,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43692.9, 300 sec: 43875.8). Total num frames: 677904384. Throughput: 0: 10786.1. Samples: 169511936. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:51:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:51:58,698][1652475] Updated weights for policy 0, policy_version 331071 (0.0011) [2024-06-15 15:52:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 678035456. Throughput: 0: 10740.6. Samples: 169577984. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:52:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:52:05,754][1648984] Fps is (10 sec: 29442.8, 60 sec: 41494.8, 300 sec: 43206.9). Total num frames: 678199296. Throughput: 0: 10975.5. Samples: 169614848. Policy #0 lag: (min: 6.0, avg: 84.8, max: 262.0) [2024-06-15 15:52:05,755][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:52:05,898][1651340] Signal inference workers to stop experience collection... (17000 times) [2024-06-15 15:52:05,926][1652475] InferenceWorker_p0-w0: stopping experience collection (17000 times) [2024-06-15 15:52:05,930][1652475] Updated weights for policy 0, policy_version 331154 (0.0116) [2024-06-15 15:52:06,201][1651340] Signal inference workers to resume experience collection... (17000 times) [2024-06-15 15:52:06,202][1652475] InferenceWorker_p0-w0: resuming experience collection (17000 times) [2024-06-15 15:52:07,636][1652475] Updated weights for policy 0, policy_version 331218 (0.0014) [2024-06-15 15:52:08,797][1652475] Updated weights for policy 0, policy_version 331264 (0.0011) [2024-06-15 15:52:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 678494208. Throughput: 0: 10672.4. Samples: 169670144. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:52:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:52:11,134][1652475] Updated weights for policy 0, policy_version 331324 (0.0014) [2024-06-15 15:52:15,738][1648984] Fps is (10 sec: 39386.8, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 678592512. Throughput: 0: 10765.0. Samples: 169738240. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:52:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:52:15,979][1652475] Updated weights for policy 0, policy_version 331365 (0.0012) [2024-06-15 15:52:18,282][1652475] Updated weights for policy 0, policy_version 331424 (0.0018) [2024-06-15 15:52:19,982][1652475] Updated weights for policy 0, policy_version 331461 (0.0020) [2024-06-15 15:52:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 678887424. Throughput: 0: 10615.5. Samples: 169772544. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:52:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 15:52:22,341][1652475] Updated weights for policy 0, policy_version 331552 (0.0152) [2024-06-15 15:52:25,738][1648984] Fps is (10 sec: 49151.3, 60 sec: 43690.6, 300 sec: 43098.4). Total num frames: 679084032. Throughput: 0: 10467.9. Samples: 169831424. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:52:25,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:52:26,632][1652475] Updated weights for policy 0, policy_version 331585 (0.0013) [2024-06-15 15:52:29,747][1652475] Updated weights for policy 0, policy_version 331664 (0.0012) [2024-06-15 15:52:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41510.9, 300 sec: 43431.5). Total num frames: 679313408. Throughput: 0: 10661.0. Samples: 169902592. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:52:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:52:30,938][1652475] Updated weights for policy 0, policy_version 331709 (0.0011) [2024-06-15 15:52:33,437][1652475] Updated weights for policy 0, policy_version 331760 (0.0013) [2024-06-15 15:52:35,271][1652475] Updated weights for policy 0, policy_version 331839 (0.0020) [2024-06-15 15:52:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 679608320. Throughput: 0: 10786.1. Samples: 169936896. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:52:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 15:52:40,717][1652475] Updated weights for policy 0, policy_version 331897 (0.0034) [2024-06-15 15:52:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42053.4, 300 sec: 42987.2). Total num frames: 679706624. Throughput: 0: 10831.7. Samples: 169999360. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:52:40,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 15:52:42,803][1652475] Updated weights for policy 0, policy_version 331965 (0.0013) [2024-06-15 15:52:45,738][1648984] Fps is (10 sec: 36043.9, 60 sec: 43144.3, 300 sec: 43875.8). Total num frames: 679968768. Throughput: 0: 10717.8. Samples: 170060288. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:52:45,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:52:46,908][1652475] Updated weights for policy 0, policy_version 332064 (0.0014) [2024-06-15 15:52:50,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 680132608. Throughput: 0: 10414.5. Samples: 170083328. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:52:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:52:54,208][1651340] Signal inference workers to stop experience collection... (17050 times) [2024-06-15 15:52:54,242][1652475] Updated weights for policy 0, policy_version 332130 (0.0012) [2024-06-15 15:52:54,255][1652475] InferenceWorker_p0-w0: stopping experience collection (17050 times) [2024-06-15 15:52:54,512][1651340] Signal inference workers to resume experience collection... (17050 times) [2024-06-15 15:52:54,513][1652475] InferenceWorker_p0-w0: resuming experience collection (17050 times) [2024-06-15 15:52:55,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 39867.5, 300 sec: 43209.3). Total num frames: 680296448. Throughput: 0: 10717.8. Samples: 170152448. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:52:55,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:52:56,058][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000332208_680361984.pth... [2024-06-15 15:52:56,118][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000327168_670040064.pth [2024-06-15 15:52:56,356][1652475] Updated weights for policy 0, policy_version 332224 (0.0013) [2024-06-15 15:53:00,434][1652475] Updated weights for policy 0, policy_version 332336 (0.0015) [2024-06-15 15:53:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 680656896. Throughput: 0: 10240.0. Samples: 170199040. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:53:00,752][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:53:05,738][1648984] Fps is (10 sec: 36046.1, 60 sec: 40971.3, 300 sec: 42653.9). Total num frames: 680656896. Throughput: 0: 10353.8. Samples: 170238464. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:53:05,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:53:07,136][1652475] Updated weights for policy 0, policy_version 332415 (0.0015) [2024-06-15 15:53:10,408][1652475] Updated weights for policy 0, policy_version 332485 (0.0014) [2024-06-15 15:53:10,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 40959.9, 300 sec: 43653.7). Total num frames: 680951808. Throughput: 0: 10376.5. Samples: 170298368. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:53:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:53:12,221][1652475] Updated weights for policy 0, policy_version 332550 (0.0145) [2024-06-15 15:53:13,492][1652475] Updated weights for policy 0, policy_version 332608 (0.0023) [2024-06-15 15:53:15,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 681181184. Throughput: 0: 10217.2. Samples: 170362368. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:53:15,742][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:53:19,463][1652475] Updated weights for policy 0, policy_version 332672 (0.0014) [2024-06-15 15:53:20,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 41506.0, 300 sec: 43320.4). Total num frames: 681377792. Throughput: 0: 10331.0. Samples: 170401792. Policy #0 lag: (min: 31.0, avg: 130.0, max: 287.0) [2024-06-15 15:53:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:53:21,073][1652475] Updated weights for policy 0, policy_version 332731 (0.0013) [2024-06-15 15:53:23,340][1652475] Updated weights for policy 0, policy_version 332787 (0.0016) [2024-06-15 15:53:25,295][1652475] Updated weights for policy 0, policy_version 332833 (0.0012) [2024-06-15 15:53:25,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.6, 300 sec: 43653.6). Total num frames: 681672704. Throughput: 0: 10331.0. Samples: 170464256. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:53:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:53:29,450][1652475] Updated weights for policy 0, policy_version 332883 (0.0027) [2024-06-15 15:53:30,415][1652475] Updated weights for policy 0, policy_version 332928 (0.0060) [2024-06-15 15:53:30,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 42052.2, 300 sec: 43320.4). Total num frames: 681836544. Throughput: 0: 10558.6. Samples: 170535424. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:53:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:53:32,187][1652475] Updated weights for policy 0, policy_version 332976 (0.0013) [2024-06-15 15:53:33,596][1652475] Updated weights for policy 0, policy_version 333024 (0.0012) [2024-06-15 15:53:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 43653.7). Total num frames: 682098688. Throughput: 0: 10797.5. Samples: 170569216. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:53:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:53:36,361][1652475] Updated weights for policy 0, policy_version 333076 (0.0012) [2024-06-15 15:53:40,739][1648984] Fps is (10 sec: 39320.9, 60 sec: 42052.1, 300 sec: 43098.2). Total num frames: 682229760. Throughput: 0: 10831.7. Samples: 170639872. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:53:40,740][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:53:41,246][1651340] Signal inference workers to stop experience collection... (17100 times) [2024-06-15 15:53:41,289][1652475] InferenceWorker_p0-w0: stopping experience collection (17100 times) [2024-06-15 15:53:41,291][1652475] Updated weights for policy 0, policy_version 333138 (0.0012) [2024-06-15 15:53:41,494][1651340] Signal inference workers to resume experience collection... (17100 times) [2024-06-15 15:53:41,495][1652475] InferenceWorker_p0-w0: resuming experience collection (17100 times) [2024-06-15 15:53:43,416][1652475] Updated weights for policy 0, policy_version 333239 (0.0019) [2024-06-15 15:53:45,403][1652475] Updated weights for policy 0, policy_version 333280 (0.0015) [2024-06-15 15:53:45,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43144.7, 300 sec: 43764.7). Total num frames: 682557440. Throughput: 0: 11229.9. Samples: 170704384. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:53:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:53:47,834][1652475] Updated weights for policy 0, policy_version 333315 (0.0010) [2024-06-15 15:53:49,170][1652475] Updated weights for policy 0, policy_version 333373 (0.0016) [2024-06-15 15:53:50,742][1648984] Fps is (10 sec: 52405.7, 60 sec: 43687.3, 300 sec: 43098.5). Total num frames: 682754048. Throughput: 0: 11137.7. Samples: 170739712. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:53:50,743][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:53:54,234][1652475] Updated weights for policy 0, policy_version 333456 (0.0014) [2024-06-15 15:53:55,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 45329.4, 300 sec: 43542.7). Total num frames: 683016192. Throughput: 0: 11286.8. Samples: 170806272. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:53:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:53:57,024][1652475] Updated weights for policy 0, policy_version 333520 (0.0100) [2024-06-15 15:53:59,205][1652475] Updated weights for policy 0, policy_version 333572 (0.0012) [2024-06-15 15:54:00,486][1652475] Updated weights for policy 0, policy_version 333630 (0.0013) [2024-06-15 15:54:00,738][1648984] Fps is (10 sec: 52452.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 683278336. Throughput: 0: 11241.2. Samples: 170868224. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:54:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:54:05,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 683376640. Throughput: 0: 11229.9. Samples: 170907136. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:54:05,738][1648984] Avg episode reward: [(0, '-0.710')] [2024-06-15 15:54:07,034][1652475] Updated weights for policy 0, policy_version 333699 (0.0024) [2024-06-15 15:54:09,410][1652475] Updated weights for policy 0, policy_version 333794 (0.0014) [2024-06-15 15:54:10,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 45328.9, 300 sec: 43653.6). Total num frames: 683671552. Throughput: 0: 11093.3. Samples: 170963456. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:54:10,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:54:11,605][1652475] Updated weights for policy 0, policy_version 333856 (0.0019) [2024-06-15 15:54:15,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 683802624. Throughput: 0: 11036.5. Samples: 171032064. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:54:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:54:17,077][1652475] Updated weights for policy 0, policy_version 333920 (0.0018) [2024-06-15 15:54:20,216][1652475] Updated weights for policy 0, policy_version 334000 (0.0014) [2024-06-15 15:54:20,738][1648984] Fps is (10 sec: 39323.1, 60 sec: 44783.1, 300 sec: 43542.6). Total num frames: 684064768. Throughput: 0: 11070.6. Samples: 171067392. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:54:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:54:21,921][1652475] Updated weights for policy 0, policy_version 334032 (0.0011) [2024-06-15 15:54:23,593][1652475] Updated weights for policy 0, policy_version 334100 (0.0013) [2024-06-15 15:54:23,962][1651340] Signal inference workers to stop experience collection... (17150 times) [2024-06-15 15:54:24,061][1652475] InferenceWorker_p0-w0: stopping experience collection (17150 times) [2024-06-15 15:54:24,294][1651340] Signal inference workers to resume experience collection... (17150 times) [2024-06-15 15:54:24,295][1652475] InferenceWorker_p0-w0: resuming experience collection (17150 times) [2024-06-15 15:54:24,692][1652475] Updated weights for policy 0, policy_version 334143 (0.0019) [2024-06-15 15:54:25,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 684326912. Throughput: 0: 10797.6. Samples: 171125760. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:54:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:54:29,644][1652475] Updated weights for policy 0, policy_version 334208 (0.0112) [2024-06-15 15:54:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 684457984. Throughput: 0: 10877.2. Samples: 171193856. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:54:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:54:34,215][1652475] Updated weights for policy 0, policy_version 334288 (0.0014) [2024-06-15 15:54:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 684720128. Throughput: 0: 10923.8. Samples: 171231232. Policy #0 lag: (min: 15.0, avg: 135.3, max: 271.0) [2024-06-15 15:54:35,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 15:54:36,530][1652475] Updated weights for policy 0, policy_version 334384 (0.0014) [2024-06-15 15:54:40,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44237.0, 300 sec: 43209.3). Total num frames: 684883968. Throughput: 0: 10638.2. Samples: 171284992. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:54:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:54:41,127][1652475] Updated weights for policy 0, policy_version 334441 (0.0013) [2024-06-15 15:54:44,840][1652475] Updated weights for policy 0, policy_version 334480 (0.0016) [2024-06-15 15:54:45,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.3, 300 sec: 43431.5). Total num frames: 685080576. Throughput: 0: 10945.4. Samples: 171360768. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:54:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:54:46,347][1652475] Updated weights for policy 0, policy_version 334531 (0.0013) [2024-06-15 15:54:47,860][1652475] Updated weights for policy 0, policy_version 334593 (0.0035) [2024-06-15 15:54:49,092][1652475] Updated weights for policy 0, policy_version 334647 (0.0012) [2024-06-15 15:54:50,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43694.0, 300 sec: 43320.4). Total num frames: 685375488. Throughput: 0: 10626.8. Samples: 171385344. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:54:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:54:51,893][1652475] Updated weights for policy 0, policy_version 334688 (0.0016) [2024-06-15 15:54:55,770][1648984] Fps is (10 sec: 42459.9, 60 sec: 41483.6, 300 sec: 43093.5). Total num frames: 685506560. Throughput: 0: 11028.5. Samples: 171460096. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:54:55,771][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:54:55,776][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000334720_685506560.pth... [2024-06-15 15:54:55,891][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000329728_675282944.pth [2024-06-15 15:54:56,864][1652475] Updated weights for policy 0, policy_version 334756 (0.0013) [2024-06-15 15:54:58,961][1652475] Updated weights for policy 0, policy_version 334835 (0.0109) [2024-06-15 15:55:00,522][1652475] Updated weights for policy 0, policy_version 334896 (0.0015) [2024-06-15 15:55:00,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 685867008. Throughput: 0: 10729.2. Samples: 171514880. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:00,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:55:05,145][1652475] Updated weights for policy 0, policy_version 334928 (0.0105) [2024-06-15 15:55:05,738][1648984] Fps is (10 sec: 46025.6, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 685965312. Throughput: 0: 10843.0. Samples: 171555328. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 15:55:08,499][1652475] Updated weights for policy 0, policy_version 334996 (0.0012) [2024-06-15 15:55:09,633][1651340] Signal inference workers to stop experience collection... (17200 times) [2024-06-15 15:55:09,696][1652475] InferenceWorker_p0-w0: stopping experience collection (17200 times) [2024-06-15 15:55:09,830][1651340] Signal inference workers to resume experience collection... (17200 times) [2024-06-15 15:55:09,842][1652475] InferenceWorker_p0-w0: resuming experience collection (17200 times) [2024-06-15 15:55:09,992][1652475] Updated weights for policy 0, policy_version 335057 (0.0014) [2024-06-15 15:55:10,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 42598.6, 300 sec: 43655.6). Total num frames: 686227456. Throughput: 0: 10922.7. Samples: 171617280. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:55:11,807][1652475] Updated weights for policy 0, policy_version 335120 (0.0013) [2024-06-15 15:55:13,065][1652475] Updated weights for policy 0, policy_version 335168 (0.0012) [2024-06-15 15:55:15,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.6, 300 sec: 43098.8). Total num frames: 686424064. Throughput: 0: 10717.9. Samples: 171676160. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:55:19,965][1652475] Updated weights for policy 0, policy_version 335232 (0.0184) [2024-06-15 15:55:20,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 686620672. Throughput: 0: 10683.7. Samples: 171712000. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:55:21,397][1652475] Updated weights for policy 0, policy_version 335290 (0.0012) [2024-06-15 15:55:23,936][1652475] Updated weights for policy 0, policy_version 335344 (0.0017) [2024-06-15 15:55:25,229][1652475] Updated weights for policy 0, policy_version 335397 (0.0015) [2024-06-15 15:55:25,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 686948352. Throughput: 0: 10752.0. Samples: 171768832. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:55:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 687013888. Throughput: 0: 10661.0. Samples: 171840512. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 15:55:31,157][1652475] Updated weights for policy 0, policy_version 335472 (0.0022) [2024-06-15 15:55:33,199][1652475] Updated weights for policy 0, policy_version 335541 (0.0015) [2024-06-15 15:55:34,618][1652475] Updated weights for policy 0, policy_version 335569 (0.0013) [2024-06-15 15:55:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 687341568. Throughput: 0: 10649.6. Samples: 171864576. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:55:36,766][1652475] Updated weights for policy 0, policy_version 335638 (0.0012) [2024-06-15 15:55:40,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 687472640. Throughput: 0: 10623.2. Samples: 171937792. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:55:42,328][1652475] Updated weights for policy 0, policy_version 335697 (0.0042) [2024-06-15 15:55:43,821][1652475] Updated weights for policy 0, policy_version 335760 (0.0014) [2024-06-15 15:55:45,161][1652475] Updated weights for policy 0, policy_version 335808 (0.0089) [2024-06-15 15:55:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 687734784. Throughput: 0: 10774.8. Samples: 171999744. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:55:48,507][1652475] Updated weights for policy 0, policy_version 335873 (0.0014) [2024-06-15 15:55:49,786][1652475] Updated weights for policy 0, policy_version 335934 (0.0052) [2024-06-15 15:55:50,738][1648984] Fps is (10 sec: 52426.2, 60 sec: 43690.3, 300 sec: 43098.6). Total num frames: 687996928. Throughput: 0: 10695.0. Samples: 172036608. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:50,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:55:55,735][1652475] Updated weights for policy 0, policy_version 335988 (0.0015) [2024-06-15 15:55:55,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43168.0, 300 sec: 42987.2). Total num frames: 688095232. Throughput: 0: 10945.4. Samples: 172109824. Policy #0 lag: (min: 15.0, avg: 123.4, max: 271.0) [2024-06-15 15:55:55,746][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:55:56,415][1651340] Signal inference workers to stop experience collection... (17250 times) [2024-06-15 15:55:56,476][1652475] InferenceWorker_p0-w0: stopping experience collection (17250 times) [2024-06-15 15:55:56,691][1651340] Signal inference workers to resume experience collection... (17250 times) [2024-06-15 15:55:56,692][1652475] InferenceWorker_p0-w0: resuming experience collection (17250 times) [2024-06-15 15:55:58,493][1652475] Updated weights for policy 0, policy_version 336080 (0.0020) [2024-06-15 15:56:00,738][1648984] Fps is (10 sec: 39323.7, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 688390144. Throughput: 0: 10763.4. Samples: 172160512. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:56:02,002][1652475] Updated weights for policy 0, policy_version 336188 (0.0092) [2024-06-15 15:56:05,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.3, 300 sec: 42987.2). Total num frames: 688521216. Throughput: 0: 10706.5. Samples: 172193792. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:56:08,203][1652475] Updated weights for policy 0, policy_version 336247 (0.0015) [2024-06-15 15:56:09,558][1652475] Updated weights for policy 0, policy_version 336316 (0.0013) [2024-06-15 15:56:10,738][1648984] Fps is (10 sec: 42597.3, 60 sec: 43144.4, 300 sec: 43209.3). Total num frames: 688816128. Throughput: 0: 10945.4. Samples: 172261376. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:10,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:56:11,191][1652475] Updated weights for policy 0, policy_version 336368 (0.0111) [2024-06-15 15:56:13,142][1652475] Updated weights for policy 0, policy_version 336416 (0.0013) [2024-06-15 15:56:15,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 689045504. Throughput: 0: 10831.7. Samples: 172327936. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:56:19,309][1652475] Updated weights for policy 0, policy_version 336465 (0.0013) [2024-06-15 15:56:20,738][1648984] Fps is (10 sec: 36046.0, 60 sec: 42598.5, 300 sec: 43098.3). Total num frames: 689176576. Throughput: 0: 11127.5. Samples: 172365312. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:56:20,943][1652475] Updated weights for policy 0, policy_version 336530 (0.0012) [2024-06-15 15:56:22,454][1652475] Updated weights for policy 0, policy_version 336594 (0.0140) [2024-06-15 15:56:25,268][1652475] Updated weights for policy 0, policy_version 336672 (0.0017) [2024-06-15 15:56:25,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.6, 300 sec: 43099.3). Total num frames: 689537024. Throughput: 0: 10706.5. Samples: 172419584. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:25,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 15:56:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 689569792. Throughput: 0: 10831.6. Samples: 172487168. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:56:31,304][1652475] Updated weights for policy 0, policy_version 336722 (0.0018) [2024-06-15 15:56:33,846][1652475] Updated weights for policy 0, policy_version 336784 (0.0014) [2024-06-15 15:56:35,219][1652475] Updated weights for policy 0, policy_version 336836 (0.0012) [2024-06-15 15:56:35,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42052.3, 300 sec: 42987.4). Total num frames: 689864704. Throughput: 0: 10763.5. Samples: 172520960. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:56:37,018][1652475] Updated weights for policy 0, policy_version 336899 (0.0013) [2024-06-15 15:56:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 690094080. Throughput: 0: 10308.3. Samples: 172573696. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:40,740][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 15:56:43,927][1651340] Signal inference workers to stop experience collection... (17300 times) [2024-06-15 15:56:43,958][1652475] InferenceWorker_p0-w0: stopping experience collection (17300 times) [2024-06-15 15:56:44,224][1651340] Signal inference workers to resume experience collection... (17300 times) [2024-06-15 15:56:44,226][1652475] InferenceWorker_p0-w0: resuming experience collection (17300 times) [2024-06-15 15:56:44,228][1652475] Updated weights for policy 0, policy_version 336992 (0.0015) [2024-06-15 15:56:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 690225152. Throughput: 0: 10706.5. Samples: 172642304. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:45,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 15:56:47,307][1652475] Updated weights for policy 0, policy_version 337072 (0.0028) [2024-06-15 15:56:49,223][1652475] Updated weights for policy 0, policy_version 337145 (0.0012) [2024-06-15 15:56:50,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43145.0, 300 sec: 42987.2). Total num frames: 690585600. Throughput: 0: 10547.2. Samples: 172668416. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:50,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 15:56:50,802][1652475] Updated weights for policy 0, policy_version 337208 (0.0146) [2024-06-15 15:56:55,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 690618368. Throughput: 0: 10467.6. Samples: 172732416. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:56:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 15:56:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000337216_690618368.pth... [2024-06-15 15:56:55,977][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000332208_680361984.pth [2024-06-15 15:56:57,109][1652475] Updated weights for policy 0, policy_version 337272 (0.0010) [2024-06-15 15:56:59,836][1652475] Updated weights for policy 0, policy_version 337316 (0.0026) [2024-06-15 15:57:00,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 41506.1, 300 sec: 42989.6). Total num frames: 690880512. Throughput: 0: 10399.3. Samples: 172795904. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:57:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:57:01,175][1652475] Updated weights for policy 0, policy_version 337351 (0.0011) [2024-06-15 15:57:03,267][1652475] Updated weights for policy 0, policy_version 337456 (0.0013) [2024-06-15 15:57:05,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 691142656. Throughput: 0: 10217.2. Samples: 172825088. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:57:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:57:07,659][1652475] Updated weights for policy 0, policy_version 337490 (0.0013) [2024-06-15 15:57:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40960.2, 300 sec: 42987.2). Total num frames: 691273728. Throughput: 0: 10524.4. Samples: 172893184. Policy #0 lag: (min: 93.0, avg: 149.5, max: 295.0) [2024-06-15 15:57:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 15:57:10,984][1652475] Updated weights for policy 0, policy_version 337542 (0.0013) [2024-06-15 15:57:12,120][1652475] Updated weights for policy 0, policy_version 337594 (0.0013) [2024-06-15 15:57:13,595][1652475] Updated weights for policy 0, policy_version 337653 (0.0015) [2024-06-15 15:57:15,150][1652475] Updated weights for policy 0, policy_version 337699 (0.0012) [2024-06-15 15:57:15,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 691634176. Throughput: 0: 10501.7. Samples: 172959744. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:57:15,783][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:57:20,283][1652475] Updated weights for policy 0, policy_version 337765 (0.0015) [2024-06-15 15:57:20,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 691765248. Throughput: 0: 10569.9. Samples: 172996608. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:57:20,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 15:57:22,910][1652475] Updated weights for policy 0, policy_version 337824 (0.0017) [2024-06-15 15:57:24,596][1652475] Updated weights for policy 0, policy_version 337872 (0.0014) [2024-06-15 15:57:25,739][1648984] Fps is (10 sec: 42594.9, 60 sec: 42051.7, 300 sec: 43209.2). Total num frames: 692060160. Throughput: 0: 10820.1. Samples: 173060608. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:57:25,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:57:26,330][1652475] Updated weights for policy 0, policy_version 337936 (0.0013) [2024-06-15 15:57:30,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 692191232. Throughput: 0: 10683.7. Samples: 173123072. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:57:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 15:57:33,468][1651340] Signal inference workers to stop experience collection... (17350 times) [2024-06-15 15:57:33,539][1652475] InferenceWorker_p0-w0: stopping experience collection (17350 times) [2024-06-15 15:57:33,567][1652475] Updated weights for policy 0, policy_version 337988 (0.0015) [2024-06-15 15:57:33,806][1651340] Signal inference workers to resume experience collection... (17350 times) [2024-06-15 15:57:33,807][1652475] InferenceWorker_p0-w0: resuming experience collection (17350 times) [2024-06-15 15:57:35,387][1652475] Updated weights for policy 0, policy_version 338066 (0.0013) [2024-06-15 15:57:35,738][1648984] Fps is (10 sec: 32768.7, 60 sec: 42051.8, 300 sec: 42987.1). Total num frames: 692387840. Throughput: 0: 10956.6. Samples: 173161472. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:57:35,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 15:57:37,156][1652475] Updated weights for policy 0, policy_version 338144 (0.0016) [2024-06-15 15:57:39,515][1652475] Updated weights for policy 0, policy_version 338233 (0.0091) [2024-06-15 15:57:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43209.4). Total num frames: 692715520. Throughput: 0: 10638.2. Samples: 173211136. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:57:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:57:45,738][1648984] Fps is (10 sec: 32770.1, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 692715520. Throughput: 0: 10877.2. Samples: 173285376. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:57:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:57:46,555][1652475] Updated weights for policy 0, policy_version 338276 (0.0013) [2024-06-15 15:57:48,256][1652475] Updated weights for policy 0, policy_version 338362 (0.0024) [2024-06-15 15:57:50,603][1652475] Updated weights for policy 0, policy_version 338432 (0.0092) [2024-06-15 15:57:50,739][1648984] Fps is (10 sec: 39320.9, 60 sec: 42052.1, 300 sec: 43431.5). Total num frames: 693108736. Throughput: 0: 10820.2. Samples: 173312000. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:57:50,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:57:52,719][1652475] Updated weights for policy 0, policy_version 338489 (0.0013) [2024-06-15 15:57:55,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 693239808. Throughput: 0: 10604.1. Samples: 173370368. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:57:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 15:57:58,956][1652475] Updated weights for policy 0, policy_version 338576 (0.0012) [2024-06-15 15:58:00,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 693501952. Throughput: 0: 10683.7. Samples: 173440512. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:58:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:58:01,742][1652475] Updated weights for policy 0, policy_version 338640 (0.0077) [2024-06-15 15:58:02,695][1652475] Updated weights for policy 0, policy_version 338680 (0.0011) [2024-06-15 15:58:04,482][1652475] Updated weights for policy 0, policy_version 338736 (0.0012) [2024-06-15 15:58:05,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 693764096. Throughput: 0: 10638.2. Samples: 173475328. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:58:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:58:09,585][1652475] Updated weights for policy 0, policy_version 338784 (0.0013) [2024-06-15 15:58:10,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44236.7, 300 sec: 43209.3). Total num frames: 693927936. Throughput: 0: 10866.0. Samples: 173549568. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:58:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:58:11,022][1652475] Updated weights for policy 0, policy_version 338854 (0.0021) [2024-06-15 15:58:13,017][1652475] Updated weights for policy 0, policy_version 338896 (0.0013) [2024-06-15 15:58:14,796][1651340] Signal inference workers to stop experience collection... (17400 times) [2024-06-15 15:58:14,886][1652475] InferenceWorker_p0-w0: stopping experience collection (17400 times) [2024-06-15 15:58:15,059][1651340] Signal inference workers to resume experience collection... (17400 times) [2024-06-15 15:58:15,061][1652475] InferenceWorker_p0-w0: resuming experience collection (17400 times) [2024-06-15 15:58:15,063][1652475] Updated weights for policy 0, policy_version 338960 (0.0013) [2024-06-15 15:58:15,743][1648984] Fps is (10 sec: 45852.6, 60 sec: 43141.0, 300 sec: 43541.9). Total num frames: 694222848. Throughput: 0: 10876.0. Samples: 173612544. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:58:15,743][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:58:20,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 694288384. Throughput: 0: 10683.9. Samples: 173642240. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:58:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:58:20,867][1652475] Updated weights for policy 0, policy_version 339024 (0.0018) [2024-06-15 15:58:22,825][1652475] Updated weights for policy 0, policy_version 339104 (0.0017) [2024-06-15 15:58:25,107][1652475] Updated weights for policy 0, policy_version 339168 (0.0012) [2024-06-15 15:58:25,738][1648984] Fps is (10 sec: 42619.2, 60 sec: 43145.1, 300 sec: 43431.5). Total num frames: 694648832. Throughput: 0: 11184.4. Samples: 173714432. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:58:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:58:26,866][1652475] Updated weights for policy 0, policy_version 339219 (0.0018) [2024-06-15 15:58:30,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 694812672. Throughput: 0: 11013.7. Samples: 173780992. Policy #0 lag: (min: 13.0, avg: 124.1, max: 269.0) [2024-06-15 15:58:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:58:32,611][1652475] Updated weights for policy 0, policy_version 339283 (0.0013) [2024-06-15 15:58:33,584][1652475] Updated weights for policy 0, policy_version 339333 (0.0012) [2024-06-15 15:58:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44783.4, 300 sec: 43542.6). Total num frames: 695074816. Throughput: 0: 11298.2. Samples: 173820416. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:58:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 15:58:35,865][1652475] Updated weights for policy 0, policy_version 339393 (0.0013) [2024-06-15 15:58:37,599][1652475] Updated weights for policy 0, policy_version 339459 (0.0014) [2024-06-15 15:58:40,739][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 695336960. Throughput: 0: 11286.7. Samples: 173878272. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:58:40,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 15:58:44,095][1652475] Updated weights for policy 0, policy_version 339541 (0.0012) [2024-06-15 15:58:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 43098.9). Total num frames: 695468032. Throughput: 0: 11241.2. Samples: 173946368. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:58:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:58:48,450][1652475] Updated weights for policy 0, policy_version 339632 (0.0013) [2024-06-15 15:58:50,729][1652475] Updated weights for policy 0, policy_version 339714 (0.0011) [2024-06-15 15:58:50,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.9, 300 sec: 43098.3). Total num frames: 695730176. Throughput: 0: 11195.7. Samples: 173979136. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:58:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 15:58:52,180][1652475] Updated weights for policy 0, policy_version 339775 (0.0016) [2024-06-15 15:58:55,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.9, 300 sec: 42765.0). Total num frames: 695894016. Throughput: 0: 10820.3. Samples: 174036480. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:58:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 15:58:56,177][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000339824_695959552.pth... [2024-06-15 15:58:56,237][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000334720_685506560.pth [2024-06-15 15:58:56,428][1652475] Updated weights for policy 0, policy_version 339836 (0.0013) [2024-06-15 15:59:00,740][1648984] Fps is (10 sec: 29483.7, 60 sec: 42050.5, 300 sec: 42875.7). Total num frames: 696025088. Throughput: 0: 11071.2. Samples: 174110720. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:59:00,741][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 15:59:01,477][1652475] Updated weights for policy 0, policy_version 339900 (0.0013) [2024-06-15 15:59:01,667][1651340] Signal inference workers to stop experience collection... (17450 times) [2024-06-15 15:59:01,741][1652475] InferenceWorker_p0-w0: stopping experience collection (17450 times) [2024-06-15 15:59:01,918][1651340] Signal inference workers to resume experience collection... (17450 times) [2024-06-15 15:59:01,919][1652475] InferenceWorker_p0-w0: resuming experience collection (17450 times) [2024-06-15 15:59:03,285][1652475] Updated weights for policy 0, policy_version 339970 (0.0013) [2024-06-15 15:59:04,729][1652475] Updated weights for policy 0, policy_version 340031 (0.0015) [2024-06-15 15:59:05,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 696385536. Throughput: 0: 10911.3. Samples: 174133248. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:59:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 15:59:10,738][1648984] Fps is (10 sec: 49164.2, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 696516608. Throughput: 0: 10615.5. Samples: 174192128. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:59:10,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 15:59:13,578][1652475] Updated weights for policy 0, policy_version 340112 (0.0015) [2024-06-15 15:59:15,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 40963.4, 300 sec: 42765.0). Total num frames: 696680448. Throughput: 0: 10683.8. Samples: 174261760. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:59:15,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 15:59:16,599][1652475] Updated weights for policy 0, policy_version 340224 (0.0091) [2024-06-15 15:59:20,157][1652475] Updated weights for policy 0, policy_version 340304 (0.0055) [2024-06-15 15:59:20,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 696975360. Throughput: 0: 10228.6. Samples: 174280704. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:59:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:59:25,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 39867.7, 300 sec: 42653.9). Total num frames: 697040896. Throughput: 0: 10570.0. Samples: 174353920. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:59:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 15:59:25,798][1652475] Updated weights for policy 0, policy_version 340355 (0.0014) [2024-06-15 15:59:27,562][1652475] Updated weights for policy 0, policy_version 340433 (0.0012) [2024-06-15 15:59:29,437][1652475] Updated weights for policy 0, policy_version 340512 (0.0032) [2024-06-15 15:59:30,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 697434112. Throughput: 0: 10319.6. Samples: 174410752. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:59:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 15:59:32,599][1652475] Updated weights for policy 0, policy_version 340604 (0.0013) [2024-06-15 15:59:35,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 697565184. Throughput: 0: 10342.4. Samples: 174444544. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:59:35,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 15:59:38,773][1652475] Updated weights for policy 0, policy_version 340668 (0.0095) [2024-06-15 15:59:40,256][1652475] Updated weights for policy 0, policy_version 340733 (0.0014) [2024-06-15 15:59:40,740][1648984] Fps is (10 sec: 39313.0, 60 sec: 41504.7, 300 sec: 43209.0). Total num frames: 697827328. Throughput: 0: 10592.2. Samples: 174513152. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:59:40,741][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 15:59:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 697958400. Throughput: 0: 10274.7. Samples: 174573056. Policy #0 lag: (min: 10.0, avg: 85.2, max: 266.0) [2024-06-15 15:59:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 15:59:46,085][1652475] Updated weights for policy 0, policy_version 340802 (0.0089) [2024-06-15 15:59:47,318][1652475] Updated weights for policy 0, policy_version 340864 (0.0013) [2024-06-15 15:59:48,650][1651340] Signal inference workers to stop experience collection... (17500 times) [2024-06-15 15:59:48,690][1652475] InferenceWorker_p0-w0: stopping experience collection (17500 times) [2024-06-15 15:59:48,908][1651340] Signal inference workers to resume experience collection... (17500 times) [2024-06-15 15:59:48,909][1652475] InferenceWorker_p0-w0: resuming experience collection (17500 times) [2024-06-15 15:59:49,789][1652475] Updated weights for policy 0, policy_version 340920 (0.0015) [2024-06-15 15:59:50,739][1648984] Fps is (10 sec: 39326.0, 60 sec: 41505.4, 300 sec: 43102.9). Total num frames: 698220544. Throughput: 0: 10524.2. Samples: 174606848. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 15:59:50,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:59:51,802][1652475] Updated weights for policy 0, policy_version 340990 (0.0012) [2024-06-15 15:59:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 698482688. Throughput: 0: 10569.9. Samples: 174667776. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 15:59:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 15:59:58,981][1652475] Updated weights for policy 0, policy_version 341072 (0.0013) [2024-06-15 16:00:00,738][1648984] Fps is (10 sec: 39325.6, 60 sec: 43146.3, 300 sec: 42876.1). Total num frames: 698613760. Throughput: 0: 10535.8. Samples: 174735872. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:00:01,943][1652475] Updated weights for policy 0, policy_version 341124 (0.0014) [2024-06-15 16:00:03,908][1652475] Updated weights for policy 0, policy_version 341203 (0.0121) [2024-06-15 16:00:05,738][1648984] Fps is (10 sec: 45874.1, 60 sec: 42598.1, 300 sec: 43098.2). Total num frames: 698941440. Throughput: 0: 10865.7. Samples: 174769664. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:05,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:00:05,850][1652475] Updated weights for policy 0, policy_version 341285 (0.0022) [2024-06-15 16:00:10,209][1652475] Updated weights for policy 0, policy_version 341315 (0.0014) [2024-06-15 16:00:10,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 699039744. Throughput: 0: 10661.0. Samples: 174833664. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:00:14,171][1652475] Updated weights for policy 0, policy_version 341396 (0.0014) [2024-06-15 16:00:15,434][1652475] Updated weights for policy 0, policy_version 341444 (0.0013) [2024-06-15 16:00:15,738][1648984] Fps is (10 sec: 32769.0, 60 sec: 43144.4, 300 sec: 42876.1). Total num frames: 699269120. Throughput: 0: 10968.2. Samples: 174904320. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:00:18,600][1652475] Updated weights for policy 0, policy_version 341562 (0.0015) [2024-06-15 16:00:20,738][1648984] Fps is (10 sec: 49151.2, 60 sec: 42598.2, 300 sec: 42653.9). Total num frames: 699531264. Throughput: 0: 10626.8. Samples: 174922752. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:20,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:00:22,578][1652475] Updated weights for policy 0, policy_version 341616 (0.0013) [2024-06-15 16:00:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 699662336. Throughput: 0: 10741.1. Samples: 174996480. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:00:27,021][1652475] Updated weights for policy 0, policy_version 341664 (0.0013) [2024-06-15 16:00:29,376][1652475] Updated weights for policy 0, policy_version 341763 (0.0094) [2024-06-15 16:00:30,738][1648984] Fps is (10 sec: 49153.0, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 700022784. Throughput: 0: 10615.5. Samples: 175050752. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:30,740][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:00:34,023][1651340] Signal inference workers to stop experience collection... (17550 times) [2024-06-15 16:00:34,079][1652475] InferenceWorker_p0-w0: stopping experience collection (17550 times) [2024-06-15 16:00:34,244][1651340] Signal inference workers to resume experience collection... (17550 times) [2024-06-15 16:00:34,246][1652475] InferenceWorker_p0-w0: resuming experience collection (17550 times) [2024-06-15 16:00:34,247][1652475] Updated weights for policy 0, policy_version 341840 (0.0013) [2024-06-15 16:00:35,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 700186624. Throughput: 0: 10672.6. Samples: 175087104. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:00:39,455][1652475] Updated weights for policy 0, policy_version 341904 (0.0329) [2024-06-15 16:00:40,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 41507.7, 300 sec: 42653.9). Total num frames: 700317696. Throughput: 0: 10991.0. Samples: 175162368. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:00:41,579][1652475] Updated weights for policy 0, policy_version 341988 (0.0111) [2024-06-15 16:00:43,368][1652475] Updated weights for policy 0, policy_version 342072 (0.0017) [2024-06-15 16:00:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 700579840. Throughput: 0: 10706.5. Samples: 175217664. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:00:47,468][1652475] Updated weights for policy 0, policy_version 342139 (0.0043) [2024-06-15 16:00:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.9, 300 sec: 42765.0). Total num frames: 700710912. Throughput: 0: 10638.3. Samples: 175248384. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:00:52,116][1652475] Updated weights for policy 0, policy_version 342192 (0.0021) [2024-06-15 16:00:53,249][1652475] Updated weights for policy 0, policy_version 342229 (0.0012) [2024-06-15 16:00:55,495][1652475] Updated weights for policy 0, policy_version 342327 (0.0091) [2024-06-15 16:00:55,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 701104128. Throughput: 0: 10661.0. Samples: 175313408. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:00:55,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:00:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000342336_701104128.pth... [2024-06-15 16:00:55,808][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000337216_690618368.pth [2024-06-15 16:00:59,466][1652475] Updated weights for policy 0, policy_version 342393 (0.0014) [2024-06-15 16:01:00,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 701235200. Throughput: 0: 10456.2. Samples: 175374848. Policy #0 lag: (min: 15.0, avg: 111.3, max: 271.0) [2024-06-15 16:01:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:01:05,115][1652475] Updated weights for policy 0, policy_version 342448 (0.0150) [2024-06-15 16:01:05,738][1648984] Fps is (10 sec: 26214.4, 60 sec: 40414.1, 300 sec: 42542.9). Total num frames: 701366272. Throughput: 0: 10888.6. Samples: 175412736. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:01:06,890][1652475] Updated weights for policy 0, policy_version 342514 (0.0150) [2024-06-15 16:01:08,283][1652475] Updated weights for policy 0, policy_version 342585 (0.0013) [2024-06-15 16:01:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 701661184. Throughput: 0: 10535.8. Samples: 175470592. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:01:11,233][1652475] Updated weights for policy 0, policy_version 342640 (0.0018) [2024-06-15 16:01:15,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 701759488. Throughput: 0: 10877.2. Samples: 175540224. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:01:16,993][1652475] Updated weights for policy 0, policy_version 342704 (0.0016) [2024-06-15 16:01:17,947][1651340] Signal inference workers to stop experience collection... (17600 times) [2024-06-15 16:01:17,977][1652475] InferenceWorker_p0-w0: stopping experience collection (17600 times) [2024-06-15 16:01:18,182][1651340] Signal inference workers to resume experience collection... (17600 times) [2024-06-15 16:01:18,183][1652475] InferenceWorker_p0-w0: resuming experience collection (17600 times) [2024-06-15 16:01:18,689][1652475] Updated weights for policy 0, policy_version 342775 (0.0013) [2024-06-15 16:01:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.4, 300 sec: 42431.8). Total num frames: 702054400. Throughput: 0: 10638.2. Samples: 175565824. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:01:21,689][1652475] Updated weights for policy 0, policy_version 342846 (0.0015) [2024-06-15 16:01:23,847][1652475] Updated weights for policy 0, policy_version 342908 (0.0016) [2024-06-15 16:01:25,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 702283776. Throughput: 0: 10228.6. Samples: 175622656. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:01:29,390][1652475] Updated weights for policy 0, policy_version 342960 (0.0018) [2024-06-15 16:01:30,724][1652475] Updated weights for policy 0, policy_version 343004 (0.0013) [2024-06-15 16:01:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 40413.9, 300 sec: 42653.9). Total num frames: 702447616. Throughput: 0: 10626.9. Samples: 175695872. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:01:33,001][1652475] Updated weights for policy 0, policy_version 343078 (0.0014) [2024-06-15 16:01:35,223][1652475] Updated weights for policy 0, policy_version 343158 (0.0080) [2024-06-15 16:01:35,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 702808064. Throughput: 0: 10535.8. Samples: 175722496. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:01:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 702873600. Throughput: 0: 10547.2. Samples: 175788032. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:01:40,973][1652475] Updated weights for policy 0, policy_version 343228 (0.0018) [2024-06-15 16:01:44,801][1652475] Updated weights for policy 0, policy_version 343280 (0.0012) [2024-06-15 16:01:45,738][1648984] Fps is (10 sec: 26214.0, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 703070208. Throughput: 0: 10547.2. Samples: 175849472. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:01:46,481][1652475] Updated weights for policy 0, policy_version 343332 (0.0013) [2024-06-15 16:01:48,304][1652475] Updated weights for policy 0, policy_version 343424 (0.0085) [2024-06-15 16:01:50,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 703332352. Throughput: 0: 10160.4. Samples: 175869952. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:50,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:01:53,704][1652475] Updated weights for policy 0, policy_version 343485 (0.0013) [2024-06-15 16:01:55,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 39321.6, 300 sec: 42653.9). Total num frames: 703463424. Throughput: 0: 10524.5. Samples: 175944192. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:01:55,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:01:56,947][1652475] Updated weights for policy 0, policy_version 343550 (0.0013) [2024-06-15 16:01:59,763][1652475] Updated weights for policy 0, policy_version 343616 (0.0013) [2024-06-15 16:02:00,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 703791104. Throughput: 0: 10376.5. Samples: 176007168. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:02:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:02:01,036][1652475] Updated weights for policy 0, policy_version 343673 (0.0013) [2024-06-15 16:02:05,206][1652475] Updated weights for policy 0, policy_version 343737 (0.0018) [2024-06-15 16:02:05,739][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 703987712. Throughput: 0: 10547.2. Samples: 176040448. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:02:05,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:02:07,994][1651340] Signal inference workers to stop experience collection... (17650 times) [2024-06-15 16:02:08,040][1652475] InferenceWorker_p0-w0: stopping experience collection (17650 times) [2024-06-15 16:02:08,305][1651340] Signal inference workers to resume experience collection... (17650 times) [2024-06-15 16:02:08,306][1652475] InferenceWorker_p0-w0: resuming experience collection (17650 times) [2024-06-15 16:02:08,647][1652475] Updated weights for policy 0, policy_version 343792 (0.0014) [2024-06-15 16:02:10,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.2, 300 sec: 42431.8). Total num frames: 704151552. Throughput: 0: 10649.6. Samples: 176101888. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:02:10,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:02:11,138][1652475] Updated weights for policy 0, policy_version 343844 (0.0012) [2024-06-15 16:02:14,046][1652475] Updated weights for policy 0, policy_version 343920 (0.0081) [2024-06-15 16:02:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 704380928. Throughput: 0: 10433.4. Samples: 176165376. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:02:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 16:02:17,474][1652475] Updated weights for policy 0, policy_version 343970 (0.0013) [2024-06-15 16:02:19,769][1652475] Updated weights for policy 0, policy_version 344018 (0.0013) [2024-06-15 16:02:20,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42598.4, 300 sec: 42543.0). Total num frames: 704610304. Throughput: 0: 10513.0. Samples: 176195584. Policy #0 lag: (min: 63.0, avg: 108.1, max: 263.0) [2024-06-15 16:02:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:02:22,792][1652475] Updated weights for policy 0, policy_version 344080 (0.0018) [2024-06-15 16:02:25,748][1648984] Fps is (10 sec: 42554.6, 60 sec: 42045.1, 300 sec: 42763.5). Total num frames: 704806912. Throughput: 0: 10396.9. Samples: 176256000. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:02:25,749][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:02:26,290][1652475] Updated weights for policy 0, policy_version 344176 (0.0014) [2024-06-15 16:02:30,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 40960.0, 300 sec: 42431.9). Total num frames: 704905216. Throughput: 0: 10604.1. Samples: 176326656. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:02:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:02:32,602][1652475] Updated weights for policy 0, policy_version 344272 (0.0014) [2024-06-15 16:02:35,738][1648984] Fps is (10 sec: 39362.4, 60 sec: 39867.7, 300 sec: 42320.7). Total num frames: 705200128. Throughput: 0: 10638.2. Samples: 176348672. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:02:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:02:35,971][1652475] Updated weights for policy 0, policy_version 344353 (0.0014) [2024-06-15 16:02:37,904][1652475] Updated weights for policy 0, policy_version 344400 (0.0012) [2024-06-15 16:02:40,738][1648984] Fps is (10 sec: 52427.2, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 705429504. Throughput: 0: 10410.6. Samples: 176412672. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:02:40,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:02:43,501][1652475] Updated weights for policy 0, policy_version 344464 (0.0038) [2024-06-15 16:02:45,636][1652475] Updated weights for policy 0, policy_version 344548 (0.0018) [2024-06-15 16:02:45,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42598.5, 300 sec: 42431.8). Total num frames: 705626112. Throughput: 0: 10478.9. Samples: 176478720. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:02:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:02:48,349][1652475] Updated weights for policy 0, policy_version 344609 (0.0013) [2024-06-15 16:02:50,336][1652475] Updated weights for policy 0, policy_version 344672 (0.0014) [2024-06-15 16:02:50,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 43144.4, 300 sec: 42987.2). Total num frames: 705921024. Throughput: 0: 10433.4. Samples: 176509952. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:02:50,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:02:55,786][1648984] Fps is (10 sec: 35870.8, 60 sec: 42018.3, 300 sec: 42313.7). Total num frames: 705986560. Throughput: 0: 10626.8. Samples: 176580608. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:02:55,787][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:02:56,012][1652475] Updated weights for policy 0, policy_version 344736 (0.0027) [2024-06-15 16:02:56,396][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000344752_706052096.pth... [2024-06-15 16:02:56,516][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000339824_695959552.pth [2024-06-15 16:02:56,856][1651340] Signal inference workers to stop experience collection... (17700 times) [2024-06-15 16:02:56,907][1652475] InferenceWorker_p0-w0: stopping experience collection (17700 times) [2024-06-15 16:02:57,082][1651340] Signal inference workers to resume experience collection... (17700 times) [2024-06-15 16:02:57,083][1652475] InferenceWorker_p0-w0: resuming experience collection (17700 times) [2024-06-15 16:02:58,122][1652475] Updated weights for policy 0, policy_version 344822 (0.0012) [2024-06-15 16:03:00,716][1652475] Updated weights for policy 0, policy_version 344889 (0.0093) [2024-06-15 16:03:00,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 706314240. Throughput: 0: 10524.4. Samples: 176638976. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:03:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:03:03,127][1652475] Updated weights for policy 0, policy_version 344957 (0.0013) [2024-06-15 16:03:05,738][1648984] Fps is (10 sec: 49391.6, 60 sec: 41506.1, 300 sec: 42542.9). Total num frames: 706478080. Throughput: 0: 10490.3. Samples: 176667648. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:03:05,742][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:03:09,704][1652475] Updated weights for policy 0, policy_version 345028 (0.0015) [2024-06-15 16:03:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42052.3, 300 sec: 42210.3). Total num frames: 706674688. Throughput: 0: 10686.2. Samples: 176736768. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:03:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:03:12,546][1652475] Updated weights for policy 0, policy_version 345089 (0.0012) [2024-06-15 16:03:14,128][1652475] Updated weights for policy 0, policy_version 345153 (0.0031) [2024-06-15 16:03:15,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 707002368. Throughput: 0: 10410.6. Samples: 176795136. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:03:15,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:03:20,322][1652475] Updated weights for policy 0, policy_version 345232 (0.0018) [2024-06-15 16:03:20,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 40414.0, 300 sec: 41987.5). Total num frames: 707035136. Throughput: 0: 10774.8. Samples: 176833536. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:03:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:03:22,804][1652475] Updated weights for policy 0, policy_version 345328 (0.0014) [2024-06-15 16:03:25,121][1652475] Updated weights for policy 0, policy_version 345379 (0.0015) [2024-06-15 16:03:25,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 43152.0, 300 sec: 42653.9). Total num frames: 707395584. Throughput: 0: 10717.9. Samples: 176894976. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:03:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:03:26,572][1652475] Updated weights for policy 0, policy_version 345440 (0.0013) [2024-06-15 16:03:30,738][1648984] Fps is (10 sec: 49151.3, 60 sec: 43690.6, 300 sec: 42209.6). Total num frames: 707526656. Throughput: 0: 10615.5. Samples: 176956416. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:03:30,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:03:33,762][1652475] Updated weights for policy 0, policy_version 345526 (0.0124) [2024-06-15 16:03:35,738][1648984] Fps is (10 sec: 26213.6, 60 sec: 40959.8, 300 sec: 41765.3). Total num frames: 707657728. Throughput: 0: 10672.3. Samples: 176990208. Policy #0 lag: (min: 15.0, avg: 127.5, max: 271.0) [2024-06-15 16:03:35,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:03:36,122][1652475] Updated weights for policy 0, policy_version 345555 (0.0019) [2024-06-15 16:03:38,317][1652475] Updated weights for policy 0, policy_version 345659 (0.0016) [2024-06-15 16:03:39,990][1652475] Updated weights for policy 0, policy_version 345728 (0.0130) [2024-06-15 16:03:40,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.9, 300 sec: 42653.9). Total num frames: 708050944. Throughput: 0: 10319.4. Samples: 177044480. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:03:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:03:44,556][1651340] Signal inference workers to stop experience collection... (17750 times) [2024-06-15 16:03:44,615][1652475] InferenceWorker_p0-w0: stopping experience collection (17750 times) [2024-06-15 16:03:44,873][1651340] Signal inference workers to resume experience collection... (17750 times) [2024-06-15 16:03:44,874][1652475] InferenceWorker_p0-w0: resuming experience collection (17750 times) [2024-06-15 16:03:45,317][1652475] Updated weights for policy 0, policy_version 345777 (0.0122) [2024-06-15 16:03:45,738][1648984] Fps is (10 sec: 52430.6, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 708182016. Throughput: 0: 10626.9. Samples: 177117184. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:03:45,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 16:03:48,227][1652475] Updated weights for policy 0, policy_version 345824 (0.0013) [2024-06-15 16:03:49,781][1652475] Updated weights for policy 0, policy_version 345894 (0.0035) [2024-06-15 16:03:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.4, 300 sec: 42542.9). Total num frames: 708444160. Throughput: 0: 10831.7. Samples: 177155072. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:03:50,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 16:03:51,891][1652475] Updated weights for policy 0, policy_version 345975 (0.0013) [2024-06-15 16:03:55,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43726.0, 300 sec: 42654.3). Total num frames: 708608000. Throughput: 0: 10752.0. Samples: 177220608. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:03:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:03:56,010][1652475] Updated weights for policy 0, policy_version 346016 (0.0013) [2024-06-15 16:04:00,270][1652475] Updated weights for policy 0, policy_version 346080 (0.0014) [2024-06-15 16:04:00,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.2, 300 sec: 42098.5). Total num frames: 708804608. Throughput: 0: 11070.6. Samples: 177293312. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:00,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:04:01,709][1652475] Updated weights for policy 0, policy_version 346144 (0.0131) [2024-06-15 16:04:03,963][1652475] Updated weights for policy 0, policy_version 346232 (0.0096) [2024-06-15 16:04:05,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 709099520. Throughput: 0: 10706.5. Samples: 177315328. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:04:10,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42542.8). Total num frames: 709230592. Throughput: 0: 10865.8. Samples: 177383936. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:04:11,416][1652475] Updated weights for policy 0, policy_version 346306 (0.0013) [2024-06-15 16:04:12,769][1652475] Updated weights for policy 0, policy_version 346370 (0.0013) [2024-06-15 16:04:13,876][1652475] Updated weights for policy 0, policy_version 346431 (0.0012) [2024-06-15 16:04:15,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42598.6, 300 sec: 42653.9). Total num frames: 709558272. Throughput: 0: 11002.3. Samples: 177451520. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:04:16,273][1652475] Updated weights for policy 0, policy_version 346488 (0.0012) [2024-06-15 16:04:18,942][1652475] Updated weights for policy 0, policy_version 346548 (0.0106) [2024-06-15 16:04:20,753][1648984] Fps is (10 sec: 52348.1, 60 sec: 45317.4, 300 sec: 43096.0). Total num frames: 709754880. Throughput: 0: 10941.7. Samples: 177482752. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:20,754][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:04:24,069][1652475] Updated weights for policy 0, policy_version 346608 (0.0012) [2024-06-15 16:04:25,746][1648984] Fps is (10 sec: 32739.8, 60 sec: 41500.2, 300 sec: 42208.4). Total num frames: 709885952. Throughput: 0: 11250.5. Samples: 177550848. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:25,747][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 16:04:27,954][1651340] Signal inference workers to stop experience collection... (17800 times) [2024-06-15 16:04:27,976][1652475] Updated weights for policy 0, policy_version 346705 (0.0015) [2024-06-15 16:04:28,007][1652475] InferenceWorker_p0-w0: stopping experience collection (17800 times) [2024-06-15 16:04:28,237][1651340] Signal inference workers to resume experience collection... (17800 times) [2024-06-15 16:04:28,237][1652475] InferenceWorker_p0-w0: resuming experience collection (17800 times) [2024-06-15 16:04:30,180][1652475] Updated weights for policy 0, policy_version 346768 (0.0012) [2024-06-15 16:04:30,738][1648984] Fps is (10 sec: 45946.1, 60 sec: 44783.0, 300 sec: 42876.1). Total num frames: 710213632. Throughput: 0: 10797.5. Samples: 177603072. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:04:35,738][1648984] Fps is (10 sec: 39355.3, 60 sec: 43690.9, 300 sec: 42209.9). Total num frames: 710279168. Throughput: 0: 10683.7. Samples: 177635840. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:04:36,628][1652475] Updated weights for policy 0, policy_version 346832 (0.0103) [2024-06-15 16:04:38,575][1652475] Updated weights for policy 0, policy_version 346883 (0.0176) [2024-06-15 16:04:40,332][1652475] Updated weights for policy 0, policy_version 346945 (0.0016) [2024-06-15 16:04:40,759][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 710574080. Throughput: 0: 10569.9. Samples: 177696256. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:40,759][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:04:41,645][1652475] Updated weights for policy 0, policy_version 347003 (0.0014) [2024-06-15 16:04:45,313][1652475] Updated weights for policy 0, policy_version 347056 (0.0017) [2024-06-15 16:04:45,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 42654.1). Total num frames: 710803456. Throughput: 0: 10399.3. Samples: 177761280. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:04:49,817][1652475] Updated weights for policy 0, policy_version 347134 (0.0015) [2024-06-15 16:04:50,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 42052.0, 300 sec: 42320.7). Total num frames: 710967296. Throughput: 0: 10649.5. Samples: 177794560. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:50,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:04:51,723][1652475] Updated weights for policy 0, policy_version 347190 (0.0011) [2024-06-15 16:04:53,963][1652475] Updated weights for policy 0, policy_version 347260 (0.0014) [2024-06-15 16:04:55,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43144.4, 300 sec: 42653.9). Total num frames: 711196672. Throughput: 0: 10308.2. Samples: 177847808. Policy #0 lag: (min: 50.0, avg: 199.6, max: 290.0) [2024-06-15 16:04:55,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:04:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000347264_711196672.pth... [2024-06-15 16:04:55,799][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000342336_701104128.pth [2024-06-15 16:04:58,838][1652475] Updated weights for policy 0, policy_version 347299 (0.0012) [2024-06-15 16:05:00,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 42052.2, 300 sec: 41987.5). Total num frames: 711327744. Throughput: 0: 10456.1. Samples: 177922048. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:00,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:05:01,481][1652475] Updated weights for policy 0, policy_version 347362 (0.0012) [2024-06-15 16:05:02,891][1652475] Updated weights for policy 0, policy_version 347410 (0.0028) [2024-06-15 16:05:03,845][1652475] Updated weights for policy 0, policy_version 347453 (0.0012) [2024-06-15 16:05:05,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 711688192. Throughput: 0: 10357.3. Samples: 177948672. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:05:05,760][1652475] Updated weights for policy 0, policy_version 347516 (0.0012) [2024-06-15 16:05:10,740][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 711720960. Throughput: 0: 10333.0. Samples: 178015744. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:10,741][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:05:11,787][1652475] Updated weights for policy 0, policy_version 347580 (0.0069) [2024-06-15 16:05:14,337][1652475] Updated weights for policy 0, policy_version 347634 (0.0043) [2024-06-15 16:05:15,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 712048640. Throughput: 0: 10547.2. Samples: 178077696. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:05:16,011][1652475] Updated weights for policy 0, policy_version 347704 (0.0014) [2024-06-15 16:05:17,415][1651340] Signal inference workers to stop experience collection... (17850 times) [2024-06-15 16:05:17,463][1652475] InferenceWorker_p0-w0: stopping experience collection (17850 times) [2024-06-15 16:05:17,701][1651340] Signal inference workers to resume experience collection... (17850 times) [2024-06-15 16:05:17,702][1652475] InferenceWorker_p0-w0: resuming experience collection (17850 times) [2024-06-15 16:05:18,059][1652475] Updated weights for policy 0, policy_version 347760 (0.0014) [2024-06-15 16:05:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 41516.8, 300 sec: 42653.9). Total num frames: 712245248. Throughput: 0: 10444.8. Samples: 178105856. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:20,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:05:23,291][1652475] Updated weights for policy 0, policy_version 347795 (0.0016) [2024-06-15 16:05:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42058.3, 300 sec: 41987.5). Total num frames: 712409088. Throughput: 0: 10615.5. Samples: 178173952. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:05:25,763][1652475] Updated weights for policy 0, policy_version 347861 (0.0020) [2024-06-15 16:05:28,193][1652475] Updated weights for policy 0, policy_version 347966 (0.0143) [2024-06-15 16:05:30,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 41506.2, 300 sec: 42431.8). Total num frames: 712704000. Throughput: 0: 10513.1. Samples: 178234368. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:30,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:05:31,083][1652475] Updated weights for policy 0, policy_version 348021 (0.0014) [2024-06-15 16:05:35,739][1648984] Fps is (10 sec: 39315.9, 60 sec: 42051.3, 300 sec: 42320.5). Total num frames: 712802304. Throughput: 0: 10581.1. Samples: 178270720. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:35,740][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:05:36,591][1652475] Updated weights for policy 0, policy_version 348093 (0.0016) [2024-06-15 16:05:38,761][1652475] Updated weights for policy 0, policy_version 348144 (0.0013) [2024-06-15 16:05:40,177][1652475] Updated weights for policy 0, policy_version 348194 (0.0014) [2024-06-15 16:05:40,738][1648984] Fps is (10 sec: 45873.6, 60 sec: 43144.4, 300 sec: 42653.9). Total num frames: 713162752. Throughput: 0: 10740.6. Samples: 178331136. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:40,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:05:42,842][1652475] Updated weights for policy 0, policy_version 348260 (0.0016) [2024-06-15 16:05:45,738][1648984] Fps is (10 sec: 49159.3, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 713293824. Throughput: 0: 10706.5. Samples: 178403840. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:45,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:05:48,249][1652475] Updated weights for policy 0, policy_version 348336 (0.0013) [2024-06-15 16:05:50,382][1652475] Updated weights for policy 0, policy_version 348369 (0.0013) [2024-06-15 16:05:50,738][1648984] Fps is (10 sec: 32769.0, 60 sec: 42052.5, 300 sec: 41987.5). Total num frames: 713490432. Throughput: 0: 10717.9. Samples: 178430976. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:05:51,883][1652475] Updated weights for policy 0, policy_version 348436 (0.0013) [2024-06-15 16:05:54,960][1652475] Updated weights for policy 0, policy_version 348497 (0.0013) [2024-06-15 16:05:55,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.7, 300 sec: 42542.9). Total num frames: 713785344. Throughput: 0: 10547.2. Samples: 178490368. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:05:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:06:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.5, 300 sec: 42431.8). Total num frames: 713883648. Throughput: 0: 10649.6. Samples: 178556928. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:06:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:06:01,380][1652475] Updated weights for policy 0, policy_version 348599 (0.0013) [2024-06-15 16:06:03,464][1652475] Updated weights for policy 0, policy_version 348672 (0.0165) [2024-06-15 16:06:05,738][1648984] Fps is (10 sec: 32767.2, 60 sec: 40413.8, 300 sec: 42209.6). Total num frames: 714113024. Throughput: 0: 10501.6. Samples: 178578432. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:06:05,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 16:06:06,518][1651340] Signal inference workers to stop experience collection... (17900 times) [2024-06-15 16:06:06,581][1652475] InferenceWorker_p0-w0: stopping experience collection (17900 times) [2024-06-15 16:06:06,737][1651340] Signal inference workers to resume experience collection... (17900 times) [2024-06-15 16:06:06,738][1652475] InferenceWorker_p0-w0: resuming experience collection (17900 times) [2024-06-15 16:06:06,740][1652475] Updated weights for policy 0, policy_version 348736 (0.0030) [2024-06-15 16:06:08,003][1652475] Updated weights for policy 0, policy_version 348792 (0.0013) [2024-06-15 16:06:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 714342400. Throughput: 0: 10558.6. Samples: 178649088. Policy #0 lag: (min: 4.0, avg: 94.7, max: 260.0) [2024-06-15 16:06:10,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:06:12,735][1652475] Updated weights for policy 0, policy_version 348834 (0.0014) [2024-06-15 16:06:15,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 40960.0, 300 sec: 42209.6). Total num frames: 714506240. Throughput: 0: 10683.7. Samples: 178715136. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:06:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:06:16,708][1652475] Updated weights for policy 0, policy_version 348922 (0.0027) [2024-06-15 16:06:19,189][1652475] Updated weights for policy 0, policy_version 349008 (0.0012) [2024-06-15 16:06:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 714866688. Throughput: 0: 10411.0. Samples: 178739200. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:06:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:06:24,365][1652475] Updated weights for policy 0, policy_version 349074 (0.0032) [2024-06-15 16:06:25,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 714997760. Throughput: 0: 10547.3. Samples: 178805760. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:06:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:06:29,149][1652475] Updated weights for policy 0, policy_version 349141 (0.0017) [2024-06-15 16:06:30,739][1648984] Fps is (10 sec: 32762.9, 60 sec: 41505.0, 300 sec: 41987.3). Total num frames: 715194368. Throughput: 0: 10433.1. Samples: 178873344. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:06:30,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:06:31,176][1652475] Updated weights for policy 0, policy_version 349233 (0.0128) [2024-06-15 16:06:32,843][1652475] Updated weights for policy 0, policy_version 349306 (0.0018) [2024-06-15 16:06:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43691.8, 300 sec: 42542.9). Total num frames: 715423744. Throughput: 0: 10331.0. Samples: 178895872. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:06:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:06:36,831][1652475] Updated weights for policy 0, policy_version 349372 (0.0022) [2024-06-15 16:06:40,738][1648984] Fps is (10 sec: 32772.6, 60 sec: 39321.7, 300 sec: 42209.6). Total num frames: 715522048. Throughput: 0: 10615.4. Samples: 178968064. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:06:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:06:42,026][1652475] Updated weights for policy 0, policy_version 349439 (0.0016) [2024-06-15 16:06:44,668][1652475] Updated weights for policy 0, policy_version 349520 (0.0211) [2024-06-15 16:06:45,739][1648984] Fps is (10 sec: 49144.0, 60 sec: 43689.5, 300 sec: 42653.7). Total num frames: 715915264. Throughput: 0: 10342.0. Samples: 179022336. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:06:45,740][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:06:47,774][1652475] Updated weights for policy 0, policy_version 349569 (0.0024) [2024-06-15 16:06:49,269][1652475] Updated weights for policy 0, policy_version 349631 (0.0012) [2024-06-15 16:06:50,747][1648984] Fps is (10 sec: 52379.1, 60 sec: 42591.6, 300 sec: 42652.6). Total num frames: 716046336. Throughput: 0: 10647.4. Samples: 179057664. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:06:50,748][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:06:54,377][1652475] Updated weights for policy 0, policy_version 349690 (0.0014) [2024-06-15 16:06:55,427][1651340] Signal inference workers to stop experience collection... (17950 times) [2024-06-15 16:06:55,472][1652475] InferenceWorker_p0-w0: stopping experience collection (17950 times) [2024-06-15 16:06:55,625][1651340] Signal inference workers to resume experience collection... (17950 times) [2024-06-15 16:06:55,627][1652475] InferenceWorker_p0-w0: resuming experience collection (17950 times) [2024-06-15 16:06:55,738][1648984] Fps is (10 sec: 32771.7, 60 sec: 40959.7, 300 sec: 42209.6). Total num frames: 716242944. Throughput: 0: 10558.5. Samples: 179124224. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:06:55,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:06:56,289][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000349760_716308480.pth... [2024-06-15 16:06:56,289][1652475] Updated weights for policy 0, policy_version 349760 (0.0023) [2024-06-15 16:06:56,324][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000344752_706052096.pth [2024-06-15 16:06:58,525][1652475] Updated weights for policy 0, policy_version 349820 (0.0012) [2024-06-15 16:07:00,738][1648984] Fps is (10 sec: 42639.2, 60 sec: 43144.5, 300 sec: 42320.7). Total num frames: 716472320. Throughput: 0: 10524.4. Samples: 179188736. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:07:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:07:01,478][1652475] Updated weights for policy 0, policy_version 349872 (0.0012) [2024-06-15 16:07:05,703][1652475] Updated weights for policy 0, policy_version 349944 (0.0129) [2024-06-15 16:07:05,738][1648984] Fps is (10 sec: 42600.1, 60 sec: 42598.5, 300 sec: 42431.8). Total num frames: 716668928. Throughput: 0: 10661.0. Samples: 179218944. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:07:05,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:07:07,856][1652475] Updated weights for policy 0, policy_version 350016 (0.0013) [2024-06-15 16:07:10,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 716898304. Throughput: 0: 10604.1. Samples: 179282944. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:07:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:07:11,081][1652475] Updated weights for policy 0, policy_version 350075 (0.0013) [2024-06-15 16:07:14,714][1652475] Updated weights for policy 0, policy_version 350128 (0.0013) [2024-06-15 16:07:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 42320.7). Total num frames: 717094912. Throughput: 0: 10604.4. Samples: 179350528. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:07:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:07:16,676][1652475] Updated weights for policy 0, policy_version 350162 (0.0013) [2024-06-15 16:07:18,389][1652475] Updated weights for policy 0, policy_version 350210 (0.0017) [2024-06-15 16:07:20,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 41506.1, 300 sec: 42544.4). Total num frames: 717357056. Throughput: 0: 10797.5. Samples: 179381760. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:07:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:07:21,449][1652475] Updated weights for policy 0, policy_version 350275 (0.0013) [2024-06-15 16:07:22,671][1652475] Updated weights for policy 0, policy_version 350327 (0.0026) [2024-06-15 16:07:25,739][1648984] Fps is (10 sec: 39318.2, 60 sec: 41505.5, 300 sec: 42653.8). Total num frames: 717488128. Throughput: 0: 10751.8. Samples: 179451904. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:07:25,740][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:07:26,471][1652475] Updated weights for policy 0, policy_version 350378 (0.0039) [2024-06-15 16:07:27,982][1652475] Updated weights for policy 0, policy_version 350416 (0.0012) [2024-06-15 16:07:29,139][1652475] Updated weights for policy 0, policy_version 350458 (0.0012) [2024-06-15 16:07:30,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43691.8, 300 sec: 42765.0). Total num frames: 717815808. Throughput: 0: 10968.6. Samples: 179515904. Policy #0 lag: (min: 7.0, avg: 115.6, max: 263.0) [2024-06-15 16:07:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:07:31,183][1652475] Updated weights for policy 0, policy_version 350519 (0.0013) [2024-06-15 16:07:33,736][1652475] Updated weights for policy 0, policy_version 350560 (0.0012) [2024-06-15 16:07:35,738][1648984] Fps is (10 sec: 52433.6, 60 sec: 43144.5, 300 sec: 42654.0). Total num frames: 718012416. Throughput: 0: 10981.9. Samples: 179551744. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:07:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:07:37,983][1652475] Updated weights for policy 0, policy_version 350611 (0.0017) [2024-06-15 16:07:40,248][1652475] Updated weights for policy 0, policy_version 350688 (0.0130) [2024-06-15 16:07:40,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 45329.1, 300 sec: 42765.0). Total num frames: 718241792. Throughput: 0: 10968.3. Samples: 179617792. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:07:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:07:42,025][1652475] Updated weights for policy 0, policy_version 350725 (0.0011) [2024-06-15 16:07:44,762][1651340] Signal inference workers to stop experience collection... (18000 times) [2024-06-15 16:07:44,822][1652475] InferenceWorker_p0-w0: stopping experience collection (18000 times) [2024-06-15 16:07:44,852][1652475] Updated weights for policy 0, policy_version 350789 (0.0013) [2024-06-15 16:07:45,043][1651340] Signal inference workers to resume experience collection... (18000 times) [2024-06-15 16:07:45,044][1652475] InferenceWorker_p0-w0: resuming experience collection (18000 times) [2024-06-15 16:07:45,775][1648984] Fps is (10 sec: 45702.8, 60 sec: 42572.7, 300 sec: 42537.4). Total num frames: 718471168. Throughput: 0: 11049.9. Samples: 179686400. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:07:45,776][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:07:49,613][1652475] Updated weights for policy 0, policy_version 350853 (0.0025) [2024-06-15 16:07:50,640][1652475] Updated weights for policy 0, policy_version 350912 (0.0013) [2024-06-15 16:07:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43697.6, 300 sec: 42994.2). Total num frames: 718667776. Throughput: 0: 11059.2. Samples: 179716608. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:07:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:07:52,642][1652475] Updated weights for policy 0, policy_version 350969 (0.0013) [2024-06-15 16:07:54,546][1652475] Updated weights for policy 0, policy_version 351024 (0.0014) [2024-06-15 16:07:55,738][1648984] Fps is (10 sec: 46048.8, 60 sec: 44783.3, 300 sec: 42765.0). Total num frames: 718929920. Throughput: 0: 11173.0. Samples: 179785728. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:07:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:07:56,874][1652475] Updated weights for policy 0, policy_version 351060 (0.0013) [2024-06-15 16:08:00,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 719060992. Throughput: 0: 11116.1. Samples: 179850752. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:08:01,603][1652475] Updated weights for policy 0, policy_version 351136 (0.0019) [2024-06-15 16:08:03,297][1652475] Updated weights for policy 0, policy_version 351185 (0.0012) [2024-06-15 16:08:05,223][1652475] Updated weights for policy 0, policy_version 351233 (0.0011) [2024-06-15 16:08:05,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44783.0, 300 sec: 42987.2). Total num frames: 719355904. Throughput: 0: 11184.4. Samples: 179885056. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:08:06,580][1652475] Updated weights for policy 0, policy_version 351296 (0.0013) [2024-06-15 16:08:09,367][1652475] Updated weights for policy 0, policy_version 351356 (0.0013) [2024-06-15 16:08:10,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44782.9, 300 sec: 42654.0). Total num frames: 719585280. Throughput: 0: 11082.2. Samples: 179950592. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:08:13,086][1652475] Updated weights for policy 0, policy_version 351393 (0.0013) [2024-06-15 16:08:15,739][1648984] Fps is (10 sec: 36040.8, 60 sec: 43689.9, 300 sec: 42987.0). Total num frames: 719716352. Throughput: 0: 11320.6. Samples: 180025344. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:15,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:08:16,999][1652475] Updated weights for policy 0, policy_version 351488 (0.0132) [2024-06-15 16:08:18,490][1652475] Updated weights for policy 0, policy_version 351545 (0.0011) [2024-06-15 16:08:19,967][1652475] Updated weights for policy 0, policy_version 351584 (0.0012) [2024-06-15 16:08:20,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45329.1, 300 sec: 42987.2). Total num frames: 720076800. Throughput: 0: 11059.2. Samples: 180049408. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:08:24,368][1652475] Updated weights for policy 0, policy_version 351619 (0.0013) [2024-06-15 16:08:25,738][1648984] Fps is (10 sec: 52434.4, 60 sec: 45875.9, 300 sec: 43098.3). Total num frames: 720240640. Throughput: 0: 11047.8. Samples: 180114944. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:08:27,281][1652475] Updated weights for policy 0, policy_version 351683 (0.0017) [2024-06-15 16:08:28,263][1652475] Updated weights for policy 0, policy_version 351739 (0.0015) [2024-06-15 16:08:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 720470016. Throughput: 0: 11091.3. Samples: 180185088. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:08:30,809][1652475] Updated weights for policy 0, policy_version 351799 (0.0017) [2024-06-15 16:08:31,348][1651340] Signal inference workers to stop experience collection... (18050 times) [2024-06-15 16:08:31,407][1652475] InferenceWorker_p0-w0: stopping experience collection (18050 times) [2024-06-15 16:08:31,632][1651340] Signal inference workers to resume experience collection... (18050 times) [2024-06-15 16:08:31,633][1652475] InferenceWorker_p0-w0: resuming experience collection (18050 times) [2024-06-15 16:08:32,109][1652475] Updated weights for policy 0, policy_version 351840 (0.0123) [2024-06-15 16:08:35,416][1652475] Updated weights for policy 0, policy_version 351879 (0.0011) [2024-06-15 16:08:35,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 720666624. Throughput: 0: 11059.2. Samples: 180214272. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:08:40,222][1652475] Updated weights for policy 0, policy_version 351940 (0.0013) [2024-06-15 16:08:40,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 720797696. Throughput: 0: 11207.1. Samples: 180290048. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:08:41,825][1652475] Updated weights for policy 0, policy_version 352000 (0.0012) [2024-06-15 16:08:43,369][1652475] Updated weights for policy 0, policy_version 352052 (0.0012) [2024-06-15 16:08:45,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 44811.1, 300 sec: 43098.2). Total num frames: 721158144. Throughput: 0: 10843.0. Samples: 180338688. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:08:47,258][1652475] Updated weights for policy 0, policy_version 352130 (0.0134) [2024-06-15 16:08:48,488][1652475] Updated weights for policy 0, policy_version 352191 (0.0013) [2024-06-15 16:08:50,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 721289216. Throughput: 0: 10820.3. Samples: 180371968. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 16:08:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:08:53,443][1652475] Updated weights for policy 0, policy_version 352248 (0.0018) [2024-06-15 16:08:55,476][1652475] Updated weights for policy 0, policy_version 352310 (0.0014) [2024-06-15 16:08:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 721551360. Throughput: 0: 11002.3. Samples: 180445696. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:08:55,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:08:55,749][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000352320_721551360.pth... [2024-06-15 16:08:55,946][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000347264_711196672.pth [2024-06-15 16:08:56,837][1652475] Updated weights for policy 0, policy_version 352372 (0.0091) [2024-06-15 16:08:59,284][1652475] Updated weights for policy 0, policy_version 352438 (0.0013) [2024-06-15 16:09:00,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 45875.1, 300 sec: 43098.2). Total num frames: 721813504. Throughput: 0: 10718.1. Samples: 180507648. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:09:03,980][1652475] Updated weights for policy 0, policy_version 352502 (0.0013) [2024-06-15 16:09:05,738][1648984] Fps is (10 sec: 39319.6, 60 sec: 43144.1, 300 sec: 43098.2). Total num frames: 721944576. Throughput: 0: 11127.3. Samples: 180550144. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:05,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:09:07,245][1652475] Updated weights for policy 0, policy_version 352545 (0.0018) [2024-06-15 16:09:09,819][1652475] Updated weights for policy 0, policy_version 352610 (0.0017) [2024-06-15 16:09:10,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.7, 300 sec: 42987.2). Total num frames: 722239488. Throughput: 0: 11070.6. Samples: 180613120. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:09:11,715][1652475] Updated weights for policy 0, policy_version 352699 (0.0014) [2024-06-15 16:09:15,617][1652475] Updated weights for policy 0, policy_version 352764 (0.0015) [2024-06-15 16:09:15,738][1648984] Fps is (10 sec: 52431.2, 60 sec: 45875.9, 300 sec: 43100.5). Total num frames: 722468864. Throughput: 0: 10911.3. Samples: 180676096. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:09:18,969][1651340] Signal inference workers to stop experience collection... (18100 times) [2024-06-15 16:09:19,013][1652475] InferenceWorker_p0-w0: stopping experience collection (18100 times) [2024-06-15 16:09:19,176][1651340] Signal inference workers to resume experience collection... (18100 times) [2024-06-15 16:09:19,177][1652475] InferenceWorker_p0-w0: resuming experience collection (18100 times) [2024-06-15 16:09:19,495][1652475] Updated weights for policy 0, policy_version 352832 (0.0013) [2024-06-15 16:09:20,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 42052.2, 300 sec: 43099.5). Total num frames: 722599936. Throughput: 0: 11047.8. Samples: 180711424. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:09:23,456][1652475] Updated weights for policy 0, policy_version 352896 (0.0015) [2024-06-15 16:09:25,186][1652475] Updated weights for policy 0, policy_version 352960 (0.0014) [2024-06-15 16:09:25,745][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 722862080. Throughput: 0: 10581.3. Samples: 180766208. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:25,754][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:09:27,500][1652475] Updated weights for policy 0, policy_version 353018 (0.0011) [2024-06-15 16:09:30,738][1648984] Fps is (10 sec: 39318.8, 60 sec: 42051.7, 300 sec: 43098.2). Total num frames: 722993152. Throughput: 0: 10899.7. Samples: 180829184. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:30,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:09:31,953][1652475] Updated weights for policy 0, policy_version 353086 (0.0013) [2024-06-15 16:09:35,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 723189760. Throughput: 0: 10808.9. Samples: 180858368. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:09:36,007][1652475] Updated weights for policy 0, policy_version 353140 (0.0013) [2024-06-15 16:09:38,856][1652475] Updated weights for policy 0, policy_version 353216 (0.0015) [2024-06-15 16:09:40,161][1652475] Updated weights for policy 0, policy_version 353280 (0.0035) [2024-06-15 16:09:40,738][1648984] Fps is (10 sec: 52432.2, 60 sec: 45329.0, 300 sec: 43098.3). Total num frames: 723517440. Throughput: 0: 10501.7. Samples: 180918272. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:09:44,201][1652475] Updated weights for policy 0, policy_version 353335 (0.0013) [2024-06-15 16:09:45,740][1648984] Fps is (10 sec: 45875.0, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 723648512. Throughput: 0: 10717.9. Samples: 180989952. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:45,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:09:46,885][1652475] Updated weights for policy 0, policy_version 353376 (0.0012) [2024-06-15 16:09:49,203][1652475] Updated weights for policy 0, policy_version 353424 (0.0079) [2024-06-15 16:09:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 723910656. Throughput: 0: 10592.8. Samples: 181026816. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:09:51,455][1652475] Updated weights for policy 0, policy_version 353504 (0.0015) [2024-06-15 16:09:55,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 724074496. Throughput: 0: 10581.3. Samples: 181089280. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:09:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:09:55,768][1652475] Updated weights for policy 0, policy_version 353558 (0.0129) [2024-06-15 16:09:58,160][1652475] Updated weights for policy 0, policy_version 353616 (0.0017) [2024-06-15 16:09:58,888][1652475] Updated weights for policy 0, policy_version 353652 (0.0012) [2024-06-15 16:10:00,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42052.4, 300 sec: 42876.1). Total num frames: 724336640. Throughput: 0: 10763.4. Samples: 181160448. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:10:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:10:01,442][1652475] Updated weights for policy 0, policy_version 353717 (0.0013) [2024-06-15 16:10:03,142][1652475] Updated weights for policy 0, policy_version 353760 (0.0013) [2024-06-15 16:10:05,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43691.0, 300 sec: 43542.6). Total num frames: 724566016. Throughput: 0: 10535.8. Samples: 181185536. Policy #0 lag: (min: 15.0, avg: 86.0, max: 271.0) [2024-06-15 16:10:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:10:07,318][1652475] Updated weights for policy 0, policy_version 353824 (0.0127) [2024-06-15 16:10:07,484][1651340] Signal inference workers to stop experience collection... (18150 times) [2024-06-15 16:10:07,564][1652475] InferenceWorker_p0-w0: stopping experience collection (18150 times) [2024-06-15 16:10:07,726][1651340] Signal inference workers to resume experience collection... (18150 times) [2024-06-15 16:10:07,727][1652475] InferenceWorker_p0-w0: resuming experience collection (18150 times) [2024-06-15 16:10:09,878][1652475] Updated weights for policy 0, policy_version 353876 (0.0016) [2024-06-15 16:10:10,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 724795392. Throughput: 0: 10979.6. Samples: 181260288. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:10:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:10:12,193][1652475] Updated weights for policy 0, policy_version 353936 (0.0013) [2024-06-15 16:10:14,792][1652475] Updated weights for policy 0, policy_version 354004 (0.0014) [2024-06-15 16:10:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 725090304. Throughput: 0: 10911.5. Samples: 181320192. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:10:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:10:19,838][1652475] Updated weights for policy 0, policy_version 354096 (0.0013) [2024-06-15 16:10:20,738][1648984] Fps is (10 sec: 42597.2, 60 sec: 43690.4, 300 sec: 43431.4). Total num frames: 725221376. Throughput: 0: 11127.4. Samples: 181359104. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:10:20,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:10:21,764][1652475] Updated weights for policy 0, policy_version 354131 (0.0032) [2024-06-15 16:10:24,628][1652475] Updated weights for policy 0, policy_version 354180 (0.0012) [2024-06-15 16:10:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 725450752. Throughput: 0: 11116.1. Samples: 181418496. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:10:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:10:27,378][1652475] Updated weights for policy 0, policy_version 354243 (0.0024) [2024-06-15 16:10:30,738][1648984] Fps is (10 sec: 39322.6, 60 sec: 43691.1, 300 sec: 43431.7). Total num frames: 725614592. Throughput: 0: 10968.2. Samples: 181483520. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:10:30,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 16:10:31,823][1652475] Updated weights for policy 0, policy_version 354336 (0.0092) [2024-06-15 16:10:34,527][1652475] Updated weights for policy 0, policy_version 354430 (0.0014) [2024-06-15 16:10:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44782.9, 300 sec: 43098.3). Total num frames: 725876736. Throughput: 0: 10808.9. Samples: 181513216. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:10:35,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:10:37,712][1652475] Updated weights for policy 0, policy_version 354484 (0.0028) [2024-06-15 16:10:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 726007808. Throughput: 0: 10899.9. Samples: 181579776. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:10:40,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:10:42,725][1652475] Updated weights for policy 0, policy_version 354560 (0.0106) [2024-06-15 16:10:44,214][1652475] Updated weights for policy 0, policy_version 354622 (0.0014) [2024-06-15 16:10:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 726335488. Throughput: 0: 10661.0. Samples: 181640192. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:10:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:10:46,150][1652475] Updated weights for policy 0, policy_version 354688 (0.0013) [2024-06-15 16:10:49,582][1652475] Updated weights for policy 0, policy_version 354752 (0.0026) [2024-06-15 16:10:50,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 726532096. Throughput: 0: 10854.4. Samples: 181673984. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:10:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:10:54,987][1652475] Updated weights for policy 0, policy_version 354800 (0.0014) [2024-06-15 16:10:55,739][1648984] Fps is (10 sec: 32763.7, 60 sec: 43143.6, 300 sec: 43320.2). Total num frames: 726663168. Throughput: 0: 10728.9. Samples: 181743104. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:10:55,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:10:56,067][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000354848_726728704.pth... [2024-06-15 16:10:56,164][1651340] Signal inference workers to stop experience collection... (18200 times) [2024-06-15 16:10:56,217][1652475] InferenceWorker_p0-w0: stopping experience collection (18200 times) [2024-06-15 16:10:56,222][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000349760_716308480.pth [2024-06-15 16:10:56,226][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000354848_726728704.pth [2024-06-15 16:10:56,475][1651340] Signal inference workers to resume experience collection... (18200 times) [2024-06-15 16:10:56,476][1652475] InferenceWorker_p0-w0: resuming experience collection (18200 times) [2024-06-15 16:10:56,626][1652475] Updated weights for policy 0, policy_version 354872 (0.0029) [2024-06-15 16:10:57,881][1652475] Updated weights for policy 0, policy_version 354928 (0.0014) [2024-06-15 16:11:00,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 44236.7, 300 sec: 43653.7). Total num frames: 726990848. Throughput: 0: 10786.1. Samples: 181805568. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:11:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:11:01,123][1652475] Updated weights for policy 0, policy_version 354992 (0.0102) [2024-06-15 16:11:05,738][1648984] Fps is (10 sec: 39326.3, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 727056384. Throughput: 0: 10604.1. Samples: 181836288. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:11:05,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:11:06,965][1652475] Updated weights for policy 0, policy_version 355065 (0.0121) [2024-06-15 16:11:08,782][1652475] Updated weights for policy 0, policy_version 355109 (0.0017) [2024-06-15 16:11:10,194][1652475] Updated weights for policy 0, policy_version 355153 (0.0013) [2024-06-15 16:11:10,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43144.4, 300 sec: 43653.6). Total num frames: 727384064. Throughput: 0: 10695.1. Samples: 181899776. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:11:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:11:12,107][1652475] Updated weights for policy 0, policy_version 355216 (0.0016) [2024-06-15 16:11:15,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 41506.2, 300 sec: 43098.2). Total num frames: 727580672. Throughput: 0: 10615.5. Samples: 181961216. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:11:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:11:17,633][1652475] Updated weights for policy 0, policy_version 355280 (0.0018) [2024-06-15 16:11:20,738][1648984] Fps is (10 sec: 32768.8, 60 sec: 41506.4, 300 sec: 43098.3). Total num frames: 727711744. Throughput: 0: 10695.1. Samples: 181994496. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:11:20,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 16:11:22,199][1652475] Updated weights for policy 0, policy_version 355363 (0.0013) [2024-06-15 16:11:23,954][1652475] Updated weights for policy 0, policy_version 355447 (0.0076) [2024-06-15 16:11:25,541][1652475] Updated weights for policy 0, policy_version 355509 (0.0014) [2024-06-15 16:11:25,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 43653.9). Total num frames: 728072192. Throughput: 0: 10535.8. Samples: 182053888. Policy #0 lag: (min: 13.0, avg: 116.5, max: 269.0) [2024-06-15 16:11:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:11:30,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 728236032. Throughput: 0: 10604.1. Samples: 182117376. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:11:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:11:33,784][1652475] Updated weights for policy 0, policy_version 355585 (0.0012) [2024-06-15 16:11:34,941][1652475] Updated weights for policy 0, policy_version 355646 (0.0035) [2024-06-15 16:11:35,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42052.2, 300 sec: 43653.7). Total num frames: 728399872. Throughput: 0: 10706.5. Samples: 182155776. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:11:35,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:11:36,419][1652475] Updated weights for policy 0, policy_version 355704 (0.0013) [2024-06-15 16:11:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43098.5). Total num frames: 728629248. Throughput: 0: 10467.9. Samples: 182214144. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:11:40,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:11:40,970][1652475] Updated weights for policy 0, policy_version 355785 (0.0017) [2024-06-15 16:11:42,062][1652475] Updated weights for policy 0, policy_version 355838 (0.0013) [2024-06-15 16:11:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40413.8, 300 sec: 43099.6). Total num frames: 728760320. Throughput: 0: 10706.5. Samples: 182287360. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:11:45,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:11:45,988][1651340] Signal inference workers to stop experience collection... (18250 times) [2024-06-15 16:11:46,039][1652475] InferenceWorker_p0-w0: stopping experience collection (18250 times) [2024-06-15 16:11:46,244][1651340] Signal inference workers to resume experience collection... (18250 times) [2024-06-15 16:11:46,245][1652475] InferenceWorker_p0-w0: resuming experience collection (18250 times) [2024-06-15 16:11:46,814][1652475] Updated weights for policy 0, policy_version 355898 (0.0016) [2024-06-15 16:11:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 43320.5). Total num frames: 729022464. Throughput: 0: 10592.7. Samples: 182312960. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:11:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:11:51,299][1652475] Updated weights for policy 0, policy_version 355985 (0.0015) [2024-06-15 16:11:52,020][1652475] Updated weights for policy 0, policy_version 356028 (0.0125) [2024-06-15 16:11:53,552][1652475] Updated weights for policy 0, policy_version 356088 (0.0036) [2024-06-15 16:11:55,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43691.6, 300 sec: 43431.5). Total num frames: 729284608. Throughput: 0: 10604.1. Samples: 182376960. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:11:55,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:11:58,517][1652475] Updated weights for policy 0, policy_version 356156 (0.0014) [2024-06-15 16:12:00,150][1652475] Updated weights for policy 0, policy_version 356221 (0.0014) [2024-06-15 16:12:00,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 729546752. Throughput: 0: 10752.0. Samples: 182445056. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:12:00,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:12:04,082][1652475] Updated weights for policy 0, policy_version 356288 (0.0014) [2024-06-15 16:12:05,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 43764.7). Total num frames: 729808896. Throughput: 0: 10865.8. Samples: 182483456. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:12:05,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:12:10,083][1652475] Updated weights for policy 0, policy_version 356384 (0.0015) [2024-06-15 16:12:10,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42052.4, 300 sec: 43431.5). Total num frames: 729907200. Throughput: 0: 11104.7. Samples: 182553600. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:12:10,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:12:12,324][1652475] Updated weights for policy 0, policy_version 356471 (0.0013) [2024-06-15 16:12:15,170][1652475] Updated weights for policy 0, policy_version 356501 (0.0013) [2024-06-15 16:12:15,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 43144.4, 300 sec: 43431.4). Total num frames: 730169344. Throughput: 0: 11036.4. Samples: 182614016. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:12:15,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:12:17,177][1652475] Updated weights for policy 0, policy_version 356592 (0.0012) [2024-06-15 16:12:20,738][1648984] Fps is (10 sec: 42597.2, 60 sec: 43690.4, 300 sec: 43542.6). Total num frames: 730333184. Throughput: 0: 10797.4. Samples: 182641664. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:12:20,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:12:22,489][1652475] Updated weights for policy 0, policy_version 356657 (0.0037) [2024-06-15 16:12:24,239][1652475] Updated weights for policy 0, policy_version 356720 (0.0041) [2024-06-15 16:12:25,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 42052.2, 300 sec: 43320.4). Total num frames: 730595328. Throughput: 0: 10945.4. Samples: 182706688. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:12:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:12:27,830][1652475] Updated weights for policy 0, policy_version 356786 (0.0017) [2024-06-15 16:12:28,254][1651340] Signal inference workers to stop experience collection... (18300 times) [2024-06-15 16:12:28,297][1652475] InferenceWorker_p0-w0: stopping experience collection (18300 times) [2024-06-15 16:12:28,562][1651340] Signal inference workers to resume experience collection... (18300 times) [2024-06-15 16:12:28,563][1652475] InferenceWorker_p0-w0: resuming experience collection (18300 times) [2024-06-15 16:12:29,084][1652475] Updated weights for policy 0, policy_version 356833 (0.0013) [2024-06-15 16:12:29,598][1652475] Updated weights for policy 0, policy_version 356864 (0.0011) [2024-06-15 16:12:30,738][1648984] Fps is (10 sec: 52430.1, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 730857472. Throughput: 0: 10831.6. Samples: 182774784. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:12:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:12:35,556][1652475] Updated weights for policy 0, policy_version 356960 (0.0021) [2024-06-15 16:12:35,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 731054080. Throughput: 0: 11127.5. Samples: 182813696. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:12:35,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:12:39,415][1652475] Updated weights for policy 0, policy_version 357027 (0.0011) [2024-06-15 16:12:40,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 43325.9). Total num frames: 731250688. Throughput: 0: 11070.6. Samples: 182875136. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:12:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:12:41,964][1652475] Updated weights for policy 0, policy_version 357111 (0.0098) [2024-06-15 16:12:45,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 731414528. Throughput: 0: 11082.0. Samples: 182943744. Policy #0 lag: (min: 8.0, avg: 110.1, max: 264.0) [2024-06-15 16:12:45,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:12:46,437][1652475] Updated weights for policy 0, policy_version 357168 (0.0145) [2024-06-15 16:12:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 731643904. Throughput: 0: 10672.4. Samples: 182963712. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:12:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:12:51,190][1652475] Updated weights for policy 0, policy_version 357252 (0.0105) [2024-06-15 16:12:52,493][1652475] Updated weights for policy 0, policy_version 357312 (0.0025) [2024-06-15 16:12:55,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.4, 300 sec: 43431.5). Total num frames: 731873280. Throughput: 0: 10717.8. Samples: 183035904. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:12:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:12:55,800][1652475] Updated weights for policy 0, policy_version 357375 (0.0020) [2024-06-15 16:12:55,809][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000357376_731906048.pth... [2024-06-15 16:12:55,869][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000352320_721551360.pth [2024-06-15 16:12:58,539][1652475] Updated weights for policy 0, policy_version 357440 (0.0013) [2024-06-15 16:12:59,580][1652475] Updated weights for policy 0, policy_version 357488 (0.0012) [2024-06-15 16:13:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 732168192. Throughput: 0: 10661.0. Samples: 183093760. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:13:03,604][1652475] Updated weights for policy 0, policy_version 357538 (0.0013) [2024-06-15 16:13:05,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 732299264. Throughput: 0: 10888.6. Samples: 183131648. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:13:08,713][1652475] Updated weights for policy 0, policy_version 357601 (0.0011) [2024-06-15 16:13:10,612][1652475] Updated weights for policy 0, policy_version 357680 (0.0015) [2024-06-15 16:13:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43690.7, 300 sec: 43431.6). Total num frames: 732528640. Throughput: 0: 10888.5. Samples: 183196672. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:10,744][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:13:12,066][1652475] Updated weights for policy 0, policy_version 357751 (0.0015) [2024-06-15 16:13:14,405][1651340] Signal inference workers to stop experience collection... (18350 times) [2024-06-15 16:13:14,437][1652475] InferenceWorker_p0-w0: stopping experience collection (18350 times) [2024-06-15 16:13:14,734][1651340] Signal inference workers to resume experience collection... (18350 times) [2024-06-15 16:13:14,735][1652475] InferenceWorker_p0-w0: resuming experience collection (18350 times) [2024-06-15 16:13:15,706][1652475] Updated weights for policy 0, policy_version 357812 (0.0013) [2024-06-15 16:13:15,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43690.8, 300 sec: 43098.2). Total num frames: 732790784. Throughput: 0: 10649.6. Samples: 183254016. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:13:19,859][1652475] Updated weights for policy 0, policy_version 357840 (0.0020) [2024-06-15 16:13:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.8, 300 sec: 42987.2). Total num frames: 732921856. Throughput: 0: 10626.9. Samples: 183291904. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:13:20,894][1652475] Updated weights for policy 0, policy_version 357884 (0.0012) [2024-06-15 16:13:22,507][1652475] Updated weights for policy 0, policy_version 357937 (0.0014) [2024-06-15 16:13:24,644][1652475] Updated weights for policy 0, policy_version 357991 (0.0058) [2024-06-15 16:13:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 733216768. Throughput: 0: 10661.0. Samples: 183354880. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:13:26,544][1652475] Updated weights for policy 0, policy_version 358036 (0.0014) [2024-06-15 16:13:27,464][1652475] Updated weights for policy 0, policy_version 358080 (0.0013) [2024-06-15 16:13:30,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 733347840. Throughput: 0: 10706.5. Samples: 183425536. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:30,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:13:31,723][1652475] Updated weights for policy 0, policy_version 358136 (0.0116) [2024-06-15 16:13:33,040][1652475] Updated weights for policy 0, policy_version 358163 (0.0046) [2024-06-15 16:13:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 733609984. Throughput: 0: 11002.3. Samples: 183458816. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:35,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 16:13:37,610][1652475] Updated weights for policy 0, policy_version 358256 (0.0020) [2024-06-15 16:13:39,297][1652475] Updated weights for policy 0, policy_version 358325 (0.0011) [2024-06-15 16:13:40,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 733872128. Throughput: 0: 10706.5. Samples: 183517696. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:13:43,140][1652475] Updated weights for policy 0, policy_version 358384 (0.0024) [2024-06-15 16:13:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 734003200. Throughput: 0: 11047.8. Samples: 183590912. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:13:46,911][1652475] Updated weights for policy 0, policy_version 358416 (0.0012) [2024-06-15 16:13:48,733][1652475] Updated weights for policy 0, policy_version 358497 (0.0178) [2024-06-15 16:13:50,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 734363648. Throughput: 0: 10899.9. Samples: 183622144. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:50,738][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 16:13:50,876][1652475] Updated weights for policy 0, policy_version 358581 (0.0012) [2024-06-15 16:13:54,947][1652475] Updated weights for policy 0, policy_version 358626 (0.0014) [2024-06-15 16:13:55,738][1648984] Fps is (10 sec: 52427.4, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 734527488. Throughput: 0: 10751.9. Samples: 183680512. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:13:55,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:14:00,540][1652475] Updated weights for policy 0, policy_version 358704 (0.0189) [2024-06-15 16:14:00,738][1648984] Fps is (10 sec: 26214.2, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 734625792. Throughput: 0: 10922.7. Samples: 183745536. Policy #0 lag: (min: 77.0, avg: 165.1, max: 333.0) [2024-06-15 16:14:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:14:01,583][1651340] Signal inference workers to stop experience collection... (18400 times) [2024-06-15 16:14:01,691][1652475] InferenceWorker_p0-w0: stopping experience collection (18400 times) [2024-06-15 16:14:01,874][1651340] Signal inference workers to resume experience collection... (18400 times) [2024-06-15 16:14:01,874][1652475] InferenceWorker_p0-w0: resuming experience collection (18400 times) [2024-06-15 16:14:02,580][1652475] Updated weights for policy 0, policy_version 358784 (0.0017) [2024-06-15 16:14:05,411][1652475] Updated weights for policy 0, policy_version 358844 (0.0142) [2024-06-15 16:14:05,738][1648984] Fps is (10 sec: 39322.6, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 734920704. Throughput: 0: 10581.3. Samples: 183768064. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:14:07,920][1652475] Updated weights for policy 0, policy_version 358912 (0.0013) [2024-06-15 16:14:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 735051776. Throughput: 0: 10604.1. Samples: 183832064. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:10,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:14:15,153][1652475] Updated weights for policy 0, policy_version 359035 (0.0013) [2024-06-15 16:14:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 43098.2). Total num frames: 735313920. Throughput: 0: 10262.8. Samples: 183887360. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:14:17,816][1652475] Updated weights for policy 0, policy_version 359097 (0.0118) [2024-06-15 16:14:20,338][1652475] Updated weights for policy 0, policy_version 359138 (0.0011) [2024-06-15 16:14:20,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 735576064. Throughput: 0: 10251.4. Samples: 183920128. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:14:25,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 42987.3). Total num frames: 735674368. Throughput: 0: 10683.7. Samples: 183998464. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:14:26,530][1652475] Updated weights for policy 0, policy_version 359248 (0.0015) [2024-06-15 16:14:28,752][1652475] Updated weights for policy 0, policy_version 359298 (0.0016) [2024-06-15 16:14:30,102][1652475] Updated weights for policy 0, policy_version 359359 (0.0012) [2024-06-15 16:14:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 735969280. Throughput: 0: 10251.4. Samples: 184052224. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:14:32,772][1652475] Updated weights for policy 0, policy_version 359419 (0.0015) [2024-06-15 16:14:35,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 736100352. Throughput: 0: 10365.1. Samples: 184088576. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:14:38,436][1652475] Updated weights for policy 0, policy_version 359509 (0.0014) [2024-06-15 16:14:40,632][1652475] Updated weights for policy 0, policy_version 359554 (0.0014) [2024-06-15 16:14:40,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 736362496. Throughput: 0: 10433.4. Samples: 184150016. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:40,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:14:44,773][1652475] Updated weights for policy 0, policy_version 359632 (0.0012) [2024-06-15 16:14:45,746][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 736624640. Throughput: 0: 10490.3. Samples: 184217600. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:45,746][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:14:49,908][1651340] Signal inference workers to stop experience collection... (18450 times) [2024-06-15 16:14:49,946][1652475] InferenceWorker_p0-w0: stopping experience collection (18450 times) [2024-06-15 16:14:50,140][1651340] Signal inference workers to resume experience collection... (18450 times) [2024-06-15 16:14:50,141][1652475] InferenceWorker_p0-w0: resuming experience collection (18450 times) [2024-06-15 16:14:50,142][1652475] Updated weights for policy 0, policy_version 359744 (0.0012) [2024-06-15 16:14:50,738][1648984] Fps is (10 sec: 42599.6, 60 sec: 40413.9, 300 sec: 43098.3). Total num frames: 736788480. Throughput: 0: 10729.2. Samples: 184250880. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:14:51,643][1652475] Updated weights for policy 0, policy_version 359798 (0.0012) [2024-06-15 16:14:53,952][1652475] Updated weights for policy 0, policy_version 359830 (0.0012) [2024-06-15 16:14:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 737017856. Throughput: 0: 10683.7. Samples: 184312832. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:14:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:14:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000359872_737017856.pth... [2024-06-15 16:14:55,832][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000354848_726728704.pth [2024-06-15 16:14:57,590][1652475] Updated weights for policy 0, policy_version 359904 (0.0087) [2024-06-15 16:14:59,959][1652475] Updated weights for policy 0, policy_version 359952 (0.0013) [2024-06-15 16:15:00,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 737247232. Throughput: 0: 11013.7. Samples: 184382976. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:15:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:15:01,115][1652475] Updated weights for policy 0, policy_version 360000 (0.0012) [2024-06-15 16:15:02,713][1652475] Updated weights for policy 0, policy_version 360064 (0.0081) [2024-06-15 16:15:05,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.3, 300 sec: 42987.2). Total num frames: 737476608. Throughput: 0: 10934.0. Samples: 184412160. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:15:05,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:15:06,136][1652475] Updated weights for policy 0, policy_version 360128 (0.0014) [2024-06-15 16:15:10,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42598.5, 300 sec: 42431.8). Total num frames: 737607680. Throughput: 0: 10752.0. Samples: 184482304. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:15:10,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 16:15:11,126][1652475] Updated weights for policy 0, policy_version 360186 (0.0014) [2024-06-15 16:15:12,667][1652475] Updated weights for policy 0, policy_version 360249 (0.0012) [2024-06-15 16:15:14,429][1652475] Updated weights for policy 0, policy_version 360313 (0.0016) [2024-06-15 16:15:15,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 737935360. Throughput: 0: 10843.0. Samples: 184540160. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:15:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:15:17,112][1652475] Updated weights for policy 0, policy_version 360356 (0.0013) [2024-06-15 16:15:20,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 738066432. Throughput: 0: 10797.5. Samples: 184574464. Policy #0 lag: (min: 47.0, avg: 193.6, max: 351.0) [2024-06-15 16:15:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:15:23,354][1652475] Updated weights for policy 0, policy_version 360444 (0.0062) [2024-06-15 16:15:24,747][1652475] Updated weights for policy 0, policy_version 360484 (0.0015) [2024-06-15 16:15:25,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44782.8, 300 sec: 43209.3). Total num frames: 738361344. Throughput: 0: 10979.6. Samples: 184644096. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:15:25,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:15:26,874][1652475] Updated weights for policy 0, policy_version 360569 (0.0017) [2024-06-15 16:15:29,326][1652475] Updated weights for policy 0, policy_version 360633 (0.0014) [2024-06-15 16:15:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 738590720. Throughput: 0: 10831.7. Samples: 184705024. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:15:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:15:35,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 738656256. Throughput: 0: 10877.1. Samples: 184740352. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:15:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:15:36,112][1652475] Updated weights for policy 0, policy_version 360702 (0.0134) [2024-06-15 16:15:37,115][1651340] Signal inference workers to stop experience collection... (18500 times) [2024-06-15 16:15:37,212][1652475] InferenceWorker_p0-w0: stopping experience collection (18500 times) [2024-06-15 16:15:37,415][1651340] Signal inference workers to resume experience collection... (18500 times) [2024-06-15 16:15:37,416][1652475] InferenceWorker_p0-w0: resuming experience collection (18500 times) [2024-06-15 16:15:37,929][1652475] Updated weights for policy 0, policy_version 360755 (0.0132) [2024-06-15 16:15:39,487][1652475] Updated weights for policy 0, policy_version 360816 (0.0014) [2024-06-15 16:15:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.8, 300 sec: 42876.1). Total num frames: 738983936. Throughput: 0: 10706.5. Samples: 184794624. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:15:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:15:41,469][1652475] Updated weights for policy 0, policy_version 360864 (0.0013) [2024-06-15 16:15:45,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 739115008. Throughput: 0: 10626.8. Samples: 184861184. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:15:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:15:47,457][1652475] Updated weights for policy 0, policy_version 360912 (0.0021) [2024-06-15 16:15:50,466][1652475] Updated weights for policy 0, policy_version 361008 (0.0017) [2024-06-15 16:15:50,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 42598.4, 300 sec: 42987.4). Total num frames: 739344384. Throughput: 0: 10615.5. Samples: 184889856. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:15:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:15:52,712][1652475] Updated weights for policy 0, policy_version 361056 (0.0016) [2024-06-15 16:15:55,738][1648984] Fps is (10 sec: 52426.9, 60 sec: 43690.4, 300 sec: 42876.1). Total num frames: 739639296. Throughput: 0: 10365.1. Samples: 184948736. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:15:55,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:16:00,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 39867.7, 300 sec: 42654.0). Total num frames: 739639296. Throughput: 0: 10683.8. Samples: 185020928. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:16:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:16:01,027][1652475] Updated weights for policy 0, policy_version 361168 (0.0016) [2024-06-15 16:16:03,536][1652475] Updated weights for policy 0, policy_version 361274 (0.0130) [2024-06-15 16:16:05,738][1648984] Fps is (10 sec: 36046.3, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 739999744. Throughput: 0: 10456.2. Samples: 185044992. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:16:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:16:05,948][1652475] Updated weights for policy 0, policy_version 361346 (0.0137) [2024-06-15 16:16:10,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 42598.3, 300 sec: 42653.9). Total num frames: 740163584. Throughput: 0: 10228.6. Samples: 185104384. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:16:10,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:16:13,184][1652475] Updated weights for policy 0, policy_version 361412 (0.0012) [2024-06-15 16:16:14,341][1652475] Updated weights for policy 0, policy_version 361469 (0.0097) [2024-06-15 16:16:15,739][1648984] Fps is (10 sec: 29487.8, 60 sec: 39320.9, 300 sec: 42653.8). Total num frames: 740294656. Throughput: 0: 10410.4. Samples: 185173504. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:16:15,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:16:17,326][1652475] Updated weights for policy 0, policy_version 361552 (0.0015) [2024-06-15 16:16:19,145][1652475] Updated weights for policy 0, policy_version 361621 (0.0014) [2024-06-15 16:16:20,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 740687872. Throughput: 0: 10149.0. Samples: 185197056. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:16:20,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 16:16:24,986][1651340] Signal inference workers to stop experience collection... (18550 times) [2024-06-15 16:16:25,020][1652475] InferenceWorker_p0-w0: stopping experience collection (18550 times) [2024-06-15 16:16:25,261][1651340] Signal inference workers to resume experience collection... (18550 times) [2024-06-15 16:16:25,262][1652475] InferenceWorker_p0-w0: resuming experience collection (18550 times) [2024-06-15 16:16:25,381][1652475] Updated weights for policy 0, policy_version 361680 (0.0130) [2024-06-15 16:16:25,738][1648984] Fps is (10 sec: 45880.4, 60 sec: 39867.8, 300 sec: 42431.8). Total num frames: 740753408. Throughput: 0: 10478.9. Samples: 185266176. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:16:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:16:26,314][1652475] Updated weights for policy 0, policy_version 361728 (0.0038) [2024-06-15 16:16:29,253][1652475] Updated weights for policy 0, policy_version 361808 (0.0014) [2024-06-15 16:16:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 741081088. Throughput: 0: 10228.6. Samples: 185321472. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:16:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:16:32,528][1652475] Updated weights for policy 0, policy_version 361872 (0.0013) [2024-06-15 16:16:33,785][1652475] Updated weights for policy 0, policy_version 361919 (0.0012) [2024-06-15 16:16:35,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 741212160. Throughput: 0: 10331.0. Samples: 185354752. Policy #0 lag: (min: 25.0, avg: 71.6, max: 217.0) [2024-06-15 16:16:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:16:38,350][1652475] Updated weights for policy 0, policy_version 361977 (0.0014) [2024-06-15 16:16:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 741441536. Throughput: 0: 10604.2. Samples: 185425920. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:16:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:16:40,910][1652475] Updated weights for policy 0, policy_version 362048 (0.0015) [2024-06-15 16:16:42,043][1652475] Updated weights for policy 0, policy_version 362099 (0.0014) [2024-06-15 16:16:45,154][1652475] Updated weights for policy 0, policy_version 362144 (0.0012) [2024-06-15 16:16:45,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 741703680. Throughput: 0: 10410.7. Samples: 185489408. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:16:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:16:48,773][1652475] Updated weights for policy 0, policy_version 362179 (0.0037) [2024-06-15 16:16:50,158][1652475] Updated weights for policy 0, policy_version 362240 (0.0134) [2024-06-15 16:16:50,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 741867520. Throughput: 0: 10729.2. Samples: 185527808. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:16:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:16:52,306][1652475] Updated weights for policy 0, policy_version 362290 (0.0025) [2024-06-15 16:16:53,732][1652475] Updated weights for policy 0, policy_version 362352 (0.0033) [2024-06-15 16:16:55,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.4, 300 sec: 42653.9). Total num frames: 742129664. Throughput: 0: 10706.5. Samples: 185586176. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:16:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:16:55,749][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000362368_742129664.pth... [2024-06-15 16:16:55,803][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000357376_731906048.pth [2024-06-15 16:16:57,378][1652475] Updated weights for policy 0, policy_version 362416 (0.0025) [2024-06-15 16:17:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 742260736. Throughput: 0: 10775.0. Samples: 185658368. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:17:01,096][1652475] Updated weights for policy 0, policy_version 362448 (0.0013) [2024-06-15 16:17:04,183][1652475] Updated weights for policy 0, policy_version 362528 (0.0012) [2024-06-15 16:17:05,348][1652475] Updated weights for policy 0, policy_version 362576 (0.0013) [2024-06-15 16:17:05,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 742555648. Throughput: 0: 10945.4. Samples: 185689600. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:17:08,473][1651340] Signal inference workers to stop experience collection... (18600 times) [2024-06-15 16:17:08,521][1652475] InferenceWorker_p0-w0: stopping experience collection (18600 times) [2024-06-15 16:17:08,716][1651340] Signal inference workers to resume experience collection... (18600 times) [2024-06-15 16:17:08,717][1652475] InferenceWorker_p0-w0: resuming experience collection (18600 times) [2024-06-15 16:17:08,896][1652475] Updated weights for policy 0, policy_version 362660 (0.0013) [2024-06-15 16:17:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 742785024. Throughput: 0: 10740.6. Samples: 185749504. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:17:13,764][1652475] Updated weights for policy 0, policy_version 362722 (0.0016) [2024-06-15 16:17:15,740][1648984] Fps is (10 sec: 39321.3, 60 sec: 44237.6, 300 sec: 42765.1). Total num frames: 742948864. Throughput: 0: 11173.0. Samples: 185824256. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:15,741][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:17:16,059][1652475] Updated weights for policy 0, policy_version 362784 (0.0013) [2024-06-15 16:17:18,605][1652475] Updated weights for policy 0, policy_version 362880 (0.0029) [2024-06-15 16:17:20,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 743211008. Throughput: 0: 10877.1. Samples: 185844224. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:17:21,569][1652475] Updated weights for policy 0, policy_version 362937 (0.0020) [2024-06-15 16:17:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 743309312. Throughput: 0: 10865.8. Samples: 185914880. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:17:27,071][1652475] Updated weights for policy 0, policy_version 363000 (0.0019) [2024-06-15 16:17:29,302][1652475] Updated weights for policy 0, policy_version 363088 (0.0014) [2024-06-15 16:17:30,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 743702528. Throughput: 0: 10672.3. Samples: 185969664. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:17:32,048][1652475] Updated weights for policy 0, policy_version 363152 (0.0105) [2024-06-15 16:17:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 743833600. Throughput: 0: 10547.2. Samples: 186002432. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:17:38,757][1652475] Updated weights for policy 0, policy_version 363220 (0.0014) [2024-06-15 16:17:40,739][1648984] Fps is (10 sec: 32762.9, 60 sec: 43143.4, 300 sec: 42764.8). Total num frames: 744030208. Throughput: 0: 11024.7. Samples: 186082304. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:40,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:17:41,299][1652475] Updated weights for policy 0, policy_version 363327 (0.0171) [2024-06-15 16:17:42,706][1652475] Updated weights for policy 0, policy_version 363392 (0.0016) [2024-06-15 16:17:45,473][1652475] Updated weights for policy 0, policy_version 363454 (0.0130) [2024-06-15 16:17:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 744357888. Throughput: 0: 10444.8. Samples: 186128384. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:45,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:17:50,738][1648984] Fps is (10 sec: 32772.5, 60 sec: 41506.0, 300 sec: 42320.7). Total num frames: 744357888. Throughput: 0: 10604.0. Samples: 186166784. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:50,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:17:52,427][1652475] Updated weights for policy 0, policy_version 363520 (0.0122) [2024-06-15 16:17:53,664][1652475] Updated weights for policy 0, policy_version 363578 (0.0049) [2024-06-15 16:17:55,753][1648984] Fps is (10 sec: 29447.7, 60 sec: 42041.9, 300 sec: 42318.6). Total num frames: 744652800. Throughput: 0: 10657.5. Samples: 186229248. Policy #0 lag: (min: 36.0, avg: 135.4, max: 292.0) [2024-06-15 16:17:55,753][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:17:55,789][1651340] Signal inference workers to stop experience collection... (18650 times) [2024-06-15 16:17:55,816][1652475] InferenceWorker_p0-w0: stopping experience collection (18650 times) [2024-06-15 16:17:56,133][1651340] Signal inference workers to resume experience collection... (18650 times) [2024-06-15 16:17:56,134][1652475] InferenceWorker_p0-w0: resuming experience collection (18650 times) [2024-06-15 16:17:57,093][1652475] Updated weights for policy 0, policy_version 363649 (0.0014) [2024-06-15 16:18:00,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 744882176. Throughput: 0: 10285.5. Samples: 186287104. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:00,741][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 16:18:04,057][1652475] Updated weights for policy 0, policy_version 363728 (0.0012) [2024-06-15 16:18:05,738][1648984] Fps is (10 sec: 36097.9, 60 sec: 40959.9, 300 sec: 42320.7). Total num frames: 745013248. Throughput: 0: 10683.7. Samples: 186324992. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:18:06,884][1652475] Updated weights for policy 0, policy_version 363808 (0.0108) [2024-06-15 16:18:08,288][1652475] Updated weights for policy 0, policy_version 363857 (0.0011) [2024-06-15 16:18:09,820][1652475] Updated weights for policy 0, policy_version 363920 (0.0014) [2024-06-15 16:18:10,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 745373696. Throughput: 0: 10296.9. Samples: 186378240. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:18:15,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 40960.1, 300 sec: 42320.7). Total num frames: 745406464. Throughput: 0: 10626.9. Samples: 186447872. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:18:16,628][1652475] Updated weights for policy 0, policy_version 363986 (0.0013) [2024-06-15 16:18:17,367][1652475] Updated weights for policy 0, policy_version 364026 (0.0021) [2024-06-15 16:18:19,853][1652475] Updated weights for policy 0, policy_version 364065 (0.0014) [2024-06-15 16:18:20,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 40960.1, 300 sec: 42209.6). Total num frames: 745668608. Throughput: 0: 10649.6. Samples: 186481664. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:20,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:18:21,983][1652475] Updated weights for policy 0, policy_version 364148 (0.0013) [2024-06-15 16:18:23,490][1652475] Updated weights for policy 0, policy_version 364215 (0.0123) [2024-06-15 16:18:25,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 745930752. Throughput: 0: 9990.0. Samples: 186531840. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:18:30,738][1648984] Fps is (10 sec: 26214.6, 60 sec: 37137.1, 300 sec: 41765.3). Total num frames: 745930752. Throughput: 0: 10581.4. Samples: 186604544. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 16:18:32,701][1652475] Updated weights for policy 0, policy_version 364288 (0.0012) [2024-06-15 16:18:34,969][1652475] Updated weights for policy 0, policy_version 364384 (0.0013) [2024-06-15 16:18:35,751][1648984] Fps is (10 sec: 39268.1, 60 sec: 41496.7, 300 sec: 42207.7). Total num frames: 746323968. Throughput: 0: 10271.1. Samples: 186629120. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:35,752][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:18:36,593][1652475] Updated weights for policy 0, policy_version 364464 (0.0015) [2024-06-15 16:18:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 40414.9, 300 sec: 42209.6). Total num frames: 746455040. Throughput: 0: 10095.4. Samples: 186683392. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:18:44,125][1651340] Signal inference workers to stop experience collection... (18700 times) [2024-06-15 16:18:44,186][1652475] InferenceWorker_p0-w0: stopping experience collection (18700 times) [2024-06-15 16:18:44,388][1651340] Signal inference workers to resume experience collection... (18700 times) [2024-06-15 16:18:44,389][1652475] InferenceWorker_p0-w0: resuming experience collection (18700 times) [2024-06-15 16:18:44,575][1652475] Updated weights for policy 0, policy_version 364518 (0.0017) [2024-06-15 16:18:45,738][1648984] Fps is (10 sec: 29531.1, 60 sec: 37683.2, 300 sec: 41543.1). Total num frames: 746618880. Throughput: 0: 10296.9. Samples: 186750464. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:45,739][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 16:18:46,550][1652475] Updated weights for policy 0, policy_version 364608 (0.0013) [2024-06-15 16:18:48,408][1652475] Updated weights for policy 0, policy_version 364688 (0.0114) [2024-06-15 16:18:50,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.8, 300 sec: 42209.7). Total num frames: 746979328. Throughput: 0: 9978.4. Samples: 186774016. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:50,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 16:18:55,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 38785.0, 300 sec: 41876.4). Total num frames: 746979328. Throughput: 0: 10353.8. Samples: 186844160. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:18:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:18:56,059][1652475] Updated weights for policy 0, policy_version 364752 (0.0026) [2024-06-15 16:18:56,075][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000364752_747012096.pth... [2024-06-15 16:18:56,260][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000359872_737017856.pth [2024-06-15 16:18:57,634][1652475] Updated weights for policy 0, policy_version 364816 (0.0013) [2024-06-15 16:18:59,404][1652475] Updated weights for policy 0, policy_version 364896 (0.0017) [2024-06-15 16:19:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 747372544. Throughput: 0: 10069.3. Samples: 186900992. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:19:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:19:01,496][1652475] Updated weights for policy 0, policy_version 364960 (0.0014) [2024-06-15 16:19:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 747503616. Throughput: 0: 10092.1. Samples: 186935808. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:19:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:19:08,782][1652475] Updated weights for policy 0, policy_version 365045 (0.0013) [2024-06-15 16:19:10,391][1652475] Updated weights for policy 0, policy_version 365120 (0.0024) [2024-06-15 16:19:10,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 39867.6, 300 sec: 42209.6). Total num frames: 747765760. Throughput: 0: 10660.9. Samples: 187011584. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:19:10,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:19:12,977][1652475] Updated weights for policy 0, policy_version 365201 (0.0014) [2024-06-15 16:19:13,881][1652475] Updated weights for policy 0, policy_version 365248 (0.0013) [2024-06-15 16:19:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 42209.6). Total num frames: 748027904. Throughput: 0: 10353.8. Samples: 187070464. Policy #0 lag: (min: 119.0, avg: 220.2, max: 311.0) [2024-06-15 16:19:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:19:20,732][1652475] Updated weights for policy 0, policy_version 365312 (0.0024) [2024-06-15 16:19:20,738][1648984] Fps is (10 sec: 39320.1, 60 sec: 41505.7, 300 sec: 42320.6). Total num frames: 748158976. Throughput: 0: 10789.2. Samples: 187114496. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:19:20,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:19:22,587][1651340] Signal inference workers to stop experience collection... (18750 times) [2024-06-15 16:19:22,615][1652475] InferenceWorker_p0-w0: stopping experience collection (18750 times) [2024-06-15 16:19:22,904][1651340] Signal inference workers to resume experience collection... (18750 times) [2024-06-15 16:19:22,905][1652475] InferenceWorker_p0-w0: resuming experience collection (18750 times) [2024-06-15 16:19:23,077][1652475] Updated weights for policy 0, policy_version 365411 (0.0016) [2024-06-15 16:19:24,221][1652475] Updated weights for policy 0, policy_version 365456 (0.0015) [2024-06-15 16:19:25,183][1652475] Updated weights for policy 0, policy_version 365498 (0.0017) [2024-06-15 16:19:25,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 748552192. Throughput: 0: 10786.1. Samples: 187168768. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:19:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:19:30,738][1648984] Fps is (10 sec: 39323.7, 60 sec: 43690.5, 300 sec: 42209.6). Total num frames: 748552192. Throughput: 0: 11116.1. Samples: 187250688. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:19:30,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:19:32,017][1652475] Updated weights for policy 0, policy_version 365573 (0.0014) [2024-06-15 16:19:33,057][1652475] Updated weights for policy 0, policy_version 365632 (0.0016) [2024-06-15 16:19:34,532][1652475] Updated weights for policy 0, policy_version 365691 (0.0031) [2024-06-15 16:19:35,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44793.2, 300 sec: 42876.1). Total num frames: 749010944. Throughput: 0: 11218.5. Samples: 187278848. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:19:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:19:36,053][1652475] Updated weights for policy 0, policy_version 365754 (0.0016) [2024-06-15 16:19:40,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 749076480. Throughput: 0: 11138.9. Samples: 187345408. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:19:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:19:43,597][1652475] Updated weights for policy 0, policy_version 365824 (0.0015) [2024-06-15 16:19:45,573][1652475] Updated weights for policy 0, policy_version 365905 (0.0015) [2024-06-15 16:19:45,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 45875.3, 300 sec: 42653.9). Total num frames: 749371392. Throughput: 0: 11298.1. Samples: 187409408. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:19:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:19:47,524][1652475] Updated weights for policy 0, policy_version 365984 (0.0014) [2024-06-15 16:19:50,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 749600768. Throughput: 0: 11047.8. Samples: 187432960. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:19:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:19:55,507][1652475] Updated weights for policy 0, policy_version 366038 (0.0013) [2024-06-15 16:19:55,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 44782.9, 300 sec: 42098.5). Total num frames: 749666304. Throughput: 0: 11025.1. Samples: 187507712. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:19:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 16:19:57,323][1652475] Updated weights for policy 0, policy_version 366112 (0.0018) [2024-06-15 16:19:58,755][1652475] Updated weights for policy 0, policy_version 366163 (0.0013) [2024-06-15 16:20:00,568][1652475] Updated weights for policy 0, policy_version 366240 (0.0014) [2024-06-15 16:20:00,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44782.8, 300 sec: 42653.9). Total num frames: 750059520. Throughput: 0: 10979.5. Samples: 187564544. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:20:00,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:20:05,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.6, 300 sec: 42431.8). Total num frames: 750125056. Throughput: 0: 10752.1. Samples: 187598336. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:20:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:20:06,704][1652475] Updated weights for policy 0, policy_version 366288 (0.0014) [2024-06-15 16:20:08,211][1651340] Signal inference workers to stop experience collection... (18800 times) [2024-06-15 16:20:08,290][1652475] InferenceWorker_p0-w0: stopping experience collection (18800 times) [2024-06-15 16:20:08,485][1651340] Signal inference workers to resume experience collection... (18800 times) [2024-06-15 16:20:08,487][1652475] InferenceWorker_p0-w0: resuming experience collection (18800 times) [2024-06-15 16:20:08,657][1652475] Updated weights for policy 0, policy_version 366354 (0.0013) [2024-06-15 16:20:10,624][1652475] Updated weights for policy 0, policy_version 366417 (0.0013) [2024-06-15 16:20:10,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 44237.0, 300 sec: 42320.7). Total num frames: 750419968. Throughput: 0: 11047.8. Samples: 187665920. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:20:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:20:12,567][1652475] Updated weights for policy 0, policy_version 366501 (0.0011) [2024-06-15 16:20:15,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 750649344. Throughput: 0: 10615.4. Samples: 187728384. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:20:15,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:20:18,414][1652475] Updated weights for policy 0, policy_version 366544 (0.0012) [2024-06-15 16:20:20,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43691.1, 300 sec: 42098.6). Total num frames: 750780416. Throughput: 0: 10786.1. Samples: 187764224. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:20:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:20:22,610][1652475] Updated weights for policy 0, policy_version 366640 (0.0185) [2024-06-15 16:20:25,026][1652475] Updated weights for policy 0, policy_version 366736 (0.0014) [2024-06-15 16:20:25,738][1648984] Fps is (10 sec: 49153.5, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 751140864. Throughput: 0: 10490.3. Samples: 187817472. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:20:25,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 16:20:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.8, 300 sec: 42431.8). Total num frames: 751173632. Throughput: 0: 10524.5. Samples: 187883008. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:20:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:20:31,228][1652475] Updated weights for policy 0, policy_version 366800 (0.0013) [2024-06-15 16:20:32,588][1652475] Updated weights for policy 0, policy_version 366848 (0.0017) [2024-06-15 16:20:34,703][1652475] Updated weights for policy 0, policy_version 366905 (0.0014) [2024-06-15 16:20:35,738][1648984] Fps is (10 sec: 32766.9, 60 sec: 40959.7, 300 sec: 42320.7). Total num frames: 751468544. Throughput: 0: 10660.9. Samples: 187912704. Policy #0 lag: (min: 15.0, avg: 73.8, max: 271.0) [2024-06-15 16:20:35,740][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 16:20:36,519][1652475] Updated weights for policy 0, policy_version 366960 (0.0011) [2024-06-15 16:20:37,969][1652475] Updated weights for policy 0, policy_version 367008 (0.0024) [2024-06-15 16:20:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 751697920. Throughput: 0: 10399.3. Samples: 187975680. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:20:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:20:44,484][1652475] Updated weights for policy 0, policy_version 367058 (0.0023) [2024-06-15 16:20:45,738][1648984] Fps is (10 sec: 36046.2, 60 sec: 40960.0, 300 sec: 42320.7). Total num frames: 751828992. Throughput: 0: 10615.5. Samples: 188042240. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:20:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:20:46,327][1652475] Updated weights for policy 0, policy_version 367136 (0.0015) [2024-06-15 16:20:48,040][1652475] Updated weights for policy 0, policy_version 367186 (0.0012) [2024-06-15 16:20:50,657][1652475] Updated weights for policy 0, policy_version 367264 (0.0015) [2024-06-15 16:20:50,739][1648984] Fps is (10 sec: 45869.1, 60 sec: 42597.5, 300 sec: 42431.7). Total num frames: 752156672. Throughput: 0: 10467.3. Samples: 188069376. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:20:50,739][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 16:20:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 752222208. Throughput: 0: 10353.8. Samples: 188131840. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:20:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:20:55,749][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000367296_752222208.pth... [2024-06-15 16:20:55,791][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000362368_742129664.pth [2024-06-15 16:20:58,408][1651340] Signal inference workers to stop experience collection... (18850 times) [2024-06-15 16:20:58,494][1652475] InferenceWorker_p0-w0: stopping experience collection (18850 times) [2024-06-15 16:20:58,605][1651340] Signal inference workers to resume experience collection... (18850 times) [2024-06-15 16:20:58,606][1652475] InferenceWorker_p0-w0: resuming experience collection (18850 times) [2024-06-15 16:20:58,608][1652475] Updated weights for policy 0, policy_version 367344 (0.0015) [2024-06-15 16:21:00,681][1652475] Updated weights for policy 0, policy_version 367425 (0.0191) [2024-06-15 16:21:00,738][1648984] Fps is (10 sec: 32772.4, 60 sec: 40414.0, 300 sec: 42320.7). Total num frames: 752484352. Throughput: 0: 10331.1. Samples: 188193280. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:21:02,233][1652475] Updated weights for policy 0, policy_version 367494 (0.0015) [2024-06-15 16:21:03,452][1652475] Updated weights for policy 0, policy_version 367551 (0.0218) [2024-06-15 16:21:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 42654.0). Total num frames: 752746496. Throughput: 0: 10092.1. Samples: 188218368. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:05,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:21:10,100][1652475] Updated weights for policy 0, policy_version 367607 (0.0012) [2024-06-15 16:21:10,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 40960.0, 300 sec: 42654.1). Total num frames: 752877568. Throughput: 0: 10649.6. Samples: 188296704. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:10,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 16:21:11,762][1652475] Updated weights for policy 0, policy_version 367664 (0.0124) [2024-06-15 16:21:13,841][1652475] Updated weights for policy 0, policy_version 367740 (0.0014) [2024-06-15 16:21:15,364][1652475] Updated weights for policy 0, policy_version 367800 (0.0013) [2024-06-15 16:21:15,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.9, 300 sec: 42653.9). Total num frames: 753270784. Throughput: 0: 10444.8. Samples: 188353024. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:21:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 753303552. Throughput: 0: 10695.2. Samples: 188393984. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:20,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:21:20,926][1652475] Updated weights for policy 0, policy_version 367842 (0.0015) [2024-06-15 16:21:23,443][1652475] Updated weights for policy 0, policy_version 367920 (0.0013) [2024-06-15 16:21:24,872][1652475] Updated weights for policy 0, policy_version 367984 (0.0014) [2024-06-15 16:21:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 753696768. Throughput: 0: 10877.1. Samples: 188465152. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:25,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:21:25,963][1652475] Updated weights for policy 0, policy_version 368032 (0.0011) [2024-06-15 16:21:30,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 753795072. Throughput: 0: 10979.6. Samples: 188536320. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:30,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:21:32,531][1652475] Updated weights for policy 0, policy_version 368112 (0.0015) [2024-06-15 16:21:35,051][1652475] Updated weights for policy 0, policy_version 368176 (0.0012) [2024-06-15 16:21:35,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.8, 300 sec: 42765.0). Total num frames: 754057216. Throughput: 0: 11150.5. Samples: 188571136. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:35,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:21:37,057][1652475] Updated weights for policy 0, policy_version 368242 (0.0013) [2024-06-15 16:21:38,201][1651340] Signal inference workers to stop experience collection... (18900 times) [2024-06-15 16:21:38,239][1652475] InferenceWorker_p0-w0: stopping experience collection (18900 times) [2024-06-15 16:21:38,519][1651340] Signal inference workers to resume experience collection... (18900 times) [2024-06-15 16:21:38,520][1652475] InferenceWorker_p0-w0: resuming experience collection (18900 times) [2024-06-15 16:21:38,730][1652475] Updated weights for policy 0, policy_version 368313 (0.0013) [2024-06-15 16:21:40,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 754319360. Throughput: 0: 11036.4. Samples: 188628480. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:40,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:21:43,983][1652475] Updated weights for policy 0, policy_version 368352 (0.0016) [2024-06-15 16:21:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 754450432. Throughput: 0: 11400.5. Samples: 188706304. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:45,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:21:45,946][1652475] Updated weights for policy 0, policy_version 368400 (0.0013) [2024-06-15 16:21:47,436][1652475] Updated weights for policy 0, policy_version 368464 (0.0013) [2024-06-15 16:21:49,941][1652475] Updated weights for policy 0, policy_version 368570 (0.0014) [2024-06-15 16:21:50,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 44783.9, 300 sec: 43098.3). Total num frames: 754843648. Throughput: 0: 11468.8. Samples: 188734464. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:50,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:21:55,695][1652475] Updated weights for policy 0, policy_version 368612 (0.0013) [2024-06-15 16:21:55,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 754909184. Throughput: 0: 11286.8. Samples: 188804608. Policy #0 lag: (min: 31.0, avg: 194.2, max: 383.0) [2024-06-15 16:21:55,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:21:57,978][1652475] Updated weights for policy 0, policy_version 368656 (0.0012) [2024-06-15 16:21:59,210][1652475] Updated weights for policy 0, policy_version 368707 (0.0012) [2024-06-15 16:22:00,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 45875.1, 300 sec: 42987.2). Total num frames: 755236864. Throughput: 0: 11423.3. Samples: 188867072. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:00,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 16:22:00,918][1652475] Updated weights for policy 0, policy_version 368784 (0.0014) [2024-06-15 16:22:05,738][1648984] Fps is (10 sec: 45872.7, 60 sec: 43690.2, 300 sec: 42653.9). Total num frames: 755367936. Throughput: 0: 11218.3. Samples: 188898816. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:05,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 16:22:07,176][1652475] Updated weights for policy 0, policy_version 368864 (0.0013) [2024-06-15 16:22:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 45329.1, 300 sec: 42876.1). Total num frames: 755597312. Throughput: 0: 11298.1. Samples: 188973568. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:22:10,979][1652475] Updated weights for policy 0, policy_version 368949 (0.0101) [2024-06-15 16:22:12,459][1652475] Updated weights for policy 0, policy_version 369013 (0.0013) [2024-06-15 16:22:14,335][1652475] Updated weights for policy 0, policy_version 369086 (0.0013) [2024-06-15 16:22:15,758][1648984] Fps is (10 sec: 52326.6, 60 sec: 43676.1, 300 sec: 42984.3). Total num frames: 755892224. Throughput: 0: 10860.9. Samples: 189025280. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:15,758][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:22:19,435][1652475] Updated weights for policy 0, policy_version 369122 (0.0107) [2024-06-15 16:22:20,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45329.0, 300 sec: 43098.3). Total num frames: 756023296. Throughput: 0: 10979.5. Samples: 189065216. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:22:21,587][1652475] Updated weights for policy 0, policy_version 369158 (0.0033) [2024-06-15 16:22:22,597][1652475] Updated weights for policy 0, policy_version 369214 (0.0015) [2024-06-15 16:22:23,806][1651340] Signal inference workers to stop experience collection... (18950 times) [2024-06-15 16:22:23,850][1652475] InferenceWorker_p0-w0: stopping experience collection (18950 times) [2024-06-15 16:22:24,030][1651340] Signal inference workers to resume experience collection... (18950 times) [2024-06-15 16:22:24,031][1652475] InferenceWorker_p0-w0: resuming experience collection (18950 times) [2024-06-15 16:22:24,850][1652475] Updated weights for policy 0, policy_version 369277 (0.0012) [2024-06-15 16:22:25,738][1648984] Fps is (10 sec: 42684.1, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 756318208. Throughput: 0: 11138.9. Samples: 189129728. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:22:26,450][1652475] Updated weights for policy 0, policy_version 369339 (0.0011) [2024-06-15 16:22:30,585][1652475] Updated weights for policy 0, policy_version 369401 (0.0151) [2024-06-15 16:22:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 45875.1, 300 sec: 43098.3). Total num frames: 756547584. Throughput: 0: 10968.2. Samples: 189199872. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:22:33,645][1652475] Updated weights for policy 0, policy_version 369467 (0.0014) [2024-06-15 16:22:35,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 44236.8, 300 sec: 42987.4). Total num frames: 756711424. Throughput: 0: 11025.0. Samples: 189230592. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:22:36,650][1652475] Updated weights for policy 0, policy_version 369529 (0.0014) [2024-06-15 16:22:38,368][1652475] Updated weights for policy 0, policy_version 369571 (0.0014) [2024-06-15 16:22:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 756940800. Throughput: 0: 10956.8. Samples: 189297664. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:40,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:22:41,727][1652475] Updated weights for policy 0, policy_version 369650 (0.0012) [2024-06-15 16:22:44,537][1652475] Updated weights for policy 0, policy_version 369697 (0.0015) [2024-06-15 16:22:45,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 757202944. Throughput: 0: 11013.7. Samples: 189362688. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:45,747][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 16:22:48,567][1652475] Updated weights for policy 0, policy_version 369747 (0.0013) [2024-06-15 16:22:50,377][1652475] Updated weights for policy 0, policy_version 369813 (0.0012) [2024-06-15 16:22:50,742][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.3, 300 sec: 43211.5). Total num frames: 757399552. Throughput: 0: 11207.2. Samples: 189403136. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:50,742][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:22:54,346][1652475] Updated weights for policy 0, policy_version 369920 (0.0028) [2024-06-15 16:22:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 757596160. Throughput: 0: 10683.7. Samples: 189454336. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:22:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:22:56,325][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000369952_757661696.pth... [2024-06-15 16:22:56,548][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000364752_747012096.pth [2024-06-15 16:22:57,256][1652475] Updated weights for policy 0, policy_version 369983 (0.0013) [2024-06-15 16:23:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42052.3, 300 sec: 43209.4). Total num frames: 757760000. Throughput: 0: 11075.5. Samples: 189523456. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:23:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:23:01,438][1652475] Updated weights for policy 0, policy_version 370040 (0.0014) [2024-06-15 16:23:03,781][1652475] Updated weights for policy 0, policy_version 370089 (0.0014) [2024-06-15 16:23:05,739][1648984] Fps is (10 sec: 39321.1, 60 sec: 43691.0, 300 sec: 42765.0). Total num frames: 757989376. Throughput: 0: 10877.1. Samples: 189554688. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:23:05,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:23:06,400][1652475] Updated weights for policy 0, policy_version 370118 (0.0013) [2024-06-15 16:23:08,030][1652475] Updated weights for policy 0, policy_version 370178 (0.0027) [2024-06-15 16:23:09,601][1652475] Updated weights for policy 0, policy_version 370239 (0.0015) [2024-06-15 16:23:10,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 758251520. Throughput: 0: 10706.5. Samples: 189611520. Policy #0 lag: (min: 62.0, avg: 140.5, max: 318.0) [2024-06-15 16:23:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:23:12,746][1651340] Signal inference workers to stop experience collection... (19000 times) [2024-06-15 16:23:12,785][1652475] InferenceWorker_p0-w0: stopping experience collection (19000 times) [2024-06-15 16:23:13,025][1651340] Signal inference workers to resume experience collection... (19000 times) [2024-06-15 16:23:13,027][1652475] InferenceWorker_p0-w0: resuming experience collection (19000 times) [2024-06-15 16:23:13,029][1652475] Updated weights for policy 0, policy_version 370288 (0.0013) [2024-06-15 16:23:15,408][1652475] Updated weights for policy 0, policy_version 370336 (0.0011) [2024-06-15 16:23:15,738][1648984] Fps is (10 sec: 45876.2, 60 sec: 42612.7, 300 sec: 43320.4). Total num frames: 758448128. Throughput: 0: 10695.1. Samples: 189681152. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:23:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:23:19,256][1652475] Updated weights for policy 0, policy_version 370387 (0.0033) [2024-06-15 16:23:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 758644736. Throughput: 0: 10774.8. Samples: 189715456. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:23:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:23:22,336][1652475] Updated weights for policy 0, policy_version 370453 (0.0015) [2024-06-15 16:23:24,754][1652475] Updated weights for policy 0, policy_version 370532 (0.0087) [2024-06-15 16:23:25,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 758906880. Throughput: 0: 10569.9. Samples: 189773312. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:23:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:23:27,937][1652475] Updated weights for policy 0, policy_version 370622 (0.0107) [2024-06-15 16:23:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 43100.3). Total num frames: 759037952. Throughput: 0: 10592.7. Samples: 189839360. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:23:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:23:31,430][1652475] Updated weights for policy 0, policy_version 370659 (0.0011) [2024-06-15 16:23:35,079][1652475] Updated weights for policy 0, policy_version 370723 (0.0014) [2024-06-15 16:23:35,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 759300096. Throughput: 0: 10570.0. Samples: 189878784. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:23:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:23:36,773][1652475] Updated weights for policy 0, policy_version 370800 (0.0020) [2024-06-15 16:23:39,110][1652475] Updated weights for policy 0, policy_version 370848 (0.0013) [2024-06-15 16:23:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 759562240. Throughput: 0: 10911.3. Samples: 189945344. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:23:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:23:41,926][1652475] Updated weights for policy 0, policy_version 370912 (0.0013) [2024-06-15 16:23:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 759758848. Throughput: 0: 10990.9. Samples: 190018048. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:23:45,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:23:45,980][1652475] Updated weights for policy 0, policy_version 370992 (0.0013) [2024-06-15 16:23:48,766][1652475] Updated weights for policy 0, policy_version 371057 (0.0118) [2024-06-15 16:23:50,340][1652475] Updated weights for policy 0, policy_version 371088 (0.0011) [2024-06-15 16:23:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 760020992. Throughput: 0: 10968.2. Samples: 190048256. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:23:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:23:51,144][1652475] Updated weights for policy 0, policy_version 371129 (0.0047) [2024-06-15 16:23:53,474][1652475] Updated weights for policy 0, policy_version 371190 (0.0013) [2024-06-15 16:23:55,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.7, 300 sec: 43542.5). Total num frames: 760217600. Throughput: 0: 11218.5. Samples: 190116352. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:23:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:23:57,605][1652475] Updated weights for policy 0, policy_version 371248 (0.0014) [2024-06-15 16:23:59,421][1652475] Updated weights for policy 0, policy_version 371299 (0.0043) [2024-06-15 16:24:00,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 45329.0, 300 sec: 43986.9). Total num frames: 760479744. Throughput: 0: 11184.3. Samples: 190184448. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:24:00,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:24:02,428][1651340] Signal inference workers to stop experience collection... (19050 times) [2024-06-15 16:24:02,478][1652475] InferenceWorker_p0-w0: stopping experience collection (19050 times) [2024-06-15 16:24:02,488][1652475] Updated weights for policy 0, policy_version 371346 (0.0013) [2024-06-15 16:24:02,649][1651340] Signal inference workers to resume experience collection... (19050 times) [2024-06-15 16:24:02,649][1652475] InferenceWorker_p0-w0: resuming experience collection (19050 times) [2024-06-15 16:24:03,668][1652475] Updated weights for policy 0, policy_version 371393 (0.0046) [2024-06-15 16:24:05,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 45875.4, 300 sec: 43986.9). Total num frames: 760741888. Throughput: 0: 11195.7. Samples: 190219264. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:24:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:24:08,221][1652475] Updated weights for policy 0, policy_version 371459 (0.0017) [2024-06-15 16:24:10,408][1652475] Updated weights for policy 0, policy_version 371536 (0.0128) [2024-06-15 16:24:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 760905728. Throughput: 0: 11434.7. Samples: 190287872. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:24:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:24:11,358][1652475] Updated weights for policy 0, policy_version 371584 (0.0014) [2024-06-15 16:24:15,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 44782.9, 300 sec: 43987.0). Total num frames: 761135104. Throughput: 0: 11389.1. Samples: 190351872. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:24:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:24:15,800][1652475] Updated weights for policy 0, policy_version 371664 (0.0013) [2024-06-15 16:24:16,801][1652475] Updated weights for policy 0, policy_version 371712 (0.0011) [2024-06-15 16:24:20,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 45329.0, 300 sec: 43431.5). Total num frames: 761364480. Throughput: 0: 11275.4. Samples: 190386176. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:24:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:24:22,753][1652475] Updated weights for policy 0, policy_version 371792 (0.0016) [2024-06-15 16:24:23,769][1652475] Updated weights for policy 0, policy_version 371835 (0.0028) [2024-06-15 16:24:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 761528320. Throughput: 0: 11275.4. Samples: 190452736. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:24:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:24:27,103][1652475] Updated weights for policy 0, policy_version 371888 (0.0015) [2024-06-15 16:24:28,726][1652475] Updated weights for policy 0, policy_version 371966 (0.0012) [2024-06-15 16:24:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45875.1, 300 sec: 43320.4). Total num frames: 761790464. Throughput: 0: 11116.1. Samples: 190518272. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:24:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:24:32,542][1652475] Updated weights for policy 0, policy_version 372029 (0.0166) [2024-06-15 16:24:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 761921536. Throughput: 0: 11150.2. Samples: 190550016. Policy #0 lag: (min: 15.0, avg: 126.9, max: 271.0) [2024-06-15 16:24:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 16:24:37,118][1652475] Updated weights for policy 0, policy_version 372095 (0.0016) [2024-06-15 16:24:38,659][1652475] Updated weights for policy 0, policy_version 372148 (0.0014) [2024-06-15 16:24:40,379][1652475] Updated weights for policy 0, policy_version 372224 (0.0013) [2024-06-15 16:24:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 43875.8). Total num frames: 762314752. Throughput: 0: 11070.6. Samples: 190614528. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:24:40,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:24:43,893][1652475] Updated weights for policy 0, policy_version 372278 (0.0011) [2024-06-15 16:24:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 762445824. Throughput: 0: 11093.3. Samples: 190683648. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:24:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:24:49,414][1652475] Updated weights for policy 0, policy_version 372336 (0.0020) [2024-06-15 16:24:49,542][1651340] Signal inference workers to stop experience collection... (19100 times) [2024-06-15 16:24:49,606][1652475] InferenceWorker_p0-w0: stopping experience collection (19100 times) [2024-06-15 16:24:49,810][1651340] Signal inference workers to resume experience collection... (19100 times) [2024-06-15 16:24:49,820][1652475] InferenceWorker_p0-w0: resuming experience collection (19100 times) [2024-06-15 16:24:50,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 762642432. Throughput: 0: 11173.0. Samples: 190722048. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:24:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:24:51,644][1652475] Updated weights for policy 0, policy_version 372420 (0.0100) [2024-06-15 16:24:53,280][1652475] Updated weights for policy 0, policy_version 372480 (0.0092) [2024-06-15 16:24:55,738][1648984] Fps is (10 sec: 45873.8, 60 sec: 44782.7, 300 sec: 43542.5). Total num frames: 762904576. Throughput: 0: 10740.5. Samples: 190771200. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:24:55,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:24:55,999][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000372528_762937344.pth... [2024-06-15 16:24:56,043][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000367296_752222208.pth [2024-06-15 16:24:56,316][1652475] Updated weights for policy 0, policy_version 372542 (0.0013) [2024-06-15 16:25:00,739][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 762970112. Throughput: 0: 10888.5. Samples: 190841856. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:00,741][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 16:25:02,948][1652475] Updated weights for policy 0, policy_version 372608 (0.0016) [2024-06-15 16:25:05,167][1652475] Updated weights for policy 0, policy_version 372704 (0.0014) [2024-06-15 16:25:05,742][1648984] Fps is (10 sec: 42582.2, 60 sec: 43141.5, 300 sec: 43764.1). Total num frames: 763330560. Throughput: 0: 10716.9. Samples: 190868480. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:05,742][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:25:07,066][1652475] Updated weights for policy 0, policy_version 372752 (0.0012) [2024-06-15 16:25:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 763494400. Throughput: 0: 10547.2. Samples: 190927360. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:25:13,997][1652475] Updated weights for policy 0, policy_version 372808 (0.0047) [2024-06-15 16:25:15,139][1652475] Updated weights for policy 0, policy_version 372864 (0.0015) [2024-06-15 16:25:15,740][1648984] Fps is (10 sec: 32775.0, 60 sec: 42050.9, 300 sec: 43653.3). Total num frames: 763658240. Throughput: 0: 10603.6. Samples: 190995456. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:15,741][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:25:17,811][1652475] Updated weights for policy 0, policy_version 372932 (0.0014) [2024-06-15 16:25:19,899][1652475] Updated weights for policy 0, policy_version 373024 (0.0048) [2024-06-15 16:25:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 764018688. Throughput: 0: 10604.1. Samples: 191027200. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:25:25,738][1648984] Fps is (10 sec: 36052.3, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 764018688. Throughput: 0: 10524.5. Samples: 191088128. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:25:26,879][1652475] Updated weights for policy 0, policy_version 373116 (0.0015) [2024-06-15 16:25:29,075][1652475] Updated weights for policy 0, policy_version 373173 (0.0013) [2024-06-15 16:25:30,738][1648984] Fps is (10 sec: 26214.5, 60 sec: 41506.2, 300 sec: 43431.5). Total num frames: 764280832. Throughput: 0: 10353.8. Samples: 191149568. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:25:31,688][1652475] Updated weights for policy 0, policy_version 373232 (0.0012) [2024-06-15 16:25:33,863][1652475] Updated weights for policy 0, policy_version 373311 (0.0082) [2024-06-15 16:25:35,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 764542976. Throughput: 0: 10092.1. Samples: 191176192. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:35,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 16:25:37,548][1651340] Signal inference workers to stop experience collection... (19150 times) [2024-06-15 16:25:37,641][1652475] InferenceWorker_p0-w0: stopping experience collection (19150 times) [2024-06-15 16:25:37,741][1651340] Signal inference workers to resume experience collection... (19150 times) [2024-06-15 16:25:37,742][1652475] InferenceWorker_p0-w0: resuming experience collection (19150 times) [2024-06-15 16:25:38,467][1652475] Updated weights for policy 0, policy_version 373360 (0.0099) [2024-06-15 16:25:40,736][1652475] Updated weights for policy 0, policy_version 373424 (0.0016) [2024-06-15 16:25:40,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 40960.1, 300 sec: 43875.8). Total num frames: 764772352. Throughput: 0: 10558.7. Samples: 191246336. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:25:43,966][1652475] Updated weights for policy 0, policy_version 373473 (0.0037) [2024-06-15 16:25:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 43320.6). Total num frames: 764936192. Throughput: 0: 10410.7. Samples: 191310336. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:25:45,894][1652475] Updated weights for policy 0, policy_version 373506 (0.0011) [2024-06-15 16:25:49,725][1652475] Updated weights for policy 0, policy_version 373569 (0.0074) [2024-06-15 16:25:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42052.3, 300 sec: 43875.8). Total num frames: 765165568. Throughput: 0: 10457.1. Samples: 191339008. Policy #0 lag: (min: 15.0, avg: 115.9, max: 271.0) [2024-06-15 16:25:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:25:50,901][1652475] Updated weights for policy 0, policy_version 373628 (0.0041) [2024-06-15 16:25:52,145][1652475] Updated weights for policy 0, policy_version 373666 (0.0018) [2024-06-15 16:25:55,535][1652475] Updated weights for policy 0, policy_version 373728 (0.0014) [2024-06-15 16:25:55,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 41506.4, 300 sec: 43764.7). Total num frames: 765394944. Throughput: 0: 10672.4. Samples: 191407616. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:25:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:25:58,768][1652475] Updated weights for policy 0, policy_version 373808 (0.0014) [2024-06-15 16:26:00,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 765591552. Throughput: 0: 10741.1. Samples: 191478784. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:26:02,651][1652475] Updated weights for policy 0, policy_version 373872 (0.0131) [2024-06-15 16:26:04,160][1652475] Updated weights for policy 0, policy_version 373936 (0.0068) [2024-06-15 16:26:05,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42055.2, 300 sec: 43986.9). Total num frames: 765853696. Throughput: 0: 10649.6. Samples: 191506432. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:26:07,069][1652475] Updated weights for policy 0, policy_version 374000 (0.0015) [2024-06-15 16:26:10,452][1652475] Updated weights for policy 0, policy_version 374049 (0.0013) [2024-06-15 16:26:10,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 766083072. Throughput: 0: 10899.9. Samples: 191578624. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:26:13,941][1652475] Updated weights for policy 0, policy_version 374096 (0.0012) [2024-06-15 16:26:15,635][1652475] Updated weights for policy 0, policy_version 374166 (0.0015) [2024-06-15 16:26:15,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43692.1, 300 sec: 43986.9). Total num frames: 766279680. Throughput: 0: 10865.7. Samples: 191638528. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:26:18,378][1652475] Updated weights for policy 0, policy_version 374225 (0.0013) [2024-06-15 16:26:19,475][1652475] Updated weights for policy 0, policy_version 374272 (0.0013) [2024-06-15 16:26:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 766509056. Throughput: 0: 11173.0. Samples: 191678976. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:26:23,013][1652475] Updated weights for policy 0, policy_version 374336 (0.0023) [2024-06-15 16:26:24,953][1651340] Signal inference workers to stop experience collection... (19200 times) [2024-06-15 16:26:25,014][1652475] InferenceWorker_p0-w0: stopping experience collection (19200 times) [2024-06-15 16:26:25,268][1651340] Signal inference workers to resume experience collection... (19200 times) [2024-06-15 16:26:25,269][1652475] InferenceWorker_p0-w0: resuming experience collection (19200 times) [2024-06-15 16:26:25,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44782.8, 300 sec: 43764.7). Total num frames: 766705664. Throughput: 0: 11116.1. Samples: 191746560. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:26:26,294][1652475] Updated weights for policy 0, policy_version 374400 (0.0017) [2024-06-15 16:26:27,395][1652475] Updated weights for policy 0, policy_version 374454 (0.0013) [2024-06-15 16:26:30,731][1652475] Updated weights for policy 0, policy_version 374517 (0.0016) [2024-06-15 16:26:30,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45329.0, 300 sec: 43875.8). Total num frames: 767000576. Throughput: 0: 11195.7. Samples: 191814144. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:30,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:26:33,346][1652475] Updated weights for policy 0, policy_version 374561 (0.0013) [2024-06-15 16:26:35,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 767164416. Throughput: 0: 11400.5. Samples: 191852032. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:26:36,637][1652475] Updated weights for policy 0, policy_version 374624 (0.0013) [2024-06-15 16:26:38,438][1652475] Updated weights for policy 0, policy_version 374704 (0.0013) [2024-06-15 16:26:40,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44236.7, 300 sec: 43986.9). Total num frames: 767426560. Throughput: 0: 11264.0. Samples: 191914496. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:26:42,901][1652475] Updated weights for policy 0, policy_version 374777 (0.0032) [2024-06-15 16:26:45,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 767623168. Throughput: 0: 11252.6. Samples: 191985152. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:26:46,496][1652475] Updated weights for policy 0, policy_version 374848 (0.0018) [2024-06-15 16:26:50,077][1652475] Updated weights for policy 0, policy_version 374931 (0.0129) [2024-06-15 16:26:50,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 45875.2, 300 sec: 44098.0). Total num frames: 767918080. Throughput: 0: 11320.9. Samples: 192015872. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:26:54,368][1652475] Updated weights for policy 0, policy_version 374996 (0.0014) [2024-06-15 16:26:55,738][1648984] Fps is (10 sec: 45873.9, 60 sec: 44782.8, 300 sec: 43542.5). Total num frames: 768081920. Throughput: 0: 11025.0. Samples: 192074752. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:26:55,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:26:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000375040_768081920.pth... [2024-06-15 16:26:55,824][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000369952_757661696.pth [2024-06-15 16:26:57,231][1652475] Updated weights for policy 0, policy_version 375056 (0.0014) [2024-06-15 16:26:58,434][1652475] Updated weights for policy 0, policy_version 375099 (0.0015) [2024-06-15 16:27:00,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 44783.0, 300 sec: 43764.8). Total num frames: 768278528. Throughput: 0: 11150.2. Samples: 192140288. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:27:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:27:01,022][1652475] Updated weights for policy 0, policy_version 375152 (0.0014) [2024-06-15 16:27:05,556][1652475] Updated weights for policy 0, policy_version 375220 (0.0012) [2024-06-15 16:27:05,738][1648984] Fps is (10 sec: 39322.6, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 768475136. Throughput: 0: 10934.0. Samples: 192171008. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:27:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:27:07,176][1652475] Updated weights for policy 0, policy_version 375287 (0.0012) [2024-06-15 16:27:09,600][1651340] Signal inference workers to stop experience collection... (19250 times) [2024-06-15 16:27:09,630][1652475] InferenceWorker_p0-w0: stopping experience collection (19250 times) [2024-06-15 16:27:09,843][1651340] Signal inference workers to resume experience collection... (19250 times) [2024-06-15 16:27:09,844][1652475] InferenceWorker_p0-w0: resuming experience collection (19250 times) [2024-06-15 16:27:10,554][1652475] Updated weights for policy 0, policy_version 375352 (0.0013) [2024-06-15 16:27:10,738][1648984] Fps is (10 sec: 45873.6, 60 sec: 44236.6, 300 sec: 43545.5). Total num frames: 768737280. Throughput: 0: 10786.1. Samples: 192231936. Policy #0 lag: (min: 2.0, avg: 115.3, max: 258.0) [2024-06-15 16:27:10,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:27:13,956][1652475] Updated weights for policy 0, policy_version 375414 (0.0013) [2024-06-15 16:27:15,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 43144.4, 300 sec: 43542.5). Total num frames: 768868352. Throughput: 0: 10649.5. Samples: 192293376. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:27:15,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:27:18,301][1652475] Updated weights for policy 0, policy_version 375456 (0.0111) [2024-06-15 16:27:20,048][1652475] Updated weights for policy 0, policy_version 375521 (0.0013) [2024-06-15 16:27:20,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 769130496. Throughput: 0: 10570.0. Samples: 192327680. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:27:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:27:22,487][1652475] Updated weights for policy 0, policy_version 375610 (0.0015) [2024-06-15 16:27:25,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42598.2, 300 sec: 43098.2). Total num frames: 769261568. Throughput: 0: 10433.4. Samples: 192384000. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:27:25,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:27:26,502][1652475] Updated weights for policy 0, policy_version 375653 (0.0032) [2024-06-15 16:27:30,273][1652475] Updated weights for policy 0, policy_version 375712 (0.0015) [2024-06-15 16:27:30,738][1648984] Fps is (10 sec: 36044.0, 60 sec: 41505.9, 300 sec: 43320.4). Total num frames: 769490944. Throughput: 0: 10308.2. Samples: 192449024. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:27:30,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:27:31,076][1652475] Updated weights for policy 0, policy_version 375744 (0.0013) [2024-06-15 16:27:33,528][1652475] Updated weights for policy 0, policy_version 375808 (0.0016) [2024-06-15 16:27:35,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 769785856. Throughput: 0: 10262.7. Samples: 192477696. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:27:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:27:40,488][1652475] Updated weights for policy 0, policy_version 375876 (0.0022) [2024-06-15 16:27:40,738][1648984] Fps is (10 sec: 32768.9, 60 sec: 39867.8, 300 sec: 42765.0). Total num frames: 769818624. Throughput: 0: 10297.0. Samples: 192538112. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:27:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:27:42,393][1652475] Updated weights for policy 0, policy_version 375955 (0.0083) [2024-06-15 16:27:45,647][1652475] Updated weights for policy 0, policy_version 376003 (0.0028) [2024-06-15 16:27:45,738][1648984] Fps is (10 sec: 26214.4, 60 sec: 40413.8, 300 sec: 42876.1). Total num frames: 770048000. Throughput: 0: 10160.3. Samples: 192597504. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:27:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:27:47,266][1652475] Updated weights for policy 0, policy_version 376069 (0.0013) [2024-06-15 16:27:48,416][1652475] Updated weights for policy 0, policy_version 376122 (0.0013) [2024-06-15 16:27:50,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 39867.7, 300 sec: 43098.3). Total num frames: 770310144. Throughput: 0: 10126.2. Samples: 192626688. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:27:50,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 16:27:53,528][1652475] Updated weights for policy 0, policy_version 376163 (0.0017) [2024-06-15 16:27:55,400][1652475] Updated weights for policy 0, policy_version 376249 (0.0014) [2024-06-15 16:27:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 41506.3, 300 sec: 43431.5). Total num frames: 770572288. Throughput: 0: 10297.0. Samples: 192695296. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:27:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:27:58,146][1651340] Signal inference workers to stop experience collection... (19300 times) [2024-06-15 16:27:58,181][1652475] InferenceWorker_p0-w0: stopping experience collection (19300 times) [2024-06-15 16:27:58,402][1651340] Signal inference workers to resume experience collection... (19300 times) [2024-06-15 16:27:58,403][1652475] InferenceWorker_p0-w0: resuming experience collection (19300 times) [2024-06-15 16:27:58,564][1652475] Updated weights for policy 0, policy_version 376290 (0.0015) [2024-06-15 16:28:00,204][1652475] Updated weights for policy 0, policy_version 376353 (0.0136) [2024-06-15 16:28:00,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 770834432. Throughput: 0: 10319.7. Samples: 192757760. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:28:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:28:05,160][1652475] Updated weights for policy 0, policy_version 376432 (0.0223) [2024-06-15 16:28:05,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 770965504. Throughput: 0: 10319.6. Samples: 192792064. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:28:05,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:28:07,091][1652475] Updated weights for policy 0, policy_version 376507 (0.0014) [2024-06-15 16:28:10,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 40414.0, 300 sec: 43098.2). Total num frames: 771162112. Throughput: 0: 10535.9. Samples: 192858112. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:28:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:28:10,837][1652475] Updated weights for policy 0, policy_version 376545 (0.0014) [2024-06-15 16:28:12,010][1652475] Updated weights for policy 0, policy_version 376592 (0.0017) [2024-06-15 16:28:13,231][1652475] Updated weights for policy 0, policy_version 376639 (0.0068) [2024-06-15 16:28:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.3, 300 sec: 43098.2). Total num frames: 771358720. Throughput: 0: 10672.4. Samples: 192929280. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:28:15,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:28:16,443][1652475] Updated weights for policy 0, policy_version 376688 (0.0015) [2024-06-15 16:28:17,869][1652475] Updated weights for policy 0, policy_version 376752 (0.0014) [2024-06-15 16:28:20,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 771620864. Throughput: 0: 10672.4. Samples: 192957952. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:28:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:28:22,144][1652475] Updated weights for policy 0, policy_version 376800 (0.0010) [2024-06-15 16:28:24,285][1652475] Updated weights for policy 0, policy_version 376892 (0.0097) [2024-06-15 16:28:25,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.8, 300 sec: 43542.5). Total num frames: 771883008. Throughput: 0: 10843.0. Samples: 193026048. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:28:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:28:28,658][1652475] Updated weights for policy 0, policy_version 376962 (0.0016) [2024-06-15 16:28:30,091][1652475] Updated weights for policy 0, policy_version 377021 (0.0014) [2024-06-15 16:28:30,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 44236.9, 300 sec: 43542.5). Total num frames: 772145152. Throughput: 0: 10968.2. Samples: 193091072. Policy #0 lag: (min: 47.0, avg: 160.2, max: 303.0) [2024-06-15 16:28:30,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:28:35,041][1652475] Updated weights for policy 0, policy_version 377075 (0.0016) [2024-06-15 16:28:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 772308992. Throughput: 0: 11241.3. Samples: 193132544. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:28:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:28:36,392][1652475] Updated weights for policy 0, policy_version 377144 (0.0127) [2024-06-15 16:28:40,030][1652475] Updated weights for policy 0, policy_version 377216 (0.0038) [2024-06-15 16:28:40,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 45875.2, 300 sec: 43431.5). Total num frames: 772571136. Throughput: 0: 11127.5. Samples: 193196032. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:28:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:28:40,848][1651340] Signal inference workers to stop experience collection... (19350 times) [2024-06-15 16:28:40,915][1652475] InferenceWorker_p0-w0: stopping experience collection (19350 times) [2024-06-15 16:28:41,196][1651340] Signal inference workers to resume experience collection... (19350 times) [2024-06-15 16:28:41,196][1652475] InferenceWorker_p0-w0: resuming experience collection (19350 times) [2024-06-15 16:28:41,855][1652475] Updated weights for policy 0, policy_version 377270 (0.0092) [2024-06-15 16:28:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 772669440. Throughput: 0: 11275.4. Samples: 193265152. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:28:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:28:46,970][1652475] Updated weights for policy 0, policy_version 377344 (0.0150) [2024-06-15 16:28:48,482][1652475] Updated weights for policy 0, policy_version 377400 (0.0012) [2024-06-15 16:28:50,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 772931584. Throughput: 0: 11070.6. Samples: 193290240. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:28:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:28:52,106][1652475] Updated weights for policy 0, policy_version 377470 (0.0176) [2024-06-15 16:28:53,958][1652475] Updated weights for policy 0, policy_version 377530 (0.0013) [2024-06-15 16:28:55,758][1648984] Fps is (10 sec: 52321.5, 60 sec: 43675.7, 300 sec: 43095.3). Total num frames: 773193728. Throughput: 0: 10963.2. Samples: 193351680. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:28:55,759][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:28:55,765][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000377536_773193728.pth... [2024-06-15 16:28:55,805][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000372528_762937344.pth [2024-06-15 16:28:58,895][1652475] Updated weights for policy 0, policy_version 377589 (0.0012) [2024-06-15 16:29:00,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 773455872. Throughput: 0: 10888.5. Samples: 193419264. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:29:02,546][1652475] Updated weights for policy 0, policy_version 377665 (0.0335) [2024-06-15 16:29:03,994][1652475] Updated weights for policy 0, policy_version 377727 (0.0012) [2024-06-15 16:29:05,738][1648984] Fps is (10 sec: 42686.2, 60 sec: 44236.9, 300 sec: 43098.3). Total num frames: 773619712. Throughput: 0: 10990.9. Samples: 193452544. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:29:06,164][1652475] Updated weights for policy 0, policy_version 377776 (0.0012) [2024-06-15 16:29:10,645][1652475] Updated weights for policy 0, policy_version 377856 (0.0014) [2024-06-15 16:29:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44783.0, 300 sec: 43098.3). Total num frames: 773849088. Throughput: 0: 11150.2. Samples: 193527808. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:29:12,213][1652475] Updated weights for policy 0, policy_version 377920 (0.0021) [2024-06-15 16:29:15,237][1652475] Updated weights for policy 0, policy_version 377974 (0.0021) [2024-06-15 16:29:15,738][1648984] Fps is (10 sec: 49150.6, 60 sec: 45875.0, 300 sec: 43209.3). Total num frames: 774111232. Throughput: 0: 11002.3. Samples: 193586176. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:15,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:29:19,395][1652475] Updated weights for policy 0, policy_version 378016 (0.0013) [2024-06-15 16:29:20,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 774242304. Throughput: 0: 10911.3. Samples: 193623552. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:20,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 16:29:21,857][1652475] Updated weights for policy 0, policy_version 378081 (0.0014) [2024-06-15 16:29:23,491][1652475] Updated weights for policy 0, policy_version 378146 (0.0015) [2024-06-15 16:29:25,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 774504448. Throughput: 0: 10752.0. Samples: 193679872. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:29:26,319][1652475] Updated weights for policy 0, policy_version 378208 (0.0016) [2024-06-15 16:29:26,422][1651340] Signal inference workers to stop experience collection... (19400 times) [2024-06-15 16:29:26,476][1652475] InferenceWorker_p0-w0: stopping experience collection (19400 times) [2024-06-15 16:29:26,660][1651340] Signal inference workers to resume experience collection... (19400 times) [2024-06-15 16:29:26,661][1652475] InferenceWorker_p0-w0: resuming experience collection (19400 times) [2024-06-15 16:29:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 43098.2). Total num frames: 774635520. Throughput: 0: 10752.0. Samples: 193748992. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 16:29:33,057][1652475] Updated weights for policy 0, policy_version 378288 (0.0024) [2024-06-15 16:29:35,601][1652475] Updated weights for policy 0, policy_version 378385 (0.0012) [2024-06-15 16:29:35,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43690.5, 300 sec: 42765.0). Total num frames: 774930432. Throughput: 0: 10968.1. Samples: 193783808. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:35,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 16:29:38,917][1652475] Updated weights for policy 0, policy_version 378464 (0.0127) [2024-06-15 16:29:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 775159808. Throughput: 0: 10654.5. Samples: 193830912. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 16:29:44,848][1652475] Updated weights for policy 0, policy_version 378498 (0.0011) [2024-06-15 16:29:45,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 775225344. Throughput: 0: 10683.7. Samples: 193900032. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:29:46,730][1652475] Updated weights for policy 0, policy_version 378562 (0.0032) [2024-06-15 16:29:49,085][1652475] Updated weights for policy 0, policy_version 378656 (0.0014) [2024-06-15 16:29:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 775553024. Throughput: 0: 10535.8. Samples: 193926656. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 16:29:50,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:29:52,669][1652475] Updated weights for policy 0, policy_version 378722 (0.0095) [2024-06-15 16:29:55,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 41520.3, 300 sec: 43098.2). Total num frames: 775684096. Throughput: 0: 10251.4. Samples: 193989120. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:29:55,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:29:56,805][1652475] Updated weights for policy 0, policy_version 378755 (0.0069) [2024-06-15 16:29:58,567][1652475] Updated weights for policy 0, policy_version 378823 (0.0015) [2024-06-15 16:30:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 42765.6). Total num frames: 775946240. Throughput: 0: 10228.7. Samples: 194046464. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:30:02,139][1652475] Updated weights for policy 0, policy_version 378896 (0.0014) [2024-06-15 16:30:05,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 776077312. Throughput: 0: 10092.1. Samples: 194077696. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:30:05,987][1652475] Updated weights for policy 0, policy_version 378962 (0.0014) [2024-06-15 16:30:09,009][1652475] Updated weights for policy 0, policy_version 379012 (0.0037) [2024-06-15 16:30:10,287][1652475] Updated weights for policy 0, policy_version 379072 (0.0014) [2024-06-15 16:30:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42987.5). Total num frames: 776339456. Throughput: 0: 10353.8. Samples: 194145792. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:30:13,909][1652475] Updated weights for policy 0, policy_version 379139 (0.0023) [2024-06-15 16:30:15,209][1652475] Updated weights for policy 0, policy_version 379193 (0.0013) [2024-06-15 16:30:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 41506.3, 300 sec: 42653.9). Total num frames: 776601600. Throughput: 0: 10058.0. Samples: 194201600. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:30:17,711][1651340] Signal inference workers to stop experience collection... (19450 times) [2024-06-15 16:30:17,761][1652475] InferenceWorker_p0-w0: stopping experience collection (19450 times) [2024-06-15 16:30:17,906][1651340] Signal inference workers to resume experience collection... (19450 times) [2024-06-15 16:30:17,908][1652475] InferenceWorker_p0-w0: resuming experience collection (19450 times) [2024-06-15 16:30:18,435][1652475] Updated weights for policy 0, policy_version 379236 (0.0097) [2024-06-15 16:30:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43098.2). Total num frames: 776732672. Throughput: 0: 10046.6. Samples: 194235904. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:20,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 16:30:21,297][1652475] Updated weights for policy 0, policy_version 379284 (0.0015) [2024-06-15 16:30:22,106][1652475] Updated weights for policy 0, policy_version 379325 (0.0014) [2024-06-15 16:30:25,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 40413.9, 300 sec: 42876.1). Total num frames: 776929280. Throughput: 0: 10672.4. Samples: 194311168. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:30:26,345][1652475] Updated weights for policy 0, policy_version 379377 (0.0012) [2024-06-15 16:30:28,273][1652475] Updated weights for policy 0, policy_version 379452 (0.0013) [2024-06-15 16:30:30,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 777224192. Throughput: 0: 10319.6. Samples: 194364416. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:30:32,970][1652475] Updated weights for policy 0, policy_version 379521 (0.0105) [2024-06-15 16:30:34,477][1652475] Updated weights for policy 0, policy_version 379582 (0.0030) [2024-06-15 16:30:35,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 40960.2, 300 sec: 42765.0). Total num frames: 777388032. Throughput: 0: 10592.7. Samples: 194403328. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:30:39,390][1652475] Updated weights for policy 0, policy_version 379680 (0.0214) [2024-06-15 16:30:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 777650176. Throughput: 0: 10456.2. Samples: 194459648. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:30:43,043][1652475] Updated weights for policy 0, policy_version 379754 (0.0021) [2024-06-15 16:30:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 777814016. Throughput: 0: 10774.8. Samples: 194531328. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:30:46,425][1652475] Updated weights for policy 0, policy_version 379824 (0.0012) [2024-06-15 16:30:50,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.0, 300 sec: 42765.0). Total num frames: 778010624. Throughput: 0: 10763.4. Samples: 194562048. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:30:50,935][1652475] Updated weights for policy 0, policy_version 379907 (0.0014) [2024-06-15 16:30:53,989][1652475] Updated weights for policy 0, policy_version 379985 (0.0013) [2024-06-15 16:30:55,738][1648984] Fps is (10 sec: 49150.4, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 778305536. Throughput: 0: 10638.2. Samples: 194624512. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:30:55,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:30:55,765][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000380032_778305536.pth... [2024-06-15 16:30:55,893][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000375040_768081920.pth [2024-06-15 16:30:55,899][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000380032_778305536.pth [2024-06-15 16:30:58,591][1652475] Updated weights for policy 0, policy_version 380091 (0.0138) [2024-06-15 16:31:00,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 778436608. Throughput: 0: 11059.2. Samples: 194699264. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:31:00,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:31:02,524][1652475] Updated weights for policy 0, policy_version 380162 (0.0112) [2024-06-15 16:31:02,822][1651340] Signal inference workers to stop experience collection... (19500 times) [2024-06-15 16:31:02,866][1652475] InferenceWorker_p0-w0: stopping experience collection (19500 times) [2024-06-15 16:31:03,098][1651340] Signal inference workers to resume experience collection... (19500 times) [2024-06-15 16:31:03,099][1652475] InferenceWorker_p0-w0: resuming experience collection (19500 times) [2024-06-15 16:31:05,139][1652475] Updated weights for policy 0, policy_version 380240 (0.0023) [2024-06-15 16:31:05,738][1648984] Fps is (10 sec: 45876.6, 60 sec: 44782.9, 300 sec: 42987.2). Total num frames: 778764288. Throughput: 0: 10808.9. Samples: 194722304. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:31:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:31:09,347][1652475] Updated weights for policy 0, policy_version 380294 (0.0015) [2024-06-15 16:31:10,588][1652475] Updated weights for policy 0, policy_version 380350 (0.0016) [2024-06-15 16:31:10,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 778960896. Throughput: 0: 10934.0. Samples: 194803200. Policy #0 lag: (min: 47.0, avg: 186.4, max: 303.0) [2024-06-15 16:31:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:31:13,902][1652475] Updated weights for policy 0, policy_version 380422 (0.0098) [2024-06-15 16:31:15,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 779223040. Throughput: 0: 10979.6. Samples: 194858496. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:31:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:31:16,703][1652475] Updated weights for policy 0, policy_version 380496 (0.0086) [2024-06-15 16:31:17,641][1652475] Updated weights for policy 0, policy_version 380543 (0.0013) [2024-06-15 16:31:20,739][1648984] Fps is (10 sec: 39319.8, 60 sec: 43690.3, 300 sec: 42876.0). Total num frames: 779354112. Throughput: 0: 10933.9. Samples: 194895360. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:31:20,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:31:22,435][1652475] Updated weights for policy 0, policy_version 380599 (0.0013) [2024-06-15 16:31:25,134][1652475] Updated weights for policy 0, policy_version 380660 (0.0018) [2024-06-15 16:31:25,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45329.0, 300 sec: 42876.1). Total num frames: 779649024. Throughput: 0: 11389.2. Samples: 194972160. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:31:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:31:26,552][1652475] Updated weights for policy 0, policy_version 380732 (0.0017) [2024-06-15 16:31:30,738][1648984] Fps is (10 sec: 52431.4, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 779878400. Throughput: 0: 11173.0. Samples: 195034112. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:31:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:31:33,221][1652475] Updated weights for policy 0, policy_version 380801 (0.0032) [2024-06-15 16:31:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 780042240. Throughput: 0: 11423.3. Samples: 195076096. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:31:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:31:35,770][1652475] Updated weights for policy 0, policy_version 380882 (0.0027) [2024-06-15 16:31:36,966][1652475] Updated weights for policy 0, policy_version 380934 (0.0014) [2024-06-15 16:31:37,848][1652475] Updated weights for policy 0, policy_version 380987 (0.0012) [2024-06-15 16:31:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 43320.4). Total num frames: 780402688. Throughput: 0: 11320.9. Samples: 195133952. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:31:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:31:45,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 43144.4, 300 sec: 42320.7). Total num frames: 780402688. Throughput: 0: 11252.6. Samples: 195205632. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:31:45,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:31:46,568][1652475] Updated weights for policy 0, policy_version 381057 (0.0014) [2024-06-15 16:31:48,300][1652475] Updated weights for policy 0, policy_version 381139 (0.0011) [2024-06-15 16:31:48,685][1651340] Signal inference workers to stop experience collection... (19550 times) [2024-06-15 16:31:48,734][1652475] InferenceWorker_p0-w0: stopping experience collection (19550 times) [2024-06-15 16:31:48,940][1651340] Signal inference workers to resume experience collection... (19550 times) [2024-06-15 16:31:48,941][1652475] InferenceWorker_p0-w0: resuming experience collection (19550 times) [2024-06-15 16:31:50,564][1652475] Updated weights for policy 0, policy_version 381232 (0.0013) [2024-06-15 16:31:50,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 45875.2, 300 sec: 42987.2). Total num frames: 780763136. Throughput: 0: 11491.5. Samples: 195239424. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:31:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:31:55,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.9, 300 sec: 42876.1). Total num frames: 780926976. Throughput: 0: 10854.4. Samples: 195291648. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:31:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:31:59,533][1652475] Updated weights for policy 0, policy_version 381344 (0.0142) [2024-06-15 16:32:00,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 44236.9, 300 sec: 42765.0). Total num frames: 781090816. Throughput: 0: 11241.3. Samples: 195364352. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:32:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:32:01,673][1652475] Updated weights for policy 0, policy_version 381440 (0.0107) [2024-06-15 16:32:04,201][1652475] Updated weights for policy 0, policy_version 381520 (0.0014) [2024-06-15 16:32:05,740][1648984] Fps is (10 sec: 52428.5, 60 sec: 44782.9, 300 sec: 43098.3). Total num frames: 781451264. Throughput: 0: 10854.5. Samples: 195383808. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:32:05,741][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:32:10,738][1648984] Fps is (10 sec: 36043.9, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 781451264. Throughput: 0: 10706.4. Samples: 195453952. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:32:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:32:11,097][1652475] Updated weights for policy 0, policy_version 381585 (0.0028) [2024-06-15 16:32:12,901][1652475] Updated weights for policy 0, policy_version 381667 (0.0018) [2024-06-15 16:32:14,760][1652475] Updated weights for policy 0, policy_version 381728 (0.0015) [2024-06-15 16:32:15,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 781844480. Throughput: 0: 10615.4. Samples: 195511808. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:32:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:32:17,668][1652475] Updated weights for policy 0, policy_version 381792 (0.0017) [2024-06-15 16:32:20,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43691.0, 300 sec: 43098.3). Total num frames: 781975552. Throughput: 0: 10524.4. Samples: 195549696. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:32:20,742][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:32:22,435][1652475] Updated weights for policy 0, policy_version 381840 (0.0038) [2024-06-15 16:32:24,275][1652475] Updated weights for policy 0, policy_version 381924 (0.0098) [2024-06-15 16:32:25,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43144.5, 300 sec: 43209.4). Total num frames: 782237696. Throughput: 0: 10717.9. Samples: 195616256. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:32:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:32:27,435][1652475] Updated weights for policy 0, policy_version 381987 (0.0012) [2024-06-15 16:32:28,483][1652475] Updated weights for policy 0, policy_version 382018 (0.0027) [2024-06-15 16:32:29,550][1652475] Updated weights for policy 0, policy_version 382075 (0.0034) [2024-06-15 16:32:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 782499840. Throughput: 0: 10558.6. Samples: 195680768. Policy #0 lag: (min: 0.0, avg: 86.2, max: 256.0) [2024-06-15 16:32:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:32:34,009][1651340] Signal inference workers to stop experience collection... (19600 times) [2024-06-15 16:32:34,059][1652475] InferenceWorker_p0-w0: stopping experience collection (19600 times) [2024-06-15 16:32:34,212][1651340] Signal inference workers to resume experience collection... (19600 times) [2024-06-15 16:32:34,212][1652475] InferenceWorker_p0-w0: resuming experience collection (19600 times) [2024-06-15 16:32:34,596][1652475] Updated weights for policy 0, policy_version 382135 (0.0012) [2024-06-15 16:32:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 782630912. Throughput: 0: 10638.2. Samples: 195718144. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:32:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:32:37,900][1652475] Updated weights for policy 0, policy_version 382194 (0.0037) [2024-06-15 16:32:39,864][1652475] Updated weights for policy 0, policy_version 382266 (0.0011) [2024-06-15 16:32:40,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 782925824. Throughput: 0: 10831.6. Samples: 195779072. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:32:40,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:32:41,430][1652475] Updated weights for policy 0, policy_version 382327 (0.0012) [2024-06-15 16:32:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 783024128. Throughput: 0: 10729.2. Samples: 195847168. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:32:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:32:46,329][1652475] Updated weights for policy 0, policy_version 382370 (0.0013) [2024-06-15 16:32:49,529][1652475] Updated weights for policy 0, policy_version 382422 (0.0016) [2024-06-15 16:32:50,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 783286272. Throughput: 0: 11093.4. Samples: 195883008. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:32:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:32:51,255][1652475] Updated weights for policy 0, policy_version 382485 (0.0011) [2024-06-15 16:32:52,928][1652475] Updated weights for policy 0, policy_version 382560 (0.0184) [2024-06-15 16:32:53,654][1652475] Updated weights for policy 0, policy_version 382589 (0.0011) [2024-06-15 16:32:55,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 783548416. Throughput: 0: 10854.4. Samples: 195942400. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:32:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:32:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000382592_783548416.pth... [2024-06-15 16:32:55,788][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000377536_773193728.pth [2024-06-15 16:32:59,077][1652475] Updated weights for policy 0, policy_version 382656 (0.0013) [2024-06-15 16:33:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 783679488. Throughput: 0: 11195.8. Samples: 196015616. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:33:02,819][1652475] Updated weights for policy 0, policy_version 382737 (0.0014) [2024-06-15 16:33:05,187][1652475] Updated weights for policy 0, policy_version 382840 (0.0173) [2024-06-15 16:33:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 784072704. Throughput: 0: 10922.7. Samples: 196041216. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:05,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:33:10,738][1648984] Fps is (10 sec: 42597.5, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 784105472. Throughput: 0: 10877.1. Samples: 196105728. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:10,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:33:11,327][1652475] Updated weights for policy 0, policy_version 382885 (0.0013) [2024-06-15 16:33:13,975][1652475] Updated weights for policy 0, policy_version 382929 (0.0015) [2024-06-15 16:33:15,535][1652475] Updated weights for policy 0, policy_version 382992 (0.0012) [2024-06-15 16:33:15,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 42052.4, 300 sec: 43209.3). Total num frames: 784367616. Throughput: 0: 10797.5. Samples: 196166656. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:33:16,118][1651340] Signal inference workers to stop experience collection... (19650 times) [2024-06-15 16:33:16,167][1652475] InferenceWorker_p0-w0: stopping experience collection (19650 times) [2024-06-15 16:33:16,400][1651340] Signal inference workers to resume experience collection... (19650 times) [2024-06-15 16:33:16,401][1652475] InferenceWorker_p0-w0: resuming experience collection (19650 times) [2024-06-15 16:33:17,880][1652475] Updated weights for policy 0, policy_version 383088 (0.0106) [2024-06-15 16:33:20,738][1648984] Fps is (10 sec: 49153.4, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 784596992. Throughput: 0: 10490.3. Samples: 196190208. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:33:23,466][1652475] Updated weights for policy 0, policy_version 383138 (0.0013) [2024-06-15 16:33:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 784728064. Throughput: 0: 10797.5. Samples: 196264960. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:33:26,251][1652475] Updated weights for policy 0, policy_version 383184 (0.0014) [2024-06-15 16:33:29,125][1652475] Updated weights for policy 0, policy_version 383312 (0.0015) [2024-06-15 16:33:30,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 785121280. Throughput: 0: 10433.4. Samples: 196316672. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:33:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 785154048. Throughput: 0: 10513.1. Samples: 196356096. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:33:35,774][1652475] Updated weights for policy 0, policy_version 383392 (0.0014) [2024-06-15 16:33:39,264][1652475] Updated weights for policy 0, policy_version 383456 (0.0014) [2024-06-15 16:33:40,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 785448960. Throughput: 0: 10683.8. Samples: 196423168. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:40,741][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:33:40,854][1652475] Updated weights for policy 0, policy_version 383536 (0.0013) [2024-06-15 16:33:42,353][1652475] Updated weights for policy 0, policy_version 383600 (0.0013) [2024-06-15 16:33:45,738][1648984] Fps is (10 sec: 49148.8, 60 sec: 43690.2, 300 sec: 43098.1). Total num frames: 785645568. Throughput: 0: 10547.1. Samples: 196490240. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:45,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:33:48,513][1652475] Updated weights for policy 0, policy_version 383671 (0.0013) [2024-06-15 16:33:50,725][1652475] Updated weights for policy 0, policy_version 383714 (0.0013) [2024-06-15 16:33:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.3, 300 sec: 42879.1). Total num frames: 785842176. Throughput: 0: 10672.4. Samples: 196521472. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 16:33:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:33:52,133][1652475] Updated weights for policy 0, policy_version 383781 (0.0012) [2024-06-15 16:33:53,888][1652475] Updated weights for policy 0, policy_version 383863 (0.0013) [2024-06-15 16:33:55,738][1648984] Fps is (10 sec: 52431.6, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 786169856. Throughput: 0: 10570.0. Samples: 196581376. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:33:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:34:00,758][1648984] Fps is (10 sec: 32700.4, 60 sec: 41491.8, 300 sec: 42539.9). Total num frames: 786169856. Throughput: 0: 11008.6. Samples: 196662272. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:00,759][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:34:01,607][1651340] Signal inference workers to stop experience collection... (19700 times) [2024-06-15 16:34:01,650][1652475] InferenceWorker_p0-w0: stopping experience collection (19700 times) [2024-06-15 16:34:01,828][1651340] Signal inference workers to resume experience collection... (19700 times) [2024-06-15 16:34:01,833][1652475] InferenceWorker_p0-w0: resuming experience collection (19700 times) [2024-06-15 16:34:01,835][1652475] Updated weights for policy 0, policy_version 383936 (0.0013) [2024-06-15 16:34:03,487][1652475] Updated weights for policy 0, policy_version 384001 (0.0014) [2024-06-15 16:34:05,564][1652475] Updated weights for policy 0, policy_version 384081 (0.0023) [2024-06-15 16:34:05,750][1648984] Fps is (10 sec: 42545.8, 60 sec: 42043.6, 300 sec: 43207.5). Total num frames: 786595840. Throughput: 0: 11044.8. Samples: 196687360. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:05,751][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:34:06,453][1652475] Updated weights for policy 0, policy_version 384121 (0.0045) [2024-06-15 16:34:10,738][1648984] Fps is (10 sec: 52537.6, 60 sec: 43144.7, 300 sec: 42654.0). Total num frames: 786694144. Throughput: 0: 10695.1. Samples: 196746240. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:34:13,670][1652475] Updated weights for policy 0, policy_version 384176 (0.0117) [2024-06-15 16:34:15,738][1648984] Fps is (10 sec: 36089.7, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 786956288. Throughput: 0: 11013.7. Samples: 196812288. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:34:15,997][1652475] Updated weights for policy 0, policy_version 384276 (0.0103) [2024-06-15 16:34:18,302][1652475] Updated weights for policy 0, policy_version 384375 (0.0019) [2024-06-15 16:34:20,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 787218432. Throughput: 0: 10547.2. Samples: 196830720. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:20,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 16:34:25,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 787251200. Throughput: 0: 10774.7. Samples: 196908032. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:34:26,964][1652475] Updated weights for policy 0, policy_version 384449 (0.0015) [2024-06-15 16:34:28,447][1652475] Updated weights for policy 0, policy_version 384515 (0.0012) [2024-06-15 16:34:30,143][1652475] Updated weights for policy 0, policy_version 384576 (0.0013) [2024-06-15 16:34:30,738][1648984] Fps is (10 sec: 42599.3, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 787644416. Throughput: 0: 10456.3. Samples: 196960768. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:34:35,738][1648984] Fps is (10 sec: 49151.2, 60 sec: 43144.4, 300 sec: 42653.9). Total num frames: 787742720. Throughput: 0: 10467.5. Samples: 196992512. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:35,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:34:37,332][1652475] Updated weights for policy 0, policy_version 384657 (0.0014) [2024-06-15 16:34:38,503][1652475] Updated weights for policy 0, policy_version 384707 (0.0013) [2024-06-15 16:34:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 788004864. Throughput: 0: 10581.4. Samples: 197057536. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:34:41,359][1652475] Updated weights for policy 0, policy_version 384769 (0.0014) [2024-06-15 16:34:41,789][1651340] Signal inference workers to stop experience collection... (19750 times) [2024-06-15 16:34:41,822][1652475] InferenceWorker_p0-w0: stopping experience collection (19750 times) [2024-06-15 16:34:42,074][1651340] Signal inference workers to resume experience collection... (19750 times) [2024-06-15 16:34:42,075][1652475] InferenceWorker_p0-w0: resuming experience collection (19750 times) [2024-06-15 16:34:42,667][1652475] Updated weights for policy 0, policy_version 384829 (0.0012) [2024-06-15 16:34:44,858][1652475] Updated weights for policy 0, policy_version 384894 (0.0013) [2024-06-15 16:34:45,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43691.1, 300 sec: 43098.3). Total num frames: 788267008. Throughput: 0: 10130.9. Samples: 197117952. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:34:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 788398080. Throughput: 0: 10413.6. Samples: 197155840. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:50,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 16:34:50,739][1651340] Saving new best policy, reward=-0.180! [2024-06-15 16:34:51,670][1652475] Updated weights for policy 0, policy_version 384976 (0.0017) [2024-06-15 16:34:53,079][1652475] Updated weights for policy 0, policy_version 385024 (0.0013) [2024-06-15 16:34:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 43098.2). Total num frames: 788660224. Throughput: 0: 10342.4. Samples: 197211648. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:34:55,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:34:55,752][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000385088_788660224.pth... [2024-06-15 16:34:56,033][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000380032_778305536.pth [2024-06-15 16:34:56,791][1652475] Updated weights for policy 0, policy_version 385124 (0.0076) [2024-06-15 16:35:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43705.8, 300 sec: 43098.3). Total num frames: 788791296. Throughput: 0: 10444.8. Samples: 197282304. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:35:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:35:01,765][1652475] Updated weights for policy 0, policy_version 385171 (0.0014) [2024-06-15 16:35:04,591][1652475] Updated weights for policy 0, policy_version 385218 (0.0013) [2024-06-15 16:35:05,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 39876.0, 300 sec: 42876.1). Total num frames: 788987904. Throughput: 0: 10774.8. Samples: 197315584. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:35:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:35:06,338][1652475] Updated weights for policy 0, policy_version 385282 (0.0014) [2024-06-15 16:35:07,986][1652475] Updated weights for policy 0, policy_version 385344 (0.0016) [2024-06-15 16:35:10,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 789315584. Throughput: 0: 10205.9. Samples: 197367296. Policy #0 lag: (min: 19.0, avg: 163.4, max: 266.0) [2024-06-15 16:35:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:35:14,288][1652475] Updated weights for policy 0, policy_version 385424 (0.0011) [2024-06-15 16:35:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 789446656. Throughput: 0: 10615.5. Samples: 197438464. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:35:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:35:17,110][1652475] Updated weights for policy 0, policy_version 385474 (0.0015) [2024-06-15 16:35:18,962][1652475] Updated weights for policy 0, policy_version 385552 (0.0011) [2024-06-15 16:35:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.4, 300 sec: 43431.5). Total num frames: 789741568. Throughput: 0: 10706.5. Samples: 197474304. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:35:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:35:20,774][1652475] Updated weights for policy 0, policy_version 385632 (0.0105) [2024-06-15 16:35:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 789839872. Throughput: 0: 10638.2. Samples: 197536256. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:35:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:35:27,349][1652475] Updated weights for policy 0, policy_version 385697 (0.0015) [2024-06-15 16:35:29,919][1652475] Updated weights for policy 0, policy_version 385747 (0.0013) [2024-06-15 16:35:30,254][1651340] Signal inference workers to stop experience collection... (19800 times) [2024-06-15 16:35:30,313][1652475] InferenceWorker_p0-w0: stopping experience collection (19800 times) [2024-06-15 16:35:30,508][1651340] Signal inference workers to resume experience collection... (19800 times) [2024-06-15 16:35:30,510][1652475] InferenceWorker_p0-w0: resuming experience collection (19800 times) [2024-06-15 16:35:30,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 40413.9, 300 sec: 42987.2). Total num frames: 790069248. Throughput: 0: 10831.6. Samples: 197605376. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:35:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:35:31,065][1652475] Updated weights for policy 0, policy_version 385796 (0.0011) [2024-06-15 16:35:32,270][1652475] Updated weights for policy 0, policy_version 385856 (0.0014) [2024-06-15 16:35:33,478][1652475] Updated weights for policy 0, policy_version 385918 (0.0013) [2024-06-15 16:35:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 790364160. Throughput: 0: 10615.5. Samples: 197633536. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:35:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:35:38,985][1652475] Updated weights for policy 0, policy_version 385983 (0.0013) [2024-06-15 16:35:40,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 790495232. Throughput: 0: 10968.2. Samples: 197705216. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:35:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:35:42,830][1652475] Updated weights for policy 0, policy_version 386080 (0.0013) [2024-06-15 16:35:44,690][1652475] Updated weights for policy 0, policy_version 386160 (0.0013) [2024-06-15 16:35:45,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 790888448. Throughput: 0: 10695.1. Samples: 197763584. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:35:45,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:35:49,869][1652475] Updated weights for policy 0, policy_version 386192 (0.0014) [2024-06-15 16:35:50,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 790986752. Throughput: 0: 10843.0. Samples: 197803520. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:35:50,759][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:35:52,713][1652475] Updated weights for policy 0, policy_version 386241 (0.0015) [2024-06-15 16:35:54,140][1652475] Updated weights for policy 0, policy_version 386305 (0.0140) [2024-06-15 16:35:55,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 791281664. Throughput: 0: 11229.9. Samples: 197872640. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:35:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:35:56,089][1652475] Updated weights for policy 0, policy_version 386386 (0.0014) [2024-06-15 16:36:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 791412736. Throughput: 0: 11025.1. Samples: 197934592. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:36:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:36:02,052][1652475] Updated weights for policy 0, policy_version 386448 (0.0011) [2024-06-15 16:36:03,272][1652475] Updated weights for policy 0, policy_version 386496 (0.0012) [2024-06-15 16:36:05,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 791642112. Throughput: 0: 10968.2. Samples: 197967872. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:36:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:36:06,026][1652475] Updated weights for policy 0, policy_version 386560 (0.0018) [2024-06-15 16:36:07,380][1652475] Updated weights for policy 0, policy_version 386611 (0.0016) [2024-06-15 16:36:08,525][1651340] Signal inference workers to stop experience collection... (19850 times) [2024-06-15 16:36:08,576][1652475] InferenceWorker_p0-w0: stopping experience collection (19850 times) [2024-06-15 16:36:08,738][1651340] Signal inference workers to resume experience collection... (19850 times) [2024-06-15 16:36:08,738][1652475] InferenceWorker_p0-w0: resuming experience collection (19850 times) [2024-06-15 16:36:10,740][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 791937024. Throughput: 0: 10945.4. Samples: 198028800. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:36:10,741][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:36:15,068][1652475] Updated weights for policy 0, policy_version 386689 (0.0032) [2024-06-15 16:36:15,755][1648984] Fps is (10 sec: 35980.7, 60 sec: 42585.8, 300 sec: 42873.6). Total num frames: 792002560. Throughput: 0: 11032.1. Samples: 198102016. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:36:15,756][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:36:16,500][1652475] Updated weights for policy 0, policy_version 386752 (0.0015) [2024-06-15 16:36:18,607][1652475] Updated weights for policy 0, policy_version 386835 (0.0095) [2024-06-15 16:36:19,812][1652475] Updated weights for policy 0, policy_version 386886 (0.0020) [2024-06-15 16:36:20,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 44236.9, 300 sec: 43209.3). Total num frames: 792395776. Throughput: 0: 10956.8. Samples: 198126592. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:36:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 16:36:25,738][1648984] Fps is (10 sec: 45956.7, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 792461312. Throughput: 0: 10717.9. Samples: 198187520. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:36:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 16:36:28,332][1652475] Updated weights for policy 0, policy_version 386960 (0.0012) [2024-06-15 16:36:29,807][1652475] Updated weights for policy 0, policy_version 387010 (0.0013) [2024-06-15 16:36:30,738][1648984] Fps is (10 sec: 26213.8, 60 sec: 43144.4, 300 sec: 42765.0). Total num frames: 792657920. Throughput: 0: 10877.1. Samples: 198253056. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 16:36:30,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:36:31,641][1652475] Updated weights for policy 0, policy_version 387088 (0.0015) [2024-06-15 16:36:34,152][1652475] Updated weights for policy 0, policy_version 387198 (0.0083) [2024-06-15 16:36:35,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 792985600. Throughput: 0: 10387.9. Samples: 198270976. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:36:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:36:40,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 792985600. Throughput: 0: 10410.7. Samples: 198341120. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:36:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:36:43,119][1652475] Updated weights for policy 0, policy_version 387266 (0.0013) [2024-06-15 16:36:44,842][1652475] Updated weights for policy 0, policy_version 387344 (0.0014) [2024-06-15 16:36:45,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 793346048. Throughput: 0: 10285.5. Samples: 198397440. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:36:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:36:46,009][1652475] Updated weights for policy 0, policy_version 387395 (0.0011) [2024-06-15 16:36:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 793509888. Throughput: 0: 10217.2. Samples: 198427648. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:36:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:36:52,507][1652475] Updated weights for policy 0, policy_version 387459 (0.0013) [2024-06-15 16:36:54,861][1652475] Updated weights for policy 0, policy_version 387536 (0.0056) [2024-06-15 16:36:55,460][1651340] Signal inference workers to stop experience collection... (19900 times) [2024-06-15 16:36:55,499][1652475] InferenceWorker_p0-w0: stopping experience collection (19900 times) [2024-06-15 16:36:55,687][1651340] Signal inference workers to resume experience collection... (19900 times) [2024-06-15 16:36:55,688][1652475] InferenceWorker_p0-w0: resuming experience collection (19900 times) [2024-06-15 16:36:55,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 40959.9, 300 sec: 42876.1). Total num frames: 793739264. Throughput: 0: 10331.0. Samples: 198493696. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:36:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:36:55,883][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000387584_793772032.pth... [2024-06-15 16:36:55,968][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000382592_783548416.pth [2024-06-15 16:36:59,015][1652475] Updated weights for policy 0, policy_version 387632 (0.0014) [2024-06-15 16:37:00,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 794001408. Throughput: 0: 10039.2. Samples: 198553600. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:37:00,858][1652475] Updated weights for policy 0, policy_version 387709 (0.0012) [2024-06-15 16:37:05,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 40960.0, 300 sec: 42876.1). Total num frames: 794099712. Throughput: 0: 10171.7. Samples: 198584320. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:37:05,946][1652475] Updated weights for policy 0, policy_version 387747 (0.0020) [2024-06-15 16:37:07,936][1652475] Updated weights for policy 0, policy_version 387836 (0.0212) [2024-06-15 16:37:10,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 39321.7, 300 sec: 42209.7). Total num frames: 794296320. Throughput: 0: 10205.9. Samples: 198646784. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:37:11,980][1652475] Updated weights for policy 0, policy_version 387888 (0.0016) [2024-06-15 16:37:13,853][1652475] Updated weights for policy 0, policy_version 387964 (0.0017) [2024-06-15 16:37:15,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42611.1, 300 sec: 42653.9). Total num frames: 794558464. Throughput: 0: 10114.9. Samples: 198708224. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:15,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:37:19,164][1652475] Updated weights for policy 0, policy_version 388032 (0.0094) [2024-06-15 16:37:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 40413.8, 300 sec: 42653.9). Total num frames: 794820608. Throughput: 0: 10570.0. Samples: 198746624. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 16:37:22,599][1652475] Updated weights for policy 0, policy_version 388104 (0.0017) [2024-06-15 16:37:24,272][1652475] Updated weights for policy 0, policy_version 388176 (0.0012) [2024-06-15 16:37:25,742][1648984] Fps is (10 sec: 52404.5, 60 sec: 43687.4, 300 sec: 42653.3). Total num frames: 795082752. Throughput: 0: 10398.2. Samples: 198809088. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:25,743][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 16:37:30,134][1652475] Updated weights for policy 0, policy_version 388256 (0.0012) [2024-06-15 16:37:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 795213824. Throughput: 0: 10672.4. Samples: 198877696. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:30,740][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:37:31,823][1652475] Updated weights for policy 0, policy_version 388323 (0.0014) [2024-06-15 16:37:34,629][1652475] Updated weights for policy 0, policy_version 388384 (0.0085) [2024-06-15 16:37:35,737][1648984] Fps is (10 sec: 42618.5, 60 sec: 42052.4, 300 sec: 42654.0). Total num frames: 795508736. Throughput: 0: 10763.4. Samples: 198912000. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:35,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:37:36,721][1652475] Updated weights for policy 0, policy_version 388471 (0.0012) [2024-06-15 16:37:40,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 795607040. Throughput: 0: 10638.2. Samples: 198972416. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:40,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:37:42,199][1651340] Signal inference workers to stop experience collection... (19950 times) [2024-06-15 16:37:42,249][1652475] InferenceWorker_p0-w0: stopping experience collection (19950 times) [2024-06-15 16:37:42,257][1652475] Updated weights for policy 0, policy_version 388521 (0.0015) [2024-06-15 16:37:42,366][1651340] Signal inference workers to resume experience collection... (19950 times) [2024-06-15 16:37:42,367][1652475] InferenceWorker_p0-w0: resuming experience collection (19950 times) [2024-06-15 16:37:43,799][1652475] Updated weights for policy 0, policy_version 388577 (0.0015) [2024-06-15 16:37:45,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 795869184. Throughput: 0: 10888.5. Samples: 199043584. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:45,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:37:46,746][1652475] Updated weights for policy 0, policy_version 388642 (0.0024) [2024-06-15 16:37:48,859][1652475] Updated weights for policy 0, policy_version 388734 (0.0108) [2024-06-15 16:37:50,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 796131328. Throughput: 0: 10774.8. Samples: 199069184. Policy #0 lag: (min: 127.0, avg: 235.1, max: 335.0) [2024-06-15 16:37:50,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:37:55,598][1652475] Updated weights for policy 0, policy_version 388821 (0.0329) [2024-06-15 16:37:55,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 796327936. Throughput: 0: 11082.0. Samples: 199145472. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:37:55,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:37:58,855][1652475] Updated weights for policy 0, policy_version 388883 (0.0015) [2024-06-15 16:38:00,617][1652475] Updated weights for policy 0, policy_version 388944 (0.0021) [2024-06-15 16:38:00,760][1648984] Fps is (10 sec: 42502.5, 60 sec: 42582.5, 300 sec: 42317.5). Total num frames: 796557312. Throughput: 0: 10905.8. Samples: 199199232. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:00,761][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:38:05,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 796655616. Throughput: 0: 10729.2. Samples: 199229440. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:05,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:38:07,265][1652475] Updated weights for policy 0, policy_version 389024 (0.0014) [2024-06-15 16:38:09,577][1652475] Updated weights for policy 0, policy_version 389113 (0.0015) [2024-06-15 16:38:10,738][1648984] Fps is (10 sec: 36126.2, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 796917760. Throughput: 0: 10673.5. Samples: 199289344. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:10,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:38:12,706][1652475] Updated weights for policy 0, policy_version 389183 (0.0018) [2024-06-15 16:38:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 797179904. Throughput: 0: 10501.7. Samples: 199350272. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:15,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:38:19,525][1652475] Updated weights for policy 0, policy_version 389249 (0.0037) [2024-06-15 16:38:20,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.0, 300 sec: 42542.9). Total num frames: 797278208. Throughput: 0: 10592.7. Samples: 199388672. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:20,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:38:21,799][1652475] Updated weights for policy 0, policy_version 389344 (0.0012) [2024-06-15 16:38:23,837][1652475] Updated weights for policy 0, policy_version 389392 (0.0012) [2024-06-15 16:38:24,983][1652475] Updated weights for policy 0, policy_version 389440 (0.0015) [2024-06-15 16:38:25,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42055.5, 300 sec: 42320.7). Total num frames: 797605888. Throughput: 0: 10604.1. Samples: 199449600. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:25,740][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:38:26,051][1651340] Signal inference workers to stop experience collection... (20000 times) [2024-06-15 16:38:26,109][1652475] InferenceWorker_p0-w0: stopping experience collection (20000 times) [2024-06-15 16:38:26,278][1651340] Signal inference workers to resume experience collection... (20000 times) [2024-06-15 16:38:26,279][1652475] InferenceWorker_p0-w0: resuming experience collection (20000 times) [2024-06-15 16:38:26,419][1652475] Updated weights for policy 0, policy_version 389495 (0.0014) [2024-06-15 16:38:30,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 41506.0, 300 sec: 42542.8). Total num frames: 797704192. Throughput: 0: 10581.3. Samples: 199519744. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:30,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 16:38:32,478][1652475] Updated weights for policy 0, policy_version 389552 (0.0201) [2024-06-15 16:38:34,485][1652475] Updated weights for policy 0, policy_version 389628 (0.0012) [2024-06-15 16:38:35,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 40959.9, 300 sec: 42431.8). Total num frames: 797966336. Throughput: 0: 10672.3. Samples: 199549440. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:35,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 16:38:36,928][1652475] Updated weights for policy 0, policy_version 389680 (0.0105) [2024-06-15 16:38:38,472][1652475] Updated weights for policy 0, policy_version 389747 (0.0014) [2024-06-15 16:38:40,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 798228480. Throughput: 0: 10296.9. Samples: 199608832. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:38:44,497][1652475] Updated weights for policy 0, policy_version 389795 (0.0013) [2024-06-15 16:38:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 798359552. Throughput: 0: 10654.9. Samples: 199678464. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:45,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:38:46,326][1652475] Updated weights for policy 0, policy_version 389856 (0.0011) [2024-06-15 16:38:47,516][1652475] Updated weights for policy 0, policy_version 389891 (0.0014) [2024-06-15 16:38:49,324][1652475] Updated weights for policy 0, policy_version 389968 (0.0013) [2024-06-15 16:38:50,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 798752768. Throughput: 0: 10706.4. Samples: 199711232. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:50,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:38:55,740][1648984] Fps is (10 sec: 42590.2, 60 sec: 40958.6, 300 sec: 42767.7). Total num frames: 798785536. Throughput: 0: 10774.3. Samples: 199774208. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:38:55,740][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 16:38:55,787][1652475] Updated weights for policy 0, policy_version 390035 (0.0014) [2024-06-15 16:38:56,318][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000390064_798851072.pth... [2024-06-15 16:38:56,368][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000385088_788660224.pth [2024-06-15 16:38:58,744][1652475] Updated weights for policy 0, policy_version 390081 (0.0012) [2024-06-15 16:39:00,016][1652475] Updated weights for policy 0, policy_version 390144 (0.0013) [2024-06-15 16:39:00,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 41521.5, 300 sec: 42211.4). Total num frames: 799047680. Throughput: 0: 10922.6. Samples: 199841792. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:39:00,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 16:39:01,645][1652475] Updated weights for policy 0, policy_version 390208 (0.0012) [2024-06-15 16:39:05,738][1648984] Fps is (10 sec: 49162.0, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 799277056. Throughput: 0: 10581.3. Samples: 199864832. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:39:05,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 16:39:06,777][1652475] Updated weights for policy 0, policy_version 390276 (0.0031) [2024-06-15 16:39:07,893][1652475] Updated weights for policy 0, policy_version 390324 (0.0012) [2024-06-15 16:39:10,738][1648984] Fps is (10 sec: 36045.8, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 799408128. Throughput: 0: 10786.1. Samples: 199934976. Policy #0 lag: (min: 9.0, avg: 81.8, max: 265.0) [2024-06-15 16:39:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 16:39:11,408][1652475] Updated weights for policy 0, policy_version 390368 (0.0017) [2024-06-15 16:39:13,677][1652475] Updated weights for policy 0, policy_version 390404 (0.0014) [2024-06-15 16:39:14,038][1651340] Signal inference workers to stop experience collection... (20050 times) [2024-06-15 16:39:14,099][1652475] InferenceWorker_p0-w0: stopping experience collection (20050 times) [2024-06-15 16:39:14,372][1651340] Signal inference workers to resume experience collection... (20050 times) [2024-06-15 16:39:14,373][1652475] InferenceWorker_p0-w0: resuming experience collection (20050 times) [2024-06-15 16:39:15,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 799703040. Throughput: 0: 10479.0. Samples: 199991296. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:39:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:39:15,983][1652475] Updated weights for policy 0, policy_version 390496 (0.0012) [2024-06-15 16:39:19,882][1652475] Updated weights for policy 0, policy_version 390585 (0.0015) [2024-06-15 16:39:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 799932416. Throughput: 0: 10479.0. Samples: 200020992. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:39:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:39:23,836][1652475] Updated weights for policy 0, policy_version 390640 (0.0013) [2024-06-15 16:39:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40959.9, 300 sec: 42098.5). Total num frames: 800063488. Throughput: 0: 10422.0. Samples: 200077824. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:39:25,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:39:27,711][1652475] Updated weights for policy 0, policy_version 390674 (0.0013) [2024-06-15 16:39:30,300][1652475] Updated weights for policy 0, policy_version 390777 (0.0013) [2024-06-15 16:39:30,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.8, 300 sec: 42654.0). Total num frames: 800325632. Throughput: 0: 10183.1. Samples: 200136704. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:39:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:39:34,139][1652475] Updated weights for policy 0, policy_version 390840 (0.0012) [2024-06-15 16:39:35,740][1648984] Fps is (10 sec: 45865.8, 60 sec: 42597.0, 300 sec: 42431.5). Total num frames: 800522240. Throughput: 0: 10250.9. Samples: 200172544. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:39:35,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:39:36,195][1652475] Updated weights for policy 0, policy_version 390901 (0.0017) [2024-06-15 16:39:40,740][1648984] Fps is (10 sec: 36044.4, 60 sec: 40960.0, 300 sec: 42098.5). Total num frames: 800686080. Throughput: 0: 10286.0. Samples: 200237056. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:39:40,740][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 16:39:40,748][1652475] Updated weights for policy 0, policy_version 390967 (0.0015) [2024-06-15 16:39:42,230][1652475] Updated weights for policy 0, policy_version 391011 (0.0013) [2024-06-15 16:39:45,738][1648984] Fps is (10 sec: 32774.9, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 800849920. Throughput: 0: 10240.1. Samples: 200302592. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:39:45,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:39:45,937][1652475] Updated weights for policy 0, policy_version 391056 (0.0017) [2024-06-15 16:39:48,319][1652475] Updated weights for policy 0, policy_version 391140 (0.0117) [2024-06-15 16:39:50,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 39321.7, 300 sec: 42209.6). Total num frames: 801112064. Throughput: 0: 10217.2. Samples: 200324608. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:39:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:39:53,256][1652475] Updated weights for policy 0, policy_version 391229 (0.0027) [2024-06-15 16:39:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43145.9, 300 sec: 42653.9). Total num frames: 801374208. Throughput: 0: 10149.0. Samples: 200391680. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:39:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:39:58,458][1652475] Updated weights for policy 0, policy_version 391312 (0.0013) [2024-06-15 16:40:00,392][1651340] Signal inference workers to stop experience collection... (20100 times) [2024-06-15 16:40:00,464][1652475] InferenceWorker_p0-w0: stopping experience collection (20100 times) [2024-06-15 16:40:00,563][1651340] Signal inference workers to resume experience collection... (20100 times) [2024-06-15 16:40:00,564][1652475] InferenceWorker_p0-w0: resuming experience collection (20100 times) [2024-06-15 16:40:00,566][1652475] Updated weights for policy 0, policy_version 391392 (0.0016) [2024-06-15 16:40:00,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42052.5, 300 sec: 42653.9). Total num frames: 801570816. Throughput: 0: 10422.1. Samples: 200460288. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:40:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:03,892][1652475] Updated weights for policy 0, policy_version 391456 (0.0028) [2024-06-15 16:40:05,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41506.0, 300 sec: 42209.6). Total num frames: 801767424. Throughput: 0: 10615.4. Samples: 200498688. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:40:05,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:06,454][1652475] Updated weights for policy 0, policy_version 391536 (0.0014) [2024-06-15 16:40:10,723][1652475] Updated weights for policy 0, policy_version 391585 (0.0012) [2024-06-15 16:40:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 801964032. Throughput: 0: 10774.8. Samples: 200562688. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:40:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:12,492][1652475] Updated weights for policy 0, policy_version 391664 (0.0014) [2024-06-15 16:40:15,729][1652475] Updated weights for policy 0, policy_version 391702 (0.0013) [2024-06-15 16:40:15,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 802193408. Throughput: 0: 10922.6. Samples: 200628224. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:40:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:17,555][1652475] Updated weights for policy 0, policy_version 391749 (0.0026) [2024-06-15 16:40:18,582][1652475] Updated weights for policy 0, policy_version 391804 (0.0014) [2024-06-15 16:40:20,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 802422784. Throughput: 0: 10809.4. Samples: 200658944. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:40:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:22,323][1652475] Updated weights for policy 0, policy_version 391845 (0.0014) [2024-06-15 16:40:24,371][1652475] Updated weights for policy 0, policy_version 391932 (0.0015) [2024-06-15 16:40:25,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 802684928. Throughput: 0: 10865.7. Samples: 200726016. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:40:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:27,910][1652475] Updated weights for policy 0, policy_version 392000 (0.0012) [2024-06-15 16:40:29,602][1652475] Updated weights for policy 0, policy_version 392064 (0.0081) [2024-06-15 16:40:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 802947072. Throughput: 0: 10922.7. Samples: 200794112. Policy #0 lag: (min: 15.0, avg: 133.7, max: 335.0) [2024-06-15 16:40:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:35,521][1652475] Updated weights for policy 0, policy_version 392144 (0.0014) [2024-06-15 16:40:35,739][1648984] Fps is (10 sec: 42592.7, 60 sec: 43145.0, 300 sec: 42764.8). Total num frames: 803110912. Throughput: 0: 11331.9. Samples: 200834560. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:40:35,740][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:38,760][1652475] Updated weights for policy 0, policy_version 392208 (0.0025) [2024-06-15 16:40:40,740][1648984] Fps is (10 sec: 42598.2, 60 sec: 44782.9, 300 sec: 42320.7). Total num frames: 803373056. Throughput: 0: 11241.2. Samples: 200897536. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:40:40,741][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:40,987][1652475] Updated weights for policy 0, policy_version 392290 (0.0024) [2024-06-15 16:40:44,918][1652475] Updated weights for policy 0, policy_version 392352 (0.0014) [2024-06-15 16:40:45,738][1648984] Fps is (10 sec: 49159.3, 60 sec: 45875.2, 300 sec: 42765.0). Total num frames: 803602432. Throughput: 0: 11195.7. Samples: 200964096. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:40:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:47,629][1652475] Updated weights for policy 0, policy_version 392418 (0.0019) [2024-06-15 16:40:50,168][1651340] Signal inference workers to stop experience collection... (20150 times) [2024-06-15 16:40:50,214][1652475] InferenceWorker_p0-w0: stopping experience collection (20150 times) [2024-06-15 16:40:50,473][1651340] Signal inference workers to resume experience collection... (20150 times) [2024-06-15 16:40:50,474][1652475] InferenceWorker_p0-w0: resuming experience collection (20150 times) [2024-06-15 16:40:50,638][1652475] Updated weights for policy 0, policy_version 392469 (0.0035) [2024-06-15 16:40:50,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44236.8, 300 sec: 42320.7). Total num frames: 803766272. Throughput: 0: 11116.1. Samples: 200998912. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:40:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:52,690][1652475] Updated weights for policy 0, policy_version 392560 (0.0026) [2024-06-15 16:40:55,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 43690.4, 300 sec: 42653.9). Total num frames: 803995648. Throughput: 0: 11138.8. Samples: 201063936. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:40:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:40:56,119][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000392592_804028416.pth... [2024-06-15 16:40:56,234][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000387584_793772032.pth [2024-06-15 16:40:56,611][1652475] Updated weights for policy 0, policy_version 392610 (0.0031) [2024-06-15 16:41:00,266][1652475] Updated weights for policy 0, policy_version 392672 (0.0091) [2024-06-15 16:41:00,739][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 42653.9). Total num frames: 804225024. Throughput: 0: 11241.2. Samples: 201134080. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:00,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:41:02,706][1652475] Updated weights for policy 0, policy_version 392752 (0.0013) [2024-06-15 16:41:04,617][1652475] Updated weights for policy 0, policy_version 392804 (0.0014) [2024-06-15 16:41:05,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 45875.3, 300 sec: 42653.9). Total num frames: 804519936. Throughput: 0: 11138.8. Samples: 201160192. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:05,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 16:41:07,538][1652475] Updated weights for policy 0, policy_version 392848 (0.0016) [2024-06-15 16:41:10,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44782.9, 300 sec: 42878.7). Total num frames: 804651008. Throughput: 0: 11138.9. Samples: 201227264. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 16:41:12,843][1652475] Updated weights for policy 0, policy_version 392902 (0.0016) [2024-06-15 16:41:14,401][1652475] Updated weights for policy 0, policy_version 392964 (0.0014) [2024-06-15 16:41:15,517][1652475] Updated weights for policy 0, policy_version 393024 (0.0020) [2024-06-15 16:41:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 45329.1, 300 sec: 42431.8). Total num frames: 804913152. Throughput: 0: 11082.0. Samples: 201292800. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:41:19,059][1652475] Updated weights for policy 0, policy_version 393104 (0.0016) [2024-06-15 16:41:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 805175296. Throughput: 0: 10900.3. Samples: 201325056. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:41:25,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 42052.4, 300 sec: 42542.9). Total num frames: 805208064. Throughput: 0: 10911.3. Samples: 201388544. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:41:26,256][1652475] Updated weights for policy 0, policy_version 393186 (0.0030) [2024-06-15 16:41:27,786][1652475] Updated weights for policy 0, policy_version 393248 (0.0014) [2024-06-15 16:41:29,118][1652475] Updated weights for policy 0, policy_version 393281 (0.0012) [2024-06-15 16:41:30,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 805601280. Throughput: 0: 10706.5. Samples: 201445888. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 16:41:30,925][1652475] Updated weights for policy 0, policy_version 393362 (0.0018) [2024-06-15 16:41:35,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43145.6, 300 sec: 43098.3). Total num frames: 805699584. Throughput: 0: 10604.1. Samples: 201476096. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:41:38,642][1651340] Signal inference workers to stop experience collection... (20200 times) [2024-06-15 16:41:38,669][1652475] InferenceWorker_p0-w0: stopping experience collection (20200 times) [2024-06-15 16:41:38,674][1652475] Updated weights for policy 0, policy_version 393441 (0.0012) [2024-06-15 16:41:38,992][1651340] Signal inference workers to resume experience collection... (20200 times) [2024-06-15 16:41:38,993][1652475] InferenceWorker_p0-w0: resuming experience collection (20200 times) [2024-06-15 16:41:40,738][1648984] Fps is (10 sec: 29490.6, 60 sec: 42052.1, 300 sec: 42542.8). Total num frames: 805896192. Throughput: 0: 10626.9. Samples: 201542144. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:40,739][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 16:41:40,921][1652475] Updated weights for policy 0, policy_version 393520 (0.0098) [2024-06-15 16:41:44,093][1652475] Updated weights for policy 0, policy_version 393621 (0.0013) [2024-06-15 16:41:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 806223872. Throughput: 0: 10365.2. Samples: 201600512. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:41:49,771][1652475] Updated weights for policy 0, policy_version 393680 (0.0013) [2024-06-15 16:41:50,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 42598.5, 300 sec: 42654.0). Total num frames: 806322176. Throughput: 0: 10581.3. Samples: 201636352. Policy #0 lag: (min: 47.0, avg: 138.5, max: 303.0) [2024-06-15 16:41:50,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 16:41:50,767][1652475] Updated weights for policy 0, policy_version 393728 (0.0013) [2024-06-15 16:41:52,467][1652475] Updated weights for policy 0, policy_version 393786 (0.0013) [2024-06-15 16:41:55,498][1652475] Updated weights for policy 0, policy_version 393856 (0.0017) [2024-06-15 16:41:55,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 43690.8, 300 sec: 42765.0). Total num frames: 806617088. Throughput: 0: 10672.3. Samples: 201707520. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:41:55,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:41:57,157][1652475] Updated weights for policy 0, policy_version 393915 (0.0013) [2024-06-15 16:42:00,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 806748160. Throughput: 0: 10672.4. Samples: 201773056. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:00,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:02,467][1652475] Updated weights for policy 0, policy_version 393977 (0.0018) [2024-06-15 16:42:03,591][1652475] Updated weights for policy 0, policy_version 394017 (0.0089) [2024-06-15 16:42:05,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 807010304. Throughput: 0: 10672.3. Samples: 201805312. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:05,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:06,476][1652475] Updated weights for policy 0, policy_version 394069 (0.0015) [2024-06-15 16:42:07,945][1652475] Updated weights for policy 0, policy_version 394128 (0.0017) [2024-06-15 16:42:10,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 807272448. Throughput: 0: 10729.2. Samples: 201871360. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:10,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:12,111][1652475] Updated weights for policy 0, policy_version 394177 (0.0015) [2024-06-15 16:42:13,410][1652475] Updated weights for policy 0, policy_version 394233 (0.0015) [2024-06-15 16:42:15,215][1652475] Updated weights for policy 0, policy_version 394296 (0.0097) [2024-06-15 16:42:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 807534592. Throughput: 0: 11013.7. Samples: 201941504. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:15,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:18,233][1652475] Updated weights for policy 0, policy_version 394365 (0.0015) [2024-06-15 16:42:20,469][1652475] Updated weights for policy 0, policy_version 394416 (0.0015) [2024-06-15 16:42:20,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.5, 300 sec: 43098.9). Total num frames: 807796736. Throughput: 0: 11116.0. Samples: 201976320. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:20,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:24,660][1652475] Updated weights for policy 0, policy_version 394467 (0.0013) [2024-06-15 16:42:25,584][1651340] Signal inference workers to stop experience collection... (20250 times) [2024-06-15 16:42:25,621][1652475] InferenceWorker_p0-w0: stopping experience collection (20250 times) [2024-06-15 16:42:25,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 45329.1, 300 sec: 43098.2). Total num frames: 807927808. Throughput: 0: 11241.3. Samples: 202048000. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:25,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:25,812][1651340] Signal inference workers to resume experience collection... (20250 times) [2024-06-15 16:42:25,813][1652475] InferenceWorker_p0-w0: resuming experience collection (20250 times) [2024-06-15 16:42:26,574][1652475] Updated weights for policy 0, policy_version 394544 (0.0012) [2024-06-15 16:42:29,770][1652475] Updated weights for policy 0, policy_version 394618 (0.0023) [2024-06-15 16:42:30,739][1648984] Fps is (10 sec: 39318.7, 60 sec: 43143.9, 300 sec: 42987.0). Total num frames: 808189952. Throughput: 0: 11332.0. Samples: 202110464. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:30,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:32,707][1652475] Updated weights for policy 0, policy_version 394681 (0.0014) [2024-06-15 16:42:35,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 808353792. Throughput: 0: 11332.3. Samples: 202146304. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:35,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:36,160][1652475] Updated weights for policy 0, policy_version 394736 (0.0018) [2024-06-15 16:42:37,701][1652475] Updated weights for policy 0, policy_version 394784 (0.0086) [2024-06-15 16:42:40,639][1652475] Updated weights for policy 0, policy_version 394832 (0.0012) [2024-06-15 16:42:40,738][1648984] Fps is (10 sec: 42601.6, 60 sec: 45329.1, 300 sec: 43209.3). Total num frames: 808615936. Throughput: 0: 11309.5. Samples: 202216448. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:40,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:42,920][1652475] Updated weights for policy 0, policy_version 394883 (0.0052) [2024-06-15 16:42:44,428][1652475] Updated weights for policy 0, policy_version 394943 (0.0014) [2024-06-15 16:42:45,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 808845312. Throughput: 0: 11264.0. Samples: 202279936. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:45,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:49,231][1652475] Updated weights for policy 0, policy_version 395024 (0.0012) [2024-06-15 16:42:50,738][1648984] Fps is (10 sec: 49153.3, 60 sec: 46421.4, 300 sec: 43320.4). Total num frames: 809107456. Throughput: 0: 11355.0. Samples: 202316288. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:50,740][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:52,516][1652475] Updated weights for policy 0, policy_version 395075 (0.0035) [2024-06-15 16:42:55,040][1652475] Updated weights for policy 0, policy_version 395152 (0.0012) [2024-06-15 16:42:55,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44783.1, 300 sec: 43212.6). Total num frames: 809304064. Throughput: 0: 11355.1. Samples: 202382336. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:42:55,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:42:56,141][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000395200_809369600.pth... [2024-06-15 16:42:56,194][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000390064_798851072.pth [2024-06-15 16:42:58,687][1652475] Updated weights for policy 0, policy_version 395201 (0.0032) [2024-06-15 16:43:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 809500672. Throughput: 0: 11320.9. Samples: 202450944. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:43:00,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:43:01,329][1652475] Updated weights for policy 0, policy_version 395296 (0.0083) [2024-06-15 16:43:05,017][1652475] Updated weights for policy 0, policy_version 395382 (0.0014) [2024-06-15 16:43:05,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 809762816. Throughput: 0: 11252.7. Samples: 202482688. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:43:05,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 16:43:07,072][1652475] Updated weights for policy 0, policy_version 395424 (0.0012) [2024-06-15 16:43:10,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.8, 300 sec: 43098.2). Total num frames: 809893888. Throughput: 0: 11207.1. Samples: 202552320. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:43:10,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 16:43:10,866][1652475] Updated weights for policy 0, policy_version 395472 (0.0013) [2024-06-15 16:43:12,209][1651340] Signal inference workers to stop experience collection... (20300 times) [2024-06-15 16:43:12,264][1652475] InferenceWorker_p0-w0: stopping experience collection (20300 times) [2024-06-15 16:43:12,414][1651340] Signal inference workers to resume experience collection... (20300 times) [2024-06-15 16:43:12,415][1652475] InferenceWorker_p0-w0: resuming experience collection (20300 times) [2024-06-15 16:43:12,492][1652475] Updated weights for policy 0, policy_version 395536 (0.0088) [2024-06-15 16:43:15,740][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 810156032. Throughput: 0: 11252.9. Samples: 202616832. Policy #0 lag: (min: 11.0, avg: 117.8, max: 267.0) [2024-06-15 16:43:15,741][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 16:43:16,831][1652475] Updated weights for policy 0, policy_version 395642 (0.0025) [2024-06-15 16:43:19,144][1652475] Updated weights for policy 0, policy_version 395680 (0.0012) [2024-06-15 16:43:20,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 810418176. Throughput: 0: 11195.7. Samples: 202650112. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:43:20,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:43:23,145][1652475] Updated weights for policy 0, policy_version 395728 (0.0011) [2024-06-15 16:43:24,355][1652475] Updated weights for policy 0, policy_version 395780 (0.0024) [2024-06-15 16:43:25,679][1652475] Updated weights for policy 0, policy_version 395833 (0.0112) [2024-06-15 16:43:25,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45329.0, 300 sec: 43875.8). Total num frames: 810647552. Throughput: 0: 11104.8. Samples: 202716160. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:43:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:43:28,597][1652475] Updated weights for policy 0, policy_version 395888 (0.0014) [2024-06-15 16:43:30,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 44237.5, 300 sec: 43653.7). Total num frames: 810844160. Throughput: 0: 11150.2. Samples: 202781696. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:43:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:43:31,361][1652475] Updated weights for policy 0, policy_version 395964 (0.0014) [2024-06-15 16:43:35,207][1652475] Updated weights for policy 0, policy_version 396027 (0.0014) [2024-06-15 16:43:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 45329.1, 300 sec: 43542.6). Total num frames: 811073536. Throughput: 0: 11138.8. Samples: 202817536. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:43:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 16:43:37,583][1652475] Updated weights for policy 0, policy_version 396064 (0.0013) [2024-06-15 16:43:39,536][1652475] Updated weights for policy 0, policy_version 396144 (0.0050) [2024-06-15 16:43:40,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 45329.2, 300 sec: 43986.9). Total num frames: 811335680. Throughput: 0: 11059.2. Samples: 202880000. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:43:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:43:41,998][1652475] Updated weights for policy 0, policy_version 396206 (0.0077) [2024-06-15 16:43:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 811466752. Throughput: 0: 11138.8. Samples: 202952192. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:43:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:43:48,152][1652475] Updated weights for policy 0, policy_version 396256 (0.0013) [2024-06-15 16:43:50,450][1652475] Updated weights for policy 0, policy_version 396349 (0.0013) [2024-06-15 16:43:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43876.1). Total num frames: 811728896. Throughput: 0: 11127.5. Samples: 202983424. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:43:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:43:53,665][1652475] Updated weights for policy 0, policy_version 396405 (0.0015) [2024-06-15 16:43:55,355][1652475] Updated weights for policy 0, policy_version 396472 (0.0011) [2024-06-15 16:43:55,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44783.0, 300 sec: 43875.8). Total num frames: 811991040. Throughput: 0: 10729.3. Samples: 203035136. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:43:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:44:00,470][1651340] Signal inference workers to stop experience collection... (20350 times) [2024-06-15 16:44:00,520][1652475] InferenceWorker_p0-w0: stopping experience collection (20350 times) [2024-06-15 16:44:00,679][1651340] Signal inference workers to resume experience collection... (20350 times) [2024-06-15 16:44:00,680][1652475] InferenceWorker_p0-w0: resuming experience collection (20350 times) [2024-06-15 16:44:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 812089344. Throughput: 0: 10706.5. Samples: 203098624. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:44:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 16:44:00,872][1652475] Updated weights for policy 0, policy_version 396537 (0.0012) [2024-06-15 16:44:05,738][1648984] Fps is (10 sec: 26214.1, 60 sec: 41506.1, 300 sec: 43542.5). Total num frames: 812253184. Throughput: 0: 10535.8. Samples: 203124224. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:44:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:44:06,402][1652475] Updated weights for policy 0, policy_version 396624 (0.0014) [2024-06-15 16:44:08,436][1652475] Updated weights for policy 0, policy_version 396720 (0.0098) [2024-06-15 16:44:10,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 812515328. Throughput: 0: 10467.5. Samples: 203187200. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:44:10,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:44:12,624][1652475] Updated weights for policy 0, policy_version 396797 (0.0014) [2024-06-15 16:44:15,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 812777472. Throughput: 0: 10456.2. Samples: 203252224. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:44:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:44:17,946][1652475] Updated weights for policy 0, policy_version 396868 (0.0014) [2024-06-15 16:44:20,254][1652475] Updated weights for policy 0, policy_version 396944 (0.0015) [2024-06-15 16:44:20,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42598.5, 300 sec: 43764.7). Total num frames: 812974080. Throughput: 0: 10513.1. Samples: 203290624. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:44:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:44:23,689][1652475] Updated weights for policy 0, policy_version 397009 (0.0017) [2024-06-15 16:44:25,051][1652475] Updated weights for policy 0, policy_version 397077 (0.0128) [2024-06-15 16:44:25,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 813268992. Throughput: 0: 10683.7. Samples: 203360768. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:44:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:44:29,502][1652475] Updated weights for policy 0, policy_version 397127 (0.0015) [2024-06-15 16:44:30,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42598.5, 300 sec: 43654.0). Total num frames: 813400064. Throughput: 0: 10570.0. Samples: 203427840. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:44:30,740][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:44:30,743][1652475] Updated weights for policy 0, policy_version 397175 (0.0018) [2024-06-15 16:44:34,866][1652475] Updated weights for policy 0, policy_version 397252 (0.0016) [2024-06-15 16:44:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 813662208. Throughput: 0: 10547.2. Samples: 203458048. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 16:44:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:44:36,036][1652475] Updated weights for policy 0, policy_version 397318 (0.0013) [2024-06-15 16:44:37,160][1652475] Updated weights for policy 0, policy_version 397376 (0.0013) [2024-06-15 16:44:40,738][1648984] Fps is (10 sec: 42597.1, 60 sec: 41506.0, 300 sec: 43986.8). Total num frames: 813826048. Throughput: 0: 11025.0. Samples: 203531264. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:44:40,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:44:42,139][1652475] Updated weights for policy 0, policy_version 397434 (0.0034) [2024-06-15 16:44:44,176][1652475] Updated weights for policy 0, policy_version 397504 (0.0035) [2024-06-15 16:44:45,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 814088192. Throughput: 0: 11116.1. Samples: 203598848. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:44:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:44:46,732][1651340] Signal inference workers to stop experience collection... (20400 times) [2024-06-15 16:44:46,792][1652475] InferenceWorker_p0-w0: stopping experience collection (20400 times) [2024-06-15 16:44:46,961][1651340] Signal inference workers to resume experience collection... (20400 times) [2024-06-15 16:44:46,962][1652475] InferenceWorker_p0-w0: resuming experience collection (20400 times) [2024-06-15 16:44:47,762][1652475] Updated weights for policy 0, policy_version 397574 (0.0013) [2024-06-15 16:44:48,777][1652475] Updated weights for policy 0, policy_version 397632 (0.0013) [2024-06-15 16:44:50,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 814350336. Throughput: 0: 11298.1. Samples: 203632640. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:44:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:44:54,616][1652475] Updated weights for policy 0, policy_version 397699 (0.0013) [2024-06-15 16:44:55,738][1648984] Fps is (10 sec: 49150.4, 60 sec: 43144.3, 300 sec: 44097.9). Total num frames: 814579712. Throughput: 0: 11491.5. Samples: 203704320. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:44:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:44:55,754][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000397760_814612480.pth... [2024-06-15 16:44:55,814][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000392592_804028416.pth [2024-06-15 16:44:57,974][1652475] Updated weights for policy 0, policy_version 397776 (0.0015) [2024-06-15 16:44:59,899][1652475] Updated weights for policy 0, policy_version 397872 (0.0021) [2024-06-15 16:45:00,740][1648984] Fps is (10 sec: 52428.6, 60 sec: 46421.3, 300 sec: 44431.2). Total num frames: 814874624. Throughput: 0: 11400.5. Samples: 203765248. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:00,741][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:45:04,752][1652475] Updated weights for policy 0, policy_version 397923 (0.0013) [2024-06-15 16:45:05,738][1648984] Fps is (10 sec: 42599.9, 60 sec: 45875.3, 300 sec: 44209.0). Total num frames: 815005696. Throughput: 0: 11446.0. Samples: 203805696. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:45:06,599][1652475] Updated weights for policy 0, policy_version 397972 (0.0017) [2024-06-15 16:45:08,835][1652475] Updated weights for policy 0, policy_version 398017 (0.0014) [2024-06-15 16:45:10,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 46421.4, 300 sec: 44431.2). Total num frames: 815300608. Throughput: 0: 11457.4. Samples: 203876352. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:45:10,814][1652475] Updated weights for policy 0, policy_version 398112 (0.0015) [2024-06-15 16:45:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 815398912. Throughput: 0: 11446.0. Samples: 203942912. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:45:16,282][1652475] Updated weights for policy 0, policy_version 398176 (0.0029) [2024-06-15 16:45:18,350][1652475] Updated weights for policy 0, policy_version 398243 (0.0017) [2024-06-15 16:45:20,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 815661056. Throughput: 0: 11411.9. Samples: 203971584. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:45:21,955][1652475] Updated weights for policy 0, policy_version 398323 (0.0014) [2024-06-15 16:45:23,704][1652475] Updated weights for policy 0, policy_version 398393 (0.0014) [2024-06-15 16:45:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 815923200. Throughput: 0: 11195.8. Samples: 204035072. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:45:28,083][1652475] Updated weights for policy 0, policy_version 398456 (0.0094) [2024-06-15 16:45:30,121][1652475] Updated weights for policy 0, policy_version 398496 (0.0014) [2024-06-15 16:45:30,234][1651340] Signal inference workers to stop experience collection... (20450 times) [2024-06-15 16:45:30,303][1652475] InferenceWorker_p0-w0: stopping experience collection (20450 times) [2024-06-15 16:45:30,509][1651340] Signal inference workers to resume experience collection... (20450 times) [2024-06-15 16:45:30,510][1652475] InferenceWorker_p0-w0: resuming experience collection (20450 times) [2024-06-15 16:45:30,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 45875.0, 300 sec: 44209.2). Total num frames: 816152576. Throughput: 0: 11309.5. Samples: 204107776. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:30,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:45:33,063][1652475] Updated weights for policy 0, policy_version 398544 (0.0013) [2024-06-15 16:45:35,516][1652475] Updated weights for policy 0, policy_version 398640 (0.0083) [2024-06-15 16:45:35,738][1648984] Fps is (10 sec: 52427.5, 60 sec: 46421.1, 300 sec: 44320.1). Total num frames: 816447488. Throughput: 0: 11389.1. Samples: 204145152. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:45:39,145][1652475] Updated weights for policy 0, policy_version 398704 (0.0013) [2024-06-15 16:45:40,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 45875.4, 300 sec: 43986.9). Total num frames: 816578560. Throughput: 0: 11070.7. Samples: 204202496. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:45:41,753][1652475] Updated weights for policy 0, policy_version 398752 (0.0025) [2024-06-15 16:45:44,532][1652475] Updated weights for policy 0, policy_version 398800 (0.0013) [2024-06-15 16:45:45,738][1648984] Fps is (10 sec: 39322.7, 60 sec: 45875.2, 300 sec: 44320.1). Total num frames: 816840704. Throughput: 0: 11252.6. Samples: 204271616. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:45:48,144][1652475] Updated weights for policy 0, policy_version 398864 (0.0013) [2024-06-15 16:45:49,712][1652475] Updated weights for policy 0, policy_version 398928 (0.0013) [2024-06-15 16:45:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 44431.2). Total num frames: 817102848. Throughput: 0: 11195.7. Samples: 204309504. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:45:52,516][1652475] Updated weights for policy 0, policy_version 398983 (0.0014) [2024-06-15 16:45:53,805][1652475] Updated weights for policy 0, policy_version 399040 (0.0012) [2024-06-15 16:45:55,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44237.1, 300 sec: 44098.0). Total num frames: 817233920. Throughput: 0: 10934.1. Samples: 204368384. Policy #0 lag: (min: 47.0, avg: 161.3, max: 303.0) [2024-06-15 16:45:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:45:58,462][1652475] Updated weights for policy 0, policy_version 399098 (0.0019) [2024-06-15 16:46:00,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42598.5, 300 sec: 43764.7). Total num frames: 817430528. Throughput: 0: 10968.2. Samples: 204436480. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:46:01,126][1652475] Updated weights for policy 0, policy_version 399164 (0.0049) [2024-06-15 16:46:03,686][1652475] Updated weights for policy 0, policy_version 399234 (0.0014) [2024-06-15 16:46:05,010][1652475] Updated weights for policy 0, policy_version 399291 (0.0011) [2024-06-15 16:46:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 44431.2). Total num frames: 817758208. Throughput: 0: 10979.6. Samples: 204465664. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:46:10,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40960.1, 300 sec: 43542.6). Total num frames: 817758208. Throughput: 0: 11002.3. Samples: 204530176. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:46:12,657][1652475] Updated weights for policy 0, policy_version 399394 (0.0188) [2024-06-15 16:46:15,738][1648984] Fps is (10 sec: 26213.4, 60 sec: 43690.4, 300 sec: 43542.5). Total num frames: 818020352. Throughput: 0: 10717.8. Samples: 204590080. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:15,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:46:17,006][1652475] Updated weights for policy 0, policy_version 399472 (0.0030) [2024-06-15 16:46:17,406][1651340] Signal inference workers to stop experience collection... (20500 times) [2024-06-15 16:46:17,451][1652475] InferenceWorker_p0-w0: stopping experience collection (20500 times) [2024-06-15 16:46:17,633][1651340] Signal inference workers to resume experience collection... (20500 times) [2024-06-15 16:46:17,634][1652475] InferenceWorker_p0-w0: resuming experience collection (20500 times) [2024-06-15 16:46:18,682][1652475] Updated weights for policy 0, policy_version 399548 (0.0013) [2024-06-15 16:46:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 818282496. Throughput: 0: 10433.5. Samples: 204614656. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 16:46:22,762][1652475] Updated weights for policy 0, policy_version 399586 (0.0015) [2024-06-15 16:46:24,267][1652475] Updated weights for policy 0, policy_version 399649 (0.0012) [2024-06-15 16:46:25,738][1648984] Fps is (10 sec: 52430.4, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 818544640. Throughput: 0: 10729.2. Samples: 204685312. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:46:28,817][1652475] Updated weights for policy 0, policy_version 399712 (0.0015) [2024-06-15 16:46:30,517][1652475] Updated weights for policy 0, policy_version 399776 (0.0012) [2024-06-15 16:46:30,738][1648984] Fps is (10 sec: 45873.7, 60 sec: 43144.5, 300 sec: 44209.0). Total num frames: 818741248. Throughput: 0: 10660.9. Samples: 204751360. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:30,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:46:35,054][1652475] Updated weights for policy 0, policy_version 399872 (0.0016) [2024-06-15 16:46:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42052.5, 300 sec: 44320.1). Total num frames: 818970624. Throughput: 0: 10626.8. Samples: 204787712. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:46:36,827][1652475] Updated weights for policy 0, policy_version 399936 (0.0013) [2024-06-15 16:46:40,743][1648984] Fps is (10 sec: 32750.6, 60 sec: 41502.2, 300 sec: 43541.7). Total num frames: 819068928. Throughput: 0: 10693.8. Samples: 204849664. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:40,744][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:46:42,261][1652475] Updated weights for policy 0, policy_version 400000 (0.0013) [2024-06-15 16:46:43,585][1652475] Updated weights for policy 0, policy_version 400059 (0.0013) [2024-06-15 16:46:45,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.2, 300 sec: 44098.0). Total num frames: 819331072. Throughput: 0: 10604.1. Samples: 204913664. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:46:47,460][1652475] Updated weights for policy 0, policy_version 400128 (0.0012) [2024-06-15 16:46:49,042][1652475] Updated weights for policy 0, policy_version 400190 (0.0015) [2024-06-15 16:46:50,738][1648984] Fps is (10 sec: 52457.8, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 819593216. Throughput: 0: 10604.1. Samples: 204942848. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:46:55,075][1652475] Updated weights for policy 0, policy_version 400258 (0.0018) [2024-06-15 16:46:55,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 44209.0). Total num frames: 819789824. Throughput: 0: 10592.7. Samples: 205006848. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:46:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:46:56,030][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000400304_819822592.pth... [2024-06-15 16:46:56,091][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000395200_809369600.pth [2024-06-15 16:46:56,445][1652475] Updated weights for policy 0, policy_version 400318 (0.0016) [2024-06-15 16:47:00,144][1652475] Updated weights for policy 0, policy_version 400384 (0.0018) [2024-06-15 16:47:00,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43144.5, 300 sec: 44098.0). Total num frames: 820019200. Throughput: 0: 10638.3. Samples: 205068800. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:47:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:47:00,958][1651340] Signal inference workers to stop experience collection... (20550 times) [2024-06-15 16:47:00,992][1652475] InferenceWorker_p0-w0: stopping experience collection (20550 times) [2024-06-15 16:47:01,095][1651340] Signal inference workers to resume experience collection... (20550 times) [2024-06-15 16:47:01,096][1652475] InferenceWorker_p0-w0: resuming experience collection (20550 times) [2024-06-15 16:47:01,314][1652475] Updated weights for policy 0, policy_version 400448 (0.0015) [2024-06-15 16:47:05,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 39867.7, 300 sec: 43653.7). Total num frames: 820150272. Throughput: 0: 10899.9. Samples: 205105152. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:47:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:47:06,707][1652475] Updated weights for policy 0, policy_version 400512 (0.0012) [2024-06-15 16:47:08,163][1652475] Updated weights for policy 0, policy_version 400565 (0.0012) [2024-06-15 16:47:10,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 44236.6, 300 sec: 43653.6). Total num frames: 820412416. Throughput: 0: 10831.6. Samples: 205172736. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:47:10,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:47:11,192][1652475] Updated weights for policy 0, policy_version 400624 (0.0087) [2024-06-15 16:47:12,242][1652475] Updated weights for policy 0, policy_version 400675 (0.0101) [2024-06-15 16:47:15,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 820641792. Throughput: 0: 10979.6. Samples: 205245440. Policy #0 lag: (min: 10.0, avg: 117.8, max: 266.0) [2024-06-15 16:47:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:47:17,556][1652475] Updated weights for policy 0, policy_version 400725 (0.0011) [2024-06-15 16:47:19,707][1652475] Updated weights for policy 0, policy_version 400803 (0.0013) [2024-06-15 16:47:20,738][1648984] Fps is (10 sec: 49152.8, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 820903936. Throughput: 0: 10888.5. Samples: 205277696. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:47:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:47:22,102][1652475] Updated weights for policy 0, policy_version 400848 (0.0014) [2024-06-15 16:47:23,932][1652475] Updated weights for policy 0, policy_version 400928 (0.0012) [2024-06-15 16:47:25,746][1648984] Fps is (10 sec: 52384.2, 60 sec: 43684.5, 300 sec: 43985.8). Total num frames: 821166080. Throughput: 0: 10762.7. Samples: 205334016. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:47:25,747][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:47:30,173][1652475] Updated weights for policy 0, policy_version 400997 (0.0015) [2024-06-15 16:47:30,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.4, 300 sec: 43764.7). Total num frames: 821264384. Throughput: 0: 10945.4. Samples: 205406208. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:47:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:47:32,096][1652475] Updated weights for policy 0, policy_version 401080 (0.0015) [2024-06-15 16:47:35,503][1652475] Updated weights for policy 0, policy_version 401149 (0.0017) [2024-06-15 16:47:35,738][1648984] Fps is (10 sec: 39355.3, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 821559296. Throughput: 0: 10922.7. Samples: 205434368. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:47:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:47:37,212][1652475] Updated weights for policy 0, policy_version 401216 (0.0014) [2024-06-15 16:47:40,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43694.7, 300 sec: 43542.6). Total num frames: 821690368. Throughput: 0: 10865.8. Samples: 205495808. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:47:40,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:47:43,363][1652475] Updated weights for policy 0, policy_version 401296 (0.0014) [2024-06-15 16:47:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 821952512. Throughput: 0: 10888.5. Samples: 205558784. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:47:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:47:46,204][1651340] Signal inference workers to stop experience collection... (20600 times) [2024-06-15 16:47:46,252][1652475] InferenceWorker_p0-w0: stopping experience collection (20600 times) [2024-06-15 16:47:46,466][1651340] Signal inference workers to resume experience collection... (20600 times) [2024-06-15 16:47:46,466][1652475] InferenceWorker_p0-w0: resuming experience collection (20600 times) [2024-06-15 16:47:46,468][1652475] Updated weights for policy 0, policy_version 401360 (0.0014) [2024-06-15 16:47:47,563][1652475] Updated weights for policy 0, policy_version 401406 (0.0024) [2024-06-15 16:47:49,160][1652475] Updated weights for policy 0, policy_version 401466 (0.0093) [2024-06-15 16:47:50,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 822214656. Throughput: 0: 10820.3. Samples: 205592064. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:47:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:47:54,151][1652475] Updated weights for policy 0, policy_version 401520 (0.0050) [2024-06-15 16:47:55,599][1652475] Updated weights for policy 0, policy_version 401572 (0.0013) [2024-06-15 16:47:55,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 822411264. Throughput: 0: 10797.5. Samples: 205658624. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:47:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:47:59,628][1652475] Updated weights for policy 0, policy_version 401632 (0.0029) [2024-06-15 16:48:00,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 822607872. Throughput: 0: 10672.4. Samples: 205725696. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:48:00,740][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 16:48:01,284][1652475] Updated weights for policy 0, policy_version 401696 (0.0014) [2024-06-15 16:48:04,485][1652475] Updated weights for policy 0, policy_version 401729 (0.0013) [2024-06-15 16:48:05,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44782.9, 300 sec: 43875.8). Total num frames: 822837248. Throughput: 0: 10626.8. Samples: 205755904. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:48:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:48:06,692][1652475] Updated weights for policy 0, policy_version 401796 (0.0014) [2024-06-15 16:48:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 823001088. Throughput: 0: 10788.2. Samples: 205819392. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:48:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:48:11,297][1652475] Updated weights for policy 0, policy_version 401860 (0.0014) [2024-06-15 16:48:12,652][1652475] Updated weights for policy 0, policy_version 401916 (0.0096) [2024-06-15 16:48:14,544][1652475] Updated weights for policy 0, policy_version 401968 (0.0014) [2024-06-15 16:48:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 823263232. Throughput: 0: 10638.2. Samples: 205884928. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:48:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:48:16,421][1652475] Updated weights for policy 0, policy_version 402016 (0.0014) [2024-06-15 16:48:18,995][1652475] Updated weights for policy 0, policy_version 402050 (0.0014) [2024-06-15 16:48:20,740][1648984] Fps is (10 sec: 52418.3, 60 sec: 43689.2, 300 sec: 43653.3). Total num frames: 823525376. Throughput: 0: 10683.2. Samples: 205915136. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:48:20,741][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:48:24,419][1652475] Updated weights for policy 0, policy_version 402128 (0.0014) [2024-06-15 16:48:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41512.0, 300 sec: 43431.5). Total num frames: 823656448. Throughput: 0: 10922.7. Samples: 205987328. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:48:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:48:26,131][1652475] Updated weights for policy 0, policy_version 402196 (0.0011) [2024-06-15 16:48:26,969][1652475] Updated weights for policy 0, policy_version 402235 (0.0132) [2024-06-15 16:48:30,738][1648984] Fps is (10 sec: 39330.0, 60 sec: 44236.9, 300 sec: 43542.6). Total num frames: 823918592. Throughput: 0: 10683.7. Samples: 206039552. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:48:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:48:32,759][1652475] Updated weights for policy 0, policy_version 402308 (0.0013) [2024-06-15 16:48:33,037][1651340] Signal inference workers to stop experience collection... (20650 times) [2024-06-15 16:48:33,126][1652475] InferenceWorker_p0-w0: stopping experience collection (20650 times) [2024-06-15 16:48:33,312][1651340] Signal inference workers to resume experience collection... (20650 times) [2024-06-15 16:48:33,314][1652475] InferenceWorker_p0-w0: resuming experience collection (20650 times) [2024-06-15 16:48:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 824049664. Throughput: 0: 10672.4. Samples: 206072320. Policy #0 lag: (min: 27.0, avg: 107.0, max: 283.0) [2024-06-15 16:48:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:48:35,893][1652475] Updated weights for policy 0, policy_version 402384 (0.0135) [2024-06-15 16:48:38,568][1652475] Updated weights for policy 0, policy_version 402434 (0.0011) [2024-06-15 16:48:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 824311808. Throughput: 0: 10581.3. Samples: 206134784. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:48:40,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 16:48:41,878][1652475] Updated weights for policy 0, policy_version 402512 (0.0106) [2024-06-15 16:48:45,741][1648984] Fps is (10 sec: 39307.5, 60 sec: 41503.6, 300 sec: 43097.7). Total num frames: 824442880. Throughput: 0: 10535.0. Samples: 206199808. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:48:45,742][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:48:46,169][1652475] Updated weights for policy 0, policy_version 402576 (0.0079) [2024-06-15 16:48:48,361][1652475] Updated weights for policy 0, policy_version 402658 (0.0013) [2024-06-15 16:48:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 43098.2). Total num frames: 824705024. Throughput: 0: 10422.1. Samples: 206224896. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:48:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:48:51,632][1652475] Updated weights for policy 0, policy_version 402724 (0.0015) [2024-06-15 16:48:55,315][1652475] Updated weights for policy 0, policy_version 402807 (0.0015) [2024-06-15 16:48:55,740][1648984] Fps is (10 sec: 52437.6, 60 sec: 42597.1, 300 sec: 43653.3). Total num frames: 824967168. Throughput: 0: 10512.6. Samples: 206292480. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:48:55,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:48:55,764][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000402816_824967168.pth... [2024-06-15 16:48:55,820][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000397760_814612480.pth [2024-06-15 16:48:58,918][1652475] Updated weights for policy 0, policy_version 402851 (0.0013) [2024-06-15 16:49:00,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 825163776. Throughput: 0: 10456.2. Samples: 206355456. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:01,081][1652475] Updated weights for policy 0, policy_version 402938 (0.0138) [2024-06-15 16:49:03,933][1652475] Updated weights for policy 0, policy_version 403005 (0.0014) [2024-06-15 16:49:05,738][1648984] Fps is (10 sec: 39329.2, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 825360384. Throughput: 0: 10524.9. Samples: 206388736. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:07,536][1652475] Updated weights for policy 0, policy_version 403063 (0.0014) [2024-06-15 16:49:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 825556992. Throughput: 0: 10501.7. Samples: 206459904. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:11,542][1652475] Updated weights for policy 0, policy_version 403136 (0.0134) [2024-06-15 16:49:15,380][1652475] Updated weights for policy 0, policy_version 403212 (0.0061) [2024-06-15 16:49:15,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.5, 300 sec: 43542.6). Total num frames: 825819136. Throughput: 0: 10672.4. Samples: 206519808. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:18,252][1652475] Updated weights for policy 0, policy_version 403267 (0.0013) [2024-06-15 16:49:19,560][1652475] Updated weights for policy 0, policy_version 403326 (0.0018) [2024-06-15 16:49:20,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 41507.6, 300 sec: 43209.3). Total num frames: 826015744. Throughput: 0: 10763.4. Samples: 206556672. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:21,836][1651340] Signal inference workers to stop experience collection... (20700 times) [2024-06-15 16:49:21,902][1652475] InferenceWorker_p0-w0: stopping experience collection (20700 times) [2024-06-15 16:49:22,101][1651340] Signal inference workers to resume experience collection... (20700 times) [2024-06-15 16:49:22,102][1652475] InferenceWorker_p0-w0: resuming experience collection (20700 times) [2024-06-15 16:49:23,073][1652475] Updated weights for policy 0, policy_version 403382 (0.0013) [2024-06-15 16:49:25,104][1652475] Updated weights for policy 0, policy_version 403446 (0.0014) [2024-06-15 16:49:25,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 43690.5, 300 sec: 43653.6). Total num frames: 826277888. Throughput: 0: 10786.1. Samples: 206620160. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:25,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:28,204][1652475] Updated weights for policy 0, policy_version 403512 (0.0013) [2024-06-15 16:49:30,745][1648984] Fps is (10 sec: 42565.1, 60 sec: 42046.8, 300 sec: 43319.3). Total num frames: 826441728. Throughput: 0: 10762.4. Samples: 206684160. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:30,746][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:31,570][1652475] Updated weights for policy 0, policy_version 403568 (0.0013) [2024-06-15 16:49:35,738][1648984] Fps is (10 sec: 36045.9, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 826638336. Throughput: 0: 10877.2. Samples: 206714368. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:35,794][1652475] Updated weights for policy 0, policy_version 403637 (0.0014) [2024-06-15 16:49:37,221][1652475] Updated weights for policy 0, policy_version 403685 (0.0013) [2024-06-15 16:49:40,230][1652475] Updated weights for policy 0, policy_version 403744 (0.0028) [2024-06-15 16:49:40,738][1648984] Fps is (10 sec: 45911.0, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 826900480. Throughput: 0: 10763.8. Samples: 206776832. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:42,918][1652475] Updated weights for policy 0, policy_version 403792 (0.0013) [2024-06-15 16:49:45,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43693.3, 300 sec: 43098.3). Total num frames: 827064320. Throughput: 0: 10808.9. Samples: 206841856. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:47,246][1652475] Updated weights for policy 0, policy_version 403856 (0.0013) [2024-06-15 16:49:48,564][1652475] Updated weights for policy 0, policy_version 403920 (0.0015) [2024-06-15 16:49:50,739][1648984] Fps is (10 sec: 42592.1, 60 sec: 43689.6, 300 sec: 43209.2). Total num frames: 827326464. Throughput: 0: 10831.3. Samples: 206876160. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:50,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:51,922][1652475] Updated weights for policy 0, policy_version 404000 (0.0014) [2024-06-15 16:49:54,948][1652475] Updated weights for policy 0, policy_version 404048 (0.0016) [2024-06-15 16:49:55,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43145.9, 300 sec: 42987.2). Total num frames: 827555840. Throughput: 0: 10729.3. Samples: 206942720. Policy #0 lag: (min: 15.0, avg: 125.5, max: 287.0) [2024-06-15 16:49:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:49:59,085][1652475] Updated weights for policy 0, policy_version 404120 (0.0013) [2024-06-15 16:50:00,197][1652475] Updated weights for policy 0, policy_version 404161 (0.0018) [2024-06-15 16:50:00,738][1648984] Fps is (10 sec: 42604.1, 60 sec: 43144.4, 300 sec: 43209.3). Total num frames: 827752448. Throughput: 0: 10911.2. Samples: 207010816. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:50:01,384][1652475] Updated weights for policy 0, policy_version 404218 (0.0013) [2024-06-15 16:50:04,043][1652475] Updated weights for policy 0, policy_version 404287 (0.0016) [2024-06-15 16:50:05,740][1648984] Fps is (10 sec: 42589.8, 60 sec: 43689.2, 300 sec: 42986.9). Total num frames: 827981824. Throughput: 0: 10785.6. Samples: 207042048. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:05,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:50:10,365][1652475] Updated weights for policy 0, policy_version 404356 (0.0016) [2024-06-15 16:50:10,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 828145664. Throughput: 0: 10922.7. Samples: 207111680. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:50:10,839][1651340] Signal inference workers to stop experience collection... (20750 times) [2024-06-15 16:50:10,891][1652475] InferenceWorker_p0-w0: stopping experience collection (20750 times) [2024-06-15 16:50:11,087][1651340] Signal inference workers to resume experience collection... (20750 times) [2024-06-15 16:50:11,088][1652475] InferenceWorker_p0-w0: resuming experience collection (20750 times) [2024-06-15 16:50:13,408][1652475] Updated weights for policy 0, policy_version 404432 (0.0014) [2024-06-15 16:50:15,158][1652475] Updated weights for policy 0, policy_version 404498 (0.0016) [2024-06-15 16:50:15,738][1648984] Fps is (10 sec: 45883.1, 60 sec: 43690.4, 300 sec: 43320.4). Total num frames: 828440576. Throughput: 0: 10799.3. Samples: 207170048. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:15,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:50:17,848][1652475] Updated weights for policy 0, policy_version 404546 (0.0032) [2024-06-15 16:50:20,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 828637184. Throughput: 0: 10774.7. Samples: 207199232. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:50:22,588][1652475] Updated weights for policy 0, policy_version 404610 (0.0017) [2024-06-15 16:50:25,738][1648984] Fps is (10 sec: 32769.0, 60 sec: 41506.3, 300 sec: 42765.0). Total num frames: 828768256. Throughput: 0: 10865.8. Samples: 207265792. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 16:50:26,202][1652475] Updated weights for policy 0, policy_version 404675 (0.0013) [2024-06-15 16:50:28,265][1652475] Updated weights for policy 0, policy_version 404738 (0.0098) [2024-06-15 16:50:29,642][1652475] Updated weights for policy 0, policy_version 404797 (0.0013) [2024-06-15 16:50:30,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43696.4, 300 sec: 42765.1). Total num frames: 829063168. Throughput: 0: 10774.8. Samples: 207326720. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:30,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 16:50:31,711][1652475] Updated weights for policy 0, policy_version 404856 (0.0016) [2024-06-15 16:50:35,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 829227008. Throughput: 0: 10661.3. Samples: 207355904. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:50:35,908][1652475] Updated weights for policy 0, policy_version 404922 (0.0014) [2024-06-15 16:50:40,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 40960.0, 300 sec: 42431.8). Total num frames: 829358080. Throughput: 0: 10808.9. Samples: 207429120. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:40,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 16:50:40,779][1652475] Updated weights for policy 0, policy_version 404976 (0.0016) [2024-06-15 16:50:42,081][1652475] Updated weights for policy 0, policy_version 405027 (0.0013) [2024-06-15 16:50:43,902][1652475] Updated weights for policy 0, policy_version 405111 (0.0032) [2024-06-15 16:50:45,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 829685760. Throughput: 0: 10524.5. Samples: 207484416. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:50:48,159][1652475] Updated weights for policy 0, policy_version 405157 (0.0013) [2024-06-15 16:50:50,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 41507.2, 300 sec: 42653.9). Total num frames: 829816832. Throughput: 0: 10559.1. Samples: 207517184. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:50:52,175][1652475] Updated weights for policy 0, policy_version 405221 (0.0011) [2024-06-15 16:50:54,709][1652475] Updated weights for policy 0, policy_version 405280 (0.0020) [2024-06-15 16:50:55,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 42052.1, 300 sec: 42876.1). Total num frames: 830078976. Throughput: 0: 10365.1. Samples: 207578112. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:50:55,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:50:56,071][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000405344_830144512.pth... [2024-06-15 16:50:56,213][1651340] Signal inference workers to stop experience collection... (20800 times) [2024-06-15 16:50:56,216][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000400304_819822592.pth [2024-06-15 16:50:56,223][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000405344_830144512.pth [2024-06-15 16:50:56,395][1652475] InferenceWorker_p0-w0: stopping experience collection (20800 times) [2024-06-15 16:50:56,398][1652475] Updated weights for policy 0, policy_version 405346 (0.0014) [2024-06-15 16:50:56,607][1651340] Signal inference workers to resume experience collection... (20800 times) [2024-06-15 16:50:56,608][1652475] InferenceWorker_p0-w0: resuming experience collection (20800 times) [2024-06-15 16:51:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40960.1, 300 sec: 42209.6). Total num frames: 830210048. Throughput: 0: 10513.1. Samples: 207643136. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:51:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 16:51:01,602][1652475] Updated weights for policy 0, policy_version 405408 (0.0016) [2024-06-15 16:51:03,311][1652475] Updated weights for policy 0, policy_version 405462 (0.0016) [2024-06-15 16:51:05,632][1652475] Updated weights for policy 0, policy_version 405505 (0.0015) [2024-06-15 16:51:05,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 41507.5, 300 sec: 43098.2). Total num frames: 830472192. Throughput: 0: 10558.6. Samples: 207674368. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:51:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:51:06,905][1652475] Updated weights for policy 0, policy_version 405568 (0.0020) [2024-06-15 16:51:09,207][1652475] Updated weights for policy 0, policy_version 405632 (0.0012) [2024-06-15 16:51:10,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 830734336. Throughput: 0: 10456.2. Samples: 207736320. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:51:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:51:14,348][1652475] Updated weights for policy 0, policy_version 405698 (0.0013) [2024-06-15 16:51:15,388][1652475] Updated weights for policy 0, policy_version 405759 (0.0013) [2024-06-15 16:51:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 42598.6, 300 sec: 43098.2). Total num frames: 830996480. Throughput: 0: 10604.1. Samples: 207803904. Policy #0 lag: (min: 47.0, avg: 136.0, max: 303.0) [2024-06-15 16:51:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:51:18,594][1652475] Updated weights for policy 0, policy_version 405809 (0.0015) [2024-06-15 16:51:20,617][1652475] Updated weights for policy 0, policy_version 405856 (0.0016) [2024-06-15 16:51:20,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 831193088. Throughput: 0: 10774.8. Samples: 207840768. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:51:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:51:25,068][1652475] Updated weights for policy 0, policy_version 405920 (0.0012) [2024-06-15 16:51:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 831389696. Throughput: 0: 10706.5. Samples: 207910912. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:51:25,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:51:26,846][1652475] Updated weights for policy 0, policy_version 406000 (0.0015) [2024-06-15 16:51:30,537][1652475] Updated weights for policy 0, policy_version 406079 (0.0087) [2024-06-15 16:51:30,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 831651840. Throughput: 0: 10831.6. Samples: 207971840. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:51:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:51:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42598.4, 300 sec: 43099.1). Total num frames: 831782912. Throughput: 0: 10831.7. Samples: 208004608. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:51:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:51:36,252][1652475] Updated weights for policy 0, policy_version 406160 (0.0011) [2024-06-15 16:51:37,711][1652475] Updated weights for policy 0, policy_version 406224 (0.0010) [2024-06-15 16:51:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 832045056. Throughput: 0: 10979.6. Samples: 208072192. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:51:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:51:42,492][1652475] Updated weights for policy 0, policy_version 406304 (0.0100) [2024-06-15 16:51:44,033][1652475] Updated weights for policy 0, policy_version 406353 (0.0028) [2024-06-15 16:51:44,435][1651340] Signal inference workers to stop experience collection... (20850 times) [2024-06-15 16:51:44,530][1652475] InferenceWorker_p0-w0: stopping experience collection (20850 times) [2024-06-15 16:51:44,691][1651340] Signal inference workers to resume experience collection... (20850 times) [2024-06-15 16:51:44,692][1652475] InferenceWorker_p0-w0: resuming experience collection (20850 times) [2024-06-15 16:51:45,051][1652475] Updated weights for policy 0, policy_version 406399 (0.0096) [2024-06-15 16:51:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 832307200. Throughput: 0: 10956.8. Samples: 208136192. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:51:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:51:48,902][1652475] Updated weights for policy 0, policy_version 406458 (0.0013) [2024-06-15 16:51:50,092][1652475] Updated weights for policy 0, policy_version 406512 (0.0012) [2024-06-15 16:51:50,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 45875.0, 300 sec: 43320.4). Total num frames: 832569344. Throughput: 0: 11207.1. Samples: 208178688. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:51:50,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:51:53,472][1652475] Updated weights for policy 0, policy_version 406550 (0.0013) [2024-06-15 16:51:54,669][1652475] Updated weights for policy 0, policy_version 406596 (0.0012) [2024-06-15 16:51:55,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 45329.2, 300 sec: 43320.4). Total num frames: 832798720. Throughput: 0: 11320.9. Samples: 208245760. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:51:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:51:56,009][1652475] Updated weights for policy 0, policy_version 406656 (0.0017) [2024-06-15 16:52:00,144][1652475] Updated weights for policy 0, policy_version 406720 (0.0015) [2024-06-15 16:52:00,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 45874.9, 300 sec: 43431.4). Total num frames: 832962560. Throughput: 0: 11366.3. Samples: 208315392. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:52:00,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:52:01,860][1652475] Updated weights for policy 0, policy_version 406781 (0.0013) [2024-06-15 16:52:05,428][1652475] Updated weights for policy 0, policy_version 406840 (0.0014) [2024-06-15 16:52:05,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 45875.2, 300 sec: 43431.5). Total num frames: 833224704. Throughput: 0: 11332.3. Samples: 208350720. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:52:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:52:07,356][1652475] Updated weights for policy 0, policy_version 406901 (0.0026) [2024-06-15 16:52:10,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 44236.7, 300 sec: 43209.3). Total num frames: 833388544. Throughput: 0: 11252.6. Samples: 208417280. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:52:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:52:11,290][1652475] Updated weights for policy 0, policy_version 406947 (0.0013) [2024-06-15 16:52:13,387][1652475] Updated weights for policy 0, policy_version 407040 (0.0012) [2024-06-15 16:52:15,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 833617920. Throughput: 0: 11389.2. Samples: 208484352. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:52:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:52:18,748][1652475] Updated weights for policy 0, policy_version 407122 (0.0025) [2024-06-15 16:52:20,740][1648984] Fps is (10 sec: 49151.9, 60 sec: 44782.8, 300 sec: 43099.5). Total num frames: 833880064. Throughput: 0: 11286.7. Samples: 208512512. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:52:20,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:52:23,699][1652475] Updated weights for policy 0, policy_version 407222 (0.0014) [2024-06-15 16:52:25,012][1652475] Updated weights for policy 0, policy_version 407251 (0.0013) [2024-06-15 16:52:25,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 43653.6). Total num frames: 834142208. Throughput: 0: 11252.6. Samples: 208578560. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:52:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:52:28,695][1652475] Updated weights for policy 0, policy_version 407315 (0.0024) [2024-06-15 16:52:30,116][1652475] Updated weights for policy 0, policy_version 407361 (0.0015) [2024-06-15 16:52:30,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 834338816. Throughput: 0: 11298.1. Samples: 208644608. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:52:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:52:30,768][1651340] Signal inference workers to stop experience collection... (20900 times) [2024-06-15 16:52:30,801][1652475] InferenceWorker_p0-w0: stopping experience collection (20900 times) [2024-06-15 16:52:30,978][1651340] Signal inference workers to resume experience collection... (20900 times) [2024-06-15 16:52:30,979][1652475] InferenceWorker_p0-w0: resuming experience collection (20900 times) [2024-06-15 16:52:31,274][1652475] Updated weights for policy 0, policy_version 407421 (0.0012) [2024-06-15 16:52:34,979][1652475] Updated weights for policy 0, policy_version 407488 (0.0014) [2024-06-15 16:52:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 45875.1, 300 sec: 43542.6). Total num frames: 834535424. Throughput: 0: 11184.4. Samples: 208681984. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:52:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:52:40,055][1652475] Updated weights for policy 0, policy_version 407576 (0.0014) [2024-06-15 16:52:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 45329.2, 300 sec: 43431.5). Total num frames: 834764800. Throughput: 0: 11082.0. Samples: 208744448. Policy #0 lag: (min: 31.0, avg: 158.5, max: 287.0) [2024-06-15 16:52:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:52:41,851][1652475] Updated weights for policy 0, policy_version 407632 (0.0096) [2024-06-15 16:52:42,839][1652475] Updated weights for policy 0, policy_version 407679 (0.0012) [2024-06-15 16:52:45,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 834994176. Throughput: 0: 11093.4. Samples: 208814592. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:52:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 16:52:46,210][1652475] Updated weights for policy 0, policy_version 407738 (0.0013) [2024-06-15 16:52:50,738][1648984] Fps is (10 sec: 29490.1, 60 sec: 41506.0, 300 sec: 42876.0). Total num frames: 835059712. Throughput: 0: 11138.7. Samples: 208851968. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:52:50,739][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 16:52:51,767][1652475] Updated weights for policy 0, policy_version 407792 (0.0016) [2024-06-15 16:52:53,408][1652475] Updated weights for policy 0, policy_version 407872 (0.0015) [2024-06-15 16:52:54,936][1652475] Updated weights for policy 0, policy_version 407928 (0.0016) [2024-06-15 16:52:55,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 835452928. Throughput: 0: 10877.2. Samples: 208906752. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:52:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:52:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000407936_835452928.pth... [2024-06-15 16:52:55,777][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000402816_824967168.pth [2024-06-15 16:52:58,244][1652475] Updated weights for policy 0, policy_version 407993 (0.0016) [2024-06-15 16:53:00,738][1648984] Fps is (10 sec: 52430.9, 60 sec: 43690.9, 300 sec: 43209.3). Total num frames: 835584000. Throughput: 0: 10865.8. Samples: 208973312. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:53:03,410][1652475] Updated weights for policy 0, policy_version 408061 (0.0013) [2024-06-15 16:53:04,841][1652475] Updated weights for policy 0, policy_version 408122 (0.0025) [2024-06-15 16:53:05,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 835846144. Throughput: 0: 10968.2. Samples: 209006080. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 16:53:06,395][1652475] Updated weights for policy 0, policy_version 408163 (0.0013) [2024-06-15 16:53:10,738][1648984] Fps is (10 sec: 42597.5, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 836009984. Throughput: 0: 10979.5. Samples: 209072640. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:10,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:53:10,743][1652475] Updated weights for policy 0, policy_version 408215 (0.0014) [2024-06-15 16:53:11,639][1652475] Updated weights for policy 0, policy_version 408255 (0.0027) [2024-06-15 16:53:14,427][1652475] Updated weights for policy 0, policy_version 408320 (0.0014) [2024-06-15 16:53:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43098.6). Total num frames: 836239360. Throughput: 0: 10877.2. Samples: 209134080. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 16:53:19,024][1652475] Updated weights for policy 0, policy_version 408400 (0.0012) [2024-06-15 16:53:19,146][1651340] Signal inference workers to stop experience collection... (20950 times) [2024-06-15 16:53:19,184][1652475] InferenceWorker_p0-w0: stopping experience collection (20950 times) [2024-06-15 16:53:19,322][1651340] Signal inference workers to resume experience collection... (20950 times) [2024-06-15 16:53:19,328][1652475] InferenceWorker_p0-w0: resuming experience collection (20950 times) [2024-06-15 16:53:20,738][1648984] Fps is (10 sec: 49153.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 836501504. Throughput: 0: 10695.1. Samples: 209163264. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:53:23,620][1652475] Updated weights for policy 0, policy_version 408464 (0.0017) [2024-06-15 16:53:24,959][1652475] Updated weights for policy 0, policy_version 408511 (0.0012) [2024-06-15 16:53:25,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 836665344. Throughput: 0: 10752.0. Samples: 209228288. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:53:29,096][1652475] Updated weights for policy 0, policy_version 408581 (0.0016) [2024-06-15 16:53:30,383][1652475] Updated weights for policy 0, policy_version 408635 (0.0014) [2024-06-15 16:53:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 836894720. Throughput: 0: 10570.0. Samples: 209290240. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:53:32,327][1652475] Updated weights for policy 0, policy_version 408704 (0.0013) [2024-06-15 16:53:35,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 837058560. Throughput: 0: 10444.8. Samples: 209321984. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:35,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:53:37,019][1652475] Updated weights for policy 0, policy_version 408771 (0.0013) [2024-06-15 16:53:40,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 42052.1, 300 sec: 43543.1). Total num frames: 837287936. Throughput: 0: 10717.8. Samples: 209389056. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:40,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:53:40,769][1652475] Updated weights for policy 0, policy_version 408836 (0.0014) [2024-06-15 16:53:41,909][1652475] Updated weights for policy 0, policy_version 408891 (0.0012) [2024-06-15 16:53:43,460][1652475] Updated weights for policy 0, policy_version 408954 (0.0019) [2024-06-15 16:53:45,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 837550080. Throughput: 0: 10877.1. Samples: 209462784. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:53:47,693][1652475] Updated weights for policy 0, policy_version 409018 (0.0089) [2024-06-15 16:53:49,266][1652475] Updated weights for policy 0, policy_version 409072 (0.0013) [2024-06-15 16:53:50,738][1648984] Fps is (10 sec: 52430.1, 60 sec: 45875.5, 300 sec: 43542.9). Total num frames: 837812224. Throughput: 0: 10945.4. Samples: 209498624. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:53:52,711][1652475] Updated weights for policy 0, policy_version 409120 (0.0013) [2024-06-15 16:53:54,290][1652475] Updated weights for policy 0, policy_version 409168 (0.0016) [2024-06-15 16:53:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 838074368. Throughput: 0: 10945.5. Samples: 209565184. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:53:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:53:58,641][1652475] Updated weights for policy 0, policy_version 409219 (0.0014) [2024-06-15 16:54:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 838238208. Throughput: 0: 10945.4. Samples: 209626624. Policy #0 lag: (min: 15.0, avg: 155.4, max: 271.0) [2024-06-15 16:54:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:54:00,781][1652475] Updated weights for policy 0, policy_version 409312 (0.0013) [2024-06-15 16:54:04,507][1652475] Updated weights for policy 0, policy_version 409364 (0.0013) [2024-06-15 16:54:05,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 838467584. Throughput: 0: 11036.4. Samples: 209659904. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:54:07,381][1651340] Signal inference workers to stop experience collection... (21000 times) [2024-06-15 16:54:07,423][1652475] Updated weights for policy 0, policy_version 409441 (0.0014) [2024-06-15 16:54:07,472][1652475] InferenceWorker_p0-w0: stopping experience collection (21000 times) [2024-06-15 16:54:07,596][1651340] Signal inference workers to resume experience collection... (21000 times) [2024-06-15 16:54:07,596][1652475] InferenceWorker_p0-w0: resuming experience collection (21000 times) [2024-06-15 16:54:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.7, 300 sec: 43320.4). Total num frames: 838598656. Throughput: 0: 11081.9. Samples: 209726976. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:54:11,383][1652475] Updated weights for policy 0, policy_version 409504 (0.0037) [2024-06-15 16:54:12,307][1652475] Updated weights for policy 0, policy_version 409542 (0.0013) [2024-06-15 16:54:15,246][1652475] Updated weights for policy 0, policy_version 409602 (0.0013) [2024-06-15 16:54:15,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 838893568. Throughput: 0: 11286.8. Samples: 209798144. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:54:16,580][1652475] Updated weights for policy 0, policy_version 409664 (0.0029) [2024-06-15 16:54:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 839122944. Throughput: 0: 11252.7. Samples: 209828352. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:54:22,906][1652475] Updated weights for policy 0, policy_version 409744 (0.0025) [2024-06-15 16:54:24,970][1652475] Updated weights for policy 0, policy_version 409840 (0.0018) [2024-06-15 16:54:25,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 45329.0, 300 sec: 43876.9). Total num frames: 839385088. Throughput: 0: 11241.3. Samples: 209894912. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:54:27,762][1652475] Updated weights for policy 0, policy_version 409876 (0.0011) [2024-06-15 16:54:29,559][1652475] Updated weights for policy 0, policy_version 409924 (0.0013) [2024-06-15 16:54:30,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 45329.1, 300 sec: 43986.9). Total num frames: 839614464. Throughput: 0: 11025.1. Samples: 209958912. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:54:30,847][1652475] Updated weights for policy 0, policy_version 409981 (0.0013) [2024-06-15 16:54:35,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 44236.9, 300 sec: 43431.5). Total num frames: 839712768. Throughput: 0: 11093.3. Samples: 209997824. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:54:35,999][1652475] Updated weights for policy 0, policy_version 410033 (0.0013) [2024-06-15 16:54:37,636][1652475] Updated weights for policy 0, policy_version 410112 (0.0012) [2024-06-15 16:54:39,874][1652475] Updated weights for policy 0, policy_version 410167 (0.0011) [2024-06-15 16:54:40,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 45875.3, 300 sec: 43986.9). Total num frames: 840040448. Throughput: 0: 11036.4. Samples: 210061824. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 16:54:41,743][1652475] Updated weights for policy 0, policy_version 410211 (0.0012) [2024-06-15 16:54:42,380][1652475] Updated weights for policy 0, policy_version 410240 (0.0013) [2024-06-15 16:54:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43542.8). Total num frames: 840171520. Throughput: 0: 11298.1. Samples: 210135040. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:45,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:54:50,108][1652475] Updated weights for policy 0, policy_version 410368 (0.0093) [2024-06-15 16:54:50,739][1648984] Fps is (10 sec: 42594.7, 60 sec: 44236.1, 300 sec: 43764.6). Total num frames: 840466432. Throughput: 0: 11263.8. Samples: 210166784. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:50,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:54:51,328][1652475] Updated weights for policy 0, policy_version 410427 (0.0013) [2024-06-15 16:54:52,286][1651340] Signal inference workers to stop experience collection... (21050 times) [2024-06-15 16:54:52,325][1652475] InferenceWorker_p0-w0: stopping experience collection (21050 times) [2024-06-15 16:54:52,516][1651340] Signal inference workers to resume experience collection... (21050 times) [2024-06-15 16:54:52,517][1652475] InferenceWorker_p0-w0: resuming experience collection (21050 times) [2024-06-15 16:54:53,504][1652475] Updated weights for policy 0, policy_version 410486 (0.0012) [2024-06-15 16:54:55,738][1648984] Fps is (10 sec: 52426.9, 60 sec: 43690.4, 300 sec: 43875.8). Total num frames: 840695808. Throughput: 0: 11104.6. Samples: 210226688. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:54:55,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:54:55,750][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000410496_840695808.pth... [2024-06-15 16:54:55,803][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000405344_830144512.pth [2024-06-15 16:54:58,216][1652475] Updated weights for policy 0, policy_version 410518 (0.0015) [2024-06-15 16:55:00,738][1648984] Fps is (10 sec: 39325.1, 60 sec: 43690.7, 300 sec: 43653.9). Total num frames: 840859648. Throughput: 0: 11218.5. Samples: 210302976. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:55:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 16:55:00,826][1652475] Updated weights for policy 0, policy_version 410581 (0.0014) [2024-06-15 16:55:02,472][1652475] Updated weights for policy 0, policy_version 410628 (0.0014) [2024-06-15 16:55:04,611][1652475] Updated weights for policy 0, policy_version 410720 (0.0119) [2024-06-15 16:55:05,738][1648984] Fps is (10 sec: 52430.9, 60 sec: 45875.2, 300 sec: 44320.1). Total num frames: 841220096. Throughput: 0: 11207.1. Samples: 210332672. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:55:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:55:10,037][1652475] Updated weights for policy 0, policy_version 410800 (0.0017) [2024-06-15 16:55:10,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 45875.1, 300 sec: 43764.7). Total num frames: 841351168. Throughput: 0: 11195.7. Samples: 210398720. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:55:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:55:15,161][1652475] Updated weights for policy 0, policy_version 410880 (0.0102) [2024-06-15 16:55:15,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 841515008. Throughput: 0: 11150.2. Samples: 210460672. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:55:15,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 16:55:17,654][1652475] Updated weights for policy 0, policy_version 410976 (0.0014) [2024-06-15 16:55:20,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 43690.4, 300 sec: 43986.8). Total num frames: 841744384. Throughput: 0: 10706.4. Samples: 210479616. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:55:20,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:55:22,528][1652475] Updated weights for policy 0, policy_version 411010 (0.0013) [2024-06-15 16:55:23,617][1652475] Updated weights for policy 0, policy_version 411066 (0.0011) [2024-06-15 16:55:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.2, 300 sec: 43431.5). Total num frames: 841875456. Throughput: 0: 10740.6. Samples: 210545152. Policy #0 lag: (min: 11.0, avg: 119.6, max: 267.0) [2024-06-15 16:55:25,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 16:55:27,355][1652475] Updated weights for policy 0, policy_version 411120 (0.0013) [2024-06-15 16:55:30,090][1652475] Updated weights for policy 0, policy_version 411202 (0.0013) [2024-06-15 16:55:30,738][1648984] Fps is (10 sec: 45876.2, 60 sec: 43144.4, 300 sec: 43986.9). Total num frames: 842203136. Throughput: 0: 10399.3. Samples: 210603008. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:55:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 16:55:31,184][1652475] Updated weights for policy 0, policy_version 411262 (0.0014) [2024-06-15 16:55:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42598.5, 300 sec: 43764.7). Total num frames: 842268672. Throughput: 0: 10388.1. Samples: 210634240. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:55:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 16:55:38,319][1652475] Updated weights for policy 0, policy_version 411346 (0.0017) [2024-06-15 16:55:40,738][1648984] Fps is (10 sec: 32768.6, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 842530816. Throughput: 0: 10501.8. Samples: 210699264. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:55:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 16:55:40,935][1651340] Signal inference workers to stop experience collection... (21100 times) [2024-06-15 16:55:40,980][1652475] InferenceWorker_p0-w0: stopping experience collection (21100 times) [2024-06-15 16:55:41,151][1651340] Signal inference workers to resume experience collection... (21100 times) [2024-06-15 16:55:41,152][1652475] InferenceWorker_p0-w0: resuming experience collection (21100 times) [2024-06-15 16:55:41,153][1652475] Updated weights for policy 0, policy_version 411424 (0.0012) [2024-06-15 16:55:42,762][1652475] Updated weights for policy 0, policy_version 411488 (0.0014) [2024-06-15 16:55:45,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 842792960. Throughput: 0: 10308.3. Samples: 210766848. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:55:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:55:48,775][1652475] Updated weights for policy 0, policy_version 411568 (0.0025) [2024-06-15 16:55:50,378][1652475] Updated weights for policy 0, policy_version 411616 (0.0013) [2024-06-15 16:55:50,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 42052.9, 300 sec: 43764.7). Total num frames: 842989568. Throughput: 0: 10501.7. Samples: 210805248. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:55:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:55:52,682][1652475] Updated weights for policy 0, policy_version 411670 (0.0012) [2024-06-15 16:55:53,349][1652475] Updated weights for policy 0, policy_version 411712 (0.0026) [2024-06-15 16:55:55,106][1652475] Updated weights for policy 0, policy_version 411769 (0.0113) [2024-06-15 16:55:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43691.0, 300 sec: 44431.2). Total num frames: 843317248. Throughput: 0: 10456.2. Samples: 210869248. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:55:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:00,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43144.6, 300 sec: 43986.9). Total num frames: 843448320. Throughput: 0: 10558.6. Samples: 210935808. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:56:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:01,454][1652475] Updated weights for policy 0, policy_version 411842 (0.0033) [2024-06-15 16:56:02,931][1652475] Updated weights for policy 0, policy_version 411904 (0.0012) [2024-06-15 16:56:05,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 843710464. Throughput: 0: 10809.0. Samples: 210966016. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:56:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:05,923][1652475] Updated weights for policy 0, policy_version 411969 (0.0018) [2024-06-15 16:56:07,272][1652475] Updated weights for policy 0, policy_version 412024 (0.0013) [2024-06-15 16:56:10,749][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 843841536. Throughput: 0: 10740.6. Samples: 211028480. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:56:10,787][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:13,064][1652475] Updated weights for policy 0, policy_version 412087 (0.0132) [2024-06-15 16:56:14,975][1652475] Updated weights for policy 0, policy_version 412144 (0.0012) [2024-06-15 16:56:15,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43144.6, 300 sec: 43764.7). Total num frames: 844103680. Throughput: 0: 10854.4. Samples: 211091456. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:56:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:17,096][1652475] Updated weights for policy 0, policy_version 412201 (0.0085) [2024-06-15 16:56:19,160][1652475] Updated weights for policy 0, policy_version 412288 (0.0012) [2024-06-15 16:56:20,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 844365824. Throughput: 0: 10865.7. Samples: 211123200. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:56:20,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:25,658][1652475] Updated weights for policy 0, policy_version 412351 (0.0013) [2024-06-15 16:56:25,738][1648984] Fps is (10 sec: 39319.1, 60 sec: 43690.3, 300 sec: 43542.5). Total num frames: 844496896. Throughput: 0: 11047.7. Samples: 211196416. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:56:25,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:27,166][1652475] Updated weights for policy 0, policy_version 412407 (0.0011) [2024-06-15 16:56:29,184][1651340] Signal inference workers to stop experience collection... (21150 times) [2024-06-15 16:56:29,215][1652475] Updated weights for policy 0, policy_version 412449 (0.0025) [2024-06-15 16:56:29,238][1652475] InferenceWorker_p0-w0: stopping experience collection (21150 times) [2024-06-15 16:56:29,457][1651340] Signal inference workers to resume experience collection... (21150 times) [2024-06-15 16:56:29,459][1652475] InferenceWorker_p0-w0: resuming experience collection (21150 times) [2024-06-15 16:56:30,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 844824576. Throughput: 0: 10797.5. Samples: 211252736. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:56:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:31,220][1652475] Updated weights for policy 0, policy_version 412538 (0.0105) [2024-06-15 16:56:35,738][1648984] Fps is (10 sec: 39324.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 844890112. Throughput: 0: 10717.9. Samples: 211287552. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:56:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:37,915][1652475] Updated weights for policy 0, policy_version 412598 (0.0014) [2024-06-15 16:56:39,147][1652475] Updated weights for policy 0, policy_version 412644 (0.0013) [2024-06-15 16:56:40,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 44236.7, 300 sec: 43653.6). Total num frames: 845185024. Throughput: 0: 10843.0. Samples: 211357184. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:56:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:41,516][1652475] Updated weights for policy 0, policy_version 412734 (0.0025) [2024-06-15 16:56:45,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 845414400. Throughput: 0: 10706.5. Samples: 211417600. Policy #0 lag: (min: 15.0, avg: 83.0, max: 255.0) [2024-06-15 16:56:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:49,038][1652475] Updated weights for policy 0, policy_version 412818 (0.0014) [2024-06-15 16:56:50,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 845578240. Throughput: 0: 10934.1. Samples: 211458048. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:56:50,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:51,461][1652475] Updated weights for policy 0, policy_version 412918 (0.0018) [2024-06-15 16:56:53,085][1652475] Updated weights for policy 0, policy_version 412988 (0.0015) [2024-06-15 16:56:55,147][1652475] Updated weights for policy 0, policy_version 413056 (0.0025) [2024-06-15 16:56:55,738][1648984] Fps is (10 sec: 52426.9, 60 sec: 43690.4, 300 sec: 43986.9). Total num frames: 845938688. Throughput: 0: 10820.2. Samples: 211515392. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:56:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 16:56:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000413056_845938688.pth... [2024-06-15 16:56:55,820][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000407936_835452928.pth [2024-06-15 16:57:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 845971456. Throughput: 0: 11161.6. Samples: 211593728. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 16:57:02,130][1652475] Updated weights for policy 0, policy_version 413123 (0.0013) [2024-06-15 16:57:03,568][1652475] Updated weights for policy 0, policy_version 413186 (0.0157) [2024-06-15 16:57:04,629][1652475] Updated weights for policy 0, policy_version 413248 (0.0024) [2024-06-15 16:57:05,738][1648984] Fps is (10 sec: 39323.2, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 846331904. Throughput: 0: 10991.0. Samples: 211617792. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:57:10,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 846462976. Throughput: 0: 10865.9. Samples: 211685376. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:57:11,572][1652475] Updated weights for policy 0, policy_version 413315 (0.0121) [2024-06-15 16:57:15,189][1652475] Updated weights for policy 0, policy_version 413408 (0.0014) [2024-06-15 16:57:15,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 846692352. Throughput: 0: 11207.1. Samples: 211757056. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:15,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 16:57:16,217][1651340] Signal inference workers to stop experience collection... (21200 times) [2024-06-15 16:57:16,267][1652475] InferenceWorker_p0-w0: stopping experience collection (21200 times) [2024-06-15 16:57:16,400][1651340] Signal inference workers to resume experience collection... (21200 times) [2024-06-15 16:57:16,401][1652475] InferenceWorker_p0-w0: resuming experience collection (21200 times) [2024-06-15 16:57:16,814][1652475] Updated weights for policy 0, policy_version 413488 (0.0014) [2024-06-15 16:57:18,367][1652475] Updated weights for policy 0, policy_version 413558 (0.0015) [2024-06-15 16:57:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 846987264. Throughput: 0: 11002.3. Samples: 211782656. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 16:57:23,723][1652475] Updated weights for policy 0, policy_version 413625 (0.0015) [2024-06-15 16:57:25,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43691.0, 300 sec: 43320.4). Total num frames: 847118336. Throughput: 0: 11104.7. Samples: 211856896. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 16:57:27,254][1652475] Updated weights for policy 0, policy_version 413685 (0.0015) [2024-06-15 16:57:29,290][1652475] Updated weights for policy 0, policy_version 413765 (0.0013) [2024-06-15 16:57:30,525][1652475] Updated weights for policy 0, policy_version 413824 (0.0014) [2024-06-15 16:57:30,743][1648984] Fps is (10 sec: 52428.4, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 847511552. Throughput: 0: 11013.7. Samples: 211913216. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:30,746][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:57:35,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 45329.0, 300 sec: 43542.6). Total num frames: 847609856. Throughput: 0: 10956.8. Samples: 211951104. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:35,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 16:57:35,835][1652475] Updated weights for policy 0, policy_version 413882 (0.0014) [2024-06-15 16:57:40,028][1652475] Updated weights for policy 0, policy_version 413923 (0.0013) [2024-06-15 16:57:40,738][1648984] Fps is (10 sec: 26214.6, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 847773696. Throughput: 0: 11275.5. Samples: 212022784. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:40,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 16:57:42,651][1652475] Updated weights for policy 0, policy_version 414032 (0.0024) [2024-06-15 16:57:43,755][1652475] Updated weights for policy 0, policy_version 414077 (0.0011) [2024-06-15 16:57:45,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 848035840. Throughput: 0: 10683.7. Samples: 212074496. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:45,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 16:57:49,731][1652475] Updated weights for policy 0, policy_version 414140 (0.0013) [2024-06-15 16:57:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 848166912. Throughput: 0: 10934.0. Samples: 212109824. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:50,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 16:57:52,665][1652475] Updated weights for policy 0, policy_version 414210 (0.0015) [2024-06-15 16:57:54,644][1652475] Updated weights for policy 0, policy_version 414289 (0.0012) [2024-06-15 16:57:55,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 848560128. Throughput: 0: 10672.4. Samples: 212165632. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:57:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 16:58:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 848560128. Throughput: 0: 10683.7. Samples: 212237824. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:58:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 16:58:01,325][1652475] Updated weights for policy 0, policy_version 414352 (0.0013) [2024-06-15 16:58:02,877][1652475] Updated weights for policy 0, policy_version 414403 (0.0023) [2024-06-15 16:58:03,177][1651340] Signal inference workers to stop experience collection... (21250 times) [2024-06-15 16:58:03,215][1652475] InferenceWorker_p0-w0: stopping experience collection (21250 times) [2024-06-15 16:58:03,470][1651340] Signal inference workers to resume experience collection... (21250 times) [2024-06-15 16:58:03,471][1652475] InferenceWorker_p0-w0: resuming experience collection (21250 times) [2024-06-15 16:58:04,584][1652475] Updated weights for policy 0, policy_version 414467 (0.0013) [2024-06-15 16:58:05,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.5, 300 sec: 43764.7). Total num frames: 848920576. Throughput: 0: 10865.8. Samples: 212271616. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:58:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 16:58:05,890][1652475] Updated weights for policy 0, policy_version 414525 (0.0012) [2024-06-15 16:58:08,055][1652475] Updated weights for policy 0, policy_version 414591 (0.0083) [2024-06-15 16:58:10,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 849084416. Throughput: 0: 10444.8. Samples: 212326912. Policy #0 lag: (min: 3.0, avg: 70.0, max: 259.0) [2024-06-15 16:58:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:58:14,063][1652475] Updated weights for policy 0, policy_version 414656 (0.0018) [2024-06-15 16:58:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 849313792. Throughput: 0: 10808.9. Samples: 212399616. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:58:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:58:16,300][1652475] Updated weights for policy 0, policy_version 414721 (0.0015) [2024-06-15 16:58:17,861][1652475] Updated weights for policy 0, policy_version 414784 (0.0013) [2024-06-15 16:58:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 849608704. Throughput: 0: 10524.5. Samples: 212424704. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:58:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:58:25,312][1652475] Updated weights for policy 0, policy_version 414864 (0.0015) [2024-06-15 16:58:25,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 849674240. Throughput: 0: 10706.5. Samples: 212504576. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:58:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:58:26,350][1652475] Updated weights for policy 0, policy_version 414906 (0.0017) [2024-06-15 16:58:28,207][1652475] Updated weights for policy 0, policy_version 414961 (0.0112) [2024-06-15 16:58:30,717][1652475] Updated weights for policy 0, policy_version 415056 (0.0117) [2024-06-15 16:58:30,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 43986.9). Total num frames: 850034688. Throughput: 0: 10695.1. Samples: 212555776. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:58:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:58:35,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 850132992. Throughput: 0: 10592.7. Samples: 212586496. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:58:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:58:37,831][1652475] Updated weights for policy 0, policy_version 415121 (0.0021) [2024-06-15 16:58:38,975][1652475] Updated weights for policy 0, policy_version 415168 (0.0019) [2024-06-15 16:58:40,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 850329600. Throughput: 0: 11013.7. Samples: 212661248. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:58:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:58:41,537][1652475] Updated weights for policy 0, policy_version 415237 (0.0013) [2024-06-15 16:58:43,051][1652475] Updated weights for policy 0, policy_version 415300 (0.0017) [2024-06-15 16:58:45,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 850657280. Throughput: 0: 10592.7. Samples: 212714496. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:58:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:58:49,888][1651340] Signal inference workers to stop experience collection... (21300 times) [2024-06-15 16:58:49,932][1652475] InferenceWorker_p0-w0: stopping experience collection (21300 times) [2024-06-15 16:58:50,236][1651340] Signal inference workers to resume experience collection... (21300 times) [2024-06-15 16:58:50,236][1652475] InferenceWorker_p0-w0: resuming experience collection (21300 times) [2024-06-15 16:58:50,736][1652475] Updated weights for policy 0, policy_version 415392 (0.0182) [2024-06-15 16:58:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 850722816. Throughput: 0: 10729.3. Samples: 212754432. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:58:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:58:52,296][1652475] Updated weights for policy 0, policy_version 415442 (0.0015) [2024-06-15 16:58:53,723][1652475] Updated weights for policy 0, policy_version 415492 (0.0022) [2024-06-15 16:58:54,996][1652475] Updated weights for policy 0, policy_version 415554 (0.0012) [2024-06-15 16:58:55,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 42598.2, 300 sec: 43653.6). Total num frames: 851116032. Throughput: 0: 10865.7. Samples: 212815872. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:58:55,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:58:55,982][1652475] Updated weights for policy 0, policy_version 415615 (0.0012) [2024-06-15 16:58:56,007][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000415616_851181568.pth... [2024-06-15 16:58:56,049][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000410496_840695808.pth [2024-06-15 16:59:00,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 851181568. Throughput: 0: 10899.9. Samples: 212890112. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:59:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:59:02,615][1652475] Updated weights for policy 0, policy_version 415664 (0.0013) [2024-06-15 16:59:04,691][1652475] Updated weights for policy 0, policy_version 415735 (0.0164) [2024-06-15 16:59:05,738][1648984] Fps is (10 sec: 32768.7, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 851443712. Throughput: 0: 11059.2. Samples: 212922368. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:59:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:59:06,092][1652475] Updated weights for policy 0, policy_version 415767 (0.0013) [2024-06-15 16:59:08,244][1652475] Updated weights for policy 0, policy_version 415863 (0.0107) [2024-06-15 16:59:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 851705856. Throughput: 0: 10535.8. Samples: 212978688. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:59:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:59:15,354][1652475] Updated weights for policy 0, policy_version 415937 (0.0018) [2024-06-15 16:59:15,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 851869696. Throughput: 0: 10934.1. Samples: 213047808. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:59:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 16:59:16,708][1652475] Updated weights for policy 0, policy_version 415992 (0.0014) [2024-06-15 16:59:19,091][1652475] Updated weights for policy 0, policy_version 416064 (0.0013) [2024-06-15 16:59:20,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 852230144. Throughput: 0: 10922.7. Samples: 213078016. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:59:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 16:59:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 852230144. Throughput: 0: 10865.8. Samples: 213150208. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:59:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:59:26,142][1652475] Updated weights for policy 0, policy_version 416147 (0.0012) [2024-06-15 16:59:27,820][1652475] Updated weights for policy 0, policy_version 416211 (0.0013) [2024-06-15 16:59:30,214][1651340] Signal inference workers to stop experience collection... (21350 times) [2024-06-15 16:59:30,258][1652475] InferenceWorker_p0-w0: stopping experience collection (21350 times) [2024-06-15 16:59:30,261][1652475] Updated weights for policy 0, policy_version 416276 (0.0016) [2024-06-15 16:59:30,404][1651340] Signal inference workers to resume experience collection... (21350 times) [2024-06-15 16:59:30,405][1652475] InferenceWorker_p0-w0: resuming experience collection (21350 times) [2024-06-15 16:59:30,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 852590592. Throughput: 0: 11036.4. Samples: 213211136. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:59:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:59:30,976][1652475] Updated weights for policy 0, policy_version 416320 (0.0012) [2024-06-15 16:59:32,106][1652475] Updated weights for policy 0, policy_version 416378 (0.0019) [2024-06-15 16:59:35,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 852754432. Throughput: 0: 10888.5. Samples: 213244416. Policy #0 lag: (min: 15.0, avg: 84.5, max: 271.0) [2024-06-15 16:59:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 16:59:37,994][1652475] Updated weights for policy 0, policy_version 416433 (0.0012) [2024-06-15 16:59:39,828][1652475] Updated weights for policy 0, policy_version 416507 (0.0015) [2024-06-15 16:59:40,740][1648984] Fps is (10 sec: 42589.8, 60 sec: 44781.4, 300 sec: 43542.3). Total num frames: 853016576. Throughput: 0: 10910.9. Samples: 213306880. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 16:59:40,742][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:59:42,523][1652475] Updated weights for policy 0, policy_version 416572 (0.0095) [2024-06-15 16:59:45,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 43098.4). Total num frames: 853180416. Throughput: 0: 10729.2. Samples: 213372928. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 16:59:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 16:59:45,925][1652475] Updated weights for policy 0, policy_version 416608 (0.0015) [2024-06-15 16:59:48,448][1652475] Updated weights for policy 0, policy_version 416643 (0.0014) [2024-06-15 16:59:50,738][1648984] Fps is (10 sec: 39329.7, 60 sec: 44782.9, 300 sec: 43098.3). Total num frames: 853409792. Throughput: 0: 10843.0. Samples: 213410304. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 16:59:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 16:59:52,135][1652475] Updated weights for policy 0, policy_version 416709 (0.0013) [2024-06-15 16:59:54,130][1652475] Updated weights for policy 0, policy_version 416809 (0.0014) [2024-06-15 16:59:55,738][1648984] Fps is (10 sec: 49150.5, 60 sec: 42598.3, 300 sec: 43431.4). Total num frames: 853671936. Throughput: 0: 10877.1. Samples: 213468160. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 16:59:55,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 16:59:56,974][1652475] Updated weights for policy 0, policy_version 416854 (0.0015) [2024-06-15 17:00:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 853803008. Throughput: 0: 10843.0. Samples: 213535744. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:00:01,362][1652475] Updated weights for policy 0, policy_version 416898 (0.0013) [2024-06-15 17:00:05,064][1652475] Updated weights for policy 0, policy_version 416962 (0.0015) [2024-06-15 17:00:05,738][1648984] Fps is (10 sec: 32768.9, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 853999616. Throughput: 0: 10808.9. Samples: 213564416. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:00:07,190][1652475] Updated weights for policy 0, policy_version 417076 (0.0084) [2024-06-15 17:00:09,173][1652475] Updated weights for policy 0, policy_version 417145 (0.0012) [2024-06-15 17:00:10,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 43690.4, 300 sec: 43431.4). Total num frames: 854327296. Throughput: 0: 10376.5. Samples: 213617152. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:10,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:00:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.3, 300 sec: 42987.2). Total num frames: 854425600. Throughput: 0: 10808.9. Samples: 213697536. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:00:15,941][1652475] Updated weights for policy 0, policy_version 417209 (0.0113) [2024-06-15 17:00:16,831][1651340] Signal inference workers to stop experience collection... (21400 times) [2024-06-15 17:00:16,886][1652475] InferenceWorker_p0-w0: stopping experience collection (21400 times) [2024-06-15 17:00:17,095][1651340] Signal inference workers to resume experience collection... (21400 times) [2024-06-15 17:00:17,096][1652475] InferenceWorker_p0-w0: resuming experience collection (21400 times) [2024-06-15 17:00:17,319][1652475] Updated weights for policy 0, policy_version 417280 (0.0033) [2024-06-15 17:00:19,933][1652475] Updated weights for policy 0, policy_version 417360 (0.0012) [2024-06-15 17:00:20,738][1648984] Fps is (10 sec: 49153.9, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 854818816. Throughput: 0: 10717.9. Samples: 213726720. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:00:25,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 854851584. Throughput: 0: 10877.6. Samples: 213796352. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:00:26,499][1652475] Updated weights for policy 0, policy_version 417410 (0.0014) [2024-06-15 17:00:28,267][1652475] Updated weights for policy 0, policy_version 417504 (0.0135) [2024-06-15 17:00:29,749][1652475] Updated weights for policy 0, policy_version 417540 (0.0013) [2024-06-15 17:00:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 855212032. Throughput: 0: 10899.9. Samples: 213863424. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:00:31,250][1652475] Updated weights for policy 0, policy_version 417603 (0.0013) [2024-06-15 17:00:32,560][1652475] Updated weights for policy 0, policy_version 417659 (0.0011) [2024-06-15 17:00:35,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 855375872. Throughput: 0: 10740.6. Samples: 213893632. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:00:38,849][1652475] Updated weights for policy 0, policy_version 417720 (0.0016) [2024-06-15 17:00:39,906][1652475] Updated weights for policy 0, policy_version 417764 (0.0013) [2024-06-15 17:00:40,728][1652475] Updated weights for policy 0, policy_version 417798 (0.0011) [2024-06-15 17:00:40,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43692.1, 300 sec: 43542.5). Total num frames: 855638016. Throughput: 0: 11241.3. Samples: 213974016. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:00:43,384][1652475] Updated weights for policy 0, policy_version 417915 (0.0109) [2024-06-15 17:00:45,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 45329.1, 300 sec: 43764.7). Total num frames: 855900160. Throughput: 0: 11082.0. Samples: 214034432. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:00:50,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 856064000. Throughput: 0: 11457.5. Samples: 214080000. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:00:50,792][1652475] Updated weights for policy 0, policy_version 418002 (0.0015) [2024-06-15 17:00:52,869][1652475] Updated weights for policy 0, policy_version 418084 (0.0014) [2024-06-15 17:00:54,271][1652475] Updated weights for policy 0, policy_version 418144 (0.0013) [2024-06-15 17:00:55,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 45875.3, 300 sec: 43986.8). Total num frames: 856424448. Throughput: 0: 11628.1. Samples: 214140416. Policy #0 lag: (min: 15.0, avg: 93.3, max: 271.0) [2024-06-15 17:00:55,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:00:55,754][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000418176_856424448.pth... [2024-06-15 17:00:55,828][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000413056_845938688.pth [2024-06-15 17:01:00,149][1651340] Signal inference workers to stop experience collection... (21450 times) [2024-06-15 17:01:00,194][1652475] InferenceWorker_p0-w0: stopping experience collection (21450 times) [2024-06-15 17:01:00,319][1651340] Signal inference workers to resume experience collection... (21450 times) [2024-06-15 17:01:00,320][1652475] InferenceWorker_p0-w0: resuming experience collection (21450 times) [2024-06-15 17:01:00,323][1652475] Updated weights for policy 0, policy_version 418192 (0.0015) [2024-06-15 17:01:00,746][1648984] Fps is (10 sec: 42563.2, 60 sec: 44776.8, 300 sec: 43319.2). Total num frames: 856489984. Throughput: 0: 11671.5. Samples: 214222848. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:00,747][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:01:01,104][1652475] Updated weights for policy 0, policy_version 418240 (0.0013) [2024-06-15 17:01:02,240][1652475] Updated weights for policy 0, policy_version 418300 (0.0013) [2024-06-15 17:01:03,598][1652475] Updated weights for policy 0, policy_version 418352 (0.0021) [2024-06-15 17:01:05,627][1652475] Updated weights for policy 0, policy_version 418432 (0.0019) [2024-06-15 17:01:05,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 49152.0, 300 sec: 44431.2). Total num frames: 856948736. Throughput: 0: 11810.1. Samples: 214258176. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:01:10,738][1648984] Fps is (10 sec: 45912.9, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 856948736. Throughput: 0: 11867.0. Samples: 214330368. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:01:12,783][1652475] Updated weights for policy 0, policy_version 418528 (0.0113) [2024-06-15 17:01:15,483][1652475] Updated weights for policy 0, policy_version 418633 (0.0022) [2024-06-15 17:01:15,740][1648984] Fps is (10 sec: 42598.4, 60 sec: 49152.0, 300 sec: 44098.0). Total num frames: 857374720. Throughput: 0: 11741.9. Samples: 214391808. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:15,741][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:01:16,725][1652475] Updated weights for policy 0, policy_version 418688 (0.0016) [2024-06-15 17:01:20,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 44236.6, 300 sec: 43986.9). Total num frames: 857473024. Throughput: 0: 11798.7. Samples: 214424576. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:20,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:01:23,404][1652475] Updated weights for policy 0, policy_version 418757 (0.0013) [2024-06-15 17:01:24,387][1652475] Updated weights for policy 0, policy_version 418815 (0.0015) [2024-06-15 17:01:25,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 49698.1, 300 sec: 44097.9). Total num frames: 857833472. Throughput: 0: 11753.3. Samples: 214502912. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:01:25,965][1652475] Updated weights for policy 0, policy_version 418880 (0.0012) [2024-06-15 17:01:27,233][1652475] Updated weights for policy 0, policy_version 418943 (0.0012) [2024-06-15 17:01:30,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 46421.4, 300 sec: 44431.2). Total num frames: 857997312. Throughput: 0: 12026.3. Samples: 214575616. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:01:34,099][1652475] Updated weights for policy 0, policy_version 419024 (0.0012) [2024-06-15 17:01:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 48059.7, 300 sec: 44320.1). Total num frames: 858259456. Throughput: 0: 11969.4. Samples: 214618624. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:01:36,650][1651340] Signal inference workers to stop experience collection... (21500 times) [2024-06-15 17:01:36,715][1652475] Updated weights for policy 0, policy_version 419107 (0.0121) [2024-06-15 17:01:36,768][1652475] InferenceWorker_p0-w0: stopping experience collection (21500 times) [2024-06-15 17:01:36,829][1651340] Signal inference workers to resume experience collection... (21500 times) [2024-06-15 17:01:36,838][1652475] InferenceWorker_p0-w0: resuming experience collection (21500 times) [2024-06-15 17:01:37,725][1652475] Updated weights for policy 0, policy_version 419168 (0.0021) [2024-06-15 17:01:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 48059.8, 300 sec: 44431.2). Total num frames: 858521600. Throughput: 0: 11980.9. Samples: 214679552. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:40,740][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 17:01:43,780][1652475] Updated weights for policy 0, policy_version 419218 (0.0012) [2024-06-15 17:01:45,028][1652475] Updated weights for policy 0, policy_version 419267 (0.0013) [2024-06-15 17:01:45,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 44542.3). Total num frames: 858718208. Throughput: 0: 11778.2. Samples: 214752768. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:01:47,348][1652475] Updated weights for policy 0, policy_version 419344 (0.0014) [2024-06-15 17:01:50,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 48059.8, 300 sec: 44098.0). Total num frames: 858947584. Throughput: 0: 11685.0. Samples: 214784000. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:01:50,887][1652475] Updated weights for policy 0, policy_version 419424 (0.0013) [2024-06-15 17:01:55,473][1652475] Updated weights for policy 0, policy_version 419474 (0.0015) [2024-06-15 17:01:55,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 44237.0, 300 sec: 44431.2). Total num frames: 859078656. Throughput: 0: 11639.5. Samples: 214854144. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:01:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:01:57,217][1652475] Updated weights for policy 0, policy_version 419553 (0.0014) [2024-06-15 17:01:58,638][1652475] Updated weights for policy 0, policy_version 419605 (0.0014) [2024-06-15 17:02:00,738][1648984] Fps is (10 sec: 49150.0, 60 sec: 49158.5, 300 sec: 44431.1). Total num frames: 859439104. Throughput: 0: 11696.3. Samples: 214918144. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:02:00,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:02:02,779][1652475] Updated weights for policy 0, policy_version 419664 (0.0017) [2024-06-15 17:02:03,956][1652475] Updated weights for policy 0, policy_version 419709 (0.0012) [2024-06-15 17:02:05,738][1648984] Fps is (10 sec: 49149.7, 60 sec: 43690.4, 300 sec: 44431.1). Total num frames: 859570176. Throughput: 0: 11673.5. Samples: 214949888. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:02:05,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:02:09,180][1652475] Updated weights for policy 0, policy_version 419792 (0.0013) [2024-06-15 17:02:10,454][1652475] Updated weights for policy 0, policy_version 419856 (0.0013) [2024-06-15 17:02:10,738][1648984] Fps is (10 sec: 45876.4, 60 sec: 49152.0, 300 sec: 44764.4). Total num frames: 859897856. Throughput: 0: 11400.5. Samples: 215015936. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:02:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:02:11,290][1652475] Updated weights for policy 0, policy_version 419904 (0.0016) [2024-06-15 17:02:14,442][1652475] Updated weights for policy 0, policy_version 419964 (0.0013) [2024-06-15 17:02:15,738][1648984] Fps is (10 sec: 52431.4, 60 sec: 45329.1, 300 sec: 44431.2). Total num frames: 860094464. Throughput: 0: 11104.7. Samples: 215075328. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:02:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:02:20,738][1648984] Fps is (10 sec: 22937.7, 60 sec: 44236.9, 300 sec: 44098.0). Total num frames: 860127232. Throughput: 0: 10922.7. Samples: 215110144. Policy #0 lag: (min: 12.0, avg: 77.0, max: 268.0) [2024-06-15 17:02:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 17:02:21,483][1652475] Updated weights for policy 0, policy_version 420032 (0.0025) [2024-06-15 17:02:23,209][1651340] Signal inference workers to stop experience collection... (21550 times) [2024-06-15 17:02:23,239][1652475] InferenceWorker_p0-w0: stopping experience collection (21550 times) [2024-06-15 17:02:23,433][1651340] Signal inference workers to resume experience collection... (21550 times) [2024-06-15 17:02:23,433][1652475] InferenceWorker_p0-w0: resuming experience collection (21550 times) [2024-06-15 17:02:23,750][1652475] Updated weights for policy 0, policy_version 420112 (0.0092) [2024-06-15 17:02:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44783.0, 300 sec: 44098.0). Total num frames: 860520448. Throughput: 0: 10899.9. Samples: 215170048. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:02:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:02:26,331][1652475] Updated weights for policy 0, policy_version 420215 (0.0014) [2024-06-15 17:02:30,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.6, 300 sec: 44098.0). Total num frames: 860618752. Throughput: 0: 10717.9. Samples: 215235072. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:02:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:02:33,488][1652475] Updated weights for policy 0, policy_version 420256 (0.0082) [2024-06-15 17:02:35,350][1652475] Updated weights for policy 0, policy_version 420320 (0.0028) [2024-06-15 17:02:35,738][1648984] Fps is (10 sec: 29490.2, 60 sec: 42598.2, 300 sec: 44209.0). Total num frames: 860815360. Throughput: 0: 10888.4. Samples: 215273984. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:02:35,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:02:37,558][1652475] Updated weights for policy 0, policy_version 420400 (0.0012) [2024-06-15 17:02:39,534][1652475] Updated weights for policy 0, policy_version 420475 (0.0011) [2024-06-15 17:02:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 861143040. Throughput: 0: 10331.0. Samples: 215319040. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:02:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:02:45,738][1648984] Fps is (10 sec: 36045.8, 60 sec: 40960.0, 300 sec: 44098.0). Total num frames: 861175808. Throughput: 0: 10638.3. Samples: 215396864. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:02:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:02:46,342][1652475] Updated weights for policy 0, policy_version 420537 (0.0014) [2024-06-15 17:02:48,782][1652475] Updated weights for policy 0, policy_version 420579 (0.0099) [2024-06-15 17:02:50,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 42052.2, 300 sec: 43764.7). Total num frames: 861470720. Throughput: 0: 10672.5. Samples: 215430144. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:02:50,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:02:51,183][1652475] Updated weights for policy 0, policy_version 420672 (0.0160) [2024-06-15 17:02:52,802][1652475] Updated weights for policy 0, policy_version 420736 (0.0020) [2024-06-15 17:02:55,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.5, 300 sec: 44431.2). Total num frames: 861667328. Throughput: 0: 10183.1. Samples: 215474176. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:02:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:02:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000420736_861667328.pth... [2024-06-15 17:02:55,838][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000415616_851181568.pth [2024-06-15 17:02:59,428][1652475] Updated weights for policy 0, policy_version 420790 (0.0013) [2024-06-15 17:03:00,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 39321.8, 300 sec: 43653.6). Total num frames: 861798400. Throughput: 0: 10410.7. Samples: 215543808. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:03:00,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:02,812][1652475] Updated weights for policy 0, policy_version 420864 (0.0020) [2024-06-15 17:03:04,662][1652475] Updated weights for policy 0, policy_version 420934 (0.0013) [2024-06-15 17:03:05,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43691.0, 300 sec: 44431.2). Total num frames: 862191616. Throughput: 0: 10217.2. Samples: 215569920. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:03:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:10,702][1651340] Signal inference workers to stop experience collection... (21600 times) [2024-06-15 17:03:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 38229.3, 300 sec: 43653.6). Total num frames: 862191616. Throughput: 0: 10365.1. Samples: 215636480. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:03:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:10,784][1652475] InferenceWorker_p0-w0: stopping experience collection (21600 times) [2024-06-15 17:03:10,786][1652475] Updated weights for policy 0, policy_version 420997 (0.0116) [2024-06-15 17:03:10,937][1651340] Signal inference workers to resume experience collection... (21600 times) [2024-06-15 17:03:10,938][1652475] InferenceWorker_p0-w0: resuming experience collection (21600 times) [2024-06-15 17:03:13,619][1652475] Updated weights for policy 0, policy_version 421074 (0.0139) [2024-06-15 17:03:15,735][1652475] Updated weights for policy 0, policy_version 421152 (0.0015) [2024-06-15 17:03:15,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 39867.7, 300 sec: 43653.6). Total num frames: 862486528. Throughput: 0: 10319.6. Samples: 215699456. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:03:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:17,690][1652475] Updated weights for policy 0, policy_version 421240 (0.0014) [2024-06-15 17:03:20,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43144.5, 300 sec: 44209.0). Total num frames: 862715904. Throughput: 0: 9967.0. Samples: 215722496. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:03:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:23,804][1652475] Updated weights for policy 0, policy_version 421296 (0.0098) [2024-06-15 17:03:25,710][1652475] Updated weights for policy 0, policy_version 421344 (0.0042) [2024-06-15 17:03:25,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 39867.7, 300 sec: 43653.6). Total num frames: 862912512. Throughput: 0: 10626.8. Samples: 215797248. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:03:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:27,238][1652475] Updated weights for policy 0, policy_version 421392 (0.0014) [2024-06-15 17:03:29,381][1652475] Updated weights for policy 0, policy_version 421478 (0.0013) [2024-06-15 17:03:30,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 863240192. Throughput: 0: 10171.7. Samples: 215854592. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:03:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:35,186][1652475] Updated weights for policy 0, policy_version 421536 (0.0144) [2024-06-15 17:03:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42052.5, 300 sec: 44098.0). Total num frames: 863338496. Throughput: 0: 10376.5. Samples: 215897088. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:03:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:35,849][1652475] Updated weights for policy 0, policy_version 421568 (0.0046) [2024-06-15 17:03:37,939][1652475] Updated weights for policy 0, policy_version 421632 (0.0015) [2024-06-15 17:03:40,313][1652475] Updated weights for policy 0, policy_version 421712 (0.0013) [2024-06-15 17:03:40,738][1648984] Fps is (10 sec: 45874.1, 60 sec: 42598.2, 300 sec: 44209.0). Total num frames: 863698944. Throughput: 0: 10740.6. Samples: 215957504. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:03:40,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:45,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43144.6, 300 sec: 44209.0). Total num frames: 863764480. Throughput: 0: 10808.9. Samples: 216030208. Policy #0 lag: (min: 15.0, avg: 127.6, max: 335.0) [2024-06-15 17:03:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:48,230][1652475] Updated weights for policy 0, policy_version 421819 (0.0019) [2024-06-15 17:03:49,947][1652475] Updated weights for policy 0, policy_version 421872 (0.0013) [2024-06-15 17:03:50,738][1648984] Fps is (10 sec: 32769.0, 60 sec: 42598.4, 300 sec: 43764.8). Total num frames: 864026624. Throughput: 0: 10899.9. Samples: 216060416. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:03:50,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:03:51,890][1652475] Updated weights for policy 0, policy_version 421950 (0.0114) [2024-06-15 17:03:52,448][1651340] Signal inference workers to stop experience collection... (21650 times) [2024-06-15 17:03:52,499][1652475] InferenceWorker_p0-w0: stopping experience collection (21650 times) [2024-06-15 17:03:52,685][1651340] Signal inference workers to resume experience collection... (21650 times) [2024-06-15 17:03:52,686][1652475] InferenceWorker_p0-w0: resuming experience collection (21650 times) [2024-06-15 17:03:53,259][1652475] Updated weights for policy 0, policy_version 422009 (0.0013) [2024-06-15 17:03:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 864288768. Throughput: 0: 10774.8. Samples: 216121344. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:03:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:03:58,819][1652475] Updated weights for policy 0, policy_version 422049 (0.0012) [2024-06-15 17:04:00,059][1652475] Updated weights for policy 0, policy_version 422084 (0.0012) [2024-06-15 17:04:00,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 44209.0). Total num frames: 864485376. Throughput: 0: 11059.2. Samples: 216197120. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:04:01,407][1652475] Updated weights for policy 0, policy_version 422144 (0.0011) [2024-06-15 17:04:03,507][1652475] Updated weights for policy 0, policy_version 422199 (0.0013) [2024-06-15 17:04:05,567][1652475] Updated weights for policy 0, policy_version 422272 (0.0013) [2024-06-15 17:04:05,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 864813056. Throughput: 0: 11241.3. Samples: 216228352. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:04:10,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 45329.1, 300 sec: 44209.0). Total num frames: 864911360. Throughput: 0: 11150.2. Samples: 216299008. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:04:10,777][1652475] Updated weights for policy 0, policy_version 422336 (0.0013) [2024-06-15 17:04:11,970][1652475] Updated weights for policy 0, policy_version 422384 (0.0013) [2024-06-15 17:04:14,016][1652475] Updated weights for policy 0, policy_version 422417 (0.0013) [2024-06-15 17:04:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 45329.1, 300 sec: 43986.9). Total num frames: 865206272. Throughput: 0: 11229.9. Samples: 216359936. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:15,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 17:04:17,418][1652475] Updated weights for policy 0, policy_version 422480 (0.0022) [2024-06-15 17:04:18,456][1652475] Updated weights for policy 0, policy_version 422528 (0.0013) [2024-06-15 17:04:20,738][1648984] Fps is (10 sec: 42597.2, 60 sec: 43690.5, 300 sec: 44431.1). Total num frames: 865337344. Throughput: 0: 11150.1. Samples: 216398848. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:20,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:04:22,713][1652475] Updated weights for policy 0, policy_version 422597 (0.0013) [2024-06-15 17:04:24,152][1652475] Updated weights for policy 0, policy_version 422661 (0.0013) [2024-06-15 17:04:25,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 46967.5, 300 sec: 44542.3). Total num frames: 865730560. Throughput: 0: 11264.0. Samples: 216464384. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:04:29,534][1652475] Updated weights for policy 0, policy_version 422755 (0.0033) [2024-06-15 17:04:30,738][1648984] Fps is (10 sec: 52430.1, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 865861632. Throughput: 0: 11036.4. Samples: 216526848. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:30,741][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:04:34,655][1652475] Updated weights for policy 0, policy_version 422788 (0.0021) [2024-06-15 17:04:35,738][1648984] Fps is (10 sec: 22937.7, 60 sec: 43690.6, 300 sec: 43876.1). Total num frames: 865959936. Throughput: 0: 11184.3. Samples: 216563712. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:35,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 17:04:36,962][1652475] Updated weights for policy 0, policy_version 422880 (0.0112) [2024-06-15 17:04:38,633][1651340] Signal inference workers to stop experience collection... (21700 times) [2024-06-15 17:04:38,691][1652475] InferenceWorker_p0-w0: stopping experience collection (21700 times) [2024-06-15 17:04:38,814][1651340] Signal inference workers to resume experience collection... (21700 times) [2024-06-15 17:04:38,815][1652475] InferenceWorker_p0-w0: resuming experience collection (21700 times) [2024-06-15 17:04:39,025][1652475] Updated weights for policy 0, policy_version 422971 (0.0014) [2024-06-15 17:04:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.7, 300 sec: 44431.2). Total num frames: 866287616. Throughput: 0: 10922.7. Samples: 216612864. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:04:40,779][1652475] Updated weights for policy 0, policy_version 423008 (0.0013) [2024-06-15 17:04:45,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 866385920. Throughput: 0: 10820.3. Samples: 216684032. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:04:47,174][1652475] Updated weights for policy 0, policy_version 423056 (0.0012) [2024-06-15 17:04:50,083][1652475] Updated weights for policy 0, policy_version 423122 (0.0124) [2024-06-15 17:04:50,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 43144.5, 300 sec: 43875.8). Total num frames: 866615296. Throughput: 0: 10763.4. Samples: 216712704. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:04:52,375][1652475] Updated weights for policy 0, policy_version 423219 (0.0216) [2024-06-15 17:04:55,738][1648984] Fps is (10 sec: 52426.9, 60 sec: 43690.4, 300 sec: 44431.1). Total num frames: 866910208. Throughput: 0: 10331.0. Samples: 216763904. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:04:55,739][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 17:04:55,748][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000423296_866910208.pth... [2024-06-15 17:04:55,808][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000418176_856424448.pth [2024-06-15 17:04:59,724][1652475] Updated weights for policy 0, policy_version 423314 (0.0029) [2024-06-15 17:05:00,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 42598.4, 300 sec: 44209.0). Total num frames: 867041280. Throughput: 0: 10661.0. Samples: 216839680. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:05:00,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:02,919][1652475] Updated weights for policy 0, policy_version 423392 (0.0024) [2024-06-15 17:05:04,644][1652475] Updated weights for policy 0, policy_version 423458 (0.0017) [2024-06-15 17:05:05,738][1648984] Fps is (10 sec: 42599.9, 60 sec: 42052.3, 300 sec: 44098.0). Total num frames: 867336192. Throughput: 0: 10592.8. Samples: 216875520. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:05:05,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:06,437][1652475] Updated weights for policy 0, policy_version 423543 (0.0012) [2024-06-15 17:05:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42052.3, 300 sec: 44098.0). Total num frames: 867434496. Throughput: 0: 10501.7. Samples: 216936960. Policy #0 lag: (min: 8.0, avg: 71.1, max: 264.0) [2024-06-15 17:05:10,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:12,091][1652475] Updated weights for policy 0, policy_version 423600 (0.0013) [2024-06-15 17:05:15,123][1652475] Updated weights for policy 0, policy_version 423650 (0.0012) [2024-06-15 17:05:15,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 40960.0, 300 sec: 43542.6). Total num frames: 867663872. Throughput: 0: 10626.8. Samples: 217005056. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:05:15,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:16,917][1652475] Updated weights for policy 0, policy_version 423714 (0.0015) [2024-06-15 17:05:18,353][1652475] Updated weights for policy 0, policy_version 423777 (0.0010) [2024-06-15 17:05:20,746][1648984] Fps is (10 sec: 52383.9, 60 sec: 43684.6, 300 sec: 44429.9). Total num frames: 867958784. Throughput: 0: 10385.9. Samples: 217031168. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:05:20,747][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:23,835][1652475] Updated weights for policy 0, policy_version 423840 (0.0022) [2024-06-15 17:05:25,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 39321.6, 300 sec: 43653.6). Total num frames: 868089856. Throughput: 0: 10922.7. Samples: 217104384. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:05:25,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:26,810][1651340] Signal inference workers to stop experience collection... (21750 times) [2024-06-15 17:05:26,865][1652475] InferenceWorker_p0-w0: stopping experience collection (21750 times) [2024-06-15 17:05:27,089][1651340] Signal inference workers to resume experience collection... (21750 times) [2024-06-15 17:05:27,090][1652475] InferenceWorker_p0-w0: resuming experience collection (21750 times) [2024-06-15 17:05:27,529][1652475] Updated weights for policy 0, policy_version 423920 (0.0015) [2024-06-15 17:05:29,214][1652475] Updated weights for policy 0, policy_version 423990 (0.0024) [2024-06-15 17:05:30,738][1648984] Fps is (10 sec: 49194.2, 60 sec: 43144.6, 300 sec: 44320.1). Total num frames: 868450304. Throughput: 0: 10535.8. Samples: 217158144. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:05:30,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:30,772][1652475] Updated weights for policy 0, policy_version 424063 (0.0147) [2024-06-15 17:05:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42598.4, 300 sec: 43653.7). Total num frames: 868515840. Throughput: 0: 10831.7. Samples: 217200128. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:05:35,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:36,481][1652475] Updated weights for policy 0, policy_version 424123 (0.0015) [2024-06-15 17:05:39,135][1652475] Updated weights for policy 0, policy_version 424162 (0.0030) [2024-06-15 17:05:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42052.3, 300 sec: 43764.7). Total num frames: 868810752. Throughput: 0: 11207.2. Samples: 217268224. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:05:40,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:40,882][1652475] Updated weights for policy 0, policy_version 424227 (0.0013) [2024-06-15 17:05:43,071][1652475] Updated weights for policy 0, policy_version 424312 (0.0021) [2024-06-15 17:05:45,743][1648984] Fps is (10 sec: 49125.0, 60 sec: 43686.6, 300 sec: 43875.0). Total num frames: 869007360. Throughput: 0: 10910.0. Samples: 217330688. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:05:45,744][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:48,416][1652475] Updated weights for policy 0, policy_version 424353 (0.0021) [2024-06-15 17:05:50,673][1652475] Updated weights for policy 0, policy_version 424406 (0.0014) [2024-06-15 17:05:50,738][1648984] Fps is (10 sec: 36043.6, 60 sec: 42598.2, 300 sec: 43209.3). Total num frames: 869171200. Throughput: 0: 10911.2. Samples: 217366528. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:05:50,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:05:52,366][1652475] Updated weights for policy 0, policy_version 424480 (0.0039) [2024-06-15 17:05:54,403][1652475] Updated weights for policy 0, policy_version 424550 (0.0025) [2024-06-15 17:05:55,738][1648984] Fps is (10 sec: 52457.7, 60 sec: 43690.9, 300 sec: 44210.3). Total num frames: 869531648. Throughput: 0: 10729.3. Samples: 217419776. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:05:55,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:06:00,086][1652475] Updated weights for policy 0, policy_version 424595 (0.0013) [2024-06-15 17:06:00,738][1648984] Fps is (10 sec: 45876.4, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 869629952. Throughput: 0: 10979.5. Samples: 217499136. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:06:00,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:06:02,834][1652475] Updated weights for policy 0, policy_version 424672 (0.0123) [2024-06-15 17:06:04,893][1652475] Updated weights for policy 0, policy_version 424744 (0.0089) [2024-06-15 17:06:05,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 869924864. Throughput: 0: 11004.4. Samples: 217526272. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:06:05,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 17:06:05,988][1651340] Signal inference workers to stop experience collection... (21800 times) [2024-06-15 17:06:06,022][1652475] InferenceWorker_p0-w0: stopping experience collection (21800 times) [2024-06-15 17:06:06,312][1651340] Signal inference workers to resume experience collection... (21800 times) [2024-06-15 17:06:06,313][1652475] InferenceWorker_p0-w0: resuming experience collection (21800 times) [2024-06-15 17:06:06,963][1652475] Updated weights for policy 0, policy_version 424829 (0.0105) [2024-06-15 17:06:10,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 870055936. Throughput: 0: 10683.8. Samples: 217585152. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:06:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:06:15,413][1652475] Updated weights for policy 0, policy_version 424912 (0.0013) [2024-06-15 17:06:15,738][1648984] Fps is (10 sec: 32767.3, 60 sec: 43144.4, 300 sec: 43320.4). Total num frames: 870252544. Throughput: 0: 10968.1. Samples: 217651712. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:06:15,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:06:16,596][1652475] Updated weights for policy 0, policy_version 424963 (0.0020) [2024-06-15 17:06:18,809][1652475] Updated weights for policy 0, policy_version 425040 (0.0219) [2024-06-15 17:06:20,017][1652475] Updated weights for policy 0, policy_version 425084 (0.0012) [2024-06-15 17:06:20,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43697.0, 300 sec: 43209.3). Total num frames: 870580224. Throughput: 0: 10535.8. Samples: 217674240. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:06:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:06:25,738][1648984] Fps is (10 sec: 42599.6, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 870678528. Throughput: 0: 10456.2. Samples: 217738752. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:06:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 17:06:25,937][1652475] Updated weights for policy 0, policy_version 425144 (0.0139) [2024-06-15 17:06:28,997][1652475] Updated weights for policy 0, policy_version 425208 (0.0142) [2024-06-15 17:06:29,854][1652475] Updated weights for policy 0, policy_version 425235 (0.0013) [2024-06-15 17:06:30,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 870940672. Throughput: 0: 10377.8. Samples: 217797632. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:06:30,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 17:06:30,781][1652475] Updated weights for policy 0, policy_version 425280 (0.0012) [2024-06-15 17:06:32,975][1652475] Updated weights for policy 0, policy_version 425339 (0.0123) [2024-06-15 17:06:35,738][1648984] Fps is (10 sec: 42597.6, 60 sec: 43144.4, 300 sec: 42653.9). Total num frames: 871104512. Throughput: 0: 10353.8. Samples: 217832448. Policy #0 lag: (min: 15.0, avg: 98.8, max: 271.0) [2024-06-15 17:06:35,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:06:37,685][1652475] Updated weights for policy 0, policy_version 425403 (0.0015) [2024-06-15 17:06:40,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 871333888. Throughput: 0: 10649.6. Samples: 217899008. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:06:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:06:40,831][1652475] Updated weights for policy 0, policy_version 425468 (0.0014) [2024-06-15 17:06:42,355][1652475] Updated weights for policy 0, policy_version 425520 (0.0012) [2024-06-15 17:06:44,861][1652475] Updated weights for policy 0, policy_version 425568 (0.0012) [2024-06-15 17:06:45,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43694.6, 300 sec: 42987.1). Total num frames: 871628800. Throughput: 0: 10194.5. Samples: 217957888. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:06:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:06:50,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42052.5, 300 sec: 42765.0). Total num frames: 871694336. Throughput: 0: 10353.8. Samples: 217992192. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:06:50,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:06:51,234][1652475] Updated weights for policy 0, policy_version 425648 (0.0014) [2024-06-15 17:06:52,858][1652475] Updated weights for policy 0, policy_version 425696 (0.0012) [2024-06-15 17:06:54,952][1652475] Updated weights for policy 0, policy_version 425776 (0.0095) [2024-06-15 17:06:55,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 872022016. Throughput: 0: 10365.2. Samples: 218051584. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:06:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 17:06:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000425792_872022016.pth... [2024-06-15 17:06:55,869][1651340] Signal inference workers to stop experience collection... (21850 times) [2024-06-15 17:06:55,894][1652475] InferenceWorker_p0-w0: stopping experience collection (21850 times) [2024-06-15 17:06:55,896][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000420736_861667328.pth [2024-06-15 17:06:56,134][1651340] Signal inference workers to resume experience collection... (21850 times) [2024-06-15 17:06:56,135][1652475] InferenceWorker_p0-w0: resuming experience collection (21850 times) [2024-06-15 17:06:56,810][1652475] Updated weights for policy 0, policy_version 425840 (0.0012) [2024-06-15 17:07:00,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42052.3, 300 sec: 42654.0). Total num frames: 872153088. Throughput: 0: 10228.7. Samples: 218112000. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:07:03,519][1652475] Updated weights for policy 0, policy_version 425904 (0.0059) [2024-06-15 17:07:05,738][1648984] Fps is (10 sec: 26214.5, 60 sec: 39321.7, 300 sec: 41987.5). Total num frames: 872284160. Throughput: 0: 10444.8. Samples: 218144256. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:05,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 17:07:07,383][1652475] Updated weights for policy 0, policy_version 425985 (0.0015) [2024-06-15 17:07:10,002][1652475] Updated weights for policy 0, policy_version 426080 (0.0016) [2024-06-15 17:07:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 872677376. Throughput: 0: 10240.0. Samples: 218199552. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:07:15,627][1652475] Updated weights for policy 0, policy_version 426160 (0.0017) [2024-06-15 17:07:15,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 42052.5, 300 sec: 42876.1). Total num frames: 872775680. Throughput: 0: 10331.0. Samples: 218262528. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:07:20,583][1652475] Updated weights for policy 0, policy_version 426256 (0.0077) [2024-06-15 17:07:20,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 39867.7, 300 sec: 42209.6). Total num frames: 872972288. Throughput: 0: 10353.8. Samples: 218298368. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:07:22,997][1652475] Updated weights for policy 0, policy_version 426320 (0.0095) [2024-06-15 17:07:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42653.9). Total num frames: 873201664. Throughput: 0: 10126.2. Samples: 218354688. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:07:27,027][1652475] Updated weights for policy 0, policy_version 426370 (0.0014) [2024-06-15 17:07:30,739][1648984] Fps is (10 sec: 36040.3, 60 sec: 39866.9, 300 sec: 42431.6). Total num frames: 873332736. Throughput: 0: 10410.4. Samples: 218426368. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:30,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:07:31,039][1652475] Updated weights for policy 0, policy_version 426435 (0.0029) [2024-06-15 17:07:32,911][1652475] Updated weights for policy 0, policy_version 426515 (0.0013) [2024-06-15 17:07:33,845][1652475] Updated weights for policy 0, policy_version 426556 (0.0014) [2024-06-15 17:07:35,338][1652475] Updated weights for policy 0, policy_version 426615 (0.0013) [2024-06-15 17:07:35,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 873725952. Throughput: 0: 10285.5. Samples: 218455040. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:07:40,069][1652475] Updated weights for policy 0, policy_version 426682 (0.0022) [2024-06-15 17:07:40,738][1648984] Fps is (10 sec: 52435.4, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 873857024. Throughput: 0: 10570.0. Samples: 218527232. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:07:43,840][1652475] Updated weights for policy 0, policy_version 426736 (0.0040) [2024-06-15 17:07:45,174][1651340] Signal inference workers to stop experience collection... (21900 times) [2024-06-15 17:07:45,221][1652475] InferenceWorker_p0-w0: stopping experience collection (21900 times) [2024-06-15 17:07:45,265][1652475] Updated weights for policy 0, policy_version 426790 (0.0011) [2024-06-15 17:07:45,459][1651340] Signal inference workers to resume experience collection... (21900 times) [2024-06-15 17:07:45,460][1652475] InferenceWorker_p0-w0: resuming experience collection (21900 times) [2024-06-15 17:07:45,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 40960.0, 300 sec: 42765.0). Total num frames: 874086400. Throughput: 0: 10570.0. Samples: 218587648. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:07:47,001][1652475] Updated weights for policy 0, policy_version 426835 (0.0012) [2024-06-15 17:07:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 874250240. Throughput: 0: 10569.9. Samples: 218619904. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:07:51,367][1652475] Updated weights for policy 0, policy_version 426899 (0.0014) [2024-06-15 17:07:54,515][1652475] Updated weights for policy 0, policy_version 426946 (0.0019) [2024-06-15 17:07:55,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 874479616. Throughput: 0: 11002.3. Samples: 218694656. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 17:07:55,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:07:56,766][1652475] Updated weights for policy 0, policy_version 427040 (0.0099) [2024-06-15 17:07:57,594][1652475] Updated weights for policy 0, policy_version 427072 (0.0013) [2024-06-15 17:07:59,173][1652475] Updated weights for policy 0, policy_version 427130 (0.0098) [2024-06-15 17:08:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 874774528. Throughput: 0: 10945.4. Samples: 218755072. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:08:04,268][1652475] Updated weights for policy 0, policy_version 427196 (0.0035) [2024-06-15 17:08:05,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 874905600. Throughput: 0: 10990.9. Samples: 218792960. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:08:07,972][1652475] Updated weights for policy 0, policy_version 427266 (0.0013) [2024-06-15 17:08:09,278][1652475] Updated weights for policy 0, policy_version 427328 (0.0013) [2024-06-15 17:08:10,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 875266048. Throughput: 0: 11070.6. Samples: 218852864. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:08:10,906][1652475] Updated weights for policy 0, policy_version 427389 (0.0013) [2024-06-15 17:08:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 875364352. Throughput: 0: 11014.0. Samples: 218921984. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:08:16,285][1652475] Updated weights for policy 0, policy_version 427448 (0.0025) [2024-06-15 17:08:19,369][1652475] Updated weights for policy 0, policy_version 427491 (0.0015) [2024-06-15 17:08:20,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 875626496. Throughput: 0: 11104.7. Samples: 218954752. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:08:21,010][1652475] Updated weights for policy 0, policy_version 427568 (0.0011) [2024-06-15 17:08:22,619][1652475] Updated weights for policy 0, policy_version 427621 (0.0102) [2024-06-15 17:08:25,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 875823104. Throughput: 0: 10945.4. Samples: 219019776. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:08:27,967][1652475] Updated weights for policy 0, policy_version 427696 (0.0098) [2024-06-15 17:08:30,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 43691.5, 300 sec: 42765.0). Total num frames: 875954176. Throughput: 0: 11059.2. Samples: 219085312. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:08:30,915][1652475] Updated weights for policy 0, policy_version 427729 (0.0014) [2024-06-15 17:08:31,982][1651340] Signal inference workers to stop experience collection... (21950 times) [2024-06-15 17:08:32,048][1652475] InferenceWorker_p0-w0: stopping experience collection (21950 times) [2024-06-15 17:08:32,178][1651340] Signal inference workers to resume experience collection... (21950 times) [2024-06-15 17:08:32,179][1652475] InferenceWorker_p0-w0: resuming experience collection (21950 times) [2024-06-15 17:08:32,406][1652475] Updated weights for policy 0, policy_version 427799 (0.0031) [2024-06-15 17:08:33,965][1652475] Updated weights for policy 0, policy_version 427856 (0.0033) [2024-06-15 17:08:34,954][1652475] Updated weights for policy 0, policy_version 427902 (0.0013) [2024-06-15 17:08:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 876347392. Throughput: 0: 11025.1. Samples: 219116032. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:08:40,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 876478464. Throughput: 0: 10797.5. Samples: 219180544. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:08:42,752][1652475] Updated weights for policy 0, policy_version 428007 (0.0019) [2024-06-15 17:08:45,248][1652475] Updated weights for policy 0, policy_version 428052 (0.0012) [2024-06-15 17:08:45,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 876675072. Throughput: 0: 11082.0. Samples: 219253760. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:08:47,106][1652475] Updated weights for policy 0, policy_version 428128 (0.0017) [2024-06-15 17:08:50,771][1648984] Fps is (10 sec: 45724.6, 60 sec: 44758.4, 300 sec: 42871.3). Total num frames: 876937216. Throughput: 0: 10766.9. Samples: 219277824. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:50,771][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:08:50,829][1652475] Updated weights for policy 0, policy_version 428198 (0.0017) [2024-06-15 17:08:54,516][1652475] Updated weights for policy 0, policy_version 428260 (0.0014) [2024-06-15 17:08:55,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 44236.7, 300 sec: 42876.1). Total num frames: 877133824. Throughput: 0: 10774.7. Samples: 219337728. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:08:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:08:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000428288_877133824.pth... [2024-06-15 17:08:55,806][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000423296_866910208.pth [2024-06-15 17:08:57,348][1652475] Updated weights for policy 0, policy_version 428320 (0.0013) [2024-06-15 17:08:59,682][1652475] Updated weights for policy 0, policy_version 428368 (0.0012) [2024-06-15 17:09:00,738][1648984] Fps is (10 sec: 42739.1, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 877363200. Throughput: 0: 10729.2. Samples: 219404800. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:09:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:09:04,621][1652475] Updated weights for policy 0, policy_version 428464 (0.0016) [2024-06-15 17:09:05,738][1648984] Fps is (10 sec: 45874.4, 60 sec: 44782.8, 300 sec: 42987.1). Total num frames: 877592576. Throughput: 0: 10797.5. Samples: 219440640. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:09:05,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:09:05,848][1652475] Updated weights for policy 0, policy_version 428516 (0.0108) [2024-06-15 17:09:09,507][1652475] Updated weights for policy 0, policy_version 428601 (0.0013) [2024-06-15 17:09:10,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 877789184. Throughput: 0: 10581.3. Samples: 219495936. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:09:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:09:12,303][1652475] Updated weights for policy 0, policy_version 428640 (0.0012) [2024-06-15 17:09:15,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 42598.3, 300 sec: 42654.0). Total num frames: 877920256. Throughput: 0: 10604.1. Samples: 219562496. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:09:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:09:16,981][1652475] Updated weights for policy 0, policy_version 428688 (0.0015) [2024-06-15 17:09:19,289][1652475] Updated weights for policy 0, policy_version 428768 (0.0122) [2024-06-15 17:09:20,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 878182400. Throughput: 0: 10592.7. Samples: 219592704. Policy #0 lag: (min: 25.0, avg: 170.7, max: 281.0) [2024-06-15 17:09:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:09:22,312][1652475] Updated weights for policy 0, policy_version 428816 (0.0043) [2024-06-15 17:09:22,417][1651340] Signal inference workers to stop experience collection... (22000 times) [2024-06-15 17:09:22,475][1652475] InferenceWorker_p0-w0: stopping experience collection (22000 times) [2024-06-15 17:09:22,632][1651340] Signal inference workers to resume experience collection... (22000 times) [2024-06-15 17:09:22,633][1652475] InferenceWorker_p0-w0: resuming experience collection (22000 times) [2024-06-15 17:09:24,326][1652475] Updated weights for policy 0, policy_version 428884 (0.0016) [2024-06-15 17:09:25,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 878444544. Throughput: 0: 10467.5. Samples: 219651584. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:09:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 17:09:28,857][1652475] Updated weights for policy 0, policy_version 428960 (0.0027) [2024-06-15 17:09:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 878575616. Throughput: 0: 10240.0. Samples: 219714560. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:09:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:09:32,354][1652475] Updated weights for policy 0, policy_version 429040 (0.0174) [2024-06-15 17:09:35,485][1652475] Updated weights for policy 0, policy_version 429058 (0.0026) [2024-06-15 17:09:35,737][1648984] Fps is (10 sec: 29491.7, 60 sec: 39867.8, 300 sec: 42209.6). Total num frames: 878739456. Throughput: 0: 10384.2. Samples: 219744768. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:09:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:09:37,832][1652475] Updated weights for policy 0, policy_version 429181 (0.0012) [2024-06-15 17:09:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 879001600. Throughput: 0: 10524.5. Samples: 219811328. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:09:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:09:41,394][1652475] Updated weights for policy 0, policy_version 429240 (0.0013) [2024-06-15 17:09:44,271][1652475] Updated weights for policy 0, policy_version 429281 (0.0120) [2024-06-15 17:09:45,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 42598.3, 300 sec: 42765.0). Total num frames: 879230976. Throughput: 0: 10467.6. Samples: 219875840. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:09:45,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:09:48,180][1652475] Updated weights for policy 0, policy_version 429376 (0.0013) [2024-06-15 17:09:50,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 42621.8, 300 sec: 42654.0). Total num frames: 879493120. Throughput: 0: 10513.1. Samples: 219913728. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:09:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:09:52,703][1652475] Updated weights for policy 0, policy_version 429478 (0.0015) [2024-06-15 17:09:55,087][1652475] Updated weights for policy 0, policy_version 429507 (0.0015) [2024-06-15 17:09:55,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 879689728. Throughput: 0: 10808.9. Samples: 219982336. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:09:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:09:56,100][1652475] Updated weights for policy 0, policy_version 429554 (0.0013) [2024-06-15 17:09:59,253][1652475] Updated weights for policy 0, policy_version 429616 (0.0013) [2024-06-15 17:10:00,647][1652475] Updated weights for policy 0, policy_version 429688 (0.0042) [2024-06-15 17:10:00,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 879984640. Throughput: 0: 10899.9. Samples: 220052992. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:10:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:03,780][1652475] Updated weights for policy 0, policy_version 429754 (0.0014) [2024-06-15 17:10:05,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42598.6, 300 sec: 43098.3). Total num frames: 880148480. Throughput: 0: 11002.3. Samples: 220087808. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:10:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:07,515][1652475] Updated weights for policy 0, policy_version 429814 (0.0014) [2024-06-15 17:10:09,615][1651340] Signal inference workers to stop experience collection... (22050 times) [2024-06-15 17:10:09,674][1652475] InferenceWorker_p0-w0: stopping experience collection (22050 times) [2024-06-15 17:10:09,798][1651340] Signal inference workers to resume experience collection... (22050 times) [2024-06-15 17:10:09,800][1652475] InferenceWorker_p0-w0: resuming experience collection (22050 times) [2024-06-15 17:10:09,905][1652475] Updated weights for policy 0, policy_version 429840 (0.0117) [2024-06-15 17:10:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 880377856. Throughput: 0: 11320.9. Samples: 220161024. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:10:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:11,071][1652475] Updated weights for policy 0, policy_version 429904 (0.0020) [2024-06-15 17:10:15,004][1652475] Updated weights for policy 0, policy_version 429984 (0.0015) [2024-06-15 17:10:15,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45875.3, 300 sec: 43099.5). Total num frames: 880672768. Throughput: 0: 11355.0. Samples: 220225536. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:10:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:18,136][1652475] Updated weights for policy 0, policy_version 430034 (0.0013) [2024-06-15 17:10:20,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 880803840. Throughput: 0: 11514.3. Samples: 220262912. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:10:20,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:22,084][1652475] Updated weights for policy 0, policy_version 430117 (0.0012) [2024-06-15 17:10:23,566][1652475] Updated weights for policy 0, policy_version 430192 (0.0012) [2024-06-15 17:10:25,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 881065984. Throughput: 0: 11446.0. Samples: 220326400. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:10:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:26,548][1652475] Updated weights for policy 0, policy_version 430243 (0.0015) [2024-06-15 17:10:30,396][1652475] Updated weights for policy 0, policy_version 430329 (0.0017) [2024-06-15 17:10:30,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 43431.5). Total num frames: 881328128. Throughput: 0: 11525.7. Samples: 220394496. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:10:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:35,257][1652475] Updated weights for policy 0, policy_version 430432 (0.0014) [2024-06-15 17:10:35,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 46967.4, 300 sec: 43209.3). Total num frames: 881557504. Throughput: 0: 11548.5. Samples: 220433408. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:10:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:38,491][1652475] Updated weights for policy 0, policy_version 430496 (0.0013) [2024-06-15 17:10:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 45329.1, 300 sec: 43099.1). Total num frames: 881721344. Throughput: 0: 11241.3. Samples: 220488192. Policy #0 lag: (min: 15.0, avg: 132.1, max: 271.0) [2024-06-15 17:10:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:41,898][1652475] Updated weights for policy 0, policy_version 430560 (0.0031) [2024-06-15 17:10:42,545][1652475] Updated weights for policy 0, policy_version 430592 (0.0012) [2024-06-15 17:10:45,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 881885184. Throughput: 0: 11355.0. Samples: 220563968. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:10:45,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:46,340][1652475] Updated weights for policy 0, policy_version 430646 (0.0013) [2024-06-15 17:10:47,800][1652475] Updated weights for policy 0, policy_version 430714 (0.0021) [2024-06-15 17:10:50,424][1652475] Updated weights for policy 0, policy_version 430776 (0.0011) [2024-06-15 17:10:50,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 43098.2). Total num frames: 882245632. Throughput: 0: 11241.2. Samples: 220593664. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:10:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:10:52,903][1651340] Signal inference workers to stop experience collection... (22100 times) [2024-06-15 17:10:52,940][1652475] InferenceWorker_p0-w0: stopping experience collection (22100 times) [2024-06-15 17:10:53,122][1651340] Signal inference workers to resume experience collection... (22100 times) [2024-06-15 17:10:53,123][1652475] InferenceWorker_p0-w0: resuming experience collection (22100 times) [2024-06-15 17:10:53,125][1652475] Updated weights for policy 0, policy_version 430816 (0.0012) [2024-06-15 17:10:55,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 44783.0, 300 sec: 43209.3). Total num frames: 882376704. Throughput: 0: 11047.8. Samples: 220658176. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:10:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:10:55,782][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000430848_882376704.pth... [2024-06-15 17:10:55,839][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000425792_872022016.pth [2024-06-15 17:10:55,845][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000430848_882376704.pth [2024-06-15 17:10:59,120][1652475] Updated weights for policy 0, policy_version 430896 (0.0095) [2024-06-15 17:11:00,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 882606080. Throughput: 0: 10968.2. Samples: 220719104. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:11:01,013][1652475] Updated weights for policy 0, policy_version 430970 (0.0013) [2024-06-15 17:11:02,515][1652475] Updated weights for policy 0, policy_version 431024 (0.0013) [2024-06-15 17:11:05,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 882835456. Throughput: 0: 10763.4. Samples: 220747264. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:11:06,109][1652475] Updated weights for policy 0, policy_version 431099 (0.0013) [2024-06-15 17:11:10,741][1648984] Fps is (10 sec: 32757.4, 60 sec: 42596.2, 300 sec: 42986.7). Total num frames: 882933760. Throughput: 0: 10865.0. Samples: 220815360. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:10,742][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 17:11:11,455][1652475] Updated weights for policy 0, policy_version 431163 (0.0014) [2024-06-15 17:11:15,388][1652475] Updated weights for policy 0, policy_version 431248 (0.0012) [2024-06-15 17:11:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 883195904. Throughput: 0: 10615.5. Samples: 220872192. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:11:17,057][1652475] Updated weights for policy 0, policy_version 431313 (0.0024) [2024-06-15 17:11:20,738][1648984] Fps is (10 sec: 49167.5, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 883425280. Throughput: 0: 10308.3. Samples: 220897280. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:11:22,919][1652475] Updated weights for policy 0, policy_version 431376 (0.0013) [2024-06-15 17:11:25,738][1648984] Fps is (10 sec: 36043.3, 60 sec: 41505.9, 300 sec: 42764.9). Total num frames: 883556352. Throughput: 0: 10501.6. Samples: 220960768. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:25,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:11:27,590][1652475] Updated weights for policy 0, policy_version 431456 (0.0098) [2024-06-15 17:11:29,901][1652475] Updated weights for policy 0, policy_version 431553 (0.0015) [2024-06-15 17:11:30,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 883884032. Throughput: 0: 10194.5. Samples: 221022720. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:11:30,926][1652475] Updated weights for policy 0, policy_version 431614 (0.0026) [2024-06-15 17:11:35,738][1648984] Fps is (10 sec: 45877.2, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 884015104. Throughput: 0: 10285.5. Samples: 221056512. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:11:36,354][1652475] Updated weights for policy 0, policy_version 431669 (0.0014) [2024-06-15 17:11:39,819][1652475] Updated weights for policy 0, policy_version 431740 (0.0012) [2024-06-15 17:11:40,502][1651340] Signal inference workers to stop experience collection... (22150 times) [2024-06-15 17:11:40,594][1652475] InferenceWorker_p0-w0: stopping experience collection (22150 times) [2024-06-15 17:11:40,719][1651340] Signal inference workers to resume experience collection... (22150 times) [2024-06-15 17:11:40,734][1652475] InferenceWorker_p0-w0: resuming experience collection (22150 times) [2024-06-15 17:11:40,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 884244480. Throughput: 0: 10285.5. Samples: 221121024. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:40,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:11:42,916][1652475] Updated weights for policy 0, policy_version 431811 (0.0121) [2024-06-15 17:11:45,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 884473856. Throughput: 0: 10274.1. Samples: 221181440. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:11:48,618][1652475] Updated weights for policy 0, policy_version 431875 (0.0015) [2024-06-15 17:11:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 39867.7, 300 sec: 42765.0). Total num frames: 884637696. Throughput: 0: 10752.0. Samples: 221231104. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:11:50,801][1652475] Updated weights for policy 0, policy_version 431955 (0.0016) [2024-06-15 17:11:52,137][1652475] Updated weights for policy 0, policy_version 432019 (0.0012) [2024-06-15 17:11:55,213][1652475] Updated weights for policy 0, policy_version 432097 (0.0014) [2024-06-15 17:11:55,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 884998144. Throughput: 0: 10434.2. Samples: 221284864. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:11:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:12:00,476][1652475] Updated weights for policy 0, policy_version 432129 (0.0013) [2024-06-15 17:12:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40413.9, 300 sec: 43209.3). Total num frames: 885030912. Throughput: 0: 11025.1. Samples: 221368320. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:12:00,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:02,167][1652475] Updated weights for policy 0, policy_version 432200 (0.0016) [2024-06-15 17:12:03,351][1652475] Updated weights for policy 0, policy_version 432260 (0.0014) [2024-06-15 17:12:05,689][1652475] Updated weights for policy 0, policy_version 432336 (0.0015) [2024-06-15 17:12:05,746][1648984] Fps is (10 sec: 42563.4, 60 sec: 43138.6, 300 sec: 43208.1). Total num frames: 885424128. Throughput: 0: 11045.8. Samples: 221394432. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:12:05,746][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:10,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43146.8, 300 sec: 43209.3). Total num frames: 885522432. Throughput: 0: 11264.1. Samples: 221467648. Policy #0 lag: (min: 15.0, avg: 96.6, max: 271.0) [2024-06-15 17:12:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:12,502][1652475] Updated weights for policy 0, policy_version 432448 (0.0245) [2024-06-15 17:12:13,985][1652475] Updated weights for policy 0, policy_version 432514 (0.0094) [2024-06-15 17:12:14,935][1652475] Updated weights for policy 0, policy_version 432571 (0.0012) [2024-06-15 17:12:15,738][1648984] Fps is (10 sec: 49191.6, 60 sec: 45328.9, 300 sec: 43875.8). Total num frames: 885915648. Throughput: 0: 11309.5. Samples: 221531648. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:12:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:18,218][1652475] Updated weights for policy 0, policy_version 432624 (0.0013) [2024-06-15 17:12:20,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 886046720. Throughput: 0: 11320.9. Samples: 221565952. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:12:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:23,809][1651340] Signal inference workers to stop experience collection... (22200 times) [2024-06-15 17:12:23,849][1652475] InferenceWorker_p0-w0: stopping experience collection (22200 times) [2024-06-15 17:12:23,866][1652475] Updated weights for policy 0, policy_version 432691 (0.0014) [2024-06-15 17:12:24,040][1651340] Signal inference workers to resume experience collection... (22200 times) [2024-06-15 17:12:24,041][1652475] InferenceWorker_p0-w0: resuming experience collection (22200 times) [2024-06-15 17:12:25,297][1652475] Updated weights for policy 0, policy_version 432761 (0.0015) [2024-06-15 17:12:25,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 46421.7, 300 sec: 44098.1). Total num frames: 886341632. Throughput: 0: 11559.8. Samples: 221641216. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:12:25,743][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:26,503][1652475] Updated weights for policy 0, policy_version 432829 (0.0013) [2024-06-15 17:12:30,347][1652475] Updated weights for policy 0, policy_version 432893 (0.0023) [2024-06-15 17:12:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 886571008. Throughput: 0: 11582.6. Samples: 221702656. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:12:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:35,402][1652475] Updated weights for policy 0, policy_version 432960 (0.0013) [2024-06-15 17:12:35,738][1648984] Fps is (10 sec: 39320.3, 60 sec: 45328.8, 300 sec: 43653.6). Total num frames: 886734848. Throughput: 0: 11457.3. Samples: 221746688. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:12:35,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:37,589][1652475] Updated weights for policy 0, policy_version 433072 (0.0243) [2024-06-15 17:12:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 886964224. Throughput: 0: 11548.4. Samples: 221804544. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:12:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:42,184][1652475] Updated weights for policy 0, policy_version 433148 (0.0014) [2024-06-15 17:12:45,738][1648984] Fps is (10 sec: 36045.7, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 887095296. Throughput: 0: 11252.6. Samples: 221874688. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:12:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:47,382][1652475] Updated weights for policy 0, policy_version 433216 (0.0014) [2024-06-15 17:12:48,892][1652475] Updated weights for policy 0, policy_version 433296 (0.0112) [2024-06-15 17:12:50,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 44097.9). Total num frames: 887488512. Throughput: 0: 11391.2. Samples: 221906944. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:12:50,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:52,719][1652475] Updated weights for policy 0, policy_version 433345 (0.0012) [2024-06-15 17:12:54,134][1652475] Updated weights for policy 0, policy_version 433402 (0.0013) [2024-06-15 17:12:55,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 887619584. Throughput: 0: 11252.6. Samples: 221974016. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:12:55,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:12:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000433408_887619584.pth... [2024-06-15 17:12:55,791][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000428288_877133824.pth [2024-06-15 17:12:58,561][1652475] Updated weights for policy 0, policy_version 433441 (0.0015) [2024-06-15 17:12:59,875][1652475] Updated weights for policy 0, policy_version 433520 (0.0014) [2024-06-15 17:13:00,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 44098.0). Total num frames: 887914496. Throughput: 0: 11377.8. Samples: 222043648. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:13:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:13:01,416][1652475] Updated weights for policy 0, policy_version 433598 (0.0011) [2024-06-15 17:13:03,513][1651340] Signal inference workers to stop experience collection... (22250 times) [2024-06-15 17:13:03,566][1652475] InferenceWorker_p0-w0: stopping experience collection (22250 times) [2024-06-15 17:13:03,896][1651340] Signal inference workers to resume experience collection... (22250 times) [2024-06-15 17:13:03,896][1652475] InferenceWorker_p0-w0: resuming experience collection (22250 times) [2024-06-15 17:13:04,847][1652475] Updated weights for policy 0, policy_version 433652 (0.0013) [2024-06-15 17:13:05,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 45335.3, 300 sec: 43653.6). Total num frames: 888143872. Throughput: 0: 11423.3. Samples: 222080000. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:13:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:13:09,738][1652475] Updated weights for policy 0, policy_version 433712 (0.0013) [2024-06-15 17:13:10,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 46967.5, 300 sec: 43986.9). Total num frames: 888340480. Throughput: 0: 11468.8. Samples: 222157312. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:13:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:13:11,240][1652475] Updated weights for policy 0, policy_version 433782 (0.0121) [2024-06-15 17:13:12,718][1652475] Updated weights for policy 0, policy_version 433856 (0.0092) [2024-06-15 17:13:15,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44783.1, 300 sec: 43986.9). Total num frames: 888602624. Throughput: 0: 11457.4. Samples: 222218240. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:13:15,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:13:16,183][1652475] Updated weights for policy 0, policy_version 433920 (0.0013) [2024-06-15 17:13:20,742][1648984] Fps is (10 sec: 45855.8, 60 sec: 45872.0, 300 sec: 43986.3). Total num frames: 888799232. Throughput: 0: 11331.3. Samples: 222256640. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:13:20,742][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:13:21,828][1652475] Updated weights for policy 0, policy_version 434042 (0.0016) [2024-06-15 17:13:25,449][1652475] Updated weights for policy 0, policy_version 434103 (0.0012) [2024-06-15 17:13:25,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 45329.1, 300 sec: 44431.2). Total num frames: 889061376. Throughput: 0: 11400.5. Samples: 222317568. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:13:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:13:27,580][1652475] Updated weights for policy 0, policy_version 434174 (0.0125) [2024-06-15 17:13:30,738][1648984] Fps is (10 sec: 39338.1, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 889192448. Throughput: 0: 11264.0. Samples: 222381568. Policy #0 lag: (min: 4.0, avg: 64.9, max: 260.0) [2024-06-15 17:13:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 17:13:32,097][1652475] Updated weights for policy 0, policy_version 434228 (0.0015) [2024-06-15 17:13:35,651][1652475] Updated weights for policy 0, policy_version 434273 (0.0032) [2024-06-15 17:13:35,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 44237.0, 300 sec: 43764.7). Total num frames: 889389056. Throughput: 0: 11218.5. Samples: 222411776. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:13:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:13:37,416][1652475] Updated weights for policy 0, policy_version 434320 (0.0013) [2024-06-15 17:13:39,517][1652475] Updated weights for policy 0, policy_version 434400 (0.0095) [2024-06-15 17:13:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 44209.0). Total num frames: 889716736. Throughput: 0: 11104.7. Samples: 222473728. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:13:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:13:45,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 43769.6). Total num frames: 889847808. Throughput: 0: 10888.5. Samples: 222533632. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:13:45,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:13:47,693][1652475] Updated weights for policy 0, policy_version 434497 (0.0019) [2024-06-15 17:13:48,898][1652475] Updated weights for policy 0, policy_version 434547 (0.0046) [2024-06-15 17:13:49,421][1651340] Signal inference workers to stop experience collection... (22300 times) [2024-06-15 17:13:49,475][1652475] InferenceWorker_p0-w0: stopping experience collection (22300 times) [2024-06-15 17:13:49,695][1651340] Signal inference workers to resume experience collection... (22300 times) [2024-06-15 17:13:49,695][1652475] InferenceWorker_p0-w0: resuming experience collection (22300 times) [2024-06-15 17:13:50,718][1652475] Updated weights for policy 0, policy_version 434614 (0.0092) [2024-06-15 17:13:50,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 43875.8). Total num frames: 890077184. Throughput: 0: 10899.9. Samples: 222570496. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:13:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:13:53,176][1652475] Updated weights for policy 0, policy_version 434677 (0.0123) [2024-06-15 17:13:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 890241024. Throughput: 0: 10274.1. Samples: 222619648. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:13:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:13:57,317][1652475] Updated weights for policy 0, policy_version 434720 (0.0015) [2024-06-15 17:13:59,325][1652475] Updated weights for policy 0, policy_version 434756 (0.0014) [2024-06-15 17:14:00,533][1652475] Updated weights for policy 0, policy_version 434805 (0.0011) [2024-06-15 17:14:00,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43144.6, 300 sec: 43764.8). Total num frames: 890503168. Throughput: 0: 10513.1. Samples: 222691328. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:14:02,134][1652475] Updated weights for policy 0, policy_version 434871 (0.0011) [2024-06-15 17:14:05,519][1652475] Updated weights for policy 0, policy_version 434928 (0.0041) [2024-06-15 17:14:05,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 890765312. Throughput: 0: 10332.0. Samples: 222721536. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:14:10,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 41506.1, 300 sec: 43764.7). Total num frames: 890830848. Throughput: 0: 10570.0. Samples: 222793216. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:14:10,942][1652475] Updated weights for policy 0, policy_version 434994 (0.0014) [2024-06-15 17:14:12,085][1652475] Updated weights for policy 0, policy_version 435040 (0.0100) [2024-06-15 17:14:13,444][1652475] Updated weights for policy 0, policy_version 435088 (0.0085) [2024-06-15 17:14:15,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 42598.3, 300 sec: 43986.9). Total num frames: 891158528. Throughput: 0: 10353.8. Samples: 222847488. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:14:16,996][1652475] Updated weights for policy 0, policy_version 435152 (0.0015) [2024-06-15 17:14:20,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 41509.0, 300 sec: 43542.6). Total num frames: 891289600. Throughput: 0: 10478.9. Samples: 222883328. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:14:22,176][1652475] Updated weights for policy 0, policy_version 435232 (0.0016) [2024-06-15 17:14:24,449][1652475] Updated weights for policy 0, policy_version 435312 (0.0124) [2024-06-15 17:14:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 891551744. Throughput: 0: 10569.9. Samples: 222949376. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:14:26,921][1652475] Updated weights for policy 0, policy_version 435384 (0.0032) [2024-06-15 17:14:29,615][1652475] Updated weights for policy 0, policy_version 435448 (0.0017) [2024-06-15 17:14:30,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 891813888. Throughput: 0: 10638.3. Samples: 223012352. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:14:34,862][1652475] Updated weights for policy 0, policy_version 435504 (0.0019) [2024-06-15 17:14:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43144.6, 300 sec: 43986.9). Total num frames: 891977728. Throughput: 0: 10706.5. Samples: 223052288. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:14:35,895][1651340] Signal inference workers to stop experience collection... (22350 times) [2024-06-15 17:14:35,943][1652475] InferenceWorker_p0-w0: stopping experience collection (22350 times) [2024-06-15 17:14:36,210][1651340] Signal inference workers to resume experience collection... (22350 times) [2024-06-15 17:14:36,211][1652475] InferenceWorker_p0-w0: resuming experience collection (22350 times) [2024-06-15 17:14:38,066][1652475] Updated weights for policy 0, policy_version 435585 (0.0015) [2024-06-15 17:14:39,355][1652475] Updated weights for policy 0, policy_version 435646 (0.0012) [2024-06-15 17:14:40,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 892207104. Throughput: 0: 10831.7. Samples: 223107072. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:14:45,670][1652475] Updated weights for policy 0, policy_version 435714 (0.0013) [2024-06-15 17:14:45,741][1648984] Fps is (10 sec: 36044.5, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 892338176. Throughput: 0: 10956.8. Samples: 223184384. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:45,742][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:14:47,789][1652475] Updated weights for policy 0, policy_version 435794 (0.0017) [2024-06-15 17:14:49,917][1652475] Updated weights for policy 0, policy_version 435845 (0.0014) [2024-06-15 17:14:50,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43144.5, 300 sec: 43986.9). Total num frames: 892665856. Throughput: 0: 10808.8. Samples: 223207936. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:14:51,025][1652475] Updated weights for policy 0, policy_version 435892 (0.0015) [2024-06-15 17:14:53,225][1652475] Updated weights for policy 0, policy_version 435945 (0.0014) [2024-06-15 17:14:55,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 892862464. Throughput: 0: 10786.1. Samples: 223278592. Policy #0 lag: (min: 15.0, avg: 118.6, max: 287.0) [2024-06-15 17:14:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:14:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000435968_892862464.pth... [2024-06-15 17:14:55,803][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000430848_882376704.pth [2024-06-15 17:14:57,886][1652475] Updated weights for policy 0, policy_version 436000 (0.0022) [2024-06-15 17:15:00,509][1652475] Updated weights for policy 0, policy_version 436096 (0.0108) [2024-06-15 17:15:00,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 893124608. Throughput: 0: 10843.0. Samples: 223335424. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:15:02,739][1652475] Updated weights for policy 0, policy_version 436157 (0.0013) [2024-06-15 17:15:05,671][1652475] Updated weights for policy 0, policy_version 436222 (0.0111) [2024-06-15 17:15:05,738][1648984] Fps is (10 sec: 52427.1, 60 sec: 43690.4, 300 sec: 44097.9). Total num frames: 893386752. Throughput: 0: 10888.5. Samples: 223373312. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:05,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:15:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 893485056. Throughput: 0: 11036.5. Samples: 223446016. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:10,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:15:11,035][1652475] Updated weights for policy 0, policy_version 436288 (0.0013) [2024-06-15 17:15:12,660][1652475] Updated weights for policy 0, policy_version 436349 (0.0013) [2024-06-15 17:15:15,488][1652475] Updated weights for policy 0, policy_version 436413 (0.0015) [2024-06-15 17:15:15,738][1648984] Fps is (10 sec: 39323.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 893779968. Throughput: 0: 10888.5. Samples: 223502336. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:15:17,814][1652475] Updated weights for policy 0, policy_version 436454 (0.0017) [2024-06-15 17:15:20,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 893911040. Throughput: 0: 10740.6. Samples: 223535616. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:15:22,114][1652475] Updated weights for policy 0, policy_version 436502 (0.0020) [2024-06-15 17:15:22,805][1651340] Signal inference workers to stop experience collection... (22400 times) [2024-06-15 17:15:22,875][1652475] InferenceWorker_p0-w0: stopping experience collection (22400 times) [2024-06-15 17:15:23,150][1651340] Signal inference workers to resume experience collection... (22400 times) [2024-06-15 17:15:23,154][1652475] InferenceWorker_p0-w0: resuming experience collection (22400 times) [2024-06-15 17:15:23,348][1652475] Updated weights for policy 0, policy_version 436546 (0.0011) [2024-06-15 17:15:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 894173184. Throughput: 0: 10956.8. Samples: 223600128. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:15:26,722][1652475] Updated weights for policy 0, policy_version 436629 (0.0012) [2024-06-15 17:15:27,281][1652475] Updated weights for policy 0, policy_version 436665 (0.0012) [2024-06-15 17:15:29,699][1652475] Updated weights for policy 0, policy_version 436733 (0.0013) [2024-06-15 17:15:30,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 894435328. Throughput: 0: 10797.5. Samples: 223670272. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:15:35,417][1652475] Updated weights for policy 0, policy_version 436807 (0.0036) [2024-06-15 17:15:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 894599168. Throughput: 0: 11138.9. Samples: 223709184. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:35,738][1648984] Avg episode reward: [(0, '-0.120')] [2024-06-15 17:15:36,420][1651340] Saving new best policy, reward=-0.120! [2024-06-15 17:15:36,694][1652475] Updated weights for policy 0, policy_version 436856 (0.0137) [2024-06-15 17:15:39,619][1652475] Updated weights for policy 0, policy_version 436912 (0.0017) [2024-06-15 17:15:40,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 894861312. Throughput: 0: 10877.1. Samples: 223768064. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:15:40,985][1652475] Updated weights for policy 0, policy_version 436961 (0.0016) [2024-06-15 17:15:45,487][1652475] Updated weights for policy 0, policy_version 437008 (0.0013) [2024-06-15 17:15:45,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 894992384. Throughput: 0: 11195.7. Samples: 223839232. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:15:46,691][1652475] Updated weights for policy 0, policy_version 437056 (0.0026) [2024-06-15 17:15:50,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 895221760. Throughput: 0: 10820.3. Samples: 223860224. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 17:15:51,725][1652475] Updated weights for policy 0, policy_version 437142 (0.0014) [2024-06-15 17:15:52,630][1652475] Updated weights for policy 0, policy_version 437184 (0.0013) [2024-06-15 17:15:54,479][1652475] Updated weights for policy 0, policy_version 437246 (0.0013) [2024-06-15 17:15:55,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 895483904. Throughput: 0: 10592.7. Samples: 223922688. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:15:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 17:15:58,343][1652475] Updated weights for policy 0, policy_version 437308 (0.0012) [2024-06-15 17:16:00,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.3, 300 sec: 43542.6). Total num frames: 895680512. Throughput: 0: 10877.1. Samples: 223991808. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:16:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 17:16:01,133][1652475] Updated weights for policy 0, policy_version 437374 (0.0110) [2024-06-15 17:16:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.3, 300 sec: 43876.3). Total num frames: 895877120. Throughput: 0: 10786.1. Samples: 224020992. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:16:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:16:05,741][1652475] Updated weights for policy 0, policy_version 437442 (0.0092) [2024-06-15 17:16:06,923][1652475] Updated weights for policy 0, policy_version 437495 (0.0012) [2024-06-15 17:16:09,811][1651340] Signal inference workers to stop experience collection... (22450 times) [2024-06-15 17:16:09,846][1652475] InferenceWorker_p0-w0: stopping experience collection (22450 times) [2024-06-15 17:16:09,867][1652475] Updated weights for policy 0, policy_version 437537 (0.0012) [2024-06-15 17:16:10,066][1651340] Signal inference workers to resume experience collection... (22450 times) [2024-06-15 17:16:10,070][1652475] InferenceWorker_p0-w0: resuming experience collection (22450 times) [2024-06-15 17:16:10,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 896139264. Throughput: 0: 10854.4. Samples: 224088576. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:16:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:16:12,276][1652475] Updated weights for policy 0, policy_version 437584 (0.0014) [2024-06-15 17:16:15,738][1648984] Fps is (10 sec: 39320.3, 60 sec: 41505.9, 300 sec: 43542.5). Total num frames: 896270336. Throughput: 0: 10717.8. Samples: 224152576. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:16:15,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:16:16,070][1652475] Updated weights for policy 0, policy_version 437648 (0.0014) [2024-06-15 17:16:17,956][1652475] Updated weights for policy 0, policy_version 437712 (0.0014) [2024-06-15 17:16:20,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 896532480. Throughput: 0: 10467.6. Samples: 224180224. Policy #0 lag: (min: 31.0, avg: 112.2, max: 287.0) [2024-06-15 17:16:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:16:23,536][1652475] Updated weights for policy 0, policy_version 437778 (0.0018) [2024-06-15 17:16:24,647][1652475] Updated weights for policy 0, policy_version 437826 (0.0012) [2024-06-15 17:16:25,738][1648984] Fps is (10 sec: 49153.9, 60 sec: 43144.6, 300 sec: 43653.6). Total num frames: 896761856. Throughput: 0: 10661.0. Samples: 224247808. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:16:25,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 17:16:25,902][1652475] Updated weights for policy 0, policy_version 437888 (0.0020) [2024-06-15 17:16:29,131][1652475] Updated weights for policy 0, policy_version 437940 (0.0018) [2024-06-15 17:16:30,340][1652475] Updated weights for policy 0, policy_version 437984 (0.0012) [2024-06-15 17:16:30,742][1648984] Fps is (10 sec: 45854.2, 60 sec: 42595.2, 300 sec: 43986.2). Total num frames: 896991232. Throughput: 0: 10432.4. Samples: 224308736. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:16:30,743][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 17:16:31,236][1652475] Updated weights for policy 0, policy_version 438016 (0.0027) [2024-06-15 17:16:35,738][1648984] Fps is (10 sec: 36044.1, 60 sec: 42052.1, 300 sec: 43653.6). Total num frames: 897122304. Throughput: 0: 10774.7. Samples: 224345088. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:16:35,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:16:36,022][1652475] Updated weights for policy 0, policy_version 438068 (0.0036) [2024-06-15 17:16:37,569][1652475] Updated weights for policy 0, policy_version 438136 (0.0017) [2024-06-15 17:16:40,738][1648984] Fps is (10 sec: 42617.6, 60 sec: 42598.4, 300 sec: 43875.8). Total num frames: 897417216. Throughput: 0: 10911.3. Samples: 224413696. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:16:40,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:16:40,814][1652475] Updated weights for policy 0, policy_version 438199 (0.0013) [2024-06-15 17:16:42,455][1652475] Updated weights for policy 0, policy_version 438242 (0.0030) [2024-06-15 17:16:45,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 897581056. Throughput: 0: 10752.0. Samples: 224475648. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:16:45,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:16:47,146][1652475] Updated weights for policy 0, policy_version 438296 (0.0015) [2024-06-15 17:16:47,750][1652475] Updated weights for policy 0, policy_version 438335 (0.0012) [2024-06-15 17:16:49,572][1652475] Updated weights for policy 0, policy_version 438394 (0.0092) [2024-06-15 17:16:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 897843200. Throughput: 0: 10854.4. Samples: 224509440. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:16:50,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:16:52,893][1652475] Updated weights for policy 0, policy_version 438459 (0.0013) [2024-06-15 17:16:54,842][1652475] Updated weights for policy 0, policy_version 438512 (0.0021) [2024-06-15 17:16:55,740][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 44320.1). Total num frames: 898105344. Throughput: 0: 10774.7. Samples: 224573440. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:16:55,741][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:16:55,770][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000438528_898105344.pth... [2024-06-15 17:16:55,853][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000433408_887619584.pth [2024-06-15 17:16:58,350][1651340] Signal inference workers to stop experience collection... (22500 times) [2024-06-15 17:16:58,360][1652475] Updated weights for policy 0, policy_version 438547 (0.0032) [2024-06-15 17:16:58,414][1652475] InferenceWorker_p0-w0: stopping experience collection (22500 times) [2024-06-15 17:16:58,544][1651340] Signal inference workers to resume experience collection... (22500 times) [2024-06-15 17:16:58,545][1652475] InferenceWorker_p0-w0: resuming experience collection (22500 times) [2024-06-15 17:17:00,155][1652475] Updated weights for policy 0, policy_version 438593 (0.0015) [2024-06-15 17:17:00,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.6, 300 sec: 43543.8). Total num frames: 898269184. Throughput: 0: 10934.1. Samples: 224644608. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:17:00,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:17:03,365][1652475] Updated weights for policy 0, policy_version 438657 (0.0014) [2024-06-15 17:17:04,841][1652475] Updated weights for policy 0, policy_version 438718 (0.0015) [2024-06-15 17:17:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 898498560. Throughput: 0: 11059.2. Samples: 224677888. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:17:05,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:17:06,560][1652475] Updated weights for policy 0, policy_version 438775 (0.0012) [2024-06-15 17:17:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 898695168. Throughput: 0: 10990.9. Samples: 224742400. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:17:10,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:17:10,966][1652475] Updated weights for policy 0, policy_version 438826 (0.0012) [2024-06-15 17:17:12,403][1652475] Updated weights for policy 0, policy_version 438849 (0.0018) [2024-06-15 17:17:13,670][1652475] Updated weights for policy 0, policy_version 438911 (0.0014) [2024-06-15 17:17:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44783.2, 300 sec: 43764.7). Total num frames: 898957312. Throughput: 0: 11185.5. Samples: 224812032. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:17:15,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:17:17,539][1652475] Updated weights for policy 0, policy_version 438992 (0.0232) [2024-06-15 17:17:20,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 899153920. Throughput: 0: 10945.4. Samples: 224837632. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:17:20,739][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:17:22,141][1652475] Updated weights for policy 0, policy_version 439043 (0.0012) [2024-06-15 17:17:23,574][1652475] Updated weights for policy 0, policy_version 439104 (0.0014) [2024-06-15 17:17:25,659][1652475] Updated weights for policy 0, policy_version 439164 (0.0013) [2024-06-15 17:17:25,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 899383296. Throughput: 0: 10865.8. Samples: 224902656. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:17:25,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:17:28,132][1652475] Updated weights for policy 0, policy_version 439223 (0.0066) [2024-06-15 17:17:30,594][1652475] Updated weights for policy 0, policy_version 439280 (0.0013) [2024-06-15 17:17:30,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 44240.1, 300 sec: 43764.8). Total num frames: 899645440. Throughput: 0: 10990.9. Samples: 224970240. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:17:30,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:17:34,364][1652475] Updated weights for policy 0, policy_version 439317 (0.0055) [2024-06-15 17:17:35,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44783.1, 300 sec: 43542.6). Total num frames: 899809280. Throughput: 0: 11025.1. Samples: 225005568. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:17:35,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:17:36,354][1652475] Updated weights for policy 0, policy_version 439376 (0.0013) [2024-06-15 17:17:37,172][1652475] Updated weights for policy 0, policy_version 439419 (0.0015) [2024-06-15 17:17:39,786][1652475] Updated weights for policy 0, policy_version 439483 (0.0013) [2024-06-15 17:17:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 900071424. Throughput: 0: 11104.7. Samples: 225073152. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:17:40,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:17:42,011][1652475] Updated weights for policy 0, policy_version 439551 (0.0013) [2024-06-15 17:17:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 900235264. Throughput: 0: 11070.6. Samples: 225142784. Policy #0 lag: (min: 15.0, avg: 113.4, max: 271.0) [2024-06-15 17:17:45,741][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 17:17:46,912][1652475] Updated weights for policy 0, policy_version 439611 (0.0012) [2024-06-15 17:17:48,296][1651340] Signal inference workers to stop experience collection... (22550 times) [2024-06-15 17:17:48,345][1652475] InferenceWorker_p0-w0: stopping experience collection (22550 times) [2024-06-15 17:17:48,564][1651340] Signal inference workers to resume experience collection... (22550 times) [2024-06-15 17:17:48,565][1652475] InferenceWorker_p0-w0: resuming experience collection (22550 times) [2024-06-15 17:17:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 900464640. Throughput: 0: 11047.8. Samples: 225175040. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:17:50,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 17:17:51,110][1652475] Updated weights for policy 0, policy_version 439696 (0.0110) [2024-06-15 17:17:53,334][1652475] Updated weights for policy 0, policy_version 439760 (0.0013) [2024-06-15 17:17:55,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 900726784. Throughput: 0: 10854.4. Samples: 225230848. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:17:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:17:58,513][1652475] Updated weights for policy 0, policy_version 439840 (0.0014) [2024-06-15 17:18:00,331][1652475] Updated weights for policy 0, policy_version 439888 (0.0012) [2024-06-15 17:18:00,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 900923392. Throughput: 0: 10831.7. Samples: 225299456. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:18:04,898][1652475] Updated weights for policy 0, policy_version 439974 (0.0015) [2024-06-15 17:18:05,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 901120000. Throughput: 0: 11002.3. Samples: 225332736. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:18:07,003][1652475] Updated weights for policy 0, policy_version 440056 (0.0014) [2024-06-15 17:18:10,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 901283840. Throughput: 0: 10911.3. Samples: 225393664. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:18:11,677][1652475] Updated weights for policy 0, policy_version 440128 (0.0016) [2024-06-15 17:18:12,948][1652475] Updated weights for policy 0, policy_version 440192 (0.0030) [2024-06-15 17:18:15,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 43098.9). Total num frames: 901513216. Throughput: 0: 10922.7. Samples: 225461760. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:18:17,900][1652475] Updated weights for policy 0, policy_version 440258 (0.0051) [2024-06-15 17:18:20,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 901775360. Throughput: 0: 10774.8. Samples: 225490432. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:18:22,275][1652475] Updated weights for policy 0, policy_version 440324 (0.0017) [2024-06-15 17:18:23,555][1652475] Updated weights for policy 0, policy_version 440384 (0.0012) [2024-06-15 17:18:25,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 44236.6, 300 sec: 43542.5). Total num frames: 902037504. Throughput: 0: 10729.2. Samples: 225555968. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:18:28,634][1652475] Updated weights for policy 0, policy_version 440451 (0.0012) [2024-06-15 17:18:30,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 902234112. Throughput: 0: 10478.9. Samples: 225614336. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:30,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 17:18:31,034][1652475] Updated weights for policy 0, policy_version 440560 (0.0015) [2024-06-15 17:18:35,738][1648984] Fps is (10 sec: 26214.7, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 902299648. Throughput: 0: 10535.8. Samples: 225649152. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:35,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 17:18:36,363][1651340] Signal inference workers to stop experience collection... (22600 times) [2024-06-15 17:18:36,423][1652475] InferenceWorker_p0-w0: stopping experience collection (22600 times) [2024-06-15 17:18:36,589][1651340] Signal inference workers to resume experience collection... (22600 times) [2024-06-15 17:18:36,590][1652475] InferenceWorker_p0-w0: resuming experience collection (22600 times) [2024-06-15 17:18:36,592][1652475] Updated weights for policy 0, policy_version 440624 (0.0014) [2024-06-15 17:18:38,217][1652475] Updated weights for policy 0, policy_version 440702 (0.0015) [2024-06-15 17:18:40,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 902627328. Throughput: 0: 10672.4. Samples: 225711104. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:18:41,401][1652475] Updated weights for policy 0, policy_version 440768 (0.0014) [2024-06-15 17:18:44,327][1652475] Updated weights for policy 0, policy_version 440830 (0.0013) [2024-06-15 17:18:45,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 902823936. Throughput: 0: 10558.6. Samples: 225774592. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:18:48,667][1652475] Updated weights for policy 0, policy_version 440893 (0.0016) [2024-06-15 17:18:50,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 903053312. Throughput: 0: 10626.8. Samples: 225810944. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:18:50,789][1652475] Updated weights for policy 0, policy_version 440954 (0.0015) [2024-06-15 17:18:53,012][1652475] Updated weights for policy 0, policy_version 441022 (0.0014) [2024-06-15 17:18:55,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 903282688. Throughput: 0: 10672.3. Samples: 225873920. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:18:55,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:18:56,076][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000441072_903315456.pth... [2024-06-15 17:18:56,113][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000435968_892862464.pth [2024-06-15 17:18:56,337][1652475] Updated weights for policy 0, policy_version 441085 (0.0012) [2024-06-15 17:19:00,520][1652475] Updated weights for policy 0, policy_version 441150 (0.0014) [2024-06-15 17:19:00,749][1648984] Fps is (10 sec: 42551.5, 60 sec: 42590.6, 300 sec: 43096.6). Total num frames: 903479296. Throughput: 0: 10738.0. Samples: 225945088. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:19:00,749][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:02,333][1652475] Updated weights for policy 0, policy_version 441208 (0.0014) [2024-06-15 17:19:04,430][1652475] Updated weights for policy 0, policy_version 441268 (0.0015) [2024-06-15 17:19:05,738][1648984] Fps is (10 sec: 45876.4, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 903741440. Throughput: 0: 10808.9. Samples: 225976832. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:19:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:07,478][1652475] Updated weights for policy 0, policy_version 441312 (0.0109) [2024-06-15 17:19:10,739][1648984] Fps is (10 sec: 39358.8, 60 sec: 43143.4, 300 sec: 43098.0). Total num frames: 903872512. Throughput: 0: 10842.7. Samples: 226043904. Policy #0 lag: (min: 47.0, avg: 141.5, max: 303.0) [2024-06-15 17:19:10,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:11,470][1652475] Updated weights for policy 0, policy_version 441360 (0.0013) [2024-06-15 17:19:13,150][1652475] Updated weights for policy 0, policy_version 441440 (0.0136) [2024-06-15 17:19:14,008][1652475] Updated weights for policy 0, policy_version 441471 (0.0011) [2024-06-15 17:19:15,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 904167424. Throughput: 0: 11025.1. Samples: 226110464. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:19:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:16,450][1652475] Updated weights for policy 0, policy_version 441520 (0.0013) [2024-06-15 17:19:19,256][1652475] Updated weights for policy 0, policy_version 441592 (0.0015) [2024-06-15 17:19:20,738][1648984] Fps is (10 sec: 52437.0, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 904396800. Throughput: 0: 10968.2. Samples: 226142720. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:19:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:23,562][1651340] Signal inference workers to stop experience collection... (22650 times) [2024-06-15 17:19:23,601][1652475] InferenceWorker_p0-w0: stopping experience collection (22650 times) [2024-06-15 17:19:23,779][1651340] Signal inference workers to resume experience collection... (22650 times) [2024-06-15 17:19:23,780][1652475] InferenceWorker_p0-w0: resuming experience collection (22650 times) [2024-06-15 17:19:23,986][1652475] Updated weights for policy 0, policy_version 441658 (0.0108) [2024-06-15 17:19:25,139][1652475] Updated weights for policy 0, policy_version 441697 (0.0015) [2024-06-15 17:19:25,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.7, 300 sec: 43431.5). Total num frames: 904626176. Throughput: 0: 11150.2. Samples: 226212864. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:19:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:27,588][1652475] Updated weights for policy 0, policy_version 441760 (0.0030) [2024-06-15 17:19:29,680][1652475] Updated weights for policy 0, policy_version 441808 (0.0014) [2024-06-15 17:19:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 43875.8). Total num frames: 904921088. Throughput: 0: 11195.7. Samples: 226278400. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:19:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:35,132][1652475] Updated weights for policy 0, policy_version 441888 (0.0043) [2024-06-15 17:19:35,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 905019392. Throughput: 0: 11241.2. Samples: 226316800. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:19:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:36,729][1652475] Updated weights for policy 0, policy_version 441952 (0.0013) [2024-06-15 17:19:39,848][1652475] Updated weights for policy 0, policy_version 442016 (0.0010) [2024-06-15 17:19:40,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 905314304. Throughput: 0: 11286.8. Samples: 226381824. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:19:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:41,676][1652475] Updated weights for policy 0, policy_version 442064 (0.0011) [2024-06-15 17:19:45,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 905445376. Throughput: 0: 11175.7. Samples: 226447872. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:19:45,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:47,126][1652475] Updated weights for policy 0, policy_version 442131 (0.0015) [2024-06-15 17:19:48,872][1652475] Updated weights for policy 0, policy_version 442194 (0.0012) [2024-06-15 17:19:50,059][1652475] Updated weights for policy 0, policy_version 442240 (0.0014) [2024-06-15 17:19:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44236.9, 300 sec: 43542.6). Total num frames: 905707520. Throughput: 0: 11161.6. Samples: 226479104. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:19:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:54,508][1652475] Updated weights for policy 0, policy_version 442320 (0.0088) [2024-06-15 17:19:55,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 44237.0, 300 sec: 43431.5). Total num frames: 905936896. Throughput: 0: 10945.8. Samples: 226536448. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:19:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:19:59,548][1652475] Updated weights for policy 0, policy_version 442384 (0.0015) [2024-06-15 17:20:00,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43698.7, 300 sec: 43098.3). Total num frames: 906100736. Throughput: 0: 10922.7. Samples: 226601984. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:20:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:20:01,993][1652475] Updated weights for policy 0, policy_version 442480 (0.0097) [2024-06-15 17:20:04,460][1652475] Updated weights for policy 0, policy_version 442544 (0.0013) [2024-06-15 17:20:05,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 906362880. Throughput: 0: 10854.4. Samples: 226631168. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:20:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:20:07,654][1652475] Updated weights for policy 0, policy_version 442622 (0.0014) [2024-06-15 17:20:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43691.8, 300 sec: 43098.2). Total num frames: 906493952. Throughput: 0: 10763.4. Samples: 226697216. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:20:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:20:11,571][1651340] Signal inference workers to stop experience collection... (22700 times) [2024-06-15 17:20:11,619][1652475] InferenceWorker_p0-w0: stopping experience collection (22700 times) [2024-06-15 17:20:11,772][1651340] Signal inference workers to resume experience collection... (22700 times) [2024-06-15 17:20:11,772][1652475] InferenceWorker_p0-w0: resuming experience collection (22700 times) [2024-06-15 17:20:12,117][1652475] Updated weights for policy 0, policy_version 442672 (0.0036) [2024-06-15 17:20:13,788][1652475] Updated weights for policy 0, policy_version 442720 (0.0016) [2024-06-15 17:20:15,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 906756096. Throughput: 0: 10786.1. Samples: 226763776. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:20:15,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 17:20:16,589][1652475] Updated weights for policy 0, policy_version 442768 (0.0011) [2024-06-15 17:20:17,968][1652475] Updated weights for policy 0, policy_version 442820 (0.0015) [2024-06-15 17:20:19,125][1652475] Updated weights for policy 0, policy_version 442874 (0.0018) [2024-06-15 17:20:20,739][1648984] Fps is (10 sec: 52423.7, 60 sec: 43690.0, 300 sec: 43542.4). Total num frames: 907018240. Throughput: 0: 10638.0. Samples: 226795520. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:20:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:20:24,447][1652475] Updated weights for policy 0, policy_version 442912 (0.0074) [2024-06-15 17:20:25,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 907182080. Throughput: 0: 10695.1. Samples: 226863104. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:20:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:20:26,590][1652475] Updated weights for policy 0, policy_version 443004 (0.0025) [2024-06-15 17:20:29,875][1652475] Updated weights for policy 0, policy_version 443056 (0.0021) [2024-06-15 17:20:30,741][1648984] Fps is (10 sec: 39314.5, 60 sec: 41504.2, 300 sec: 43431.1). Total num frames: 907411456. Throughput: 0: 10501.0. Samples: 226920448. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:20:30,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:20:31,629][1652475] Updated weights for policy 0, policy_version 443104 (0.0013) [2024-06-15 17:20:35,588][1652475] Updated weights for policy 0, policy_version 443137 (0.0013) [2024-06-15 17:20:35,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 42052.4, 300 sec: 42987.2). Total num frames: 907542528. Throughput: 0: 10478.9. Samples: 226950656. Policy #0 lag: (min: 4.0, avg: 83.2, max: 260.0) [2024-06-15 17:20:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:20:36,785][1652475] Updated weights for policy 0, policy_version 443198 (0.0011) [2024-06-15 17:20:38,502][1652475] Updated weights for policy 0, policy_version 443262 (0.0014) [2024-06-15 17:20:40,738][1648984] Fps is (10 sec: 39332.5, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 907804672. Throughput: 0: 10683.7. Samples: 227017216. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:20:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:20:43,149][1652475] Updated weights for policy 0, policy_version 443328 (0.0014) [2024-06-15 17:20:45,749][1648984] Fps is (10 sec: 52371.2, 60 sec: 43682.7, 300 sec: 43541.0). Total num frames: 908066816. Throughput: 0: 10544.6. Samples: 227076608. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:20:45,750][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:20:47,783][1652475] Updated weights for policy 0, policy_version 443393 (0.0117) [2024-06-15 17:20:49,083][1652475] Updated weights for policy 0, policy_version 443456 (0.0022) [2024-06-15 17:20:50,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 908328960. Throughput: 0: 10752.0. Samples: 227115008. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:20:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:20:53,891][1652475] Updated weights for policy 0, policy_version 443522 (0.0014) [2024-06-15 17:20:55,405][1652475] Updated weights for policy 0, policy_version 443584 (0.0013) [2024-06-15 17:20:55,738][1648984] Fps is (10 sec: 39364.0, 60 sec: 42052.1, 300 sec: 43320.4). Total num frames: 908460032. Throughput: 0: 10535.8. Samples: 227171328. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:20:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:20:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000443584_908460032.pth... [2024-06-15 17:20:55,815][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000438528_898105344.pth [2024-06-15 17:20:57,506][1651340] Signal inference workers to stop experience collection... (22750 times) [2024-06-15 17:20:57,586][1652475] InferenceWorker_p0-w0: stopping experience collection (22750 times) [2024-06-15 17:20:57,803][1651340] Signal inference workers to resume experience collection... (22750 times) [2024-06-15 17:20:57,804][1652475] InferenceWorker_p0-w0: resuming experience collection (22750 times) [2024-06-15 17:20:57,806][1652475] Updated weights for policy 0, policy_version 443632 (0.0015) [2024-06-15 17:21:00,109][1652475] Updated weights for policy 0, policy_version 443649 (0.0011) [2024-06-15 17:21:00,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 908656640. Throughput: 0: 10638.2. Samples: 227242496. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:02,425][1652475] Updated weights for policy 0, policy_version 443715 (0.0014) [2024-06-15 17:21:03,817][1652475] Updated weights for policy 0, policy_version 443776 (0.0012) [2024-06-15 17:21:05,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 908886016. Throughput: 0: 10547.4. Samples: 227270144. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:06,510][1652475] Updated weights for policy 0, policy_version 443832 (0.0012) [2024-06-15 17:21:09,088][1652475] Updated weights for policy 0, policy_version 443888 (0.0022) [2024-06-15 17:21:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 909115392. Throughput: 0: 10513.1. Samples: 227336192. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:12,570][1652475] Updated weights for policy 0, policy_version 443940 (0.0030) [2024-06-15 17:21:15,352][1652475] Updated weights for policy 0, policy_version 444016 (0.0095) [2024-06-15 17:21:15,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 909344768. Throughput: 0: 10764.0. Samples: 227404800. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:17,199][1652475] Updated weights for policy 0, policy_version 444048 (0.0019) [2024-06-15 17:21:20,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43145.3, 300 sec: 43542.6). Total num frames: 909606912. Throughput: 0: 10797.5. Samples: 227436544. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:20,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:20,831][1652475] Updated weights for policy 0, policy_version 444154 (0.0214) [2024-06-15 17:21:25,097][1652475] Updated weights for policy 0, policy_version 444221 (0.0014) [2024-06-15 17:21:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.6, 300 sec: 43321.1). Total num frames: 909770752. Throughput: 0: 10911.3. Samples: 227508224. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:27,207][1652475] Updated weights for policy 0, policy_version 444283 (0.0022) [2024-06-15 17:21:29,545][1652475] Updated weights for policy 0, policy_version 444336 (0.0015) [2024-06-15 17:21:30,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 43692.6, 300 sec: 43764.7). Total num frames: 910032896. Throughput: 0: 11016.3. Samples: 227572224. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:31,673][1652475] Updated weights for policy 0, policy_version 444389 (0.0014) [2024-06-15 17:21:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 910163968. Throughput: 0: 10945.4. Samples: 227607552. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:36,005][1652475] Updated weights for policy 0, policy_version 444435 (0.0018) [2024-06-15 17:21:38,257][1652475] Updated weights for policy 0, policy_version 444487 (0.0016) [2024-06-15 17:21:39,361][1652475] Updated weights for policy 0, policy_version 444544 (0.0013) [2024-06-15 17:21:40,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 910491648. Throughput: 0: 11229.9. Samples: 227676672. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:40,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:41,129][1652475] Updated weights for policy 0, policy_version 444595 (0.0019) [2024-06-15 17:21:43,546][1652475] Updated weights for policy 0, policy_version 444644 (0.0014) [2024-06-15 17:21:45,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43698.6, 300 sec: 43542.6). Total num frames: 910688256. Throughput: 0: 11070.6. Samples: 227740672. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:45,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:47,474][1652475] Updated weights for policy 0, policy_version 444688 (0.0014) [2024-06-15 17:21:47,588][1651340] Signal inference workers to stop experience collection... (22800 times) [2024-06-15 17:21:47,653][1652475] InferenceWorker_p0-w0: stopping experience collection (22800 times) [2024-06-15 17:21:47,808][1651340] Signal inference workers to resume experience collection... (22800 times) [2024-06-15 17:21:47,809][1652475] InferenceWorker_p0-w0: resuming experience collection (22800 times) [2024-06-15 17:21:50,364][1652475] Updated weights for policy 0, policy_version 444740 (0.0120) [2024-06-15 17:21:50,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 910852096. Throughput: 0: 11252.6. Samples: 227776512. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:51,770][1652475] Updated weights for policy 0, policy_version 444800 (0.0013) [2024-06-15 17:21:53,011][1652475] Updated weights for policy 0, policy_version 444854 (0.0013) [2024-06-15 17:21:55,674][1652475] Updated weights for policy 0, policy_version 444912 (0.0015) [2024-06-15 17:21:55,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 45329.1, 300 sec: 43764.7). Total num frames: 911179776. Throughput: 0: 11229.8. Samples: 227841536. Policy #0 lag: (min: 15.0, avg: 97.6, max: 255.0) [2024-06-15 17:21:55,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:21:59,335][1652475] Updated weights for policy 0, policy_version 444945 (0.0012) [2024-06-15 17:22:00,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 911343616. Throughput: 0: 11195.7. Samples: 227908608. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:22:03,439][1652475] Updated weights for policy 0, policy_version 445056 (0.0014) [2024-06-15 17:22:04,332][1652475] Updated weights for policy 0, policy_version 445104 (0.0015) [2024-06-15 17:22:05,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 45329.1, 300 sec: 43764.7). Total num frames: 911605760. Throughput: 0: 11252.6. Samples: 227942912. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:22:07,228][1652475] Updated weights for policy 0, policy_version 445175 (0.0016) [2024-06-15 17:22:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 911736832. Throughput: 0: 11127.5. Samples: 228008960. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:22:11,500][1652475] Updated weights for policy 0, policy_version 445219 (0.0039) [2024-06-15 17:22:14,146][1652475] Updated weights for policy 0, policy_version 445264 (0.0017) [2024-06-15 17:22:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44782.9, 300 sec: 43653.7). Total num frames: 912031744. Throughput: 0: 11195.8. Samples: 228076032. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:15,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:22:16,030][1652475] Updated weights for policy 0, policy_version 445346 (0.0015) [2024-06-15 17:22:18,325][1652475] Updated weights for policy 0, policy_version 445385 (0.0013) [2024-06-15 17:22:20,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 44236.7, 300 sec: 43653.6). Total num frames: 912261120. Throughput: 0: 11229.8. Samples: 228112896. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:22:23,567][1652475] Updated weights for policy 0, policy_version 445488 (0.0197) [2024-06-15 17:22:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 912457728. Throughput: 0: 11127.5. Samples: 228177408. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:22:26,098][1652475] Updated weights for policy 0, policy_version 445557 (0.0013) [2024-06-15 17:22:27,377][1652475] Updated weights for policy 0, policy_version 445622 (0.0013) [2024-06-15 17:22:30,250][1651340] Signal inference workers to stop experience collection... (22850 times) [2024-06-15 17:22:30,288][1652475] InferenceWorker_p0-w0: stopping experience collection (22850 times) [2024-06-15 17:22:30,477][1651340] Signal inference workers to resume experience collection... (22850 times) [2024-06-15 17:22:30,478][1652475] InferenceWorker_p0-w0: resuming experience collection (22850 times) [2024-06-15 17:22:30,608][1652475] Updated weights for policy 0, policy_version 445685 (0.0017) [2024-06-15 17:22:30,738][1648984] Fps is (10 sec: 49152.8, 60 sec: 45329.2, 300 sec: 43875.8). Total num frames: 912752640. Throughput: 0: 11173.0. Samples: 228243456. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:22:35,338][1652475] Updated weights for policy 0, policy_version 445744 (0.0014) [2024-06-15 17:22:35,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 912916480. Throughput: 0: 11252.6. Samples: 228282880. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:22:36,989][1652475] Updated weights for policy 0, policy_version 445822 (0.0013) [2024-06-15 17:22:40,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.8, 300 sec: 43653.6). Total num frames: 913113088. Throughput: 0: 11207.1. Samples: 228345856. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:22:42,020][1652475] Updated weights for policy 0, policy_version 445906 (0.0015) [2024-06-15 17:22:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 913309696. Throughput: 0: 11104.7. Samples: 228408320. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:45,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:22:46,971][1652475] Updated weights for policy 0, policy_version 445968 (0.0014) [2024-06-15 17:22:49,417][1652475] Updated weights for policy 0, policy_version 446048 (0.0013) [2024-06-15 17:22:50,738][1648984] Fps is (10 sec: 45874.5, 60 sec: 45329.0, 300 sec: 43542.6). Total num frames: 913571840. Throughput: 0: 11047.8. Samples: 228440064. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:22:52,757][1652475] Updated weights for policy 0, policy_version 446112 (0.0014) [2024-06-15 17:22:54,329][1652475] Updated weights for policy 0, policy_version 446178 (0.0013) [2024-06-15 17:22:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44236.8, 300 sec: 43764.7). Total num frames: 913833984. Throughput: 0: 10899.9. Samples: 228499456. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:22:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:22:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000446208_913833984.pth... [2024-06-15 17:22:55,789][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000441072_903315456.pth [2024-06-15 17:22:59,163][1652475] Updated weights for policy 0, policy_version 446242 (0.0013) [2024-06-15 17:23:00,759][1648984] Fps is (10 sec: 39237.6, 60 sec: 43675.0, 300 sec: 43539.4). Total num frames: 913965056. Throughput: 0: 10917.4. Samples: 228567552. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:23:00,760][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:23:01,843][1652475] Updated weights for policy 0, policy_version 446305 (0.0012) [2024-06-15 17:23:05,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 914161664. Throughput: 0: 10786.2. Samples: 228598272. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:23:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:23:05,757][1652475] Updated weights for policy 0, policy_version 446370 (0.0013) [2024-06-15 17:23:07,233][1652475] Updated weights for policy 0, policy_version 446423 (0.0013) [2024-06-15 17:23:10,738][1648984] Fps is (10 sec: 39406.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 914358272. Throughput: 0: 10717.9. Samples: 228659712. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:23:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:23:12,311][1652475] Updated weights for policy 0, policy_version 446496 (0.0013) [2024-06-15 17:23:14,195][1652475] Updated weights for policy 0, policy_version 446564 (0.0013) [2024-06-15 17:23:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 914620416. Throughput: 0: 10535.8. Samples: 228717568. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:23:15,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 17:23:17,656][1652475] Updated weights for policy 0, policy_version 446624 (0.0012) [2024-06-15 17:23:19,398][1651340] Signal inference workers to stop experience collection... (22900 times) [2024-06-15 17:23:19,451][1652475] InferenceWorker_p0-w0: stopping experience collection (22900 times) [2024-06-15 17:23:19,710][1651340] Signal inference workers to resume experience collection... (22900 times) [2024-06-15 17:23:19,711][1652475] InferenceWorker_p0-w0: resuming experience collection (22900 times) [2024-06-15 17:23:20,386][1652475] Updated weights for policy 0, policy_version 446704 (0.0062) [2024-06-15 17:23:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 914882560. Throughput: 0: 10353.8. Samples: 228748800. Policy #0 lag: (min: 2.0, avg: 101.5, max: 258.0) [2024-06-15 17:23:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:23:25,172][1652475] Updated weights for policy 0, policy_version 446752 (0.0032) [2024-06-15 17:23:25,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 914980864. Throughput: 0: 10558.6. Samples: 228820992. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:23:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:23:26,673][1652475] Updated weights for policy 0, policy_version 446817 (0.0010) [2024-06-15 17:23:27,151][1652475] Updated weights for policy 0, policy_version 446844 (0.0010) [2024-06-15 17:23:29,780][1652475] Updated weights for policy 0, policy_version 446905 (0.0014) [2024-06-15 17:23:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 43986.9). Total num frames: 915275776. Throughput: 0: 10513.1. Samples: 228881408. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:23:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:23:32,517][1652475] Updated weights for policy 0, policy_version 446976 (0.0013) [2024-06-15 17:23:35,740][1648984] Fps is (10 sec: 42589.7, 60 sec: 41504.7, 300 sec: 43320.1). Total num frames: 915406848. Throughput: 0: 10512.6. Samples: 228913152. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:23:35,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:23:37,699][1652475] Updated weights for policy 0, policy_version 447042 (0.0139) [2024-06-15 17:23:39,122][1652475] Updated weights for policy 0, policy_version 447100 (0.0012) [2024-06-15 17:23:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 915668992. Throughput: 0: 10717.9. Samples: 228981760. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:23:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:23:42,171][1652475] Updated weights for policy 0, policy_version 447168 (0.0014) [2024-06-15 17:23:45,738][1648984] Fps is (10 sec: 52439.6, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 915931136. Throughput: 0: 10609.2. Samples: 229044736. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:23:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:23:48,284][1652475] Updated weights for policy 0, policy_version 447248 (0.0014) [2024-06-15 17:23:49,870][1652475] Updated weights for policy 0, policy_version 447312 (0.0012) [2024-06-15 17:23:50,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43144.6, 300 sec: 43653.7). Total num frames: 916160512. Throughput: 0: 10877.1. Samples: 229087744. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:23:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:23:50,837][1652475] Updated weights for policy 0, policy_version 447357 (0.0057) [2024-06-15 17:23:53,609][1652475] Updated weights for policy 0, policy_version 447424 (0.0012) [2024-06-15 17:23:55,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43766.4). Total num frames: 916389888. Throughput: 0: 10808.9. Samples: 229146112. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:23:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:24:00,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 42067.4, 300 sec: 43209.3). Total num frames: 916488192. Throughput: 0: 11184.4. Samples: 229220864. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:24:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:24:01,591][1652475] Updated weights for policy 0, policy_version 447536 (0.0093) [2024-06-15 17:24:04,382][1652475] Updated weights for policy 0, policy_version 447619 (0.0091) [2024-06-15 17:24:04,622][1651340] Signal inference workers to stop experience collection... (22950 times) [2024-06-15 17:24:04,659][1652475] InferenceWorker_p0-w0: stopping experience collection (22950 times) [2024-06-15 17:24:04,812][1651340] Signal inference workers to resume experience collection... (22950 times) [2024-06-15 17:24:04,813][1652475] InferenceWorker_p0-w0: resuming experience collection (22950 times) [2024-06-15 17:24:05,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 43987.1). Total num frames: 916848640. Throughput: 0: 11013.7. Samples: 229244416. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:24:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:24:08,176][1652475] Updated weights for policy 0, policy_version 447715 (0.0090) [2024-06-15 17:24:10,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 916979712. Throughput: 0: 10797.5. Samples: 229306880. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:24:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:24:12,873][1652475] Updated weights for policy 0, policy_version 447776 (0.0014) [2024-06-15 17:24:14,479][1652475] Updated weights for policy 0, policy_version 447840 (0.0012) [2024-06-15 17:24:15,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44236.7, 300 sec: 43653.6). Total num frames: 917274624. Throughput: 0: 11036.4. Samples: 229378048. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:24:15,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:24:16,079][1652475] Updated weights for policy 0, policy_version 447904 (0.0015) [2024-06-15 17:24:19,677][1652475] Updated weights for policy 0, policy_version 447969 (0.0013) [2024-06-15 17:24:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 917504000. Throughput: 0: 11082.5. Samples: 229411840. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:24:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:24:24,972][1652475] Updated weights for policy 0, policy_version 448048 (0.0012) [2024-06-15 17:24:25,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 44236.8, 300 sec: 43098.2). Total num frames: 917635072. Throughput: 0: 11275.4. Samples: 229489152. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:24:25,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:24:26,920][1652475] Updated weights for policy 0, policy_version 448116 (0.0012) [2024-06-15 17:24:28,655][1652475] Updated weights for policy 0, policy_version 448176 (0.0013) [2024-06-15 17:24:30,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 43764.7). Total num frames: 917929984. Throughput: 0: 11093.3. Samples: 229543936. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:24:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:24:31,818][1652475] Updated weights for policy 0, policy_version 448246 (0.0024) [2024-06-15 17:24:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43692.1, 300 sec: 43098.2). Total num frames: 918028288. Throughput: 0: 10808.9. Samples: 229574144. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:24:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:24:37,656][1652475] Updated weights for policy 0, policy_version 448304 (0.0015) [2024-06-15 17:24:39,396][1652475] Updated weights for policy 0, policy_version 448384 (0.0120) [2024-06-15 17:24:40,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 918355968. Throughput: 0: 10968.2. Samples: 229639680. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:24:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:24:40,973][1652475] Updated weights for policy 0, policy_version 448444 (0.0013) [2024-06-15 17:24:43,651][1652475] Updated weights for policy 0, policy_version 448502 (0.0015) [2024-06-15 17:24:45,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 918552576. Throughput: 0: 10877.2. Samples: 229710336. Policy #0 lag: (min: 9.0, avg: 81.3, max: 217.0) [2024-06-15 17:24:45,740][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:24:49,524][1652475] Updated weights for policy 0, policy_version 448564 (0.0013) [2024-06-15 17:24:50,549][1651340] Signal inference workers to stop experience collection... (23000 times) [2024-06-15 17:24:50,598][1652475] InferenceWorker_p0-w0: stopping experience collection (23000 times) [2024-06-15 17:24:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 918749184. Throughput: 0: 11184.4. Samples: 229747712. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:24:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:24:50,833][1651340] Signal inference workers to resume experience collection... (23000 times) [2024-06-15 17:24:50,834][1652475] InferenceWorker_p0-w0: resuming experience collection (23000 times) [2024-06-15 17:24:51,165][1652475] Updated weights for policy 0, policy_version 448640 (0.0010) [2024-06-15 17:24:55,369][1652475] Updated weights for policy 0, policy_version 448723 (0.0013) [2024-06-15 17:24:55,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 919011328. Throughput: 0: 10934.0. Samples: 229798912. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:24:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:24:56,314][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000448768_919076864.pth... [2024-06-15 17:24:56,360][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000443584_908460032.pth [2024-06-15 17:25:00,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 919076864. Throughput: 0: 10786.2. Samples: 229863424. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:25:00,919][1652475] Updated weights for policy 0, policy_version 448784 (0.0011) [2024-06-15 17:25:02,839][1652475] Updated weights for policy 0, policy_version 448839 (0.0014) [2024-06-15 17:25:04,141][1652475] Updated weights for policy 0, policy_version 448896 (0.0014) [2024-06-15 17:25:05,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 919339008. Throughput: 0: 10740.6. Samples: 229895168. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:25:07,911][1652475] Updated weights for policy 0, policy_version 448981 (0.0014) [2024-06-15 17:25:10,803][1648984] Fps is (10 sec: 52090.8, 60 sec: 43643.5, 300 sec: 43533.0). Total num frames: 919601152. Throughput: 0: 10134.4. Samples: 229945856. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:10,803][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:25:13,183][1652475] Updated weights for policy 0, policy_version 449045 (0.0111) [2024-06-15 17:25:15,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 40960.0, 300 sec: 43098.4). Total num frames: 919732224. Throughput: 0: 10501.7. Samples: 230016512. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:25:18,294][1652475] Updated weights for policy 0, policy_version 449147 (0.0013) [2024-06-15 17:25:20,271][1652475] Updated weights for policy 0, policy_version 449216 (0.0018) [2024-06-15 17:25:20,738][1648984] Fps is (10 sec: 42876.3, 60 sec: 42052.2, 300 sec: 43542.6). Total num frames: 920027136. Throughput: 0: 10490.3. Samples: 230046208. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:20,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 17:25:21,801][1652475] Updated weights for policy 0, policy_version 449280 (0.0013) [2024-06-15 17:25:25,740][1648984] Fps is (10 sec: 39313.8, 60 sec: 41504.7, 300 sec: 43098.4). Total num frames: 920125440. Throughput: 0: 10410.2. Samples: 230108160. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:25,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:25:27,464][1652475] Updated weights for policy 0, policy_version 449344 (0.0023) [2024-06-15 17:25:30,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 43542.5). Total num frames: 920387584. Throughput: 0: 10126.2. Samples: 230166016. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:25:31,787][1652475] Updated weights for policy 0, policy_version 449443 (0.0016) [2024-06-15 17:25:34,091][1652475] Updated weights for policy 0, policy_version 449475 (0.0026) [2024-06-15 17:25:35,738][1648984] Fps is (10 sec: 52439.1, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 920649728. Throughput: 0: 9864.5. Samples: 230191616. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:25:40,738][1648984] Fps is (10 sec: 29490.5, 60 sec: 38775.3, 300 sec: 42766.6). Total num frames: 920682496. Throughput: 0: 10319.6. Samples: 230263296. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:40,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:25:41,022][1652475] Updated weights for policy 0, policy_version 449570 (0.0180) [2024-06-15 17:25:41,694][1651340] Signal inference workers to stop experience collection... (23050 times) [2024-06-15 17:25:41,755][1652475] InferenceWorker_p0-w0: stopping experience collection (23050 times) [2024-06-15 17:25:41,903][1651340] Signal inference workers to resume experience collection... (23050 times) [2024-06-15 17:25:41,910][1652475] InferenceWorker_p0-w0: resuming experience collection (23050 times) [2024-06-15 17:25:42,972][1652475] Updated weights for policy 0, policy_version 449651 (0.0123) [2024-06-15 17:25:44,489][1652475] Updated weights for policy 0, policy_version 449718 (0.0013) [2024-06-15 17:25:45,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 921042944. Throughput: 0: 10092.1. Samples: 230317568. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:25:46,732][1652475] Updated weights for policy 0, policy_version 449746 (0.0024) [2024-06-15 17:25:50,738][1648984] Fps is (10 sec: 49153.6, 60 sec: 40413.9, 300 sec: 43098.3). Total num frames: 921174016. Throughput: 0: 10205.9. Samples: 230354432. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:25:51,964][1652475] Updated weights for policy 0, policy_version 449795 (0.0014) [2024-06-15 17:25:53,401][1652475] Updated weights for policy 0, policy_version 449860 (0.0011) [2024-06-15 17:25:54,695][1652475] Updated weights for policy 0, policy_version 449920 (0.0013) [2024-06-15 17:25:55,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 921534464. Throughput: 0: 10676.4. Samples: 230425600. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:25:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:25:58,365][1652475] Updated weights for policy 0, policy_version 450000 (0.0013) [2024-06-15 17:26:00,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 921698304. Throughput: 0: 10490.3. Samples: 230488576. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:26:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:04,279][1652475] Updated weights for policy 0, policy_version 450051 (0.0112) [2024-06-15 17:26:05,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 921829376. Throughput: 0: 10661.0. Samples: 230525952. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:26:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:06,265][1652475] Updated weights for policy 0, policy_version 450134 (0.0096) [2024-06-15 17:26:07,901][1652475] Updated weights for policy 0, policy_version 450208 (0.0013) [2024-06-15 17:26:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42097.8, 300 sec: 43320.4). Total num frames: 922124288. Throughput: 0: 10490.8. Samples: 230580224. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:26:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:11,322][1652475] Updated weights for policy 0, policy_version 450277 (0.0013) [2024-06-15 17:26:15,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 41506.0, 300 sec: 42765.0). Total num frames: 922222592. Throughput: 0: 10786.1. Samples: 230651392. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 17:26:15,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:18,221][1652475] Updated weights for policy 0, policy_version 450368 (0.0015) [2024-06-15 17:26:19,432][1652475] Updated weights for policy 0, policy_version 450418 (0.0013) [2024-06-15 17:26:20,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42598.5, 300 sec: 43431.5). Total num frames: 922583040. Throughput: 0: 10843.1. Samples: 230679552. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:26:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:20,954][1652475] Updated weights for policy 0, policy_version 450496 (0.0140) [2024-06-15 17:26:22,979][1651340] Signal inference workers to stop experience collection... (23100 times) [2024-06-15 17:26:23,016][1652475] InferenceWorker_p0-w0: stopping experience collection (23100 times) [2024-06-15 17:26:23,274][1651340] Signal inference workers to resume experience collection... (23100 times) [2024-06-15 17:26:23,275][1652475] InferenceWorker_p0-w0: resuming experience collection (23100 times) [2024-06-15 17:26:24,012][1652475] Updated weights for policy 0, policy_version 450560 (0.0017) [2024-06-15 17:26:25,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43692.1, 300 sec: 43098.3). Total num frames: 922746880. Throughput: 0: 10547.3. Samples: 230737920. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:26:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:30,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 922910720. Throughput: 0: 10945.4. Samples: 230810112. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:26:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:30,826][1652475] Updated weights for policy 0, policy_version 450643 (0.0015) [2024-06-15 17:26:32,795][1652475] Updated weights for policy 0, policy_version 450722 (0.0013) [2024-06-15 17:26:35,430][1652475] Updated weights for policy 0, policy_version 450771 (0.0013) [2024-06-15 17:26:35,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42598.5, 300 sec: 43098.3). Total num frames: 923205632. Throughput: 0: 10615.5. Samples: 230832128. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:26:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:36,468][1652475] Updated weights for policy 0, policy_version 450814 (0.0013) [2024-06-15 17:26:40,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 923271168. Throughput: 0: 10626.8. Samples: 230903808. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:26:40,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:42,602][1652475] Updated weights for policy 0, policy_version 450874 (0.0111) [2024-06-15 17:26:44,144][1652475] Updated weights for policy 0, policy_version 450928 (0.0013) [2024-06-15 17:26:45,727][1652475] Updated weights for policy 0, policy_version 450992 (0.0014) [2024-06-15 17:26:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 923631616. Throughput: 0: 10456.2. Samples: 230959104. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:26:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:47,871][1652475] Updated weights for policy 0, policy_version 451024 (0.0025) [2024-06-15 17:26:50,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 923795456. Throughput: 0: 10376.5. Samples: 230992896. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:26:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:53,611][1652475] Updated weights for policy 0, policy_version 451088 (0.0013) [2024-06-15 17:26:55,150][1652475] Updated weights for policy 0, policy_version 451136 (0.0013) [2024-06-15 17:26:55,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 40413.8, 300 sec: 42765.0). Total num frames: 923959296. Throughput: 0: 10763.4. Samples: 231064576. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:26:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:26:56,184][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000451184_924024832.pth... [2024-06-15 17:26:56,242][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000446208_913833984.pth [2024-06-15 17:26:56,875][1652475] Updated weights for policy 0, policy_version 451204 (0.0013) [2024-06-15 17:27:00,040][1652475] Updated weights for policy 0, policy_version 451280 (0.0012) [2024-06-15 17:27:00,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 924254208. Throughput: 0: 10479.0. Samples: 231122944. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:27:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:27:05,671][1652475] Updated weights for policy 0, policy_version 451336 (0.0014) [2024-06-15 17:27:05,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 924319744. Throughput: 0: 10615.5. Samples: 231157248. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:27:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:27:07,162][1652475] Updated weights for policy 0, policy_version 451392 (0.0015) [2024-06-15 17:27:08,079][1651340] Signal inference workers to stop experience collection... (23150 times) [2024-06-15 17:27:08,110][1652475] InferenceWorker_p0-w0: stopping experience collection (23150 times) [2024-06-15 17:27:08,376][1651340] Signal inference workers to resume experience collection... (23150 times) [2024-06-15 17:27:08,377][1652475] InferenceWorker_p0-w0: resuming experience collection (23150 times) [2024-06-15 17:27:08,699][1652475] Updated weights for policy 0, policy_version 451454 (0.0012) [2024-06-15 17:27:10,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 924647424. Throughput: 0: 10717.9. Samples: 231220224. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:27:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:27:11,141][1652475] Updated weights for policy 0, policy_version 451517 (0.0025) [2024-06-15 17:27:12,574][1652475] Updated weights for policy 0, policy_version 451574 (0.0015) [2024-06-15 17:27:15,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.9, 300 sec: 42654.0). Total num frames: 924844032. Throughput: 0: 10547.2. Samples: 231284736. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:27:15,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 17:27:19,670][1652475] Updated weights for policy 0, policy_version 451648 (0.0013) [2024-06-15 17:27:20,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 40959.8, 300 sec: 42653.9). Total num frames: 925040640. Throughput: 0: 10774.7. Samples: 231316992. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:27:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:27:23,445][1652475] Updated weights for policy 0, policy_version 451716 (0.0012) [2024-06-15 17:27:24,605][1652475] Updated weights for policy 0, policy_version 451776 (0.0014) [2024-06-15 17:27:25,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 925237248. Throughput: 0: 10296.9. Samples: 231367168. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:27:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:27:27,329][1652475] Updated weights for policy 0, policy_version 451834 (0.0021) [2024-06-15 17:27:30,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 925401088. Throughput: 0: 10592.7. Samples: 231435776. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:27:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:27:31,415][1652475] Updated weights for policy 0, policy_version 451888 (0.0012) [2024-06-15 17:27:34,244][1652475] Updated weights for policy 0, policy_version 451967 (0.0013) [2024-06-15 17:27:35,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 40959.9, 300 sec: 42542.8). Total num frames: 925663232. Throughput: 0: 10467.5. Samples: 231463936. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:27:35,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:27:36,542][1652475] Updated weights for policy 0, policy_version 452032 (0.0027) [2024-06-15 17:27:39,981][1652475] Updated weights for policy 0, policy_version 452092 (0.0018) [2024-06-15 17:27:40,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 925892608. Throughput: 0: 10160.4. Samples: 231521792. Policy #0 lag: (min: 15.0, avg: 74.0, max: 271.0) [2024-06-15 17:27:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:27:42,983][1652475] Updated weights for policy 0, policy_version 452129 (0.0032) [2024-06-15 17:27:45,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 40413.8, 300 sec: 42320.7). Total num frames: 926056448. Throughput: 0: 10490.3. Samples: 231595008. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:27:45,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:27:45,767][1652475] Updated weights for policy 0, policy_version 452192 (0.0043) [2024-06-15 17:27:50,154][1652475] Updated weights for policy 0, policy_version 452304 (0.0014) [2024-06-15 17:27:50,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 926351360. Throughput: 0: 10513.0. Samples: 231630336. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:27:50,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:27:55,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.1, 300 sec: 42212.7). Total num frames: 926416896. Throughput: 0: 10387.9. Samples: 231687680. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:27:55,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 17:27:56,122][1652475] Updated weights for policy 0, policy_version 452384 (0.0016) [2024-06-15 17:27:56,762][1651340] Signal inference workers to stop experience collection... (23200 times) [2024-06-15 17:27:56,827][1652475] InferenceWorker_p0-w0: stopping experience collection (23200 times) [2024-06-15 17:27:57,031][1651340] Signal inference workers to resume experience collection... (23200 times) [2024-06-15 17:27:57,032][1652475] InferenceWorker_p0-w0: resuming experience collection (23200 times) [2024-06-15 17:27:57,882][1652475] Updated weights for policy 0, policy_version 452475 (0.0014) [2024-06-15 17:28:00,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 40413.9, 300 sec: 42431.8). Total num frames: 926679040. Throughput: 0: 10444.8. Samples: 231754752. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:28:02,661][1652475] Updated weights for policy 0, policy_version 452547 (0.0152) [2024-06-15 17:28:03,912][1652475] Updated weights for policy 0, policy_version 452604 (0.0013) [2024-06-15 17:28:05,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 926941184. Throughput: 0: 10251.4. Samples: 231778304. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:28:08,981][1652475] Updated weights for policy 0, policy_version 452656 (0.0023) [2024-06-15 17:28:10,525][1652475] Updated weights for policy 0, policy_version 452729 (0.0026) [2024-06-15 17:28:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 927203328. Throughput: 0: 10672.4. Samples: 231847424. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:28:14,136][1652475] Updated weights for policy 0, policy_version 452787 (0.0096) [2024-06-15 17:28:15,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43144.4, 300 sec: 42542.8). Total num frames: 927432704. Throughput: 0: 10524.4. Samples: 231909376. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:28:20,173][1652475] Updated weights for policy 0, policy_version 452880 (0.0166) [2024-06-15 17:28:20,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42052.4, 300 sec: 42653.9). Total num frames: 927563776. Throughput: 0: 10717.9. Samples: 231946240. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:28:21,595][1652475] Updated weights for policy 0, policy_version 452944 (0.0091) [2024-06-15 17:28:25,236][1652475] Updated weights for policy 0, policy_version 453008 (0.0016) [2024-06-15 17:28:25,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 927760384. Throughput: 0: 10945.4. Samples: 232014336. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:28:27,580][1652475] Updated weights for policy 0, policy_version 453091 (0.0114) [2024-06-15 17:28:30,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43144.6, 300 sec: 42654.2). Total num frames: 927989760. Throughput: 0: 10797.5. Samples: 232080896. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:28:31,686][1652475] Updated weights for policy 0, policy_version 453121 (0.0012) [2024-06-15 17:28:33,105][1652475] Updated weights for policy 0, policy_version 453187 (0.0131) [2024-06-15 17:28:35,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 928251904. Throughput: 0: 10786.1. Samples: 232115712. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:28:37,409][1652475] Updated weights for policy 0, policy_version 453264 (0.0094) [2024-06-15 17:28:39,721][1651340] Signal inference workers to stop experience collection... (23250 times) [2024-06-15 17:28:39,756][1652475] InferenceWorker_p0-w0: stopping experience collection (23250 times) [2024-06-15 17:28:39,778][1652475] Updated weights for policy 0, policy_version 453347 (0.0012) [2024-06-15 17:28:40,006][1651340] Signal inference workers to resume experience collection... (23250 times) [2024-06-15 17:28:40,015][1652475] InferenceWorker_p0-w0: resuming experience collection (23250 times) [2024-06-15 17:28:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 928514048. Throughput: 0: 10820.3. Samples: 232174592. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:28:44,460][1652475] Updated weights for policy 0, policy_version 453408 (0.0018) [2024-06-15 17:28:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43144.6, 300 sec: 42320.7). Total num frames: 928645120. Throughput: 0: 10774.8. Samples: 232239616. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:28:47,003][1652475] Updated weights for policy 0, policy_version 453504 (0.0015) [2024-06-15 17:28:50,746][1648984] Fps is (10 sec: 32740.3, 60 sec: 41500.3, 300 sec: 42208.4). Total num frames: 928841728. Throughput: 0: 10818.2. Samples: 232265216. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:50,747][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:28:51,502][1652475] Updated weights for policy 0, policy_version 453573 (0.0095) [2024-06-15 17:28:55,752][1648984] Fps is (10 sec: 39266.1, 60 sec: 43680.4, 300 sec: 42540.8). Total num frames: 929038336. Throughput: 0: 10839.6. Samples: 232335360. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:28:55,752][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:28:55,785][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000453632_929038336.pth... [2024-06-15 17:28:55,854][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000448768_919076864.pth [2024-06-15 17:28:56,202][1652475] Updated weights for policy 0, policy_version 453635 (0.0012) [2024-06-15 17:28:58,423][1652475] Updated weights for policy 0, policy_version 453728 (0.0013) [2024-06-15 17:28:59,124][1652475] Updated weights for policy 0, policy_version 453755 (0.0012) [2024-06-15 17:29:00,738][1648984] Fps is (10 sec: 45914.0, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 929300480. Throughput: 0: 10888.6. Samples: 232399360. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:29:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:29:02,429][1652475] Updated weights for policy 0, policy_version 453792 (0.0011) [2024-06-15 17:29:04,514][1652475] Updated weights for policy 0, policy_version 453885 (0.0013) [2024-06-15 17:29:05,738][1648984] Fps is (10 sec: 52502.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 929562624. Throughput: 0: 10808.9. Samples: 232432640. Policy #0 lag: (min: 15.0, avg: 113.1, max: 271.0) [2024-06-15 17:29:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:29:10,260][1652475] Updated weights for policy 0, policy_version 453968 (0.0094) [2024-06-15 17:29:10,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 929759232. Throughput: 0: 10763.4. Samples: 232498688. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:29:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:29:11,452][1652475] Updated weights for policy 0, policy_version 454015 (0.0013) [2024-06-15 17:29:15,407][1652475] Updated weights for policy 0, policy_version 454080 (0.0015) [2024-06-15 17:29:15,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 929955840. Throughput: 0: 10717.9. Samples: 232563200. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:29:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:29:16,773][1652475] Updated weights for policy 0, policy_version 454144 (0.0107) [2024-06-15 17:29:20,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 930086912. Throughput: 0: 10615.5. Samples: 232593408. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:29:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:29:21,857][1652475] Updated weights for policy 0, policy_version 454199 (0.0014) [2024-06-15 17:29:23,343][1652475] Updated weights for policy 0, policy_version 454263 (0.0015) [2024-06-15 17:29:25,337][1651340] Signal inference workers to stop experience collection... (23300 times) [2024-06-15 17:29:25,401][1652475] InferenceWorker_p0-w0: stopping experience collection (23300 times) [2024-06-15 17:29:25,660][1651340] Signal inference workers to resume experience collection... (23300 times) [2024-06-15 17:29:25,661][1652475] InferenceWorker_p0-w0: resuming experience collection (23300 times) [2024-06-15 17:29:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 930381824. Throughput: 0: 10877.2. Samples: 232664064. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:29:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:29:26,491][1652475] Updated weights for policy 0, policy_version 454320 (0.0012) [2024-06-15 17:29:28,182][1652475] Updated weights for policy 0, policy_version 454390 (0.0013) [2024-06-15 17:29:30,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 930611200. Throughput: 0: 10968.2. Samples: 232733184. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:29:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:29:32,178][1652475] Updated weights for policy 0, policy_version 454419 (0.0015) [2024-06-15 17:29:33,456][1652475] Updated weights for policy 0, policy_version 454480 (0.0015) [2024-06-15 17:29:34,592][1652475] Updated weights for policy 0, policy_version 454526 (0.0035) [2024-06-15 17:29:35,739][1648984] Fps is (10 sec: 49143.9, 60 sec: 43689.5, 300 sec: 42431.5). Total num frames: 930873344. Throughput: 0: 11163.3. Samples: 232767488. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:29:35,740][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:29:37,987][1652475] Updated weights for policy 0, policy_version 454585 (0.0011) [2024-06-15 17:29:40,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 931135488. Throughput: 0: 11085.4. Samples: 232834048. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:29:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:29:43,025][1652475] Updated weights for policy 0, policy_version 454657 (0.0015) [2024-06-15 17:29:44,286][1652475] Updated weights for policy 0, policy_version 454720 (0.0014) [2024-06-15 17:29:45,737][1648984] Fps is (10 sec: 39328.9, 60 sec: 43690.8, 300 sec: 42431.8). Total num frames: 931266560. Throughput: 0: 10979.6. Samples: 232893440. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:29:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:29:47,287][1652475] Updated weights for policy 0, policy_version 454784 (0.0014) [2024-06-15 17:29:49,843][1652475] Updated weights for policy 0, policy_version 454848 (0.0014) [2024-06-15 17:29:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44789.3, 300 sec: 42431.8). Total num frames: 931528704. Throughput: 0: 10911.3. Samples: 232923648. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:29:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:29:55,745][1648984] Fps is (10 sec: 42567.8, 60 sec: 44242.0, 300 sec: 42764.0). Total num frames: 931692544. Throughput: 0: 10784.4. Samples: 232984064. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:29:55,745][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:29:55,988][1652475] Updated weights for policy 0, policy_version 454945 (0.0015) [2024-06-15 17:29:59,029][1652475] Updated weights for policy 0, policy_version 454981 (0.0010) [2024-06-15 17:30:00,329][1652475] Updated weights for policy 0, policy_version 455036 (0.0012) [2024-06-15 17:30:00,742][1648984] Fps is (10 sec: 39305.0, 60 sec: 43687.6, 300 sec: 42653.3). Total num frames: 931921920. Throughput: 0: 10796.5. Samples: 233049088. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:30:00,743][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:30:01,974][1652475] Updated weights for policy 0, policy_version 455074 (0.0013) [2024-06-15 17:30:05,738][1648984] Fps is (10 sec: 39349.4, 60 sec: 42052.3, 300 sec: 42330.0). Total num frames: 932085760. Throughput: 0: 10843.0. Samples: 233081344. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:30:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:30:06,205][1652475] Updated weights for policy 0, policy_version 455152 (0.0014) [2024-06-15 17:30:08,180][1652475] Updated weights for policy 0, policy_version 455202 (0.0012) [2024-06-15 17:30:10,738][1648984] Fps is (10 sec: 39337.9, 60 sec: 42598.3, 300 sec: 42653.9). Total num frames: 932315136. Throughput: 0: 10638.2. Samples: 233142784. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:30:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:30:11,034][1652475] Updated weights for policy 0, policy_version 455248 (0.0013) [2024-06-15 17:30:11,573][1651340] Signal inference workers to stop experience collection... (23350 times) [2024-06-15 17:30:11,626][1652475] InferenceWorker_p0-w0: stopping experience collection (23350 times) [2024-06-15 17:30:11,893][1651340] Signal inference workers to resume experience collection... (23350 times) [2024-06-15 17:30:11,893][1652475] InferenceWorker_p0-w0: resuming experience collection (23350 times) [2024-06-15 17:30:15,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 932478976. Throughput: 0: 10558.5. Samples: 233208320. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:30:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:30:15,756][1652475] Updated weights for policy 0, policy_version 455315 (0.0015) [2024-06-15 17:30:17,972][1652475] Updated weights for policy 0, policy_version 455408 (0.0017) [2024-06-15 17:30:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 42654.2). Total num frames: 932708352. Throughput: 0: 10263.1. Samples: 233229312. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:30:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:30:21,197][1652475] Updated weights for policy 0, policy_version 455443 (0.0012) [2024-06-15 17:30:23,006][1652475] Updated weights for policy 0, policy_version 455520 (0.0019) [2024-06-15 17:30:25,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 932970496. Throughput: 0: 10387.9. Samples: 233301504. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:30:25,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 17:30:28,509][1652475] Updated weights for policy 0, policy_version 455600 (0.0014) [2024-06-15 17:30:30,263][1652475] Updated weights for policy 0, policy_version 455673 (0.0014) [2024-06-15 17:30:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 933232640. Throughput: 0: 10399.2. Samples: 233361408. Policy #0 lag: (min: 13.0, avg: 91.3, max: 253.0) [2024-06-15 17:30:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:30:33,967][1652475] Updated weights for policy 0, policy_version 455717 (0.0134) [2024-06-15 17:30:35,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43145.7, 300 sec: 43320.4). Total num frames: 933462016. Throughput: 0: 10706.5. Samples: 233405440. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:30:35,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:30:35,768][1652475] Updated weights for policy 0, policy_version 455806 (0.0014) [2024-06-15 17:30:40,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 40413.8, 300 sec: 42431.8). Total num frames: 933560320. Throughput: 0: 10765.0. Samples: 233468416. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:30:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:30:41,381][1652475] Updated weights for policy 0, policy_version 455876 (0.0012) [2024-06-15 17:30:42,722][1652475] Updated weights for policy 0, policy_version 455925 (0.0017) [2024-06-15 17:30:45,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42052.1, 300 sec: 42765.0). Total num frames: 933789696. Throughput: 0: 10809.9. Samples: 233535488. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:30:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:30:46,130][1652475] Updated weights for policy 0, policy_version 455972 (0.0016) [2024-06-15 17:30:50,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 934019072. Throughput: 0: 10626.8. Samples: 233559552. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:30:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:30:51,428][1652475] Updated weights for policy 0, policy_version 456065 (0.0097) [2024-06-15 17:30:53,300][1652475] Updated weights for policy 0, policy_version 456129 (0.0039) [2024-06-15 17:30:54,817][1652475] Updated weights for policy 0, policy_version 456189 (0.0019) [2024-06-15 17:30:55,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 43149.4, 300 sec: 42653.9). Total num frames: 934281216. Throughput: 0: 10752.0. Samples: 233626624. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:30:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:30:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000456192_934281216.pth... [2024-06-15 17:30:55,814][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000451184_924024832.pth [2024-06-15 17:30:55,819][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000456192_934281216.pth [2024-06-15 17:30:58,306][1651340] Signal inference workers to stop experience collection... (23400 times) [2024-06-15 17:30:58,357][1652475] InferenceWorker_p0-w0: stopping experience collection (23400 times) [2024-06-15 17:30:58,384][1652475] Updated weights for policy 0, policy_version 456245 (0.0077) [2024-06-15 17:30:58,509][1651340] Signal inference workers to resume experience collection... (23400 times) [2024-06-15 17:30:58,511][1652475] InferenceWorker_p0-w0: resuming experience collection (23400 times) [2024-06-15 17:30:59,541][1652475] Updated weights for policy 0, policy_version 456304 (0.0014) [2024-06-15 17:31:00,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43693.8, 300 sec: 43098.3). Total num frames: 934543360. Throughput: 0: 10831.7. Samples: 233695744. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:31:03,000][1652475] Updated weights for policy 0, policy_version 456325 (0.0012) [2024-06-15 17:31:04,461][1652475] Updated weights for policy 0, policy_version 456389 (0.0022) [2024-06-15 17:31:05,738][1648984] Fps is (10 sec: 49153.0, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 934772736. Throughput: 0: 11264.0. Samples: 233736192. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:31:05,808][1652475] Updated weights for policy 0, policy_version 456446 (0.0011) [2024-06-15 17:31:09,796][1652475] Updated weights for policy 0, policy_version 456514 (0.0140) [2024-06-15 17:31:10,665][1652475] Updated weights for policy 0, policy_version 456564 (0.0012) [2024-06-15 17:31:10,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 935034880. Throughput: 0: 11082.0. Samples: 233800192. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:31:15,592][1652475] Updated weights for policy 0, policy_version 456640 (0.0098) [2024-06-15 17:31:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45329.2, 300 sec: 42765.0). Total num frames: 935198720. Throughput: 0: 11332.3. Samples: 233871360. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:31:17,683][1652475] Updated weights for policy 0, policy_version 456702 (0.0014) [2024-06-15 17:31:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 45329.1, 300 sec: 42987.2). Total num frames: 935428096. Throughput: 0: 11002.3. Samples: 233900544. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:31:21,334][1652475] Updated weights for policy 0, policy_version 456784 (0.0014) [2024-06-15 17:31:25,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 43690.5, 300 sec: 42987.1). Total num frames: 935591936. Throughput: 0: 11172.9. Samples: 233971200. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:25,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:31:27,086][1652475] Updated weights for policy 0, policy_version 456849 (0.0014) [2024-06-15 17:31:28,815][1652475] Updated weights for policy 0, policy_version 456916 (0.0015) [2024-06-15 17:31:29,710][1652475] Updated weights for policy 0, policy_version 456960 (0.0016) [2024-06-15 17:31:30,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 935854080. Throughput: 0: 11070.6. Samples: 234033664. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:31:33,709][1652475] Updated weights for policy 0, policy_version 457030 (0.0140) [2024-06-15 17:31:35,146][1652475] Updated weights for policy 0, policy_version 457087 (0.0014) [2024-06-15 17:31:35,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 44236.9, 300 sec: 43542.6). Total num frames: 936116224. Throughput: 0: 11252.6. Samples: 234065920. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:31:40,195][1652475] Updated weights for policy 0, policy_version 457152 (0.0039) [2024-06-15 17:31:40,735][1651340] Signal inference workers to stop experience collection... (23450 times) [2024-06-15 17:31:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 45329.1, 300 sec: 42876.1). Total num frames: 936280064. Throughput: 0: 11286.8. Samples: 234134528. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:31:40,766][1652475] InferenceWorker_p0-w0: stopping experience collection (23450 times) [2024-06-15 17:31:40,992][1651340] Signal inference workers to resume experience collection... (23450 times) [2024-06-15 17:31:40,994][1652475] InferenceWorker_p0-w0: resuming experience collection (23450 times) [2024-06-15 17:31:41,613][1652475] Updated weights for policy 0, policy_version 457211 (0.0013) [2024-06-15 17:31:45,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 44783.1, 300 sec: 42987.2). Total num frames: 936476672. Throughput: 0: 11104.7. Samples: 234195456. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:31:45,862][1652475] Updated weights for policy 0, policy_version 457268 (0.0012) [2024-06-15 17:31:47,588][1652475] Updated weights for policy 0, policy_version 457338 (0.0026) [2024-06-15 17:31:50,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 936640512. Throughput: 0: 10854.4. Samples: 234224640. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:31:52,192][1652475] Updated weights for policy 0, policy_version 457403 (0.0014) [2024-06-15 17:31:54,166][1652475] Updated weights for policy 0, policy_version 457456 (0.0129) [2024-06-15 17:31:55,738][1648984] Fps is (10 sec: 42597.4, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 936902656. Throughput: 0: 10763.4. Samples: 234284544. Policy #0 lag: (min: 9.0, avg: 105.9, max: 265.0) [2024-06-15 17:31:55,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:31:58,317][1652475] Updated weights for policy 0, policy_version 457536 (0.0013) [2024-06-15 17:32:00,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.5, 300 sec: 43542.5). Total num frames: 937164800. Throughput: 0: 10569.9. Samples: 234347008. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:00,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:32:02,997][1652475] Updated weights for policy 0, policy_version 457616 (0.0017) [2024-06-15 17:32:05,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 937295872. Throughput: 0: 10672.4. Samples: 234380800. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:32:09,203][1652475] Updated weights for policy 0, policy_version 457712 (0.0110) [2024-06-15 17:32:10,678][1652475] Updated weights for policy 0, policy_version 457776 (0.0012) [2024-06-15 17:32:10,738][1648984] Fps is (10 sec: 36045.8, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 937525248. Throughput: 0: 10399.4. Samples: 234439168. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:32:12,309][1652475] Updated weights for policy 0, policy_version 457812 (0.0011) [2024-06-15 17:32:13,405][1652475] Updated weights for policy 0, policy_version 457856 (0.0010) [2024-06-15 17:32:15,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 937721856. Throughput: 0: 10342.4. Samples: 234499072. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:32:20,740][1648984] Fps is (10 sec: 29490.9, 60 sec: 39867.7, 300 sec: 42653.9). Total num frames: 937820160. Throughput: 0: 10262.8. Samples: 234527744. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:20,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:32:21,013][1652475] Updated weights for policy 0, policy_version 457940 (0.0013) [2024-06-15 17:32:23,080][1652475] Updated weights for policy 0, policy_version 458032 (0.0013) [2024-06-15 17:32:24,522][1652475] Updated weights for policy 0, policy_version 458086 (0.0028) [2024-06-15 17:32:25,738][1648984] Fps is (10 sec: 49150.9, 60 sec: 43690.8, 300 sec: 43431.5). Total num frames: 938213376. Throughput: 0: 10114.8. Samples: 234589696. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:32:30,072][1651340] Signal inference workers to stop experience collection... (23500 times) [2024-06-15 17:32:30,107][1652475] InferenceWorker_p0-w0: stopping experience collection (23500 times) [2024-06-15 17:32:30,354][1651340] Signal inference workers to resume experience collection... (23500 times) [2024-06-15 17:32:30,355][1652475] InferenceWorker_p0-w0: resuming experience collection (23500 times) [2024-06-15 17:32:30,357][1652475] Updated weights for policy 0, policy_version 458160 (0.0014) [2024-06-15 17:32:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 938344448. Throughput: 0: 10137.6. Samples: 234651648. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:32:34,556][1652475] Updated weights for policy 0, policy_version 458237 (0.0029) [2024-06-15 17:32:35,738][1648984] Fps is (10 sec: 32768.7, 60 sec: 40413.9, 300 sec: 42876.1). Total num frames: 938541056. Throughput: 0: 10251.4. Samples: 234685952. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:35,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:32:36,311][1652475] Updated weights for policy 0, policy_version 458304 (0.0014) [2024-06-15 17:32:37,727][1652475] Updated weights for policy 0, policy_version 458367 (0.0013) [2024-06-15 17:32:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 40959.9, 300 sec: 42987.2). Total num frames: 938737664. Throughput: 0: 10149.0. Samples: 234741248. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:32:44,374][1652475] Updated weights for policy 0, policy_version 458429 (0.0015) [2024-06-15 17:32:45,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 40413.8, 300 sec: 42542.9). Total num frames: 938901504. Throughput: 0: 10217.3. Samples: 234806784. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:32:47,023][1652475] Updated weights for policy 0, policy_version 458500 (0.0014) [2024-06-15 17:32:48,117][1652475] Updated weights for policy 0, policy_version 458552 (0.0013) [2024-06-15 17:32:50,706][1652475] Updated weights for policy 0, policy_version 458608 (0.0014) [2024-06-15 17:32:50,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 939229184. Throughput: 0: 10171.7. Samples: 234838528. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:32:54,952][1652475] Updated weights for policy 0, policy_version 458660 (0.0019) [2024-06-15 17:32:55,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 939393024. Throughput: 0: 10467.5. Samples: 234910208. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:32:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:32:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000458688_939393024.pth... [2024-06-15 17:32:55,790][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000453632_929038336.pth [2024-06-15 17:32:57,043][1652475] Updated weights for policy 0, policy_version 458704 (0.0012) [2024-06-15 17:32:57,969][1652475] Updated weights for policy 0, policy_version 458745 (0.0029) [2024-06-15 17:32:59,575][1652475] Updated weights for policy 0, policy_version 458804 (0.0020) [2024-06-15 17:33:00,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 41506.3, 300 sec: 43098.3). Total num frames: 939655168. Throughput: 0: 10649.6. Samples: 234978304. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:33:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:02,356][1652475] Updated weights for policy 0, policy_version 458870 (0.0139) [2024-06-15 17:33:05,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 939819008. Throughput: 0: 10752.0. Samples: 235011584. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:33:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:06,193][1652475] Updated weights for policy 0, policy_version 458919 (0.0013) [2024-06-15 17:33:08,391][1652475] Updated weights for policy 0, policy_version 458964 (0.0013) [2024-06-15 17:33:09,152][1652475] Updated weights for policy 0, policy_version 459006 (0.0013) [2024-06-15 17:33:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 940113920. Throughput: 0: 11013.7. Samples: 235085312. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:33:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:10,938][1652475] Updated weights for policy 0, policy_version 459060 (0.0013) [2024-06-15 17:33:13,939][1652475] Updated weights for policy 0, policy_version 459130 (0.0012) [2024-06-15 17:33:15,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 940310528. Throughput: 0: 11070.6. Samples: 235149824. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:33:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:17,396][1651340] Signal inference workers to stop experience collection... (23550 times) [2024-06-15 17:33:17,447][1652475] InferenceWorker_p0-w0: stopping experience collection (23550 times) [2024-06-15 17:33:17,632][1651340] Signal inference workers to resume experience collection... (23550 times) [2024-06-15 17:33:17,633][1652475] InferenceWorker_p0-w0: resuming experience collection (23550 times) [2024-06-15 17:33:17,635][1652475] Updated weights for policy 0, policy_version 459184 (0.0013) [2024-06-15 17:33:20,144][1652475] Updated weights for policy 0, policy_version 459232 (0.0012) [2024-06-15 17:33:20,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 45329.0, 300 sec: 43320.4). Total num frames: 940539904. Throughput: 0: 11070.5. Samples: 235184128. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 17:33:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:21,650][1652475] Updated weights for policy 0, policy_version 459282 (0.0013) [2024-06-15 17:33:22,497][1652475] Updated weights for policy 0, policy_version 459319 (0.0012) [2024-06-15 17:33:25,257][1652475] Updated weights for policy 0, policy_version 459361 (0.0032) [2024-06-15 17:33:25,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 940802048. Throughput: 0: 11468.8. Samples: 235257344. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:33:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:28,138][1652475] Updated weights for policy 0, policy_version 459413 (0.0013) [2024-06-15 17:33:29,126][1652475] Updated weights for policy 0, policy_version 459456 (0.0014) [2024-06-15 17:33:30,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 940965888. Throughput: 0: 11571.2. Samples: 235327488. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:33:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:32,873][1652475] Updated weights for policy 0, policy_version 459536 (0.0013) [2024-06-15 17:33:33,948][1652475] Updated weights for policy 0, policy_version 459581 (0.0015) [2024-06-15 17:33:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 941228032. Throughput: 0: 11593.9. Samples: 235360256. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:33:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:36,959][1652475] Updated weights for policy 0, policy_version 459639 (0.0015) [2024-06-15 17:33:39,798][1652475] Updated weights for policy 0, policy_version 459684 (0.0016) [2024-06-15 17:33:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 941490176. Throughput: 0: 11548.4. Samples: 235429888. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:33:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:41,956][1652475] Updated weights for policy 0, policy_version 459728 (0.0014) [2024-06-15 17:33:42,945][1652475] Updated weights for policy 0, policy_version 459775 (0.0022) [2024-06-15 17:33:44,972][1652475] Updated weights for policy 0, policy_version 459831 (0.0161) [2024-06-15 17:33:45,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 47513.7, 300 sec: 43766.0). Total num frames: 941752320. Throughput: 0: 11468.8. Samples: 235494400. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:33:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:48,511][1652475] Updated weights for policy 0, policy_version 459888 (0.0015) [2024-06-15 17:33:50,778][1648984] Fps is (10 sec: 42426.7, 60 sec: 44752.7, 300 sec: 43649.7). Total num frames: 941916160. Throughput: 0: 11515.3. Samples: 235530240. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:33:50,779][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:51,473][1652475] Updated weights for policy 0, policy_version 459957 (0.0013) [2024-06-15 17:33:54,394][1652475] Updated weights for policy 0, policy_version 460000 (0.0014) [2024-06-15 17:33:55,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 942145536. Throughput: 0: 11411.9. Samples: 235598848. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:33:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:33:56,472][1652475] Updated weights for policy 0, policy_version 460050 (0.0020) [2024-06-15 17:33:59,440][1652475] Updated weights for policy 0, policy_version 460112 (0.0022) [2024-06-15 17:34:00,738][1648984] Fps is (10 sec: 49351.8, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 942407680. Throughput: 0: 11332.3. Samples: 235659776. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:34:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:34:03,618][1652475] Updated weights for policy 0, policy_version 460224 (0.0017) [2024-06-15 17:34:05,457][1651340] Signal inference workers to stop experience collection... (23600 times) [2024-06-15 17:34:05,492][1652475] InferenceWorker_p0-w0: stopping experience collection (23600 times) [2024-06-15 17:34:05,685][1651340] Signal inference workers to resume experience collection... (23600 times) [2024-06-15 17:34:05,686][1652475] InferenceWorker_p0-w0: resuming experience collection (23600 times) [2024-06-15 17:34:05,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45875.1, 300 sec: 43431.5). Total num frames: 942571520. Throughput: 0: 11309.5. Samples: 235693056. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:34:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:34:06,768][1652475] Updated weights for policy 0, policy_version 460274 (0.0018) [2024-06-15 17:34:08,948][1652475] Updated weights for policy 0, policy_version 460322 (0.0153) [2024-06-15 17:34:10,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 44782.8, 300 sec: 43542.5). Total num frames: 942800896. Throughput: 0: 11127.4. Samples: 235758080. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:34:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:34:11,409][1652475] Updated weights for policy 0, policy_version 460368 (0.0016) [2024-06-15 17:34:12,308][1652475] Updated weights for policy 0, policy_version 460415 (0.0013) [2024-06-15 17:34:14,443][1652475] Updated weights for policy 0, policy_version 460472 (0.0015) [2024-06-15 17:34:15,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 45875.1, 300 sec: 43986.9). Total num frames: 943063040. Throughput: 0: 11138.8. Samples: 235828736. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:34:15,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 17:34:19,334][1652475] Updated weights for policy 0, policy_version 460528 (0.0054) [2024-06-15 17:34:20,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 44236.9, 300 sec: 43431.5). Total num frames: 943194112. Throughput: 0: 11264.0. Samples: 235867136. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:34:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:34:21,236][1652475] Updated weights for policy 0, policy_version 460576 (0.0012) [2024-06-15 17:34:22,936][1652475] Updated weights for policy 0, policy_version 460640 (0.0111) [2024-06-15 17:34:25,446][1652475] Updated weights for policy 0, policy_version 460720 (0.0014) [2024-06-15 17:34:25,748][1648984] Fps is (10 sec: 52375.8, 60 sec: 46413.4, 300 sec: 43985.3). Total num frames: 943587328. Throughput: 0: 10988.4. Samples: 235924480. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:34:25,749][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:34:30,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 43098.5). Total num frames: 943587328. Throughput: 0: 11150.2. Samples: 235996160. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:34:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:34:33,009][1652475] Updated weights for policy 0, policy_version 460800 (0.0014) [2024-06-15 17:34:34,692][1652475] Updated weights for policy 0, policy_version 460880 (0.0014) [2024-06-15 17:34:35,738][1648984] Fps is (10 sec: 39362.0, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 943980544. Throughput: 0: 11012.2. Samples: 236025344. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:34:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:34:38,398][1652475] Updated weights for policy 0, policy_version 460976 (0.0015) [2024-06-15 17:34:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 944111616. Throughput: 0: 10797.5. Samples: 236084736. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:34:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:34:44,402][1652475] Updated weights for policy 0, policy_version 461046 (0.0014) [2024-06-15 17:34:45,535][1652475] Updated weights for policy 0, policy_version 461115 (0.0016) [2024-06-15 17:34:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 944373760. Throughput: 0: 10945.4. Samples: 236152320. Policy #0 lag: (min: 41.0, avg: 163.9, max: 287.0) [2024-06-15 17:34:45,746][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:34:47,859][1652475] Updated weights for policy 0, policy_version 461184 (0.0013) [2024-06-15 17:34:50,535][1651340] Signal inference workers to stop experience collection... (23650 times) [2024-06-15 17:34:50,619][1652475] InferenceWorker_p0-w0: stopping experience collection (23650 times) [2024-06-15 17:34:50,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43720.1, 300 sec: 43543.6). Total num frames: 944537600. Throughput: 0: 10899.9. Samples: 236183552. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:34:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:34:50,835][1651340] Signal inference workers to resume experience collection... (23650 times) [2024-06-15 17:34:50,836][1652475] InferenceWorker_p0-w0: resuming experience collection (23650 times) [2024-06-15 17:34:51,609][1652475] Updated weights for policy 0, policy_version 461244 (0.0020) [2024-06-15 17:34:55,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 43144.5, 300 sec: 43432.1). Total num frames: 944734208. Throughput: 0: 10990.9. Samples: 236252672. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:34:55,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 17:34:56,041][1652475] Updated weights for policy 0, policy_version 461312 (0.0102) [2024-06-15 17:34:56,241][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000461328_944799744.pth... [2024-06-15 17:34:56,348][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000456192_934281216.pth [2024-06-15 17:34:57,064][1652475] Updated weights for policy 0, policy_version 461373 (0.0014) [2024-06-15 17:35:00,312][1652475] Updated weights for policy 0, policy_version 461428 (0.0011) [2024-06-15 17:35:00,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 945029120. Throughput: 0: 11013.7. Samples: 236324352. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:02,541][1652475] Updated weights for policy 0, policy_version 461494 (0.0012) [2024-06-15 17:35:05,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 945192960. Throughput: 0: 10877.1. Samples: 236356608. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:06,740][1652475] Updated weights for policy 0, policy_version 461558 (0.0012) [2024-06-15 17:35:07,976][1652475] Updated weights for policy 0, policy_version 461631 (0.0013) [2024-06-15 17:35:10,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.8, 300 sec: 43875.8). Total num frames: 945422336. Throughput: 0: 11130.0. Samples: 236425216. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:11,782][1652475] Updated weights for policy 0, policy_version 461680 (0.0013) [2024-06-15 17:35:13,788][1652475] Updated weights for policy 0, policy_version 461744 (0.0105) [2024-06-15 17:35:15,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 945684480. Throughput: 0: 11138.9. Samples: 236497408. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:16,815][1652475] Updated weights for policy 0, policy_version 461776 (0.0013) [2024-06-15 17:35:18,537][1652475] Updated weights for policy 0, policy_version 461856 (0.0104) [2024-06-15 17:35:20,742][1648984] Fps is (10 sec: 52405.5, 60 sec: 45871.8, 300 sec: 43986.2). Total num frames: 945946624. Throughput: 0: 11206.0. Samples: 236529664. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:20,743][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:22,505][1652475] Updated weights for policy 0, policy_version 461904 (0.0013) [2024-06-15 17:35:24,001][1652475] Updated weights for policy 0, policy_version 461954 (0.0011) [2024-06-15 17:35:25,458][1652475] Updated weights for policy 0, policy_version 462016 (0.0016) [2024-06-15 17:35:25,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43698.1, 300 sec: 43986.9). Total num frames: 946208768. Throughput: 0: 11571.2. Samples: 236605440. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:25,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:29,156][1652475] Updated weights for policy 0, policy_version 462082 (0.0017) [2024-06-15 17:35:30,738][1648984] Fps is (10 sec: 52452.1, 60 sec: 48059.8, 300 sec: 44098.0). Total num frames: 946470912. Throughput: 0: 11434.7. Samples: 236666880. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:34,035][1652475] Updated weights for policy 0, policy_version 462154 (0.0013) [2024-06-15 17:35:34,628][1651340] Signal inference workers to stop experience collection... (23700 times) [2024-06-15 17:35:34,685][1652475] InferenceWorker_p0-w0: stopping experience collection (23700 times) [2024-06-15 17:35:34,882][1651340] Signal inference workers to resume experience collection... (23700 times) [2024-06-15 17:35:34,883][1652475] InferenceWorker_p0-w0: resuming experience collection (23700 times) [2024-06-15 17:35:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 44209.1). Total num frames: 946601984. Throughput: 0: 11639.5. Samples: 236707328. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:36,146][1652475] Updated weights for policy 0, policy_version 462240 (0.0018) [2024-06-15 17:35:40,206][1652475] Updated weights for policy 0, policy_version 462305 (0.0012) [2024-06-15 17:35:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 45329.1, 300 sec: 44209.0). Total num frames: 946831360. Throughput: 0: 11582.6. Samples: 236773888. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:41,371][1652475] Updated weights for policy 0, policy_version 462356 (0.0107) [2024-06-15 17:35:45,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 946995200. Throughput: 0: 11571.2. Samples: 236845056. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:46,185][1652475] Updated weights for policy 0, policy_version 462417 (0.0022) [2024-06-15 17:35:47,237][1652475] Updated weights for policy 0, policy_version 462464 (0.0011) [2024-06-15 17:35:48,734][1652475] Updated weights for policy 0, policy_version 462528 (0.0015) [2024-06-15 17:35:50,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 45329.1, 300 sec: 43986.9). Total num frames: 947257344. Throughput: 0: 11480.2. Samples: 236873216. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:50,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:52,588][1652475] Updated weights for policy 0, policy_version 462580 (0.0014) [2024-06-15 17:35:54,035][1652475] Updated weights for policy 0, policy_version 462648 (0.0013) [2024-06-15 17:35:55,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 46421.4, 300 sec: 43986.9). Total num frames: 947519488. Throughput: 0: 11366.4. Samples: 236936704. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:35:55,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:35:58,016][1652475] Updated weights for policy 0, policy_version 462689 (0.0011) [2024-06-15 17:35:59,988][1652475] Updated weights for policy 0, policy_version 462777 (0.0144) [2024-06-15 17:36:00,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 45875.3, 300 sec: 44098.0). Total num frames: 947781632. Throughput: 0: 11229.9. Samples: 237002752. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:36:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:36:04,026][1652475] Updated weights for policy 0, policy_version 462832 (0.0014) [2024-06-15 17:36:05,516][1652475] Updated weights for policy 0, policy_version 462907 (0.0014) [2024-06-15 17:36:05,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 44098.0). Total num frames: 948043776. Throughput: 0: 11424.4. Samples: 237043712. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:36:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:36:10,024][1652475] Updated weights for policy 0, policy_version 462969 (0.0110) [2024-06-15 17:36:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 948174848. Throughput: 0: 11275.4. Samples: 237112832. Policy #0 lag: (min: 15.0, avg: 171.0, max: 367.0) [2024-06-15 17:36:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:36:11,854][1652475] Updated weights for policy 0, policy_version 463034 (0.0010) [2024-06-15 17:36:15,295][1652475] Updated weights for policy 0, policy_version 463077 (0.0013) [2024-06-15 17:36:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 44098.0). Total num frames: 948436992. Throughput: 0: 11411.9. Samples: 237180416. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:36:15,741][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:36:16,181][1651340] Signal inference workers to stop experience collection... (23750 times) [2024-06-15 17:36:16,263][1652475] InferenceWorker_p0-w0: stopping experience collection (23750 times) [2024-06-15 17:36:16,341][1651340] Signal inference workers to resume experience collection... (23750 times) [2024-06-15 17:36:16,374][1652475] InferenceWorker_p0-w0: resuming experience collection (23750 times) [2024-06-15 17:36:16,685][1652475] Updated weights for policy 0, policy_version 463152 (0.0014) [2024-06-15 17:36:20,644][1652475] Updated weights for policy 0, policy_version 463200 (0.0012) [2024-06-15 17:36:20,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 44786.2, 300 sec: 44209.1). Total num frames: 948633600. Throughput: 0: 11309.5. Samples: 237216256. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:36:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:36:22,804][1652475] Updated weights for policy 0, policy_version 463287 (0.0091) [2024-06-15 17:36:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 948830208. Throughput: 0: 11241.2. Samples: 237279744. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:36:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:36:27,120][1652475] Updated weights for policy 0, policy_version 463344 (0.0015) [2024-06-15 17:36:28,565][1652475] Updated weights for policy 0, policy_version 463414 (0.0094) [2024-06-15 17:36:30,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 949092352. Throughput: 0: 11161.6. Samples: 237347328. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:36:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 17:36:32,050][1652475] Updated weights for policy 0, policy_version 463456 (0.0014) [2024-06-15 17:36:33,938][1652475] Updated weights for policy 0, policy_version 463523 (0.0014) [2024-06-15 17:36:35,758][1648984] Fps is (10 sec: 52323.2, 60 sec: 45859.7, 300 sec: 44317.1). Total num frames: 949354496. Throughput: 0: 11213.5. Samples: 237378048. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:36:35,759][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 17:36:40,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 949452800. Throughput: 0: 11332.3. Samples: 237446656. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:36:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:36:40,807][1652475] Updated weights for policy 0, policy_version 463603 (0.0013) [2024-06-15 17:36:43,683][1652475] Updated weights for policy 0, policy_version 463696 (0.0014) [2024-06-15 17:36:45,738][1648984] Fps is (10 sec: 39401.1, 60 sec: 45875.2, 300 sec: 44431.2). Total num frames: 949747712. Throughput: 0: 10934.0. Samples: 237494784. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:36:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:36:46,153][1652475] Updated weights for policy 0, policy_version 463746 (0.0014) [2024-06-15 17:36:47,381][1652475] Updated weights for policy 0, policy_version 463808 (0.0013) [2024-06-15 17:36:50,738][1648984] Fps is (10 sec: 42597.1, 60 sec: 43690.5, 300 sec: 43986.9). Total num frames: 949878784. Throughput: 0: 10740.5. Samples: 237527040. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:36:50,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:36:53,378][1652475] Updated weights for policy 0, policy_version 463868 (0.0115) [2024-06-15 17:36:54,860][1652475] Updated weights for policy 0, policy_version 463936 (0.0127) [2024-06-15 17:36:55,740][1648984] Fps is (10 sec: 42588.5, 60 sec: 44235.1, 300 sec: 44097.6). Total num frames: 950173696. Throughput: 0: 10660.4. Samples: 237592576. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:36:55,741][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:36:56,152][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000463968_950206464.pth... [2024-06-15 17:36:56,370][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000458688_939393024.pth [2024-06-15 17:37:00,063][1652475] Updated weights for policy 0, policy_version 464004 (0.0014) [2024-06-15 17:37:00,738][1648984] Fps is (10 sec: 45876.9, 60 sec: 42598.4, 300 sec: 44209.0). Total num frames: 950337536. Throughput: 0: 10570.0. Samples: 237656064. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:37:00,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 17:37:01,130][1652475] Updated weights for policy 0, policy_version 464060 (0.0013) [2024-06-15 17:37:04,578][1652475] Updated weights for policy 0, policy_version 464117 (0.0088) [2024-06-15 17:37:05,572][1651340] Signal inference workers to stop experience collection... (23800 times) [2024-06-15 17:37:05,605][1652475] InferenceWorker_p0-w0: stopping experience collection (23800 times) [2024-06-15 17:37:05,738][1648984] Fps is (10 sec: 39329.1, 60 sec: 42051.9, 300 sec: 44209.0). Total num frames: 950566912. Throughput: 0: 10604.0. Samples: 237693440. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:37:05,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 17:37:05,769][1651340] Signal inference workers to resume experience collection... (23800 times) [2024-06-15 17:37:05,770][1652475] InferenceWorker_p0-w0: resuming experience collection (23800 times) [2024-06-15 17:37:05,773][1652475] Updated weights for policy 0, policy_version 464160 (0.0013) [2024-06-15 17:37:10,149][1652475] Updated weights for policy 0, policy_version 464229 (0.0090) [2024-06-15 17:37:10,738][1648984] Fps is (10 sec: 45873.8, 60 sec: 43690.5, 300 sec: 44320.1). Total num frames: 950796288. Throughput: 0: 10592.7. Samples: 237756416. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:37:10,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 17:37:11,790][1652475] Updated weights for policy 0, policy_version 464258 (0.0012) [2024-06-15 17:37:13,068][1652475] Updated weights for policy 0, policy_version 464319 (0.0111) [2024-06-15 17:37:15,738][1648984] Fps is (10 sec: 45877.2, 60 sec: 43144.5, 300 sec: 44764.4). Total num frames: 951025664. Throughput: 0: 10547.2. Samples: 237821952. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:37:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:37:15,822][1652475] Updated weights for policy 0, policy_version 464384 (0.0014) [2024-06-15 17:37:17,832][1652475] Updated weights for policy 0, policy_version 464447 (0.0012) [2024-06-15 17:37:20,739][1648984] Fps is (10 sec: 39322.2, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 951189504. Throughput: 0: 10551.9. Samples: 237852672. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:37:20,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:37:23,026][1652475] Updated weights for policy 0, policy_version 464512 (0.0014) [2024-06-15 17:37:24,241][1652475] Updated weights for policy 0, policy_version 464551 (0.0011) [2024-06-15 17:37:24,726][1652475] Updated weights for policy 0, policy_version 464576 (0.0014) [2024-06-15 17:37:25,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 951451648. Throughput: 0: 10638.2. Samples: 237925376. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:37:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:37:26,744][1652475] Updated weights for policy 0, policy_version 464628 (0.0044) [2024-06-15 17:37:28,030][1652475] Updated weights for policy 0, policy_version 464672 (0.0013) [2024-06-15 17:37:30,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 44653.3). Total num frames: 951713792. Throughput: 0: 11207.1. Samples: 237999104. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:37:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:37:33,199][1652475] Updated weights for policy 0, policy_version 464724 (0.0100) [2024-06-15 17:37:34,590][1652475] Updated weights for policy 0, policy_version 464790 (0.0012) [2024-06-15 17:37:35,467][1652475] Updated weights for policy 0, policy_version 464830 (0.0011) [2024-06-15 17:37:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43705.4, 300 sec: 44875.5). Total num frames: 951975936. Throughput: 0: 11332.3. Samples: 238036992. Policy #0 lag: (min: 32.0, avg: 162.1, max: 304.0) [2024-06-15 17:37:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:37:37,889][1652475] Updated weights for policy 0, policy_version 464895 (0.0014) [2024-06-15 17:37:40,088][1652475] Updated weights for policy 0, policy_version 464944 (0.0013) [2024-06-15 17:37:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 46421.3, 300 sec: 45208.7). Total num frames: 952238080. Throughput: 0: 11310.1. Samples: 238101504. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:37:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:37:45,413][1652475] Updated weights for policy 0, policy_version 465013 (0.0013) [2024-06-15 17:37:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 44542.3). Total num frames: 952369152. Throughput: 0: 11446.0. Samples: 238171136. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:37:45,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:37:46,983][1652475] Updated weights for policy 0, policy_version 465075 (0.0011) [2024-06-15 17:37:48,722][1652475] Updated weights for policy 0, policy_version 465105 (0.0013) [2024-06-15 17:37:50,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 45875.3, 300 sec: 44875.5). Total num frames: 952631296. Throughput: 0: 11377.9. Samples: 238205440. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:37:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:37:51,077][1651340] Signal inference workers to stop experience collection... (23850 times) [2024-06-15 17:37:51,116][1652475] InferenceWorker_p0-w0: stopping experience collection (23850 times) [2024-06-15 17:37:51,142][1652475] Updated weights for policy 0, policy_version 465171 (0.0013) [2024-06-15 17:37:51,358][1651340] Signal inference workers to resume experience collection... (23850 times) [2024-06-15 17:37:51,359][1652475] InferenceWorker_p0-w0: resuming experience collection (23850 times) [2024-06-15 17:37:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43146.2, 300 sec: 44431.2). Total num frames: 952762368. Throughput: 0: 11423.3. Samples: 238270464. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:37:55,742][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:37:56,052][1652475] Updated weights for policy 0, policy_version 465222 (0.0011) [2024-06-15 17:37:57,769][1652475] Updated weights for policy 0, policy_version 465297 (0.0012) [2024-06-15 17:38:00,247][1652475] Updated weights for policy 0, policy_version 465360 (0.0020) [2024-06-15 17:38:00,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 45875.1, 300 sec: 44986.6). Total num frames: 953090048. Throughput: 0: 11457.4. Samples: 238337536. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:38:01,256][1652475] Updated weights for policy 0, policy_version 465401 (0.0013) [2024-06-15 17:38:03,817][1652475] Updated weights for policy 0, policy_version 465463 (0.0022) [2024-06-15 17:38:05,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 45329.2, 300 sec: 44653.3). Total num frames: 953286656. Throughput: 0: 11411.9. Samples: 238366208. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:05,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:38:09,308][1652475] Updated weights for policy 0, policy_version 465509 (0.0017) [2024-06-15 17:38:10,549][1652475] Updated weights for policy 0, policy_version 465555 (0.0021) [2024-06-15 17:38:10,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 44237.0, 300 sec: 44542.3). Total num frames: 953450496. Throughput: 0: 11446.0. Samples: 238440448. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:38:12,209][1652475] Updated weights for policy 0, policy_version 465632 (0.0015) [2024-06-15 17:38:15,709][1652475] Updated weights for policy 0, policy_version 465696 (0.0015) [2024-06-15 17:38:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 45328.9, 300 sec: 44764.4). Total num frames: 953745408. Throughput: 0: 11172.9. Samples: 238501888. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:15,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:38:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 953843712. Throughput: 0: 11047.8. Samples: 238534144. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:38:20,866][1652475] Updated weights for policy 0, policy_version 465760 (0.0014) [2024-06-15 17:38:23,119][1652475] Updated weights for policy 0, policy_version 465860 (0.0013) [2024-06-15 17:38:25,738][1648984] Fps is (10 sec: 45876.1, 60 sec: 45875.2, 300 sec: 44875.5). Total num frames: 954204160. Throughput: 0: 10979.5. Samples: 238595584. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:38:26,998][1652475] Updated weights for policy 0, policy_version 465928 (0.0024) [2024-06-15 17:38:28,171][1652475] Updated weights for policy 0, policy_version 465984 (0.0013) [2024-06-15 17:38:30,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 954335232. Throughput: 0: 11127.5. Samples: 238671872. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:38:33,407][1652475] Updated weights for policy 0, policy_version 466064 (0.0013) [2024-06-15 17:38:34,394][1651340] Signal inference workers to stop experience collection... (23900 times) [2024-06-15 17:38:34,456][1652475] InferenceWorker_p0-w0: stopping experience collection (23900 times) [2024-06-15 17:38:34,619][1651340] Signal inference workers to resume experience collection... (23900 times) [2024-06-15 17:38:34,620][1652475] InferenceWorker_p0-w0: resuming experience collection (23900 times) [2024-06-15 17:38:35,338][1652475] Updated weights for policy 0, policy_version 466144 (0.0034) [2024-06-15 17:38:35,738][1648984] Fps is (10 sec: 49150.7, 60 sec: 45328.8, 300 sec: 44764.4). Total num frames: 954695680. Throughput: 0: 11172.9. Samples: 238708224. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:35,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:38:38,268][1652475] Updated weights for policy 0, policy_version 466182 (0.0015) [2024-06-15 17:38:39,785][1652475] Updated weights for policy 0, policy_version 466240 (0.0013) [2024-06-15 17:38:40,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.5, 300 sec: 44431.2). Total num frames: 954859520. Throughput: 0: 11013.7. Samples: 238766080. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:40,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:38:45,738][1648984] Fps is (10 sec: 36045.9, 60 sec: 44782.9, 300 sec: 44548.4). Total num frames: 955056128. Throughput: 0: 11025.1. Samples: 238833664. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:38:46,191][1652475] Updated weights for policy 0, policy_version 466367 (0.0014) [2024-06-15 17:38:49,024][1652475] Updated weights for policy 0, policy_version 466424 (0.0021) [2024-06-15 17:38:50,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 955252736. Throughput: 0: 11002.4. Samples: 238861312. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:38:52,471][1652475] Updated weights for policy 0, policy_version 466493 (0.0015) [2024-06-15 17:38:55,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44783.0, 300 sec: 44209.0). Total num frames: 955449344. Throughput: 0: 10854.4. Samples: 238928896. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:38:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:38:56,089][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000466544_955482112.pth... [2024-06-15 17:38:56,271][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000461328_944799744.pth [2024-06-15 17:38:56,968][1652475] Updated weights for policy 0, policy_version 466576 (0.0025) [2024-06-15 17:39:00,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 42598.3, 300 sec: 44320.1). Total num frames: 955645952. Throughput: 0: 10797.5. Samples: 238987776. Policy #0 lag: (min: 11.0, avg: 128.4, max: 267.0) [2024-06-15 17:39:00,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 17:39:02,305][1652475] Updated weights for policy 0, policy_version 466659 (0.0016) [2024-06-15 17:39:05,126][1652475] Updated weights for policy 0, policy_version 466713 (0.0015) [2024-06-15 17:39:05,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.7, 300 sec: 44320.1). Total num frames: 955875328. Throughput: 0: 10695.1. Samples: 239015424. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:39:07,830][1652475] Updated weights for policy 0, policy_version 466778 (0.0014) [2024-06-15 17:39:09,541][1652475] Updated weights for policy 0, policy_version 466817 (0.0053) [2024-06-15 17:39:10,738][1648984] Fps is (10 sec: 49153.0, 60 sec: 44782.9, 300 sec: 44320.1). Total num frames: 956137472. Throughput: 0: 10820.3. Samples: 239082496. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:39:10,847][1652475] Updated weights for policy 0, policy_version 466879 (0.0013) [2024-06-15 17:39:13,778][1652475] Updated weights for policy 0, policy_version 466936 (0.0013) [2024-06-15 17:39:15,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42598.6, 300 sec: 44431.2). Total num frames: 956301312. Throughput: 0: 10615.5. Samples: 239149568. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:15,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 17:39:17,216][1652475] Updated weights for policy 0, policy_version 466998 (0.0022) [2024-06-15 17:39:20,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43690.7, 300 sec: 43655.2). Total num frames: 956465152. Throughput: 0: 10535.9. Samples: 239182336. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:39:21,781][1652475] Updated weights for policy 0, policy_version 467066 (0.0014) [2024-06-15 17:39:22,499][1651340] Signal inference workers to stop experience collection... (23950 times) [2024-06-15 17:39:22,579][1652475] InferenceWorker_p0-w0: stopping experience collection (23950 times) [2024-06-15 17:39:22,797][1651340] Signal inference workers to resume experience collection... (23950 times) [2024-06-15 17:39:22,798][1652475] InferenceWorker_p0-w0: resuming experience collection (23950 times) [2024-06-15 17:39:23,387][1652475] Updated weights for policy 0, policy_version 467133 (0.0013) [2024-06-15 17:39:25,107][1652475] Updated weights for policy 0, policy_version 467184 (0.0023) [2024-06-15 17:39:25,742][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 956825600. Throughput: 0: 10592.7. Samples: 239242752. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:25,744][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:39:29,086][1652475] Updated weights for policy 0, policy_version 467248 (0.0026) [2024-06-15 17:39:30,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 956956672. Throughput: 0: 10604.1. Samples: 239310848. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:39:33,835][1652475] Updated weights for policy 0, policy_version 467296 (0.0153) [2024-06-15 17:39:35,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41506.3, 300 sec: 44320.1). Total num frames: 957186048. Throughput: 0: 10911.3. Samples: 239352320. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:39:36,057][1652475] Updated weights for policy 0, policy_version 467392 (0.0013) [2024-06-15 17:39:37,663][1652475] Updated weights for policy 0, policy_version 467454 (0.0013) [2024-06-15 17:39:40,739][1648984] Fps is (10 sec: 39314.8, 60 sec: 41505.1, 300 sec: 43986.6). Total num frames: 957349888. Throughput: 0: 10478.5. Samples: 239400448. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:40,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:39:41,637][1652475] Updated weights for policy 0, policy_version 467507 (0.0116) [2024-06-15 17:39:45,739][1648984] Fps is (10 sec: 32767.5, 60 sec: 40959.9, 300 sec: 43986.9). Total num frames: 957513728. Throughput: 0: 10865.8. Samples: 239476736. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:45,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:39:46,504][1652475] Updated weights for policy 0, policy_version 467568 (0.0013) [2024-06-15 17:39:48,648][1652475] Updated weights for policy 0, policy_version 467648 (0.0017) [2024-06-15 17:39:50,738][1648984] Fps is (10 sec: 52437.9, 60 sec: 43690.7, 300 sec: 44542.3). Total num frames: 957874176. Throughput: 0: 10774.8. Samples: 239500288. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:39:53,088][1652475] Updated weights for policy 0, policy_version 467719 (0.0025) [2024-06-15 17:39:55,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 958005248. Throughput: 0: 10615.5. Samples: 239560192. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:39:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:39:57,947][1652475] Updated weights for policy 0, policy_version 467778 (0.0014) [2024-06-15 17:39:59,311][1652475] Updated weights for policy 0, policy_version 467840 (0.0013) [2024-06-15 17:40:00,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.8, 300 sec: 44320.1). Total num frames: 958267392. Throughput: 0: 10592.7. Samples: 239626240. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:40:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:02,273][1652475] Updated weights for policy 0, policy_version 467968 (0.0015) [2024-06-15 17:40:05,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 44098.0). Total num frames: 958431232. Throughput: 0: 10467.6. Samples: 239653376. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:40:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:06,530][1652475] Updated weights for policy 0, policy_version 468027 (0.0014) [2024-06-15 17:40:10,595][1651340] Signal inference workers to stop experience collection... (24000 times) [2024-06-15 17:40:10,677][1652475] InferenceWorker_p0-w0: stopping experience collection (24000 times) [2024-06-15 17:40:10,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 40413.8, 300 sec: 43653.6). Total num frames: 958562304. Throughput: 0: 10808.9. Samples: 239729152. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:40:10,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:10,870][1651340] Signal inference workers to resume experience collection... (24000 times) [2024-06-15 17:40:10,871][1652475] InferenceWorker_p0-w0: resuming experience collection (24000 times) [2024-06-15 17:40:11,231][1652475] Updated weights for policy 0, policy_version 468080 (0.0020) [2024-06-15 17:40:13,122][1652475] Updated weights for policy 0, policy_version 468160 (0.0019) [2024-06-15 17:40:15,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43690.6, 300 sec: 43987.5). Total num frames: 958922752. Throughput: 0: 10501.7. Samples: 239783424. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:40:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:17,802][1652475] Updated weights for policy 0, policy_version 468245 (0.0012) [2024-06-15 17:40:20,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 959053824. Throughput: 0: 10331.0. Samples: 239817216. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:40:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:22,706][1652475] Updated weights for policy 0, policy_version 468296 (0.0105) [2024-06-15 17:40:24,821][1652475] Updated weights for policy 0, policy_version 468384 (0.0117) [2024-06-15 17:40:25,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 959315968. Throughput: 0: 10775.2. Samples: 239885312. Policy #0 lag: (min: 15.0, avg: 119.5, max: 271.0) [2024-06-15 17:40:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:26,669][1652475] Updated weights for policy 0, policy_version 468478 (0.0013) [2024-06-15 17:40:30,709][1652475] Updated weights for policy 0, policy_version 468537 (0.0089) [2024-06-15 17:40:30,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 959545344. Throughput: 0: 10467.6. Samples: 239947776. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:40:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:35,596][1652475] Updated weights for policy 0, policy_version 468592 (0.0034) [2024-06-15 17:40:35,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 959676416. Throughput: 0: 10843.0. Samples: 239988224. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:40:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:37,471][1652475] Updated weights for policy 0, policy_version 468674 (0.0132) [2024-06-15 17:40:38,850][1652475] Updated weights for policy 0, policy_version 468736 (0.0013) [2024-06-15 17:40:40,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43691.9, 300 sec: 43986.9). Total num frames: 959971328. Throughput: 0: 10820.3. Samples: 240047104. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:40:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:42,531][1652475] Updated weights for policy 0, policy_version 468796 (0.0011) [2024-06-15 17:40:45,742][1648984] Fps is (10 sec: 42579.0, 60 sec: 43141.4, 300 sec: 43541.9). Total num frames: 960102400. Throughput: 0: 11103.6. Samples: 240125952. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:40:45,743][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:46,610][1652475] Updated weights for policy 0, policy_version 468861 (0.0012) [2024-06-15 17:40:48,150][1652475] Updated weights for policy 0, policy_version 468926 (0.0013) [2024-06-15 17:40:49,688][1652475] Updated weights for policy 0, policy_version 468985 (0.0013) [2024-06-15 17:40:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 960495616. Throughput: 0: 11150.2. Samples: 240155136. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:40:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:40:52,386][1651340] Signal inference workers to stop experience collection... (24050 times) [2024-06-15 17:40:52,457][1652475] InferenceWorker_p0-w0: stopping experience collection (24050 times) [2024-06-15 17:40:52,735][1651340] Signal inference workers to resume experience collection... (24050 times) [2024-06-15 17:40:52,736][1652475] InferenceWorker_p0-w0: resuming experience collection (24050 times) [2024-06-15 17:40:53,479][1652475] Updated weights for policy 0, policy_version 469040 (0.0012) [2024-06-15 17:40:55,738][1648984] Fps is (10 sec: 52452.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 960626688. Throughput: 0: 11093.4. Samples: 240228352. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:40:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:40:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000469056_960626688.pth... [2024-06-15 17:40:55,806][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000463968_950206464.pth [2024-06-15 17:40:57,001][1652475] Updated weights for policy 0, policy_version 469091 (0.0011) [2024-06-15 17:40:58,848][1652475] Updated weights for policy 0, policy_version 469138 (0.0015) [2024-06-15 17:41:00,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 960921600. Throughput: 0: 11309.5. Samples: 240292352. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:00,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 17:41:00,988][1652475] Updated weights for policy 0, policy_version 469220 (0.0015) [2024-06-15 17:41:05,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44236.7, 300 sec: 43764.7). Total num frames: 961085440. Throughput: 0: 11309.5. Samples: 240326144. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:41:05,811][1652475] Updated weights for policy 0, policy_version 469296 (0.0084) [2024-06-15 17:41:07,722][1652475] Updated weights for policy 0, policy_version 469344 (0.0011) [2024-06-15 17:41:10,194][1652475] Updated weights for policy 0, policy_version 469404 (0.0015) [2024-06-15 17:41:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 46967.6, 300 sec: 43875.8). Total num frames: 961380352. Throughput: 0: 11275.4. Samples: 240392704. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:41:13,834][1652475] Updated weights for policy 0, policy_version 469457 (0.0015) [2024-06-15 17:41:15,738][1648984] Fps is (10 sec: 45874.2, 60 sec: 43690.5, 300 sec: 43764.7). Total num frames: 961544192. Throughput: 0: 11286.7. Samples: 240455680. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:15,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:41:18,213][1652475] Updated weights for policy 0, policy_version 469506 (0.0013) [2024-06-15 17:41:19,706][1652475] Updated weights for policy 0, policy_version 469571 (0.0014) [2024-06-15 17:41:20,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 45329.0, 300 sec: 43875.8). Total num frames: 961773568. Throughput: 0: 11207.1. Samples: 240492544. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 17:41:21,808][1652475] Updated weights for policy 0, policy_version 469680 (0.0107) [2024-06-15 17:41:25,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 961937408. Throughput: 0: 11093.3. Samples: 240546304. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:41:26,815][1652475] Updated weights for policy 0, policy_version 469759 (0.0014) [2024-06-15 17:41:30,738][1648984] Fps is (10 sec: 29491.5, 60 sec: 42052.3, 300 sec: 43101.2). Total num frames: 962068480. Throughput: 0: 10957.9. Samples: 240619008. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:41:32,739][1652475] Updated weights for policy 0, policy_version 469825 (0.0028) [2024-06-15 17:41:34,919][1652475] Updated weights for policy 0, policy_version 469920 (0.0013) [2024-06-15 17:41:35,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 45875.3, 300 sec: 43986.9). Total num frames: 962428928. Throughput: 0: 11013.7. Samples: 240650752. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:41:37,436][1652475] Updated weights for policy 0, policy_version 469954 (0.0014) [2024-06-15 17:41:38,165][1651340] Signal inference workers to stop experience collection... (24100 times) [2024-06-15 17:41:38,263][1652475] InferenceWorker_p0-w0: stopping experience collection (24100 times) [2024-06-15 17:41:38,393][1651340] Signal inference workers to resume experience collection... (24100 times) [2024-06-15 17:41:38,398][1652475] InferenceWorker_p0-w0: resuming experience collection (24100 times) [2024-06-15 17:41:40,744][1648984] Fps is (10 sec: 52396.9, 60 sec: 43686.3, 300 sec: 43541.7). Total num frames: 962592768. Throughput: 0: 10591.3. Samples: 240705024. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:40,744][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:41:42,279][1652475] Updated weights for policy 0, policy_version 470019 (0.0257) [2024-06-15 17:41:45,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 43694.0, 300 sec: 43542.6). Total num frames: 962723840. Throughput: 0: 10808.9. Samples: 240778752. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:41:46,260][1652475] Updated weights for policy 0, policy_version 470098 (0.0013) [2024-06-15 17:41:48,256][1652475] Updated weights for policy 0, policy_version 470176 (0.0017) [2024-06-15 17:41:50,090][1652475] Updated weights for policy 0, policy_version 470242 (0.0014) [2024-06-15 17:41:50,738][1648984] Fps is (10 sec: 52460.5, 60 sec: 43690.6, 300 sec: 43876.1). Total num frames: 963117056. Throughput: 0: 10558.6. Samples: 240801280. Policy #0 lag: (min: 95.0, avg: 215.6, max: 335.0) [2024-06-15 17:41:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:41:54,953][1652475] Updated weights for policy 0, policy_version 470320 (0.0017) [2024-06-15 17:41:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 963248128. Throughput: 0: 10569.9. Samples: 240868352. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:41:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:41:59,193][1652475] Updated weights for policy 0, policy_version 470394 (0.0012) [2024-06-15 17:42:00,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 42052.3, 300 sec: 43653.7). Total num frames: 963444736. Throughput: 0: 10683.8. Samples: 240936448. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:01,131][1652475] Updated weights for policy 0, policy_version 470448 (0.0017) [2024-06-15 17:42:03,104][1652475] Updated weights for policy 0, policy_version 470519 (0.0026) [2024-06-15 17:42:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 963641344. Throughput: 0: 10376.5. Samples: 240959488. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:07,010][1652475] Updated weights for policy 0, policy_version 470563 (0.0012) [2024-06-15 17:42:09,605][1652475] Updated weights for policy 0, policy_version 470593 (0.0012) [2024-06-15 17:42:10,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 963870720. Throughput: 0: 10968.2. Samples: 241039872. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:11,674][1652475] Updated weights for policy 0, policy_version 470658 (0.0105) [2024-06-15 17:42:14,272][1652475] Updated weights for policy 0, policy_version 470768 (0.0015) [2024-06-15 17:42:15,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 964165632. Throughput: 0: 10513.0. Samples: 241092096. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:15,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:19,113][1652475] Updated weights for policy 0, policy_version 470832 (0.0015) [2024-06-15 17:42:20,754][1648984] Fps is (10 sec: 42528.6, 60 sec: 42040.8, 300 sec: 43540.1). Total num frames: 964296704. Throughput: 0: 10725.3. Samples: 241133568. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:20,755][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:21,688][1652475] Updated weights for policy 0, policy_version 470880 (0.0011) [2024-06-15 17:42:24,030][1652475] Updated weights for policy 0, policy_version 470931 (0.0013) [2024-06-15 17:42:24,690][1651340] Signal inference workers to stop experience collection... (24150 times) [2024-06-15 17:42:24,756][1652475] InferenceWorker_p0-w0: stopping experience collection (24150 times) [2024-06-15 17:42:25,020][1651340] Signal inference workers to resume experience collection... (24150 times) [2024-06-15 17:42:25,021][1652475] InferenceWorker_p0-w0: resuming experience collection (24150 times) [2024-06-15 17:42:25,705][1652475] Updated weights for policy 0, policy_version 470995 (0.0019) [2024-06-15 17:42:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 964591616. Throughput: 0: 11003.8. Samples: 241200128. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:30,738][1648984] Fps is (10 sec: 45950.9, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 964755456. Throughput: 0: 10808.9. Samples: 241265152. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:31,101][1652475] Updated weights for policy 0, policy_version 471101 (0.0015) [2024-06-15 17:42:34,384][1652475] Updated weights for policy 0, policy_version 471152 (0.0013) [2024-06-15 17:42:35,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42052.2, 300 sec: 43098.2). Total num frames: 964952064. Throughput: 0: 11059.2. Samples: 241298944. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:36,407][1652475] Updated weights for policy 0, policy_version 471204 (0.0016) [2024-06-15 17:42:37,460][1652475] Updated weights for policy 0, policy_version 471248 (0.0012) [2024-06-15 17:42:40,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43695.1, 300 sec: 43542.6). Total num frames: 965214208. Throughput: 0: 10968.2. Samples: 241361920. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:40,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:41,968][1652475] Updated weights for policy 0, policy_version 471297 (0.0011) [2024-06-15 17:42:43,269][1652475] Updated weights for policy 0, policy_version 471355 (0.0012) [2024-06-15 17:42:45,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 965410816. Throughput: 0: 11093.3. Samples: 241435648. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:45,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:45,768][1652475] Updated weights for policy 0, policy_version 471394 (0.0017) [2024-06-15 17:42:46,843][1652475] Updated weights for policy 0, policy_version 471426 (0.0016) [2024-06-15 17:42:48,921][1652475] Updated weights for policy 0, policy_version 471504 (0.0013) [2024-06-15 17:42:49,897][1652475] Updated weights for policy 0, policy_version 471551 (0.0012) [2024-06-15 17:42:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 965738496. Throughput: 0: 11241.3. Samples: 241465344. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:50,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:55,739][1648984] Fps is (10 sec: 45868.2, 60 sec: 43689.6, 300 sec: 43320.2). Total num frames: 965869568. Throughput: 0: 10933.7. Samples: 241531904. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:42:55,740][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:42:55,754][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000471616_965869568.pth... [2024-06-15 17:42:55,831][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000466544_955482112.pth [2024-06-15 17:42:57,223][1652475] Updated weights for policy 0, policy_version 471632 (0.0014) [2024-06-15 17:42:58,429][1652475] Updated weights for policy 0, policy_version 471677 (0.0021) [2024-06-15 17:43:00,430][1652475] Updated weights for policy 0, policy_version 471744 (0.0012) [2024-06-15 17:43:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44782.8, 300 sec: 43542.6). Total num frames: 966131712. Throughput: 0: 11184.4. Samples: 241595392. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:43:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:43:05,738][1648984] Fps is (10 sec: 39327.9, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 966262784. Throughput: 0: 10835.6. Samples: 241620992. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:43:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:43:06,377][1652475] Updated weights for policy 0, policy_version 471824 (0.0108) [2024-06-15 17:43:09,782][1652475] Updated weights for policy 0, policy_version 471874 (0.0014) [2024-06-15 17:43:10,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 43144.6, 300 sec: 43098.3). Total num frames: 966459392. Throughput: 0: 11013.7. Samples: 241695744. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:43:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 17:43:11,320][1651340] Signal inference workers to stop experience collection... (24200 times) [2024-06-15 17:43:11,405][1652475] InferenceWorker_p0-w0: stopping experience collection (24200 times) [2024-06-15 17:43:11,707][1651340] Signal inference workers to resume experience collection... (24200 times) [2024-06-15 17:43:11,708][1652475] InferenceWorker_p0-w0: resuming experience collection (24200 times) [2024-06-15 17:43:12,277][1652475] Updated weights for policy 0, policy_version 471970 (0.0102) [2024-06-15 17:43:13,190][1652475] Updated weights for policy 0, policy_version 472007 (0.0013) [2024-06-15 17:43:15,771][1648984] Fps is (10 sec: 52255.8, 60 sec: 43666.6, 300 sec: 43870.9). Total num frames: 966787072. Throughput: 0: 10744.1. Samples: 241748992. Policy #0 lag: (min: 7.0, avg: 110.2, max: 263.0) [2024-06-15 17:43:15,772][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 17:43:18,629][1652475] Updated weights for policy 0, policy_version 472067 (0.0022) [2024-06-15 17:43:19,882][1652475] Updated weights for policy 0, policy_version 472128 (0.0023) [2024-06-15 17:43:20,737][1648984] Fps is (10 sec: 45875.6, 60 sec: 43702.7, 300 sec: 43098.3). Total num frames: 966918144. Throughput: 0: 10877.2. Samples: 241788416. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:43:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:43:22,943][1652475] Updated weights for policy 0, policy_version 472192 (0.0014) [2024-06-15 17:43:25,738][1648984] Fps is (10 sec: 36164.1, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 967147520. Throughput: 0: 10865.8. Samples: 241850880. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:43:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:43:26,376][1652475] Updated weights for policy 0, policy_version 472277 (0.0170) [2024-06-15 17:43:30,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 42598.3, 300 sec: 42765.0). Total num frames: 967311360. Throughput: 0: 10581.3. Samples: 241911808. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:43:30,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:43:32,720][1652475] Updated weights for policy 0, policy_version 472368 (0.0131) [2024-06-15 17:43:33,289][1652475] Updated weights for policy 0, policy_version 472384 (0.0013) [2024-06-15 17:43:34,791][1652475] Updated weights for policy 0, policy_version 472444 (0.0011) [2024-06-15 17:43:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 967573504. Throughput: 0: 10604.1. Samples: 241942528. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:43:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:43:39,020][1652475] Updated weights for policy 0, policy_version 472513 (0.0015) [2024-06-15 17:43:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 967835648. Throughput: 0: 10479.3. Samples: 242003456. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:43:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:43:44,174][1652475] Updated weights for policy 0, policy_version 472580 (0.0112) [2024-06-15 17:43:45,604][1652475] Updated weights for policy 0, policy_version 472635 (0.0016) [2024-06-15 17:43:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 967966720. Throughput: 0: 10626.8. Samples: 242073600. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:43:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:43:47,819][1652475] Updated weights for policy 0, policy_version 472704 (0.0014) [2024-06-15 17:43:49,100][1652475] Updated weights for policy 0, policy_version 472766 (0.0015) [2024-06-15 17:43:50,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 968228864. Throughput: 0: 10661.0. Samples: 242100736. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:43:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:43:51,865][1652475] Updated weights for policy 0, policy_version 472826 (0.0014) [2024-06-15 17:43:55,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 41507.0, 300 sec: 43098.2). Total num frames: 968359936. Throughput: 0: 10547.1. Samples: 242170368. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:43:55,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:43:56,614][1652475] Updated weights for policy 0, policy_version 472880 (0.0046) [2024-06-15 17:44:00,096][1651340] Signal inference workers to stop experience collection... (24250 times) [2024-06-15 17:44:00,152][1652475] InferenceWorker_p0-w0: stopping experience collection (24250 times) [2024-06-15 17:44:00,318][1651340] Signal inference workers to resume experience collection... (24250 times) [2024-06-15 17:44:00,330][1652475] InferenceWorker_p0-w0: resuming experience collection (24250 times) [2024-06-15 17:44:00,688][1652475] Updated weights for policy 0, policy_version 472944 (0.0015) [2024-06-15 17:44:00,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 40960.0, 300 sec: 43098.3). Total num frames: 968589312. Throughput: 0: 10851.0. Samples: 242236928. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:44:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:44:02,148][1652475] Updated weights for policy 0, policy_version 473008 (0.0017) [2024-06-15 17:44:03,751][1652475] Updated weights for policy 0, policy_version 473078 (0.0015) [2024-06-15 17:44:05,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 968884224. Throughput: 0: 10547.2. Samples: 242263040. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:44:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:44:07,963][1652475] Updated weights for policy 0, policy_version 473146 (0.0013) [2024-06-15 17:44:10,748][1648984] Fps is (10 sec: 42554.5, 60 sec: 42591.0, 300 sec: 43096.7). Total num frames: 969015296. Throughput: 0: 10772.3. Samples: 242335744. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:44:10,749][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:44:14,068][1652475] Updated weights for policy 0, policy_version 473248 (0.0256) [2024-06-15 17:44:15,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42075.5, 300 sec: 43542.6). Total num frames: 969310208. Throughput: 0: 10547.2. Samples: 242386432. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:44:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:44:16,249][1652475] Updated weights for policy 0, policy_version 473328 (0.0013) [2024-06-15 17:44:20,549][1652475] Updated weights for policy 0, policy_version 473392 (0.0012) [2024-06-15 17:44:20,738][1648984] Fps is (10 sec: 49202.7, 60 sec: 43144.4, 300 sec: 42987.2). Total num frames: 969506816. Throughput: 0: 10604.1. Samples: 242419712. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:44:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:44:25,045][1652475] Updated weights for policy 0, policy_version 473441 (0.0089) [2024-06-15 17:44:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42052.3, 300 sec: 43098.2). Total num frames: 969670656. Throughput: 0: 10854.4. Samples: 242491904. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:44:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:44:26,980][1652475] Updated weights for policy 0, policy_version 473524 (0.0017) [2024-06-15 17:44:28,127][1652475] Updated weights for policy 0, policy_version 473559 (0.0015) [2024-06-15 17:44:28,884][1652475] Updated weights for policy 0, policy_version 473598 (0.0015) [2024-06-15 17:44:30,756][1648984] Fps is (10 sec: 42521.6, 60 sec: 43677.6, 300 sec: 43206.7). Total num frames: 969932800. Throughput: 0: 10713.6. Samples: 242555904. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:44:30,757][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:44:31,976][1652475] Updated weights for policy 0, policy_version 473650 (0.0156) [2024-06-15 17:44:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 43098.5). Total num frames: 970063872. Throughput: 0: 10899.9. Samples: 242591232. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:44:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:44:36,178][1652475] Updated weights for policy 0, policy_version 473680 (0.0012) [2024-06-15 17:44:38,960][1652475] Updated weights for policy 0, policy_version 473786 (0.0015) [2024-06-15 17:44:40,739][1648984] Fps is (10 sec: 45952.0, 60 sec: 42597.5, 300 sec: 43653.5). Total num frames: 970391552. Throughput: 0: 10547.0. Samples: 242644992. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:44:40,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:44:41,298][1652475] Updated weights for policy 0, policy_version 473850 (0.0013) [2024-06-15 17:44:43,589][1651340] Signal inference workers to stop experience collection... (24300 times) [2024-06-15 17:44:43,612][1652475] InferenceWorker_p0-w0: stopping experience collection (24300 times) [2024-06-15 17:44:43,771][1651340] Signal inference workers to resume experience collection... (24300 times) [2024-06-15 17:44:43,772][1652475] InferenceWorker_p0-w0: resuming experience collection (24300 times) [2024-06-15 17:44:44,633][1652475] Updated weights for policy 0, policy_version 473915 (0.0013) [2024-06-15 17:44:45,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 970588160. Throughput: 0: 10683.7. Samples: 242717696. Policy #0 lag: (min: 47.0, avg: 147.2, max: 303.0) [2024-06-15 17:44:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:44:49,299][1652475] Updated weights for policy 0, policy_version 473987 (0.0130) [2024-06-15 17:44:50,738][1648984] Fps is (10 sec: 45881.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 970850304. Throughput: 0: 10877.2. Samples: 242752512. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:44:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:44:52,471][1652475] Updated weights for policy 0, policy_version 474067 (0.0017) [2024-06-15 17:44:53,344][1652475] Updated weights for policy 0, policy_version 474110 (0.0014) [2024-06-15 17:44:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.8, 300 sec: 43098.2). Total num frames: 970981376. Throughput: 0: 10606.5. Samples: 242812928. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:44:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:44:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000474112_970981376.pth... [2024-06-15 17:44:55,914][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000469056_960626688.pth [2024-06-15 17:44:57,513][1652475] Updated weights for policy 0, policy_version 474176 (0.0140) [2024-06-15 17:45:00,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 971177984. Throughput: 0: 11002.3. Samples: 242881536. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:45:01,663][1652475] Updated weights for policy 0, policy_version 474256 (0.0013) [2024-06-15 17:45:04,373][1652475] Updated weights for policy 0, policy_version 474320 (0.0012) [2024-06-15 17:45:05,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 971505664. Throughput: 0: 10865.8. Samples: 242908672. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:45:08,517][1652475] Updated weights for policy 0, policy_version 474369 (0.0011) [2024-06-15 17:45:09,957][1652475] Updated weights for policy 0, policy_version 474427 (0.0013) [2024-06-15 17:45:10,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43698.1, 300 sec: 43098.3). Total num frames: 971636736. Throughput: 0: 10808.9. Samples: 242978304. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:45:12,651][1652475] Updated weights for policy 0, policy_version 474481 (0.0142) [2024-06-15 17:45:14,469][1652475] Updated weights for policy 0, policy_version 474556 (0.0013) [2024-06-15 17:45:15,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 43144.4, 300 sec: 43542.5). Total num frames: 971898880. Throughput: 0: 10813.2. Samples: 243042304. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:45:17,268][1652475] Updated weights for policy 0, policy_version 474608 (0.0014) [2024-06-15 17:45:20,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42052.2, 300 sec: 43098.2). Total num frames: 972029952. Throughput: 0: 10763.3. Samples: 243075584. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:20,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:45:21,347][1652475] Updated weights for policy 0, policy_version 474656 (0.0015) [2024-06-15 17:45:24,124][1652475] Updated weights for policy 0, policy_version 474740 (0.0099) [2024-06-15 17:45:25,738][1648984] Fps is (10 sec: 45876.1, 60 sec: 44783.0, 300 sec: 43431.5). Total num frames: 972357632. Throughput: 0: 11036.8. Samples: 243141632. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:45:25,753][1652475] Updated weights for policy 0, policy_version 474800 (0.0014) [2024-06-15 17:45:28,967][1651340] Signal inference workers to stop experience collection... (24350 times) [2024-06-15 17:45:29,026][1652475] InferenceWorker_p0-w0: stopping experience collection (24350 times) [2024-06-15 17:45:29,185][1651340] Signal inference workers to resume experience collection... (24350 times) [2024-06-15 17:45:29,186][1652475] InferenceWorker_p0-w0: resuming experience collection (24350 times) [2024-06-15 17:45:29,188][1652475] Updated weights for policy 0, policy_version 474848 (0.0014) [2024-06-15 17:45:30,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43703.8, 300 sec: 43653.6). Total num frames: 972554240. Throughput: 0: 10820.3. Samples: 243204608. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:45:34,059][1652475] Updated weights for policy 0, policy_version 474928 (0.0016) [2024-06-15 17:45:35,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 972718080. Throughput: 0: 10865.8. Samples: 243241472. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:45:36,956][1652475] Updated weights for policy 0, policy_version 475008 (0.0012) [2024-06-15 17:45:38,267][1652475] Updated weights for policy 0, policy_version 475072 (0.0012) [2024-06-15 17:45:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42599.4, 300 sec: 43543.2). Total num frames: 972947456. Throughput: 0: 10774.8. Samples: 243297792. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:45:41,934][1652475] Updated weights for policy 0, policy_version 475124 (0.0041) [2024-06-15 17:45:45,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 973078528. Throughput: 0: 10786.1. Samples: 243366912. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:45:47,207][1652475] Updated weights for policy 0, policy_version 475188 (0.0015) [2024-06-15 17:45:48,896][1652475] Updated weights for policy 0, policy_version 475261 (0.0012) [2024-06-15 17:45:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 973340672. Throughput: 0: 10683.7. Samples: 243389440. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:50,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 17:45:52,557][1652475] Updated weights for policy 0, policy_version 475326 (0.0015) [2024-06-15 17:45:53,854][1652475] Updated weights for policy 0, policy_version 475389 (0.0014) [2024-06-15 17:45:55,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 973602816. Throughput: 0: 10467.6. Samples: 243449344. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:45:55,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 17:46:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 973701120. Throughput: 0: 10649.7. Samples: 243521536. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:46:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:46:01,416][1652475] Updated weights for policy 0, policy_version 475472 (0.0142) [2024-06-15 17:46:03,583][1652475] Updated weights for policy 0, policy_version 475522 (0.0118) [2024-06-15 17:46:05,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 974061568. Throughput: 0: 10444.8. Samples: 243545600. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:46:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 17:46:05,816][1652475] Updated weights for policy 0, policy_version 475621 (0.0140) [2024-06-15 17:46:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 42654.0). Total num frames: 974127104. Throughput: 0: 10331.0. Samples: 243606528. Policy #0 lag: (min: 0.0, avg: 84.6, max: 256.0) [2024-06-15 17:46:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:46:13,779][1652475] Updated weights for policy 0, policy_version 475703 (0.0013) [2024-06-15 17:46:15,354][1652475] Updated weights for policy 0, policy_version 475768 (0.0124) [2024-06-15 17:46:15,604][1651340] Signal inference workers to stop experience collection... (24400 times) [2024-06-15 17:46:15,623][1651340] Signal inference workers to resume experience collection... (24400 times) [2024-06-15 17:46:15,638][1652475] InferenceWorker_p0-w0: stopping experience collection (24400 times) [2024-06-15 17:46:15,666][1652475] InferenceWorker_p0-w0: resuming experience collection (24400 times) [2024-06-15 17:46:15,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.2, 300 sec: 42765.0). Total num frames: 974389248. Throughput: 0: 10285.5. Samples: 243667456. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:46:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:46:16,310][1652475] Updated weights for policy 0, policy_version 475808 (0.0012) [2024-06-15 17:46:17,825][1652475] Updated weights for policy 0, policy_version 475872 (0.0013) [2024-06-15 17:46:20,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 974651392. Throughput: 0: 10057.9. Samples: 243694080. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:46:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:46:25,053][1652475] Updated weights for policy 0, policy_version 475936 (0.0015) [2024-06-15 17:46:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 39867.7, 300 sec: 42987.2). Total num frames: 974749696. Throughput: 0: 10456.2. Samples: 243768320. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:46:25,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 17:46:27,271][1652475] Updated weights for policy 0, policy_version 476001 (0.0032) [2024-06-15 17:46:29,033][1652475] Updated weights for policy 0, policy_version 476086 (0.0013) [2024-06-15 17:46:30,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 975044608. Throughput: 0: 10262.7. Samples: 243828736. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:46:30,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 17:46:31,913][1652475] Updated weights for policy 0, policy_version 476118 (0.0039) [2024-06-15 17:46:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 40959.9, 300 sec: 42654.8). Total num frames: 975175680. Throughput: 0: 10444.8. Samples: 243859456. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:46:35,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:46:36,454][1652475] Updated weights for policy 0, policy_version 476186 (0.0014) [2024-06-15 17:46:38,613][1652475] Updated weights for policy 0, policy_version 476229 (0.0104) [2024-06-15 17:46:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 975470592. Throughput: 0: 10592.7. Samples: 243926016. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:46:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:46:40,816][1652475] Updated weights for policy 0, policy_version 476305 (0.0013) [2024-06-15 17:46:41,913][1652475] Updated weights for policy 0, policy_version 476352 (0.0012) [2024-06-15 17:46:45,381][1652475] Updated weights for policy 0, policy_version 476408 (0.0031) [2024-06-15 17:46:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 975699968. Throughput: 0: 10478.9. Samples: 243993088. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:46:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:46:47,633][1652475] Updated weights for policy 0, policy_version 476436 (0.0017) [2024-06-15 17:46:48,361][1652475] Updated weights for policy 0, policy_version 476480 (0.0014) [2024-06-15 17:46:50,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 975863808. Throughput: 0: 10729.3. Samples: 244028416. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:46:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:46:52,422][1652475] Updated weights for policy 0, policy_version 476577 (0.0015) [2024-06-15 17:46:53,091][1652475] Updated weights for policy 0, policy_version 476607 (0.0012) [2024-06-15 17:46:55,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 42876.1). Total num frames: 976093184. Throughput: 0: 10843.0. Samples: 244094464. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:46:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:46:56,200][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000476640_976158720.pth... [2024-06-15 17:46:56,304][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000471616_965869568.pth [2024-06-15 17:46:58,842][1652475] Updated weights for policy 0, policy_version 476688 (0.0015) [2024-06-15 17:47:00,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 976355328. Throughput: 0: 11013.7. Samples: 244163072. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:47:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:47:01,835][1651340] Signal inference workers to stop experience collection... (24450 times) [2024-06-15 17:47:01,897][1652475] InferenceWorker_p0-w0: stopping experience collection (24450 times) [2024-06-15 17:47:02,068][1651340] Signal inference workers to resume experience collection... (24450 times) [2024-06-15 17:47:02,069][1652475] InferenceWorker_p0-w0: resuming experience collection (24450 times) [2024-06-15 17:47:02,840][1652475] Updated weights for policy 0, policy_version 476784 (0.0014) [2024-06-15 17:47:04,399][1652475] Updated weights for policy 0, policy_version 476848 (0.0014) [2024-06-15 17:47:05,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 42598.3, 300 sec: 43209.3). Total num frames: 976617472. Throughput: 0: 11047.8. Samples: 244191232. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:47:05,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:47:08,415][1652475] Updated weights for policy 0, policy_version 476915 (0.0013) [2024-06-15 17:47:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 976748544. Throughput: 0: 10956.8. Samples: 244261376. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:47:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:47:11,699][1652475] Updated weights for policy 0, policy_version 476989 (0.0094) [2024-06-15 17:47:14,797][1652475] Updated weights for policy 0, policy_version 477040 (0.0013) [2024-06-15 17:47:15,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 43211.7). Total num frames: 977043456. Throughput: 0: 11025.1. Samples: 244324864. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:47:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:47:17,093][1652475] Updated weights for policy 0, policy_version 477120 (0.0012) [2024-06-15 17:47:20,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 977207296. Throughput: 0: 11002.3. Samples: 244354560. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:47:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:47:21,331][1652475] Updated weights for policy 0, policy_version 477181 (0.0012) [2024-06-15 17:47:24,417][1652475] Updated weights for policy 0, policy_version 477245 (0.0014) [2024-06-15 17:47:25,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 44783.0, 300 sec: 42987.2). Total num frames: 977436672. Throughput: 0: 11002.4. Samples: 244421120. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:47:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:47:26,475][1652475] Updated weights for policy 0, policy_version 477298 (0.0014) [2024-06-15 17:47:27,984][1652475] Updated weights for policy 0, policy_version 477360 (0.0013) [2024-06-15 17:47:30,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 977666048. Throughput: 0: 11138.9. Samples: 244494336. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:47:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:47:32,337][1652475] Updated weights for policy 0, policy_version 477408 (0.0011) [2024-06-15 17:47:33,184][1652475] Updated weights for policy 0, policy_version 477440 (0.0012) [2024-06-15 17:47:35,135][1652475] Updated weights for policy 0, policy_version 477501 (0.0013) [2024-06-15 17:47:35,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 45875.3, 300 sec: 43098.3). Total num frames: 977928192. Throughput: 0: 11127.5. Samples: 244529152. Policy #0 lag: (min: 31.0, avg: 87.0, max: 287.0) [2024-06-15 17:47:35,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:47:38,514][1652475] Updated weights for policy 0, policy_version 477601 (0.0015) [2024-06-15 17:47:40,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 45329.1, 300 sec: 43320.4). Total num frames: 978190336. Throughput: 0: 11036.4. Samples: 244591104. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:47:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:47:43,056][1652475] Updated weights for policy 0, policy_version 477648 (0.0012) [2024-06-15 17:47:45,718][1652475] Updated weights for policy 0, policy_version 477713 (0.0012) [2024-06-15 17:47:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 978354176. Throughput: 0: 11241.3. Samples: 244668928. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:47:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:47:46,075][1651340] Signal inference workers to stop experience collection... (24500 times) [2024-06-15 17:47:46,104][1652475] InferenceWorker_p0-w0: stopping experience collection (24500 times) [2024-06-15 17:47:46,303][1651340] Signal inference workers to resume experience collection... (24500 times) [2024-06-15 17:47:46,305][1652475] InferenceWorker_p0-w0: resuming experience collection (24500 times) [2024-06-15 17:47:48,223][1652475] Updated weights for policy 0, policy_version 477778 (0.0012) [2024-06-15 17:47:49,803][1652475] Updated weights for policy 0, policy_version 477840 (0.0094) [2024-06-15 17:47:50,738][1648984] Fps is (10 sec: 49153.0, 60 sec: 46967.4, 300 sec: 43431.7). Total num frames: 978681856. Throughput: 0: 11343.7. Samples: 244701696. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:47:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:47:51,026][1652475] Updated weights for policy 0, policy_version 477888 (0.0016) [2024-06-15 17:47:55,691][1652475] Updated weights for policy 0, policy_version 477946 (0.0015) [2024-06-15 17:47:55,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 45329.0, 300 sec: 42987.2). Total num frames: 978812928. Throughput: 0: 11332.2. Samples: 244771328. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:47:55,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 17:47:57,837][1652475] Updated weights for policy 0, policy_version 478012 (0.0014) [2024-06-15 17:48:00,738][1648984] Fps is (10 sec: 29489.9, 60 sec: 43690.4, 300 sec: 43098.2). Total num frames: 978976768. Throughput: 0: 11184.3. Samples: 244828160. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:00,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:48:02,565][1652475] Updated weights for policy 0, policy_version 478080 (0.0014) [2024-06-15 17:48:04,304][1652475] Updated weights for policy 0, policy_version 478144 (0.0014) [2024-06-15 17:48:05,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.8, 300 sec: 43320.4). Total num frames: 979238912. Throughput: 0: 11104.7. Samples: 244854272. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:48:08,158][1652475] Updated weights for policy 0, policy_version 478202 (0.0016) [2024-06-15 17:48:10,738][1648984] Fps is (10 sec: 39323.4, 60 sec: 43690.7, 300 sec: 42658.7). Total num frames: 979369984. Throughput: 0: 11036.5. Samples: 244917760. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:48:12,480][1652475] Updated weights for policy 0, policy_version 478269 (0.0124) [2024-06-15 17:48:15,028][1652475] Updated weights for policy 0, policy_version 478328 (0.0014) [2024-06-15 17:48:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 979632128. Throughput: 0: 10638.2. Samples: 244973056. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:48:17,499][1652475] Updated weights for policy 0, policy_version 478398 (0.0013) [2024-06-15 17:48:19,824][1652475] Updated weights for policy 0, policy_version 478452 (0.0017) [2024-06-15 17:48:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 43209.3). Total num frames: 979894272. Throughput: 0: 10581.3. Samples: 245005312. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:48:25,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 979927040. Throughput: 0: 10672.4. Samples: 245071360. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 17:48:26,478][1652475] Updated weights for policy 0, policy_version 478522 (0.0025) [2024-06-15 17:48:29,021][1652475] Updated weights for policy 0, policy_version 478592 (0.0039) [2024-06-15 17:48:30,732][1652475] Updated weights for policy 0, policy_version 478656 (0.0014) [2024-06-15 17:48:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 980287488. Throughput: 0: 10149.0. Samples: 245125632. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:30,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:48:35,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 980418560. Throughput: 0: 10012.4. Samples: 245152256. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:35,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 17:48:37,426][1652475] Updated weights for policy 0, policy_version 478736 (0.0013) [2024-06-15 17:48:37,548][1651340] Signal inference workers to stop experience collection... (24550 times) [2024-06-15 17:48:37,609][1652475] InferenceWorker_p0-w0: stopping experience collection (24550 times) [2024-06-15 17:48:37,851][1651340] Signal inference workers to resume experience collection... (24550 times) [2024-06-15 17:48:37,852][1652475] InferenceWorker_p0-w0: resuming experience collection (24550 times) [2024-06-15 17:48:39,289][1652475] Updated weights for policy 0, policy_version 478787 (0.0114) [2024-06-15 17:48:40,616][1652475] Updated weights for policy 0, policy_version 478833 (0.0012) [2024-06-15 17:48:40,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 40960.1, 300 sec: 42987.2). Total num frames: 980647936. Throughput: 0: 9989.7. Samples: 245220864. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:40,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 17:48:43,255][1652475] Updated weights for policy 0, policy_version 478871 (0.0015) [2024-06-15 17:48:45,314][1652475] Updated weights for policy 0, policy_version 478928 (0.0019) [2024-06-15 17:48:45,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 42052.2, 300 sec: 42876.1). Total num frames: 980877312. Throughput: 0: 10160.4. Samples: 245285376. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:45,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:48:46,232][1652475] Updated weights for policy 0, policy_version 478976 (0.0013) [2024-06-15 17:48:50,713][1652475] Updated weights for policy 0, policy_version 479042 (0.0016) [2024-06-15 17:48:50,739][1648984] Fps is (10 sec: 42595.0, 60 sec: 39867.1, 300 sec: 43098.2). Total num frames: 981073920. Throughput: 0: 10330.8. Samples: 245319168. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:50,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:48:52,062][1652475] Updated weights for policy 0, policy_version 479102 (0.0014) [2024-06-15 17:48:55,616][1652475] Updated weights for policy 0, policy_version 479167 (0.0021) [2024-06-15 17:48:55,738][1648984] Fps is (10 sec: 45874.3, 60 sec: 42052.1, 300 sec: 43209.3). Total num frames: 981336064. Throughput: 0: 10353.7. Samples: 245383680. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:48:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:48:55,780][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000479168_981336064.pth... [2024-06-15 17:48:55,858][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000474112_970981376.pth [2024-06-15 17:49:00,738][1648984] Fps is (10 sec: 39324.8, 60 sec: 41506.4, 300 sec: 42653.9). Total num frames: 981467136. Throughput: 0: 10490.3. Samples: 245445120. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:49:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:01,014][1652475] Updated weights for policy 0, policy_version 479233 (0.0013) [2024-06-15 17:49:03,558][1652475] Updated weights for policy 0, policy_version 479331 (0.0012) [2024-06-15 17:49:05,738][1648984] Fps is (10 sec: 39323.0, 60 sec: 41506.2, 300 sec: 43099.8). Total num frames: 981729280. Throughput: 0: 10353.8. Samples: 245471232. Policy #0 lag: (min: 47.0, avg: 208.9, max: 335.0) [2024-06-15 17:49:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:06,940][1652475] Updated weights for policy 0, policy_version 479392 (0.0013) [2024-06-15 17:49:10,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 41505.9, 300 sec: 42542.8). Total num frames: 981860352. Throughput: 0: 10547.2. Samples: 245545984. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:49:10,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:11,238][1652475] Updated weights for policy 0, policy_version 479440 (0.0017) [2024-06-15 17:49:13,789][1652475] Updated weights for policy 0, policy_version 479520 (0.0231) [2024-06-15 17:49:14,563][1652475] Updated weights for policy 0, policy_version 479552 (0.0013) [2024-06-15 17:49:15,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 982188032. Throughput: 0: 10672.4. Samples: 245605888. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:49:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:16,078][1652475] Updated weights for policy 0, policy_version 479612 (0.0013) [2024-06-15 17:49:19,696][1652475] Updated weights for policy 0, policy_version 479664 (0.0027) [2024-06-15 17:49:20,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 982384640. Throughput: 0: 10854.4. Samples: 245640704. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:49:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:23,801][1652475] Updated weights for policy 0, policy_version 479728 (0.0025) [2024-06-15 17:49:25,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 44236.8, 300 sec: 42878.7). Total num frames: 982581248. Throughput: 0: 10786.1. Samples: 245706240. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:49:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:25,810][1652475] Updated weights for policy 0, policy_version 479792 (0.0014) [2024-06-15 17:49:26,664][1651340] Signal inference workers to stop experience collection... (24600 times) [2024-06-15 17:49:26,705][1652475] InferenceWorker_p0-w0: stopping experience collection (24600 times) [2024-06-15 17:49:26,957][1651340] Signal inference workers to resume experience collection... (24600 times) [2024-06-15 17:49:26,974][1652475] InferenceWorker_p0-w0: resuming experience collection (24600 times) [2024-06-15 17:49:27,727][1652475] Updated weights for policy 0, policy_version 479856 (0.0015) [2024-06-15 17:49:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 982777856. Throughput: 0: 10774.8. Samples: 245770240. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:49:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:31,625][1652475] Updated weights for policy 0, policy_version 479904 (0.0022) [2024-06-15 17:49:35,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42052.2, 300 sec: 42543.0). Total num frames: 982941696. Throughput: 0: 10809.1. Samples: 245805568. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:49:35,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:36,437][1652475] Updated weights for policy 0, policy_version 479985 (0.0014) [2024-06-15 17:49:37,981][1652475] Updated weights for policy 0, policy_version 480062 (0.0013) [2024-06-15 17:49:39,900][1652475] Updated weights for policy 0, policy_version 480124 (0.0126) [2024-06-15 17:49:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44236.8, 300 sec: 43098.2). Total num frames: 983302144. Throughput: 0: 10706.6. Samples: 245865472. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:49:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:44,737][1652475] Updated weights for policy 0, policy_version 480184 (0.0015) [2024-06-15 17:49:45,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 983433216. Throughput: 0: 10797.5. Samples: 245931008. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:49:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:48,634][1652475] Updated weights for policy 0, policy_version 480240 (0.0110) [2024-06-15 17:49:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43691.2, 300 sec: 43098.2). Total num frames: 983695360. Throughput: 0: 11013.7. Samples: 245966848. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:49:50,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:51,072][1652475] Updated weights for policy 0, policy_version 480321 (0.0034) [2024-06-15 17:49:55,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.5, 300 sec: 42987.2). Total num frames: 983859200. Throughput: 0: 10661.0. Samples: 246025728. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:49:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:49:55,902][1652475] Updated weights for policy 0, policy_version 480401 (0.0015) [2024-06-15 17:50:00,521][1652475] Updated weights for policy 0, policy_version 480480 (0.0093) [2024-06-15 17:50:00,738][1648984] Fps is (10 sec: 32767.5, 60 sec: 42598.2, 300 sec: 42431.7). Total num frames: 984023040. Throughput: 0: 10945.3. Samples: 246098432. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:50:00,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:50:02,595][1652475] Updated weights for policy 0, policy_version 480573 (0.0215) [2024-06-15 17:50:04,197][1652475] Updated weights for policy 0, policy_version 480624 (0.0015) [2024-06-15 17:50:05,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 984350720. Throughput: 0: 10717.9. Samples: 246123008. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:50:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:50:08,250][1652475] Updated weights for policy 0, policy_version 480688 (0.0019) [2024-06-15 17:50:10,738][1648984] Fps is (10 sec: 45876.4, 60 sec: 43690.8, 300 sec: 42654.0). Total num frames: 984481792. Throughput: 0: 10774.8. Samples: 246191104. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:50:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:50:12,453][1652475] Updated weights for policy 0, policy_version 480742 (0.0013) [2024-06-15 17:50:13,408][1651340] Signal inference workers to stop experience collection... (24650 times) [2024-06-15 17:50:13,498][1652475] InferenceWorker_p0-w0: stopping experience collection (24650 times) [2024-06-15 17:50:13,639][1651340] Signal inference workers to resume experience collection... (24650 times) [2024-06-15 17:50:13,650][1652475] InferenceWorker_p0-w0: resuming experience collection (24650 times) [2024-06-15 17:50:13,652][1652475] Updated weights for policy 0, policy_version 480800 (0.0013) [2024-06-15 17:50:15,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 984743936. Throughput: 0: 10808.9. Samples: 246256640. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:50:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:50:17,315][1652475] Updated weights for policy 0, policy_version 480888 (0.0016) [2024-06-15 17:50:19,122][1652475] Updated weights for policy 0, policy_version 480913 (0.0025) [2024-06-15 17:50:20,740][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 985006080. Throughput: 0: 10752.0. Samples: 246289408. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:50:20,741][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 17:50:22,802][1652475] Updated weights for policy 0, policy_version 480961 (0.0078) [2024-06-15 17:50:23,938][1652475] Updated weights for policy 0, policy_version 481024 (0.0013) [2024-06-15 17:50:25,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 985169920. Throughput: 0: 10934.1. Samples: 246357504. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:50:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:50:29,074][1652475] Updated weights for policy 0, policy_version 481104 (0.0013) [2024-06-15 17:50:30,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 985399296. Throughput: 0: 10808.9. Samples: 246417408. Policy #0 lag: (min: 15.0, avg: 135.1, max: 271.0) [2024-06-15 17:50:30,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 17:50:31,617][1652475] Updated weights for policy 0, policy_version 481200 (0.0014) [2024-06-15 17:50:35,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.9, 300 sec: 42876.1). Total num frames: 985595904. Throughput: 0: 10604.1. Samples: 246444032. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:50:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:50:36,042][1652475] Updated weights for policy 0, policy_version 481264 (0.0014) [2024-06-15 17:50:40,740][1648984] Fps is (10 sec: 29491.0, 60 sec: 39867.7, 300 sec: 42765.0). Total num frames: 985694208. Throughput: 0: 10774.7. Samples: 246510592. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:50:40,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 17:50:41,583][1652475] Updated weights for policy 0, policy_version 481339 (0.0013) [2024-06-15 17:50:42,879][1652475] Updated weights for policy 0, policy_version 481380 (0.0011) [2024-06-15 17:50:44,790][1652475] Updated weights for policy 0, policy_version 481465 (0.0013) [2024-06-15 17:50:45,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 986054656. Throughput: 0: 10319.7. Samples: 246562816. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:50:45,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 17:50:50,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 41506.2, 300 sec: 42654.0). Total num frames: 986185728. Throughput: 0: 10501.7. Samples: 246595584. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:50:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:50:52,644][1652475] Updated weights for policy 0, policy_version 481542 (0.0012) [2024-06-15 17:50:54,896][1652475] Updated weights for policy 0, policy_version 481632 (0.0011) [2024-06-15 17:50:55,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 986447872. Throughput: 0: 10456.2. Samples: 246661632. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:50:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:50:55,765][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000481664_986447872.pth... [2024-06-15 17:50:55,815][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000476640_976158720.pth [2024-06-15 17:50:55,821][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000481664_986447872.pth [2024-06-15 17:50:58,394][1652475] Updated weights for policy 0, policy_version 481721 (0.0012) [2024-06-15 17:51:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.6, 300 sec: 42431.8). Total num frames: 986578944. Throughput: 0: 10262.7. Samples: 246718464. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:51:02,725][1652475] Updated weights for policy 0, policy_version 481784 (0.0069) [2024-06-15 17:51:04,809][1651340] Signal inference workers to stop experience collection... (24700 times) [2024-06-15 17:51:04,872][1652475] InferenceWorker_p0-w0: stopping experience collection (24700 times) [2024-06-15 17:51:05,057][1651340] Signal inference workers to resume experience collection... (24700 times) [2024-06-15 17:51:05,058][1652475] InferenceWorker_p0-w0: resuming experience collection (24700 times) [2024-06-15 17:51:05,297][1652475] Updated weights for policy 0, policy_version 481813 (0.0011) [2024-06-15 17:51:05,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40413.8, 300 sec: 42876.1). Total num frames: 986775552. Throughput: 0: 10217.2. Samples: 246749184. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:51:07,792][1652475] Updated weights for policy 0, policy_version 481914 (0.0012) [2024-06-15 17:51:10,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 987037696. Throughput: 0: 10001.1. Samples: 246807552. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:51:10,910][1652475] Updated weights for policy 0, policy_version 481959 (0.0015) [2024-06-15 17:51:15,739][1648984] Fps is (10 sec: 32767.4, 60 sec: 39321.4, 300 sec: 42209.6). Total num frames: 987103232. Throughput: 0: 10342.3. Samples: 246882816. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:15,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:51:16,311][1652475] Updated weights for policy 0, policy_version 482016 (0.0078) [2024-06-15 17:51:18,428][1652475] Updated weights for policy 0, policy_version 482105 (0.0012) [2024-06-15 17:51:20,097][1652475] Updated weights for policy 0, policy_version 482176 (0.0013) [2024-06-15 17:51:20,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 987496448. Throughput: 0: 10240.0. Samples: 246904832. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:20,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:51:25,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 40960.0, 300 sec: 42654.0). Total num frames: 987627520. Throughput: 0: 10194.5. Samples: 246969344. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:51:28,487][1652475] Updated weights for policy 0, policy_version 482260 (0.0019) [2024-06-15 17:51:30,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 987856896. Throughput: 0: 10501.7. Samples: 247035392. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:51:30,787][1652475] Updated weights for policy 0, policy_version 482356 (0.0086) [2024-06-15 17:51:33,801][1652475] Updated weights for policy 0, policy_version 482433 (0.0015) [2024-06-15 17:51:34,911][1652475] Updated weights for policy 0, policy_version 482496 (0.0012) [2024-06-15 17:51:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 988151808. Throughput: 0: 10387.9. Samples: 247063040. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:51:40,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.2, 300 sec: 42320.7). Total num frames: 988184576. Throughput: 0: 10695.1. Samples: 247142912. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:40,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:51:42,639][1652475] Updated weights for policy 0, policy_version 482597 (0.0013) [2024-06-15 17:51:44,218][1652475] Updated weights for policy 0, policy_version 482674 (0.0014) [2024-06-15 17:51:45,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 41506.2, 300 sec: 42987.1). Total num frames: 988545024. Throughput: 0: 10604.1. Samples: 247195648. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:51:46,224][1651340] Signal inference workers to stop experience collection... (24750 times) [2024-06-15 17:51:46,259][1652475] InferenceWorker_p0-w0: stopping experience collection (24750 times) [2024-06-15 17:51:46,447][1651340] Signal inference workers to resume experience collection... (24750 times) [2024-06-15 17:51:46,448][1652475] InferenceWorker_p0-w0: resuming experience collection (24750 times) [2024-06-15 17:51:46,692][1652475] Updated weights for policy 0, policy_version 482752 (0.0016) [2024-06-15 17:51:50,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 988676096. Throughput: 0: 10604.1. Samples: 247226368. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:51:54,429][1652475] Updated weights for policy 0, policy_version 482834 (0.0013) [2024-06-15 17:51:55,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 988971008. Throughput: 0: 10956.8. Samples: 247300608. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:51:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:51:56,531][1652475] Updated weights for policy 0, policy_version 482937 (0.0014) [2024-06-15 17:51:58,335][1652475] Updated weights for policy 0, policy_version 482992 (0.0115) [2024-06-15 17:52:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 989200384. Throughput: 0: 10649.6. Samples: 247362048. Policy #0 lag: (min: 8.0, avg: 110.8, max: 264.0) [2024-06-15 17:52:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:52:05,429][1652475] Updated weights for policy 0, policy_version 483060 (0.0011) [2024-06-15 17:52:05,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 989331456. Throughput: 0: 11093.4. Samples: 247404032. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:05,741][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:52:06,326][1652475] Updated weights for policy 0, policy_version 483111 (0.0091) [2024-06-15 17:52:07,745][1652475] Updated weights for policy 0, policy_version 483172 (0.0018) [2024-06-15 17:52:09,271][1652475] Updated weights for policy 0, policy_version 483238 (0.0013) [2024-06-15 17:52:10,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 44783.0, 300 sec: 42987.2). Total num frames: 989724672. Throughput: 0: 10990.9. Samples: 247463936. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:52:15,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.8, 300 sec: 42431.8). Total num frames: 989724672. Throughput: 0: 11241.2. Samples: 247541248. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:52:17,164][1652475] Updated weights for policy 0, policy_version 483313 (0.0014) [2024-06-15 17:52:18,459][1652475] Updated weights for policy 0, policy_version 483376 (0.0013) [2024-06-15 17:52:19,611][1652475] Updated weights for policy 0, policy_version 483432 (0.0102) [2024-06-15 17:52:20,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.9, 300 sec: 43098.3). Total num frames: 990150656. Throughput: 0: 11343.6. Samples: 247573504. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:52:21,776][1652475] Updated weights for policy 0, policy_version 483510 (0.0013) [2024-06-15 17:52:25,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 990248960. Throughput: 0: 10956.8. Samples: 247635968. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:52:28,787][1652475] Updated weights for policy 0, policy_version 483554 (0.0020) [2024-06-15 17:52:30,315][1651340] Signal inference workers to stop experience collection... (24800 times) [2024-06-15 17:52:30,348][1652475] InferenceWorker_p0-w0: stopping experience collection (24800 times) [2024-06-15 17:52:30,541][1651340] Signal inference workers to resume experience collection... (24800 times) [2024-06-15 17:52:30,543][1652475] InferenceWorker_p0-w0: resuming experience collection (24800 times) [2024-06-15 17:52:30,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 990478336. Throughput: 0: 11275.4. Samples: 247703040. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:30,740][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:52:31,075][1652475] Updated weights for policy 0, policy_version 483652 (0.0210) [2024-06-15 17:52:33,527][1652475] Updated weights for policy 0, policy_version 483760 (0.0091) [2024-06-15 17:52:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 990773248. Throughput: 0: 11036.4. Samples: 247723008. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 17:52:40,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 43144.5, 300 sec: 42098.5). Total num frames: 990773248. Throughput: 0: 11002.3. Samples: 247795712. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:52:41,291][1652475] Updated weights for policy 0, policy_version 483816 (0.0098) [2024-06-15 17:52:42,737][1652475] Updated weights for policy 0, policy_version 483888 (0.0014) [2024-06-15 17:52:44,376][1652475] Updated weights for policy 0, policy_version 483952 (0.0012) [2024-06-15 17:52:45,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44783.0, 300 sec: 42542.8). Total num frames: 991232000. Throughput: 0: 10843.0. Samples: 247849984. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:52:46,046][1652475] Updated weights for policy 0, policy_version 484020 (0.0014) [2024-06-15 17:52:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 42320.7). Total num frames: 991297536. Throughput: 0: 10695.1. Samples: 247885312. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:50,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 17:52:53,038][1652475] Updated weights for policy 0, policy_version 484067 (0.0014) [2024-06-15 17:52:54,997][1652475] Updated weights for policy 0, policy_version 484128 (0.0016) [2024-06-15 17:52:55,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 43144.5, 300 sec: 42654.0). Total num frames: 991559680. Throughput: 0: 10968.1. Samples: 247957504. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:52:55,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:52:56,138][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000484176_991592448.pth... [2024-06-15 17:52:56,318][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000479168_981336064.pth [2024-06-15 17:52:57,089][1652475] Updated weights for policy 0, policy_version 484208 (0.0124) [2024-06-15 17:52:58,975][1652475] Updated weights for policy 0, policy_version 484288 (0.0014) [2024-06-15 17:53:00,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 991821824. Throughput: 0: 10319.6. Samples: 248005632. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:53:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:53:05,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 991920128. Throughput: 0: 10331.0. Samples: 248038400. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:53:05,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 17:53:05,766][1652475] Updated weights for policy 0, policy_version 484345 (0.0016) [2024-06-15 17:53:08,721][1652475] Updated weights for policy 0, policy_version 484400 (0.0016) [2024-06-15 17:53:09,653][1652475] Updated weights for policy 0, policy_version 484433 (0.0010) [2024-06-15 17:53:10,740][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 992215040. Throughput: 0: 10410.7. Samples: 248104448. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:53:10,741][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:53:11,465][1651340] Signal inference workers to stop experience collection... (24850 times) [2024-06-15 17:53:11,521][1652475] InferenceWorker_p0-w0: stopping experience collection (24850 times) [2024-06-15 17:53:11,523][1652475] Updated weights for policy 0, policy_version 484518 (0.0024) [2024-06-15 17:53:11,641][1651340] Signal inference workers to resume experience collection... (24850 times) [2024-06-15 17:53:11,650][1652475] InferenceWorker_p0-w0: resuming experience collection (24850 times) [2024-06-15 17:53:15,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 42209.6). Total num frames: 992346112. Throughput: 0: 10365.1. Samples: 248169472. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:53:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:53:16,431][1652475] Updated weights for policy 0, policy_version 484548 (0.0027) [2024-06-15 17:53:17,831][1652475] Updated weights for policy 0, policy_version 484605 (0.0012) [2024-06-15 17:53:20,465][1652475] Updated weights for policy 0, policy_version 484671 (0.0013) [2024-06-15 17:53:20,742][1648984] Fps is (10 sec: 39304.0, 60 sec: 40956.9, 300 sec: 42986.5). Total num frames: 992608256. Throughput: 0: 10614.4. Samples: 248200704. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:53:20,743][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:53:22,494][1652475] Updated weights for policy 0, policy_version 484736 (0.0029) [2024-06-15 17:53:24,643][1652475] Updated weights for policy 0, policy_version 484800 (0.0013) [2024-06-15 17:53:25,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 992870400. Throughput: 0: 10296.9. Samples: 248259072. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:53:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:53:30,738][1648984] Fps is (10 sec: 32782.6, 60 sec: 40959.9, 300 sec: 42431.8). Total num frames: 992935936. Throughput: 0: 10808.9. Samples: 248336384. Policy #0 lag: (min: 15.0, avg: 66.3, max: 271.0) [2024-06-15 17:53:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:53:31,603][1652475] Updated weights for policy 0, policy_version 484866 (0.0014) [2024-06-15 17:53:33,821][1652475] Updated weights for policy 0, policy_version 484963 (0.0015) [2024-06-15 17:53:35,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 993296384. Throughput: 0: 10456.2. Samples: 248355840. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:53:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:53:36,463][1652475] Updated weights for policy 0, policy_version 485040 (0.0013) [2024-06-15 17:53:40,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 993394688. Throughput: 0: 10194.5. Samples: 248416256. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:53:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:53:42,992][1652475] Updated weights for policy 0, policy_version 485076 (0.0022) [2024-06-15 17:53:44,426][1652475] Updated weights for policy 0, policy_version 485136 (0.0137) [2024-06-15 17:53:45,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40413.9, 300 sec: 42654.1). Total num frames: 993656832. Throughput: 0: 10717.9. Samples: 248487936. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:53:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:53:46,546][1652475] Updated weights for policy 0, policy_version 485232 (0.0016) [2024-06-15 17:53:47,684][1652475] Updated weights for policy 0, policy_version 485270 (0.0096) [2024-06-15 17:53:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 993918976. Throughput: 0: 10535.8. Samples: 248512512. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:53:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:53:54,750][1652475] Updated weights for policy 0, policy_version 485360 (0.0014) [2024-06-15 17:53:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 994050048. Throughput: 0: 10922.7. Samples: 248595968. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:53:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:53:56,506][1652475] Updated weights for policy 0, policy_version 485408 (0.0014) [2024-06-15 17:53:58,136][1651340] Signal inference workers to stop experience collection... (24900 times) [2024-06-15 17:53:58,210][1652475] InferenceWorker_p0-w0: stopping experience collection (24900 times) [2024-06-15 17:53:58,367][1651340] Signal inference workers to resume experience collection... (24900 times) [2024-06-15 17:53:58,367][1652475] InferenceWorker_p0-w0: resuming experience collection (24900 times) [2024-06-15 17:53:58,370][1652475] Updated weights for policy 0, policy_version 485488 (0.0020) [2024-06-15 17:54:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 994443264. Throughput: 0: 10490.3. Samples: 248641536. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:00,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:54:05,774][1648984] Fps is (10 sec: 39180.0, 60 sec: 42026.9, 300 sec: 42648.7). Total num frames: 994443264. Throughput: 0: 10699.0. Samples: 248682496. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:05,774][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:54:06,032][1652475] Updated weights for policy 0, policy_version 485570 (0.0016) [2024-06-15 17:54:08,226][1652475] Updated weights for policy 0, policy_version 485633 (0.0013) [2024-06-15 17:54:10,703][1652475] Updated weights for policy 0, policy_version 485744 (0.0015) [2024-06-15 17:54:10,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.6, 300 sec: 42765.0). Total num frames: 994803712. Throughput: 0: 10911.3. Samples: 248750080. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:54:12,391][1652475] Updated weights for policy 0, policy_version 485817 (0.0014) [2024-06-15 17:54:15,738][1648984] Fps is (10 sec: 52619.0, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 994967552. Throughput: 0: 10706.5. Samples: 248818176. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:54:19,260][1652475] Updated weights for policy 0, policy_version 485883 (0.0014) [2024-06-15 17:54:20,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42601.6, 300 sec: 42653.9). Total num frames: 995164160. Throughput: 0: 11116.1. Samples: 248856064. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:54:20,915][1652475] Updated weights for policy 0, policy_version 485926 (0.0075) [2024-06-15 17:54:22,945][1652475] Updated weights for policy 0, policy_version 486032 (0.0015) [2024-06-15 17:54:23,902][1652475] Updated weights for policy 0, policy_version 486079 (0.0014) [2024-06-15 17:54:25,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 995491840. Throughput: 0: 11013.7. Samples: 248911872. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:25,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:54:30,687][1652475] Updated weights for policy 0, policy_version 486142 (0.0012) [2024-06-15 17:54:30,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44783.0, 300 sec: 42987.2). Total num frames: 995622912. Throughput: 0: 11127.5. Samples: 248988672. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:54:32,752][1652475] Updated weights for policy 0, policy_version 486209 (0.0018) [2024-06-15 17:54:33,932][1652475] Updated weights for policy 0, policy_version 486264 (0.0017) [2024-06-15 17:54:35,532][1652475] Updated weights for policy 0, policy_version 486333 (0.0012) [2024-06-15 17:54:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45329.1, 300 sec: 43098.3). Total num frames: 996016128. Throughput: 0: 11252.6. Samples: 249018880. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:54:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 996016128. Throughput: 0: 10945.4. Samples: 249088512. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:54:42,193][1652475] Updated weights for policy 0, policy_version 486371 (0.0013) [2024-06-15 17:54:42,938][1651340] Signal inference workers to stop experience collection... (24950 times) [2024-06-15 17:54:42,996][1652475] InferenceWorker_p0-w0: stopping experience collection (24950 times) [2024-06-15 17:54:43,125][1651340] Signal inference workers to resume experience collection... (24950 times) [2024-06-15 17:54:43,126][1652475] InferenceWorker_p0-w0: resuming experience collection (24950 times) [2024-06-15 17:54:43,312][1652475] Updated weights for policy 0, policy_version 486419 (0.0016) [2024-06-15 17:54:44,622][1652475] Updated weights for policy 0, policy_version 486481 (0.0115) [2024-06-15 17:54:45,721][1652475] Updated weights for policy 0, policy_version 486530 (0.0015) [2024-06-15 17:54:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 996409344. Throughput: 0: 11423.3. Samples: 249155584. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:45,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 17:54:46,899][1652475] Updated weights for policy 0, policy_version 486588 (0.0018) [2024-06-15 17:54:50,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 996540416. Throughput: 0: 11307.2. Samples: 249190912. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:50,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:54:54,127][1652475] Updated weights for policy 0, policy_version 486643 (0.0015) [2024-06-15 17:54:55,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 45875.0, 300 sec: 43320.4). Total num frames: 996802560. Throughput: 0: 11491.5. Samples: 249267200. Policy #0 lag: (min: 42.0, avg: 124.6, max: 298.0) [2024-06-15 17:54:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:54:56,276][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000486752_996868096.pth... [2024-06-15 17:54:56,394][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000481664_986447872.pth [2024-06-15 17:54:57,239][1652475] Updated weights for policy 0, policy_version 486791 (0.0014) [2024-06-15 17:55:00,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 997064704. Throughput: 0: 11195.7. Samples: 249321984. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:55:05,522][1652475] Updated weights for policy 0, policy_version 486864 (0.0014) [2024-06-15 17:55:05,740][1648984] Fps is (10 sec: 29491.8, 60 sec: 44263.5, 300 sec: 42765.0). Total num frames: 997097472. Throughput: 0: 11184.4. Samples: 249359360. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:05,741][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:55:07,067][1652475] Updated weights for policy 0, policy_version 486928 (0.0015) [2024-06-15 17:55:09,227][1652475] Updated weights for policy 0, policy_version 487024 (0.0014) [2024-06-15 17:55:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 45329.1, 300 sec: 43320.4). Total num frames: 997523456. Throughput: 0: 11275.4. Samples: 249419264. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:55:11,313][1652475] Updated weights for policy 0, policy_version 487102 (0.0012) [2024-06-15 17:55:15,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 997588992. Throughput: 0: 10911.3. Samples: 249479680. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:55:19,684][1652475] Updated weights for policy 0, policy_version 487172 (0.0019) [2024-06-15 17:55:20,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 44236.8, 300 sec: 42876.1). Total num frames: 997818368. Throughput: 0: 11081.9. Samples: 249517568. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 17:55:21,028][1652475] Updated weights for policy 0, policy_version 487234 (0.0011) [2024-06-15 17:55:21,709][1651340] Signal inference workers to stop experience collection... (25000 times) [2024-06-15 17:55:21,770][1652475] InferenceWorker_p0-w0: stopping experience collection (25000 times) [2024-06-15 17:55:22,035][1651340] Signal inference workers to resume experience collection... (25000 times) [2024-06-15 17:55:22,036][1652475] InferenceWorker_p0-w0: resuming experience collection (25000 times) [2024-06-15 17:55:22,893][1652475] Updated weights for policy 0, policy_version 487312 (0.0093) [2024-06-15 17:55:25,738][1648984] Fps is (10 sec: 52426.8, 60 sec: 43690.4, 300 sec: 43098.2). Total num frames: 998113280. Throughput: 0: 10638.2. Samples: 249567232. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:25,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 17:55:29,833][1652475] Updated weights for policy 0, policy_version 487386 (0.0013) [2024-06-15 17:55:30,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 998244352. Throughput: 0: 10865.8. Samples: 249644544. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:55:31,198][1652475] Updated weights for policy 0, policy_version 487425 (0.0013) [2024-06-15 17:55:32,835][1652475] Updated weights for policy 0, policy_version 487504 (0.0135) [2024-06-15 17:55:35,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 998506496. Throughput: 0: 10695.1. Samples: 249672192. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:55:35,934][1652475] Updated weights for policy 0, policy_version 487568 (0.0020) [2024-06-15 17:55:40,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 998637568. Throughput: 0: 10501.7. Samples: 249739776. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:40,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:55:41,611][1652475] Updated weights for policy 0, policy_version 487618 (0.0015) [2024-06-15 17:55:43,535][1652475] Updated weights for policy 0, policy_version 487700 (0.0012) [2024-06-15 17:55:45,504][1652475] Updated weights for policy 0, policy_version 487776 (0.0012) [2024-06-15 17:55:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 998965248. Throughput: 0: 10581.3. Samples: 249798144. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 17:55:47,985][1652475] Updated weights for policy 0, policy_version 487813 (0.0030) [2024-06-15 17:55:49,200][1652475] Updated weights for policy 0, policy_version 487862 (0.0013) [2024-06-15 17:55:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 999161856. Throughput: 0: 10467.5. Samples: 249830400. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:50,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:55:54,155][1652475] Updated weights for policy 0, policy_version 487904 (0.0012) [2024-06-15 17:55:55,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.3, 300 sec: 43098.2). Total num frames: 999292928. Throughput: 0: 10831.6. Samples: 249906688. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:55:55,738][1648984] Avg episode reward: [(0, '-0.600')] [2024-06-15 17:55:55,756][1652475] Updated weights for policy 0, policy_version 487941 (0.0028) [2024-06-15 17:55:57,960][1652475] Updated weights for policy 0, policy_version 488018 (0.0015) [2024-06-15 17:55:59,432][1652475] Updated weights for policy 0, policy_version 488080 (0.0012) [2024-06-15 17:56:00,739][1648984] Fps is (10 sec: 52425.2, 60 sec: 43690.0, 300 sec: 43764.6). Total num frames: 999686144. Throughput: 0: 10603.9. Samples: 249956864. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:56:00,739][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 17:56:05,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 999686144. Throughput: 0: 10626.9. Samples: 249995776. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:56:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 17:56:05,793][1652475] Updated weights for policy 0, policy_version 488129 (0.0013) [2024-06-15 17:56:06,934][1652475] Updated weights for policy 0, policy_version 488182 (0.0015) [2024-06-15 17:56:08,272][1651340] Signal inference workers to stop experience collection... (25050 times) [2024-06-15 17:56:08,315][1652475] InferenceWorker_p0-w0: stopping experience collection (25050 times) [2024-06-15 17:56:08,480][1651340] Signal inference workers to resume experience collection... (25050 times) [2024-06-15 17:56:08,481][1652475] InferenceWorker_p0-w0: resuming experience collection (25050 times) [2024-06-15 17:56:08,484][1652475] Updated weights for policy 0, policy_version 488240 (0.0012) [2024-06-15 17:56:10,738][1648984] Fps is (10 sec: 39325.2, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 1000079360. Throughput: 0: 11013.8. Samples: 250062848. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:56:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:56:12,112][1652475] Updated weights for policy 0, policy_version 488368 (0.0265) [2024-06-15 17:56:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1000210432. Throughput: 0: 10695.1. Samples: 250125824. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:56:15,741][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:56:18,565][1652475] Updated weights for policy 0, policy_version 488404 (0.0023) [2024-06-15 17:56:20,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 1000407040. Throughput: 0: 11036.4. Samples: 250168832. Policy #0 lag: (min: 95.0, avg: 202.4, max: 335.0) [2024-06-15 17:56:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:56:20,973][1652475] Updated weights for policy 0, policy_version 488496 (0.0097) [2024-06-15 17:56:22,691][1652475] Updated weights for policy 0, policy_version 488570 (0.0014) [2024-06-15 17:56:24,619][1652475] Updated weights for policy 0, policy_version 488632 (0.0013) [2024-06-15 17:56:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.8, 300 sec: 43653.6). Total num frames: 1000734720. Throughput: 0: 10581.4. Samples: 250215936. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:56:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:56:30,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 1000767488. Throughput: 0: 11104.7. Samples: 250297856. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:56:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:56:31,675][1652475] Updated weights for policy 0, policy_version 488688 (0.0014) [2024-06-15 17:56:33,139][1652475] Updated weights for policy 0, policy_version 488754 (0.0014) [2024-06-15 17:56:34,820][1652475] Updated weights for policy 0, policy_version 488824 (0.0089) [2024-06-15 17:56:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 1001160704. Throughput: 0: 10934.1. Samples: 250322432. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:56:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:56:36,233][1652475] Updated weights for policy 0, policy_version 488867 (0.0012) [2024-06-15 17:56:40,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 1001259008. Throughput: 0: 10649.6. Samples: 250385920. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:56:40,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:56:43,028][1652475] Updated weights for policy 0, policy_version 488912 (0.0013) [2024-06-15 17:56:45,211][1652475] Updated weights for policy 0, policy_version 489013 (0.0014) [2024-06-15 17:56:45,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1001521152. Throughput: 0: 11013.9. Samples: 250452480. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:56:45,751][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:56:46,973][1652475] Updated weights for policy 0, policy_version 489087 (0.0022) [2024-06-15 17:56:48,521][1651340] Signal inference workers to stop experience collection... (25100 times) [2024-06-15 17:56:48,577][1652475] InferenceWorker_p0-w0: stopping experience collection (25100 times) [2024-06-15 17:56:48,878][1651340] Signal inference workers to resume experience collection... (25100 times) [2024-06-15 17:56:48,879][1652475] InferenceWorker_p0-w0: resuming experience collection (25100 times) [2024-06-15 17:56:49,110][1652475] Updated weights for policy 0, policy_version 489143 (0.0013) [2024-06-15 17:56:50,755][1648984] Fps is (10 sec: 52336.4, 60 sec: 43678.0, 300 sec: 43428.9). Total num frames: 1001783296. Throughput: 0: 10725.0. Samples: 250478592. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:56:50,756][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:56:55,738][1648984] Fps is (10 sec: 32766.7, 60 sec: 42598.2, 300 sec: 42876.0). Total num frames: 1001848832. Throughput: 0: 10911.2. Samples: 250553856. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:56:55,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:56:56,042][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000489200_1001881600.pth... [2024-06-15 17:56:56,197][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000484176_991592448.pth [2024-06-15 17:56:56,739][1652475] Updated weights for policy 0, policy_version 489217 (0.0013) [2024-06-15 17:56:58,533][1652475] Updated weights for policy 0, policy_version 489296 (0.0103) [2024-06-15 17:57:00,086][1652475] Updated weights for policy 0, policy_version 489360 (0.0013) [2024-06-15 17:57:00,738][1648984] Fps is (10 sec: 45956.1, 60 sec: 42599.0, 300 sec: 43764.7). Total num frames: 1002242048. Throughput: 0: 10706.5. Samples: 250607616. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:57:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 17:57:05,738][1648984] Fps is (10 sec: 45876.7, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1002307584. Throughput: 0: 10524.5. Samples: 250642432. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:57:05,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 17:57:08,385][1652475] Updated weights for policy 0, policy_version 489456 (0.0013) [2024-06-15 17:57:10,109][1652475] Updated weights for policy 0, policy_version 489524 (0.0013) [2024-06-15 17:57:10,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 1002569728. Throughput: 0: 10968.2. Samples: 250709504. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:57:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:57:12,625][1652475] Updated weights for policy 0, policy_version 489621 (0.0235) [2024-06-15 17:57:15,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1002831872. Throughput: 0: 10410.7. Samples: 250766336. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:57:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:57:20,436][1652475] Updated weights for policy 0, policy_version 489680 (0.0013) [2024-06-15 17:57:20,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 40960.1, 300 sec: 42765.0). Total num frames: 1002864640. Throughput: 0: 10808.9. Samples: 250808832. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:57:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 17:57:22,915][1652475] Updated weights for policy 0, policy_version 489760 (0.0235) [2024-06-15 17:57:23,875][1652475] Updated weights for policy 0, policy_version 489811 (0.0014) [2024-06-15 17:57:25,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 1003257856. Throughput: 0: 10638.2. Samples: 250864640. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:57:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:57:25,865][1652475] Updated weights for policy 0, policy_version 489888 (0.0015) [2024-06-15 17:57:30,746][1648984] Fps is (10 sec: 49110.1, 60 sec: 43138.4, 300 sec: 42652.7). Total num frames: 1003356160. Throughput: 0: 10681.7. Samples: 250933248. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:57:30,747][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:57:33,626][1652475] Updated weights for policy 0, policy_version 489968 (0.0013) [2024-06-15 17:57:34,047][1651340] Signal inference workers to stop experience collection... (25150 times) [2024-06-15 17:57:34,184][1652475] InferenceWorker_p0-w0: stopping experience collection (25150 times) [2024-06-15 17:57:34,235][1651340] Signal inference workers to resume experience collection... (25150 times) [2024-06-15 17:57:34,236][1652475] InferenceWorker_p0-w0: resuming experience collection (25150 times) [2024-06-15 17:57:35,115][1652475] Updated weights for policy 0, policy_version 490037 (0.0099) [2024-06-15 17:57:35,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 43542.6). Total num frames: 1003618304. Throughput: 0: 10870.0. Samples: 250967552. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:57:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 17:57:37,268][1652475] Updated weights for policy 0, policy_version 490115 (0.0013) [2024-06-15 17:57:38,711][1652475] Updated weights for policy 0, policy_version 490172 (0.0015) [2024-06-15 17:57:40,738][1648984] Fps is (10 sec: 52471.9, 60 sec: 43690.5, 300 sec: 42876.1). Total num frames: 1003880448. Throughput: 0: 10228.6. Samples: 251014144. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:57:40,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 17:57:45,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 40413.9, 300 sec: 42876.1). Total num frames: 1003945984. Throughput: 0: 10740.6. Samples: 251090944. Policy #0 lag: (min: 15.0, avg: 178.6, max: 271.0) [2024-06-15 17:57:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:57:46,483][1652475] Updated weights for policy 0, policy_version 490240 (0.0013) [2024-06-15 17:57:48,038][1652475] Updated weights for policy 0, policy_version 490304 (0.0012) [2024-06-15 17:57:49,267][1652475] Updated weights for policy 0, policy_version 490361 (0.0016) [2024-06-15 17:57:50,738][1648984] Fps is (10 sec: 49153.5, 60 sec: 43157.3, 300 sec: 43431.5). Total num frames: 1004371968. Throughput: 0: 10524.5. Samples: 251116032. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:57:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:57:50,802][1652475] Updated weights for policy 0, policy_version 490432 (0.0013) [2024-06-15 17:57:55,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.6, 300 sec: 42653.9). Total num frames: 1004404736. Throughput: 0: 10581.3. Samples: 251185664. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:57:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 17:57:57,865][1652475] Updated weights for policy 0, policy_version 490483 (0.0035) [2024-06-15 17:57:59,710][1652475] Updated weights for policy 0, policy_version 490550 (0.0013) [2024-06-15 17:58:00,738][1648984] Fps is (10 sec: 29490.3, 60 sec: 40413.7, 300 sec: 43209.3). Total num frames: 1004666880. Throughput: 0: 10558.5. Samples: 251241472. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:00,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 17:58:02,440][1652475] Updated weights for policy 0, policy_version 490624 (0.0013) [2024-06-15 17:58:03,980][1652475] Updated weights for policy 0, policy_version 490688 (0.0072) [2024-06-15 17:58:05,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1004929024. Throughput: 0: 10228.6. Samples: 251269120. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 17:58:10,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 1005060096. Throughput: 0: 10569.9. Samples: 251340288. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:58:12,069][1652475] Updated weights for policy 0, policy_version 490768 (0.0110) [2024-06-15 17:58:14,144][1652475] Updated weights for policy 0, policy_version 490848 (0.0013) [2024-06-15 17:58:15,049][1651340] Signal inference workers to stop experience collection... (25200 times) [2024-06-15 17:58:15,122][1652475] InferenceWorker_p0-w0: stopping experience collection (25200 times) [2024-06-15 17:58:15,282][1651340] Signal inference workers to resume experience collection... (25200 times) [2024-06-15 17:58:15,283][1652475] InferenceWorker_p0-w0: resuming experience collection (25200 times) [2024-06-15 17:58:15,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 43321.1). Total num frames: 1005387776. Throughput: 0: 10150.9. Samples: 251389952. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:58:16,134][1652475] Updated weights for policy 0, policy_version 490941 (0.0013) [2024-06-15 17:58:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.4, 300 sec: 42653.9). Total num frames: 1005453312. Throughput: 0: 10149.0. Samples: 251424256. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 17:58:22,282][1652475] Updated weights for policy 0, policy_version 491004 (0.0014) [2024-06-15 17:58:25,738][1648984] Fps is (10 sec: 29490.6, 60 sec: 40413.7, 300 sec: 43209.3). Total num frames: 1005682688. Throughput: 0: 10786.1. Samples: 251499520. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:25,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 17:58:26,878][1652475] Updated weights for policy 0, policy_version 491106 (0.0015) [2024-06-15 17:58:28,603][1652475] Updated weights for policy 0, policy_version 491200 (0.0090) [2024-06-15 17:58:30,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43696.9, 300 sec: 42987.2). Total num frames: 1005977600. Throughput: 0: 10262.8. Samples: 251552768. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:58:34,482][1652475] Updated weights for policy 0, policy_version 491260 (0.0015) [2024-06-15 17:58:35,738][1648984] Fps is (10 sec: 42599.3, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1006108672. Throughput: 0: 10626.8. Samples: 251594240. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:58:37,439][1652475] Updated weights for policy 0, policy_version 491312 (0.0014) [2024-06-15 17:58:38,555][1652475] Updated weights for policy 0, policy_version 491360 (0.0115) [2024-06-15 17:58:40,629][1652475] Updated weights for policy 0, policy_version 491447 (0.0078) [2024-06-15 17:58:40,738][1648984] Fps is (10 sec: 49148.7, 60 sec: 43144.3, 300 sec: 43431.4). Total num frames: 1006469120. Throughput: 0: 10353.6. Samples: 251651584. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:40,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:58:45,060][1652475] Updated weights for policy 0, policy_version 491488 (0.0014) [2024-06-15 17:58:45,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 1006632960. Throughput: 0: 10763.4. Samples: 251725824. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:45,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:58:48,559][1652475] Updated weights for policy 0, policy_version 491541 (0.0016) [2024-06-15 17:58:50,154][1652475] Updated weights for policy 0, policy_version 491601 (0.0037) [2024-06-15 17:58:50,738][1648984] Fps is (10 sec: 39323.7, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 1006862336. Throughput: 0: 10956.8. Samples: 251762176. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:50,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:58:52,100][1652475] Updated weights for policy 0, policy_version 491696 (0.0021) [2024-06-15 17:58:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1007026176. Throughput: 0: 10695.1. Samples: 251821568. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:58:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:58:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000491712_1007026176.pth... [2024-06-15 17:58:55,841][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000486752_996868096.pth [2024-06-15 17:58:56,628][1652475] Updated weights for policy 0, policy_version 491744 (0.0013) [2024-06-15 17:59:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42598.6, 300 sec: 43325.7). Total num frames: 1007222784. Throughput: 0: 11207.1. Samples: 251894272. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:59:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:59:00,976][1652475] Updated weights for policy 0, policy_version 491831 (0.0013) [2024-06-15 17:59:01,619][1651340] Signal inference workers to stop experience collection... (25250 times) [2024-06-15 17:59:01,679][1652475] InferenceWorker_p0-w0: stopping experience collection (25250 times) [2024-06-15 17:59:01,732][1651340] Signal inference workers to resume experience collection... (25250 times) [2024-06-15 17:59:01,733][1652475] InferenceWorker_p0-w0: resuming experience collection (25250 times) [2024-06-15 17:59:01,875][1652475] Updated weights for policy 0, policy_version 491859 (0.0094) [2024-06-15 17:59:03,645][1652475] Updated weights for policy 0, policy_version 491943 (0.0012) [2024-06-15 17:59:05,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43690.8, 300 sec: 43209.3). Total num frames: 1007550464. Throughput: 0: 11013.7. Samples: 251919872. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:59:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:59:08,009][1652475] Updated weights for policy 0, policy_version 491971 (0.0012) [2024-06-15 17:59:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1007681536. Throughput: 0: 11082.0. Samples: 251998208. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:59:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:59:11,054][1652475] Updated weights for policy 0, policy_version 492033 (0.0015) [2024-06-15 17:59:12,663][1652475] Updated weights for policy 0, policy_version 492091 (0.0015) [2024-06-15 17:59:15,005][1652475] Updated weights for policy 0, policy_version 492192 (0.0098) [2024-06-15 17:59:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 1008074752. Throughput: 0: 11059.2. Samples: 252050432. Policy #0 lag: (min: 20.0, avg: 181.0, max: 324.0) [2024-06-15 17:59:15,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:59:20,032][1652475] Updated weights for policy 0, policy_version 492256 (0.0078) [2024-06-15 17:59:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 43098.3). Total num frames: 1008205824. Throughput: 0: 11195.7. Samples: 252098048. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:59:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 17:59:24,305][1652475] Updated weights for policy 0, policy_version 492342 (0.0078) [2024-06-15 17:59:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 45875.4, 300 sec: 43431.5). Total num frames: 1008435200. Throughput: 0: 11332.4. Samples: 252161536. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:59:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 17:59:26,098][1652475] Updated weights for policy 0, policy_version 492416 (0.0017) [2024-06-15 17:59:27,739][1652475] Updated weights for policy 0, policy_version 492478 (0.0032) [2024-06-15 17:59:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1008599040. Throughput: 0: 11082.0. Samples: 252224512. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:59:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 17:59:33,721][1652475] Updated weights for policy 0, policy_version 492544 (0.0026) [2024-06-15 17:59:35,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1008762880. Throughput: 0: 11047.8. Samples: 252259328. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:59:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 17:59:37,128][1652475] Updated weights for policy 0, policy_version 492629 (0.0015) [2024-06-15 17:59:38,351][1652475] Updated weights for policy 0, policy_version 492675 (0.0015) [2024-06-15 17:59:39,701][1652475] Updated weights for policy 0, policy_version 492734 (0.0012) [2024-06-15 17:59:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44237.3, 300 sec: 43098.2). Total num frames: 1009123328. Throughput: 0: 10991.0. Samples: 252316160. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:59:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 17:59:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1009123328. Throughput: 0: 11093.3. Samples: 252393472. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:59:45,744][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 17:59:46,469][1651340] Signal inference workers to stop experience collection... (25300 times) [2024-06-15 17:59:46,528][1652475] InferenceWorker_p0-w0: stopping experience collection (25300 times) [2024-06-15 17:59:46,772][1651340] Signal inference workers to resume experience collection... (25300 times) [2024-06-15 17:59:46,773][1652475] InferenceWorker_p0-w0: resuming experience collection (25300 times) [2024-06-15 17:59:47,678][1652475] Updated weights for policy 0, policy_version 492819 (0.0115) [2024-06-15 17:59:48,332][1652475] Updated weights for policy 0, policy_version 492857 (0.0015) [2024-06-15 17:59:50,348][1652475] Updated weights for policy 0, policy_version 492928 (0.0109) [2024-06-15 17:59:50,754][1648984] Fps is (10 sec: 39259.0, 60 sec: 44225.1, 300 sec: 43096.0). Total num frames: 1009516544. Throughput: 0: 11112.1. Samples: 252420096. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:59:50,754][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 17:59:51,983][1652475] Updated weights for policy 0, policy_version 492988 (0.0011) [2024-06-15 17:59:55,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1009647616. Throughput: 0: 10683.7. Samples: 252478976. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:59:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 17:59:58,983][1652475] Updated weights for policy 0, policy_version 493056 (0.0099) [2024-06-15 18:00:00,738][1648984] Fps is (10 sec: 26256.0, 60 sec: 42598.3, 300 sec: 42987.2). Total num frames: 1009778688. Throughput: 0: 10990.9. Samples: 252545024. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 18:00:00,739][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 18:00:02,350][1652475] Updated weights for policy 0, policy_version 493121 (0.0013) [2024-06-15 18:00:04,252][1652475] Updated weights for policy 0, policy_version 493200 (0.0022) [2024-06-15 18:00:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1010171904. Throughput: 0: 10422.0. Samples: 252567040. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 18:00:05,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:00:10,127][1652475] Updated weights for policy 0, policy_version 493264 (0.0014) [2024-06-15 18:00:10,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1010237440. Throughput: 0: 10478.9. Samples: 252633088. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 18:00:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:00:13,550][1652475] Updated weights for policy 0, policy_version 493333 (0.0013) [2024-06-15 18:00:15,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 39867.8, 300 sec: 42876.1). Total num frames: 1010466816. Throughput: 0: 10524.5. Samples: 252698112. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 18:00:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:00:16,087][1652475] Updated weights for policy 0, policy_version 493424 (0.0015) [2024-06-15 18:00:17,683][1652475] Updated weights for policy 0, policy_version 493504 (0.0014) [2024-06-15 18:00:20,739][1648984] Fps is (10 sec: 45869.0, 60 sec: 41505.2, 300 sec: 42653.8). Total num frames: 1010696192. Throughput: 0: 10308.0. Samples: 252723200. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 18:00:20,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:00:25,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 39867.7, 300 sec: 42653.9). Total num frames: 1010827264. Throughput: 0: 10604.1. Samples: 252793344. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 18:00:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:00:25,923][1652475] Updated weights for policy 0, policy_version 493573 (0.0105) [2024-06-15 18:00:27,740][1652475] Updated weights for policy 0, policy_version 493649 (0.0013) [2024-06-15 18:00:28,660][1651340] Signal inference workers to stop experience collection... (25350 times) [2024-06-15 18:00:28,691][1652475] InferenceWorker_p0-w0: stopping experience collection (25350 times) [2024-06-15 18:00:28,894][1651340] Signal inference workers to resume experience collection... (25350 times) [2024-06-15 18:00:28,921][1652475] InferenceWorker_p0-w0: resuming experience collection (25350 times) [2024-06-15 18:00:29,611][1652475] Updated weights for policy 0, policy_version 493744 (0.0013) [2024-06-15 18:00:30,738][1648984] Fps is (10 sec: 52435.9, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1011220480. Throughput: 0: 10228.6. Samples: 252853760. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 18:00:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:00:33,253][1652475] Updated weights for policy 0, policy_version 493782 (0.0014) [2024-06-15 18:00:35,747][1648984] Fps is (10 sec: 52380.3, 60 sec: 43137.8, 300 sec: 43096.9). Total num frames: 1011351552. Throughput: 0: 10548.8. Samples: 252894720. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 18:00:35,748][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:00:37,656][1652475] Updated weights for policy 0, policy_version 493827 (0.0012) [2024-06-15 18:00:39,479][1652475] Updated weights for policy 0, policy_version 493910 (0.0108) [2024-06-15 18:00:40,738][1648984] Fps is (10 sec: 42596.7, 60 sec: 42052.0, 300 sec: 42987.1). Total num frames: 1011646464. Throughput: 0: 10729.1. Samples: 252961792. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 18:00:40,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:00:41,016][1652475] Updated weights for policy 0, policy_version 493984 (0.0014) [2024-06-15 18:00:44,536][1652475] Updated weights for policy 0, policy_version 494049 (0.0017) [2024-06-15 18:00:45,738][1648984] Fps is (10 sec: 52477.3, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 1011875840. Throughput: 0: 10729.3. Samples: 253027840. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 18:00:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:00:49,846][1652475] Updated weights for policy 0, policy_version 494114 (0.0014) [2024-06-15 18:00:50,738][1648984] Fps is (10 sec: 39323.4, 60 sec: 42063.5, 300 sec: 43209.3). Total num frames: 1012039680. Throughput: 0: 11138.9. Samples: 253068288. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:00:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:00:51,825][1652475] Updated weights for policy 0, policy_version 494201 (0.0017) [2024-06-15 18:00:53,153][1652475] Updated weights for policy 0, policy_version 494262 (0.0013) [2024-06-15 18:00:55,739][1648984] Fps is (10 sec: 39316.2, 60 sec: 43689.6, 300 sec: 42653.9). Total num frames: 1012269056. Throughput: 0: 10979.2. Samples: 253127168. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:00:55,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:00:56,415][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000494304_1012334592.pth... [2024-06-15 18:00:56,535][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000489200_1001881600.pth [2024-06-15 18:00:56,808][1652475] Updated weights for policy 0, policy_version 494320 (0.0019) [2024-06-15 18:01:00,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1012400128. Throughput: 0: 11127.4. Samples: 253198848. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:01:01,928][1652475] Updated weights for policy 0, policy_version 494371 (0.0157) [2024-06-15 18:01:04,000][1652475] Updated weights for policy 0, policy_version 494460 (0.0208) [2024-06-15 18:01:05,671][1652475] Updated weights for policy 0, policy_version 494520 (0.0090) [2024-06-15 18:01:05,738][1648984] Fps is (10 sec: 49157.8, 60 sec: 43144.3, 300 sec: 42987.1). Total num frames: 1012760576. Throughput: 0: 11127.7. Samples: 253223936. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:05,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:01:08,311][1652475] Updated weights for policy 0, policy_version 494560 (0.0012) [2024-06-15 18:01:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 1012924416. Throughput: 0: 10911.3. Samples: 253284352. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:01:14,679][1652475] Updated weights for policy 0, policy_version 494640 (0.0036) [2024-06-15 18:01:14,846][1651340] Signal inference workers to stop experience collection... (25400 times) [2024-06-15 18:01:14,937][1652475] InferenceWorker_p0-w0: stopping experience collection (25400 times) [2024-06-15 18:01:15,048][1651340] Signal inference workers to resume experience collection... (25400 times) [2024-06-15 18:01:15,051][1652475] InferenceWorker_p0-w0: resuming experience collection (25400 times) [2024-06-15 18:01:15,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 44236.6, 300 sec: 43098.3). Total num frames: 1013121024. Throughput: 0: 11172.9. Samples: 253356544. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:01:16,004][1652475] Updated weights for policy 0, policy_version 494707 (0.0013) [2024-06-15 18:01:17,191][1652475] Updated weights for policy 0, policy_version 494741 (0.0013) [2024-06-15 18:01:19,517][1652475] Updated weights for policy 0, policy_version 494800 (0.0013) [2024-06-15 18:01:20,420][1652475] Updated weights for policy 0, policy_version 494838 (0.0011) [2024-06-15 18:01:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 45876.3, 300 sec: 43098.3). Total num frames: 1013448704. Throughput: 0: 10970.5. Samples: 253388288. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:01:25,738][1648984] Fps is (10 sec: 42599.3, 60 sec: 45329.2, 300 sec: 43320.4). Total num frames: 1013547008. Throughput: 0: 11127.6. Samples: 253462528. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:01:25,773][1652475] Updated weights for policy 0, policy_version 494899 (0.0014) [2024-06-15 18:01:27,263][1652475] Updated weights for policy 0, policy_version 494972 (0.0014) [2024-06-15 18:01:29,411][1652475] Updated weights for policy 0, policy_version 495011 (0.0045) [2024-06-15 18:01:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1013841920. Throughput: 0: 10956.8. Samples: 253520896. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:01:32,201][1652475] Updated weights for policy 0, policy_version 495093 (0.0014) [2024-06-15 18:01:35,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 43697.4, 300 sec: 43098.2). Total num frames: 1013972992. Throughput: 0: 10820.2. Samples: 253555200. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:35,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:01:38,480][1652475] Updated weights for policy 0, policy_version 495168 (0.0016) [2024-06-15 18:01:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.9, 300 sec: 43098.3). Total num frames: 1014235136. Throughput: 0: 10854.8. Samples: 253615616. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:01:41,467][1652475] Updated weights for policy 0, policy_version 495249 (0.0022) [2024-06-15 18:01:43,555][1652475] Updated weights for policy 0, policy_version 495312 (0.0011) [2024-06-15 18:01:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43100.8). Total num frames: 1014497280. Throughput: 0: 10649.6. Samples: 253678080. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:01:49,352][1652475] Updated weights for policy 0, policy_version 495376 (0.0014) [2024-06-15 18:01:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43144.5, 300 sec: 43320.5). Total num frames: 1014628352. Throughput: 0: 10877.2. Samples: 253713408. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:01:51,432][1652475] Updated weights for policy 0, policy_version 495456 (0.0014) [2024-06-15 18:01:53,947][1652475] Updated weights for policy 0, policy_version 495521 (0.0096) [2024-06-15 18:01:55,739][1648984] Fps is (10 sec: 39321.7, 60 sec: 43691.7, 300 sec: 42876.1). Total num frames: 1014890496. Throughput: 0: 10831.6. Samples: 253771776. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:01:55,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:01:57,926][1652475] Updated weights for policy 0, policy_version 495557 (0.0012) [2024-06-15 18:02:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1015021568. Throughput: 0: 10820.3. Samples: 253843456. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:02:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:02:00,800][1652475] Updated weights for policy 0, policy_version 495618 (0.0012) [2024-06-15 18:02:01,084][1651340] Signal inference workers to stop experience collection... (25450 times) [2024-06-15 18:02:01,134][1652475] InferenceWorker_p0-w0: stopping experience collection (25450 times) [2024-06-15 18:02:01,370][1651340] Signal inference workers to resume experience collection... (25450 times) [2024-06-15 18:02:01,370][1652475] InferenceWorker_p0-w0: resuming experience collection (25450 times) [2024-06-15 18:02:04,913][1652475] Updated weights for policy 0, policy_version 495745 (0.0081) [2024-06-15 18:02:05,738][1648984] Fps is (10 sec: 45874.2, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 1015349248. Throughput: 0: 10740.5. Samples: 253871616. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:02:05,739][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 18:02:10,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 1015447552. Throughput: 0: 10513.0. Samples: 253935616. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:02:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:02:10,835][1652475] Updated weights for policy 0, policy_version 495830 (0.0014) [2024-06-15 18:02:11,547][1652475] Updated weights for policy 0, policy_version 495872 (0.0014) [2024-06-15 18:02:15,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42598.3, 300 sec: 43431.4). Total num frames: 1015676928. Throughput: 0: 10683.7. Samples: 254001664. Policy #0 lag: (min: 14.0, avg: 80.5, max: 270.0) [2024-06-15 18:02:15,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:02:16,333][1652475] Updated weights for policy 0, policy_version 495953 (0.0017) [2024-06-15 18:02:17,697][1652475] Updated weights for policy 0, policy_version 496016 (0.0012) [2024-06-15 18:02:20,737][1648984] Fps is (10 sec: 49153.4, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 1015939072. Throughput: 0: 10490.4. Samples: 254027264. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:02:20,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 18:02:22,258][1652475] Updated weights for policy 0, policy_version 496096 (0.0014) [2024-06-15 18:02:25,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 42052.2, 300 sec: 43099.5). Total num frames: 1016070144. Throughput: 0: 10547.2. Samples: 254090240. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:02:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:02:27,208][1652475] Updated weights for policy 0, policy_version 496163 (0.0036) [2024-06-15 18:02:29,314][1652475] Updated weights for policy 0, policy_version 496249 (0.0014) [2024-06-15 18:02:30,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 1016397824. Throughput: 0: 10592.7. Samples: 254154752. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:02:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:02:31,434][1652475] Updated weights for policy 0, policy_version 496319 (0.0013) [2024-06-15 18:02:34,282][1652475] Updated weights for policy 0, policy_version 496382 (0.0013) [2024-06-15 18:02:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1016594432. Throughput: 0: 10501.7. Samples: 254185984. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:02:35,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 18:02:40,738][1648984] Fps is (10 sec: 26214.8, 60 sec: 40413.9, 300 sec: 43098.3). Total num frames: 1016659968. Throughput: 0: 10843.0. Samples: 254259712. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:02:40,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 18:02:42,188][1652475] Updated weights for policy 0, policy_version 496466 (0.0015) [2024-06-15 18:02:43,945][1652475] Updated weights for policy 0, policy_version 496548 (0.0091) [2024-06-15 18:02:45,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.4, 300 sec: 42876.1). Total num frames: 1017020416. Throughput: 0: 10353.8. Samples: 254309376. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:02:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:02:46,124][1651340] Signal inference workers to stop experience collection... (25500 times) [2024-06-15 18:02:46,166][1652475] InferenceWorker_p0-w0: stopping experience collection (25500 times) [2024-06-15 18:02:46,492][1651340] Signal inference workers to resume experience collection... (25500 times) [2024-06-15 18:02:46,493][1652475] InferenceWorker_p0-w0: resuming experience collection (25500 times) [2024-06-15 18:02:46,658][1652475] Updated weights for policy 0, policy_version 496630 (0.0013) [2024-06-15 18:02:50,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1017118720. Throughput: 0: 10376.6. Samples: 254338560. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:02:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:02:53,431][1652475] Updated weights for policy 0, policy_version 496674 (0.0017) [2024-06-15 18:02:54,665][1652475] Updated weights for policy 0, policy_version 496706 (0.0012) [2024-06-15 18:02:55,738][1648984] Fps is (10 sec: 32767.5, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 1017348096. Throughput: 0: 10456.2. Samples: 254406144. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:02:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:02:55,807][1652475] Updated weights for policy 0, policy_version 496768 (0.0117) [2024-06-15 18:02:56,115][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000496784_1017413632.pth... [2024-06-15 18:02:56,246][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000491712_1007026176.pth [2024-06-15 18:02:58,014][1652475] Updated weights for policy 0, policy_version 496848 (0.0041) [2024-06-15 18:03:00,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1017643008. Throughput: 0: 10296.9. Samples: 254465024. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:03:00,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:03:03,899][1652475] Updated weights for policy 0, policy_version 496897 (0.0023) [2024-06-15 18:03:05,259][1652475] Updated weights for policy 0, policy_version 496949 (0.0012) [2024-06-15 18:03:05,739][1648984] Fps is (10 sec: 42593.4, 60 sec: 40413.2, 300 sec: 43098.1). Total num frames: 1017774080. Throughput: 0: 10592.4. Samples: 254503936. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:03:05,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:03:06,984][1652475] Updated weights for policy 0, policy_version 496997 (0.0011) [2024-06-15 18:03:08,983][1652475] Updated weights for policy 0, policy_version 497057 (0.0014) [2024-06-15 18:03:10,738][1648984] Fps is (10 sec: 49149.7, 60 sec: 44782.6, 300 sec: 43209.3). Total num frames: 1018134528. Throughput: 0: 10638.1. Samples: 254568960. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:03:10,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:03:10,871][1652475] Updated weights for policy 0, policy_version 497146 (0.0013) [2024-06-15 18:03:15,738][1648984] Fps is (10 sec: 42603.6, 60 sec: 42052.4, 300 sec: 43209.3). Total num frames: 1018200064. Throughput: 0: 10808.9. Samples: 254641152. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:03:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:03:16,792][1652475] Updated weights for policy 0, policy_version 497214 (0.0013) [2024-06-15 18:03:18,673][1652475] Updated weights for policy 0, policy_version 497280 (0.0012) [2024-06-15 18:03:20,739][1648984] Fps is (10 sec: 39318.9, 60 sec: 43143.5, 300 sec: 43542.4). Total num frames: 1018527744. Throughput: 0: 10876.9. Samples: 254675456. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:03:20,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:03:20,888][1652475] Updated weights for policy 0, policy_version 497344 (0.0029) [2024-06-15 18:03:25,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1018691584. Throughput: 0: 10569.9. Samples: 254735360. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:03:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:03:28,054][1652475] Updated weights for policy 0, policy_version 497440 (0.0013) [2024-06-15 18:03:30,087][1652475] Updated weights for policy 0, policy_version 497475 (0.0011) [2024-06-15 18:03:30,738][1648984] Fps is (10 sec: 36049.1, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 1018888192. Throughput: 0: 11116.1. Samples: 254809600. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:03:30,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:03:32,116][1651340] Signal inference workers to stop experience collection... (25550 times) [2024-06-15 18:03:32,166][1652475] InferenceWorker_p0-w0: stopping experience collection (25550 times) [2024-06-15 18:03:32,399][1651340] Signal inference workers to resume experience collection... (25550 times) [2024-06-15 18:03:32,400][1652475] InferenceWorker_p0-w0: resuming experience collection (25550 times) [2024-06-15 18:03:32,667][1652475] Updated weights for policy 0, policy_version 497597 (0.0203) [2024-06-15 18:03:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 43209.4). Total num frames: 1019215872. Throughput: 0: 10899.9. Samples: 254829056. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:03:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:03:39,219][1652475] Updated weights for policy 0, policy_version 497665 (0.0018) [2024-06-15 18:03:40,451][1652475] Updated weights for policy 0, policy_version 497725 (0.0023) [2024-06-15 18:03:40,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 43098.3). Total num frames: 1019346944. Throughput: 0: 11173.0. Samples: 254908928. Policy #0 lag: (min: 15.0, avg: 101.9, max: 207.0) [2024-06-15 18:03:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:03:42,924][1652475] Updated weights for policy 0, policy_version 497792 (0.0016) [2024-06-15 18:03:44,930][1652475] Updated weights for policy 0, policy_version 497875 (0.0014) [2024-06-15 18:03:45,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 1019707392. Throughput: 0: 11138.9. Samples: 254966272. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:03:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:03:45,876][1652475] Updated weights for policy 0, policy_version 497917 (0.0012) [2024-06-15 18:03:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1019740160. Throughput: 0: 11150.5. Samples: 255005696. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:03:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:03:51,904][1652475] Updated weights for policy 0, policy_version 497984 (0.0013) [2024-06-15 18:03:54,112][1652475] Updated weights for policy 0, policy_version 498043 (0.0012) [2024-06-15 18:03:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 43653.6). Total num frames: 1020100608. Throughput: 0: 11264.1. Samples: 255075840. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:03:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:03:55,936][1652475] Updated weights for policy 0, policy_version 498112 (0.0014) [2024-06-15 18:04:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1020264448. Throughput: 0: 11013.7. Samples: 255136768. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:04:02,511][1652475] Updated weights for policy 0, policy_version 498192 (0.0013) [2024-06-15 18:04:05,172][1652475] Updated weights for policy 0, policy_version 498256 (0.0014) [2024-06-15 18:04:05,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 44783.9, 300 sec: 43320.4). Total num frames: 1020461056. Throughput: 0: 11036.8. Samples: 255172096. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:04:06,040][1652475] Updated weights for policy 0, policy_version 498302 (0.0019) [2024-06-15 18:04:09,353][1652475] Updated weights for policy 0, policy_version 498384 (0.0015) [2024-06-15 18:04:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44237.2, 300 sec: 43098.2). Total num frames: 1020788736. Throughput: 0: 11047.8. Samples: 255232512. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:04:14,894][1652475] Updated weights for policy 0, policy_version 498480 (0.0014) [2024-06-15 18:04:15,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 45328.9, 300 sec: 43098.2). Total num frames: 1020919808. Throughput: 0: 10888.5. Samples: 255299584. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:15,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:04:17,514][1652475] Updated weights for policy 0, policy_version 498551 (0.0013) [2024-06-15 18:04:19,144][1651340] Signal inference workers to stop experience collection... (25600 times) [2024-06-15 18:04:19,204][1652475] InferenceWorker_p0-w0: stopping experience collection (25600 times) [2024-06-15 18:04:19,493][1651340] Signal inference workers to resume experience collection... (25600 times) [2024-06-15 18:04:19,495][1652475] InferenceWorker_p0-w0: resuming experience collection (25600 times) [2024-06-15 18:04:20,588][1652475] Updated weights for policy 0, policy_version 498623 (0.0033) [2024-06-15 18:04:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44237.7, 300 sec: 43209.3). Total num frames: 1021181952. Throughput: 0: 11218.5. Samples: 255333888. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:04:23,263][1652475] Updated weights for policy 0, policy_version 498681 (0.0013) [2024-06-15 18:04:25,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1021313024. Throughput: 0: 10820.2. Samples: 255395840. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:25,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:04:26,827][1652475] Updated weights for policy 0, policy_version 498750 (0.0011) [2024-06-15 18:04:30,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 43144.6, 300 sec: 43098.3). Total num frames: 1021476864. Throughput: 0: 11036.5. Samples: 255462912. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:04:31,624][1652475] Updated weights for policy 0, policy_version 498802 (0.0128) [2024-06-15 18:04:33,193][1652475] Updated weights for policy 0, policy_version 498875 (0.0035) [2024-06-15 18:04:34,889][1652475] Updated weights for policy 0, policy_version 498942 (0.0012) [2024-06-15 18:04:35,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1021837312. Throughput: 0: 10695.1. Samples: 255486976. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:04:39,637][1652475] Updated weights for policy 0, policy_version 498998 (0.0014) [2024-06-15 18:04:40,738][1648984] Fps is (10 sec: 49149.8, 60 sec: 43690.4, 300 sec: 43542.5). Total num frames: 1021968384. Throughput: 0: 10751.9. Samples: 255559680. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:40,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:04:41,920][1652475] Updated weights for policy 0, policy_version 499040 (0.0025) [2024-06-15 18:04:44,843][1652475] Updated weights for policy 0, policy_version 499120 (0.0016) [2024-06-15 18:04:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 43211.7). Total num frames: 1022263296. Throughput: 0: 10763.4. Samples: 255621120. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:04:45,828][1652475] Updated weights for policy 0, policy_version 499159 (0.0094) [2024-06-15 18:04:50,738][1648984] Fps is (10 sec: 39323.2, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1022361600. Throughput: 0: 10717.9. Samples: 255654400. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:04:52,679][1652475] Updated weights for policy 0, policy_version 499248 (0.0014) [2024-06-15 18:04:54,173][1652475] Updated weights for policy 0, policy_version 499312 (0.0012) [2024-06-15 18:04:55,150][1652475] Updated weights for policy 0, policy_version 499344 (0.0023) [2024-06-15 18:04:55,767][1648984] Fps is (10 sec: 42476.2, 60 sec: 43123.9, 300 sec: 43760.5). Total num frames: 1022689280. Throughput: 0: 10836.1. Samples: 255720448. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:04:55,768][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:04:56,171][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000499392_1022754816.pth... [2024-06-15 18:04:56,229][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000494304_1012334592.pth [2024-06-15 18:04:57,259][1652475] Updated weights for policy 0, policy_version 499397 (0.0013) [2024-06-15 18:04:58,640][1652475] Updated weights for policy 0, policy_version 499454 (0.0013) [2024-06-15 18:05:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1022885888. Throughput: 0: 10865.8. Samples: 255788544. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:05:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:05:05,738][1648984] Fps is (10 sec: 32862.4, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1023016960. Throughput: 0: 10956.8. Samples: 255826944. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:05:05,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 18:05:06,048][1652475] Updated weights for policy 0, policy_version 499536 (0.0013) [2024-06-15 18:05:06,516][1651340] Signal inference workers to stop experience collection... (25650 times) [2024-06-15 18:05:06,558][1652475] InferenceWorker_p0-w0: stopping experience collection (25650 times) [2024-06-15 18:05:06,719][1651340] Signal inference workers to resume experience collection... (25650 times) [2024-06-15 18:05:06,720][1652475] InferenceWorker_p0-w0: resuming experience collection (25650 times) [2024-06-15 18:05:06,989][1652475] Updated weights for policy 0, policy_version 499584 (0.0012) [2024-06-15 18:05:08,827][1652475] Updated weights for policy 0, policy_version 499652 (0.0014) [2024-06-15 18:05:10,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 1023410176. Throughput: 0: 10808.9. Samples: 255882240. Policy #0 lag: (min: 15.0, avg: 135.0, max: 287.0) [2024-06-15 18:05:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:05:15,249][1652475] Updated weights for policy 0, policy_version 499714 (0.0013) [2024-06-15 18:05:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.4, 300 sec: 43209.5). Total num frames: 1023442944. Throughput: 0: 10911.3. Samples: 255953920. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:05:15,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:05:16,348][1652475] Updated weights for policy 0, policy_version 499776 (0.0027) [2024-06-15 18:05:19,628][1652475] Updated weights for policy 0, policy_version 499842 (0.0014) [2024-06-15 18:05:20,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 1023770624. Throughput: 0: 11025.1. Samples: 255983104. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:05:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 18:05:20,886][1652475] Updated weights for policy 0, policy_version 499895 (0.0014) [2024-06-15 18:05:22,351][1652475] Updated weights for policy 0, policy_version 499955 (0.0015) [2024-06-15 18:05:25,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 1023934464. Throughput: 0: 10968.3. Samples: 256053248. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:05:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:05:27,337][1652475] Updated weights for policy 0, policy_version 500032 (0.0108) [2024-06-15 18:05:29,519][1652475] Updated weights for policy 0, policy_version 500091 (0.0012) [2024-06-15 18:05:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 43543.9). Total num frames: 1024196608. Throughput: 0: 11047.8. Samples: 256118272. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:05:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:05:32,845][1652475] Updated weights for policy 0, policy_version 500157 (0.0013) [2024-06-15 18:05:35,200][1652475] Updated weights for policy 0, policy_version 500221 (0.0013) [2024-06-15 18:05:35,749][1648984] Fps is (10 sec: 52369.7, 60 sec: 43682.4, 300 sec: 43429.9). Total num frames: 1024458752. Throughput: 0: 11010.9. Samples: 256150016. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:05:35,749][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:05:38,668][1652475] Updated weights for policy 0, policy_version 500288 (0.0014) [2024-06-15 18:05:40,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 44783.2, 300 sec: 43320.4). Total num frames: 1024655360. Throughput: 0: 11123.2. Samples: 256220672. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:05:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:05:43,229][1652475] Updated weights for policy 0, policy_version 500355 (0.0012) [2024-06-15 18:05:44,584][1652475] Updated weights for policy 0, policy_version 500415 (0.0034) [2024-06-15 18:05:45,738][1648984] Fps is (10 sec: 42645.8, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 1024884736. Throughput: 0: 11172.9. Samples: 256291328. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:05:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:05:46,443][1652475] Updated weights for policy 0, policy_version 500471 (0.0015) [2024-06-15 18:05:49,789][1652475] Updated weights for policy 0, policy_version 500528 (0.0015) [2024-06-15 18:05:50,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 43542.8). Total num frames: 1025114112. Throughput: 0: 11070.6. Samples: 256325120. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:05:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:05:52,572][1652475] Updated weights for policy 0, policy_version 500604 (0.0015) [2024-06-15 18:05:55,062][1651340] Signal inference workers to stop experience collection... (25700 times) [2024-06-15 18:05:55,160][1652475] InferenceWorker_p0-w0: stopping experience collection (25700 times) [2024-06-15 18:05:55,398][1651340] Signal inference workers to resume experience collection... (25700 times) [2024-06-15 18:05:55,399][1652475] InferenceWorker_p0-w0: resuming experience collection (25700 times) [2024-06-15 18:05:55,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43165.3, 300 sec: 43653.7). Total num frames: 1025277952. Throughput: 0: 11320.9. Samples: 256391680. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:05:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:05:56,427][1652475] Updated weights for policy 0, policy_version 500670 (0.0012) [2024-06-15 18:05:58,321][1652475] Updated weights for policy 0, policy_version 500736 (0.0014) [2024-06-15 18:06:00,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 1025572864. Throughput: 0: 11286.8. Samples: 256461824. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:06:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:06:03,248][1652475] Updated weights for policy 0, policy_version 500832 (0.0132) [2024-06-15 18:06:05,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 1025769472. Throughput: 0: 11366.4. Samples: 256494592. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:06:05,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:06:06,649][1652475] Updated weights for policy 0, policy_version 500880 (0.0015) [2024-06-15 18:06:07,610][1652475] Updated weights for policy 0, policy_version 500922 (0.0012) [2024-06-15 18:06:09,724][1652475] Updated weights for policy 0, policy_version 500992 (0.0031) [2024-06-15 18:06:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 1026031616. Throughput: 0: 11343.6. Samples: 256563712. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:06:10,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:06:15,732][1652475] Updated weights for policy 0, policy_version 501091 (0.0013) [2024-06-15 18:06:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 46421.4, 300 sec: 43320.4). Total num frames: 1026228224. Throughput: 0: 11355.0. Samples: 256629248. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:06:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:06:18,634][1652475] Updated weights for policy 0, policy_version 501138 (0.0017) [2024-06-15 18:06:20,119][1652475] Updated weights for policy 0, policy_version 501189 (0.0014) [2024-06-15 18:06:20,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 45329.0, 300 sec: 43875.8). Total num frames: 1026490368. Throughput: 0: 11471.7. Samples: 256666112. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:06:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:06:23,187][1652475] Updated weights for policy 0, policy_version 501264 (0.0012) [2024-06-15 18:06:24,348][1652475] Updated weights for policy 0, policy_version 501312 (0.0012) [2024-06-15 18:06:25,738][1648984] Fps is (10 sec: 45874.1, 60 sec: 45875.0, 300 sec: 43542.5). Total num frames: 1026686976. Throughput: 0: 11172.9. Samples: 256723456. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:06:25,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:06:29,636][1652475] Updated weights for policy 0, policy_version 501376 (0.0023) [2024-06-15 18:06:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45329.0, 300 sec: 43875.8). Total num frames: 1026916352. Throughput: 0: 11309.5. Samples: 256800256. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:06:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:06:30,752][1652475] Updated weights for policy 0, policy_version 501431 (0.0016) [2024-06-15 18:06:31,794][1652475] Updated weights for policy 0, policy_version 501472 (0.0017) [2024-06-15 18:06:34,484][1652475] Updated weights for policy 0, policy_version 501537 (0.0014) [2024-06-15 18:06:35,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 45883.8, 300 sec: 43986.9). Total num frames: 1027211264. Throughput: 0: 11252.6. Samples: 256831488. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:06:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:06:40,378][1652475] Updated weights for policy 0, policy_version 501585 (0.0014) [2024-06-15 18:06:40,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 1027276800. Throughput: 0: 11559.8. Samples: 256911872. Policy #0 lag: (min: 12.0, avg: 83.3, max: 268.0) [2024-06-15 18:06:40,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:06:41,031][1651340] Signal inference workers to stop experience collection... (25750 times) [2024-06-15 18:06:41,095][1652475] InferenceWorker_p0-w0: stopping experience collection (25750 times) [2024-06-15 18:06:41,207][1651340] Signal inference workers to resume experience collection... (25750 times) [2024-06-15 18:06:41,209][1652475] InferenceWorker_p0-w0: resuming experience collection (25750 times) [2024-06-15 18:06:41,749][1652475] Updated weights for policy 0, policy_version 501657 (0.0097) [2024-06-15 18:06:43,670][1652475] Updated weights for policy 0, policy_version 501728 (0.0259) [2024-06-15 18:06:45,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 45329.0, 300 sec: 43986.8). Total num frames: 1027604480. Throughput: 0: 11184.3. Samples: 256965120. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:06:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:06:46,090][1652475] Updated weights for policy 0, policy_version 501766 (0.0013) [2024-06-15 18:06:47,209][1652475] Updated weights for policy 0, policy_version 501818 (0.0080) [2024-06-15 18:06:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1027735552. Throughput: 0: 11173.0. Samples: 256997376. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:06:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:06:52,139][1652475] Updated weights for policy 0, policy_version 501888 (0.0011) [2024-06-15 18:06:54,331][1652475] Updated weights for policy 0, policy_version 501952 (0.0012) [2024-06-15 18:06:55,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 45875.0, 300 sec: 44097.9). Total num frames: 1028030464. Throughput: 0: 11116.0. Samples: 257063936. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:06:55,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:06:56,187][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000502000_1028096000.pth... [2024-06-15 18:06:56,269][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000496784_1017413632.pth [2024-06-15 18:06:59,102][1652475] Updated weights for policy 0, policy_version 502017 (0.0138) [2024-06-15 18:07:00,405][1652475] Updated weights for policy 0, policy_version 502072 (0.0011) [2024-06-15 18:07:00,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 44783.0, 300 sec: 43764.8). Total num frames: 1028259840. Throughput: 0: 11002.3. Samples: 257124352. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:07:03,103][1652475] Updated weights for policy 0, policy_version 502128 (0.0014) [2024-06-15 18:07:05,017][1652475] Updated weights for policy 0, policy_version 502198 (0.0098) [2024-06-15 18:07:05,738][1648984] Fps is (10 sec: 49153.4, 60 sec: 45875.2, 300 sec: 44320.1). Total num frames: 1028521984. Throughput: 0: 11025.1. Samples: 257162240. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:05,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 18:07:09,130][1652475] Updated weights for policy 0, policy_version 502225 (0.0011) [2024-06-15 18:07:10,139][1652475] Updated weights for policy 0, policy_version 502272 (0.0021) [2024-06-15 18:07:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1028653056. Throughput: 0: 11252.7. Samples: 257229824. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:10,740][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 18:07:13,487][1652475] Updated weights for policy 0, policy_version 502337 (0.0015) [2024-06-15 18:07:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44783.0, 300 sec: 43986.9). Total num frames: 1028915200. Throughput: 0: 11013.7. Samples: 257295872. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:15,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:07:15,879][1652475] Updated weights for policy 0, policy_version 502416 (0.0013) [2024-06-15 18:07:20,567][1652475] Updated weights for policy 0, policy_version 502467 (0.0015) [2024-06-15 18:07:20,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 1029046272. Throughput: 0: 10990.9. Samples: 257326080. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:07:21,912][1652475] Updated weights for policy 0, policy_version 502521 (0.0017) [2024-06-15 18:07:23,362][1652475] Updated weights for policy 0, policy_version 502566 (0.0011) [2024-06-15 18:07:24,842][1652475] Updated weights for policy 0, policy_version 502597 (0.0013) [2024-06-15 18:07:25,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 45329.3, 300 sec: 44098.0). Total num frames: 1029406720. Throughput: 0: 10740.6. Samples: 257395200. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:07:25,864][1652475] Updated weights for policy 0, policy_version 502656 (0.0013) [2024-06-15 18:07:29,660][1651340] Signal inference workers to stop experience collection... (25800 times) [2024-06-15 18:07:29,723][1652475] InferenceWorker_p0-w0: stopping experience collection (25800 times) [2024-06-15 18:07:29,901][1651340] Signal inference workers to resume experience collection... (25800 times) [2024-06-15 18:07:29,902][1652475] InferenceWorker_p0-w0: resuming experience collection (25800 times) [2024-06-15 18:07:30,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.6, 300 sec: 43764.7). Total num frames: 1029505024. Throughput: 0: 11127.5. Samples: 257465856. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:07:31,792][1652475] Updated weights for policy 0, policy_version 502723 (0.0014) [2024-06-15 18:07:32,951][1652475] Updated weights for policy 0, policy_version 502775 (0.0016) [2024-06-15 18:07:34,198][1652475] Updated weights for policy 0, policy_version 502810 (0.0055) [2024-06-15 18:07:34,922][1652475] Updated weights for policy 0, policy_version 502848 (0.0011) [2024-06-15 18:07:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.6, 300 sec: 44653.3). Total num frames: 1029832704. Throughput: 0: 11047.8. Samples: 257494528. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:07:40,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 43875.8). Total num frames: 1029963776. Throughput: 0: 11173.0. Samples: 257566720. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:07:43,090][1652475] Updated weights for policy 0, policy_version 502944 (0.0111) [2024-06-15 18:07:44,766][1652475] Updated weights for policy 0, policy_version 503008 (0.0013) [2024-06-15 18:07:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 1030225920. Throughput: 0: 11229.9. Samples: 257629696. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:07:46,227][1652475] Updated weights for policy 0, policy_version 503072 (0.0019) [2024-06-15 18:07:47,990][1652475] Updated weights for policy 0, policy_version 503110 (0.0050) [2024-06-15 18:07:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 45875.3, 300 sec: 44542.3). Total num frames: 1030488064. Throughput: 0: 11116.1. Samples: 257662464. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:07:54,212][1652475] Updated weights for policy 0, policy_version 503171 (0.0013) [2024-06-15 18:07:55,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.7, 300 sec: 43986.9). Total num frames: 1030619136. Throughput: 0: 11309.5. Samples: 257738752. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:07:55,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:07:56,028][1652475] Updated weights for policy 0, policy_version 503248 (0.0113) [2024-06-15 18:07:58,214][1652475] Updated weights for policy 0, policy_version 503344 (0.0015) [2024-06-15 18:08:00,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 44653.5). Total num frames: 1030946816. Throughput: 0: 11082.0. Samples: 257794560. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:08:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:08:00,782][1652475] Updated weights for policy 0, policy_version 503392 (0.0084) [2024-06-15 18:08:05,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 43653.7). Total num frames: 1031012352. Throughput: 0: 11127.5. Samples: 257826816. Policy #0 lag: (min: 53.0, avg: 138.9, max: 253.0) [2024-06-15 18:08:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:08:06,342][1652475] Updated weights for policy 0, policy_version 503430 (0.0015) [2024-06-15 18:08:07,891][1652475] Updated weights for policy 0, policy_version 503504 (0.0012) [2024-06-15 18:08:09,641][1652475] Updated weights for policy 0, policy_version 503571 (0.0014) [2024-06-15 18:08:09,944][1651340] Signal inference workers to stop experience collection... (25850 times) [2024-06-15 18:08:09,988][1652475] InferenceWorker_p0-w0: stopping experience collection (25850 times) [2024-06-15 18:08:10,190][1651340] Signal inference workers to resume experience collection... (25850 times) [2024-06-15 18:08:10,191][1652475] InferenceWorker_p0-w0: resuming experience collection (25850 times) [2024-06-15 18:08:10,738][1648984] Fps is (10 sec: 45874.2, 60 sec: 45875.0, 300 sec: 44764.4). Total num frames: 1031405568. Throughput: 0: 11036.4. Samples: 257891840. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:08:10,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:08:12,091][1652475] Updated weights for policy 0, policy_version 503617 (0.0046) [2024-06-15 18:08:13,655][1652475] Updated weights for policy 0, policy_version 503680 (0.0017) [2024-06-15 18:08:15,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 44098.1). Total num frames: 1031536640. Throughput: 0: 10922.6. Samples: 257957376. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:08:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:08:20,152][1652475] Updated weights for policy 0, policy_version 503762 (0.0015) [2024-06-15 18:08:20,738][1648984] Fps is (10 sec: 36045.7, 60 sec: 45329.1, 300 sec: 44320.1). Total num frames: 1031766016. Throughput: 0: 11150.2. Samples: 257996288. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:08:20,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:08:21,280][1652475] Updated weights for policy 0, policy_version 503812 (0.0012) [2024-06-15 18:08:22,524][1652475] Updated weights for policy 0, policy_version 503868 (0.0013) [2024-06-15 18:08:24,958][1652475] Updated weights for policy 0, policy_version 503906 (0.0013) [2024-06-15 18:08:25,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44236.7, 300 sec: 44653.3). Total num frames: 1032060928. Throughput: 0: 10899.9. Samples: 258057216. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:08:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:08:29,896][1652475] Updated weights for policy 0, policy_version 503952 (0.0026) [2024-06-15 18:08:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 1032159232. Throughput: 0: 11093.3. Samples: 258128896. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:08:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:08:32,215][1652475] Updated weights for policy 0, policy_version 504049 (0.0013) [2024-06-15 18:08:33,336][1652475] Updated weights for policy 0, policy_version 504102 (0.0013) [2024-06-15 18:08:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 1032454144. Throughput: 0: 10865.8. Samples: 258151424. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:08:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:08:38,565][1652475] Updated weights for policy 0, policy_version 504160 (0.0013) [2024-06-15 18:08:40,743][1648984] Fps is (10 sec: 42578.1, 60 sec: 43687.2, 300 sec: 43652.9). Total num frames: 1032585216. Throughput: 0: 10705.3. Samples: 258220544. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:08:40,743][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:08:42,775][1652475] Updated weights for policy 0, policy_version 504240 (0.0014) [2024-06-15 18:08:44,160][1652475] Updated weights for policy 0, policy_version 504293 (0.0120) [2024-06-15 18:08:45,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 1032912896. Throughput: 0: 10854.4. Samples: 258283008. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:08:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:08:46,106][1652475] Updated weights for policy 0, policy_version 504375 (0.0013) [2024-06-15 18:08:50,738][1648984] Fps is (10 sec: 39340.3, 60 sec: 41506.1, 300 sec: 43653.6). Total num frames: 1032978432. Throughput: 0: 10865.8. Samples: 258315776. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:08:50,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:08:52,038][1652475] Updated weights for policy 0, policy_version 504419 (0.0022) [2024-06-15 18:08:53,067][1652475] Updated weights for policy 0, policy_version 504451 (0.0018) [2024-06-15 18:08:54,415][1651340] Signal inference workers to stop experience collection... (25900 times) [2024-06-15 18:08:54,492][1652475] InferenceWorker_p0-w0: stopping experience collection (25900 times) [2024-06-15 18:08:54,718][1651340] Signal inference workers to resume experience collection... (25900 times) [2024-06-15 18:08:54,718][1652475] InferenceWorker_p0-w0: resuming experience collection (25900 times) [2024-06-15 18:08:55,307][1652475] Updated weights for policy 0, policy_version 504545 (0.0015) [2024-06-15 18:08:55,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 45329.0, 300 sec: 44320.1). Total num frames: 1033338880. Throughput: 0: 11036.5. Samples: 258388480. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:08:55,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 18:08:56,385][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000504592_1033404416.pth... [2024-06-15 18:08:56,510][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000499392_1022754816.pth [2024-06-15 18:08:56,994][1652475] Updated weights for policy 0, policy_version 504609 (0.0148) [2024-06-15 18:09:00,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 42598.4, 300 sec: 44209.0). Total num frames: 1033502720. Throughput: 0: 10820.3. Samples: 258444288. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:09:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:09:04,036][1652475] Updated weights for policy 0, policy_version 504672 (0.0038) [2024-06-15 18:09:05,093][1652475] Updated weights for policy 0, policy_version 504710 (0.0014) [2024-06-15 18:09:05,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 1033699328. Throughput: 0: 10797.5. Samples: 258482176. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:09:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:09:06,174][1652475] Updated weights for policy 0, policy_version 504766 (0.0014) [2024-06-15 18:09:08,483][1652475] Updated weights for policy 0, policy_version 504816 (0.0012) [2024-06-15 18:09:10,428][1652475] Updated weights for policy 0, policy_version 504886 (0.0012) [2024-06-15 18:09:10,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 1034027008. Throughput: 0: 10820.3. Samples: 258544128. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:09:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:09:15,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 42598.5, 300 sec: 43764.7). Total num frames: 1034092544. Throughput: 0: 10638.3. Samples: 258607616. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:09:15,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:09:15,826][1652475] Updated weights for policy 0, policy_version 504946 (0.0016) [2024-06-15 18:09:17,273][1652475] Updated weights for policy 0, policy_version 505008 (0.0014) [2024-06-15 18:09:20,738][1648984] Fps is (10 sec: 26214.1, 60 sec: 42052.1, 300 sec: 43986.9). Total num frames: 1034289152. Throughput: 0: 10763.4. Samples: 258635776. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:09:20,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:09:21,218][1652475] Updated weights for policy 0, policy_version 505043 (0.0014) [2024-06-15 18:09:22,058][1652475] Updated weights for policy 0, policy_version 505087 (0.0013) [2024-06-15 18:09:24,061][1652475] Updated weights for policy 0, policy_version 505152 (0.0029) [2024-06-15 18:09:25,738][1648984] Fps is (10 sec: 45874.4, 60 sec: 41506.2, 300 sec: 44320.1). Total num frames: 1034551296. Throughput: 0: 10662.1. Samples: 258700288. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:09:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:09:27,195][1652475] Updated weights for policy 0, policy_version 505216 (0.0101) [2024-06-15 18:09:29,057][1652475] Updated weights for policy 0, policy_version 505279 (0.0011) [2024-06-15 18:09:30,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 1034813440. Throughput: 0: 10752.0. Samples: 258766848. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:09:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:09:34,704][1652475] Updated weights for policy 0, policy_version 505341 (0.0011) [2024-06-15 18:09:35,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 42052.3, 300 sec: 44098.0). Total num frames: 1034977280. Throughput: 0: 10854.4. Samples: 258804224. Policy #0 lag: (min: 79.0, avg: 186.9, max: 349.0) [2024-06-15 18:09:35,740][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:09:38,486][1652475] Updated weights for policy 0, policy_version 505424 (0.0032) [2024-06-15 18:09:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43694.2, 300 sec: 43875.8). Total num frames: 1035206656. Throughput: 0: 10456.2. Samples: 258859008. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:09:40,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 18:09:41,092][1652475] Updated weights for policy 0, policy_version 505482 (0.0012) [2024-06-15 18:09:44,912][1651340] Signal inference workers to stop experience collection... (25950 times) [2024-06-15 18:09:44,959][1652475] InferenceWorker_p0-w0: stopping experience collection (25950 times) [2024-06-15 18:09:45,151][1651340] Signal inference workers to resume experience collection... (25950 times) [2024-06-15 18:09:45,152][1652475] InferenceWorker_p0-w0: resuming experience collection (25950 times) [2024-06-15 18:09:45,351][1652475] Updated weights for policy 0, policy_version 505555 (0.0122) [2024-06-15 18:09:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.2, 300 sec: 44209.0). Total num frames: 1035403264. Throughput: 0: 10717.9. Samples: 258926592. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:09:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:09:46,251][1652475] Updated weights for policy 0, policy_version 505599 (0.0034) [2024-06-15 18:09:47,882][1652475] Updated weights for policy 0, policy_version 505658 (0.0013) [2024-06-15 18:09:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44783.0, 300 sec: 43991.2). Total num frames: 1035665408. Throughput: 0: 10604.1. Samples: 258959360. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:09:50,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 18:09:50,944][1652475] Updated weights for policy 0, policy_version 505714 (0.0015) [2024-06-15 18:09:51,245][1652475] Updated weights for policy 0, policy_version 505728 (0.0012) [2024-06-15 18:09:55,740][1648984] Fps is (10 sec: 36036.1, 60 sec: 40412.3, 300 sec: 43653.3). Total num frames: 1035763712. Throughput: 0: 10751.4. Samples: 259027968. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:09:55,741][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:09:56,459][1652475] Updated weights for policy 0, policy_version 505792 (0.0014) [2024-06-15 18:09:58,590][1652475] Updated weights for policy 0, policy_version 505859 (0.0014) [2024-06-15 18:09:59,733][1652475] Updated weights for policy 0, policy_version 505916 (0.0014) [2024-06-15 18:10:00,738][1648984] Fps is (10 sec: 45874.4, 60 sec: 43690.5, 300 sec: 44431.2). Total num frames: 1036124160. Throughput: 0: 10683.7. Samples: 259088384. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:10:05,738][1648984] Fps is (10 sec: 49163.7, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1036255232. Throughput: 0: 10797.5. Samples: 259121664. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:10:07,478][1652475] Updated weights for policy 0, policy_version 505987 (0.0014) [2024-06-15 18:10:08,973][1652475] Updated weights for policy 0, policy_version 506051 (0.0096) [2024-06-15 18:10:10,567][1652475] Updated weights for policy 0, policy_version 506114 (0.0012) [2024-06-15 18:10:10,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 41506.2, 300 sec: 44320.1). Total num frames: 1036517376. Throughput: 0: 10899.9. Samples: 259190784. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:10:11,865][1652475] Updated weights for policy 0, policy_version 506176 (0.0027) [2024-06-15 18:10:14,581][1652475] Updated weights for policy 0, policy_version 506231 (0.0013) [2024-06-15 18:10:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44782.8, 300 sec: 44097.9). Total num frames: 1036779520. Throughput: 0: 10854.4. Samples: 259255296. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:15,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:10:19,874][1652475] Updated weights for policy 0, policy_version 506273 (0.0013) [2024-06-15 18:10:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 1036910592. Throughput: 0: 10899.9. Samples: 259294720. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:20,740][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:10:21,498][1652475] Updated weights for policy 0, policy_version 506339 (0.0013) [2024-06-15 18:10:22,533][1652475] Updated weights for policy 0, policy_version 506384 (0.0014) [2024-06-15 18:10:23,493][1652475] Updated weights for policy 0, policy_version 506432 (0.0012) [2024-06-15 18:10:25,739][1648984] Fps is (10 sec: 45871.0, 60 sec: 44782.2, 300 sec: 44208.9). Total num frames: 1037238272. Throughput: 0: 11070.3. Samples: 259357184. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:25,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:10:26,163][1652475] Updated weights for policy 0, policy_version 506489 (0.0014) [2024-06-15 18:10:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43544.2). Total num frames: 1037303808. Throughput: 0: 11195.7. Samples: 259430400. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:10:31,056][1651340] Signal inference workers to stop experience collection... (26000 times) [2024-06-15 18:10:31,094][1652475] InferenceWorker_p0-w0: stopping experience collection (26000 times) [2024-06-15 18:10:31,286][1651340] Signal inference workers to resume experience collection... (26000 times) [2024-06-15 18:10:31,288][1652475] InferenceWorker_p0-w0: resuming experience collection (26000 times) [2024-06-15 18:10:31,437][1652475] Updated weights for policy 0, policy_version 506530 (0.0014) [2024-06-15 18:10:33,197][1652475] Updated weights for policy 0, policy_version 506618 (0.0014) [2024-06-15 18:10:35,003][1652475] Updated weights for policy 0, policy_version 506682 (0.0128) [2024-06-15 18:10:35,738][1648984] Fps is (10 sec: 45879.7, 60 sec: 45329.0, 300 sec: 44209.0). Total num frames: 1037697024. Throughput: 0: 11070.6. Samples: 259457536. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:10:38,007][1652475] Updated weights for policy 0, policy_version 506747 (0.0014) [2024-06-15 18:10:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 1037828096. Throughput: 0: 11139.4. Samples: 259529216. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:10:42,863][1652475] Updated weights for policy 0, policy_version 506800 (0.0013) [2024-06-15 18:10:44,509][1652475] Updated weights for policy 0, policy_version 506878 (0.0013) [2024-06-15 18:10:45,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 45329.0, 300 sec: 44097.9). Total num frames: 1038123008. Throughput: 0: 11229.9. Samples: 259593728. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:45,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:10:46,750][1652475] Updated weights for policy 0, policy_version 506944 (0.0012) [2024-06-15 18:10:50,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44782.9, 300 sec: 44320.1). Total num frames: 1038352384. Throughput: 0: 11184.3. Samples: 259624960. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:50,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:10:53,641][1652475] Updated weights for policy 0, policy_version 507011 (0.0013) [2024-06-15 18:10:54,736][1652475] Updated weights for policy 0, policy_version 507072 (0.0013) [2024-06-15 18:10:55,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 46423.2, 300 sec: 43986.9). Total num frames: 1038548992. Throughput: 0: 11195.7. Samples: 259694592. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:10:55,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:10:56,068][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000507136_1038614528.pth... [2024-06-15 18:10:56,082][1652475] Updated weights for policy 0, policy_version 507136 (0.0012) [2024-06-15 18:10:56,107][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000502000_1028096000.pth [2024-06-15 18:10:56,111][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000507136_1038614528.pth [2024-06-15 18:10:58,366][1652475] Updated weights for policy 0, policy_version 507195 (0.0014) [2024-06-15 18:11:00,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 1038745600. Throughput: 0: 11093.4. Samples: 259754496. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:11:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:11:03,767][1652475] Updated weights for policy 0, policy_version 507261 (0.0019) [2024-06-15 18:11:05,738][1648984] Fps is (10 sec: 32767.3, 60 sec: 43690.5, 300 sec: 43542.5). Total num frames: 1038876672. Throughput: 0: 10899.8. Samples: 259785216. Policy #0 lag: (min: 15.0, avg: 125.2, max: 271.0) [2024-06-15 18:11:05,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:11:06,751][1652475] Updated weights for policy 0, policy_version 507326 (0.0014) [2024-06-15 18:11:08,546][1652475] Updated weights for policy 0, policy_version 507384 (0.0026) [2024-06-15 18:11:10,702][1652475] Updated weights for policy 0, policy_version 507455 (0.0077) [2024-06-15 18:11:10,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 45875.1, 300 sec: 44209.0). Total num frames: 1039269888. Throughput: 0: 10888.8. Samples: 259847168. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:11:10,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:11:15,738][1648984] Fps is (10 sec: 45876.4, 60 sec: 42598.5, 300 sec: 43542.6). Total num frames: 1039335424. Throughput: 0: 10854.4. Samples: 259918848. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:11:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:11:16,096][1652475] Updated weights for policy 0, policy_version 507517 (0.0020) [2024-06-15 18:11:17,393][1651340] Signal inference workers to stop experience collection... (26050 times) [2024-06-15 18:11:17,494][1652475] InferenceWorker_p0-w0: stopping experience collection (26050 times) [2024-06-15 18:11:17,703][1651340] Signal inference workers to resume experience collection... (26050 times) [2024-06-15 18:11:17,703][1652475] InferenceWorker_p0-w0: resuming experience collection (26050 times) [2024-06-15 18:11:18,752][1652475] Updated weights for policy 0, policy_version 507577 (0.0064) [2024-06-15 18:11:20,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 44782.8, 300 sec: 43764.7). Total num frames: 1039597568. Throughput: 0: 10979.5. Samples: 259951616. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:11:20,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:11:21,154][1652475] Updated weights for policy 0, policy_version 507648 (0.0084) [2024-06-15 18:11:22,980][1652475] Updated weights for policy 0, policy_version 507707 (0.0090) [2024-06-15 18:11:25,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42599.0, 300 sec: 43653.6). Total num frames: 1039794176. Throughput: 0: 10683.7. Samples: 260009984. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:11:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:11:26,889][1652475] Updated weights for policy 0, policy_version 507774 (0.0014) [2024-06-15 18:11:30,243][1652475] Updated weights for policy 0, policy_version 507835 (0.0035) [2024-06-15 18:11:30,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 1040056320. Throughput: 0: 10683.8. Samples: 260074496. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:11:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:11:35,297][1652475] Updated weights for policy 0, policy_version 507891 (0.0092) [2024-06-15 18:11:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 43764.7). Total num frames: 1040187392. Throughput: 0: 10752.0. Samples: 260108800. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:11:35,740][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 18:11:37,069][1652475] Updated weights for policy 0, policy_version 507968 (0.0012) [2024-06-15 18:11:38,240][1652475] Updated weights for policy 0, policy_version 508032 (0.0013) [2024-06-15 18:11:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1040449536. Throughput: 0: 10490.3. Samples: 260166656. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:11:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:11:41,982][1652475] Updated weights for policy 0, policy_version 508093 (0.0016) [2024-06-15 18:11:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 40960.1, 300 sec: 43542.6). Total num frames: 1040580608. Throughput: 0: 10683.7. Samples: 260235264. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:11:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:11:48,733][1652475] Updated weights for policy 0, policy_version 508148 (0.0012) [2024-06-15 18:11:50,554][1652475] Updated weights for policy 0, policy_version 508240 (0.0082) [2024-06-15 18:11:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 1040875520. Throughput: 0: 10808.9. Samples: 260271616. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:11:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:11:52,722][1652475] Updated weights for policy 0, policy_version 508291 (0.0025) [2024-06-15 18:11:53,984][1652475] Updated weights for policy 0, policy_version 508347 (0.0012) [2024-06-15 18:11:55,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1041104896. Throughput: 0: 10672.4. Samples: 260327424. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:11:55,741][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:12:00,178][1652475] Updated weights for policy 0, policy_version 508432 (0.0013) [2024-06-15 18:12:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1041301504. Throughput: 0: 10638.2. Samples: 260397568. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:12:00,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 18:12:00,979][1651340] Signal inference workers to stop experience collection... (26100 times) [2024-06-15 18:12:01,010][1652475] InferenceWorker_p0-w0: stopping experience collection (26100 times) [2024-06-15 18:12:01,238][1651340] Signal inference workers to resume experience collection... (26100 times) [2024-06-15 18:12:01,238][1652475] InferenceWorker_p0-w0: resuming experience collection (26100 times) [2024-06-15 18:12:01,826][1652475] Updated weights for policy 0, policy_version 508501 (0.0013) [2024-06-15 18:12:05,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 1041498112. Throughput: 0: 10535.9. Samples: 260425728. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:12:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:12:07,678][1652475] Updated weights for policy 0, policy_version 508576 (0.0013) [2024-06-15 18:12:10,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 39867.8, 300 sec: 43209.3). Total num frames: 1041661952. Throughput: 0: 10797.5. Samples: 260495872. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:12:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:12:10,973][1652475] Updated weights for policy 0, policy_version 508642 (0.0014) [2024-06-15 18:12:12,161][1652475] Updated weights for policy 0, policy_version 508704 (0.0014) [2024-06-15 18:12:14,137][1652475] Updated weights for policy 0, policy_version 508787 (0.0085) [2024-06-15 18:12:15,740][1648984] Fps is (10 sec: 52416.6, 60 sec: 44781.2, 300 sec: 43986.5). Total num frames: 1042022400. Throughput: 0: 10728.7. Samples: 260557312. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:12:15,742][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:12:20,263][1652475] Updated weights for policy 0, policy_version 508832 (0.0014) [2024-06-15 18:12:20,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42052.4, 300 sec: 43098.3). Total num frames: 1042120704. Throughput: 0: 10865.8. Samples: 260597760. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:12:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:12:22,380][1652475] Updated weights for policy 0, policy_version 508896 (0.0014) [2024-06-15 18:12:24,317][1652475] Updated weights for policy 0, policy_version 508976 (0.0137) [2024-06-15 18:12:25,738][1648984] Fps is (10 sec: 45885.6, 60 sec: 44783.0, 300 sec: 43986.9). Total num frames: 1042481152. Throughput: 0: 10911.3. Samples: 260657664. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:12:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:12:26,085][1652475] Updated weights for policy 0, policy_version 509053 (0.0013) [2024-06-15 18:12:30,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1042546688. Throughput: 0: 11013.7. Samples: 260730880. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 18:12:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:12:32,891][1652475] Updated weights for policy 0, policy_version 509120 (0.0015) [2024-06-15 18:12:35,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 44236.7, 300 sec: 43653.6). Total num frames: 1042841600. Throughput: 0: 10922.6. Samples: 260763136. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:12:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:12:35,769][1652475] Updated weights for policy 0, policy_version 509206 (0.0116) [2024-06-15 18:12:37,551][1652475] Updated weights for policy 0, policy_version 509284 (0.0013) [2024-06-15 18:12:40,740][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1043070976. Throughput: 0: 10979.5. Samples: 260821504. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:12:40,741][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:12:43,776][1652475] Updated weights for policy 0, policy_version 509332 (0.0013) [2024-06-15 18:12:45,039][1652475] Updated weights for policy 0, policy_version 509378 (0.0014) [2024-06-15 18:12:45,494][1651340] Signal inference workers to stop experience collection... (26150 times) [2024-06-15 18:12:45,528][1652475] InferenceWorker_p0-w0: stopping experience collection (26150 times) [2024-06-15 18:12:45,694][1651340] Signal inference workers to resume experience collection... (26150 times) [2024-06-15 18:12:45,695][1652475] InferenceWorker_p0-w0: resuming experience collection (26150 times) [2024-06-15 18:12:45,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 1043267584. Throughput: 0: 11218.5. Samples: 260902400. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:12:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:12:46,593][1652475] Updated weights for policy 0, policy_version 509442 (0.0075) [2024-06-15 18:12:49,241][1652475] Updated weights for policy 0, policy_version 509560 (0.0109) [2024-06-15 18:12:50,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 45329.1, 300 sec: 43986.9). Total num frames: 1043595264. Throughput: 0: 11013.7. Samples: 260921344. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:12:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:12:55,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 41506.0, 300 sec: 42876.1). Total num frames: 1043595264. Throughput: 0: 11241.2. Samples: 261001728. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:12:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:12:56,175][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000509600_1043660800.pth... [2024-06-15 18:12:56,296][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000504592_1033404416.pth [2024-06-15 18:12:56,890][1652475] Updated weights for policy 0, policy_version 509623 (0.0016) [2024-06-15 18:12:58,637][1652475] Updated weights for policy 0, policy_version 509696 (0.0079) [2024-06-15 18:13:00,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 45329.0, 300 sec: 44097.9). Total num frames: 1044021248. Throughput: 0: 11014.2. Samples: 261052928. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:13:01,271][1652475] Updated weights for policy 0, policy_version 509808 (0.0013) [2024-06-15 18:13:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1044119552. Throughput: 0: 10934.0. Samples: 261089792. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:13:07,674][1652475] Updated weights for policy 0, policy_version 509829 (0.0012) [2024-06-15 18:13:10,170][1652475] Updated weights for policy 0, policy_version 509944 (0.0069) [2024-06-15 18:13:10,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 45328.9, 300 sec: 43542.5). Total num frames: 1044381696. Throughput: 0: 11127.4. Samples: 261158400. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:10,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 18:13:13,205][1652475] Updated weights for policy 0, policy_version 510032 (0.0016) [2024-06-15 18:13:15,745][1648984] Fps is (10 sec: 52391.2, 60 sec: 43687.1, 300 sec: 43652.6). Total num frames: 1044643840. Throughput: 0: 10682.0. Samples: 261211648. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:15,745][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:13:20,397][1652475] Updated weights for policy 0, policy_version 510102 (0.0087) [2024-06-15 18:13:20,738][1648984] Fps is (10 sec: 32769.0, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 1044709376. Throughput: 0: 10843.1. Samples: 261251072. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:13:22,385][1652475] Updated weights for policy 0, policy_version 510192 (0.0015) [2024-06-15 18:13:24,937][1652475] Updated weights for policy 0, policy_version 510265 (0.0022) [2024-06-15 18:13:25,738][1648984] Fps is (10 sec: 39350.0, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 1045037056. Throughput: 0: 10797.5. Samples: 261307392. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:25,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:13:26,190][1651340] Signal inference workers to stop experience collection... (26200 times) [2024-06-15 18:13:26,235][1652475] InferenceWorker_p0-w0: stopping experience collection (26200 times) [2024-06-15 18:13:26,410][1651340] Signal inference workers to resume experience collection... (26200 times) [2024-06-15 18:13:26,422][1652475] InferenceWorker_p0-w0: resuming experience collection (26200 times) [2024-06-15 18:13:26,927][1652475] Updated weights for policy 0, policy_version 510330 (0.0011) [2024-06-15 18:13:30,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1045168128. Throughput: 0: 10626.8. Samples: 261380608. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:13:33,012][1652475] Updated weights for policy 0, policy_version 510397 (0.0012) [2024-06-15 18:13:34,491][1652475] Updated weights for policy 0, policy_version 510459 (0.0139) [2024-06-15 18:13:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.6, 300 sec: 43543.3). Total num frames: 1045430272. Throughput: 0: 10899.9. Samples: 261411840. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:13:36,783][1652475] Updated weights for policy 0, policy_version 510519 (0.0015) [2024-06-15 18:13:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1045692416. Throughput: 0: 10444.8. Samples: 261471744. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:13:44,624][1652475] Updated weights for policy 0, policy_version 510608 (0.0012) [2024-06-15 18:13:45,739][1648984] Fps is (10 sec: 36043.9, 60 sec: 42052.1, 300 sec: 43431.5). Total num frames: 1045790720. Throughput: 0: 10865.7. Samples: 261541888. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:45,741][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:13:46,483][1652475] Updated weights for policy 0, policy_version 510674 (0.0027) [2024-06-15 18:13:48,561][1652475] Updated weights for policy 0, policy_version 510736 (0.0014) [2024-06-15 18:13:49,561][1652475] Updated weights for policy 0, policy_version 510779 (0.0012) [2024-06-15 18:13:50,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 1046151168. Throughput: 0: 10626.9. Samples: 261568000. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:13:51,218][1652475] Updated weights for policy 0, policy_version 510841 (0.0013) [2024-06-15 18:13:55,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1046216704. Throughput: 0: 10535.9. Samples: 261632512. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:13:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:13:56,948][1652475] Updated weights for policy 0, policy_version 510896 (0.0017) [2024-06-15 18:14:00,476][1652475] Updated weights for policy 0, policy_version 510960 (0.0013) [2024-06-15 18:14:00,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 40413.9, 300 sec: 43209.3). Total num frames: 1046446080. Throughput: 0: 10685.4. Samples: 261692416. Policy #0 lag: (min: 10.0, avg: 84.3, max: 233.0) [2024-06-15 18:14:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:14:02,004][1652475] Updated weights for policy 0, policy_version 511024 (0.0013) [2024-06-15 18:14:04,434][1652475] Updated weights for policy 0, policy_version 511058 (0.0014) [2024-06-15 18:14:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1046740992. Throughput: 0: 10433.4. Samples: 261720576. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:05,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:14:08,157][1652475] Updated weights for policy 0, policy_version 511136 (0.0013) [2024-06-15 18:14:10,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 41506.3, 300 sec: 43320.4). Total num frames: 1046872064. Throughput: 0: 10626.9. Samples: 261785600. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:14:11,994][1652475] Updated weights for policy 0, policy_version 511203 (0.0014) [2024-06-15 18:14:13,238][1651340] Signal inference workers to stop experience collection... (26250 times) [2024-06-15 18:14:13,298][1652475] InferenceWorker_p0-w0: stopping experience collection (26250 times) [2024-06-15 18:14:13,592][1651340] Signal inference workers to resume experience collection... (26250 times) [2024-06-15 18:14:13,593][1652475] InferenceWorker_p0-w0: resuming experience collection (26250 times) [2024-06-15 18:14:14,190][1652475] Updated weights for policy 0, policy_version 511289 (0.0124) [2024-06-15 18:14:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41511.1, 300 sec: 43542.6). Total num frames: 1047134208. Throughput: 0: 10376.5. Samples: 261847552. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:14:17,296][1652475] Updated weights for policy 0, policy_version 511354 (0.0014) [2024-06-15 18:14:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 1047265280. Throughput: 0: 10456.2. Samples: 261882368. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:14:23,225][1652475] Updated weights for policy 0, policy_version 511411 (0.0013) [2024-06-15 18:14:24,675][1652475] Updated weights for policy 0, policy_version 511472 (0.0021) [2024-06-15 18:14:25,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1047592960. Throughput: 0: 10513.1. Samples: 261944832. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:25,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:14:26,256][1652475] Updated weights for policy 0, policy_version 511550 (0.0011) [2024-06-15 18:14:29,409][1652475] Updated weights for policy 0, policy_version 511605 (0.0017) [2024-06-15 18:14:30,739][1648984] Fps is (10 sec: 52427.9, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 1047789568. Throughput: 0: 10353.8. Samples: 262007808. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:30,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:14:34,930][1652475] Updated weights for policy 0, policy_version 511648 (0.0013) [2024-06-15 18:14:35,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1047920640. Throughput: 0: 10649.6. Samples: 262047232. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:14:36,737][1652475] Updated weights for policy 0, policy_version 511728 (0.0013) [2024-06-15 18:14:38,221][1652475] Updated weights for policy 0, policy_version 511805 (0.0014) [2024-06-15 18:14:40,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 42052.3, 300 sec: 43431.5). Total num frames: 1048215552. Throughput: 0: 10513.1. Samples: 262105600. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:14:41,321][1652475] Updated weights for policy 0, policy_version 511864 (0.0011) [2024-06-15 18:14:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.5, 300 sec: 42876.1). Total num frames: 1048313856. Throughput: 0: 10763.4. Samples: 262176768. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:14:47,029][1652475] Updated weights for policy 0, policy_version 511931 (0.0013) [2024-06-15 18:14:48,469][1652475] Updated weights for policy 0, policy_version 511998 (0.0013) [2024-06-15 18:14:50,104][1652475] Updated weights for policy 0, policy_version 512048 (0.0013) [2024-06-15 18:14:50,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 42598.4, 300 sec: 43876.1). Total num frames: 1048707072. Throughput: 0: 10763.4. Samples: 262204928. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:50,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:14:51,783][1652475] Updated weights for policy 0, policy_version 512080 (0.0012) [2024-06-15 18:14:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1048838144. Throughput: 0: 10956.8. Samples: 262278656. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:14:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:14:55,775][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000512128_1048838144.pth... [2024-06-15 18:14:55,834][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000507136_1038614528.pth [2024-06-15 18:14:57,488][1652475] Updated weights for policy 0, policy_version 512144 (0.0013) [2024-06-15 18:14:59,771][1651340] Signal inference workers to stop experience collection... (26300 times) [2024-06-15 18:14:59,842][1652475] InferenceWorker_p0-w0: stopping experience collection (26300 times) [2024-06-15 18:14:59,882][1652475] Updated weights for policy 0, policy_version 512230 (0.0017) [2024-06-15 18:15:00,016][1651340] Signal inference workers to resume experience collection... (26300 times) [2024-06-15 18:15:00,017][1652475] InferenceWorker_p0-w0: resuming experience collection (26300 times) [2024-06-15 18:15:00,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 44782.8, 300 sec: 43653.6). Total num frames: 1049133056. Throughput: 0: 11013.6. Samples: 262343168. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:15:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:15:01,591][1652475] Updated weights for policy 0, policy_version 512311 (0.0016) [2024-06-15 18:15:03,885][1652475] Updated weights for policy 0, policy_version 512354 (0.0013) [2024-06-15 18:15:05,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 1049362432. Throughput: 0: 10956.8. Samples: 262375424. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:15:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:15:09,099][1652475] Updated weights for policy 0, policy_version 512391 (0.0014) [2024-06-15 18:15:10,131][1652475] Updated weights for policy 0, policy_version 512440 (0.0014) [2024-06-15 18:15:10,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1049526272. Throughput: 0: 11332.2. Samples: 262454784. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:15:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:15:11,819][1652475] Updated weights for policy 0, policy_version 512512 (0.0030) [2024-06-15 18:15:13,346][1652475] Updated weights for policy 0, policy_version 512572 (0.0015) [2024-06-15 18:15:14,991][1652475] Updated weights for policy 0, policy_version 512630 (0.0013) [2024-06-15 18:15:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 43986.9). Total num frames: 1049886720. Throughput: 0: 11286.8. Samples: 262515712. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:15:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:15:20,740][1648984] Fps is (10 sec: 36040.2, 60 sec: 43689.7, 300 sec: 42876.0). Total num frames: 1049886720. Throughput: 0: 11400.2. Samples: 262560256. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:15:20,741][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:15:21,265][1652475] Updated weights for policy 0, policy_version 512677 (0.0056) [2024-06-15 18:15:23,137][1652475] Updated weights for policy 0, policy_version 512768 (0.0015) [2024-06-15 18:15:24,588][1652475] Updated weights for policy 0, policy_version 512827 (0.0015) [2024-06-15 18:15:25,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 45875.1, 300 sec: 44209.0). Total num frames: 1050345472. Throughput: 0: 11411.9. Samples: 262619136. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:15:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:15:25,810][1652475] Updated weights for policy 0, policy_version 512868 (0.0014) [2024-06-15 18:15:30,738][1648984] Fps is (10 sec: 52435.9, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 1050411008. Throughput: 0: 11480.2. Samples: 262693376. Policy #0 lag: (min: 111.0, avg: 206.5, max: 399.0) [2024-06-15 18:15:30,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:15:32,183][1652475] Updated weights for policy 0, policy_version 512944 (0.0019) [2024-06-15 18:15:35,174][1652475] Updated weights for policy 0, policy_version 513008 (0.0115) [2024-06-15 18:15:35,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 45875.1, 300 sec: 43542.6). Total num frames: 1050673152. Throughput: 0: 11628.1. Samples: 262728192. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:15:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:15:37,277][1652475] Updated weights for policy 0, policy_version 513093 (0.0013) [2024-06-15 18:15:38,653][1652475] Updated weights for policy 0, policy_version 513145 (0.0015) [2024-06-15 18:15:40,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 45328.9, 300 sec: 43431.5). Total num frames: 1050935296. Throughput: 0: 11207.1. Samples: 262782976. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:15:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:15:43,322][1652475] Updated weights for policy 0, policy_version 513174 (0.0013) [2024-06-15 18:15:43,581][1651340] Signal inference workers to stop experience collection... (26350 times) [2024-06-15 18:15:43,644][1652475] InferenceWorker_p0-w0: stopping experience collection (26350 times) [2024-06-15 18:15:43,752][1651340] Signal inference workers to resume experience collection... (26350 times) [2024-06-15 18:15:43,753][1652475] InferenceWorker_p0-w0: resuming experience collection (26350 times) [2024-06-15 18:15:45,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 45875.1, 300 sec: 43098.3). Total num frames: 1051066368. Throughput: 0: 11491.6. Samples: 262860288. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:15:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:15:46,274][1652475] Updated weights for policy 0, policy_version 513232 (0.0109) [2024-06-15 18:15:47,229][1652475] Updated weights for policy 0, policy_version 513280 (0.0020) [2024-06-15 18:15:49,896][1652475] Updated weights for policy 0, policy_version 513363 (0.0012) [2024-06-15 18:15:50,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 1051426816. Throughput: 0: 11423.3. Samples: 262889472. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:15:50,744][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:15:55,786][1648984] Fps is (10 sec: 39133.1, 60 sec: 43655.6, 300 sec: 43091.2). Total num frames: 1051459584. Throughput: 0: 11047.4. Samples: 262952448. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:15:55,787][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:15:56,247][1652475] Updated weights for policy 0, policy_version 513440 (0.0014) [2024-06-15 18:15:59,536][1652475] Updated weights for policy 0, policy_version 513520 (0.0099) [2024-06-15 18:16:00,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43690.8, 300 sec: 43653.7). Total num frames: 1051754496. Throughput: 0: 11002.3. Samples: 263010816. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:16:01,951][1652475] Updated weights for policy 0, policy_version 513604 (0.0013) [2024-06-15 18:16:03,430][1652475] Updated weights for policy 0, policy_version 513664 (0.0012) [2024-06-15 18:16:05,738][1648984] Fps is (10 sec: 52682.9, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 1051983872. Throughput: 0: 10433.7. Samples: 263029760. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:16:10,738][1648984] Fps is (10 sec: 26214.4, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 1052016640. Throughput: 0: 10752.0. Samples: 263102976. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:16:11,427][1652475] Updated weights for policy 0, policy_version 513713 (0.0015) [2024-06-15 18:16:13,114][1652475] Updated weights for policy 0, policy_version 513778 (0.0013) [2024-06-15 18:16:14,804][1652475] Updated weights for policy 0, policy_version 513849 (0.0015) [2024-06-15 18:16:15,738][1648984] Fps is (10 sec: 39320.1, 60 sec: 41506.0, 300 sec: 43320.4). Total num frames: 1052377088. Throughput: 0: 10274.0. Samples: 263155712. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:15,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:16:18,005][1652475] Updated weights for policy 0, policy_version 513918 (0.0142) [2024-06-15 18:16:20,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43691.6, 300 sec: 43098.3). Total num frames: 1052508160. Throughput: 0: 10171.7. Samples: 263185920. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:16:23,650][1652475] Updated weights for policy 0, policy_version 513975 (0.0089) [2024-06-15 18:16:25,622][1652475] Updated weights for policy 0, policy_version 514051 (0.0012) [2024-06-15 18:16:25,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 40413.9, 300 sec: 43098.3). Total num frames: 1052770304. Throughput: 0: 10513.1. Samples: 263256064. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:16:26,826][1652475] Updated weights for policy 0, policy_version 514102 (0.0023) [2024-06-15 18:16:29,396][1651340] Signal inference workers to stop experience collection... (26400 times) [2024-06-15 18:16:29,479][1652475] InferenceWorker_p0-w0: stopping experience collection (26400 times) [2024-06-15 18:16:29,585][1651340] Signal inference workers to resume experience collection... (26400 times) [2024-06-15 18:16:29,586][1652475] InferenceWorker_p0-w0: resuming experience collection (26400 times) [2024-06-15 18:16:30,134][1652475] Updated weights for policy 0, policy_version 514146 (0.0013) [2024-06-15 18:16:30,712][1652475] Updated weights for policy 0, policy_version 514176 (0.0013) [2024-06-15 18:16:30,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1053032448. Throughput: 0: 10205.9. Samples: 263319552. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:16:35,738][1648984] Fps is (10 sec: 36044.0, 60 sec: 40959.9, 300 sec: 42987.1). Total num frames: 1053130752. Throughput: 0: 10353.7. Samples: 263355392. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:35,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:16:36,650][1652475] Updated weights for policy 0, policy_version 514257 (0.0013) [2024-06-15 18:16:38,987][1652475] Updated weights for policy 0, policy_version 514354 (0.0214) [2024-06-15 18:16:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 1053425664. Throughput: 0: 9977.6. Samples: 263400960. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:16:43,072][1652475] Updated weights for policy 0, policy_version 514425 (0.0015) [2024-06-15 18:16:45,738][1648984] Fps is (10 sec: 42599.5, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 1053556736. Throughput: 0: 10251.4. Samples: 263472128. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:16:48,355][1652475] Updated weights for policy 0, policy_version 514487 (0.0017) [2024-06-15 18:16:50,099][1652475] Updated weights for policy 0, policy_version 514555 (0.0013) [2024-06-15 18:16:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 39867.7, 300 sec: 43098.2). Total num frames: 1053818880. Throughput: 0: 10513.1. Samples: 263502848. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:16:52,562][1652475] Updated weights for policy 0, policy_version 514615 (0.0071) [2024-06-15 18:16:54,951][1652475] Updated weights for policy 0, policy_version 514679 (0.0013) [2024-06-15 18:16:55,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43725.7, 300 sec: 43320.4). Total num frames: 1054081024. Throughput: 0: 10285.5. Samples: 263565824. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 18:16:55,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:16:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000514688_1054081024.pth... [2024-06-15 18:16:55,842][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000509600_1043660800.pth [2024-06-15 18:16:59,938][1652475] Updated weights for policy 0, policy_version 514736 (0.0037) [2024-06-15 18:17:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40960.0, 300 sec: 43098.2). Total num frames: 1054212096. Throughput: 0: 10570.0. Samples: 263631360. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:17:01,794][1652475] Updated weights for policy 0, policy_version 514812 (0.0012) [2024-06-15 18:17:05,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 39867.7, 300 sec: 43098.3). Total num frames: 1054375936. Throughput: 0: 10501.7. Samples: 263658496. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:17:06,940][1652475] Updated weights for policy 0, policy_version 514896 (0.0022) [2024-06-15 18:17:07,822][1652475] Updated weights for policy 0, policy_version 514942 (0.0013) [2024-06-15 18:17:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.5, 300 sec: 42654.3). Total num frames: 1054605312. Throughput: 0: 10433.4. Samples: 263725568. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:17:13,130][1652475] Updated weights for policy 0, policy_version 515040 (0.0012) [2024-06-15 18:17:13,239][1651340] Signal inference workers to stop experience collection... (26450 times) [2024-06-15 18:17:13,277][1652475] InferenceWorker_p0-w0: stopping experience collection (26450 times) [2024-06-15 18:17:13,406][1651340] Signal inference workers to resume experience collection... (26450 times) [2024-06-15 18:17:13,407][1652475] InferenceWorker_p0-w0: resuming experience collection (26450 times) [2024-06-15 18:17:15,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 41506.3, 300 sec: 43209.3). Total num frames: 1054867456. Throughput: 0: 10467.5. Samples: 263790592. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:17:18,634][1652475] Updated weights for policy 0, policy_version 515136 (0.0030) [2024-06-15 18:17:20,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1055129600. Throughput: 0: 10524.5. Samples: 263828992. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:17:22,458][1652475] Updated weights for policy 0, policy_version 515202 (0.0012) [2024-06-15 18:17:24,483][1652475] Updated weights for policy 0, policy_version 515296 (0.0012) [2024-06-15 18:17:25,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1055391744. Throughput: 0: 10865.8. Samples: 263889920. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:17:29,953][1652475] Updated weights for policy 0, policy_version 515365 (0.0013) [2024-06-15 18:17:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 1055522816. Throughput: 0: 10911.3. Samples: 263963136. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:17:32,038][1652475] Updated weights for policy 0, policy_version 515450 (0.0123) [2024-06-15 18:17:35,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43690.8, 300 sec: 42987.2). Total num frames: 1055752192. Throughput: 0: 10820.2. Samples: 263989760. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:35,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:17:35,884][1652475] Updated weights for policy 0, policy_version 515520 (0.0014) [2024-06-15 18:17:37,354][1652475] Updated weights for policy 0, policy_version 515584 (0.0012) [2024-06-15 18:17:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 1055916032. Throughput: 0: 10740.6. Samples: 264049152. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:17:43,768][1652475] Updated weights for policy 0, policy_version 515652 (0.0014) [2024-06-15 18:17:45,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1056178176. Throughput: 0: 10592.7. Samples: 264108032. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:17:47,026][1652475] Updated weights for policy 0, policy_version 515713 (0.0013) [2024-06-15 18:17:50,231][1652475] Updated weights for policy 0, policy_version 515794 (0.0015) [2024-06-15 18:17:50,740][1648984] Fps is (10 sec: 45866.4, 60 sec: 42597.0, 300 sec: 43320.1). Total num frames: 1056374784. Throughput: 0: 10717.4. Samples: 264140800. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:50,740][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:17:54,197][1652475] Updated weights for policy 0, policy_version 515860 (0.0015) [2024-06-15 18:17:55,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42653.9). Total num frames: 1056604160. Throughput: 0: 10752.0. Samples: 264209408. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:17:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:17:56,605][1652475] Updated weights for policy 0, policy_version 515962 (0.0014) [2024-06-15 18:17:59,439][1651340] Signal inference workers to stop experience collection... (26500 times) [2024-06-15 18:17:59,474][1652475] InferenceWorker_p0-w0: stopping experience collection (26500 times) [2024-06-15 18:17:59,654][1651340] Signal inference workers to resume experience collection... (26500 times) [2024-06-15 18:17:59,657][1652475] InferenceWorker_p0-w0: resuming experience collection (26500 times) [2024-06-15 18:18:00,333][1652475] Updated weights for policy 0, policy_version 516023 (0.0013) [2024-06-15 18:18:00,738][1648984] Fps is (10 sec: 45883.7, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1056833536. Throughput: 0: 10547.2. Samples: 264265216. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:18:00,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:18:04,899][1652475] Updated weights for policy 0, policy_version 516087 (0.0014) [2024-06-15 18:18:05,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.5, 300 sec: 42654.0). Total num frames: 1056964608. Throughput: 0: 10524.4. Samples: 264302592. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:18:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:18:06,941][1652475] Updated weights for policy 0, policy_version 516144 (0.0015) [2024-06-15 18:18:08,490][1652475] Updated weights for policy 0, policy_version 516208 (0.0014) [2024-06-15 18:18:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 42655.0). Total num frames: 1057226752. Throughput: 0: 10535.8. Samples: 264364032. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:18:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:18:11,260][1652475] Updated weights for policy 0, policy_version 516257 (0.0014) [2024-06-15 18:18:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1057390592. Throughput: 0: 10456.2. Samples: 264433664. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:18:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:18:16,255][1652475] Updated weights for policy 0, policy_version 516329 (0.0014) [2024-06-15 18:18:16,818][1652475] Updated weights for policy 0, policy_version 516352 (0.0012) [2024-06-15 18:18:19,888][1652475] Updated weights for policy 0, policy_version 516432 (0.0013) [2024-06-15 18:18:20,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1057718272. Throughput: 0: 10615.5. Samples: 264467456. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:18:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:18:20,987][1652475] Updated weights for policy 0, policy_version 516475 (0.0015) [2024-06-15 18:18:24,769][1652475] Updated weights for policy 0, policy_version 516536 (0.0085) [2024-06-15 18:18:25,738][1648984] Fps is (10 sec: 49151.2, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 1057882112. Throughput: 0: 10695.1. Samples: 264530432. Policy #0 lag: (min: 16.0, avg: 94.8, max: 272.0) [2024-06-15 18:18:25,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:18:27,676][1652475] Updated weights for policy 0, policy_version 516594 (0.0095) [2024-06-15 18:18:30,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 1058078720. Throughput: 0: 10774.8. Samples: 264592896. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:18:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:18:30,816][1652475] Updated weights for policy 0, policy_version 516656 (0.0014) [2024-06-15 18:18:33,029][1652475] Updated weights for policy 0, policy_version 516704 (0.0011) [2024-06-15 18:18:35,738][1648984] Fps is (10 sec: 39322.7, 60 sec: 42052.4, 300 sec: 42654.0). Total num frames: 1058275328. Throughput: 0: 10695.6. Samples: 264622080. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:18:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:18:37,337][1652475] Updated weights for policy 0, policy_version 516754 (0.0013) [2024-06-15 18:18:39,352][1652475] Updated weights for policy 0, policy_version 516848 (0.0014) [2024-06-15 18:18:40,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1058537472. Throughput: 0: 10569.9. Samples: 264685056. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:18:40,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:18:42,120][1652475] Updated weights for policy 0, policy_version 516900 (0.0050) [2024-06-15 18:18:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 42431.8). Total num frames: 1058668544. Throughput: 0: 10899.9. Samples: 264755712. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:18:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:18:46,515][1652475] Updated weights for policy 0, policy_version 516960 (0.0035) [2024-06-15 18:18:48,591][1651340] Signal inference workers to stop experience collection... (26550 times) [2024-06-15 18:18:48,646][1652475] InferenceWorker_p0-w0: stopping experience collection (26550 times) [2024-06-15 18:18:48,767][1651340] Signal inference workers to resume experience collection... (26550 times) [2024-06-15 18:18:48,768][1652475] InferenceWorker_p0-w0: resuming experience collection (26550 times) [2024-06-15 18:18:48,903][1652475] Updated weights for policy 0, policy_version 517009 (0.0014) [2024-06-15 18:18:49,881][1652475] Updated weights for policy 0, policy_version 517061 (0.0013) [2024-06-15 18:18:50,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 43692.1, 300 sec: 43320.4). Total num frames: 1058996224. Throughput: 0: 10831.6. Samples: 264790016. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:18:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:18:53,377][1652475] Updated weights for policy 0, policy_version 517141 (0.0014) [2024-06-15 18:18:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 1059192832. Throughput: 0: 10797.5. Samples: 264849920. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:18:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:18:55,751][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000517184_1059192832.pth... [2024-06-15 18:18:55,804][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000512128_1048838144.pth [2024-06-15 18:18:59,004][1652475] Updated weights for policy 0, policy_version 517232 (0.0013) [2024-06-15 18:19:00,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 42052.4, 300 sec: 42765.0). Total num frames: 1059356672. Throughput: 0: 10797.5. Samples: 264919552. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:00,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:19:01,085][1652475] Updated weights for policy 0, policy_version 517296 (0.0013) [2024-06-15 18:19:02,286][1652475] Updated weights for policy 0, policy_version 517347 (0.0025) [2024-06-15 18:19:05,739][1648984] Fps is (10 sec: 39318.0, 60 sec: 43690.0, 300 sec: 43098.1). Total num frames: 1059586048. Throughput: 0: 10660.8. Samples: 264947200. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:05,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:19:07,321][1652475] Updated weights for policy 0, policy_version 517411 (0.0014) [2024-06-15 18:19:08,879][1652475] Updated weights for policy 0, policy_version 517443 (0.0014) [2024-06-15 18:19:10,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 1059848192. Throughput: 0: 11025.1. Samples: 265026560. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:10,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 18:19:11,243][1652475] Updated weights for policy 0, policy_version 517520 (0.0023) [2024-06-15 18:19:13,460][1652475] Updated weights for policy 0, policy_version 517606 (0.0100) [2024-06-15 18:19:15,738][1648984] Fps is (10 sec: 52433.8, 60 sec: 45329.1, 300 sec: 43542.6). Total num frames: 1060110336. Throughput: 0: 10934.0. Samples: 265084928. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:19:20,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 1060175872. Throughput: 0: 11127.5. Samples: 265122816. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:19:20,795][1652475] Updated weights for policy 0, policy_version 517667 (0.0014) [2024-06-15 18:19:22,551][1652475] Updated weights for policy 0, policy_version 517745 (0.0012) [2024-06-15 18:19:24,036][1652475] Updated weights for policy 0, policy_version 517821 (0.0118) [2024-06-15 18:19:25,435][1652475] Updated weights for policy 0, policy_version 517878 (0.0015) [2024-06-15 18:19:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 1060634624. Throughput: 0: 11047.9. Samples: 265182208. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:19:30,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 1060634624. Throughput: 0: 11218.5. Samples: 265260544. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:19:31,796][1652475] Updated weights for policy 0, policy_version 517936 (0.0012) [2024-06-15 18:19:33,173][1651340] Signal inference workers to stop experience collection... (26600 times) [2024-06-15 18:19:33,226][1652475] InferenceWorker_p0-w0: stopping experience collection (26600 times) [2024-06-15 18:19:33,373][1651340] Signal inference workers to resume experience collection... (26600 times) [2024-06-15 18:19:33,375][1652475] InferenceWorker_p0-w0: resuming experience collection (26600 times) [2024-06-15 18:19:34,189][1652475] Updated weights for policy 0, policy_version 518000 (0.0014) [2024-06-15 18:19:35,615][1652475] Updated weights for policy 0, policy_version 518069 (0.0013) [2024-06-15 18:19:35,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 45329.0, 300 sec: 43320.4). Total num frames: 1060995072. Throughput: 0: 11138.8. Samples: 265291264. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:19:37,253][1652475] Updated weights for policy 0, policy_version 518136 (0.0017) [2024-06-15 18:19:40,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.8, 300 sec: 43542.5). Total num frames: 1061158912. Throughput: 0: 11161.6. Samples: 265352192. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:19:44,199][1652475] Updated weights for policy 0, policy_version 518179 (0.0015) [2024-06-15 18:19:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 1061355520. Throughput: 0: 11218.5. Samples: 265424384. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:19:45,799][1652475] Updated weights for policy 0, policy_version 518245 (0.0027) [2024-06-15 18:19:47,958][1652475] Updated weights for policy 0, policy_version 518336 (0.0017) [2024-06-15 18:19:49,487][1652475] Updated weights for policy 0, policy_version 518394 (0.0012) [2024-06-15 18:19:50,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 1061683200. Throughput: 0: 11070.8. Samples: 265445376. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:19:55,744][1648984] Fps is (10 sec: 36024.1, 60 sec: 42048.2, 300 sec: 42653.1). Total num frames: 1061715968. Throughput: 0: 10989.5. Samples: 265521152. Policy #0 lag: (min: 40.0, avg: 146.2, max: 296.0) [2024-06-15 18:19:55,744][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:19:56,204][1652475] Updated weights for policy 0, policy_version 518436 (0.0014) [2024-06-15 18:19:58,451][1652475] Updated weights for policy 0, policy_version 518513 (0.0015) [2024-06-15 18:20:00,254][1652475] Updated weights for policy 0, policy_version 518582 (0.0014) [2024-06-15 18:20:00,752][1648984] Fps is (10 sec: 39266.6, 60 sec: 45318.5, 300 sec: 43096.2). Total num frames: 1062076416. Throughput: 0: 10851.0. Samples: 265573376. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:00,752][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:20:02,027][1652475] Updated weights for policy 0, policy_version 518644 (0.0012) [2024-06-15 18:20:05,738][1648984] Fps is (10 sec: 49180.4, 60 sec: 43691.3, 300 sec: 42987.2). Total num frames: 1062207488. Throughput: 0: 10729.2. Samples: 265605632. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:20:07,832][1652475] Updated weights for policy 0, policy_version 518704 (0.0013) [2024-06-15 18:20:09,832][1652475] Updated weights for policy 0, policy_version 518782 (0.0127) [2024-06-15 18:20:10,738][1648984] Fps is (10 sec: 42657.8, 60 sec: 44236.7, 300 sec: 42765.0). Total num frames: 1062502400. Throughput: 0: 11013.7. Samples: 265677824. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:20:14,038][1652475] Updated weights for policy 0, policy_version 518849 (0.0025) [2024-06-15 18:20:14,705][1651340] Signal inference workers to stop experience collection... (26650 times) [2024-06-15 18:20:14,783][1652475] InferenceWorker_p0-w0: stopping experience collection (26650 times) [2024-06-15 18:20:15,035][1651340] Signal inference workers to resume experience collection... (26650 times) [2024-06-15 18:20:15,035][1652475] InferenceWorker_p0-w0: resuming experience collection (26650 times) [2024-06-15 18:20:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43542.7). Total num frames: 1062731776. Throughput: 0: 10626.8. Samples: 265738752. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:15,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 18:20:19,258][1652475] Updated weights for policy 0, policy_version 518944 (0.0012) [2024-06-15 18:20:20,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 44782.8, 300 sec: 42431.8). Total num frames: 1062862848. Throughput: 0: 10843.0. Samples: 265779200. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:20,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:20:22,061][1652475] Updated weights for policy 0, policy_version 519042 (0.0019) [2024-06-15 18:20:25,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 1063124992. Throughput: 0: 10695.1. Samples: 265833472. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:25,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 18:20:27,257][1652475] Updated weights for policy 0, policy_version 519136 (0.0107) [2024-06-15 18:20:30,743][1648984] Fps is (10 sec: 39300.5, 60 sec: 43686.6, 300 sec: 42653.1). Total num frames: 1063256064. Throughput: 0: 10773.4. Samples: 265909248. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:30,744][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:20:31,243][1652475] Updated weights for policy 0, policy_version 519184 (0.0015) [2024-06-15 18:20:32,722][1652475] Updated weights for policy 0, policy_version 519251 (0.0021) [2024-06-15 18:20:35,111][1652475] Updated weights for policy 0, policy_version 519328 (0.0012) [2024-06-15 18:20:35,739][1648984] Fps is (10 sec: 49147.2, 60 sec: 43689.8, 300 sec: 42987.0). Total num frames: 1063616512. Throughput: 0: 10956.5. Samples: 265938432. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:35,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:20:39,387][1652475] Updated weights for policy 0, policy_version 519408 (0.0014) [2024-06-15 18:20:40,738][1648984] Fps is (10 sec: 52456.0, 60 sec: 43690.4, 300 sec: 43098.2). Total num frames: 1063780352. Throughput: 0: 10616.8. Samples: 265998848. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:40,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:20:44,145][1652475] Updated weights for policy 0, policy_version 519456 (0.0110) [2024-06-15 18:20:45,383][1652475] Updated weights for policy 0, policy_version 519505 (0.0013) [2024-06-15 18:20:45,738][1648984] Fps is (10 sec: 36048.9, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 1063976960. Throughput: 0: 10891.9. Samples: 266063360. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:20:46,427][1652475] Updated weights for policy 0, policy_version 519551 (0.0094) [2024-06-15 18:20:49,846][1652475] Updated weights for policy 0, policy_version 519620 (0.0013) [2024-06-15 18:20:50,738][1648984] Fps is (10 sec: 45877.0, 60 sec: 42598.4, 300 sec: 43327.5). Total num frames: 1064239104. Throughput: 0: 10922.7. Samples: 266097152. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:20:51,318][1652475] Updated weights for policy 0, policy_version 519678 (0.0018) [2024-06-15 18:20:55,742][1648984] Fps is (10 sec: 39304.1, 60 sec: 44237.7, 300 sec: 42764.4). Total num frames: 1064370176. Throughput: 0: 10739.6. Samples: 266161152. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:20:55,743][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:20:56,116][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000519728_1064402944.pth... [2024-06-15 18:20:56,163][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000514688_1054081024.pth [2024-06-15 18:20:56,320][1652475] Updated weights for policy 0, policy_version 519734 (0.0136) [2024-06-15 18:20:59,356][1652475] Updated weights for policy 0, policy_version 519795 (0.0013) [2024-06-15 18:21:00,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 41515.8, 300 sec: 42653.9). Total num frames: 1064566784. Throughput: 0: 10786.2. Samples: 266224128. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:21:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:21:01,100][1652475] Updated weights for policy 0, policy_version 519825 (0.0015) [2024-06-15 18:21:01,944][1651340] Signal inference workers to stop experience collection... (26700 times) [2024-06-15 18:21:01,980][1652475] InferenceWorker_p0-w0: stopping experience collection (26700 times) [2024-06-15 18:21:02,286][1651340] Signal inference workers to resume experience collection... (26700 times) [2024-06-15 18:21:02,287][1652475] InferenceWorker_p0-w0: resuming experience collection (26700 times) [2024-06-15 18:21:03,420][1652475] Updated weights for policy 0, policy_version 519920 (0.0013) [2024-06-15 18:21:05,738][1648984] Fps is (10 sec: 45896.1, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1064828928. Throughput: 0: 10365.2. Samples: 266245632. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:21:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:21:07,585][1652475] Updated weights for policy 0, policy_version 519968 (0.0034) [2024-06-15 18:21:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 40960.0, 300 sec: 42654.0). Total num frames: 1064960000. Throughput: 0: 10900.0. Samples: 266323968. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:21:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:21:11,040][1652475] Updated weights for policy 0, policy_version 520002 (0.0012) [2024-06-15 18:21:12,787][1652475] Updated weights for policy 0, policy_version 520080 (0.0013) [2024-06-15 18:21:14,945][1652475] Updated weights for policy 0, policy_version 520176 (0.0014) [2024-06-15 18:21:15,737][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 1065353216. Throughput: 0: 10377.9. Samples: 266376192. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:21:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:21:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.4, 300 sec: 42765.0). Total num frames: 1065385984. Throughput: 0: 10627.1. Samples: 266416640. Policy #0 lag: (min: 31.0, avg: 84.9, max: 287.0) [2024-06-15 18:21:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:21:20,770][1652475] Updated weights for policy 0, policy_version 520224 (0.0014) [2024-06-15 18:21:23,480][1652475] Updated weights for policy 0, policy_version 520293 (0.0016) [2024-06-15 18:21:25,222][1652475] Updated weights for policy 0, policy_version 520368 (0.0104) [2024-06-15 18:21:25,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 43690.8, 300 sec: 43098.2). Total num frames: 1065746432. Throughput: 0: 10683.8. Samples: 266479616. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:21:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:21:27,286][1652475] Updated weights for policy 0, policy_version 520444 (0.0122) [2024-06-15 18:21:30,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43694.7, 300 sec: 43209.4). Total num frames: 1065877504. Throughput: 0: 10729.3. Samples: 266546176. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:21:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:21:34,748][1652475] Updated weights for policy 0, policy_version 520498 (0.0014) [2024-06-15 18:21:35,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 40960.8, 300 sec: 42876.1). Total num frames: 1066074112. Throughput: 0: 10831.6. Samples: 266584576. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:21:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:21:36,815][1652475] Updated weights for policy 0, policy_version 520592 (0.0012) [2024-06-15 18:21:39,263][1652475] Updated weights for policy 0, policy_version 520672 (0.0015) [2024-06-15 18:21:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 1066401792. Throughput: 0: 10491.4. Samples: 266633216. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:21:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:21:45,571][1652475] Updated weights for policy 0, policy_version 520722 (0.0016) [2024-06-15 18:21:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40960.0, 300 sec: 42765.0). Total num frames: 1066434560. Throughput: 0: 10729.2. Samples: 266706944. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:21:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:21:46,311][1652475] Updated weights for policy 0, policy_version 520764 (0.0013) [2024-06-15 18:21:48,636][1651340] Signal inference workers to stop experience collection... (26750 times) [2024-06-15 18:21:48,692][1652475] InferenceWorker_p0-w0: stopping experience collection (26750 times) [2024-06-15 18:21:48,693][1652475] Updated weights for policy 0, policy_version 520820 (0.0013) [2024-06-15 18:21:48,855][1651340] Signal inference workers to resume experience collection... (26750 times) [2024-06-15 18:21:48,856][1652475] InferenceWorker_p0-w0: resuming experience collection (26750 times) [2024-06-15 18:21:50,188][1652475] Updated weights for policy 0, policy_version 520894 (0.0013) [2024-06-15 18:21:50,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1066827776. Throughput: 0: 10956.8. Samples: 266738688. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:21:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:21:51,394][1652475] Updated weights for policy 0, policy_version 520944 (0.0016) [2024-06-15 18:21:55,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 42601.6, 300 sec: 43098.2). Total num frames: 1066926080. Throughput: 0: 10729.2. Samples: 266806784. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:21:55,747][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:21:56,800][1652475] Updated weights for policy 0, policy_version 520976 (0.0016) [2024-06-15 18:21:57,797][1652475] Updated weights for policy 0, policy_version 521024 (0.0015) [2024-06-15 18:22:00,210][1652475] Updated weights for policy 0, policy_version 521091 (0.0085) [2024-06-15 18:22:00,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44783.0, 300 sec: 43653.6). Total num frames: 1067253760. Throughput: 0: 11104.7. Samples: 266875904. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:22:02,271][1652475] Updated weights for policy 0, policy_version 521168 (0.0014) [2024-06-15 18:22:03,438][1652475] Updated weights for policy 0, policy_version 521216 (0.0013) [2024-06-15 18:22:05,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1067450368. Throughput: 0: 10808.9. Samples: 266903040. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:22:08,755][1652475] Updated weights for policy 0, policy_version 521269 (0.0110) [2024-06-15 18:22:10,738][1648984] Fps is (10 sec: 36044.0, 60 sec: 44236.6, 300 sec: 43209.3). Total num frames: 1067614208. Throughput: 0: 11138.8. Samples: 266980864. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:10,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:22:11,262][1652475] Updated weights for policy 0, policy_version 521316 (0.0030) [2024-06-15 18:22:12,766][1652475] Updated weights for policy 0, policy_version 521378 (0.0018) [2024-06-15 18:22:14,230][1652475] Updated weights for policy 0, policy_version 521431 (0.0013) [2024-06-15 18:22:15,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.5, 300 sec: 43542.6). Total num frames: 1067974656. Throughput: 0: 10899.9. Samples: 267036672. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:22:20,096][1652475] Updated weights for policy 0, policy_version 521521 (0.0014) [2024-06-15 18:22:20,738][1648984] Fps is (10 sec: 49152.8, 60 sec: 45329.0, 300 sec: 43098.3). Total num frames: 1068105728. Throughput: 0: 11002.3. Samples: 267079680. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:20,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 18:22:24,306][1652475] Updated weights for policy 0, policy_version 521602 (0.0015) [2024-06-15 18:22:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1068367872. Throughput: 0: 11286.7. Samples: 267141120. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:22:26,558][1652475] Updated weights for policy 0, policy_version 521672 (0.0020) [2024-06-15 18:22:27,523][1652475] Updated weights for policy 0, policy_version 521723 (0.0014) [2024-06-15 18:22:30,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 43690.4, 300 sec: 43209.3). Total num frames: 1068498944. Throughput: 0: 11229.8. Samples: 267212288. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:30,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:22:31,942][1652475] Updated weights for policy 0, policy_version 521789 (0.0082) [2024-06-15 18:22:34,550][1652475] Updated weights for policy 0, policy_version 521850 (0.0013) [2024-06-15 18:22:35,017][1651340] Signal inference workers to stop experience collection... (26800 times) [2024-06-15 18:22:35,132][1652475] InferenceWorker_p0-w0: stopping experience collection (26800 times) [2024-06-15 18:22:35,258][1651340] Signal inference workers to resume experience collection... (26800 times) [2024-06-15 18:22:35,259][1652475] InferenceWorker_p0-w0: resuming experience collection (26800 times) [2024-06-15 18:22:35,738][1648984] Fps is (10 sec: 45874.5, 60 sec: 45875.1, 300 sec: 43764.7). Total num frames: 1068826624. Throughput: 0: 11298.1. Samples: 267247104. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:35,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:22:35,872][1652475] Updated weights for policy 0, policy_version 521891 (0.0013) [2024-06-15 18:22:38,797][1652475] Updated weights for policy 0, policy_version 521940 (0.0013) [2024-06-15 18:22:39,834][1652475] Updated weights for policy 0, policy_version 521984 (0.0016) [2024-06-15 18:22:40,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1069023232. Throughput: 0: 11207.1. Samples: 267311104. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:40,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:22:45,610][1652475] Updated weights for policy 0, policy_version 522053 (0.0015) [2024-06-15 18:22:45,738][1648984] Fps is (10 sec: 32768.7, 60 sec: 45329.1, 300 sec: 43320.7). Total num frames: 1069154304. Throughput: 0: 11275.4. Samples: 267383296. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:45,741][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:22:47,605][1652475] Updated weights for policy 0, policy_version 522144 (0.0113) [2024-06-15 18:22:50,264][1652475] Updated weights for policy 0, policy_version 522208 (0.0016) [2024-06-15 18:22:50,754][1648984] Fps is (10 sec: 49072.6, 60 sec: 44770.9, 300 sec: 43762.3). Total num frames: 1069514752. Throughput: 0: 11214.5. Samples: 267407872. Policy #0 lag: (min: 4.0, avg: 133.7, max: 304.0) [2024-06-15 18:22:50,755][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:22:53,851][1652475] Updated weights for policy 0, policy_version 522243 (0.0022) [2024-06-15 18:22:55,738][1648984] Fps is (10 sec: 52427.0, 60 sec: 45875.0, 300 sec: 43542.5). Total num frames: 1069678592. Throughput: 0: 11161.6. Samples: 267483136. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:22:55,739][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 18:22:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000522304_1069678592.pth... [2024-06-15 18:22:55,794][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000517184_1059192832.pth [2024-06-15 18:22:57,119][1652475] Updated weights for policy 0, policy_version 522320 (0.0092) [2024-06-15 18:22:59,882][1652475] Updated weights for policy 0, policy_version 522384 (0.0016) [2024-06-15 18:23:00,738][1648984] Fps is (10 sec: 39385.1, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 1069907968. Throughput: 0: 11275.4. Samples: 267544064. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:23:02,019][1652475] Updated weights for policy 0, policy_version 522466 (0.0012) [2024-06-15 18:23:05,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1070071808. Throughput: 0: 10831.7. Samples: 267567104. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:05,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:23:07,885][1652475] Updated weights for policy 0, policy_version 522512 (0.0016) [2024-06-15 18:23:09,786][1652475] Updated weights for policy 0, policy_version 522608 (0.0019) [2024-06-15 18:23:10,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 45329.1, 300 sec: 43875.8). Total num frames: 1070333952. Throughput: 0: 11047.8. Samples: 267638272. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:10,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:23:12,907][1652475] Updated weights for policy 0, policy_version 522658 (0.0014) [2024-06-15 18:23:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 1070596096. Throughput: 0: 10740.7. Samples: 267695616. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:23:19,636][1652475] Updated weights for policy 0, policy_version 522756 (0.0026) [2024-06-15 18:23:20,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1070727168. Throughput: 0: 10843.1. Samples: 267735040. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:23:21,787][1652475] Updated weights for policy 0, policy_version 522832 (0.0014) [2024-06-15 18:23:21,918][1651340] Signal inference workers to stop experience collection... (26850 times) [2024-06-15 18:23:21,974][1652475] InferenceWorker_p0-w0: stopping experience collection (26850 times) [2024-06-15 18:23:22,156][1651340] Signal inference workers to resume experience collection... (26850 times) [2024-06-15 18:23:22,157][1652475] InferenceWorker_p0-w0: resuming experience collection (26850 times) [2024-06-15 18:23:23,930][1652475] Updated weights for policy 0, policy_version 522886 (0.0021) [2024-06-15 18:23:25,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 1071022080. Throughput: 0: 10899.9. Samples: 267801600. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:23:25,948][1652475] Updated weights for policy 0, policy_version 522976 (0.0013) [2024-06-15 18:23:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 1071120384. Throughput: 0: 10695.1. Samples: 267864576. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:23:31,487][1652475] Updated weights for policy 0, policy_version 523027 (0.0013) [2024-06-15 18:23:33,768][1652475] Updated weights for policy 0, policy_version 523077 (0.0014) [2024-06-15 18:23:35,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 42598.6, 300 sec: 43542.6). Total num frames: 1071382528. Throughput: 0: 10972.1. Samples: 267901440. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:23:36,054][1652475] Updated weights for policy 0, policy_version 523154 (0.0016) [2024-06-15 18:23:37,673][1652475] Updated weights for policy 0, policy_version 523216 (0.0014) [2024-06-15 18:23:38,599][1652475] Updated weights for policy 0, policy_version 523257 (0.0016) [2024-06-15 18:23:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1071644672. Throughput: 0: 10535.9. Samples: 267957248. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:23:45,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 1071710208. Throughput: 0: 10956.8. Samples: 268037120. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:45,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:23:46,390][1652475] Updated weights for policy 0, policy_version 523328 (0.0014) [2024-06-15 18:23:47,798][1652475] Updated weights for policy 0, policy_version 523381 (0.0013) [2024-06-15 18:23:50,129][1652475] Updated weights for policy 0, policy_version 523472 (0.0013) [2024-06-15 18:23:50,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43156.2, 300 sec: 43764.7). Total num frames: 1072103424. Throughput: 0: 10934.1. Samples: 268059136. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:50,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:23:55,738][1648984] Fps is (10 sec: 45874.3, 60 sec: 41506.2, 300 sec: 43431.4). Total num frames: 1072168960. Throughput: 0: 10717.9. Samples: 268120576. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:23:55,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:23:58,005][1652475] Updated weights for policy 0, policy_version 523552 (0.0018) [2024-06-15 18:23:59,519][1652475] Updated weights for policy 0, policy_version 523603 (0.0031) [2024-06-15 18:24:00,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42052.3, 300 sec: 43542.7). Total num frames: 1072431104. Throughput: 0: 10877.2. Samples: 268185088. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:24:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:24:01,592][1652475] Updated weights for policy 0, policy_version 523684 (0.0100) [2024-06-15 18:24:02,548][1651340] Signal inference workers to stop experience collection... (26900 times) [2024-06-15 18:24:02,663][1652475] InferenceWorker_p0-w0: stopping experience collection (26900 times) [2024-06-15 18:24:02,711][1651340] Signal inference workers to resume experience collection... (26900 times) [2024-06-15 18:24:02,720][1652475] InferenceWorker_p0-w0: resuming experience collection (26900 times) [2024-06-15 18:24:02,743][1652475] Updated weights for policy 0, policy_version 523744 (0.0014) [2024-06-15 18:24:05,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1072693248. Throughput: 0: 10592.7. Samples: 268211712. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:24:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:24:09,667][1652475] Updated weights for policy 0, policy_version 523792 (0.0014) [2024-06-15 18:24:10,739][1648984] Fps is (10 sec: 36044.6, 60 sec: 40960.1, 300 sec: 42987.2). Total num frames: 1072791552. Throughput: 0: 10854.4. Samples: 268290048. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:24:10,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:24:11,361][1652475] Updated weights for policy 0, policy_version 523856 (0.0017) [2024-06-15 18:24:12,884][1652475] Updated weights for policy 0, policy_version 523908 (0.0097) [2024-06-15 18:24:14,561][1652475] Updated weights for policy 0, policy_version 523984 (0.0012) [2024-06-15 18:24:15,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.6, 300 sec: 44209.0). Total num frames: 1073217536. Throughput: 0: 10478.9. Samples: 268336128. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:24:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:24:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 1073217536. Throughput: 0: 10570.0. Samples: 268377088. Policy #0 lag: (min: 6.0, avg: 104.3, max: 262.0) [2024-06-15 18:24:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:24:21,661][1652475] Updated weights for policy 0, policy_version 524033 (0.0100) [2024-06-15 18:24:23,375][1652475] Updated weights for policy 0, policy_version 524096 (0.0168) [2024-06-15 18:24:25,738][1648984] Fps is (10 sec: 29491.7, 60 sec: 41506.2, 300 sec: 43653.6). Total num frames: 1073512448. Throughput: 0: 10774.8. Samples: 268442112. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:24:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:24:25,772][1652475] Updated weights for policy 0, policy_version 524192 (0.0095) [2024-06-15 18:24:27,355][1652475] Updated weights for policy 0, policy_version 524272 (0.0014) [2024-06-15 18:24:30,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1073741824. Throughput: 0: 10376.5. Samples: 268504064. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:24:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:24:35,193][1652475] Updated weights for policy 0, policy_version 524338 (0.0015) [2024-06-15 18:24:35,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1073872896. Throughput: 0: 10820.3. Samples: 268546048. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:24:35,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 18:24:37,257][1652475] Updated weights for policy 0, policy_version 524432 (0.0116) [2024-06-15 18:24:38,463][1652475] Updated weights for policy 0, policy_version 524496 (0.0013) [2024-06-15 18:24:40,738][1648984] Fps is (10 sec: 52427.5, 60 sec: 43690.5, 300 sec: 43764.7). Total num frames: 1074266112. Throughput: 0: 10592.7. Samples: 268597248. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:24:40,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:24:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1074266112. Throughput: 0: 10899.9. Samples: 268675584. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:24:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:24:46,194][1652475] Updated weights for policy 0, policy_version 524561 (0.0015) [2024-06-15 18:24:47,065][1651340] Signal inference workers to stop experience collection... (26950 times) [2024-06-15 18:24:47,115][1652475] InferenceWorker_p0-w0: stopping experience collection (26950 times) [2024-06-15 18:24:47,243][1651340] Signal inference workers to resume experience collection... (26950 times) [2024-06-15 18:24:47,244][1652475] InferenceWorker_p0-w0: resuming experience collection (26950 times) [2024-06-15 18:24:48,390][1652475] Updated weights for policy 0, policy_version 524669 (0.0016) [2024-06-15 18:24:50,259][1652475] Updated weights for policy 0, policy_version 524720 (0.0114) [2024-06-15 18:24:50,738][1648984] Fps is (10 sec: 39322.7, 60 sec: 42598.4, 300 sec: 43876.7). Total num frames: 1074659328. Throughput: 0: 10854.4. Samples: 268700160. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:24:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:24:51,997][1652475] Updated weights for policy 0, policy_version 524800 (0.0013) [2024-06-15 18:24:55,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 43100.3). Total num frames: 1074790400. Throughput: 0: 10649.6. Samples: 268769280. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:24:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:24:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000524800_1074790400.pth... [2024-06-15 18:24:55,784][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000519728_1064402944.pth [2024-06-15 18:24:59,144][1652475] Updated weights for policy 0, policy_version 524892 (0.0113) [2024-06-15 18:25:00,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 43653.7). Total num frames: 1075085312. Throughput: 0: 11150.3. Samples: 268837888. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:00,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:25:00,813][1652475] Updated weights for policy 0, policy_version 524960 (0.0015) [2024-06-15 18:25:03,184][1652475] Updated weights for policy 0, policy_version 525032 (0.0093) [2024-06-15 18:25:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.5, 300 sec: 43431.5). Total num frames: 1075314688. Throughput: 0: 10843.0. Samples: 268865024. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:05,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:25:09,266][1652475] Updated weights for policy 0, policy_version 525072 (0.0011) [2024-06-15 18:25:10,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 1075445760. Throughput: 0: 11218.5. Samples: 268946944. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:25:11,559][1652475] Updated weights for policy 0, policy_version 525159 (0.0017) [2024-06-15 18:25:13,327][1652475] Updated weights for policy 0, policy_version 525246 (0.0019) [2024-06-15 18:25:15,226][1652475] Updated weights for policy 0, policy_version 525311 (0.0014) [2024-06-15 18:25:15,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1075838976. Throughput: 0: 10888.5. Samples: 268994048. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:25:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1075838976. Throughput: 0: 10831.6. Samples: 269033472. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:25:24,330][1652475] Updated weights for policy 0, policy_version 525377 (0.0014) [2024-06-15 18:25:25,738][1648984] Fps is (10 sec: 26214.8, 60 sec: 43144.6, 300 sec: 43543.4). Total num frames: 1076101120. Throughput: 0: 11104.8. Samples: 269096960. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:25:26,505][1652475] Updated weights for policy 0, policy_version 525475 (0.0015) [2024-06-15 18:25:27,167][1651340] Signal inference workers to stop experience collection... (27000 times) [2024-06-15 18:25:27,213][1652475] InferenceWorker_p0-w0: stopping experience collection (27000 times) [2024-06-15 18:25:27,483][1651340] Signal inference workers to resume experience collection... (27000 times) [2024-06-15 18:25:27,484][1652475] InferenceWorker_p0-w0: resuming experience collection (27000 times) [2024-06-15 18:25:28,572][1652475] Updated weights for policy 0, policy_version 525560 (0.0012) [2024-06-15 18:25:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43209.5). Total num frames: 1076363264. Throughput: 0: 10570.0. Samples: 269151232. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:25:35,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42598.4, 300 sec: 42876.2). Total num frames: 1076428800. Throughput: 0: 10877.2. Samples: 269189632. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:25:36,215][1652475] Updated weights for policy 0, policy_version 525619 (0.0106) [2024-06-15 18:25:38,642][1652475] Updated weights for policy 0, policy_version 525728 (0.0014) [2024-06-15 18:25:40,684][1652475] Updated weights for policy 0, policy_version 525808 (0.0013) [2024-06-15 18:25:40,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.7, 300 sec: 43653.6). Total num frames: 1076854784. Throughput: 0: 10558.6. Samples: 269244416. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:25:45,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1076887552. Throughput: 0: 10604.1. Samples: 269315072. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:25:48,561][1652475] Updated weights for policy 0, policy_version 525891 (0.0117) [2024-06-15 18:25:50,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 42598.4, 300 sec: 43543.2). Total num frames: 1077215232. Throughput: 0: 10729.3. Samples: 269347840. Policy #0 lag: (min: 33.0, avg: 90.1, max: 289.0) [2024-06-15 18:25:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:25:50,865][1652475] Updated weights for policy 0, policy_version 525987 (0.0012) [2024-06-15 18:25:52,731][1652475] Updated weights for policy 0, policy_version 526032 (0.0014) [2024-06-15 18:25:55,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 43690.5, 300 sec: 43542.5). Total num frames: 1077411840. Throughput: 0: 10092.0. Samples: 269401088. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:25:55,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:25:59,571][1652475] Updated weights for policy 0, policy_version 526096 (0.0018) [2024-06-15 18:26:00,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 40413.9, 300 sec: 42987.2). Total num frames: 1077510144. Throughput: 0: 10695.1. Samples: 269475328. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:00,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:26:00,827][1652475] Updated weights for policy 0, policy_version 526144 (0.0015) [2024-06-15 18:26:03,125][1652475] Updated weights for policy 0, policy_version 526240 (0.0013) [2024-06-15 18:26:04,569][1652475] Updated weights for policy 0, policy_version 526275 (0.0015) [2024-06-15 18:26:05,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 1077936128. Throughput: 0: 10296.9. Samples: 269496832. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:26:10,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1077936128. Throughput: 0: 10478.9. Samples: 269568512. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:26:12,026][1652475] Updated weights for policy 0, policy_version 526368 (0.0100) [2024-06-15 18:26:13,001][1651340] Signal inference workers to stop experience collection... (27050 times) [2024-06-15 18:26:13,058][1652475] InferenceWorker_p0-w0: stopping experience collection (27050 times) [2024-06-15 18:26:13,323][1651340] Signal inference workers to resume experience collection... (27050 times) [2024-06-15 18:26:13,324][1652475] InferenceWorker_p0-w0: resuming experience collection (27050 times) [2024-06-15 18:26:14,035][1652475] Updated weights for policy 0, policy_version 526448 (0.0016) [2024-06-15 18:26:15,738][1648984] Fps is (10 sec: 26212.8, 60 sec: 39321.3, 300 sec: 43431.4). Total num frames: 1078198272. Throughput: 0: 10604.0. Samples: 269628416. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:26:16,126][1652475] Updated weights for policy 0, policy_version 526483 (0.0028) [2024-06-15 18:26:17,472][1652475] Updated weights for policy 0, policy_version 526544 (0.0014) [2024-06-15 18:26:20,741][1648984] Fps is (10 sec: 52411.1, 60 sec: 43688.2, 300 sec: 43097.8). Total num frames: 1078460416. Throughput: 0: 10375.7. Samples: 269656576. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:20,742][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:26:23,365][1652475] Updated weights for policy 0, policy_version 526596 (0.0013) [2024-06-15 18:26:25,737][1648984] Fps is (10 sec: 42601.4, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1078624256. Throughput: 0: 10717.9. Samples: 269726720. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:26:26,387][1652475] Updated weights for policy 0, policy_version 526711 (0.0013) [2024-06-15 18:26:28,708][1652475] Updated weights for policy 0, policy_version 526775 (0.0011) [2024-06-15 18:26:30,404][1652475] Updated weights for policy 0, policy_version 526804 (0.0011) [2024-06-15 18:26:30,738][1648984] Fps is (10 sec: 45890.4, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1078919168. Throughput: 0: 10513.1. Samples: 269788160. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:26:35,082][1652475] Updated weights for policy 0, policy_version 526850 (0.0013) [2024-06-15 18:26:35,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 43144.4, 300 sec: 42765.0). Total num frames: 1079017472. Throughput: 0: 10592.7. Samples: 269824512. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:26:36,537][1652475] Updated weights for policy 0, policy_version 526912 (0.0036) [2024-06-15 18:26:39,818][1652475] Updated weights for policy 0, policy_version 526992 (0.0025) [2024-06-15 18:26:40,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 43764.7). Total num frames: 1079345152. Throughput: 0: 10786.2. Samples: 269886464. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:26:40,954][1652475] Updated weights for policy 0, policy_version 527039 (0.0082) [2024-06-15 18:26:43,040][1652475] Updated weights for policy 0, policy_version 527092 (0.0015) [2024-06-15 18:26:45,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 1079508992. Throughput: 0: 10752.0. Samples: 269959168. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:26:48,746][1652475] Updated weights for policy 0, policy_version 527170 (0.0045) [2024-06-15 18:26:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1079771136. Throughput: 0: 10899.9. Samples: 269987328. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:50,741][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:26:51,749][1652475] Updated weights for policy 0, policy_version 527234 (0.0020) [2024-06-15 18:26:54,224][1652475] Updated weights for policy 0, policy_version 527301 (0.0011) [2024-06-15 18:26:55,415][1652475] Updated weights for policy 0, policy_version 527357 (0.0046) [2024-06-15 18:26:55,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.8, 300 sec: 43320.4). Total num frames: 1080033280. Throughput: 0: 10649.6. Samples: 270047744. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:26:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:26:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000527360_1080033280.pth... [2024-06-15 18:26:55,829][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000522304_1069678592.pth [2024-06-15 18:27:00,303][1651340] Signal inference workers to stop experience collection... (27100 times) [2024-06-15 18:27:00,357][1652475] InferenceWorker_p0-w0: stopping experience collection (27100 times) [2024-06-15 18:27:00,561][1651340] Signal inference workers to resume experience collection... (27100 times) [2024-06-15 18:27:00,562][1652475] InferenceWorker_p0-w0: resuming experience collection (27100 times) [2024-06-15 18:27:00,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 1080098816. Throughput: 0: 10865.9. Samples: 270117376. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:27:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:27:01,363][1652475] Updated weights for policy 0, policy_version 527420 (0.0029) [2024-06-15 18:27:02,662][1652475] Updated weights for policy 0, policy_version 527484 (0.0015) [2024-06-15 18:27:04,335][1652475] Updated weights for policy 0, policy_version 527536 (0.0013) [2024-06-15 18:27:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 1080426496. Throughput: 0: 10878.0. Samples: 270146048. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:27:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:27:06,452][1652475] Updated weights for policy 0, policy_version 527585 (0.0012) [2024-06-15 18:27:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1080557568. Throughput: 0: 10820.2. Samples: 270213632. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:27:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:27:12,038][1652475] Updated weights for policy 0, policy_version 527649 (0.0014) [2024-06-15 18:27:14,629][1652475] Updated weights for policy 0, policy_version 527699 (0.0013) [2024-06-15 18:27:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43691.1, 300 sec: 43098.3). Total num frames: 1080819712. Throughput: 0: 11013.7. Samples: 270283776. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:27:15,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:27:16,584][1652475] Updated weights for policy 0, policy_version 527785 (0.0013) [2024-06-15 18:27:17,184][1652475] Updated weights for policy 0, policy_version 527811 (0.0012) [2024-06-15 18:27:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43693.1, 300 sec: 43098.3). Total num frames: 1081081856. Throughput: 0: 10786.1. Samples: 270309888. Policy #0 lag: (min: 137.0, avg: 211.2, max: 377.0) [2024-06-15 18:27:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:27:23,962][1652475] Updated weights for policy 0, policy_version 527908 (0.0036) [2024-06-15 18:27:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.4, 300 sec: 43098.3). Total num frames: 1081212928. Throughput: 0: 10922.7. Samples: 270377984. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:27:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 18:27:26,412][1652475] Updated weights for policy 0, policy_version 527956 (0.0013) [2024-06-15 18:27:28,144][1652475] Updated weights for policy 0, policy_version 528039 (0.0013) [2024-06-15 18:27:29,392][1652475] Updated weights for policy 0, policy_version 528080 (0.0012) [2024-06-15 18:27:30,738][1648984] Fps is (10 sec: 52427.2, 60 sec: 44782.7, 300 sec: 43320.4). Total num frames: 1081606144. Throughput: 0: 10649.5. Samples: 270438400. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:27:30,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:27:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 1081606144. Throughput: 0: 10934.0. Samples: 270479360. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:27:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:27:37,353][1652475] Updated weights for policy 0, policy_version 528160 (0.0086) [2024-06-15 18:27:39,230][1652475] Updated weights for policy 0, policy_version 528256 (0.0024) [2024-06-15 18:27:40,738][1648984] Fps is (10 sec: 36045.6, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 1081966592. Throughput: 0: 10990.9. Samples: 270542336. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:27:40,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 18:27:40,817][1651340] Signal inference workers to stop experience collection... (27150 times) [2024-06-15 18:27:40,877][1652475] InferenceWorker_p0-w0: stopping experience collection (27150 times) [2024-06-15 18:27:40,880][1651340] Signal inference workers to resume experience collection... (27150 times) [2024-06-15 18:27:40,890][1652475] Updated weights for policy 0, policy_version 528320 (0.0126) [2024-06-15 18:27:40,909][1652475] InferenceWorker_p0-w0: resuming experience collection (27150 times) [2024-06-15 18:27:42,127][1652475] Updated weights for policy 0, policy_version 528382 (0.0013) [2024-06-15 18:27:45,741][1648984] Fps is (10 sec: 52419.2, 60 sec: 43689.4, 300 sec: 42767.1). Total num frames: 1082130432. Throughput: 0: 10888.1. Samples: 270607360. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:27:45,742][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 18:27:49,647][1652475] Updated weights for policy 0, policy_version 528439 (0.0014) [2024-06-15 18:27:50,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1082327040. Throughput: 0: 11070.6. Samples: 270644224. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:27:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:27:51,184][1652475] Updated weights for policy 0, policy_version 528503 (0.0012) [2024-06-15 18:27:52,756][1652475] Updated weights for policy 0, policy_version 528546 (0.0037) [2024-06-15 18:27:54,720][1652475] Updated weights for policy 0, policy_version 528624 (0.0013) [2024-06-15 18:27:55,738][1648984] Fps is (10 sec: 52438.1, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1082654720. Throughput: 0: 10717.9. Samples: 270695936. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:27:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:28:00,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 43144.6, 300 sec: 42765.0). Total num frames: 1082687488. Throughput: 0: 10865.8. Samples: 270772736. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:28:00,761][1652475] Updated weights for policy 0, policy_version 528657 (0.0013) [2024-06-15 18:28:02,564][1652475] Updated weights for policy 0, policy_version 528738 (0.0013) [2024-06-15 18:28:05,033][1652475] Updated weights for policy 0, policy_version 528824 (0.0013) [2024-06-15 18:28:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1083047936. Throughput: 0: 10888.5. Samples: 270799872. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:28:07,026][1652475] Updated weights for policy 0, policy_version 528880 (0.0013) [2024-06-15 18:28:10,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1083179008. Throughput: 0: 10865.8. Samples: 270866944. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:28:13,154][1652475] Updated weights for policy 0, policy_version 528928 (0.0014) [2024-06-15 18:28:15,192][1652475] Updated weights for policy 0, policy_version 529023 (0.0109) [2024-06-15 18:28:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1083441152. Throughput: 0: 10820.3. Samples: 270925312. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:28:17,422][1652475] Updated weights for policy 0, policy_version 529082 (0.0012) [2024-06-15 18:28:19,344][1652475] Updated weights for policy 0, policy_version 529136 (0.0012) [2024-06-15 18:28:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1083703296. Throughput: 0: 10672.4. Samples: 270959616. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:28:24,739][1652475] Updated weights for policy 0, policy_version 529170 (0.0013) [2024-06-15 18:28:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1083834368. Throughput: 0: 10854.4. Samples: 271030784. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:28:25,916][1652475] Updated weights for policy 0, policy_version 529232 (0.0100) [2024-06-15 18:28:26,397][1651340] Signal inference workers to stop experience collection... (27200 times) [2024-06-15 18:28:26,433][1652475] InferenceWorker_p0-w0: stopping experience collection (27200 times) [2024-06-15 18:28:26,686][1651340] Signal inference workers to resume experience collection... (27200 times) [2024-06-15 18:28:26,687][1652475] InferenceWorker_p0-w0: resuming experience collection (27200 times) [2024-06-15 18:28:29,549][1652475] Updated weights for policy 0, policy_version 529296 (0.0023) [2024-06-15 18:28:30,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.3, 300 sec: 42987.2). Total num frames: 1084063744. Throughput: 0: 10832.1. Samples: 271094784. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 18:28:31,430][1652475] Updated weights for policy 0, policy_version 529368 (0.0013) [2024-06-15 18:28:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1084227584. Throughput: 0: 10661.0. Samples: 271123968. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:28:36,229][1652475] Updated weights for policy 0, policy_version 529440 (0.0014) [2024-06-15 18:28:38,413][1652475] Updated weights for policy 0, policy_version 529530 (0.0019) [2024-06-15 18:28:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 1084489728. Throughput: 0: 10865.8. Samples: 271184896. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:28:42,829][1652475] Updated weights for policy 0, policy_version 529599 (0.0144) [2024-06-15 18:28:44,648][1652475] Updated weights for policy 0, policy_version 529655 (0.0014) [2024-06-15 18:28:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43692.0, 300 sec: 42876.1). Total num frames: 1084751872. Throughput: 0: 10695.1. Samples: 271254016. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:28:48,479][1652475] Updated weights for policy 0, policy_version 529714 (0.0016) [2024-06-15 18:28:50,094][1652475] Updated weights for policy 0, policy_version 529776 (0.0069) [2024-06-15 18:28:50,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 1085014016. Throughput: 0: 10888.5. Samples: 271289856. Policy #0 lag: (min: 13.0, avg: 105.6, max: 269.0) [2024-06-15 18:28:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:28:54,294][1652475] Updated weights for policy 0, policy_version 529853 (0.0013) [2024-06-15 18:28:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1085145088. Throughput: 0: 10922.7. Samples: 271358464. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:28:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:28:56,067][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000529888_1085210624.pth... [2024-06-15 18:28:56,202][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000524800_1074790400.pth [2024-06-15 18:28:56,708][1652475] Updated weights for policy 0, policy_version 529910 (0.0012) [2024-06-15 18:29:00,048][1652475] Updated weights for policy 0, policy_version 529990 (0.0042) [2024-06-15 18:29:00,738][1648984] Fps is (10 sec: 45873.5, 60 sec: 46421.0, 300 sec: 43320.4). Total num frames: 1085472768. Throughput: 0: 11059.1. Samples: 271422976. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:00,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:29:04,876][1652475] Updated weights for policy 0, policy_version 530050 (0.0012) [2024-06-15 18:29:05,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 42598.3, 300 sec: 43431.5). Total num frames: 1085603840. Throughput: 0: 11150.2. Samples: 271461376. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:05,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:29:06,060][1652475] Updated weights for policy 0, policy_version 530111 (0.0017) [2024-06-15 18:29:07,553][1652475] Updated weights for policy 0, policy_version 530168 (0.0013) [2024-06-15 18:29:10,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 1085865984. Throughput: 0: 11093.3. Samples: 271529984. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:29:10,935][1652475] Updated weights for policy 0, policy_version 530233 (0.0015) [2024-06-15 18:29:11,647][1651340] Signal inference workers to stop experience collection... (27250 times) [2024-06-15 18:29:11,739][1652475] InferenceWorker_p0-w0: stopping experience collection (27250 times) [2024-06-15 18:29:11,952][1651340] Signal inference workers to resume experience collection... (27250 times) [2024-06-15 18:29:11,953][1652475] InferenceWorker_p0-w0: resuming experience collection (27250 times) [2024-06-15 18:29:12,868][1652475] Updated weights for policy 0, policy_version 530296 (0.0071) [2024-06-15 18:29:15,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1086062592. Throughput: 0: 11173.0. Samples: 271597568. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:15,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 18:29:17,013][1652475] Updated weights for policy 0, policy_version 530341 (0.0014) [2024-06-15 18:29:18,211][1652475] Updated weights for policy 0, policy_version 530384 (0.0013) [2024-06-15 18:29:19,331][1652475] Updated weights for policy 0, policy_version 530432 (0.0097) [2024-06-15 18:29:20,738][1648984] Fps is (10 sec: 45874.3, 60 sec: 43690.5, 300 sec: 43431.4). Total num frames: 1086324736. Throughput: 0: 11241.2. Samples: 271629824. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:20,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 18:29:22,235][1652475] Updated weights for policy 0, policy_version 530485 (0.0020) [2024-06-15 18:29:25,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 1086554112. Throughput: 0: 11514.3. Samples: 271703040. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:29:26,031][1652475] Updated weights for policy 0, policy_version 530558 (0.0022) [2024-06-15 18:29:28,577][1652475] Updated weights for policy 0, policy_version 530624 (0.0012) [2024-06-15 18:29:30,591][1652475] Updated weights for policy 0, policy_version 530685 (0.0014) [2024-06-15 18:29:30,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 46421.3, 300 sec: 43986.9). Total num frames: 1086849024. Throughput: 0: 11355.0. Samples: 271764992. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:29:33,490][1652475] Updated weights for policy 0, policy_version 530752 (0.0013) [2024-06-15 18:29:35,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 45875.1, 300 sec: 43098.3). Total num frames: 1086980096. Throughput: 0: 11264.0. Samples: 271796736. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:29:38,234][1652475] Updated weights for policy 0, policy_version 530812 (0.0020) [2024-06-15 18:29:40,167][1652475] Updated weights for policy 0, policy_version 530879 (0.0013) [2024-06-15 18:29:40,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 45875.0, 300 sec: 43986.8). Total num frames: 1087242240. Throughput: 0: 11343.6. Samples: 271868928. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:40,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:29:44,784][1652475] Updated weights for policy 0, policy_version 530960 (0.0127) [2024-06-15 18:29:45,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 1087471616. Throughput: 0: 11252.7. Samples: 271929344. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:29:45,841][1652475] Updated weights for policy 0, policy_version 531000 (0.0014) [2024-06-15 18:29:49,474][1652475] Updated weights for policy 0, policy_version 531056 (0.0014) [2024-06-15 18:29:50,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1087635456. Throughput: 0: 11195.8. Samples: 271965184. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:29:52,370][1652475] Updated weights for policy 0, policy_version 531128 (0.0018) [2024-06-15 18:29:54,085][1652475] Updated weights for policy 0, policy_version 531200 (0.0011) [2024-06-15 18:29:55,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45875.2, 300 sec: 43431.5). Total num frames: 1087897600. Throughput: 0: 10979.6. Samples: 272024064. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:29:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:29:59,640][1652475] Updated weights for policy 0, policy_version 531262 (0.0017) [2024-06-15 18:30:00,254][1651340] Signal inference workers to stop experience collection... (27300 times) [2024-06-15 18:30:00,286][1652475] InferenceWorker_p0-w0: stopping experience collection (27300 times) [2024-06-15 18:30:00,486][1651340] Signal inference workers to resume experience collection... (27300 times) [2024-06-15 18:30:00,487][1652475] InferenceWorker_p0-w0: resuming experience collection (27300 times) [2024-06-15 18:30:00,737][1648984] Fps is (10 sec: 45875.9, 60 sec: 43691.0, 300 sec: 43320.4). Total num frames: 1088094208. Throughput: 0: 10991.0. Samples: 272092160. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:30:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:30:01,150][1652475] Updated weights for policy 0, policy_version 531325 (0.0072) [2024-06-15 18:30:04,616][1652475] Updated weights for policy 0, policy_version 531376 (0.0012) [2024-06-15 18:30:05,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 43764.7). Total num frames: 1088356352. Throughput: 0: 11013.8. Samples: 272125440. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:30:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:30:06,172][1652475] Updated weights for policy 0, policy_version 531451 (0.0147) [2024-06-15 18:30:10,740][1648984] Fps is (10 sec: 36038.1, 60 sec: 43143.3, 300 sec: 42764.8). Total num frames: 1088454656. Throughput: 0: 10888.1. Samples: 272193024. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:30:10,741][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:30:11,193][1652475] Updated weights for policy 0, policy_version 531490 (0.0014) [2024-06-15 18:30:13,261][1652475] Updated weights for policy 0, policy_version 531582 (0.0112) [2024-06-15 18:30:15,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1088684032. Throughput: 0: 10990.9. Samples: 272259584. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:30:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 18:30:16,617][1652475] Updated weights for policy 0, policy_version 531636 (0.0079) [2024-06-15 18:30:19,119][1652475] Updated weights for policy 0, policy_version 531703 (0.0013) [2024-06-15 18:30:20,770][1648984] Fps is (10 sec: 49000.9, 60 sec: 43667.2, 300 sec: 43537.8). Total num frames: 1088946176. Throughput: 0: 10983.0. Samples: 272291328. Policy #0 lag: (min: 5.0, avg: 103.2, max: 261.0) [2024-06-15 18:30:20,771][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:30:22,309][1652475] Updated weights for policy 0, policy_version 531744 (0.0098) [2024-06-15 18:30:24,460][1652475] Updated weights for policy 0, policy_version 531828 (0.0014) [2024-06-15 18:30:25,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 44236.7, 300 sec: 43542.6). Total num frames: 1089208320. Throughput: 0: 10592.7. Samples: 272345600. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:30:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:30:30,738][1648984] Fps is (10 sec: 39450.1, 60 sec: 41506.1, 300 sec: 43764.7). Total num frames: 1089339392. Throughput: 0: 10831.6. Samples: 272416768. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:30:30,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:30:31,032][1652475] Updated weights for policy 0, policy_version 531905 (0.0015) [2024-06-15 18:30:34,331][1652475] Updated weights for policy 0, policy_version 531984 (0.0014) [2024-06-15 18:30:35,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1089601536. Throughput: 0: 10763.4. Samples: 272449536. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:30:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:30:36,126][1652475] Updated weights for policy 0, policy_version 532054 (0.0014) [2024-06-15 18:30:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.3, 300 sec: 43542.6). Total num frames: 1089732608. Throughput: 0: 10934.0. Samples: 272516096. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:30:40,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:30:42,708][1652475] Updated weights for policy 0, policy_version 532150 (0.0014) [2024-06-15 18:30:44,191][1652475] Updated weights for policy 0, policy_version 532224 (0.0013) [2024-06-15 18:30:45,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 1089994752. Throughput: 0: 10706.5. Samples: 272573952. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:30:45,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 18:30:46,632][1651340] Signal inference workers to stop experience collection... (27350 times) [2024-06-15 18:30:46,667][1652475] InferenceWorker_p0-w0: stopping experience collection (27350 times) [2024-06-15 18:30:46,889][1651340] Signal inference workers to resume experience collection... (27350 times) [2024-06-15 18:30:46,889][1652475] InferenceWorker_p0-w0: resuming experience collection (27350 times) [2024-06-15 18:30:47,775][1652475] Updated weights for policy 0, policy_version 532304 (0.0012) [2024-06-15 18:30:50,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1090256896. Throughput: 0: 10501.7. Samples: 272598016. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:30:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:30:55,090][1652475] Updated weights for policy 0, policy_version 532384 (0.0011) [2024-06-15 18:30:55,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 41505.9, 300 sec: 43653.6). Total num frames: 1090387968. Throughput: 0: 10661.3. Samples: 272672768. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:30:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:30:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000532416_1090387968.pth... [2024-06-15 18:30:55,796][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000527360_1080033280.pth [2024-06-15 18:30:55,801][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000532416_1090387968.pth [2024-06-15 18:30:57,044][1652475] Updated weights for policy 0, policy_version 532434 (0.0018) [2024-06-15 18:30:59,085][1652475] Updated weights for policy 0, policy_version 532512 (0.0014) [2024-06-15 18:31:00,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.5, 300 sec: 43320.4). Total num frames: 1090715648. Throughput: 0: 10296.9. Samples: 272722944. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:31:00,910][1652475] Updated weights for policy 0, policy_version 532581 (0.0014) [2024-06-15 18:31:05,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 40413.9, 300 sec: 43542.6). Total num frames: 1090781184. Throughput: 0: 10293.0. Samples: 272754176. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:05,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 18:31:07,596][1652475] Updated weights for policy 0, policy_version 532624 (0.0012) [2024-06-15 18:31:09,812][1652475] Updated weights for policy 0, policy_version 532688 (0.0024) [2024-06-15 18:31:10,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 42599.6, 300 sec: 43431.6). Total num frames: 1091010560. Throughput: 0: 10706.5. Samples: 272827392. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 18:31:12,224][1652475] Updated weights for policy 0, policy_version 532787 (0.0036) [2024-06-15 18:31:14,137][1652475] Updated weights for policy 0, policy_version 532864 (0.0013) [2024-06-15 18:31:15,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43543.0). Total num frames: 1091305472. Throughput: 0: 10217.2. Samples: 272876544. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:15,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 18:31:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40982.3, 300 sec: 43320.4). Total num frames: 1091403776. Throughput: 0: 10365.2. Samples: 272915968. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 18:31:20,971][1652475] Updated weights for policy 0, policy_version 532925 (0.0099) [2024-06-15 18:31:23,265][1652475] Updated weights for policy 0, policy_version 532988 (0.0014) [2024-06-15 18:31:24,766][1652475] Updated weights for policy 0, policy_version 533040 (0.0013) [2024-06-15 18:31:25,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 43431.5). Total num frames: 1091731456. Throughput: 0: 10240.0. Samples: 272976896. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:31:26,528][1652475] Updated weights for policy 0, policy_version 533120 (0.0014) [2024-06-15 18:31:30,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 1091829760. Throughput: 0: 10353.8. Samples: 273039872. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:31:33,849][1651340] Signal inference workers to stop experience collection... (27400 times) [2024-06-15 18:31:33,912][1652475] InferenceWorker_p0-w0: stopping experience collection (27400 times) [2024-06-15 18:31:34,141][1651340] Signal inference workers to resume experience collection... (27400 times) [2024-06-15 18:31:34,142][1652475] InferenceWorker_p0-w0: resuming experience collection (27400 times) [2024-06-15 18:31:34,144][1652475] Updated weights for policy 0, policy_version 533200 (0.0014) [2024-06-15 18:31:35,744][1648984] Fps is (10 sec: 36020.7, 60 sec: 41501.6, 300 sec: 43208.4). Total num frames: 1092091904. Throughput: 0: 10534.2. Samples: 273072128. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:35,745][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:31:37,174][1652475] Updated weights for policy 0, policy_version 533280 (0.0034) [2024-06-15 18:31:38,822][1652475] Updated weights for policy 0, policy_version 533314 (0.0028) [2024-06-15 18:31:39,753][1652475] Updated weights for policy 0, policy_version 533368 (0.0013) [2024-06-15 18:31:40,740][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1092354048. Throughput: 0: 10160.4. Samples: 273129984. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:40,741][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 18:31:45,073][1652475] Updated weights for policy 0, policy_version 533429 (0.0016) [2024-06-15 18:31:45,738][1648984] Fps is (10 sec: 39347.8, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1092485120. Throughput: 0: 10581.3. Samples: 273199104. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:45,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 18:31:46,222][1652475] Updated weights for policy 0, policy_version 533472 (0.0020) [2024-06-15 18:31:48,135][1652475] Updated weights for policy 0, policy_version 533505 (0.0023) [2024-06-15 18:31:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1092747264. Throughput: 0: 10615.4. Samples: 273231872. Policy #0 lag: (min: 47.0, avg: 145.2, max: 303.0) [2024-06-15 18:31:50,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 18:31:51,661][1652475] Updated weights for policy 0, policy_version 533569 (0.0015) [2024-06-15 18:31:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.3, 300 sec: 43320.4). Total num frames: 1092878336. Throughput: 0: 10353.7. Samples: 273293312. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:31:55,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 18:31:56,201][1652475] Updated weights for policy 0, policy_version 533636 (0.0013) [2024-06-15 18:31:57,377][1652475] Updated weights for policy 0, policy_version 533687 (0.0039) [2024-06-15 18:31:58,571][1652475] Updated weights for policy 0, policy_version 533716 (0.0012) [2024-06-15 18:32:00,739][1648984] Fps is (10 sec: 39318.4, 60 sec: 40413.3, 300 sec: 43098.1). Total num frames: 1093140480. Throughput: 0: 10683.5. Samples: 273357312. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:00,739][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 18:32:01,163][1652475] Updated weights for policy 0, policy_version 533792 (0.0077) [2024-06-15 18:32:04,749][1652475] Updated weights for policy 0, policy_version 533881 (0.0014) [2024-06-15 18:32:05,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1093402624. Throughput: 0: 10513.1. Samples: 273389056. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:32:09,536][1652475] Updated weights for policy 0, policy_version 533936 (0.0018) [2024-06-15 18:32:10,738][1648984] Fps is (10 sec: 39324.9, 60 sec: 42052.2, 300 sec: 43098.3). Total num frames: 1093533696. Throughput: 0: 10547.2. Samples: 273451520. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:32:11,915][1652475] Updated weights for policy 0, policy_version 533971 (0.0011) [2024-06-15 18:32:13,996][1652475] Updated weights for policy 0, policy_version 534080 (0.0018) [2024-06-15 18:32:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1093828608. Throughput: 0: 10626.8. Samples: 273518080. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:15,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:32:20,637][1652475] Updated weights for policy 0, policy_version 534146 (0.0013) [2024-06-15 18:32:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1093926912. Throughput: 0: 10560.2. Samples: 273547264. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:32:24,049][1651340] Signal inference workers to stop experience collection... (27450 times) [2024-06-15 18:32:24,226][1652475] InferenceWorker_p0-w0: stopping experience collection (27450 times) [2024-06-15 18:32:24,253][1652475] Updated weights for policy 0, policy_version 534219 (0.0013) [2024-06-15 18:32:24,322][1651340] Signal inference workers to resume experience collection... (27450 times) [2024-06-15 18:32:24,323][1652475] InferenceWorker_p0-w0: resuming experience collection (27450 times) [2024-06-15 18:32:25,522][1652475] Updated weights for policy 0, policy_version 534277 (0.0015) [2024-06-15 18:32:25,738][1648984] Fps is (10 sec: 39319.6, 60 sec: 41505.8, 300 sec: 42765.0). Total num frames: 1094221824. Throughput: 0: 10877.0. Samples: 273619456. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:25,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:32:27,191][1652475] Updated weights for policy 0, policy_version 534353 (0.0200) [2024-06-15 18:32:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1094451200. Throughput: 0: 10513.1. Samples: 273672192. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:32:35,257][1652475] Updated weights for policy 0, policy_version 534417 (0.0013) [2024-06-15 18:32:35,738][1648984] Fps is (10 sec: 29492.6, 60 sec: 40418.4, 300 sec: 42542.9). Total num frames: 1094516736. Throughput: 0: 10661.0. Samples: 273711616. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:32:36,719][1652475] Updated weights for policy 0, policy_version 534482 (0.0012) [2024-06-15 18:32:38,885][1652475] Updated weights for policy 0, policy_version 534576 (0.0015) [2024-06-15 18:32:40,453][1652475] Updated weights for policy 0, policy_version 534651 (0.0136) [2024-06-15 18:32:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43542.8). Total num frames: 1094975488. Throughput: 0: 10547.2. Samples: 273767936. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:32:45,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 41506.2, 300 sec: 42876.1). Total num frames: 1094975488. Throughput: 0: 10740.8. Samples: 273840640. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:32:47,980][1652475] Updated weights for policy 0, policy_version 534713 (0.0015) [2024-06-15 18:32:49,004][1652475] Updated weights for policy 0, policy_version 534756 (0.0012) [2024-06-15 18:32:50,541][1652475] Updated weights for policy 0, policy_version 534836 (0.0013) [2024-06-15 18:32:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1095368704. Throughput: 0: 10888.5. Samples: 273879040. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:32:55,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1095499776. Throughput: 0: 10888.5. Samples: 273941504. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:32:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:32:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000534912_1095499776.pth... [2024-06-15 18:32:55,828][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000529888_1085210624.pth [2024-06-15 18:32:59,143][1652475] Updated weights for policy 0, policy_version 534928 (0.0161) [2024-06-15 18:33:00,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 42052.9, 300 sec: 42765.0). Total num frames: 1095663616. Throughput: 0: 11013.7. Samples: 274013696. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:33:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:33:01,533][1652475] Updated weights for policy 0, policy_version 535031 (0.0015) [2024-06-15 18:33:02,174][1651340] Signal inference workers to stop experience collection... (27500 times) [2024-06-15 18:33:02,233][1652475] InferenceWorker_p0-w0: stopping experience collection (27500 times) [2024-06-15 18:33:02,423][1651340] Signal inference workers to resume experience collection... (27500 times) [2024-06-15 18:33:02,424][1652475] InferenceWorker_p0-w0: resuming experience collection (27500 times) [2024-06-15 18:33:03,348][1652475] Updated weights for policy 0, policy_version 535107 (0.0150) [2024-06-15 18:33:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1096024064. Throughput: 0: 10854.4. Samples: 274035712. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:33:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:33:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1096024064. Throughput: 0: 10843.1. Samples: 274107392. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:33:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:33:11,209][1652475] Updated weights for policy 0, policy_version 535175 (0.0013) [2024-06-15 18:33:13,130][1652475] Updated weights for policy 0, policy_version 535264 (0.0014) [2024-06-15 18:33:14,760][1652475] Updated weights for policy 0, policy_version 535330 (0.0014) [2024-06-15 18:33:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 1096417280. Throughput: 0: 10922.7. Samples: 274163712. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:33:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 18:33:16,674][1652475] Updated weights for policy 0, policy_version 535418 (0.0015) [2024-06-15 18:33:20,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1096548352. Throughput: 0: 10717.9. Samples: 274193920. Policy #0 lag: (min: 30.0, avg: 152.3, max: 286.0) [2024-06-15 18:33:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:33:24,147][1652475] Updated weights for policy 0, policy_version 535472 (0.0094) [2024-06-15 18:33:25,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42598.7, 300 sec: 43098.2). Total num frames: 1096777728. Throughput: 0: 11195.7. Samples: 274271744. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:33:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:33:26,696][1652475] Updated weights for policy 0, policy_version 535572 (0.0015) [2024-06-15 18:33:28,481][1652475] Updated weights for policy 0, policy_version 535634 (0.0014) [2024-06-15 18:33:30,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1097072640. Throughput: 0: 10808.9. Samples: 274327040. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:33:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:33:34,782][1652475] Updated weights for policy 0, policy_version 535681 (0.0036) [2024-06-15 18:33:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 1097170944. Throughput: 0: 10831.6. Samples: 274366464. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:33:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:33:36,820][1652475] Updated weights for policy 0, policy_version 535778 (0.0077) [2024-06-15 18:33:38,428][1652475] Updated weights for policy 0, policy_version 535856 (0.0014) [2024-06-15 18:33:40,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1097531392. Throughput: 0: 10774.8. Samples: 274426368. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:33:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:33:41,034][1652475] Updated weights for policy 0, policy_version 535920 (0.0097) [2024-06-15 18:33:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1097596928. Throughput: 0: 10729.2. Samples: 274496512. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:33:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:33:47,110][1651340] Signal inference workers to stop experience collection... (27550 times) [2024-06-15 18:33:47,188][1652475] InferenceWorker_p0-w0: stopping experience collection (27550 times) [2024-06-15 18:33:47,190][1652475] Updated weights for policy 0, policy_version 535957 (0.0013) [2024-06-15 18:33:47,318][1651340] Signal inference workers to resume experience collection... (27550 times) [2024-06-15 18:33:47,319][1652475] InferenceWorker_p0-w0: resuming experience collection (27550 times) [2024-06-15 18:33:48,570][1652475] Updated weights for policy 0, policy_version 536024 (0.0013) [2024-06-15 18:33:50,429][1652475] Updated weights for policy 0, policy_version 536112 (0.0119) [2024-06-15 18:33:50,738][1648984] Fps is (10 sec: 45873.5, 60 sec: 43690.4, 300 sec: 43542.5). Total num frames: 1097990144. Throughput: 0: 10945.3. Samples: 274528256. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:33:50,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:33:52,837][1652475] Updated weights for policy 0, policy_version 536182 (0.0026) [2024-06-15 18:33:55,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1098121216. Throughput: 0: 10638.2. Samples: 274586112. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:33:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:33:58,891][1652475] Updated weights for policy 0, policy_version 536211 (0.0013) [2024-06-15 18:34:00,675][1652475] Updated weights for policy 0, policy_version 536313 (0.0017) [2024-06-15 18:34:00,738][1648984] Fps is (10 sec: 36046.0, 60 sec: 44782.9, 300 sec: 43209.4). Total num frames: 1098350592. Throughput: 0: 10899.9. Samples: 274654208. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:00,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:34:03,480][1652475] Updated weights for policy 0, policy_version 536368 (0.0016) [2024-06-15 18:34:05,479][1652475] Updated weights for policy 0, policy_version 536448 (0.0013) [2024-06-15 18:34:05,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1098645504. Throughput: 0: 10934.1. Samples: 274685952. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:05,741][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:34:10,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1098645504. Throughput: 0: 10570.0. Samples: 274747392. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:34:12,797][1652475] Updated weights for policy 0, policy_version 536532 (0.0017) [2024-06-15 18:34:15,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 1098973184. Throughput: 0: 10763.4. Samples: 274811392. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:34:15,739][1652475] Updated weights for policy 0, policy_version 536608 (0.0012) [2024-06-15 18:34:17,259][1652475] Updated weights for policy 0, policy_version 536672 (0.0014) [2024-06-15 18:34:20,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1099169792. Throughput: 0: 10410.7. Samples: 274834944. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:34:25,661][1652475] Updated weights for policy 0, policy_version 536768 (0.0015) [2024-06-15 18:34:25,738][1648984] Fps is (10 sec: 32766.6, 60 sec: 42052.1, 300 sec: 42209.6). Total num frames: 1099300864. Throughput: 0: 10808.8. Samples: 274912768. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:25,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:34:27,696][1652475] Updated weights for policy 0, policy_version 536848 (0.0014) [2024-06-15 18:34:27,896][1651340] Signal inference workers to stop experience collection... (27600 times) [2024-06-15 18:34:27,984][1652475] InferenceWorker_p0-w0: stopping experience collection (27600 times) [2024-06-15 18:34:28,196][1651340] Signal inference workers to resume experience collection... (27600 times) [2024-06-15 18:34:28,202][1652475] InferenceWorker_p0-w0: resuming experience collection (27600 times) [2024-06-15 18:34:30,551][1652475] Updated weights for policy 0, policy_version 536944 (0.0076) [2024-06-15 18:34:30,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1099661312. Throughput: 0: 10194.5. Samples: 274955264. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:34:35,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 42052.3, 300 sec: 42209.7). Total num frames: 1099694080. Throughput: 0: 10297.0. Samples: 274991616. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:34:38,135][1652475] Updated weights for policy 0, policy_version 537013 (0.0014) [2024-06-15 18:34:39,694][1652475] Updated weights for policy 0, policy_version 537072 (0.0013) [2024-06-15 18:34:40,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40959.9, 300 sec: 42431.8). Total num frames: 1099988992. Throughput: 0: 10456.2. Samples: 275056640. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:34:41,523][1652475] Updated weights for policy 0, policy_version 537142 (0.0015) [2024-06-15 18:34:42,560][1652475] Updated weights for policy 0, policy_version 537184 (0.0014) [2024-06-15 18:34:45,738][1648984] Fps is (10 sec: 52427.1, 60 sec: 43690.4, 300 sec: 42653.9). Total num frames: 1100218368. Throughput: 0: 10262.7. Samples: 275116032. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:45,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:34:50,738][1648984] Fps is (10 sec: 26214.7, 60 sec: 37683.5, 300 sec: 41876.4). Total num frames: 1100251136. Throughput: 0: 10422.1. Samples: 275154944. Policy #0 lag: (min: 15.0, avg: 63.1, max: 268.0) [2024-06-15 18:34:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:34:50,989][1652475] Updated weights for policy 0, policy_version 537254 (0.0016) [2024-06-15 18:34:52,949][1652475] Updated weights for policy 0, policy_version 537344 (0.0014) [2024-06-15 18:34:54,614][1652475] Updated weights for policy 0, policy_version 537411 (0.0012) [2024-06-15 18:34:55,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1100742656. Throughput: 0: 10262.7. Samples: 275209216. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:34:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:34:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000537472_1100742656.pth... [2024-06-15 18:34:55,854][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000532416_1090387968.pth [2024-06-15 18:35:00,738][1648984] Fps is (10 sec: 49150.8, 60 sec: 39867.6, 300 sec: 41987.4). Total num frames: 1100742656. Throughput: 0: 10467.5. Samples: 275282432. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:00,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:35:01,943][1652475] Updated weights for policy 0, policy_version 537488 (0.0054) [2024-06-15 18:35:04,106][1652475] Updated weights for policy 0, policy_version 537568 (0.0190) [2024-06-15 18:35:05,644][1652475] Updated weights for policy 0, policy_version 537620 (0.0015) [2024-06-15 18:35:05,738][1648984] Fps is (10 sec: 29491.5, 60 sec: 39867.7, 300 sec: 42654.2). Total num frames: 1101037568. Throughput: 0: 10672.4. Samples: 275315200. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:05,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:35:06,979][1652475] Updated weights for policy 0, policy_version 537696 (0.0016) [2024-06-15 18:35:10,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1101266944. Throughput: 0: 10205.9. Samples: 275372032. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:10,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 18:35:13,257][1651340] Signal inference workers to stop experience collection... (27650 times) [2024-06-15 18:35:13,325][1652475] InferenceWorker_p0-w0: stopping experience collection (27650 times) [2024-06-15 18:35:13,579][1651340] Signal inference workers to resume experience collection... (27650 times) [2024-06-15 18:35:13,579][1652475] InferenceWorker_p0-w0: resuming experience collection (27650 times) [2024-06-15 18:35:13,770][1652475] Updated weights for policy 0, policy_version 537763 (0.0015) [2024-06-15 18:35:15,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42052.2, 300 sec: 42547.6). Total num frames: 1101496320. Throughput: 0: 10854.4. Samples: 275443712. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:15,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 18:35:15,865][1652475] Updated weights for policy 0, policy_version 537856 (0.0018) [2024-06-15 18:35:18,256][1652475] Updated weights for policy 0, policy_version 537916 (0.0013) [2024-06-15 18:35:19,743][1652475] Updated weights for policy 0, policy_version 537979 (0.0015) [2024-06-15 18:35:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1101791232. Throughput: 0: 10774.8. Samples: 275476480. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:20,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:35:25,162][1652475] Updated weights for policy 0, policy_version 538032 (0.0016) [2024-06-15 18:35:25,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43691.0, 300 sec: 42653.9). Total num frames: 1101922304. Throughput: 0: 11025.1. Samples: 275552768. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:25,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 18:35:26,323][1652475] Updated weights for policy 0, policy_version 538080 (0.0021) [2024-06-15 18:35:30,449][1652475] Updated weights for policy 0, policy_version 538146 (0.0027) [2024-06-15 18:35:30,740][1648984] Fps is (10 sec: 32759.5, 60 sec: 40958.3, 300 sec: 42431.4). Total num frames: 1102118912. Throughput: 0: 11024.5. Samples: 275612160. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:30,741][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:35:32,251][1652475] Updated weights for policy 0, policy_version 538211 (0.0013) [2024-06-15 18:35:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1102315520. Throughput: 0: 10706.5. Samples: 275636736. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:35:36,054][1652475] Updated weights for policy 0, policy_version 538257 (0.0015) [2024-06-15 18:35:37,696][1652475] Updated weights for policy 0, policy_version 538320 (0.0014) [2024-06-15 18:35:40,738][1648984] Fps is (10 sec: 45887.1, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 1102577664. Throughput: 0: 11047.8. Samples: 275706368. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:35:42,953][1652475] Updated weights for policy 0, policy_version 538423 (0.0014) [2024-06-15 18:35:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.5, 300 sec: 42320.7). Total num frames: 1102741504. Throughput: 0: 10877.2. Samples: 275771904. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:35:46,393][1652475] Updated weights for policy 0, policy_version 538486 (0.0012) [2024-06-15 18:35:48,052][1652475] Updated weights for policy 0, policy_version 538515 (0.0012) [2024-06-15 18:35:50,006][1652475] Updated weights for policy 0, policy_version 538611 (0.0085) [2024-06-15 18:35:50,740][1648984] Fps is (10 sec: 52428.5, 60 sec: 47513.5, 300 sec: 43098.3). Total num frames: 1103101952. Throughput: 0: 10877.1. Samples: 275804672. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:50,741][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:35:54,525][1652475] Updated weights for policy 0, policy_version 538661 (0.0012) [2024-06-15 18:35:55,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 1103233024. Throughput: 0: 10979.5. Samples: 275866112. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:35:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:35:58,237][1651340] Signal inference workers to stop experience collection... (27700 times) [2024-06-15 18:35:58,302][1652475] InferenceWorker_p0-w0: stopping experience collection (27700 times) [2024-06-15 18:35:58,304][1652475] Updated weights for policy 0, policy_version 538708 (0.0060) [2024-06-15 18:35:58,554][1651340] Signal inference workers to resume experience collection... (27700 times) [2024-06-15 18:35:58,558][1652475] InferenceWorker_p0-w0: resuming experience collection (27700 times) [2024-06-15 18:36:00,243][1652475] Updated weights for policy 0, policy_version 538769 (0.0012) [2024-06-15 18:36:00,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 44783.1, 300 sec: 42876.1). Total num frames: 1103429632. Throughput: 0: 10968.2. Samples: 275937280. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:36:00,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:36:02,165][1652475] Updated weights for policy 0, policy_version 538864 (0.0013) [2024-06-15 18:36:05,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44236.7, 300 sec: 42987.2). Total num frames: 1103691776. Throughput: 0: 10797.5. Samples: 275962368. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:36:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:36:06,399][1652475] Updated weights for policy 0, policy_version 538934 (0.0014) [2024-06-15 18:36:10,327][1652475] Updated weights for policy 0, policy_version 538977 (0.0114) [2024-06-15 18:36:10,738][1648984] Fps is (10 sec: 42597.2, 60 sec: 43144.4, 300 sec: 42542.8). Total num frames: 1103855616. Throughput: 0: 10729.2. Samples: 276035584. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:36:10,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:36:11,529][1652475] Updated weights for policy 0, policy_version 539024 (0.0026) [2024-06-15 18:36:13,920][1652475] Updated weights for policy 0, policy_version 539076 (0.0015) [2024-06-15 18:36:15,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 44236.7, 300 sec: 43209.3). Total num frames: 1104150528. Throughput: 0: 10752.6. Samples: 276096000. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:36:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:36:17,468][1652475] Updated weights for policy 0, policy_version 539168 (0.0023) [2024-06-15 18:36:20,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 41506.1, 300 sec: 42542.9). Total num frames: 1104281600. Throughput: 0: 10888.5. Samples: 276126720. Policy #0 lag: (min: 117.0, avg: 171.9, max: 373.0) [2024-06-15 18:36:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:36:21,765][1652475] Updated weights for policy 0, policy_version 539221 (0.0019) [2024-06-15 18:36:23,111][1652475] Updated weights for policy 0, policy_version 539283 (0.0012) [2024-06-15 18:36:25,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1104543744. Throughput: 0: 10820.2. Samples: 276193280. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:36:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:36:27,537][1652475] Updated weights for policy 0, policy_version 539364 (0.0015) [2024-06-15 18:36:29,765][1652475] Updated weights for policy 0, policy_version 539443 (0.0093) [2024-06-15 18:36:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44784.8, 300 sec: 43099.2). Total num frames: 1104805888. Throughput: 0: 10808.9. Samples: 276258304. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:36:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:36:34,117][1652475] Updated weights for policy 0, policy_version 539504 (0.0013) [2024-06-15 18:36:35,566][1652475] Updated weights for policy 0, policy_version 539579 (0.0015) [2024-06-15 18:36:35,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 1105068032. Throughput: 0: 10968.2. Samples: 276298240. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:36:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:36:39,653][1652475] Updated weights for policy 0, policy_version 539635 (0.0013) [2024-06-15 18:36:40,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1105231872. Throughput: 0: 10956.8. Samples: 276359168. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:36:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:36:41,740][1652475] Updated weights for policy 0, policy_version 539707 (0.0021) [2024-06-15 18:36:45,580][1651340] Signal inference workers to stop experience collection... (27750 times) [2024-06-15 18:36:45,643][1652475] InferenceWorker_p0-w0: stopping experience collection (27750 times) [2024-06-15 18:36:45,738][1648984] Fps is (10 sec: 26214.4, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1105330176. Throughput: 0: 10808.9. Samples: 276423680. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:36:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:36:45,802][1651340] Signal inference workers to resume experience collection... (27750 times) [2024-06-15 18:36:45,803][1652475] InferenceWorker_p0-w0: resuming experience collection (27750 times) [2024-06-15 18:36:46,829][1652475] Updated weights for policy 0, policy_version 539769 (0.0017) [2024-06-15 18:36:50,335][1652475] Updated weights for policy 0, policy_version 539832 (0.0221) [2024-06-15 18:36:50,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1105592320. Throughput: 0: 10956.8. Samples: 276455424. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:36:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:36:51,809][1652475] Updated weights for policy 0, policy_version 539897 (0.0015) [2024-06-15 18:36:54,352][1652475] Updated weights for policy 0, policy_version 539953 (0.0019) [2024-06-15 18:36:55,741][1648984] Fps is (10 sec: 52413.6, 60 sec: 43688.7, 300 sec: 43098.0). Total num frames: 1105854464. Throughput: 0: 10546.6. Samples: 276510208. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:36:55,741][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:36:55,758][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000539968_1105854464.pth... [2024-06-15 18:36:55,799][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000534912_1095499776.pth [2024-06-15 18:36:58,792][1652475] Updated weights for policy 0, policy_version 540000 (0.0123) [2024-06-15 18:37:00,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.3, 300 sec: 42653.9). Total num frames: 1105985536. Throughput: 0: 10729.3. Samples: 276578816. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:37:02,676][1652475] Updated weights for policy 0, policy_version 540090 (0.0015) [2024-06-15 18:37:04,430][1652475] Updated weights for policy 0, policy_version 540144 (0.0011) [2024-06-15 18:37:05,738][1648984] Fps is (10 sec: 42610.5, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1106280448. Throughput: 0: 10774.8. Samples: 276611584. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:37:06,540][1652475] Updated weights for policy 0, policy_version 540223 (0.0014) [2024-06-15 18:37:10,571][1652475] Updated weights for policy 0, policy_version 540288 (0.0013) [2024-06-15 18:37:10,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 44236.9, 300 sec: 42987.2). Total num frames: 1106509824. Throughput: 0: 10786.1. Samples: 276678656. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:10,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 18:37:14,203][1652475] Updated weights for policy 0, policy_version 540346 (0.0011) [2024-06-15 18:37:15,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42598.6, 300 sec: 43320.4). Total num frames: 1106706432. Throughput: 0: 10763.4. Samples: 276742656. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:15,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:37:16,134][1652475] Updated weights for policy 0, policy_version 540400 (0.0014) [2024-06-15 18:37:19,886][1652475] Updated weights for policy 0, policy_version 540464 (0.0014) [2024-06-15 18:37:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1106903040. Throughput: 0: 10592.7. Samples: 276774912. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:37:21,324][1652475] Updated weights for policy 0, policy_version 540512 (0.0012) [2024-06-15 18:37:25,749][1652475] Updated weights for policy 0, policy_version 540607 (0.0017) [2024-06-15 18:37:25,751][1648984] Fps is (10 sec: 42540.3, 60 sec: 43134.8, 300 sec: 42985.2). Total num frames: 1107132416. Throughput: 0: 10760.1. Samples: 276843520. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:25,752][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:37:28,365][1652475] Updated weights for policy 0, policy_version 540672 (0.0014) [2024-06-15 18:37:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 1107296256. Throughput: 0: 10706.5. Samples: 276905472. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:37:33,463][1651340] Signal inference workers to stop experience collection... (27800 times) [2024-06-15 18:37:33,485][1652475] InferenceWorker_p0-w0: stopping experience collection (27800 times) [2024-06-15 18:37:33,684][1651340] Signal inference workers to resume experience collection... (27800 times) [2024-06-15 18:37:33,686][1652475] InferenceWorker_p0-w0: resuming experience collection (27800 times) [2024-06-15 18:37:33,688][1652475] Updated weights for policy 0, policy_version 540752 (0.0015) [2024-06-15 18:37:35,738][1648984] Fps is (10 sec: 42656.4, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1107558400. Throughput: 0: 10763.4. Samples: 276939776. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:37:37,104][1652475] Updated weights for policy 0, policy_version 540819 (0.0013) [2024-06-15 18:37:39,267][1652475] Updated weights for policy 0, policy_version 540881 (0.0109) [2024-06-15 18:37:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43144.5, 300 sec: 43542.5). Total num frames: 1107820544. Throughput: 0: 10923.3. Samples: 277001728. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:37:45,385][1652475] Updated weights for policy 0, policy_version 540983 (0.0019) [2024-06-15 18:37:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1107951616. Throughput: 0: 10695.1. Samples: 277060096. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:37:46,834][1652475] Updated weights for policy 0, policy_version 541040 (0.0085) [2024-06-15 18:37:50,545][1652475] Updated weights for policy 0, policy_version 541113 (0.0012) [2024-06-15 18:37:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1108213760. Throughput: 0: 10706.5. Samples: 277093376. Policy #0 lag: (min: 47.0, avg: 135.6, max: 303.0) [2024-06-15 18:37:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:37:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41508.1, 300 sec: 42987.2). Total num frames: 1108344832. Throughput: 0: 10467.6. Samples: 277149696. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:37:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:37:57,381][1652475] Updated weights for policy 0, policy_version 541216 (0.0013) [2024-06-15 18:38:00,741][1648984] Fps is (10 sec: 26205.0, 60 sec: 41503.7, 300 sec: 42209.1). Total num frames: 1108475904. Throughput: 0: 10603.2. Samples: 277219840. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:00,742][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:38:00,872][1652475] Updated weights for policy 0, policy_version 541264 (0.0017) [2024-06-15 18:38:02,836][1652475] Updated weights for policy 0, policy_version 541329 (0.0011) [2024-06-15 18:38:03,986][1652475] Updated weights for policy 0, policy_version 541381 (0.0052) [2024-06-15 18:38:05,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 1108869120. Throughput: 0: 10410.7. Samples: 277243392. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:38:09,467][1652475] Updated weights for policy 0, policy_version 541456 (0.0143) [2024-06-15 18:38:10,738][1648984] Fps is (10 sec: 52448.0, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1109000192. Throughput: 0: 10470.7. Samples: 277314560. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:38:12,596][1652475] Updated weights for policy 0, policy_version 541520 (0.0113) [2024-06-15 18:38:15,163][1652475] Updated weights for policy 0, policy_version 541600 (0.0108) [2024-06-15 18:38:15,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 42052.1, 300 sec: 42987.2). Total num frames: 1109229568. Throughput: 0: 10387.9. Samples: 277372928. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:15,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:38:17,271][1652475] Updated weights for policy 0, policy_version 541696 (0.0147) [2024-06-15 18:38:20,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 1109393408. Throughput: 0: 10251.4. Samples: 277401088. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 18:38:22,077][1651340] Signal inference workers to stop experience collection... (27850 times) [2024-06-15 18:38:22,113][1652475] InferenceWorker_p0-w0: stopping experience collection (27850 times) [2024-06-15 18:38:22,351][1651340] Signal inference workers to resume experience collection... (27850 times) [2024-06-15 18:38:22,351][1652475] InferenceWorker_p0-w0: resuming experience collection (27850 times) [2024-06-15 18:38:25,227][1652475] Updated weights for policy 0, policy_version 541768 (0.0019) [2024-06-15 18:38:25,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40423.0, 300 sec: 42320.7). Total num frames: 1109557248. Throughput: 0: 10615.5. Samples: 277479424. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:25,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:38:27,389][1652475] Updated weights for policy 0, policy_version 541861 (0.0016) [2024-06-15 18:38:29,087][1652475] Updated weights for policy 0, policy_version 541946 (0.0015) [2024-06-15 18:38:30,738][1648984] Fps is (10 sec: 52426.6, 60 sec: 43690.3, 300 sec: 43209.3). Total num frames: 1109917696. Throughput: 0: 10638.1. Samples: 277538816. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:30,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:38:33,920][1652475] Updated weights for policy 0, policy_version 542016 (0.0014) [2024-06-15 18:38:35,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 1110048768. Throughput: 0: 10706.5. Samples: 277575168. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:38:38,758][1652475] Updated weights for policy 0, policy_version 542096 (0.0114) [2024-06-15 18:38:40,536][1652475] Updated weights for policy 0, policy_version 542176 (0.0014) [2024-06-15 18:38:40,738][1648984] Fps is (10 sec: 45876.2, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 1110376448. Throughput: 0: 10808.8. Samples: 277636096. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:40,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:38:45,329][1652475] Updated weights for policy 0, policy_version 542242 (0.0013) [2024-06-15 18:38:45,746][1648984] Fps is (10 sec: 49110.4, 60 sec: 43138.4, 300 sec: 42541.7). Total num frames: 1110540288. Throughput: 0: 10705.3. Samples: 277701632. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:45,747][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:38:49,439][1652475] Updated weights for policy 0, policy_version 542304 (0.0015) [2024-06-15 18:38:50,738][1648984] Fps is (10 sec: 36045.7, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1110736896. Throughput: 0: 10990.9. Samples: 277737984. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:38:51,204][1652475] Updated weights for policy 0, policy_version 542369 (0.0116) [2024-06-15 18:38:53,793][1652475] Updated weights for policy 0, policy_version 542457 (0.0014) [2024-06-15 18:38:55,738][1648984] Fps is (10 sec: 42632.7, 60 sec: 43690.3, 300 sec: 42765.0). Total num frames: 1110966272. Throughput: 0: 10626.7. Samples: 277792768. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:38:55,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:38:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000542464_1110966272.pth... [2024-06-15 18:38:55,793][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000537472_1100742656.pth [2024-06-15 18:38:57,691][1652475] Updated weights for policy 0, policy_version 542519 (0.0014) [2024-06-15 18:39:00,762][1648984] Fps is (10 sec: 35957.0, 60 sec: 43675.5, 300 sec: 42206.1). Total num frames: 1111097344. Throughput: 0: 11007.7. Samples: 277868544. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:39:00,763][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:39:02,867][1652475] Updated weights for policy 0, policy_version 542592 (0.0014) [2024-06-15 18:39:05,348][1651340] Signal inference workers to stop experience collection... (27900 times) [2024-06-15 18:39:05,426][1652475] InferenceWorker_p0-w0: stopping experience collection (27900 times) [2024-06-15 18:39:05,705][1651340] Signal inference workers to resume experience collection... (27900 times) [2024-06-15 18:39:05,706][1652475] InferenceWorker_p0-w0: resuming experience collection (27900 times) [2024-06-15 18:39:05,738][1648984] Fps is (10 sec: 45877.1, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 1111425024. Throughput: 0: 11093.3. Samples: 277900288. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:39:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:39:05,906][1652475] Updated weights for policy 0, policy_version 542692 (0.0132) [2024-06-15 18:39:09,415][1652475] Updated weights for policy 0, policy_version 542768 (0.0015) [2024-06-15 18:39:10,739][1648984] Fps is (10 sec: 52549.4, 60 sec: 43689.6, 300 sec: 42875.9). Total num frames: 1111621632. Throughput: 0: 10569.6. Samples: 277955072. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:39:10,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:39:14,788][1652475] Updated weights for policy 0, policy_version 542832 (0.0013) [2024-06-15 18:39:15,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 1111785472. Throughput: 0: 10831.7. Samples: 278026240. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:39:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 18:39:16,610][1652475] Updated weights for policy 0, policy_version 542906 (0.0012) [2024-06-15 18:39:19,254][1652475] Updated weights for policy 0, policy_version 542967 (0.0012) [2024-06-15 18:39:20,738][1648984] Fps is (10 sec: 42604.4, 60 sec: 44236.8, 300 sec: 43209.4). Total num frames: 1112047616. Throughput: 0: 10638.2. Samples: 278053888. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:39:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 18:39:21,349][1652475] Updated weights for policy 0, policy_version 543033 (0.0016) [2024-06-15 18:39:25,738][1648984] Fps is (10 sec: 36044.2, 60 sec: 43144.4, 300 sec: 42320.7). Total num frames: 1112145920. Throughput: 0: 10797.5. Samples: 278121984. Policy #0 lag: (min: 47.0, avg: 180.3, max: 303.0) [2024-06-15 18:39:25,739][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 18:39:27,005][1652475] Updated weights for policy 0, policy_version 543102 (0.0122) [2024-06-15 18:39:30,228][1652475] Updated weights for policy 0, policy_version 543172 (0.0014) [2024-06-15 18:39:30,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42052.6, 300 sec: 43209.3). Total num frames: 1112440832. Throughput: 0: 10697.1. Samples: 278182912. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:39:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:39:31,363][1652475] Updated weights for policy 0, policy_version 543232 (0.0013) [2024-06-15 18:39:35,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1112670208. Throughput: 0: 10661.0. Samples: 278217728. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:39:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:39:37,907][1652475] Updated weights for policy 0, policy_version 543299 (0.0015) [2024-06-15 18:39:39,636][1652475] Updated weights for policy 0, policy_version 543366 (0.0080) [2024-06-15 18:39:40,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42052.5, 300 sec: 42987.2). Total num frames: 1112899584. Throughput: 0: 10831.8. Samples: 278280192. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:39:40,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:39:42,374][1652475] Updated weights for policy 0, policy_version 543430 (0.0016) [2024-06-15 18:39:43,643][1652475] Updated weights for policy 0, policy_version 543488 (0.0111) [2024-06-15 18:39:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42058.2, 300 sec: 43431.5). Total num frames: 1113063424. Throughput: 0: 10609.9. Samples: 278345728. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:39:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:39:48,655][1652475] Updated weights for policy 0, policy_version 543548 (0.0015) [2024-06-15 18:39:50,561][1652475] Updated weights for policy 0, policy_version 543616 (0.0013) [2024-06-15 18:39:50,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43144.6, 300 sec: 42654.0). Total num frames: 1113325568. Throughput: 0: 10683.7. Samples: 278381056. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:39:50,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:39:51,512][1651340] Signal inference workers to stop experience collection... (27950 times) [2024-06-15 18:39:51,575][1652475] InferenceWorker_p0-w0: stopping experience collection (27950 times) [2024-06-15 18:39:51,717][1651340] Signal inference workers to resume experience collection... (27950 times) [2024-06-15 18:39:51,718][1652475] InferenceWorker_p0-w0: resuming experience collection (27950 times) [2024-06-15 18:39:53,896][1652475] Updated weights for policy 0, policy_version 543681 (0.0014) [2024-06-15 18:39:55,236][1652475] Updated weights for policy 0, policy_version 543744 (0.0013) [2024-06-15 18:39:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43691.0, 300 sec: 43542.6). Total num frames: 1113587712. Throughput: 0: 10741.0. Samples: 278438400. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:39:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:40:00,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 42615.7, 300 sec: 42765.0). Total num frames: 1113653248. Throughput: 0: 10820.3. Samples: 278513152. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:40:01,134][1652475] Updated weights for policy 0, policy_version 543801 (0.0019) [2024-06-15 18:40:02,445][1652475] Updated weights for policy 0, policy_version 543846 (0.0013) [2024-06-15 18:40:03,759][1652475] Updated weights for policy 0, policy_version 543905 (0.0016) [2024-06-15 18:40:05,344][1652475] Updated weights for policy 0, policy_version 543959 (0.0014) [2024-06-15 18:40:05,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1114046464. Throughput: 0: 10820.3. Samples: 278540800. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:40:06,083][1652475] Updated weights for policy 0, policy_version 543999 (0.0012) [2024-06-15 18:40:10,740][1648984] Fps is (10 sec: 45862.9, 60 sec: 41505.3, 300 sec: 42764.6). Total num frames: 1114112000. Throughput: 0: 11001.7. Samples: 278617088. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:10,741][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:40:12,138][1652475] Updated weights for policy 0, policy_version 544064 (0.0013) [2024-06-15 18:40:15,023][1652475] Updated weights for policy 0, policy_version 544144 (0.0014) [2024-06-15 18:40:15,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44783.0, 300 sec: 42987.2). Total num frames: 1114472448. Throughput: 0: 10956.8. Samples: 278675968. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:40:16,010][1652475] Updated weights for policy 0, policy_version 544190 (0.0014) [2024-06-15 18:40:17,451][1652475] Updated weights for policy 0, policy_version 544256 (0.0027) [2024-06-15 18:40:20,738][1648984] Fps is (10 sec: 52443.5, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 1114636288. Throughput: 0: 10934.0. Samples: 278709760. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:40:23,781][1652475] Updated weights for policy 0, policy_version 544320 (0.0013) [2024-06-15 18:40:25,641][1652475] Updated weights for policy 0, policy_version 544381 (0.0022) [2024-06-15 18:40:25,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 45875.2, 300 sec: 43320.8). Total num frames: 1114898432. Throughput: 0: 11059.1. Samples: 278777856. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:40:27,856][1652475] Updated weights for policy 0, policy_version 544435 (0.0014) [2024-06-15 18:40:30,503][1652475] Updated weights for policy 0, policy_version 544512 (0.0015) [2024-06-15 18:40:30,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45329.1, 300 sec: 43542.6). Total num frames: 1115160576. Throughput: 0: 10911.3. Samples: 278836736. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:40:35,472][1652475] Updated weights for policy 0, policy_version 544571 (0.0029) [2024-06-15 18:40:35,755][1648984] Fps is (10 sec: 39255.5, 60 sec: 43678.3, 300 sec: 43095.8). Total num frames: 1115291648. Throughput: 0: 11043.6. Samples: 278878208. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:35,756][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 18:40:37,251][1652475] Updated weights for policy 0, policy_version 544610 (0.0013) [2024-06-15 18:40:38,504][1652475] Updated weights for policy 0, policy_version 544646 (0.0011) [2024-06-15 18:40:38,781][1651340] Signal inference workers to stop experience collection... (28000 times) [2024-06-15 18:40:38,842][1652475] InferenceWorker_p0-w0: stopping experience collection (28000 times) [2024-06-15 18:40:38,987][1651340] Signal inference workers to resume experience collection... (28000 times) [2024-06-15 18:40:38,988][1652475] InferenceWorker_p0-w0: resuming experience collection (28000 times) [2024-06-15 18:40:39,532][1652475] Updated weights for policy 0, policy_version 544704 (0.0015) [2024-06-15 18:40:40,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 44236.7, 300 sec: 43431.5). Total num frames: 1115553792. Throughput: 0: 11104.7. Samples: 278938112. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:40:45,738][1648984] Fps is (10 sec: 39388.3, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1115684864. Throughput: 0: 10865.8. Samples: 279002112. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:40:46,289][1652475] Updated weights for policy 0, policy_version 544773 (0.0016) [2024-06-15 18:40:47,452][1652475] Updated weights for policy 0, policy_version 544832 (0.0012) [2024-06-15 18:40:50,738][1648984] Fps is (10 sec: 36043.7, 60 sec: 43144.2, 300 sec: 42987.1). Total num frames: 1115914240. Throughput: 0: 10979.5. Samples: 279034880. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:50,739][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 18:40:50,872][1652475] Updated weights for policy 0, policy_version 544882 (0.0016) [2024-06-15 18:40:52,673][1652475] Updated weights for policy 0, policy_version 544957 (0.0083) [2024-06-15 18:40:55,121][1652475] Updated weights for policy 0, policy_version 545018 (0.0014) [2024-06-15 18:40:55,738][1648984] Fps is (10 sec: 52427.1, 60 sec: 43690.4, 300 sec: 43320.3). Total num frames: 1116209152. Throughput: 0: 10604.6. Samples: 279094272. Policy #0 lag: (min: 15.0, avg: 89.3, max: 271.0) [2024-06-15 18:40:55,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:40:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000545024_1116209152.pth... [2024-06-15 18:40:55,823][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000539968_1105854464.pth [2024-06-15 18:40:59,884][1652475] Updated weights for policy 0, policy_version 545088 (0.0034) [2024-06-15 18:41:00,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 44782.8, 300 sec: 42876.1). Total num frames: 1116340224. Throughput: 0: 10740.6. Samples: 279159296. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:41:03,362][1652475] Updated weights for policy 0, policy_version 545150 (0.0123) [2024-06-15 18:41:05,488][1652475] Updated weights for policy 0, policy_version 545216 (0.0020) [2024-06-15 18:41:05,738][1648984] Fps is (10 sec: 39323.0, 60 sec: 42598.4, 300 sec: 43209.4). Total num frames: 1116602368. Throughput: 0: 10638.2. Samples: 279188480. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:41:07,664][1652475] Updated weights for policy 0, policy_version 545273 (0.0121) [2024-06-15 18:41:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43692.5, 300 sec: 42653.9). Total num frames: 1116733440. Throughput: 0: 10558.6. Samples: 279252992. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:10,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:41:11,305][1652475] Updated weights for policy 0, policy_version 545328 (0.0013) [2024-06-15 18:41:15,536][1652475] Updated weights for policy 0, policy_version 545398 (0.0013) [2024-06-15 18:41:15,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 1116962816. Throughput: 0: 10808.9. Samples: 279323136. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:41:17,033][1652475] Updated weights for policy 0, policy_version 545447 (0.0215) [2024-06-15 18:41:19,215][1652475] Updated weights for policy 0, policy_version 545504 (0.0026) [2024-06-15 18:41:20,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 1117257728. Throughput: 0: 10460.1. Samples: 279348736. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:20,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:41:23,030][1652475] Updated weights for policy 0, policy_version 545552 (0.0019) [2024-06-15 18:41:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1117388800. Throughput: 0: 10592.7. Samples: 279414784. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:25,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:41:27,086][1652475] Updated weights for policy 0, policy_version 545616 (0.0106) [2024-06-15 18:41:28,499][1651340] Signal inference workers to stop experience collection... (28050 times) [2024-06-15 18:41:28,604][1652475] InferenceWorker_p0-w0: stopping experience collection (28050 times) [2024-06-15 18:41:28,605][1652475] Updated weights for policy 0, policy_version 545669 (0.0014) [2024-06-15 18:41:28,813][1651340] Signal inference workers to resume experience collection... (28050 times) [2024-06-15 18:41:28,814][1652475] InferenceWorker_p0-w0: resuming experience collection (28050 times) [2024-06-15 18:41:30,030][1652475] Updated weights for policy 0, policy_version 545728 (0.0017) [2024-06-15 18:41:30,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 1117683712. Throughput: 0: 10695.1. Samples: 279483392. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:41:34,316][1652475] Updated weights for policy 0, policy_version 545793 (0.0144) [2024-06-15 18:41:35,335][1652475] Updated weights for policy 0, policy_version 545847 (0.0029) [2024-06-15 18:41:35,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43703.0, 300 sec: 42987.2). Total num frames: 1117913088. Throughput: 0: 10683.8. Samples: 279515648. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:41:39,152][1652475] Updated weights for policy 0, policy_version 545913 (0.0015) [2024-06-15 18:41:40,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 41506.2, 300 sec: 43098.2). Total num frames: 1118044160. Throughput: 0: 10786.2. Samples: 279579648. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:40,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 18:41:42,602][1652475] Updated weights for policy 0, policy_version 545972 (0.0014) [2024-06-15 18:41:45,740][1648984] Fps is (10 sec: 39314.0, 60 sec: 43689.3, 300 sec: 43098.0). Total num frames: 1118306304. Throughput: 0: 10808.5. Samples: 279645696. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:45,740][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:41:47,075][1652475] Updated weights for policy 0, policy_version 546064 (0.0013) [2024-06-15 18:41:48,122][1652475] Updated weights for policy 0, policy_version 546112 (0.0012) [2024-06-15 18:41:50,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43144.9, 300 sec: 42876.5). Total num frames: 1118502912. Throughput: 0: 10899.9. Samples: 279678976. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:41:53,179][1652475] Updated weights for policy 0, policy_version 546192 (0.0013) [2024-06-15 18:41:55,135][1652475] Updated weights for policy 0, policy_version 546275 (0.0014) [2024-06-15 18:41:55,738][1648984] Fps is (10 sec: 52439.4, 60 sec: 43691.0, 300 sec: 43542.6). Total num frames: 1118830592. Throughput: 0: 10888.6. Samples: 279742976. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:41:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:42:00,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 41506.3, 300 sec: 42542.9). Total num frames: 1118830592. Throughput: 0: 10888.5. Samples: 279813120. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:42:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:42:01,278][1652475] Updated weights for policy 0, policy_version 546324 (0.0014) [2024-06-15 18:42:03,863][1652475] Updated weights for policy 0, policy_version 546416 (0.0220) [2024-06-15 18:42:05,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1119158272. Throughput: 0: 10945.5. Samples: 279841280. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:42:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:42:06,272][1652475] Updated weights for policy 0, policy_version 546496 (0.0012) [2024-06-15 18:42:10,742][1648984] Fps is (10 sec: 52404.4, 60 sec: 43687.4, 300 sec: 42875.4). Total num frames: 1119354880. Throughput: 0: 10739.5. Samples: 279898112. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:42:10,743][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 18:42:13,841][1652475] Updated weights for policy 0, policy_version 546577 (0.0026) [2024-06-15 18:42:14,683][1651340] Signal inference workers to stop experience collection... (28100 times) [2024-06-15 18:42:14,715][1652475] InferenceWorker_p0-w0: stopping experience collection (28100 times) [2024-06-15 18:42:14,786][1651340] Signal inference workers to resume experience collection... (28100 times) [2024-06-15 18:42:14,787][1652475] InferenceWorker_p0-w0: resuming experience collection (28100 times) [2024-06-15 18:42:15,681][1652475] Updated weights for policy 0, policy_version 546656 (0.0013) [2024-06-15 18:42:15,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 1119551488. Throughput: 0: 10797.5. Samples: 279969280. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:42:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 18:42:17,088][1652475] Updated weights for policy 0, policy_version 546704 (0.0011) [2024-06-15 18:42:18,345][1652475] Updated weights for policy 0, policy_version 546768 (0.0042) [2024-06-15 18:42:20,738][1648984] Fps is (10 sec: 52452.9, 60 sec: 43690.8, 300 sec: 43211.3). Total num frames: 1119879168. Throughput: 0: 10786.1. Samples: 280001024. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:42:20,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 18:42:24,930][1652475] Updated weights for policy 0, policy_version 546820 (0.0029) [2024-06-15 18:42:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1119944704. Throughput: 0: 10968.2. Samples: 280073216. Policy #0 lag: (min: 7.0, avg: 108.1, max: 263.0) [2024-06-15 18:42:25,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 18:42:26,665][1652475] Updated weights for policy 0, policy_version 546887 (0.0013) [2024-06-15 18:42:28,432][1652475] Updated weights for policy 0, policy_version 546961 (0.0014) [2024-06-15 18:42:29,802][1652475] Updated weights for policy 0, policy_version 547040 (0.0031) [2024-06-15 18:42:30,738][1648984] Fps is (10 sec: 52426.6, 60 sec: 45328.8, 300 sec: 43542.5). Total num frames: 1120403456. Throughput: 0: 10843.4. Samples: 280133632. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:42:30,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 18:42:35,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 41506.2, 300 sec: 42654.0). Total num frames: 1120403456. Throughput: 0: 10968.2. Samples: 280172544. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:42:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 18:42:36,352][1652475] Updated weights for policy 0, policy_version 547088 (0.0012) [2024-06-15 18:42:38,518][1652475] Updated weights for policy 0, policy_version 547184 (0.0122) [2024-06-15 18:42:40,738][1648984] Fps is (10 sec: 39323.5, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 1120796672. Throughput: 0: 11002.3. Samples: 280238080. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:42:40,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:42:42,422][1652475] Updated weights for policy 0, policy_version 547267 (0.0014) [2024-06-15 18:42:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43692.2, 300 sec: 43098.3). Total num frames: 1120927744. Throughput: 0: 10899.9. Samples: 280303616. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:42:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:42:47,856][1652475] Updated weights for policy 0, policy_version 547346 (0.0017) [2024-06-15 18:42:49,357][1652475] Updated weights for policy 0, policy_version 547424 (0.0069) [2024-06-15 18:42:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 1121189888. Throughput: 0: 11127.5. Samples: 280342016. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:42:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:42:50,855][1652475] Updated weights for policy 0, policy_version 547472 (0.0014) [2024-06-15 18:42:51,946][1652475] Updated weights for policy 0, policy_version 547520 (0.0028) [2024-06-15 18:42:55,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42598.4, 300 sec: 43765.3). Total num frames: 1121386496. Throughput: 0: 11310.7. Samples: 280407040. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:42:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:42:56,276][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000547568_1121419264.pth... [2024-06-15 18:42:56,322][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000542464_1110966272.pth [2024-06-15 18:42:56,477][1652475] Updated weights for policy 0, policy_version 547572 (0.0028) [2024-06-15 18:43:00,108][1651340] Signal inference workers to stop experience collection... (28150 times) [2024-06-15 18:43:00,128][1652475] Updated weights for policy 0, policy_version 547604 (0.0014) [2024-06-15 18:43:00,164][1652475] InferenceWorker_p0-w0: stopping experience collection (28150 times) [2024-06-15 18:43:00,317][1651340] Signal inference workers to resume experience collection... (28150 times) [2024-06-15 18:43:00,318][1652475] InferenceWorker_p0-w0: resuming experience collection (28150 times) [2024-06-15 18:43:00,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 45329.0, 300 sec: 42987.2). Total num frames: 1121550336. Throughput: 0: 11366.4. Samples: 280480768. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:00,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 18:43:01,588][1652475] Updated weights for policy 0, policy_version 547680 (0.0016) [2024-06-15 18:43:03,523][1652475] Updated weights for policy 0, policy_version 547760 (0.0056) [2024-06-15 18:43:05,738][1648984] Fps is (10 sec: 45873.7, 60 sec: 44782.7, 300 sec: 43542.5). Total num frames: 1121845248. Throughput: 0: 11184.3. Samples: 280504320. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:05,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:43:07,938][1652475] Updated weights for policy 0, policy_version 547837 (0.0030) [2024-06-15 18:43:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43694.0, 300 sec: 43209.3). Total num frames: 1121976320. Throughput: 0: 11070.6. Samples: 280571392. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:43:12,221][1652475] Updated weights for policy 0, policy_version 547888 (0.0015) [2024-06-15 18:43:13,694][1652475] Updated weights for policy 0, policy_version 547966 (0.0014) [2024-06-15 18:43:15,329][1652475] Updated weights for policy 0, policy_version 548029 (0.0013) [2024-06-15 18:43:15,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 46967.4, 300 sec: 43986.9). Total num frames: 1122369536. Throughput: 0: 11173.1. Samples: 280636416. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:43:18,960][1652475] Updated weights for policy 0, policy_version 548080 (0.0018) [2024-06-15 18:43:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 1122500608. Throughput: 0: 11116.1. Samples: 280672768. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:43:23,743][1652475] Updated weights for policy 0, policy_version 548144 (0.0015) [2024-06-15 18:43:25,738][1648984] Fps is (10 sec: 39319.5, 60 sec: 46967.0, 300 sec: 43542.5). Total num frames: 1122762752. Throughput: 0: 11241.1. Samples: 280743936. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:25,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:43:26,445][1652475] Updated weights for policy 0, policy_version 548240 (0.0012) [2024-06-15 18:43:28,735][1652475] Updated weights for policy 0, policy_version 548291 (0.0013) [2024-06-15 18:43:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43691.0, 300 sec: 43986.9). Total num frames: 1123024896. Throughput: 0: 11081.9. Samples: 280802304. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:43:34,624][1652475] Updated weights for policy 0, policy_version 548353 (0.0015) [2024-06-15 18:43:35,738][1648984] Fps is (10 sec: 39323.9, 60 sec: 45875.1, 300 sec: 43320.4). Total num frames: 1123155968. Throughput: 0: 11207.1. Samples: 280846336. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:35,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 18:43:37,422][1652475] Updated weights for policy 0, policy_version 548448 (0.0014) [2024-06-15 18:43:39,621][1652475] Updated weights for policy 0, policy_version 548541 (0.0015) [2024-06-15 18:43:40,740][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 43654.9). Total num frames: 1123418112. Throughput: 0: 11002.3. Samples: 280902144. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:40,741][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:43:40,764][1651340] Signal inference workers to stop experience collection... (28200 times) [2024-06-15 18:43:40,812][1652475] InferenceWorker_p0-w0: stopping experience collection (28200 times) [2024-06-15 18:43:41,122][1651340] Signal inference workers to resume experience collection... (28200 times) [2024-06-15 18:43:41,123][1652475] InferenceWorker_p0-w0: resuming experience collection (28200 times) [2024-06-15 18:43:42,166][1652475] Updated weights for policy 0, policy_version 548599 (0.0012) [2024-06-15 18:43:45,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 43690.4, 300 sec: 43431.4). Total num frames: 1123549184. Throughput: 0: 10956.7. Samples: 280973824. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:43:47,148][1652475] Updated weights for policy 0, policy_version 548627 (0.0012) [2024-06-15 18:43:47,911][1652475] Updated weights for policy 0, policy_version 548672 (0.0013) [2024-06-15 18:43:49,583][1652475] Updated weights for policy 0, policy_version 548736 (0.0041) [2024-06-15 18:43:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1123811328. Throughput: 0: 11207.2. Samples: 281008640. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:50,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:43:53,498][1652475] Updated weights for policy 0, policy_version 548832 (0.0027) [2024-06-15 18:43:55,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 44782.9, 300 sec: 43990.5). Total num frames: 1124073472. Throughput: 0: 10820.3. Samples: 281058304. Policy #0 lag: (min: 37.0, avg: 164.6, max: 293.0) [2024-06-15 18:43:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:43:59,136][1652475] Updated weights for policy 0, policy_version 548880 (0.0013) [2024-06-15 18:44:00,224][1652475] Updated weights for policy 0, policy_version 548928 (0.0015) [2024-06-15 18:44:00,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 44782.8, 300 sec: 43431.5). Total num frames: 1124237312. Throughput: 0: 11127.4. Samples: 281137152. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:00,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:44:01,736][1652475] Updated weights for policy 0, policy_version 548988 (0.0016) [2024-06-15 18:44:05,317][1652475] Updated weights for policy 0, policy_version 549059 (0.0114) [2024-06-15 18:44:05,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44237.0, 300 sec: 43653.9). Total num frames: 1124499456. Throughput: 0: 11070.6. Samples: 281170944. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:05,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 18:44:06,575][1652475] Updated weights for policy 0, policy_version 549111 (0.0016) [2024-06-15 18:44:10,738][1648984] Fps is (10 sec: 36045.6, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1124597760. Throughput: 0: 10945.6. Samples: 281236480. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:44:12,029][1652475] Updated weights for policy 0, policy_version 549168 (0.0029) [2024-06-15 18:44:13,935][1652475] Updated weights for policy 0, policy_version 549248 (0.0031) [2024-06-15 18:44:15,758][1648984] Fps is (10 sec: 39241.4, 60 sec: 42038.0, 300 sec: 43539.6). Total num frames: 1124892672. Throughput: 0: 11054.2. Samples: 281299968. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:15,759][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 18:44:17,422][1652475] Updated weights for policy 0, policy_version 549328 (0.0012) [2024-06-15 18:44:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1125122048. Throughput: 0: 10524.5. Samples: 281319936. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:44:23,356][1652475] Updated weights for policy 0, policy_version 549384 (0.0012) [2024-06-15 18:44:25,738][1648984] Fps is (10 sec: 36118.6, 60 sec: 41506.5, 300 sec: 43431.5). Total num frames: 1125253120. Throughput: 0: 10763.4. Samples: 281386496. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:44:26,730][1652475] Updated weights for policy 0, policy_version 549456 (0.0015) [2024-06-15 18:44:27,817][1651340] Signal inference workers to stop experience collection... (28250 times) [2024-06-15 18:44:27,940][1652475] InferenceWorker_p0-w0: stopping experience collection (28250 times) [2024-06-15 18:44:27,966][1651340] Signal inference workers to resume experience collection... (28250 times) [2024-06-15 18:44:28,004][1652475] InferenceWorker_p0-w0: resuming experience collection (28250 times) [2024-06-15 18:44:28,169][1652475] Updated weights for policy 0, policy_version 549507 (0.0101) [2024-06-15 18:44:29,633][1652475] Updated weights for policy 0, policy_version 549568 (0.0021) [2024-06-15 18:44:30,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.5, 300 sec: 43764.7). Total num frames: 1125580800. Throughput: 0: 10615.5. Samples: 281451520. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:44:35,182][1652475] Updated weights for policy 0, policy_version 549635 (0.0105) [2024-06-15 18:44:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 1125679104. Throughput: 0: 10524.5. Samples: 281482240. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:44:39,460][1652475] Updated weights for policy 0, policy_version 549699 (0.0012) [2024-06-15 18:44:40,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 40960.1, 300 sec: 43431.5). Total num frames: 1125875712. Throughput: 0: 11070.6. Samples: 281556480. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:44:41,417][1652475] Updated weights for policy 0, policy_version 549776 (0.0013) [2024-06-15 18:44:43,460][1652475] Updated weights for policy 0, policy_version 549860 (0.0014) [2024-06-15 18:44:45,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 1126170624. Throughput: 0: 10444.9. Samples: 281607168. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:44:48,117][1652475] Updated weights for policy 0, policy_version 549924 (0.0013) [2024-06-15 18:44:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1126301696. Throughput: 0: 10444.8. Samples: 281640960. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:50,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 18:44:51,658][1652475] Updated weights for policy 0, policy_version 549987 (0.0015) [2024-06-15 18:44:53,226][1652475] Updated weights for policy 0, policy_version 550048 (0.0107) [2024-06-15 18:44:54,531][1652475] Updated weights for policy 0, policy_version 550103 (0.0017) [2024-06-15 18:44:55,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 1126694912. Throughput: 0: 10444.8. Samples: 281706496. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:44:55,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:44:55,747][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000550144_1126694912.pth... [2024-06-15 18:44:55,795][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000545024_1116209152.pth [2024-06-15 18:45:00,231][1652475] Updated weights for policy 0, policy_version 550192 (0.0022) [2024-06-15 18:45:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43144.7, 300 sec: 43320.4). Total num frames: 1126825984. Throughput: 0: 10472.3. Samples: 281771008. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:45:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:45:03,664][1652475] Updated weights for policy 0, policy_version 550257 (0.0013) [2024-06-15 18:45:05,331][1652475] Updated weights for policy 0, policy_version 550325 (0.0016) [2024-06-15 18:45:05,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 43144.3, 300 sec: 43987.2). Total num frames: 1127088128. Throughput: 0: 10877.1. Samples: 281809408. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:45:05,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:45:10,611][1652475] Updated weights for policy 0, policy_version 550397 (0.0016) [2024-06-15 18:45:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1127219200. Throughput: 0: 10706.5. Samples: 281868288. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:45:10,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 18:45:12,390][1651340] Signal inference workers to stop experience collection... (28300 times) [2024-06-15 18:45:12,476][1652475] InferenceWorker_p0-w0: stopping experience collection (28300 times) [2024-06-15 18:45:12,789][1651340] Signal inference workers to resume experience collection... (28300 times) [2024-06-15 18:45:12,790][1652475] InferenceWorker_p0-w0: resuming experience collection (28300 times) [2024-06-15 18:45:12,793][1652475] Updated weights for policy 0, policy_version 550448 (0.0012) [2024-06-15 18:45:14,813][1652475] Updated weights for policy 0, policy_version 550466 (0.0011) [2024-06-15 18:45:15,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 42066.4, 300 sec: 43320.4). Total num frames: 1127415808. Throughput: 0: 10626.8. Samples: 281929728. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:45:15,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:45:16,844][1652475] Updated weights for policy 0, policy_version 550544 (0.0118) [2024-06-15 18:45:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1127612416. Throughput: 0: 10456.2. Samples: 281952768. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:45:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:45:22,344][1652475] Updated weights for policy 0, policy_version 550624 (0.0013) [2024-06-15 18:45:25,738][1648984] Fps is (10 sec: 32768.9, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1127743488. Throughput: 0: 10422.0. Samples: 282025472. Policy #0 lag: (min: 11.0, avg: 84.7, max: 267.0) [2024-06-15 18:45:25,738][1648984] Avg episode reward: [(0, '-0.190')] [2024-06-15 18:45:26,324][1652475] Updated weights for policy 0, policy_version 550676 (0.0011) [2024-06-15 18:45:28,180][1652475] Updated weights for policy 0, policy_version 550752 (0.0015) [2024-06-15 18:45:29,732][1652475] Updated weights for policy 0, policy_version 550832 (0.0012) [2024-06-15 18:45:30,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 42598.3, 300 sec: 43545.1). Total num frames: 1128136704. Throughput: 0: 10615.4. Samples: 282084864. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:45:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:45:33,727][1652475] Updated weights for policy 0, policy_version 550883 (0.0012) [2024-06-15 18:45:35,738][1648984] Fps is (10 sec: 52426.9, 60 sec: 43144.2, 300 sec: 43098.2). Total num frames: 1128267776. Throughput: 0: 10740.5. Samples: 282124288. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:45:35,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:45:38,288][1652475] Updated weights for policy 0, policy_version 550944 (0.0016) [2024-06-15 18:45:40,177][1652475] Updated weights for policy 0, policy_version 551024 (0.0228) [2024-06-15 18:45:40,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44782.9, 300 sec: 43653.7). Total num frames: 1128562688. Throughput: 0: 10752.0. Samples: 282190336. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:45:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:45:41,386][1652475] Updated weights for policy 0, policy_version 551088 (0.0020) [2024-06-15 18:45:45,424][1652475] Updated weights for policy 0, policy_version 551167 (0.0191) [2024-06-15 18:45:45,738][1648984] Fps is (10 sec: 52430.7, 60 sec: 43690.6, 300 sec: 43653.7). Total num frames: 1128792064. Throughput: 0: 10695.1. Samples: 282252288. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:45:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:45:50,738][1648984] Fps is (10 sec: 32767.3, 60 sec: 43144.4, 300 sec: 42987.2). Total num frames: 1128890368. Throughput: 0: 10695.2. Samples: 282290688. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:45:50,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:45:51,208][1652475] Updated weights for policy 0, policy_version 551248 (0.0013) [2024-06-15 18:45:53,163][1652475] Updated weights for policy 0, policy_version 551344 (0.0141) [2024-06-15 18:45:55,485][1651340] Signal inference workers to stop experience collection... (28350 times) [2024-06-15 18:45:55,533][1652475] InferenceWorker_p0-w0: stopping experience collection (28350 times) [2024-06-15 18:45:55,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1129185280. Throughput: 0: 10865.8. Samples: 282357248. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:45:55,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 18:45:55,758][1651340] Signal inference workers to resume experience collection... (28350 times) [2024-06-15 18:45:55,759][1652475] InferenceWorker_p0-w0: resuming experience collection (28350 times) [2024-06-15 18:45:56,651][1652475] Updated weights for policy 0, policy_version 551411 (0.0013) [2024-06-15 18:46:00,750][1648984] Fps is (10 sec: 42599.1, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1129316352. Throughput: 0: 11138.9. Samples: 282430976. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:00,777][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:46:01,171][1652475] Updated weights for policy 0, policy_version 551440 (0.0015) [2024-06-15 18:46:02,484][1652475] Updated weights for policy 0, policy_version 551504 (0.0012) [2024-06-15 18:46:03,831][1652475] Updated weights for policy 0, policy_version 551574 (0.0014) [2024-06-15 18:46:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 1129709568. Throughput: 0: 11320.9. Samples: 282462208. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:46:06,822][1652475] Updated weights for policy 0, policy_version 551634 (0.0022) [2024-06-15 18:46:10,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 1129840640. Throughput: 0: 11207.1. Samples: 282529792. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:46:11,673][1652475] Updated weights for policy 0, policy_version 551681 (0.0014) [2024-06-15 18:46:13,221][1652475] Updated weights for policy 0, policy_version 551760 (0.0051) [2024-06-15 18:46:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44783.2, 300 sec: 43542.6). Total num frames: 1130102784. Throughput: 0: 11480.2. Samples: 282601472. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:46:16,143][1652475] Updated weights for policy 0, policy_version 551826 (0.0013) [2024-06-15 18:46:17,902][1652475] Updated weights for policy 0, policy_version 551877 (0.0131) [2024-06-15 18:46:20,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 1130364928. Throughput: 0: 11366.5. Samples: 282635776. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:46:23,496][1652475] Updated weights for policy 0, policy_version 551939 (0.0014) [2024-06-15 18:46:24,943][1652475] Updated weights for policy 0, policy_version 552016 (0.0015) [2024-06-15 18:46:25,738][1648984] Fps is (10 sec: 49150.9, 60 sec: 47513.4, 300 sec: 43764.7). Total num frames: 1130594304. Throughput: 0: 11468.7. Samples: 282706432. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:25,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:46:27,239][1652475] Updated weights for policy 0, policy_version 552096 (0.0015) [2024-06-15 18:46:29,166][1652475] Updated weights for policy 0, policy_version 552144 (0.0012) [2024-06-15 18:46:30,370][1652475] Updated weights for policy 0, policy_version 552192 (0.0024) [2024-06-15 18:46:30,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 1130889216. Throughput: 0: 11446.0. Samples: 282767360. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:46:35,442][1652475] Updated weights for policy 0, policy_version 552250 (0.0015) [2024-06-15 18:46:35,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 45875.5, 300 sec: 43986.9). Total num frames: 1131020288. Throughput: 0: 11468.8. Samples: 282806784. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:46:38,649][1651340] Signal inference workers to stop experience collection... (28400 times) [2024-06-15 18:46:38,696][1652475] Updated weights for policy 0, policy_version 552322 (0.0118) [2024-06-15 18:46:38,736][1652475] InferenceWorker_p0-w0: stopping experience collection (28400 times) [2024-06-15 18:46:38,966][1651340] Signal inference workers to resume experience collection... (28400 times) [2024-06-15 18:46:38,967][1652475] InferenceWorker_p0-w0: resuming experience collection (28400 times) [2024-06-15 18:46:40,270][1652475] Updated weights for policy 0, policy_version 552384 (0.0034) [2024-06-15 18:46:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 45329.0, 300 sec: 43987.2). Total num frames: 1131282432. Throughput: 0: 11446.1. Samples: 282872320. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:40,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:46:42,077][1652475] Updated weights for policy 0, policy_version 552447 (0.0015) [2024-06-15 18:46:45,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 43690.4, 300 sec: 43764.7). Total num frames: 1131413504. Throughput: 0: 11229.8. Samples: 282936320. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:45,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:46:47,001][1652475] Updated weights for policy 0, policy_version 552507 (0.0012) [2024-06-15 18:46:50,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 45875.4, 300 sec: 43431.5). Total num frames: 1131642880. Throughput: 0: 11343.7. Samples: 282972672. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:50,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 18:46:50,918][1652475] Updated weights for policy 0, policy_version 552569 (0.0024) [2024-06-15 18:46:52,403][1652475] Updated weights for policy 0, policy_version 552609 (0.0013) [2024-06-15 18:46:54,060][1652475] Updated weights for policy 0, policy_version 552678 (0.0013) [2024-06-15 18:46:55,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 45875.2, 300 sec: 44431.2). Total num frames: 1131937792. Throughput: 0: 11059.2. Samples: 283027456. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:46:55,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:46:55,755][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000552704_1131937792.pth... [2024-06-15 18:46:55,830][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000547568_1121419264.pth [2024-06-15 18:46:57,633][1652475] Updated weights for policy 0, policy_version 552707 (0.0017) [2024-06-15 18:46:58,568][1652475] Updated weights for policy 0, policy_version 552764 (0.0013) [2024-06-15 18:47:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 45875.2, 300 sec: 43764.7). Total num frames: 1132068864. Throughput: 0: 11161.6. Samples: 283103744. Policy #0 lag: (min: 159.0, avg: 251.8, max: 415.0) [2024-06-15 18:47:00,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:47:03,977][1652475] Updated weights for policy 0, policy_version 552848 (0.0042) [2024-06-15 18:47:05,747][1648984] Fps is (10 sec: 42567.7, 60 sec: 44231.5, 300 sec: 44097.6). Total num frames: 1132363776. Throughput: 0: 11068.8. Samples: 283133952. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:05,751][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:47:06,571][1652475] Updated weights for policy 0, policy_version 552944 (0.0016) [2024-06-15 18:47:10,682][1652475] Updated weights for policy 0, policy_version 552992 (0.0012) [2024-06-15 18:47:10,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 44782.8, 300 sec: 43986.8). Total num frames: 1132527616. Throughput: 0: 10740.6. Samples: 283189760. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:10,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:47:14,020][1652475] Updated weights for policy 0, policy_version 553040 (0.0013) [2024-06-15 18:47:15,738][1648984] Fps is (10 sec: 36070.9, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1132724224. Throughput: 0: 10797.5. Samples: 283253248. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:47:15,975][1652475] Updated weights for policy 0, policy_version 553090 (0.0012) [2024-06-15 18:47:17,334][1652475] Updated weights for policy 0, policy_version 553148 (0.0014) [2024-06-15 18:47:20,497][1652475] Updated weights for policy 0, policy_version 553200 (0.0012) [2024-06-15 18:47:20,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.6, 300 sec: 44209.0). Total num frames: 1132986368. Throughput: 0: 10695.1. Samples: 283288064. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:47:22,175][1652475] Updated weights for policy 0, policy_version 553255 (0.0096) [2024-06-15 18:47:25,309][1652475] Updated weights for policy 0, policy_version 553302 (0.0047) [2024-06-15 18:47:25,630][1651340] Signal inference workers to stop experience collection... (28450 times) [2024-06-15 18:47:25,659][1652475] InferenceWorker_p0-w0: stopping experience collection (28450 times) [2024-06-15 18:47:25,754][1648984] Fps is (10 sec: 45799.4, 60 sec: 43132.8, 300 sec: 43318.0). Total num frames: 1133182976. Throughput: 0: 10725.3. Samples: 283355136. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:25,755][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 18:47:25,958][1651340] Signal inference workers to resume experience collection... (28450 times) [2024-06-15 18:47:25,959][1652475] InferenceWorker_p0-w0: resuming experience collection (28450 times) [2024-06-15 18:47:27,036][1652475] Updated weights for policy 0, policy_version 553348 (0.0017) [2024-06-15 18:47:30,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 43986.9). Total num frames: 1133379584. Throughput: 0: 10854.5. Samples: 283424768. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:47:32,014][1652475] Updated weights for policy 0, policy_version 553409 (0.0013) [2024-06-15 18:47:33,023][1652475] Updated weights for policy 0, policy_version 553470 (0.0019) [2024-06-15 18:47:35,738][1648984] Fps is (10 sec: 45951.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1133641728. Throughput: 0: 10843.0. Samples: 283460608. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:35,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 18:47:36,047][1652475] Updated weights for policy 0, policy_version 553552 (0.0014) [2024-06-15 18:47:38,326][1652475] Updated weights for policy 0, policy_version 553617 (0.0013) [2024-06-15 18:47:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1133903872. Throughput: 0: 10899.9. Samples: 283517952. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:40,740][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 18:47:44,567][1652475] Updated weights for policy 0, policy_version 553721 (0.0014) [2024-06-15 18:47:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 1134034944. Throughput: 0: 10854.4. Samples: 283592192. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:47:47,714][1652475] Updated weights for policy 0, policy_version 553777 (0.0096) [2024-06-15 18:47:49,714][1652475] Updated weights for policy 0, policy_version 553853 (0.0013) [2024-06-15 18:47:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 45329.1, 300 sec: 43986.9). Total num frames: 1134362624. Throughput: 0: 10856.2. Samples: 283622400. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:50,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:47:51,092][1652475] Updated weights for policy 0, policy_version 553916 (0.0016) [2024-06-15 18:47:55,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 43764.7). Total num frames: 1134460928. Throughput: 0: 11150.2. Samples: 283691520. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:47:55,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 18:47:56,319][1652475] Updated weights for policy 0, policy_version 553971 (0.0013) [2024-06-15 18:47:59,404][1652475] Updated weights for policy 0, policy_version 554018 (0.0043) [2024-06-15 18:48:00,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 44236.8, 300 sec: 43653.7). Total num frames: 1134723072. Throughput: 0: 11161.6. Samples: 283755520. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:48:00,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:48:01,006][1652475] Updated weights for policy 0, policy_version 554083 (0.0014) [2024-06-15 18:48:02,683][1652475] Updated weights for policy 0, policy_version 554160 (0.0012) [2024-06-15 18:48:05,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43149.8, 300 sec: 43986.9). Total num frames: 1134952448. Throughput: 0: 10945.4. Samples: 283780608. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:48:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:48:07,394][1652475] Updated weights for policy 0, policy_version 554193 (0.0013) [2024-06-15 18:48:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.5, 300 sec: 43098.3). Total num frames: 1135083520. Throughput: 0: 11154.3. Samples: 283856896. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:48:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:48:11,331][1652475] Updated weights for policy 0, policy_version 554256 (0.0013) [2024-06-15 18:48:12,368][1651340] Signal inference workers to stop experience collection... (28500 times) [2024-06-15 18:48:12,435][1652475] InferenceWorker_p0-w0: stopping experience collection (28500 times) [2024-06-15 18:48:12,552][1651340] Signal inference workers to resume experience collection... (28500 times) [2024-06-15 18:48:12,553][1652475] InferenceWorker_p0-w0: resuming experience collection (28500 times) [2024-06-15 18:48:13,756][1652475] Updated weights for policy 0, policy_version 554353 (0.0012) [2024-06-15 18:48:15,693][1652475] Updated weights for policy 0, policy_version 554425 (0.0139) [2024-06-15 18:48:15,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45329.1, 300 sec: 43875.8). Total num frames: 1135443968. Throughput: 0: 10797.5. Samples: 283910656. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:48:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:48:19,610][1652475] Updated weights for policy 0, policy_version 554448 (0.0015) [2024-06-15 18:48:20,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.6, 300 sec: 43431.6). Total num frames: 1135575040. Throughput: 0: 10786.1. Samples: 283945984. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:48:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:48:20,774][1652475] Updated weights for policy 0, policy_version 554492 (0.0014) [2024-06-15 18:48:25,270][1652475] Updated weights for policy 0, policy_version 554561 (0.0126) [2024-06-15 18:48:25,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43156.5, 300 sec: 43209.3). Total num frames: 1135771648. Throughput: 0: 11025.1. Samples: 284014080. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:48:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:48:27,501][1652475] Updated weights for policy 0, policy_version 554640 (0.0012) [2024-06-15 18:48:28,695][1652475] Updated weights for policy 0, policy_version 554687 (0.0012) [2024-06-15 18:48:30,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1136001024. Throughput: 0: 10547.2. Samples: 284066816. Policy #0 lag: (min: 4.0, avg: 83.9, max: 260.0) [2024-06-15 18:48:30,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:48:32,925][1652475] Updated weights for policy 0, policy_version 554736 (0.0016) [2024-06-15 18:48:35,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1136132096. Throughput: 0: 10695.1. Samples: 284103680. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:48:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:48:37,238][1652475] Updated weights for policy 0, policy_version 554816 (0.0136) [2024-06-15 18:48:39,079][1652475] Updated weights for policy 0, policy_version 554878 (0.0014) [2024-06-15 18:48:40,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42598.4, 300 sec: 43764.8). Total num frames: 1136459776. Throughput: 0: 10387.9. Samples: 284158976. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:48:40,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:48:45,440][1652475] Updated weights for policy 0, policy_version 554992 (0.0033) [2024-06-15 18:48:45,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1136656384. Throughput: 0: 10410.7. Samples: 284224000. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:48:45,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:48:49,688][1652475] Updated weights for policy 0, policy_version 555040 (0.0014) [2024-06-15 18:48:50,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 40413.8, 300 sec: 43098.3). Total num frames: 1136787456. Throughput: 0: 10638.2. Samples: 284259328. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:48:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:48:51,838][1652475] Updated weights for policy 0, policy_version 555094 (0.0013) [2024-06-15 18:48:53,857][1652475] Updated weights for policy 0, policy_version 555189 (0.0109) [2024-06-15 18:48:55,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 43144.4, 300 sec: 43431.5). Total num frames: 1137049600. Throughput: 0: 10126.2. Samples: 284312576. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:48:55,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:48:55,777][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000555200_1137049600.pth... [2024-06-15 18:48:55,823][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000550144_1126694912.pth [2024-06-15 18:48:57,343][1651340] Signal inference workers to stop experience collection... (28550 times) [2024-06-15 18:48:57,404][1652475] Updated weights for policy 0, policy_version 555234 (0.0155) [2024-06-15 18:48:57,426][1652475] InferenceWorker_p0-w0: stopping experience collection (28550 times) [2024-06-15 18:48:57,592][1651340] Signal inference workers to resume experience collection... (28550 times) [2024-06-15 18:48:57,610][1652475] InferenceWorker_p0-w0: resuming experience collection (28550 times) [2024-06-15 18:49:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 1137180672. Throughput: 0: 10581.3. Samples: 284386816. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:49:01,256][1652475] Updated weights for policy 0, policy_version 555296 (0.0038) [2024-06-15 18:49:04,343][1652475] Updated weights for policy 0, policy_version 555376 (0.0015) [2024-06-15 18:49:05,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 42598.3, 300 sec: 43764.7). Total num frames: 1137508352. Throughput: 0: 10467.5. Samples: 284417024. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:49:05,905][1652475] Updated weights for policy 0, policy_version 555430 (0.0015) [2024-06-15 18:49:08,679][1652475] Updated weights for policy 0, policy_version 555477 (0.0014) [2024-06-15 18:49:10,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43434.5). Total num frames: 1137704960. Throughput: 0: 10433.4. Samples: 284483584. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:49:11,655][1652475] Updated weights for policy 0, policy_version 555525 (0.0014) [2024-06-15 18:49:14,661][1652475] Updated weights for policy 0, policy_version 555588 (0.0013) [2024-06-15 18:49:15,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 1137934336. Throughput: 0: 10797.5. Samples: 284552704. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:49:15,807][1652475] Updated weights for policy 0, policy_version 555648 (0.0013) [2024-06-15 18:49:20,352][1652475] Updated weights for policy 0, policy_version 555730 (0.0016) [2024-06-15 18:49:20,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.5, 300 sec: 43764.7). Total num frames: 1138163712. Throughput: 0: 10672.4. Samples: 284583936. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:49:23,884][1652475] Updated weights for policy 0, policy_version 555792 (0.0015) [2024-06-15 18:49:25,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 43144.4, 300 sec: 43320.4). Total num frames: 1138360320. Throughput: 0: 10922.7. Samples: 284650496. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:49:26,064][1652475] Updated weights for policy 0, policy_version 555844 (0.0012) [2024-06-15 18:49:27,288][1652475] Updated weights for policy 0, policy_version 555903 (0.0014) [2024-06-15 18:49:30,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 1138622464. Throughput: 0: 10888.5. Samples: 284713984. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:49:33,833][1652475] Updated weights for policy 0, policy_version 555972 (0.0012) [2024-06-15 18:49:35,712][1652475] Updated weights for policy 0, policy_version 556055 (0.0013) [2024-06-15 18:49:35,740][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 43764.7). Total num frames: 1138786304. Throughput: 0: 10968.2. Samples: 284752896. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:35,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:49:38,113][1652475] Updated weights for policy 0, policy_version 556115 (0.0016) [2024-06-15 18:49:39,077][1652475] Updated weights for policy 0, policy_version 556158 (0.0016) [2024-06-15 18:49:40,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 1139081216. Throughput: 0: 11104.7. Samples: 284812288. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:49:41,301][1652475] Updated weights for policy 0, policy_version 556222 (0.0013) [2024-06-15 18:49:45,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1139146752. Throughput: 0: 11036.4. Samples: 284883456. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:49:47,943][1651340] Signal inference workers to stop experience collection... (28600 times) [2024-06-15 18:49:48,016][1652475] InferenceWorker_p0-w0: stopping experience collection (28600 times) [2024-06-15 18:49:48,018][1652475] Updated weights for policy 0, policy_version 556276 (0.0014) [2024-06-15 18:49:48,190][1651340] Signal inference workers to resume experience collection... (28600 times) [2024-06-15 18:49:48,192][1652475] InferenceWorker_p0-w0: resuming experience collection (28600 times) [2024-06-15 18:49:50,405][1652475] Updated weights for policy 0, policy_version 556384 (0.0015) [2024-06-15 18:49:50,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 1139507200. Throughput: 0: 11082.0. Samples: 284915712. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:49:52,630][1652475] Updated weights for policy 0, policy_version 556448 (0.0016) [2024-06-15 18:49:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 1139671040. Throughput: 0: 10865.8. Samples: 284972544. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:49:55,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:49:59,538][1652475] Updated weights for policy 0, policy_version 556512 (0.0013) [2024-06-15 18:50:00,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 44236.8, 300 sec: 43209.4). Total num frames: 1139834880. Throughput: 0: 10911.3. Samples: 285043712. Policy #0 lag: (min: 31.0, avg: 140.4, max: 287.0) [2024-06-15 18:50:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:50:01,332][1652475] Updated weights for policy 0, policy_version 556592 (0.0014) [2024-06-15 18:50:03,262][1652475] Updated weights for policy 0, policy_version 556669 (0.0013) [2024-06-15 18:50:04,925][1652475] Updated weights for policy 0, policy_version 556729 (0.0024) [2024-06-15 18:50:05,737][1648984] Fps is (10 sec: 52429.8, 60 sec: 44783.1, 300 sec: 43986.9). Total num frames: 1140195328. Throughput: 0: 10683.8. Samples: 285064704. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:50:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.2, 300 sec: 43320.4). Total num frames: 1140195328. Throughput: 0: 10877.2. Samples: 285139968. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:50:12,337][1652475] Updated weights for policy 0, policy_version 556787 (0.0013) [2024-06-15 18:50:13,991][1652475] Updated weights for policy 0, policy_version 556855 (0.0106) [2024-06-15 18:50:15,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 1140588544. Throughput: 0: 10843.0. Samples: 285201920. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:50:15,858][1652475] Updated weights for policy 0, policy_version 556929 (0.0012) [2024-06-15 18:50:17,268][1652475] Updated weights for policy 0, policy_version 556987 (0.0013) [2024-06-15 18:50:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 1140719616. Throughput: 0: 10672.4. Samples: 285233152. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:50:24,349][1652475] Updated weights for policy 0, policy_version 557056 (0.0014) [2024-06-15 18:50:25,751][1648984] Fps is (10 sec: 35998.7, 60 sec: 43135.4, 300 sec: 43429.6). Total num frames: 1140948992. Throughput: 0: 10862.7. Samples: 285301248. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:25,752][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:50:25,932][1652475] Updated weights for policy 0, policy_version 557119 (0.0011) [2024-06-15 18:50:27,710][1652475] Updated weights for policy 0, policy_version 557172 (0.0029) [2024-06-15 18:50:28,535][1651340] Signal inference workers to stop experience collection... (28650 times) [2024-06-15 18:50:28,569][1652475] InferenceWorker_p0-w0: stopping experience collection (28650 times) [2024-06-15 18:50:28,891][1651340] Signal inference workers to resume experience collection... (28650 times) [2024-06-15 18:50:28,893][1652475] InferenceWorker_p0-w0: resuming experience collection (28650 times) [2024-06-15 18:50:29,520][1652475] Updated weights for policy 0, policy_version 557240 (0.0014) [2024-06-15 18:50:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1141243904. Throughput: 0: 10604.1. Samples: 285360640. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:50:35,721][1652475] Updated weights for policy 0, policy_version 557280 (0.0014) [2024-06-15 18:50:35,738][1648984] Fps is (10 sec: 36091.1, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1141309440. Throughput: 0: 10695.1. Samples: 285396992. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:50:38,744][1652475] Updated weights for policy 0, policy_version 557360 (0.0015) [2024-06-15 18:50:40,440][1652475] Updated weights for policy 0, policy_version 557424 (0.0011) [2024-06-15 18:50:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42052.4, 300 sec: 43431.5). Total num frames: 1141604352. Throughput: 0: 10854.4. Samples: 285460992. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:50:42,349][1652475] Updated weights for policy 0, policy_version 557498 (0.0013) [2024-06-15 18:50:45,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 43653.7). Total num frames: 1141768192. Throughput: 0: 10626.8. Samples: 285521920. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:50:47,975][1652475] Updated weights for policy 0, policy_version 557539 (0.0014) [2024-06-15 18:50:50,337][1652475] Updated weights for policy 0, policy_version 557587 (0.0013) [2024-06-15 18:50:50,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40960.0, 300 sec: 43320.4). Total num frames: 1141964800. Throughput: 0: 10922.6. Samples: 285556224. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:50:51,826][1652475] Updated weights for policy 0, policy_version 557651 (0.0141) [2024-06-15 18:50:54,483][1652475] Updated weights for policy 0, policy_version 557744 (0.0013) [2024-06-15 18:50:55,738][1648984] Fps is (10 sec: 52426.9, 60 sec: 43690.4, 300 sec: 43986.8). Total num frames: 1142292480. Throughput: 0: 10433.3. Samples: 285609472. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:50:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:50:55,755][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000557760_1142292480.pth... [2024-06-15 18:50:55,833][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000552704_1131937792.pth [2024-06-15 18:50:55,839][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000557760_1142292480.pth [2024-06-15 18:50:59,992][1652475] Updated weights for policy 0, policy_version 557792 (0.0051) [2024-06-15 18:51:00,739][1648984] Fps is (10 sec: 45869.4, 60 sec: 43143.6, 300 sec: 43098.1). Total num frames: 1142423552. Throughput: 0: 10649.3. Samples: 285681152. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:51:00,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:51:03,403][1652475] Updated weights for policy 0, policy_version 557872 (0.0014) [2024-06-15 18:51:03,896][1652475] Updated weights for policy 0, policy_version 557887 (0.0011) [2024-06-15 18:51:05,635][1652475] Updated weights for policy 0, policy_version 557952 (0.0040) [2024-06-15 18:51:05,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 41506.0, 300 sec: 43542.6). Total num frames: 1142685696. Throughput: 0: 10626.8. Samples: 285711360. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:51:05,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:51:07,069][1652475] Updated weights for policy 0, policy_version 558012 (0.0014) [2024-06-15 18:51:10,738][1648984] Fps is (10 sec: 39326.5, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1142816768. Throughput: 0: 10561.6. Samples: 285776384. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:51:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:51:12,225][1652475] Updated weights for policy 0, policy_version 558056 (0.0026) [2024-06-15 18:51:14,652][1652475] Updated weights for policy 0, policy_version 558128 (0.0025) [2024-06-15 18:51:15,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 1143078912. Throughput: 0: 10740.6. Samples: 285843968. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:51:15,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 18:51:16,175][1651340] Signal inference workers to stop experience collection... (28700 times) [2024-06-15 18:51:16,206][1652475] InferenceWorker_p0-w0: stopping experience collection (28700 times) [2024-06-15 18:51:16,450][1651340] Signal inference workers to resume experience collection... (28700 times) [2024-06-15 18:51:16,451][1652475] InferenceWorker_p0-w0: resuming experience collection (28700 times) [2024-06-15 18:51:17,186][1652475] Updated weights for policy 0, policy_version 558193 (0.0107) [2024-06-15 18:51:18,100][1652475] Updated weights for policy 0, policy_version 558227 (0.0011) [2024-06-15 18:51:19,122][1652475] Updated weights for policy 0, policy_version 558270 (0.0011) [2024-06-15 18:51:20,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43209.4). Total num frames: 1143341056. Throughput: 0: 10649.6. Samples: 285876224. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:51:20,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:51:25,262][1652475] Updated weights for policy 0, policy_version 558338 (0.0126) [2024-06-15 18:51:25,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 42607.5, 300 sec: 42765.0). Total num frames: 1143504896. Throughput: 0: 10797.5. Samples: 285946880. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:51:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:51:27,929][1652475] Updated weights for policy 0, policy_version 558432 (0.0013) [2024-06-15 18:51:30,599][1652475] Updated weights for policy 0, policy_version 558481 (0.0012) [2024-06-15 18:51:30,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1143767040. Throughput: 0: 10934.0. Samples: 286013952. Policy #0 lag: (min: 95.0, avg: 231.8, max: 447.0) [2024-06-15 18:51:30,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:51:33,570][1652475] Updated weights for policy 0, policy_version 558529 (0.0014) [2024-06-15 18:51:34,735][1652475] Updated weights for policy 0, policy_version 558592 (0.0014) [2024-06-15 18:51:35,738][1648984] Fps is (10 sec: 49150.7, 60 sec: 44782.7, 300 sec: 43098.2). Total num frames: 1143996416. Throughput: 0: 10968.1. Samples: 286049792. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:51:35,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:51:37,525][1652475] Updated weights for policy 0, policy_version 558641 (0.0139) [2024-06-15 18:51:39,121][1652475] Updated weights for policy 0, policy_version 558695 (0.0014) [2024-06-15 18:51:40,738][1648984] Fps is (10 sec: 49150.5, 60 sec: 44236.6, 300 sec: 43542.6). Total num frames: 1144258560. Throughput: 0: 11298.2. Samples: 286117888. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:51:40,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 18:51:41,530][1652475] Updated weights for policy 0, policy_version 558722 (0.0013) [2024-06-15 18:51:42,632][1652475] Updated weights for policy 0, policy_version 558784 (0.0014) [2024-06-15 18:51:45,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1144389632. Throughput: 0: 11389.5. Samples: 286193664. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:51:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:51:46,767][1652475] Updated weights for policy 0, policy_version 558842 (0.0014) [2024-06-15 18:51:48,306][1652475] Updated weights for policy 0, policy_version 558896 (0.0017) [2024-06-15 18:51:50,000][1652475] Updated weights for policy 0, policy_version 558972 (0.0012) [2024-06-15 18:51:50,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 46967.4, 300 sec: 43542.6). Total num frames: 1144782848. Throughput: 0: 11400.5. Samples: 286224384. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:51:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:51:54,147][1652475] Updated weights for policy 0, policy_version 559029 (0.0013) [2024-06-15 18:51:55,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.9, 300 sec: 43542.5). Total num frames: 1144913920. Throughput: 0: 11355.0. Samples: 286287360. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:51:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:51:59,260][1652475] Updated weights for policy 0, policy_version 559077 (0.0014) [2024-06-15 18:52:00,486][1651340] Signal inference workers to stop experience collection... (28750 times) [2024-06-15 18:52:00,594][1652475] InferenceWorker_p0-w0: stopping experience collection (28750 times) [2024-06-15 18:52:00,745][1648984] Fps is (10 sec: 32743.4, 60 sec: 44778.2, 300 sec: 43209.3). Total num frames: 1145110528. Throughput: 0: 11421.4. Samples: 286358016. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:00,746][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:52:00,846][1651340] Signal inference workers to resume experience collection... (28750 times) [2024-06-15 18:52:00,847][1652475] InferenceWorker_p0-w0: resuming experience collection (28750 times) [2024-06-15 18:52:01,034][1652475] Updated weights for policy 0, policy_version 559155 (0.0015) [2024-06-15 18:52:02,692][1652475] Updated weights for policy 0, policy_version 559226 (0.0013) [2024-06-15 18:52:05,599][1652475] Updated weights for policy 0, policy_version 559267 (0.0148) [2024-06-15 18:52:05,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 45329.1, 300 sec: 43653.7). Total num frames: 1145405440. Throughput: 0: 11218.5. Samples: 286381056. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:05,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 18:52:10,512][1652475] Updated weights for policy 0, policy_version 559312 (0.0044) [2024-06-15 18:52:10,738][1648984] Fps is (10 sec: 36071.9, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1145470976. Throughput: 0: 11286.8. Samples: 286454784. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:52:12,590][1652475] Updated weights for policy 0, policy_version 559393 (0.0038) [2024-06-15 18:52:14,146][1652475] Updated weights for policy 0, policy_version 559444 (0.0021) [2024-06-15 18:52:15,566][1652475] Updated weights for policy 0, policy_version 559490 (0.0012) [2024-06-15 18:52:15,738][1648984] Fps is (10 sec: 42596.4, 60 sec: 45875.0, 300 sec: 43542.5). Total num frames: 1145831424. Throughput: 0: 11172.9. Samples: 286516736. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:15,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:52:20,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43690.5, 300 sec: 43322.8). Total num frames: 1145962496. Throughput: 0: 11013.7. Samples: 286545408. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:20,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:52:22,324][1652475] Updated weights for policy 0, policy_version 559600 (0.0015) [2024-06-15 18:52:25,392][1652475] Updated weights for policy 0, policy_version 559651 (0.0033) [2024-06-15 18:52:25,738][1648984] Fps is (10 sec: 36046.6, 60 sec: 44783.0, 300 sec: 43431.5). Total num frames: 1146191872. Throughput: 0: 11264.1. Samples: 286624768. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:25,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 18:52:27,092][1652475] Updated weights for policy 0, policy_version 559716 (0.0012) [2024-06-15 18:52:28,457][1652475] Updated weights for policy 0, policy_version 559778 (0.0014) [2024-06-15 18:52:30,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 45329.1, 300 sec: 43542.6). Total num frames: 1146486784. Throughput: 0: 10808.9. Samples: 286680064. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:30,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 18:52:33,112][1652475] Updated weights for policy 0, policy_version 559817 (0.0018) [2024-06-15 18:52:35,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.9, 300 sec: 43098.2). Total num frames: 1146617856. Throughput: 0: 11013.7. Samples: 286720000. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:52:36,740][1652475] Updated weights for policy 0, policy_version 559888 (0.0044) [2024-06-15 18:52:39,229][1652475] Updated weights for policy 0, policy_version 559984 (0.0015) [2024-06-15 18:52:40,715][1652475] Updated weights for policy 0, policy_version 560048 (0.0037) [2024-06-15 18:52:40,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 45329.3, 300 sec: 43875.8). Total num frames: 1146978304. Throughput: 0: 10877.2. Samples: 286776832. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:52:45,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1147011072. Throughput: 0: 10913.1. Samples: 286849024. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:52:46,830][1651340] Signal inference workers to stop experience collection... (28800 times) [2024-06-15 18:52:46,936][1652475] InferenceWorker_p0-w0: stopping experience collection (28800 times) [2024-06-15 18:52:47,068][1651340] Signal inference workers to resume experience collection... (28800 times) [2024-06-15 18:52:47,069][1652475] InferenceWorker_p0-w0: resuming experience collection (28800 times) [2024-06-15 18:52:47,563][1652475] Updated weights for policy 0, policy_version 560112 (0.0017) [2024-06-15 18:52:49,654][1652475] Updated weights for policy 0, policy_version 560163 (0.0012) [2024-06-15 18:52:50,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 1147305984. Throughput: 0: 11070.6. Samples: 286879232. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:52:51,585][1652475] Updated weights for policy 0, policy_version 560256 (0.0014) [2024-06-15 18:52:53,220][1652475] Updated weights for policy 0, policy_version 560315 (0.0012) [2024-06-15 18:52:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1147535360. Throughput: 0: 10626.8. Samples: 286932992. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:52:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:52:55,762][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000560320_1147535360.pth... [2024-06-15 18:52:55,823][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000555200_1137049600.pth [2024-06-15 18:53:00,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 41511.3, 300 sec: 42876.1). Total num frames: 1147600896. Throughput: 0: 10934.1. Samples: 287008768. Policy #0 lag: (min: 31.0, avg: 136.8, max: 287.0) [2024-06-15 18:53:00,751][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:53:00,988][1652475] Updated weights for policy 0, policy_version 560370 (0.0012) [2024-06-15 18:53:03,213][1652475] Updated weights for policy 0, policy_version 560480 (0.0095) [2024-06-15 18:53:05,600][1652475] Updated weights for policy 0, policy_version 560572 (0.0110) [2024-06-15 18:53:05,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 44236.8, 300 sec: 43986.9). Total num frames: 1148059648. Throughput: 0: 10786.2. Samples: 287030784. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:53:10,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43144.6, 300 sec: 42765.0). Total num frames: 1148059648. Throughput: 0: 10353.8. Samples: 287090688. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:53:13,991][1652475] Updated weights for policy 0, policy_version 560641 (0.0013) [2024-06-15 18:53:15,738][1648984] Fps is (10 sec: 26214.1, 60 sec: 41506.4, 300 sec: 43209.3). Total num frames: 1148321792. Throughput: 0: 10626.8. Samples: 287158272. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:53:16,511][1652475] Updated weights for policy 0, policy_version 560738 (0.0014) [2024-06-15 18:53:18,340][1652475] Updated weights for policy 0, policy_version 560816 (0.0115) [2024-06-15 18:53:20,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.8, 300 sec: 43431.5). Total num frames: 1148583936. Throughput: 0: 10149.0. Samples: 287176704. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:53:25,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 40959.9, 300 sec: 42876.1). Total num frames: 1148649472. Throughput: 0: 10581.3. Samples: 287252992. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:53:26,194][1652475] Updated weights for policy 0, policy_version 560896 (0.0013) [2024-06-15 18:53:28,593][1651340] Signal inference workers to stop experience collection... (28850 times) [2024-06-15 18:53:28,706][1652475] InferenceWorker_p0-w0: stopping experience collection (28850 times) [2024-06-15 18:53:28,836][1651340] Signal inference workers to resume experience collection... (28850 times) [2024-06-15 18:53:28,837][1652475] InferenceWorker_p0-w0: resuming experience collection (28850 times) [2024-06-15 18:53:29,047][1652475] Updated weights for policy 0, policy_version 560995 (0.0216) [2024-06-15 18:53:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1148977152. Throughput: 0: 10092.1. Samples: 287303168. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:53:31,648][1652475] Updated weights for policy 0, policy_version 561060 (0.0013) [2024-06-15 18:53:35,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 1149108224. Throughput: 0: 10160.3. Samples: 287336448. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:53:37,205][1652475] Updated weights for policy 0, policy_version 561104 (0.0155) [2024-06-15 18:53:38,645][1652475] Updated weights for policy 0, policy_version 561171 (0.0014) [2024-06-15 18:53:40,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 40413.8, 300 sec: 43209.3). Total num frames: 1149403136. Throughput: 0: 10581.3. Samples: 287409152. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:40,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 18:53:40,741][1652475] Updated weights for policy 0, policy_version 561248 (0.0013) [2024-06-15 18:53:44,251][1652475] Updated weights for policy 0, policy_version 561296 (0.0012) [2024-06-15 18:53:45,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1149632512. Throughput: 0: 10274.1. Samples: 287471104. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:53:48,076][1652475] Updated weights for policy 0, policy_version 561350 (0.0013) [2024-06-15 18:53:49,817][1652475] Updated weights for policy 0, policy_version 561428 (0.0094) [2024-06-15 18:53:50,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 1149894656. Throughput: 0: 10740.6. Samples: 287514112. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:53:51,299][1652475] Updated weights for policy 0, policy_version 561489 (0.0146) [2024-06-15 18:53:55,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1150025728. Throughput: 0: 10763.4. Samples: 287575040. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:53:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:53:56,122][1652475] Updated weights for policy 0, policy_version 561552 (0.0012) [2024-06-15 18:54:00,155][1652475] Updated weights for policy 0, policy_version 561652 (0.0162) [2024-06-15 18:54:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 1150287872. Throughput: 0: 10922.7. Samples: 287649792. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:54:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:54:02,331][1652475] Updated weights for policy 0, policy_version 561713 (0.0013) [2024-06-15 18:54:04,049][1652475] Updated weights for policy 0, policy_version 561786 (0.0024) [2024-06-15 18:54:05,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 41505.9, 300 sec: 43542.5). Total num frames: 1150550016. Throughput: 0: 11081.9. Samples: 287675392. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:54:05,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:54:08,880][1652475] Updated weights for policy 0, policy_version 561843 (0.0011) [2024-06-15 18:54:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1150681088. Throughput: 0: 10945.4. Samples: 287745536. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:54:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:54:11,028][1652475] Updated weights for policy 0, policy_version 561872 (0.0011) [2024-06-15 18:54:12,856][1652475] Updated weights for policy 0, policy_version 561922 (0.0014) [2024-06-15 18:54:13,702][1651340] Signal inference workers to stop experience collection... (28900 times) [2024-06-15 18:54:13,742][1652475] InferenceWorker_p0-w0: stopping experience collection (28900 times) [2024-06-15 18:54:14,091][1651340] Signal inference workers to resume experience collection... (28900 times) [2024-06-15 18:54:14,093][1652475] InferenceWorker_p0-w0: resuming experience collection (28900 times) [2024-06-15 18:54:15,568][1652475] Updated weights for policy 0, policy_version 561986 (0.0013) [2024-06-15 18:54:15,738][1648984] Fps is (10 sec: 39322.7, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1150943232. Throughput: 0: 11184.4. Samples: 287806464. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:54:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:54:16,987][1652475] Updated weights for policy 0, policy_version 562047 (0.0137) [2024-06-15 18:54:19,859][1652475] Updated weights for policy 0, policy_version 562105 (0.0029) [2024-06-15 18:54:20,739][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1151205376. Throughput: 0: 11150.2. Samples: 287838208. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:54:20,741][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:54:24,919][1652475] Updated weights for policy 0, policy_version 562176 (0.0026) [2024-06-15 18:54:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 43209.3). Total num frames: 1151369216. Throughput: 0: 11116.1. Samples: 287909376. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:54:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:54:26,279][1652475] Updated weights for policy 0, policy_version 562238 (0.0028) [2024-06-15 18:54:29,824][1652475] Updated weights for policy 0, policy_version 562292 (0.0013) [2024-06-15 18:54:30,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1151598592. Throughput: 0: 11047.9. Samples: 287968256. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:54:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:54:31,061][1652475] Updated weights for policy 0, policy_version 562324 (0.0011) [2024-06-15 18:54:35,249][1652475] Updated weights for policy 0, policy_version 562370 (0.0012) [2024-06-15 18:54:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44236.9, 300 sec: 42987.2). Total num frames: 1151762432. Throughput: 0: 10843.0. Samples: 288002048. Policy #0 lag: (min: 79.0, avg: 225.2, max: 319.0) [2024-06-15 18:54:35,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:54:38,524][1652475] Updated weights for policy 0, policy_version 562448 (0.0050) [2024-06-15 18:54:40,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 1151991808. Throughput: 0: 10865.8. Samples: 288064000. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:54:40,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:54:41,196][1652475] Updated weights for policy 0, policy_version 562512 (0.0112) [2024-06-15 18:54:43,396][1652475] Updated weights for policy 0, policy_version 562595 (0.0094) [2024-06-15 18:54:45,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1152253952. Throughput: 0: 10717.9. Samples: 288132096. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:54:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:54:47,524][1652475] Updated weights for policy 0, policy_version 562657 (0.0014) [2024-06-15 18:54:50,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1152385024. Throughput: 0: 10877.2. Samples: 288164864. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:54:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:54:51,107][1652475] Updated weights for policy 0, policy_version 562704 (0.0015) [2024-06-15 18:54:53,345][1652475] Updated weights for policy 0, policy_version 562769 (0.0012) [2024-06-15 18:54:54,759][1652475] Updated weights for policy 0, policy_version 562832 (0.0034) [2024-06-15 18:54:55,740][1648984] Fps is (10 sec: 49141.5, 60 sec: 45327.4, 300 sec: 43764.4). Total num frames: 1152745472. Throughput: 0: 10762.9. Samples: 288229888. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:54:55,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:54:55,799][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000562880_1152778240.pth... [2024-06-15 18:54:55,861][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000557760_1142292480.pth [2024-06-15 18:55:00,417][1652475] Updated weights for policy 0, policy_version 562915 (0.0239) [2024-06-15 18:55:00,738][1648984] Fps is (10 sec: 49151.3, 60 sec: 43144.5, 300 sec: 42987.1). Total num frames: 1152876544. Throughput: 0: 10968.2. Samples: 288300032. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:00,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:55:02,400][1651340] Signal inference workers to stop experience collection... (28950 times) [2024-06-15 18:55:02,439][1652475] InferenceWorker_p0-w0: stopping experience collection (28950 times) [2024-06-15 18:55:02,611][1651340] Signal inference workers to resume experience collection... (28950 times) [2024-06-15 18:55:02,611][1652475] InferenceWorker_p0-w0: resuming experience collection (28950 times) [2024-06-15 18:55:02,613][1652475] Updated weights for policy 0, policy_version 562976 (0.0014) [2024-06-15 18:55:05,026][1652475] Updated weights for policy 0, policy_version 563056 (0.0014) [2024-06-15 18:55:05,738][1648984] Fps is (10 sec: 42607.6, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 1153171456. Throughput: 0: 10945.4. Samples: 288330752. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:55:06,541][1652475] Updated weights for policy 0, policy_version 563128 (0.0025) [2024-06-15 18:55:10,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1153302528. Throughput: 0: 10899.9. Samples: 288399872. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 18:55:13,629][1652475] Updated weights for policy 0, policy_version 563173 (0.0013) [2024-06-15 18:55:15,188][1652475] Updated weights for policy 0, policy_version 563248 (0.0015) [2024-06-15 18:55:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1153564672. Throughput: 0: 10979.5. Samples: 288462336. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:55:16,026][1652475] Updated weights for policy 0, policy_version 563284 (0.0013) [2024-06-15 18:55:17,636][1652475] Updated weights for policy 0, policy_version 563360 (0.0014) [2024-06-15 18:55:18,201][1652475] Updated weights for policy 0, policy_version 563389 (0.0013) [2024-06-15 18:55:20,738][1648984] Fps is (10 sec: 52425.3, 60 sec: 43690.2, 300 sec: 43655.4). Total num frames: 1153826816. Throughput: 0: 10888.4. Samples: 288492032. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:20,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:55:25,497][1652475] Updated weights for policy 0, policy_version 563447 (0.0014) [2024-06-15 18:55:25,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 1153925120. Throughput: 0: 11320.9. Samples: 288573440. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 18:55:26,529][1652475] Updated weights for policy 0, policy_version 563488 (0.0197) [2024-06-15 18:55:28,706][1652475] Updated weights for policy 0, policy_version 563584 (0.0021) [2024-06-15 18:55:30,308][1652475] Updated weights for policy 0, policy_version 563646 (0.0012) [2024-06-15 18:55:30,738][1648984] Fps is (10 sec: 52432.1, 60 sec: 45875.1, 300 sec: 44209.0). Total num frames: 1154351104. Throughput: 0: 10820.3. Samples: 288619008. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:30,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:55:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1154351104. Throughput: 0: 11002.3. Samples: 288659968. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:55:38,061][1652475] Updated weights for policy 0, policy_version 563712 (0.0023) [2024-06-15 18:55:40,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 1154678784. Throughput: 0: 11002.8. Samples: 288724992. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:55:40,898][1652475] Updated weights for policy 0, policy_version 563824 (0.0015) [2024-06-15 18:55:41,681][1651340] Signal inference workers to stop experience collection... (29000 times) [2024-06-15 18:55:41,807][1652475] InferenceWorker_p0-w0: stopping experience collection (29000 times) [2024-06-15 18:55:41,949][1651340] Signal inference workers to resume experience collection... (29000 times) [2024-06-15 18:55:41,961][1652475] InferenceWorker_p0-w0: resuming experience collection (29000 times) [2024-06-15 18:55:42,869][1652475] Updated weights for policy 0, policy_version 563888 (0.0102) [2024-06-15 18:55:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 1154875392. Throughput: 0: 10729.3. Samples: 288782848. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:55:50,334][1652475] Updated weights for policy 0, policy_version 563953 (0.0013) [2024-06-15 18:55:50,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1155006464. Throughput: 0: 10911.3. Samples: 288821760. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:50,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:55:52,496][1652475] Updated weights for policy 0, policy_version 564033 (0.0014) [2024-06-15 18:55:53,698][1652475] Updated weights for policy 0, policy_version 564084 (0.0012) [2024-06-15 18:55:55,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 42053.6, 300 sec: 43542.7). Total num frames: 1155268608. Throughput: 0: 10581.3. Samples: 288876032. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:55:55,739][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 18:55:56,441][1652475] Updated weights for policy 0, policy_version 564113 (0.0014) [2024-06-15 18:56:00,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 43098.2). Total num frames: 1155399680. Throughput: 0: 10797.5. Samples: 288948224. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:56:00,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 18:56:01,424][1652475] Updated weights for policy 0, policy_version 564163 (0.0011) [2024-06-15 18:56:03,410][1652475] Updated weights for policy 0, policy_version 564241 (0.0013) [2024-06-15 18:56:05,511][1652475] Updated weights for policy 0, policy_version 564320 (0.0011) [2024-06-15 18:56:05,738][1648984] Fps is (10 sec: 45876.4, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 1155727360. Throughput: 0: 10740.8. Samples: 288975360. Policy #0 lag: (min: 63.0, avg: 144.4, max: 303.0) [2024-06-15 18:56:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 18:56:10,702][1652475] Updated weights for policy 0, policy_version 564386 (0.0013) [2024-06-15 18:56:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1155858432. Throughput: 0: 10285.5. Samples: 289036288. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:56:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:56:14,062][1652475] Updated weights for policy 0, policy_version 564435 (0.0040) [2024-06-15 18:56:15,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1156087808. Throughput: 0: 10683.7. Samples: 289099776. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:56:15,741][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:56:16,789][1652475] Updated weights for policy 0, policy_version 564529 (0.0108) [2024-06-15 18:56:18,457][1652475] Updated weights for policy 0, policy_version 564607 (0.0019) [2024-06-15 18:56:20,738][1648984] Fps is (10 sec: 45874.4, 60 sec: 41506.5, 300 sec: 43431.5). Total num frames: 1156317184. Throughput: 0: 10262.7. Samples: 289121792. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:56:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:56:23,571][1652475] Updated weights for policy 0, policy_version 564668 (0.0096) [2024-06-15 18:56:25,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1156448256. Throughput: 0: 10490.3. Samples: 289197056. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:56:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:56:27,789][1652475] Updated weights for policy 0, policy_version 564739 (0.0096) [2024-06-15 18:56:28,095][1651340] Signal inference workers to stop experience collection... (29050 times) [2024-06-15 18:56:28,135][1652475] InferenceWorker_p0-w0: stopping experience collection (29050 times) [2024-06-15 18:56:28,300][1651340] Signal inference workers to resume experience collection... (29050 times) [2024-06-15 18:56:28,301][1652475] InferenceWorker_p0-w0: resuming experience collection (29050 times) [2024-06-15 18:56:30,176][1652475] Updated weights for policy 0, policy_version 564836 (0.0015) [2024-06-15 18:56:30,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1156841472. Throughput: 0: 10331.0. Samples: 289247744. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:56:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 18:56:34,935][1652475] Updated weights for policy 0, policy_version 564885 (0.0020) [2024-06-15 18:56:35,738][1648984] Fps is (10 sec: 49148.4, 60 sec: 43144.0, 300 sec: 42987.1). Total num frames: 1156939776. Throughput: 0: 10353.6. Samples: 289287680. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:56:35,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:56:38,158][1652475] Updated weights for policy 0, policy_version 564929 (0.0015) [2024-06-15 18:56:39,915][1652475] Updated weights for policy 0, policy_version 564999 (0.0204) [2024-06-15 18:56:40,738][1648984] Fps is (10 sec: 32768.6, 60 sec: 41506.2, 300 sec: 43320.4). Total num frames: 1157169152. Throughput: 0: 10501.8. Samples: 289348608. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:56:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:56:41,250][1652475] Updated weights for policy 0, policy_version 565056 (0.0014) [2024-06-15 18:56:44,566][1652475] Updated weights for policy 0, policy_version 565093 (0.0044) [2024-06-15 18:56:45,738][1648984] Fps is (10 sec: 42601.6, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1157365760. Throughput: 0: 10251.4. Samples: 289409536. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:56:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 18:56:47,932][1652475] Updated weights for policy 0, policy_version 565168 (0.0013) [2024-06-15 18:56:50,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1157562368. Throughput: 0: 10240.0. Samples: 289436160. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:56:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:56:51,202][1652475] Updated weights for policy 0, policy_version 565237 (0.0013) [2024-06-15 18:56:53,121][1652475] Updated weights for policy 0, policy_version 565296 (0.0012) [2024-06-15 18:56:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 42877.2). Total num frames: 1157758976. Throughput: 0: 10387.9. Samples: 289503744. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:56:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:56:55,758][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000565312_1157758976.pth... [2024-06-15 18:56:55,831][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000560320_1147535360.pth [2024-06-15 18:56:58,231][1652475] Updated weights for policy 0, policy_version 565360 (0.0011) [2024-06-15 18:57:00,368][1652475] Updated weights for policy 0, policy_version 565432 (0.0012) [2024-06-15 18:57:00,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1158021120. Throughput: 0: 10194.5. Samples: 289558528. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:57:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:57:02,808][1652475] Updated weights for policy 0, policy_version 565488 (0.0118) [2024-06-15 18:57:05,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 40413.9, 300 sec: 42987.2). Total num frames: 1158152192. Throughput: 0: 10399.3. Samples: 289589760. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:57:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:57:06,893][1652475] Updated weights for policy 0, policy_version 565555 (0.0020) [2024-06-15 18:57:10,270][1652475] Updated weights for policy 0, policy_version 565619 (0.0023) [2024-06-15 18:57:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 42654.0). Total num frames: 1158414336. Throughput: 0: 10456.2. Samples: 289667584. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:57:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:57:11,950][1652475] Updated weights for policy 0, policy_version 565680 (0.0101) [2024-06-15 18:57:15,225][1652475] Updated weights for policy 0, policy_version 565760 (0.0018) [2024-06-15 18:57:15,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 1158676480. Throughput: 0: 10558.6. Samples: 289722880. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:57:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:57:18,490][1651340] Signal inference workers to stop experience collection... (29100 times) [2024-06-15 18:57:18,545][1652475] InferenceWorker_p0-w0: stopping experience collection (29100 times) [2024-06-15 18:57:18,691][1651340] Signal inference workers to resume experience collection... (29100 times) [2024-06-15 18:57:18,692][1652475] InferenceWorker_p0-w0: resuming experience collection (29100 times) [2024-06-15 18:57:19,227][1652475] Updated weights for policy 0, policy_version 565820 (0.0093) [2024-06-15 18:57:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.3, 300 sec: 42765.0). Total num frames: 1158807552. Throughput: 0: 10581.5. Samples: 289763840. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:57:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 18:57:22,195][1652475] Updated weights for policy 0, policy_version 565879 (0.0013) [2024-06-15 18:57:23,878][1652475] Updated weights for policy 0, policy_version 565936 (0.0024) [2024-06-15 18:57:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1159069696. Throughput: 0: 10513.0. Samples: 289821696. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:57:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:57:27,961][1652475] Updated weights for policy 0, policy_version 566005 (0.0012) [2024-06-15 18:57:30,019][1652475] Updated weights for policy 0, policy_version 566053 (0.0087) [2024-06-15 18:57:30,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1159331840. Throughput: 0: 10638.2. Samples: 289888256. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:57:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:57:33,748][1652475] Updated weights for policy 0, policy_version 566112 (0.0013) [2024-06-15 18:57:34,967][1652475] Updated weights for policy 0, policy_version 566160 (0.0015) [2024-06-15 18:57:35,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43691.2, 300 sec: 42653.9). Total num frames: 1159561216. Throughput: 0: 10899.9. Samples: 289926656. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:57:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:57:39,754][1652475] Updated weights for policy 0, policy_version 566224 (0.0014) [2024-06-15 18:57:40,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 1159692288. Throughput: 0: 10956.8. Samples: 289996800. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 18:57:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:57:41,200][1652475] Updated weights for policy 0, policy_version 566288 (0.0020) [2024-06-15 18:57:43,959][1652475] Updated weights for policy 0, policy_version 566338 (0.0015) [2024-06-15 18:57:45,253][1652475] Updated weights for policy 0, policy_version 566393 (0.0013) [2024-06-15 18:57:45,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1159987200. Throughput: 0: 11150.2. Samples: 290060288. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:57:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:57:46,654][1652475] Updated weights for policy 0, policy_version 566463 (0.0014) [2024-06-15 18:57:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1160118272. Throughput: 0: 11264.0. Samples: 290096640. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:57:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:57:51,838][1652475] Updated weights for policy 0, policy_version 566521 (0.0096) [2024-06-15 18:57:53,625][1652475] Updated weights for policy 0, policy_version 566585 (0.0089) [2024-06-15 18:57:55,738][1648984] Fps is (10 sec: 45874.5, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 1160445952. Throughput: 0: 11059.2. Samples: 290165248. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:57:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 18:57:56,333][1652475] Updated weights for policy 0, policy_version 566649 (0.0018) [2024-06-15 18:57:58,134][1652475] Updated weights for policy 0, policy_version 566704 (0.0014) [2024-06-15 18:58:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1160642560. Throughput: 0: 11366.4. Samples: 290234368. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 18:58:02,877][1652475] Updated weights for policy 0, policy_version 566777 (0.0013) [2024-06-15 18:58:03,549][1651340] Signal inference workers to stop experience collection... (29150 times) [2024-06-15 18:58:03,612][1652475] InferenceWorker_p0-w0: stopping experience collection (29150 times) [2024-06-15 18:58:03,779][1651340] Signal inference workers to resume experience collection... (29150 times) [2024-06-15 18:58:03,781][1652475] InferenceWorker_p0-w0: resuming experience collection (29150 times) [2024-06-15 18:58:04,342][1652475] Updated weights for policy 0, policy_version 566817 (0.0019) [2024-06-15 18:58:05,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 45875.1, 300 sec: 43542.5). Total num frames: 1160904704. Throughput: 0: 11229.8. Samples: 290269184. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:58:06,983][1652475] Updated weights for policy 0, policy_version 566880 (0.0013) [2024-06-15 18:58:07,795][1652475] Updated weights for policy 0, policy_version 566910 (0.0019) [2024-06-15 18:58:10,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 1161035776. Throughput: 0: 11411.8. Samples: 290335232. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:10,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:58:12,289][1652475] Updated weights for policy 0, policy_version 566972 (0.0134) [2024-06-15 18:58:14,648][1652475] Updated weights for policy 0, policy_version 567036 (0.0035) [2024-06-15 18:58:15,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1161330688. Throughput: 0: 11343.6. Samples: 290398720. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:15,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 18:58:16,091][1652475] Updated weights for policy 0, policy_version 567076 (0.0014) [2024-06-15 18:58:18,078][1652475] Updated weights for policy 0, policy_version 567107 (0.0011) [2024-06-15 18:58:19,329][1652475] Updated weights for policy 0, policy_version 567168 (0.0020) [2024-06-15 18:58:20,737][1648984] Fps is (10 sec: 52431.2, 60 sec: 45875.4, 300 sec: 43764.8). Total num frames: 1161560064. Throughput: 0: 11195.8. Samples: 290430464. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:58:25,739][1648984] Fps is (10 sec: 36044.6, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1161691136. Throughput: 0: 11275.4. Samples: 290504192. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:25,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 18:58:26,232][1652475] Updated weights for policy 0, policy_version 567250 (0.0014) [2024-06-15 18:58:28,629][1652475] Updated weights for policy 0, policy_version 567358 (0.0013) [2024-06-15 18:58:30,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 44782.8, 300 sec: 43764.7). Total num frames: 1162018816. Throughput: 0: 11025.0. Samples: 290556416. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:58:31,272][1652475] Updated weights for policy 0, policy_version 567422 (0.0123) [2024-06-15 18:58:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1162084352. Throughput: 0: 10945.4. Samples: 290589184. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:58:38,912][1652475] Updated weights for policy 0, policy_version 567520 (0.0172) [2024-06-15 18:58:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 45328.9, 300 sec: 43320.4). Total num frames: 1162412032. Throughput: 0: 10865.8. Samples: 290654208. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:40,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:58:40,900][1652475] Updated weights for policy 0, policy_version 567600 (0.0016) [2024-06-15 18:58:43,217][1652475] Updated weights for policy 0, policy_version 567635 (0.0014) [2024-06-15 18:58:45,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1162608640. Throughput: 0: 10695.1. Samples: 290715648. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:58:48,741][1652475] Updated weights for policy 0, policy_version 567691 (0.0017) [2024-06-15 18:58:50,004][1651340] Signal inference workers to stop experience collection... (29200 times) [2024-06-15 18:58:50,039][1652475] InferenceWorker_p0-w0: stopping experience collection (29200 times) [2024-06-15 18:58:50,059][1652475] Updated weights for policy 0, policy_version 567747 (0.0024) [2024-06-15 18:58:50,244][1651340] Signal inference workers to resume experience collection... (29200 times) [2024-06-15 18:58:50,245][1652475] InferenceWorker_p0-w0: resuming experience collection (29200 times) [2024-06-15 18:58:50,739][1648984] Fps is (10 sec: 39317.1, 60 sec: 44781.9, 300 sec: 43320.2). Total num frames: 1162805248. Throughput: 0: 10785.8. Samples: 290754560. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:50,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:58:51,302][1652475] Updated weights for policy 0, policy_version 567808 (0.0012) [2024-06-15 18:58:52,924][1652475] Updated weights for policy 0, policy_version 567871 (0.0012) [2024-06-15 18:58:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 1163001856. Throughput: 0: 10661.0. Samples: 290814976. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:58:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:58:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000567872_1163001856.pth... [2024-06-15 18:58:55,887][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000562880_1152778240.pth [2024-06-15 18:58:56,779][1652475] Updated weights for policy 0, policy_version 567920 (0.0015) [2024-06-15 18:59:00,738][1648984] Fps is (10 sec: 39326.9, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1163198464. Throughput: 0: 10911.3. Samples: 290889728. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:59:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:59:01,057][1652475] Updated weights for policy 0, policy_version 567985 (0.0076) [2024-06-15 18:59:02,785][1652475] Updated weights for policy 0, policy_version 568066 (0.0022) [2024-06-15 18:59:04,109][1652475] Updated weights for policy 0, policy_version 568124 (0.0013) [2024-06-15 18:59:05,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 1163526144. Throughput: 0: 10797.4. Samples: 290916352. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:59:05,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 18:59:10,529][1652475] Updated weights for policy 0, policy_version 568180 (0.0013) [2024-06-15 18:59:10,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.7, 300 sec: 42987.2). Total num frames: 1163624448. Throughput: 0: 10763.4. Samples: 290988544. Policy #0 lag: (min: 46.0, avg: 178.6, max: 350.0) [2024-06-15 18:59:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 18:59:11,987][1652475] Updated weights for policy 0, policy_version 568240 (0.0012) [2024-06-15 18:59:13,952][1652475] Updated weights for policy 0, policy_version 568275 (0.0014) [2024-06-15 18:59:15,369][1652475] Updated weights for policy 0, policy_version 568325 (0.0015) [2024-06-15 18:59:15,737][1648984] Fps is (10 sec: 42600.0, 60 sec: 43690.8, 300 sec: 43209.4). Total num frames: 1163952128. Throughput: 0: 10911.3. Samples: 291047424. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 18:59:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:59:20,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 41506.0, 300 sec: 42987.2). Total num frames: 1164050432. Throughput: 0: 10911.3. Samples: 291080192. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 18:59:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 18:59:20,776][1652475] Updated weights for policy 0, policy_version 568385 (0.0022) [2024-06-15 18:59:23,967][1652475] Updated weights for policy 0, policy_version 568496 (0.0014) [2024-06-15 18:59:25,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 44236.9, 300 sec: 43209.3). Total num frames: 1164345344. Throughput: 0: 10820.3. Samples: 291141120. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 18:59:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 18:59:26,605][1652475] Updated weights for policy 0, policy_version 568569 (0.0012) [2024-06-15 18:59:29,256][1652475] Updated weights for policy 0, policy_version 568629 (0.0013) [2024-06-15 18:59:30,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 42598.5, 300 sec: 43431.5). Total num frames: 1164574720. Throughput: 0: 10956.8. Samples: 291208704. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 18:59:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 18:59:33,401][1652475] Updated weights for policy 0, policy_version 568672 (0.0013) [2024-06-15 18:59:34,890][1651340] Signal inference workers to stop experience collection... (29250 times) [2024-06-15 18:59:34,924][1652475] InferenceWorker_p0-w0: stopping experience collection (29250 times) [2024-06-15 18:59:35,094][1651340] Signal inference workers to resume experience collection... (29250 times) [2024-06-15 18:59:35,095][1652475] InferenceWorker_p0-w0: resuming experience collection (29250 times) [2024-06-15 18:59:35,097][1652475] Updated weights for policy 0, policy_version 568736 (0.0012) [2024-06-15 18:59:35,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 1164836864. Throughput: 0: 10854.7. Samples: 291243008. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 18:59:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 18:59:39,146][1652475] Updated weights for policy 0, policy_version 568816 (0.0022) [2024-06-15 18:59:40,333][1652475] Updated weights for policy 0, policy_version 568850 (0.0028) [2024-06-15 18:59:40,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.8, 300 sec: 43320.4). Total num frames: 1165033472. Throughput: 0: 10968.2. Samples: 291308544. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 18:59:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 18:59:44,705][1652475] Updated weights for policy 0, policy_version 568902 (0.0012) [2024-06-15 18:59:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 1165197312. Throughput: 0: 10763.4. Samples: 291374080. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 18:59:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 18:59:46,171][1652475] Updated weights for policy 0, policy_version 568963 (0.0012) [2024-06-15 18:59:47,420][1652475] Updated weights for policy 0, policy_version 569019 (0.0035) [2024-06-15 18:59:50,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42599.3, 300 sec: 42765.3). Total num frames: 1165361152. Throughput: 0: 10797.5. Samples: 291402240. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 18:59:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 18:59:51,755][1652475] Updated weights for policy 0, policy_version 569072 (0.0028) [2024-06-15 18:59:53,913][1652475] Updated weights for policy 0, policy_version 569136 (0.0013) [2024-06-15 18:59:55,750][1648984] Fps is (10 sec: 42544.5, 60 sec: 43681.5, 300 sec: 43207.5). Total num frames: 1165623296. Throughput: 0: 10703.5. Samples: 291470336. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 18:59:55,751][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 18:59:57,215][1652475] Updated weights for policy 0, policy_version 569185 (0.0012) [2024-06-15 18:59:58,647][1652475] Updated weights for policy 0, policy_version 569248 (0.0012) [2024-06-15 19:00:00,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 1165885440. Throughput: 0: 10797.5. Samples: 291533312. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 19:00:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:00:03,282][1652475] Updated weights for policy 0, policy_version 569320 (0.0016) [2024-06-15 19:00:05,496][1652475] Updated weights for policy 0, policy_version 569347 (0.0014) [2024-06-15 19:00:05,738][1648984] Fps is (10 sec: 42652.2, 60 sec: 42052.4, 300 sec: 43209.3). Total num frames: 1166049280. Throughput: 0: 10831.6. Samples: 291567616. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 19:00:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:00:06,728][1652475] Updated weights for policy 0, policy_version 569407 (0.0013) [2024-06-15 19:00:08,807][1652475] Updated weights for policy 0, policy_version 569462 (0.0013) [2024-06-15 19:00:10,582][1652475] Updated weights for policy 0, policy_version 569530 (0.0013) [2024-06-15 19:00:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 46421.3, 300 sec: 43542.6). Total num frames: 1166409728. Throughput: 0: 11002.3. Samples: 291636224. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 19:00:10,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:00:15,049][1652475] Updated weights for policy 0, policy_version 569600 (0.0044) [2024-06-15 19:00:15,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43144.4, 300 sec: 43098.3). Total num frames: 1166540800. Throughput: 0: 11002.3. Samples: 291703808. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 19:00:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:00:18,127][1652475] Updated weights for policy 0, policy_version 569656 (0.0012) [2024-06-15 19:00:20,593][1652475] Updated weights for policy 0, policy_version 569724 (0.0016) [2024-06-15 19:00:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 43653.6). Total num frames: 1166802944. Throughput: 0: 10990.9. Samples: 291737600. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 19:00:20,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:00:22,819][1651340] Signal inference workers to stop experience collection... (29300 times) [2024-06-15 19:00:22,902][1652475] InferenceWorker_p0-w0: stopping experience collection (29300 times) [2024-06-15 19:00:23,144][1651340] Signal inference workers to resume experience collection... (29300 times) [2024-06-15 19:00:23,144][1652475] InferenceWorker_p0-w0: resuming experience collection (29300 times) [2024-06-15 19:00:23,298][1652475] Updated weights for policy 0, policy_version 569782 (0.0011) [2024-06-15 19:00:25,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43144.6, 300 sec: 42654.0). Total num frames: 1166934016. Throughput: 0: 10968.2. Samples: 291802112. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 19:00:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:00:26,345][1652475] Updated weights for policy 0, policy_version 569830 (0.0012) [2024-06-15 19:00:29,116][1652475] Updated weights for policy 0, policy_version 569875 (0.0013) [2024-06-15 19:00:30,766][1648984] Fps is (10 sec: 39209.2, 60 sec: 43669.8, 300 sec: 43538.3). Total num frames: 1167196160. Throughput: 0: 10927.1. Samples: 291866112. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 19:00:30,767][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:00:31,524][1652475] Updated weights for policy 0, policy_version 569936 (0.0027) [2024-06-15 19:00:35,738][1648984] Fps is (10 sec: 42597.0, 60 sec: 42052.1, 300 sec: 42987.1). Total num frames: 1167360000. Throughput: 0: 11059.1. Samples: 291899904. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 19:00:35,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:00:35,848][1652475] Updated weights for policy 0, policy_version 570004 (0.0014) [2024-06-15 19:00:37,017][1652475] Updated weights for policy 0, policy_version 570050 (0.0015) [2024-06-15 19:00:38,101][1652475] Updated weights for policy 0, policy_version 570109 (0.0153) [2024-06-15 19:00:40,738][1648984] Fps is (10 sec: 46006.4, 60 sec: 43690.5, 300 sec: 43320.4). Total num frames: 1167654912. Throughput: 0: 11096.4. Samples: 291969536. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 19:00:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:00:42,397][1652475] Updated weights for policy 0, policy_version 570177 (0.0014) [2024-06-15 19:00:45,738][1648984] Fps is (10 sec: 49152.8, 60 sec: 44236.7, 300 sec: 43542.5). Total num frames: 1167851520. Throughput: 0: 11252.6. Samples: 292039680. Policy #0 lag: (min: 9.0, avg: 123.5, max: 249.0) [2024-06-15 19:00:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:00:47,594][1652475] Updated weights for policy 0, policy_version 570242 (0.0013) [2024-06-15 19:00:49,651][1652475] Updated weights for policy 0, policy_version 570323 (0.0014) [2024-06-15 19:00:50,738][1648984] Fps is (10 sec: 45876.1, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 1168113664. Throughput: 0: 11366.4. Samples: 292079104. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:00:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:00:52,093][1652475] Updated weights for policy 0, policy_version 570400 (0.0109) [2024-06-15 19:00:53,961][1652475] Updated weights for policy 0, policy_version 570449 (0.0015) [2024-06-15 19:00:55,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 45884.6, 300 sec: 43986.8). Total num frames: 1168375808. Throughput: 0: 11059.1. Samples: 292133888. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:00:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:00:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000570496_1168375808.pth... [2024-06-15 19:00:55,785][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000565312_1157758976.pth [2024-06-15 19:00:59,484][1652475] Updated weights for policy 0, policy_version 570500 (0.0016) [2024-06-15 19:01:00,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1168506880. Throughput: 0: 11286.8. Samples: 292211712. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:01:01,736][1652475] Updated weights for policy 0, policy_version 570608 (0.0016) [2024-06-15 19:01:03,805][1652475] Updated weights for policy 0, policy_version 570656 (0.0014) [2024-06-15 19:01:04,610][1652475] Updated weights for policy 0, policy_version 570688 (0.0017) [2024-06-15 19:01:05,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 45329.1, 300 sec: 43764.7). Total num frames: 1168769024. Throughput: 0: 11195.7. Samples: 292241408. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:05,738][1648984] Avg episode reward: [(0, '-0.630')] [2024-06-15 19:01:07,417][1652475] Updated weights for policy 0, policy_version 570745 (0.0018) [2024-06-15 19:01:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 1168900096. Throughput: 0: 11207.1. Samples: 292306432. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:10,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:01:11,459][1651340] Signal inference workers to stop experience collection... (29350 times) [2024-06-15 19:01:11,509][1652475] InferenceWorker_p0-w0: stopping experience collection (29350 times) [2024-06-15 19:01:11,639][1651340] Signal inference workers to resume experience collection... (29350 times) [2024-06-15 19:01:11,640][1652475] InferenceWorker_p0-w0: resuming experience collection (29350 times) [2024-06-15 19:01:11,642][1652475] Updated weights for policy 0, policy_version 570784 (0.0012) [2024-06-15 19:01:13,796][1652475] Updated weights for policy 0, policy_version 570868 (0.0015) [2024-06-15 19:01:15,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 45329.1, 300 sec: 43875.8). Total num frames: 1169260544. Throughput: 0: 11089.0. Samples: 292364800. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:01:15,781][1652475] Updated weights for policy 0, policy_version 570942 (0.0014) [2024-06-15 19:01:20,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 1169391616. Throughput: 0: 11127.5. Samples: 292400640. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 19:01:20,829][1652475] Updated weights for policy 0, policy_version 571003 (0.0014) [2024-06-15 19:01:23,707][1652475] Updated weights for policy 0, policy_version 571072 (0.0013) [2024-06-15 19:01:25,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1169555456. Throughput: 0: 10934.1. Samples: 292461568. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 19:01:27,924][1652475] Updated weights for policy 0, policy_version 571169 (0.0014) [2024-06-15 19:01:30,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43711.5, 300 sec: 43653.7). Total num frames: 1169817600. Throughput: 0: 10706.5. Samples: 292521472. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 19:01:33,947][1652475] Updated weights for policy 0, policy_version 571235 (0.0137) [2024-06-15 19:01:35,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44237.0, 300 sec: 43542.6). Total num frames: 1170014208. Throughput: 0: 10683.7. Samples: 292559872. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 19:01:36,026][1652475] Updated weights for policy 0, policy_version 571312 (0.0013) [2024-06-15 19:01:38,124][1652475] Updated weights for policy 0, policy_version 571344 (0.0026) [2024-06-15 19:01:39,052][1652475] Updated weights for policy 0, policy_version 571389 (0.0014) [2024-06-15 19:01:40,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.8, 300 sec: 43764.7). Total num frames: 1170276352. Throughput: 0: 10706.6. Samples: 292615680. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 19:01:41,283][1652475] Updated weights for policy 0, policy_version 571453 (0.0015) [2024-06-15 19:01:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.5, 300 sec: 43542.6). Total num frames: 1170407424. Throughput: 0: 10626.9. Samples: 292689920. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:01:47,841][1652475] Updated weights for policy 0, policy_version 571579 (0.0123) [2024-06-15 19:01:50,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42052.3, 300 sec: 43653.7). Total num frames: 1170636800. Throughput: 0: 10376.5. Samples: 292708352. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:50,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 19:01:51,492][1652475] Updated weights for policy 0, policy_version 571632 (0.0014) [2024-06-15 19:01:52,449][1652475] Updated weights for policy 0, policy_version 571680 (0.0012) [2024-06-15 19:01:55,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 41506.3, 300 sec: 43542.6). Total num frames: 1170866176. Throughput: 0: 10615.5. Samples: 292784128. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:01:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:01:57,184][1652475] Updated weights for policy 0, policy_version 571716 (0.0015) [2024-06-15 19:01:57,963][1651340] Signal inference workers to stop experience collection... (29400 times) [2024-06-15 19:01:57,998][1652475] InferenceWorker_p0-w0: stopping experience collection (29400 times) [2024-06-15 19:01:58,233][1651340] Signal inference workers to resume experience collection... (29400 times) [2024-06-15 19:01:58,235][1652475] InferenceWorker_p0-w0: resuming experience collection (29400 times) [2024-06-15 19:01:59,154][1652475] Updated weights for policy 0, policy_version 571808 (0.0013) [2024-06-15 19:02:00,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 43690.6, 300 sec: 43986.8). Total num frames: 1171128320. Throughput: 0: 10649.6. Samples: 292844032. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:02:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:02:03,440][1652475] Updated weights for policy 0, policy_version 571872 (0.0014) [2024-06-15 19:02:05,128][1652475] Updated weights for policy 0, policy_version 571942 (0.0013) [2024-06-15 19:02:05,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1171390464. Throughput: 0: 10717.9. Samples: 292882944. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:02:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:02:08,726][1652475] Updated weights for policy 0, policy_version 571985 (0.0013) [2024-06-15 19:02:10,434][1652475] Updated weights for policy 0, policy_version 572037 (0.0014) [2024-06-15 19:02:10,737][1648984] Fps is (10 sec: 42599.7, 60 sec: 44237.0, 300 sec: 43653.7). Total num frames: 1171554304. Throughput: 0: 10786.2. Samples: 292946944. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:02:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:02:14,156][1652475] Updated weights for policy 0, policy_version 572098 (0.0011) [2024-06-15 19:02:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 43986.9). Total num frames: 1171783680. Throughput: 0: 10968.2. Samples: 293015040. Policy #0 lag: (min: 12.0, avg: 86.0, max: 268.0) [2024-06-15 19:02:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:02:16,199][1652475] Updated weights for policy 0, policy_version 572176 (0.0129) [2024-06-15 19:02:20,049][1652475] Updated weights for policy 0, policy_version 572225 (0.0013) [2024-06-15 19:02:20,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 1171947520. Throughput: 0: 10911.3. Samples: 293050880. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:02:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:02:21,925][1652475] Updated weights for policy 0, policy_version 572307 (0.0014) [2024-06-15 19:02:25,746][1648984] Fps is (10 sec: 39288.2, 60 sec: 43684.5, 300 sec: 43541.3). Total num frames: 1172176896. Throughput: 0: 11205.0. Samples: 293120000. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:02:25,747][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:02:25,924][1652475] Updated weights for policy 0, policy_version 572368 (0.0155) [2024-06-15 19:02:27,912][1652475] Updated weights for policy 0, policy_version 572451 (0.0023) [2024-06-15 19:02:30,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 1172439040. Throughput: 0: 11127.4. Samples: 293190656. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:02:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:02:32,840][1652475] Updated weights for policy 0, policy_version 572544 (0.0037) [2024-06-15 19:02:34,326][1652475] Updated weights for policy 0, policy_version 572606 (0.0030) [2024-06-15 19:02:35,746][1648984] Fps is (10 sec: 52428.5, 60 sec: 44776.5, 300 sec: 44096.7). Total num frames: 1172701184. Throughput: 0: 11364.2. Samples: 293219840. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:02:35,747][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:02:37,757][1652475] Updated weights for policy 0, policy_version 572672 (0.0015) [2024-06-15 19:02:39,583][1652475] Updated weights for policy 0, policy_version 572724 (0.0012) [2024-06-15 19:02:40,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 44782.8, 300 sec: 43986.8). Total num frames: 1172963328. Throughput: 0: 11298.1. Samples: 293292544. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:02:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:02:42,184][1651340] Signal inference workers to stop experience collection... (29450 times) [2024-06-15 19:02:42,234][1652475] InferenceWorker_p0-w0: stopping experience collection (29450 times) [2024-06-15 19:02:42,477][1651340] Signal inference workers to resume experience collection... (29450 times) [2024-06-15 19:02:42,480][1652475] InferenceWorker_p0-w0: resuming experience collection (29450 times) [2024-06-15 19:02:42,721][1652475] Updated weights for policy 0, policy_version 572756 (0.0039) [2024-06-15 19:02:45,738][1648984] Fps is (10 sec: 39355.2, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 1173094400. Throughput: 0: 11446.1. Samples: 293359104. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:02:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:02:45,909][1652475] Updated weights for policy 0, policy_version 572807 (0.0103) [2024-06-15 19:02:48,511][1652475] Updated weights for policy 0, policy_version 572865 (0.0011) [2024-06-15 19:02:49,769][1652475] Updated weights for policy 0, policy_version 572928 (0.0013) [2024-06-15 19:02:50,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 45329.0, 300 sec: 43764.7). Total num frames: 1173356544. Throughput: 0: 11298.1. Samples: 293391360. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:02:50,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 19:02:51,570][1652475] Updated weights for policy 0, policy_version 572985 (0.0034) [2024-06-15 19:02:55,079][1652475] Updated weights for policy 0, policy_version 573046 (0.0014) [2024-06-15 19:02:55,738][1648984] Fps is (10 sec: 52427.1, 60 sec: 45875.0, 300 sec: 43986.8). Total num frames: 1173618688. Throughput: 0: 11263.9. Samples: 293453824. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:02:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:02:55,759][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000573056_1173618688.pth... [2024-06-15 19:02:55,832][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000567872_1163001856.pth [2024-06-15 19:02:59,768][1652475] Updated weights for policy 0, policy_version 573078 (0.0014) [2024-06-15 19:03:00,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 1173749760. Throughput: 0: 11377.7. Samples: 293527040. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:00,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:03:01,470][1652475] Updated weights for policy 0, policy_version 573152 (0.0014) [2024-06-15 19:03:03,571][1652475] Updated weights for policy 0, policy_version 573235 (0.0015) [2024-06-15 19:03:05,738][1648984] Fps is (10 sec: 42599.6, 60 sec: 44236.7, 300 sec: 44098.0). Total num frames: 1174044672. Throughput: 0: 11127.4. Samples: 293551616. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:03:06,318][1652475] Updated weights for policy 0, policy_version 573296 (0.0021) [2024-06-15 19:03:10,738][1648984] Fps is (10 sec: 39322.6, 60 sec: 43144.4, 300 sec: 43431.5). Total num frames: 1174142976. Throughput: 0: 11197.8. Samples: 293623808. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:03:11,712][1652475] Updated weights for policy 0, policy_version 573332 (0.0013) [2024-06-15 19:03:13,704][1652475] Updated weights for policy 0, policy_version 573424 (0.0015) [2024-06-15 19:03:15,466][1652475] Updated weights for policy 0, policy_version 573494 (0.0015) [2024-06-15 19:03:15,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 1174536192. Throughput: 0: 10945.4. Samples: 293683200. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:03:18,280][1652475] Updated weights for policy 0, policy_version 573538 (0.0014) [2024-06-15 19:03:20,739][1648984] Fps is (10 sec: 52428.5, 60 sec: 45329.0, 300 sec: 43986.9). Total num frames: 1174667264. Throughput: 0: 10981.6. Samples: 293713920. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:20,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:03:23,936][1652475] Updated weights for policy 0, policy_version 573600 (0.0014) [2024-06-15 19:03:25,732][1652475] Updated weights for policy 0, policy_version 573680 (0.0154) [2024-06-15 19:03:25,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 45335.4, 300 sec: 43653.6). Total num frames: 1174896640. Throughput: 0: 10922.7. Samples: 293784064. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:03:25,867][1651340] Signal inference workers to stop experience collection... (29500 times) [2024-06-15 19:03:25,932][1652475] InferenceWorker_p0-w0: stopping experience collection (29500 times) [2024-06-15 19:03:26,072][1651340] Signal inference workers to resume experience collection... (29500 times) [2024-06-15 19:03:26,073][1652475] InferenceWorker_p0-w0: resuming experience collection (29500 times) [2024-06-15 19:03:27,563][1652475] Updated weights for policy 0, policy_version 573744 (0.0140) [2024-06-15 19:03:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1175060480. Throughput: 0: 10740.6. Samples: 293842432. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 19:03:30,912][1652475] Updated weights for policy 0, policy_version 573763 (0.0012) [2024-06-15 19:03:35,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 41512.0, 300 sec: 43320.4). Total num frames: 1175191552. Throughput: 0: 10763.4. Samples: 293875712. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:03:37,047][1652475] Updated weights for policy 0, policy_version 573860 (0.0015) [2024-06-15 19:03:38,974][1652475] Updated weights for policy 0, policy_version 573923 (0.0061) [2024-06-15 19:03:40,396][1652475] Updated weights for policy 0, policy_version 573987 (0.0012) [2024-06-15 19:03:40,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 1175552000. Throughput: 0: 10581.4. Samples: 293929984. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:03:41,025][1652475] Updated weights for policy 0, policy_version 574016 (0.0010) [2024-06-15 19:03:45,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.5, 300 sec: 43653.8). Total num frames: 1175683072. Throughput: 0: 10626.9. Samples: 294005248. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:03:45,824][1652475] Updated weights for policy 0, policy_version 574071 (0.0012) [2024-06-15 19:03:48,659][1652475] Updated weights for policy 0, policy_version 574128 (0.0015) [2024-06-15 19:03:50,358][1652475] Updated weights for policy 0, policy_version 574192 (0.0022) [2024-06-15 19:03:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 1175977984. Throughput: 0: 10763.4. Samples: 294035968. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 19:03:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:03:52,158][1652475] Updated weights for policy 0, policy_version 574260 (0.0013) [2024-06-15 19:03:55,740][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.3, 300 sec: 43764.7). Total num frames: 1176109056. Throughput: 0: 10547.2. Samples: 294098432. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:03:55,743][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:03:58,167][1652475] Updated weights for policy 0, policy_version 574304 (0.0011) [2024-06-15 19:04:00,206][1652475] Updated weights for policy 0, policy_version 574384 (0.0014) [2024-06-15 19:04:00,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 1176371200. Throughput: 0: 10524.4. Samples: 294156800. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:04:02,178][1652475] Updated weights for policy 0, policy_version 574460 (0.0014) [2024-06-15 19:04:05,408][1652475] Updated weights for policy 0, policy_version 574517 (0.0012) [2024-06-15 19:04:05,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43144.5, 300 sec: 44097.9). Total num frames: 1176633344. Throughput: 0: 10501.7. Samples: 294186496. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:05,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:04:10,625][1652475] Updated weights for policy 0, policy_version 574547 (0.0013) [2024-06-15 19:04:10,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 42052.3, 300 sec: 43098.2). Total num frames: 1176666112. Throughput: 0: 10490.3. Samples: 294256128. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:04:13,069][1652475] Updated weights for policy 0, policy_version 574624 (0.0013) [2024-06-15 19:04:13,150][1651340] Signal inference workers to stop experience collection... (29550 times) [2024-06-15 19:04:13,202][1652475] InferenceWorker_p0-w0: stopping experience collection (29550 times) [2024-06-15 19:04:13,378][1651340] Signal inference workers to resume experience collection... (29550 times) [2024-06-15 19:04:13,379][1652475] InferenceWorker_p0-w0: resuming experience collection (29550 times) [2024-06-15 19:04:15,084][1652475] Updated weights for policy 0, policy_version 574710 (0.0100) [2024-06-15 19:04:15,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 1177026560. Throughput: 0: 10365.2. Samples: 294308864. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:15,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 19:04:16,067][1652475] Updated weights for policy 0, policy_version 574723 (0.0010) [2024-06-15 19:04:20,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 41506.2, 300 sec: 43431.5). Total num frames: 1177157632. Throughput: 0: 10331.0. Samples: 294340608. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:04:23,117][1652475] Updated weights for policy 0, policy_version 574789 (0.0013) [2024-06-15 19:04:24,544][1652475] Updated weights for policy 0, policy_version 574847 (0.0013) [2024-06-15 19:04:25,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40960.1, 300 sec: 43320.4). Total num frames: 1177354240. Throughput: 0: 10604.1. Samples: 294407168. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:04:26,528][1652475] Updated weights for policy 0, policy_version 574912 (0.0015) [2024-06-15 19:04:28,242][1652475] Updated weights for policy 0, policy_version 574976 (0.0010) [2024-06-15 19:04:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1177681920. Throughput: 0: 10160.4. Samples: 294462464. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:04:35,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 1177681920. Throughput: 0: 10262.8. Samples: 294497792. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:04:35,973][1652475] Updated weights for policy 0, policy_version 575056 (0.0014) [2024-06-15 19:04:37,294][1652475] Updated weights for policy 0, policy_version 575104 (0.0029) [2024-06-15 19:04:39,065][1652475] Updated weights for policy 0, policy_version 575157 (0.0012) [2024-06-15 19:04:40,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 1178075136. Throughput: 0: 10410.7. Samples: 294566912. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:04:41,016][1652475] Updated weights for policy 0, policy_version 575249 (0.0117) [2024-06-15 19:04:45,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 1178206208. Throughput: 0: 10513.1. Samples: 294629888. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:45,741][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:04:48,774][1652475] Updated weights for policy 0, policy_version 575329 (0.0015) [2024-06-15 19:04:49,684][1652475] Updated weights for policy 0, policy_version 575361 (0.0013) [2024-06-15 19:04:50,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 40413.9, 300 sec: 43322.3). Total num frames: 1178402816. Throughput: 0: 10774.8. Samples: 294671360. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:04:51,089][1652475] Updated weights for policy 0, policy_version 575411 (0.0012) [2024-06-15 19:04:52,486][1652475] Updated weights for policy 0, policy_version 575488 (0.0012) [2024-06-15 19:04:55,751][1648984] Fps is (10 sec: 52360.4, 60 sec: 43681.2, 300 sec: 43540.6). Total num frames: 1178730496. Throughput: 0: 10544.1. Samples: 294730752. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:04:55,752][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:04:55,766][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000575552_1178730496.pth... [2024-06-15 19:04:55,854][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000570496_1168375808.pth [2024-06-15 19:04:59,277][1651340] Signal inference workers to stop experience collection... (29600 times) [2024-06-15 19:04:59,371][1652475] InferenceWorker_p0-w0: stopping experience collection (29600 times) [2024-06-15 19:04:59,561][1651340] Signal inference workers to resume experience collection... (29600 times) [2024-06-15 19:04:59,562][1652475] InferenceWorker_p0-w0: resuming experience collection (29600 times) [2024-06-15 19:04:59,564][1652475] Updated weights for policy 0, policy_version 575568 (0.0012) [2024-06-15 19:05:00,737][1648984] Fps is (10 sec: 42598.7, 60 sec: 40960.1, 300 sec: 43320.4). Total num frames: 1178828800. Throughput: 0: 10934.1. Samples: 294800896. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:05:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:05:02,046][1652475] Updated weights for policy 0, policy_version 575632 (0.0012) [2024-06-15 19:05:03,307][1652475] Updated weights for policy 0, policy_version 575684 (0.0013) [2024-06-15 19:05:04,719][1652475] Updated weights for policy 0, policy_version 575744 (0.0013) [2024-06-15 19:05:05,738][1648984] Fps is (10 sec: 45935.2, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1179189248. Throughput: 0: 10990.9. Samples: 294835200. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:05:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:05:06,022][1652475] Updated weights for policy 0, policy_version 575808 (0.0150) [2024-06-15 19:05:10,739][1648984] Fps is (10 sec: 42596.7, 60 sec: 43144.3, 300 sec: 43098.2). Total num frames: 1179254784. Throughput: 0: 10888.5. Samples: 294897152. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:05:10,741][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:05:12,269][1652475] Updated weights for policy 0, policy_version 575865 (0.0011) [2024-06-15 19:05:15,170][1652475] Updated weights for policy 0, policy_version 575952 (0.0016) [2024-06-15 19:05:15,739][1648984] Fps is (10 sec: 39316.9, 60 sec: 42597.5, 300 sec: 43320.2). Total num frames: 1179582464. Throughput: 0: 10922.4. Samples: 294953984. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:05:15,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:05:16,295][1652475] Updated weights for policy 0, policy_version 575996 (0.0013) [2024-06-15 19:05:20,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 1179779072. Throughput: 0: 10922.7. Samples: 294989312. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:05:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:05:22,921][1652475] Updated weights for policy 0, policy_version 576080 (0.0015) [2024-06-15 19:05:25,640][1652475] Updated weights for policy 0, policy_version 576131 (0.0014) [2024-06-15 19:05:25,740][1648984] Fps is (10 sec: 32771.8, 60 sec: 42598.3, 300 sec: 43102.4). Total num frames: 1179910144. Throughput: 0: 10831.6. Samples: 295054336. Policy #0 lag: (min: 68.0, avg: 242.4, max: 408.0) [2024-06-15 19:05:25,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:05:27,577][1652475] Updated weights for policy 0, policy_version 576212 (0.0015) [2024-06-15 19:05:30,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41506.0, 300 sec: 43431.5). Total num frames: 1180172288. Throughput: 0: 10877.1. Samples: 295119360. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:05:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:05:31,801][1652475] Updated weights for policy 0, policy_version 576291 (0.0017) [2024-06-15 19:05:34,534][1652475] Updated weights for policy 0, policy_version 576336 (0.0014) [2024-06-15 19:05:35,739][1648984] Fps is (10 sec: 52424.1, 60 sec: 45874.5, 300 sec: 43320.3). Total num frames: 1180434432. Throughput: 0: 10785.9. Samples: 295156736. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:05:35,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:05:37,362][1652475] Updated weights for policy 0, policy_version 576404 (0.0128) [2024-06-15 19:05:38,768][1652475] Updated weights for policy 0, policy_version 576480 (0.0014) [2024-06-15 19:05:40,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1180696576. Throughput: 0: 10812.0. Samples: 295217152. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:05:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:05:42,476][1651340] Signal inference workers to stop experience collection... (29650 times) [2024-06-15 19:05:42,539][1652475] InferenceWorker_p0-w0: stopping experience collection (29650 times) [2024-06-15 19:05:42,692][1651340] Signal inference workers to resume experience collection... (29650 times) [2024-06-15 19:05:42,693][1652475] InferenceWorker_p0-w0: resuming experience collection (29650 times) [2024-06-15 19:05:43,256][1652475] Updated weights for policy 0, policy_version 576546 (0.0047) [2024-06-15 19:05:45,738][1648984] Fps is (10 sec: 39325.3, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1180827648. Throughput: 0: 10865.7. Samples: 295289856. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:05:45,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:05:46,534][1652475] Updated weights for policy 0, policy_version 576592 (0.0011) [2024-06-15 19:05:48,856][1652475] Updated weights for policy 0, policy_version 576641 (0.0016) [2024-06-15 19:05:50,303][1652475] Updated weights for policy 0, policy_version 576707 (0.0013) [2024-06-15 19:05:50,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45329.0, 300 sec: 43209.4). Total num frames: 1181122560. Throughput: 0: 10820.3. Samples: 295322112. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:05:50,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:05:53,568][1652475] Updated weights for policy 0, policy_version 576771 (0.0014) [2024-06-15 19:05:54,838][1652475] Updated weights for policy 0, policy_version 576832 (0.0024) [2024-06-15 19:05:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43700.2, 300 sec: 43542.6). Total num frames: 1181351936. Throughput: 0: 10900.0. Samples: 295387648. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:05:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:06:00,722][1652475] Updated weights for policy 0, policy_version 576896 (0.0106) [2024-06-15 19:06:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 44236.7, 300 sec: 43098.3). Total num frames: 1181483008. Throughput: 0: 11139.2. Samples: 295455232. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:06:01,750][1652475] Updated weights for policy 0, policy_version 576954 (0.0016) [2024-06-15 19:06:04,828][1652475] Updated weights for policy 0, policy_version 577028 (0.0013) [2024-06-15 19:06:05,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 1181810688. Throughput: 0: 11104.7. Samples: 295489024. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:05,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 19:06:06,037][1652475] Updated weights for policy 0, policy_version 577088 (0.0015) [2024-06-15 19:06:10,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43690.8, 300 sec: 42765.0). Total num frames: 1181876224. Throughput: 0: 10945.4. Samples: 295546880. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:10,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:06:11,766][1652475] Updated weights for policy 0, policy_version 577151 (0.0013) [2024-06-15 19:06:15,722][1652475] Updated weights for policy 0, policy_version 577232 (0.0017) [2024-06-15 19:06:15,737][1648984] Fps is (10 sec: 36045.2, 60 sec: 43145.5, 300 sec: 43320.4). Total num frames: 1182171136. Throughput: 0: 11093.4. Samples: 295618560. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:06:16,882][1652475] Updated weights for policy 0, policy_version 577277 (0.0013) [2024-06-15 19:06:18,638][1652475] Updated weights for policy 0, policy_version 577333 (0.0014) [2024-06-15 19:06:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1182400512. Throughput: 0: 10911.5. Samples: 295647744. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:20,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:06:22,496][1652475] Updated weights for policy 0, policy_version 577381 (0.0012) [2024-06-15 19:06:25,267][1652475] Updated weights for policy 0, policy_version 577441 (0.0014) [2024-06-15 19:06:25,738][1648984] Fps is (10 sec: 45874.5, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 1182629888. Throughput: 0: 11161.6. Samples: 295719424. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:06:27,448][1652475] Updated weights for policy 0, policy_version 577488 (0.0013) [2024-06-15 19:06:28,543][1651340] Signal inference workers to stop experience collection... (29700 times) [2024-06-15 19:06:28,636][1652475] InferenceWorker_p0-w0: stopping experience collection (29700 times) [2024-06-15 19:06:28,753][1651340] Signal inference workers to resume experience collection... (29700 times) [2024-06-15 19:06:28,755][1652475] InferenceWorker_p0-w0: resuming experience collection (29700 times) [2024-06-15 19:06:29,443][1652475] Updated weights for policy 0, policy_version 577556 (0.0140) [2024-06-15 19:06:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 45875.3, 300 sec: 43764.7). Total num frames: 1182924800. Throughput: 0: 10786.1. Samples: 295775232. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:06:35,063][1652475] Updated weights for policy 0, policy_version 577616 (0.0014) [2024-06-15 19:06:35,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43145.2, 300 sec: 43209.3). Total num frames: 1183023104. Throughput: 0: 10888.5. Samples: 295812096. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:06:37,364][1652475] Updated weights for policy 0, policy_version 577682 (0.0032) [2024-06-15 19:06:38,383][1652475] Updated weights for policy 0, policy_version 577724 (0.0011) [2024-06-15 19:06:40,154][1652475] Updated weights for policy 0, policy_version 577760 (0.0014) [2024-06-15 19:06:40,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43144.6, 300 sec: 43653.6). Total num frames: 1183285248. Throughput: 0: 10945.4. Samples: 295880192. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:40,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:06:42,577][1652475] Updated weights for policy 0, policy_version 577847 (0.0015) [2024-06-15 19:06:45,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1183449088. Throughput: 0: 10740.6. Samples: 295938560. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:45,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:06:49,530][1652475] Updated weights for policy 0, policy_version 577904 (0.0033) [2024-06-15 19:06:50,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 1183645696. Throughput: 0: 10888.5. Samples: 295979008. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:50,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:06:51,116][1652475] Updated weights for policy 0, policy_version 577968 (0.0013) [2024-06-15 19:06:52,521][1652475] Updated weights for policy 0, policy_version 578016 (0.0013) [2024-06-15 19:06:55,055][1652475] Updated weights for policy 0, policy_version 578110 (0.0014) [2024-06-15 19:06:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1183973376. Throughput: 0: 10638.2. Samples: 296025600. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:06:55,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 19:06:55,793][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000578112_1183973376.pth... [2024-06-15 19:06:55,840][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000573056_1173618688.pth [2024-06-15 19:07:00,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1183973376. Throughput: 0: 10695.1. Samples: 296099840. Policy #0 lag: (min: 15.0, avg: 108.3, max: 271.0) [2024-06-15 19:07:00,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:07:02,085][1652475] Updated weights for policy 0, policy_version 578170 (0.0015) [2024-06-15 19:07:03,464][1652475] Updated weights for policy 0, policy_version 578210 (0.0011) [2024-06-15 19:07:05,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 1184333824. Throughput: 0: 10706.5. Samples: 296129536. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:05,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:07:06,430][1652475] Updated weights for policy 0, policy_version 578320 (0.0132) [2024-06-15 19:07:07,245][1652475] Updated weights for policy 0, policy_version 578368 (0.0014) [2024-06-15 19:07:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1184497664. Throughput: 0: 10410.7. Samples: 296187904. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:07:14,171][1651340] Signal inference workers to stop experience collection... (29750 times) [2024-06-15 19:07:14,209][1652475] InferenceWorker_p0-w0: stopping experience collection (29750 times) [2024-06-15 19:07:14,340][1651340] Signal inference workers to resume experience collection... (29750 times) [2024-06-15 19:07:14,341][1652475] InferenceWorker_p0-w0: resuming experience collection (29750 times) [2024-06-15 19:07:14,560][1652475] Updated weights for policy 0, policy_version 578435 (0.0101) [2024-06-15 19:07:15,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 1184727040. Throughput: 0: 10672.4. Samples: 296255488. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:07:16,755][1652475] Updated weights for policy 0, policy_version 578497 (0.0013) [2024-06-15 19:07:19,007][1652475] Updated weights for policy 0, policy_version 578598 (0.0015) [2024-06-15 19:07:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43543.8). Total num frames: 1185021952. Throughput: 0: 10399.3. Samples: 296280064. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:07:25,254][1652475] Updated weights for policy 0, policy_version 578640 (0.0015) [2024-06-15 19:07:25,738][1648984] Fps is (10 sec: 36043.9, 60 sec: 40959.8, 300 sec: 42876.1). Total num frames: 1185087488. Throughput: 0: 10626.8. Samples: 296358400. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:25,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:07:26,931][1652475] Updated weights for policy 0, policy_version 578707 (0.0013) [2024-06-15 19:07:28,042][1652475] Updated weights for policy 0, policy_version 578752 (0.0012) [2024-06-15 19:07:29,248][1652475] Updated weights for policy 0, policy_version 578814 (0.0013) [2024-06-15 19:07:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 43099.5). Total num frames: 1185415168. Throughput: 0: 10683.7. Samples: 296419328. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:07:32,299][1652475] Updated weights for policy 0, policy_version 578880 (0.0117) [2024-06-15 19:07:35,738][1648984] Fps is (10 sec: 45876.6, 60 sec: 42052.3, 300 sec: 42654.0). Total num frames: 1185546240. Throughput: 0: 10524.4. Samples: 296452608. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:07:38,423][1652475] Updated weights for policy 0, policy_version 578960 (0.0113) [2024-06-15 19:07:40,479][1652475] Updated weights for policy 0, policy_version 579042 (0.0013) [2024-06-15 19:07:40,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 1185906688. Throughput: 0: 10877.2. Samples: 296515072. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:07:44,406][1652475] Updated weights for policy 0, policy_version 579092 (0.0028) [2024-06-15 19:07:45,740][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1186070528. Throughput: 0: 10729.2. Samples: 296582656. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:45,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:07:48,412][1652475] Updated weights for policy 0, policy_version 579152 (0.0107) [2024-06-15 19:07:50,607][1652475] Updated weights for policy 0, policy_version 579232 (0.0014) [2024-06-15 19:07:50,738][1648984] Fps is (10 sec: 36043.4, 60 sec: 43690.3, 300 sec: 42876.1). Total num frames: 1186267136. Throughput: 0: 11002.2. Samples: 296624640. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:50,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:07:51,777][1652475] Updated weights for policy 0, policy_version 579281 (0.0058) [2024-06-15 19:07:55,303][1651340] Signal inference workers to stop experience collection... (29800 times) [2024-06-15 19:07:55,337][1652475] InferenceWorker_p0-w0: stopping experience collection (29800 times) [2024-06-15 19:07:55,621][1651340] Signal inference workers to resume experience collection... (29800 times) [2024-06-15 19:07:55,622][1652475] InferenceWorker_p0-w0: resuming experience collection (29800 times) [2024-06-15 19:07:55,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 43209.4). Total num frames: 1186496512. Throughput: 0: 11025.1. Samples: 296684032. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:07:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:07:56,220][1652475] Updated weights for policy 0, policy_version 579362 (0.0013) [2024-06-15 19:08:00,738][1648984] Fps is (10 sec: 39323.3, 60 sec: 44782.9, 300 sec: 42765.0). Total num frames: 1186660352. Throughput: 0: 11116.1. Samples: 296755712. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:08:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:08:01,005][1652475] Updated weights for policy 0, policy_version 579447 (0.0015) [2024-06-15 19:08:02,313][1652475] Updated weights for policy 0, policy_version 579488 (0.0012) [2024-06-15 19:08:03,496][1652475] Updated weights for policy 0, policy_version 579537 (0.0015) [2024-06-15 19:08:05,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 1186988032. Throughput: 0: 11252.6. Samples: 296786432. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:08:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:08:07,413][1652475] Updated weights for policy 0, policy_version 579618 (0.0134) [2024-06-15 19:08:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1187119104. Throughput: 0: 11070.6. Samples: 296856576. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:08:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:08:13,013][1652475] Updated weights for policy 0, policy_version 579680 (0.0012) [2024-06-15 19:08:15,275][1652475] Updated weights for policy 0, policy_version 579772 (0.0013) [2024-06-15 19:08:15,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 1187381248. Throughput: 0: 10888.5. Samples: 296909312. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:08:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:08:16,726][1652475] Updated weights for policy 0, policy_version 579816 (0.0162) [2024-06-15 19:08:18,938][1652475] Updated weights for policy 0, policy_version 579862 (0.0016) [2024-06-15 19:08:20,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1187643392. Throughput: 0: 10899.9. Samples: 296943104. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:08:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:08:24,896][1652475] Updated weights for policy 0, policy_version 579936 (0.0022) [2024-06-15 19:08:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44783.2, 300 sec: 43098.3). Total num frames: 1187774464. Throughput: 0: 10991.0. Samples: 297009664. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:08:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:08:26,884][1652475] Updated weights for policy 0, policy_version 579987 (0.0026) [2024-06-15 19:08:29,223][1652475] Updated weights for policy 0, policy_version 580033 (0.0016) [2024-06-15 19:08:30,678][1652475] Updated weights for policy 0, policy_version 580081 (0.0012) [2024-06-15 19:08:30,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 1188003840. Throughput: 0: 10945.4. Samples: 297075200. Policy #0 lag: (min: 0.0, avg: 56.7, max: 256.0) [2024-06-15 19:08:30,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 19:08:32,183][1652475] Updated weights for policy 0, policy_version 580155 (0.0137) [2024-06-15 19:08:35,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 42987.2). Total num frames: 1188233216. Throughput: 0: 10683.8. Samples: 297105408. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:08:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:08:35,861][1652475] Updated weights for policy 0, policy_version 580194 (0.0014) [2024-06-15 19:08:39,237][1652475] Updated weights for policy 0, policy_version 580277 (0.0013) [2024-06-15 19:08:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1188429824. Throughput: 0: 10877.2. Samples: 297173504. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:08:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:08:40,772][1652475] Updated weights for policy 0, policy_version 580304 (0.0014) [2024-06-15 19:08:41,278][1651340] Signal inference workers to stop experience collection... (29850 times) [2024-06-15 19:08:41,383][1652475] InferenceWorker_p0-w0: stopping experience collection (29850 times) [2024-06-15 19:08:41,567][1651340] Signal inference workers to resume experience collection... (29850 times) [2024-06-15 19:08:41,568][1652475] InferenceWorker_p0-w0: resuming experience collection (29850 times) [2024-06-15 19:08:42,583][1652475] Updated weights for policy 0, policy_version 580368 (0.0100) [2024-06-15 19:08:45,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1188691968. Throughput: 0: 10820.3. Samples: 297242624. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:08:45,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:08:48,522][1652475] Updated weights for policy 0, policy_version 580432 (0.0015) [2024-06-15 19:08:50,188][1652475] Updated weights for policy 0, policy_version 580512 (0.0015) [2024-06-15 19:08:50,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 44237.1, 300 sec: 43431.5). Total num frames: 1188921344. Throughput: 0: 10956.8. Samples: 297279488. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:08:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:08:52,201][1652475] Updated weights for policy 0, policy_version 580560 (0.0011) [2024-06-15 19:08:54,353][1652475] Updated weights for policy 0, policy_version 580626 (0.0014) [2024-06-15 19:08:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 45329.1, 300 sec: 43542.5). Total num frames: 1189216256. Throughput: 0: 10626.8. Samples: 297334784. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:08:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:08:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000580672_1189216256.pth... [2024-06-15 19:08:55,802][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000575552_1178730496.pth [2024-06-15 19:09:00,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1189216256. Throughput: 0: 11047.8. Samples: 297406464. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:00,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 19:09:01,298][1652475] Updated weights for policy 0, policy_version 580691 (0.0013) [2024-06-15 19:09:02,270][1652475] Updated weights for policy 0, policy_version 580736 (0.0158) [2024-06-15 19:09:04,198][1652475] Updated weights for policy 0, policy_version 580793 (0.0040) [2024-06-15 19:09:05,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 43144.5, 300 sec: 43764.7). Total num frames: 1189576704. Throughput: 0: 10979.5. Samples: 297437184. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:09:05,840][1652475] Updated weights for policy 0, policy_version 580864 (0.0013) [2024-06-15 19:09:07,145][1652475] Updated weights for policy 0, policy_version 580923 (0.0016) [2024-06-15 19:09:10,761][1648984] Fps is (10 sec: 52307.4, 60 sec: 43673.8, 300 sec: 43094.9). Total num frames: 1189740544. Throughput: 0: 10882.9. Samples: 297499648. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:10,762][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:09:14,207][1652475] Updated weights for policy 0, policy_version 580984 (0.0013) [2024-06-15 19:09:15,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1189904384. Throughput: 0: 11025.1. Samples: 297571328. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:09:16,556][1652475] Updated weights for policy 0, policy_version 581041 (0.0135) [2024-06-15 19:09:18,124][1652475] Updated weights for policy 0, policy_version 581108 (0.0016) [2024-06-15 19:09:19,356][1652475] Updated weights for policy 0, policy_version 581168 (0.0014) [2024-06-15 19:09:20,738][1648984] Fps is (10 sec: 52550.8, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 1190264832. Throughput: 0: 10854.4. Samples: 297593856. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:09:25,604][1652475] Updated weights for policy 0, policy_version 581239 (0.0036) [2024-06-15 19:09:25,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1190363136. Throughput: 0: 11025.1. Samples: 297669632. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:09:28,139][1652475] Updated weights for policy 0, policy_version 581280 (0.0012) [2024-06-15 19:09:28,325][1651340] Signal inference workers to stop experience collection... (29900 times) [2024-06-15 19:09:28,450][1652475] InferenceWorker_p0-w0: stopping experience collection (29900 times) [2024-06-15 19:09:28,536][1651340] Signal inference workers to resume experience collection... (29900 times) [2024-06-15 19:09:28,537][1652475] InferenceWorker_p0-w0: resuming experience collection (29900 times) [2024-06-15 19:09:30,131][1652475] Updated weights for policy 0, policy_version 581367 (0.0184) [2024-06-15 19:09:30,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44783.0, 300 sec: 44098.0). Total num frames: 1190690816. Throughput: 0: 10740.6. Samples: 297725952. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:30,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:09:35,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 1190789120. Throughput: 0: 10569.9. Samples: 297755136. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:09:36,573][1652475] Updated weights for policy 0, policy_version 581441 (0.0013) [2024-06-15 19:09:39,904][1652475] Updated weights for policy 0, policy_version 581506 (0.0217) [2024-06-15 19:09:40,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 1191018496. Throughput: 0: 11002.3. Samples: 297829888. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:09:41,808][1652475] Updated weights for policy 0, policy_version 581587 (0.0014) [2024-06-15 19:09:44,042][1652475] Updated weights for policy 0, policy_version 581692 (0.0014) [2024-06-15 19:09:45,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 1191313408. Throughput: 0: 10547.2. Samples: 297881088. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:09:49,360][1652475] Updated weights for policy 0, policy_version 581753 (0.0108) [2024-06-15 19:09:50,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 43100.2). Total num frames: 1191444480. Throughput: 0: 10865.8. Samples: 297926144. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:50,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:09:52,932][1652475] Updated weights for policy 0, policy_version 581808 (0.0021) [2024-06-15 19:09:54,674][1652475] Updated weights for policy 0, policy_version 581883 (0.0079) [2024-06-15 19:09:55,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 41506.0, 300 sec: 43653.6). Total num frames: 1191706624. Throughput: 0: 10734.7. Samples: 297982464. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:09:55,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:09:57,435][1652475] Updated weights for policy 0, policy_version 581943 (0.0013) [2024-06-15 19:10:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1191837696. Throughput: 0: 10717.9. Samples: 298053632. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:10:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 19:10:01,659][1652475] Updated weights for policy 0, policy_version 582013 (0.0015) [2024-06-15 19:10:05,626][1652475] Updated weights for policy 0, policy_version 582128 (0.0090) [2024-06-15 19:10:05,738][1648984] Fps is (10 sec: 49153.1, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 1192198144. Throughput: 0: 10979.5. Samples: 298087936. Policy #0 lag: (min: 15.0, avg: 118.2, max: 271.0) [2024-06-15 19:10:05,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 19:10:09,473][1652475] Updated weights for policy 0, policy_version 582176 (0.0015) [2024-06-15 19:10:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43707.6, 300 sec: 43320.6). Total num frames: 1192361984. Throughput: 0: 10638.2. Samples: 298148352. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:10:10,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 19:10:12,587][1652475] Updated weights for policy 0, policy_version 582224 (0.0014) [2024-06-15 19:10:12,682][1651340] Signal inference workers to stop experience collection... (29950 times) [2024-06-15 19:10:12,747][1652475] InferenceWorker_p0-w0: stopping experience collection (29950 times) [2024-06-15 19:10:12,871][1651340] Signal inference workers to resume experience collection... (29950 times) [2024-06-15 19:10:12,873][1652475] InferenceWorker_p0-w0: resuming experience collection (29950 times) [2024-06-15 19:10:14,677][1652475] Updated weights for policy 0, policy_version 582276 (0.0014) [2024-06-15 19:10:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 1192591360. Throughput: 0: 10945.4. Samples: 298218496. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:10:15,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 19:10:16,903][1652475] Updated weights for policy 0, policy_version 582384 (0.0124) [2024-06-15 19:10:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 43653.7). Total num frames: 1192787968. Throughput: 0: 10911.3. Samples: 298246144. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:10:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:10:20,901][1652475] Updated weights for policy 0, policy_version 582432 (0.0015) [2024-06-15 19:10:25,723][1652475] Updated weights for policy 0, policy_version 582485 (0.0013) [2024-06-15 19:10:25,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42598.4, 300 sec: 43209.4). Total num frames: 1192919040. Throughput: 0: 10797.5. Samples: 298315776. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:10:25,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 19:10:27,936][1652475] Updated weights for policy 0, policy_version 582576 (0.0012) [2024-06-15 19:10:29,692][1652475] Updated weights for policy 0, policy_version 582653 (0.0012) [2024-06-15 19:10:30,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 43542.7). Total num frames: 1193279488. Throughput: 0: 10808.9. Samples: 298367488. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:10:30,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:10:33,496][1652475] Updated weights for policy 0, policy_version 582717 (0.0014) [2024-06-15 19:10:35,739][1648984] Fps is (10 sec: 49146.0, 60 sec: 43689.8, 300 sec: 43098.1). Total num frames: 1193410560. Throughput: 0: 10626.6. Samples: 298404352. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:10:35,739][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 19:10:38,402][1652475] Updated weights for policy 0, policy_version 582775 (0.0014) [2024-06-15 19:10:40,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 1193607168. Throughput: 0: 10922.7. Samples: 298473984. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:10:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:10:41,091][1652475] Updated weights for policy 0, policy_version 582842 (0.0132) [2024-06-15 19:10:42,473][1652475] Updated weights for policy 0, policy_version 582887 (0.0013) [2024-06-15 19:10:43,804][1652475] Updated weights for policy 0, policy_version 582928 (0.0013) [2024-06-15 19:10:45,738][1648984] Fps is (10 sec: 52435.6, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1193934848. Throughput: 0: 10626.9. Samples: 298531840. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:10:45,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 19:10:48,996][1652475] Updated weights for policy 0, policy_version 582996 (0.0015) [2024-06-15 19:10:50,746][1648984] Fps is (10 sec: 45836.5, 60 sec: 43684.5, 300 sec: 43097.0). Total num frames: 1194065920. Throughput: 0: 10761.4. Samples: 298572288. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:10:50,747][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:10:52,124][1652475] Updated weights for policy 0, policy_version 583041 (0.0014) [2024-06-15 19:10:53,643][1652475] Updated weights for policy 0, policy_version 583107 (0.0013) [2024-06-15 19:10:55,028][1652475] Updated weights for policy 0, policy_version 583168 (0.0012) [2024-06-15 19:10:55,740][1648984] Fps is (10 sec: 42597.9, 60 sec: 44236.9, 300 sec: 43653.6). Total num frames: 1194360832. Throughput: 0: 10831.6. Samples: 298635776. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:10:55,741][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 19:10:56,061][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000583200_1194393600.pth... [2024-06-15 19:10:56,209][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000578112_1183973376.pth [2024-06-15 19:10:56,215][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000583200_1194393600.pth [2024-06-15 19:10:56,755][1652475] Updated weights for policy 0, policy_version 583232 (0.0040) [2024-06-15 19:11:00,353][1651340] Signal inference workers to stop experience collection... (30000 times) [2024-06-15 19:11:00,409][1652475] InferenceWorker_p0-w0: stopping experience collection (30000 times) [2024-06-15 19:11:00,698][1651340] Signal inference workers to resume experience collection... (30000 times) [2024-06-15 19:11:00,699][1652475] InferenceWorker_p0-w0: resuming experience collection (30000 times) [2024-06-15 19:11:00,738][1648984] Fps is (10 sec: 42634.7, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 1194491904. Throughput: 0: 10820.3. Samples: 298705408. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:11:00,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 19:11:01,726][1652475] Updated weights for policy 0, policy_version 583293 (0.0012) [2024-06-15 19:11:04,994][1652475] Updated weights for policy 0, policy_version 583346 (0.0113) [2024-06-15 19:11:05,746][1648984] Fps is (10 sec: 39288.2, 60 sec: 42592.3, 300 sec: 43652.4). Total num frames: 1194754048. Throughput: 0: 10920.6. Samples: 298737664. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:11:05,747][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:11:06,790][1652475] Updated weights for policy 0, policy_version 583417 (0.0014) [2024-06-15 19:11:08,527][1652475] Updated weights for policy 0, policy_version 583460 (0.0013) [2024-06-15 19:11:10,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1194983424. Throughput: 0: 10695.1. Samples: 298797056. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:11:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:11:14,506][1652475] Updated weights for policy 0, policy_version 583520 (0.0013) [2024-06-15 19:11:15,738][1648984] Fps is (10 sec: 39355.0, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 1195147264. Throughput: 0: 11161.6. Samples: 298869760. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:11:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:11:15,940][1652475] Updated weights for policy 0, policy_version 583572 (0.0013) [2024-06-15 19:11:17,844][1652475] Updated weights for policy 0, policy_version 583648 (0.0012) [2024-06-15 19:11:19,761][1652475] Updated weights for policy 0, policy_version 583697 (0.0014) [2024-06-15 19:11:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 45329.1, 300 sec: 43653.7). Total num frames: 1195507712. Throughput: 0: 10945.7. Samples: 298896896. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:11:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:11:25,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43144.4, 300 sec: 42653.9). Total num frames: 1195507712. Throughput: 0: 11047.8. Samples: 298971136. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:11:25,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:11:26,044][1652475] Updated weights for policy 0, policy_version 583760 (0.0013) [2024-06-15 19:11:28,306][1652475] Updated weights for policy 0, policy_version 583856 (0.0015) [2024-06-15 19:11:29,892][1652475] Updated weights for policy 0, policy_version 583930 (0.0122) [2024-06-15 19:11:30,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 43653.7). Total num frames: 1195900928. Throughput: 0: 10934.0. Samples: 299023872. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:11:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:11:32,522][1652475] Updated weights for policy 0, policy_version 583970 (0.0012) [2024-06-15 19:11:35,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43691.5, 300 sec: 43209.3). Total num frames: 1196032000. Throughput: 0: 10822.3. Samples: 299059200. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:11:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:11:38,578][1652475] Updated weights for policy 0, policy_version 584032 (0.0040) [2024-06-15 19:11:40,102][1652475] Updated weights for policy 0, policy_version 584096 (0.0011) [2024-06-15 19:11:40,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 1196261376. Throughput: 0: 11059.2. Samples: 299133440. Policy #0 lag: (min: 10.0, avg: 128.4, max: 266.0) [2024-06-15 19:11:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:11:41,647][1651340] Signal inference workers to stop experience collection... (30050 times) [2024-06-15 19:11:41,668][1652475] Updated weights for policy 0, policy_version 584163 (0.0014) [2024-06-15 19:11:41,701][1652475] InferenceWorker_p0-w0: stopping experience collection (30050 times) [2024-06-15 19:11:41,881][1651340] Signal inference workers to resume experience collection... (30050 times) [2024-06-15 19:11:41,883][1652475] InferenceWorker_p0-w0: resuming experience collection (30050 times) [2024-06-15 19:11:44,537][1652475] Updated weights for policy 0, policy_version 584232 (0.0014) [2024-06-15 19:11:45,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.5, 300 sec: 43764.7). Total num frames: 1196556288. Throughput: 0: 10797.4. Samples: 299191296. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:11:45,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:11:49,736][1652475] Updated weights for policy 0, policy_version 584281 (0.0015) [2024-06-15 19:11:50,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43696.9, 300 sec: 43098.3). Total num frames: 1196687360. Throughput: 0: 11015.8. Samples: 299233280. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:11:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:11:51,192][1652475] Updated weights for policy 0, policy_version 584336 (0.0023) [2024-06-15 19:11:52,947][1652475] Updated weights for policy 0, policy_version 584416 (0.0012) [2024-06-15 19:11:55,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 43690.7, 300 sec: 44097.9). Total num frames: 1196982272. Throughput: 0: 11081.9. Samples: 299295744. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:11:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:11:56,021][1652475] Updated weights for policy 0, policy_version 584480 (0.0109) [2024-06-15 19:12:00,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 43144.3, 300 sec: 43209.3). Total num frames: 1197080576. Throughput: 0: 11104.7. Samples: 299369472. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:00,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:12:01,247][1652475] Updated weights for policy 0, policy_version 584544 (0.0014) [2024-06-15 19:12:02,864][1652475] Updated weights for policy 0, policy_version 584610 (0.0011) [2024-06-15 19:12:04,655][1652475] Updated weights for policy 0, policy_version 584676 (0.0014) [2024-06-15 19:12:05,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45335.5, 300 sec: 43986.9). Total num frames: 1197473792. Throughput: 0: 11116.1. Samples: 299397120. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:12:09,997][1652475] Updated weights for policy 0, policy_version 584752 (0.0014) [2024-06-15 19:12:10,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 1197604864. Throughput: 0: 11138.9. Samples: 299472384. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:12:12,718][1652475] Updated weights for policy 0, policy_version 584824 (0.0014) [2024-06-15 19:12:13,688][1652475] Updated weights for policy 0, policy_version 584871 (0.0099) [2024-06-15 19:12:14,976][1652475] Updated weights for policy 0, policy_version 584899 (0.0016) [2024-06-15 19:12:15,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 46421.4, 300 sec: 43764.7). Total num frames: 1197932544. Throughput: 0: 11298.1. Samples: 299532288. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:12:20,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41505.9, 300 sec: 43764.7). Total num frames: 1197998080. Throughput: 0: 11229.8. Samples: 299564544. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:12:22,365][1652475] Updated weights for policy 0, policy_version 584966 (0.0097) [2024-06-15 19:12:24,514][1652475] Updated weights for policy 0, policy_version 585058 (0.0015) [2024-06-15 19:12:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 46421.4, 300 sec: 43653.6). Total num frames: 1198292992. Throughput: 0: 11116.1. Samples: 299633664. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:12:26,484][1652475] Updated weights for policy 0, policy_version 585142 (0.0016) [2024-06-15 19:12:26,860][1651340] Signal inference workers to stop experience collection... (30100 times) [2024-06-15 19:12:26,936][1652475] InferenceWorker_p0-w0: stopping experience collection (30100 times) [2024-06-15 19:12:27,234][1651340] Signal inference workers to resume experience collection... (30100 times) [2024-06-15 19:12:27,234][1652475] InferenceWorker_p0-w0: resuming experience collection (30100 times) [2024-06-15 19:12:28,311][1652475] Updated weights for policy 0, policy_version 585212 (0.0013) [2024-06-15 19:12:30,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1198522368. Throughput: 0: 11104.8. Samples: 299691008. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:12:35,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1198653440. Throughput: 0: 11150.2. Samples: 299735040. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:12:35,948][1652475] Updated weights for policy 0, policy_version 585296 (0.0138) [2024-06-15 19:12:36,952][1652475] Updated weights for policy 0, policy_version 585342 (0.0012) [2024-06-15 19:12:38,537][1652475] Updated weights for policy 0, policy_version 585407 (0.0028) [2024-06-15 19:12:40,359][1652475] Updated weights for policy 0, policy_version 585464 (0.0074) [2024-06-15 19:12:40,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 46421.2, 300 sec: 43986.8). Total num frames: 1199046656. Throughput: 0: 11036.4. Samples: 299792384. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:40,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:12:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.3, 300 sec: 43320.5). Total num frames: 1199046656. Throughput: 0: 11002.4. Samples: 299864576. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:45,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 19:12:46,886][1652475] Updated weights for policy 0, policy_version 585536 (0.0012) [2024-06-15 19:12:50,317][1652475] Updated weights for policy 0, policy_version 585616 (0.0014) [2024-06-15 19:12:50,738][1648984] Fps is (10 sec: 32768.8, 60 sec: 44782.9, 300 sec: 43653.7). Total num frames: 1199374336. Throughput: 0: 10888.5. Samples: 299887104. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:12:52,762][1652475] Updated weights for policy 0, policy_version 585681 (0.0024) [2024-06-15 19:12:55,738][1648984] Fps is (10 sec: 52426.7, 60 sec: 43144.3, 300 sec: 43764.7). Total num frames: 1199570944. Throughput: 0: 10706.4. Samples: 299954176. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:12:55,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:12:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000585728_1199570944.pth... [2024-06-15 19:12:55,836][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000580672_1189216256.pth [2024-06-15 19:12:56,802][1652475] Updated weights for policy 0, policy_version 585760 (0.0014) [2024-06-15 19:12:59,008][1652475] Updated weights for policy 0, policy_version 585827 (0.0013) [2024-06-15 19:13:00,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 45875.4, 300 sec: 43542.6). Total num frames: 1199833088. Throughput: 0: 10865.8. Samples: 300021248. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:13:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:13:03,205][1652475] Updated weights for policy 0, policy_version 585888 (0.0014) [2024-06-15 19:13:05,738][1648984] Fps is (10 sec: 45877.0, 60 sec: 42598.5, 300 sec: 43764.7). Total num frames: 1200029696. Throughput: 0: 10900.0. Samples: 300055040. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:13:05,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 19:13:05,807][1652475] Updated weights for policy 0, policy_version 585968 (0.0045) [2024-06-15 19:13:08,849][1652475] Updated weights for policy 0, policy_version 586046 (0.0015) [2024-06-15 19:13:10,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 1200259072. Throughput: 0: 10740.6. Samples: 300116992. Policy #0 lag: (min: 63.0, avg: 180.2, max: 319.0) [2024-06-15 19:13:10,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 19:13:11,402][1652475] Updated weights for policy 0, policy_version 586106 (0.0015) [2024-06-15 19:13:15,333][1651340] Signal inference workers to stop experience collection... (30150 times) [2024-06-15 19:13:15,380][1652475] InferenceWorker_p0-w0: stopping experience collection (30150 times) [2024-06-15 19:13:15,534][1651340] Signal inference workers to resume experience collection... (30150 times) [2024-06-15 19:13:15,534][1652475] InferenceWorker_p0-w0: resuming experience collection (30150 times) [2024-06-15 19:13:15,706][1652475] Updated weights for policy 0, policy_version 586165 (0.0019) [2024-06-15 19:13:15,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 43431.5). Total num frames: 1200455680. Throughput: 0: 11047.8. Samples: 300188160. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:13:15,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 19:13:20,410][1652475] Updated weights for policy 0, policy_version 586241 (0.0013) [2024-06-15 19:13:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44237.0, 300 sec: 43653.6). Total num frames: 1200652288. Throughput: 0: 10774.8. Samples: 300219904. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:13:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:13:21,825][1652475] Updated weights for policy 0, policy_version 586304 (0.0133) [2024-06-15 19:13:23,363][1652475] Updated weights for policy 0, policy_version 586366 (0.0012) [2024-06-15 19:13:25,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 1200881664. Throughput: 0: 10877.2. Samples: 300281856. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:13:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:13:27,600][1652475] Updated weights for policy 0, policy_version 586426 (0.0101) [2024-06-15 19:13:30,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 1201143808. Throughput: 0: 10729.2. Samples: 300347392. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:13:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 19:13:33,189][1652475] Updated weights for policy 0, policy_version 586497 (0.0013) [2024-06-15 19:13:34,527][1652475] Updated weights for policy 0, policy_version 586555 (0.0026) [2024-06-15 19:13:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 1201307648. Throughput: 0: 11070.6. Samples: 300385280. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:13:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:13:36,552][1652475] Updated weights for policy 0, policy_version 586619 (0.0087) [2024-06-15 19:13:40,267][1652475] Updated weights for policy 0, policy_version 586704 (0.0015) [2024-06-15 19:13:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.5, 300 sec: 43653.6). Total num frames: 1201569792. Throughput: 0: 10877.2. Samples: 300443648. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:13:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:13:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 1201700864. Throughput: 0: 10979.6. Samples: 300515328. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:13:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:13:46,361][1652475] Updated weights for policy 0, policy_version 586800 (0.0042) [2024-06-15 19:13:49,085][1652475] Updated weights for policy 0, policy_version 586864 (0.0013) [2024-06-15 19:13:50,731][1652475] Updated weights for policy 0, policy_version 586938 (0.0012) [2024-06-15 19:13:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 1202028544. Throughput: 0: 10854.4. Samples: 300543488. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:13:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:13:53,415][1652475] Updated weights for policy 0, policy_version 586992 (0.0014) [2024-06-15 19:13:55,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 1202192384. Throughput: 0: 10956.8. Samples: 300610048. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:13:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:13:58,370][1652475] Updated weights for policy 0, policy_version 587063 (0.0037) [2024-06-15 19:14:00,334][1652475] Updated weights for policy 0, policy_version 587120 (0.0023) [2024-06-15 19:14:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 1202454528. Throughput: 0: 10968.2. Samples: 300681728. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:14:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:14:01,117][1651340] Signal inference workers to stop experience collection... (30200 times) [2024-06-15 19:14:01,173][1652475] InferenceWorker_p0-w0: stopping experience collection (30200 times) [2024-06-15 19:14:01,353][1651340] Signal inference workers to resume experience collection... (30200 times) [2024-06-15 19:14:01,354][1652475] InferenceWorker_p0-w0: resuming experience collection (30200 times) [2024-06-15 19:14:01,838][1652475] Updated weights for policy 0, policy_version 587191 (0.0013) [2024-06-15 19:14:05,544][1652475] Updated weights for policy 0, policy_version 587260 (0.0014) [2024-06-15 19:14:05,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 44782.8, 300 sec: 43990.3). Total num frames: 1202716672. Throughput: 0: 10956.8. Samples: 300712960. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:14:05,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:14:10,215][1652475] Updated weights for policy 0, policy_version 587312 (0.0011) [2024-06-15 19:14:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.5, 300 sec: 43875.8). Total num frames: 1202847744. Throughput: 0: 11207.1. Samples: 300786176. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:14:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:14:11,728][1652475] Updated weights for policy 0, policy_version 587380 (0.0013) [2024-06-15 19:14:13,031][1652475] Updated weights for policy 0, policy_version 587440 (0.0012) [2024-06-15 19:14:15,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 1203109888. Throughput: 0: 11138.8. Samples: 300848640. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:14:15,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:14:17,766][1652475] Updated weights for policy 0, policy_version 587510 (0.0013) [2024-06-15 19:14:20,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 1203240960. Throughput: 0: 10934.0. Samples: 300877312. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:14:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:14:22,679][1652475] Updated weights for policy 0, policy_version 587555 (0.0157) [2024-06-15 19:14:24,217][1652475] Updated weights for policy 0, policy_version 587632 (0.0013) [2024-06-15 19:14:25,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44783.0, 300 sec: 43653.6). Total num frames: 1203568640. Throughput: 0: 11104.7. Samples: 300943360. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:14:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:14:25,796][1652475] Updated weights for policy 0, policy_version 587696 (0.0014) [2024-06-15 19:14:28,616][1652475] Updated weights for policy 0, policy_version 587730 (0.0012) [2024-06-15 19:14:29,577][1652475] Updated weights for policy 0, policy_version 587776 (0.0015) [2024-06-15 19:14:30,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.5, 300 sec: 43986.9). Total num frames: 1203765248. Throughput: 0: 10934.0. Samples: 301007360. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:14:30,742][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 19:14:35,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 1203929088. Throughput: 0: 11138.8. Samples: 301044736. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:14:35,738][1648984] Avg episode reward: [(0, '-0.610')] [2024-06-15 19:14:35,753][1652475] Updated weights for policy 0, policy_version 587872 (0.0087) [2024-06-15 19:14:36,493][1652475] Updated weights for policy 0, policy_version 587904 (0.0011) [2024-06-15 19:14:38,524][1652475] Updated weights for policy 0, policy_version 587960 (0.0013) [2024-06-15 19:14:40,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 44782.9, 300 sec: 43875.8). Total num frames: 1204256768. Throughput: 0: 11025.1. Samples: 301106176. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:14:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:14:40,797][1652475] Updated weights for policy 0, policy_version 588032 (0.0044) [2024-06-15 19:14:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 1204322304. Throughput: 0: 10968.2. Samples: 301175296. Policy #0 lag: (min: 2.0, avg: 93.0, max: 258.0) [2024-06-15 19:14:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:14:46,843][1651340] Signal inference workers to stop experience collection... (30250 times) [2024-06-15 19:14:46,856][1652475] Updated weights for policy 0, policy_version 588097 (0.0014) [2024-06-15 19:14:46,935][1652475] InferenceWorker_p0-w0: stopping experience collection (30250 times) [2024-06-15 19:14:47,153][1651340] Signal inference workers to resume experience collection... (30250 times) [2024-06-15 19:14:47,154][1652475] InferenceWorker_p0-w0: resuming experience collection (30250 times) [2024-06-15 19:14:48,430][1652475] Updated weights for policy 0, policy_version 588157 (0.0013) [2024-06-15 19:14:50,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 42052.2, 300 sec: 43542.6). Total num frames: 1204551680. Throughput: 0: 10752.0. Samples: 301196800. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:14:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:14:51,927][1652475] Updated weights for policy 0, policy_version 588224 (0.0013) [2024-06-15 19:14:53,978][1652475] Updated weights for policy 0, policy_version 588280 (0.0014) [2024-06-15 19:14:55,738][1648984] Fps is (10 sec: 49150.2, 60 sec: 43690.4, 300 sec: 43986.8). Total num frames: 1204813824. Throughput: 0: 10524.3. Samples: 301259776. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:14:55,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:14:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000588288_1204813824.pth... [2024-06-15 19:14:55,805][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000583200_1194393600.pth [2024-06-15 19:14:58,342][1652475] Updated weights for policy 0, policy_version 588342 (0.0013) [2024-06-15 19:14:59,576][1652475] Updated weights for policy 0, policy_version 588384 (0.0014) [2024-06-15 19:15:00,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 1205075968. Throughput: 0: 10592.7. Samples: 301325312. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:15:02,769][1652475] Updated weights for policy 0, policy_version 588432 (0.0015) [2024-06-15 19:15:05,738][1648984] Fps is (10 sec: 39323.3, 60 sec: 41506.3, 300 sec: 43542.6). Total num frames: 1205207040. Throughput: 0: 10672.4. Samples: 301357568. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:15:06,361][1652475] Updated weights for policy 0, policy_version 588528 (0.0028) [2024-06-15 19:15:09,760][1652475] Updated weights for policy 0, policy_version 588577 (0.0015) [2024-06-15 19:15:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 1205469184. Throughput: 0: 10638.2. Samples: 301422080. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:10,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 19:15:14,031][1652475] Updated weights for policy 0, policy_version 588660 (0.0028) [2024-06-15 19:15:15,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43144.6, 300 sec: 43764.7). Total num frames: 1205698560. Throughput: 0: 10444.9. Samples: 301477376. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:15:19,935][1652475] Updated weights for policy 0, policy_version 588754 (0.0013) [2024-06-15 19:15:20,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43144.6, 300 sec: 43764.7). Total num frames: 1205829632. Throughput: 0: 10422.0. Samples: 301513728. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:15:21,615][1652475] Updated weights for policy 0, policy_version 588817 (0.0116) [2024-06-15 19:15:25,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 40413.8, 300 sec: 43098.2). Total num frames: 1205993472. Throughput: 0: 10342.4. Samples: 301571584. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:15:26,540][1652475] Updated weights for policy 0, policy_version 588896 (0.0013) [2024-06-15 19:15:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.3, 300 sec: 43542.7). Total num frames: 1206255616. Throughput: 0: 10240.0. Samples: 301636096. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:15:32,295][1652475] Updated weights for policy 0, policy_version 588995 (0.0014) [2024-06-15 19:15:34,283][1651340] Signal inference workers to stop experience collection... (30300 times) [2024-06-15 19:15:34,441][1652475] InferenceWorker_p0-w0: stopping experience collection (30300 times) [2024-06-15 19:15:34,571][1651340] Signal inference workers to resume experience collection... (30300 times) [2024-06-15 19:15:34,571][1652475] InferenceWorker_p0-w0: resuming experience collection (30300 times) [2024-06-15 19:15:35,172][1652475] Updated weights for policy 0, policy_version 589114 (0.0088) [2024-06-15 19:15:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43144.5, 300 sec: 43764.7). Total num frames: 1206517760. Throughput: 0: 10558.6. Samples: 301671936. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:35,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 19:15:40,063][1652475] Updated weights for policy 0, policy_version 589216 (0.0014) [2024-06-15 19:15:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 1206779904. Throughput: 0: 10410.8. Samples: 301728256. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:15:45,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42598.4, 300 sec: 43432.7). Total num frames: 1206878208. Throughput: 0: 10467.6. Samples: 301796352. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:15:45,800][1652475] Updated weights for policy 0, policy_version 589310 (0.0114) [2024-06-15 19:15:48,850][1652475] Updated weights for policy 0, policy_version 589368 (0.0013) [2024-06-15 19:15:50,424][1652475] Updated weights for policy 0, policy_version 589435 (0.0013) [2024-06-15 19:15:50,738][1648984] Fps is (10 sec: 39320.3, 60 sec: 43690.4, 300 sec: 43431.4). Total num frames: 1207173120. Throughput: 0: 10513.0. Samples: 301830656. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:50,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:15:55,738][1648984] Fps is (10 sec: 42596.0, 60 sec: 41506.0, 300 sec: 43431.4). Total num frames: 1207304192. Throughput: 0: 10512.9. Samples: 301895168. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:15:55,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:15:56,783][1652475] Updated weights for policy 0, policy_version 589522 (0.0016) [2024-06-15 19:16:00,383][1652475] Updated weights for policy 0, policy_version 589584 (0.0012) [2024-06-15 19:16:00,738][1648984] Fps is (10 sec: 29492.2, 60 sec: 39867.8, 300 sec: 43099.5). Total num frames: 1207468032. Throughput: 0: 10786.1. Samples: 301962752. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:16:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:16:02,375][1652475] Updated weights for policy 0, policy_version 589664 (0.0012) [2024-06-15 19:16:03,419][1652475] Updated weights for policy 0, policy_version 589712 (0.0015) [2024-06-15 19:16:05,738][1648984] Fps is (10 sec: 52431.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1207828480. Throughput: 0: 10558.6. Samples: 301988864. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:16:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:16:08,343][1652475] Updated weights for policy 0, policy_version 589762 (0.0012) [2024-06-15 19:16:10,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 41506.2, 300 sec: 43431.5). Total num frames: 1207959552. Throughput: 0: 10831.6. Samples: 302059008. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:16:10,745][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:16:11,738][1652475] Updated weights for policy 0, policy_version 589827 (0.0013) [2024-06-15 19:16:12,948][1652475] Updated weights for policy 0, policy_version 589888 (0.0014) [2024-06-15 19:16:15,234][1652475] Updated weights for policy 0, policy_version 590000 (0.0014) [2024-06-15 19:16:15,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 44236.6, 300 sec: 43542.5). Total num frames: 1208352768. Throughput: 0: 10843.0. Samples: 302124032. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 19:16:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:16:19,931][1651340] Signal inference workers to stop experience collection... (30350 times) [2024-06-15 19:16:20,002][1652475] InferenceWorker_p0-w0: stopping experience collection (30350 times) [2024-06-15 19:16:20,129][1651340] Signal inference workers to resume experience collection... (30350 times) [2024-06-15 19:16:20,130][1652475] InferenceWorker_p0-w0: resuming experience collection (30350 times) [2024-06-15 19:16:20,132][1652475] Updated weights for policy 0, policy_version 590048 (0.0023) [2024-06-15 19:16:20,741][1648984] Fps is (10 sec: 52410.7, 60 sec: 44234.2, 300 sec: 43986.4). Total num frames: 1208483840. Throughput: 0: 10967.3. Samples: 302165504. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:16:20,742][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:16:23,242][1652475] Updated weights for policy 0, policy_version 590112 (0.0021) [2024-06-15 19:16:24,246][1652475] Updated weights for policy 0, policy_version 590147 (0.0071) [2024-06-15 19:16:25,161][1652475] Updated weights for policy 0, policy_version 590203 (0.0013) [2024-06-15 19:16:25,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 1208745984. Throughput: 0: 11241.2. Samples: 302234112. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:16:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:16:26,587][1652475] Updated weights for policy 0, policy_version 590269 (0.0016) [2024-06-15 19:16:30,738][1648984] Fps is (10 sec: 39335.2, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1208877056. Throughput: 0: 11446.0. Samples: 302311424. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:16:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:16:31,955][1652475] Updated weights for policy 0, policy_version 590331 (0.0014) [2024-06-15 19:16:34,362][1652475] Updated weights for policy 0, policy_version 590392 (0.0013) [2024-06-15 19:16:35,739][1648984] Fps is (10 sec: 49143.5, 60 sec: 45327.8, 300 sec: 43986.6). Total num frames: 1209237504. Throughput: 0: 11457.1. Samples: 302346240. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:16:35,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:16:35,934][1652475] Updated weights for policy 0, policy_version 590464 (0.0018) [2024-06-15 19:16:39,004][1652475] Updated weights for policy 0, policy_version 590519 (0.0016) [2024-06-15 19:16:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1209401344. Throughput: 0: 11286.9. Samples: 302403072. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:16:40,754][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:16:42,709][1652475] Updated weights for policy 0, policy_version 590576 (0.0014) [2024-06-15 19:16:45,738][1648984] Fps is (10 sec: 36050.6, 60 sec: 45329.0, 300 sec: 43764.7). Total num frames: 1209597952. Throughput: 0: 11400.5. Samples: 302475776. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:16:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 19:16:46,089][1652475] Updated weights for policy 0, policy_version 590647 (0.0012) [2024-06-15 19:16:47,126][1652475] Updated weights for policy 0, policy_version 590688 (0.0013) [2024-06-15 19:16:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.8, 300 sec: 43431.5). Total num frames: 1209794560. Throughput: 0: 11491.5. Samples: 302505984. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:16:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:16:51,835][1652475] Updated weights for policy 0, policy_version 590723 (0.0013) [2024-06-15 19:16:53,584][1652475] Updated weights for policy 0, policy_version 590804 (0.0014) [2024-06-15 19:16:55,739][1648984] Fps is (10 sec: 45872.7, 60 sec: 45875.1, 300 sec: 43986.8). Total num frames: 1210056704. Throughput: 0: 11423.1. Samples: 302573056. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:16:55,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:16:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000590848_1210056704.pth... [2024-06-15 19:16:55,863][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000585728_1199570944.pth [2024-06-15 19:16:57,008][1652475] Updated weights for policy 0, policy_version 590903 (0.0017) [2024-06-15 19:16:58,813][1652475] Updated weights for policy 0, policy_version 590972 (0.0014) [2024-06-15 19:17:00,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 43542.6). Total num frames: 1210318848. Throughput: 0: 11446.1. Samples: 302639104. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:00,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 19:17:05,059][1652475] Updated weights for policy 0, policy_version 591029 (0.0056) [2024-06-15 19:17:05,384][1651340] Signal inference workers to stop experience collection... (30400 times) [2024-06-15 19:17:05,473][1652475] InferenceWorker_p0-w0: stopping experience collection (30400 times) [2024-06-15 19:17:05,620][1651340] Signal inference workers to resume experience collection... (30400 times) [2024-06-15 19:17:05,621][1652475] InferenceWorker_p0-w0: resuming experience collection (30400 times) [2024-06-15 19:17:05,738][1648984] Fps is (10 sec: 42601.5, 60 sec: 44236.8, 300 sec: 43653.7). Total num frames: 1210482688. Throughput: 0: 11469.7. Samples: 302681600. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:05,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 19:17:07,490][1652475] Updated weights for policy 0, policy_version 591107 (0.0135) [2024-06-15 19:17:09,428][1652475] Updated weights for policy 0, policy_version 591187 (0.0014) [2024-06-15 19:17:10,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 43764.7). Total num frames: 1210843136. Throughput: 0: 11195.7. Samples: 302737920. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:10,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 19:17:15,738][1648984] Fps is (10 sec: 36041.9, 60 sec: 41505.7, 300 sec: 43542.5). Total num frames: 1210843136. Throughput: 0: 11161.4. Samples: 302813696. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:15,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:17:16,585][1652475] Updated weights for policy 0, policy_version 591251 (0.0014) [2024-06-15 19:17:18,181][1652475] Updated weights for policy 0, policy_version 591344 (0.0013) [2024-06-15 19:17:20,165][1652475] Updated weights for policy 0, policy_version 591419 (0.0012) [2024-06-15 19:17:20,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 45877.9, 300 sec: 43875.8). Total num frames: 1211236352. Throughput: 0: 11025.5. Samples: 302842368. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:17:22,678][1652475] Updated weights for policy 0, policy_version 591480 (0.0013) [2024-06-15 19:17:25,738][1648984] Fps is (10 sec: 52432.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1211367424. Throughput: 0: 11116.1. Samples: 302903296. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:17:28,595][1652475] Updated weights for policy 0, policy_version 591526 (0.0011) [2024-06-15 19:17:30,173][1652475] Updated weights for policy 0, policy_version 591584 (0.0014) [2024-06-15 19:17:30,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 45329.0, 300 sec: 43875.8). Total num frames: 1211596800. Throughput: 0: 11082.0. Samples: 302974464. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:30,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:17:32,413][1652475] Updated weights for policy 0, policy_version 591672 (0.0024) [2024-06-15 19:17:35,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43692.0, 300 sec: 43431.5). Total num frames: 1211858944. Throughput: 0: 10956.8. Samples: 302999040. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:35,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 19:17:35,944][1652475] Updated weights for policy 0, policy_version 591738 (0.0011) [2024-06-15 19:17:39,597][1652475] Updated weights for policy 0, policy_version 591796 (0.0019) [2024-06-15 19:17:40,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 1212022784. Throughput: 0: 11059.4. Samples: 303070720. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:17:41,908][1652475] Updated weights for policy 0, policy_version 591849 (0.0015) [2024-06-15 19:17:44,609][1652475] Updated weights for policy 0, policy_version 591920 (0.0016) [2024-06-15 19:17:45,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 1212284928. Throughput: 0: 10899.9. Samples: 303129600. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:17:46,924][1652475] Updated weights for policy 0, policy_version 591968 (0.0019) [2024-06-15 19:17:47,672][1652475] Updated weights for policy 0, policy_version 591999 (0.0024) [2024-06-15 19:17:50,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 1212416000. Throughput: 0: 10820.3. Samples: 303168512. Policy #0 lag: (min: 63.0, avg: 164.7, max: 319.0) [2024-06-15 19:17:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:17:51,264][1651340] Signal inference workers to stop experience collection... (30450 times) [2024-06-15 19:17:51,304][1652475] InferenceWorker_p0-w0: stopping experience collection (30450 times) [2024-06-15 19:17:51,457][1651340] Signal inference workers to resume experience collection... (30450 times) [2024-06-15 19:17:51,457][1652475] InferenceWorker_p0-w0: resuming experience collection (30450 times) [2024-06-15 19:17:51,711][1652475] Updated weights for policy 0, policy_version 592061 (0.0014) [2024-06-15 19:17:55,495][1652475] Updated weights for policy 0, policy_version 592128 (0.0014) [2024-06-15 19:17:55,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43691.2, 300 sec: 43542.6). Total num frames: 1212678144. Throughput: 0: 11150.2. Samples: 303239680. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:17:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:17:57,006][1652475] Updated weights for policy 0, policy_version 592187 (0.0014) [2024-06-15 19:17:59,434][1652475] Updated weights for policy 0, policy_version 592254 (0.0016) [2024-06-15 19:18:00,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 1212940288. Throughput: 0: 10661.2. Samples: 303293440. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:18:05,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43144.4, 300 sec: 43431.5). Total num frames: 1213071360. Throughput: 0: 10763.4. Samples: 303326720. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:05,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:18:06,786][1652475] Updated weights for policy 0, policy_version 592323 (0.0012) [2024-06-15 19:18:07,924][1652475] Updated weights for policy 0, policy_version 592376 (0.0014) [2024-06-15 19:18:09,357][1652475] Updated weights for policy 0, policy_version 592416 (0.0012) [2024-06-15 19:18:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 43764.7). Total num frames: 1213366272. Throughput: 0: 10888.5. Samples: 303393280. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:18:11,310][1652475] Updated weights for policy 0, policy_version 592482 (0.0019) [2024-06-15 19:18:15,411][1652475] Updated weights for policy 0, policy_version 592531 (0.0013) [2024-06-15 19:18:15,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44783.5, 300 sec: 43653.6). Total num frames: 1213530112. Throughput: 0: 10729.3. Samples: 303457280. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:18:18,985][1652475] Updated weights for policy 0, policy_version 592579 (0.0017) [2024-06-15 19:18:20,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1213726720. Throughput: 0: 10979.5. Samples: 303493120. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:18:21,319][1652475] Updated weights for policy 0, policy_version 592656 (0.0141) [2024-06-15 19:18:22,870][1652475] Updated weights for policy 0, policy_version 592720 (0.0019) [2024-06-15 19:18:25,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1213988864. Throughput: 0: 10649.6. Samples: 303549952. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:18:27,673][1652475] Updated weights for policy 0, policy_version 592788 (0.0013) [2024-06-15 19:18:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 43431.5). Total num frames: 1214119936. Throughput: 0: 10979.6. Samples: 303623680. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:18:31,091][1652475] Updated weights for policy 0, policy_version 592848 (0.0025) [2024-06-15 19:18:32,908][1652475] Updated weights for policy 0, policy_version 592912 (0.0013) [2024-06-15 19:18:34,700][1652475] Updated weights for policy 0, policy_version 592998 (0.0013) [2024-06-15 19:18:35,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 1214513152. Throughput: 0: 10729.2. Samples: 303651328. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:18:39,134][1651340] Signal inference workers to stop experience collection... (30500 times) [2024-06-15 19:18:39,174][1652475] InferenceWorker_p0-w0: stopping experience collection (30500 times) [2024-06-15 19:18:39,417][1651340] Signal inference workers to resume experience collection... (30500 times) [2024-06-15 19:18:39,417][1652475] InferenceWorker_p0-w0: resuming experience collection (30500 times) [2024-06-15 19:18:39,792][1652475] Updated weights for policy 0, policy_version 593056 (0.0015) [2024-06-15 19:18:40,731][1652475] Updated weights for policy 0, policy_version 593088 (0.0011) [2024-06-15 19:18:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 1214644224. Throughput: 0: 10661.0. Samples: 303719424. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:18:45,615][1652475] Updated weights for policy 0, policy_version 593184 (0.0014) [2024-06-15 19:18:45,738][1648984] Fps is (10 sec: 32767.4, 60 sec: 42598.3, 300 sec: 43431.5). Total num frames: 1214840832. Throughput: 0: 10854.4. Samples: 303781888. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:18:46,949][1652475] Updated weights for policy 0, policy_version 593254 (0.0014) [2024-06-15 19:18:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1215037440. Throughput: 0: 10820.3. Samples: 303813632. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:18:51,620][1652475] Updated weights for policy 0, policy_version 593285 (0.0012) [2024-06-15 19:18:55,200][1652475] Updated weights for policy 0, policy_version 593347 (0.0014) [2024-06-15 19:18:55,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 1215201280. Throughput: 0: 10911.3. Samples: 303884288. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:18:55,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:18:56,325][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000593392_1215266816.pth... [2024-06-15 19:18:56,508][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000588288_1204813824.pth [2024-06-15 19:18:57,870][1652475] Updated weights for policy 0, policy_version 593463 (0.0013) [2024-06-15 19:18:59,161][1652475] Updated weights for policy 0, policy_version 593533 (0.0012) [2024-06-15 19:19:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1215561728. Throughput: 0: 10763.4. Samples: 303941632. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:19:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:19:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1215594496. Throughput: 0: 10786.1. Samples: 303978496. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:19:05,761][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:19:06,484][1652475] Updated weights for policy 0, policy_version 593593 (0.0012) [2024-06-15 19:19:08,543][1652475] Updated weights for policy 0, policy_version 593680 (0.0012) [2024-06-15 19:19:10,040][1652475] Updated weights for policy 0, policy_version 593744 (0.0081) [2024-06-15 19:19:10,738][1648984] Fps is (10 sec: 45874.3, 60 sec: 44236.7, 300 sec: 43764.7). Total num frames: 1216020480. Throughput: 0: 10865.7. Samples: 304038912. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:19:10,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:19:15,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1216086016. Throughput: 0: 10752.0. Samples: 304107520. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:19:15,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 19:19:17,841][1652475] Updated weights for policy 0, policy_version 593793 (0.0011) [2024-06-15 19:19:19,217][1652475] Updated weights for policy 0, policy_version 593848 (0.0012) [2024-06-15 19:19:20,429][1651340] Signal inference workers to stop experience collection... (30550 times) [2024-06-15 19:19:20,460][1652475] InferenceWorker_p0-w0: stopping experience collection (30550 times) [2024-06-15 19:19:20,678][1651340] Signal inference workers to resume experience collection... (30550 times) [2024-06-15 19:19:20,679][1652475] InferenceWorker_p0-w0: resuming experience collection (30550 times) [2024-06-15 19:19:20,738][1648984] Fps is (10 sec: 32768.9, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1216348160. Throughput: 0: 11082.0. Samples: 304150016. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:19:20,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 19:19:20,959][1652475] Updated weights for policy 0, policy_version 593936 (0.0085) [2024-06-15 19:19:22,854][1652475] Updated weights for policy 0, policy_version 594035 (0.0014) [2024-06-15 19:19:25,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1216610304. Throughput: 0: 10786.1. Samples: 304204800. Policy #0 lag: (min: 5.0, avg: 106.5, max: 261.0) [2024-06-15 19:19:25,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:19:30,532][1652475] Updated weights for policy 0, policy_version 594096 (0.0013) [2024-06-15 19:19:30,738][1648984] Fps is (10 sec: 36043.4, 60 sec: 43144.3, 300 sec: 43320.4). Total num frames: 1216708608. Throughput: 0: 11104.7. Samples: 304281600. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:19:30,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:19:32,719][1652475] Updated weights for policy 0, policy_version 594198 (0.0012) [2024-06-15 19:19:33,780][1652475] Updated weights for policy 0, policy_version 594256 (0.0013) [2024-06-15 19:19:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 1217134592. Throughput: 0: 10899.9. Samples: 304304128. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:19:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:19:40,738][1648984] Fps is (10 sec: 42599.8, 60 sec: 41506.2, 300 sec: 43431.5). Total num frames: 1217134592. Throughput: 0: 10900.0. Samples: 304374784. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:19:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:19:41,163][1652475] Updated weights for policy 0, policy_version 594321 (0.0089) [2024-06-15 19:19:42,909][1652475] Updated weights for policy 0, policy_version 594400 (0.0045) [2024-06-15 19:19:43,780][1652475] Updated weights for policy 0, policy_version 594433 (0.0087) [2024-06-15 19:19:45,739][1648984] Fps is (10 sec: 39318.0, 60 sec: 44782.4, 300 sec: 43986.7). Total num frames: 1217527808. Throughput: 0: 11036.2. Samples: 304438272. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:19:45,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:19:46,362][1652475] Updated weights for policy 0, policy_version 594512 (0.0149) [2024-06-15 19:19:50,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1217658880. Throughput: 0: 10945.4. Samples: 304471040. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:19:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:19:51,211][1652475] Updated weights for policy 0, policy_version 594563 (0.0020) [2024-06-15 19:19:53,860][1652475] Updated weights for policy 0, policy_version 594640 (0.0013) [2024-06-15 19:19:55,738][1648984] Fps is (10 sec: 39325.2, 60 sec: 45329.2, 300 sec: 43542.6). Total num frames: 1217921024. Throughput: 0: 11116.1. Samples: 304539136. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:19:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:19:58,222][1652475] Updated weights for policy 0, policy_version 594720 (0.0014) [2024-06-15 19:20:00,062][1652475] Updated weights for policy 0, policy_version 594785 (0.0021) [2024-06-15 19:20:00,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 1218183168. Throughput: 0: 10911.3. Samples: 304598528. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:20:03,110][1651340] Signal inference workers to stop experience collection... (30600 times) [2024-06-15 19:20:03,212][1652475] InferenceWorker_p0-w0: stopping experience collection (30600 times) [2024-06-15 19:20:03,333][1651340] Signal inference workers to resume experience collection... (30600 times) [2024-06-15 19:20:03,334][1652475] InferenceWorker_p0-w0: resuming experience collection (30600 times) [2024-06-15 19:20:03,546][1652475] Updated weights for policy 0, policy_version 594836 (0.0014) [2024-06-15 19:20:05,399][1652475] Updated weights for policy 0, policy_version 594912 (0.0023) [2024-06-15 19:20:05,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 46421.4, 300 sec: 43764.7). Total num frames: 1218379776. Throughput: 0: 10854.4. Samples: 304638464. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:20:09,778][1652475] Updated weights for policy 0, policy_version 594960 (0.0012) [2024-06-15 19:20:10,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42052.4, 300 sec: 43542.5). Total num frames: 1218543616. Throughput: 0: 11138.9. Samples: 304706048. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:20:12,097][1652475] Updated weights for policy 0, policy_version 595045 (0.0014) [2024-06-15 19:20:15,738][1648984] Fps is (10 sec: 32767.5, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 1218707456. Throughput: 0: 10763.4. Samples: 304765952. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:20:16,875][1652475] Updated weights for policy 0, policy_version 595089 (0.0014) [2024-06-15 19:20:18,420][1652475] Updated weights for policy 0, policy_version 595156 (0.0012) [2024-06-15 19:20:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1218969600. Throughput: 0: 11002.3. Samples: 304799232. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:20:21,530][1652475] Updated weights for policy 0, policy_version 595218 (0.0013) [2024-06-15 19:20:23,600][1652475] Updated weights for policy 0, policy_version 595296 (0.0016) [2024-06-15 19:20:25,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43986.8). Total num frames: 1219231744. Throughput: 0: 10592.7. Samples: 304851456. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:25,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:20:29,172][1652475] Updated weights for policy 0, policy_version 595329 (0.0011) [2024-06-15 19:20:30,434][1652475] Updated weights for policy 0, policy_version 595388 (0.0011) [2024-06-15 19:20:30,738][1648984] Fps is (10 sec: 39319.2, 60 sec: 44236.6, 300 sec: 43542.5). Total num frames: 1219362816. Throughput: 0: 10809.0. Samples: 304924672. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:30,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:20:32,269][1652475] Updated weights for policy 0, policy_version 595453 (0.0014) [2024-06-15 19:20:34,233][1652475] Updated weights for policy 0, policy_version 595504 (0.0011) [2024-06-15 19:20:35,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 1219690496. Throughput: 0: 10820.3. Samples: 304957952. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:20:36,322][1652475] Updated weights for policy 0, policy_version 595581 (0.0110) [2024-06-15 19:20:40,738][1648984] Fps is (10 sec: 39323.7, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 1219756032. Throughput: 0: 10661.0. Samples: 305018880. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:20:43,468][1652475] Updated weights for policy 0, policy_version 595653 (0.0031) [2024-06-15 19:20:44,960][1652475] Updated weights for policy 0, policy_version 595711 (0.0013) [2024-06-15 19:20:45,738][1648984] Fps is (10 sec: 32766.3, 60 sec: 41506.4, 300 sec: 43542.5). Total num frames: 1220018176. Throughput: 0: 10797.4. Samples: 305084416. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:45,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:20:46,992][1652475] Updated weights for policy 0, policy_version 595776 (0.0016) [2024-06-15 19:20:47,152][1651340] Signal inference workers to stop experience collection... (30650 times) [2024-06-15 19:20:47,192][1652475] InferenceWorker_p0-w0: stopping experience collection (30650 times) [2024-06-15 19:20:47,479][1651340] Signal inference workers to resume experience collection... (30650 times) [2024-06-15 19:20:47,488][1652475] InferenceWorker_p0-w0: resuming experience collection (30650 times) [2024-06-15 19:20:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 1220280320. Throughput: 0: 10478.9. Samples: 305110016. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:20:53,755][1652475] Updated weights for policy 0, policy_version 595845 (0.0016) [2024-06-15 19:20:55,280][1652475] Updated weights for policy 0, policy_version 595901 (0.0015) [2024-06-15 19:20:55,738][1648984] Fps is (10 sec: 39323.4, 60 sec: 41506.1, 300 sec: 43875.8). Total num frames: 1220411392. Throughput: 0: 10717.8. Samples: 305188352. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:20:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:20:56,497][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000595936_1220476928.pth... [2024-06-15 19:20:56,681][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000590848_1210056704.pth [2024-06-15 19:20:57,231][1652475] Updated weights for policy 0, policy_version 595961 (0.0013) [2024-06-15 19:20:58,921][1652475] Updated weights for policy 0, policy_version 596016 (0.0012) [2024-06-15 19:21:00,710][1652475] Updated weights for policy 0, policy_version 596093 (0.0013) [2024-06-15 19:21:00,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43144.6, 300 sec: 43875.8). Total num frames: 1220771840. Throughput: 0: 10399.3. Samples: 305233920. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:21:00,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:21:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 40413.9, 300 sec: 43542.6). Total num frames: 1220804608. Throughput: 0: 10467.5. Samples: 305270272. Policy #0 lag: (min: 15.0, avg: 64.0, max: 271.0) [2024-06-15 19:21:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:21:07,418][1652475] Updated weights for policy 0, policy_version 596158 (0.0013) [2024-06-15 19:21:09,407][1652475] Updated weights for policy 0, policy_version 596195 (0.0016) [2024-06-15 19:21:10,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 1221099520. Throughput: 0: 10831.7. Samples: 305338880. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:21:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:21:11,230][1652475] Updated weights for policy 0, policy_version 596272 (0.0012) [2024-06-15 19:21:12,467][1652475] Updated weights for policy 0, policy_version 596325 (0.0013) [2024-06-15 19:21:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.8, 300 sec: 43543.1). Total num frames: 1221328896. Throughput: 0: 10638.3. Samples: 305403392. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:21:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:21:19,229][1652475] Updated weights for policy 0, policy_version 596374 (0.0080) [2024-06-15 19:21:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 1221492736. Throughput: 0: 10808.9. Samples: 305444352. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:21:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:21:20,805][1652475] Updated weights for policy 0, policy_version 596434 (0.0093) [2024-06-15 19:21:22,280][1652475] Updated weights for policy 0, policy_version 596498 (0.0019) [2024-06-15 19:21:24,231][1652475] Updated weights for policy 0, policy_version 596579 (0.0134) [2024-06-15 19:21:25,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 1221853184. Throughput: 0: 10604.1. Samples: 305496064. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:21:25,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 19:21:30,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41506.5, 300 sec: 42765.3). Total num frames: 1221853184. Throughput: 0: 10809.0. Samples: 305570816. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:21:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:21:32,240][1652475] Updated weights for policy 0, policy_version 596664 (0.0018) [2024-06-15 19:21:33,396][1651340] Signal inference workers to stop experience collection... (30700 times) [2024-06-15 19:21:33,446][1652475] InferenceWorker_p0-w0: stopping experience collection (30700 times) [2024-06-15 19:21:33,617][1651340] Signal inference workers to resume experience collection... (30700 times) [2024-06-15 19:21:33,618][1652475] InferenceWorker_p0-w0: resuming experience collection (30700 times) [2024-06-15 19:21:34,339][1652475] Updated weights for policy 0, policy_version 596736 (0.0028) [2024-06-15 19:21:35,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 1222180864. Throughput: 0: 10922.7. Samples: 305601536. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:21:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:21:36,661][1652475] Updated weights for policy 0, policy_version 596817 (0.0012) [2024-06-15 19:21:40,740][1648984] Fps is (10 sec: 52418.4, 60 sec: 43689.2, 300 sec: 43320.1). Total num frames: 1222377472. Throughput: 0: 10296.4. Samples: 305651712. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:21:40,741][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:21:43,764][1652475] Updated weights for policy 0, policy_version 596867 (0.0013) [2024-06-15 19:21:45,510][1652475] Updated weights for policy 0, policy_version 596930 (0.0087) [2024-06-15 19:21:45,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 41506.5, 300 sec: 43098.3). Total num frames: 1222508544. Throughput: 0: 10934.0. Samples: 305725952. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:21:45,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:21:47,550][1652475] Updated weights for policy 0, policy_version 597012 (0.0114) [2024-06-15 19:21:49,233][1652475] Updated weights for policy 0, policy_version 597088 (0.0084) [2024-06-15 19:21:50,738][1648984] Fps is (10 sec: 52439.5, 60 sec: 43690.7, 300 sec: 43542.7). Total num frames: 1222901760. Throughput: 0: 10626.8. Samples: 305748480. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:21:50,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:21:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1222901760. Throughput: 0: 10661.0. Samples: 305818624. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:21:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:21:55,795][1652475] Updated weights for policy 0, policy_version 597124 (0.0088) [2024-06-15 19:21:57,006][1652475] Updated weights for policy 0, policy_version 597176 (0.0110) [2024-06-15 19:21:58,294][1652475] Updated weights for policy 0, policy_version 597203 (0.0012) [2024-06-15 19:22:00,204][1652475] Updated weights for policy 0, policy_version 597281 (0.0119) [2024-06-15 19:22:00,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 41506.0, 300 sec: 43320.4). Total num frames: 1223262208. Throughput: 0: 10649.6. Samples: 305882624. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:22:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:22:01,618][1652475] Updated weights for policy 0, policy_version 597348 (0.0016) [2024-06-15 19:22:02,337][1652475] Updated weights for policy 0, policy_version 597376 (0.0015) [2024-06-15 19:22:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1223426048. Throughput: 0: 10444.8. Samples: 305914368. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:22:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:22:10,263][1652475] Updated weights for policy 0, policy_version 597474 (0.0017) [2024-06-15 19:22:10,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42598.4, 300 sec: 43431.6). Total num frames: 1223655424. Throughput: 0: 10786.2. Samples: 305981440. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:22:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:22:12,186][1652475] Updated weights for policy 0, policy_version 597520 (0.0018) [2024-06-15 19:22:14,088][1652475] Updated weights for policy 0, policy_version 597585 (0.0013) [2024-06-15 19:22:14,474][1651340] Signal inference workers to stop experience collection... (30750 times) [2024-06-15 19:22:14,533][1652475] InferenceWorker_p0-w0: stopping experience collection (30750 times) [2024-06-15 19:22:14,841][1651340] Signal inference workers to resume experience collection... (30750 times) [2024-06-15 19:22:14,842][1652475] InferenceWorker_p0-w0: resuming experience collection (30750 times) [2024-06-15 19:22:15,127][1652475] Updated weights for policy 0, policy_version 597630 (0.0013) [2024-06-15 19:22:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1223950336. Throughput: 0: 10353.8. Samples: 306036736. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:22:15,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:22:19,551][1652475] Updated weights for policy 0, policy_version 597680 (0.0091) [2024-06-15 19:22:20,739][1648984] Fps is (10 sec: 42594.8, 60 sec: 43144.0, 300 sec: 43098.1). Total num frames: 1224081408. Throughput: 0: 10467.4. Samples: 306072576. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:22:20,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:22:22,123][1652475] Updated weights for policy 0, policy_version 597729 (0.0011) [2024-06-15 19:22:24,983][1652475] Updated weights for policy 0, policy_version 597812 (0.0014) [2024-06-15 19:22:25,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 43209.3). Total num frames: 1224343552. Throughput: 0: 10798.0. Samples: 306137600. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:22:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:22:27,628][1652475] Updated weights for policy 0, policy_version 597872 (0.0012) [2024-06-15 19:22:30,738][1648984] Fps is (10 sec: 39325.1, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1224474624. Throughput: 0: 10615.5. Samples: 306203648. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:22:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:22:33,928][1652475] Updated weights for policy 0, policy_version 597970 (0.0109) [2024-06-15 19:22:35,621][1652475] Updated weights for policy 0, policy_version 598032 (0.0018) [2024-06-15 19:22:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 1224769536. Throughput: 0: 10831.7. Samples: 306235904. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:22:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:22:36,735][1652475] Updated weights for policy 0, policy_version 598076 (0.0015) [2024-06-15 19:22:40,646][1652475] Updated weights for policy 0, policy_version 598141 (0.0042) [2024-06-15 19:22:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43692.1, 300 sec: 43098.3). Total num frames: 1224998912. Throughput: 0: 10626.8. Samples: 306296832. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:22:40,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 19:22:45,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1225097216. Throughput: 0: 10649.6. Samples: 306361856. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:22:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:22:45,847][1652475] Updated weights for policy 0, policy_version 598201 (0.0031) [2024-06-15 19:22:47,916][1652475] Updated weights for policy 0, policy_version 598256 (0.0014) [2024-06-15 19:22:49,716][1652475] Updated weights for policy 0, policy_version 598332 (0.0017) [2024-06-15 19:22:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1225392128. Throughput: 0: 10513.1. Samples: 306387456. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:22:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:22:53,186][1652475] Updated weights for policy 0, policy_version 598400 (0.0014) [2024-06-15 19:22:55,738][1648984] Fps is (10 sec: 42597.6, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1225523200. Throughput: 0: 10422.0. Samples: 306450432. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:22:55,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:22:55,747][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000598400_1225523200.pth... [2024-06-15 19:22:55,827][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000593392_1215266816.pth [2024-06-15 19:22:58,089][1652475] Updated weights for policy 0, policy_version 598464 (0.0015) [2024-06-15 19:23:00,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 1225752576. Throughput: 0: 10672.3. Samples: 306516992. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:23:01,732][1652475] Updated weights for policy 0, policy_version 598560 (0.0013) [2024-06-15 19:23:02,454][1652475] Updated weights for policy 0, policy_version 598592 (0.0016) [2024-06-15 19:23:03,986][1651340] Signal inference workers to stop experience collection... (30800 times) [2024-06-15 19:23:04,038][1652475] InferenceWorker_p0-w0: stopping experience collection (30800 times) [2024-06-15 19:23:04,187][1651340] Signal inference workers to resume experience collection... (30800 times) [2024-06-15 19:23:04,187][1652475] InferenceWorker_p0-w0: resuming experience collection (30800 times) [2024-06-15 19:23:05,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1226047488. Throughput: 0: 10524.7. Samples: 306546176. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:23:09,642][1652475] Updated weights for policy 0, policy_version 598688 (0.0100) [2024-06-15 19:23:10,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.2, 300 sec: 42876.1). Total num frames: 1226178560. Throughput: 0: 10683.7. Samples: 306618368. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:23:12,517][1652475] Updated weights for policy 0, policy_version 598768 (0.0114) [2024-06-15 19:23:13,845][1652475] Updated weights for policy 0, policy_version 598832 (0.0012) [2024-06-15 19:23:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1226440704. Throughput: 0: 10558.6. Samples: 306678784. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:23:16,686][1652475] Updated weights for policy 0, policy_version 598882 (0.0019) [2024-06-15 19:23:20,747][1648984] Fps is (10 sec: 39285.0, 60 sec: 41500.2, 300 sec: 42652.6). Total num frames: 1226571776. Throughput: 0: 10590.5. Samples: 306712576. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:20,748][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:23:21,024][1652475] Updated weights for policy 0, policy_version 598931 (0.0013) [2024-06-15 19:23:23,337][1652475] Updated weights for policy 0, policy_version 598979 (0.0013) [2024-06-15 19:23:24,887][1652475] Updated weights for policy 0, policy_version 599056 (0.0012) [2024-06-15 19:23:25,738][1648984] Fps is (10 sec: 49150.9, 60 sec: 43144.4, 300 sec: 43431.5). Total num frames: 1226932224. Throughput: 0: 10763.3. Samples: 306781184. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:25,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:23:28,453][1652475] Updated weights for policy 0, policy_version 599137 (0.0012) [2024-06-15 19:23:30,738][1648984] Fps is (10 sec: 52478.1, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1227096064. Throughput: 0: 10763.4. Samples: 306846208. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:23:33,306][1652475] Updated weights for policy 0, policy_version 599200 (0.0107) [2024-06-15 19:23:35,090][1652475] Updated weights for policy 0, policy_version 599250 (0.0014) [2024-06-15 19:23:35,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 42598.3, 300 sec: 42987.2). Total num frames: 1227325440. Throughput: 0: 11013.7. Samples: 306883072. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:23:37,149][1652475] Updated weights for policy 0, policy_version 599344 (0.0012) [2024-06-15 19:23:40,680][1652475] Updated weights for policy 0, policy_version 599408 (0.0014) [2024-06-15 19:23:40,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1227587584. Throughput: 0: 10979.6. Samples: 306944512. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:23:45,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 42052.3, 300 sec: 42653.9). Total num frames: 1227620352. Throughput: 0: 11036.4. Samples: 307013632. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:45,741][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:23:46,558][1652475] Updated weights for policy 0, policy_version 599458 (0.0014) [2024-06-15 19:23:48,021][1652475] Updated weights for policy 0, policy_version 599520 (0.0035) [2024-06-15 19:23:49,263][1651340] Signal inference workers to stop experience collection... (30850 times) [2024-06-15 19:23:49,333][1652475] InferenceWorker_p0-w0: stopping experience collection (30850 times) [2024-06-15 19:23:49,548][1651340] Signal inference workers to resume experience collection... (30850 times) [2024-06-15 19:23:49,549][1652475] InferenceWorker_p0-w0: resuming experience collection (30850 times) [2024-06-15 19:23:49,551][1652475] Updated weights for policy 0, policy_version 599584 (0.0013) [2024-06-15 19:23:50,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1228013568. Throughput: 0: 11013.7. Samples: 307041792. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:50,762][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 19:23:51,541][1652475] Updated weights for policy 0, policy_version 599623 (0.0025) [2024-06-15 19:23:55,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1228144640. Throughput: 0: 10706.5. Samples: 307100160. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:23:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:23:57,837][1652475] Updated weights for policy 0, policy_version 599698 (0.0015) [2024-06-15 19:23:59,696][1652475] Updated weights for policy 0, policy_version 599776 (0.0015) [2024-06-15 19:24:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 1228406784. Throughput: 0: 10763.4. Samples: 307163136. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:24:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:24:02,211][1652475] Updated weights for policy 0, policy_version 599815 (0.0011) [2024-06-15 19:24:04,582][1652475] Updated weights for policy 0, policy_version 599904 (0.0014) [2024-06-15 19:24:05,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.5, 300 sec: 42876.1). Total num frames: 1228668928. Throughput: 0: 10799.7. Samples: 307198464. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:24:05,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:24:09,459][1652475] Updated weights for policy 0, policy_version 599938 (0.0055) [2024-06-15 19:24:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1228800000. Throughput: 0: 10899.9. Samples: 307271680. Policy #0 lag: (min: 32.0, avg: 156.2, max: 288.0) [2024-06-15 19:24:10,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 19:24:11,411][1652475] Updated weights for policy 0, policy_version 600017 (0.0013) [2024-06-15 19:24:12,350][1652475] Updated weights for policy 0, policy_version 600064 (0.0013) [2024-06-15 19:24:15,738][1648984] Fps is (10 sec: 42599.8, 60 sec: 44236.9, 300 sec: 43209.3). Total num frames: 1229094912. Throughput: 0: 10774.8. Samples: 307331072. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:24:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:24:15,739][1652475] Updated weights for policy 0, policy_version 600144 (0.0119) [2024-06-15 19:24:16,712][1652475] Updated weights for policy 0, policy_version 600191 (0.0014) [2024-06-15 19:24:20,740][1648984] Fps is (10 sec: 39321.6, 60 sec: 43697.5, 300 sec: 42653.9). Total num frames: 1229193216. Throughput: 0: 10729.3. Samples: 307365888. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:24:20,741][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:24:22,687][1652475] Updated weights for policy 0, policy_version 600248 (0.0015) [2024-06-15 19:24:24,291][1652475] Updated weights for policy 0, policy_version 600304 (0.0013) [2024-06-15 19:24:25,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 1229488128. Throughput: 0: 10763.4. Samples: 307428864. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:24:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:24:26,251][1652475] Updated weights for policy 0, policy_version 600357 (0.0014) [2024-06-15 19:24:27,574][1652475] Updated weights for policy 0, policy_version 600390 (0.0012) [2024-06-15 19:24:28,986][1652475] Updated weights for policy 0, policy_version 600444 (0.0013) [2024-06-15 19:24:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1229717504. Throughput: 0: 10592.7. Samples: 307490304. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:24:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:24:34,212][1652475] Updated weights for policy 0, policy_version 600510 (0.0013) [2024-06-15 19:24:35,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42052.3, 300 sec: 43098.2). Total num frames: 1229848576. Throughput: 0: 10786.1. Samples: 307527168. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:24:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:24:37,496][1651340] Signal inference workers to stop experience collection... (30900 times) [2024-06-15 19:24:37,555][1652475] InferenceWorker_p0-w0: stopping experience collection (30900 times) [2024-06-15 19:24:37,752][1651340] Signal inference workers to resume experience collection... (30900 times) [2024-06-15 19:24:37,753][1652475] InferenceWorker_p0-w0: resuming experience collection (30900 times) [2024-06-15 19:24:38,326][1652475] Updated weights for policy 0, policy_version 600593 (0.0012) [2024-06-15 19:24:40,738][1648984] Fps is (10 sec: 39320.1, 60 sec: 42052.0, 300 sec: 42654.0). Total num frames: 1230110720. Throughput: 0: 10660.9. Samples: 307579904. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:24:40,739][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 19:24:41,583][1652475] Updated weights for policy 0, policy_version 600643 (0.0041) [2024-06-15 19:24:42,599][1652475] Updated weights for policy 0, policy_version 600697 (0.0180) [2024-06-15 19:24:45,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 1230274560. Throughput: 0: 10899.9. Samples: 307653632. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:24:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:24:46,038][1652475] Updated weights for policy 0, policy_version 600737 (0.0099) [2024-06-15 19:24:48,611][1652475] Updated weights for policy 0, policy_version 600770 (0.0025) [2024-06-15 19:24:50,595][1652475] Updated weights for policy 0, policy_version 600864 (0.0012) [2024-06-15 19:24:50,738][1648984] Fps is (10 sec: 45877.4, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 1230569472. Throughput: 0: 10786.2. Samples: 307683840. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:24:50,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 19:24:54,236][1652475] Updated weights for policy 0, policy_version 600928 (0.0012) [2024-06-15 19:24:55,739][1648984] Fps is (10 sec: 49147.6, 60 sec: 43690.1, 300 sec: 42653.8). Total num frames: 1230766080. Throughput: 0: 10547.0. Samples: 307746304. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:24:55,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 19:24:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000600960_1230766080.pth... [2024-06-15 19:24:55,790][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000595936_1220476928.pth [2024-06-15 19:24:59,092][1652475] Updated weights for policy 0, policy_version 600978 (0.0013) [2024-06-15 19:25:00,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 42052.2, 300 sec: 42542.9). Total num frames: 1230929920. Throughput: 0: 10638.2. Samples: 307809792. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:25:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:25:01,746][1652475] Updated weights for policy 0, policy_version 601090 (0.0219) [2024-06-15 19:25:03,050][1652475] Updated weights for policy 0, policy_version 601147 (0.0011) [2024-06-15 19:25:05,738][1648984] Fps is (10 sec: 39325.2, 60 sec: 41506.3, 300 sec: 42765.0). Total num frames: 1231159296. Throughput: 0: 10467.6. Samples: 307836928. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:25:05,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:25:07,360][1652475] Updated weights for policy 0, policy_version 601210 (0.0017) [2024-06-15 19:25:10,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 41506.2, 300 sec: 42654.0). Total num frames: 1231290368. Throughput: 0: 10626.9. Samples: 307907072. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:25:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:25:12,395][1652475] Updated weights for policy 0, policy_version 601278 (0.0114) [2024-06-15 19:25:14,357][1652475] Updated weights for policy 0, policy_version 601348 (0.0012) [2024-06-15 19:25:15,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 1231683584. Throughput: 0: 10490.3. Samples: 307962368. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:25:15,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:25:18,478][1652475] Updated weights for policy 0, policy_version 601424 (0.0025) [2024-06-15 19:25:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 42654.0). Total num frames: 1231814656. Throughput: 0: 10547.2. Samples: 308001792. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:25:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:25:24,180][1652475] Updated weights for policy 0, policy_version 601520 (0.0014) [2024-06-15 19:25:24,827][1651340] Signal inference workers to stop experience collection... (30950 times) [2024-06-15 19:25:24,851][1652475] InferenceWorker_p0-w0: stopping experience collection (30950 times) [2024-06-15 19:25:25,050][1651340] Signal inference workers to resume experience collection... (30950 times) [2024-06-15 19:25:25,051][1652475] InferenceWorker_p0-w0: resuming experience collection (30950 times) [2024-06-15 19:25:25,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 42052.2, 300 sec: 42876.2). Total num frames: 1232011264. Throughput: 0: 10854.5. Samples: 308068352. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:25:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:25:25,986][1652475] Updated weights for policy 0, policy_version 601586 (0.0012) [2024-06-15 19:25:28,028][1652475] Updated weights for policy 0, policy_version 601664 (0.0022) [2024-06-15 19:25:30,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1232273408. Throughput: 0: 10626.8. Samples: 308131840. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:25:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:25:35,168][1652475] Updated weights for policy 0, policy_version 601735 (0.0015) [2024-06-15 19:25:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1232404480. Throughput: 0: 10763.4. Samples: 308168192. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:25:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:25:36,147][1652475] Updated weights for policy 0, policy_version 601790 (0.0040) [2024-06-15 19:25:37,677][1652475] Updated weights for policy 0, policy_version 601840 (0.0013) [2024-06-15 19:25:39,735][1652475] Updated weights for policy 0, policy_version 601913 (0.0033) [2024-06-15 19:25:40,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43691.0, 300 sec: 43098.3). Total num frames: 1232732160. Throughput: 0: 10683.9. Samples: 308227072. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:25:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:25:43,203][1652475] Updated weights for policy 0, policy_version 601968 (0.0011) [2024-06-15 19:25:45,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1232863232. Throughput: 0: 10888.5. Samples: 308299776. Policy #0 lag: (min: 4.0, avg: 117.7, max: 260.0) [2024-06-15 19:25:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:25:48,135][1652475] Updated weights for policy 0, policy_version 602016 (0.0013) [2024-06-15 19:25:50,272][1652475] Updated weights for policy 0, policy_version 602085 (0.0013) [2024-06-15 19:25:50,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1233092608. Throughput: 0: 10979.6. Samples: 308331008. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:25:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:25:52,172][1652475] Updated weights for policy 0, policy_version 602160 (0.0122) [2024-06-15 19:25:55,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42599.0, 300 sec: 42542.8). Total num frames: 1233321984. Throughput: 0: 10672.3. Samples: 308387328. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:25:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:25:55,954][1652475] Updated weights for policy 0, policy_version 602234 (0.0027) [2024-06-15 19:26:00,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 1233387520. Throughput: 0: 11002.3. Samples: 308457472. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:26:01,822][1652475] Updated weights for policy 0, policy_version 602291 (0.0017) [2024-06-15 19:26:04,034][1652475] Updated weights for policy 0, policy_version 602375 (0.0012) [2024-06-15 19:26:05,740][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 1233780736. Throughput: 0: 10615.4. Samples: 308479488. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:05,742][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:26:06,722][1652475] Updated weights for policy 0, policy_version 602436 (0.0014) [2024-06-15 19:26:07,544][1651340] Signal inference workers to stop experience collection... (31000 times) [2024-06-15 19:26:07,657][1652475] InferenceWorker_p0-w0: stopping experience collection (31000 times) [2024-06-15 19:26:07,924][1651340] Signal inference workers to resume experience collection... (31000 times) [2024-06-15 19:26:07,925][1652475] InferenceWorker_p0-w0: resuming experience collection (31000 times) [2024-06-15 19:26:08,168][1652475] Updated weights for policy 0, policy_version 602487 (0.0032) [2024-06-15 19:26:10,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1233911808. Throughput: 0: 10547.2. Samples: 308542976. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:26:13,655][1652475] Updated weights for policy 0, policy_version 602531 (0.0012) [2024-06-15 19:26:15,248][1652475] Updated weights for policy 0, policy_version 602596 (0.0022) [2024-06-15 19:26:15,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 1234173952. Throughput: 0: 10604.1. Samples: 308609024. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:26:17,645][1652475] Updated weights for policy 0, policy_version 602640 (0.0015) [2024-06-15 19:26:19,886][1652475] Updated weights for policy 0, policy_version 602720 (0.0017) [2024-06-15 19:26:20,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1234436096. Throughput: 0: 10592.7. Samples: 308644864. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:20,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 19:26:25,761][1648984] Fps is (10 sec: 35974.7, 60 sec: 42038.7, 300 sec: 42984.3). Total num frames: 1234534400. Throughput: 0: 10633.6. Samples: 308705792. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:25,767][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:26:26,043][1652475] Updated weights for policy 0, policy_version 602816 (0.0016) [2024-06-15 19:26:27,229][1652475] Updated weights for policy 0, policy_version 602875 (0.0014) [2024-06-15 19:26:30,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1234763776. Throughput: 0: 10399.3. Samples: 308767744. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:26:31,955][1652475] Updated weights for policy 0, policy_version 602960 (0.0014) [2024-06-15 19:26:35,738][1648984] Fps is (10 sec: 42680.6, 60 sec: 42598.2, 300 sec: 42654.2). Total num frames: 1234960384. Throughput: 0: 10262.7. Samples: 308792832. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:35,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:26:36,660][1652475] Updated weights for policy 0, policy_version 603009 (0.0015) [2024-06-15 19:26:38,650][1652475] Updated weights for policy 0, policy_version 603073 (0.0014) [2024-06-15 19:26:39,807][1652475] Updated weights for policy 0, policy_version 603127 (0.0014) [2024-06-15 19:26:40,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1235222528. Throughput: 0: 10570.0. Samples: 308862976. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:26:43,193][1652475] Updated weights for policy 0, policy_version 603184 (0.0015) [2024-06-15 19:26:45,060][1652475] Updated weights for policy 0, policy_version 603257 (0.0104) [2024-06-15 19:26:45,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1235484672. Throughput: 0: 10274.1. Samples: 308919808. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:45,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 19:26:49,193][1652475] Updated weights for policy 0, policy_version 603312 (0.0014) [2024-06-15 19:26:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42052.2, 300 sec: 43098.2). Total num frames: 1235615744. Throughput: 0: 10661.0. Samples: 308959232. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:50,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 19:26:53,133][1652475] Updated weights for policy 0, policy_version 603381 (0.0015) [2024-06-15 19:26:55,186][1652475] Updated weights for policy 0, policy_version 603440 (0.0013) [2024-06-15 19:26:55,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 1235877888. Throughput: 0: 10649.6. Samples: 309022208. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:26:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:26:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000603456_1235877888.pth... [2024-06-15 19:26:55,809][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000598400_1225523200.pth [2024-06-15 19:26:56,537][1651340] Signal inference workers to stop experience collection... (31050 times) [2024-06-15 19:26:56,607][1652475] InferenceWorker_p0-w0: stopping experience collection (31050 times) [2024-06-15 19:26:56,728][1651340] Signal inference workers to resume experience collection... (31050 times) [2024-06-15 19:26:56,729][1652475] InferenceWorker_p0-w0: resuming experience collection (31050 times) [2024-06-15 19:26:57,104][1652475] Updated weights for policy 0, policy_version 603488 (0.0014) [2024-06-15 19:27:00,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44783.0, 300 sec: 42876.1). Total num frames: 1236074496. Throughput: 0: 10706.5. Samples: 309090816. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:27:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:27:00,888][1652475] Updated weights for policy 0, policy_version 603568 (0.0021) [2024-06-15 19:27:04,912][1652475] Updated weights for policy 0, policy_version 603642 (0.0014) [2024-06-15 19:27:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 42765.0). Total num frames: 1236271104. Throughput: 0: 10626.9. Samples: 309123072. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:27:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:27:09,197][1652475] Updated weights for policy 0, policy_version 603715 (0.0014) [2024-06-15 19:27:10,290][1652475] Updated weights for policy 0, policy_version 603774 (0.0028) [2024-06-15 19:27:10,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1236533248. Throughput: 0: 10654.2. Samples: 309185024. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:27:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:27:13,692][1652475] Updated weights for policy 0, policy_version 603833 (0.0022) [2024-06-15 19:27:15,569][1652475] Updated weights for policy 0, policy_version 603904 (0.0018) [2024-06-15 19:27:15,738][1648984] Fps is (10 sec: 52427.0, 60 sec: 43690.4, 300 sec: 43098.3). Total num frames: 1236795392. Throughput: 0: 10763.3. Samples: 309252096. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:27:15,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 19:27:18,154][1652475] Updated weights for policy 0, policy_version 603966 (0.0021) [2024-06-15 19:27:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1236959232. Throughput: 0: 10922.7. Samples: 309284352. Policy #0 lag: (min: 31.0, avg: 101.5, max: 287.0) [2024-06-15 19:27:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:27:21,475][1652475] Updated weights for policy 0, policy_version 604022 (0.0013) [2024-06-15 19:27:25,675][1652475] Updated weights for policy 0, policy_version 604068 (0.0014) [2024-06-15 19:27:25,738][1648984] Fps is (10 sec: 32768.8, 60 sec: 43158.5, 300 sec: 42876.1). Total num frames: 1237123072. Throughput: 0: 11116.1. Samples: 309363200. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:27:25,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:27:27,520][1652475] Updated weights for policy 0, policy_version 604144 (0.0095) [2024-06-15 19:27:29,430][1652475] Updated weights for policy 0, policy_version 604194 (0.0013) [2024-06-15 19:27:30,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 44782.9, 300 sec: 42987.2). Total num frames: 1237450752. Throughput: 0: 11025.0. Samples: 309415936. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:27:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:27:32,529][1652475] Updated weights for policy 0, policy_version 604272 (0.0012) [2024-06-15 19:27:35,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1237581824. Throughput: 0: 10854.4. Samples: 309447680. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:27:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:27:38,031][1652475] Updated weights for policy 0, policy_version 604336 (0.0015) [2024-06-15 19:27:39,503][1652475] Updated weights for policy 0, policy_version 604384 (0.0054) [2024-06-15 19:27:40,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43690.5, 300 sec: 43209.3). Total num frames: 1237843968. Throughput: 0: 11127.4. Samples: 309522944. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:27:40,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:27:41,610][1652475] Updated weights for policy 0, policy_version 604464 (0.0013) [2024-06-15 19:27:43,257][1651340] Signal inference workers to stop experience collection... (31100 times) [2024-06-15 19:27:43,354][1652475] InferenceWorker_p0-w0: stopping experience collection (31100 times) [2024-06-15 19:27:43,442][1651340] Signal inference workers to resume experience collection... (31100 times) [2024-06-15 19:27:43,444][1652475] InferenceWorker_p0-w0: resuming experience collection (31100 times) [2024-06-15 19:27:43,446][1652475] Updated weights for policy 0, policy_version 604512 (0.0097) [2024-06-15 19:27:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1238106112. Throughput: 0: 11070.6. Samples: 309588992. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:27:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:27:48,731][1652475] Updated weights for policy 0, policy_version 604564 (0.0017) [2024-06-15 19:27:50,476][1652475] Updated weights for policy 0, policy_version 604640 (0.0015) [2024-06-15 19:27:50,738][1648984] Fps is (10 sec: 45876.4, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 1238302720. Throughput: 0: 11332.3. Samples: 309633024. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:27:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:27:51,706][1652475] Updated weights for policy 0, policy_version 604675 (0.0012) [2024-06-15 19:27:53,805][1652475] Updated weights for policy 0, policy_version 604737 (0.0037) [2024-06-15 19:27:55,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 43653.6). Total num frames: 1238630400. Throughput: 0: 11275.4. Samples: 309692416. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:27:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:27:59,942][1652475] Updated weights for policy 0, policy_version 604806 (0.0132) [2024-06-15 19:28:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1238695936. Throughput: 0: 11537.1. Samples: 309771264. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:28:02,679][1652475] Updated weights for policy 0, policy_version 604912 (0.0013) [2024-06-15 19:28:04,138][1652475] Updated weights for policy 0, policy_version 604962 (0.0011) [2024-06-15 19:28:05,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 46421.3, 300 sec: 43653.6). Total num frames: 1239056384. Throughput: 0: 11355.0. Samples: 309795328. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:05,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:28:06,036][1652475] Updated weights for policy 0, policy_version 605025 (0.0103) [2024-06-15 19:28:10,738][1648984] Fps is (10 sec: 45874.5, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 1239154688. Throughput: 0: 11241.2. Samples: 309869056. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:10,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:28:11,502][1652475] Updated weights for policy 0, policy_version 605059 (0.0011) [2024-06-15 19:28:12,628][1652475] Updated weights for policy 0, policy_version 605115 (0.0030) [2024-06-15 19:28:14,799][1652475] Updated weights for policy 0, policy_version 605202 (0.0013) [2024-06-15 19:28:15,739][1648984] Fps is (10 sec: 49145.8, 60 sec: 45874.4, 300 sec: 43988.1). Total num frames: 1239547904. Throughput: 0: 11536.7. Samples: 309935104. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:15,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:28:16,672][1652475] Updated weights for policy 0, policy_version 605264 (0.0014) [2024-06-15 19:28:17,554][1652475] Updated weights for policy 0, policy_version 605305 (0.0014) [2024-06-15 19:28:20,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 45329.1, 300 sec: 43209.4). Total num frames: 1239678976. Throughput: 0: 11548.4. Samples: 309967360. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:20,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 19:28:23,673][1652475] Updated weights for policy 0, policy_version 605344 (0.0014) [2024-06-15 19:28:25,127][1652475] Updated weights for policy 0, policy_version 605408 (0.0096) [2024-06-15 19:28:25,738][1648984] Fps is (10 sec: 36049.8, 60 sec: 46421.4, 300 sec: 43431.5). Total num frames: 1239908352. Throughput: 0: 11525.8. Samples: 310041600. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:28:27,860][1652475] Updated weights for policy 0, policy_version 605475 (0.0013) [2024-06-15 19:28:28,642][1651340] Signal inference workers to stop experience collection... (31150 times) [2024-06-15 19:28:28,679][1652475] InferenceWorker_p0-w0: stopping experience collection (31150 times) [2024-06-15 19:28:28,901][1651340] Signal inference workers to resume experience collection... (31150 times) [2024-06-15 19:28:28,902][1652475] InferenceWorker_p0-w0: resuming experience collection (31150 times) [2024-06-15 19:28:29,435][1652475] Updated weights for policy 0, policy_version 605540 (0.0012) [2024-06-15 19:28:30,035][1652475] Updated weights for policy 0, policy_version 605565 (0.0013) [2024-06-15 19:28:30,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 45875.3, 300 sec: 43653.7). Total num frames: 1240203264. Throughput: 0: 11286.8. Samples: 310096896. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:28:35,590][1652475] Updated weights for policy 0, policy_version 605623 (0.0015) [2024-06-15 19:28:35,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 45875.2, 300 sec: 43209.3). Total num frames: 1240334336. Throughput: 0: 11241.2. Samples: 310138880. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:28:36,952][1652475] Updated weights for policy 0, policy_version 605690 (0.0014) [2024-06-15 19:28:39,427][1652475] Updated weights for policy 0, policy_version 605752 (0.0012) [2024-06-15 19:28:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 45875.3, 300 sec: 43986.9). Total num frames: 1240596480. Throughput: 0: 11298.1. Samples: 310200832. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:40,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 19:28:42,715][1652475] Updated weights for policy 0, policy_version 605816 (0.0122) [2024-06-15 19:28:45,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 1240727552. Throughput: 0: 11116.1. Samples: 310271488. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:45,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 19:28:47,412][1652475] Updated weights for policy 0, policy_version 605872 (0.0013) [2024-06-15 19:28:48,821][1652475] Updated weights for policy 0, policy_version 605936 (0.0011) [2024-06-15 19:28:50,115][1652475] Updated weights for policy 0, policy_version 605984 (0.0014) [2024-06-15 19:28:50,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 46421.2, 300 sec: 43875.8). Total num frames: 1241088000. Throughput: 0: 11298.1. Samples: 310303744. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:28:50,866][1652475] Updated weights for policy 0, policy_version 606016 (0.0014) [2024-06-15 19:28:54,433][1652475] Updated weights for policy 0, policy_version 606070 (0.0015) [2024-06-15 19:28:55,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1241251840. Throughput: 0: 11150.3. Samples: 310370816. Policy #0 lag: (min: 15.0, avg: 75.6, max: 207.0) [2024-06-15 19:28:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:28:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000606080_1241251840.pth... [2024-06-15 19:28:55,805][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000600960_1230766080.pth [2024-06-15 19:28:58,330][1652475] Updated weights for policy 0, policy_version 606138 (0.0096) [2024-06-15 19:29:00,451][1652475] Updated weights for policy 0, policy_version 606180 (0.0091) [2024-06-15 19:29:00,738][1648984] Fps is (10 sec: 39318.6, 60 sec: 46420.7, 300 sec: 43431.4). Total num frames: 1241481216. Throughput: 0: 11207.2. Samples: 310439424. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:00,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:29:02,843][1652475] Updated weights for policy 0, policy_version 606240 (0.0012) [2024-06-15 19:29:04,220][1652475] Updated weights for policy 0, policy_version 606274 (0.0011) [2024-06-15 19:29:05,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 45329.2, 300 sec: 43986.9). Total num frames: 1241776128. Throughput: 0: 11184.4. Samples: 310470656. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:29:09,428][1652475] Updated weights for policy 0, policy_version 606355 (0.0012) [2024-06-15 19:29:10,738][1648984] Fps is (10 sec: 42602.0, 60 sec: 45875.3, 300 sec: 43431.5). Total num frames: 1241907200. Throughput: 0: 10899.9. Samples: 310532096. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:10,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 19:29:12,315][1652475] Updated weights for policy 0, policy_version 606416 (0.0045) [2024-06-15 19:29:15,517][1652475] Updated weights for policy 0, policy_version 606480 (0.0015) [2024-06-15 19:29:15,738][1648984] Fps is (10 sec: 29490.7, 60 sec: 42053.2, 300 sec: 43653.6). Total num frames: 1242071040. Throughput: 0: 11264.0. Samples: 310603776. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:15,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 19:29:16,524][1651340] Signal inference workers to stop experience collection... (31200 times) [2024-06-15 19:29:16,577][1652475] InferenceWorker_p0-w0: stopping experience collection (31200 times) [2024-06-15 19:29:16,862][1651340] Signal inference workers to resume experience collection... (31200 times) [2024-06-15 19:29:16,863][1652475] InferenceWorker_p0-w0: resuming experience collection (31200 times) [2024-06-15 19:29:17,863][1652475] Updated weights for policy 0, policy_version 606566 (0.0012) [2024-06-15 19:29:20,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 1242300416. Throughput: 0: 10774.7. Samples: 310623744. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:20,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:29:22,097][1652475] Updated weights for policy 0, policy_version 606615 (0.0013) [2024-06-15 19:29:23,702][1652475] Updated weights for policy 0, policy_version 606672 (0.0012) [2024-06-15 19:29:24,693][1652475] Updated weights for policy 0, policy_version 606717 (0.0016) [2024-06-15 19:29:25,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 44236.6, 300 sec: 43542.5). Total num frames: 1242562560. Throughput: 0: 10990.9. Samples: 310695424. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:25,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:29:28,537][1652475] Updated weights for policy 0, policy_version 606768 (0.0014) [2024-06-15 19:29:30,399][1652475] Updated weights for policy 0, policy_version 606848 (0.0014) [2024-06-15 19:29:30,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 1242824704. Throughput: 0: 10763.4. Samples: 310755840. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:29:35,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 1242923008. Throughput: 0: 10911.3. Samples: 310794752. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:29:36,441][1652475] Updated weights for policy 0, policy_version 606928 (0.0013) [2024-06-15 19:29:40,572][1652475] Updated weights for policy 0, policy_version 607008 (0.0153) [2024-06-15 19:29:40,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 1243152384. Throughput: 0: 10808.9. Samples: 310857216. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:29:42,432][1652475] Updated weights for policy 0, policy_version 607088 (0.0019) [2024-06-15 19:29:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.8, 300 sec: 43320.4). Total num frames: 1243348992. Throughput: 0: 10661.2. Samples: 310919168. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:29:47,031][1652475] Updated weights for policy 0, policy_version 607125 (0.0013) [2024-06-15 19:29:48,480][1652475] Updated weights for policy 0, policy_version 607169 (0.0021) [2024-06-15 19:29:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42052.3, 300 sec: 43542.7). Total num frames: 1243611136. Throughput: 0: 10752.0. Samples: 310954496. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:29:52,291][1652475] Updated weights for policy 0, policy_version 607264 (0.0012) [2024-06-15 19:29:53,684][1652475] Updated weights for policy 0, policy_version 607328 (0.0037) [2024-06-15 19:29:55,739][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43875.8). Total num frames: 1243873280. Throughput: 0: 10752.0. Samples: 311015936. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:29:55,740][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:29:58,453][1652475] Updated weights for policy 0, policy_version 607379 (0.0012) [2024-06-15 19:30:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.8, 300 sec: 43542.5). Total num frames: 1244004352. Throughput: 0: 10740.6. Samples: 311087104. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:30:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:30:01,381][1652475] Updated weights for policy 0, policy_version 607458 (0.0012) [2024-06-15 19:30:03,036][1651340] Signal inference workers to stop experience collection... (31250 times) [2024-06-15 19:30:03,126][1652475] InferenceWorker_p0-w0: stopping experience collection (31250 times) [2024-06-15 19:30:03,360][1651340] Signal inference workers to resume experience collection... (31250 times) [2024-06-15 19:30:03,362][1652475] InferenceWorker_p0-w0: resuming experience collection (31250 times) [2024-06-15 19:30:03,951][1652475] Updated weights for policy 0, policy_version 607521 (0.0019) [2024-06-15 19:30:05,658][1652475] Updated weights for policy 0, policy_version 607600 (0.0013) [2024-06-15 19:30:05,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.4, 300 sec: 44320.1). Total num frames: 1244364800. Throughput: 0: 11036.4. Samples: 311120384. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:30:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:30:10,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 1244495872. Throughput: 0: 11013.7. Samples: 311191040. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:30:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:30:10,851][1652475] Updated weights for policy 0, policy_version 607676 (0.0014) [2024-06-15 19:30:13,785][1652475] Updated weights for policy 0, policy_version 607737 (0.0014) [2024-06-15 19:30:15,677][1652475] Updated weights for policy 0, policy_version 607784 (0.0015) [2024-06-15 19:30:15,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 44236.8, 300 sec: 43764.7). Total num frames: 1244725248. Throughput: 0: 10968.2. Samples: 311249408. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:30:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:30:17,152][1652475] Updated weights for policy 0, policy_version 607829 (0.0017) [2024-06-15 19:30:20,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 1244921856. Throughput: 0: 10865.8. Samples: 311283712. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:30:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:30:22,153][1652475] Updated weights for policy 0, policy_version 607906 (0.0015) [2024-06-15 19:30:25,309][1652475] Updated weights for policy 0, policy_version 607984 (0.0013) [2024-06-15 19:30:25,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.8, 300 sec: 43764.7). Total num frames: 1245184000. Throughput: 0: 11070.6. Samples: 311355392. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:30:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:30:27,602][1652475] Updated weights for policy 0, policy_version 608058 (0.0031) [2024-06-15 19:30:29,498][1652475] Updated weights for policy 0, policy_version 608102 (0.0013) [2024-06-15 19:30:30,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 1245446144. Throughput: 0: 11025.1. Samples: 311415296. Policy #0 lag: (min: 13.0, avg: 132.5, max: 333.0) [2024-06-15 19:30:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:30:34,897][1652475] Updated weights for policy 0, policy_version 608176 (0.0028) [2024-06-15 19:30:35,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 44236.9, 300 sec: 43542.6). Total num frames: 1245577216. Throughput: 0: 11104.7. Samples: 311454208. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:30:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:30:36,368][1652475] Updated weights for policy 0, policy_version 608208 (0.0012) [2024-06-15 19:30:39,147][1652475] Updated weights for policy 0, policy_version 608277 (0.0013) [2024-06-15 19:30:40,471][1652475] Updated weights for policy 0, policy_version 608337 (0.0015) [2024-06-15 19:30:40,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 45875.3, 300 sec: 44209.0). Total num frames: 1245904896. Throughput: 0: 11127.5. Samples: 311516672. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:30:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:30:45,721][1652475] Updated weights for policy 0, policy_version 608386 (0.0013) [2024-06-15 19:30:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 1245970432. Throughput: 0: 11025.1. Samples: 311583232. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:30:45,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:30:46,966][1652475] Updated weights for policy 0, policy_version 608446 (0.0014) [2024-06-15 19:30:50,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 43690.8, 300 sec: 43764.7). Total num frames: 1246232576. Throughput: 0: 10968.2. Samples: 311613952. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:30:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:30:51,091][1652475] Updated weights for policy 0, policy_version 608513 (0.0014) [2024-06-15 19:30:51,383][1651340] Signal inference workers to stop experience collection... (31300 times) [2024-06-15 19:30:51,424][1652475] InferenceWorker_p0-w0: stopping experience collection (31300 times) [2024-06-15 19:30:51,609][1651340] Signal inference workers to resume experience collection... (31300 times) [2024-06-15 19:30:51,610][1652475] InferenceWorker_p0-w0: resuming experience collection (31300 times) [2024-06-15 19:30:53,692][1652475] Updated weights for policy 0, policy_version 608579 (0.0014) [2024-06-15 19:30:55,039][1652475] Updated weights for policy 0, policy_version 608639 (0.0018) [2024-06-15 19:30:55,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 1246494720. Throughput: 0: 10808.9. Samples: 311677440. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:30:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:30:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000608640_1246494720.pth... [2024-06-15 19:30:55,812][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000603456_1235877888.pth [2024-06-15 19:30:55,818][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000608640_1246494720.pth [2024-06-15 19:30:58,791][1652475] Updated weights for policy 0, policy_version 608704 (0.0014) [2024-06-15 19:31:00,518][1652475] Updated weights for policy 0, policy_version 608763 (0.0017) [2024-06-15 19:31:00,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 43986.9). Total num frames: 1246756864. Throughput: 0: 10990.9. Samples: 311744000. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:31:03,268][1652475] Updated weights for policy 0, policy_version 608826 (0.0013) [2024-06-15 19:31:05,746][1648984] Fps is (10 sec: 39288.6, 60 sec: 42046.4, 300 sec: 43985.6). Total num frames: 1246887936. Throughput: 0: 10932.0. Samples: 311775744. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:05,747][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 19:31:06,972][1652475] Updated weights for policy 0, policy_version 608884 (0.0013) [2024-06-15 19:31:10,524][1652475] Updated weights for policy 0, policy_version 608928 (0.0013) [2024-06-15 19:31:10,738][1648984] Fps is (10 sec: 32767.1, 60 sec: 43144.3, 300 sec: 43764.7). Total num frames: 1247084544. Throughput: 0: 10922.6. Samples: 311846912. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:10,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:31:12,400][1652475] Updated weights for policy 0, policy_version 609020 (0.0014) [2024-06-15 19:31:15,313][1652475] Updated weights for policy 0, policy_version 609087 (0.0013) [2024-06-15 19:31:15,738][1648984] Fps is (10 sec: 52473.4, 60 sec: 44782.9, 300 sec: 43986.9). Total num frames: 1247412224. Throughput: 0: 10888.5. Samples: 311905280. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:31:18,887][1652475] Updated weights for policy 0, policy_version 609136 (0.0012) [2024-06-15 19:31:20,738][1648984] Fps is (10 sec: 45876.5, 60 sec: 43690.7, 300 sec: 44100.9). Total num frames: 1247543296. Throughput: 0: 10740.6. Samples: 311937536. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:20,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 19:31:22,821][1652475] Updated weights for policy 0, policy_version 609187 (0.0028) [2024-06-15 19:31:24,310][1652475] Updated weights for policy 0, policy_version 609232 (0.0012) [2024-06-15 19:31:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 1247805440. Throughput: 0: 10843.0. Samples: 312004608. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 19:31:27,580][1652475] Updated weights for policy 0, policy_version 609282 (0.0012) [2024-06-15 19:31:29,768][1652475] Updated weights for policy 0, policy_version 609376 (0.0013) [2024-06-15 19:31:30,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 1248067584. Throughput: 0: 10706.5. Samples: 312065024. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 19:31:33,901][1652475] Updated weights for policy 0, policy_version 609426 (0.0017) [2024-06-15 19:31:34,781][1652475] Updated weights for policy 0, policy_version 609468 (0.0013) [2024-06-15 19:31:35,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 44209.0). Total num frames: 1248264192. Throughput: 0: 10888.5. Samples: 312103936. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:35,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 19:31:35,772][1652475] Updated weights for policy 0, policy_version 609508 (0.0020) [2024-06-15 19:31:36,231][1652475] Updated weights for policy 0, policy_version 609536 (0.0012) [2024-06-15 19:31:39,345][1651340] Signal inference workers to stop experience collection... (31350 times) [2024-06-15 19:31:39,426][1652475] InferenceWorker_p0-w0: stopping experience collection (31350 times) [2024-06-15 19:31:39,708][1651340] Signal inference workers to resume experience collection... (31350 times) [2024-06-15 19:31:39,709][1652475] InferenceWorker_p0-w0: resuming experience collection (31350 times) [2024-06-15 19:31:40,740][1648984] Fps is (10 sec: 39321.3, 60 sec: 42598.4, 300 sec: 43986.9). Total num frames: 1248460800. Throughput: 0: 11184.4. Samples: 312180736. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:40,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:31:40,976][1652475] Updated weights for policy 0, policy_version 609604 (0.0021) [2024-06-15 19:31:42,258][1652475] Updated weights for policy 0, policy_version 609662 (0.0032) [2024-06-15 19:31:45,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 1248591872. Throughput: 0: 11104.7. Samples: 312243712. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:31:46,501][1652475] Updated weights for policy 0, policy_version 609712 (0.0013) [2024-06-15 19:31:47,952][1652475] Updated weights for policy 0, policy_version 609776 (0.0013) [2024-06-15 19:31:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 1248854016. Throughput: 0: 11038.5. Samples: 312272384. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:50,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 19:31:51,921][1652475] Updated weights for policy 0, policy_version 609826 (0.0012) [2024-06-15 19:31:52,620][1652475] Updated weights for policy 0, policy_version 609856 (0.0047) [2024-06-15 19:31:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 1249116160. Throughput: 0: 10774.8. Samples: 312331776. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:31:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 19:31:56,847][1652475] Updated weights for policy 0, policy_version 609922 (0.0029) [2024-06-15 19:31:58,022][1652475] Updated weights for policy 0, policy_version 609983 (0.0012) [2024-06-15 19:32:00,738][1648984] Fps is (10 sec: 45873.8, 60 sec: 42598.2, 300 sec: 44209.0). Total num frames: 1249312768. Throughput: 0: 11218.4. Samples: 312410112. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:32:00,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:32:01,092][1652475] Updated weights for policy 0, policy_version 610033 (0.0013) [2024-06-15 19:32:03,184][1652475] Updated weights for policy 0, policy_version 610085 (0.0014) [2024-06-15 19:32:04,652][1652475] Updated weights for policy 0, policy_version 610144 (0.0017) [2024-06-15 19:32:05,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 45881.7, 300 sec: 44431.2). Total num frames: 1249640448. Throughput: 0: 11138.8. Samples: 312438784. Policy #0 lag: (min: 23.0, avg: 113.5, max: 279.0) [2024-06-15 19:32:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:32:08,850][1652475] Updated weights for policy 0, policy_version 610196 (0.0014) [2024-06-15 19:32:10,738][1648984] Fps is (10 sec: 45876.8, 60 sec: 44783.1, 300 sec: 43986.9). Total num frames: 1249771520. Throughput: 0: 11150.2. Samples: 312506368. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:32:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:32:12,508][1652475] Updated weights for policy 0, policy_version 610272 (0.0013) [2024-06-15 19:32:14,428][1652475] Updated weights for policy 0, policy_version 610320 (0.0024) [2024-06-15 19:32:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 1250033664. Throughput: 0: 11264.0. Samples: 312571904. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:32:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:32:16,211][1652475] Updated weights for policy 0, policy_version 610370 (0.0013) [2024-06-15 19:32:17,689][1652475] Updated weights for policy 0, policy_version 610431 (0.0012) [2024-06-15 19:32:20,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 44320.1). Total num frames: 1250197504. Throughput: 0: 11036.5. Samples: 312600576. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:32:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:32:21,876][1652475] Updated weights for policy 0, policy_version 610492 (0.0013) [2024-06-15 19:32:25,740][1648984] Fps is (10 sec: 39310.9, 60 sec: 43688.7, 300 sec: 43986.5). Total num frames: 1250426880. Throughput: 0: 10853.8. Samples: 312669184. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:32:25,742][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:32:26,711][1651340] Signal inference workers to stop experience collection... (31400 times) [2024-06-15 19:32:26,758][1652475] InferenceWorker_p0-w0: stopping experience collection (31400 times) [2024-06-15 19:32:26,785][1652475] Updated weights for policy 0, policy_version 610565 (0.0131) [2024-06-15 19:32:26,905][1651340] Signal inference workers to resume experience collection... (31400 times) [2024-06-15 19:32:26,906][1652475] InferenceWorker_p0-w0: resuming experience collection (31400 times) [2024-06-15 19:32:28,001][1652475] Updated weights for policy 0, policy_version 610622 (0.0107) [2024-06-15 19:32:29,720][1652475] Updated weights for policy 0, policy_version 610676 (0.0012) [2024-06-15 19:32:30,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 1250689024. Throughput: 0: 10843.0. Samples: 312731648. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:32:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:32:32,934][1652475] Updated weights for policy 0, policy_version 610722 (0.0013) [2024-06-15 19:32:35,738][1648984] Fps is (10 sec: 39332.3, 60 sec: 42598.5, 300 sec: 43986.9). Total num frames: 1250820096. Throughput: 0: 11036.5. Samples: 312769024. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:32:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:32:35,934][1652475] Updated weights for policy 0, policy_version 610773 (0.0012) [2024-06-15 19:32:38,162][1652475] Updated weights for policy 0, policy_version 610835 (0.0018) [2024-06-15 19:32:39,188][1652475] Updated weights for policy 0, policy_version 610880 (0.0013) [2024-06-15 19:32:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 44097.9). Total num frames: 1251115008. Throughput: 0: 11173.0. Samples: 312834560. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:32:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:32:44,387][1652475] Updated weights for policy 0, policy_version 610964 (0.0015) [2024-06-15 19:32:45,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 44209.0). Total num frames: 1251344384. Throughput: 0: 10900.0. Samples: 312900608. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:32:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:32:47,070][1652475] Updated weights for policy 0, policy_version 611011 (0.0013) [2024-06-15 19:32:48,276][1652475] Updated weights for policy 0, policy_version 611069 (0.0014) [2024-06-15 19:32:50,737][1648984] Fps is (10 sec: 42599.2, 60 sec: 44783.1, 300 sec: 43764.7). Total num frames: 1251540992. Throughput: 0: 11082.0. Samples: 312937472. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:32:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:32:51,228][1652475] Updated weights for policy 0, policy_version 611128 (0.0088) [2024-06-15 19:32:52,983][1652475] Updated weights for policy 0, policy_version 611184 (0.0017) [2024-06-15 19:32:55,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 44236.7, 300 sec: 44320.1). Total num frames: 1251770368. Throughput: 0: 11013.7. Samples: 313001984. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:32:55,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:32:56,210][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000611248_1251835904.pth... [2024-06-15 19:32:56,211][1652475] Updated weights for policy 0, policy_version 611248 (0.0060) [2024-06-15 19:32:56,243][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000606080_1241251840.pth [2024-06-15 19:33:00,069][1652475] Updated weights for policy 0, policy_version 611312 (0.0015) [2024-06-15 19:33:00,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 44783.2, 300 sec: 43875.8). Total num frames: 1251999744. Throughput: 0: 10922.7. Samples: 313063424. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:33:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:33:03,673][1652475] Updated weights for policy 0, policy_version 611360 (0.0071) [2024-06-15 19:33:05,633][1652475] Updated weights for policy 0, policy_version 611427 (0.0013) [2024-06-15 19:33:05,744][1648984] Fps is (10 sec: 42572.5, 60 sec: 42593.9, 300 sec: 44208.1). Total num frames: 1252196352. Throughput: 0: 11137.3. Samples: 313101824. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:33:05,744][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:33:08,212][1652475] Updated weights for policy 0, policy_version 611488 (0.0013) [2024-06-15 19:33:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43542.8). Total num frames: 1252392960. Throughput: 0: 10775.4. Samples: 313154048. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:33:10,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 19:33:11,419][1652475] Updated weights for policy 0, policy_version 611538 (0.0012) [2024-06-15 19:33:12,390][1652475] Updated weights for policy 0, policy_version 611580 (0.0012) [2024-06-15 19:33:15,249][1651340] Signal inference workers to stop experience collection... (31450 times) [2024-06-15 19:33:15,297][1652475] InferenceWorker_p0-w0: stopping experience collection (31450 times) [2024-06-15 19:33:15,552][1651340] Signal inference workers to resume experience collection... (31450 times) [2024-06-15 19:33:15,554][1652475] InferenceWorker_p0-w0: resuming experience collection (31450 times) [2024-06-15 19:33:15,738][1648984] Fps is (10 sec: 39346.6, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 1252589568. Throughput: 0: 10968.2. Samples: 313225216. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:33:15,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 19:33:16,144][1652475] Updated weights for policy 0, policy_version 611636 (0.0014) [2024-06-15 19:33:19,800][1652475] Updated weights for policy 0, policy_version 611683 (0.0016) [2024-06-15 19:33:20,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 1252818944. Throughput: 0: 10820.3. Samples: 313255936. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:33:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:33:21,737][1652475] Updated weights for policy 0, policy_version 611771 (0.0101) [2024-06-15 19:33:23,963][1652475] Updated weights for policy 0, policy_version 611810 (0.0012) [2024-06-15 19:33:25,738][1648984] Fps is (10 sec: 45873.2, 60 sec: 43692.4, 300 sec: 43542.5). Total num frames: 1253048320. Throughput: 0: 10604.0. Samples: 313311744. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:33:25,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:33:27,196][1652475] Updated weights for policy 0, policy_version 611863 (0.0011) [2024-06-15 19:33:27,976][1652475] Updated weights for policy 0, policy_version 611902 (0.0014) [2024-06-15 19:33:30,740][1648984] Fps is (10 sec: 36035.3, 60 sec: 41504.3, 300 sec: 43542.2). Total num frames: 1253179392. Throughput: 0: 10728.6. Samples: 313383424. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:33:30,741][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 19:33:32,299][1652475] Updated weights for policy 0, policy_version 611961 (0.0014) [2024-06-15 19:33:34,135][1652475] Updated weights for policy 0, policy_version 612002 (0.0012) [2024-06-15 19:33:35,639][1652475] Updated weights for policy 0, policy_version 612067 (0.0012) [2024-06-15 19:33:35,738][1648984] Fps is (10 sec: 45876.9, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 1253507072. Throughput: 0: 10638.2. Samples: 313416192. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:33:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:33:36,336][1652475] Updated weights for policy 0, policy_version 612093 (0.0010) [2024-06-15 19:33:40,738][1648984] Fps is (10 sec: 45885.9, 60 sec: 42052.1, 300 sec: 43764.7). Total num frames: 1253638144. Throughput: 0: 10649.6. Samples: 313481216. Policy #0 lag: (min: 6.0, avg: 117.0, max: 262.0) [2024-06-15 19:33:40,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 19:33:41,167][1652475] Updated weights for policy 0, policy_version 612150 (0.0032) [2024-06-15 19:33:44,697][1652475] Updated weights for policy 0, policy_version 612225 (0.0153) [2024-06-15 19:33:45,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 1253933056. Throughput: 0: 10638.2. Samples: 313542144. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:33:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:33:47,272][1652475] Updated weights for policy 0, policy_version 612289 (0.0027) [2024-06-15 19:33:48,595][1652475] Updated weights for policy 0, policy_version 612347 (0.0016) [2024-06-15 19:33:50,738][1648984] Fps is (10 sec: 45876.4, 60 sec: 42598.3, 300 sec: 43542.6). Total num frames: 1254096896. Throughput: 0: 10582.8. Samples: 313577984. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:33:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:33:54,084][1652475] Updated weights for policy 0, policy_version 612420 (0.0144) [2024-06-15 19:33:55,574][1652475] Updated weights for policy 0, policy_version 612476 (0.0013) [2024-06-15 19:33:55,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43144.6, 300 sec: 43653.8). Total num frames: 1254359040. Throughput: 0: 10854.4. Samples: 313642496. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:33:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:33:58,621][1652475] Updated weights for policy 0, policy_version 612528 (0.0012) [2024-06-15 19:33:59,909][1651340] Signal inference workers to stop experience collection... (31500 times) [2024-06-15 19:33:59,992][1652475] InferenceWorker_p0-w0: stopping experience collection (31500 times) [2024-06-15 19:34:00,210][1651340] Signal inference workers to resume experience collection... (31500 times) [2024-06-15 19:34:00,211][1652475] InferenceWorker_p0-w0: resuming experience collection (31500 times) [2024-06-15 19:34:00,499][1652475] Updated weights for policy 0, policy_version 612605 (0.0180) [2024-06-15 19:34:00,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 1254621184. Throughput: 0: 10501.6. Samples: 313697792. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:00,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:34:05,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 40964.3, 300 sec: 43209.3). Total num frames: 1254653952. Throughput: 0: 10604.1. Samples: 313733120. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:05,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:34:06,787][1652475] Updated weights for policy 0, policy_version 612672 (0.0013) [2024-06-15 19:34:08,198][1652475] Updated weights for policy 0, policy_version 612728 (0.0014) [2024-06-15 19:34:10,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42598.3, 300 sec: 43653.6). Total num frames: 1254948864. Throughput: 0: 10740.7. Samples: 313795072. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:34:11,188][1652475] Updated weights for policy 0, policy_version 612785 (0.0012) [2024-06-15 19:34:12,364][1652475] Updated weights for policy 0, policy_version 612804 (0.0015) [2024-06-15 19:34:13,367][1652475] Updated weights for policy 0, policy_version 612855 (0.0017) [2024-06-15 19:34:15,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 42598.3, 300 sec: 43542.6). Total num frames: 1255145472. Throughput: 0: 10695.7. Samples: 313864704. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:15,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:34:17,930][1652475] Updated weights for policy 0, policy_version 612900 (0.0016) [2024-06-15 19:34:19,636][1652475] Updated weights for policy 0, policy_version 612976 (0.0014) [2024-06-15 19:34:20,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 1255407616. Throughput: 0: 10661.0. Samples: 313895936. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:20,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:34:22,485][1652475] Updated weights for policy 0, policy_version 613046 (0.0014) [2024-06-15 19:34:25,074][1652475] Updated weights for policy 0, policy_version 613074 (0.0013) [2024-06-15 19:34:25,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.7, 300 sec: 43431.5). Total num frames: 1255636992. Throughput: 0: 10661.0. Samples: 313960960. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:25,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:34:29,295][1652475] Updated weights for policy 0, policy_version 613136 (0.0013) [2024-06-15 19:34:30,737][1648984] Fps is (10 sec: 39322.0, 60 sec: 43692.7, 300 sec: 43653.7). Total num frames: 1255800832. Throughput: 0: 10797.5. Samples: 314028032. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:30,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:34:31,204][1652475] Updated weights for policy 0, policy_version 613203 (0.0012) [2024-06-15 19:34:33,013][1652475] Updated weights for policy 0, policy_version 613249 (0.0011) [2024-06-15 19:34:34,521][1652475] Updated weights for policy 0, policy_version 613307 (0.0049) [2024-06-15 19:34:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 1256062976. Throughput: 0: 10683.7. Samples: 314058752. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:35,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:34:37,081][1652475] Updated weights for policy 0, policy_version 613348 (0.0018) [2024-06-15 19:34:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42598.6, 300 sec: 43542.6). Total num frames: 1256194048. Throughput: 0: 10820.3. Samples: 314129408. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:40,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:34:42,803][1652475] Updated weights for policy 0, policy_version 613424 (0.0013) [2024-06-15 19:34:44,122][1652475] Updated weights for policy 0, policy_version 613474 (0.0014) [2024-06-15 19:34:45,722][1651340] Signal inference workers to stop experience collection... (31550 times) [2024-06-15 19:34:45,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43144.5, 300 sec: 43764.7). Total num frames: 1256521728. Throughput: 0: 10888.6. Samples: 314187776. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:45,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:34:45,756][1652475] InferenceWorker_p0-w0: stopping experience collection (31550 times) [2024-06-15 19:34:45,797][1652475] Updated weights for policy 0, policy_version 613540 (0.0123) [2024-06-15 19:34:46,023][1651340] Signal inference workers to resume experience collection... (31550 times) [2024-06-15 19:34:46,025][1652475] InferenceWorker_p0-w0: resuming experience collection (31550 times) [2024-06-15 19:34:49,662][1652475] Updated weights for policy 0, policy_version 613632 (0.0015) [2024-06-15 19:34:50,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1256718336. Throughput: 0: 10911.3. Samples: 314224128. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:50,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:34:55,359][1652475] Updated weights for policy 0, policy_version 613712 (0.0103) [2024-06-15 19:34:55,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 1256882176. Throughput: 0: 11059.2. Samples: 314292736. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:34:55,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:34:56,296][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000613744_1256947712.pth... [2024-06-15 19:34:56,490][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000608640_1246494720.pth [2024-06-15 19:34:58,066][1652475] Updated weights for policy 0, policy_version 613808 (0.0013) [2024-06-15 19:35:00,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43209.3). Total num frames: 1257111552. Throughput: 0: 10752.0. Samples: 314348544. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:35:00,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:35:02,566][1652475] Updated weights for policy 0, policy_version 613883 (0.0013) [2024-06-15 19:35:05,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1257242624. Throughput: 0: 10717.8. Samples: 314378240. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:35:05,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:35:08,224][1652475] Updated weights for policy 0, policy_version 613940 (0.0012) [2024-06-15 19:35:09,567][1652475] Updated weights for policy 0, policy_version 614000 (0.0012) [2024-06-15 19:35:10,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1257570304. Throughput: 0: 10695.1. Samples: 314442240. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:35:10,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:35:11,305][1652475] Updated weights for policy 0, policy_version 614079 (0.0016) [2024-06-15 19:35:14,954][1652475] Updated weights for policy 0, policy_version 614129 (0.0016) [2024-06-15 19:35:15,742][1648984] Fps is (10 sec: 52405.5, 60 sec: 43687.4, 300 sec: 43541.9). Total num frames: 1257766912. Throughput: 0: 10557.5. Samples: 314503168. Policy #0 lag: (min: 33.0, avg: 135.9, max: 289.0) [2024-06-15 19:35:15,743][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:35:19,778][1652475] Updated weights for policy 0, policy_version 614178 (0.0014) [2024-06-15 19:35:20,334][1652475] Updated weights for policy 0, policy_version 614208 (0.0012) [2024-06-15 19:35:20,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1257897984. Throughput: 0: 10695.1. Samples: 314540032. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:35:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:35:23,232][1652475] Updated weights for policy 0, policy_version 614327 (0.0013) [2024-06-15 19:35:25,738][1648984] Fps is (10 sec: 39339.2, 60 sec: 42052.3, 300 sec: 43098.2). Total num frames: 1258160128. Throughput: 0: 10410.6. Samples: 314597888. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:35:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:35:26,178][1652475] Updated weights for policy 0, policy_version 614368 (0.0012) [2024-06-15 19:35:30,770][1648984] Fps is (10 sec: 39193.5, 60 sec: 41483.5, 300 sec: 43093.5). Total num frames: 1258291200. Throughput: 0: 10698.7. Samples: 314669568. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:35:30,771][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 19:35:31,509][1652475] Updated weights for policy 0, policy_version 614422 (0.0012) [2024-06-15 19:35:32,868][1651340] Signal inference workers to stop experience collection... (31600 times) [2024-06-15 19:35:32,916][1652475] InferenceWorker_p0-w0: stopping experience collection (31600 times) [2024-06-15 19:35:33,156][1651340] Signal inference workers to resume experience collection... (31600 times) [2024-06-15 19:35:33,156][1652475] InferenceWorker_p0-w0: resuming experience collection (31600 times) [2024-06-15 19:35:33,830][1652475] Updated weights for policy 0, policy_version 614520 (0.0098) [2024-06-15 19:35:35,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 1258618880. Throughput: 0: 10535.8. Samples: 314698240. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:35:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 19:35:36,355][1652475] Updated weights for policy 0, policy_version 614586 (0.0013) [2024-06-15 19:35:40,738][1648984] Fps is (10 sec: 52600.1, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 1258815488. Throughput: 0: 10251.4. Samples: 314754048. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:35:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:35:43,487][1652475] Updated weights for policy 0, policy_version 614672 (0.0038) [2024-06-15 19:35:45,592][1652475] Updated weights for policy 0, policy_version 614736 (0.0016) [2024-06-15 19:35:45,739][1648984] Fps is (10 sec: 36041.3, 60 sec: 40959.3, 300 sec: 43209.2). Total num frames: 1258979328. Throughput: 0: 10615.2. Samples: 314826240. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:35:45,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:35:47,982][1652475] Updated weights for policy 0, policy_version 614786 (0.0011) [2024-06-15 19:35:49,603][1652475] Updated weights for policy 0, policy_version 614850 (0.0167) [2024-06-15 19:35:50,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 1259307008. Throughput: 0: 10638.2. Samples: 314856960. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:35:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:35:50,966][1652475] Updated weights for policy 0, policy_version 614906 (0.0011) [2024-06-15 19:35:55,738][1648984] Fps is (10 sec: 36047.9, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 1259339776. Throughput: 0: 10547.2. Samples: 314916864. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:35:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:35:56,816][1652475] Updated weights for policy 0, policy_version 614960 (0.0034) [2024-06-15 19:35:59,874][1652475] Updated weights for policy 0, policy_version 615024 (0.0106) [2024-06-15 19:36:00,744][1648984] Fps is (10 sec: 32747.7, 60 sec: 42047.9, 300 sec: 43209.7). Total num frames: 1259634688. Throughput: 0: 10649.2. Samples: 314982400. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:00,745][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:36:01,748][1652475] Updated weights for policy 0, policy_version 615090 (0.0012) [2024-06-15 19:36:03,745][1652475] Updated weights for policy 0, policy_version 615168 (0.0012) [2024-06-15 19:36:05,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43320.5). Total num frames: 1259864064. Throughput: 0: 10262.8. Samples: 315001856. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:36:10,738][1648984] Fps is (10 sec: 22951.7, 60 sec: 38229.3, 300 sec: 42209.6). Total num frames: 1259864064. Throughput: 0: 10535.8. Samples: 315072000. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:36:12,576][1652475] Updated weights for policy 0, policy_version 615248 (0.0017) [2024-06-15 19:36:14,923][1652475] Updated weights for policy 0, policy_version 615330 (0.0107) [2024-06-15 19:36:15,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 41509.0, 300 sec: 43098.2). Total num frames: 1260257280. Throughput: 0: 10019.6. Samples: 315120128. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:15,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:36:16,971][1652475] Updated weights for policy 0, policy_version 615408 (0.0107) [2024-06-15 19:36:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1260388352. Throughput: 0: 9944.2. Samples: 315145728. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:20,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:36:23,996][1651340] Signal inference workers to stop experience collection... (31650 times) [2024-06-15 19:36:24,038][1652475] InferenceWorker_p0-w0: stopping experience collection (31650 times) [2024-06-15 19:36:24,278][1651340] Signal inference workers to resume experience collection... (31650 times) [2024-06-15 19:36:24,278][1652475] InferenceWorker_p0-w0: resuming experience collection (31650 times) [2024-06-15 19:36:25,244][1652475] Updated weights for policy 0, policy_version 615472 (0.0013) [2024-06-15 19:36:25,738][1648984] Fps is (10 sec: 26215.1, 60 sec: 39321.6, 300 sec: 42209.6). Total num frames: 1260519424. Throughput: 0: 10365.2. Samples: 315220480. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:36:26,814][1652475] Updated weights for policy 0, policy_version 615536 (0.0012) [2024-06-15 19:36:28,713][1652475] Updated weights for policy 0, policy_version 615614 (0.0013) [2024-06-15 19:36:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43714.4, 300 sec: 42876.1). Total num frames: 1260912640. Throughput: 0: 9807.8. Samples: 315267584. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:36:35,661][1652475] Updated weights for policy 0, policy_version 615681 (0.0142) [2024-06-15 19:36:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 38229.3, 300 sec: 42209.6). Total num frames: 1260912640. Throughput: 0: 10023.8. Samples: 315308032. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:35,741][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:36:37,411][1652475] Updated weights for policy 0, policy_version 615747 (0.0013) [2024-06-15 19:36:39,251][1652475] Updated weights for policy 0, policy_version 615811 (0.0013) [2024-06-15 19:36:40,619][1652475] Updated weights for policy 0, policy_version 615871 (0.0014) [2024-06-15 19:36:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1261305856. Throughput: 0: 10023.8. Samples: 315367936. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:40,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 19:36:43,488][1652475] Updated weights for policy 0, policy_version 615920 (0.0012) [2024-06-15 19:36:45,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 40960.6, 300 sec: 42653.9). Total num frames: 1261436928. Throughput: 0: 10173.1. Samples: 315440128. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:45,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 19:36:47,479][1652475] Updated weights for policy 0, policy_version 615975 (0.0021) [2024-06-15 19:36:49,348][1652475] Updated weights for policy 0, policy_version 616032 (0.0016) [2024-06-15 19:36:50,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 39867.7, 300 sec: 42654.0). Total num frames: 1261699072. Throughput: 0: 10478.9. Samples: 315473408. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 19:36:50,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:36:51,270][1652475] Updated weights for policy 0, policy_version 616096 (0.0024) [2024-06-15 19:36:54,155][1652475] Updated weights for policy 0, policy_version 616144 (0.0012) [2024-06-15 19:36:55,236][1652475] Updated weights for policy 0, policy_version 616192 (0.0012) [2024-06-15 19:36:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1261961216. Throughput: 0: 10376.5. Samples: 315538944. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:36:55,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:36:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000616192_1261961216.pth... [2024-06-15 19:36:55,810][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000611248_1251835904.pth [2024-06-15 19:36:59,478][1652475] Updated weights for policy 0, policy_version 616254 (0.0019) [2024-06-15 19:37:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 40964.2, 300 sec: 42209.6). Total num frames: 1262092288. Throughput: 0: 10854.5. Samples: 315608576. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:00,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:37:02,008][1652475] Updated weights for policy 0, policy_version 616315 (0.0013) [2024-06-15 19:37:05,363][1652475] Updated weights for policy 0, policy_version 616385 (0.0016) [2024-06-15 19:37:05,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 1262387200. Throughput: 0: 10922.7. Samples: 315637248. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:05,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:37:05,833][1651340] Signal inference workers to stop experience collection... (31700 times) [2024-06-15 19:37:05,865][1652475] InferenceWorker_p0-w0: stopping experience collection (31700 times) [2024-06-15 19:37:06,094][1651340] Signal inference workers to resume experience collection... (31700 times) [2024-06-15 19:37:06,102][1652475] InferenceWorker_p0-w0: resuming experience collection (31700 times) [2024-06-15 19:37:09,723][1652475] Updated weights for policy 0, policy_version 616450 (0.0105) [2024-06-15 19:37:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44782.9, 300 sec: 42431.8). Total num frames: 1262551040. Throughput: 0: 10843.0. Samples: 315708416. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:37:12,442][1652475] Updated weights for policy 0, policy_version 616514 (0.0011) [2024-06-15 19:37:13,747][1652475] Updated weights for policy 0, policy_version 616573 (0.0013) [2024-06-15 19:37:15,401][1652475] Updated weights for policy 0, policy_version 616640 (0.0014) [2024-06-15 19:37:15,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.9, 300 sec: 42987.2). Total num frames: 1262878720. Throughput: 0: 11218.5. Samples: 315772416. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:15,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:37:17,681][1652475] Updated weights for policy 0, policy_version 616701 (0.0014) [2024-06-15 19:37:20,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.7, 300 sec: 42654.3). Total num frames: 1263009792. Throughput: 0: 11116.1. Samples: 315808256. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:20,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:37:22,055][1652475] Updated weights for policy 0, policy_version 616768 (0.0136) [2024-06-15 19:37:25,466][1652475] Updated weights for policy 0, policy_version 616832 (0.0013) [2024-06-15 19:37:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 42653.9). Total num frames: 1263271936. Throughput: 0: 11446.1. Samples: 315883008. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:25,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:37:26,999][1652475] Updated weights for policy 0, policy_version 616892 (0.0015) [2024-06-15 19:37:28,707][1652475] Updated weights for policy 0, policy_version 616929 (0.0012) [2024-06-15 19:37:30,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1263534080. Throughput: 0: 11343.6. Samples: 315950592. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:30,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:37:32,298][1652475] Updated weights for policy 0, policy_version 616992 (0.0015) [2024-06-15 19:37:35,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 45875.2, 300 sec: 42542.9). Total num frames: 1263665152. Throughput: 0: 11309.5. Samples: 315982336. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:35,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:37:36,197][1652475] Updated weights for policy 0, policy_version 617040 (0.0018) [2024-06-15 19:37:37,403][1652475] Updated weights for policy 0, policy_version 617092 (0.0015) [2024-06-15 19:37:39,909][1652475] Updated weights for policy 0, policy_version 617168 (0.0091) [2024-06-15 19:37:40,738][1648984] Fps is (10 sec: 49150.6, 60 sec: 45328.9, 300 sec: 42987.1). Total num frames: 1264025600. Throughput: 0: 11423.2. Samples: 316052992. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:37:43,630][1652475] Updated weights for policy 0, policy_version 617232 (0.0122) [2024-06-15 19:37:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 42876.1). Total num frames: 1264189440. Throughput: 0: 11207.1. Samples: 316112896. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:45,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:37:48,928][1652475] Updated weights for policy 0, policy_version 617304 (0.0012) [2024-06-15 19:37:50,413][1652475] Updated weights for policy 0, policy_version 617360 (0.0013) [2024-06-15 19:37:50,738][1648984] Fps is (10 sec: 32769.0, 60 sec: 44236.7, 300 sec: 42654.0). Total num frames: 1264353280. Throughput: 0: 11377.8. Samples: 316149248. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:37:52,057][1652475] Updated weights for policy 0, policy_version 617424 (0.0031) [2024-06-15 19:37:52,162][1651340] Signal inference workers to stop experience collection... (31750 times) [2024-06-15 19:37:52,208][1652475] InferenceWorker_p0-w0: stopping experience collection (31750 times) [2024-06-15 19:37:52,348][1651340] Signal inference workers to resume experience collection... (31750 times) [2024-06-15 19:37:52,366][1652475] InferenceWorker_p0-w0: resuming experience collection (31750 times) [2024-06-15 19:37:55,644][1652475] Updated weights for policy 0, policy_version 617476 (0.0013) [2024-06-15 19:37:55,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1264582656. Throughput: 0: 11116.0. Samples: 316208640. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:37:55,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:38:00,328][1652475] Updated weights for policy 0, policy_version 617552 (0.0013) [2024-06-15 19:38:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44782.9, 300 sec: 42654.8). Total num frames: 1264779264. Throughput: 0: 11229.9. Samples: 316277760. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:38:00,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 19:38:01,964][1652475] Updated weights for policy 0, policy_version 617616 (0.0017) [2024-06-15 19:38:03,066][1652475] Updated weights for policy 0, policy_version 617663 (0.0012) [2024-06-15 19:38:05,383][1652475] Updated weights for policy 0, policy_version 617726 (0.0013) [2024-06-15 19:38:05,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 45329.1, 300 sec: 43098.2). Total num frames: 1265106944. Throughput: 0: 11047.8. Samples: 316305408. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:38:05,752][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:38:09,429][1652475] Updated weights for policy 0, policy_version 617780 (0.0016) [2024-06-15 19:38:10,750][1648984] Fps is (10 sec: 45819.2, 60 sec: 44773.8, 300 sec: 42874.3). Total num frames: 1265238016. Throughput: 0: 10987.9. Samples: 316377600. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:38:10,751][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:38:12,235][1652475] Updated weights for policy 0, policy_version 617840 (0.0013) [2024-06-15 19:38:14,297][1652475] Updated weights for policy 0, policy_version 617904 (0.0011) [2024-06-15 19:38:15,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1265500160. Throughput: 0: 10899.9. Samples: 316441088. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:38:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:38:16,165][1652475] Updated weights for policy 0, policy_version 617952 (0.0011) [2024-06-15 19:38:20,738][1648984] Fps is (10 sec: 42651.0, 60 sec: 44236.9, 300 sec: 42765.1). Total num frames: 1265664000. Throughput: 0: 10877.2. Samples: 316471808. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:38:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:38:20,883][1652475] Updated weights for policy 0, policy_version 618003 (0.0014) [2024-06-15 19:38:24,405][1652475] Updated weights for policy 0, policy_version 618080 (0.0014) [2024-06-15 19:38:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43098.6). Total num frames: 1265893376. Throughput: 0: 10900.0. Samples: 316543488. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:38:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:38:26,055][1652475] Updated weights for policy 0, policy_version 618132 (0.0014) [2024-06-15 19:38:28,105][1652475] Updated weights for policy 0, policy_version 618224 (0.0014) [2024-06-15 19:38:30,738][1648984] Fps is (10 sec: 49149.9, 60 sec: 43690.4, 300 sec: 42876.0). Total num frames: 1266155520. Throughput: 0: 10922.6. Samples: 316604416. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 19:38:30,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:38:32,119][1652475] Updated weights for policy 0, policy_version 618241 (0.0033) [2024-06-15 19:38:33,543][1652475] Updated weights for policy 0, policy_version 618304 (0.0022) [2024-06-15 19:38:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1266286592. Throughput: 0: 10865.8. Samples: 316638208. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:38:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:38:38,990][1652475] Updated weights for policy 0, policy_version 618402 (0.0016) [2024-06-15 19:38:39,299][1651340] Signal inference workers to stop experience collection... (31800 times) [2024-06-15 19:38:39,377][1652475] InferenceWorker_p0-w0: stopping experience collection (31800 times) [2024-06-15 19:38:39,651][1651340] Signal inference workers to resume experience collection... (31800 times) [2024-06-15 19:38:39,655][1652475] InferenceWorker_p0-w0: resuming experience collection (31800 times) [2024-06-15 19:38:40,738][1648984] Fps is (10 sec: 45876.8, 60 sec: 43144.8, 300 sec: 42987.2). Total num frames: 1266614272. Throughput: 0: 10854.5. Samples: 316697088. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:38:40,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:38:40,966][1652475] Updated weights for policy 0, policy_version 618480 (0.0132) [2024-06-15 19:38:44,727][1652475] Updated weights for policy 0, policy_version 618513 (0.0012) [2024-06-15 19:38:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1266810880. Throughput: 0: 10786.1. Samples: 316763136. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:38:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:38:49,267][1652475] Updated weights for policy 0, policy_version 618576 (0.0081) [2024-06-15 19:38:50,201][1652475] Updated weights for policy 0, policy_version 618624 (0.0015) [2024-06-15 19:38:50,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1266941952. Throughput: 0: 11013.7. Samples: 316801024. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:38:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:38:52,202][1652475] Updated weights for policy 0, policy_version 618690 (0.0013) [2024-06-15 19:38:53,486][1652475] Updated weights for policy 0, policy_version 618747 (0.0011) [2024-06-15 19:38:55,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1267204096. Throughput: 0: 10686.6. Samples: 316858368. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:38:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:38:55,777][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000618752_1267204096.pth... [2024-06-15 19:38:55,867][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000613744_1256947712.pth [2024-06-15 19:39:00,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 1267335168. Throughput: 0: 10945.4. Samples: 316933632. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:39:00,996][1652475] Updated weights for policy 0, policy_version 618833 (0.0015) [2024-06-15 19:39:03,099][1652475] Updated weights for policy 0, policy_version 618896 (0.0013) [2024-06-15 19:39:05,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.3, 300 sec: 43209.3). Total num frames: 1267695616. Throughput: 0: 10899.8. Samples: 316962304. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:05,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:39:05,891][1652475] Updated weights for policy 0, policy_version 619008 (0.0134) [2024-06-15 19:39:09,650][1652475] Updated weights for policy 0, policy_version 619072 (0.0013) [2024-06-15 19:39:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43699.5, 300 sec: 43098.2). Total num frames: 1267859456. Throughput: 0: 10558.6. Samples: 317018624. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:39:14,172][1652475] Updated weights for policy 0, policy_version 619127 (0.0016) [2024-06-15 19:39:15,652][1652475] Updated weights for policy 0, policy_version 619157 (0.0013) [2024-06-15 19:39:15,738][1648984] Fps is (10 sec: 32768.8, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 1268023296. Throughput: 0: 10831.7. Samples: 317091840. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:39:17,476][1652475] Updated weights for policy 0, policy_version 619237 (0.0015) [2024-06-15 19:39:20,287][1652475] Updated weights for policy 0, policy_version 619297 (0.0013) [2024-06-15 19:39:20,739][1648984] Fps is (10 sec: 49147.8, 60 sec: 44782.2, 300 sec: 43098.1). Total num frames: 1268350976. Throughput: 0: 10706.3. Samples: 317120000. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:20,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:39:20,934][1652475] Updated weights for policy 0, policy_version 619328 (0.0013) [2024-06-15 19:39:25,400][1652475] Updated weights for policy 0, policy_version 619389 (0.0013) [2024-06-15 19:39:25,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1268514816. Throughput: 0: 11161.6. Samples: 317199360. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:39:27,174][1651340] Signal inference workers to stop experience collection... (31850 times) [2024-06-15 19:39:27,214][1652475] InferenceWorker_p0-w0: stopping experience collection (31850 times) [2024-06-15 19:39:27,485][1651340] Signal inference workers to resume experience collection... (31850 times) [2024-06-15 19:39:27,486][1652475] InferenceWorker_p0-w0: resuming experience collection (31850 times) [2024-06-15 19:39:28,676][1652475] Updated weights for policy 0, policy_version 619472 (0.0013) [2024-06-15 19:39:29,504][1652475] Updated weights for policy 0, policy_version 619518 (0.0011) [2024-06-15 19:39:30,738][1648984] Fps is (10 sec: 42602.3, 60 sec: 43690.9, 300 sec: 43098.2). Total num frames: 1268776960. Throughput: 0: 11013.7. Samples: 317258752. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:39:31,900][1652475] Updated weights for policy 0, policy_version 619575 (0.0013) [2024-06-15 19:39:35,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1268908032. Throughput: 0: 10990.9. Samples: 317295616. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:39:36,379][1652475] Updated weights for policy 0, policy_version 619618 (0.0013) [2024-06-15 19:39:39,009][1652475] Updated weights for policy 0, policy_version 619665 (0.0014) [2024-06-15 19:39:40,255][1652475] Updated weights for policy 0, policy_version 619718 (0.0013) [2024-06-15 19:39:40,739][1648984] Fps is (10 sec: 45868.4, 60 sec: 43689.6, 300 sec: 43098.0). Total num frames: 1269235712. Throughput: 0: 11275.1. Samples: 317365760. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:40,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:39:42,575][1652475] Updated weights for policy 0, policy_version 619794 (0.0013) [2024-06-15 19:39:45,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1269432320. Throughput: 0: 11093.4. Samples: 317432832. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:39:47,430][1652475] Updated weights for policy 0, policy_version 619842 (0.0012) [2024-06-15 19:39:48,921][1652475] Updated weights for policy 0, policy_version 619900 (0.0012) [2024-06-15 19:39:50,738][1648984] Fps is (10 sec: 39327.6, 60 sec: 44783.0, 300 sec: 43209.3). Total num frames: 1269628928. Throughput: 0: 11184.4. Samples: 317465600. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:39:51,074][1652475] Updated weights for policy 0, policy_version 619957 (0.0014) [2024-06-15 19:39:52,200][1652475] Updated weights for policy 0, policy_version 620026 (0.0014) [2024-06-15 19:39:55,123][1652475] Updated weights for policy 0, policy_version 620087 (0.0048) [2024-06-15 19:39:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 45875.4, 300 sec: 43542.5). Total num frames: 1269956608. Throughput: 0: 11389.2. Samples: 317531136. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:39:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:40:00,739][1648984] Fps is (10 sec: 32764.4, 60 sec: 43690.0, 300 sec: 43098.1). Total num frames: 1269956608. Throughput: 0: 11354.8. Samples: 317602816. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:40:00,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:40:02,047][1652475] Updated weights for policy 0, policy_version 620160 (0.0014) [2024-06-15 19:40:03,775][1652475] Updated weights for policy 0, policy_version 620240 (0.0013) [2024-06-15 19:40:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 44237.1, 300 sec: 43320.4). Total num frames: 1270349824. Throughput: 0: 11287.0. Samples: 317627904. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 19:40:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:40:06,259][1652475] Updated weights for policy 0, policy_version 620289 (0.0013) [2024-06-15 19:40:07,650][1652475] Updated weights for policy 0, policy_version 620352 (0.0012) [2024-06-15 19:40:10,738][1648984] Fps is (10 sec: 52434.1, 60 sec: 43690.7, 300 sec: 43098.9). Total num frames: 1270480896. Throughput: 0: 10922.6. Samples: 317690880. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:40:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:40:12,618][1651340] Signal inference workers to stop experience collection... (31900 times) [2024-06-15 19:40:12,648][1652475] InferenceWorker_p0-w0: stopping experience collection (31900 times) [2024-06-15 19:40:12,958][1651340] Signal inference workers to resume experience collection... (31900 times) [2024-06-15 19:40:12,959][1652475] InferenceWorker_p0-w0: resuming experience collection (31900 times) [2024-06-15 19:40:13,910][1652475] Updated weights for policy 0, policy_version 620420 (0.0124) [2024-06-15 19:40:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 45329.2, 300 sec: 43542.6). Total num frames: 1270743040. Throughput: 0: 10922.7. Samples: 317750272. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:40:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:40:16,946][1652475] Updated weights for policy 0, policy_version 620528 (0.0015) [2024-06-15 19:40:20,495][1652475] Updated weights for policy 0, policy_version 620576 (0.0014) [2024-06-15 19:40:20,738][1648984] Fps is (10 sec: 45874.3, 60 sec: 43145.0, 300 sec: 43320.4). Total num frames: 1270939648. Throughput: 0: 10786.1. Samples: 317780992. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:40:20,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:40:25,219][1652475] Updated weights for policy 0, policy_version 620656 (0.0128) [2024-06-15 19:40:25,738][1648984] Fps is (10 sec: 39318.0, 60 sec: 43690.0, 300 sec: 43547.3). Total num frames: 1271136256. Throughput: 0: 10752.2. Samples: 317849600. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:40:25,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:40:27,140][1652475] Updated weights for policy 0, policy_version 620734 (0.0049) [2024-06-15 19:40:30,738][1648984] Fps is (10 sec: 42599.3, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1271365632. Throughput: 0: 10410.7. Samples: 317901312. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:40:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:40:30,796][1652475] Updated weights for policy 0, policy_version 620797 (0.0021) [2024-06-15 19:40:35,159][1652475] Updated weights for policy 0, policy_version 620864 (0.0016) [2024-06-15 19:40:35,738][1648984] Fps is (10 sec: 39324.9, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 1271529472. Throughput: 0: 10478.9. Samples: 317937152. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:40:35,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:40:39,456][1652475] Updated weights for policy 0, policy_version 620946 (0.0128) [2024-06-15 19:40:40,741][1648984] Fps is (10 sec: 42585.7, 60 sec: 42597.3, 300 sec: 43431.2). Total num frames: 1271791616. Throughput: 0: 10364.5. Samples: 317997568. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:40:40,741][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:40:42,287][1652475] Updated weights for policy 0, policy_version 621024 (0.0013) [2024-06-15 19:40:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 1271922688. Throughput: 0: 10183.3. Samples: 318061056. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:40:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:40:46,265][1652475] Updated weights for policy 0, policy_version 621072 (0.0032) [2024-06-15 19:40:48,927][1652475] Updated weights for policy 0, policy_version 621122 (0.0014) [2024-06-15 19:40:50,738][1648984] Fps is (10 sec: 39333.6, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1272184832. Throughput: 0: 10319.6. Samples: 318092288. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:40:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:40:53,855][1652475] Updated weights for policy 0, policy_version 621221 (0.0013) [2024-06-15 19:40:55,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 40959.9, 300 sec: 43321.3). Total num frames: 1272414208. Throughput: 0: 10240.0. Samples: 318151680. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:40:55,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:40:55,761][1652475] Updated weights for policy 0, policy_version 621303 (0.0014) [2024-06-15 19:40:55,923][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000621312_1272446976.pth... [2024-06-15 19:40:56,002][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000616192_1261961216.pth [2024-06-15 19:40:59,026][1652475] Updated weights for policy 0, policy_version 621348 (0.0012) [2024-06-15 19:41:00,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 43691.3, 300 sec: 43098.2). Total num frames: 1272578048. Throughput: 0: 10239.9. Samples: 318211072. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:41:00,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 19:41:02,971][1651340] Signal inference workers to stop experience collection... (31950 times) [2024-06-15 19:41:03,032][1652475] InferenceWorker_p0-w0: stopping experience collection (31950 times) [2024-06-15 19:41:03,233][1651340] Signal inference workers to resume experience collection... (31950 times) [2024-06-15 19:41:03,234][1652475] InferenceWorker_p0-w0: resuming experience collection (31950 times) [2024-06-15 19:41:03,702][1652475] Updated weights for policy 0, policy_version 621408 (0.0018) [2024-06-15 19:41:05,430][1652475] Updated weights for policy 0, policy_version 621456 (0.0023) [2024-06-15 19:41:05,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 39867.7, 300 sec: 43653.6). Total num frames: 1272741888. Throughput: 0: 10331.1. Samples: 318245888. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:41:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:41:07,251][1652475] Updated weights for policy 0, policy_version 621520 (0.0013) [2024-06-15 19:41:09,392][1652475] Updated weights for policy 0, policy_version 621572 (0.0012) [2024-06-15 19:41:10,645][1652475] Updated weights for policy 0, policy_version 621627 (0.0014) [2024-06-15 19:41:10,740][1648984] Fps is (10 sec: 49152.8, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 1273069568. Throughput: 0: 10217.4. Samples: 318309376. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:41:10,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:41:15,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 39321.5, 300 sec: 43098.2). Total num frames: 1273102336. Throughput: 0: 10626.8. Samples: 318379520. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:41:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:41:16,963][1652475] Updated weights for policy 0, policy_version 621689 (0.0014) [2024-06-15 19:41:18,533][1652475] Updated weights for policy 0, policy_version 621753 (0.0097) [2024-06-15 19:41:20,023][1652475] Updated weights for policy 0, policy_version 621808 (0.0013) [2024-06-15 19:41:20,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 42598.4, 300 sec: 43986.8). Total num frames: 1273495552. Throughput: 0: 10399.2. Samples: 318405120. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:41:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:41:21,657][1652475] Updated weights for policy 0, policy_version 621841 (0.0013) [2024-06-15 19:41:22,807][1652475] Updated weights for policy 0, policy_version 621888 (0.0014) [2024-06-15 19:41:25,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 41506.7, 300 sec: 43098.2). Total num frames: 1273626624. Throughput: 0: 10547.9. Samples: 318472192. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:41:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:41:28,719][1652475] Updated weights for policy 0, policy_version 621944 (0.0015) [2024-06-15 19:41:30,738][1648984] Fps is (10 sec: 36045.6, 60 sec: 41506.1, 300 sec: 43875.8). Total num frames: 1273856000. Throughput: 0: 10626.8. Samples: 318539264. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:41:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:41:31,224][1652475] Updated weights for policy 0, policy_version 622032 (0.0110) [2024-06-15 19:41:33,204][1652475] Updated weights for policy 0, policy_version 622083 (0.0012) [2024-06-15 19:41:34,636][1652475] Updated weights for policy 0, policy_version 622141 (0.0013) [2024-06-15 19:41:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1274150912. Throughput: 0: 10592.7. Samples: 318568960. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:41:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:41:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 40962.0, 300 sec: 43431.5). Total num frames: 1274249216. Throughput: 0: 10843.1. Samples: 318639616. Policy #0 lag: (min: 59.0, avg: 212.4, max: 315.0) [2024-06-15 19:41:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:41:40,795][1652475] Updated weights for policy 0, policy_version 622198 (0.0140) [2024-06-15 19:41:42,315][1652475] Updated weights for policy 0, policy_version 622256 (0.0014) [2024-06-15 19:41:44,005][1652475] Updated weights for policy 0, policy_version 622307 (0.0012) [2024-06-15 19:41:45,186][1652475] Updated weights for policy 0, policy_version 622353 (0.0012) [2024-06-15 19:41:45,594][1651340] Signal inference workers to stop experience collection... (32000 times) [2024-06-15 19:41:45,651][1652475] InferenceWorker_p0-w0: stopping experience collection (32000 times) [2024-06-15 19:41:45,740][1648984] Fps is (10 sec: 45865.5, 60 sec: 44781.4, 300 sec: 43764.4). Total num frames: 1274609664. Throughput: 0: 10910.8. Samples: 318702080. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:41:45,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:41:45,962][1651340] Signal inference workers to resume experience collection... (32000 times) [2024-06-15 19:41:45,963][1652475] InferenceWorker_p0-w0: resuming experience collection (32000 times) [2024-06-15 19:41:46,237][1652475] Updated weights for policy 0, policy_version 622398 (0.0011) [2024-06-15 19:41:50,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1274675200. Throughput: 0: 10899.9. Samples: 318736384. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:41:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:41:52,093][1652475] Updated weights for policy 0, policy_version 622458 (0.0032) [2024-06-15 19:41:54,267][1652475] Updated weights for policy 0, policy_version 622519 (0.0013) [2024-06-15 19:41:55,738][1648984] Fps is (10 sec: 39330.0, 60 sec: 43144.7, 300 sec: 43764.7). Total num frames: 1275002880. Throughput: 0: 11036.5. Samples: 318806016. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:41:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:41:56,489][1652475] Updated weights for policy 0, policy_version 622592 (0.0013) [2024-06-15 19:41:57,907][1652475] Updated weights for policy 0, policy_version 622655 (0.0013) [2024-06-15 19:42:00,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.9, 300 sec: 43431.5). Total num frames: 1275199488. Throughput: 0: 10843.0. Samples: 318867456. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:00,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:42:04,437][1652475] Updated weights for policy 0, policy_version 622719 (0.0014) [2024-06-15 19:42:05,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44236.9, 300 sec: 43542.6). Total num frames: 1275396096. Throughput: 0: 11161.7. Samples: 318907392. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:42:07,259][1652475] Updated weights for policy 0, policy_version 622787 (0.0026) [2024-06-15 19:42:08,876][1652475] Updated weights for policy 0, policy_version 622848 (0.0012) [2024-06-15 19:42:10,196][1652475] Updated weights for policy 0, policy_version 622911 (0.0045) [2024-06-15 19:42:10,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 1275723776. Throughput: 0: 10934.0. Samples: 318964224. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:42:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 1275789312. Throughput: 0: 11082.0. Samples: 319037952. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:42:16,099][1652475] Updated weights for policy 0, policy_version 622976 (0.0092) [2024-06-15 19:42:19,165][1652475] Updated weights for policy 0, policy_version 623041 (0.0106) [2024-06-15 19:42:20,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.8, 300 sec: 43542.5). Total num frames: 1276116992. Throughput: 0: 11036.4. Samples: 319065600. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:42:21,509][1652475] Updated weights for policy 0, policy_version 623139 (0.0012) [2024-06-15 19:42:25,738][1648984] Fps is (10 sec: 45873.3, 60 sec: 43690.4, 300 sec: 43098.2). Total num frames: 1276248064. Throughput: 0: 10842.9. Samples: 319127552. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:25,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:42:29,021][1652475] Updated weights for policy 0, policy_version 623200 (0.0013) [2024-06-15 19:42:30,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 1276444672. Throughput: 0: 10854.9. Samples: 319190528. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:42:31,088][1652475] Updated weights for policy 0, policy_version 623291 (0.0013) [2024-06-15 19:42:32,248][1651340] Signal inference workers to stop experience collection... (32050 times) [2024-06-15 19:42:32,298][1652475] InferenceWorker_p0-w0: stopping experience collection (32050 times) [2024-06-15 19:42:32,497][1651340] Signal inference workers to resume experience collection... (32050 times) [2024-06-15 19:42:32,499][1652475] InferenceWorker_p0-w0: resuming experience collection (32050 times) [2024-06-15 19:42:32,501][1652475] Updated weights for policy 0, policy_version 623344 (0.0015) [2024-06-15 19:42:34,644][1652475] Updated weights for policy 0, policy_version 623413 (0.0012) [2024-06-15 19:42:35,738][1648984] Fps is (10 sec: 52431.2, 60 sec: 43690.7, 300 sec: 43209.4). Total num frames: 1276772352. Throughput: 0: 10683.7. Samples: 319217152. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:42:40,596][1652475] Updated weights for policy 0, policy_version 623472 (0.0027) [2024-06-15 19:42:40,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1276870656. Throughput: 0: 10752.0. Samples: 319289856. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:40,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:42:43,623][1652475] Updated weights for policy 0, policy_version 623552 (0.0087) [2024-06-15 19:42:44,811][1652475] Updated weights for policy 0, policy_version 623610 (0.0012) [2024-06-15 19:42:45,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 42599.9, 300 sec: 43431.5). Total num frames: 1277165568. Throughput: 0: 10740.6. Samples: 319350784. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:42:47,485][1652475] Updated weights for policy 0, policy_version 623676 (0.0014) [2024-06-15 19:42:50,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1277296640. Throughput: 0: 10547.2. Samples: 319382016. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:42:54,898][1652475] Updated weights for policy 0, policy_version 623760 (0.0091) [2024-06-15 19:42:55,738][1648984] Fps is (10 sec: 36044.0, 60 sec: 42052.1, 300 sec: 43209.3). Total num frames: 1277526016. Throughput: 0: 10831.6. Samples: 319451648. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:42:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:42:56,273][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000623824_1277591552.pth... [2024-06-15 19:42:56,407][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000618752_1267204096.pth [2024-06-15 19:42:57,295][1652475] Updated weights for policy 0, policy_version 623859 (0.0088) [2024-06-15 19:42:59,967][1652475] Updated weights for policy 0, policy_version 623926 (0.0012) [2024-06-15 19:43:00,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1277820928. Throughput: 0: 10501.7. Samples: 319510528. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:43:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:43:03,526][1652475] Updated weights for policy 0, policy_version 623968 (0.0013) [2024-06-15 19:43:05,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 42598.3, 300 sec: 43100.0). Total num frames: 1277952000. Throughput: 0: 10649.6. Samples: 319544832. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:43:05,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:43:07,076][1652475] Updated weights for policy 0, policy_version 624032 (0.0014) [2024-06-15 19:43:09,417][1652475] Updated weights for policy 0, policy_version 624080 (0.0013) [2024-06-15 19:43:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1278214144. Throughput: 0: 10729.3. Samples: 319610368. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:43:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:43:11,884][1652475] Updated weights for policy 0, policy_version 624186 (0.0014) [2024-06-15 19:43:15,737][1648984] Fps is (10 sec: 39322.2, 60 sec: 42598.5, 300 sec: 42987.2). Total num frames: 1278345216. Throughput: 0: 10649.6. Samples: 319669760. Policy #0 lag: (min: 63.0, avg: 157.1, max: 319.0) [2024-06-15 19:43:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:43:17,446][1652475] Updated weights for policy 0, policy_version 624255 (0.0021) [2024-06-15 19:43:20,443][1652475] Updated weights for policy 0, policy_version 624319 (0.0017) [2024-06-15 19:43:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1278607360. Throughput: 0: 10626.8. Samples: 319695360. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:43:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:43:22,682][1651340] Signal inference workers to stop experience collection... (32100 times) [2024-06-15 19:43:22,722][1652475] InferenceWorker_p0-w0: stopping experience collection (32100 times) [2024-06-15 19:43:22,955][1651340] Signal inference workers to resume experience collection... (32100 times) [2024-06-15 19:43:22,958][1652475] InferenceWorker_p0-w0: resuming experience collection (32100 times) [2024-06-15 19:43:24,300][1652475] Updated weights for policy 0, policy_version 624416 (0.0033) [2024-06-15 19:43:25,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43691.0, 300 sec: 43098.3). Total num frames: 1278869504. Throughput: 0: 10308.3. Samples: 319753728. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:43:25,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:43:29,881][1652475] Updated weights for policy 0, policy_version 624451 (0.0014) [2024-06-15 19:43:30,738][1648984] Fps is (10 sec: 32767.1, 60 sec: 41506.0, 300 sec: 42876.1). Total num frames: 1278935040. Throughput: 0: 10638.2. Samples: 319829504. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:43:30,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:43:31,485][1652475] Updated weights for policy 0, policy_version 624512 (0.0011) [2024-06-15 19:43:33,131][1652475] Updated weights for policy 0, policy_version 624572 (0.0013) [2024-06-15 19:43:35,750][1648984] Fps is (10 sec: 39272.1, 60 sec: 41497.4, 300 sec: 42874.3). Total num frames: 1279262720. Throughput: 0: 10419.1. Samples: 319851008. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:43:35,751][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:43:36,354][1652475] Updated weights for policy 0, policy_version 624641 (0.0028) [2024-06-15 19:43:40,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42052.1, 300 sec: 42653.9). Total num frames: 1279393792. Throughput: 0: 10331.0. Samples: 319916544. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:43:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:43:42,913][1652475] Updated weights for policy 0, policy_version 624720 (0.0025) [2024-06-15 19:43:44,805][1652475] Updated weights for policy 0, policy_version 624784 (0.0014) [2024-06-15 19:43:45,738][1648984] Fps is (10 sec: 36089.3, 60 sec: 40959.9, 300 sec: 42987.1). Total num frames: 1279623168. Throughput: 0: 10444.7. Samples: 319980544. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:43:45,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:43:45,867][1652475] Updated weights for policy 0, policy_version 624832 (0.0014) [2024-06-15 19:43:47,352][1652475] Updated weights for policy 0, policy_version 624891 (0.0069) [2024-06-15 19:43:49,575][1652475] Updated weights for policy 0, policy_version 624955 (0.0015) [2024-06-15 19:43:50,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1279918080. Throughput: 0: 10331.0. Samples: 320009728. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:43:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:43:55,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 41506.3, 300 sec: 42987.2). Total num frames: 1280016384. Throughput: 0: 10581.3. Samples: 320086528. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:43:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:43:56,436][1652475] Updated weights for policy 0, policy_version 625046 (0.0018) [2024-06-15 19:43:57,882][1652475] Updated weights for policy 0, policy_version 625104 (0.0013) [2024-06-15 19:43:59,117][1652475] Updated weights for policy 0, policy_version 625152 (0.0012) [2024-06-15 19:44:00,744][1648984] Fps is (10 sec: 42572.6, 60 sec: 42048.0, 300 sec: 42875.3). Total num frames: 1280344064. Throughput: 0: 10523.0. Samples: 320143360. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:00,744][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:44:05,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1280442368. Throughput: 0: 10683.7. Samples: 320176128. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:44:06,693][1652475] Updated weights for policy 0, policy_version 625232 (0.0014) [2024-06-15 19:44:07,613][1651340] Signal inference workers to stop experience collection... (32150 times) [2024-06-15 19:44:07,660][1652475] InferenceWorker_p0-w0: stopping experience collection (32150 times) [2024-06-15 19:44:07,941][1651340] Signal inference workers to resume experience collection... (32150 times) [2024-06-15 19:44:07,944][1652475] InferenceWorker_p0-w0: resuming experience collection (32150 times) [2024-06-15 19:44:09,819][1652475] Updated weights for policy 0, policy_version 625344 (0.0015) [2024-06-15 19:44:10,739][1648984] Fps is (10 sec: 36066.4, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 1280704512. Throughput: 0: 10695.1. Samples: 320235008. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:10,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:44:14,056][1652475] Updated weights for policy 0, policy_version 625442 (0.0099) [2024-06-15 19:44:15,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 42765.2). Total num frames: 1280966656. Throughput: 0: 10433.5. Samples: 320299008. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:44:19,029][1652475] Updated weights for policy 0, policy_version 625479 (0.0048) [2024-06-15 19:44:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1281097728. Throughput: 0: 10891.6. Samples: 320340992. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:44:20,861][1652475] Updated weights for policy 0, policy_version 625552 (0.0013) [2024-06-15 19:44:22,372][1652475] Updated weights for policy 0, policy_version 625604 (0.0013) [2024-06-15 19:44:24,994][1652475] Updated weights for policy 0, policy_version 625680 (0.0015) [2024-06-15 19:44:25,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 1281425408. Throughput: 0: 10797.5. Samples: 320402432. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:44:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.5, 300 sec: 42654.0). Total num frames: 1281490944. Throughput: 0: 10934.1. Samples: 320472576. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:44:31,971][1652475] Updated weights for policy 0, policy_version 625762 (0.0022) [2024-06-15 19:44:32,994][1652475] Updated weights for policy 0, policy_version 625799 (0.0016) [2024-06-15 19:44:34,548][1652475] Updated weights for policy 0, policy_version 625859 (0.0013) [2024-06-15 19:44:35,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43699.8, 300 sec: 42876.3). Total num frames: 1281884160. Throughput: 0: 10877.2. Samples: 320499200. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:35,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 19:44:36,481][1652475] Updated weights for policy 0, policy_version 625921 (0.0038) [2024-06-15 19:44:37,836][1652475] Updated weights for policy 0, policy_version 625978 (0.0016) [2024-06-15 19:44:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1282015232. Throughput: 0: 10729.2. Samples: 320569344. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:44:44,776][1652475] Updated weights for policy 0, policy_version 626064 (0.0015) [2024-06-15 19:44:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 44237.0, 300 sec: 42876.1). Total num frames: 1282277376. Throughput: 0: 10901.4. Samples: 320633856. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:44:47,001][1652475] Updated weights for policy 0, policy_version 626144 (0.0090) [2024-06-15 19:44:49,034][1652475] Updated weights for policy 0, policy_version 626208 (0.0112) [2024-06-15 19:44:50,738][1648984] Fps is (10 sec: 52426.3, 60 sec: 43690.3, 300 sec: 42653.9). Total num frames: 1282539520. Throughput: 0: 10808.8. Samples: 320662528. Policy #0 lag: (min: 5.0, avg: 102.0, max: 261.0) [2024-06-15 19:44:50,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:44:55,180][1652475] Updated weights for policy 0, policy_version 626256 (0.0013) [2024-06-15 19:44:55,623][1651340] Signal inference workers to stop experience collection... (32200 times) [2024-06-15 19:44:55,710][1652475] InferenceWorker_p0-w0: stopping experience collection (32200 times) [2024-06-15 19:44:55,738][1648984] Fps is (10 sec: 32767.5, 60 sec: 43144.5, 300 sec: 42876.2). Total num frames: 1282605056. Throughput: 0: 11013.7. Samples: 320730624. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:44:55,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 19:44:55,835][1651340] Signal inference workers to resume experience collection... (32200 times) [2024-06-15 19:44:55,837][1652475] InferenceWorker_p0-w0: resuming experience collection (32200 times) [2024-06-15 19:44:56,071][1652475] Updated weights for policy 0, policy_version 626300 (0.0023) [2024-06-15 19:44:56,127][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000626304_1282670592.pth... [2024-06-15 19:44:56,170][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000621312_1272446976.pth [2024-06-15 19:44:58,807][1652475] Updated weights for policy 0, policy_version 626362 (0.0032) [2024-06-15 19:45:00,661][1652475] Updated weights for policy 0, policy_version 626437 (0.0159) [2024-06-15 19:45:00,738][1648984] Fps is (10 sec: 39323.7, 60 sec: 43148.9, 300 sec: 42653.9). Total num frames: 1282932736. Throughput: 0: 10877.1. Samples: 320788480. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:00,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:45:02,086][1652475] Updated weights for policy 0, policy_version 626489 (0.0011) [2024-06-15 19:45:05,751][1648984] Fps is (10 sec: 45814.3, 60 sec: 43680.9, 300 sec: 42652.0). Total num frames: 1283063808. Throughput: 0: 10657.8. Samples: 320820736. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:05,752][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:45:06,898][1652475] Updated weights for policy 0, policy_version 626552 (0.0013) [2024-06-15 19:45:10,738][1648984] Fps is (10 sec: 26214.4, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1283194880. Throughput: 0: 10877.2. Samples: 320891904. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:10,741][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:45:11,423][1652475] Updated weights for policy 0, policy_version 626608 (0.0017) [2024-06-15 19:45:13,675][1652475] Updated weights for policy 0, policy_version 626691 (0.0027) [2024-06-15 19:45:15,738][1648984] Fps is (10 sec: 52498.2, 60 sec: 43690.5, 300 sec: 42876.1). Total num frames: 1283588096. Throughput: 0: 10490.3. Samples: 320944640. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:45:18,380][1652475] Updated weights for policy 0, policy_version 626784 (0.0013) [2024-06-15 19:45:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42654.1). Total num frames: 1283719168. Throughput: 0: 10774.8. Samples: 320984064. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:45:24,285][1652475] Updated weights for policy 0, policy_version 626872 (0.0013) [2024-06-15 19:45:25,738][1648984] Fps is (10 sec: 32768.6, 60 sec: 41506.2, 300 sec: 42542.9). Total num frames: 1283915776. Throughput: 0: 10581.4. Samples: 321045504. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 19:45:26,165][1652475] Updated weights for policy 0, policy_version 626934 (0.0014) [2024-06-15 19:45:28,012][1652475] Updated weights for policy 0, policy_version 627002 (0.0064) [2024-06-15 19:45:30,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 1284145152. Throughput: 0: 10490.3. Samples: 321105920. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:45:31,089][1652475] Updated weights for policy 0, policy_version 627056 (0.0013) [2024-06-15 19:45:35,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 39867.8, 300 sec: 42321.1). Total num frames: 1284276224. Throughput: 0: 10524.6. Samples: 321136128. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:35,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 19:45:35,958][1652475] Updated weights for policy 0, policy_version 627107 (0.0013) [2024-06-15 19:45:37,827][1652475] Updated weights for policy 0, policy_version 627190 (0.0162) [2024-06-15 19:45:40,513][1651340] Signal inference workers to stop experience collection... (32250 times) [2024-06-15 19:45:40,561][1652475] InferenceWorker_p0-w0: stopping experience collection (32250 times) [2024-06-15 19:45:40,710][1651340] Signal inference workers to resume experience collection... (32250 times) [2024-06-15 19:45:40,710][1652475] InferenceWorker_p0-w0: resuming experience collection (32250 times) [2024-06-15 19:45:40,712][1652475] Updated weights for policy 0, policy_version 627232 (0.0011) [2024-06-15 19:45:40,738][1648984] Fps is (10 sec: 42597.5, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 1284571136. Throughput: 0: 10410.6. Samples: 321199104. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:40,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:45:44,823][1652475] Updated weights for policy 0, policy_version 627312 (0.0013) [2024-06-15 19:45:45,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1284767744. Throughput: 0: 10581.3. Samples: 321264640. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:45,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:45:47,338][1652475] Updated weights for policy 0, policy_version 627364 (0.0012) [2024-06-15 19:45:49,331][1652475] Updated weights for policy 0, policy_version 627451 (0.0011) [2024-06-15 19:45:50,738][1648984] Fps is (10 sec: 45874.3, 60 sec: 41506.3, 300 sec: 42765.0). Total num frames: 1285029888. Throughput: 0: 10618.5. Samples: 321298432. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:50,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:45:53,177][1652475] Updated weights for policy 0, policy_version 627495 (0.0014) [2024-06-15 19:45:55,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.5, 300 sec: 42654.0). Total num frames: 1285160960. Throughput: 0: 10524.5. Samples: 321365504. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:45:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:45:57,353][1652475] Updated weights for policy 0, policy_version 627552 (0.0015) [2024-06-15 19:45:59,000][1652475] Updated weights for policy 0, policy_version 627619 (0.0012) [2024-06-15 19:46:00,738][1648984] Fps is (10 sec: 45876.8, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 1285488640. Throughput: 0: 10649.6. Samples: 321423872. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:46:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:00,978][1652475] Updated weights for policy 0, policy_version 627704 (0.0012) [2024-06-15 19:46:05,229][1652475] Updated weights for policy 0, policy_version 627744 (0.0088) [2024-06-15 19:46:05,738][1648984] Fps is (10 sec: 49150.9, 60 sec: 43154.0, 300 sec: 42653.9). Total num frames: 1285652480. Throughput: 0: 10547.2. Samples: 321458688. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:46:05,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:09,545][1652475] Updated weights for policy 0, policy_version 627792 (0.0013) [2024-06-15 19:46:10,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1285783552. Throughput: 0: 10865.8. Samples: 321534464. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:46:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:11,547][1652475] Updated weights for policy 0, policy_version 627858 (0.0014) [2024-06-15 19:46:13,316][1652475] Updated weights for policy 0, policy_version 627936 (0.0010) [2024-06-15 19:46:15,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1286078464. Throughput: 0: 10717.8. Samples: 321588224. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:46:15,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:17,608][1652475] Updated weights for policy 0, policy_version 628000 (0.0023) [2024-06-15 19:46:20,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1286209536. Throughput: 0: 10786.1. Samples: 321621504. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:46:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:22,619][1652475] Updated weights for policy 0, policy_version 628083 (0.0099) [2024-06-15 19:46:24,283][1651340] Signal inference workers to stop experience collection... (32300 times) [2024-06-15 19:46:24,354][1652475] InferenceWorker_p0-w0: stopping experience collection (32300 times) [2024-06-15 19:46:24,558][1651340] Signal inference workers to resume experience collection... (32300 times) [2024-06-15 19:46:24,559][1652475] InferenceWorker_p0-w0: resuming experience collection (32300 times) [2024-06-15 19:46:24,719][1652475] Updated weights for policy 0, policy_version 628163 (0.0013) [2024-06-15 19:46:25,738][1648984] Fps is (10 sec: 49153.0, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 1286569984. Throughput: 0: 10649.6. Samples: 321678336. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 19:46:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:25,801][1652475] Updated weights for policy 0, policy_version 628215 (0.0012) [2024-06-15 19:46:30,722][1652475] Updated weights for policy 0, policy_version 628272 (0.0012) [2024-06-15 19:46:30,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 1286701056. Throughput: 0: 10808.9. Samples: 321751040. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:46:30,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:34,201][1652475] Updated weights for policy 0, policy_version 628306 (0.0013) [2024-06-15 19:46:35,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 43144.4, 300 sec: 42765.0). Total num frames: 1286864896. Throughput: 0: 10809.0. Samples: 321784832. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:46:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:36,651][1652475] Updated weights for policy 0, policy_version 628400 (0.0031) [2024-06-15 19:46:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42598.5, 300 sec: 42432.1). Total num frames: 1287127040. Throughput: 0: 10422.0. Samples: 321834496. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:46:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:43,059][1652475] Updated weights for policy 0, policy_version 628484 (0.0014) [2024-06-15 19:46:45,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 1287258112. Throughput: 0: 10717.8. Samples: 321906176. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:46:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:46,346][1652475] Updated weights for policy 0, policy_version 628560 (0.0015) [2024-06-15 19:46:47,777][1652475] Updated weights for policy 0, policy_version 628612 (0.0062) [2024-06-15 19:46:49,766][1652475] Updated weights for policy 0, policy_version 628706 (0.0014) [2024-06-15 19:46:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43691.0, 300 sec: 42876.1). Total num frames: 1287651328. Throughput: 0: 10558.6. Samples: 321933824. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:46:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:46:55,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 1287684096. Throughput: 0: 10422.0. Samples: 322003456. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:46:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:46:55,902][1652475] Updated weights for policy 0, policy_version 628768 (0.0141) [2024-06-15 19:46:56,226][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000628784_1287749632.pth... [2024-06-15 19:46:56,306][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000623824_1277591552.pth [2024-06-15 19:46:56,588][1652475] Updated weights for policy 0, policy_version 628795 (0.0027) [2024-06-15 19:46:59,607][1652475] Updated weights for policy 0, policy_version 628856 (0.0012) [2024-06-15 19:47:00,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1287979008. Throughput: 0: 10604.1. Samples: 322065408. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:47:01,691][1652475] Updated weights for policy 0, policy_version 628944 (0.0114) [2024-06-15 19:47:05,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 42052.4, 300 sec: 42209.6). Total num frames: 1288175616. Throughput: 0: 10433.4. Samples: 322091008. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:47:07,796][1652475] Updated weights for policy 0, policy_version 629014 (0.0014) [2024-06-15 19:47:08,856][1652475] Updated weights for policy 0, policy_version 629055 (0.0010) [2024-06-15 19:47:10,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42052.1, 300 sec: 42431.8). Total num frames: 1288306688. Throughput: 0: 10604.1. Samples: 322155520. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:47:11,040][1651340] Signal inference workers to stop experience collection... (32350 times) [2024-06-15 19:47:11,093][1652475] InferenceWorker_p0-w0: stopping experience collection (32350 times) [2024-06-15 19:47:11,240][1651340] Signal inference workers to resume experience collection... (32350 times) [2024-06-15 19:47:11,240][1652475] InferenceWorker_p0-w0: resuming experience collection (32350 times) [2024-06-15 19:47:13,126][1652475] Updated weights for policy 0, policy_version 629121 (0.0014) [2024-06-15 19:47:14,550][1652475] Updated weights for policy 0, policy_version 629184 (0.0111) [2024-06-15 19:47:15,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43144.8, 300 sec: 42542.9). Total num frames: 1288667136. Throughput: 0: 10342.4. Samples: 322216448. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:47:15,910][1652475] Updated weights for policy 0, policy_version 629245 (0.0013) [2024-06-15 19:47:20,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 1288798208. Throughput: 0: 10285.5. Samples: 322247680. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:47:22,479][1652475] Updated weights for policy 0, policy_version 629322 (0.0013) [2024-06-15 19:47:25,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 39867.7, 300 sec: 42431.8). Total num frames: 1288962048. Throughput: 0: 10456.2. Samples: 322305024. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:47:26,809][1652475] Updated weights for policy 0, policy_version 629378 (0.0162) [2024-06-15 19:47:29,343][1652475] Updated weights for policy 0, policy_version 629456 (0.0013) [2024-06-15 19:47:30,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 1289224192. Throughput: 0: 10240.1. Samples: 322366976. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:30,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:47:31,961][1652475] Updated weights for policy 0, policy_version 629511 (0.0013) [2024-06-15 19:47:34,304][1652475] Updated weights for policy 0, policy_version 629584 (0.0013) [2024-06-15 19:47:35,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 1289486336. Throughput: 0: 10319.6. Samples: 322398208. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:47:39,085][1652475] Updated weights for policy 0, policy_version 629634 (0.0014) [2024-06-15 19:47:40,125][1652475] Updated weights for policy 0, policy_version 629684 (0.0027) [2024-06-15 19:47:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1289617408. Throughput: 0: 10399.3. Samples: 322471424. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:47:42,322][1652475] Updated weights for policy 0, policy_version 629728 (0.0012) [2024-06-15 19:47:44,343][1652475] Updated weights for policy 0, policy_version 629777 (0.0022) [2024-06-15 19:47:45,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1289879552. Throughput: 0: 10262.8. Samples: 322527232. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:47:46,909][1652475] Updated weights for policy 0, policy_version 629859 (0.0015) [2024-06-15 19:47:50,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 39321.3, 300 sec: 42320.7). Total num frames: 1290010624. Throughput: 0: 10387.8. Samples: 322558464. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:50,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:47:52,137][1652475] Updated weights for policy 0, policy_version 629936 (0.0012) [2024-06-15 19:47:55,086][1652475] Updated weights for policy 0, policy_version 629987 (0.0013) [2024-06-15 19:47:55,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 1290240000. Throughput: 0: 10353.8. Samples: 322621440. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:47:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:47:59,238][1652475] Updated weights for policy 0, policy_version 630048 (0.0013) [2024-06-15 19:48:00,722][1651340] Signal inference workers to stop experience collection... (32400 times) [2024-06-15 19:48:00,738][1648984] Fps is (10 sec: 45876.7, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 1290469376. Throughput: 0: 10308.2. Samples: 322680320. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:48:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:48:00,758][1652475] InferenceWorker_p0-w0: stopping experience collection (32400 times) [2024-06-15 19:48:00,933][1651340] Signal inference workers to resume experience collection... (32400 times) [2024-06-15 19:48:00,951][1652475] InferenceWorker_p0-w0: resuming experience collection (32400 times) [2024-06-15 19:48:01,231][1652475] Updated weights for policy 0, policy_version 630144 (0.0012) [2024-06-15 19:48:04,440][1652475] Updated weights for policy 0, policy_version 630202 (0.0013) [2024-06-15 19:48:05,756][1648984] Fps is (10 sec: 42536.5, 60 sec: 41496.0, 300 sec: 42207.5). Total num frames: 1290665984. Throughput: 0: 10304.9. Samples: 322711552. Policy #0 lag: (min: 47.0, avg: 146.4, max: 303.0) [2024-06-15 19:48:05,759][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:48:06,765][1652475] Updated weights for policy 0, policy_version 630256 (0.0013) [2024-06-15 19:48:10,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 41506.3, 300 sec: 42209.6). Total num frames: 1290797056. Throughput: 0: 10422.1. Samples: 322774016. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:48:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:48:13,288][1652475] Updated weights for policy 0, policy_version 630320 (0.0012) [2024-06-15 19:48:15,715][1652475] Updated weights for policy 0, policy_version 630372 (0.0044) [2024-06-15 19:48:15,738][1648984] Fps is (10 sec: 32815.4, 60 sec: 38775.3, 300 sec: 41987.4). Total num frames: 1290993664. Throughput: 0: 10376.5. Samples: 322833920. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:48:15,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:48:17,661][1652475] Updated weights for policy 0, policy_version 630455 (0.0013) [2024-06-15 19:48:18,643][1652475] Updated weights for policy 0, policy_version 630496 (0.0029) [2024-06-15 19:48:20,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 1291321344. Throughput: 0: 10296.9. Samples: 322861568. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:48:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:48:25,441][1652475] Updated weights for policy 0, policy_version 630547 (0.0018) [2024-06-15 19:48:25,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 40413.9, 300 sec: 42209.6). Total num frames: 1291386880. Throughput: 0: 10274.1. Samples: 322933760. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:48:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:48:26,613][1652475] Updated weights for policy 0, policy_version 630592 (0.0013) [2024-06-15 19:48:28,194][1652475] Updated weights for policy 0, policy_version 630656 (0.0012) [2024-06-15 19:48:30,061][1652475] Updated weights for policy 0, policy_version 630721 (0.0012) [2024-06-15 19:48:30,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42598.3, 300 sec: 42433.6). Total num frames: 1291780096. Throughput: 0: 10285.5. Samples: 322990080. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:48:30,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:48:35,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 39321.7, 300 sec: 42209.7). Total num frames: 1291845632. Throughput: 0: 10262.8. Samples: 323020288. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:48:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:48:37,827][1652475] Updated weights for policy 0, policy_version 630800 (0.0014) [2024-06-15 19:48:40,097][1652475] Updated weights for policy 0, policy_version 630880 (0.0014) [2024-06-15 19:48:40,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 40960.0, 300 sec: 42209.7). Total num frames: 1292075008. Throughput: 0: 10410.7. Samples: 323089920. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:48:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:48:42,512][1652475] Updated weights for policy 0, policy_version 630974 (0.0013) [2024-06-15 19:48:45,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1292369920. Throughput: 0: 10194.5. Samples: 323139072. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:48:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:48:50,301][1652475] Updated weights for policy 0, policy_version 631041 (0.0014) [2024-06-15 19:48:50,746][1648984] Fps is (10 sec: 32750.4, 60 sec: 39864.4, 300 sec: 41986.7). Total num frames: 1292402688. Throughput: 0: 10390.0. Samples: 323179008. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:48:50,749][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:48:51,071][1651340] Signal inference workers to stop experience collection... (32450 times) [2024-06-15 19:48:51,133][1652475] InferenceWorker_p0-w0: stopping experience collection (32450 times) [2024-06-15 19:48:51,462][1651340] Signal inference workers to resume experience collection... (32450 times) [2024-06-15 19:48:51,462][1652475] InferenceWorker_p0-w0: resuming experience collection (32450 times) [2024-06-15 19:48:51,810][1652475] Updated weights for policy 0, policy_version 631104 (0.0013) [2024-06-15 19:48:53,718][1652475] Updated weights for policy 0, policy_version 631186 (0.0013) [2024-06-15 19:48:55,157][1652475] Updated weights for policy 0, policy_version 631237 (0.0013) [2024-06-15 19:48:55,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.5, 300 sec: 42321.6). Total num frames: 1292828672. Throughput: 0: 10410.6. Samples: 323242496. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:48:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:48:55,991][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000631280_1292861440.pth... [2024-06-15 19:48:56,038][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000626304_1282670592.pth [2024-06-15 19:49:00,738][1648984] Fps is (10 sec: 49178.5, 60 sec: 40413.9, 300 sec: 42209.6). Total num frames: 1292894208. Throughput: 0: 10570.0. Samples: 323309568. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:49:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:49:02,419][1652475] Updated weights for policy 0, policy_version 631312 (0.0097) [2024-06-15 19:49:03,806][1652475] Updated weights for policy 0, policy_version 631363 (0.0014) [2024-06-15 19:49:05,436][1652475] Updated weights for policy 0, policy_version 631440 (0.0011) [2024-06-15 19:49:05,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42062.5, 300 sec: 42320.7). Total num frames: 1293189120. Throughput: 0: 10717.9. Samples: 323343872. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:49:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:49:06,811][1652475] Updated weights for policy 0, policy_version 631504 (0.0014) [2024-06-15 19:49:10,755][1648984] Fps is (10 sec: 52340.0, 60 sec: 43678.3, 300 sec: 42207.2). Total num frames: 1293418496. Throughput: 0: 10463.6. Samples: 323404800. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:49:10,755][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:49:13,860][1652475] Updated weights for policy 0, policy_version 631570 (0.0015) [2024-06-15 19:49:15,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 1293549568. Throughput: 0: 10865.8. Samples: 323479040. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:49:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:49:16,233][1652475] Updated weights for policy 0, policy_version 631648 (0.0013) [2024-06-15 19:49:17,560][1652475] Updated weights for policy 0, policy_version 631712 (0.0013) [2024-06-15 19:49:19,127][1652475] Updated weights for policy 0, policy_version 631760 (0.0053) [2024-06-15 19:49:20,738][1648984] Fps is (10 sec: 52518.2, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 1293942784. Throughput: 0: 10808.9. Samples: 323506688. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:49:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:49:25,444][1652475] Updated weights for policy 0, policy_version 631824 (0.0012) [2024-06-15 19:49:25,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43144.6, 300 sec: 42320.7). Total num frames: 1293975552. Throughput: 0: 10888.6. Samples: 323579904. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:49:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:49:27,797][1652475] Updated weights for policy 0, policy_version 631888 (0.0013) [2024-06-15 19:49:30,291][1652475] Updated weights for policy 0, policy_version 631986 (0.0016) [2024-06-15 19:49:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 1294336000. Throughput: 0: 10968.2. Samples: 323632640. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:49:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:49:32,130][1651340] Signal inference workers to stop experience collection... (32500 times) [2024-06-15 19:49:32,162][1652475] InferenceWorker_p0-w0: stopping experience collection (32500 times) [2024-06-15 19:49:32,315][1651340] Signal inference workers to resume experience collection... (32500 times) [2024-06-15 19:49:32,315][1652475] InferenceWorker_p0-w0: resuming experience collection (32500 times) [2024-06-15 19:49:33,165][1652475] Updated weights for policy 0, policy_version 632054 (0.0015) [2024-06-15 19:49:35,776][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1294467072. Throughput: 0: 10833.0. Samples: 323666432. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:49:35,777][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:49:39,422][1652475] Updated weights for policy 0, policy_version 632129 (0.0016) [2024-06-15 19:49:40,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43690.7, 300 sec: 42098.5). Total num frames: 1294696448. Throughput: 0: 10865.8. Samples: 323731456. Policy #0 lag: (min: 41.0, avg: 166.1, max: 297.0) [2024-06-15 19:49:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:49:41,965][1652475] Updated weights for policy 0, policy_version 632202 (0.0014) [2024-06-15 19:49:45,502][1652475] Updated weights for policy 0, policy_version 632259 (0.0013) [2024-06-15 19:49:45,739][1648984] Fps is (10 sec: 42592.1, 60 sec: 42051.3, 300 sec: 41876.3). Total num frames: 1294893056. Throughput: 0: 10899.6. Samples: 323800064. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:49:45,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:49:46,312][1652475] Updated weights for policy 0, policy_version 632307 (0.0013) [2024-06-15 19:49:48,705][1652475] Updated weights for policy 0, policy_version 632352 (0.0014) [2024-06-15 19:49:50,539][1652475] Updated weights for policy 0, policy_version 632403 (0.0014) [2024-06-15 19:49:50,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 46425.6, 300 sec: 42654.0). Total num frames: 1295187968. Throughput: 0: 10945.5. Samples: 323836416. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:49:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:49:53,687][1652475] Updated weights for policy 0, policy_version 632449 (0.0045) [2024-06-15 19:49:55,017][1652475] Updated weights for policy 0, policy_version 632510 (0.0022) [2024-06-15 19:49:55,738][1648984] Fps is (10 sec: 49158.3, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 1295384576. Throughput: 0: 11052.0. Samples: 323901952. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:49:55,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 19:49:59,195][1652475] Updated weights for policy 0, policy_version 632576 (0.0106) [2024-06-15 19:50:00,746][1648984] Fps is (10 sec: 39287.9, 60 sec: 44776.6, 300 sec: 42432.5). Total num frames: 1295581184. Throughput: 0: 10784.1. Samples: 323964416. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:00,747][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:50:02,500][1652475] Updated weights for policy 0, policy_version 632656 (0.0014) [2024-06-15 19:50:03,465][1652475] Updated weights for policy 0, policy_version 632704 (0.0100) [2024-06-15 19:50:05,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1295810560. Throughput: 0: 10774.7. Samples: 323991552. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:05,738][1648984] Avg episode reward: [(0, '-0.170')] [2024-06-15 19:50:06,343][1652475] Updated weights for policy 0, policy_version 632752 (0.0025) [2024-06-15 19:50:10,028][1652475] Updated weights for policy 0, policy_version 632785 (0.0014) [2024-06-15 19:50:10,738][1648984] Fps is (10 sec: 39355.0, 60 sec: 42610.5, 300 sec: 41987.5). Total num frames: 1295974400. Throughput: 0: 10899.9. Samples: 324070400. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:50:13,662][1652475] Updated weights for policy 0, policy_version 632852 (0.0014) [2024-06-15 19:50:15,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44782.9, 300 sec: 42431.8). Total num frames: 1296236544. Throughput: 0: 10854.4. Samples: 324121088. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:50:15,868][1652475] Updated weights for policy 0, policy_version 632945 (0.0013) [2024-06-15 19:50:18,704][1652475] Updated weights for policy 0, policy_version 633010 (0.0013) [2024-06-15 19:50:20,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 1296433152. Throughput: 0: 10820.3. Samples: 324153344. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:50:21,854][1651340] Signal inference workers to stop experience collection... (32550 times) [2024-06-15 19:50:21,907][1652475] InferenceWorker_p0-w0: stopping experience collection (32550 times) [2024-06-15 19:50:22,057][1651340] Signal inference workers to resume experience collection... (32550 times) [2024-06-15 19:50:22,059][1652475] InferenceWorker_p0-w0: resuming experience collection (32550 times) [2024-06-15 19:50:22,208][1652475] Updated weights for policy 0, policy_version 633058 (0.0012) [2024-06-15 19:50:25,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 43144.4, 300 sec: 42098.5). Total num frames: 1296564224. Throughput: 0: 10820.2. Samples: 324218368. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:50:27,618][1652475] Updated weights for policy 0, policy_version 633120 (0.0012) [2024-06-15 19:50:29,436][1652475] Updated weights for policy 0, policy_version 633203 (0.0011) [2024-06-15 19:50:30,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.5, 300 sec: 42765.0). Total num frames: 1296891904. Throughput: 0: 10695.5. Samples: 324281344. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:50:33,222][1652475] Updated weights for policy 0, policy_version 633296 (0.0028) [2024-06-15 19:50:35,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 42431.8). Total num frames: 1297088512. Throughput: 0: 10524.4. Samples: 324310016. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:50:39,750][1652475] Updated weights for policy 0, policy_version 633360 (0.0012) [2024-06-15 19:50:40,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 41506.1, 300 sec: 42098.6). Total num frames: 1297186816. Throughput: 0: 10820.3. Samples: 324388864. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:50:41,462][1652475] Updated weights for policy 0, policy_version 633428 (0.0012) [2024-06-15 19:50:43,117][1652475] Updated weights for policy 0, policy_version 633504 (0.0011) [2024-06-15 19:50:45,442][1652475] Updated weights for policy 0, policy_version 633568 (0.0013) [2024-06-15 19:50:45,740][1648984] Fps is (10 sec: 45865.4, 60 sec: 44236.2, 300 sec: 42431.5). Total num frames: 1297547264. Throughput: 0: 10594.2. Samples: 324441088. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:45,741][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:50:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 40413.8, 300 sec: 42209.6). Total num frames: 1297612800. Throughput: 0: 10740.6. Samples: 324474880. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:50:51,402][1652475] Updated weights for policy 0, policy_version 633618 (0.0014) [2024-06-15 19:50:52,593][1652475] Updated weights for policy 0, policy_version 633665 (0.0016) [2024-06-15 19:50:53,834][1652475] Updated weights for policy 0, policy_version 633728 (0.0015) [2024-06-15 19:50:55,738][1648984] Fps is (10 sec: 45884.9, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 1298006016. Throughput: 0: 10535.8. Samples: 324544512. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:50:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:50:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000633792_1298006016.pth... [2024-06-15 19:50:55,825][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000628784_1287749632.pth [2024-06-15 19:50:55,831][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000633792_1298006016.pth [2024-06-15 19:50:56,373][1652475] Updated weights for policy 0, policy_version 633794 (0.0014) [2024-06-15 19:51:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 42604.4, 300 sec: 42320.7). Total num frames: 1298137088. Throughput: 0: 10956.8. Samples: 324614144. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:51:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:51:02,673][1652475] Updated weights for policy 0, policy_version 633872 (0.0013) [2024-06-15 19:51:04,424][1652475] Updated weights for policy 0, policy_version 633952 (0.0012) [2024-06-15 19:51:05,480][1651340] Signal inference workers to stop experience collection... (32600 times) [2024-06-15 19:51:05,536][1652475] InferenceWorker_p0-w0: stopping experience collection (32600 times) [2024-06-15 19:51:05,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 1298399232. Throughput: 0: 11104.7. Samples: 324653056. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:51:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:51:05,820][1651340] Signal inference workers to resume experience collection... (32600 times) [2024-06-15 19:51:05,821][1652475] InferenceWorker_p0-w0: resuming experience collection (32600 times) [2024-06-15 19:51:05,823][1652475] Updated weights for policy 0, policy_version 634000 (0.0019) [2024-06-15 19:51:08,772][1652475] Updated weights for policy 0, policy_version 634080 (0.0013) [2024-06-15 19:51:09,512][1652475] Updated weights for policy 0, policy_version 634109 (0.0012) [2024-06-15 19:51:10,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 44782.8, 300 sec: 42654.0). Total num frames: 1298661376. Throughput: 0: 10808.9. Samples: 324704768. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:51:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:51:15,258][1652475] Updated weights for policy 0, policy_version 634169 (0.0012) [2024-06-15 19:51:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1298792448. Throughput: 0: 11127.4. Samples: 324782080. Policy #0 lag: (min: 35.0, avg: 172.5, max: 355.0) [2024-06-15 19:51:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:51:16,592][1652475] Updated weights for policy 0, policy_version 634210 (0.0012) [2024-06-15 19:51:17,952][1652475] Updated weights for policy 0, policy_version 634262 (0.0047) [2024-06-15 19:51:20,384][1652475] Updated weights for policy 0, policy_version 634321 (0.0016) [2024-06-15 19:51:20,738][1648984] Fps is (10 sec: 45876.2, 60 sec: 44782.9, 300 sec: 42542.9). Total num frames: 1299120128. Throughput: 0: 11138.9. Samples: 324811264. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:51:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:51:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 42320.7). Total num frames: 1299185664. Throughput: 0: 10956.8. Samples: 324881920. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:51:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:51:26,780][1652475] Updated weights for policy 0, policy_version 634416 (0.0015) [2024-06-15 19:51:28,725][1652475] Updated weights for policy 0, policy_version 634480 (0.0013) [2024-06-15 19:51:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 1299546112. Throughput: 0: 11139.4. Samples: 324942336. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:51:30,749][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:51:30,783][1652475] Updated weights for policy 0, policy_version 634553 (0.0023) [2024-06-15 19:51:33,312][1652475] Updated weights for policy 0, policy_version 634595 (0.0012) [2024-06-15 19:51:35,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1299709952. Throughput: 0: 11036.4. Samples: 324971520. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:51:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:51:38,124][1652475] Updated weights for policy 0, policy_version 634640 (0.0012) [2024-06-15 19:51:39,649][1652475] Updated weights for policy 0, policy_version 634690 (0.0018) [2024-06-15 19:51:40,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 46421.3, 300 sec: 43098.3). Total num frames: 1299972096. Throughput: 0: 11161.6. Samples: 325046784. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:51:40,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:51:41,096][1652475] Updated weights for policy 0, policy_version 634768 (0.0014) [2024-06-15 19:51:44,681][1652475] Updated weights for policy 0, policy_version 634834 (0.0013) [2024-06-15 19:51:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44784.6, 300 sec: 42653.9). Total num frames: 1300234240. Throughput: 0: 10934.0. Samples: 325106176. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:51:45,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 19:51:50,738][1648984] Fps is (10 sec: 26214.6, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 1300234240. Throughput: 0: 10831.7. Samples: 325140480. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:51:50,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:51:51,218][1652475] Updated weights for policy 0, policy_version 634896 (0.0012) [2024-06-15 19:51:53,105][1652475] Updated weights for policy 0, policy_version 634964 (0.0017) [2024-06-15 19:51:53,545][1651340] Signal inference workers to stop experience collection... (32650 times) [2024-06-15 19:51:53,632][1652475] InferenceWorker_p0-w0: stopping experience collection (32650 times) [2024-06-15 19:51:53,777][1651340] Signal inference workers to resume experience collection... (32650 times) [2024-06-15 19:51:53,777][1652475] InferenceWorker_p0-w0: resuming experience collection (32650 times) [2024-06-15 19:51:55,152][1652475] Updated weights for policy 0, policy_version 635040 (0.0013) [2024-06-15 19:51:55,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 1300594688. Throughput: 0: 10899.9. Samples: 325195264. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:51:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:51:59,921][1652475] Updated weights for policy 0, policy_version 635120 (0.0013) [2024-06-15 19:52:00,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1300758528. Throughput: 0: 10501.7. Samples: 325254656. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:52:04,255][1652475] Updated weights for policy 0, policy_version 635168 (0.0013) [2024-06-15 19:52:05,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1300922368. Throughput: 0: 10683.7. Samples: 325292032. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:52:05,913][1652475] Updated weights for policy 0, policy_version 635232 (0.0014) [2024-06-15 19:52:07,434][1652475] Updated weights for policy 0, policy_version 635296 (0.0011) [2024-06-15 19:52:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.3, 300 sec: 42320.7). Total num frames: 1301151744. Throughput: 0: 10433.4. Samples: 325351424. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:52:12,736][1652475] Updated weights for policy 0, policy_version 635347 (0.0013) [2024-06-15 19:52:13,658][1652475] Updated weights for policy 0, policy_version 635392 (0.0015) [2024-06-15 19:52:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 1301348352. Throughput: 0: 10706.5. Samples: 325424128. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:52:16,395][1652475] Updated weights for policy 0, policy_version 635456 (0.0013) [2024-06-15 19:52:18,615][1652475] Updated weights for policy 0, policy_version 635537 (0.0013) [2024-06-15 19:52:19,553][1652475] Updated weights for policy 0, policy_version 635583 (0.0011) [2024-06-15 19:52:20,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 1301676032. Throughput: 0: 10570.0. Samples: 325447168. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:52:25,731][1652475] Updated weights for policy 0, policy_version 635645 (0.0013) [2024-06-15 19:52:25,738][1648984] Fps is (10 sec: 42597.4, 60 sec: 43144.3, 300 sec: 42542.8). Total num frames: 1301774336. Throughput: 0: 10547.2. Samples: 325521408. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:25,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 19:52:28,897][1652475] Updated weights for policy 0, policy_version 635712 (0.0014) [2024-06-15 19:52:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42654.0). Total num frames: 1302069248. Throughput: 0: 10501.7. Samples: 325578752. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:30,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:52:31,212][1652475] Updated weights for policy 0, policy_version 635808 (0.0146) [2024-06-15 19:52:35,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1302200320. Throughput: 0: 10296.9. Samples: 325603840. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:52:37,554][1652475] Updated weights for policy 0, policy_version 635872 (0.0014) [2024-06-15 19:52:40,520][1651340] Signal inference workers to stop experience collection... (32700 times) [2024-06-15 19:52:40,584][1652475] InferenceWorker_p0-w0: stopping experience collection (32700 times) [2024-06-15 19:52:40,738][1648984] Fps is (10 sec: 26214.3, 60 sec: 39321.7, 300 sec: 42209.6). Total num frames: 1302331392. Throughput: 0: 10649.6. Samples: 325674496. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:52:40,767][1651340] Signal inference workers to resume experience collection... (32700 times) [2024-06-15 19:52:40,767][1652475] InferenceWorker_p0-w0: resuming experience collection (32700 times) [2024-06-15 19:52:40,919][1652475] Updated weights for policy 0, policy_version 635922 (0.0013) [2024-06-15 19:52:42,834][1652475] Updated weights for policy 0, policy_version 636005 (0.0014) [2024-06-15 19:52:44,608][1652475] Updated weights for policy 0, policy_version 636069 (0.0118) [2024-06-15 19:52:45,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1302724608. Throughput: 0: 10433.4. Samples: 325724160. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 19:52:49,811][1652475] Updated weights for policy 0, policy_version 636134 (0.0013) [2024-06-15 19:52:50,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 1302855680. Throughput: 0: 10387.9. Samples: 325759488. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:52:52,010][1652475] Updated weights for policy 0, policy_version 636164 (0.0031) [2024-06-15 19:52:55,738][1648984] Fps is (10 sec: 26213.4, 60 sec: 39867.6, 300 sec: 42431.7). Total num frames: 1302986752. Throughput: 0: 10467.5. Samples: 325822464. Policy #0 lag: (min: 63.0, avg: 179.3, max: 303.0) [2024-06-15 19:52:55,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:52:55,877][1652475] Updated weights for policy 0, policy_version 636230 (0.0013) [2024-06-15 19:52:56,038][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000636240_1303019520.pth... [2024-06-15 19:52:56,207][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000631280_1292861440.pth [2024-06-15 19:52:57,844][1652475] Updated weights for policy 0, policy_version 636306 (0.0014) [2024-06-15 19:52:58,708][1652475] Updated weights for policy 0, policy_version 636352 (0.0013) [2024-06-15 19:53:00,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.2, 300 sec: 42767.1). Total num frames: 1303281664. Throughput: 0: 10387.9. Samples: 325891584. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:00,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:01,642][1652475] Updated weights for policy 0, policy_version 636409 (0.0033) [2024-06-15 19:53:04,519][1652475] Updated weights for policy 0, policy_version 636472 (0.0016) [2024-06-15 19:53:05,738][1648984] Fps is (10 sec: 52430.7, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 1303511040. Throughput: 0: 10615.5. Samples: 325924864. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:08,883][1652475] Updated weights for policy 0, policy_version 636529 (0.0017) [2024-06-15 19:53:10,334][1652475] Updated weights for policy 0, policy_version 636592 (0.0014) [2024-06-15 19:53:10,745][1648984] Fps is (10 sec: 49116.7, 60 sec: 43685.4, 300 sec: 43319.4). Total num frames: 1303773184. Throughput: 0: 10443.2. Samples: 325991424. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:10,746][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:11,881][1652475] Updated weights for policy 0, policy_version 636628 (0.0013) [2024-06-15 19:53:15,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 1303937024. Throughput: 0: 10683.7. Samples: 326059520. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:15,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:16,499][1652475] Updated weights for policy 0, policy_version 636720 (0.0016) [2024-06-15 19:53:20,738][1648984] Fps is (10 sec: 32791.8, 60 sec: 40413.9, 300 sec: 43098.3). Total num frames: 1304100864. Throughput: 0: 10763.4. Samples: 326088192. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:21,219][1652475] Updated weights for policy 0, policy_version 636790 (0.0013) [2024-06-15 19:53:22,837][1652475] Updated weights for policy 0, policy_version 636857 (0.0092) [2024-06-15 19:53:24,116][1651340] Signal inference workers to stop experience collection... (32750 times) [2024-06-15 19:53:24,144][1652475] InferenceWorker_p0-w0: stopping experience collection (32750 times) [2024-06-15 19:53:24,375][1651340] Signal inference workers to resume experience collection... (32750 times) [2024-06-15 19:53:24,376][1652475] InferenceWorker_p0-w0: resuming experience collection (32750 times) [2024-06-15 19:53:24,956][1652475] Updated weights for policy 0, policy_version 636923 (0.0014) [2024-06-15 19:53:25,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 44236.9, 300 sec: 42876.1). Total num frames: 1304428544. Throughput: 0: 10558.5. Samples: 326149632. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:29,447][1652475] Updated weights for policy 0, policy_version 636985 (0.0059) [2024-06-15 19:53:30,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1304559616. Throughput: 0: 10899.9. Samples: 326214656. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:33,827][1652475] Updated weights for policy 0, policy_version 637047 (0.0064) [2024-06-15 19:53:34,938][1652475] Updated weights for policy 0, policy_version 637088 (0.0011) [2024-06-15 19:53:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1304821760. Throughput: 0: 10808.9. Samples: 326245888. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:35,841][1652475] Updated weights for policy 0, policy_version 637122 (0.0011) [2024-06-15 19:53:40,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 44236.7, 300 sec: 42765.0). Total num frames: 1304985600. Throughput: 0: 10900.0. Samples: 326312960. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:41,034][1652475] Updated weights for policy 0, policy_version 637216 (0.0031) [2024-06-15 19:53:44,803][1652475] Updated weights for policy 0, policy_version 637265 (0.0014) [2024-06-15 19:53:45,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 40960.0, 300 sec: 43321.2). Total num frames: 1305182208. Throughput: 0: 10865.8. Samples: 326380544. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:46,025][1652475] Updated weights for policy 0, policy_version 637313 (0.0076) [2024-06-15 19:53:47,773][1652475] Updated weights for policy 0, policy_version 637394 (0.0011) [2024-06-15 19:53:50,738][1648984] Fps is (10 sec: 49150.6, 60 sec: 43690.4, 300 sec: 42876.1). Total num frames: 1305477120. Throughput: 0: 10660.9. Samples: 326404608. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:50,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:53,077][1652475] Updated weights for policy 0, policy_version 637443 (0.0018) [2024-06-15 19:53:54,173][1652475] Updated weights for policy 0, policy_version 637492 (0.0014) [2024-06-15 19:53:55,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.9, 300 sec: 43098.3). Total num frames: 1305608192. Throughput: 0: 10765.1. Samples: 326475776. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:53:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:53:56,733][1652475] Updated weights for policy 0, policy_version 637537 (0.0012) [2024-06-15 19:53:57,934][1652475] Updated weights for policy 0, policy_version 637571 (0.0012) [2024-06-15 19:53:59,435][1652475] Updated weights for policy 0, policy_version 637636 (0.0012) [2024-06-15 19:54:00,738][1648984] Fps is (10 sec: 49153.8, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 1305968640. Throughput: 0: 10740.6. Samples: 326542848. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:54:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:54:04,652][1652475] Updated weights for policy 0, policy_version 637699 (0.0012) [2024-06-15 19:54:05,738][1648984] Fps is (10 sec: 45873.4, 60 sec: 42598.1, 300 sec: 42878.5). Total num frames: 1306066944. Throughput: 0: 10888.4. Samples: 326578176. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:54:05,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:54:06,128][1652475] Updated weights for policy 0, policy_version 637757 (0.0012) [2024-06-15 19:54:09,605][1652475] Updated weights for policy 0, policy_version 637829 (0.0020) [2024-06-15 19:54:10,147][1651340] Signal inference workers to stop experience collection... (32800 times) [2024-06-15 19:54:10,181][1652475] InferenceWorker_p0-w0: stopping experience collection (32800 times) [2024-06-15 19:54:10,307][1651340] Signal inference workers to resume experience collection... (32800 times) [2024-06-15 19:54:10,308][1652475] InferenceWorker_p0-w0: resuming experience collection (32800 times) [2024-06-15 19:54:10,739][1648984] Fps is (10 sec: 42591.1, 60 sec: 43694.7, 300 sec: 43542.3). Total num frames: 1306394624. Throughput: 0: 10808.5. Samples: 326636032. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:54:10,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:54:12,501][1652475] Updated weights for policy 0, policy_version 637894 (0.0014) [2024-06-15 19:54:15,749][1648984] Fps is (10 sec: 45827.0, 60 sec: 43136.8, 300 sec: 42652.4). Total num frames: 1306525696. Throughput: 0: 10874.5. Samples: 326704128. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:54:15,749][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:54:18,466][1652475] Updated weights for policy 0, policy_version 637968 (0.0013) [2024-06-15 19:54:20,738][1648984] Fps is (10 sec: 36051.0, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 1306755072. Throughput: 0: 11002.3. Samples: 326740992. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:54:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 19:54:20,781][1652475] Updated weights for policy 0, policy_version 638068 (0.0016) [2024-06-15 19:54:22,312][1652475] Updated weights for policy 0, policy_version 638141 (0.0019) [2024-06-15 19:54:25,738][1648984] Fps is (10 sec: 39364.2, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1306918912. Throughput: 0: 10729.2. Samples: 326795776. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:54:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:54:28,227][1652475] Updated weights for policy 0, policy_version 638208 (0.0017) [2024-06-15 19:54:30,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1307049984. Throughput: 0: 10706.5. Samples: 326862336. Policy #0 lag: (min: 15.0, avg: 123.8, max: 335.0) [2024-06-15 19:54:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:54:31,931][1652475] Updated weights for policy 0, policy_version 638274 (0.0017) [2024-06-15 19:54:33,068][1652475] Updated weights for policy 0, policy_version 638327 (0.0101) [2024-06-15 19:54:34,733][1652475] Updated weights for policy 0, policy_version 638396 (0.0013) [2024-06-15 19:54:35,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.8, 300 sec: 43209.3). Total num frames: 1307443200. Throughput: 0: 10854.5. Samples: 326893056. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:54:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 19:54:40,743][1648984] Fps is (10 sec: 52402.8, 60 sec: 43141.0, 300 sec: 42986.7). Total num frames: 1307574272. Throughput: 0: 10796.3. Samples: 326961664. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:54:40,743][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:54:42,795][1652475] Updated weights for policy 0, policy_version 638480 (0.0021) [2024-06-15 19:54:44,297][1652475] Updated weights for policy 0, policy_version 638549 (0.0013) [2024-06-15 19:54:45,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44236.8, 300 sec: 42876.1). Total num frames: 1307836416. Throughput: 0: 10683.7. Samples: 327023616. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:54:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:54:46,355][1652475] Updated weights for policy 0, policy_version 638625 (0.0017) [2024-06-15 19:54:50,738][1648984] Fps is (10 sec: 39340.8, 60 sec: 41506.4, 300 sec: 42654.0). Total num frames: 1307967488. Throughput: 0: 10626.9. Samples: 327056384. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:54:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:54:51,625][1652475] Updated weights for policy 0, policy_version 638690 (0.0109) [2024-06-15 19:54:54,795][1652475] Updated weights for policy 0, policy_version 638752 (0.0014) [2024-06-15 19:54:55,599][1651340] Signal inference workers to stop experience collection... (32850 times) [2024-06-15 19:54:55,652][1652475] InferenceWorker_p0-w0: stopping experience collection (32850 times) [2024-06-15 19:54:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 42877.3). Total num frames: 1308229632. Throughput: 0: 10934.4. Samples: 327128064. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:54:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:54:55,867][1651340] Signal inference workers to resume experience collection... (32850 times) [2024-06-15 19:54:55,868][1652475] InferenceWorker_p0-w0: resuming experience collection (32850 times) [2024-06-15 19:54:56,229][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000638816_1308295168.pth... [2024-06-15 19:54:56,364][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000633792_1298006016.pth [2024-06-15 19:54:56,898][1652475] Updated weights for policy 0, policy_version 638835 (0.0103) [2024-06-15 19:54:58,744][1652475] Updated weights for policy 0, policy_version 638909 (0.0012) [2024-06-15 19:55:00,746][1648984] Fps is (10 sec: 52428.8, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 1308491776. Throughput: 0: 10618.0. Samples: 327181824. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:00,747][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:55:03,896][1652475] Updated weights for policy 0, policy_version 638972 (0.0013) [2024-06-15 19:55:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.7, 300 sec: 42876.1). Total num frames: 1308622848. Throughput: 0: 10535.8. Samples: 327215104. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:55:07,828][1652475] Updated weights for policy 0, policy_version 639024 (0.0014) [2024-06-15 19:55:10,667][1652475] Updated weights for policy 0, policy_version 639101 (0.0014) [2024-06-15 19:55:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40961.1, 300 sec: 42765.0). Total num frames: 1308852224. Throughput: 0: 10752.0. Samples: 327279616. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 19:55:12,133][1652475] Updated weights for policy 0, policy_version 639152 (0.0011) [2024-06-15 19:55:15,153][1652475] Updated weights for policy 0, policy_version 639202 (0.0014) [2024-06-15 19:55:15,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43698.5, 300 sec: 43098.2). Total num frames: 1309147136. Throughput: 0: 10649.6. Samples: 327341568. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:15,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:55:18,711][1652475] Updated weights for policy 0, policy_version 639265 (0.0014) [2024-06-15 19:55:19,360][1652475] Updated weights for policy 0, policy_version 639293 (0.0014) [2024-06-15 19:55:20,738][1648984] Fps is (10 sec: 42597.1, 60 sec: 42052.0, 300 sec: 43098.2). Total num frames: 1309278208. Throughput: 0: 10706.4. Samples: 327374848. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:20,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:55:23,440][1652475] Updated weights for policy 0, policy_version 639348 (0.0014) [2024-06-15 19:55:25,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1309540352. Throughput: 0: 10628.0. Samples: 327439872. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:55:26,516][1652475] Updated weights for policy 0, policy_version 639440 (0.0013) [2024-06-15 19:55:29,691][1652475] Updated weights for policy 0, policy_version 639505 (0.0117) [2024-06-15 19:55:30,738][1648984] Fps is (10 sec: 49153.6, 60 sec: 45329.0, 300 sec: 42987.2). Total num frames: 1309769728. Throughput: 0: 10615.5. Samples: 327501312. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:55:34,462][1652475] Updated weights for policy 0, policy_version 639568 (0.0025) [2024-06-15 19:55:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 1309933568. Throughput: 0: 10729.3. Samples: 327539200. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:55:36,740][1652475] Updated weights for policy 0, policy_version 639667 (0.0031) [2024-06-15 19:55:38,800][1652475] Updated weights for policy 0, policy_version 639712 (0.0029) [2024-06-15 19:55:40,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43694.2, 300 sec: 42876.4). Total num frames: 1310195712. Throughput: 0: 10444.8. Samples: 327598080. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:55:42,152][1651340] Signal inference workers to stop experience collection... (32900 times) [2024-06-15 19:55:42,253][1652475] InferenceWorker_p0-w0: stopping experience collection (32900 times) [2024-06-15 19:55:42,369][1651340] Signal inference workers to resume experience collection... (32900 times) [2024-06-15 19:55:42,378][1652475] InferenceWorker_p0-w0: resuming experience collection (32900 times) [2024-06-15 19:55:42,552][1652475] Updated weights for policy 0, policy_version 639777 (0.0012) [2024-06-15 19:55:43,271][1652475] Updated weights for policy 0, policy_version 639808 (0.0011) [2024-06-15 19:55:45,737][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1310326784. Throughput: 0: 10956.8. Samples: 327674880. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:45,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:55:47,399][1652475] Updated weights for policy 0, policy_version 639875 (0.0014) [2024-06-15 19:55:50,418][1652475] Updated weights for policy 0, policy_version 639956 (0.0017) [2024-06-15 19:55:50,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44783.0, 300 sec: 42876.1). Total num frames: 1310654464. Throughput: 0: 10797.5. Samples: 327700992. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:55:54,768][1652475] Updated weights for policy 0, policy_version 640048 (0.0014) [2024-06-15 19:55:55,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1310851072. Throughput: 0: 10911.3. Samples: 327770624. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:55:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:55:59,080][1652475] Updated weights for policy 0, policy_version 640113 (0.0013) [2024-06-15 19:56:00,559][1652475] Updated weights for policy 0, policy_version 640192 (0.0014) [2024-06-15 19:56:00,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1311113216. Throughput: 0: 11002.3. Samples: 327836672. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:56:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:56:03,005][1652475] Updated weights for policy 0, policy_version 640251 (0.0100) [2024-06-15 19:56:05,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 1311277056. Throughput: 0: 10945.5. Samples: 327867392. Policy #0 lag: (min: 42.0, avg: 130.0, max: 240.0) [2024-06-15 19:56:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:56:06,601][1652475] Updated weights for policy 0, policy_version 640315 (0.0117) [2024-06-15 19:56:10,696][1652475] Updated weights for policy 0, policy_version 640373 (0.0012) [2024-06-15 19:56:10,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 1311473664. Throughput: 0: 11161.6. Samples: 327942144. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:56:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:56:14,786][1652475] Updated weights for policy 0, policy_version 640480 (0.0014) [2024-06-15 19:56:15,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.8, 300 sec: 42876.1). Total num frames: 1311768576. Throughput: 0: 11047.8. Samples: 327998464. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:56:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:56:17,287][1652475] Updated weights for policy 0, policy_version 640528 (0.0012) [2024-06-15 19:56:18,433][1652475] Updated weights for policy 0, policy_version 640570 (0.0131) [2024-06-15 19:56:20,738][1648984] Fps is (10 sec: 42597.4, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1311899648. Throughput: 0: 10865.7. Samples: 328028160. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:56:20,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:56:21,695][1652475] Updated weights for policy 0, policy_version 640609 (0.0014) [2024-06-15 19:56:22,660][1652475] Updated weights for policy 0, policy_version 640646 (0.0013) [2024-06-15 19:56:25,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 1312161792. Throughput: 0: 11207.1. Samples: 328102400. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:56:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:56:26,562][1652475] Updated weights for policy 0, policy_version 640720 (0.0135) [2024-06-15 19:56:26,949][1651340] Signal inference workers to stop experience collection... (32950 times) [2024-06-15 19:56:26,997][1652475] InferenceWorker_p0-w0: stopping experience collection (32950 times) [2024-06-15 19:56:27,127][1651340] Signal inference workers to resume experience collection... (32950 times) [2024-06-15 19:56:27,127][1652475] InferenceWorker_p0-w0: resuming experience collection (32950 times) [2024-06-15 19:56:27,435][1652475] Updated weights for policy 0, policy_version 640768 (0.0027) [2024-06-15 19:56:30,738][1648984] Fps is (10 sec: 49154.1, 60 sec: 43690.8, 300 sec: 42987.2). Total num frames: 1312391168. Throughput: 0: 10979.6. Samples: 328168960. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:56:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 19:56:32,759][1652475] Updated weights for policy 0, policy_version 640864 (0.0097) [2024-06-15 19:56:34,503][1652475] Updated weights for policy 0, policy_version 640944 (0.0016) [2024-06-15 19:56:35,738][1648984] Fps is (10 sec: 52426.7, 60 sec: 45874.8, 300 sec: 43098.2). Total num frames: 1312686080. Throughput: 0: 11047.7. Samples: 328198144. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:56:35,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 19:56:40,536][1652475] Updated weights for policy 0, policy_version 640993 (0.0020) [2024-06-15 19:56:40,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1312751616. Throughput: 0: 10945.4. Samples: 328263168. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:56:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:56:42,807][1652475] Updated weights for policy 0, policy_version 641040 (0.0013) [2024-06-15 19:56:44,826][1652475] Updated weights for policy 0, policy_version 641120 (0.0128) [2024-06-15 19:56:45,738][1648984] Fps is (10 sec: 39323.5, 60 sec: 45875.1, 300 sec: 43542.6). Total num frames: 1313079296. Throughput: 0: 10922.7. Samples: 328328192. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:56:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 19:56:46,037][1652475] Updated weights for policy 0, policy_version 641170 (0.0015) [2024-06-15 19:56:50,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 1313210368. Throughput: 0: 10922.7. Samples: 328358912. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:56:50,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 19:56:53,422][1652475] Updated weights for policy 0, policy_version 641252 (0.0013) [2024-06-15 19:56:55,738][1648984] Fps is (10 sec: 32766.7, 60 sec: 42598.2, 300 sec: 42876.0). Total num frames: 1313406976. Throughput: 0: 10831.6. Samples: 328429568. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:56:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:56:56,059][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000641328_1313439744.pth... [2024-06-15 19:56:56,068][1652475] Updated weights for policy 0, policy_version 641328 (0.0021) [2024-06-15 19:56:56,209][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000636240_1303019520.pth [2024-06-15 19:56:57,411][1652475] Updated weights for policy 0, policy_version 641380 (0.0131) [2024-06-15 19:56:59,155][1652475] Updated weights for policy 0, policy_version 641458 (0.0013) [2024-06-15 19:57:00,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1313734656. Throughput: 0: 10763.4. Samples: 328482816. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:57:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:57:05,738][1648984] Fps is (10 sec: 39323.1, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 1313800192. Throughput: 0: 10991.0. Samples: 328522752. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:57:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:57:06,067][1652475] Updated weights for policy 0, policy_version 641520 (0.0013) [2024-06-15 19:57:07,184][1652475] Updated weights for policy 0, policy_version 641555 (0.0019) [2024-06-15 19:57:08,500][1652475] Updated weights for policy 0, policy_version 641618 (0.0155) [2024-06-15 19:57:09,636][1652475] Updated weights for policy 0, policy_version 641667 (0.0014) [2024-06-15 19:57:09,634][1651340] Signal inference workers to stop experience collection... (33000 times) [2024-06-15 19:57:09,705][1652475] InferenceWorker_p0-w0: stopping experience collection (33000 times) [2024-06-15 19:57:09,780][1651340] Signal inference workers to resume experience collection... (33000 times) [2024-06-15 19:57:09,818][1652475] InferenceWorker_p0-w0: resuming experience collection (33000 times) [2024-06-15 19:57:10,554][1652475] Updated weights for policy 0, policy_version 641720 (0.0159) [2024-06-15 19:57:10,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 45875.3, 300 sec: 43653.6). Total num frames: 1314226176. Throughput: 0: 10763.4. Samples: 328586752. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:57:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 19:57:15,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1314258944. Throughput: 0: 10934.0. Samples: 328660992. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:57:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 19:57:17,688][1652475] Updated weights for policy 0, policy_version 641778 (0.0015) [2024-06-15 19:57:20,541][1652475] Updated weights for policy 0, policy_version 641858 (0.0037) [2024-06-15 19:57:20,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 43690.9, 300 sec: 43209.4). Total num frames: 1314521088. Throughput: 0: 10831.8. Samples: 328685568. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:57:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 19:57:21,530][1652475] Updated weights for policy 0, policy_version 641915 (0.0093) [2024-06-15 19:57:24,572][1652475] Updated weights for policy 0, policy_version 641982 (0.0051) [2024-06-15 19:57:25,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1314783232. Throughput: 0: 10706.5. Samples: 328744960. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:57:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:57:29,729][1652475] Updated weights for policy 0, policy_version 642034 (0.0016) [2024-06-15 19:57:30,738][1648984] Fps is (10 sec: 42597.0, 60 sec: 42598.1, 300 sec: 43209.3). Total num frames: 1314947072. Throughput: 0: 10717.8. Samples: 328810496. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:57:30,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 19:57:31,315][1652475] Updated weights for policy 0, policy_version 642112 (0.0019) [2024-06-15 19:57:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.4, 300 sec: 43542.6). Total num frames: 1315176448. Throughput: 0: 10729.2. Samples: 328841728. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:57:35,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 19:57:36,378][1652475] Updated weights for policy 0, policy_version 642177 (0.0014) [2024-06-15 19:57:40,539][1652475] Updated weights for policy 0, policy_version 642258 (0.0015) [2024-06-15 19:57:40,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 1315340288. Throughput: 0: 10661.0. Samples: 328909312. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:57:40,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 19:57:42,584][1652475] Updated weights for policy 0, policy_version 642338 (0.0012) [2024-06-15 19:57:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1315569664. Throughput: 0: 10911.3. Samples: 328973824. Policy #0 lag: (min: 5.0, avg: 85.8, max: 261.0) [2024-06-15 19:57:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:57:46,764][1652475] Updated weights for policy 0, policy_version 642400 (0.0022) [2024-06-15 19:57:48,940][1652475] Updated weights for policy 0, policy_version 642435 (0.0014) [2024-06-15 19:57:50,285][1652475] Updated weights for policy 0, policy_version 642496 (0.0016) [2024-06-15 19:57:50,738][1648984] Fps is (10 sec: 49153.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1315831808. Throughput: 0: 10740.6. Samples: 329006080. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:57:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:57:53,957][1652475] Updated weights for policy 0, policy_version 642565 (0.0134) [2024-06-15 19:57:55,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 44783.2, 300 sec: 43431.5). Total num frames: 1316093952. Throughput: 0: 10661.0. Samples: 329066496. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:57:55,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:57:58,065][1652475] Updated weights for policy 0, policy_version 642640 (0.0012) [2024-06-15 19:57:58,186][1651340] Signal inference workers to stop experience collection... (33050 times) [2024-06-15 19:57:58,250][1652475] InferenceWorker_p0-w0: stopping experience collection (33050 times) [2024-06-15 19:57:58,442][1651340] Signal inference workers to resume experience collection... (33050 times) [2024-06-15 19:57:58,443][1652475] InferenceWorker_p0-w0: resuming experience collection (33050 times) [2024-06-15 19:58:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 43098.2). Total num frames: 1316225024. Throughput: 0: 10524.5. Samples: 329134592. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:58:01,640][1652475] Updated weights for policy 0, policy_version 642705 (0.0013) [2024-06-15 19:58:04,174][1652475] Updated weights for policy 0, policy_version 642755 (0.0013) [2024-06-15 19:58:05,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44782.9, 300 sec: 43099.3). Total num frames: 1316487168. Throughput: 0: 10717.8. Samples: 329167872. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:58:05,994][1652475] Updated weights for policy 0, policy_version 642833 (0.0202) [2024-06-15 19:58:06,950][1652475] Updated weights for policy 0, policy_version 642880 (0.0013) [2024-06-15 19:58:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 40960.0, 300 sec: 43209.3). Total num frames: 1316683776. Throughput: 0: 10922.7. Samples: 329236480. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:58:11,196][1652475] Updated weights for policy 0, policy_version 642944 (0.0171) [2024-06-15 19:58:14,502][1652475] Updated weights for policy 0, policy_version 643001 (0.0013) [2024-06-15 19:58:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 1316880384. Throughput: 0: 10888.6. Samples: 329300480. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:58:17,400][1652475] Updated weights for policy 0, policy_version 643059 (0.0138) [2024-06-15 19:58:19,153][1652475] Updated weights for policy 0, policy_version 643136 (0.0015) [2024-06-15 19:58:20,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1317142528. Throughput: 0: 10820.3. Samples: 329328640. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:58:23,171][1652475] Updated weights for policy 0, policy_version 643195 (0.0043) [2024-06-15 19:58:25,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1317306368. Throughput: 0: 10877.2. Samples: 329398784. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:58:26,667][1652475] Updated weights for policy 0, policy_version 643256 (0.0013) [2024-06-15 19:58:29,164][1652475] Updated weights for policy 0, policy_version 643321 (0.0013) [2024-06-15 19:58:30,346][1652475] Updated weights for policy 0, policy_version 643389 (0.0034) [2024-06-15 19:58:30,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 45329.1, 300 sec: 43542.5). Total num frames: 1317666816. Throughput: 0: 10808.8. Samples: 329460224. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:30,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:58:34,857][1652475] Updated weights for policy 0, policy_version 643447 (0.0015) [2024-06-15 19:58:35,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1317797888. Throughput: 0: 10956.8. Samples: 329499136. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:35,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 19:58:38,244][1652475] Updated weights for policy 0, policy_version 643504 (0.0012) [2024-06-15 19:58:40,739][1648984] Fps is (10 sec: 29489.2, 60 sec: 43690.1, 300 sec: 43320.3). Total num frames: 1317961728. Throughput: 0: 11070.4. Samples: 329564672. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:40,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 19:58:41,862][1652475] Updated weights for policy 0, policy_version 643586 (0.0094) [2024-06-15 19:58:42,562][1651340] Signal inference workers to stop experience collection... (33100 times) [2024-06-15 19:58:42,610][1652475] InferenceWorker_p0-w0: stopping experience collection (33100 times) [2024-06-15 19:58:42,748][1651340] Signal inference workers to resume experience collection... (33100 times) [2024-06-15 19:58:42,750][1652475] InferenceWorker_p0-w0: resuming experience collection (33100 times) [2024-06-15 19:58:42,952][1652475] Updated weights for policy 0, policy_version 643648 (0.0014) [2024-06-15 19:58:45,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44783.0, 300 sec: 43320.5). Total num frames: 1318256640. Throughput: 0: 10934.0. Samples: 329626624. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 19:58:46,077][1652475] Updated weights for policy 0, policy_version 643711 (0.0015) [2024-06-15 19:58:49,958][1652475] Updated weights for policy 0, policy_version 643772 (0.0031) [2024-06-15 19:58:50,738][1648984] Fps is (10 sec: 49156.5, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1318453248. Throughput: 0: 10990.9. Samples: 329662464. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 19:58:54,424][1652475] Updated weights for policy 0, policy_version 643812 (0.0014) [2024-06-15 19:58:55,738][1648984] Fps is (10 sec: 39320.0, 60 sec: 42598.1, 300 sec: 42987.1). Total num frames: 1318649856. Throughput: 0: 10877.1. Samples: 329725952. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:58:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:58:55,909][1652475] Updated weights for policy 0, policy_version 643888 (0.0022) [2024-06-15 19:58:56,295][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000643904_1318715392.pth... [2024-06-15 19:58:56,428][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000638816_1308295168.pth [2024-06-15 19:58:57,732][1652475] Updated weights for policy 0, policy_version 643962 (0.0109) [2024-06-15 19:59:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 43431.5). Total num frames: 1318879232. Throughput: 0: 10820.3. Samples: 329787392. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:59:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 19:59:01,507][1652475] Updated weights for policy 0, policy_version 644016 (0.0012) [2024-06-15 19:59:05,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 41505.9, 300 sec: 42654.1). Total num frames: 1318977536. Throughput: 0: 10888.5. Samples: 329818624. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:59:05,739][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 19:59:06,526][1652475] Updated weights for policy 0, policy_version 644064 (0.0012) [2024-06-15 19:59:08,689][1652475] Updated weights for policy 0, policy_version 644146 (0.0013) [2024-06-15 19:59:10,507][1652475] Updated weights for policy 0, policy_version 644192 (0.0016) [2024-06-15 19:59:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.6, 300 sec: 43322.0). Total num frames: 1319305216. Throughput: 0: 10706.5. Samples: 329880576. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:59:10,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 19:59:13,398][1652475] Updated weights for policy 0, policy_version 644256 (0.0110) [2024-06-15 19:59:15,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1319501824. Throughput: 0: 10683.8. Samples: 329940992. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:59:15,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 19:59:18,311][1652475] Updated weights for policy 0, policy_version 644307 (0.0059) [2024-06-15 19:59:19,422][1652475] Updated weights for policy 0, policy_version 644355 (0.0011) [2024-06-15 19:59:20,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 1319731200. Throughput: 0: 10661.0. Samples: 329978880. Policy #0 lag: (min: 15.0, avg: 113.0, max: 271.0) [2024-06-15 19:59:20,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 19:59:20,796][1652475] Updated weights for policy 0, policy_version 644416 (0.0012) [2024-06-15 19:59:25,491][1652475] Updated weights for policy 0, policy_version 644497 (0.0013) [2024-06-15 19:59:25,740][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 1319927808. Throughput: 0: 10547.4. Samples: 330039296. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 19:59:25,743][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 19:59:30,738][1648984] Fps is (10 sec: 36044.2, 60 sec: 40414.0, 300 sec: 42876.1). Total num frames: 1320091648. Throughput: 0: 10706.5. Samples: 330108416. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 19:59:30,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 19:59:30,739][1652475] Updated weights for policy 0, policy_version 644576 (0.0013) [2024-06-15 19:59:31,479][1651340] Signal inference workers to stop experience collection... (33150 times) [2024-06-15 19:59:31,529][1652475] InferenceWorker_p0-w0: stopping experience collection (33150 times) [2024-06-15 19:59:31,645][1651340] Signal inference workers to resume experience collection... (33150 times) [2024-06-15 19:59:31,646][1652475] InferenceWorker_p0-w0: resuming experience collection (33150 times) [2024-06-15 19:59:32,543][1652475] Updated weights for policy 0, policy_version 644643 (0.0012) [2024-06-15 19:59:34,427][1652475] Updated weights for policy 0, policy_version 644675 (0.0022) [2024-06-15 19:59:35,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43144.6, 300 sec: 43432.2). Total num frames: 1320386560. Throughput: 0: 10410.7. Samples: 330130944. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 19:59:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 19:59:35,826][1652475] Updated weights for policy 0, policy_version 644736 (0.0058) [2024-06-15 19:59:40,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43145.2, 300 sec: 43098.3). Total num frames: 1320550400. Throughput: 0: 10490.4. Samples: 330198016. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 19:59:40,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:59:41,598][1652475] Updated weights for policy 0, policy_version 644802 (0.0013) [2024-06-15 19:59:42,692][1652475] Updated weights for policy 0, policy_version 644855 (0.0014) [2024-06-15 19:59:45,481][1652475] Updated weights for policy 0, policy_version 644896 (0.0023) [2024-06-15 19:59:45,738][1648984] Fps is (10 sec: 36044.1, 60 sec: 41506.0, 300 sec: 43320.4). Total num frames: 1320747008. Throughput: 0: 10660.9. Samples: 330267136. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 19:59:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 19:59:47,542][1652475] Updated weights for policy 0, policy_version 644976 (0.0149) [2024-06-15 19:59:50,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 1320943616. Throughput: 0: 10501.7. Samples: 330291200. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 19:59:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 19:59:52,973][1652475] Updated weights for policy 0, policy_version 645024 (0.0014) [2024-06-15 19:59:54,831][1652475] Updated weights for policy 0, policy_version 645104 (0.0011) [2024-06-15 19:59:55,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 42598.7, 300 sec: 43098.3). Total num frames: 1321205760. Throughput: 0: 10535.8. Samples: 330354688. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 19:59:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 19:59:56,718][1652475] Updated weights for policy 0, policy_version 645122 (0.0012) [2024-06-15 19:59:58,152][1652475] Updated weights for policy 0, policy_version 645177 (0.0018) [2024-06-15 20:00:00,436][1652475] Updated weights for policy 0, policy_version 645247 (0.0014) [2024-06-15 20:00:00,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 1321467904. Throughput: 0: 10547.2. Samples: 330415616. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:00:05,029][1652475] Updated weights for policy 0, policy_version 645305 (0.0013) [2024-06-15 20:00:05,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.9, 300 sec: 43209.3). Total num frames: 1321598976. Throughput: 0: 10592.7. Samples: 330455552. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:00:06,987][1652475] Updated weights for policy 0, policy_version 645371 (0.0013) [2024-06-15 20:00:09,710][1652475] Updated weights for policy 0, policy_version 645431 (0.0013) [2024-06-15 20:00:10,765][1648984] Fps is (10 sec: 39215.9, 60 sec: 42579.3, 300 sec: 43094.3). Total num frames: 1321861120. Throughput: 0: 10666.0. Samples: 330519552. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:10,765][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:00:11,928][1652475] Updated weights for policy 0, policy_version 645474 (0.0014) [2024-06-15 20:00:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1321992192. Throughput: 0: 10672.4. Samples: 330588672. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:00:16,863][1652475] Updated weights for policy 0, policy_version 645559 (0.0015) [2024-06-15 20:00:18,547][1651340] Signal inference workers to stop experience collection... (33200 times) [2024-06-15 20:00:18,601][1652475] InferenceWorker_p0-w0: stopping experience collection (33200 times) [2024-06-15 20:00:18,856][1651340] Signal inference workers to resume experience collection... (33200 times) [2024-06-15 20:00:18,857][1652475] InferenceWorker_p0-w0: resuming experience collection (33200 times) [2024-06-15 20:00:19,061][1652475] Updated weights for policy 0, policy_version 645624 (0.0013) [2024-06-15 20:00:20,738][1648984] Fps is (10 sec: 39428.2, 60 sec: 42052.2, 300 sec: 43098.3). Total num frames: 1322254336. Throughput: 0: 10888.5. Samples: 330620928. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:00:22,204][1652475] Updated weights for policy 0, policy_version 645690 (0.0014) [2024-06-15 20:00:23,807][1652475] Updated weights for policy 0, policy_version 645755 (0.0021) [2024-06-15 20:00:25,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1322516480. Throughput: 0: 10672.3. Samples: 330678272. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:00:29,012][1652475] Updated weights for policy 0, policy_version 645816 (0.0013) [2024-06-15 20:00:30,421][1652475] Updated weights for policy 0, policy_version 645856 (0.0017) [2024-06-15 20:00:30,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1322713088. Throughput: 0: 10774.8. Samples: 330752000. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:00:33,028][1652475] Updated weights for policy 0, policy_version 645890 (0.0038) [2024-06-15 20:00:34,579][1652475] Updated weights for policy 0, policy_version 645952 (0.0012) [2024-06-15 20:00:35,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 1322975232. Throughput: 0: 10934.1. Samples: 330783232. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:00:40,116][1652475] Updated weights for policy 0, policy_version 646032 (0.0014) [2024-06-15 20:00:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1323106304. Throughput: 0: 11013.7. Samples: 330850304. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:00:41,410][1652475] Updated weights for policy 0, policy_version 646079 (0.0015) [2024-06-15 20:00:43,117][1652475] Updated weights for policy 0, policy_version 646134 (0.0029) [2024-06-15 20:00:45,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.7, 300 sec: 42987.2). Total num frames: 1323335680. Throughput: 0: 11025.1. Samples: 330911744. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:00:46,192][1652475] Updated weights for policy 0, policy_version 646180 (0.0027) [2024-06-15 20:00:47,263][1652475] Updated weights for policy 0, policy_version 646240 (0.0013) [2024-06-15 20:00:50,774][1648984] Fps is (10 sec: 45707.8, 60 sec: 43664.2, 300 sec: 43092.9). Total num frames: 1323565056. Throughput: 0: 10834.2. Samples: 330943488. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:50,775][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:00:52,599][1652475] Updated weights for policy 0, policy_version 646320 (0.0014) [2024-06-15 20:00:54,382][1652475] Updated weights for policy 0, policy_version 646370 (0.0014) [2024-06-15 20:00:55,738][1648984] Fps is (10 sec: 49150.2, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 1323827200. Throughput: 0: 10883.6. Samples: 331009024. Policy #0 lag: (min: 36.0, avg: 154.0, max: 292.0) [2024-06-15 20:00:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:00:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000646400_1323827200.pth... [2024-06-15 20:00:55,783][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000641328_1313439744.pth [2024-06-15 20:00:57,967][1652475] Updated weights for policy 0, policy_version 646448 (0.0015) [2024-06-15 20:00:59,679][1652475] Updated weights for policy 0, policy_version 646512 (0.0015) [2024-06-15 20:01:00,738][1648984] Fps is (10 sec: 52621.3, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1324089344. Throughput: 0: 10808.9. Samples: 331075072. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:01:05,737][1648984] Fps is (10 sec: 32769.3, 60 sec: 42598.5, 300 sec: 42987.2). Total num frames: 1324154880. Throughput: 0: 10956.8. Samples: 331113984. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 20:01:06,181][1652475] Updated weights for policy 0, policy_version 646580 (0.0013) [2024-06-15 20:01:07,975][1652475] Updated weights for policy 0, policy_version 646647 (0.0012) [2024-06-15 20:01:09,639][1651340] Signal inference workers to stop experience collection... (33250 times) [2024-06-15 20:01:09,729][1652475] InferenceWorker_p0-w0: stopping experience collection (33250 times) [2024-06-15 20:01:09,873][1651340] Signal inference workers to resume experience collection... (33250 times) [2024-06-15 20:01:09,873][1652475] InferenceWorker_p0-w0: resuming experience collection (33250 times) [2024-06-15 20:01:10,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 43163.8, 300 sec: 42987.1). Total num frames: 1324449792. Throughput: 0: 10934.0. Samples: 331170304. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:01:10,886][1652475] Updated weights for policy 0, policy_version 646720 (0.0202) [2024-06-15 20:01:12,026][1652475] Updated weights for policy 0, policy_version 646783 (0.0077) [2024-06-15 20:01:15,738][1648984] Fps is (10 sec: 45874.5, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1324613632. Throughput: 0: 10740.6. Samples: 331235328. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:15,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 20:01:19,311][1652475] Updated weights for policy 0, policy_version 646851 (0.0015) [2024-06-15 20:01:20,459][1652475] Updated weights for policy 0, policy_version 646912 (0.0013) [2024-06-15 20:01:20,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1324875776. Throughput: 0: 10638.2. Samples: 331261952. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:01:22,255][1652475] Updated weights for policy 0, policy_version 646974 (0.0118) [2024-06-15 20:01:25,328][1652475] Updated weights for policy 0, policy_version 647035 (0.0023) [2024-06-15 20:01:25,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1325137920. Throughput: 0: 10672.4. Samples: 331330560. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:25,740][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 20:01:29,384][1652475] Updated weights for policy 0, policy_version 647097 (0.0012) [2024-06-15 20:01:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42598.5, 300 sec: 42654.0). Total num frames: 1325268992. Throughput: 0: 10786.1. Samples: 331397120. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:01:31,782][1652475] Updated weights for policy 0, policy_version 647152 (0.0013) [2024-06-15 20:01:33,103][1652475] Updated weights for policy 0, policy_version 647218 (0.0014) [2024-06-15 20:01:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1325531136. Throughput: 0: 10772.1. Samples: 331427840. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:01:37,943][1652475] Updated weights for policy 0, policy_version 647264 (0.0033) [2024-06-15 20:01:39,448][1652475] Updated weights for policy 0, policy_version 647298 (0.0015) [2024-06-15 20:01:40,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 1325760512. Throughput: 0: 10956.9. Samples: 331502080. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:01:40,795][1652475] Updated weights for policy 0, policy_version 647359 (0.0012) [2024-06-15 20:01:43,391][1652475] Updated weights for policy 0, policy_version 647409 (0.0013) [2024-06-15 20:01:44,856][1652475] Updated weights for policy 0, policy_version 647485 (0.0086) [2024-06-15 20:01:45,740][1648984] Fps is (10 sec: 52428.3, 60 sec: 45329.0, 300 sec: 43542.5). Total num frames: 1326055424. Throughput: 0: 10922.7. Samples: 331566592. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:45,741][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:01:50,359][1652475] Updated weights for policy 0, policy_version 647546 (0.0015) [2024-06-15 20:01:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43717.3, 300 sec: 43320.5). Total num frames: 1326186496. Throughput: 0: 10934.0. Samples: 331606016. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:50,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:01:54,105][1652475] Updated weights for policy 0, policy_version 647622 (0.0014) [2024-06-15 20:01:54,633][1651340] Signal inference workers to stop experience collection... (33300 times) [2024-06-15 20:01:54,695][1652475] InferenceWorker_p0-w0: stopping experience collection (33300 times) [2024-06-15 20:01:54,821][1651340] Signal inference workers to resume experience collection... (33300 times) [2024-06-15 20:01:54,822][1652475] InferenceWorker_p0-w0: resuming experience collection (33300 times) [2024-06-15 20:01:55,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.8, 300 sec: 43098.2). Total num frames: 1326448640. Throughput: 0: 10899.9. Samples: 331660800. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:01:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:01:56,899][1652475] Updated weights for policy 0, policy_version 647696 (0.0014) [2024-06-15 20:02:00,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43320.4). Total num frames: 1326579712. Throughput: 0: 10865.8. Samples: 331724288. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:02:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:02:00,899][1652475] Updated weights for policy 0, policy_version 647747 (0.0013) [2024-06-15 20:02:05,328][1652475] Updated weights for policy 0, policy_version 647840 (0.0031) [2024-06-15 20:02:05,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 44236.7, 300 sec: 42653.9). Total num frames: 1326809088. Throughput: 0: 11002.3. Samples: 331757056. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:02:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:02:06,938][1652475] Updated weights for policy 0, policy_version 647920 (0.0014) [2024-06-15 20:02:09,801][1652475] Updated weights for policy 0, policy_version 647941 (0.0011) [2024-06-15 20:02:10,738][1648984] Fps is (10 sec: 45874.2, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 1327038464. Throughput: 0: 10922.6. Samples: 331822080. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:02:10,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:02:11,164][1652475] Updated weights for policy 0, policy_version 648000 (0.0013) [2024-06-15 20:02:15,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1327235072. Throughput: 0: 10956.8. Samples: 331890176. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:02:15,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:02:16,406][1652475] Updated weights for policy 0, policy_version 648067 (0.0110) [2024-06-15 20:02:17,665][1652475] Updated weights for policy 0, policy_version 648128 (0.0035) [2024-06-15 20:02:19,854][1652475] Updated weights for policy 0, policy_version 648192 (0.0012) [2024-06-15 20:02:20,738][1648984] Fps is (10 sec: 45876.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1327497216. Throughput: 0: 11002.3. Samples: 331922944. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:02:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:02:23,797][1652475] Updated weights for policy 0, policy_version 648272 (0.0015) [2024-06-15 20:02:25,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1327759360. Throughput: 0: 10615.5. Samples: 331979776. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:02:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:02:28,064][1652475] Updated weights for policy 0, policy_version 648322 (0.0026) [2024-06-15 20:02:29,311][1652475] Updated weights for policy 0, policy_version 648384 (0.0012) [2024-06-15 20:02:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1327890432. Throughput: 0: 10877.2. Samples: 332056064. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:02:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:02:32,033][1652475] Updated weights for policy 0, policy_version 648448 (0.0019) [2024-06-15 20:02:35,732][1652475] Updated weights for policy 0, policy_version 648528 (0.0012) [2024-06-15 20:02:35,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 1328185344. Throughput: 0: 10740.6. Samples: 332089344. Policy #0 lag: (min: 47.0, avg: 184.3, max: 303.0) [2024-06-15 20:02:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:02:36,796][1652475] Updated weights for policy 0, policy_version 648572 (0.0012) [2024-06-15 20:02:40,260][1652475] Updated weights for policy 0, policy_version 648612 (0.0012) [2024-06-15 20:02:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 1328414720. Throughput: 0: 11036.5. Samples: 332157440. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:02:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:02:42,906][1651340] Signal inference workers to stop experience collection... (33350 times) [2024-06-15 20:02:42,949][1652475] InferenceWorker_p0-w0: stopping experience collection (33350 times) [2024-06-15 20:02:43,247][1651340] Signal inference workers to resume experience collection... (33350 times) [2024-06-15 20:02:43,248][1652475] InferenceWorker_p0-w0: resuming experience collection (33350 times) [2024-06-15 20:02:43,453][1652475] Updated weights for policy 0, policy_version 648676 (0.0017) [2024-06-15 20:02:45,737][1648984] Fps is (10 sec: 36045.2, 60 sec: 41506.3, 300 sec: 43098.3). Total num frames: 1328545792. Throughput: 0: 11082.0. Samples: 332222976. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:02:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:02:46,518][1652475] Updated weights for policy 0, policy_version 648736 (0.0128) [2024-06-15 20:02:48,691][1652475] Updated weights for policy 0, policy_version 648831 (0.0014) [2024-06-15 20:02:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1328807936. Throughput: 0: 10888.5. Samples: 332247040. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:02:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:02:52,124][1652475] Updated weights for policy 0, policy_version 648867 (0.0014) [2024-06-15 20:02:55,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1328971776. Throughput: 0: 11138.9. Samples: 332323328. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:02:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:02:55,772][1652475] Updated weights for policy 0, policy_version 648928 (0.0016) [2024-06-15 20:02:56,161][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000648944_1329037312.pth... [2024-06-15 20:02:56,195][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000643904_1318715392.pth [2024-06-15 20:02:58,439][1652475] Updated weights for policy 0, policy_version 648998 (0.0022) [2024-06-15 20:03:00,139][1652475] Updated weights for policy 0, policy_version 649072 (0.0034) [2024-06-15 20:03:00,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 45875.1, 300 sec: 43542.6). Total num frames: 1329332224. Throughput: 0: 10922.7. Samples: 332381696. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:00,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:03:03,998][1652475] Updated weights for policy 0, policy_version 649150 (0.0015) [2024-06-15 20:03:05,738][1648984] Fps is (10 sec: 49150.0, 60 sec: 44236.4, 300 sec: 43320.3). Total num frames: 1329463296. Throughput: 0: 11161.5. Samples: 332425216. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:05,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:03:07,898][1652475] Updated weights for policy 0, policy_version 649214 (0.0243) [2024-06-15 20:03:10,195][1652475] Updated weights for policy 0, policy_version 649267 (0.0012) [2024-06-15 20:03:10,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 1329725440. Throughput: 0: 11343.6. Samples: 332490240. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:03:11,347][1652475] Updated weights for policy 0, policy_version 649314 (0.0013) [2024-06-15 20:03:14,875][1652475] Updated weights for policy 0, policy_version 649376 (0.0014) [2024-06-15 20:03:15,738][1648984] Fps is (10 sec: 52430.9, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 1329987584. Throughput: 0: 11286.7. Samples: 332563968. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:03:17,547][1652475] Updated weights for policy 0, policy_version 649426 (0.0017) [2024-06-15 20:03:18,494][1652475] Updated weights for policy 0, policy_version 649471 (0.0013) [2024-06-15 20:03:20,743][1648984] Fps is (10 sec: 45852.7, 60 sec: 44779.2, 300 sec: 43652.9). Total num frames: 1330184192. Throughput: 0: 11228.6. Samples: 332594688. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:20,743][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:03:20,943][1652475] Updated weights for policy 0, policy_version 649522 (0.0014) [2024-06-15 20:03:22,221][1652475] Updated weights for policy 0, policy_version 649573 (0.0012) [2024-06-15 20:03:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1330380800. Throughput: 0: 11286.7. Samples: 332665344. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:03:27,865][1651340] Signal inference workers to stop experience collection... (33400 times) [2024-06-15 20:03:27,911][1652475] InferenceWorker_p0-w0: stopping experience collection (33400 times) [2024-06-15 20:03:28,136][1651340] Signal inference workers to resume experience collection... (33400 times) [2024-06-15 20:03:28,144][1652475] InferenceWorker_p0-w0: resuming experience collection (33400 times) [2024-06-15 20:03:28,349][1652475] Updated weights for policy 0, policy_version 649652 (0.0013) [2024-06-15 20:03:29,754][1652475] Updated weights for policy 0, policy_version 649719 (0.0015) [2024-06-15 20:03:30,760][1648984] Fps is (10 sec: 45796.6, 60 sec: 45858.3, 300 sec: 43539.3). Total num frames: 1330642944. Throughput: 0: 11247.1. Samples: 332729344. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:30,761][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:03:31,965][1652475] Updated weights for policy 0, policy_version 649780 (0.0095) [2024-06-15 20:03:34,379][1652475] Updated weights for policy 0, policy_version 649847 (0.0134) [2024-06-15 20:03:35,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 45329.1, 300 sec: 43875.9). Total num frames: 1330905088. Throughput: 0: 11468.8. Samples: 332763136. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:03:39,314][1652475] Updated weights for policy 0, policy_version 649891 (0.0050) [2024-06-15 20:03:40,738][1648984] Fps is (10 sec: 39408.7, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1331036160. Throughput: 0: 11309.5. Samples: 332832256. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 20:03:40,985][1652475] Updated weights for policy 0, policy_version 649938 (0.0014) [2024-06-15 20:03:42,867][1652475] Updated weights for policy 0, policy_version 650019 (0.0130) [2024-06-15 20:03:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 45875.1, 300 sec: 43542.6). Total num frames: 1331298304. Throughput: 0: 11559.9. Samples: 332901888. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:03:46,531][1652475] Updated weights for policy 0, policy_version 650064 (0.0012) [2024-06-15 20:03:49,776][1652475] Updated weights for policy 0, policy_version 650144 (0.0127) [2024-06-15 20:03:50,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 43764.8). Total num frames: 1331560448. Throughput: 0: 11377.9. Samples: 332937216. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:03:52,148][1652475] Updated weights for policy 0, policy_version 650197 (0.0014) [2024-06-15 20:03:53,685][1652475] Updated weights for policy 0, policy_version 650261 (0.0015) [2024-06-15 20:03:54,587][1652475] Updated weights for policy 0, policy_version 650299 (0.0013) [2024-06-15 20:03:55,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 47513.6, 300 sec: 43875.8). Total num frames: 1331822592. Throughput: 0: 11298.1. Samples: 332998656. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:03:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:03:59,234][1652475] Updated weights for policy 0, policy_version 650367 (0.0013) [2024-06-15 20:04:00,738][1648984] Fps is (10 sec: 39319.7, 60 sec: 43690.4, 300 sec: 43986.9). Total num frames: 1331953664. Throughput: 0: 11377.7. Samples: 333075968. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:04:00,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:04:01,908][1652475] Updated weights for policy 0, policy_version 650432 (0.0015) [2024-06-15 20:04:03,909][1652475] Updated weights for policy 0, policy_version 650490 (0.0014) [2024-06-15 20:04:05,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 46421.7, 300 sec: 43875.8). Total num frames: 1332248576. Throughput: 0: 11356.3. Samples: 333105664. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:04:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:04:06,691][1652475] Updated weights for policy 0, policy_version 650551 (0.0012) [2024-06-15 20:04:10,717][1652475] Updated weights for policy 0, policy_version 650593 (0.0033) [2024-06-15 20:04:10,738][1648984] Fps is (10 sec: 45877.7, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 1332412416. Throughput: 0: 11184.4. Samples: 333168640. Policy #0 lag: (min: 107.0, avg: 193.5, max: 315.0) [2024-06-15 20:04:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:04:13,881][1651340] Signal inference workers to stop experience collection... (33450 times) [2024-06-15 20:04:13,963][1652475] InferenceWorker_p0-w0: stopping experience collection (33450 times) [2024-06-15 20:04:14,092][1651340] Signal inference workers to resume experience collection... (33450 times) [2024-06-15 20:04:14,092][1652475] InferenceWorker_p0-w0: resuming experience collection (33450 times) [2024-06-15 20:04:14,094][1652475] Updated weights for policy 0, policy_version 650656 (0.0043) [2024-06-15 20:04:15,567][1652475] Updated weights for policy 0, policy_version 650727 (0.0014) [2024-06-15 20:04:15,738][1648984] Fps is (10 sec: 45874.1, 60 sec: 45329.0, 300 sec: 43986.8). Total num frames: 1332707328. Throughput: 0: 11189.8. Samples: 333232640. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:04:15,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:04:19,262][1652475] Updated weights for policy 0, policy_version 650787 (0.0013) [2024-06-15 20:04:20,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44786.6, 300 sec: 43875.8). Total num frames: 1332871168. Throughput: 0: 11229.9. Samples: 333268480. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:04:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:04:21,617][1652475] Updated weights for policy 0, policy_version 650864 (0.0013) [2024-06-15 20:04:25,527][1652475] Updated weights for policy 0, policy_version 650928 (0.0014) [2024-06-15 20:04:25,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 45875.2, 300 sec: 44209.0). Total num frames: 1333133312. Throughput: 0: 11150.2. Samples: 333334016. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:04:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:04:29,980][1652475] Updated weights for policy 0, policy_version 651001 (0.0014) [2024-06-15 20:04:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43706.8, 300 sec: 43653.6). Total num frames: 1333264384. Throughput: 0: 10922.7. Samples: 333393408. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:04:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:04:32,115][1652475] Updated weights for policy 0, policy_version 651056 (0.0013) [2024-06-15 20:04:34,140][1652475] Updated weights for policy 0, policy_version 651127 (0.0014) [2024-06-15 20:04:35,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43986.9). Total num frames: 1333526528. Throughput: 0: 10797.5. Samples: 333423104. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:04:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:04:38,439][1652475] Updated weights for policy 0, policy_version 651184 (0.0014) [2024-06-15 20:04:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 1333657600. Throughput: 0: 10877.2. Samples: 333488128. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:04:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:04:42,439][1652475] Updated weights for policy 0, policy_version 651248 (0.0012) [2024-06-15 20:04:44,714][1652475] Updated weights for policy 0, policy_version 651328 (0.0071) [2024-06-15 20:04:45,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 44209.1). Total num frames: 1333985280. Throughput: 0: 10570.1. Samples: 333551616. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:04:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:04:46,322][1652475] Updated weights for policy 0, policy_version 651392 (0.0013) [2024-06-15 20:04:50,490][1652475] Updated weights for policy 0, policy_version 651446 (0.0011) [2024-06-15 20:04:50,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 1334181888. Throughput: 0: 10638.2. Samples: 333584384. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:04:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:04:54,562][1652475] Updated weights for policy 0, policy_version 651519 (0.0030) [2024-06-15 20:04:55,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 1334345728. Throughput: 0: 10843.0. Samples: 333656576. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:04:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:04:56,252][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000651568_1334411264.pth... [2024-06-15 20:04:56,392][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000646400_1323827200.pth [2024-06-15 20:04:57,052][1652475] Updated weights for policy 0, policy_version 651589 (0.0014) [2024-06-15 20:04:57,774][1651340] Signal inference workers to stop experience collection... (33500 times) [2024-06-15 20:04:57,822][1652475] InferenceWorker_p0-w0: stopping experience collection (33500 times) [2024-06-15 20:04:58,025][1651340] Signal inference workers to resume experience collection... (33500 times) [2024-06-15 20:04:58,026][1652475] InferenceWorker_p0-w0: resuming experience collection (33500 times) [2024-06-15 20:05:00,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43691.1, 300 sec: 43986.9). Total num frames: 1334575104. Throughput: 0: 10672.4. Samples: 333712896. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:05:02,012][1652475] Updated weights for policy 0, policy_version 651664 (0.0014) [2024-06-15 20:05:05,754][1648984] Fps is (10 sec: 35985.1, 60 sec: 40948.6, 300 sec: 43544.1). Total num frames: 1334706176. Throughput: 0: 10588.8. Samples: 333745152. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:05,755][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:05:05,969][1652475] Updated weights for policy 0, policy_version 651728 (0.0013) [2024-06-15 20:05:08,349][1652475] Updated weights for policy 0, policy_version 651808 (0.0014) [2024-06-15 20:05:10,238][1652475] Updated weights for policy 0, policy_version 651876 (0.0116) [2024-06-15 20:05:10,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 44236.7, 300 sec: 44320.1). Total num frames: 1335066624. Throughput: 0: 10478.9. Samples: 333805568. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:05:14,489][1652475] Updated weights for policy 0, policy_version 651921 (0.0013) [2024-06-15 20:05:15,738][1648984] Fps is (10 sec: 52516.4, 60 sec: 42052.5, 300 sec: 43986.9). Total num frames: 1335230464. Throughput: 0: 10581.3. Samples: 333869568. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:05:18,408][1652475] Updated weights for policy 0, policy_version 651974 (0.0017) [2024-06-15 20:05:20,383][1652475] Updated weights for policy 0, policy_version 652048 (0.0014) [2024-06-15 20:05:20,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 1335394304. Throughput: 0: 10854.4. Samples: 333911552. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:05:22,506][1652475] Updated weights for policy 0, policy_version 652128 (0.0113) [2024-06-15 20:05:25,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41506.1, 300 sec: 43764.7). Total num frames: 1335623680. Throughput: 0: 10615.5. Samples: 333965824. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:05:27,463][1652475] Updated weights for policy 0, policy_version 652198 (0.0013) [2024-06-15 20:05:30,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 41506.0, 300 sec: 43320.4). Total num frames: 1335754752. Throughput: 0: 10740.6. Samples: 334034944. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:05:31,559][1652475] Updated weights for policy 0, policy_version 652256 (0.0072) [2024-06-15 20:05:33,021][1652475] Updated weights for policy 0, policy_version 652306 (0.0105) [2024-06-15 20:05:35,155][1652475] Updated weights for policy 0, policy_version 652400 (0.0130) [2024-06-15 20:05:35,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 43690.5, 300 sec: 44209.0). Total num frames: 1336147968. Throughput: 0: 10592.6. Samples: 334061056. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:35,739][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 20:05:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 1336147968. Throughput: 0: 10410.7. Samples: 334125056. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:05:42,576][1652475] Updated weights for policy 0, policy_version 652475 (0.0012) [2024-06-15 20:05:44,921][1652475] Updated weights for policy 0, policy_version 652544 (0.0010) [2024-06-15 20:05:45,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.0, 300 sec: 43770.1). Total num frames: 1336475648. Throughput: 0: 10365.1. Samples: 334179328. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:45,739][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 20:05:46,435][1652475] Updated weights for policy 0, policy_version 652608 (0.0113) [2024-06-15 20:05:46,967][1651340] Signal inference workers to stop experience collection... (33550 times) [2024-06-15 20:05:47,016][1652475] InferenceWorker_p0-w0: stopping experience collection (33550 times) [2024-06-15 20:05:47,212][1651340] Signal inference workers to resume experience collection... (33550 times) [2024-06-15 20:05:47,212][1652475] InferenceWorker_p0-w0: resuming experience collection (33550 times) [2024-06-15 20:05:48,349][1652475] Updated weights for policy 0, policy_version 652666 (0.0013) [2024-06-15 20:05:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1336672256. Throughput: 0: 10221.0. Samples: 334204928. Policy #0 lag: (min: 15.0, avg: 83.4, max: 207.0) [2024-06-15 20:05:50,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:05:55,501][1652475] Updated weights for policy 0, policy_version 652720 (0.0011) [2024-06-15 20:05:55,738][1648984] Fps is (10 sec: 29492.0, 60 sec: 40413.9, 300 sec: 42987.2). Total num frames: 1336770560. Throughput: 0: 10478.9. Samples: 334277120. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:05:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:05:57,883][1652475] Updated weights for policy 0, policy_version 652817 (0.0028) [2024-06-15 20:05:58,833][1652475] Updated weights for policy 0, policy_version 652864 (0.0014) [2024-06-15 20:06:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 43764.7). Total num frames: 1337065472. Throughput: 0: 10274.1. Samples: 334331904. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:06:02,857][1652475] Updated weights for policy 0, policy_version 652924 (0.0015) [2024-06-15 20:06:05,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 41517.6, 300 sec: 43209.4). Total num frames: 1337196544. Throughput: 0: 9989.7. Samples: 334361088. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:06:07,253][1652475] Updated weights for policy 0, policy_version 652976 (0.0115) [2024-06-15 20:06:08,818][1652475] Updated weights for policy 0, policy_version 653027 (0.0042) [2024-06-15 20:06:10,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 41506.2, 300 sec: 43875.8). Total num frames: 1337556992. Throughput: 0: 10296.9. Samples: 334429184. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:06:10,887][1652475] Updated weights for policy 0, policy_version 653110 (0.0012) [2024-06-15 20:06:15,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 39867.7, 300 sec: 43209.3). Total num frames: 1337622528. Throughput: 0: 10274.2. Samples: 334497280. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:06:15,809][1652475] Updated weights for policy 0, policy_version 653152 (0.0013) [2024-06-15 20:06:18,452][1652475] Updated weights for policy 0, policy_version 653202 (0.0018) [2024-06-15 20:06:20,301][1652475] Updated weights for policy 0, policy_version 653250 (0.0013) [2024-06-15 20:06:20,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 1337884672. Throughput: 0: 10331.1. Samples: 334525952. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:06:21,761][1652475] Updated weights for policy 0, policy_version 653312 (0.0017) [2024-06-15 20:06:25,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 1338114048. Throughput: 0: 10217.3. Samples: 334584832. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:06:27,690][1652475] Updated weights for policy 0, policy_version 653392 (0.0015) [2024-06-15 20:06:30,738][1648984] Fps is (10 sec: 36043.9, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 1338245120. Throughput: 0: 10501.7. Samples: 334651904. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:30,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:06:32,076][1652475] Updated weights for policy 0, policy_version 653495 (0.0022) [2024-06-15 20:06:34,170][1652475] Updated weights for policy 0, policy_version 653561 (0.0016) [2024-06-15 20:06:35,080][1651340] Signal inference workers to stop experience collection... (33600 times) [2024-06-15 20:06:35,133][1652475] InferenceWorker_p0-w0: stopping experience collection (33600 times) [2024-06-15 20:06:35,384][1651340] Signal inference workers to resume experience collection... (33600 times) [2024-06-15 20:06:35,385][1652475] InferenceWorker_p0-w0: resuming experience collection (33600 times) [2024-06-15 20:06:35,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 40414.1, 300 sec: 43431.5). Total num frames: 1338572800. Throughput: 0: 10581.4. Samples: 334681088. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:06:35,962][1652475] Updated weights for policy 0, policy_version 653620 (0.0134) [2024-06-15 20:06:39,884][1652475] Updated weights for policy 0, policy_version 653650 (0.0012) [2024-06-15 20:06:40,738][1648984] Fps is (10 sec: 49153.1, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1338736640. Throughput: 0: 10422.0. Samples: 334746112. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:06:44,615][1652475] Updated weights for policy 0, policy_version 653717 (0.0013) [2024-06-15 20:06:45,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 40414.1, 300 sec: 43098.3). Total num frames: 1338900480. Throughput: 0: 10615.5. Samples: 334809600. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:06:45,885][1652475] Updated weights for policy 0, policy_version 653776 (0.0013) [2024-06-15 20:06:48,549][1652475] Updated weights for policy 0, policy_version 653825 (0.0011) [2024-06-15 20:06:50,010][1652475] Updated weights for policy 0, policy_version 653887 (0.0014) [2024-06-15 20:06:50,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1339162624. Throughput: 0: 10501.7. Samples: 334833664. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:50,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:06:52,530][1652475] Updated weights for policy 0, policy_version 653949 (0.0011) [2024-06-15 20:06:55,738][1648984] Fps is (10 sec: 39319.8, 60 sec: 42051.9, 300 sec: 43098.2). Total num frames: 1339293696. Throughput: 0: 10387.8. Samples: 334896640. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:06:55,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:06:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000653952_1339293696.pth... [2024-06-15 20:06:56,007][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000648944_1329037312.pth [2024-06-15 20:06:57,375][1652475] Updated weights for policy 0, policy_version 654013 (0.0103) [2024-06-15 20:06:59,953][1652475] Updated weights for policy 0, policy_version 654076 (0.0110) [2024-06-15 20:07:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 1339555840. Throughput: 0: 10331.0. Samples: 334962176. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:07:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:07:02,370][1652475] Updated weights for policy 0, policy_version 654140 (0.0084) [2024-06-15 20:07:03,964][1652475] Updated weights for policy 0, policy_version 654192 (0.0013) [2024-06-15 20:07:05,738][1648984] Fps is (10 sec: 52431.2, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 1339817984. Throughput: 0: 10365.2. Samples: 334992384. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:07:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:07:09,916][1652475] Updated weights for policy 0, policy_version 654270 (0.0015) [2024-06-15 20:07:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 39867.8, 300 sec: 43098.3). Total num frames: 1339949056. Throughput: 0: 10638.2. Samples: 335063552. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:07:10,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:07:12,417][1652475] Updated weights for policy 0, policy_version 654328 (0.0016) [2024-06-15 20:07:13,497][1652475] Updated weights for policy 0, policy_version 654368 (0.0015) [2024-06-15 20:07:15,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1340243968. Throughput: 0: 10467.6. Samples: 335122944. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:07:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:07:16,177][1652475] Updated weights for policy 0, policy_version 654448 (0.0014) [2024-06-15 20:07:20,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 42765.0). Total num frames: 1340375040. Throughput: 0: 10535.8. Samples: 335155200. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:07:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:07:21,598][1652475] Updated weights for policy 0, policy_version 654523 (0.0018) [2024-06-15 20:07:24,846][1652475] Updated weights for policy 0, policy_version 654576 (0.0050) [2024-06-15 20:07:25,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 1340637184. Throughput: 0: 10581.3. Samples: 335222272. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:07:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:07:25,789][1651340] Signal inference workers to stop experience collection... (33650 times) [2024-06-15 20:07:25,840][1652475] InferenceWorker_p0-w0: stopping experience collection (33650 times) [2024-06-15 20:07:25,980][1651340] Signal inference workers to resume experience collection... (33650 times) [2024-06-15 20:07:25,981][1652475] InferenceWorker_p0-w0: resuming experience collection (33650 times) [2024-06-15 20:07:26,181][1652475] Updated weights for policy 0, policy_version 654628 (0.0015) [2024-06-15 20:07:28,635][1652475] Updated weights for policy 0, policy_version 654711 (0.0014) [2024-06-15 20:07:30,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.9, 300 sec: 42987.2). Total num frames: 1340866560. Throughput: 0: 10490.3. Samples: 335281664. Policy #0 lag: (min: 15.0, avg: 53.1, max: 207.0) [2024-06-15 20:07:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:07:33,144][1652475] Updated weights for policy 0, policy_version 654773 (0.0015) [2024-06-15 20:07:35,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 40413.8, 300 sec: 42653.9). Total num frames: 1340997632. Throughput: 0: 10774.7. Samples: 335318528. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:07:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:07:36,014][1652475] Updated weights for policy 0, policy_version 654807 (0.0014) [2024-06-15 20:07:37,848][1652475] Updated weights for policy 0, policy_version 654884 (0.0013) [2024-06-15 20:07:40,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 1341325312. Throughput: 0: 10797.6. Samples: 335382528. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:07:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:07:40,920][1652475] Updated weights for policy 0, policy_version 654960 (0.0017) [2024-06-15 20:07:45,065][1652475] Updated weights for policy 0, policy_version 655024 (0.0014) [2024-06-15 20:07:45,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1341521920. Throughput: 0: 10854.4. Samples: 335450624. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:07:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:07:48,750][1652475] Updated weights for policy 0, policy_version 655104 (0.0013) [2024-06-15 20:07:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1341784064. Throughput: 0: 10990.9. Samples: 335486976. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:07:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:07:51,761][1652475] Updated weights for policy 0, policy_version 655173 (0.0014) [2024-06-15 20:07:55,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43690.9, 300 sec: 42653.9). Total num frames: 1341915136. Throughput: 0: 10774.7. Samples: 335548416. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:07:55,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:07:56,835][1652475] Updated weights for policy 0, policy_version 655233 (0.0014) [2024-06-15 20:07:59,192][1652475] Updated weights for policy 0, policy_version 655312 (0.0014) [2024-06-15 20:08:00,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 43209.4). Total num frames: 1342210048. Throughput: 0: 10865.8. Samples: 335611904. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:08:01,542][1652475] Updated weights for policy 0, policy_version 655417 (0.0017) [2024-06-15 20:08:05,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 1342439424. Throughput: 0: 10808.8. Samples: 335641600. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:05,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:08:10,534][1652475] Updated weights for policy 0, policy_version 655504 (0.0108) [2024-06-15 20:08:10,738][1648984] Fps is (10 sec: 26214.3, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 1342472192. Throughput: 0: 10831.6. Samples: 335709696. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:08:12,309][1652475] Updated weights for policy 0, policy_version 655584 (0.0012) [2024-06-15 20:08:12,824][1651340] Signal inference workers to stop experience collection... (33700 times) [2024-06-15 20:08:12,880][1652475] InferenceWorker_p0-w0: stopping experience collection (33700 times) [2024-06-15 20:08:13,020][1651340] Signal inference workers to resume experience collection... (33700 times) [2024-06-15 20:08:13,021][1652475] InferenceWorker_p0-w0: resuming experience collection (33700 times) [2024-06-15 20:08:13,932][1652475] Updated weights for policy 0, policy_version 655655 (0.0011) [2024-06-15 20:08:15,738][1648984] Fps is (10 sec: 39322.7, 60 sec: 43144.5, 300 sec: 42876.8). Total num frames: 1342832640. Throughput: 0: 10729.2. Samples: 335764480. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:15,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:08:17,966][1652475] Updated weights for policy 0, policy_version 655701 (0.0015) [2024-06-15 20:08:20,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 43144.6, 300 sec: 42654.0). Total num frames: 1342963712. Throughput: 0: 10638.3. Samples: 335797248. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:20,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:08:21,668][1652475] Updated weights for policy 0, policy_version 655767 (0.0026) [2024-06-15 20:08:23,444][1652475] Updated weights for policy 0, policy_version 655811 (0.0048) [2024-06-15 20:08:24,247][1652475] Updated weights for policy 0, policy_version 655859 (0.0016) [2024-06-15 20:08:25,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44236.9, 300 sec: 42879.3). Total num frames: 1343291392. Throughput: 0: 10786.2. Samples: 335867904. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:08:25,750][1652475] Updated weights for policy 0, policy_version 655920 (0.0014) [2024-06-15 20:08:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1343356928. Throughput: 0: 10808.9. Samples: 335937024. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:30,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 20:08:31,733][1652475] Updated weights for policy 0, policy_version 655971 (0.0033) [2024-06-15 20:08:32,885][1652475] Updated weights for policy 0, policy_version 656016 (0.0012) [2024-06-15 20:08:34,637][1652475] Updated weights for policy 0, policy_version 656065 (0.0022) [2024-06-15 20:08:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45329.2, 300 sec: 42987.2). Total num frames: 1343717376. Throughput: 0: 10672.4. Samples: 335967232. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:35,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:08:35,742][1652475] Updated weights for policy 0, policy_version 656124 (0.0012) [2024-06-15 20:08:37,247][1652475] Updated weights for policy 0, policy_version 656184 (0.0101) [2024-06-15 20:08:40,738][1648984] Fps is (10 sec: 52427.0, 60 sec: 42598.2, 300 sec: 42653.9). Total num frames: 1343881216. Throughput: 0: 10774.7. Samples: 336033280. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:40,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:08:44,256][1652475] Updated weights for policy 0, policy_version 656240 (0.0015) [2024-06-15 20:08:45,304][1652475] Updated weights for policy 0, policy_version 656273 (0.0013) [2024-06-15 20:08:45,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1344077824. Throughput: 0: 10934.0. Samples: 336103936. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:45,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:08:47,045][1652475] Updated weights for policy 0, policy_version 656353 (0.0013) [2024-06-15 20:08:48,334][1652475] Updated weights for policy 0, policy_version 656390 (0.0013) [2024-06-15 20:08:50,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1344405504. Throughput: 0: 10877.2. Samples: 336131072. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:50,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:08:55,143][1652475] Updated weights for policy 0, policy_version 656470 (0.0016) [2024-06-15 20:08:55,738][1648984] Fps is (10 sec: 42597.4, 60 sec: 43144.4, 300 sec: 42542.9). Total num frames: 1344503808. Throughput: 0: 10831.6. Samples: 336197120. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:08:55,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:08:55,874][1652475] Updated weights for policy 0, policy_version 656511 (0.0013) [2024-06-15 20:08:55,884][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000656512_1344536576.pth... [2024-06-15 20:08:55,949][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000651568_1334411264.pth [2024-06-15 20:08:59,820][1652475] Updated weights for policy 0, policy_version 656564 (0.0011) [2024-06-15 20:09:00,457][1651340] Signal inference workers to stop experience collection... (33750 times) [2024-06-15 20:09:00,554][1652475] InferenceWorker_p0-w0: stopping experience collection (33750 times) [2024-06-15 20:09:00,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1344700416. Throughput: 0: 11059.2. Samples: 336262144. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:09:00,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 20:09:00,795][1651340] Signal inference workers to resume experience collection... (33750 times) [2024-06-15 20:09:00,796][1652475] InferenceWorker_p0-w0: resuming experience collection (33750 times) [2024-06-15 20:09:01,199][1652475] Updated weights for policy 0, policy_version 656624 (0.0086) [2024-06-15 20:09:03,373][1652475] Updated weights for policy 0, policy_version 656704 (0.0042) [2024-06-15 20:09:05,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 41506.3, 300 sec: 42431.8). Total num frames: 1344929792. Throughput: 0: 10706.5. Samples: 336279040. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:09:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 20:09:10,735][1652475] Updated weights for policy 0, policy_version 656771 (0.0014) [2024-06-15 20:09:10,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 43144.5, 300 sec: 41876.4). Total num frames: 1345060864. Throughput: 0: 10638.2. Samples: 336346624. Policy #0 lag: (min: 9.0, avg: 103.2, max: 265.0) [2024-06-15 20:09:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:09:12,139][1652475] Updated weights for policy 0, policy_version 656832 (0.0012) [2024-06-15 20:09:14,974][1652475] Updated weights for policy 0, policy_version 656912 (0.0110) [2024-06-15 20:09:15,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43144.6, 300 sec: 42542.9). Total num frames: 1345421312. Throughput: 0: 10422.0. Samples: 336406016. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:09:15,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:09:19,925][1652475] Updated weights for policy 0, policy_version 656992 (0.0014) [2024-06-15 20:09:20,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 42209.6). Total num frames: 1345585152. Throughput: 0: 10490.3. Samples: 336439296. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:09:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:09:22,379][1652475] Updated weights for policy 0, policy_version 657040 (0.0014) [2024-06-15 20:09:23,724][1652475] Updated weights for policy 0, policy_version 657088 (0.0013) [2024-06-15 20:09:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 1345781760. Throughput: 0: 10558.6. Samples: 336508416. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:09:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:09:26,088][1652475] Updated weights for policy 0, policy_version 657149 (0.0018) [2024-06-15 20:09:28,214][1652475] Updated weights for policy 0, policy_version 657216 (0.0013) [2024-06-15 20:09:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1345978368. Throughput: 0: 10422.1. Samples: 336572928. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:09:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:09:33,910][1652475] Updated weights for policy 0, policy_version 657282 (0.0014) [2024-06-15 20:09:35,404][1652475] Updated weights for policy 0, policy_version 657344 (0.0013) [2024-06-15 20:09:35,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42052.3, 300 sec: 42654.0). Total num frames: 1346240512. Throughput: 0: 10558.6. Samples: 336606208. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:09:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:09:37,790][1652475] Updated weights for policy 0, policy_version 657394 (0.0013) [2024-06-15 20:09:39,514][1652475] Updated weights for policy 0, policy_version 657456 (0.0013) [2024-06-15 20:09:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.9, 300 sec: 42431.8). Total num frames: 1346502656. Throughput: 0: 10524.5. Samples: 336670720. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:09:40,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:09:43,409][1652475] Updated weights for policy 0, policy_version 657505 (0.0012) [2024-06-15 20:09:45,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43144.6, 300 sec: 42320.7). Total num frames: 1346666496. Throughput: 0: 10695.1. Samples: 336743424. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:09:45,741][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:09:46,690][1652475] Updated weights for policy 0, policy_version 657593 (0.0013) [2024-06-15 20:09:49,401][1651340] Signal inference workers to stop experience collection... (33800 times) [2024-06-15 20:09:49,457][1652475] InferenceWorker_p0-w0: stopping experience collection (33800 times) [2024-06-15 20:09:49,721][1651340] Signal inference workers to resume experience collection... (33800 times) [2024-06-15 20:09:49,724][1652475] InferenceWorker_p0-w0: resuming experience collection (33800 times) [2024-06-15 20:09:49,908][1652475] Updated weights for policy 0, policy_version 657654 (0.0012) [2024-06-15 20:09:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 42542.9). Total num frames: 1346895872. Throughput: 0: 11036.4. Samples: 336775680. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:09:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:09:51,974][1652475] Updated weights for policy 0, policy_version 657720 (0.0013) [2024-06-15 20:09:55,606][1652475] Updated weights for policy 0, policy_version 657763 (0.0014) [2024-06-15 20:09:55,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43144.7, 300 sec: 42431.8). Total num frames: 1347092480. Throughput: 0: 10968.2. Samples: 336840192. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:09:55,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:09:56,120][1652475] Updated weights for policy 0, policy_version 657790 (0.0013) [2024-06-15 20:09:57,914][1652475] Updated weights for policy 0, policy_version 657842 (0.0013) [2024-06-15 20:10:00,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 42767.4). Total num frames: 1347321856. Throughput: 0: 11286.8. Samples: 336913920. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:10:00,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:10:01,049][1652475] Updated weights for policy 0, policy_version 657888 (0.0012) [2024-06-15 20:10:02,882][1652475] Updated weights for policy 0, policy_version 657952 (0.0013) [2024-06-15 20:10:05,681][1652475] Updated weights for policy 0, policy_version 657985 (0.0013) [2024-06-15 20:10:05,738][1648984] Fps is (10 sec: 45874.4, 60 sec: 43690.5, 300 sec: 42320.7). Total num frames: 1347551232. Throughput: 0: 11207.1. Samples: 336943616. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:10:05,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:10:06,556][1652475] Updated weights for policy 0, policy_version 658032 (0.0023) [2024-06-15 20:10:08,883][1652475] Updated weights for policy 0, policy_version 658081 (0.0016) [2024-06-15 20:10:10,740][1648984] Fps is (10 sec: 49151.6, 60 sec: 45875.2, 300 sec: 42653.9). Total num frames: 1347813376. Throughput: 0: 11138.8. Samples: 337009664. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:10:10,741][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:10:12,859][1652475] Updated weights for policy 0, policy_version 658130 (0.0017) [2024-06-15 20:10:14,331][1652475] Updated weights for policy 0, policy_version 658208 (0.0013) [2024-06-15 20:10:15,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 44236.7, 300 sec: 42987.1). Total num frames: 1348075520. Throughput: 0: 11275.3. Samples: 337080320. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:10:15,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 20:10:17,306][1652475] Updated weights for policy 0, policy_version 658272 (0.0013) [2024-06-15 20:10:20,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1348206592. Throughput: 0: 11195.7. Samples: 337110016. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:10:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:10:22,622][1652475] Updated weights for policy 0, policy_version 658336 (0.0026) [2024-06-15 20:10:24,256][1652475] Updated weights for policy 0, policy_version 658384 (0.0012) [2024-06-15 20:10:25,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 45329.1, 300 sec: 43209.3). Total num frames: 1348501504. Throughput: 0: 11275.4. Samples: 337178112. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:10:25,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 20:10:26,502][1652475] Updated weights for policy 0, policy_version 658492 (0.0097) [2024-06-15 20:10:30,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 44782.8, 300 sec: 42431.8). Total num frames: 1348665344. Throughput: 0: 10945.4. Samples: 337235968. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:10:30,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:10:31,251][1652475] Updated weights for policy 0, policy_version 658550 (0.0013) [2024-06-15 20:10:35,738][1648984] Fps is (10 sec: 26214.3, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 1348763648. Throughput: 0: 10968.2. Samples: 337269248. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:10:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:10:36,188][1652475] Updated weights for policy 0, policy_version 658608 (0.0018) [2024-06-15 20:10:37,059][1651340] Signal inference workers to stop experience collection... (33850 times) [2024-06-15 20:10:37,125][1652475] InferenceWorker_p0-w0: stopping experience collection (33850 times) [2024-06-15 20:10:37,329][1651340] Signal inference workers to resume experience collection... (33850 times) [2024-06-15 20:10:37,334][1652475] InferenceWorker_p0-w0: resuming experience collection (33850 times) [2024-06-15 20:10:38,050][1652475] Updated weights for policy 0, policy_version 658688 (0.0113) [2024-06-15 20:10:40,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43690.5, 300 sec: 42876.1). Total num frames: 1349124096. Throughput: 0: 10729.2. Samples: 337323008. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:10:40,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 20:10:44,039][1652475] Updated weights for policy 0, policy_version 658755 (0.0111) [2024-06-15 20:10:45,277][1652475] Updated weights for policy 0, policy_version 658816 (0.0013) [2024-06-15 20:10:45,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43144.6, 300 sec: 42654.0). Total num frames: 1349255168. Throughput: 0: 10763.4. Samples: 337398272. Policy #0 lag: (min: 15.0, avg: 100.5, max: 271.0) [2024-06-15 20:10:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:10:49,577][1652475] Updated weights for policy 0, policy_version 658928 (0.0013) [2024-06-15 20:10:50,738][1648984] Fps is (10 sec: 45876.2, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 1349582848. Throughput: 0: 10820.3. Samples: 337430528. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:10:50,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:10:51,261][1652475] Updated weights for policy 0, policy_version 659005 (0.0091) [2024-06-15 20:10:55,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1349648384. Throughput: 0: 10717.9. Samples: 337491968. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:10:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:10:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000659008_1349648384.pth... [2024-06-15 20:10:55,798][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000653952_1339293696.pth [2024-06-15 20:10:55,803][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000659008_1349648384.pth [2024-06-15 20:10:59,115][1652475] Updated weights for policy 0, policy_version 659073 (0.0034) [2024-06-15 20:11:00,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 1349910528. Throughput: 0: 10638.3. Samples: 337559040. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:11:00,740][1652475] Updated weights for policy 0, policy_version 659138 (0.0049) [2024-06-15 20:11:02,814][1652475] Updated weights for policy 0, policy_version 659220 (0.0012) [2024-06-15 20:11:03,835][1652475] Updated weights for policy 0, policy_version 659264 (0.0011) [2024-06-15 20:11:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.8, 300 sec: 42765.0). Total num frames: 1350172672. Throughput: 0: 10456.2. Samples: 337580544. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:11:10,348][1652475] Updated weights for policy 0, policy_version 659300 (0.0024) [2024-06-15 20:11:10,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 40960.0, 300 sec: 42876.1). Total num frames: 1350270976. Throughput: 0: 10547.2. Samples: 337652736. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:11:11,845][1652475] Updated weights for policy 0, policy_version 659363 (0.0096) [2024-06-15 20:11:13,406][1652475] Updated weights for policy 0, policy_version 659410 (0.0015) [2024-06-15 20:11:14,815][1652475] Updated weights for policy 0, policy_version 659472 (0.0013) [2024-06-15 20:11:15,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 1350664192. Throughput: 0: 10501.7. Samples: 337708544. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:11:16,045][1652475] Updated weights for policy 0, policy_version 659519 (0.0012) [2024-06-15 20:11:20,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1350696960. Throughput: 0: 10513.1. Samples: 337742336. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:20,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:11:21,668][1651340] Signal inference workers to stop experience collection... (33900 times) [2024-06-15 20:11:21,718][1652475] InferenceWorker_p0-w0: stopping experience collection (33900 times) [2024-06-15 20:11:21,823][1651340] Signal inference workers to resume experience collection... (33900 times) [2024-06-15 20:11:21,824][1652475] InferenceWorker_p0-w0: resuming experience collection (33900 times) [2024-06-15 20:11:22,089][1652475] Updated weights for policy 0, policy_version 659580 (0.0014) [2024-06-15 20:11:23,988][1652475] Updated weights for policy 0, policy_version 659642 (0.0016) [2024-06-15 20:11:25,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 40959.9, 300 sec: 43098.3). Total num frames: 1350959104. Throughput: 0: 10649.6. Samples: 337802240. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:25,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:11:28,674][1652475] Updated weights for policy 0, policy_version 659713 (0.0012) [2024-06-15 20:11:29,891][1652475] Updated weights for policy 0, policy_version 659773 (0.0022) [2024-06-15 20:11:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 1351221248. Throughput: 0: 10376.5. Samples: 337865216. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:11:33,186][1652475] Updated weights for policy 0, policy_version 659825 (0.0125) [2024-06-15 20:11:35,662][1652475] Updated weights for policy 0, policy_version 659874 (0.0122) [2024-06-15 20:11:35,738][1648984] Fps is (10 sec: 45876.1, 60 sec: 44236.9, 300 sec: 42987.2). Total num frames: 1351417856. Throughput: 0: 10376.6. Samples: 337897472. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:11:39,536][1652475] Updated weights for policy 0, policy_version 659922 (0.0013) [2024-06-15 20:11:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.2, 300 sec: 42987.2). Total num frames: 1351581696. Throughput: 0: 10729.3. Samples: 337974784. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:11:41,538][1652475] Updated weights for policy 0, policy_version 660000 (0.0012) [2024-06-15 20:11:44,023][1652475] Updated weights for policy 0, policy_version 660048 (0.0011) [2024-06-15 20:11:45,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1351876608. Throughput: 0: 10433.4. Samples: 338028544. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:11:47,036][1652475] Updated weights for policy 0, policy_version 660115 (0.0092) [2024-06-15 20:11:50,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 40413.9, 300 sec: 43098.3). Total num frames: 1352007680. Throughput: 0: 10740.6. Samples: 338063872. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:11:51,536][1652475] Updated weights for policy 0, policy_version 660176 (0.0042) [2024-06-15 20:11:53,407][1652475] Updated weights for policy 0, policy_version 660256 (0.0012) [2024-06-15 20:11:55,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1352269824. Throughput: 0: 10604.1. Samples: 338129920. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:11:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:11:56,692][1652475] Updated weights for policy 0, policy_version 660336 (0.0016) [2024-06-15 20:11:59,261][1652475] Updated weights for policy 0, policy_version 660402 (0.0012) [2024-06-15 20:12:00,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1352531968. Throughput: 0: 10899.9. Samples: 338199040. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:12:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:12:03,983][1652475] Updated weights for policy 0, policy_version 660452 (0.0012) [2024-06-15 20:12:05,566][1652475] Updated weights for policy 0, policy_version 660516 (0.0013) [2024-06-15 20:12:05,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 1352761344. Throughput: 0: 11070.6. Samples: 338240512. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:12:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:12:07,200][1651340] Signal inference workers to stop experience collection... (33950 times) [2024-06-15 20:12:07,275][1652475] InferenceWorker_p0-w0: stopping experience collection (33950 times) [2024-06-15 20:12:07,398][1651340] Signal inference workers to resume experience collection... (33950 times) [2024-06-15 20:12:07,400][1652475] InferenceWorker_p0-w0: resuming experience collection (33950 times) [2024-06-15 20:12:07,808][1652475] Updated weights for policy 0, policy_version 660576 (0.0015) [2024-06-15 20:12:08,748][1652475] Updated weights for policy 0, policy_version 660606 (0.0010) [2024-06-15 20:12:10,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 45329.1, 300 sec: 43209.3). Total num frames: 1352990720. Throughput: 0: 11093.3. Samples: 338301440. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:12:10,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:12:10,902][1652475] Updated weights for policy 0, policy_version 660658 (0.0014) [2024-06-15 20:12:15,690][1652475] Updated weights for policy 0, policy_version 660720 (0.0012) [2024-06-15 20:12:15,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 41505.9, 300 sec: 43320.3). Total num frames: 1353154560. Throughput: 0: 11309.4. Samples: 338374144. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:12:15,746][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:12:17,292][1652475] Updated weights for policy 0, policy_version 660800 (0.0095) [2024-06-15 20:12:20,008][1652475] Updated weights for policy 0, policy_version 660856 (0.0012) [2024-06-15 20:12:20,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 45875.2, 300 sec: 43431.5). Total num frames: 1353449472. Throughput: 0: 11229.9. Samples: 338402816. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:12:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:12:22,621][1652475] Updated weights for policy 0, policy_version 660898 (0.0013) [2024-06-15 20:12:25,738][1648984] Fps is (10 sec: 42599.7, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1353580544. Throughput: 0: 10990.9. Samples: 338469376. Policy #0 lag: (min: 63.0, avg: 136.5, max: 303.0) [2024-06-15 20:12:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:12:27,074][1652475] Updated weights for policy 0, policy_version 660960 (0.0012) [2024-06-15 20:12:28,540][1652475] Updated weights for policy 0, policy_version 661025 (0.0016) [2024-06-15 20:12:30,321][1652475] Updated weights for policy 0, policy_version 661073 (0.0017) [2024-06-15 20:12:30,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 1353908224. Throughput: 0: 11320.9. Samples: 338537984. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:12:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:12:34,626][1652475] Updated weights for policy 0, policy_version 661136 (0.0013) [2024-06-15 20:12:35,738][1648984] Fps is (10 sec: 49149.6, 60 sec: 44236.4, 300 sec: 43209.3). Total num frames: 1354072064. Throughput: 0: 11309.4. Samples: 338572800. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:12:35,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:12:35,906][1652475] Updated weights for policy 0, policy_version 661183 (0.0025) [2024-06-15 20:12:38,784][1652475] Updated weights for policy 0, policy_version 661248 (0.0135) [2024-06-15 20:12:40,144][1652475] Updated weights for policy 0, policy_version 661306 (0.0013) [2024-06-15 20:12:40,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 46421.3, 300 sec: 43542.5). Total num frames: 1354366976. Throughput: 0: 11173.0. Samples: 338632704. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:12:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:12:42,413][1652475] Updated weights for policy 0, policy_version 661344 (0.0013) [2024-06-15 20:12:45,738][1648984] Fps is (10 sec: 42600.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1354498048. Throughput: 0: 11298.1. Samples: 338707456. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:12:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:12:48,141][1652475] Updated weights for policy 0, policy_version 661382 (0.0014) [2024-06-15 20:12:49,947][1652475] Updated weights for policy 0, policy_version 661459 (0.0013) [2024-06-15 20:12:50,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 45329.0, 300 sec: 43431.5). Total num frames: 1354727424. Throughput: 0: 11150.2. Samples: 338742272. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:12:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:12:51,217][1651340] Signal inference workers to stop experience collection... (34000 times) [2024-06-15 20:12:51,279][1652475] InferenceWorker_p0-w0: stopping experience collection (34000 times) [2024-06-15 20:12:51,451][1651340] Signal inference workers to resume experience collection... (34000 times) [2024-06-15 20:12:51,452][1652475] InferenceWorker_p0-w0: resuming experience collection (34000 times) [2024-06-15 20:12:51,940][1652475] Updated weights for policy 0, policy_version 661536 (0.0015) [2024-06-15 20:12:55,738][1648984] Fps is (10 sec: 39319.8, 60 sec: 43690.4, 300 sec: 42987.1). Total num frames: 1354891264. Throughput: 0: 10945.3. Samples: 338793984. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:12:55,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:12:56,228][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000661600_1354956800.pth... [2024-06-15 20:12:56,229][1652475] Updated weights for policy 0, policy_version 661600 (0.0012) [2024-06-15 20:12:56,399][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000656512_1344536576.pth [2024-06-15 20:13:00,526][1652475] Updated weights for policy 0, policy_version 661666 (0.0012) [2024-06-15 20:13:00,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1355120640. Throughput: 0: 10877.2. Samples: 338863616. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:13:01,491][1652475] Updated weights for policy 0, policy_version 661716 (0.0013) [2024-06-15 20:13:03,619][1652475] Updated weights for policy 0, policy_version 661808 (0.0017) [2024-06-15 20:13:05,738][1648984] Fps is (10 sec: 52431.0, 60 sec: 44236.8, 300 sec: 43875.8). Total num frames: 1355415552. Throughput: 0: 10831.6. Samples: 338890240. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:13:09,078][1652475] Updated weights for policy 0, policy_version 661845 (0.0013) [2024-06-15 20:13:10,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.5, 300 sec: 43098.3). Total num frames: 1355546624. Throughput: 0: 11025.1. Samples: 338965504. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:13:11,351][1652475] Updated weights for policy 0, policy_version 661907 (0.0011) [2024-06-15 20:13:12,253][1652475] Updated weights for policy 0, policy_version 661952 (0.0020) [2024-06-15 20:13:15,319][1652475] Updated weights for policy 0, policy_version 662049 (0.0111) [2024-06-15 20:13:15,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45875.5, 300 sec: 43875.8). Total num frames: 1355907072. Throughput: 0: 10638.2. Samples: 339016704. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:13:20,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 1355939840. Throughput: 0: 10752.1. Samples: 339056640. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:20,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:13:21,067][1652475] Updated weights for policy 0, policy_version 662081 (0.0010) [2024-06-15 20:13:23,643][1652475] Updated weights for policy 0, policy_version 662160 (0.0012) [2024-06-15 20:13:25,481][1652475] Updated weights for policy 0, policy_version 662228 (0.0012) [2024-06-15 20:13:25,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 1356267520. Throughput: 0: 10945.4. Samples: 339125248. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 20:13:27,294][1652475] Updated weights for policy 0, policy_version 662288 (0.0012) [2024-06-15 20:13:28,397][1652475] Updated weights for policy 0, policy_version 662336 (0.0012) [2024-06-15 20:13:30,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 1356464128. Throughput: 0: 10570.0. Samples: 339183104. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 20:13:35,472][1652475] Updated weights for policy 0, policy_version 662401 (0.0103) [2024-06-15 20:13:35,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 42052.7, 300 sec: 43098.3). Total num frames: 1356595200. Throughput: 0: 10535.9. Samples: 339216384. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 20:13:37,283][1652475] Updated weights for policy 0, policy_version 662467 (0.0023) [2024-06-15 20:13:37,674][1651340] Signal inference workers to stop experience collection... (34050 times) [2024-06-15 20:13:37,721][1652475] InferenceWorker_p0-w0: stopping experience collection (34050 times) [2024-06-15 20:13:38,026][1651340] Signal inference workers to resume experience collection... (34050 times) [2024-06-15 20:13:38,027][1652475] InferenceWorker_p0-w0: resuming experience collection (34050 times) [2024-06-15 20:13:38,768][1652475] Updated weights for policy 0, policy_version 662524 (0.0012) [2024-06-15 20:13:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 43320.4). Total num frames: 1356857344. Throughput: 0: 10683.8. Samples: 339274752. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:13:44,629][1652475] Updated weights for policy 0, policy_version 662593 (0.0014) [2024-06-15 20:13:45,676][1652475] Updated weights for policy 0, policy_version 662652 (0.0013) [2024-06-15 20:13:45,742][1648984] Fps is (10 sec: 52404.0, 60 sec: 43687.2, 300 sec: 43097.6). Total num frames: 1357119488. Throughput: 0: 10534.7. Samples: 339337728. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:45,743][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:13:48,108][1652475] Updated weights for policy 0, policy_version 662714 (0.0013) [2024-06-15 20:13:50,738][1648984] Fps is (10 sec: 42597.2, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 1357283328. Throughput: 0: 10638.2. Samples: 339368960. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:50,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:13:51,118][1652475] Updated weights for policy 0, policy_version 662755 (0.0014) [2024-06-15 20:13:54,025][1652475] Updated weights for policy 0, policy_version 662800 (0.0016) [2024-06-15 20:13:55,738][1648984] Fps is (10 sec: 39340.3, 60 sec: 43691.0, 300 sec: 43431.5). Total num frames: 1357512704. Throughput: 0: 10626.9. Samples: 339443712. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:13:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:13:56,123][1652475] Updated weights for policy 0, policy_version 662851 (0.0012) [2024-06-15 20:13:57,876][1652475] Updated weights for policy 0, policy_version 662913 (0.0017) [2024-06-15 20:13:59,240][1652475] Updated weights for policy 0, policy_version 662973 (0.0015) [2024-06-15 20:14:00,738][1648984] Fps is (10 sec: 49153.6, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 1357774848. Throughput: 0: 10865.8. Samples: 339505664. Policy #0 lag: (min: 13.0, avg: 94.8, max: 269.0) [2024-06-15 20:14:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:14:03,062][1652475] Updated weights for policy 0, policy_version 663024 (0.0029) [2024-06-15 20:14:05,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 43764.7). Total num frames: 1357971456. Throughput: 0: 10911.3. Samples: 339547648. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:14:05,870][1652475] Updated weights for policy 0, policy_version 663074 (0.0011) [2024-06-15 20:14:07,777][1652475] Updated weights for policy 0, policy_version 663110 (0.0032) [2024-06-15 20:14:09,828][1652475] Updated weights for policy 0, policy_version 663190 (0.0015) [2024-06-15 20:14:10,736][1652475] Updated weights for policy 0, policy_version 663232 (0.0012) [2024-06-15 20:14:10,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 45875.2, 300 sec: 43653.6). Total num frames: 1358299136. Throughput: 0: 10797.5. Samples: 339611136. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:14:14,499][1652475] Updated weights for policy 0, policy_version 663290 (0.0038) [2024-06-15 20:14:15,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42052.2, 300 sec: 43542.6). Total num frames: 1358430208. Throughput: 0: 11127.4. Samples: 339683840. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:15,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:14:18,087][1652475] Updated weights for policy 0, policy_version 663356 (0.0098) [2024-06-15 20:14:20,624][1652475] Updated weights for policy 0, policy_version 663408 (0.0013) [2024-06-15 20:14:20,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 1358659584. Throughput: 0: 11036.4. Samples: 339713024. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:14:22,366][1652475] Updated weights for policy 0, policy_version 663479 (0.0013) [2024-06-15 20:14:25,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 1358856192. Throughput: 0: 11173.0. Samples: 339777536. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:14:26,195][1651340] Signal inference workers to stop experience collection... (34100 times) [2024-06-15 20:14:26,330][1652475] InferenceWorker_p0-w0: stopping experience collection (34100 times) [2024-06-15 20:14:26,440][1651340] Signal inference workers to resume experience collection... (34100 times) [2024-06-15 20:14:26,441][1652475] InferenceWorker_p0-w0: resuming experience collection (34100 times) [2024-06-15 20:14:29,611][1652475] Updated weights for policy 0, policy_version 663569 (0.0141) [2024-06-15 20:14:30,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1359085568. Throughput: 0: 11174.2. Samples: 339840512. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:14:31,681][1652475] Updated weights for policy 0, policy_version 663632 (0.0050) [2024-06-15 20:14:32,847][1652475] Updated weights for policy 0, policy_version 663680 (0.0028) [2024-06-15 20:14:34,282][1652475] Updated weights for policy 0, policy_version 663743 (0.0022) [2024-06-15 20:14:35,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 1359347712. Throughput: 0: 11275.5. Samples: 339876352. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:14:38,734][1652475] Updated weights for policy 0, policy_version 663798 (0.0014) [2024-06-15 20:14:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 1359478784. Throughput: 0: 11104.7. Samples: 339943424. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:14:41,552][1652475] Updated weights for policy 0, policy_version 663856 (0.0012) [2024-06-15 20:14:43,550][1652475] Updated weights for policy 0, policy_version 663890 (0.0033) [2024-06-15 20:14:44,641][1652475] Updated weights for policy 0, policy_version 663937 (0.0012) [2024-06-15 20:14:45,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 45332.6, 300 sec: 43875.8). Total num frames: 1359839232. Throughput: 0: 11252.6. Samples: 340012032. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:45,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:14:49,324][1652475] Updated weights for policy 0, policy_version 664020 (0.0036) [2024-06-15 20:14:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45329.2, 300 sec: 43764.7). Total num frames: 1360003072. Throughput: 0: 11081.9. Samples: 340046336. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:50,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 20:14:52,049][1652475] Updated weights for policy 0, policy_version 664080 (0.0011) [2024-06-15 20:14:55,029][1652475] Updated weights for policy 0, policy_version 664130 (0.0011) [2024-06-15 20:14:55,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 44782.9, 300 sec: 43653.6). Total num frames: 1360199680. Throughput: 0: 11127.5. Samples: 340111872. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:14:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:14:56,087][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000664176_1360232448.pth... [2024-06-15 20:14:56,122][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000659008_1349648384.pth [2024-06-15 20:14:59,224][1652475] Updated weights for policy 0, policy_version 664208 (0.0013) [2024-06-15 20:15:00,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43690.5, 300 sec: 43542.6). Total num frames: 1360396288. Throughput: 0: 10774.7. Samples: 340168704. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:15:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:15:01,316][1652475] Updated weights for policy 0, policy_version 664272 (0.0014) [2024-06-15 20:15:03,444][1652475] Updated weights for policy 0, policy_version 664321 (0.0013) [2024-06-15 20:15:04,572][1652475] Updated weights for policy 0, policy_version 664378 (0.0011) [2024-06-15 20:15:05,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 1360658432. Throughput: 0: 10774.8. Samples: 340197888. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:15:05,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 20:15:09,721][1652475] Updated weights for policy 0, policy_version 664439 (0.0012) [2024-06-15 20:15:10,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1360789504. Throughput: 0: 10808.9. Samples: 340263936. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:15:10,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:15:12,987][1652475] Updated weights for policy 0, policy_version 664480 (0.0137) [2024-06-15 20:15:14,183][1651340] Signal inference workers to stop experience collection... (34150 times) [2024-06-15 20:15:14,276][1652475] InferenceWorker_p0-w0: stopping experience collection (34150 times) [2024-06-15 20:15:14,393][1651340] Signal inference workers to resume experience collection... (34150 times) [2024-06-15 20:15:14,394][1652475] InferenceWorker_p0-w0: resuming experience collection (34150 times) [2024-06-15 20:15:15,416][1652475] Updated weights for policy 0, policy_version 664582 (0.0012) [2024-06-15 20:15:15,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 43653.7). Total num frames: 1361084416. Throughput: 0: 10638.2. Samples: 340319232. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:15:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:15:20,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 42052.4, 300 sec: 42987.2). Total num frames: 1361182720. Throughput: 0: 10490.3. Samples: 340348416. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:15:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:15:23,630][1652475] Updated weights for policy 0, policy_version 664656 (0.0013) [2024-06-15 20:15:25,334][1652475] Updated weights for policy 0, policy_version 664736 (0.0013) [2024-06-15 20:15:25,738][1648984] Fps is (10 sec: 32767.5, 60 sec: 42598.3, 300 sec: 43209.3). Total num frames: 1361412096. Throughput: 0: 10558.5. Samples: 340418560. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:15:25,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:15:27,403][1652475] Updated weights for policy 0, policy_version 664816 (0.0084) [2024-06-15 20:15:28,881][1652475] Updated weights for policy 0, policy_version 664865 (0.0013) [2024-06-15 20:15:30,738][1648984] Fps is (10 sec: 52427.1, 60 sec: 43690.5, 300 sec: 43875.8). Total num frames: 1361707008. Throughput: 0: 10251.3. Samples: 340473344. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:15:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:15:35,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 39321.6, 300 sec: 42654.0). Total num frames: 1361707008. Throughput: 0: 10331.0. Samples: 340511232. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:15:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:15:36,714][1652475] Updated weights for policy 0, policy_version 664912 (0.0137) [2024-06-15 20:15:38,310][1652475] Updated weights for policy 0, policy_version 664978 (0.0012) [2024-06-15 20:15:40,380][1652475] Updated weights for policy 0, policy_version 665057 (0.0013) [2024-06-15 20:15:40,738][1648984] Fps is (10 sec: 36045.6, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 1362067456. Throughput: 0: 10262.8. Samples: 340573696. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:15:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:15:42,424][1652475] Updated weights for policy 0, policy_version 665150 (0.0029) [2024-06-15 20:15:45,738][1648984] Fps is (10 sec: 52427.5, 60 sec: 39867.6, 300 sec: 42876.1). Total num frames: 1362231296. Throughput: 0: 10217.2. Samples: 340628480. Policy #0 lag: (min: 4.0, avg: 105.4, max: 260.0) [2024-06-15 20:15:45,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:15:50,702][1652475] Updated weights for policy 0, policy_version 665223 (0.0011) [2024-06-15 20:15:50,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 39321.6, 300 sec: 43098.3). Total num frames: 1362362368. Throughput: 0: 10444.8. Samples: 340667904. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:15:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:15:51,963][1652475] Updated weights for policy 0, policy_version 665283 (0.0013) [2024-06-15 20:15:54,820][1652475] Updated weights for policy 0, policy_version 665360 (0.0015) [2024-06-15 20:15:55,738][1648984] Fps is (10 sec: 49152.8, 60 sec: 42052.2, 300 sec: 43431.5). Total num frames: 1362722816. Throughput: 0: 10171.7. Samples: 340721664. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:15:55,754][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:16:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 39321.7, 300 sec: 42653.9). Total num frames: 1362755584. Throughput: 0: 10422.0. Samples: 340788224. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:16:01,263][1651340] Signal inference workers to stop experience collection... (34200 times) [2024-06-15 20:16:01,300][1652475] InferenceWorker_p0-w0: stopping experience collection (34200 times) [2024-06-15 20:16:01,584][1651340] Signal inference workers to resume experience collection... (34200 times) [2024-06-15 20:16:01,585][1652475] InferenceWorker_p0-w0: resuming experience collection (34200 times) [2024-06-15 20:16:01,587][1652475] Updated weights for policy 0, policy_version 665440 (0.0012) [2024-06-15 20:16:03,741][1652475] Updated weights for policy 0, policy_version 665536 (0.0019) [2024-06-15 20:16:05,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 39867.7, 300 sec: 43320.4). Total num frames: 1363050496. Throughput: 0: 10274.1. Samples: 340810752. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:16:07,391][1652475] Updated weights for policy 0, policy_version 665601 (0.0012) [2024-06-15 20:16:08,483][1652475] Updated weights for policy 0, policy_version 665651 (0.0058) [2024-06-15 20:16:10,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 41506.2, 300 sec: 42765.0). Total num frames: 1363279872. Throughput: 0: 10171.8. Samples: 340876288. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:16:13,736][1652475] Updated weights for policy 0, policy_version 665700 (0.0012) [2024-06-15 20:16:15,533][1652475] Updated weights for policy 0, policy_version 665776 (0.0013) [2024-06-15 20:16:15,749][1648984] Fps is (10 sec: 45824.8, 60 sec: 40406.5, 300 sec: 43429.9). Total num frames: 1363509248. Throughput: 0: 10442.3. Samples: 340943360. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:15,749][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:16:19,664][1652475] Updated weights for policy 0, policy_version 665856 (0.0013) [2024-06-15 20:16:20,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.4, 300 sec: 43431.5). Total num frames: 1363771392. Throughput: 0: 10387.9. Samples: 340978688. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:16:20,881][1652475] Updated weights for policy 0, policy_version 665915 (0.0011) [2024-06-15 20:16:25,425][1652475] Updated weights for policy 0, policy_version 665968 (0.0011) [2024-06-15 20:16:25,738][1648984] Fps is (10 sec: 39364.8, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 1363902464. Throughput: 0: 10513.1. Samples: 341046784. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:16:27,256][1652475] Updated weights for policy 0, policy_version 666043 (0.0076) [2024-06-15 20:16:30,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 39867.9, 300 sec: 42987.2). Total num frames: 1364099072. Throughput: 0: 10763.4. Samples: 341112832. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:16:31,732][1652475] Updated weights for policy 0, policy_version 666114 (0.0106) [2024-06-15 20:16:35,738][1648984] Fps is (10 sec: 42597.4, 60 sec: 43690.5, 300 sec: 43209.3). Total num frames: 1364328448. Throughput: 0: 10399.2. Samples: 341135872. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:16:36,325][1652475] Updated weights for policy 0, policy_version 666180 (0.0155) [2024-06-15 20:16:38,239][1652475] Updated weights for policy 0, policy_version 666256 (0.0013) [2024-06-15 20:16:40,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1364590592. Throughput: 0: 10661.0. Samples: 341201408. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:16:42,700][1652475] Updated weights for policy 0, policy_version 666320 (0.0013) [2024-06-15 20:16:44,050][1651340] Signal inference workers to stop experience collection... (34250 times) [2024-06-15 20:16:44,098][1652475] InferenceWorker_p0-w0: stopping experience collection (34250 times) [2024-06-15 20:16:44,350][1651340] Signal inference workers to resume experience collection... (34250 times) [2024-06-15 20:16:44,351][1652475] InferenceWorker_p0-w0: resuming experience collection (34250 times) [2024-06-15 20:16:44,353][1652475] Updated weights for policy 0, policy_version 666384 (0.0014) [2024-06-15 20:16:45,750][1648984] Fps is (10 sec: 52364.6, 60 sec: 43681.8, 300 sec: 43540.7). Total num frames: 1364852736. Throughput: 0: 10578.4. Samples: 341264384. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:45,751][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:16:49,603][1652475] Updated weights for policy 0, policy_version 666492 (0.0021) [2024-06-15 20:16:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1365016576. Throughput: 0: 10979.5. Samples: 341304832. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:16:51,503][1652475] Updated weights for policy 0, policy_version 666560 (0.0013) [2024-06-15 20:16:55,738][1648984] Fps is (10 sec: 32807.8, 60 sec: 40959.8, 300 sec: 42876.0). Total num frames: 1365180416. Throughput: 0: 11002.2. Samples: 341371392. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:16:55,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:16:56,237][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000666624_1365245952.pth... [2024-06-15 20:16:56,376][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000661600_1354956800.pth [2024-06-15 20:16:57,244][1652475] Updated weights for policy 0, policy_version 666658 (0.0013) [2024-06-15 20:17:00,749][1648984] Fps is (10 sec: 36004.5, 60 sec: 43682.5, 300 sec: 42763.4). Total num frames: 1365377024. Throughput: 0: 10763.3. Samples: 341427712. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:17:00,750][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:17:02,159][1652475] Updated weights for policy 0, policy_version 666746 (0.0012) [2024-06-15 20:17:03,390][1652475] Updated weights for policy 0, policy_version 666784 (0.0017) [2024-06-15 20:17:05,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43144.3, 300 sec: 42876.1). Total num frames: 1365639168. Throughput: 0: 10660.9. Samples: 341458432. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:17:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:17:07,035][1652475] Updated weights for policy 0, policy_version 666818 (0.0025) [2024-06-15 20:17:09,214][1652475] Updated weights for policy 0, policy_version 666912 (0.0012) [2024-06-15 20:17:10,738][1648984] Fps is (10 sec: 52486.6, 60 sec: 43690.5, 300 sec: 43209.3). Total num frames: 1365901312. Throughput: 0: 10513.0. Samples: 341519872. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:17:10,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:17:15,045][1652475] Updated weights for policy 0, policy_version 666978 (0.0012) [2024-06-15 20:17:15,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42059.8, 300 sec: 42653.9). Total num frames: 1366032384. Throughput: 0: 10581.3. Samples: 341588992. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:17:15,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:17:17,060][1652475] Updated weights for policy 0, policy_version 667056 (0.0013) [2024-06-15 20:17:19,232][1652475] Updated weights for policy 0, policy_version 667076 (0.0015) [2024-06-15 20:17:20,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1366294528. Throughput: 0: 10626.9. Samples: 341614080. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:17:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:17:20,978][1652475] Updated weights for policy 0, policy_version 667152 (0.0181) [2024-06-15 20:17:25,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 1366425600. Throughput: 0: 10729.3. Samples: 341684224. Policy #0 lag: (min: 15.0, avg: 63.4, max: 271.0) [2024-06-15 20:17:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:17:25,863][1652475] Updated weights for policy 0, policy_version 667203 (0.0014) [2024-06-15 20:17:28,426][1652475] Updated weights for policy 0, policy_version 667266 (0.0025) [2024-06-15 20:17:30,612][1652475] Updated weights for policy 0, policy_version 667330 (0.0014) [2024-06-15 20:17:30,738][1648984] Fps is (10 sec: 39320.3, 60 sec: 43144.3, 300 sec: 42765.0). Total num frames: 1366687744. Throughput: 0: 10732.1. Samples: 341747200. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:17:30,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 20:17:30,950][1651340] Signal inference workers to stop experience collection... (34300 times) [2024-06-15 20:17:31,000][1652475] InferenceWorker_p0-w0: stopping experience collection (34300 times) [2024-06-15 20:17:31,137][1651340] Signal inference workers to resume experience collection... (34300 times) [2024-06-15 20:17:31,143][1652475] InferenceWorker_p0-w0: resuming experience collection (34300 times) [2024-06-15 20:17:31,842][1652475] Updated weights for policy 0, policy_version 667392 (0.0012) [2024-06-15 20:17:35,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.9, 300 sec: 42654.0). Total num frames: 1366949888. Throughput: 0: 10479.0. Samples: 341776384. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:17:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:17:39,481][1652475] Updated weights for policy 0, policy_version 667472 (0.0019) [2024-06-15 20:17:40,738][1648984] Fps is (10 sec: 36046.0, 60 sec: 40960.0, 300 sec: 42542.9). Total num frames: 1367048192. Throughput: 0: 10672.4. Samples: 341851648. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:17:40,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:17:40,983][1652475] Updated weights for policy 0, policy_version 667525 (0.0075) [2024-06-15 20:17:42,311][1652475] Updated weights for policy 0, policy_version 667589 (0.0013) [2024-06-15 20:17:43,609][1652475] Updated weights for policy 0, policy_version 667644 (0.0013) [2024-06-15 20:17:45,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43699.8, 300 sec: 43209.3). Total num frames: 1367474176. Throughput: 0: 10572.6. Samples: 341903360. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:17:45,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:17:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 40960.0, 300 sec: 42654.0). Total num frames: 1367474176. Throughput: 0: 10774.8. Samples: 341943296. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:17:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:17:52,371][1652475] Updated weights for policy 0, policy_version 667714 (0.0012) [2024-06-15 20:17:53,570][1652475] Updated weights for policy 0, policy_version 667776 (0.0012) [2024-06-15 20:17:55,571][1652475] Updated weights for policy 0, policy_version 667872 (0.0011) [2024-06-15 20:17:55,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 43690.9, 300 sec: 42987.2). Total num frames: 1367801856. Throughput: 0: 10922.7. Samples: 342011392. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:17:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:17:56,959][1652475] Updated weights for policy 0, policy_version 667936 (0.0015) [2024-06-15 20:18:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43698.8, 300 sec: 42653.9). Total num frames: 1367998464. Throughput: 0: 10820.3. Samples: 342075904. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:18:04,625][1652475] Updated weights for policy 0, policy_version 667985 (0.0014) [2024-06-15 20:18:05,738][1648984] Fps is (10 sec: 32766.8, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1368129536. Throughput: 0: 11081.9. Samples: 342112768. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:05,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:18:06,785][1652475] Updated weights for policy 0, policy_version 668081 (0.0012) [2024-06-15 20:18:08,015][1652475] Updated weights for policy 0, policy_version 668144 (0.0012) [2024-06-15 20:18:09,589][1652475] Updated weights for policy 0, policy_version 668208 (0.0020) [2024-06-15 20:18:10,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1368522752. Throughput: 0: 10683.7. Samples: 342164992. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:10,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:18:15,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1368522752. Throughput: 0: 10877.2. Samples: 342236672. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:18:15,876][1651340] Signal inference workers to stop experience collection... (34350 times) [2024-06-15 20:18:15,940][1652475] InferenceWorker_p0-w0: stopping experience collection (34350 times) [2024-06-15 20:18:16,059][1651340] Signal inference workers to resume experience collection... (34350 times) [2024-06-15 20:18:16,060][1652475] InferenceWorker_p0-w0: resuming experience collection (34350 times) [2024-06-15 20:18:16,247][1652475] Updated weights for policy 0, policy_version 668262 (0.0021) [2024-06-15 20:18:18,440][1652475] Updated weights for policy 0, policy_version 668351 (0.0034) [2024-06-15 20:18:20,348][1652475] Updated weights for policy 0, policy_version 668401 (0.0011) [2024-06-15 20:18:20,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1368915968. Throughput: 0: 10911.3. Samples: 342267392. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:18:22,140][1652475] Updated weights for policy 0, policy_version 668473 (0.0106) [2024-06-15 20:18:25,750][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1369047040. Throughput: 0: 10615.5. Samples: 342329344. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:25,751][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 20:18:27,725][1652475] Updated weights for policy 0, policy_version 668541 (0.0029) [2024-06-15 20:18:29,737][1652475] Updated weights for policy 0, policy_version 668592 (0.0013) [2024-06-15 20:18:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.9, 300 sec: 43098.3). Total num frames: 1369309184. Throughput: 0: 11002.3. Samples: 342398464. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:18:31,875][1652475] Updated weights for policy 0, policy_version 668626 (0.0044) [2024-06-15 20:18:33,277][1652475] Updated weights for policy 0, policy_version 668673 (0.0036) [2024-06-15 20:18:34,709][1652475] Updated weights for policy 0, policy_version 668726 (0.0012) [2024-06-15 20:18:35,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1369571328. Throughput: 0: 10877.2. Samples: 342432768. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:18:38,352][1652475] Updated weights for policy 0, policy_version 668770 (0.0012) [2024-06-15 20:18:40,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 45875.2, 300 sec: 42987.9). Total num frames: 1369800704. Throughput: 0: 10911.3. Samples: 342502400. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:18:40,895][1652475] Updated weights for policy 0, policy_version 668864 (0.0013) [2024-06-15 20:18:44,133][1652475] Updated weights for policy 0, policy_version 668924 (0.0011) [2024-06-15 20:18:45,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 43098.3). Total num frames: 1369997312. Throughput: 0: 11002.3. Samples: 342571008. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:18:46,183][1652475] Updated weights for policy 0, policy_version 668962 (0.0011) [2024-06-15 20:18:49,403][1652475] Updated weights for policy 0, policy_version 669024 (0.0010) [2024-06-15 20:18:50,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 43098.2). Total num frames: 1370226688. Throughput: 0: 11013.8. Samples: 342608384. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:18:52,189][1652475] Updated weights for policy 0, policy_version 669104 (0.0012) [2024-06-15 20:18:55,317][1652475] Updated weights for policy 0, policy_version 669173 (0.0013) [2024-06-15 20:18:55,738][1648984] Fps is (10 sec: 49149.6, 60 sec: 44782.5, 300 sec: 43098.2). Total num frames: 1370488832. Throughput: 0: 11320.8. Samples: 342674432. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:18:55,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:18:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000669184_1370488832.pth... [2024-06-15 20:18:55,812][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000664176_1360232448.pth [2024-06-15 20:18:57,422][1652475] Updated weights for policy 0, policy_version 669200 (0.0011) [2024-06-15 20:18:58,780][1652475] Updated weights for policy 0, policy_version 669248 (0.0013) [2024-06-15 20:19:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1370619904. Throughput: 0: 11184.4. Samples: 342739968. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:19:00,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:19:01,269][1651340] Signal inference workers to stop experience collection... (34400 times) [2024-06-15 20:19:01,313][1652475] InferenceWorker_p0-w0: stopping experience collection (34400 times) [2024-06-15 20:19:01,461][1651340] Signal inference workers to resume experience collection... (34400 times) [2024-06-15 20:19:01,461][1652475] InferenceWorker_p0-w0: resuming experience collection (34400 times) [2024-06-15 20:19:03,686][1652475] Updated weights for policy 0, policy_version 669329 (0.0012) [2024-06-15 20:19:05,584][1652475] Updated weights for policy 0, policy_version 669392 (0.0011) [2024-06-15 20:19:05,738][1648984] Fps is (10 sec: 42601.1, 60 sec: 46421.7, 300 sec: 42765.0). Total num frames: 1370914816. Throughput: 0: 11173.0. Samples: 342770176. Policy #0 lag: (min: 14.0, avg: 135.0, max: 334.0) [2024-06-15 20:19:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:19:09,066][1652475] Updated weights for policy 0, policy_version 669441 (0.0011) [2024-06-15 20:19:10,712][1652475] Updated weights for policy 0, policy_version 669504 (0.0035) [2024-06-15 20:19:10,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 1371144192. Throughput: 0: 11525.7. Samples: 342848000. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:19:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:19:13,179][1652475] Updated weights for policy 0, policy_version 669565 (0.0114) [2024-06-15 20:19:15,291][1652475] Updated weights for policy 0, policy_version 669603 (0.0021) [2024-06-15 20:19:15,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 48059.8, 300 sec: 43209.3). Total num frames: 1371406336. Throughput: 0: 11309.5. Samples: 342907392. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:19:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:19:17,399][1652475] Updated weights for policy 0, policy_version 669664 (0.0014) [2024-06-15 20:19:20,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43690.5, 300 sec: 42987.2). Total num frames: 1371537408. Throughput: 0: 11264.0. Samples: 342939648. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:19:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:19:21,894][1652475] Updated weights for policy 0, policy_version 669744 (0.0013) [2024-06-15 20:19:24,250][1652475] Updated weights for policy 0, policy_version 669792 (0.0105) [2024-06-15 20:19:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 45875.2, 300 sec: 43098.2). Total num frames: 1371799552. Throughput: 0: 11377.8. Samples: 343014400. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:19:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:19:26,229][1652475] Updated weights for policy 0, policy_version 669856 (0.0011) [2024-06-15 20:19:28,608][1652475] Updated weights for policy 0, policy_version 669920 (0.0013) [2024-06-15 20:19:30,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 45874.9, 300 sec: 43098.2). Total num frames: 1372061696. Throughput: 0: 11343.6. Samples: 343081472. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:19:30,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:19:32,752][1652475] Updated weights for policy 0, policy_version 669984 (0.0011) [2024-06-15 20:19:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1372192768. Throughput: 0: 11275.4. Samples: 343115776. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:19:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:19:36,983][1652475] Updated weights for policy 0, policy_version 670039 (0.0012) [2024-06-15 20:19:38,244][1652475] Updated weights for policy 0, policy_version 670103 (0.0011) [2024-06-15 20:19:39,160][1652475] Updated weights for policy 0, policy_version 670147 (0.0076) [2024-06-15 20:19:40,236][1652475] Updated weights for policy 0, policy_version 670208 (0.0036) [2024-06-15 20:19:40,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 46421.3, 300 sec: 43209.3). Total num frames: 1372585984. Throughput: 0: 11321.0. Samples: 343183872. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:19:40,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:19:43,983][1652475] Updated weights for policy 0, policy_version 670256 (0.0016) [2024-06-15 20:19:45,738][1648984] Fps is (10 sec: 52427.5, 60 sec: 45328.9, 300 sec: 43098.2). Total num frames: 1372717056. Throughput: 0: 11400.5. Samples: 343252992. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:19:45,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:19:48,032][1652475] Updated weights for policy 0, policy_version 670275 (0.0013) [2024-06-15 20:19:49,443][1651340] Signal inference workers to stop experience collection... (34450 times) [2024-06-15 20:19:49,530][1652475] InferenceWorker_p0-w0: stopping experience collection (34450 times) [2024-06-15 20:19:49,649][1651340] Signal inference workers to resume experience collection... (34450 times) [2024-06-15 20:19:49,649][1652475] InferenceWorker_p0-w0: resuming experience collection (34450 times) [2024-06-15 20:19:49,651][1652475] Updated weights for policy 0, policy_version 670352 (0.0011) [2024-06-15 20:19:50,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 45329.1, 300 sec: 43209.3). Total num frames: 1372946432. Throughput: 0: 11571.2. Samples: 343290880. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:19:50,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 20:19:51,280][1652475] Updated weights for policy 0, policy_version 670417 (0.0012) [2024-06-15 20:19:53,985][1652475] Updated weights for policy 0, policy_version 670466 (0.0013) [2024-06-15 20:19:55,164][1652475] Updated weights for policy 0, policy_version 670521 (0.0013) [2024-06-15 20:19:55,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 45875.6, 300 sec: 43542.6). Total num frames: 1373241344. Throughput: 0: 11184.4. Samples: 343351296. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:19:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 20:20:00,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 1373274112. Throughput: 0: 11559.8. Samples: 343427584. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:20:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:20:00,795][1652475] Updated weights for policy 0, policy_version 670550 (0.0012) [2024-06-15 20:20:02,790][1652475] Updated weights for policy 0, policy_version 670640 (0.0014) [2024-06-15 20:20:04,558][1652475] Updated weights for policy 0, policy_version 670713 (0.0134) [2024-06-15 20:20:05,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 45875.1, 300 sec: 43653.6). Total num frames: 1373667328. Throughput: 0: 11400.6. Samples: 343452672. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:20:05,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 20:20:06,222][1652475] Updated weights for policy 0, policy_version 670756 (0.0014) [2024-06-15 20:20:10,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 1373765632. Throughput: 0: 11161.6. Samples: 343516672. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:20:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:20:12,715][1652475] Updated weights for policy 0, policy_version 670788 (0.0013) [2024-06-15 20:20:14,066][1652475] Updated weights for policy 0, policy_version 670852 (0.0013) [2024-06-15 20:20:15,360][1652475] Updated weights for policy 0, policy_version 670908 (0.0017) [2024-06-15 20:20:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 1374027776. Throughput: 0: 11116.2. Samples: 343581696. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:20:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:20:16,937][1652475] Updated weights for policy 0, policy_version 670962 (0.0025) [2024-06-15 20:20:18,881][1652475] Updated weights for policy 0, policy_version 671035 (0.0011) [2024-06-15 20:20:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 43653.7). Total num frames: 1374289920. Throughput: 0: 10820.3. Samples: 343602688. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:20:20,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:20:25,554][1652475] Updated weights for policy 0, policy_version 671090 (0.0019) [2024-06-15 20:20:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1374420992. Throughput: 0: 10922.7. Samples: 343675392. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:20:25,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:20:28,287][1652475] Updated weights for policy 0, policy_version 671138 (0.0014) [2024-06-15 20:20:29,899][1652475] Updated weights for policy 0, policy_version 671202 (0.0018) [2024-06-15 20:20:30,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 1374683136. Throughput: 0: 10626.9. Samples: 343731200. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:20:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:20:30,774][1651340] Signal inference workers to stop experience collection... (34500 times) [2024-06-15 20:20:30,844][1652475] InferenceWorker_p0-w0: stopping experience collection (34500 times) [2024-06-15 20:20:31,059][1651340] Signal inference workers to resume experience collection... (34500 times) [2024-06-15 20:20:31,060][1652475] InferenceWorker_p0-w0: resuming experience collection (34500 times) [2024-06-15 20:20:32,252][1652475] Updated weights for policy 0, policy_version 671294 (0.0139) [2024-06-15 20:20:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1374814208. Throughput: 0: 10274.1. Samples: 343753216. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:20:35,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:20:38,560][1652475] Updated weights for policy 0, policy_version 671360 (0.0015) [2024-06-15 20:20:40,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 39867.8, 300 sec: 43209.4). Total num frames: 1374978048. Throughput: 0: 10342.4. Samples: 343816704. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:20:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:20:41,808][1652475] Updated weights for policy 0, policy_version 671418 (0.0011) [2024-06-15 20:20:44,089][1652475] Updated weights for policy 0, policy_version 671472 (0.0013) [2024-06-15 20:20:45,410][1652475] Updated weights for policy 0, policy_version 671524 (0.0013) [2024-06-15 20:20:45,740][1648984] Fps is (10 sec: 49141.5, 60 sec: 43143.2, 300 sec: 43875.5). Total num frames: 1375305728. Throughput: 0: 10023.3. Samples: 343878656. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 20:20:45,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:20:48,916][1652475] Updated weights for policy 0, policy_version 671571 (0.0021) [2024-06-15 20:20:50,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 1375469568. Throughput: 0: 10251.4. Samples: 343913984. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:20:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:20:52,678][1652475] Updated weights for policy 0, policy_version 671650 (0.0012) [2024-06-15 20:20:53,270][1652475] Updated weights for policy 0, policy_version 671680 (0.0011) [2024-06-15 20:20:55,271][1652475] Updated weights for policy 0, policy_version 671728 (0.0013) [2024-06-15 20:20:55,738][1648984] Fps is (10 sec: 42605.9, 60 sec: 41505.9, 300 sec: 43986.8). Total num frames: 1375731712. Throughput: 0: 10433.4. Samples: 343986176. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:20:55,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:20:55,769][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000671744_1375731712.pth... [2024-06-15 20:20:55,842][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000666624_1365245952.pth [2024-06-15 20:20:58,264][1652475] Updated weights for policy 0, policy_version 671792 (0.0011) [2024-06-15 20:21:00,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 44236.8, 300 sec: 43653.6). Total num frames: 1375928320. Throughput: 0: 10422.1. Samples: 344050688. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:21:01,084][1652475] Updated weights for policy 0, policy_version 671860 (0.0114) [2024-06-15 20:21:05,010][1652475] Updated weights for policy 0, policy_version 671934 (0.0194) [2024-06-15 20:21:05,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 40960.0, 300 sec: 43542.6). Total num frames: 1376124928. Throughput: 0: 10672.4. Samples: 344082944. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:21:07,302][1652475] Updated weights for policy 0, policy_version 671993 (0.0020) [2024-06-15 20:21:10,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 42598.3, 300 sec: 43433.1). Total num frames: 1376321536. Throughput: 0: 10422.0. Samples: 344144384. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:10,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:21:11,076][1652475] Updated weights for policy 0, policy_version 672059 (0.0013) [2024-06-15 20:21:13,208][1652475] Updated weights for policy 0, policy_version 672128 (0.0012) [2024-06-15 20:21:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 1376518144. Throughput: 0: 10729.2. Samples: 344214016. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:15,746][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:21:17,998][1652475] Updated weights for policy 0, policy_version 672194 (0.0095) [2024-06-15 20:21:20,738][1648984] Fps is (10 sec: 45876.4, 60 sec: 41506.2, 300 sec: 43653.6). Total num frames: 1376780288. Throughput: 0: 10911.3. Samples: 344244224. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:21:21,867][1652475] Updated weights for policy 0, policy_version 672288 (0.0013) [2024-06-15 20:21:21,987][1651340] Signal inference workers to stop experience collection... (34550 times) [2024-06-15 20:21:22,077][1652475] InferenceWorker_p0-w0: stopping experience collection (34550 times) [2024-06-15 20:21:22,255][1651340] Signal inference workers to resume experience collection... (34550 times) [2024-06-15 20:21:22,255][1652475] InferenceWorker_p0-w0: resuming experience collection (34550 times) [2024-06-15 20:21:24,672][1652475] Updated weights for policy 0, policy_version 672337 (0.0013) [2024-06-15 20:21:25,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 43764.7). Total num frames: 1377009664. Throughput: 0: 11013.7. Samples: 344312320. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:21:27,955][1652475] Updated weights for policy 0, policy_version 672405 (0.0013) [2024-06-15 20:21:30,332][1652475] Updated weights for policy 0, policy_version 672466 (0.0035) [2024-06-15 20:21:30,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43764.8). Total num frames: 1377239040. Throughput: 0: 11048.3. Samples: 344375808. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:21:31,090][1652475] Updated weights for policy 0, policy_version 672512 (0.0011) [2024-06-15 20:21:35,682][1652475] Updated weights for policy 0, policy_version 672577 (0.0021) [2024-06-15 20:21:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1377435648. Throughput: 0: 11104.7. Samples: 344413696. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:21:39,527][1652475] Updated weights for policy 0, policy_version 672645 (0.0013) [2024-06-15 20:21:40,565][1652475] Updated weights for policy 0, policy_version 672704 (0.0011) [2024-06-15 20:21:40,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 45329.0, 300 sec: 43544.4). Total num frames: 1377697792. Throughput: 0: 10945.5. Samples: 344478720. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:21:42,232][1652475] Updated weights for policy 0, policy_version 672752 (0.0014) [2024-06-15 20:21:45,571][1652475] Updated weights for policy 0, policy_version 672816 (0.0013) [2024-06-15 20:21:45,738][1648984] Fps is (10 sec: 49150.6, 60 sec: 43692.0, 300 sec: 43764.7). Total num frames: 1377927168. Throughput: 0: 11047.7. Samples: 344547840. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:45,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:21:49,175][1652475] Updated weights for policy 0, policy_version 672882 (0.0011) [2024-06-15 20:21:50,546][1652475] Updated weights for policy 0, policy_version 672912 (0.0012) [2024-06-15 20:21:50,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 44236.9, 300 sec: 43875.9). Total num frames: 1378123776. Throughput: 0: 11104.7. Samples: 344582656. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:21:51,547][1652475] Updated weights for policy 0, policy_version 672953 (0.0017) [2024-06-15 20:21:53,692][1652475] Updated weights for policy 0, policy_version 673024 (0.0019) [2024-06-15 20:21:55,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 43690.8, 300 sec: 43988.5). Total num frames: 1378353152. Throughput: 0: 11173.0. Samples: 344647168. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:21:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:21:57,046][1652475] Updated weights for policy 0, policy_version 673084 (0.0150) [2024-06-15 20:22:00,738][1648984] Fps is (10 sec: 36043.5, 60 sec: 42598.1, 300 sec: 43542.6). Total num frames: 1378484224. Throughput: 0: 11229.8. Samples: 344719360. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:22:00,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:22:03,034][1652475] Updated weights for policy 0, policy_version 673168 (0.0150) [2024-06-15 20:22:05,053][1652475] Updated weights for policy 0, policy_version 673248 (0.0012) [2024-06-15 20:22:05,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 45329.1, 300 sec: 43875.8). Total num frames: 1378844672. Throughput: 0: 11161.6. Samples: 344746496. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:22:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:22:08,922][1652475] Updated weights for policy 0, policy_version 673303 (0.0011) [2024-06-15 20:22:09,207][1651340] Signal inference workers to stop experience collection... (34600 times) [2024-06-15 20:22:09,263][1652475] InferenceWorker_p0-w0: stopping experience collection (34600 times) [2024-06-15 20:22:09,467][1651340] Signal inference workers to resume experience collection... (34600 times) [2024-06-15 20:22:09,468][1652475] InferenceWorker_p0-w0: resuming experience collection (34600 times) [2024-06-15 20:22:10,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 44783.1, 300 sec: 43986.9). Total num frames: 1379008512. Throughput: 0: 10934.0. Samples: 344804352. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:22:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:22:14,173][1652475] Updated weights for policy 0, policy_version 673348 (0.0011) [2024-06-15 20:22:15,743][1648984] Fps is (10 sec: 29476.3, 60 sec: 43687.1, 300 sec: 43541.8). Total num frames: 1379139584. Throughput: 0: 11080.7. Samples: 344874496. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:22:15,743][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:22:17,144][1652475] Updated weights for policy 0, policy_version 673458 (0.0012) [2024-06-15 20:22:18,683][1652475] Updated weights for policy 0, policy_version 673536 (0.0011) [2024-06-15 20:22:20,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.5, 300 sec: 43986.8). Total num frames: 1379401728. Throughput: 0: 10683.7. Samples: 344894464. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:22:20,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:22:22,331][1652475] Updated weights for policy 0, policy_version 673600 (0.0058) [2024-06-15 20:22:25,738][1648984] Fps is (10 sec: 39341.4, 60 sec: 42052.3, 300 sec: 43542.6). Total num frames: 1379532800. Throughput: 0: 10683.8. Samples: 344959488. Policy #0 lag: (min: 14.0, avg: 125.4, max: 270.0) [2024-06-15 20:22:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 20:22:28,177][1652475] Updated weights for policy 0, policy_version 673668 (0.0012) [2024-06-15 20:22:29,312][1652475] Updated weights for policy 0, policy_version 673721 (0.0012) [2024-06-15 20:22:30,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 1379827712. Throughput: 0: 10592.8. Samples: 345024512. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:22:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 20:22:31,350][1652475] Updated weights for policy 0, policy_version 673780 (0.0014) [2024-06-15 20:22:34,175][1652475] Updated weights for policy 0, policy_version 673854 (0.0013) [2024-06-15 20:22:35,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 44097.9). Total num frames: 1380057088. Throughput: 0: 10490.3. Samples: 345054720. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:22:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:22:39,380][1652475] Updated weights for policy 0, policy_version 673909 (0.0011) [2024-06-15 20:22:40,763][1648984] Fps is (10 sec: 35952.8, 60 sec: 41488.5, 300 sec: 43094.5). Total num frames: 1380188160. Throughput: 0: 10564.0. Samples: 345122816. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:22:40,764][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:22:40,819][1652475] Updated weights for policy 0, policy_version 673936 (0.0010) [2024-06-15 20:22:42,039][1652475] Updated weights for policy 0, policy_version 673984 (0.0013) [2024-06-15 20:22:45,658][1652475] Updated weights for policy 0, policy_version 674064 (0.0013) [2024-06-15 20:22:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.6, 300 sec: 44097.9). Total num frames: 1380483072. Throughput: 0: 10262.8. Samples: 345181184. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:22:45,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 20:22:50,252][1652475] Updated weights for policy 0, policy_version 674128 (0.0017) [2024-06-15 20:22:50,738][1648984] Fps is (10 sec: 45992.8, 60 sec: 42052.2, 300 sec: 43542.6). Total num frames: 1380646912. Throughput: 0: 10365.1. Samples: 345212928. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:22:50,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:22:53,922][1652475] Updated weights for policy 0, policy_version 674192 (0.0097) [2024-06-15 20:22:55,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 42052.1, 300 sec: 43653.6). Total num frames: 1380876288. Throughput: 0: 10547.1. Samples: 345278976. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:22:55,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:22:56,189][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000674288_1380941824.pth... [2024-06-15 20:22:56,190][1652475] Updated weights for policy 0, policy_version 674288 (0.0013) [2024-06-15 20:22:56,239][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000669184_1370488832.pth [2024-06-15 20:22:58,952][1651340] Signal inference workers to stop experience collection... (34650 times) [2024-06-15 20:22:58,988][1652475] InferenceWorker_p0-w0: stopping experience collection (34650 times) [2024-06-15 20:22:59,172][1651340] Signal inference workers to resume experience collection... (34650 times) [2024-06-15 20:22:59,173][1652475] InferenceWorker_p0-w0: resuming experience collection (34650 times) [2024-06-15 20:22:59,813][1652475] Updated weights for policy 0, policy_version 674341 (0.0013) [2024-06-15 20:23:00,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.9, 300 sec: 43986.9). Total num frames: 1381105664. Throughput: 0: 10343.6. Samples: 345339904. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 20:23:01,159][1652475] Updated weights for policy 0, policy_version 674384 (0.0013) [2024-06-15 20:23:05,727][1652475] Updated weights for policy 0, policy_version 674448 (0.0012) [2024-06-15 20:23:05,738][1648984] Fps is (10 sec: 39323.2, 60 sec: 40413.9, 300 sec: 43209.4). Total num frames: 1381269504. Throughput: 0: 10661.0. Samples: 345374208. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:23:07,187][1652475] Updated weights for policy 0, policy_version 674499 (0.0014) [2024-06-15 20:23:10,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41506.1, 300 sec: 43986.9). Total num frames: 1381498880. Throughput: 0: 10717.8. Samples: 345441792. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:10,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:23:11,647][1652475] Updated weights for policy 0, policy_version 674563 (0.0028) [2024-06-15 20:23:12,887][1652475] Updated weights for policy 0, policy_version 674624 (0.0011) [2024-06-15 20:23:14,216][1652475] Updated weights for policy 0, policy_version 674680 (0.0012) [2024-06-15 20:23:15,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43694.3, 300 sec: 43542.6). Total num frames: 1381761024. Throughput: 0: 10865.8. Samples: 345513472. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:23:17,840][1652475] Updated weights for policy 0, policy_version 674740 (0.0015) [2024-06-15 20:23:18,989][1652475] Updated weights for policy 0, policy_version 674784 (0.0012) [2024-06-15 20:23:20,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 43690.8, 300 sec: 43986.9). Total num frames: 1382023168. Throughput: 0: 10956.8. Samples: 345547776. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:23:23,259][1652475] Updated weights for policy 0, policy_version 674852 (0.0013) [2024-06-15 20:23:25,037][1652475] Updated weights for policy 0, policy_version 674898 (0.0018) [2024-06-15 20:23:25,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 45329.1, 300 sec: 43875.8). Total num frames: 1382252544. Throughput: 0: 10997.2. Samples: 345617408. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:23:26,016][1652475] Updated weights for policy 0, policy_version 674942 (0.0010) [2024-06-15 20:23:29,017][1652475] Updated weights for policy 0, policy_version 674983 (0.0014) [2024-06-15 20:23:30,242][1652475] Updated weights for policy 0, policy_version 675028 (0.0014) [2024-06-15 20:23:30,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44236.8, 300 sec: 43764.7). Total num frames: 1382481920. Throughput: 0: 11150.2. Samples: 345682944. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:23:34,986][1652475] Updated weights for policy 0, policy_version 675104 (0.0147) [2024-06-15 20:23:35,738][1648984] Fps is (10 sec: 42597.5, 60 sec: 43690.5, 300 sec: 43653.6). Total num frames: 1382678528. Throughput: 0: 11195.7. Samples: 345716736. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:35,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:23:36,372][1652475] Updated weights for policy 0, policy_version 675137 (0.0025) [2024-06-15 20:23:37,828][1652475] Updated weights for policy 0, policy_version 675196 (0.0120) [2024-06-15 20:23:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 44802.0, 300 sec: 43653.6). Total num frames: 1382875136. Throughput: 0: 11218.6. Samples: 345783808. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:23:40,990][1652475] Updated weights for policy 0, policy_version 675257 (0.0026) [2024-06-15 20:23:42,554][1651340] Signal inference workers to stop experience collection... (34700 times) [2024-06-15 20:23:42,594][1652475] InferenceWorker_p0-w0: stopping experience collection (34700 times) [2024-06-15 20:23:42,856][1651340] Signal inference workers to resume experience collection... (34700 times) [2024-06-15 20:23:42,857][1652475] InferenceWorker_p0-w0: resuming experience collection (34700 times) [2024-06-15 20:23:42,859][1652475] Updated weights for policy 0, policy_version 675312 (0.0015) [2024-06-15 20:23:45,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 1383071744. Throughput: 0: 11264.0. Samples: 345846784. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:23:46,978][1652475] Updated weights for policy 0, policy_version 675378 (0.0117) [2024-06-15 20:23:49,738][1652475] Updated weights for policy 0, policy_version 675447 (0.0013) [2024-06-15 20:23:50,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 1383333888. Throughput: 0: 11298.1. Samples: 345882624. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:23:52,801][1652475] Updated weights for policy 0, policy_version 675489 (0.0012) [2024-06-15 20:23:54,218][1652475] Updated weights for policy 0, policy_version 675536 (0.0019) [2024-06-15 20:23:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45329.3, 300 sec: 43986.9). Total num frames: 1383596032. Throughput: 0: 11150.2. Samples: 345943552. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:23:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:23:58,422][1652475] Updated weights for policy 0, policy_version 675589 (0.0015) [2024-06-15 20:23:59,937][1652475] Updated weights for policy 0, policy_version 675648 (0.0012) [2024-06-15 20:24:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 1383727104. Throughput: 0: 11036.4. Samples: 346010112. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:24:00,778][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:24:04,313][1652475] Updated weights for policy 0, policy_version 675713 (0.0025) [2024-06-15 20:24:05,702][1652475] Updated weights for policy 0, policy_version 675778 (0.0010) [2024-06-15 20:24:05,739][1648984] Fps is (10 sec: 39317.8, 60 sec: 45328.2, 300 sec: 43542.4). Total num frames: 1383989248. Throughput: 0: 10899.6. Samples: 346038272. Policy #0 lag: (min: 47.0, avg: 112.3, max: 303.0) [2024-06-15 20:24:05,740][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:24:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1384120320. Throughput: 0: 10717.9. Samples: 346099712. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:24:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:24:12,586][1652475] Updated weights for policy 0, policy_version 675841 (0.0021) [2024-06-15 20:24:13,943][1652475] Updated weights for policy 0, policy_version 675904 (0.0037) [2024-06-15 20:24:15,738][1648984] Fps is (10 sec: 39325.1, 60 sec: 43690.5, 300 sec: 43542.6). Total num frames: 1384382464. Throughput: 0: 10706.4. Samples: 346164736. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:24:15,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:24:16,763][1652475] Updated weights for policy 0, policy_version 676000 (0.0018) [2024-06-15 20:24:18,490][1652475] Updated weights for policy 0, policy_version 676066 (0.0090) [2024-06-15 20:24:20,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1384644608. Throughput: 0: 10615.5. Samples: 346194432. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:24:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:24:24,739][1652475] Updated weights for policy 0, policy_version 676128 (0.0012) [2024-06-15 20:24:25,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 42052.2, 300 sec: 43098.3). Total num frames: 1384775680. Throughput: 0: 10820.3. Samples: 346270720. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:24:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:24:27,745][1652475] Updated weights for policy 0, policy_version 676208 (0.0013) [2024-06-15 20:24:29,495][1652475] Updated weights for policy 0, policy_version 676272 (0.0012) [2024-06-15 20:24:29,948][1651340] Signal inference workers to stop experience collection... (34750 times) [2024-06-15 20:24:29,982][1651340] Signal inference workers to resume experience collection... (34750 times) [2024-06-15 20:24:29,989][1652475] InferenceWorker_p0-w0: stopping experience collection (34750 times) [2024-06-15 20:24:30,015][1652475] InferenceWorker_p0-w0: resuming experience collection (34750 times) [2024-06-15 20:24:30,481][1652475] Updated weights for policy 0, policy_version 676307 (0.0010) [2024-06-15 20:24:30,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 1385103360. Throughput: 0: 10638.2. Samples: 346325504. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:24:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:24:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42052.4, 300 sec: 42765.0). Total num frames: 1385201664. Throughput: 0: 10558.6. Samples: 346357760. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:24:35,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:24:35,898][1652475] Updated weights for policy 0, policy_version 676375 (0.0015) [2024-06-15 20:24:39,371][1652475] Updated weights for policy 0, policy_version 676448 (0.0107) [2024-06-15 20:24:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.6, 300 sec: 43209.4). Total num frames: 1385463808. Throughput: 0: 10797.5. Samples: 346429440. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:24:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:24:41,854][1652475] Updated weights for policy 0, policy_version 676536 (0.0012) [2024-06-15 20:24:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.6, 300 sec: 43098.2). Total num frames: 1385660416. Throughput: 0: 10467.6. Samples: 346481152. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:24:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:24:45,820][1652475] Updated weights for policy 0, policy_version 676608 (0.0011) [2024-06-15 20:24:50,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1385824256. Throughput: 0: 10581.6. Samples: 346514432. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:24:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:24:51,742][1652475] Updated weights for policy 0, policy_version 676688 (0.0012) [2024-06-15 20:24:53,370][1652475] Updated weights for policy 0, policy_version 676752 (0.0010) [2024-06-15 20:24:55,738][1648984] Fps is (10 sec: 42597.0, 60 sec: 41506.0, 300 sec: 43431.4). Total num frames: 1386086400. Throughput: 0: 10604.0. Samples: 346576896. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:24:55,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:24:55,748][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000676800_1386086400.pth... [2024-06-15 20:24:55,799][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000671744_1375731712.pth [2024-06-15 20:24:57,049][1652475] Updated weights for policy 0, policy_version 676816 (0.0017) [2024-06-15 20:24:59,820][1652475] Updated weights for policy 0, policy_version 676896 (0.0012) [2024-06-15 20:25:00,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 1386348544. Throughput: 0: 10570.0. Samples: 346640384. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:25:00,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:25:03,540][1652475] Updated weights for policy 0, policy_version 676948 (0.0016) [2024-06-15 20:25:04,351][1652475] Updated weights for policy 0, policy_version 676991 (0.0012) [2024-06-15 20:25:05,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.6, 300 sec: 43098.2). Total num frames: 1386479616. Throughput: 0: 10842.9. Samples: 346682368. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:25:05,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:25:08,504][1652475] Updated weights for policy 0, policy_version 677074 (0.0014) [2024-06-15 20:25:10,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1386741760. Throughput: 0: 10387.9. Samples: 346738176. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:25:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:25:11,644][1652475] Updated weights for policy 0, policy_version 677122 (0.0015) [2024-06-15 20:25:12,950][1652475] Updated weights for policy 0, policy_version 677182 (0.0011) [2024-06-15 20:25:15,464][1652475] Updated weights for policy 0, policy_version 677232 (0.0012) [2024-06-15 20:25:15,738][1648984] Fps is (10 sec: 49153.7, 60 sec: 43144.7, 300 sec: 42987.2). Total num frames: 1386971136. Throughput: 0: 10729.2. Samples: 346808320. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:25:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:25:18,595][1652475] Updated weights for policy 0, policy_version 677267 (0.0031) [2024-06-15 20:25:18,918][1651340] Signal inference workers to stop experience collection... (34800 times) [2024-06-15 20:25:19,013][1652475] InferenceWorker_p0-w0: stopping experience collection (34800 times) [2024-06-15 20:25:19,148][1651340] Signal inference workers to resume experience collection... (34800 times) [2024-06-15 20:25:19,150][1652475] InferenceWorker_p0-w0: resuming experience collection (34800 times) [2024-06-15 20:25:20,318][1652475] Updated weights for policy 0, policy_version 677344 (0.0013) [2024-06-15 20:25:20,738][1648984] Fps is (10 sec: 49149.8, 60 sec: 43144.2, 300 sec: 43431.4). Total num frames: 1387233280. Throughput: 0: 10888.4. Samples: 346847744. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:25:20,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:25:25,138][1652475] Updated weights for policy 0, policy_version 677408 (0.0014) [2024-06-15 20:25:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1387364352. Throughput: 0: 10831.6. Samples: 346916864. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:25:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:25:27,216][1652475] Updated weights for policy 0, policy_version 677488 (0.0018) [2024-06-15 20:25:30,296][1652475] Updated weights for policy 0, policy_version 677536 (0.0010) [2024-06-15 20:25:30,738][1648984] Fps is (10 sec: 39323.4, 60 sec: 42052.3, 300 sec: 43431.5). Total num frames: 1387626496. Throughput: 0: 10968.2. Samples: 346974720. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:25:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:25:32,308][1652475] Updated weights for policy 0, policy_version 677608 (0.0050) [2024-06-15 20:25:35,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 1387790336. Throughput: 0: 10888.5. Samples: 347004416. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:25:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:25:37,243][1652475] Updated weights for policy 0, policy_version 677666 (0.0014) [2024-06-15 20:25:39,118][1652475] Updated weights for policy 0, policy_version 677744 (0.0077) [2024-06-15 20:25:40,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.5, 300 sec: 43209.6). Total num frames: 1388052480. Throughput: 0: 10945.5. Samples: 347069440. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:25:40,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:25:42,525][1652475] Updated weights for policy 0, policy_version 677796 (0.0013) [2024-06-15 20:25:44,028][1652475] Updated weights for policy 0, policy_version 677859 (0.0152) [2024-06-15 20:25:45,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 44236.7, 300 sec: 43542.5). Total num frames: 1388314624. Throughput: 0: 11070.5. Samples: 347138560. Policy #0 lag: (min: 64.0, avg: 177.0, max: 288.0) [2024-06-15 20:25:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:25:48,332][1652475] Updated weights for policy 0, policy_version 677906 (0.0014) [2024-06-15 20:25:50,205][1652475] Updated weights for policy 0, policy_version 677971 (0.0011) [2024-06-15 20:25:50,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44782.9, 300 sec: 43320.5). Total num frames: 1388511232. Throughput: 0: 11059.3. Samples: 347180032. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:25:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:25:53,438][1652475] Updated weights for policy 0, policy_version 678032 (0.0013) [2024-06-15 20:25:55,316][1652475] Updated weights for policy 0, policy_version 678086 (0.0015) [2024-06-15 20:25:55,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 44237.0, 300 sec: 43431.5). Total num frames: 1388740608. Throughput: 0: 11161.6. Samples: 347240448. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:25:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:25:56,535][1652475] Updated weights for policy 0, policy_version 678141 (0.0012) [2024-06-15 20:26:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 1388904448. Throughput: 0: 11150.2. Samples: 347310080. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:26:01,061][1652475] Updated weights for policy 0, policy_version 678200 (0.0013) [2024-06-15 20:26:01,980][1651340] Signal inference workers to stop experience collection... (34850 times) [2024-06-15 20:26:02,056][1652475] InferenceWorker_p0-w0: stopping experience collection (34850 times) [2024-06-15 20:26:02,231][1651340] Signal inference workers to resume experience collection... (34850 times) [2024-06-15 20:26:02,232][1652475] InferenceWorker_p0-w0: resuming experience collection (34850 times) [2024-06-15 20:26:02,838][1652475] Updated weights for policy 0, policy_version 678268 (0.0019) [2024-06-15 20:26:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44237.0, 300 sec: 43431.5). Total num frames: 1389133824. Throughput: 0: 10888.6. Samples: 347337728. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:26:06,317][1652475] Updated weights for policy 0, policy_version 678321 (0.0014) [2024-06-15 20:26:07,452][1652475] Updated weights for policy 0, policy_version 678368 (0.0011) [2024-06-15 20:26:10,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1389363200. Throughput: 0: 10865.8. Samples: 347405824. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:10,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:26:12,337][1652475] Updated weights for policy 0, policy_version 678436 (0.0033) [2024-06-15 20:26:13,319][1652475] Updated weights for policy 0, policy_version 678480 (0.0012) [2024-06-15 20:26:15,741][1648984] Fps is (10 sec: 49136.4, 60 sec: 44234.4, 300 sec: 43542.1). Total num frames: 1389625344. Throughput: 0: 11149.4. Samples: 347476480. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:15,741][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:26:18,004][1652475] Updated weights for policy 0, policy_version 678561 (0.0022) [2024-06-15 20:26:20,148][1652475] Updated weights for policy 0, policy_version 678650 (0.0153) [2024-06-15 20:26:20,766][1648984] Fps is (10 sec: 52279.8, 60 sec: 44216.1, 300 sec: 43649.4). Total num frames: 1389887488. Throughput: 0: 11143.2. Samples: 347506176. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:20,767][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:26:25,425][1652475] Updated weights for policy 0, policy_version 678704 (0.0012) [2024-06-15 20:26:25,738][1648984] Fps is (10 sec: 36056.1, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1389985792. Throughput: 0: 11195.7. Samples: 347573248. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:26:29,378][1652475] Updated weights for policy 0, policy_version 678786 (0.0029) [2024-06-15 20:26:30,738][1648984] Fps is (10 sec: 39433.8, 60 sec: 44236.7, 300 sec: 43542.6). Total num frames: 1390280704. Throughput: 0: 11047.8. Samples: 347635712. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:30,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:26:30,797][1652475] Updated weights for policy 0, policy_version 678851 (0.0013) [2024-06-15 20:26:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1390411776. Throughput: 0: 10820.2. Samples: 347666944. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:26:36,204][1652475] Updated weights for policy 0, policy_version 678916 (0.0015) [2024-06-15 20:26:37,509][1652475] Updated weights for policy 0, policy_version 678974 (0.0028) [2024-06-15 20:26:39,484][1652475] Updated weights for policy 0, policy_version 679034 (0.0014) [2024-06-15 20:26:40,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 43320.4). Total num frames: 1390706688. Throughput: 0: 11025.1. Samples: 347736576. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:40,741][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:26:41,992][1652475] Updated weights for policy 0, policy_version 679120 (0.0243) [2024-06-15 20:26:43,188][1652475] Updated weights for policy 0, policy_version 679164 (0.0016) [2024-06-15 20:26:45,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1390936064. Throughput: 0: 10854.4. Samples: 347798528. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:26:48,516][1651340] Signal inference workers to stop experience collection... (34900 times) [2024-06-15 20:26:48,575][1652475] InferenceWorker_p0-w0: stopping experience collection (34900 times) [2024-06-15 20:26:48,833][1651340] Signal inference workers to resume experience collection... (34900 times) [2024-06-15 20:26:48,842][1652475] InferenceWorker_p0-w0: resuming experience collection (34900 times) [2024-06-15 20:26:49,521][1652475] Updated weights for policy 0, policy_version 679232 (0.0011) [2024-06-15 20:26:50,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 1391067136. Throughput: 0: 11036.5. Samples: 347834368. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:26:53,262][1652475] Updated weights for policy 0, policy_version 679301 (0.0012) [2024-06-15 20:26:54,533][1652475] Updated weights for policy 0, policy_version 679349 (0.0011) [2024-06-15 20:26:55,558][1652475] Updated weights for policy 0, policy_version 679376 (0.0010) [2024-06-15 20:26:55,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 43653.7). Total num frames: 1391362048. Throughput: 0: 10865.8. Samples: 347894784. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:26:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:26:56,491][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000679408_1391427584.pth... [2024-06-15 20:26:56,526][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000674288_1380941824.pth [2024-06-15 20:26:56,910][1652475] Updated weights for policy 0, policy_version 679420 (0.0011) [2024-06-15 20:27:00,741][1648984] Fps is (10 sec: 45864.7, 60 sec: 43689.0, 300 sec: 42986.8). Total num frames: 1391525888. Throughput: 0: 10684.0. Samples: 347957248. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:27:00,745][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:27:00,986][1652475] Updated weights for policy 0, policy_version 679483 (0.0012) [2024-06-15 20:27:03,274][1652475] Updated weights for policy 0, policy_version 679536 (0.0014) [2024-06-15 20:27:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1391755264. Throughput: 0: 10758.8. Samples: 347990016. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:27:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:27:06,092][1652475] Updated weights for policy 0, policy_version 679586 (0.0013) [2024-06-15 20:27:08,845][1652475] Updated weights for policy 0, policy_version 679620 (0.0012) [2024-06-15 20:27:10,738][1648984] Fps is (10 sec: 45885.2, 60 sec: 43690.6, 300 sec: 43543.3). Total num frames: 1391984640. Throughput: 0: 10797.5. Samples: 348059136. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:27:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:27:11,889][1652475] Updated weights for policy 0, policy_version 679712 (0.0027) [2024-06-15 20:27:14,264][1652475] Updated weights for policy 0, policy_version 679767 (0.0027) [2024-06-15 20:27:15,023][1652475] Updated weights for policy 0, policy_version 679808 (0.0013) [2024-06-15 20:27:15,738][1648984] Fps is (10 sec: 49150.3, 60 sec: 43692.7, 300 sec: 43542.5). Total num frames: 1392246784. Throughput: 0: 10820.2. Samples: 348122624. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:27:15,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:27:18,906][1652475] Updated weights for policy 0, policy_version 679866 (0.0012) [2024-06-15 20:27:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42072.3, 300 sec: 43653.6). Total num frames: 1392410624. Throughput: 0: 10945.4. Samples: 348159488. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:27:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:27:21,202][1652475] Updated weights for policy 0, policy_version 679906 (0.0010) [2024-06-15 20:27:23,219][1652475] Updated weights for policy 0, policy_version 679952 (0.0012) [2024-06-15 20:27:25,068][1652475] Updated weights for policy 0, policy_version 680016 (0.0014) [2024-06-15 20:27:25,738][1648984] Fps is (10 sec: 49153.8, 60 sec: 45875.2, 300 sec: 43764.7). Total num frames: 1392738304. Throughput: 0: 10774.8. Samples: 348221440. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:27:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:27:30,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1392771072. Throughput: 0: 10922.7. Samples: 348290048. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 20:27:30,740][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:27:31,848][1652475] Updated weights for policy 0, policy_version 680124 (0.0016) [2024-06-15 20:27:33,411][1652475] Updated weights for policy 0, policy_version 680164 (0.0010) [2024-06-15 20:27:35,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 43690.6, 300 sec: 43546.3). Total num frames: 1393033216. Throughput: 0: 10660.9. Samples: 348314112. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:27:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:27:36,206][1652475] Updated weights for policy 0, policy_version 680195 (0.0016) [2024-06-15 20:27:38,101][1651340] Signal inference workers to stop experience collection... (34950 times) [2024-06-15 20:27:38,151][1652475] Updated weights for policy 0, policy_version 680258 (0.0011) [2024-06-15 20:27:38,173][1652475] InferenceWorker_p0-w0: stopping experience collection (34950 times) [2024-06-15 20:27:38,483][1651340] Signal inference workers to resume experience collection... (34950 times) [2024-06-15 20:27:38,484][1652475] InferenceWorker_p0-w0: resuming experience collection (34950 times) [2024-06-15 20:27:39,787][1652475] Updated weights for policy 0, policy_version 680320 (0.0025) [2024-06-15 20:27:40,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 1393295360. Throughput: 0: 10626.9. Samples: 348372992. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:27:40,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:27:44,220][1652475] Updated weights for policy 0, policy_version 680381 (0.0010) [2024-06-15 20:27:45,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 43144.7, 300 sec: 43653.7). Total num frames: 1393524736. Throughput: 0: 10672.9. Samples: 348437504. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:27:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:27:45,790][1652475] Updated weights for policy 0, policy_version 680443 (0.0054) [2024-06-15 20:27:49,240][1652475] Updated weights for policy 0, policy_version 680506 (0.0015) [2024-06-15 20:27:50,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 43431.5). Total num frames: 1393688576. Throughput: 0: 10717.9. Samples: 348472320. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:27:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:27:53,436][1652475] Updated weights for policy 0, policy_version 680571 (0.0013) [2024-06-15 20:27:55,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 1393885184. Throughput: 0: 10649.6. Samples: 348538368. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:27:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:27:56,149][1652475] Updated weights for policy 0, policy_version 680640 (0.0013) [2024-06-15 20:27:57,444][1652475] Updated weights for policy 0, policy_version 680700 (0.0017) [2024-06-15 20:28:00,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43146.2, 300 sec: 43542.6). Total num frames: 1394114560. Throughput: 0: 10706.6. Samples: 348604416. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:28:01,558][1652475] Updated weights for policy 0, policy_version 680757 (0.0014) [2024-06-15 20:28:04,873][1652475] Updated weights for policy 0, policy_version 680791 (0.0015) [2024-06-15 20:28:05,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 1394343936. Throughput: 0: 10638.2. Samples: 348638208. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:28:06,587][1652475] Updated weights for policy 0, policy_version 680850 (0.0065) [2024-06-15 20:28:07,639][1652475] Updated weights for policy 0, policy_version 680899 (0.0011) [2024-06-15 20:28:08,478][1652475] Updated weights for policy 0, policy_version 680944 (0.0020) [2024-06-15 20:28:10,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1394606080. Throughput: 0: 10820.3. Samples: 348708352. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:28:12,247][1652475] Updated weights for policy 0, policy_version 681015 (0.0013) [2024-06-15 20:28:15,742][1648984] Fps is (10 sec: 42579.1, 60 sec: 42049.3, 300 sec: 43208.7). Total num frames: 1394769920. Throughput: 0: 10921.5. Samples: 348781568. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:15,743][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:28:16,275][1652475] Updated weights for policy 0, policy_version 681063 (0.0024) [2024-06-15 20:28:17,318][1652475] Updated weights for policy 0, policy_version 681105 (0.0012) [2024-06-15 20:28:18,965][1652475] Updated weights for policy 0, policy_version 681168 (0.0014) [2024-06-15 20:28:20,225][1652475] Updated weights for policy 0, policy_version 681216 (0.0012) [2024-06-15 20:28:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 1395130368. Throughput: 0: 11093.4. Samples: 348813312. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:28:23,686][1652475] Updated weights for policy 0, policy_version 681277 (0.0011) [2024-06-15 20:28:25,738][1648984] Fps is (10 sec: 49174.5, 60 sec: 42052.3, 300 sec: 43320.4). Total num frames: 1395261440. Throughput: 0: 11298.1. Samples: 348881408. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:28:27,283][1651340] Signal inference workers to stop experience collection... (35000 times) [2024-06-15 20:28:27,322][1652475] InferenceWorker_p0-w0: stopping experience collection (35000 times) [2024-06-15 20:28:27,524][1651340] Signal inference workers to resume experience collection... (35000 times) [2024-06-15 20:28:27,524][1652475] InferenceWorker_p0-w0: resuming experience collection (35000 times) [2024-06-15 20:28:28,665][1652475] Updated weights for policy 0, policy_version 681349 (0.0013) [2024-06-15 20:28:29,893][1652475] Updated weights for policy 0, policy_version 681402 (0.0013) [2024-06-15 20:28:30,738][1648984] Fps is (10 sec: 39320.0, 60 sec: 45874.9, 300 sec: 43542.5). Total num frames: 1395523584. Throughput: 0: 11320.8. Samples: 348946944. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:30,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:28:31,908][1652475] Updated weights for policy 0, policy_version 681456 (0.0014) [2024-06-15 20:28:35,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 1395720192. Throughput: 0: 11298.1. Samples: 348980736. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:35,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:28:35,840][1652475] Updated weights for policy 0, policy_version 681507 (0.0012) [2024-06-15 20:28:39,107][1652475] Updated weights for policy 0, policy_version 681552 (0.0012) [2024-06-15 20:28:40,738][1648984] Fps is (10 sec: 45876.6, 60 sec: 44782.8, 300 sec: 43764.7). Total num frames: 1395982336. Throughput: 0: 11320.9. Samples: 349047808. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:40,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:28:41,044][1652475] Updated weights for policy 0, policy_version 681648 (0.0012) [2024-06-15 20:28:43,522][1652475] Updated weights for policy 0, policy_version 681724 (0.0013) [2024-06-15 20:28:45,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44236.7, 300 sec: 43542.6). Total num frames: 1396178944. Throughput: 0: 11241.2. Samples: 349110272. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:45,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:28:50,075][1652475] Updated weights for policy 0, policy_version 681789 (0.0108) [2024-06-15 20:28:50,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1396342784. Throughput: 0: 11377.8. Samples: 349150208. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:28:51,446][1652475] Updated weights for policy 0, policy_version 681852 (0.0015) [2024-06-15 20:28:53,103][1652475] Updated weights for policy 0, policy_version 681910 (0.0029) [2024-06-15 20:28:54,405][1652475] Updated weights for policy 0, policy_version 681952 (0.0011) [2024-06-15 20:28:55,738][1648984] Fps is (10 sec: 52427.0, 60 sec: 46967.2, 300 sec: 43986.8). Total num frames: 1396703232. Throughput: 0: 11047.7. Samples: 349205504. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:28:55,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:28:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000681984_1396703232.pth... [2024-06-15 20:28:55,830][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000676800_1386086400.pth [2024-06-15 20:29:00,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 43144.5, 300 sec: 43098.4). Total num frames: 1396703232. Throughput: 0: 11003.4. Samples: 349276672. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:29:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:29:02,492][1652475] Updated weights for policy 0, policy_version 682032 (0.0110) [2024-06-15 20:29:04,863][1652475] Updated weights for policy 0, policy_version 682113 (0.0012) [2024-06-15 20:29:05,738][1648984] Fps is (10 sec: 32769.4, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 1397030912. Throughput: 0: 10956.8. Samples: 349306368. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:29:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:29:07,286][1652475] Updated weights for policy 0, policy_version 682184 (0.0041) [2024-06-15 20:29:08,443][1652475] Updated weights for policy 0, policy_version 682237 (0.0012) [2024-06-15 20:29:10,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1397227520. Throughput: 0: 10763.4. Samples: 349365760. Policy #0 lag: (min: 13.0, avg: 109.0, max: 269.0) [2024-06-15 20:29:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:29:13,045][1651340] Signal inference workers to stop experience collection... (35050 times) [2024-06-15 20:29:13,117][1652475] InferenceWorker_p0-w0: stopping experience collection (35050 times) [2024-06-15 20:29:13,385][1651340] Signal inference workers to resume experience collection... (35050 times) [2024-06-15 20:29:13,402][1652475] InferenceWorker_p0-w0: resuming experience collection (35050 times) [2024-06-15 20:29:14,671][1652475] Updated weights for policy 0, policy_version 682307 (0.0012) [2024-06-15 20:29:15,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44240.2, 300 sec: 43320.4). Total num frames: 1397424128. Throughput: 0: 10786.2. Samples: 349432320. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:29:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:29:17,087][1652475] Updated weights for policy 0, policy_version 682402 (0.0011) [2024-06-15 20:29:20,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1397620736. Throughput: 0: 10615.5. Samples: 349458432. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:29:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:29:21,241][1652475] Updated weights for policy 0, policy_version 682448 (0.0013) [2024-06-15 20:29:25,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 1397784576. Throughput: 0: 10717.9. Samples: 349530112. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:29:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:29:25,973][1652475] Updated weights for policy 0, policy_version 682528 (0.0015) [2024-06-15 20:29:27,929][1652475] Updated weights for policy 0, policy_version 682599 (0.0125) [2024-06-15 20:29:29,630][1652475] Updated weights for policy 0, policy_version 682677 (0.0014) [2024-06-15 20:29:30,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43691.0, 300 sec: 43875.8). Total num frames: 1398145024. Throughput: 0: 10558.6. Samples: 349585408. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:29:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:29:33,739][1652475] Updated weights for policy 0, policy_version 682704 (0.0014) [2024-06-15 20:29:35,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 42598.4, 300 sec: 43431.5). Total num frames: 1398276096. Throughput: 0: 10570.0. Samples: 349625856. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:29:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:29:37,084][1652475] Updated weights for policy 0, policy_version 682753 (0.0020) [2024-06-15 20:29:39,257][1652475] Updated weights for policy 0, policy_version 682850 (0.0105) [2024-06-15 20:29:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42598.4, 300 sec: 43653.6). Total num frames: 1398538240. Throughput: 0: 10638.3. Samples: 349684224. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:29:40,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:29:41,584][1652475] Updated weights for policy 0, policy_version 682896 (0.0011) [2024-06-15 20:29:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 1398669312. Throughput: 0: 10558.6. Samples: 349751808. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:29:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:29:45,856][1652475] Updated weights for policy 0, policy_version 682960 (0.0178) [2024-06-15 20:29:49,681][1652475] Updated weights for policy 0, policy_version 683048 (0.0208) [2024-06-15 20:29:50,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 43690.7, 300 sec: 43653.7). Total num frames: 1398964224. Throughput: 0: 10661.0. Samples: 349786112. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:29:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:29:50,934][1652475] Updated weights for policy 0, policy_version 683104 (0.0107) [2024-06-15 20:29:54,766][1652475] Updated weights for policy 0, policy_version 683155 (0.0040) [2024-06-15 20:29:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 41506.4, 300 sec: 43542.6). Total num frames: 1399193600. Throughput: 0: 10865.8. Samples: 349854720. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:29:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:29:56,920][1651340] Signal inference workers to stop experience collection... (35100 times) [2024-06-15 20:29:57,010][1652475] InferenceWorker_p0-w0: stopping experience collection (35100 times) [2024-06-15 20:29:57,164][1651340] Signal inference workers to resume experience collection... (35100 times) [2024-06-15 20:29:57,165][1652475] InferenceWorker_p0-w0: resuming experience collection (35100 times) [2024-06-15 20:29:57,939][1652475] Updated weights for policy 0, policy_version 683248 (0.0012) [2024-06-15 20:30:00,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44782.9, 300 sec: 43764.8). Total num frames: 1399390208. Throughput: 0: 10797.5. Samples: 349918208. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:30:00,965][1652475] Updated weights for policy 0, policy_version 683323 (0.0117) [2024-06-15 20:30:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1399586816. Throughput: 0: 11036.5. Samples: 349955072. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:30:06,292][1652475] Updated weights for policy 0, policy_version 683408 (0.0097) [2024-06-15 20:30:08,928][1652475] Updated weights for policy 0, policy_version 683472 (0.0028) [2024-06-15 20:30:10,083][1652475] Updated weights for policy 0, policy_version 683517 (0.0016) [2024-06-15 20:30:10,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.7, 300 sec: 43653.6). Total num frames: 1399848960. Throughput: 0: 10717.9. Samples: 350012416. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:30:12,214][1652475] Updated weights for policy 0, policy_version 683556 (0.0012) [2024-06-15 20:30:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 43209.4). Total num frames: 1399980032. Throughput: 0: 11116.1. Samples: 350085632. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:15,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:30:16,736][1652475] Updated weights for policy 0, policy_version 683605 (0.0014) [2024-06-15 20:30:18,569][1652475] Updated weights for policy 0, policy_version 683683 (0.0011) [2024-06-15 20:30:20,355][1652475] Updated weights for policy 0, policy_version 683717 (0.0011) [2024-06-15 20:30:20,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 44236.8, 300 sec: 43764.7). Total num frames: 1400274944. Throughput: 0: 10843.0. Samples: 350113792. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:30:21,744][1652475] Updated weights for policy 0, policy_version 683770 (0.0011) [2024-06-15 20:30:24,666][1652475] Updated weights for policy 0, policy_version 683839 (0.0014) [2024-06-15 20:30:25,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 1400504320. Throughput: 0: 11002.3. Samples: 350179328. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:30:29,691][1652475] Updated weights for policy 0, policy_version 683904 (0.0012) [2024-06-15 20:30:30,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.3, 300 sec: 43764.7). Total num frames: 1400700928. Throughput: 0: 10990.9. Samples: 350246400. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:30:32,406][1652475] Updated weights for policy 0, policy_version 683970 (0.0011) [2024-06-15 20:30:35,048][1652475] Updated weights for policy 0, policy_version 684033 (0.0013) [2024-06-15 20:30:35,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 43764.7). Total num frames: 1400963072. Throughput: 0: 10854.4. Samples: 350274560. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:30:36,415][1652475] Updated weights for policy 0, policy_version 684096 (0.0012) [2024-06-15 20:30:40,740][1648984] Fps is (10 sec: 36045.0, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1401061376. Throughput: 0: 10968.2. Samples: 350348288. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:40,741][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:30:41,946][1652475] Updated weights for policy 0, policy_version 684159 (0.0013) [2024-06-15 20:30:42,892][1651340] Signal inference workers to stop experience collection... (35150 times) [2024-06-15 20:30:42,928][1652475] InferenceWorker_p0-w0: stopping experience collection (35150 times) [2024-06-15 20:30:43,202][1651340] Signal inference workers to resume experience collection... (35150 times) [2024-06-15 20:30:43,203][1652475] InferenceWorker_p0-w0: resuming experience collection (35150 times) [2024-06-15 20:30:43,383][1652475] Updated weights for policy 0, policy_version 684214 (0.0011) [2024-06-15 20:30:45,121][1652475] Updated weights for policy 0, policy_version 684244 (0.0012) [2024-06-15 20:30:45,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 1401389056. Throughput: 0: 10911.3. Samples: 350409216. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:30:47,317][1652475] Updated weights for policy 0, policy_version 684320 (0.0014) [2024-06-15 20:30:50,738][1648984] Fps is (10 sec: 49150.8, 60 sec: 43144.3, 300 sec: 43431.4). Total num frames: 1401552896. Throughput: 0: 10751.9. Samples: 350438912. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 20:30:50,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:30:53,230][1652475] Updated weights for policy 0, policy_version 684369 (0.0010) [2024-06-15 20:30:54,788][1652475] Updated weights for policy 0, policy_version 684432 (0.0013) [2024-06-15 20:30:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.6, 300 sec: 43653.6). Total num frames: 1401782272. Throughput: 0: 11002.3. Samples: 350507520. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:30:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:30:55,761][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000684480_1401815040.pth... [2024-06-15 20:30:55,811][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000679408_1391427584.pth [2024-06-15 20:30:55,816][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000684480_1401815040.pth [2024-06-15 20:30:56,841][1652475] Updated weights for policy 0, policy_version 684496 (0.0011) [2024-06-15 20:31:00,366][1652475] Updated weights for policy 0, policy_version 684560 (0.0012) [2024-06-15 20:31:00,740][1648984] Fps is (10 sec: 42589.6, 60 sec: 43142.8, 300 sec: 43542.2). Total num frames: 1401978880. Throughput: 0: 10831.1. Samples: 350573056. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:00,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:31:05,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43144.4, 300 sec: 43431.5). Total num frames: 1402175488. Throughput: 0: 10808.9. Samples: 350600192. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:31:05,942][1652475] Updated weights for policy 0, policy_version 684672 (0.0167) [2024-06-15 20:31:07,202][1652475] Updated weights for policy 0, policy_version 684733 (0.0029) [2024-06-15 20:31:10,079][1652475] Updated weights for policy 0, policy_version 684791 (0.0011) [2024-06-15 20:31:10,738][1648984] Fps is (10 sec: 49163.4, 60 sec: 43690.6, 300 sec: 43543.0). Total num frames: 1402470400. Throughput: 0: 10706.5. Samples: 350661120. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:31:15,097][1652475] Updated weights for policy 0, policy_version 684858 (0.0081) [2024-06-15 20:31:15,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43690.6, 300 sec: 43102.4). Total num frames: 1402601472. Throughput: 0: 10752.0. Samples: 350730240. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:15,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:31:17,189][1652475] Updated weights for policy 0, policy_version 684900 (0.0015) [2024-06-15 20:31:20,739][1648984] Fps is (10 sec: 39319.8, 60 sec: 43144.2, 300 sec: 43653.6). Total num frames: 1402863616. Throughput: 0: 10774.6. Samples: 350759424. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:20,741][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:31:21,066][1652475] Updated weights for policy 0, policy_version 685008 (0.0012) [2024-06-15 20:31:25,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1402994688. Throughput: 0: 10604.1. Samples: 350825472. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:31:26,646][1652475] Updated weights for policy 0, policy_version 685072 (0.0078) [2024-06-15 20:31:27,895][1652475] Updated weights for policy 0, policy_version 685120 (0.0012) [2024-06-15 20:31:30,155][1651340] Signal inference workers to stop experience collection... (35200 times) [2024-06-15 20:31:30,197][1652475] InferenceWorker_p0-w0: stopping experience collection (35200 times) [2024-06-15 20:31:30,480][1651340] Signal inference workers to resume experience collection... (35200 times) [2024-06-15 20:31:30,481][1652475] InferenceWorker_p0-w0: resuming experience collection (35200 times) [2024-06-15 20:31:30,726][1652475] Updated weights for policy 0, policy_version 685203 (0.0011) [2024-06-15 20:31:30,738][1648984] Fps is (10 sec: 42600.5, 60 sec: 43144.6, 300 sec: 43653.6). Total num frames: 1403289600. Throughput: 0: 10547.2. Samples: 350883840. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:31:35,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 40413.8, 300 sec: 42987.1). Total num frames: 1403387904. Throughput: 0: 10524.5. Samples: 350912512. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:35,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:31:36,207][1652475] Updated weights for policy 0, policy_version 685266 (0.0042) [2024-06-15 20:31:38,881][1652475] Updated weights for policy 0, policy_version 685328 (0.0013) [2024-06-15 20:31:39,915][1652475] Updated weights for policy 0, policy_version 685375 (0.0011) [2024-06-15 20:31:40,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43144.6, 300 sec: 43098.3). Total num frames: 1403650048. Throughput: 0: 10501.7. Samples: 350980096. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:31:41,503][1652475] Updated weights for policy 0, policy_version 685424 (0.0013) [2024-06-15 20:31:43,446][1652475] Updated weights for policy 0, policy_version 685502 (0.0014) [2024-06-15 20:31:45,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 42052.2, 300 sec: 43542.6). Total num frames: 1403912192. Throughput: 0: 10377.1. Samples: 351040000. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:31:50,738][1648984] Fps is (10 sec: 32767.0, 60 sec: 40413.9, 300 sec: 42765.0). Total num frames: 1403977728. Throughput: 0: 10569.9. Samples: 351075840. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:50,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:31:51,544][1652475] Updated weights for policy 0, policy_version 685569 (0.0012) [2024-06-15 20:31:53,117][1652475] Updated weights for policy 0, policy_version 685634 (0.0012) [2024-06-15 20:31:55,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 43320.7). Total num frames: 1404305408. Throughput: 0: 10274.1. Samples: 351123456. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:31:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:31:56,751][1652475] Updated weights for policy 0, policy_version 685714 (0.0242) [2024-06-15 20:31:57,798][1652475] Updated weights for policy 0, policy_version 685753 (0.0011) [2024-06-15 20:32:00,738][1648984] Fps is (10 sec: 45876.5, 60 sec: 40961.6, 300 sec: 42987.2). Total num frames: 1404436480. Throughput: 0: 10160.4. Samples: 351187456. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:32:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:32:02,908][1652475] Updated weights for policy 0, policy_version 685797 (0.0012) [2024-06-15 20:32:04,506][1652475] Updated weights for policy 0, policy_version 685858 (0.0087) [2024-06-15 20:32:05,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1404698624. Throughput: 0: 10274.2. Samples: 351221760. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:32:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 20:32:06,521][1652475] Updated weights for policy 0, policy_version 685920 (0.0019) [2024-06-15 20:32:10,373][1652475] Updated weights for policy 0, policy_version 685968 (0.0012) [2024-06-15 20:32:10,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 40413.9, 300 sec: 42876.1). Total num frames: 1404895232. Throughput: 0: 10194.5. Samples: 351284224. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:32:10,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:32:11,173][1652475] Updated weights for policy 0, policy_version 686011 (0.0012) [2024-06-15 20:32:14,415][1652475] Updated weights for policy 0, policy_version 686080 (0.0012) [2024-06-15 20:32:15,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42598.5, 300 sec: 43209.3). Total num frames: 1405157376. Throughput: 0: 10353.8. Samples: 351349760. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:32:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:32:16,178][1652475] Updated weights for policy 0, policy_version 686138 (0.0013) [2024-06-15 20:32:20,052][1651340] Signal inference workers to stop experience collection... (35250 times) [2024-06-15 20:32:20,108][1652475] InferenceWorker_p0-w0: stopping experience collection (35250 times) [2024-06-15 20:32:20,227][1651340] Signal inference workers to resume experience collection... (35250 times) [2024-06-15 20:32:20,228][1652475] InferenceWorker_p0-w0: resuming experience collection (35250 times) [2024-06-15 20:32:20,545][1652475] Updated weights for policy 0, policy_version 686192 (0.0011) [2024-06-15 20:32:20,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 40960.4, 300 sec: 42653.9). Total num frames: 1405321216. Throughput: 0: 10467.6. Samples: 351383552. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:32:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:32:21,981][1652475] Updated weights for policy 0, policy_version 686224 (0.0012) [2024-06-15 20:32:24,925][1652475] Updated weights for policy 0, policy_version 686304 (0.0012) [2024-06-15 20:32:25,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1405616128. Throughput: 0: 10478.9. Samples: 351451648. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:32:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:32:26,017][1652475] Updated weights for policy 0, policy_version 686337 (0.0041) [2024-06-15 20:32:27,153][1652475] Updated weights for policy 0, policy_version 686390 (0.0012) [2024-06-15 20:32:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 40960.0, 300 sec: 43098.3). Total num frames: 1405747200. Throughput: 0: 10638.2. Samples: 351518720. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 20:32:30,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 20:32:33,894][1652475] Updated weights for policy 0, policy_version 686464 (0.0132) [2024-06-15 20:32:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.8, 300 sec: 43098.2). Total num frames: 1406009344. Throughput: 0: 10661.0. Samples: 351555584. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:32:35,738][1648984] Avg episode reward: [(0, '-0.190')] [2024-06-15 20:32:36,327][1652475] Updated weights for policy 0, policy_version 686560 (0.0013) [2024-06-15 20:32:38,838][1652475] Updated weights for policy 0, policy_version 686640 (0.0014) [2024-06-15 20:32:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1406271488. Throughput: 0: 10808.9. Samples: 351609856. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:32:40,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 20:32:44,585][1652475] Updated weights for policy 0, policy_version 686688 (0.0013) [2024-06-15 20:32:45,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1406435328. Throughput: 0: 11252.6. Samples: 351693824. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:32:45,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 20:32:45,820][1652475] Updated weights for policy 0, policy_version 686752 (0.0129) [2024-06-15 20:32:48,019][1652475] Updated weights for policy 0, policy_version 686840 (0.0034) [2024-06-15 20:32:50,176][1652475] Updated weights for policy 0, policy_version 686896 (0.0019) [2024-06-15 20:32:50,738][1648984] Fps is (10 sec: 52427.0, 60 sec: 46967.4, 300 sec: 43764.7). Total num frames: 1406795776. Throughput: 0: 11036.4. Samples: 351718400. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:32:50,739][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 20:32:55,626][1652475] Updated weights for policy 0, policy_version 686932 (0.0011) [2024-06-15 20:32:55,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.3, 300 sec: 43098.2). Total num frames: 1406828544. Throughput: 0: 11355.0. Samples: 351795200. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:32:55,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 20:32:56,052][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000686960_1406894080.pth... [2024-06-15 20:32:56,102][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000681984_1396703232.pth [2024-06-15 20:32:56,831][1652475] Updated weights for policy 0, policy_version 686986 (0.0010) [2024-06-15 20:32:58,721][1652475] Updated weights for policy 0, policy_version 687056 (0.0013) [2024-06-15 20:33:00,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 45875.1, 300 sec: 43542.5). Total num frames: 1407188992. Throughput: 0: 11252.6. Samples: 351856128. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:00,739][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 20:33:00,794][1652475] Updated weights for policy 0, policy_version 687106 (0.0011) [2024-06-15 20:33:01,170][1651340] Signal inference workers to stop experience collection... (35300 times) [2024-06-15 20:33:01,238][1652475] InferenceWorker_p0-w0: stopping experience collection (35300 times) [2024-06-15 20:33:01,432][1651340] Signal inference workers to resume experience collection... (35300 times) [2024-06-15 20:33:01,433][1652475] InferenceWorker_p0-w0: resuming experience collection (35300 times) [2024-06-15 20:33:02,115][1652475] Updated weights for policy 0, policy_version 687159 (0.0050) [2024-06-15 20:33:05,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1407320064. Throughput: 0: 11366.4. Samples: 351895040. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:33:06,837][1652475] Updated weights for policy 0, policy_version 687200 (0.0013) [2024-06-15 20:33:08,665][1652475] Updated weights for policy 0, policy_version 687252 (0.0012) [2024-06-15 20:33:09,879][1652475] Updated weights for policy 0, policy_version 687298 (0.0011) [2024-06-15 20:33:10,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 43654.3). Total num frames: 1407647744. Throughput: 0: 11457.4. Samples: 351967232. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:33:12,486][1652475] Updated weights for policy 0, policy_version 687376 (0.0013) [2024-06-15 20:33:13,594][1652475] Updated weights for policy 0, policy_version 687419 (0.0036) [2024-06-15 20:33:15,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 1407844352. Throughput: 0: 11423.3. Samples: 352032768. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:33:18,212][1652475] Updated weights for policy 0, policy_version 687487 (0.0135) [2024-06-15 20:33:20,746][1648984] Fps is (10 sec: 42562.3, 60 sec: 45868.6, 300 sec: 43430.2). Total num frames: 1408073728. Throughput: 0: 11307.4. Samples: 352064512. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:20,747][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:33:23,565][1652475] Updated weights for policy 0, policy_version 687568 (0.0016) [2024-06-15 20:33:25,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44783.0, 300 sec: 43320.5). Total num frames: 1408303104. Throughput: 0: 11491.6. Samples: 352126976. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 20:33:25,839][1652475] Updated weights for policy 0, policy_version 687649 (0.0093) [2024-06-15 20:33:30,294][1652475] Updated weights for policy 0, policy_version 687712 (0.0013) [2024-06-15 20:33:30,738][1648984] Fps is (10 sec: 39354.9, 60 sec: 45329.0, 300 sec: 43209.3). Total num frames: 1408466944. Throughput: 0: 11013.7. Samples: 352189440. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:33:32,024][1652475] Updated weights for policy 0, policy_version 687765 (0.0012) [2024-06-15 20:33:35,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1408630784. Throughput: 0: 11116.1. Samples: 352218624. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:35,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:33:36,310][1652475] Updated weights for policy 0, policy_version 687824 (0.0011) [2024-06-15 20:33:38,135][1652475] Updated weights for policy 0, policy_version 687891 (0.0011) [2024-06-15 20:33:40,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1408892928. Throughput: 0: 10888.5. Samples: 352285184. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:33:41,789][1652475] Updated weights for policy 0, policy_version 687938 (0.0013) [2024-06-15 20:33:42,979][1652475] Updated weights for policy 0, policy_version 687997 (0.0013) [2024-06-15 20:33:45,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 1409155072. Throughput: 0: 10956.8. Samples: 352349184. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:33:47,815][1652475] Updated weights for policy 0, policy_version 688065 (0.0123) [2024-06-15 20:33:48,624][1651340] Signal inference workers to stop experience collection... (35350 times) [2024-06-15 20:33:48,692][1652475] InferenceWorker_p0-w0: stopping experience collection (35350 times) [2024-06-15 20:33:48,948][1651340] Signal inference workers to resume experience collection... (35350 times) [2024-06-15 20:33:48,948][1652475] InferenceWorker_p0-w0: resuming experience collection (35350 times) [2024-06-15 20:33:49,244][1652475] Updated weights for policy 0, policy_version 688123 (0.0011) [2024-06-15 20:33:50,435][1652475] Updated weights for policy 0, policy_version 688160 (0.0016) [2024-06-15 20:33:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.7, 300 sec: 42876.2). Total num frames: 1409351680. Throughput: 0: 10956.8. Samples: 352388096. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:33:53,251][1652475] Updated weights for policy 0, policy_version 688212 (0.0012) [2024-06-15 20:33:54,760][1652475] Updated weights for policy 0, policy_version 688277 (0.0014) [2024-06-15 20:33:55,738][1648984] Fps is (10 sec: 49150.7, 60 sec: 46967.3, 300 sec: 43875.7). Total num frames: 1409646592. Throughput: 0: 10808.8. Samples: 352453632. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:33:55,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:33:58,923][1652475] Updated weights for policy 0, policy_version 688322 (0.0011) [2024-06-15 20:34:00,345][1652475] Updated weights for policy 0, policy_version 688383 (0.0012) [2024-06-15 20:34:00,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.8, 300 sec: 43320.4). Total num frames: 1409810432. Throughput: 0: 10820.3. Samples: 352519680. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:34:00,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:34:04,700][1652475] Updated weights for policy 0, policy_version 688448 (0.0012) [2024-06-15 20:34:05,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 1410039808. Throughput: 0: 10958.9. Samples: 352557568. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:34:05,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:34:07,011][1652475] Updated weights for policy 0, policy_version 688544 (0.0015) [2024-06-15 20:34:10,747][1648984] Fps is (10 sec: 39294.2, 60 sec: 42593.5, 300 sec: 43319.4). Total num frames: 1410203648. Throughput: 0: 10716.2. Samples: 352609280. Policy #0 lag: (min: 15.0, avg: 85.6, max: 271.0) [2024-06-15 20:34:10,751][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 20:34:11,844][1652475] Updated weights for policy 0, policy_version 688609 (0.0012) [2024-06-15 20:34:12,539][1652475] Updated weights for policy 0, policy_version 688640 (0.0013) [2024-06-15 20:34:15,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1410334720. Throughput: 0: 10865.8. Samples: 352678400. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:34:15,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:34:18,119][1652475] Updated weights for policy 0, policy_version 688720 (0.0012) [2024-06-15 20:34:20,145][1652475] Updated weights for policy 0, policy_version 688800 (0.0011) [2024-06-15 20:34:20,738][1648984] Fps is (10 sec: 49186.1, 60 sec: 43696.9, 300 sec: 43764.7). Total num frames: 1410695168. Throughput: 0: 10911.3. Samples: 352709632. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:34:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:34:22,830][1652475] Updated weights for policy 0, policy_version 688849 (0.0020) [2024-06-15 20:34:23,971][1652475] Updated weights for policy 0, policy_version 688896 (0.0011) [2024-06-15 20:34:25,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 1410859008. Throughput: 0: 10683.7. Samples: 352765952. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:34:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:34:30,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1410990080. Throughput: 0: 10854.4. Samples: 352837632. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:34:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:34:31,502][1652475] Updated weights for policy 0, policy_version 688976 (0.0015) [2024-06-15 20:34:33,402][1652475] Updated weights for policy 0, policy_version 689043 (0.0013) [2024-06-15 20:34:34,685][1651340] Signal inference workers to stop experience collection... (35400 times) [2024-06-15 20:34:34,724][1652475] InferenceWorker_p0-w0: stopping experience collection (35400 times) [2024-06-15 20:34:34,962][1651340] Signal inference workers to resume experience collection... (35400 times) [2024-06-15 20:34:34,962][1652475] InferenceWorker_p0-w0: resuming experience collection (35400 times) [2024-06-15 20:34:35,329][1652475] Updated weights for policy 0, policy_version 689120 (0.0012) [2024-06-15 20:34:35,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 1411317760. Throughput: 0: 10592.7. Samples: 352864768. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:34:35,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:34:40,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1411448832. Throughput: 0: 10467.6. Samples: 352924672. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:34:40,740][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:34:40,827][1652475] Updated weights for policy 0, policy_version 689200 (0.0021) [2024-06-15 20:34:45,738][1648984] Fps is (10 sec: 22938.1, 60 sec: 39867.8, 300 sec: 42653.9). Total num frames: 1411547136. Throughput: 0: 10570.0. Samples: 352995328. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:34:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:34:46,037][1652475] Updated weights for policy 0, policy_version 689252 (0.0013) [2024-06-15 20:34:47,276][1652475] Updated weights for policy 0, policy_version 689316 (0.0014) [2024-06-15 20:34:49,350][1652475] Updated weights for policy 0, policy_version 689399 (0.0139) [2024-06-15 20:34:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 1411907584. Throughput: 0: 10308.3. Samples: 353021440. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:34:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:34:52,538][1652475] Updated weights for policy 0, policy_version 689456 (0.0012) [2024-06-15 20:34:55,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 39867.9, 300 sec: 42876.1). Total num frames: 1412038656. Throughput: 0: 10639.9. Samples: 353088000. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:34:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:34:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000689472_1412038656.pth... [2024-06-15 20:34:55,803][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000684480_1401815040.pth [2024-06-15 20:34:58,172][1652475] Updated weights for policy 0, policy_version 689556 (0.0012) [2024-06-15 20:34:59,736][1652475] Updated weights for policy 0, policy_version 689616 (0.0013) [2024-06-15 20:35:00,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 1412399104. Throughput: 0: 10490.3. Samples: 353150464. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:35:03,489][1652475] Updated weights for policy 0, policy_version 689680 (0.0013) [2024-06-15 20:35:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1412562944. Throughput: 0: 10649.6. Samples: 353188864. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:35:08,827][1652475] Updated weights for policy 0, policy_version 689760 (0.0015) [2024-06-15 20:35:10,471][1652475] Updated weights for policy 0, policy_version 689824 (0.0013) [2024-06-15 20:35:10,738][1648984] Fps is (10 sec: 36044.0, 60 sec: 42603.2, 300 sec: 43320.4). Total num frames: 1412759552. Throughput: 0: 10990.9. Samples: 353260544. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:10,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:35:12,231][1652475] Updated weights for policy 0, policy_version 689889 (0.0013) [2024-06-15 20:35:15,314][1652475] Updated weights for policy 0, policy_version 689943 (0.0015) [2024-06-15 20:35:15,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44783.0, 300 sec: 43209.3). Total num frames: 1413021696. Throughput: 0: 10740.6. Samples: 353320960. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:35:19,891][1652475] Updated weights for policy 0, policy_version 689985 (0.0014) [2024-06-15 20:35:20,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 40960.0, 300 sec: 42876.1). Total num frames: 1413152768. Throughput: 0: 10956.8. Samples: 353357824. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:35:20,981][1651340] Signal inference workers to stop experience collection... (35450 times) [2024-06-15 20:35:21,022][1652475] InferenceWorker_p0-w0: stopping experience collection (35450 times) [2024-06-15 20:35:21,213][1651340] Signal inference workers to resume experience collection... (35450 times) [2024-06-15 20:35:21,214][1652475] InferenceWorker_p0-w0: resuming experience collection (35450 times) [2024-06-15 20:35:21,764][1652475] Updated weights for policy 0, policy_version 690066 (0.0012) [2024-06-15 20:35:24,087][1652475] Updated weights for policy 0, policy_version 690148 (0.0013) [2024-06-15 20:35:25,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 1413480448. Throughput: 0: 10990.9. Samples: 353419264. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:35:27,250][1652475] Updated weights for policy 0, policy_version 690212 (0.0013) [2024-06-15 20:35:30,739][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1413611520. Throughput: 0: 11025.0. Samples: 353491456. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:30,741][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:35:31,454][1652475] Updated weights for policy 0, policy_version 690259 (0.0011) [2024-06-15 20:35:34,367][1652475] Updated weights for policy 0, policy_version 690310 (0.0013) [2024-06-15 20:35:35,758][1648984] Fps is (10 sec: 39242.8, 60 sec: 42584.2, 300 sec: 43428.5). Total num frames: 1413873664. Throughput: 0: 11202.1. Samples: 353525760. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:35,759][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 20:35:36,138][1652475] Updated weights for policy 0, policy_version 690384 (0.0017) [2024-06-15 20:35:38,626][1652475] Updated weights for policy 0, policy_version 690467 (0.0098) [2024-06-15 20:35:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44782.9, 300 sec: 43209.3). Total num frames: 1414135808. Throughput: 0: 10934.0. Samples: 353580032. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:35:43,607][1652475] Updated weights for policy 0, policy_version 690544 (0.0012) [2024-06-15 20:35:45,738][1648984] Fps is (10 sec: 39401.0, 60 sec: 45329.0, 300 sec: 43098.3). Total num frames: 1414266880. Throughput: 0: 11161.6. Samples: 353652736. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:45,738][1648984] Avg episode reward: [(0, '-0.190')] [2024-06-15 20:35:46,994][1652475] Updated weights for policy 0, policy_version 690593 (0.0012) [2024-06-15 20:35:49,397][1652475] Updated weights for policy 0, policy_version 690656 (0.0012) [2024-06-15 20:35:50,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 44236.7, 300 sec: 43320.4). Total num frames: 1414561792. Throughput: 0: 11002.3. Samples: 353683968. Policy #0 lag: (min: 59.0, avg: 175.7, max: 315.0) [2024-06-15 20:35:50,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:35:51,398][1652475] Updated weights for policy 0, policy_version 690742 (0.0029) [2024-06-15 20:35:54,832][1652475] Updated weights for policy 0, policy_version 690784 (0.0020) [2024-06-15 20:35:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 43431.8). Total num frames: 1414791168. Throughput: 0: 10934.1. Samples: 353752576. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:35:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:35:59,363][1652475] Updated weights for policy 0, policy_version 690870 (0.0016) [2024-06-15 20:36:00,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 42052.3, 300 sec: 43209.4). Total num frames: 1414922240. Throughput: 0: 10934.0. Samples: 353812992. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:36:02,563][1652475] Updated weights for policy 0, policy_version 690930 (0.0107) [2024-06-15 20:36:04,182][1652475] Updated weights for policy 0, policy_version 691000 (0.0010) [2024-06-15 20:36:05,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1415184384. Throughput: 0: 10752.0. Samples: 353841664. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:36:07,063][1651340] Signal inference workers to stop experience collection... (35500 times) [2024-06-15 20:36:07,145][1652475] InferenceWorker_p0-w0: stopping experience collection (35500 times) [2024-06-15 20:36:07,165][1652475] Updated weights for policy 0, policy_version 691032 (0.0014) [2024-06-15 20:36:07,302][1651340] Signal inference workers to resume experience collection... (35500 times) [2024-06-15 20:36:07,303][1652475] InferenceWorker_p0-w0: resuming experience collection (35500 times) [2024-06-15 20:36:10,738][1648984] Fps is (10 sec: 42597.7, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 1415348224. Throughput: 0: 10854.4. Samples: 353907712. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:10,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 20:36:10,835][1652475] Updated weights for policy 0, policy_version 691089 (0.0130) [2024-06-15 20:36:12,921][1652475] Updated weights for policy 0, policy_version 691138 (0.0012) [2024-06-15 20:36:14,050][1652475] Updated weights for policy 0, policy_version 691200 (0.0012) [2024-06-15 20:36:15,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 1415577600. Throughput: 0: 10752.0. Samples: 353975296. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:36:16,969][1652475] Updated weights for policy 0, policy_version 691259 (0.0012) [2024-06-15 20:36:19,418][1652475] Updated weights for policy 0, policy_version 691321 (0.0013) [2024-06-15 20:36:20,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 1415839744. Throughput: 0: 10688.5. Samples: 354006528. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:36:22,540][1652475] Updated weights for policy 0, policy_version 691384 (0.0056) [2024-06-15 20:36:24,661][1652475] Updated weights for policy 0, policy_version 691424 (0.0016) [2024-06-15 20:36:25,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1416101888. Throughput: 0: 10968.2. Samples: 354073600. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:25,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:36:28,433][1652475] Updated weights for policy 0, policy_version 691465 (0.0013) [2024-06-15 20:36:29,572][1652475] Updated weights for policy 0, policy_version 691520 (0.0012) [2024-06-15 20:36:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1416232960. Throughput: 0: 10865.8. Samples: 354141696. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:30,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 20:36:33,523][1652475] Updated weights for policy 0, policy_version 691587 (0.0020) [2024-06-15 20:36:35,109][1652475] Updated weights for policy 0, policy_version 691648 (0.0012) [2024-06-15 20:36:35,746][1648984] Fps is (10 sec: 39288.2, 60 sec: 43699.1, 300 sec: 43541.3). Total num frames: 1416495104. Throughput: 0: 10784.1. Samples: 354169344. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:35,747][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:36:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1416626176. Throughput: 0: 10535.8. Samples: 354226688. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:36:41,011][1652475] Updated weights for policy 0, policy_version 691717 (0.0011) [2024-06-15 20:36:45,738][1648984] Fps is (10 sec: 29516.0, 60 sec: 42052.2, 300 sec: 43431.5). Total num frames: 1416790016. Throughput: 0: 10660.9. Samples: 354292736. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:45,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:36:45,823][1652475] Updated weights for policy 0, policy_version 691808 (0.0104) [2024-06-15 20:36:47,629][1652475] Updated weights for policy 0, policy_version 691856 (0.0017) [2024-06-15 20:36:49,487][1652475] Updated weights for policy 0, policy_version 691936 (0.0015) [2024-06-15 20:36:50,742][1648984] Fps is (10 sec: 52405.8, 60 sec: 43141.5, 300 sec: 43541.9). Total num frames: 1417150464. Throughput: 0: 10580.3. Samples: 354317824. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:50,743][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:36:54,351][1652475] Updated weights for policy 0, policy_version 692016 (0.0147) [2024-06-15 20:36:55,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1417281536. Throughput: 0: 10353.8. Samples: 354373632. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:36:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:36:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000692032_1417281536.pth... [2024-06-15 20:36:55,816][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000686960_1406894080.pth [2024-06-15 20:36:59,396][1651340] Signal inference workers to stop experience collection... (35550 times) [2024-06-15 20:36:59,468][1652475] InferenceWorker_p0-w0: stopping experience collection (35550 times) [2024-06-15 20:36:59,633][1651340] Signal inference workers to resume experience collection... (35550 times) [2024-06-15 20:36:59,634][1652475] InferenceWorker_p0-w0: resuming experience collection (35550 times) [2024-06-15 20:37:00,689][1652475] Updated weights for policy 0, policy_version 692101 (0.0020) [2024-06-15 20:37:00,738][1648984] Fps is (10 sec: 26226.1, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1417412608. Throughput: 0: 10387.9. Samples: 354442752. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:37:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:37:02,415][1652475] Updated weights for policy 0, policy_version 692177 (0.0011) [2024-06-15 20:37:03,235][1652475] Updated weights for policy 0, policy_version 692223 (0.0012) [2024-06-15 20:37:05,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1417740288. Throughput: 0: 10217.2. Samples: 354466304. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:37:05,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 20:37:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40960.1, 300 sec: 42876.1). Total num frames: 1417805824. Throughput: 0: 10365.2. Samples: 354540032. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:37:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:37:10,940][1652475] Updated weights for policy 0, policy_version 692291 (0.0012) [2024-06-15 20:37:13,260][1652475] Updated weights for policy 0, policy_version 692384 (0.0150) [2024-06-15 20:37:15,102][1652475] Updated weights for policy 0, policy_version 692477 (0.0115) [2024-06-15 20:37:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 1418199040. Throughput: 0: 10092.1. Samples: 354595840. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:37:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:37:18,182][1652475] Updated weights for policy 0, policy_version 692536 (0.0103) [2024-06-15 20:37:20,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1418330112. Throughput: 0: 10264.7. Samples: 354631168. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:37:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:37:23,639][1652475] Updated weights for policy 0, policy_version 692592 (0.0013) [2024-06-15 20:37:25,065][1652475] Updated weights for policy 0, policy_version 692640 (0.0011) [2024-06-15 20:37:25,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 43431.5). Total num frames: 1418559488. Throughput: 0: 10638.2. Samples: 354705408. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:37:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:37:25,842][1652475] Updated weights for policy 0, policy_version 692672 (0.0011) [2024-06-15 20:37:27,246][1652475] Updated weights for policy 0, policy_version 692727 (0.0017) [2024-06-15 20:37:29,836][1652475] Updated weights for policy 0, policy_version 692797 (0.0012) [2024-06-15 20:37:30,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1418854400. Throughput: 0: 10456.2. Samples: 354763264. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 20:37:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:37:35,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 39873.5, 300 sec: 42765.0). Total num frames: 1418887168. Throughput: 0: 10639.3. Samples: 354796544. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:37:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:37:37,961][1652475] Updated weights for policy 0, policy_version 692896 (0.0012) [2024-06-15 20:37:40,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1419149312. Throughput: 0: 10478.9. Samples: 354845184. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:37:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:37:41,299][1652475] Updated weights for policy 0, policy_version 692976 (0.0013) [2024-06-15 20:37:42,729][1651340] Signal inference workers to stop experience collection... (35600 times) [2024-06-15 20:37:42,850][1652475] InferenceWorker_p0-w0: stopping experience collection (35600 times) [2024-06-15 20:37:42,935][1651340] Signal inference workers to resume experience collection... (35600 times) [2024-06-15 20:37:42,936][1652475] InferenceWorker_p0-w0: resuming experience collection (35600 times) [2024-06-15 20:37:43,505][1652475] Updated weights for policy 0, policy_version 693049 (0.0135) [2024-06-15 20:37:45,738][1648984] Fps is (10 sec: 49151.3, 60 sec: 43144.6, 300 sec: 42654.0). Total num frames: 1419378688. Throughput: 0: 10365.1. Samples: 354909184. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:37:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:37:50,197][1652475] Updated weights for policy 0, policy_version 693104 (0.0043) [2024-06-15 20:37:50,740][1648984] Fps is (10 sec: 36037.1, 60 sec: 39323.1, 300 sec: 42986.9). Total num frames: 1419509760. Throughput: 0: 10637.7. Samples: 354945024. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:37:50,741][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:37:51,895][1652475] Updated weights for policy 0, policy_version 693181 (0.0010) [2024-06-15 20:37:54,203][1652475] Updated weights for policy 0, policy_version 693242 (0.0032) [2024-06-15 20:37:55,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 1419804672. Throughput: 0: 10114.8. Samples: 354995200. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:37:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:38:00,738][1648984] Fps is (10 sec: 39329.8, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1419902976. Throughput: 0: 10012.4. Samples: 355046400. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:38:03,560][1652475] Updated weights for policy 0, policy_version 693344 (0.0095) [2024-06-15 20:38:05,738][1648984] Fps is (10 sec: 22937.7, 60 sec: 38229.3, 300 sec: 41987.5). Total num frames: 1420034048. Throughput: 0: 9966.9. Samples: 355079680. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:05,763][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:38:06,617][1652475] Updated weights for policy 0, policy_version 693394 (0.0012) [2024-06-15 20:38:09,191][1652475] Updated weights for policy 0, policy_version 693459 (0.0012) [2024-06-15 20:38:10,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 1420328960. Throughput: 0: 9477.7. Samples: 355131904. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:10,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 20:38:11,537][1652475] Updated weights for policy 0, policy_version 693566 (0.0013) [2024-06-15 20:38:15,745][1648984] Fps is (10 sec: 39293.9, 60 sec: 37132.7, 300 sec: 41876.6). Total num frames: 1420427264. Throughput: 0: 9430.7. Samples: 355187712. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:15,745][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:38:20,479][1652475] Updated weights for policy 0, policy_version 693638 (0.0013) [2024-06-15 20:38:20,738][1648984] Fps is (10 sec: 26214.6, 60 sec: 37683.1, 300 sec: 41654.2). Total num frames: 1420591104. Throughput: 0: 9238.7. Samples: 355212288. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:38:21,632][1652475] Updated weights for policy 0, policy_version 693690 (0.0011) [2024-06-15 20:38:25,740][1648984] Fps is (10 sec: 39349.1, 60 sec: 37683.2, 300 sec: 41876.4). Total num frames: 1420820480. Throughput: 0: 9386.7. Samples: 355267584. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:25,741][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:38:26,320][1652475] Updated weights for policy 0, policy_version 693763 (0.0014) [2024-06-15 20:38:27,209][1652475] Updated weights for policy 0, policy_version 693812 (0.0011) [2024-06-15 20:38:30,331][1652475] Updated weights for policy 0, policy_version 693888 (0.0012) [2024-06-15 20:38:30,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 37137.1, 300 sec: 42209.6). Total num frames: 1421082624. Throughput: 0: 9307.0. Samples: 355328000. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:30,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:38:35,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 38775.4, 300 sec: 41765.3). Total num frames: 1421213696. Throughput: 0: 9205.1. Samples: 355359232. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:38:37,160][1652475] Updated weights for policy 0, policy_version 693955 (0.0015) [2024-06-15 20:38:38,976][1651340] Signal inference workers to stop experience collection... (35650 times) [2024-06-15 20:38:39,033][1652475] InferenceWorker_p0-w0: stopping experience collection (35650 times) [2024-06-15 20:38:39,182][1651340] Signal inference workers to resume experience collection... (35650 times) [2024-06-15 20:38:39,182][1652475] InferenceWorker_p0-w0: resuming experience collection (35650 times) [2024-06-15 20:38:39,185][1652475] Updated weights for policy 0, policy_version 694032 (0.0012) [2024-06-15 20:38:40,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 38775.5, 300 sec: 41765.3). Total num frames: 1421475840. Throughput: 0: 9443.6. Samples: 355420160. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 20:38:40,948][1652475] Updated weights for policy 0, policy_version 694083 (0.0012) [2024-06-15 20:38:42,234][1652475] Updated weights for policy 0, policy_version 694141 (0.0013) [2024-06-15 20:38:45,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 37683.3, 300 sec: 41654.2). Total num frames: 1421639680. Throughput: 0: 9671.1. Samples: 355481600. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:38:46,318][1652475] Updated weights for policy 0, policy_version 694192 (0.0011) [2024-06-15 20:38:50,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 37684.6, 300 sec: 41098.9). Total num frames: 1421770752. Throughput: 0: 9534.6. Samples: 355508736. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:38:51,515][1652475] Updated weights for policy 0, policy_version 694260 (0.0131) [2024-06-15 20:38:54,855][1652475] Updated weights for policy 0, policy_version 694310 (0.0013) [2024-06-15 20:38:55,739][1648984] Fps is (10 sec: 36038.5, 60 sec: 36589.9, 300 sec: 41320.8). Total num frames: 1422000128. Throughput: 0: 9659.4. Samples: 355566592. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:38:55,740][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:38:56,265][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000694368_1422065664.pth... [2024-06-15 20:38:56,421][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000689472_1412038656.pth [2024-06-15 20:38:59,170][1652475] Updated weights for policy 0, policy_version 694418 (0.0014) [2024-06-15 20:39:00,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 39321.7, 300 sec: 41432.1). Total num frames: 1422262272. Throughput: 0: 9479.2. Samples: 355614208. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:39:00,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 20:39:03,764][1652475] Updated weights for policy 0, policy_version 694480 (0.0014) [2024-06-15 20:39:05,738][1648984] Fps is (10 sec: 39328.2, 60 sec: 39321.6, 300 sec: 41322.0). Total num frames: 1422393344. Throughput: 0: 9682.5. Samples: 355648000. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:39:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:39:08,458][1652475] Updated weights for policy 0, policy_version 694547 (0.0013) [2024-06-15 20:39:10,738][1648984] Fps is (10 sec: 26214.0, 60 sec: 36590.9, 300 sec: 41321.0). Total num frames: 1422524416. Throughput: 0: 9739.4. Samples: 355705856. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:39:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:39:12,442][1652475] Updated weights for policy 0, policy_version 694656 (0.0111) [2024-06-15 20:39:15,757][1648984] Fps is (10 sec: 39246.0, 60 sec: 39313.6, 300 sec: 40985.1). Total num frames: 1422786560. Throughput: 0: 9689.7. Samples: 355764224. Policy #0 lag: (min: 15.0, avg: 92.1, max: 271.0) [2024-06-15 20:39:15,757][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:39:15,925][1652475] Updated weights for policy 0, policy_version 694721 (0.0010) [2024-06-15 20:39:20,442][1652475] Updated weights for policy 0, policy_version 694816 (0.0099) [2024-06-15 20:39:20,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 40413.9, 300 sec: 41209.9). Total num frames: 1423015936. Throughput: 0: 9705.2. Samples: 355795968. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:39:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:39:25,194][1652475] Updated weights for policy 0, policy_version 694904 (0.0016) [2024-06-15 20:39:25,738][1648984] Fps is (10 sec: 39397.2, 60 sec: 39321.6, 300 sec: 41321.0). Total num frames: 1423179776. Throughput: 0: 9875.9. Samples: 355864576. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:39:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:39:26,830][1652475] Updated weights for policy 0, policy_version 694969 (0.0011) [2024-06-15 20:39:29,013][1651340] Signal inference workers to stop experience collection... (35700 times) [2024-06-15 20:39:29,061][1652475] InferenceWorker_p0-w0: stopping experience collection (35700 times) [2024-06-15 20:39:29,369][1651340] Signal inference workers to resume experience collection... (35700 times) [2024-06-15 20:39:29,370][1652475] InferenceWorker_p0-w0: resuming experience collection (35700 times) [2024-06-15 20:39:29,490][1652475] Updated weights for policy 0, policy_version 695028 (0.0015) [2024-06-15 20:39:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 39321.6, 300 sec: 41098.9). Total num frames: 1423441920. Throughput: 0: 9716.6. Samples: 355918848. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:39:30,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:39:32,116][1652475] Updated weights for policy 0, policy_version 695060 (0.0012) [2024-06-15 20:39:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 39321.6, 300 sec: 41098.8). Total num frames: 1423572992. Throughput: 0: 9910.0. Samples: 355954688. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:39:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:39:36,279][1652475] Updated weights for policy 0, policy_version 695110 (0.0012) [2024-06-15 20:39:39,118][1652475] Updated weights for policy 0, policy_version 695184 (0.0011) [2024-06-15 20:39:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 39321.6, 300 sec: 41654.2). Total num frames: 1423835136. Throughput: 0: 10217.6. Samples: 356026368. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:39:40,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:39:41,327][1652475] Updated weights for policy 0, policy_version 695264 (0.0065) [2024-06-15 20:39:44,093][1652475] Updated weights for policy 0, policy_version 695331 (0.0016) [2024-06-15 20:39:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 40960.0, 300 sec: 41321.0). Total num frames: 1424097280. Throughput: 0: 10444.8. Samples: 356084224. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:39:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:39:48,403][1652475] Updated weights for policy 0, policy_version 695381 (0.0011) [2024-06-15 20:39:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 41432.1). Total num frames: 1424261120. Throughput: 0: 10490.3. Samples: 356120064. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:39:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:39:51,109][1652475] Updated weights for policy 0, policy_version 695456 (0.0018) [2024-06-15 20:39:53,322][1652475] Updated weights for policy 0, policy_version 695508 (0.0033) [2024-06-15 20:39:55,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41507.2, 300 sec: 40987.8). Total num frames: 1424490496. Throughput: 0: 10592.7. Samples: 356182528. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:39:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:39:57,189][1652475] Updated weights for policy 0, policy_version 695606 (0.0114) [2024-06-15 20:40:00,305][1652475] Updated weights for policy 0, policy_version 695648 (0.0046) [2024-06-15 20:40:00,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 40959.9, 300 sec: 41209.9). Total num frames: 1424719872. Throughput: 0: 10859.0. Samples: 356252672. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:00,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:40:02,757][1652475] Updated weights for policy 0, policy_version 695713 (0.0011) [2024-06-15 20:40:05,352][1652475] Updated weights for policy 0, policy_version 695776 (0.0065) [2024-06-15 20:40:05,738][1648984] Fps is (10 sec: 49153.2, 60 sec: 43144.6, 300 sec: 41432.1). Total num frames: 1424982016. Throughput: 0: 10831.7. Samples: 356283392. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:05,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:40:06,025][1652475] Updated weights for policy 0, policy_version 695807 (0.0017) [2024-06-15 20:40:09,342][1652475] Updated weights for policy 0, policy_version 695872 (0.0013) [2024-06-15 20:40:10,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 41098.8). Total num frames: 1425145856. Throughput: 0: 10672.4. Samples: 356344832. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:40:12,417][1652475] Updated weights for policy 0, policy_version 695924 (0.0012) [2024-06-15 20:40:14,624][1652475] Updated weights for policy 0, policy_version 695968 (0.0013) [2024-06-15 20:40:15,740][1648984] Fps is (10 sec: 42597.7, 60 sec: 43704.7, 300 sec: 41543.2). Total num frames: 1425408000. Throughput: 0: 11025.1. Samples: 356414976. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:15,741][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:40:17,810][1652475] Updated weights for policy 0, policy_version 696057 (0.0017) [2024-06-15 20:40:19,865][1651340] Signal inference workers to stop experience collection... (35750 times) [2024-06-15 20:40:19,888][1652475] InferenceWorker_p0-w0: stopping experience collection (35750 times) [2024-06-15 20:40:20,069][1651340] Signal inference workers to resume experience collection... (35750 times) [2024-06-15 20:40:20,070][1652475] InferenceWorker_p0-w0: resuming experience collection (35750 times) [2024-06-15 20:40:20,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43144.5, 300 sec: 41098.9). Total num frames: 1425604608. Throughput: 0: 10911.3. Samples: 356445696. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:40:21,090][1652475] Updated weights for policy 0, policy_version 696123 (0.0018) [2024-06-15 20:40:23,702][1652475] Updated weights for policy 0, policy_version 696188 (0.0086) [2024-06-15 20:40:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 41321.0). Total num frames: 1425801216. Throughput: 0: 10808.9. Samples: 356512768. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:40:29,439][1652475] Updated weights for policy 0, policy_version 696260 (0.0036) [2024-06-15 20:40:30,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43144.5, 300 sec: 41212.7). Total num frames: 1426030592. Throughput: 0: 10877.1. Samples: 356573696. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:40:30,741][1652475] Updated weights for policy 0, policy_version 696316 (0.0013) [2024-06-15 20:40:33,306][1652475] Updated weights for policy 0, policy_version 696381 (0.0034) [2024-06-15 20:40:35,397][1652475] Updated weights for policy 0, policy_version 696438 (0.0012) [2024-06-15 20:40:35,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 41321.0). Total num frames: 1426325504. Throughput: 0: 10774.7. Samples: 356604928. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:35,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 20:40:40,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42598.3, 300 sec: 41098.8). Total num frames: 1426391040. Throughput: 0: 10945.4. Samples: 356675072. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:40:41,070][1652475] Updated weights for policy 0, policy_version 696504 (0.0013) [2024-06-15 20:40:45,218][1652475] Updated weights for policy 0, policy_version 696608 (0.0022) [2024-06-15 20:40:45,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 43144.4, 300 sec: 41098.9). Total num frames: 1426685952. Throughput: 0: 10672.4. Samples: 356732928. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:40:46,199][1652475] Updated weights for policy 0, policy_version 696649 (0.0011) [2024-06-15 20:40:47,322][1652475] Updated weights for policy 0, policy_version 696704 (0.0016) [2024-06-15 20:40:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.5, 300 sec: 40876.7). Total num frames: 1426849792. Throughput: 0: 10752.0. Samples: 356767232. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:40:53,301][1652475] Updated weights for policy 0, policy_version 696758 (0.0012) [2024-06-15 20:40:55,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 41506.2, 300 sec: 40876.7). Total num frames: 1426980864. Throughput: 0: 10888.6. Samples: 356834816. Policy #0 lag: (min: 15.0, avg: 146.6, max: 271.0) [2024-06-15 20:40:55,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:40:56,267][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000696800_1427046400.pth... [2024-06-15 20:40:56,431][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000692032_1417281536.pth [2024-06-15 20:40:57,081][1652475] Updated weights for policy 0, policy_version 696832 (0.0016) [2024-06-15 20:40:59,893][1652475] Updated weights for policy 0, policy_version 696944 (0.0153) [2024-06-15 20:41:00,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 44236.9, 300 sec: 41321.0). Total num frames: 1427374080. Throughput: 0: 10410.7. Samples: 356883456. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:00,744][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:41:05,076][1652475] Updated weights for policy 0, policy_version 696992 (0.0013) [2024-06-15 20:41:05,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 42052.1, 300 sec: 41209.9). Total num frames: 1427505152. Throughput: 0: 10547.2. Samples: 356920320. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:05,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 20:41:08,024][1651340] Signal inference workers to stop experience collection... (35800 times) [2024-06-15 20:41:08,085][1652475] InferenceWorker_p0-w0: stopping experience collection (35800 times) [2024-06-15 20:41:08,226][1651340] Signal inference workers to resume experience collection... (35800 times) [2024-06-15 20:41:08,227][1652475] InferenceWorker_p0-w0: resuming experience collection (35800 times) [2024-06-15 20:41:08,229][1652475] Updated weights for policy 0, policy_version 697040 (0.0012) [2024-06-15 20:41:09,845][1652475] Updated weights for policy 0, policy_version 697106 (0.0011) [2024-06-15 20:41:10,754][1648984] Fps is (10 sec: 35985.3, 60 sec: 43132.7, 300 sec: 41207.6). Total num frames: 1427734528. Throughput: 0: 10657.1. Samples: 356992512. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:10,755][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:41:11,438][1652475] Updated weights for policy 0, policy_version 697157 (0.0013) [2024-06-15 20:41:15,259][1652475] Updated weights for policy 0, policy_version 697222 (0.0027) [2024-06-15 20:41:15,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 40987.7). Total num frames: 1427931136. Throughput: 0: 10638.2. Samples: 357052416. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:15,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:41:16,626][1652475] Updated weights for policy 0, policy_version 697280 (0.0013) [2024-06-15 20:41:20,738][1648984] Fps is (10 sec: 36104.7, 60 sec: 41506.2, 300 sec: 40654.6). Total num frames: 1428094976. Throughput: 0: 10604.1. Samples: 357082112. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:41:21,017][1652475] Updated weights for policy 0, policy_version 697339 (0.0116) [2024-06-15 20:41:24,427][1652475] Updated weights for policy 0, policy_version 697399 (0.0050) [2024-06-15 20:41:25,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 42052.2, 300 sec: 40987.8). Total num frames: 1428324352. Throughput: 0: 10501.7. Samples: 357147648. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 20:41:26,602][1652475] Updated weights for policy 0, policy_version 697456 (0.0011) [2024-06-15 20:41:28,009][1652475] Updated weights for policy 0, policy_version 697520 (0.0014) [2024-06-15 20:41:30,738][1648984] Fps is (10 sec: 45873.3, 60 sec: 42052.0, 300 sec: 40877.8). Total num frames: 1428553728. Throughput: 0: 10581.3. Samples: 357209088. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:30,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:41:31,710][1652475] Updated weights for policy 0, policy_version 697584 (0.0067) [2024-06-15 20:41:35,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 39321.5, 300 sec: 40876.7). Total num frames: 1428684800. Throughput: 0: 10626.8. Samples: 357245440. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:41:37,510][1652475] Updated weights for policy 0, policy_version 697634 (0.0014) [2024-06-15 20:41:39,703][1652475] Updated weights for policy 0, policy_version 697728 (0.0011) [2024-06-15 20:41:40,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.5, 300 sec: 41432.1). Total num frames: 1429012480. Throughput: 0: 10456.1. Samples: 357305344. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:40,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:41:42,837][1652475] Updated weights for policy 0, policy_version 697794 (0.0015) [2024-06-15 20:41:44,240][1652475] Updated weights for policy 0, policy_version 697854 (0.0085) [2024-06-15 20:41:45,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 42052.3, 300 sec: 40877.3). Total num frames: 1429209088. Throughput: 0: 10717.9. Samples: 357365760. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:41:49,728][1652475] Updated weights for policy 0, policy_version 697919 (0.0013) [2024-06-15 20:41:50,738][1648984] Fps is (10 sec: 36046.1, 60 sec: 42052.4, 300 sec: 40987.8). Total num frames: 1429372928. Throughput: 0: 10752.0. Samples: 357404160. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:50,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 20:41:51,336][1652475] Updated weights for policy 0, policy_version 697980 (0.0033) [2024-06-15 20:41:53,688][1651340] Signal inference workers to stop experience collection... (35850 times) [2024-06-15 20:41:53,729][1652475] InferenceWorker_p0-w0: stopping experience collection (35850 times) [2024-06-15 20:41:53,921][1651340] Signal inference workers to resume experience collection... (35850 times) [2024-06-15 20:41:53,937][1652475] InferenceWorker_p0-w0: resuming experience collection (35850 times) [2024-06-15 20:41:55,018][1652475] Updated weights for policy 0, policy_version 698049 (0.0133) [2024-06-15 20:41:55,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 41432.1). Total num frames: 1429635072. Throughput: 0: 10482.8. Samples: 357464064. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:41:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 20:42:00,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 39321.6, 300 sec: 40654.5). Total num frames: 1429733376. Throughput: 0: 10581.4. Samples: 357528576. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:42:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:42:01,342][1652475] Updated weights for policy 0, policy_version 698144 (0.0014) [2024-06-15 20:42:02,881][1652475] Updated weights for policy 0, policy_version 698214 (0.0014) [2024-06-15 20:42:05,563][1652475] Updated weights for policy 0, policy_version 698256 (0.0015) [2024-06-15 20:42:05,750][1648984] Fps is (10 sec: 39272.8, 60 sec: 42043.7, 300 sec: 41430.3). Total num frames: 1430028288. Throughput: 0: 10532.9. Samples: 357556224. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:42:05,751][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:42:07,512][1652475] Updated weights for policy 0, policy_version 698311 (0.0012) [2024-06-15 20:42:10,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 42063.9, 300 sec: 40876.7). Total num frames: 1430257664. Throughput: 0: 10661.0. Samples: 357627392. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:42:10,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:42:11,782][1652475] Updated weights for policy 0, policy_version 698370 (0.0012) [2024-06-15 20:42:13,102][1652475] Updated weights for policy 0, policy_version 698424 (0.0012) [2024-06-15 20:42:14,745][1652475] Updated weights for policy 0, policy_version 698491 (0.0012) [2024-06-15 20:42:15,738][1648984] Fps is (10 sec: 49212.9, 60 sec: 43144.7, 300 sec: 41321.0). Total num frames: 1430519808. Throughput: 0: 10797.6. Samples: 357694976. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:42:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:42:18,130][1652475] Updated weights for policy 0, policy_version 698554 (0.0012) [2024-06-15 20:42:20,390][1652475] Updated weights for policy 0, policy_version 698609 (0.0012) [2024-06-15 20:42:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 41432.1). Total num frames: 1430781952. Throughput: 0: 10695.1. Samples: 357726720. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:42:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:42:24,261][1652475] Updated weights for policy 0, policy_version 698661 (0.0011) [2024-06-15 20:42:25,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 44236.8, 300 sec: 41098.8). Total num frames: 1430978560. Throughput: 0: 11025.1. Samples: 357801472. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:42:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:42:28,739][1652475] Updated weights for policy 0, policy_version 698753 (0.0108) [2024-06-15 20:42:30,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.9, 300 sec: 41654.2). Total num frames: 1431175168. Throughput: 0: 10877.1. Samples: 357855232. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:42:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:42:31,565][1652475] Updated weights for policy 0, policy_version 698817 (0.0039) [2024-06-15 20:42:33,038][1652475] Updated weights for policy 0, policy_version 698880 (0.0011) [2024-06-15 20:42:35,738][1648984] Fps is (10 sec: 32767.1, 60 sec: 43690.5, 300 sec: 41209.9). Total num frames: 1431306240. Throughput: 0: 10695.0. Samples: 357885440. Policy #0 lag: (min: 31.0, avg: 109.2, max: 287.0) [2024-06-15 20:42:35,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:42:37,472][1652475] Updated weights for policy 0, policy_version 698944 (0.0028) [2024-06-15 20:42:38,199][1651340] Signal inference workers to stop experience collection... (35900 times) [2024-06-15 20:42:38,230][1652475] InferenceWorker_p0-w0: stopping experience collection (35900 times) [2024-06-15 20:42:38,437][1651340] Signal inference workers to resume experience collection... (35900 times) [2024-06-15 20:42:38,437][1652475] InferenceWorker_p0-w0: resuming experience collection (35900 times) [2024-06-15 20:42:40,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42598.6, 300 sec: 41321.0). Total num frames: 1431568384. Throughput: 0: 10854.4. Samples: 357952512. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:42:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:42:42,539][1652475] Updated weights for policy 0, policy_version 699056 (0.0014) [2024-06-15 20:42:44,680][1652475] Updated weights for policy 0, policy_version 699134 (0.0012) [2024-06-15 20:42:45,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.6, 300 sec: 41765.6). Total num frames: 1431830528. Throughput: 0: 10831.6. Samples: 358016000. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:42:45,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:42:49,743][1652475] Updated weights for policy 0, policy_version 699200 (0.0013) [2024-06-15 20:42:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 41432.1). Total num frames: 1432027136. Throughput: 0: 11050.9. Samples: 358053376. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:42:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 20:42:51,295][1652475] Updated weights for policy 0, policy_version 699255 (0.0012) [2024-06-15 20:42:55,101][1652475] Updated weights for policy 0, policy_version 699282 (0.0011) [2024-06-15 20:42:55,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42052.1, 300 sec: 41543.2). Total num frames: 1432158208. Throughput: 0: 10865.7. Samples: 358116352. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:42:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:42:56,130][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000699328_1432223744.pth... [2024-06-15 20:42:56,327][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000694368_1422065664.pth [2024-06-15 20:42:57,443][1652475] Updated weights for policy 0, policy_version 699385 (0.0100) [2024-06-15 20:43:00,766][1648984] Fps is (10 sec: 32674.2, 60 sec: 43669.8, 300 sec: 41761.3). Total num frames: 1432354816. Throughput: 0: 10711.0. Samples: 358177280. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:00,767][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:43:02,669][1652475] Updated weights for policy 0, policy_version 699458 (0.0013) [2024-06-15 20:43:03,772][1652475] Updated weights for policy 0, policy_version 699518 (0.0014) [2024-06-15 20:43:05,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 43153.4, 300 sec: 41654.3). Total num frames: 1432616960. Throughput: 0: 10524.4. Samples: 358200320. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:05,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:43:08,055][1652475] Updated weights for policy 0, policy_version 699575 (0.0016) [2024-06-15 20:43:09,647][1652475] Updated weights for policy 0, policy_version 699616 (0.0012) [2024-06-15 20:43:10,366][1652475] Updated weights for policy 0, policy_version 699646 (0.0011) [2024-06-15 20:43:10,738][1648984] Fps is (10 sec: 52579.5, 60 sec: 43690.6, 300 sec: 42210.6). Total num frames: 1432879104. Throughput: 0: 10433.4. Samples: 358270976. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:43:14,672][1652475] Updated weights for policy 0, policy_version 699728 (0.0222) [2024-06-15 20:43:15,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43144.5, 300 sec: 42431.8). Total num frames: 1433108480. Throughput: 0: 10683.8. Samples: 358336000. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:43:15,799][1652475] Updated weights for policy 0, policy_version 699776 (0.0014) [2024-06-15 20:43:19,872][1652475] Updated weights for policy 0, policy_version 699829 (0.0012) [2024-06-15 20:43:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 1433305088. Throughput: 0: 10797.6. Samples: 358371328. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:43:21,021][1652475] Updated weights for policy 0, policy_version 699873 (0.0014) [2024-06-15 20:43:24,090][1652475] Updated weights for policy 0, policy_version 699906 (0.0015) [2024-06-15 20:43:25,133][1652475] Updated weights for policy 0, policy_version 699966 (0.0012) [2024-06-15 20:43:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42598.5, 300 sec: 42209.6). Total num frames: 1433534464. Throughput: 0: 10877.2. Samples: 358441984. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:43:26,951][1651340] Signal inference workers to stop experience collection... (35950 times) [2024-06-15 20:43:27,051][1652475] InferenceWorker_p0-w0: stopping experience collection (35950 times) [2024-06-15 20:43:27,237][1651340] Signal inference workers to resume experience collection... (35950 times) [2024-06-15 20:43:27,258][1652475] InferenceWorker_p0-w0: resuming experience collection (35950 times) [2024-06-15 20:43:28,311][1652475] Updated weights for policy 0, policy_version 700028 (0.0028) [2024-06-15 20:43:30,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43144.6, 300 sec: 42542.9). Total num frames: 1433763840. Throughput: 0: 10900.0. Samples: 358506496. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:43:31,458][1652475] Updated weights for policy 0, policy_version 700112 (0.0029) [2024-06-15 20:43:32,474][1652475] Updated weights for policy 0, policy_version 700160 (0.0015) [2024-06-15 20:43:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43691.0, 300 sec: 42209.6). Total num frames: 1433927680. Throughput: 0: 10752.0. Samples: 358537216. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:43:37,731][1652475] Updated weights for policy 0, policy_version 700222 (0.0013) [2024-06-15 20:43:40,738][1648984] Fps is (10 sec: 29489.9, 60 sec: 41505.9, 300 sec: 42098.5). Total num frames: 1434058752. Throughput: 0: 10877.1. Samples: 358605824. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:40,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:43:41,915][1652475] Updated weights for policy 0, policy_version 700274 (0.0012) [2024-06-15 20:43:43,449][1652475] Updated weights for policy 0, policy_version 700352 (0.0016) [2024-06-15 20:43:45,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.8, 300 sec: 42987.2). Total num frames: 1434451968. Throughput: 0: 10736.1. Samples: 358660096. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:45,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:43:49,719][1652475] Updated weights for policy 0, policy_version 700417 (0.0014) [2024-06-15 20:43:50,738][1648984] Fps is (10 sec: 45877.0, 60 sec: 41506.1, 300 sec: 42432.0). Total num frames: 1434517504. Throughput: 0: 11093.3. Samples: 358699520. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:43:51,113][1652475] Updated weights for policy 0, policy_version 700477 (0.0022) [2024-06-15 20:43:53,857][1652475] Updated weights for policy 0, policy_version 700544 (0.0013) [2024-06-15 20:43:55,740][1648984] Fps is (10 sec: 39321.6, 60 sec: 44783.0, 300 sec: 42653.9). Total num frames: 1434845184. Throughput: 0: 10934.0. Samples: 358763008. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:43:55,741][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:43:55,986][1652475] Updated weights for policy 0, policy_version 700624 (0.0113) [2024-06-15 20:43:57,059][1652475] Updated weights for policy 0, policy_version 700672 (0.0023) [2024-06-15 20:44:00,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43711.6, 300 sec: 42653.9). Total num frames: 1434976256. Throughput: 0: 10865.8. Samples: 358824960. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:44:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:44:03,982][1652475] Updated weights for policy 0, policy_version 700734 (0.0012) [2024-06-15 20:44:05,140][1652475] Updated weights for policy 0, policy_version 700784 (0.0012) [2024-06-15 20:44:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1435238400. Throughput: 0: 10934.1. Samples: 358863360. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:44:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:44:06,723][1652475] Updated weights for policy 0, policy_version 700836 (0.0011) [2024-06-15 20:44:07,813][1651340] Signal inference workers to stop experience collection... (36000 times) [2024-06-15 20:44:07,884][1652475] InferenceWorker_p0-w0: stopping experience collection (36000 times) [2024-06-15 20:44:08,067][1651340] Signal inference workers to resume experience collection... (36000 times) [2024-06-15 20:44:08,068][1652475] InferenceWorker_p0-w0: resuming experience collection (36000 times) [2024-06-15 20:44:08,340][1652475] Updated weights for policy 0, policy_version 700904 (0.0013) [2024-06-15 20:44:10,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.8, 300 sec: 43101.1). Total num frames: 1435500544. Throughput: 0: 10547.2. Samples: 358916608. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:44:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:44:14,889][1652475] Updated weights for policy 0, policy_version 700946 (0.0014) [2024-06-15 20:44:15,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 1435631616. Throughput: 0: 10899.9. Samples: 358996992. Policy #0 lag: (min: 20.0, avg: 101.5, max: 276.0) [2024-06-15 20:44:15,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:44:16,210][1652475] Updated weights for policy 0, policy_version 701012 (0.0012) [2024-06-15 20:44:17,897][1652475] Updated weights for policy 0, policy_version 701075 (0.0012) [2024-06-15 20:44:18,832][1652475] Updated weights for policy 0, policy_version 701120 (0.0012) [2024-06-15 20:44:20,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 1435893760. Throughput: 0: 10706.5. Samples: 359019008. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:44:20,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:44:22,123][1652475] Updated weights for policy 0, policy_version 701172 (0.0013) [2024-06-15 20:44:25,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1436024832. Throughput: 0: 10820.4. Samples: 359092736. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:44:25,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:44:27,657][1652475] Updated weights for policy 0, policy_version 701254 (0.0078) [2024-06-15 20:44:29,752][1652475] Updated weights for policy 0, policy_version 701333 (0.0011) [2024-06-15 20:44:30,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 1436418048. Throughput: 0: 10899.9. Samples: 359150592. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:44:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:44:33,912][1652475] Updated weights for policy 0, policy_version 701424 (0.0013) [2024-06-15 20:44:35,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1436549120. Throughput: 0: 10854.4. Samples: 359187968. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:44:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:44:38,737][1652475] Updated weights for policy 0, policy_version 701472 (0.0009) [2024-06-15 20:44:40,739][1648984] Fps is (10 sec: 32767.9, 60 sec: 44783.2, 300 sec: 42876.1). Total num frames: 1436745728. Throughput: 0: 10911.3. Samples: 359254016. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:44:40,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:44:40,825][1652475] Updated weights for policy 0, policy_version 701541 (0.0012) [2024-06-15 20:44:42,801][1652475] Updated weights for policy 0, policy_version 701626 (0.0014) [2024-06-15 20:44:45,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1436975104. Throughput: 0: 10934.0. Samples: 359316992. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:44:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:44:46,235][1652475] Updated weights for policy 0, policy_version 701670 (0.0010) [2024-06-15 20:44:50,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 1437106176. Throughput: 0: 10808.9. Samples: 359349760. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:44:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:44:51,032][1652475] Updated weights for policy 0, policy_version 701728 (0.0012) [2024-06-15 20:44:52,756][1651340] Signal inference workers to stop experience collection... (36050 times) [2024-06-15 20:44:52,809][1652475] InferenceWorker_p0-w0: stopping experience collection (36050 times) [2024-06-15 20:44:52,981][1651340] Signal inference workers to resume experience collection... (36050 times) [2024-06-15 20:44:52,982][1652475] InferenceWorker_p0-w0: resuming experience collection (36050 times) [2024-06-15 20:44:53,496][1652475] Updated weights for policy 0, policy_version 701824 (0.0012) [2024-06-15 20:44:55,063][1652475] Updated weights for policy 0, policy_version 701884 (0.0012) [2024-06-15 20:44:55,760][1648984] Fps is (10 sec: 49042.4, 60 sec: 43674.5, 300 sec: 43206.1). Total num frames: 1437466624. Throughput: 0: 10814.9. Samples: 359403520. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:44:55,761][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:44:55,793][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000701888_1437466624.pth... [2024-06-15 20:44:55,842][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000696800_1427046400.pth [2024-06-15 20:44:59,077][1652475] Updated weights for policy 0, policy_version 701923 (0.0012) [2024-06-15 20:45:00,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 1437597696. Throughput: 0: 10626.9. Samples: 359475200. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:45:02,639][1652475] Updated weights for policy 0, policy_version 701972 (0.0011) [2024-06-15 20:45:05,445][1652475] Updated weights for policy 0, policy_version 702064 (0.0013) [2024-06-15 20:45:05,737][1648984] Fps is (10 sec: 36126.1, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1437827072. Throughput: 0: 10888.6. Samples: 359508992. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:05,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:45:06,433][1652475] Updated weights for policy 0, policy_version 702112 (0.0011) [2024-06-15 20:45:10,678][1652475] Updated weights for policy 0, policy_version 702176 (0.0014) [2024-06-15 20:45:10,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 1438056448. Throughput: 0: 10672.3. Samples: 359572992. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:45:13,776][1652475] Updated weights for policy 0, policy_version 702213 (0.0013) [2024-06-15 20:45:14,987][1652475] Updated weights for policy 0, policy_version 702269 (0.0014) [2024-06-15 20:45:15,738][1648984] Fps is (10 sec: 42597.5, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1438253056. Throughput: 0: 10843.0. Samples: 359638528. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:45:17,135][1652475] Updated weights for policy 0, policy_version 702328 (0.0011) [2024-06-15 20:45:19,941][1652475] Updated weights for policy 0, policy_version 702395 (0.0012) [2024-06-15 20:45:20,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1438515200. Throughput: 0: 10740.6. Samples: 359671296. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:45:23,085][1652475] Updated weights for policy 0, policy_version 702457 (0.0011) [2024-06-15 20:45:25,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 42876.1). Total num frames: 1438679040. Throughput: 0: 10763.4. Samples: 359738368. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:45:26,449][1652475] Updated weights for policy 0, policy_version 702523 (0.0105) [2024-06-15 20:45:28,584][1652475] Updated weights for policy 0, policy_version 702576 (0.0011) [2024-06-15 20:45:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1438908416. Throughput: 0: 10797.5. Samples: 359802880. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:45:31,820][1652475] Updated weights for policy 0, policy_version 702628 (0.0015) [2024-06-15 20:45:34,469][1652475] Updated weights for policy 0, policy_version 702679 (0.0013) [2024-06-15 20:45:35,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1439170560. Throughput: 0: 10797.5. Samples: 359835648. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:45:36,388][1652475] Updated weights for policy 0, policy_version 702722 (0.0015) [2024-06-15 20:45:39,951][1652475] Updated weights for policy 0, policy_version 702785 (0.0020) [2024-06-15 20:45:40,518][1651340] Signal inference workers to stop experience collection... (36100 times) [2024-06-15 20:45:40,552][1652475] InferenceWorker_p0-w0: stopping experience collection (36100 times) [2024-06-15 20:45:40,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1439367168. Throughput: 0: 11144.4. Samples: 359904768. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:45:40,756][1651340] Signal inference workers to resume experience collection... (36100 times) [2024-06-15 20:45:40,756][1652475] InferenceWorker_p0-w0: resuming experience collection (36100 times) [2024-06-15 20:45:42,089][1652475] Updated weights for policy 0, policy_version 702864 (0.0012) [2024-06-15 20:45:45,713][1652475] Updated weights for policy 0, policy_version 702944 (0.0015) [2024-06-15 20:45:45,750][1648984] Fps is (10 sec: 45817.8, 60 sec: 44227.6, 300 sec: 43318.6). Total num frames: 1439629312. Throughput: 0: 11022.0. Samples: 359971328. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:45,751][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:45:49,513][1652475] Updated weights for policy 0, policy_version 703024 (0.0011) [2024-06-15 20:45:50,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 45329.1, 300 sec: 43542.6). Total num frames: 1439825920. Throughput: 0: 10990.9. Samples: 360003584. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:50,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 20:45:52,514][1652475] Updated weights for policy 0, policy_version 703047 (0.0010) [2024-06-15 20:45:54,797][1652475] Updated weights for policy 0, policy_version 703120 (0.0013) [2024-06-15 20:45:55,738][1648984] Fps is (10 sec: 42651.0, 60 sec: 43160.5, 300 sec: 42987.1). Total num frames: 1440055296. Throughput: 0: 11025.0. Samples: 360069120. Policy #0 lag: (min: 63.0, avg: 148.0, max: 287.0) [2024-06-15 20:45:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:45:58,221][1652475] Updated weights for policy 0, policy_version 703226 (0.0013) [2024-06-15 20:46:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1440219136. Throughput: 0: 10911.3. Samples: 360129536. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:46:03,052][1652475] Updated weights for policy 0, policy_version 703294 (0.0012) [2024-06-15 20:46:05,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.6, 300 sec: 43211.7). Total num frames: 1440481280. Throughput: 0: 10911.3. Samples: 360162304. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:46:07,376][1652475] Updated weights for policy 0, policy_version 703376 (0.0034) [2024-06-15 20:46:09,278][1652475] Updated weights for policy 0, policy_version 703458 (0.0011) [2024-06-15 20:46:10,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 1440743424. Throughput: 0: 10820.2. Samples: 360225280. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:46:15,441][1652475] Updated weights for policy 0, policy_version 703536 (0.0096) [2024-06-15 20:46:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1440841728. Throughput: 0: 10990.9. Samples: 360297472. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:15,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:46:17,245][1652475] Updated weights for policy 0, policy_version 703613 (0.0011) [2024-06-15 20:46:20,597][1652475] Updated weights for policy 0, policy_version 703677 (0.0012) [2024-06-15 20:46:20,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1441136640. Throughput: 0: 10888.5. Samples: 360325632. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:46:21,879][1652475] Updated weights for policy 0, policy_version 703728 (0.0025) [2024-06-15 20:46:25,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 1441267712. Throughput: 0: 10877.1. Samples: 360394240. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:46:26,441][1652475] Updated weights for policy 0, policy_version 703792 (0.0013) [2024-06-15 20:46:27,639][1652475] Updated weights for policy 0, policy_version 703840 (0.0012) [2024-06-15 20:46:27,755][1651340] Signal inference workers to stop experience collection... (36150 times) [2024-06-15 20:46:27,863][1652475] InferenceWorker_p0-w0: stopping experience collection (36150 times) [2024-06-15 20:46:28,013][1651340] Signal inference workers to resume experience collection... (36150 times) [2024-06-15 20:46:28,016][1652475] InferenceWorker_p0-w0: resuming experience collection (36150 times) [2024-06-15 20:46:30,637][1652475] Updated weights for policy 0, policy_version 703878 (0.0015) [2024-06-15 20:46:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1441529856. Throughput: 0: 10914.3. Samples: 360462336. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 20:46:31,694][1652475] Updated weights for policy 0, policy_version 703932 (0.0014) [2024-06-15 20:46:35,682][1652475] Updated weights for policy 0, policy_version 703996 (0.0011) [2024-06-15 20:46:35,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 43144.6, 300 sec: 43209.4). Total num frames: 1441759232. Throughput: 0: 10968.2. Samples: 360497152. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:35,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:46:37,866][1652475] Updated weights for policy 0, policy_version 704061 (0.0012) [2024-06-15 20:46:39,683][1652475] Updated weights for policy 0, policy_version 704128 (0.0120) [2024-06-15 20:46:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 1442054144. Throughput: 0: 10820.3. Samples: 360556032. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:46:43,245][1652475] Updated weights for policy 0, policy_version 704164 (0.0010) [2024-06-15 20:46:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42607.3, 300 sec: 43431.5). Total num frames: 1442185216. Throughput: 0: 11116.1. Samples: 360629760. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:46:47,691][1652475] Updated weights for policy 0, policy_version 704226 (0.0011) [2024-06-15 20:46:49,661][1652475] Updated weights for policy 0, policy_version 704320 (0.0111) [2024-06-15 20:46:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 43653.6). Total num frames: 1442512896. Throughput: 0: 11173.0. Samples: 360665088. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:46:51,185][1652475] Updated weights for policy 0, policy_version 704376 (0.0012) [2024-06-15 20:46:54,561][1652475] Updated weights for policy 0, policy_version 704418 (0.0014) [2024-06-15 20:46:55,321][1652475] Updated weights for policy 0, policy_version 704448 (0.0020) [2024-06-15 20:46:55,738][1648984] Fps is (10 sec: 52427.0, 60 sec: 44236.7, 300 sec: 43986.8). Total num frames: 1442709504. Throughput: 0: 11241.2. Samples: 360731136. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:46:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:46:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000704448_1442709504.pth... [2024-06-15 20:46:55,857][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000699328_1432223744.pth [2024-06-15 20:46:59,742][1652475] Updated weights for policy 0, policy_version 704512 (0.0012) [2024-06-15 20:47:00,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 44782.9, 300 sec: 43655.5). Total num frames: 1442906112. Throughput: 0: 11298.1. Samples: 360805888. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:47:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:47:01,010][1652475] Updated weights for policy 0, policy_version 704561 (0.0130) [2024-06-15 20:47:03,074][1652475] Updated weights for policy 0, policy_version 704638 (0.0015) [2024-06-15 20:47:05,738][1648984] Fps is (10 sec: 45876.5, 60 sec: 44783.0, 300 sec: 43764.7). Total num frames: 1443168256. Throughput: 0: 11138.8. Samples: 360826880. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:47:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:47:05,871][1652475] Updated weights for policy 0, policy_version 704688 (0.0016) [2024-06-15 20:47:10,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 42052.4, 300 sec: 43209.3). Total num frames: 1443266560. Throughput: 0: 11355.1. Samples: 360905216. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:47:10,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:47:10,778][1652475] Updated weights for policy 0, policy_version 704721 (0.0011) [2024-06-15 20:47:12,279][1652475] Updated weights for policy 0, policy_version 704789 (0.0018) [2024-06-15 20:47:13,112][1651340] Signal inference workers to stop experience collection... (36200 times) [2024-06-15 20:47:13,160][1652475] InferenceWorker_p0-w0: stopping experience collection (36200 times) [2024-06-15 20:47:13,401][1651340] Signal inference workers to resume experience collection... (36200 times) [2024-06-15 20:47:13,402][1652475] InferenceWorker_p0-w0: resuming experience collection (36200 times) [2024-06-15 20:47:14,725][1652475] Updated weights for policy 0, policy_version 704884 (0.0081) [2024-06-15 20:47:15,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 46421.3, 300 sec: 43542.5). Total num frames: 1443627008. Throughput: 0: 11036.4. Samples: 360958976. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:47:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:47:17,994][1652475] Updated weights for policy 0, policy_version 704948 (0.0012) [2024-06-15 20:47:20,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 1443758080. Throughput: 0: 10979.5. Samples: 360991232. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:47:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:47:23,042][1652475] Updated weights for policy 0, policy_version 704992 (0.0015) [2024-06-15 20:47:23,980][1652475] Updated weights for policy 0, policy_version 705029 (0.0020) [2024-06-15 20:47:25,210][1652475] Updated weights for policy 0, policy_version 705078 (0.0013) [2024-06-15 20:47:25,740][1648984] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 43542.6). Total num frames: 1444020224. Throughput: 0: 11184.3. Samples: 361059328. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:47:25,741][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:47:27,518][1652475] Updated weights for policy 0, policy_version 705138 (0.0024) [2024-06-15 20:47:29,887][1652475] Updated weights for policy 0, policy_version 705214 (0.0015) [2024-06-15 20:47:30,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 45875.1, 300 sec: 43986.9). Total num frames: 1444282368. Throughput: 0: 10956.8. Samples: 361122816. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:47:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:47:35,453][1652475] Updated weights for policy 0, policy_version 705281 (0.0143) [2024-06-15 20:47:35,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44782.8, 300 sec: 43653.6). Total num frames: 1444446208. Throughput: 0: 11013.6. Samples: 361160704. Policy #0 lag: (min: 47.0, avg: 176.1, max: 303.0) [2024-06-15 20:47:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:47:36,739][1652475] Updated weights for policy 0, policy_version 705341 (0.0013) [2024-06-15 20:47:40,020][1652475] Updated weights for policy 0, policy_version 705402 (0.0011) [2024-06-15 20:47:40,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1444675584. Throughput: 0: 11127.5. Samples: 361231872. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:47:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:47:41,718][1652475] Updated weights for policy 0, policy_version 705466 (0.0011) [2024-06-15 20:47:45,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1444806656. Throughput: 0: 10934.1. Samples: 361297920. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:47:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:47:47,368][1652475] Updated weights for policy 0, policy_version 705536 (0.0013) [2024-06-15 20:47:49,012][1652475] Updated weights for policy 0, policy_version 705596 (0.0012) [2024-06-15 20:47:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42598.3, 300 sec: 43764.7). Total num frames: 1445068800. Throughput: 0: 10968.2. Samples: 361320448. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:47:50,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:47:52,358][1652475] Updated weights for policy 0, policy_version 705664 (0.0012) [2024-06-15 20:47:54,336][1652475] Updated weights for policy 0, policy_version 705724 (0.0012) [2024-06-15 20:47:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.9, 300 sec: 43991.2). Total num frames: 1445330944. Throughput: 0: 10649.6. Samples: 361384448. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:47:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:47:59,948][1651340] Signal inference workers to stop experience collection... (36250 times) [2024-06-15 20:48:00,029][1652475] InferenceWorker_p0-w0: stopping experience collection (36250 times) [2024-06-15 20:48:00,051][1652475] Updated weights for policy 0, policy_version 705795 (0.0090) [2024-06-15 20:48:00,305][1651340] Signal inference workers to resume experience collection... (36250 times) [2024-06-15 20:48:00,314][1652475] InferenceWorker_p0-w0: resuming experience collection (36250 times) [2024-06-15 20:48:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 1445494784. Throughput: 0: 10820.3. Samples: 361445888. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 20:48:04,846][1652475] Updated weights for policy 0, policy_version 705893 (0.0017) [2024-06-15 20:48:05,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1445724160. Throughput: 0: 10717.9. Samples: 361473536. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:48:09,727][1652475] Updated weights for policy 0, policy_version 705968 (0.0094) [2024-06-15 20:48:10,738][1648984] Fps is (10 sec: 36044.1, 60 sec: 43144.3, 300 sec: 43209.3). Total num frames: 1445855232. Throughput: 0: 10547.2. Samples: 361533952. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:10,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:48:12,560][1652475] Updated weights for policy 0, policy_version 706047 (0.0017) [2024-06-15 20:48:14,536][1652475] Updated weights for policy 0, policy_version 706106 (0.0103) [2024-06-15 20:48:15,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 1446117376. Throughput: 0: 10365.1. Samples: 361589248. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:48:17,607][1652475] Updated weights for policy 0, policy_version 706168 (0.0011) [2024-06-15 20:48:20,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1446248448. Throughput: 0: 10126.3. Samples: 361616384. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:48:25,219][1652475] Updated weights for policy 0, policy_version 706276 (0.0234) [2024-06-15 20:48:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 1446510592. Throughput: 0: 9955.5. Samples: 361679872. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:25,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:48:29,621][1652475] Updated weights for policy 0, policy_version 706368 (0.0014) [2024-06-15 20:48:30,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 40414.0, 300 sec: 43320.4). Total num frames: 1446707200. Throughput: 0: 9705.2. Samples: 361734656. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:48:31,148][1652475] Updated weights for policy 0, policy_version 706432 (0.0093) [2024-06-15 20:48:35,738][1648984] Fps is (10 sec: 32768.7, 60 sec: 39867.9, 300 sec: 43320.5). Total num frames: 1446838272. Throughput: 0: 10092.1. Samples: 361774592. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:35,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 20:48:36,874][1652475] Updated weights for policy 0, policy_version 706515 (0.0011) [2024-06-15 20:48:40,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 39867.7, 300 sec: 42765.0). Total num frames: 1447067648. Throughput: 0: 10023.8. Samples: 361835520. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:40,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 20:48:41,136][1652475] Updated weights for policy 0, policy_version 706608 (0.0013) [2024-06-15 20:48:43,126][1652475] Updated weights for policy 0, policy_version 706682 (0.0012) [2024-06-15 20:48:45,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 1447297024. Throughput: 0: 10069.3. Samples: 361899008. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 20:48:47,920][1652475] Updated weights for policy 0, policy_version 706707 (0.0010) [2024-06-15 20:48:49,380][1652475] Updated weights for policy 0, policy_version 706757 (0.0012) [2024-06-15 20:48:50,060][1651340] Signal inference workers to stop experience collection... (36300 times) [2024-06-15 20:48:50,110][1652475] InferenceWorker_p0-w0: stopping experience collection (36300 times) [2024-06-15 20:48:50,288][1651340] Signal inference workers to resume experience collection... (36300 times) [2024-06-15 20:48:50,290][1652475] InferenceWorker_p0-w0: resuming experience collection (36300 times) [2024-06-15 20:48:50,337][1652475] Updated weights for policy 0, policy_version 706801 (0.0014) [2024-06-15 20:48:50,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1447559168. Throughput: 0: 10251.4. Samples: 361934848. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:48:52,652][1652475] Updated weights for policy 0, policy_version 706877 (0.0014) [2024-06-15 20:48:54,758][1652475] Updated weights for policy 0, policy_version 706935 (0.0020) [2024-06-15 20:48:55,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 41506.0, 300 sec: 43542.5). Total num frames: 1447821312. Throughput: 0: 10194.5. Samples: 361992704. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:48:55,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 20:48:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000706944_1447821312.pth... [2024-06-15 20:48:55,786][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000701888_1437466624.pth [2024-06-15 20:48:59,406][1652475] Updated weights for policy 0, policy_version 706993 (0.0014) [2024-06-15 20:49:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40960.1, 300 sec: 43098.2). Total num frames: 1447952384. Throughput: 0: 10570.0. Samples: 362064896. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:49:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:49:03,483][1652475] Updated weights for policy 0, policy_version 707043 (0.0015) [2024-06-15 20:49:05,181][1652475] Updated weights for policy 0, policy_version 707106 (0.0012) [2024-06-15 20:49:05,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 1448181760. Throughput: 0: 10763.4. Samples: 362100736. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:49:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 20:49:07,099][1652475] Updated weights for policy 0, policy_version 707198 (0.0196) [2024-06-15 20:49:10,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.4, 300 sec: 43209.3). Total num frames: 1448378368. Throughput: 0: 10774.8. Samples: 362164736. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:49:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 20:49:11,698][1652475] Updated weights for policy 0, policy_version 707256 (0.0015) [2024-06-15 20:49:15,366][1652475] Updated weights for policy 0, policy_version 707312 (0.0011) [2024-06-15 20:49:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 40960.1, 300 sec: 42987.2). Total num frames: 1448574976. Throughput: 0: 11104.7. Samples: 362234368. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:49:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:49:17,569][1652475] Updated weights for policy 0, policy_version 707394 (0.0011) [2024-06-15 20:49:20,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 43690.5, 300 sec: 43542.5). Total num frames: 1448869888. Throughput: 0: 10638.2. Samples: 362253312. Policy #0 lag: (min: 0.0, avg: 106.2, max: 256.0) [2024-06-15 20:49:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:49:22,790][1652475] Updated weights for policy 0, policy_version 707459 (0.0012) [2024-06-15 20:49:24,079][1652475] Updated weights for policy 0, policy_version 707519 (0.0014) [2024-06-15 20:49:25,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1449000960. Throughput: 0: 10934.0. Samples: 362327552. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:49:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 20:49:29,126][1652475] Updated weights for policy 0, policy_version 707616 (0.0012) [2024-06-15 20:49:30,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 1449328640. Throughput: 0: 10649.6. Samples: 362378240. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:49:30,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 20:49:30,994][1652475] Updated weights for policy 0, policy_version 707708 (0.0016) [2024-06-15 20:49:35,644][1651340] Signal inference workers to stop experience collection... (36350 times) [2024-06-15 20:49:35,679][1652475] InferenceWorker_p0-w0: stopping experience collection (36350 times) [2024-06-15 20:49:35,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.4, 300 sec: 42987.2). Total num frames: 1449426944. Throughput: 0: 10661.0. Samples: 362414592. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:49:35,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 20:49:35,881][1651340] Signal inference workers to resume experience collection... (36350 times) [2024-06-15 20:49:35,882][1652475] InferenceWorker_p0-w0: resuming experience collection (36350 times) [2024-06-15 20:49:39,677][1652475] Updated weights for policy 0, policy_version 707777 (0.0012) [2024-06-15 20:49:40,738][1648984] Fps is (10 sec: 26214.7, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1449590784. Throughput: 0: 11002.3. Samples: 362487808. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:49:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:49:41,818][1652475] Updated weights for policy 0, policy_version 707856 (0.0034) [2024-06-15 20:49:44,104][1652475] Updated weights for policy 0, policy_version 707952 (0.0090) [2024-06-15 20:49:45,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1449918464. Throughput: 0: 10535.8. Samples: 362539008. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:49:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:49:49,218][1652475] Updated weights for policy 0, policy_version 708025 (0.0011) [2024-06-15 20:49:50,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 41506.1, 300 sec: 42657.2). Total num frames: 1450049536. Throughput: 0: 10467.5. Samples: 362571776. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:49:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:49:53,089][1652475] Updated weights for policy 0, policy_version 708085 (0.0104) [2024-06-15 20:49:53,941][1652475] Updated weights for policy 0, policy_version 708117 (0.0018) [2024-06-15 20:49:54,653][1652475] Updated weights for policy 0, policy_version 708157 (0.0048) [2024-06-15 20:49:55,738][1648984] Fps is (10 sec: 39320.1, 60 sec: 41505.9, 300 sec: 43098.2). Total num frames: 1450311680. Throughput: 0: 10387.8. Samples: 362632192. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:49:55,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:49:57,316][1652475] Updated weights for policy 0, policy_version 708223 (0.0011) [2024-06-15 20:50:00,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.2, 300 sec: 42876.1). Total num frames: 1450475520. Throughput: 0: 10285.5. Samples: 362697216. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:50:04,121][1652475] Updated weights for policy 0, policy_version 708294 (0.0014) [2024-06-15 20:50:05,738][1648984] Fps is (10 sec: 39323.1, 60 sec: 42052.2, 300 sec: 42876.1). Total num frames: 1450704896. Throughput: 0: 10524.5. Samples: 362726912. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:50:05,911][1652475] Updated weights for policy 0, policy_version 708354 (0.0012) [2024-06-15 20:50:08,974][1652475] Updated weights for policy 0, policy_version 708448 (0.0011) [2024-06-15 20:50:10,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.6, 300 sec: 43098.3). Total num frames: 1450967040. Throughput: 0: 10251.4. Samples: 362788864. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:50:12,992][1652475] Updated weights for policy 0, policy_version 708498 (0.0029) [2024-06-15 20:50:13,706][1652475] Updated weights for policy 0, policy_version 708541 (0.0011) [2024-06-15 20:50:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 1451130880. Throughput: 0: 10854.4. Samples: 362866688. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:50:17,537][1652475] Updated weights for policy 0, policy_version 708611 (0.0088) [2024-06-15 20:50:18,707][1652475] Updated weights for policy 0, policy_version 708667 (0.0012) [2024-06-15 20:50:20,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 1451458560. Throughput: 0: 10672.4. Samples: 362894848. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:20,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:50:20,828][1652475] Updated weights for policy 0, policy_version 708732 (0.0018) [2024-06-15 20:50:25,031][1651340] Signal inference workers to stop experience collection... (36400 times) [2024-06-15 20:50:25,088][1652475] InferenceWorker_p0-w0: stopping experience collection (36400 times) [2024-06-15 20:50:25,296][1651340] Signal inference workers to resume experience collection... (36400 times) [2024-06-15 20:50:25,297][1652475] InferenceWorker_p0-w0: resuming experience collection (36400 times) [2024-06-15 20:50:25,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 1451556864. Throughput: 0: 10592.7. Samples: 362964480. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:50:26,204][1652475] Updated weights for policy 0, policy_version 708792 (0.0016) [2024-06-15 20:50:28,551][1652475] Updated weights for policy 0, policy_version 708864 (0.0014) [2024-06-15 20:50:30,307][1652475] Updated weights for policy 0, policy_version 708926 (0.0013) [2024-06-15 20:50:30,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42598.5, 300 sec: 43098.3). Total num frames: 1451884544. Throughput: 0: 10695.1. Samples: 363020288. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:50:32,478][1652475] Updated weights for policy 0, policy_version 708976 (0.0013) [2024-06-15 20:50:35,738][1648984] Fps is (10 sec: 45874.5, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 1452015616. Throughput: 0: 10649.6. Samples: 363051008. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:35,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:50:38,835][1652475] Updated weights for policy 0, policy_version 709010 (0.0014) [2024-06-15 20:50:40,155][1652475] Updated weights for policy 0, policy_version 709061 (0.0016) [2024-06-15 20:50:40,738][1648984] Fps is (10 sec: 29490.8, 60 sec: 43144.4, 300 sec: 42544.6). Total num frames: 1452179456. Throughput: 0: 10979.6. Samples: 363126272. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 20:50:41,490][1652475] Updated weights for policy 0, policy_version 709111 (0.0011) [2024-06-15 20:50:43,953][1652475] Updated weights for policy 0, policy_version 709202 (0.0118) [2024-06-15 20:50:45,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1452539904. Throughput: 0: 10592.7. Samples: 363173888. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:50:50,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 41506.2, 300 sec: 42320.7). Total num frames: 1452539904. Throughput: 0: 10797.5. Samples: 363212800. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:50,742][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:50:52,423][1652475] Updated weights for policy 0, policy_version 709280 (0.0012) [2024-06-15 20:50:54,807][1652475] Updated weights for policy 0, policy_version 709376 (0.0071) [2024-06-15 20:50:55,738][1648984] Fps is (10 sec: 32766.6, 60 sec: 42598.4, 300 sec: 42876.0). Total num frames: 1452867584. Throughput: 0: 10774.7. Samples: 363273728. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:50:55,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:50:56,107][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000709440_1452933120.pth... [2024-06-15 20:50:56,228][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000704448_1442709504.pth [2024-06-15 20:50:56,242][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000709440_1452933120.pth [2024-06-15 20:50:56,836][1652475] Updated weights for policy 0, policy_version 709472 (0.0015) [2024-06-15 20:51:00,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43144.6, 300 sec: 42654.0). Total num frames: 1453064192. Throughput: 0: 10353.8. Samples: 363332608. Policy #0 lag: (min: 9.0, avg: 109.9, max: 265.0) [2024-06-15 20:51:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:51:04,975][1652475] Updated weights for policy 0, policy_version 709520 (0.0014) [2024-06-15 20:51:05,738][1648984] Fps is (10 sec: 29491.8, 60 sec: 40959.9, 300 sec: 42098.5). Total num frames: 1453162496. Throughput: 0: 10535.8. Samples: 363368960. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:05,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:51:05,824][1652475] Updated weights for policy 0, policy_version 709564 (0.0024) [2024-06-15 20:51:07,307][1652475] Updated weights for policy 0, policy_version 709637 (0.0071) [2024-06-15 20:51:07,634][1651340] Signal inference workers to stop experience collection... (36450 times) [2024-06-15 20:51:07,679][1652475] InferenceWorker_p0-w0: stopping experience collection (36450 times) [2024-06-15 20:51:07,957][1651340] Signal inference workers to resume experience collection... (36450 times) [2024-06-15 20:51:07,958][1652475] InferenceWorker_p0-w0: resuming experience collection (36450 times) [2024-06-15 20:51:09,160][1652475] Updated weights for policy 0, policy_version 709712 (0.0100) [2024-06-15 20:51:10,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1453588480. Throughput: 0: 10205.8. Samples: 363423744. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:10,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:51:15,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 40960.1, 300 sec: 42209.6). Total num frames: 1453588480. Throughput: 0: 10581.3. Samples: 363496448. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:51:16,373][1652475] Updated weights for policy 0, policy_version 709761 (0.0012) [2024-06-15 20:51:18,728][1652475] Updated weights for policy 0, policy_version 709856 (0.0013) [2024-06-15 20:51:20,097][1652475] Updated weights for policy 0, policy_version 709904 (0.0011) [2024-06-15 20:51:20,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 1453948928. Throughput: 0: 10558.6. Samples: 363526144. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:51:21,815][1652475] Updated weights for policy 0, policy_version 709984 (0.0090) [2024-06-15 20:51:25,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1454112768. Throughput: 0: 10285.6. Samples: 363589120. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:51:28,569][1652475] Updated weights for policy 0, policy_version 710064 (0.0014) [2024-06-15 20:51:29,762][1652475] Updated weights for policy 0, policy_version 710112 (0.0015) [2024-06-15 20:51:30,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 1454374912. Throughput: 0: 10786.1. Samples: 363659264. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:51:31,816][1652475] Updated weights for policy 0, policy_version 710182 (0.0020) [2024-06-15 20:51:32,919][1652475] Updated weights for policy 0, policy_version 710213 (0.0014) [2024-06-15 20:51:34,366][1652475] Updated weights for policy 0, policy_version 710272 (0.0028) [2024-06-15 20:51:35,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1454637056. Throughput: 0: 10706.5. Samples: 363694592. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:35,740][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 20:51:40,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1454768128. Throughput: 0: 10831.7. Samples: 363761152. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:40,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 20:51:41,812][1652475] Updated weights for policy 0, policy_version 710368 (0.0019) [2024-06-15 20:51:43,318][1652475] Updated weights for policy 0, policy_version 710417 (0.0012) [2024-06-15 20:51:44,057][1652475] Updated weights for policy 0, policy_version 710460 (0.0012) [2024-06-15 20:51:45,216][1652475] Updated weights for policy 0, policy_version 710497 (0.0014) [2024-06-15 20:51:45,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 1455128576. Throughput: 0: 11036.4. Samples: 363829248. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:45,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 20:51:50,139][1652475] Updated weights for policy 0, policy_version 710544 (0.0013) [2024-06-15 20:51:50,738][1648984] Fps is (10 sec: 45876.4, 60 sec: 44783.0, 300 sec: 42431.8). Total num frames: 1455226880. Throughput: 0: 10991.0. Samples: 363863552. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:50,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:51:53,798][1652475] Updated weights for policy 0, policy_version 710624 (0.0119) [2024-06-15 20:51:54,279][1651340] Signal inference workers to stop experience collection... (36500 times) [2024-06-15 20:51:54,312][1652475] InferenceWorker_p0-w0: stopping experience collection (36500 times) [2024-06-15 20:51:54,517][1651340] Signal inference workers to resume experience collection... (36500 times) [2024-06-15 20:51:54,519][1652475] InferenceWorker_p0-w0: resuming experience collection (36500 times) [2024-06-15 20:51:54,979][1652475] Updated weights for policy 0, policy_version 710676 (0.0011) [2024-06-15 20:51:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44237.0, 300 sec: 42765.0). Total num frames: 1455521792. Throughput: 0: 11423.3. Samples: 363937792. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:51:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:51:57,097][1652475] Updated weights for policy 0, policy_version 710752 (0.0127) [2024-06-15 20:52:00,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 42431.8). Total num frames: 1455685632. Throughput: 0: 11081.9. Samples: 363995136. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:52:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:52:02,363][1652475] Updated weights for policy 0, policy_version 710817 (0.0017) [2024-06-15 20:52:05,511][1652475] Updated weights for policy 0, policy_version 710869 (0.0011) [2024-06-15 20:52:05,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 45329.2, 300 sec: 42765.0). Total num frames: 1455882240. Throughput: 0: 11150.2. Samples: 364027904. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:52:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:52:07,710][1652475] Updated weights for policy 0, policy_version 710944 (0.0011) [2024-06-15 20:52:10,227][1652475] Updated weights for policy 0, policy_version 711031 (0.0142) [2024-06-15 20:52:10,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1456209920. Throughput: 0: 11150.1. Samples: 364090880. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:52:10,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:52:13,961][1652475] Updated weights for policy 0, policy_version 711072 (0.0015) [2024-06-15 20:52:15,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 42654.0). Total num frames: 1456340992. Throughput: 0: 11025.1. Samples: 364155392. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:52:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:52:17,701][1652475] Updated weights for policy 0, policy_version 711139 (0.0013) [2024-06-15 20:52:20,738][1648984] Fps is (10 sec: 32768.7, 60 sec: 43144.5, 300 sec: 42431.8). Total num frames: 1456537600. Throughput: 0: 11036.4. Samples: 364191232. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:52:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:52:20,978][1652475] Updated weights for policy 0, policy_version 711218 (0.0108) [2024-06-15 20:52:22,863][1652475] Updated weights for policy 0, policy_version 711295 (0.0013) [2024-06-15 20:52:25,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43690.5, 300 sec: 42209.6). Total num frames: 1456734208. Throughput: 0: 10820.3. Samples: 364248064. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:52:25,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:52:26,893][1652475] Updated weights for policy 0, policy_version 711358 (0.0011) [2024-06-15 20:52:30,335][1652475] Updated weights for policy 0, policy_version 711423 (0.0109) [2024-06-15 20:52:30,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 1456996352. Throughput: 0: 10899.9. Samples: 364319744. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:52:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:52:33,330][1652475] Updated weights for policy 0, policy_version 711488 (0.0020) [2024-06-15 20:52:35,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1457258496. Throughput: 0: 10934.0. Samples: 364355584. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:52:35,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:52:37,710][1652475] Updated weights for policy 0, policy_version 711554 (0.0011) [2024-06-15 20:52:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.9, 300 sec: 42653.9). Total num frames: 1457389568. Throughput: 0: 10695.1. Samples: 364419072. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:52:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:52:41,177][1651340] Signal inference workers to stop experience collection... (36550 times) [2024-06-15 20:52:41,201][1652475] InferenceWorker_p0-w0: stopping experience collection (36550 times) [2024-06-15 20:52:41,447][1651340] Signal inference workers to resume experience collection... (36550 times) [2024-06-15 20:52:41,449][1652475] InferenceWorker_p0-w0: resuming experience collection (36550 times) [2024-06-15 20:52:42,458][1652475] Updated weights for policy 0, policy_version 711672 (0.0013) [2024-06-15 20:52:44,033][1652475] Updated weights for policy 0, policy_version 711716 (0.0012) [2024-06-15 20:52:45,738][1648984] Fps is (10 sec: 45876.1, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 1457717248. Throughput: 0: 10763.4. Samples: 364479488. Policy #0 lag: (min: 15.0, avg: 61.3, max: 271.0) [2024-06-15 20:52:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:52:46,202][1652475] Updated weights for policy 0, policy_version 711803 (0.0013) [2024-06-15 20:52:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 1457782784. Throughput: 0: 10752.0. Samples: 364511744. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:52:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:52:52,743][1652475] Updated weights for policy 0, policy_version 711862 (0.0012) [2024-06-15 20:52:55,267][1652475] Updated weights for policy 0, policy_version 711923 (0.0013) [2024-06-15 20:52:55,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 1458044928. Throughput: 0: 10945.5. Samples: 364583424. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:52:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:52:56,363][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000711968_1458110464.pth... [2024-06-15 20:52:56,564][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000706944_1447821312.pth [2024-06-15 20:52:57,164][1652475] Updated weights for policy 0, policy_version 712000 (0.0012) [2024-06-15 20:52:58,817][1652475] Updated weights for policy 0, policy_version 712063 (0.0013) [2024-06-15 20:53:00,742][1648984] Fps is (10 sec: 52405.2, 60 sec: 43687.5, 300 sec: 42653.3). Total num frames: 1458307072. Throughput: 0: 10614.4. Samples: 364633088. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:00,743][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:53:05,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 40960.0, 300 sec: 42320.7). Total num frames: 1458339840. Throughput: 0: 10717.9. Samples: 364673536. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:53:07,538][1652475] Updated weights for policy 0, policy_version 712160 (0.0011) [2024-06-15 20:53:09,987][1652475] Updated weights for policy 0, policy_version 712241 (0.0013) [2024-06-15 20:53:10,738][1648984] Fps is (10 sec: 42617.1, 60 sec: 42052.4, 300 sec: 42765.0). Total num frames: 1458733056. Throughput: 0: 10661.0. Samples: 364727808. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:53:11,380][1652475] Updated weights for policy 0, policy_version 712311 (0.0192) [2024-06-15 20:53:15,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1458831360. Throughput: 0: 10490.3. Samples: 364791808. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:53:18,888][1652475] Updated weights for policy 0, policy_version 712368 (0.0014) [2024-06-15 20:53:20,635][1652475] Updated weights for policy 0, policy_version 712432 (0.0012) [2024-06-15 20:53:20,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 1459060736. Throughput: 0: 10524.5. Samples: 364829184. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:53:22,297][1652475] Updated weights for policy 0, policy_version 712512 (0.0014) [2024-06-15 20:53:22,782][1651340] Signal inference workers to stop experience collection... (36600 times) [2024-06-15 20:53:22,797][1652475] InferenceWorker_p0-w0: stopping experience collection (36600 times) [2024-06-15 20:53:23,005][1651340] Signal inference workers to resume experience collection... (36600 times) [2024-06-15 20:53:23,006][1652475] InferenceWorker_p0-w0: resuming experience collection (36600 times) [2024-06-15 20:53:25,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1459355648. Throughput: 0: 10194.5. Samples: 364877824. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:25,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:53:30,649][1652475] Updated weights for policy 0, policy_version 712592 (0.0217) [2024-06-15 20:53:30,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 39867.7, 300 sec: 42542.9). Total num frames: 1459388416. Throughput: 0: 10570.0. Samples: 364955136. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:53:32,693][1652475] Updated weights for policy 0, policy_version 712672 (0.0012) [2024-06-15 20:53:34,305][1652475] Updated weights for policy 0, policy_version 712752 (0.0013) [2024-06-15 20:53:35,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43144.7, 300 sec: 43320.4). Total num frames: 1459847168. Throughput: 0: 10387.9. Samples: 364979200. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:35,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:53:35,857][1652475] Updated weights for policy 0, policy_version 712822 (0.0078) [2024-06-15 20:53:40,738][1648984] Fps is (10 sec: 49150.8, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 1459879936. Throughput: 0: 10274.1. Samples: 365045760. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:40,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 20:53:42,958][1652475] Updated weights for policy 0, policy_version 712891 (0.0010) [2024-06-15 20:53:44,287][1652475] Updated weights for policy 0, policy_version 712936 (0.0014) [2024-06-15 20:53:45,277][1652475] Updated weights for policy 0, policy_version 712986 (0.0072) [2024-06-15 20:53:45,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1460240384. Throughput: 0: 10696.2. Samples: 365114368. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:53:47,358][1652475] Updated weights for policy 0, policy_version 713043 (0.0012) [2024-06-15 20:53:50,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1460404224. Throughput: 0: 10422.1. Samples: 365142528. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:53:54,215][1652475] Updated weights for policy 0, policy_version 713120 (0.0013) [2024-06-15 20:53:55,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1460568064. Throughput: 0: 10922.7. Samples: 365219328. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:53:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:53:56,757][1652475] Updated weights for policy 0, policy_version 713219 (0.0127) [2024-06-15 20:53:57,963][1652475] Updated weights for policy 0, policy_version 713280 (0.0025) [2024-06-15 20:54:00,129][1652475] Updated weights for policy 0, policy_version 713344 (0.0011) [2024-06-15 20:54:00,738][1648984] Fps is (10 sec: 52426.6, 60 sec: 43693.6, 300 sec: 43209.3). Total num frames: 1460928512. Throughput: 0: 10706.4. Samples: 365273600. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:54:00,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:54:05,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 1460928512. Throughput: 0: 10763.4. Samples: 365313536. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:54:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:54:06,777][1652475] Updated weights for policy 0, policy_version 713403 (0.0013) [2024-06-15 20:54:08,447][1651340] Signal inference workers to stop experience collection... (36650 times) [2024-06-15 20:54:08,519][1652475] InferenceWorker_p0-w0: stopping experience collection (36650 times) [2024-06-15 20:54:08,757][1651340] Signal inference workers to resume experience collection... (36650 times) [2024-06-15 20:54:08,758][1652475] InferenceWorker_p0-w0: resuming experience collection (36650 times) [2024-06-15 20:54:09,412][1652475] Updated weights for policy 0, policy_version 713506 (0.0019) [2024-06-15 20:54:10,738][1648984] Fps is (10 sec: 39322.7, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1461321728. Throughput: 0: 11036.4. Samples: 365374464. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:54:10,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:54:11,551][1652475] Updated weights for policy 0, policy_version 713584 (0.0020) [2024-06-15 20:54:15,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1461452800. Throughput: 0: 10820.2. Samples: 365442048. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:54:15,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:54:17,955][1652475] Updated weights for policy 0, policy_version 713622 (0.0012) [2024-06-15 20:54:19,302][1652475] Updated weights for policy 0, policy_version 713674 (0.0055) [2024-06-15 20:54:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 43098.2). Total num frames: 1461714944. Throughput: 0: 11207.1. Samples: 365483520. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:54:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:54:20,880][1652475] Updated weights for policy 0, policy_version 713731 (0.0137) [2024-06-15 20:54:23,536][1652475] Updated weights for policy 0, policy_version 713840 (0.0253) [2024-06-15 20:54:25,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1461977088. Throughput: 0: 10786.2. Samples: 365531136. Policy #0 lag: (min: 111.0, avg: 215.7, max: 367.0) [2024-06-15 20:54:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:54:29,863][1652475] Updated weights for policy 0, policy_version 713893 (0.0013) [2024-06-15 20:54:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 45329.0, 300 sec: 42987.2). Total num frames: 1462108160. Throughput: 0: 11047.8. Samples: 365611520. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:54:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 20:54:31,374][1652475] Updated weights for policy 0, policy_version 713952 (0.0025) [2024-06-15 20:54:35,114][1652475] Updated weights for policy 0, policy_version 714033 (0.0131) [2024-06-15 20:54:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 43320.4). Total num frames: 1462370304. Throughput: 0: 11059.2. Samples: 365640192. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:54:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:54:36,829][1652475] Updated weights for policy 0, policy_version 714104 (0.0012) [2024-06-15 20:54:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1462501376. Throughput: 0: 10649.6. Samples: 365698560. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:54:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:54:42,682][1652475] Updated weights for policy 0, policy_version 714165 (0.0012) [2024-06-15 20:54:44,005][1652475] Updated weights for policy 0, policy_version 714208 (0.0014) [2024-06-15 20:54:44,843][1652475] Updated weights for policy 0, policy_version 714240 (0.0011) [2024-06-15 20:54:45,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1462763520. Throughput: 0: 11025.2. Samples: 365769728. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:54:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:54:47,781][1652475] Updated weights for policy 0, policy_version 714320 (0.0012) [2024-06-15 20:54:49,075][1652475] Updated weights for policy 0, policy_version 714368 (0.0011) [2024-06-15 20:54:50,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1463025664. Throughput: 0: 10615.5. Samples: 365791232. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:54:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:54:55,032][1652475] Updated weights for policy 0, policy_version 714429 (0.0016) [2024-06-15 20:54:55,258][1651340] Signal inference workers to stop experience collection... (36700 times) [2024-06-15 20:54:55,318][1652475] InferenceWorker_p0-w0: stopping experience collection (36700 times) [2024-06-15 20:54:55,626][1651340] Signal inference workers to resume experience collection... (36700 times) [2024-06-15 20:54:55,627][1652475] InferenceWorker_p0-w0: resuming experience collection (36700 times) [2024-06-15 20:54:55,738][1648984] Fps is (10 sec: 42596.9, 60 sec: 43690.4, 300 sec: 43098.2). Total num frames: 1463189504. Throughput: 0: 10945.4. Samples: 365867008. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:54:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:54:56,167][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000714464_1463222272.pth... [2024-06-15 20:54:56,309][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000709440_1452933120.pth [2024-06-15 20:54:56,875][1652475] Updated weights for policy 0, policy_version 714495 (0.0015) [2024-06-15 20:55:00,251][1652475] Updated weights for policy 0, policy_version 714561 (0.0140) [2024-06-15 20:55:00,738][1648984] Fps is (10 sec: 42597.1, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1463451648. Throughput: 0: 10626.8. Samples: 365920256. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:55:01,570][1652475] Updated weights for policy 0, policy_version 714617 (0.0013) [2024-06-15 20:55:05,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1463549952. Throughput: 0: 10433.4. Samples: 365953024. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:05,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:55:09,273][1652475] Updated weights for policy 0, policy_version 714690 (0.0012) [2024-06-15 20:55:10,737][1648984] Fps is (10 sec: 36046.3, 60 sec: 41506.3, 300 sec: 42987.2). Total num frames: 1463812096. Throughput: 0: 10786.2. Samples: 366016512. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:10,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 20:55:11,442][1652475] Updated weights for policy 0, policy_version 714769 (0.0010) [2024-06-15 20:55:13,348][1652475] Updated weights for policy 0, policy_version 714848 (0.0011) [2024-06-15 20:55:15,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 43690.9, 300 sec: 42765.0). Total num frames: 1464074240. Throughput: 0: 10103.5. Samples: 366066176. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:15,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 20:55:20,738][1648984] Fps is (10 sec: 26214.1, 60 sec: 39321.7, 300 sec: 42431.8). Total num frames: 1464074240. Throughput: 0: 10376.6. Samples: 366107136. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:55:22,338][1652475] Updated weights for policy 0, policy_version 714914 (0.0012) [2024-06-15 20:55:24,187][1652475] Updated weights for policy 0, policy_version 714992 (0.0092) [2024-06-15 20:55:25,541][1652475] Updated weights for policy 0, policy_version 715041 (0.0010) [2024-06-15 20:55:25,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 40960.1, 300 sec: 42542.9). Total num frames: 1464434688. Throughput: 0: 10399.3. Samples: 366166528. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:55:26,906][1652475] Updated weights for policy 0, policy_version 715120 (0.0011) [2024-06-15 20:55:30,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 1464598528. Throughput: 0: 10319.6. Samples: 366234112. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:55:33,584][1652475] Updated weights for policy 0, policy_version 715168 (0.0011) [2024-06-15 20:55:35,121][1652475] Updated weights for policy 0, policy_version 715232 (0.0010) [2024-06-15 20:55:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 1464860672. Throughput: 0: 10604.1. Samples: 366268416. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:35,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 20:55:36,488][1652475] Updated weights for policy 0, policy_version 715296 (0.0047) [2024-06-15 20:55:36,913][1651340] Signal inference workers to stop experience collection... (36750 times) [2024-06-15 20:55:36,998][1652475] InferenceWorker_p0-w0: stopping experience collection (36750 times) [2024-06-15 20:55:37,079][1651340] Signal inference workers to resume experience collection... (36750 times) [2024-06-15 20:55:37,079][1652475] InferenceWorker_p0-w0: resuming experience collection (36750 times) [2024-06-15 20:55:37,420][1652475] Updated weights for policy 0, policy_version 715344 (0.0014) [2024-06-15 20:55:38,321][1652475] Updated weights for policy 0, policy_version 715389 (0.0012) [2024-06-15 20:55:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1465122816. Throughput: 0: 10342.5. Samples: 366332416. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:40,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 20:55:45,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 1465221120. Throughput: 0: 10809.0. Samples: 366406656. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:55:46,121][1652475] Updated weights for policy 0, policy_version 715461 (0.0015) [2024-06-15 20:55:48,536][1652475] Updated weights for policy 0, policy_version 715584 (0.0014) [2024-06-15 20:55:49,906][1652475] Updated weights for policy 0, policy_version 715644 (0.0029) [2024-06-15 20:55:50,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 43320.5). Total num frames: 1465647104. Throughput: 0: 10570.0. Samples: 366428672. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 20:55:55,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 40960.2, 300 sec: 42653.9). Total num frames: 1465647104. Throughput: 0: 10820.2. Samples: 366503424. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:55:55,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 20:55:58,436][1652475] Updated weights for policy 0, policy_version 715744 (0.0078) [2024-06-15 20:56:00,136][1652475] Updated weights for policy 0, policy_version 715808 (0.0013) [2024-06-15 20:56:00,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42598.6, 300 sec: 43542.6). Total num frames: 1466007552. Throughput: 0: 10922.7. Samples: 366557696. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:56:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 20:56:01,692][1652475] Updated weights for policy 0, policy_version 715875 (0.0027) [2024-06-15 20:56:05,738][1648984] Fps is (10 sec: 52425.8, 60 sec: 43690.4, 300 sec: 42653.9). Total num frames: 1466171392. Throughput: 0: 10751.9. Samples: 366590976. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:56:05,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 20:56:08,561][1652475] Updated weights for policy 0, policy_version 715920 (0.0011) [2024-06-15 20:56:10,448][1652475] Updated weights for policy 0, policy_version 716005 (0.0013) [2024-06-15 20:56:10,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43144.3, 300 sec: 43431.5). Total num frames: 1466400768. Throughput: 0: 11047.8. Samples: 366663680. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 20:56:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:56:11,186][1652475] Updated weights for policy 0, policy_version 716033 (0.0012) [2024-06-15 20:56:13,266][1652475] Updated weights for policy 0, policy_version 716112 (0.0012) [2024-06-15 20:56:14,539][1652475] Updated weights for policy 0, policy_version 716160 (0.0013) [2024-06-15 20:56:15,738][1648984] Fps is (10 sec: 52431.6, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1466695680. Throughput: 0: 10888.5. Samples: 366724096. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:56:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:56:20,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 1466761216. Throughput: 0: 11070.6. Samples: 366766592. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:56:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:56:21,639][1652475] Updated weights for policy 0, policy_version 716240 (0.0013) [2024-06-15 20:56:21,780][1651340] Signal inference workers to stop experience collection... (36800 times) [2024-06-15 20:56:21,831][1652475] InferenceWorker_p0-w0: stopping experience collection (36800 times) [2024-06-15 20:56:22,073][1651340] Signal inference workers to resume experience collection... (36800 times) [2024-06-15 20:56:22,074][1652475] InferenceWorker_p0-w0: resuming experience collection (36800 times) [2024-06-15 20:56:24,000][1652475] Updated weights for policy 0, policy_version 716336 (0.0013) [2024-06-15 20:56:25,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 44236.6, 300 sec: 43098.2). Total num frames: 1467088896. Throughput: 0: 10865.7. Samples: 366821376. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:56:25,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:56:26,429][1652475] Updated weights for policy 0, policy_version 716386 (0.0020) [2024-06-15 20:56:30,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1467219968. Throughput: 0: 10899.9. Samples: 366897152. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:56:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:56:32,945][1652475] Updated weights for policy 0, policy_version 716438 (0.0012) [2024-06-15 20:56:34,632][1652475] Updated weights for policy 0, policy_version 716515 (0.0013) [2024-06-15 20:56:35,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 44236.8, 300 sec: 43209.4). Total num frames: 1467514880. Throughput: 0: 11161.6. Samples: 366930944. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:56:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:56:38,003][1652475] Updated weights for policy 0, policy_version 716613 (0.0012) [2024-06-15 20:56:40,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 1467744256. Throughput: 0: 10638.2. Samples: 366982144. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:56:40,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:56:44,226][1652475] Updated weights for policy 0, policy_version 716677 (0.0025) [2024-06-15 20:56:45,738][1648984] Fps is (10 sec: 36043.8, 60 sec: 44236.6, 300 sec: 42876.0). Total num frames: 1467875328. Throughput: 0: 11070.5. Samples: 367055872. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:56:45,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:56:46,924][1652475] Updated weights for policy 0, policy_version 716784 (0.0012) [2024-06-15 20:56:49,859][1652475] Updated weights for policy 0, policy_version 716855 (0.0015) [2024-06-15 20:56:50,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 42876.1). Total num frames: 1468170240. Throughput: 0: 10934.2. Samples: 367083008. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:56:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:56:51,482][1652475] Updated weights for policy 0, policy_version 716924 (0.0013) [2024-06-15 20:56:55,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1468268544. Throughput: 0: 10626.9. Samples: 367141888. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:56:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:56:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000716928_1468268544.pth... [2024-06-15 20:56:55,822][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000711968_1458110464.pth [2024-06-15 20:56:59,377][1652475] Updated weights for policy 0, policy_version 717024 (0.0015) [2024-06-15 20:57:00,232][1652475] Updated weights for policy 0, policy_version 717056 (0.0009) [2024-06-15 20:57:00,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42052.2, 300 sec: 42876.1). Total num frames: 1468530688. Throughput: 0: 10467.5. Samples: 367195136. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:57:03,893][1652475] Updated weights for policy 0, policy_version 717136 (0.0191) [2024-06-15 20:57:04,270][1651340] Signal inference workers to stop experience collection... (36850 times) [2024-06-15 20:57:04,319][1652475] InferenceWorker_p0-w0: stopping experience collection (36850 times) [2024-06-15 20:57:04,524][1651340] Signal inference workers to resume experience collection... (36850 times) [2024-06-15 20:57:04,525][1652475] InferenceWorker_p0-w0: resuming experience collection (36850 times) [2024-06-15 20:57:05,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1468792832. Throughput: 0: 10262.7. Samples: 367228416. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:05,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:57:10,720][1652475] Updated weights for policy 0, policy_version 717216 (0.0077) [2024-06-15 20:57:10,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 40960.1, 300 sec: 42431.8). Total num frames: 1468858368. Throughput: 0: 10581.4. Samples: 367297536. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:57:13,256][1652475] Updated weights for policy 0, policy_version 717306 (0.0114) [2024-06-15 20:57:15,739][1648984] Fps is (10 sec: 32766.2, 60 sec: 40413.3, 300 sec: 42653.8). Total num frames: 1469120512. Throughput: 0: 10114.7. Samples: 367352320. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:15,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:57:15,917][1652475] Updated weights for policy 0, policy_version 717363 (0.0013) [2024-06-15 20:57:16,854][1652475] Updated weights for policy 0, policy_version 717408 (0.0044) [2024-06-15 20:57:20,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 42654.0). Total num frames: 1469317120. Throughput: 0: 10069.3. Samples: 367384064. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:57:22,891][1652475] Updated weights for policy 0, policy_version 717472 (0.0012) [2024-06-15 20:57:24,798][1652475] Updated weights for policy 0, policy_version 717543 (0.0012) [2024-06-15 20:57:25,738][1648984] Fps is (10 sec: 45879.5, 60 sec: 41506.4, 300 sec: 42653.9). Total num frames: 1469579264. Throughput: 0: 10342.4. Samples: 367447552. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:57:27,229][1652475] Updated weights for policy 0, policy_version 717585 (0.0011) [2024-06-15 20:57:28,956][1652475] Updated weights for policy 0, policy_version 717664 (0.0011) [2024-06-15 20:57:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1469841408. Throughput: 0: 10126.3. Samples: 367511552. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 20:57:34,876][1652475] Updated weights for policy 0, policy_version 717697 (0.0015) [2024-06-15 20:57:35,738][1648984] Fps is (10 sec: 32767.4, 60 sec: 39867.7, 300 sec: 42431.8). Total num frames: 1469906944. Throughput: 0: 10387.9. Samples: 367550464. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 20:57:36,234][1652475] Updated weights for policy 0, policy_version 717760 (0.0012) [2024-06-15 20:57:38,740][1652475] Updated weights for policy 0, policy_version 717844 (0.0097) [2024-06-15 20:57:39,972][1652475] Updated weights for policy 0, policy_version 717890 (0.0013) [2024-06-15 20:57:40,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 1470300160. Throughput: 0: 10319.7. Samples: 367606272. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:57:41,296][1652475] Updated weights for policy 0, policy_version 717952 (0.0012) [2024-06-15 20:57:45,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 41506.3, 300 sec: 42653.9). Total num frames: 1470365696. Throughput: 0: 10752.0. Samples: 367678976. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 20:57:49,903][1652475] Updated weights for policy 0, policy_version 718039 (0.0012) [2024-06-15 20:57:50,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 1470627840. Throughput: 0: 10809.0. Samples: 367714816. Policy #0 lag: (min: 127.0, avg: 167.5, max: 367.0) [2024-06-15 20:57:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:57:51,230][1651340] Signal inference workers to stop experience collection... (36900 times) [2024-06-15 20:57:51,264][1652475] InferenceWorker_p0-w0: stopping experience collection (36900 times) [2024-06-15 20:57:51,436][1651340] Signal inference workers to resume experience collection... (36900 times) [2024-06-15 20:57:51,438][1652475] InferenceWorker_p0-w0: resuming experience collection (36900 times) [2024-06-15 20:57:51,598][1652475] Updated weights for policy 0, policy_version 718114 (0.0013) [2024-06-15 20:57:53,067][1652475] Updated weights for policy 0, policy_version 718179 (0.0113) [2024-06-15 20:57:55,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.6, 300 sec: 42654.6). Total num frames: 1470889984. Throughput: 0: 10456.1. Samples: 367768064. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:57:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:58:00,176][1652475] Updated weights for policy 0, policy_version 718259 (0.0014) [2024-06-15 20:58:00,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 1471021056. Throughput: 0: 10957.0. Samples: 367845376. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:00,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 20:58:02,844][1652475] Updated weights for policy 0, policy_version 718352 (0.0012) [2024-06-15 20:58:05,294][1652475] Updated weights for policy 0, policy_version 718460 (0.0017) [2024-06-15 20:58:05,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.9, 300 sec: 42987.2). Total num frames: 1471414272. Throughput: 0: 10911.3. Samples: 367875072. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:05,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:58:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1471414272. Throughput: 0: 11013.7. Samples: 367943168. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 20:58:12,155][1652475] Updated weights for policy 0, policy_version 718536 (0.0099) [2024-06-15 20:58:15,372][1652475] Updated weights for policy 0, policy_version 718625 (0.0124) [2024-06-15 20:58:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 44237.5, 300 sec: 43098.3). Total num frames: 1471774720. Throughput: 0: 10945.4. Samples: 368004096. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:58:17,370][1652475] Updated weights for policy 0, policy_version 718672 (0.0012) [2024-06-15 20:58:18,564][1652475] Updated weights for policy 0, policy_version 718715 (0.0014) [2024-06-15 20:58:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1471938560. Throughput: 0: 10763.4. Samples: 368034816. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:20,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 20:58:22,959][1652475] Updated weights for policy 0, policy_version 718768 (0.0015) [2024-06-15 20:58:24,508][1652475] Updated weights for policy 0, policy_version 718836 (0.0016) [2024-06-15 20:58:25,747][1648984] Fps is (10 sec: 42575.9, 60 sec: 43686.8, 300 sec: 43430.7). Total num frames: 1472200704. Throughput: 0: 11046.5. Samples: 368103424. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:25,750][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 20:58:26,814][1652475] Updated weights for policy 0, policy_version 718880 (0.0011) [2024-06-15 20:58:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 1472331776. Throughput: 0: 11036.4. Samples: 368175616. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 20:58:31,311][1652475] Updated weights for policy 0, policy_version 718944 (0.0015) [2024-06-15 20:58:34,438][1652475] Updated weights for policy 0, policy_version 719024 (0.0020) [2024-06-15 20:58:35,738][1648984] Fps is (10 sec: 45899.4, 60 sec: 45875.3, 300 sec: 43320.4). Total num frames: 1472659456. Throughput: 0: 10877.2. Samples: 368204288. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 20:58:36,068][1652475] Updated weights for policy 0, policy_version 719102 (0.0108) [2024-06-15 20:58:37,845][1651340] Signal inference workers to stop experience collection... (36950 times) [2024-06-15 20:58:37,893][1652475] InferenceWorker_p0-w0: stopping experience collection (36950 times) [2024-06-15 20:58:38,094][1651340] Signal inference workers to resume experience collection... (36950 times) [2024-06-15 20:58:38,095][1652475] InferenceWorker_p0-w0: resuming experience collection (36950 times) [2024-06-15 20:58:40,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 1472856064. Throughput: 0: 11002.3. Samples: 368263168. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 20:58:42,623][1652475] Updated weights for policy 0, policy_version 719171 (0.0012) [2024-06-15 20:58:45,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 44236.7, 300 sec: 42765.0). Total num frames: 1473019904. Throughput: 0: 11104.7. Samples: 368345088. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:45,739][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 20:58:45,760][1652475] Updated weights for policy 0, policy_version 719264 (0.0120) [2024-06-15 20:58:47,259][1652475] Updated weights for policy 0, policy_version 719344 (0.0109) [2024-06-15 20:58:50,253][1652475] Updated weights for policy 0, policy_version 719415 (0.0011) [2024-06-15 20:58:50,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 43431.5). Total num frames: 1473380352. Throughput: 0: 11059.2. Samples: 368372736. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:50,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 20:58:55,329][1652475] Updated weights for policy 0, policy_version 719484 (0.0020) [2024-06-15 20:58:55,738][1648984] Fps is (10 sec: 49150.6, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1473511424. Throughput: 0: 11195.7. Samples: 368446976. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:58:55,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 20:58:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000719488_1473511424.pth... [2024-06-15 20:58:55,804][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000714464_1463222272.pth [2024-06-15 20:58:58,521][1652475] Updated weights for policy 0, policy_version 719556 (0.0021) [2024-06-15 20:58:59,634][1652475] Updated weights for policy 0, policy_version 719611 (0.0017) [2024-06-15 20:59:00,748][1648984] Fps is (10 sec: 42570.2, 60 sec: 46416.2, 300 sec: 43652.7). Total num frames: 1473806336. Throughput: 0: 11273.7. Samples: 368511488. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:59:00,753][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 20:59:01,342][1652475] Updated weights for policy 0, policy_version 719670 (0.0017) [2024-06-15 20:59:05,738][1648984] Fps is (10 sec: 45876.7, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1473970176. Throughput: 0: 11377.8. Samples: 368546816. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:59:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 20:59:06,205][1652475] Updated weights for policy 0, policy_version 719738 (0.0015) [2024-06-15 20:59:08,059][1652475] Updated weights for policy 0, policy_version 719792 (0.0028) [2024-06-15 20:59:10,738][1648984] Fps is (10 sec: 36068.7, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 1474166784. Throughput: 0: 11390.5. Samples: 368615936. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:59:10,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 20:59:11,490][1652475] Updated weights for policy 0, policy_version 719840 (0.0015) [2024-06-15 20:59:13,103][1652475] Updated weights for policy 0, policy_version 719904 (0.0012) [2024-06-15 20:59:15,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 1474428928. Throughput: 0: 11207.1. Samples: 368679936. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:59:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:59:17,028][1652475] Updated weights for policy 0, policy_version 719954 (0.0013) [2024-06-15 20:59:19,569][1652475] Updated weights for policy 0, policy_version 720039 (0.0012) [2024-06-15 20:59:20,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 45874.9, 300 sec: 43098.2). Total num frames: 1474691072. Throughput: 0: 11343.6. Samples: 368714752. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:59:20,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:59:22,671][1652475] Updated weights for policy 0, policy_version 720096 (0.0013) [2024-06-15 20:59:23,310][1651340] Signal inference workers to stop experience collection... (37000 times) [2024-06-15 20:59:23,336][1651340] Signal inference workers to resume experience collection... (37000 times) [2024-06-15 20:59:23,372][1652475] InferenceWorker_p0-w0: stopping experience collection (37000 times) [2024-06-15 20:59:23,373][1652475] InferenceWorker_p0-w0: resuming experience collection (37000 times) [2024-06-15 20:59:23,889][1652475] Updated weights for policy 0, policy_version 720146 (0.0023) [2024-06-15 20:59:25,751][1648984] Fps is (10 sec: 52357.7, 60 sec: 45868.9, 300 sec: 43540.6). Total num frames: 1474953216. Throughput: 0: 11522.2. Samples: 368781824. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:59:25,752][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:59:27,623][1652475] Updated weights for policy 0, policy_version 720208 (0.0015) [2024-06-15 20:59:30,738][1648984] Fps is (10 sec: 45876.6, 60 sec: 46967.4, 300 sec: 43320.4). Total num frames: 1475149824. Throughput: 0: 11411.9. Samples: 368858624. Policy #0 lag: (min: 111.0, avg: 226.9, max: 399.0) [2024-06-15 20:59:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:59:31,047][1652475] Updated weights for policy 0, policy_version 720304 (0.0024) [2024-06-15 20:59:35,009][1652475] Updated weights for policy 0, policy_version 720384 (0.0118) [2024-06-15 20:59:35,738][1648984] Fps is (10 sec: 42656.2, 60 sec: 45329.1, 300 sec: 43653.6). Total num frames: 1475379200. Throughput: 0: 11537.1. Samples: 368891904. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 20:59:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 20:59:39,525][1652475] Updated weights for policy 0, policy_version 720454 (0.0013) [2024-06-15 20:59:40,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 1475608576. Throughput: 0: 11264.1. Samples: 368953856. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 20:59:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 20:59:42,888][1652475] Updated weights for policy 0, policy_version 720528 (0.0020) [2024-06-15 20:59:43,698][1652475] Updated weights for policy 0, policy_version 720573 (0.0015) [2024-06-15 20:59:45,738][1648984] Fps is (10 sec: 36043.2, 60 sec: 45328.8, 300 sec: 43098.2). Total num frames: 1475739648. Throughput: 0: 11356.6. Samples: 369022464. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 20:59:45,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 20:59:47,128][1652475] Updated weights for policy 0, policy_version 720640 (0.0166) [2024-06-15 20:59:48,537][1652475] Updated weights for policy 0, policy_version 720700 (0.0021) [2024-06-15 20:59:50,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1476001792. Throughput: 0: 11082.0. Samples: 369045504. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 20:59:50,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 20:59:51,436][1652475] Updated weights for policy 0, policy_version 720752 (0.0012) [2024-06-15 20:59:55,738][1648984] Fps is (10 sec: 39323.3, 60 sec: 43690.9, 300 sec: 42987.2). Total num frames: 1476132864. Throughput: 0: 11150.2. Samples: 369117696. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 20:59:55,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 20:59:56,880][1652475] Updated weights for policy 0, policy_version 720825 (0.0014) [2024-06-15 20:59:58,688][1652475] Updated weights for policy 0, policy_version 720889 (0.0112) [2024-06-15 21:00:00,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43149.2, 300 sec: 43542.6). Total num frames: 1476395008. Throughput: 0: 11002.3. Samples: 369175040. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:00:02,615][1652475] Updated weights for policy 0, policy_version 720946 (0.0012) [2024-06-15 21:00:05,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44782.9, 300 sec: 43542.5). Total num frames: 1476657152. Throughput: 0: 10877.2. Samples: 369204224. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:00:08,884][1652475] Updated weights for policy 0, policy_version 721029 (0.0013) [2024-06-15 21:00:10,627][1651340] Signal inference workers to stop experience collection... (37050 times) [2024-06-15 21:00:10,688][1652475] InferenceWorker_p0-w0: stopping experience collection (37050 times) [2024-06-15 21:00:10,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1476788224. Throughput: 0: 10982.9. Samples: 369275904. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:00:11,058][1651340] Signal inference workers to resume experience collection... (37050 times) [2024-06-15 21:00:11,059][1652475] InferenceWorker_p0-w0: resuming experience collection (37050 times) [2024-06-15 21:00:11,162][1652475] Updated weights for policy 0, policy_version 721104 (0.0305) [2024-06-15 21:00:12,249][1652475] Updated weights for policy 0, policy_version 721147 (0.0024) [2024-06-15 21:00:15,028][1652475] Updated weights for policy 0, policy_version 721200 (0.0014) [2024-06-15 21:00:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.7, 300 sec: 44097.9). Total num frames: 1477083136. Throughput: 0: 10547.2. Samples: 369333248. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:00:16,628][1652475] Updated weights for policy 0, policy_version 721275 (0.0012) [2024-06-15 21:00:20,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42052.4, 300 sec: 43320.4). Total num frames: 1477214208. Throughput: 0: 10535.8. Samples: 369366016. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:00:21,705][1652475] Updated weights for policy 0, policy_version 721331 (0.0011) [2024-06-15 21:00:23,642][1652475] Updated weights for policy 0, policy_version 721405 (0.0011) [2024-06-15 21:00:25,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41515.5, 300 sec: 43542.6). Total num frames: 1477443584. Throughput: 0: 10467.6. Samples: 369424896. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:00:28,183][1652475] Updated weights for policy 0, policy_version 721456 (0.0112) [2024-06-15 21:00:29,851][1652475] Updated weights for policy 0, policy_version 721534 (0.0014) [2024-06-15 21:00:30,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 42598.5, 300 sec: 43542.6). Total num frames: 1477705728. Throughput: 0: 10353.9. Samples: 369488384. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:00:33,957][1652475] Updated weights for policy 0, policy_version 721600 (0.0013) [2024-06-15 21:00:35,428][1652475] Updated weights for policy 0, policy_version 721663 (0.0012) [2024-06-15 21:00:35,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 1477967872. Throughput: 0: 10581.3. Samples: 369521664. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:00:39,871][1652475] Updated weights for policy 0, policy_version 721725 (0.0033) [2024-06-15 21:00:40,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41506.0, 300 sec: 43653.6). Total num frames: 1478098944. Throughput: 0: 10513.0. Samples: 369590784. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:00:43,672][1652475] Updated weights for policy 0, policy_version 721782 (0.0017) [2024-06-15 21:00:45,047][1652475] Updated weights for policy 0, policy_version 721827 (0.0010) [2024-06-15 21:00:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43691.0, 300 sec: 43098.2). Total num frames: 1478361088. Throughput: 0: 10581.4. Samples: 369651200. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:00:46,889][1652475] Updated weights for policy 0, policy_version 721916 (0.0012) [2024-06-15 21:00:50,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43542.6). Total num frames: 1478492160. Throughput: 0: 10570.0. Samples: 369679872. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:00:51,662][1652475] Updated weights for policy 0, policy_version 721958 (0.0013) [2024-06-15 21:00:55,633][1652475] Updated weights for policy 0, policy_version 722001 (0.0013) [2024-06-15 21:00:55,738][1648984] Fps is (10 sec: 29490.2, 60 sec: 42052.0, 300 sec: 42876.0). Total num frames: 1478656000. Throughput: 0: 10638.1. Samples: 369754624. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:00:55,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:00:56,468][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000722032_1478721536.pth... [2024-06-15 21:00:56,543][1651340] Signal inference workers to stop experience collection... (37100 times) [2024-06-15 21:00:56,631][1652475] InferenceWorker_p0-w0: stopping experience collection (37100 times) [2024-06-15 21:00:56,634][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000716928_1468268544.pth [2024-06-15 21:00:56,946][1651340] Signal inference workers to resume experience collection... (37100 times) [2024-06-15 21:00:56,947][1652475] InferenceWorker_p0-w0: resuming experience collection (37100 times) [2024-06-15 21:00:58,215][1652475] Updated weights for policy 0, policy_version 722096 (0.0111) [2024-06-15 21:01:00,030][1652475] Updated weights for policy 0, policy_version 722173 (0.0011) [2024-06-15 21:01:00,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 1479016448. Throughput: 0: 10456.2. Samples: 369803776. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:01:00,759][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:01:03,988][1652475] Updated weights for policy 0, policy_version 722209 (0.0015) [2024-06-15 21:01:05,737][1648984] Fps is (10 sec: 49154.3, 60 sec: 41506.3, 300 sec: 43209.4). Total num frames: 1479147520. Throughput: 0: 10638.3. Samples: 369844736. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:01:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:01:07,971][1652475] Updated weights for policy 0, policy_version 722273 (0.0020) [2024-06-15 21:01:09,764][1652475] Updated weights for policy 0, policy_version 722352 (0.0013) [2024-06-15 21:01:10,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1479442432. Throughput: 0: 10820.3. Samples: 369911808. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:01:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:01:11,400][1652475] Updated weights for policy 0, policy_version 722420 (0.0062) [2024-06-15 21:01:15,738][1648984] Fps is (10 sec: 42597.6, 60 sec: 41506.1, 300 sec: 43431.5). Total num frames: 1479573504. Throughput: 0: 10877.1. Samples: 369977856. Policy #0 lag: (min: 21.0, avg: 115.3, max: 277.0) [2024-06-15 21:01:15,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:01:16,476][1652475] Updated weights for policy 0, policy_version 722485 (0.0011) [2024-06-15 21:01:20,160][1652475] Updated weights for policy 0, policy_version 722528 (0.0102) [2024-06-15 21:01:20,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 42598.3, 300 sec: 42987.2). Total num frames: 1479770112. Throughput: 0: 10786.1. Samples: 370007040. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:01:20,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:01:22,312][1652475] Updated weights for policy 0, policy_version 722624 (0.0012) [2024-06-15 21:01:25,011][1652475] Updated weights for policy 0, policy_version 722686 (0.0145) [2024-06-15 21:01:25,741][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1480065024. Throughput: 0: 10478.9. Samples: 370062336. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:01:25,742][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 21:01:28,566][1652475] Updated weights for policy 0, policy_version 722745 (0.0011) [2024-06-15 21:01:30,738][1648984] Fps is (10 sec: 42597.2, 60 sec: 41505.8, 300 sec: 42987.1). Total num frames: 1480196096. Throughput: 0: 10695.0. Samples: 370132480. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:01:30,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:01:32,497][1652475] Updated weights for policy 0, policy_version 722801 (0.0012) [2024-06-15 21:01:33,986][1652475] Updated weights for policy 0, policy_version 722874 (0.0013) [2024-06-15 21:01:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1480458240. Throughput: 0: 10843.0. Samples: 370167808. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:01:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:01:36,716][1652475] Updated weights for policy 0, policy_version 722943 (0.0013) [2024-06-15 21:01:39,483][1651340] Signal inference workers to stop experience collection... (37150 times) [2024-06-15 21:01:39,515][1652475] InferenceWorker_p0-w0: stopping experience collection (37150 times) [2024-06-15 21:01:39,733][1651340] Signal inference workers to resume experience collection... (37150 times) [2024-06-15 21:01:39,734][1652475] InferenceWorker_p0-w0: resuming experience collection (37150 times) [2024-06-15 21:01:40,738][1648984] Fps is (10 sec: 52431.2, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1480720384. Throughput: 0: 10683.8. Samples: 370235392. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:01:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:01:43,324][1652475] Updated weights for policy 0, policy_version 723009 (0.0017) [2024-06-15 21:01:45,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.3, 300 sec: 43209.3). Total num frames: 1480916992. Throughput: 0: 10911.3. Samples: 370294784. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:01:45,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:01:45,797][1652475] Updated weights for policy 0, policy_version 723120 (0.0149) [2024-06-15 21:01:48,443][1652475] Updated weights for policy 0, policy_version 723155 (0.0013) [2024-06-15 21:01:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1481113600. Throughput: 0: 10752.0. Samples: 370328576. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:01:50,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:01:51,735][1652475] Updated weights for policy 0, policy_version 723221 (0.0010) [2024-06-15 21:01:55,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 43144.7, 300 sec: 43098.3). Total num frames: 1481244672. Throughput: 0: 10786.1. Samples: 370397184. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:01:55,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:01:56,202][1652475] Updated weights for policy 0, policy_version 723296 (0.0016) [2024-06-15 21:01:58,163][1652475] Updated weights for policy 0, policy_version 723386 (0.0014) [2024-06-15 21:02:00,611][1652475] Updated weights for policy 0, policy_version 723446 (0.0013) [2024-06-15 21:02:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1481637888. Throughput: 0: 10615.5. Samples: 370455552. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:02:04,479][1652475] Updated weights for policy 0, policy_version 723514 (0.0014) [2024-06-15 21:02:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 1481768960. Throughput: 0: 10808.9. Samples: 370493440. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:02:08,964][1652475] Updated weights for policy 0, policy_version 723577 (0.0091) [2024-06-15 21:02:10,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 42052.3, 300 sec: 43542.7). Total num frames: 1481965568. Throughput: 0: 10922.7. Samples: 370553856. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:10,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:02:10,855][1652475] Updated weights for policy 0, policy_version 723637 (0.0039) [2024-06-15 21:02:12,748][1652475] Updated weights for policy 0, policy_version 723705 (0.0143) [2024-06-15 21:02:15,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 1482162176. Throughput: 0: 10820.4. Samples: 370619392. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:15,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 21:02:19,239][1652475] Updated weights for policy 0, policy_version 723777 (0.0013) [2024-06-15 21:02:20,576][1652475] Updated weights for policy 0, policy_version 723840 (0.0012) [2024-06-15 21:02:20,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44236.9, 300 sec: 43542.5). Total num frames: 1482424320. Throughput: 0: 10729.2. Samples: 370650624. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:02:24,469][1652475] Updated weights for policy 0, policy_version 723921 (0.0111) [2024-06-15 21:02:25,596][1652475] Updated weights for policy 0, policy_version 723967 (0.0011) [2024-06-15 21:02:25,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1482686464. Throughput: 0: 10444.8. Samples: 370705408. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:25,742][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:02:30,569][1651340] Signal inference workers to stop experience collection... (37200 times) [2024-06-15 21:02:30,609][1652475] InferenceWorker_p0-w0: stopping experience collection (37200 times) [2024-06-15 21:02:30,721][1651340] Signal inference workers to resume experience collection... (37200 times) [2024-06-15 21:02:30,726][1652475] InferenceWorker_p0-w0: resuming experience collection (37200 times) [2024-06-15 21:02:30,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42598.7, 300 sec: 43542.6). Total num frames: 1482752000. Throughput: 0: 10865.8. Samples: 370783744. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:30,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 21:02:31,529][1652475] Updated weights for policy 0, policy_version 724035 (0.0012) [2024-06-15 21:02:32,820][1652475] Updated weights for policy 0, policy_version 724096 (0.0011) [2024-06-15 21:02:34,843][1652475] Updated weights for policy 0, policy_version 724157 (0.0013) [2024-06-15 21:02:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1483079680. Throughput: 0: 10729.3. Samples: 370811392. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:35,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 21:02:36,746][1652475] Updated weights for policy 0, policy_version 724208 (0.0013) [2024-06-15 21:02:40,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1483210752. Throughput: 0: 10729.3. Samples: 370880000. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:40,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 21:02:42,434][1652475] Updated weights for policy 0, policy_version 724242 (0.0014) [2024-06-15 21:02:44,759][1652475] Updated weights for policy 0, policy_version 724351 (0.0013) [2024-06-15 21:02:45,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43690.7, 300 sec: 43764.7). Total num frames: 1483538432. Throughput: 0: 10797.5. Samples: 370941440. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:02:46,368][1652475] Updated weights for policy 0, policy_version 724410 (0.0027) [2024-06-15 21:02:48,649][1652475] Updated weights for policy 0, policy_version 724480 (0.0013) [2024-06-15 21:02:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1483735040. Throughput: 0: 10490.3. Samples: 370965504. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:02:55,531][1652475] Updated weights for policy 0, policy_version 724544 (0.0012) [2024-06-15 21:02:55,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1483866112. Throughput: 0: 10877.1. Samples: 371043328. Policy #0 lag: (min: 30.0, avg: 141.5, max: 286.0) [2024-06-15 21:02:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:02:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000724544_1483866112.pth... [2024-06-15 21:02:55,793][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000719488_1473511424.pth [2024-06-15 21:02:57,979][1652475] Updated weights for policy 0, policy_version 724593 (0.0013) [2024-06-15 21:03:00,151][1652475] Updated weights for policy 0, policy_version 724688 (0.0013) [2024-06-15 21:03:00,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1484193792. Throughput: 0: 10501.7. Samples: 371091968. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:03:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1484259328. Throughput: 0: 10592.7. Samples: 371127296. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:03:07,145][1652475] Updated weights for policy 0, policy_version 724754 (0.0098) [2024-06-15 21:03:08,083][1652475] Updated weights for policy 0, policy_version 724800 (0.0048) [2024-06-15 21:03:09,609][1652475] Updated weights for policy 0, policy_version 724835 (0.0011) [2024-06-15 21:03:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 1484554240. Throughput: 0: 10899.9. Samples: 371195904. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:03:11,666][1652475] Updated weights for policy 0, policy_version 724921 (0.0016) [2024-06-15 21:03:13,262][1651340] Signal inference workers to stop experience collection... (37250 times) [2024-06-15 21:03:13,318][1652475] InferenceWorker_p0-w0: stopping experience collection (37250 times) [2024-06-15 21:03:13,556][1651340] Signal inference workers to resume experience collection... (37250 times) [2024-06-15 21:03:13,557][1652475] InferenceWorker_p0-w0: resuming experience collection (37250 times) [2024-06-15 21:03:13,888][1652475] Updated weights for policy 0, policy_version 724990 (0.0097) [2024-06-15 21:03:15,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 1484783616. Throughput: 0: 10535.8. Samples: 371257856. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:03:19,790][1652475] Updated weights for policy 0, policy_version 725047 (0.0012) [2024-06-15 21:03:20,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 43099.0). Total num frames: 1484914688. Throughput: 0: 10706.5. Samples: 371293184. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:20,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:03:21,923][1652475] Updated weights for policy 0, policy_version 725091 (0.0011) [2024-06-15 21:03:23,723][1652475] Updated weights for policy 0, policy_version 725168 (0.0079) [2024-06-15 21:03:25,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 42052.1, 300 sec: 43653.6). Total num frames: 1485209600. Throughput: 0: 10478.9. Samples: 371351552. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:25,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:03:30,336][1652475] Updated weights for policy 0, policy_version 725250 (0.0062) [2024-06-15 21:03:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1485340672. Throughput: 0: 10672.4. Samples: 371421696. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 21:03:31,879][1652475] Updated weights for policy 0, policy_version 725312 (0.0012) [2024-06-15 21:03:35,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1485570048. Throughput: 0: 10797.5. Samples: 371451392. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:03:35,755][1652475] Updated weights for policy 0, policy_version 725377 (0.0012) [2024-06-15 21:03:37,120][1652475] Updated weights for policy 0, policy_version 725438 (0.0012) [2024-06-15 21:03:38,726][1652475] Updated weights for policy 0, policy_version 725494 (0.0014) [2024-06-15 21:03:40,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1485832192. Throughput: 0: 10387.9. Samples: 371510784. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:03:43,531][1652475] Updated weights for policy 0, policy_version 725560 (0.0012) [2024-06-15 21:03:45,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1486061568. Throughput: 0: 10752.0. Samples: 371575808. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:45,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:03:45,904][1652475] Updated weights for policy 0, policy_version 725627 (0.0014) [2024-06-15 21:03:50,184][1652475] Updated weights for policy 0, policy_version 725696 (0.0011) [2024-06-15 21:03:50,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 43209.4). Total num frames: 1486258176. Throughput: 0: 10808.9. Samples: 371613696. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:03:54,619][1652475] Updated weights for policy 0, policy_version 725792 (0.0012) [2024-06-15 21:03:55,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 42988.1). Total num frames: 1486487552. Throughput: 0: 10854.4. Samples: 371684352. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:03:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:03:56,389][1652475] Updated weights for policy 0, policy_version 725845 (0.0012) [2024-06-15 21:04:00,307][1652475] Updated weights for policy 0, policy_version 725904 (0.0041) [2024-06-15 21:04:00,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1486684160. Throughput: 0: 11059.2. Samples: 371755520. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:04:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:04:01,655][1651340] Signal inference workers to stop experience collection... (37300 times) [2024-06-15 21:04:01,672][1652475] Updated weights for policy 0, policy_version 725969 (0.0172) [2024-06-15 21:04:01,734][1652475] InferenceWorker_p0-w0: stopping experience collection (37300 times) [2024-06-15 21:04:01,861][1651340] Signal inference workers to resume experience collection... (37300 times) [2024-06-15 21:04:01,862][1652475] InferenceWorker_p0-w0: resuming experience collection (37300 times) [2024-06-15 21:04:04,853][1652475] Updated weights for policy 0, policy_version 726017 (0.0013) [2024-06-15 21:04:05,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 45328.9, 300 sec: 43431.5). Total num frames: 1486979072. Throughput: 0: 10968.1. Samples: 371786752. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:04:05,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:04:06,066][1652475] Updated weights for policy 0, policy_version 726080 (0.0020) [2024-06-15 21:04:10,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 1487142912. Throughput: 0: 11104.7. Samples: 371851264. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:04:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:04:12,803][1652475] Updated weights for policy 0, policy_version 726176 (0.0015) [2024-06-15 21:04:14,667][1652475] Updated weights for policy 0, policy_version 726256 (0.0015) [2024-06-15 21:04:15,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1487405056. Throughput: 0: 10956.8. Samples: 371914752. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:04:15,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:04:17,844][1652475] Updated weights for policy 0, policy_version 726304 (0.0012) [2024-06-15 21:04:19,961][1652475] Updated weights for policy 0, policy_version 726352 (0.0050) [2024-06-15 21:04:20,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44783.0, 300 sec: 42878.1). Total num frames: 1487601664. Throughput: 0: 11047.8. Samples: 371948544. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:04:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:04:21,214][1652475] Updated weights for policy 0, policy_version 726398 (0.0011) [2024-06-15 21:04:25,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43144.7, 300 sec: 42876.1). Total num frames: 1487798272. Throughput: 0: 11218.5. Samples: 372015616. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:04:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 21:04:26,397][1652475] Updated weights for policy 0, policy_version 726496 (0.0012) [2024-06-15 21:04:30,342][1652475] Updated weights for policy 0, policy_version 726530 (0.0012) [2024-06-15 21:04:30,738][1648984] Fps is (10 sec: 36043.8, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1487962112. Throughput: 0: 11059.1. Samples: 372073472. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:04:30,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:04:31,802][1652475] Updated weights for policy 0, policy_version 726586 (0.0013) [2024-06-15 21:04:34,716][1652475] Updated weights for policy 0, policy_version 726640 (0.0012) [2024-06-15 21:04:35,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1488191488. Throughput: 0: 10877.2. Samples: 372103168. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:04:35,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:04:36,124][1652475] Updated weights for policy 0, policy_version 726676 (0.0012) [2024-06-15 21:04:37,483][1652475] Updated weights for policy 0, policy_version 726737 (0.0024) [2024-06-15 21:04:38,644][1652475] Updated weights for policy 0, policy_version 726784 (0.0038) [2024-06-15 21:04:40,738][1648984] Fps is (10 sec: 49153.4, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1488453632. Throughput: 0: 10638.2. Samples: 372163072. Policy #0 lag: (min: 38.0, avg: 110.3, max: 230.0) [2024-06-15 21:04:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:04:44,879][1652475] Updated weights for policy 0, policy_version 726845 (0.0025) [2024-06-15 21:04:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42052.3, 300 sec: 42654.0). Total num frames: 1488584704. Throughput: 0: 10615.5. Samples: 372233216. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:04:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:04:47,954][1652475] Updated weights for policy 0, policy_version 726912 (0.0018) [2024-06-15 21:04:48,881][1651340] Signal inference workers to stop experience collection... (37350 times) [2024-06-15 21:04:48,946][1652475] InferenceWorker_p0-w0: stopping experience collection (37350 times) [2024-06-15 21:04:49,233][1651340] Signal inference workers to resume experience collection... (37350 times) [2024-06-15 21:04:49,233][1652475] InferenceWorker_p0-w0: resuming experience collection (37350 times) [2024-06-15 21:04:50,274][1652475] Updated weights for policy 0, policy_version 726997 (0.0094) [2024-06-15 21:04:50,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 44783.0, 300 sec: 43431.5). Total num frames: 1488945152. Throughput: 0: 10581.4. Samples: 372262912. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:04:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:04:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 1488977920. Throughput: 0: 10604.1. Samples: 372328448. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:04:55,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:04:55,756][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000727040_1488977920.pth... [2024-06-15 21:04:55,846][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000722032_1478721536.pth [2024-06-15 21:04:56,881][1652475] Updated weights for policy 0, policy_version 727072 (0.0014) [2024-06-15 21:05:00,168][1652475] Updated weights for policy 0, policy_version 727171 (0.0012) [2024-06-15 21:05:00,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 1489272832. Throughput: 0: 10547.2. Samples: 372389376. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:05:02,418][1652475] Updated weights for policy 0, policy_version 727266 (0.0140) [2024-06-15 21:05:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 42052.4, 300 sec: 43098.2). Total num frames: 1489502208. Throughput: 0: 10353.8. Samples: 372414464. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:05:08,920][1652475] Updated weights for policy 0, policy_version 727314 (0.0013) [2024-06-15 21:05:09,938][1652475] Updated weights for policy 0, policy_version 727359 (0.0011) [2024-06-15 21:05:10,738][1648984] Fps is (10 sec: 36044.1, 60 sec: 41506.0, 300 sec: 42542.8). Total num frames: 1489633280. Throughput: 0: 10592.6. Samples: 372492288. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:10,739][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 21:05:12,456][1652475] Updated weights for policy 0, policy_version 727440 (0.0012) [2024-06-15 21:05:13,575][1652475] Updated weights for policy 0, policy_version 727488 (0.0010) [2024-06-15 21:05:15,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 1489960960. Throughput: 0: 10547.3. Samples: 372548096. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:05:16,082][1652475] Updated weights for policy 0, policy_version 727548 (0.0010) [2024-06-15 21:05:20,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 40413.9, 300 sec: 42653.9). Total num frames: 1490026496. Throughput: 0: 10604.1. Samples: 372580352. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:05:22,218][1652475] Updated weights for policy 0, policy_version 727615 (0.0011) [2024-06-15 21:05:24,447][1652475] Updated weights for policy 0, policy_version 727670 (0.0012) [2024-06-15 21:05:25,611][1652475] Updated weights for policy 0, policy_version 727730 (0.0010) [2024-06-15 21:05:25,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1490386944. Throughput: 0: 10490.3. Samples: 372635136. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:25,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 21:05:30,118][1652475] Updated weights for policy 0, policy_version 727776 (0.0015) [2024-06-15 21:05:30,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43144.8, 300 sec: 42654.0). Total num frames: 1490550784. Throughput: 0: 10478.9. Samples: 372704768. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:05:34,130][1652475] Updated weights for policy 0, policy_version 727867 (0.0011) [2024-06-15 21:05:35,413][1651340] Signal inference workers to stop experience collection... (37400 times) [2024-06-15 21:05:35,470][1652475] InferenceWorker_p0-w0: stopping experience collection (37400 times) [2024-06-15 21:05:35,499][1652475] Updated weights for policy 0, policy_version 727908 (0.0012) [2024-06-15 21:05:35,630][1651340] Signal inference workers to resume experience collection... (37400 times) [2024-06-15 21:05:35,631][1652475] InferenceWorker_p0-w0: resuming experience collection (37400 times) [2024-06-15 21:05:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1490780160. Throughput: 0: 10558.6. Samples: 372738048. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:35,740][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:05:37,031][1652475] Updated weights for policy 0, policy_version 727971 (0.0011) [2024-06-15 21:05:40,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 1490944000. Throughput: 0: 10410.7. Samples: 372796928. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:05:44,397][1652475] Updated weights for policy 0, policy_version 728048 (0.0021) [2024-06-15 21:05:45,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1491140608. Throughput: 0: 10513.1. Samples: 372862464. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:45,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:05:46,052][1652475] Updated weights for policy 0, policy_version 728112 (0.0013) [2024-06-15 21:05:48,217][1652475] Updated weights for policy 0, policy_version 728165 (0.0013) [2024-06-15 21:05:50,040][1652475] Updated weights for policy 0, policy_version 728227 (0.0012) [2024-06-15 21:05:50,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 42052.2, 300 sec: 43431.5). Total num frames: 1491468288. Throughput: 0: 10592.7. Samples: 372891136. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:05:55,738][1648984] Fps is (10 sec: 32767.3, 60 sec: 41506.0, 300 sec: 42209.6). Total num frames: 1491468288. Throughput: 0: 10240.0. Samples: 372953088. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:05:55,739][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 21:05:56,999][1652475] Updated weights for policy 0, policy_version 728313 (0.0012) [2024-06-15 21:05:58,747][1652475] Updated weights for policy 0, policy_version 728382 (0.0012) [2024-06-15 21:06:00,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 1491828736. Throughput: 0: 10342.4. Samples: 373013504. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:06:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:06:00,933][1652475] Updated weights for policy 0, policy_version 728448 (0.0111) [2024-06-15 21:06:03,468][1652475] Updated weights for policy 0, policy_version 728511 (0.0017) [2024-06-15 21:06:05,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 41506.1, 300 sec: 42542.9). Total num frames: 1491992576. Throughput: 0: 10285.5. Samples: 373043200. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:06:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:06:08,571][1652475] Updated weights for policy 0, policy_version 728568 (0.0013) [2024-06-15 21:06:09,923][1652475] Updated weights for policy 0, policy_version 728610 (0.0017) [2024-06-15 21:06:10,583][1652475] Updated weights for policy 0, policy_version 728640 (0.0011) [2024-06-15 21:06:10,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1492254720. Throughput: 0: 10763.4. Samples: 373119488. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:06:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:06:12,313][1652475] Updated weights for policy 0, policy_version 728703 (0.0070) [2024-06-15 21:06:15,338][1652475] Updated weights for policy 0, policy_version 728766 (0.0012) [2024-06-15 21:06:15,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 42598.4, 300 sec: 43209.4). Total num frames: 1492516864. Throughput: 0: 10581.3. Samples: 373180928. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:06:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:06:19,662][1652475] Updated weights for policy 0, policy_version 728816 (0.0011) [2024-06-15 21:06:20,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 44236.6, 300 sec: 42765.0). Total num frames: 1492680704. Throughput: 0: 10820.2. Samples: 373224960. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:06:20,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:06:21,145][1652475] Updated weights for policy 0, policy_version 728864 (0.0156) [2024-06-15 21:06:22,443][1652475] Updated weights for policy 0, policy_version 728912 (0.0015) [2024-06-15 21:06:22,899][1651340] Signal inference workers to stop experience collection... (37450 times) [2024-06-15 21:06:22,941][1652475] InferenceWorker_p0-w0: stopping experience collection (37450 times) [2024-06-15 21:06:23,199][1651340] Signal inference workers to resume experience collection... (37450 times) [2024-06-15 21:06:23,200][1652475] InferenceWorker_p0-w0: resuming experience collection (37450 times) [2024-06-15 21:06:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42052.2, 300 sec: 43098.3). Total num frames: 1492910080. Throughput: 0: 10877.2. Samples: 373286400. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 21:06:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:06:26,448][1652475] Updated weights for policy 0, policy_version 729008 (0.0025) [2024-06-15 21:06:30,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 1493041152. Throughput: 0: 11081.9. Samples: 373361152. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:06:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:06:31,432][1652475] Updated weights for policy 0, policy_version 729072 (0.0011) [2024-06-15 21:06:33,249][1652475] Updated weights for policy 0, policy_version 729150 (0.0012) [2024-06-15 21:06:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 1493434368. Throughput: 0: 11059.2. Samples: 373388800. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:06:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:06:38,041][1652475] Updated weights for policy 0, policy_version 729235 (0.0025) [2024-06-15 21:06:40,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 43690.4, 300 sec: 42876.1). Total num frames: 1493565440. Throughput: 0: 11002.3. Samples: 373448192. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:06:40,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:06:42,650][1652475] Updated weights for policy 0, policy_version 729298 (0.0012) [2024-06-15 21:06:44,143][1652475] Updated weights for policy 0, policy_version 729349 (0.0012) [2024-06-15 21:06:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 44782.9, 300 sec: 43098.3). Total num frames: 1493827584. Throughput: 0: 11275.4. Samples: 373520896. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:06:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:06:46,748][1652475] Updated weights for policy 0, policy_version 729409 (0.0014) [2024-06-15 21:06:48,592][1652475] Updated weights for policy 0, policy_version 729474 (0.0033) [2024-06-15 21:06:50,738][1648984] Fps is (10 sec: 52430.9, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1494089728. Throughput: 0: 11286.8. Samples: 373551104. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:06:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:06:53,816][1652475] Updated weights for policy 0, policy_version 729571 (0.0011) [2024-06-15 21:06:55,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 45875.3, 300 sec: 42653.9). Total num frames: 1494220800. Throughput: 0: 11138.8. Samples: 373620736. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:06:55,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:06:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000729600_1494220800.pth... [2024-06-15 21:06:55,783][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000724544_1483866112.pth [2024-06-15 21:06:59,587][1652475] Updated weights for policy 0, policy_version 729686 (0.0014) [2024-06-15 21:07:00,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 1494482944. Throughput: 0: 11081.9. Samples: 373679616. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:00,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:07:01,380][1652475] Updated weights for policy 0, policy_version 729745 (0.0094) [2024-06-15 21:07:04,028][1652475] Updated weights for policy 0, policy_version 729795 (0.0012) [2024-06-15 21:07:05,197][1652475] Updated weights for policy 0, policy_version 729854 (0.0012) [2024-06-15 21:07:05,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 45875.2, 300 sec: 43320.4). Total num frames: 1494745088. Throughput: 0: 10900.0. Samples: 373715456. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:07:10,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1494777856. Throughput: 0: 11104.7. Samples: 373786112. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:07:12,575][1651340] Signal inference workers to stop experience collection... (37500 times) [2024-06-15 21:07:12,637][1652475] InferenceWorker_p0-w0: stopping experience collection (37500 times) [2024-06-15 21:07:12,643][1652475] Updated weights for policy 0, policy_version 729957 (0.0012) [2024-06-15 21:07:12,793][1651340] Signal inference workers to resume experience collection... (37500 times) [2024-06-15 21:07:12,794][1652475] InferenceWorker_p0-w0: resuming experience collection (37500 times) [2024-06-15 21:07:14,030][1652475] Updated weights for policy 0, policy_version 730016 (0.0014) [2024-06-15 21:07:15,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1495138304. Throughput: 0: 10661.0. Samples: 373840896. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:15,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:07:16,349][1652475] Updated weights for policy 0, policy_version 730064 (0.0012) [2024-06-15 21:07:20,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 1495269376. Throughput: 0: 10763.4. Samples: 373873152. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:20,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:07:22,650][1652475] Updated weights for policy 0, policy_version 730113 (0.0015) [2024-06-15 21:07:24,132][1652475] Updated weights for policy 0, policy_version 730181 (0.0031) [2024-06-15 21:07:25,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1495531520. Throughput: 0: 10991.0. Samples: 373942784. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:07:25,823][1652475] Updated weights for policy 0, policy_version 730241 (0.0010) [2024-06-15 21:07:28,932][1652475] Updated weights for policy 0, policy_version 730305 (0.0011) [2024-06-15 21:07:30,420][1652475] Updated weights for policy 0, policy_version 730366 (0.0023) [2024-06-15 21:07:30,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 45875.4, 300 sec: 43098.2). Total num frames: 1495793664. Throughput: 0: 10695.1. Samples: 374002176. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:07:35,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.0, 300 sec: 42987.2). Total num frames: 1495891968. Throughput: 0: 10854.4. Samples: 374039552. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:35,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:07:36,550][1652475] Updated weights for policy 0, policy_version 730448 (0.0015) [2024-06-15 21:07:38,618][1652475] Updated weights for policy 0, policy_version 730528 (0.0014) [2024-06-15 21:07:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43691.0, 300 sec: 42876.1). Total num frames: 1496186880. Throughput: 0: 10456.2. Samples: 374091264. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:07:43,841][1652475] Updated weights for policy 0, policy_version 730578 (0.0009) [2024-06-15 21:07:45,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1496317952. Throughput: 0: 10649.6. Samples: 374158848. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 21:07:48,265][1652475] Updated weights for policy 0, policy_version 730672 (0.0011) [2024-06-15 21:07:49,376][1652475] Updated weights for policy 0, policy_version 730708 (0.0138) [2024-06-15 21:07:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1496580096. Throughput: 0: 10535.8. Samples: 374189568. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:07:51,380][1652475] Updated weights for policy 0, policy_version 730789 (0.0103) [2024-06-15 21:07:55,737][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.3, 300 sec: 42431.8). Total num frames: 1496711168. Throughput: 0: 10240.0. Samples: 374246912. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:07:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:07:58,115][1652475] Updated weights for policy 0, policy_version 730871 (0.0013) [2024-06-15 21:08:00,517][1651340] Signal inference workers to stop experience collection... (37550 times) [2024-06-15 21:08:00,566][1652475] InferenceWorker_p0-w0: stopping experience collection (37550 times) [2024-06-15 21:08:00,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 40413.9, 300 sec: 42876.1). Total num frames: 1496907776. Throughput: 0: 10581.3. Samples: 374317056. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:08:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:08:00,752][1651340] Signal inference workers to resume experience collection... (37550 times) [2024-06-15 21:08:00,753][1652475] InferenceWorker_p0-w0: resuming experience collection (37550 times) [2024-06-15 21:08:00,815][1652475] Updated weights for policy 0, policy_version 730928 (0.0070) [2024-06-15 21:08:02,988][1652475] Updated weights for policy 0, policy_version 731027 (0.0012) [2024-06-15 21:08:05,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 1497235456. Throughput: 0: 10285.5. Samples: 374336000. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 21:08:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:08:10,474][1652475] Updated weights for policy 0, policy_version 731090 (0.0011) [2024-06-15 21:08:10,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 1497300992. Throughput: 0: 10342.4. Samples: 374408192. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:08:10,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 21:08:13,649][1652475] Updated weights for policy 0, policy_version 731169 (0.0013) [2024-06-15 21:08:15,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 40960.1, 300 sec: 42987.2). Total num frames: 1497595904. Throughput: 0: 10240.0. Samples: 374462976. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:08:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:08:15,753][1652475] Updated weights for policy 0, policy_version 731253 (0.0014) [2024-06-15 21:08:20,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 41506.2, 300 sec: 42542.9). Total num frames: 1497759744. Throughput: 0: 10046.6. Samples: 374491648. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:08:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 21:08:22,943][1652475] Updated weights for policy 0, policy_version 731345 (0.0013) [2024-06-15 21:08:25,283][1652475] Updated weights for policy 0, policy_version 731397 (0.0033) [2024-06-15 21:08:25,738][1648984] Fps is (10 sec: 32767.4, 60 sec: 39867.6, 300 sec: 42653.9). Total num frames: 1497923584. Throughput: 0: 10410.6. Samples: 374559744. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:08:25,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:08:27,950][1652475] Updated weights for policy 0, policy_version 731504 (0.0017) [2024-06-15 21:08:30,590][1652475] Updated weights for policy 0, policy_version 731584 (0.0117) [2024-06-15 21:08:30,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1498284032. Throughput: 0: 10092.1. Samples: 374612992. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:08:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:08:35,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 40413.8, 300 sec: 42320.7). Total num frames: 1498316800. Throughput: 0: 10274.1. Samples: 374651904. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:08:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:08:37,442][1652475] Updated weights for policy 0, policy_version 731652 (0.0013) [2024-06-15 21:08:39,243][1652475] Updated weights for policy 0, policy_version 731733 (0.0012) [2024-06-15 21:08:39,877][1652475] Updated weights for policy 0, policy_version 731775 (0.0021) [2024-06-15 21:08:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 1498677248. Throughput: 0: 10376.5. Samples: 374713856. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:08:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:08:42,044][1652475] Updated weights for policy 0, policy_version 731840 (0.0013) [2024-06-15 21:08:45,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 41506.1, 300 sec: 42542.9). Total num frames: 1498808320. Throughput: 0: 10433.5. Samples: 374786560. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:08:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:08:47,518][1651340] Signal inference workers to stop experience collection... (37600 times) [2024-06-15 21:08:47,554][1652475] InferenceWorker_p0-w0: stopping experience collection (37600 times) [2024-06-15 21:08:47,746][1651340] Signal inference workers to resume experience collection... (37600 times) [2024-06-15 21:08:47,746][1652475] InferenceWorker_p0-w0: resuming experience collection (37600 times) [2024-06-15 21:08:48,461][1652475] Updated weights for policy 0, policy_version 731898 (0.0018) [2024-06-15 21:08:49,729][1652475] Updated weights for policy 0, policy_version 731952 (0.0012) [2024-06-15 21:08:50,570][1652475] Updated weights for policy 0, policy_version 731990 (0.0013) [2024-06-15 21:08:50,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1499136000. Throughput: 0: 10740.6. Samples: 374819328. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:08:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:08:52,955][1652475] Updated weights for policy 0, policy_version 732064 (0.0010) [2024-06-15 21:08:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1499332608. Throughput: 0: 10672.4. Samples: 374888448. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:08:55,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:08:55,792][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000732096_1499332608.pth... [2024-06-15 21:08:55,900][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000727040_1488977920.pth [2024-06-15 21:08:59,058][1652475] Updated weights for policy 0, policy_version 732129 (0.0014) [2024-06-15 21:09:00,650][1652475] Updated weights for policy 0, policy_version 732180 (0.0011) [2024-06-15 21:09:00,738][1648984] Fps is (10 sec: 36044.1, 60 sec: 43144.5, 300 sec: 42431.8). Total num frames: 1499496448. Throughput: 0: 11104.7. Samples: 374962688. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:00,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 21:09:02,300][1652475] Updated weights for policy 0, policy_version 732272 (0.0013) [2024-06-15 21:09:03,469][1652475] Updated weights for policy 0, policy_version 732306 (0.0010) [2024-06-15 21:09:05,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1499856896. Throughput: 0: 11184.3. Samples: 374994944. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:09:09,615][1652475] Updated weights for policy 0, policy_version 732384 (0.0090) [2024-06-15 21:09:10,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 44782.9, 300 sec: 42653.9). Total num frames: 1499987968. Throughput: 0: 11264.0. Samples: 375066624. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:09:13,507][1652475] Updated weights for policy 0, policy_version 732448 (0.0013) [2024-06-15 21:09:14,370][1652475] Updated weights for policy 0, policy_version 732484 (0.0011) [2024-06-15 21:09:15,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 44236.5, 300 sec: 42876.0). Total num frames: 1500250112. Throughput: 0: 11502.9. Samples: 375130624. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:15,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:09:16,557][1652475] Updated weights for policy 0, policy_version 732576 (0.0011) [2024-06-15 21:09:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1500381184. Throughput: 0: 11184.4. Samples: 375155200. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:09:21,506][1652475] Updated weights for policy 0, policy_version 732642 (0.0015) [2024-06-15 21:09:25,595][1652475] Updated weights for policy 0, policy_version 732704 (0.0015) [2024-06-15 21:09:25,738][1648984] Fps is (10 sec: 32769.0, 60 sec: 44236.9, 300 sec: 42765.0). Total num frames: 1500577792. Throughput: 0: 11229.9. Samples: 375219200. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:09:26,378][1652475] Updated weights for policy 0, policy_version 732736 (0.0013) [2024-06-15 21:09:28,797][1651340] Signal inference workers to stop experience collection... (37650 times) [2024-06-15 21:09:28,847][1652475] Updated weights for policy 0, policy_version 732801 (0.0113) [2024-06-15 21:09:28,874][1652475] InferenceWorker_p0-w0: stopping experience collection (37650 times) [2024-06-15 21:09:29,069][1651340] Signal inference workers to resume experience collection... (37650 times) [2024-06-15 21:09:29,070][1652475] InferenceWorker_p0-w0: resuming experience collection (37650 times) [2024-06-15 21:09:29,993][1652475] Updated weights for policy 0, policy_version 732853 (0.0013) [2024-06-15 21:09:30,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1500905472. Throughput: 0: 10968.2. Samples: 375280128. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:30,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:09:32,916][1652475] Updated weights for policy 0, policy_version 732896 (0.0013) [2024-06-15 21:09:35,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 45329.1, 300 sec: 42653.9). Total num frames: 1501036544. Throughput: 0: 11002.3. Samples: 375314432. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:09:36,615][1652475] Updated weights for policy 0, policy_version 732960 (0.0013) [2024-06-15 21:09:39,337][1652475] Updated weights for policy 0, policy_version 733008 (0.0015) [2024-06-15 21:09:40,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1501298688. Throughput: 0: 11013.7. Samples: 375384064. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:09:42,715][1652475] Updated weights for policy 0, policy_version 733089 (0.0014) [2024-06-15 21:09:44,607][1652475] Updated weights for policy 0, policy_version 733138 (0.0014) [2024-06-15 21:09:45,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 42765.0). Total num frames: 1501560832. Throughput: 0: 10706.5. Samples: 375444480. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:09:48,616][1652475] Updated weights for policy 0, policy_version 733203 (0.0012) [2024-06-15 21:09:49,632][1652475] Updated weights for policy 0, policy_version 733248 (0.0030) [2024-06-15 21:09:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 43098.3). Total num frames: 1501691904. Throughput: 0: 10797.5. Samples: 375480832. Policy #0 lag: (min: 15.0, avg: 80.4, max: 271.0) [2024-06-15 21:09:50,741][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:09:52,138][1652475] Updated weights for policy 0, policy_version 733308 (0.0011) [2024-06-15 21:09:54,891][1652475] Updated weights for policy 0, policy_version 733368 (0.0028) [2024-06-15 21:09:55,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 1501954048. Throughput: 0: 10581.3. Samples: 375542784. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:09:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:09:58,323][1652475] Updated weights for policy 0, policy_version 733410 (0.0024) [2024-06-15 21:10:00,091][1652475] Updated weights for policy 0, policy_version 733457 (0.0044) [2024-06-15 21:10:00,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 44783.1, 300 sec: 42987.2). Total num frames: 1502183424. Throughput: 0: 10638.3. Samples: 375609344. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:10:02,436][1652475] Updated weights for policy 0, policy_version 733509 (0.0080) [2024-06-15 21:10:05,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 43209.4). Total num frames: 1502380032. Throughput: 0: 10740.6. Samples: 375638528. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:10:06,153][1652475] Updated weights for policy 0, policy_version 733600 (0.0012) [2024-06-15 21:10:06,974][1652475] Updated weights for policy 0, policy_version 733632 (0.0032) [2024-06-15 21:10:10,740][1648984] Fps is (10 sec: 29484.4, 60 sec: 41504.7, 300 sec: 42431.5). Total num frames: 1502478336. Throughput: 0: 10922.1. Samples: 375710720. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:10,741][1648984] Avg episode reward: [(0, '-0.570')] [2024-06-15 21:10:12,220][1652475] Updated weights for policy 0, policy_version 733690 (0.0014) [2024-06-15 21:10:13,838][1652475] Updated weights for policy 0, policy_version 733758 (0.0010) [2024-06-15 21:10:15,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 43144.8, 300 sec: 43431.5). Total num frames: 1502838784. Throughput: 0: 10695.1. Samples: 375761408. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:15,738][1648984] Avg episode reward: [(0, '-0.640')] [2024-06-15 21:10:15,844][1652475] Updated weights for policy 0, policy_version 733820 (0.0046) [2024-06-15 21:10:17,863][1651340] Signal inference workers to stop experience collection... (37700 times) [2024-06-15 21:10:17,906][1652475] InferenceWorker_p0-w0: stopping experience collection (37700 times) [2024-06-15 21:10:18,141][1651340] Signal inference workers to resume experience collection... (37700 times) [2024-06-15 21:10:18,144][1652475] InferenceWorker_p0-w0: resuming experience collection (37700 times) [2024-06-15 21:10:18,753][1652475] Updated weights for policy 0, policy_version 733879 (0.0020) [2024-06-15 21:10:20,738][1648984] Fps is (10 sec: 52440.5, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1503002624. Throughput: 0: 10649.6. Samples: 375793664. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:10:25,241][1652475] Updated weights for policy 0, policy_version 733936 (0.0012) [2024-06-15 21:10:25,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1503133696. Throughput: 0: 10683.7. Samples: 375864832. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:25,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 21:10:27,050][1652475] Updated weights for policy 0, policy_version 734021 (0.0017) [2024-06-15 21:10:28,189][1652475] Updated weights for policy 0, policy_version 734072 (0.0011) [2024-06-15 21:10:30,495][1652475] Updated weights for policy 0, policy_version 734116 (0.0121) [2024-06-15 21:10:30,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 1503494144. Throughput: 0: 10626.8. Samples: 375922688. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:10:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1503526912. Throughput: 0: 10558.6. Samples: 375955968. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:10:36,937][1652475] Updated weights for policy 0, policy_version 734192 (0.0013) [2024-06-15 21:10:38,623][1652475] Updated weights for policy 0, policy_version 734272 (0.0090) [2024-06-15 21:10:40,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 1503920128. Throughput: 0: 10558.6. Samples: 376017920. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:10:42,818][1652475] Updated weights for policy 0, policy_version 734368 (0.0020) [2024-06-15 21:10:45,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 1504051200. Throughput: 0: 10513.0. Samples: 376082432. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:10:49,232][1652475] Updated weights for policy 0, policy_version 734432 (0.0011) [2024-06-15 21:10:50,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 42598.5, 300 sec: 43320.5). Total num frames: 1504247808. Throughput: 0: 10717.9. Samples: 376120832. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:50,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:10:51,193][1652475] Updated weights for policy 0, policy_version 734514 (0.0013) [2024-06-15 21:10:52,755][1652475] Updated weights for policy 0, policy_version 734576 (0.0016) [2024-06-15 21:10:55,272][1652475] Updated weights for policy 0, policy_version 734649 (0.0109) [2024-06-15 21:10:55,738][1648984] Fps is (10 sec: 52427.1, 60 sec: 43690.4, 300 sec: 43209.3). Total num frames: 1504575488. Throughput: 0: 10433.8. Samples: 376180224. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:10:55,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:10:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000734656_1504575488.pth... [2024-06-15 21:10:55,791][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000729600_1494220800.pth [2024-06-15 21:10:55,797][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000734656_1504575488.pth [2024-06-15 21:11:00,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 39867.6, 300 sec: 42653.9). Total num frames: 1504575488. Throughput: 0: 10888.5. Samples: 376251392. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:11:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:11:01,770][1652475] Updated weights for policy 0, policy_version 734691 (0.0016) [2024-06-15 21:11:03,591][1651340] Signal inference workers to stop experience collection... (37750 times) [2024-06-15 21:11:03,639][1652475] InferenceWorker_p0-w0: stopping experience collection (37750 times) [2024-06-15 21:11:03,815][1651340] Signal inference workers to resume experience collection... (37750 times) [2024-06-15 21:11:03,816][1652475] InferenceWorker_p0-w0: resuming experience collection (37750 times) [2024-06-15 21:11:04,028][1652475] Updated weights for policy 0, policy_version 734769 (0.0011) [2024-06-15 21:11:05,727][1652475] Updated weights for policy 0, policy_version 734839 (0.0084) [2024-06-15 21:11:05,738][1648984] Fps is (10 sec: 36046.0, 60 sec: 42598.3, 300 sec: 42987.2). Total num frames: 1504935936. Throughput: 0: 10774.7. Samples: 376278528. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:11:05,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:11:06,538][1652475] Updated weights for policy 0, policy_version 734865 (0.0012) [2024-06-15 21:11:07,641][1652475] Updated weights for policy 0, policy_version 734912 (0.0012) [2024-06-15 21:11:10,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43692.3, 300 sec: 42653.9). Total num frames: 1505099776. Throughput: 0: 10501.7. Samples: 376337408. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:11:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:11:15,451][1652475] Updated weights for policy 0, policy_version 735008 (0.0116) [2024-06-15 21:11:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.0, 300 sec: 42765.0). Total num frames: 1505296384. Throughput: 0: 10729.3. Samples: 376405504. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:11:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:11:17,069][1652475] Updated weights for policy 0, policy_version 735072 (0.0013) [2024-06-15 21:11:19,760][1652475] Updated weights for policy 0, policy_version 735152 (0.0013) [2024-06-15 21:11:20,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1505624064. Throughput: 0: 10501.7. Samples: 376428544. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:11:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:11:25,738][1648984] Fps is (10 sec: 36043.9, 60 sec: 42052.1, 300 sec: 42765.0). Total num frames: 1505656832. Throughput: 0: 10763.3. Samples: 376502272. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:11:25,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:11:26,239][1652475] Updated weights for policy 0, policy_version 735216 (0.0081) [2024-06-15 21:11:27,329][1652475] Updated weights for policy 0, policy_version 735248 (0.0012) [2024-06-15 21:11:30,118][1652475] Updated weights for policy 0, policy_version 735356 (0.0121) [2024-06-15 21:11:30,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 1506017280. Throughput: 0: 10558.5. Samples: 376557568. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:11:30,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:11:32,058][1652475] Updated weights for policy 0, policy_version 735408 (0.0013) [2024-06-15 21:11:35,738][1648984] Fps is (10 sec: 49153.2, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1506148352. Throughput: 0: 10399.3. Samples: 376588800. Policy #0 lag: (min: 0.0, avg: 115.9, max: 256.0) [2024-06-15 21:11:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:11:38,127][1652475] Updated weights for policy 0, policy_version 735472 (0.0014) [2024-06-15 21:11:39,779][1652475] Updated weights for policy 0, policy_version 735543 (0.0017) [2024-06-15 21:11:40,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1506410496. Throughput: 0: 10490.4. Samples: 376652288. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:11:40,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:11:43,546][1652475] Updated weights for policy 0, policy_version 735589 (0.0023) [2024-06-15 21:11:45,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1506541568. Throughput: 0: 10353.8. Samples: 376717312. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:11:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:11:47,037][1652475] Updated weights for policy 0, policy_version 735648 (0.0016) [2024-06-15 21:11:49,311][1652475] Updated weights for policy 0, policy_version 735712 (0.0014) [2024-06-15 21:11:49,401][1651340] Signal inference workers to stop experience collection... (37800 times) [2024-06-15 21:11:49,462][1652475] InferenceWorker_p0-w0: stopping experience collection (37800 times) [2024-06-15 21:11:49,687][1651340] Signal inference workers to resume experience collection... (37800 times) [2024-06-15 21:11:49,688][1652475] InferenceWorker_p0-w0: resuming experience collection (37800 times) [2024-06-15 21:11:50,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 1506836480. Throughput: 0: 10433.4. Samples: 376748032. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:11:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:11:50,863][1652475] Updated weights for policy 0, policy_version 735776 (0.0011) [2024-06-15 21:11:55,465][1652475] Updated weights for policy 0, policy_version 735840 (0.0016) [2024-06-15 21:11:55,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 40414.1, 300 sec: 42431.8). Total num frames: 1507000320. Throughput: 0: 10638.2. Samples: 376816128. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:11:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:12:00,538][1652475] Updated weights for policy 0, policy_version 735907 (0.0015) [2024-06-15 21:12:00,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 43144.5, 300 sec: 42098.6). Total num frames: 1507164160. Throughput: 0: 10638.2. Samples: 376884224. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:12:02,316][1652475] Updated weights for policy 0, policy_version 735984 (0.0101) [2024-06-15 21:12:03,888][1652475] Updated weights for policy 0, policy_version 736055 (0.0015) [2024-06-15 21:12:05,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1507459072. Throughput: 0: 10604.1. Samples: 376905728. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:12:07,898][1652475] Updated weights for policy 0, policy_version 736112 (0.0013) [2024-06-15 21:12:10,771][1648984] Fps is (10 sec: 42457.3, 60 sec: 41483.1, 300 sec: 42204.9). Total num frames: 1507590144. Throughput: 0: 10562.2. Samples: 376977920. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:10,771][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:12:13,299][1652475] Updated weights for policy 0, policy_version 736176 (0.0013) [2024-06-15 21:12:14,911][1652475] Updated weights for policy 0, policy_version 736241 (0.0011) [2024-06-15 21:12:15,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 1507885056. Throughput: 0: 10695.2. Samples: 377038848. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:12:16,466][1652475] Updated weights for policy 0, policy_version 736316 (0.0013) [2024-06-15 21:12:20,737][1648984] Fps is (10 sec: 49316.6, 60 sec: 40960.1, 300 sec: 42542.9). Total num frames: 1508081664. Throughput: 0: 10683.8. Samples: 377069568. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:20,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 21:12:23,864][1652475] Updated weights for policy 0, policy_version 736385 (0.0108) [2024-06-15 21:12:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.8, 300 sec: 42320.7). Total num frames: 1508278272. Throughput: 0: 10683.7. Samples: 377133056. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:25,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:12:26,002][1652475] Updated weights for policy 0, policy_version 736484 (0.0012) [2024-06-15 21:12:28,633][1652475] Updated weights for policy 0, policy_version 736560 (0.0013) [2024-06-15 21:12:30,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 41506.3, 300 sec: 42765.0). Total num frames: 1508507648. Throughput: 0: 10683.8. Samples: 377198080. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:30,740][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:12:31,418][1652475] Updated weights for policy 0, policy_version 736594 (0.0011) [2024-06-15 21:12:32,607][1652475] Updated weights for policy 0, policy_version 736640 (0.0127) [2024-06-15 21:12:35,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1508638720. Throughput: 0: 10638.2. Samples: 377226752. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:35,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 21:12:35,915][1651340] Signal inference workers to stop experience collection... (37850 times) [2024-06-15 21:12:35,951][1652475] InferenceWorker_p0-w0: stopping experience collection (37850 times) [2024-06-15 21:12:36,167][1651340] Signal inference workers to resume experience collection... (37850 times) [2024-06-15 21:12:36,169][1652475] InferenceWorker_p0-w0: resuming experience collection (37850 times) [2024-06-15 21:12:37,382][1652475] Updated weights for policy 0, policy_version 736710 (0.0015) [2024-06-15 21:12:40,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1508900864. Throughput: 0: 10558.6. Samples: 377291264. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:12:40,986][1652475] Updated weights for policy 0, policy_version 736774 (0.0088) [2024-06-15 21:12:42,101][1652475] Updated weights for policy 0, policy_version 736824 (0.0012) [2024-06-15 21:12:43,464][1652475] Updated weights for policy 0, policy_version 736854 (0.0012) [2024-06-15 21:12:45,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1509163008. Throughput: 0: 10490.3. Samples: 377356288. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:12:46,791][1652475] Updated weights for policy 0, policy_version 736898 (0.0014) [2024-06-15 21:12:50,739][1648984] Fps is (10 sec: 39317.3, 60 sec: 40959.2, 300 sec: 42653.8). Total num frames: 1509294080. Throughput: 0: 10740.3. Samples: 377389056. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:50,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:12:50,959][1652475] Updated weights for policy 0, policy_version 736966 (0.0014) [2024-06-15 21:12:52,933][1652475] Updated weights for policy 0, policy_version 737057 (0.0014) [2024-06-15 21:12:54,216][1652475] Updated weights for policy 0, policy_version 737092 (0.0013) [2024-06-15 21:12:55,742][1648984] Fps is (10 sec: 52404.3, 60 sec: 44779.5, 300 sec: 43319.7). Total num frames: 1509687296. Throughput: 0: 10758.8. Samples: 377461760. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:12:55,743][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:12:55,747][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000737152_1509687296.pth... [2024-06-15 21:12:55,821][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000732096_1499332608.pth [2024-06-15 21:12:58,779][1652475] Updated weights for policy 0, policy_version 737184 (0.0195) [2024-06-15 21:13:00,738][1648984] Fps is (10 sec: 52434.5, 60 sec: 44236.8, 300 sec: 42653.9). Total num frames: 1509818368. Throughput: 0: 10854.4. Samples: 377527296. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:13:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:13:03,861][1652475] Updated weights for policy 0, policy_version 737264 (0.0014) [2024-06-15 21:13:05,738][1648984] Fps is (10 sec: 36061.7, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1510047744. Throughput: 0: 11013.7. Samples: 377565184. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:13:05,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:13:06,639][1652475] Updated weights for policy 0, policy_version 737360 (0.0012) [2024-06-15 21:13:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43714.9, 300 sec: 42765.0). Total num frames: 1510211584. Throughput: 0: 10774.8. Samples: 377617920. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:13:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:13:11,193][1652475] Updated weights for policy 0, policy_version 737411 (0.0055) [2024-06-15 21:13:12,476][1652475] Updated weights for policy 0, policy_version 737469 (0.0013) [2024-06-15 21:13:15,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 41506.2, 300 sec: 42765.0). Total num frames: 1510375424. Throughput: 0: 10968.2. Samples: 377691648. Policy #0 lag: (min: 0.0, avg: 73.0, max: 256.0) [2024-06-15 21:13:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:13:17,016][1652475] Updated weights for policy 0, policy_version 737536 (0.0012) [2024-06-15 21:13:19,234][1652475] Updated weights for policy 0, policy_version 737621 (0.0089) [2024-06-15 21:13:19,544][1651340] Signal inference workers to stop experience collection... (37900 times) [2024-06-15 21:13:19,583][1652475] InferenceWorker_p0-w0: stopping experience collection (37900 times) [2024-06-15 21:13:19,821][1651340] Signal inference workers to resume experience collection... (37900 times) [2024-06-15 21:13:19,821][1652475] InferenceWorker_p0-w0: resuming experience collection (37900 times) [2024-06-15 21:13:20,207][1652475] Updated weights for policy 0, policy_version 737660 (0.0048) [2024-06-15 21:13:20,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 44236.7, 300 sec: 43431.5). Total num frames: 1510735872. Throughput: 0: 10797.5. Samples: 377712640. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:13:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:13:24,387][1652475] Updated weights for policy 0, policy_version 737712 (0.0014) [2024-06-15 21:13:25,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1510866944. Throughput: 0: 10854.4. Samples: 377779712. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:13:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:13:28,282][1652475] Updated weights for policy 0, policy_version 737760 (0.0012) [2024-06-15 21:13:29,935][1652475] Updated weights for policy 0, policy_version 737824 (0.0010) [2024-06-15 21:13:30,751][1648984] Fps is (10 sec: 39268.1, 60 sec: 43680.7, 300 sec: 43429.5). Total num frames: 1511129088. Throughput: 0: 10930.7. Samples: 377848320. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:13:30,752][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:13:31,272][1652475] Updated weights for policy 0, policy_version 737875 (0.0117) [2024-06-15 21:13:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1511260160. Throughput: 0: 10831.9. Samples: 377876480. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:13:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:13:35,972][1652475] Updated weights for policy 0, policy_version 737936 (0.0015) [2024-06-15 21:13:40,055][1652475] Updated weights for policy 0, policy_version 738005 (0.0013) [2024-06-15 21:13:40,738][1648984] Fps is (10 sec: 36093.7, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1511489536. Throughput: 0: 10730.3. Samples: 377944576. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:13:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:13:41,856][1652475] Updated weights for policy 0, policy_version 738080 (0.0011) [2024-06-15 21:13:44,202][1652475] Updated weights for policy 0, policy_version 738146 (0.0033) [2024-06-15 21:13:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1511784448. Throughput: 0: 10524.5. Samples: 378000896. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:13:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:13:48,689][1652475] Updated weights for policy 0, policy_version 738213 (0.0014) [2024-06-15 21:13:50,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43691.5, 300 sec: 42653.9). Total num frames: 1511915520. Throughput: 0: 10467.6. Samples: 378036224. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:13:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:13:52,814][1652475] Updated weights for policy 0, policy_version 738288 (0.0011) [2024-06-15 21:13:54,959][1652475] Updated weights for policy 0, policy_version 738367 (0.0026) [2024-06-15 21:13:55,740][1648984] Fps is (10 sec: 39321.4, 60 sec: 41509.3, 300 sec: 42987.2). Total num frames: 1512177664. Throughput: 0: 10638.2. Samples: 378096640. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:13:55,741][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:13:59,518][1652475] Updated weights for policy 0, policy_version 738422 (0.0013) [2024-06-15 21:14:00,738][1648984] Fps is (10 sec: 39318.7, 60 sec: 41505.7, 300 sec: 42209.5). Total num frames: 1512308736. Throughput: 0: 10353.6. Samples: 378157568. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:00,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:14:02,112][1652475] Updated weights for policy 0, policy_version 738464 (0.0012) [2024-06-15 21:14:03,342][1652475] Updated weights for policy 0, policy_version 738497 (0.0013) [2024-06-15 21:14:05,271][1652475] Updated weights for policy 0, policy_version 738576 (0.0261) [2024-06-15 21:14:05,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 43144.6, 300 sec: 42876.1). Total num frames: 1512636416. Throughput: 0: 10592.7. Samples: 378189312. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:14:05,792][1651340] Signal inference workers to stop experience collection... (37950 times) [2024-06-15 21:14:05,832][1652475] InferenceWorker_p0-w0: stopping experience collection (37950 times) [2024-06-15 21:14:06,108][1651340] Signal inference workers to resume experience collection... (37950 times) [2024-06-15 21:14:06,109][1652475] InferenceWorker_p0-w0: resuming experience collection (37950 times) [2024-06-15 21:14:10,738][1648984] Fps is (10 sec: 39324.5, 60 sec: 41506.2, 300 sec: 42209.7). Total num frames: 1512701952. Throughput: 0: 10376.5. Samples: 378246656. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:14:12,634][1652475] Updated weights for policy 0, policy_version 738672 (0.0013) [2024-06-15 21:14:15,228][1652475] Updated weights for policy 0, policy_version 738724 (0.0011) [2024-06-15 21:14:15,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 1512931328. Throughput: 0: 10436.6. Samples: 378317824. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:14:17,343][1652475] Updated weights for policy 0, policy_version 738803 (0.0013) [2024-06-15 21:14:18,785][1652475] Updated weights for policy 0, policy_version 738876 (0.0027) [2024-06-15 21:14:20,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 1513226240. Throughput: 0: 10217.3. Samples: 378336256. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:14:25,556][1652475] Updated weights for policy 0, policy_version 738930 (0.0011) [2024-06-15 21:14:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 40960.0, 300 sec: 42098.6). Total num frames: 1513324544. Throughput: 0: 10365.2. Samples: 378411008. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:14:28,597][1652475] Updated weights for policy 0, policy_version 738995 (0.0013) [2024-06-15 21:14:29,856][1652475] Updated weights for policy 0, policy_version 739030 (0.0012) [2024-06-15 21:14:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41515.6, 300 sec: 42653.9). Total num frames: 1513619456. Throughput: 0: 10353.8. Samples: 378466816. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:14:31,557][1652475] Updated weights for policy 0, policy_version 739120 (0.0011) [2024-06-15 21:14:35,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1513750528. Throughput: 0: 10274.1. Samples: 378498560. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:35,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:14:36,867][1652475] Updated weights for policy 0, policy_version 739184 (0.0040) [2024-06-15 21:14:40,014][1652475] Updated weights for policy 0, policy_version 739237 (0.0095) [2024-06-15 21:14:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 1514012672. Throughput: 0: 10376.6. Samples: 378563584. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:14:42,559][1652475] Updated weights for policy 0, policy_version 739312 (0.0015) [2024-06-15 21:14:42,969][1652475] Updated weights for policy 0, policy_version 739328 (0.0012) [2024-06-15 21:14:44,961][1652475] Updated weights for policy 0, policy_version 739387 (0.0012) [2024-06-15 21:14:45,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1514274816. Throughput: 0: 10456.4. Samples: 378628096. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:14:48,473][1652475] Updated weights for policy 0, policy_version 739456 (0.0012) [2024-06-15 21:14:50,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1514405888. Throughput: 0: 10467.6. Samples: 378660352. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:14:51,735][1652475] Updated weights for policy 0, policy_version 739518 (0.0130) [2024-06-15 21:14:53,955][1652475] Updated weights for policy 0, policy_version 739579 (0.0012) [2024-06-15 21:14:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42320.7). Total num frames: 1514668032. Throughput: 0: 10661.0. Samples: 378726400. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:14:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:14:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000739584_1514668032.pth... [2024-06-15 21:14:55,787][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000734656_1504575488.pth [2024-06-15 21:14:57,329][1651340] Signal inference workers to stop experience collection... (38000 times) [2024-06-15 21:14:57,388][1652475] InferenceWorker_p0-w0: stopping experience collection (38000 times) [2024-06-15 21:14:57,667][1651340] Signal inference workers to resume experience collection... (38000 times) [2024-06-15 21:14:57,667][1652475] InferenceWorker_p0-w0: resuming experience collection (38000 times) [2024-06-15 21:14:58,104][1652475] Updated weights for policy 0, policy_version 739632 (0.0011) [2024-06-15 21:14:59,756][1652475] Updated weights for policy 0, policy_version 739708 (0.0012) [2024-06-15 21:15:00,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43691.2, 300 sec: 42542.9). Total num frames: 1514930176. Throughput: 0: 10410.7. Samples: 378786304. Policy #0 lag: (min: 96.0, avg: 208.8, max: 320.0) [2024-06-15 21:15:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:15:04,852][1652475] Updated weights for policy 0, policy_version 739761 (0.0013) [2024-06-15 21:15:05,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 40960.0, 300 sec: 42765.3). Total num frames: 1515094016. Throughput: 0: 10888.5. Samples: 378826240. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:05,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 21:15:06,224][1652475] Updated weights for policy 0, policy_version 739810 (0.0009) [2024-06-15 21:15:10,199][1652475] Updated weights for policy 0, policy_version 739873 (0.0012) [2024-06-15 21:15:10,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43144.5, 300 sec: 42209.6). Total num frames: 1515290624. Throughput: 0: 10661.0. Samples: 378890752. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:15:11,889][1652475] Updated weights for policy 0, policy_version 739937 (0.0011) [2024-06-15 21:15:15,740][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 1515454464. Throughput: 0: 10831.6. Samples: 378954240. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:15,741][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:15:16,602][1652475] Updated weights for policy 0, policy_version 740000 (0.0011) [2024-06-15 21:15:19,751][1652475] Updated weights for policy 0, policy_version 740080 (0.0110) [2024-06-15 21:15:20,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1515716608. Throughput: 0: 10843.0. Samples: 378986496. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:15:22,145][1652475] Updated weights for policy 0, policy_version 740153 (0.0014) [2024-06-15 21:15:23,371][1652475] Updated weights for policy 0, policy_version 740197 (0.0022) [2024-06-15 21:15:25,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 44236.7, 300 sec: 42320.7). Total num frames: 1515978752. Throughput: 0: 10638.2. Samples: 379042304. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:15:29,300][1652475] Updated weights for policy 0, policy_version 740272 (0.0015) [2024-06-15 21:15:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1516109824. Throughput: 0: 10820.2. Samples: 379115008. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:15:31,721][1652475] Updated weights for policy 0, policy_version 740321 (0.0012) [2024-06-15 21:15:33,340][1652475] Updated weights for policy 0, policy_version 740385 (0.0097) [2024-06-15 21:15:34,821][1652475] Updated weights for policy 0, policy_version 740437 (0.0059) [2024-06-15 21:15:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 42653.9). Total num frames: 1516503040. Throughput: 0: 10717.8. Samples: 379142656. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:35,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:15:40,738][1648984] Fps is (10 sec: 42597.2, 60 sec: 42052.0, 300 sec: 42320.7). Total num frames: 1516535808. Throughput: 0: 10877.1. Samples: 379215872. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:40,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:15:40,931][1652475] Updated weights for policy 0, policy_version 740512 (0.0036) [2024-06-15 21:15:43,357][1651340] Signal inference workers to stop experience collection... (38050 times) [2024-06-15 21:15:43,455][1652475] Updated weights for policy 0, policy_version 740580 (0.0043) [2024-06-15 21:15:43,477][1652475] InferenceWorker_p0-w0: stopping experience collection (38050 times) [2024-06-15 21:15:43,643][1651340] Signal inference workers to resume experience collection... (38050 times) [2024-06-15 21:15:43,644][1652475] InferenceWorker_p0-w0: resuming experience collection (38050 times) [2024-06-15 21:15:44,929][1652475] Updated weights for policy 0, policy_version 740624 (0.0029) [2024-06-15 21:15:45,669][1652475] Updated weights for policy 0, policy_version 740672 (0.0011) [2024-06-15 21:15:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1516896256. Throughput: 0: 10888.5. Samples: 379276288. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:45,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:15:50,738][1648984] Fps is (10 sec: 49153.7, 60 sec: 43690.6, 300 sec: 42209.7). Total num frames: 1517027328. Throughput: 0: 10820.3. Samples: 379313152. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:15:51,654][1652475] Updated weights for policy 0, policy_version 740740 (0.0012) [2024-06-15 21:15:54,757][1652475] Updated weights for policy 0, policy_version 740817 (0.0013) [2024-06-15 21:15:55,740][1648984] Fps is (10 sec: 39313.4, 60 sec: 43689.1, 300 sec: 43097.9). Total num frames: 1517289472. Throughput: 0: 10888.0. Samples: 379380736. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:15:55,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:15:57,368][1652475] Updated weights for policy 0, policy_version 740898 (0.0011) [2024-06-15 21:15:59,141][1652475] Updated weights for policy 0, policy_version 740976 (0.0011) [2024-06-15 21:16:00,740][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1517551616. Throughput: 0: 10786.1. Samples: 379439616. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:16:00,742][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:16:05,026][1652475] Updated weights for policy 0, policy_version 741025 (0.0012) [2024-06-15 21:16:05,754][1648984] Fps is (10 sec: 39266.5, 60 sec: 43132.9, 300 sec: 42651.6). Total num frames: 1517682688. Throughput: 0: 10918.7. Samples: 379478016. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:16:05,754][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:16:06,472][1652475] Updated weights for policy 0, policy_version 741073 (0.0012) [2024-06-15 21:16:09,800][1652475] Updated weights for policy 0, policy_version 741122 (0.0013) [2024-06-15 21:16:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1517912064. Throughput: 0: 11002.3. Samples: 379537408. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:16:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:16:11,414][1652475] Updated weights for policy 0, policy_version 741200 (0.0082) [2024-06-15 21:16:12,604][1652475] Updated weights for policy 0, policy_version 741248 (0.0013) [2024-06-15 21:16:15,738][1648984] Fps is (10 sec: 39385.4, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1518075904. Throughput: 0: 10820.3. Samples: 379601920. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:16:15,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:16:18,623][1652475] Updated weights for policy 0, policy_version 741328 (0.0037) [2024-06-15 21:16:20,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 1518338048. Throughput: 0: 10888.5. Samples: 379632640. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:16:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:16:22,109][1652475] Updated weights for policy 0, policy_version 741414 (0.0012) [2024-06-15 21:16:24,806][1652475] Updated weights for policy 0, policy_version 741488 (0.0094) [2024-06-15 21:16:25,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 42654.0). Total num frames: 1518600192. Throughput: 0: 10649.7. Samples: 379695104. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:16:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 21:16:29,527][1652475] Updated weights for policy 0, policy_version 741552 (0.0013) [2024-06-15 21:16:30,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1518731264. Throughput: 0: 10774.8. Samples: 379761152. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:16:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 21:16:32,019][1651340] Signal inference workers to stop experience collection... (38100 times) [2024-06-15 21:16:32,064][1652475] InferenceWorker_p0-w0: stopping experience collection (38100 times) [2024-06-15 21:16:32,273][1651340] Signal inference workers to resume experience collection... (38100 times) [2024-06-15 21:16:32,274][1652475] InferenceWorker_p0-w0: resuming experience collection (38100 times) [2024-06-15 21:16:32,890][1652475] Updated weights for policy 0, policy_version 741625 (0.0012) [2024-06-15 21:16:35,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1518993408. Throughput: 0: 10604.1. Samples: 379790336. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:16:35,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:16:36,056][1652475] Updated weights for policy 0, policy_version 741697 (0.0012) [2024-06-15 21:16:40,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43144.7, 300 sec: 42653.9). Total num frames: 1519124480. Throughput: 0: 10468.0. Samples: 379851776. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:16:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:16:40,989][1652475] Updated weights for policy 0, policy_version 741763 (0.0011) [2024-06-15 21:16:44,498][1652475] Updated weights for policy 0, policy_version 741825 (0.0013) [2024-06-15 21:16:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 42542.9). Total num frames: 1519386624. Throughput: 0: 10729.3. Samples: 379922432. Policy #0 lag: (min: 15.0, avg: 85.2, max: 207.0) [2024-06-15 21:16:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:16:46,072][1652475] Updated weights for policy 0, policy_version 741904 (0.0013) [2024-06-15 21:16:47,426][1652475] Updated weights for policy 0, policy_version 741952 (0.0013) [2024-06-15 21:16:49,300][1652475] Updated weights for policy 0, policy_version 742007 (0.0012) [2024-06-15 21:16:50,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1519648768. Throughput: 0: 10494.1. Samples: 379950080. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:16:50,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:16:53,465][1652475] Updated weights for policy 0, policy_version 742064 (0.0012) [2024-06-15 21:16:55,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 41507.5, 300 sec: 42765.0). Total num frames: 1519779840. Throughput: 0: 10763.3. Samples: 380021760. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:16:55,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 21:16:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000742080_1519779840.pth... [2024-06-15 21:16:55,795][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000737152_1509687296.pth [2024-06-15 21:16:57,675][1652475] Updated weights for policy 0, policy_version 742114 (0.0017) [2024-06-15 21:17:00,238][1652475] Updated weights for policy 0, policy_version 742210 (0.0012) [2024-06-15 21:17:00,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1520074752. Throughput: 0: 10570.0. Samples: 380077568. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:17:04,713][1652475] Updated weights for policy 0, policy_version 742275 (0.0013) [2024-06-15 21:17:05,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 43702.5, 300 sec: 43103.1). Total num frames: 1520304128. Throughput: 0: 10558.6. Samples: 380107776. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:17:08,686][1652475] Updated weights for policy 0, policy_version 742337 (0.0099) [2024-06-15 21:17:09,639][1652475] Updated weights for policy 0, policy_version 742396 (0.0123) [2024-06-15 21:17:10,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 42598.3, 300 sec: 42653.9). Total num frames: 1520467968. Throughput: 0: 10729.2. Samples: 380177920. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:10,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:17:11,963][1652475] Updated weights for policy 0, policy_version 742460 (0.0053) [2024-06-15 21:17:14,556][1652475] Updated weights for policy 0, policy_version 742521 (0.0013) [2024-06-15 21:17:15,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 1520697344. Throughput: 0: 10604.1. Samples: 380238336. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:17:17,975][1651340] Signal inference workers to stop experience collection... (38150 times) [2024-06-15 21:17:18,088][1652475] InferenceWorker_p0-w0: stopping experience collection (38150 times) [2024-06-15 21:17:18,239][1651340] Signal inference workers to resume experience collection... (38150 times) [2024-06-15 21:17:18,241][1652475] InferenceWorker_p0-w0: resuming experience collection (38150 times) [2024-06-15 21:17:18,433][1652475] Updated weights for policy 0, policy_version 742588 (0.0012) [2024-06-15 21:17:20,738][1648984] Fps is (10 sec: 42597.5, 60 sec: 42598.2, 300 sec: 42765.0). Total num frames: 1520893952. Throughput: 0: 10717.8. Samples: 380272640. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:17:20,969][1652475] Updated weights for policy 0, policy_version 742656 (0.0091) [2024-06-15 21:17:23,063][1652475] Updated weights for policy 0, policy_version 742712 (0.0015) [2024-06-15 21:17:25,744][1648984] Fps is (10 sec: 39296.0, 60 sec: 41501.6, 300 sec: 42653.0). Total num frames: 1521090560. Throughput: 0: 10830.1. Samples: 380339200. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:25,745][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:17:26,647][1652475] Updated weights for policy 0, policy_version 742768 (0.0012) [2024-06-15 21:17:30,738][1648984] Fps is (10 sec: 32769.1, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1521221632. Throughput: 0: 10797.5. Samples: 380408320. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:17:32,053][1652475] Updated weights for policy 0, policy_version 742848 (0.0013) [2024-06-15 21:17:33,957][1652475] Updated weights for policy 0, policy_version 742931 (0.0135) [2024-06-15 21:17:35,738][1648984] Fps is (10 sec: 52462.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1521614848. Throughput: 0: 10763.4. Samples: 380434432. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:35,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 21:17:38,820][1652475] Updated weights for policy 0, policy_version 743010 (0.0011) [2024-06-15 21:17:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1521745920. Throughput: 0: 10649.6. Samples: 380500992. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:40,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 21:17:44,524][1652475] Updated weights for policy 0, policy_version 743120 (0.0013) [2024-06-15 21:17:45,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 43098.4). Total num frames: 1522008064. Throughput: 0: 10752.0. Samples: 380561408. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:45,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 21:17:46,152][1652475] Updated weights for policy 0, policy_version 743188 (0.0013) [2024-06-15 21:17:50,675][1652475] Updated weights for policy 0, policy_version 743234 (0.0026) [2024-06-15 21:17:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 42210.3). Total num frames: 1522139136. Throughput: 0: 10843.0. Samples: 380595712. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:50,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 21:17:55,257][1652475] Updated weights for policy 0, policy_version 743302 (0.0014) [2024-06-15 21:17:55,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 42598.5, 300 sec: 42431.8). Total num frames: 1522335744. Throughput: 0: 10888.6. Samples: 380667904. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:17:55,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 21:17:57,277][1652475] Updated weights for policy 0, policy_version 743392 (0.0012) [2024-06-15 21:17:59,190][1652475] Updated weights for policy 0, policy_version 743460 (0.0020) [2024-06-15 21:18:00,738][1648984] Fps is (10 sec: 52427.4, 60 sec: 43144.3, 300 sec: 42765.0). Total num frames: 1522663424. Throughput: 0: 10786.1. Samples: 380723712. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:18:00,739][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 21:18:03,163][1652475] Updated weights for policy 0, policy_version 743496 (0.0012) [2024-06-15 21:18:03,386][1651340] Signal inference workers to stop experience collection... (38200 times) [2024-06-15 21:18:03,435][1652475] InferenceWorker_p0-w0: stopping experience collection (38200 times) [2024-06-15 21:18:03,671][1651340] Signal inference workers to resume experience collection... (38200 times) [2024-06-15 21:18:03,672][1652475] InferenceWorker_p0-w0: resuming experience collection (38200 times) [2024-06-15 21:18:04,162][1652475] Updated weights for policy 0, policy_version 743543 (0.0034) [2024-06-15 21:18:05,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1522794496. Throughput: 0: 10922.8. Samples: 380764160. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:18:05,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 21:18:08,553][1652475] Updated weights for policy 0, policy_version 743632 (0.0013) [2024-06-15 21:18:10,738][1648984] Fps is (10 sec: 42599.9, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 1523089408. Throughput: 0: 10810.5. Samples: 380825600. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:18:10,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 21:18:10,844][1652475] Updated weights for policy 0, policy_version 743712 (0.0013) [2024-06-15 21:18:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 1523220480. Throughput: 0: 10729.3. Samples: 380891136. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:18:15,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 21:18:16,149][1652475] Updated weights for policy 0, policy_version 743779 (0.0013) [2024-06-15 21:18:19,997][1652475] Updated weights for policy 0, policy_version 743827 (0.0013) [2024-06-15 21:18:20,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42052.5, 300 sec: 42542.9). Total num frames: 1523417088. Throughput: 0: 10797.5. Samples: 380920320. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:18:20,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 21:18:22,293][1652475] Updated weights for policy 0, policy_version 743907 (0.0011) [2024-06-15 21:18:24,387][1652475] Updated weights for policy 0, policy_version 743984 (0.0012) [2024-06-15 21:18:25,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43695.4, 300 sec: 42655.9). Total num frames: 1523712000. Throughput: 0: 10353.8. Samples: 380966912. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:18:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 21:18:29,022][1652475] Updated weights for policy 0, policy_version 744032 (0.0012) [2024-06-15 21:18:30,742][1648984] Fps is (10 sec: 42578.7, 60 sec: 43687.3, 300 sec: 42653.3). Total num frames: 1523843072. Throughput: 0: 10682.6. Samples: 381042176. Policy #0 lag: (min: 127.0, avg: 202.1, max: 383.0) [2024-06-15 21:18:30,743][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:18:32,931][1652475] Updated weights for policy 0, policy_version 744112 (0.0022) [2024-06-15 21:18:34,971][1652475] Updated weights for policy 0, policy_version 744190 (0.0010) [2024-06-15 21:18:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 42765.0). Total num frames: 1524105216. Throughput: 0: 10638.2. Samples: 381074432. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:18:35,740][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:18:37,479][1652475] Updated weights for policy 0, policy_version 744252 (0.0120) [2024-06-15 21:18:40,738][1648984] Fps is (10 sec: 49173.8, 60 sec: 43144.4, 300 sec: 42542.8). Total num frames: 1524334592. Throughput: 0: 10422.0. Samples: 381136896. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:18:40,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:18:43,702][1652475] Updated weights for policy 0, policy_version 744322 (0.0093) [2024-06-15 21:18:45,043][1652475] Updated weights for policy 0, policy_version 744375 (0.0011) [2024-06-15 21:18:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1524498432. Throughput: 0: 10535.9. Samples: 381197824. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:18:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:18:46,897][1652475] Updated weights for policy 0, policy_version 744416 (0.0012) [2024-06-15 21:18:48,192][1652475] Updated weights for policy 0, policy_version 744450 (0.0025) [2024-06-15 21:18:48,454][1651340] Signal inference workers to stop experience collection... (38250 times) [2024-06-15 21:18:48,508][1652475] InferenceWorker_p0-w0: stopping experience collection (38250 times) [2024-06-15 21:18:48,685][1651340] Signal inference workers to resume experience collection... (38250 times) [2024-06-15 21:18:48,688][1652475] InferenceWorker_p0-w0: resuming experience collection (38250 times) [2024-06-15 21:18:50,738][1648984] Fps is (10 sec: 42599.5, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1524760576. Throughput: 0: 10444.8. Samples: 381234176. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:18:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:18:51,611][1652475] Updated weights for policy 0, policy_version 744536 (0.0016) [2024-06-15 21:18:55,013][1652475] Updated weights for policy 0, policy_version 744597 (0.0012) [2024-06-15 21:18:55,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 44236.8, 300 sec: 42987.3). Total num frames: 1524989952. Throughput: 0: 10558.6. Samples: 381300736. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:18:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:18:55,863][1652475] Updated weights for policy 0, policy_version 744638 (0.0010) [2024-06-15 21:18:55,891][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000744640_1525022720.pth... [2024-06-15 21:18:55,965][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000739584_1514668032.pth [2024-06-15 21:18:59,813][1652475] Updated weights for policy 0, policy_version 744695 (0.0013) [2024-06-15 21:19:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.5, 300 sec: 42542.9). Total num frames: 1525186560. Throughput: 0: 10683.7. Samples: 381371904. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:00,740][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:19:00,910][1652475] Updated weights for policy 0, policy_version 744724 (0.0033) [2024-06-15 21:19:03,203][1652475] Updated weights for policy 0, policy_version 744800 (0.0011) [2024-06-15 21:19:05,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1525415936. Throughput: 0: 10672.3. Samples: 381400576. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:05,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:19:06,257][1652475] Updated weights for policy 0, policy_version 744835 (0.0014) [2024-06-15 21:19:10,578][1652475] Updated weights for policy 0, policy_version 744912 (0.0014) [2024-06-15 21:19:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 1525579776. Throughput: 0: 11161.6. Samples: 381469184. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:19:11,550][1652475] Updated weights for policy 0, policy_version 744960 (0.0012) [2024-06-15 21:19:13,768][1652475] Updated weights for policy 0, policy_version 745015 (0.0013) [2024-06-15 21:19:15,726][1652475] Updated weights for policy 0, policy_version 745087 (0.0012) [2024-06-15 21:19:15,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 45329.1, 300 sec: 43098.3). Total num frames: 1525940224. Throughput: 0: 10923.8. Samples: 381533696. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:19:20,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1526071296. Throughput: 0: 10843.0. Samples: 381562368. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:19:21,996][1652475] Updated weights for policy 0, policy_version 745155 (0.0014) [2024-06-15 21:19:23,253][1652475] Updated weights for policy 0, policy_version 745215 (0.0012) [2024-06-15 21:19:25,640][1652475] Updated weights for policy 0, policy_version 745250 (0.0019) [2024-06-15 21:19:25,738][1648984] Fps is (10 sec: 32767.2, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 1526267904. Throughput: 0: 10979.6. Samples: 381630976. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:25,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 21:19:29,955][1652475] Updated weights for policy 0, policy_version 745344 (0.0013) [2024-06-15 21:19:30,747][1648984] Fps is (10 sec: 42556.7, 60 sec: 44233.0, 300 sec: 43207.9). Total num frames: 1526497280. Throughput: 0: 10965.8. Samples: 381691392. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:30,748][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:19:31,345][1652475] Updated weights for policy 0, policy_version 745402 (0.0027) [2024-06-15 21:19:34,377][1652475] Updated weights for policy 0, policy_version 745456 (0.0010) [2024-06-15 21:19:35,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1526726656. Throughput: 0: 10934.0. Samples: 381726208. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:19:36,838][1651340] Signal inference workers to stop experience collection... (38300 times) [2024-06-15 21:19:36,901][1652475] InferenceWorker_p0-w0: stopping experience collection (38300 times) [2024-06-15 21:19:37,025][1651340] Signal inference workers to resume experience collection... (38300 times) [2024-06-15 21:19:37,026][1652475] InferenceWorker_p0-w0: resuming experience collection (38300 times) [2024-06-15 21:19:37,528][1652475] Updated weights for policy 0, policy_version 745522 (0.0014) [2024-06-15 21:19:40,738][1648984] Fps is (10 sec: 36080.4, 60 sec: 42052.5, 300 sec: 42653.9). Total num frames: 1526857728. Throughput: 0: 10956.8. Samples: 381793792. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:19:42,534][1652475] Updated weights for policy 0, policy_version 745568 (0.0012) [2024-06-15 21:19:44,800][1652475] Updated weights for policy 0, policy_version 745655 (0.0010) [2024-06-15 21:19:45,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1527152640. Throughput: 0: 10695.1. Samples: 381853184. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:19:45,861][1652475] Updated weights for policy 0, policy_version 745696 (0.0010) [2024-06-15 21:19:48,753][1652475] Updated weights for policy 0, policy_version 745767 (0.0013) [2024-06-15 21:19:50,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1527382016. Throughput: 0: 10695.1. Samples: 381881856. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:50,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 21:19:54,349][1652475] Updated weights for policy 0, policy_version 745796 (0.0014) [2024-06-15 21:19:55,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42052.3, 300 sec: 42653.9). Total num frames: 1527513088. Throughput: 0: 10843.0. Samples: 381957120. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:19:55,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 21:19:56,242][1652475] Updated weights for policy 0, policy_version 745872 (0.0012) [2024-06-15 21:19:57,442][1652475] Updated weights for policy 0, policy_version 745919 (0.0012) [2024-06-15 21:19:58,761][1652475] Updated weights for policy 0, policy_version 745984 (0.0012) [2024-06-15 21:20:00,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 43209.3). Total num frames: 1527840768. Throughput: 0: 10626.8. Samples: 382011904. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:20:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:20:01,112][1652475] Updated weights for policy 0, policy_version 746041 (0.0014) [2024-06-15 21:20:05,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 42765.0). Total num frames: 1527906304. Throughput: 0: 10854.4. Samples: 382050816. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:20:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:20:06,860][1652475] Updated weights for policy 0, policy_version 746096 (0.0014) [2024-06-15 21:20:08,555][1652475] Updated weights for policy 0, policy_version 746176 (0.0012) [2024-06-15 21:20:10,540][1652475] Updated weights for policy 0, policy_version 746237 (0.0016) [2024-06-15 21:20:10,740][1648984] Fps is (10 sec: 45874.6, 60 sec: 45329.0, 300 sec: 43542.6). Total num frames: 1528299520. Throughput: 0: 10854.4. Samples: 382119424. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:20:10,741][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:20:12,398][1652475] Updated weights for policy 0, policy_version 746294 (0.0015) [2024-06-15 21:20:15,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1528430592. Throughput: 0: 11175.4. Samples: 382194176. Policy #0 lag: (min: 13.0, avg: 96.0, max: 269.0) [2024-06-15 21:20:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:20:17,630][1652475] Updated weights for policy 0, policy_version 746323 (0.0013) [2024-06-15 21:20:19,588][1652475] Updated weights for policy 0, policy_version 746416 (0.0201) [2024-06-15 21:20:20,060][1651340] Signal inference workers to stop experience collection... (38350 times) [2024-06-15 21:20:20,113][1652475] InferenceWorker_p0-w0: stopping experience collection (38350 times) [2024-06-15 21:20:20,307][1651340] Signal inference workers to resume experience collection... (38350 times) [2024-06-15 21:20:20,308][1652475] InferenceWorker_p0-w0: resuming experience collection (38350 times) [2024-06-15 21:20:20,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 1528758272. Throughput: 0: 11195.7. Samples: 382230016. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:20:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:20:20,875][1652475] Updated weights for policy 0, policy_version 746467 (0.0119) [2024-06-15 21:20:23,361][1652475] Updated weights for policy 0, policy_version 746549 (0.0014) [2024-06-15 21:20:25,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 1528954880. Throughput: 0: 10956.8. Samples: 382286848. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:20:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:20:29,276][1652475] Updated weights for policy 0, policy_version 746581 (0.0011) [2024-06-15 21:20:30,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43697.8, 300 sec: 42765.0). Total num frames: 1529118720. Throughput: 0: 11389.1. Samples: 382365696. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:20:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:20:31,295][1652475] Updated weights for policy 0, policy_version 746672 (0.0102) [2024-06-15 21:20:32,886][1652475] Updated weights for policy 0, policy_version 746744 (0.0012) [2024-06-15 21:20:34,858][1652475] Updated weights for policy 0, policy_version 746800 (0.0011) [2024-06-15 21:20:35,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 45875.0, 300 sec: 43875.8). Total num frames: 1529479168. Throughput: 0: 11309.4. Samples: 382390784. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:20:35,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:20:40,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1529479168. Throughput: 0: 11286.8. Samples: 382465024. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:20:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:20:41,556][1652475] Updated weights for policy 0, policy_version 746864 (0.0012) [2024-06-15 21:20:44,052][1652475] Updated weights for policy 0, policy_version 746944 (0.0017) [2024-06-15 21:20:45,721][1652475] Updated weights for policy 0, policy_version 747004 (0.0015) [2024-06-15 21:20:45,778][1648984] Fps is (10 sec: 35900.7, 60 sec: 44752.7, 300 sec: 43425.5). Total num frames: 1529839616. Throughput: 0: 11253.9. Samples: 382518784. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:20:45,779][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:20:47,259][1652475] Updated weights for policy 0, policy_version 747066 (0.0011) [2024-06-15 21:20:50,746][1648984] Fps is (10 sec: 52384.4, 60 sec: 43684.5, 300 sec: 43097.3). Total num frames: 1530003456. Throughput: 0: 11148.1. Samples: 382552576. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:20:50,747][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:20:55,026][1652475] Updated weights for policy 0, policy_version 747156 (0.0015) [2024-06-15 21:20:55,738][1648984] Fps is (10 sec: 39481.5, 60 sec: 45329.1, 300 sec: 42987.2). Total num frames: 1530232832. Throughput: 0: 11047.8. Samples: 382616576. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:20:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:20:55,870][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000747200_1530265600.pth... [2024-06-15 21:20:55,927][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000742080_1519779840.pth [2024-06-15 21:20:58,235][1652475] Updated weights for policy 0, policy_version 747221 (0.0014) [2024-06-15 21:21:00,048][1652475] Updated weights for policy 0, policy_version 747299 (0.0014) [2024-06-15 21:21:00,738][1648984] Fps is (10 sec: 52473.6, 60 sec: 44782.9, 300 sec: 43545.0). Total num frames: 1530527744. Throughput: 0: 10820.3. Samples: 382681088. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:21:04,443][1652475] Updated weights for policy 0, policy_version 747345 (0.0011) [2024-06-15 21:21:05,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 45875.1, 300 sec: 43209.3). Total num frames: 1530658816. Throughput: 0: 10774.7. Samples: 382714880. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:05,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:21:06,202][1651340] Signal inference workers to stop experience collection... (38400 times) [2024-06-15 21:21:06,266][1652475] InferenceWorker_p0-w0: stopping experience collection (38400 times) [2024-06-15 21:21:06,414][1651340] Signal inference workers to resume experience collection... (38400 times) [2024-06-15 21:21:06,415][1652475] InferenceWorker_p0-w0: resuming experience collection (38400 times) [2024-06-15 21:21:06,417][1652475] Updated weights for policy 0, policy_version 747408 (0.0011) [2024-06-15 21:21:10,222][1652475] Updated weights for policy 0, policy_version 747472 (0.0011) [2024-06-15 21:21:10,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42598.5, 300 sec: 43320.4). Total num frames: 1530855424. Throughput: 0: 11036.5. Samples: 382783488. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:10,740][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:21:11,936][1652475] Updated weights for policy 0, policy_version 747536 (0.0092) [2024-06-15 21:21:12,903][1652475] Updated weights for policy 0, policy_version 747581 (0.0020) [2024-06-15 21:21:15,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 1531117568. Throughput: 0: 10740.6. Samples: 382849024. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:21:16,299][1652475] Updated weights for policy 0, policy_version 747644 (0.0102) [2024-06-15 21:21:18,631][1652475] Updated weights for policy 0, policy_version 747696 (0.0045) [2024-06-15 21:21:20,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42598.4, 300 sec: 43098.2). Total num frames: 1531314176. Throughput: 0: 10911.4. Samples: 382881792. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 21:21:21,631][1652475] Updated weights for policy 0, policy_version 747728 (0.0011) [2024-06-15 21:21:22,690][1652475] Updated weights for policy 0, policy_version 747776 (0.0013) [2024-06-15 21:21:25,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1531576320. Throughput: 0: 10740.6. Samples: 382948352. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:21:26,880][1652475] Updated weights for policy 0, policy_version 747856 (0.0010) [2024-06-15 21:21:30,082][1652475] Updated weights for policy 0, policy_version 747923 (0.0013) [2024-06-15 21:21:30,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 44783.0, 300 sec: 43431.5). Total num frames: 1531805696. Throughput: 0: 11000.8. Samples: 383013376. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:21:30,976][1652475] Updated weights for policy 0, policy_version 747967 (0.0018) [2024-06-15 21:21:34,365][1652475] Updated weights for policy 0, policy_version 748024 (0.0013) [2024-06-15 21:21:35,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.5, 300 sec: 43653.7). Total num frames: 1532002304. Throughput: 0: 11106.8. Samples: 383052288. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:21:35,854][1652475] Updated weights for policy 0, policy_version 748064 (0.0014) [2024-06-15 21:21:40,609][1652475] Updated weights for policy 0, policy_version 748144 (0.0034) [2024-06-15 21:21:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 1532198912. Throughput: 0: 11093.3. Samples: 383115776. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:21:42,391][1652475] Updated weights for policy 0, policy_version 748199 (0.0012) [2024-06-15 21:21:45,459][1652475] Updated weights for policy 0, policy_version 748242 (0.0012) [2024-06-15 21:21:45,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43173.7, 300 sec: 43320.4). Total num frames: 1532428288. Throughput: 0: 11070.6. Samples: 383179264. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:21:47,618][1652475] Updated weights for policy 0, policy_version 748324 (0.0174) [2024-06-15 21:21:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43696.8, 300 sec: 43542.6). Total num frames: 1532624896. Throughput: 0: 10911.3. Samples: 383205888. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:21:54,471][1652475] Updated weights for policy 0, policy_version 748407 (0.0141) [2024-06-15 21:21:54,883][1651340] Signal inference workers to stop experience collection... (38450 times) [2024-06-15 21:21:54,904][1652475] InferenceWorker_p0-w0: stopping experience collection (38450 times) [2024-06-15 21:21:55,136][1651340] Signal inference workers to resume experience collection... (38450 times) [2024-06-15 21:21:55,136][1652475] InferenceWorker_p0-w0: resuming experience collection (38450 times) [2024-06-15 21:21:55,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 43144.4, 300 sec: 43209.3). Total num frames: 1532821504. Throughput: 0: 10854.3. Samples: 383271936. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:21:55,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:21:56,087][1652475] Updated weights for policy 0, policy_version 748469 (0.0012) [2024-06-15 21:21:57,838][1652475] Updated weights for policy 0, policy_version 748514 (0.0011) [2024-06-15 21:21:59,311][1652475] Updated weights for policy 0, policy_version 748580 (0.0010) [2024-06-15 21:22:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 43542.5). Total num frames: 1533149184. Throughput: 0: 10626.8. Samples: 383327232. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:22:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:22:05,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 1533149184. Throughput: 0: 10695.1. Samples: 383363072. Policy #0 lag: (min: 8.0, avg: 57.3, max: 228.0) [2024-06-15 21:22:05,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:22:07,807][1652475] Updated weights for policy 0, policy_version 748641 (0.0013) [2024-06-15 21:22:10,031][1652475] Updated weights for policy 0, policy_version 748736 (0.0011) [2024-06-15 21:22:10,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1533444096. Throughput: 0: 10604.1. Samples: 383425536. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:22:10,741][1648984] Avg episode reward: [(0, '-0.190')] [2024-06-15 21:22:12,335][1652475] Updated weights for policy 0, policy_version 748816 (0.0011) [2024-06-15 21:22:15,786][1648984] Fps is (10 sec: 52177.3, 60 sec: 42564.1, 300 sec: 43313.4). Total num frames: 1533673472. Throughput: 0: 10229.0. Samples: 383474176. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:22:15,787][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:22:20,505][1652475] Updated weights for policy 0, policy_version 748880 (0.0014) [2024-06-15 21:22:20,738][1648984] Fps is (10 sec: 26214.3, 60 sec: 39867.7, 300 sec: 42766.0). Total num frames: 1533706240. Throughput: 0: 10160.3. Samples: 383509504. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:22:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:22:22,249][1652475] Updated weights for policy 0, policy_version 748944 (0.0037) [2024-06-15 21:22:24,222][1652475] Updated weights for policy 0, policy_version 749008 (0.0010) [2024-06-15 21:22:25,738][1648984] Fps is (10 sec: 42805.6, 60 sec: 42052.4, 300 sec: 43653.7). Total num frames: 1534099456. Throughput: 0: 10137.6. Samples: 383571968. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:22:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:22:26,638][1652475] Updated weights for policy 0, policy_version 749112 (0.0014) [2024-06-15 21:22:30,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 39867.7, 300 sec: 42653.9). Total num frames: 1534197760. Throughput: 0: 10092.1. Samples: 383633408. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:22:30,740][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:22:33,644][1652475] Updated weights for policy 0, policy_version 749173 (0.0012) [2024-06-15 21:22:35,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 40413.9, 300 sec: 42987.2). Total num frames: 1534427136. Throughput: 0: 10467.6. Samples: 383676928. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:22:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:22:35,942][1652475] Updated weights for policy 0, policy_version 749250 (0.0013) [2024-06-15 21:22:36,943][1651340] Signal inference workers to stop experience collection... (38500 times) [2024-06-15 21:22:37,007][1652475] InferenceWorker_p0-w0: stopping experience collection (38500 times) [2024-06-15 21:22:37,140][1651340] Signal inference workers to resume experience collection... (38500 times) [2024-06-15 21:22:37,141][1652475] InferenceWorker_p0-w0: resuming experience collection (38500 times) [2024-06-15 21:22:37,600][1652475] Updated weights for policy 0, policy_version 749333 (0.0014) [2024-06-15 21:22:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 42052.2, 300 sec: 43098.2). Total num frames: 1534722048. Throughput: 0: 10126.3. Samples: 383727616. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:22:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:22:44,998][1652475] Updated weights for policy 0, policy_version 749382 (0.0014) [2024-06-15 21:22:45,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 39321.6, 300 sec: 42876.1). Total num frames: 1534787584. Throughput: 0: 10661.0. Samples: 383806976. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:22:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:22:47,338][1652475] Updated weights for policy 0, policy_version 749459 (0.0193) [2024-06-15 21:22:49,112][1652475] Updated weights for policy 0, policy_version 749539 (0.0104) [2024-06-15 21:22:50,399][1652475] Updated weights for policy 0, policy_version 749601 (0.0011) [2024-06-15 21:22:50,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.5, 300 sec: 43653.6). Total num frames: 1535213568. Throughput: 0: 10285.5. Samples: 383825920. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:22:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:22:55,738][1648984] Fps is (10 sec: 45873.0, 60 sec: 40413.8, 300 sec: 42653.9). Total num frames: 1535246336. Throughput: 0: 10467.5. Samples: 383896576. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:22:55,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:22:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000749632_1535246336.pth... [2024-06-15 21:22:55,816][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000744640_1525022720.pth [2024-06-15 21:22:58,015][1652475] Updated weights for policy 0, policy_version 749648 (0.0014) [2024-06-15 21:22:59,828][1652475] Updated weights for policy 0, policy_version 749717 (0.0136) [2024-06-15 21:23:00,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 39321.6, 300 sec: 43098.2). Total num frames: 1535508480. Throughput: 0: 10809.1. Samples: 383960064. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:23:00,846][1652475] Updated weights for policy 0, policy_version 749764 (0.0010) [2024-06-15 21:23:02,449][1652475] Updated weights for policy 0, policy_version 749841 (0.0015) [2024-06-15 21:23:05,738][1648984] Fps is (10 sec: 52430.4, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1535770624. Throughput: 0: 10558.6. Samples: 383984640. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:23:10,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 39867.7, 300 sec: 42765.0). Total num frames: 1535836160. Throughput: 0: 10797.5. Samples: 384057856. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:23:11,063][1652475] Updated weights for policy 0, policy_version 749937 (0.0012) [2024-06-15 21:23:12,872][1652475] Updated weights for policy 0, policy_version 750032 (0.0015) [2024-06-15 21:23:14,873][1652475] Updated weights for policy 0, policy_version 750099 (0.0013) [2024-06-15 21:23:15,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43179.2, 300 sec: 43542.6). Total num frames: 1536262144. Throughput: 0: 10547.2. Samples: 384108032. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:23:15,873][1652475] Updated weights for policy 0, policy_version 750144 (0.0015) [2024-06-15 21:23:20,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1536294912. Throughput: 0: 10365.1. Samples: 384143360. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:23:22,764][1651340] Signal inference workers to stop experience collection... (38550 times) [2024-06-15 21:23:22,797][1652475] InferenceWorker_p0-w0: stopping experience collection (38550 times) [2024-06-15 21:23:22,944][1651340] Signal inference workers to resume experience collection... (38550 times) [2024-06-15 21:23:22,945][1652475] InferenceWorker_p0-w0: resuming experience collection (38550 times) [2024-06-15 21:23:24,120][1652475] Updated weights for policy 0, policy_version 750240 (0.0121) [2024-06-15 21:23:25,011][1652475] Updated weights for policy 0, policy_version 750273 (0.0026) [2024-06-15 21:23:25,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 42052.2, 300 sec: 43321.1). Total num frames: 1536622592. Throughput: 0: 10695.1. Samples: 384208896. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:23:26,295][1652475] Updated weights for policy 0, policy_version 750337 (0.0011) [2024-06-15 21:23:27,726][1652475] Updated weights for policy 0, policy_version 750394 (0.0026) [2024-06-15 21:23:30,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1536819200. Throughput: 0: 10319.6. Samples: 384271360. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:23:35,020][1652475] Updated weights for policy 0, policy_version 750452 (0.0020) [2024-06-15 21:23:35,740][1648984] Fps is (10 sec: 36044.5, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 1536983040. Throughput: 0: 10729.2. Samples: 384308736. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:35,742][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:23:36,343][1652475] Updated weights for policy 0, policy_version 750513 (0.0010) [2024-06-15 21:23:37,186][1652475] Updated weights for policy 0, policy_version 750547 (0.0011) [2024-06-15 21:23:40,502][1652475] Updated weights for policy 0, policy_version 750608 (0.0011) [2024-06-15 21:23:40,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 1537245184. Throughput: 0: 10535.9. Samples: 384370688. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:23:44,941][1652475] Updated weights for policy 0, policy_version 750658 (0.0015) [2024-06-15 21:23:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1537409024. Throughput: 0: 10683.7. Samples: 384440832. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:23:46,120][1652475] Updated weights for policy 0, policy_version 750720 (0.0122) [2024-06-15 21:23:47,402][1652475] Updated weights for policy 0, policy_version 750769 (0.0012) [2024-06-15 21:23:48,656][1652475] Updated weights for policy 0, policy_version 750818 (0.0014) [2024-06-15 21:23:50,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1537736704. Throughput: 0: 10865.8. Samples: 384473600. Policy #0 lag: (min: 15.0, avg: 70.4, max: 271.0) [2024-06-15 21:23:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:23:53,090][1652475] Updated weights for policy 0, policy_version 750880 (0.0014) [2024-06-15 21:23:55,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43691.0, 300 sec: 42987.2). Total num frames: 1537867776. Throughput: 0: 10683.7. Samples: 384538624. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:23:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:23:56,921][1652475] Updated weights for policy 0, policy_version 750946 (0.0012) [2024-06-15 21:23:58,935][1652475] Updated weights for policy 0, policy_version 751012 (0.0166) [2024-06-15 21:24:00,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 1538195456. Throughput: 0: 10990.9. Samples: 384602624. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 21:24:01,118][1652475] Updated weights for policy 0, policy_version 751098 (0.0014) [2024-06-15 21:24:05,422][1651340] Signal inference workers to stop experience collection... (38600 times) [2024-06-15 21:24:05,522][1652475] Updated weights for policy 0, policy_version 751142 (0.0011) [2024-06-15 21:24:05,551][1652475] InferenceWorker_p0-w0: stopping experience collection (38600 times) [2024-06-15 21:24:05,687][1651340] Signal inference workers to resume experience collection... (38600 times) [2024-06-15 21:24:05,687][1652475] InferenceWorker_p0-w0: resuming experience collection (38600 times) [2024-06-15 21:24:05,738][1648984] Fps is (10 sec: 49151.0, 60 sec: 43144.4, 300 sec: 43320.4). Total num frames: 1538359296. Throughput: 0: 10968.2. Samples: 384636928. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:05,749][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:24:08,540][1652475] Updated weights for policy 0, policy_version 751201 (0.0128) [2024-06-15 21:24:10,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 44782.9, 300 sec: 42653.9). Total num frames: 1538523136. Throughput: 0: 10945.4. Samples: 384701440. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:24:11,719][1652475] Updated weights for policy 0, policy_version 751280 (0.0011) [2024-06-15 21:24:13,776][1652475] Updated weights for policy 0, policy_version 751328 (0.0011) [2024-06-15 21:24:15,765][1648984] Fps is (10 sec: 42482.4, 60 sec: 42033.0, 300 sec: 43094.2). Total num frames: 1538785280. Throughput: 0: 10836.4. Samples: 384759296. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:15,766][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:24:17,509][1652475] Updated weights for policy 0, policy_version 751414 (0.0043) [2024-06-15 21:24:20,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1538916352. Throughput: 0: 10729.2. Samples: 384791552. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:20,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:24:22,360][1652475] Updated weights for policy 0, policy_version 751444 (0.0012) [2024-06-15 21:24:23,669][1652475] Updated weights for policy 0, policy_version 751504 (0.0011) [2024-06-15 21:24:25,738][1648984] Fps is (10 sec: 39429.7, 60 sec: 42598.3, 300 sec: 42988.6). Total num frames: 1539178496. Throughput: 0: 10706.5. Samples: 384852480. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:25,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:24:27,998][1652475] Updated weights for policy 0, policy_version 751570 (0.0118) [2024-06-15 21:24:29,997][1652475] Updated weights for policy 0, policy_version 751653 (0.0013) [2024-06-15 21:24:30,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1539440640. Throughput: 0: 10467.6. Samples: 384911872. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:24:33,519][1652475] Updated weights for policy 0, policy_version 751704 (0.0013) [2024-06-15 21:24:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1539604480. Throughput: 0: 10569.9. Samples: 384949248. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:35,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:24:35,812][1652475] Updated weights for policy 0, policy_version 751763 (0.0012) [2024-06-15 21:24:39,618][1652475] Updated weights for policy 0, policy_version 751810 (0.0015) [2024-06-15 21:24:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42598.5, 300 sec: 42876.1). Total num frames: 1539801088. Throughput: 0: 10626.8. Samples: 385016832. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:24:41,419][1652475] Updated weights for policy 0, policy_version 751888 (0.0012) [2024-06-15 21:24:42,427][1652475] Updated weights for policy 0, policy_version 751936 (0.0013) [2024-06-15 21:24:45,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 1540063232. Throughput: 0: 10604.1. Samples: 385079808. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:24:45,883][1652475] Updated weights for policy 0, policy_version 751993 (0.0011) [2024-06-15 21:24:49,268][1652475] Updated weights for policy 0, policy_version 752048 (0.0014) [2024-06-15 21:24:50,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1540227072. Throughput: 0: 10626.9. Samples: 385115136. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:50,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:24:52,221][1652475] Updated weights for policy 0, policy_version 752097 (0.0011) [2024-06-15 21:24:53,027][1651340] Signal inference workers to stop experience collection... (38650 times) [2024-06-15 21:24:53,075][1652475] InferenceWorker_p0-w0: stopping experience collection (38650 times) [2024-06-15 21:24:53,230][1651340] Signal inference workers to resume experience collection... (38650 times) [2024-06-15 21:24:53,231][1652475] InferenceWorker_p0-w0: resuming experience collection (38650 times) [2024-06-15 21:24:53,958][1652475] Updated weights for policy 0, policy_version 752176 (0.0012) [2024-06-15 21:24:55,738][1648984] Fps is (10 sec: 42596.2, 60 sec: 43690.3, 300 sec: 42876.0). Total num frames: 1540489216. Throughput: 0: 10478.8. Samples: 385172992. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:24:55,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:24:55,765][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000752192_1540489216.pth... [2024-06-15 21:24:55,817][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000747200_1530265600.pth [2024-06-15 21:24:57,018][1652475] Updated weights for policy 0, policy_version 752213 (0.0011) [2024-06-15 21:25:00,497][1652475] Updated weights for policy 0, policy_version 752289 (0.0017) [2024-06-15 21:25:00,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 42052.3, 300 sec: 43431.5). Total num frames: 1540718592. Throughput: 0: 10804.1. Samples: 385245184. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:25:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:25:04,578][1652475] Updated weights for policy 0, policy_version 752372 (0.0012) [2024-06-15 21:25:05,738][1648984] Fps is (10 sec: 45877.5, 60 sec: 43144.7, 300 sec: 42876.1). Total num frames: 1540947968. Throughput: 0: 10888.6. Samples: 385281536. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:25:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:25:06,116][1652475] Updated weights for policy 0, policy_version 752432 (0.0137) [2024-06-15 21:25:09,772][1652475] Updated weights for policy 0, policy_version 752480 (0.0032) [2024-06-15 21:25:10,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1541144576. Throughput: 0: 10797.5. Samples: 385338368. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:25:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:25:11,859][1652475] Updated weights for policy 0, policy_version 752531 (0.0014) [2024-06-15 21:25:15,738][1648984] Fps is (10 sec: 32767.4, 60 sec: 41525.1, 300 sec: 42431.8). Total num frames: 1541275648. Throughput: 0: 10865.7. Samples: 385400832. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:25:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:25:16,999][1652475] Updated weights for policy 0, policy_version 752599 (0.0015) [2024-06-15 21:25:18,820][1652475] Updated weights for policy 0, policy_version 752674 (0.0014) [2024-06-15 21:25:20,743][1648984] Fps is (10 sec: 39302.0, 60 sec: 43687.2, 300 sec: 42653.2). Total num frames: 1541537792. Throughput: 0: 10648.4. Samples: 385428480. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:25:20,743][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:25:24,122][1652475] Updated weights for policy 0, policy_version 752752 (0.0110) [2024-06-15 21:25:25,448][1652475] Updated weights for policy 0, policy_version 752828 (0.0013) [2024-06-15 21:25:25,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1541799936. Throughput: 0: 10604.1. Samples: 385494016. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:25:25,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:25:30,147][1652475] Updated weights for policy 0, policy_version 752912 (0.0127) [2024-06-15 21:25:30,738][1648984] Fps is (10 sec: 45898.5, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1541996544. Throughput: 0: 10535.8. Samples: 385553920. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:25:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:25:35,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 1542094848. Throughput: 0: 10433.4. Samples: 385584640. Policy #0 lag: (min: 15.0, avg: 128.4, max: 271.0) [2024-06-15 21:25:35,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 21:25:36,220][1652475] Updated weights for policy 0, policy_version 753008 (0.0017) [2024-06-15 21:25:38,426][1652475] Updated weights for policy 0, policy_version 753072 (0.0012) [2024-06-15 21:25:40,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42052.3, 300 sec: 42326.5). Total num frames: 1542324224. Throughput: 0: 10638.3. Samples: 385651712. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:25:40,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 21:25:41,573][1651340] Signal inference workers to stop experience collection... (38700 times) [2024-06-15 21:25:41,618][1652475] InferenceWorker_p0-w0: stopping experience collection (38700 times) [2024-06-15 21:25:41,822][1651340] Signal inference workers to resume experience collection... (38700 times) [2024-06-15 21:25:41,823][1652475] InferenceWorker_p0-w0: resuming experience collection (38700 times) [2024-06-15 21:25:43,371][1652475] Updated weights for policy 0, policy_version 753170 (0.0174) [2024-06-15 21:25:45,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 42052.2, 300 sec: 42655.2). Total num frames: 1542586368. Throughput: 0: 10228.6. Samples: 385705472. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:25:45,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:25:47,886][1652475] Updated weights for policy 0, policy_version 753250 (0.0014) [2024-06-15 21:25:50,407][1652475] Updated weights for policy 0, policy_version 753288 (0.0011) [2024-06-15 21:25:50,737][1648984] Fps is (10 sec: 42599.0, 60 sec: 42052.4, 300 sec: 42431.8). Total num frames: 1542750208. Throughput: 0: 10126.2. Samples: 385737216. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:25:50,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:25:54,047][1652475] Updated weights for policy 0, policy_version 753360 (0.0013) [2024-06-15 21:25:55,133][1652475] Updated weights for policy 0, policy_version 753402 (0.0012) [2024-06-15 21:25:55,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.5, 300 sec: 42209.6). Total num frames: 1542979584. Throughput: 0: 10444.8. Samples: 385808384. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:25:55,740][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:25:57,926][1652475] Updated weights for policy 0, policy_version 753472 (0.0011) [2024-06-15 21:25:59,937][1652475] Updated weights for policy 0, policy_version 753534 (0.0157) [2024-06-15 21:26:00,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 42052.2, 300 sec: 42654.0). Total num frames: 1543241728. Throughput: 0: 10444.8. Samples: 385870848. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:26:04,637][1652475] Updated weights for policy 0, policy_version 753604 (0.0011) [2024-06-15 21:26:05,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1543471104. Throughput: 0: 10593.9. Samples: 385905152. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:26:09,809][1652475] Updated weights for policy 0, policy_version 753681 (0.0135) [2024-06-15 21:26:10,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 40960.0, 300 sec: 42320.7). Total num frames: 1543602176. Throughput: 0: 10740.6. Samples: 385977344. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:26:11,292][1652475] Updated weights for policy 0, policy_version 753744 (0.0039) [2024-06-15 21:26:14,025][1652475] Updated weights for policy 0, policy_version 753813 (0.0011) [2024-06-15 21:26:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1543897088. Throughput: 0: 10604.1. Samples: 386031104. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:26:17,094][1652475] Updated weights for policy 0, policy_version 753858 (0.0011) [2024-06-15 21:26:20,738][1648984] Fps is (10 sec: 42597.3, 60 sec: 41509.4, 300 sec: 42209.6). Total num frames: 1544028160. Throughput: 0: 10695.1. Samples: 386065920. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:26:21,857][1652475] Updated weights for policy 0, policy_version 753925 (0.0093) [2024-06-15 21:26:23,822][1652475] Updated weights for policy 0, policy_version 754004 (0.0213) [2024-06-15 21:26:25,549][1652475] Updated weights for policy 0, policy_version 754051 (0.0022) [2024-06-15 21:26:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 1544290304. Throughput: 0: 10638.2. Samples: 386130432. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:26:26,906][1652475] Updated weights for policy 0, policy_version 754111 (0.0026) [2024-06-15 21:26:30,327][1651340] Signal inference workers to stop experience collection... (38750 times) [2024-06-15 21:26:30,369][1652475] InferenceWorker_p0-w0: stopping experience collection (38750 times) [2024-06-15 21:26:30,525][1651340] Signal inference workers to resume experience collection... (38750 times) [2024-06-15 21:26:30,526][1652475] InferenceWorker_p0-w0: resuming experience collection (38750 times) [2024-06-15 21:26:30,738][1648984] Fps is (10 sec: 42599.7, 60 sec: 40960.0, 300 sec: 42209.6). Total num frames: 1544454144. Throughput: 0: 10956.8. Samples: 386198528. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:26:31,206][1652475] Updated weights for policy 0, policy_version 754160 (0.0012) [2024-06-15 21:26:34,271][1652475] Updated weights for policy 0, policy_version 754224 (0.0028) [2024-06-15 21:26:35,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 43690.6, 300 sec: 42431.8). Total num frames: 1544716288. Throughput: 0: 11002.2. Samples: 386232320. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:35,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:26:36,381][1652475] Updated weights for policy 0, policy_version 754275 (0.0012) [2024-06-15 21:26:38,547][1652475] Updated weights for policy 0, policy_version 754364 (0.0011) [2024-06-15 21:26:40,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 1544945664. Throughput: 0: 10626.8. Samples: 386286592. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:26:45,498][1652475] Updated weights for policy 0, policy_version 754419 (0.0011) [2024-06-15 21:26:45,738][1648984] Fps is (10 sec: 36045.8, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1545076736. Throughput: 0: 10706.5. Samples: 386352640. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:26:47,043][1652475] Updated weights for policy 0, policy_version 754496 (0.0011) [2024-06-15 21:26:49,279][1652475] Updated weights for policy 0, policy_version 754560 (0.0033) [2024-06-15 21:26:50,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 44236.6, 300 sec: 42654.0). Total num frames: 1545404416. Throughput: 0: 10683.7. Samples: 386385920. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:26:50,991][1652475] Updated weights for policy 0, policy_version 754617 (0.0012) [2024-06-15 21:26:55,738][1648984] Fps is (10 sec: 39319.8, 60 sec: 41505.9, 300 sec: 41765.3). Total num frames: 1545469952. Throughput: 0: 10501.6. Samples: 386449920. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:26:55,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 21:26:55,748][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000754624_1545469952.pth... [2024-06-15 21:26:56,024][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000749632_1535246336.pth [2024-06-15 21:26:57,679][1652475] Updated weights for policy 0, policy_version 754692 (0.0012) [2024-06-15 21:26:58,935][1652475] Updated weights for policy 0, policy_version 754752 (0.0015) [2024-06-15 21:27:00,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1545797632. Throughput: 0: 10672.4. Samples: 386511360. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:27:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:27:01,216][1652475] Updated weights for policy 0, policy_version 754809 (0.0011) [2024-06-15 21:27:04,225][1652475] Updated weights for policy 0, policy_version 754864 (0.0036) [2024-06-15 21:27:05,738][1648984] Fps is (10 sec: 52430.4, 60 sec: 42052.2, 300 sec: 42542.9). Total num frames: 1545994240. Throughput: 0: 10592.8. Samples: 386542592. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:27:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:27:08,624][1652475] Updated weights for policy 0, policy_version 754896 (0.0012) [2024-06-15 21:27:10,479][1652475] Updated weights for policy 0, policy_version 754963 (0.0147) [2024-06-15 21:27:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.6, 300 sec: 42438.7). Total num frames: 1546190848. Throughput: 0: 10683.7. Samples: 386611200. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:27:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:27:13,140][1652475] Updated weights for policy 0, policy_version 755040 (0.0012) [2024-06-15 21:27:15,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1546420224. Throughput: 0: 10513.1. Samples: 386671616. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:27:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:27:15,849][1651340] Signal inference workers to stop experience collection... (38800 times) [2024-06-15 21:27:15,922][1652475] InferenceWorker_p0-w0: stopping experience collection (38800 times) [2024-06-15 21:27:16,069][1651340] Signal inference workers to resume experience collection... (38800 times) [2024-06-15 21:27:16,070][1652475] InferenceWorker_p0-w0: resuming experience collection (38800 times) [2024-06-15 21:27:16,483][1652475] Updated weights for policy 0, policy_version 755120 (0.0151) [2024-06-15 21:27:20,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.3, 300 sec: 42098.5). Total num frames: 1546518528. Throughput: 0: 10513.1. Samples: 386705408. Policy #0 lag: (min: 52.0, avg: 128.4, max: 308.0) [2024-06-15 21:27:20,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:27:21,830][1652475] Updated weights for policy 0, policy_version 755189 (0.0011) [2024-06-15 21:27:23,461][1652475] Updated weights for policy 0, policy_version 755257 (0.0013) [2024-06-15 21:27:25,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1546878976. Throughput: 0: 10649.6. Samples: 386765824. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:27:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:27:27,733][1652475] Updated weights for policy 0, policy_version 755333 (0.0011) [2024-06-15 21:27:30,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43144.5, 300 sec: 42765.0). Total num frames: 1547042816. Throughput: 0: 10672.3. Samples: 386832896. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:27:30,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:27:33,045][1652475] Updated weights for policy 0, policy_version 755394 (0.0013) [2024-06-15 21:27:34,954][1652475] Updated weights for policy 0, policy_version 755473 (0.0013) [2024-06-15 21:27:35,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 42598.5, 300 sec: 42542.9). Total num frames: 1547272192. Throughput: 0: 10740.6. Samples: 386869248. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:27:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:27:37,455][1652475] Updated weights for policy 0, policy_version 755522 (0.0014) [2024-06-15 21:27:40,001][1652475] Updated weights for policy 0, policy_version 755601 (0.0018) [2024-06-15 21:27:40,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43144.4, 300 sec: 43209.3). Total num frames: 1547534336. Throughput: 0: 10547.3. Samples: 386924544. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:27:40,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:27:45,026][1652475] Updated weights for policy 0, policy_version 755653 (0.0015) [2024-06-15 21:27:45,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 42598.4, 300 sec: 42098.6). Total num frames: 1547632640. Throughput: 0: 10763.4. Samples: 386995712. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:27:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:27:46,487][1652475] Updated weights for policy 0, policy_version 755728 (0.0013) [2024-06-15 21:27:47,530][1652475] Updated weights for policy 0, policy_version 755776 (0.0014) [2024-06-15 21:27:50,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 41506.2, 300 sec: 42876.1). Total num frames: 1547894784. Throughput: 0: 10752.0. Samples: 387026432. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:27:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:27:51,440][1652475] Updated weights for policy 0, policy_version 755840 (0.0096) [2024-06-15 21:27:52,728][1652475] Updated weights for policy 0, policy_version 755904 (0.0010) [2024-06-15 21:27:55,738][1648984] Fps is (10 sec: 45874.5, 60 sec: 43690.9, 300 sec: 42653.9). Total num frames: 1548091392. Throughput: 0: 10729.2. Samples: 387094016. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:27:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:27:59,277][1652475] Updated weights for policy 0, policy_version 756031 (0.0117) [2024-06-15 21:28:00,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 42654.0). Total num frames: 1548353536. Throughput: 0: 10706.5. Samples: 387153408. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:00,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 21:28:02,290][1651340] Signal inference workers to stop experience collection... (38850 times) [2024-06-15 21:28:02,345][1652475] InferenceWorker_p0-w0: stopping experience collection (38850 times) [2024-06-15 21:28:02,562][1651340] Signal inference workers to resume experience collection... (38850 times) [2024-06-15 21:28:02,563][1652475] InferenceWorker_p0-w0: resuming experience collection (38850 times) [2024-06-15 21:28:03,094][1652475] Updated weights for policy 0, policy_version 756087 (0.0013) [2024-06-15 21:28:05,023][1652475] Updated weights for policy 0, policy_version 756132 (0.0028) [2024-06-15 21:28:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1548615680. Throughput: 0: 10717.9. Samples: 387187712. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:28:09,789][1652475] Updated weights for policy 0, policy_version 756208 (0.0016) [2024-06-15 21:28:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 1548746752. Throughput: 0: 10808.9. Samples: 387252224. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:10,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:28:12,199][1652475] Updated weights for policy 0, policy_version 756256 (0.0103) [2024-06-15 21:28:13,728][1652475] Updated weights for policy 0, policy_version 756304 (0.0015) [2024-06-15 21:28:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 1549008896. Throughput: 0: 10672.4. Samples: 387313152. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:28:17,408][1652475] Updated weights for policy 0, policy_version 756386 (0.0031) [2024-06-15 21:28:20,598][1652475] Updated weights for policy 0, policy_version 756433 (0.0048) [2024-06-15 21:28:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.9, 300 sec: 42542.9). Total num frames: 1549172736. Throughput: 0: 10547.2. Samples: 387343872. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:28:21,571][1652475] Updated weights for policy 0, policy_version 756479 (0.0013) [2024-06-15 21:28:25,740][1648984] Fps is (10 sec: 32759.8, 60 sec: 40958.3, 300 sec: 42431.4). Total num frames: 1549336576. Throughput: 0: 10922.1. Samples: 387416064. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:25,741][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:28:26,586][1652475] Updated weights for policy 0, policy_version 756548 (0.0011) [2024-06-15 21:28:28,946][1652475] Updated weights for policy 0, policy_version 756611 (0.0012) [2024-06-15 21:28:30,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1549664256. Throughput: 0: 10410.7. Samples: 387464192. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:28:32,511][1652475] Updated weights for policy 0, policy_version 756688 (0.0123) [2024-06-15 21:28:33,328][1652475] Updated weights for policy 0, policy_version 756735 (0.0022) [2024-06-15 21:28:35,738][1648984] Fps is (10 sec: 45886.6, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 1549795328. Throughput: 0: 10513.1. Samples: 387499520. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:28:39,144][1652475] Updated weights for policy 0, policy_version 756816 (0.0011) [2024-06-15 21:28:40,443][1652475] Updated weights for policy 0, policy_version 756864 (0.0012) [2024-06-15 21:28:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42052.4, 300 sec: 42876.1). Total num frames: 1550057472. Throughput: 0: 10547.2. Samples: 387568640. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:28:42,739][1652475] Updated weights for policy 0, policy_version 756928 (0.0012) [2024-06-15 21:28:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44782.9, 300 sec: 42653.9). Total num frames: 1550319616. Throughput: 0: 10615.5. Samples: 387631104. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:45,741][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:28:49,873][1652475] Updated weights for policy 0, policy_version 756993 (0.0014) [2024-06-15 21:28:50,750][1648984] Fps is (10 sec: 32727.3, 60 sec: 41497.6, 300 sec: 42430.0). Total num frames: 1550385152. Throughput: 0: 10635.3. Samples: 387666432. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:50,751][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:28:51,491][1651340] Signal inference workers to stop experience collection... (38900 times) [2024-06-15 21:28:51,538][1652475] InferenceWorker_p0-w0: stopping experience collection (38900 times) [2024-06-15 21:28:51,883][1651340] Signal inference workers to resume experience collection... (38900 times) [2024-06-15 21:28:51,888][1652475] InferenceWorker_p0-w0: resuming experience collection (38900 times) [2024-06-15 21:28:51,890][1652475] Updated weights for policy 0, policy_version 757072 (0.0012) [2024-06-15 21:28:54,434][1652475] Updated weights for policy 0, policy_version 757176 (0.0124) [2024-06-15 21:28:55,754][1648984] Fps is (10 sec: 39256.0, 60 sec: 43678.6, 300 sec: 42429.4). Total num frames: 1550712832. Throughput: 0: 10338.6. Samples: 387717632. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:28:55,755][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:28:55,760][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000757184_1550712832.pth... [2024-06-15 21:28:55,839][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000752192_1540489216.pth [2024-06-15 21:29:00,309][1652475] Updated weights for policy 0, policy_version 757232 (0.0014) [2024-06-15 21:29:00,738][1648984] Fps is (10 sec: 45931.9, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 1550843904. Throughput: 0: 10501.7. Samples: 387785728. Policy #0 lag: (min: 1.0, avg: 81.4, max: 257.0) [2024-06-15 21:29:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:29:03,530][1652475] Updated weights for policy 0, policy_version 757302 (0.0012) [2024-06-15 21:29:05,230][1652475] Updated weights for policy 0, policy_version 757376 (0.0084) [2024-06-15 21:29:05,738][1648984] Fps is (10 sec: 42669.5, 60 sec: 42052.2, 300 sec: 42765.0). Total num frames: 1551138816. Throughput: 0: 10456.2. Samples: 387814400. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:29:10,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 41505.9, 300 sec: 42213.5). Total num frames: 1551237120. Throughput: 0: 10206.4. Samples: 387875328. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:10,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:29:14,300][1652475] Updated weights for policy 0, policy_version 757459 (0.0014) [2024-06-15 21:29:15,738][1648984] Fps is (10 sec: 26214.8, 60 sec: 39867.8, 300 sec: 42320.8). Total num frames: 1551400960. Throughput: 0: 10547.2. Samples: 387938816. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:29:15,894][1652475] Updated weights for policy 0, policy_version 757527 (0.0012) [2024-06-15 21:29:18,020][1652475] Updated weights for policy 0, policy_version 757616 (0.0013) [2024-06-15 21:29:20,180][1652475] Updated weights for policy 0, policy_version 757688 (0.0013) [2024-06-15 21:29:20,738][1648984] Fps is (10 sec: 52430.1, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1551761408. Throughput: 0: 10296.9. Samples: 387962880. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:29:25,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 40415.5, 300 sec: 41765.3). Total num frames: 1551761408. Throughput: 0: 10285.5. Samples: 388031488. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:25,741][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:29:27,177][1652475] Updated weights for policy 0, policy_version 757733 (0.0012) [2024-06-15 21:29:29,963][1652475] Updated weights for policy 0, policy_version 757840 (0.0097) [2024-06-15 21:29:30,739][1648984] Fps is (10 sec: 36040.8, 60 sec: 40959.2, 300 sec: 42431.6). Total num frames: 1552121856. Throughput: 0: 10137.3. Samples: 388087296. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:30,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:29:32,576][1652475] Updated weights for policy 0, policy_version 757904 (0.0013) [2024-06-15 21:29:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 1552285696. Throughput: 0: 10026.6. Samples: 388117504. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:29:39,278][1652475] Updated weights for policy 0, policy_version 757984 (0.0043) [2024-06-15 21:29:39,380][1651340] Signal inference workers to stop experience collection... (38950 times) [2024-06-15 21:29:39,451][1652475] InferenceWorker_p0-w0: stopping experience collection (38950 times) [2024-06-15 21:29:39,751][1651340] Signal inference workers to resume experience collection... (38950 times) [2024-06-15 21:29:39,764][1652475] InferenceWorker_p0-w0: resuming experience collection (38950 times) [2024-06-15 21:29:40,738][1648984] Fps is (10 sec: 29494.7, 60 sec: 39321.6, 300 sec: 41876.4). Total num frames: 1552416768. Throughput: 0: 10573.9. Samples: 388193280. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:29:41,896][1652475] Updated weights for policy 0, policy_version 758065 (0.0014) [2024-06-15 21:29:45,266][1652475] Updated weights for policy 0, policy_version 758176 (0.0013) [2024-06-15 21:29:45,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 40960.0, 300 sec: 42542.9). Total num frames: 1552777216. Throughput: 0: 10171.7. Samples: 388243456. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:29:50,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 40422.1, 300 sec: 41765.4). Total num frames: 1552809984. Throughput: 0: 10456.1. Samples: 388284928. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:50,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:29:52,246][1652475] Updated weights for policy 0, policy_version 758257 (0.0011) [2024-06-15 21:29:54,116][1652475] Updated weights for policy 0, policy_version 758340 (0.0015) [2024-06-15 21:29:55,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41517.7, 300 sec: 42320.7). Total num frames: 1553203200. Throughput: 0: 10433.5. Samples: 388344832. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:29:55,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:29:56,214][1652475] Updated weights for policy 0, policy_version 758401 (0.0014) [2024-06-15 21:29:57,397][1652475] Updated weights for policy 0, policy_version 758459 (0.0013) [2024-06-15 21:30:00,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 1553334272. Throughput: 0: 10569.9. Samples: 388414464. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:30:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:30:03,919][1652475] Updated weights for policy 0, policy_version 758521 (0.0013) [2024-06-15 21:30:05,403][1652475] Updated weights for policy 0, policy_version 758589 (0.0020) [2024-06-15 21:30:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 40960.0, 300 sec: 42209.6). Total num frames: 1553596416. Throughput: 0: 10831.6. Samples: 388450304. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:30:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:30:07,527][1652475] Updated weights for policy 0, policy_version 758643 (0.0011) [2024-06-15 21:30:09,007][1652475] Updated weights for policy 0, policy_version 758713 (0.0012) [2024-06-15 21:30:10,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.9, 300 sec: 42654.0). Total num frames: 1553858560. Throughput: 0: 10501.7. Samples: 388504064. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:30:10,741][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:30:15,419][1652475] Updated weights for policy 0, policy_version 758779 (0.0156) [2024-06-15 21:30:15,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 43144.3, 300 sec: 42210.3). Total num frames: 1553989632. Throughput: 0: 10911.5. Samples: 388578304. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:30:15,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:30:16,637][1652475] Updated weights for policy 0, policy_version 758832 (0.0012) [2024-06-15 21:30:18,898][1652475] Updated weights for policy 0, policy_version 758896 (0.0017) [2024-06-15 21:30:20,711][1652475] Updated weights for policy 0, policy_version 758928 (0.0010) [2024-06-15 21:30:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 1554284544. Throughput: 0: 10956.8. Samples: 388610560. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:30:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:30:20,817][1651340] Signal inference workers to stop experience collection... (39000 times) [2024-06-15 21:30:20,895][1652475] InferenceWorker_p0-w0: stopping experience collection (39000 times) [2024-06-15 21:30:21,090][1651340] Signal inference workers to resume experience collection... (39000 times) [2024-06-15 21:30:21,091][1652475] InferenceWorker_p0-w0: resuming experience collection (39000 times) [2024-06-15 21:30:25,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.6, 300 sec: 41987.5). Total num frames: 1554382848. Throughput: 0: 10956.8. Samples: 388686336. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:30:25,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:30:26,606][1652475] Updated weights for policy 0, policy_version 759011 (0.0015) [2024-06-15 21:30:27,626][1652475] Updated weights for policy 0, policy_version 759059 (0.0012) [2024-06-15 21:30:29,025][1652475] Updated weights for policy 0, policy_version 759120 (0.0015) [2024-06-15 21:30:30,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 44237.7, 300 sec: 42987.2). Total num frames: 1554776064. Throughput: 0: 11184.4. Samples: 388746752. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:30:30,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:30:32,341][1652475] Updated weights for policy 0, policy_version 759185 (0.0013) [2024-06-15 21:30:35,744][1648984] Fps is (10 sec: 52397.3, 60 sec: 43686.2, 300 sec: 42653.1). Total num frames: 1554907136. Throughput: 0: 11012.3. Samples: 388780544. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:30:35,744][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:30:37,944][1652475] Updated weights for policy 0, policy_version 759280 (0.0013) [2024-06-15 21:30:40,548][1652475] Updated weights for policy 0, policy_version 759360 (0.0011) [2024-06-15 21:30:40,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 45875.1, 300 sec: 42653.9). Total num frames: 1555169280. Throughput: 0: 11320.9. Samples: 388854272. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:30:40,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 21:30:41,818][1652475] Updated weights for policy 0, policy_version 759415 (0.0011) [2024-06-15 21:30:44,096][1652475] Updated weights for policy 0, policy_version 759457 (0.0074) [2024-06-15 21:30:45,738][1648984] Fps is (10 sec: 52460.8, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 1555431424. Throughput: 0: 11059.2. Samples: 388912128. Policy #0 lag: (min: 21.0, avg: 130.0, max: 227.0) [2024-06-15 21:30:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:30:49,061][1652475] Updated weights for policy 0, policy_version 759520 (0.0013) [2024-06-15 21:30:50,737][1648984] Fps is (10 sec: 39322.4, 60 sec: 45875.5, 300 sec: 42654.0). Total num frames: 1555562496. Throughput: 0: 11173.0. Samples: 388953088. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:30:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:30:50,852][1652475] Updated weights for policy 0, policy_version 759568 (0.0011) [2024-06-15 21:30:53,735][1652475] Updated weights for policy 0, policy_version 759635 (0.0013) [2024-06-15 21:30:55,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 44783.0, 300 sec: 42876.1). Total num frames: 1555890176. Throughput: 0: 11400.5. Samples: 389017088. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:30:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:30:56,085][1652475] Updated weights for policy 0, policy_version 759740 (0.0228) [2024-06-15 21:30:56,151][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000759744_1555955712.pth... [2024-06-15 21:30:56,232][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000754624_1545469952.pth [2024-06-15 21:30:56,239][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000759744_1555955712.pth [2024-06-15 21:31:00,754][1648984] Fps is (10 sec: 39256.4, 60 sec: 43678.6, 300 sec: 42318.3). Total num frames: 1555955712. Throughput: 0: 11180.3. Samples: 389081600. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:00,755][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:31:03,469][1652475] Updated weights for policy 0, policy_version 759824 (0.0096) [2024-06-15 21:31:04,758][1652475] Updated weights for policy 0, policy_version 759872 (0.0100) [2024-06-15 21:31:05,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1556217856. Throughput: 0: 11150.2. Samples: 389112320. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:31:06,877][1651340] Signal inference workers to stop experience collection... (39050 times) [2024-06-15 21:31:06,910][1652475] InferenceWorker_p0-w0: stopping experience collection (39050 times) [2024-06-15 21:31:07,156][1651340] Signal inference workers to resume experience collection... (39050 times) [2024-06-15 21:31:07,157][1652475] InferenceWorker_p0-w0: resuming experience collection (39050 times) [2024-06-15 21:31:08,033][1652475] Updated weights for policy 0, policy_version 759952 (0.0013) [2024-06-15 21:31:09,268][1652475] Updated weights for policy 0, policy_version 760000 (0.0013) [2024-06-15 21:31:10,738][1648984] Fps is (10 sec: 52515.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1556480000. Throughput: 0: 10638.2. Samples: 389165056. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:31:15,737][1648984] Fps is (10 sec: 32768.7, 60 sec: 42598.7, 300 sec: 42431.9). Total num frames: 1556545536. Throughput: 0: 10865.8. Samples: 389235712. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:31:16,129][1652475] Updated weights for policy 0, policy_version 760061 (0.0015) [2024-06-15 21:31:18,496][1652475] Updated weights for policy 0, policy_version 760130 (0.0013) [2024-06-15 21:31:20,556][1652475] Updated weights for policy 0, policy_version 760193 (0.0043) [2024-06-15 21:31:20,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1556873216. Throughput: 0: 10730.7. Samples: 389263360. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:31:25,738][1648984] Fps is (10 sec: 45873.0, 60 sec: 43690.5, 300 sec: 42542.8). Total num frames: 1557004288. Throughput: 0: 10285.5. Samples: 389317120. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:31:29,188][1652475] Updated weights for policy 0, policy_version 760272 (0.0180) [2024-06-15 21:31:30,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 39867.7, 300 sec: 42209.7). Total num frames: 1557168128. Throughput: 0: 10581.3. Samples: 389388288. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:31:32,098][1652475] Updated weights for policy 0, policy_version 760387 (0.0013) [2024-06-15 21:31:34,270][1652475] Updated weights for policy 0, policy_version 760480 (0.0012) [2024-06-15 21:31:35,738][1648984] Fps is (10 sec: 52430.4, 60 sec: 43695.1, 300 sec: 42653.9). Total num frames: 1557528576. Throughput: 0: 10046.6. Samples: 389405184. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:31:40,738][1648984] Fps is (10 sec: 36043.8, 60 sec: 39321.5, 300 sec: 42209.6). Total num frames: 1557528576. Throughput: 0: 10308.2. Samples: 389480960. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:40,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:31:41,624][1652475] Updated weights for policy 0, policy_version 760545 (0.0070) [2024-06-15 21:31:44,248][1652475] Updated weights for policy 0, policy_version 760656 (0.0124) [2024-06-15 21:31:45,342][1652475] Updated weights for policy 0, policy_version 760704 (0.0017) [2024-06-15 21:31:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 1557921792. Throughput: 0: 10004.7. Samples: 389531648. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:31:47,120][1652475] Updated weights for policy 0, policy_version 760762 (0.0013) [2024-06-15 21:31:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 41505.9, 300 sec: 42654.0). Total num frames: 1558052864. Throughput: 0: 10171.7. Samples: 389570048. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:31:53,181][1651340] Signal inference workers to stop experience collection... (39100 times) [2024-06-15 21:31:53,225][1652475] InferenceWorker_p0-w0: stopping experience collection (39100 times) [2024-06-15 21:31:53,463][1651340] Signal inference workers to resume experience collection... (39100 times) [2024-06-15 21:31:53,464][1652475] InferenceWorker_p0-w0: resuming experience collection (39100 times) [2024-06-15 21:31:53,569][1652475] Updated weights for policy 0, policy_version 760816 (0.0119) [2024-06-15 21:31:55,513][1652475] Updated weights for policy 0, policy_version 760895 (0.0013) [2024-06-15 21:31:55,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 40413.9, 300 sec: 42431.8). Total num frames: 1558315008. Throughput: 0: 10592.7. Samples: 389641728. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:31:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:31:58,062][1652475] Updated weights for policy 0, policy_version 760976 (0.0013) [2024-06-15 21:31:59,338][1652475] Updated weights for policy 0, policy_version 761024 (0.0113) [2024-06-15 21:32:00,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 43702.6, 300 sec: 42653.9). Total num frames: 1558577152. Throughput: 0: 10353.7. Samples: 389701632. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:32:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:32:05,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 42431.8). Total num frames: 1558708224. Throughput: 0: 10706.5. Samples: 389745152. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:32:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:32:06,132][1652475] Updated weights for policy 0, policy_version 761104 (0.0017) [2024-06-15 21:32:07,641][1652475] Updated weights for policy 0, policy_version 761156 (0.0014) [2024-06-15 21:32:09,297][1652475] Updated weights for policy 0, policy_version 761221 (0.0012) [2024-06-15 21:32:10,627][1652475] Updated weights for policy 0, policy_version 761276 (0.0011) [2024-06-15 21:32:10,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 1559101440. Throughput: 0: 10752.1. Samples: 389800960. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:32:10,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:32:15,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 42598.2, 300 sec: 42653.9). Total num frames: 1559101440. Throughput: 0: 10854.4. Samples: 389876736. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:32:15,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:32:16,820][1652475] Updated weights for policy 0, policy_version 761318 (0.0012) [2024-06-15 21:32:18,922][1652475] Updated weights for policy 0, policy_version 761376 (0.0012) [2024-06-15 21:32:20,661][1652475] Updated weights for policy 0, policy_version 761440 (0.0012) [2024-06-15 21:32:20,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 1559429120. Throughput: 0: 11252.6. Samples: 389911552. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:32:20,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:32:22,319][1652475] Updated weights for policy 0, policy_version 761505 (0.0011) [2024-06-15 21:32:25,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.9, 300 sec: 42653.9). Total num frames: 1559625728. Throughput: 0: 10877.2. Samples: 389970432. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:32:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 21:32:27,866][1652475] Updated weights for policy 0, policy_version 761552 (0.0012) [2024-06-15 21:32:28,791][1652475] Updated weights for policy 0, policy_version 761595 (0.0031) [2024-06-15 21:32:30,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 1559789568. Throughput: 0: 11343.7. Samples: 390042112. Policy #0 lag: (min: 6.0, avg: 81.8, max: 262.0) [2024-06-15 21:32:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:32:31,918][1652475] Updated weights for policy 0, policy_version 761664 (0.0010) [2024-06-15 21:32:33,768][1651340] Signal inference workers to stop experience collection... (39150 times) [2024-06-15 21:32:33,829][1652475] InferenceWorker_p0-w0: stopping experience collection (39150 times) [2024-06-15 21:32:33,837][1652475] Updated weights for policy 0, policy_version 761730 (0.0012) [2024-06-15 21:32:34,112][1651340] Signal inference workers to resume experience collection... (39150 times) [2024-06-15 21:32:34,113][1652475] InferenceWorker_p0-w0: resuming experience collection (39150 times) [2024-06-15 21:32:35,740][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1560150016. Throughput: 0: 10979.6. Samples: 390064128. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:32:35,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:32:40,738][1648984] Fps is (10 sec: 36043.6, 60 sec: 43690.6, 300 sec: 42431.7). Total num frames: 1560150016. Throughput: 0: 10751.9. Samples: 390125568. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:32:40,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:32:41,116][1652475] Updated weights for policy 0, policy_version 761808 (0.0059) [2024-06-15 21:32:44,466][1652475] Updated weights for policy 0, policy_version 761904 (0.0163) [2024-06-15 21:32:45,738][1648984] Fps is (10 sec: 29491.5, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 1560444928. Throughput: 0: 10661.0. Samples: 390181376. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:32:45,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:32:46,234][1652475] Updated weights for policy 0, policy_version 761968 (0.0012) [2024-06-15 21:32:48,691][1652475] Updated weights for policy 0, policy_version 762033 (0.0011) [2024-06-15 21:32:50,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 43690.9, 300 sec: 42654.0). Total num frames: 1560674304. Throughput: 0: 10296.9. Samples: 390208512. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:32:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:32:55,738][1648984] Fps is (10 sec: 26214.2, 60 sec: 39867.7, 300 sec: 41876.4). Total num frames: 1560707072. Throughput: 0: 10672.4. Samples: 390281216. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:32:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:32:56,275][1652475] Updated weights for policy 0, policy_version 762096 (0.0023) [2024-06-15 21:32:56,282][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000762096_1560772608.pth... [2024-06-15 21:32:56,416][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000757184_1550712832.pth [2024-06-15 21:32:59,387][1652475] Updated weights for policy 0, policy_version 762208 (0.0090) [2024-06-15 21:33:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1561067520. Throughput: 0: 9875.9. Samples: 390321152. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:00,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:33:01,768][1652475] Updated weights for policy 0, policy_version 762276 (0.0012) [2024-06-15 21:33:05,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1561198592. Throughput: 0: 9898.7. Samples: 390356992. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:33:08,126][1652475] Updated weights for policy 0, policy_version 762321 (0.0014) [2024-06-15 21:33:10,481][1652475] Updated weights for policy 0, policy_version 762402 (0.0024) [2024-06-15 21:33:10,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 38775.5, 300 sec: 42098.5). Total num frames: 1561427968. Throughput: 0: 10171.7. Samples: 390428160. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:33:12,383][1652475] Updated weights for policy 0, policy_version 762480 (0.0015) [2024-06-15 21:33:13,759][1652475] Updated weights for policy 0, policy_version 762554 (0.0011) [2024-06-15 21:33:15,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 1561722880. Throughput: 0: 9830.4. Samples: 390484480. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:33:20,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 39321.5, 300 sec: 42210.0). Total num frames: 1561788416. Throughput: 0: 10228.6. Samples: 390524416. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:20,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:33:22,040][1652475] Updated weights for policy 0, policy_version 762643 (0.0013) [2024-06-15 21:33:22,394][1651340] Signal inference workers to stop experience collection... (39200 times) [2024-06-15 21:33:22,482][1652475] InferenceWorker_p0-w0: stopping experience collection (39200 times) [2024-06-15 21:33:22,678][1651340] Signal inference workers to resume experience collection... (39200 times) [2024-06-15 21:33:22,679][1652475] InferenceWorker_p0-w0: resuming experience collection (39200 times) [2024-06-15 21:33:24,043][1652475] Updated weights for policy 0, policy_version 762694 (0.0026) [2024-06-15 21:33:25,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41506.0, 300 sec: 42209.6). Total num frames: 1562116096. Throughput: 0: 10149.0. Samples: 390582272. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:25,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:33:25,769][1652475] Updated weights for policy 0, policy_version 762768 (0.0011) [2024-06-15 21:33:30,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 40959.9, 300 sec: 42209.6). Total num frames: 1562247168. Throughput: 0: 10365.1. Samples: 390647808. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:33:33,463][1652475] Updated weights for policy 0, policy_version 762832 (0.0013) [2024-06-15 21:33:34,975][1652475] Updated weights for policy 0, policy_version 762903 (0.0013) [2024-06-15 21:33:35,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 38775.5, 300 sec: 42098.5). Total num frames: 1562476544. Throughput: 0: 10569.9. Samples: 390684160. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:33:36,369][1652475] Updated weights for policy 0, policy_version 762969 (0.0013) [2024-06-15 21:33:38,369][1652475] Updated weights for policy 0, policy_version 763061 (0.0011) [2024-06-15 21:33:40,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1562771456. Throughput: 0: 10148.9. Samples: 390737920. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:40,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:33:45,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 38775.4, 300 sec: 41989.2). Total num frames: 1562771456. Throughput: 0: 10888.5. Samples: 390811136. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:33:46,271][1652475] Updated weights for policy 0, policy_version 763092 (0.0016) [2024-06-15 21:33:47,868][1652475] Updated weights for policy 0, policy_version 763169 (0.0011) [2024-06-15 21:33:49,547][1652475] Updated weights for policy 0, policy_version 763234 (0.0011) [2024-06-15 21:33:50,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 42052.2, 300 sec: 42323.1). Total num frames: 1563197440. Throughput: 0: 10740.6. Samples: 390840320. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:50,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:33:51,757][1652475] Updated weights for policy 0, policy_version 763318 (0.0012) [2024-06-15 21:33:55,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43144.5, 300 sec: 42209.6). Total num frames: 1563295744. Throughput: 0: 10456.2. Samples: 390898688. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:33:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:33:57,541][1652475] Updated weights for policy 0, policy_version 763360 (0.0012) [2024-06-15 21:33:59,420][1652475] Updated weights for policy 0, policy_version 763440 (0.0015) [2024-06-15 21:34:00,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 41506.1, 300 sec: 42098.6). Total num frames: 1563557888. Throughput: 0: 10615.5. Samples: 390962176. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:34:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:34:02,715][1652475] Updated weights for policy 0, policy_version 763520 (0.0065) [2024-06-15 21:34:03,301][1651340] Signal inference workers to stop experience collection... (39250 times) [2024-06-15 21:34:03,450][1652475] InferenceWorker_p0-w0: stopping experience collection (39250 times) [2024-06-15 21:34:03,610][1651340] Signal inference workers to resume experience collection... (39250 times) [2024-06-15 21:34:03,611][1652475] InferenceWorker_p0-w0: resuming experience collection (39250 times) [2024-06-15 21:34:04,227][1652475] Updated weights for policy 0, policy_version 763577 (0.0138) [2024-06-15 21:34:05,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1563820032. Throughput: 0: 10262.8. Samples: 390986240. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:34:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:34:10,503][1652475] Updated weights for policy 0, policy_version 763635 (0.0012) [2024-06-15 21:34:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 1563951104. Throughput: 0: 10649.6. Samples: 391061504. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:34:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:34:12,182][1652475] Updated weights for policy 0, policy_version 763706 (0.0009) [2024-06-15 21:34:15,628][1652475] Updated weights for policy 0, policy_version 763792 (0.0013) [2024-06-15 21:34:15,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 1564246016. Throughput: 0: 10353.8. Samples: 391113728. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:34:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:34:16,819][1652475] Updated weights for policy 0, policy_version 763837 (0.0010) [2024-06-15 21:34:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 1564344320. Throughput: 0: 10274.2. Samples: 391146496. Policy #0 lag: (min: 101.0, avg: 186.4, max: 357.0) [2024-06-15 21:34:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:34:23,464][1652475] Updated weights for policy 0, policy_version 763905 (0.0112) [2024-06-15 21:34:24,889][1652475] Updated weights for policy 0, policy_version 763960 (0.0015) [2024-06-15 21:34:25,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 41506.2, 300 sec: 42320.9). Total num frames: 1564606464. Throughput: 0: 10661.0. Samples: 391217664. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:34:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:34:27,129][1652475] Updated weights for policy 0, policy_version 764016 (0.0123) [2024-06-15 21:34:28,896][1652475] Updated weights for policy 0, policy_version 764089 (0.0011) [2024-06-15 21:34:30,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1564868608. Throughput: 0: 10342.4. Samples: 391276544. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:34:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:34:35,285][1652475] Updated weights for policy 0, policy_version 764149 (0.0017) [2024-06-15 21:34:35,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 1564999680. Throughput: 0: 10615.5. Samples: 391318016. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:34:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:34:37,067][1652475] Updated weights for policy 0, policy_version 764212 (0.0040) [2024-06-15 21:34:38,754][1652475] Updated weights for policy 0, policy_version 764257 (0.0012) [2024-06-15 21:34:40,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42598.5, 300 sec: 42542.9). Total num frames: 1565327360. Throughput: 0: 10490.3. Samples: 391370752. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:34:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:34:41,004][1652475] Updated weights for policy 0, policy_version 764344 (0.0012) [2024-06-15 21:34:45,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1565392896. Throughput: 0: 10626.8. Samples: 391440384. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:34:45,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:34:47,210][1652475] Updated weights for policy 0, policy_version 764393 (0.0012) [2024-06-15 21:34:49,771][1652475] Updated weights for policy 0, policy_version 764448 (0.0097) [2024-06-15 21:34:50,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40960.0, 300 sec: 42209.6). Total num frames: 1565655040. Throughput: 0: 10797.5. Samples: 391472128. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:34:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:34:50,759][1651340] Signal inference workers to stop experience collection... (39300 times) [2024-06-15 21:34:50,804][1652475] InferenceWorker_p0-w0: stopping experience collection (39300 times) [2024-06-15 21:34:51,033][1651340] Signal inference workers to resume experience collection... (39300 times) [2024-06-15 21:34:51,033][1652475] InferenceWorker_p0-w0: resuming experience collection (39300 times) [2024-06-15 21:34:51,175][1652475] Updated weights for policy 0, policy_version 764497 (0.0013) [2024-06-15 21:34:52,890][1652475] Updated weights for policy 0, policy_version 764562 (0.0046) [2024-06-15 21:34:55,739][1648984] Fps is (10 sec: 52420.8, 60 sec: 43689.4, 300 sec: 42653.7). Total num frames: 1565917184. Throughput: 0: 10410.3. Samples: 391529984. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:34:55,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:34:55,747][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000764608_1565917184.pth... [2024-06-15 21:34:55,791][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000759744_1555955712.pth [2024-06-15 21:34:58,204][1652475] Updated weights for policy 0, policy_version 764611 (0.0011) [2024-06-15 21:35:00,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.0, 300 sec: 42209.6). Total num frames: 1566048256. Throughput: 0: 10706.5. Samples: 391595520. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:00,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:35:01,285][1652475] Updated weights for policy 0, policy_version 764673 (0.0014) [2024-06-15 21:35:03,339][1652475] Updated weights for policy 0, policy_version 764740 (0.0131) [2024-06-15 21:35:04,547][1652475] Updated weights for policy 0, policy_version 764800 (0.0011) [2024-06-15 21:35:05,738][1648984] Fps is (10 sec: 45883.3, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1566375936. Throughput: 0: 10615.5. Samples: 391624192. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:35:10,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 42209.7). Total num frames: 1566441472. Throughput: 0: 10490.3. Samples: 391689728. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:35:11,966][1652475] Updated weights for policy 0, policy_version 764869 (0.0047) [2024-06-15 21:35:14,061][1652475] Updated weights for policy 0, policy_version 764947 (0.0014) [2024-06-15 21:35:15,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 40960.0, 300 sec: 42098.6). Total num frames: 1566703616. Throughput: 0: 10547.2. Samples: 391751168. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:35:15,882][1652475] Updated weights for policy 0, policy_version 765009 (0.0026) [2024-06-15 21:35:17,177][1652475] Updated weights for policy 0, policy_version 765072 (0.0010) [2024-06-15 21:35:20,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 1566965760. Throughput: 0: 10240.0. Samples: 391778816. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:20,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 21:35:24,769][1652475] Updated weights for policy 0, policy_version 765136 (0.0014) [2024-06-15 21:35:25,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40960.1, 300 sec: 41654.2). Total num frames: 1567064064. Throughput: 0: 10695.1. Samples: 391852032. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:35:26,608][1652475] Updated weights for policy 0, policy_version 765190 (0.0012) [2024-06-15 21:35:27,913][1652475] Updated weights for policy 0, policy_version 765251 (0.0013) [2024-06-15 21:35:29,435][1652475] Updated weights for policy 0, policy_version 765316 (0.0011) [2024-06-15 21:35:30,466][1652475] Updated weights for policy 0, policy_version 765374 (0.0012) [2024-06-15 21:35:30,743][1648984] Fps is (10 sec: 52401.8, 60 sec: 43686.9, 300 sec: 42654.1). Total num frames: 1567490048. Throughput: 0: 10307.1. Samples: 391904256. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:30,743][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:35:35,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 1567490048. Throughput: 0: 10422.1. Samples: 391941120. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:35:37,821][1652475] Updated weights for policy 0, policy_version 765440 (0.0012) [2024-06-15 21:35:38,554][1651340] Signal inference workers to stop experience collection... (39350 times) [2024-06-15 21:35:38,588][1652475] InferenceWorker_p0-w0: stopping experience collection (39350 times) [2024-06-15 21:35:38,894][1651340] Signal inference workers to resume experience collection... (39350 times) [2024-06-15 21:35:38,895][1652475] InferenceWorker_p0-w0: resuming experience collection (39350 times) [2024-06-15 21:35:40,608][1652475] Updated weights for policy 0, policy_version 765509 (0.0011) [2024-06-15 21:35:40,738][1648984] Fps is (10 sec: 26227.7, 60 sec: 40413.8, 300 sec: 41765.3). Total num frames: 1567752192. Throughput: 0: 10604.5. Samples: 392007168. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:35:43,135][1652475] Updated weights for policy 0, policy_version 765622 (0.0225) [2024-06-15 21:35:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.8, 300 sec: 42209.6). Total num frames: 1568014336. Throughput: 0: 10513.1. Samples: 392068608. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:45,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:35:49,747][1652475] Updated weights for policy 0, policy_version 765696 (0.0013) [2024-06-15 21:35:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41543.2). Total num frames: 1568145408. Throughput: 0: 10808.9. Samples: 392110592. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:35:52,600][1652475] Updated weights for policy 0, policy_version 765778 (0.0022) [2024-06-15 21:35:54,115][1652475] Updated weights for policy 0, policy_version 765840 (0.0014) [2024-06-15 21:35:55,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43691.9, 300 sec: 42656.3). Total num frames: 1568538624. Throughput: 0: 10535.8. Samples: 392163840. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:35:55,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:36:00,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 1568538624. Throughput: 0: 10695.1. Samples: 392232448. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:36:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:36:01,647][1652475] Updated weights for policy 0, policy_version 765905 (0.0019) [2024-06-15 21:36:04,844][1652475] Updated weights for policy 0, policy_version 766001 (0.0134) [2024-06-15 21:36:05,738][1648984] Fps is (10 sec: 29490.6, 60 sec: 40959.8, 300 sec: 41876.4). Total num frames: 1568833536. Throughput: 0: 10808.8. Samples: 392265216. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 21:36:05,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:36:06,691][1652475] Updated weights for policy 0, policy_version 766065 (0.0030) [2024-06-15 21:36:08,476][1652475] Updated weights for policy 0, policy_version 766139 (0.0099) [2024-06-15 21:36:10,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 1569062912. Throughput: 0: 10228.6. Samples: 392312320. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:36:10,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 21:36:15,749][1648984] Fps is (10 sec: 29458.6, 60 sec: 40406.2, 300 sec: 41541.6). Total num frames: 1569128448. Throughput: 0: 10705.0. Samples: 392386048. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:36:15,750][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:36:16,036][1652475] Updated weights for policy 0, policy_version 766192 (0.0107) [2024-06-15 21:36:18,727][1652475] Updated weights for policy 0, policy_version 766288 (0.0011) [2024-06-15 21:36:20,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 42209.7). Total num frames: 1569456128. Throughput: 0: 10285.5. Samples: 392403968. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:36:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:36:21,804][1651340] Signal inference workers to stop experience collection... (39400 times) [2024-06-15 21:36:21,850][1652475] InferenceWorker_p0-w0: stopping experience collection (39400 times) [2024-06-15 21:36:21,884][1652475] Updated weights for policy 0, policy_version 766371 (0.0014) [2024-06-15 21:36:22,090][1651340] Signal inference workers to resume experience collection... (39400 times) [2024-06-15 21:36:22,091][1652475] InferenceWorker_p0-w0: resuming experience collection (39400 times) [2024-06-15 21:36:25,738][1648984] Fps is (10 sec: 45927.2, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 1569587200. Throughput: 0: 10183.1. Samples: 392465408. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:36:25,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:36:28,575][1652475] Updated weights for policy 0, policy_version 766417 (0.0011) [2024-06-15 21:36:30,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 38778.8, 300 sec: 41654.2). Total num frames: 1569816576. Throughput: 0: 10285.5. Samples: 392531456. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:36:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:36:30,976][1652475] Updated weights for policy 0, policy_version 766522 (0.0241) [2024-06-15 21:36:32,339][1652475] Updated weights for policy 0, policy_version 766583 (0.0012) [2024-06-15 21:36:34,872][1652475] Updated weights for policy 0, policy_version 766628 (0.0012) [2024-06-15 21:36:35,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 1570111488. Throughput: 0: 10046.6. Samples: 392562688. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:36:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:36:40,715][1652475] Updated weights for policy 0, policy_version 766704 (0.0014) [2024-06-15 21:36:40,740][1648984] Fps is (10 sec: 39311.0, 60 sec: 40958.2, 300 sec: 41653.9). Total num frames: 1570209792. Throughput: 0: 10467.0. Samples: 392634880. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:36:40,742][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:36:42,609][1652475] Updated weights for policy 0, policy_version 766800 (0.0135) [2024-06-15 21:36:45,698][1652475] Updated weights for policy 0, policy_version 766864 (0.0014) [2024-06-15 21:36:45,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 1570537472. Throughput: 0: 10262.7. Samples: 392694272. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:36:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:36:50,738][1648984] Fps is (10 sec: 42607.5, 60 sec: 41505.8, 300 sec: 41765.2). Total num frames: 1570635776. Throughput: 0: 10251.3. Samples: 392726528. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:36:50,740][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:36:51,781][1652475] Updated weights for policy 0, policy_version 766930 (0.0012) [2024-06-15 21:36:53,206][1652475] Updated weights for policy 0, policy_version 766999 (0.0042) [2024-06-15 21:36:54,729][1652475] Updated weights for policy 0, policy_version 767056 (0.0244) [2024-06-15 21:36:55,738][1648984] Fps is (10 sec: 49150.6, 60 sec: 41505.9, 300 sec: 42209.6). Total num frames: 1571028992. Throughput: 0: 10808.8. Samples: 392798720. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:36:55,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:36:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000767104_1571028992.pth... [2024-06-15 21:36:55,749][1652475] Updated weights for policy 0, policy_version 767104 (0.0019) [2024-06-15 21:36:55,788][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000762096_1560772608.pth [2024-06-15 21:36:58,234][1652475] Updated weights for policy 0, policy_version 767159 (0.0012) [2024-06-15 21:37:00,738][1648984] Fps is (10 sec: 52431.4, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1571160064. Throughput: 0: 10675.0. Samples: 392866304. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:37:03,535][1652475] Updated weights for policy 0, policy_version 767226 (0.0129) [2024-06-15 21:37:05,738][1648984] Fps is (10 sec: 36046.3, 60 sec: 42598.6, 300 sec: 41654.2). Total num frames: 1571389440. Throughput: 0: 11081.9. Samples: 392902656. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:37:05,924][1652475] Updated weights for policy 0, policy_version 767283 (0.0012) [2024-06-15 21:37:06,263][1651340] Signal inference workers to stop experience collection... (39450 times) [2024-06-15 21:37:06,367][1652475] InferenceWorker_p0-w0: stopping experience collection (39450 times) [2024-06-15 21:37:06,525][1651340] Signal inference workers to resume experience collection... (39450 times) [2024-06-15 21:37:06,526][1652475] InferenceWorker_p0-w0: resuming experience collection (39450 times) [2024-06-15 21:37:07,539][1652475] Updated weights for policy 0, policy_version 767360 (0.0153) [2024-06-15 21:37:09,630][1652475] Updated weights for policy 0, policy_version 767423 (0.0011) [2024-06-15 21:37:10,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1571684352. Throughput: 0: 10968.2. Samples: 392958976. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:37:14,615][1652475] Updated weights for policy 0, policy_version 767472 (0.0014) [2024-06-15 21:37:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44791.4, 300 sec: 41987.5). Total num frames: 1571815424. Throughput: 0: 11150.2. Samples: 393033216. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:37:17,377][1652475] Updated weights for policy 0, policy_version 767543 (0.0015) [2024-06-15 21:37:19,799][1652475] Updated weights for policy 0, policy_version 767609 (0.0015) [2024-06-15 21:37:20,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.7, 300 sec: 42320.7). Total num frames: 1572110336. Throughput: 0: 11264.0. Samples: 393069568. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:37:21,712][1652475] Updated weights for policy 0, policy_version 767680 (0.0012) [2024-06-15 21:37:25,739][1648984] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 42209.6). Total num frames: 1572241408. Throughput: 0: 11082.6. Samples: 393133568. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:25,740][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:37:26,393][1652475] Updated weights for policy 0, policy_version 767737 (0.0015) [2024-06-15 21:37:28,922][1652475] Updated weights for policy 0, policy_version 767804 (0.0014) [2024-06-15 21:37:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 44782.9, 300 sec: 41876.4). Total num frames: 1572503552. Throughput: 0: 11161.6. Samples: 393196544. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:37:31,442][1652475] Updated weights for policy 0, policy_version 767863 (0.0011) [2024-06-15 21:37:33,485][1652475] Updated weights for policy 0, policy_version 767920 (0.0012) [2024-06-15 21:37:35,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1572732928. Throughput: 0: 11127.6. Samples: 393227264. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:37:39,246][1652475] Updated weights for policy 0, policy_version 767971 (0.0012) [2024-06-15 21:37:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 45331.1, 300 sec: 42320.7). Total num frames: 1572929536. Throughput: 0: 11127.6. Samples: 393299456. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:37:40,998][1652475] Updated weights for policy 0, policy_version 768048 (0.0013) [2024-06-15 21:37:43,259][1652475] Updated weights for policy 0, policy_version 768115 (0.0011) [2024-06-15 21:37:45,421][1652475] Updated weights for policy 0, policy_version 768146 (0.0011) [2024-06-15 21:37:45,738][1648984] Fps is (10 sec: 45874.0, 60 sec: 44236.8, 300 sec: 42431.7). Total num frames: 1573191680. Throughput: 0: 10854.4. Samples: 393354752. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:45,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:37:50,738][1648984] Fps is (10 sec: 32767.6, 60 sec: 43691.0, 300 sec: 42542.8). Total num frames: 1573257216. Throughput: 0: 10774.7. Samples: 393387520. Policy #0 lag: (min: 61.0, avg: 205.5, max: 381.0) [2024-06-15 21:37:50,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:37:51,544][1652475] Updated weights for policy 0, policy_version 768225 (0.0164) [2024-06-15 21:37:54,001][1652475] Updated weights for policy 0, policy_version 768272 (0.0011) [2024-06-15 21:37:55,337][1651340] Signal inference workers to stop experience collection... (39500 times) [2024-06-15 21:37:55,378][1652475] InferenceWorker_p0-w0: stopping experience collection (39500 times) [2024-06-15 21:37:55,669][1651340] Signal inference workers to resume experience collection... (39500 times) [2024-06-15 21:37:55,671][1652475] InferenceWorker_p0-w0: resuming experience collection (39500 times) [2024-06-15 21:37:55,738][1648984] Fps is (10 sec: 36045.7, 60 sec: 42052.6, 300 sec: 42320.7). Total num frames: 1573552128. Throughput: 0: 10945.5. Samples: 393451520. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:37:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:37:56,672][1652475] Updated weights for policy 0, policy_version 768377 (0.0083) [2024-06-15 21:37:58,575][1652475] Updated weights for policy 0, policy_version 768417 (0.0012) [2024-06-15 21:38:00,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1573781504. Throughput: 0: 10581.3. Samples: 393509376. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:38:03,740][1652475] Updated weights for policy 0, policy_version 768496 (0.0153) [2024-06-15 21:38:05,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 1573912576. Throughput: 0: 10661.0. Samples: 393549312. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:38:07,074][1652475] Updated weights for policy 0, policy_version 768563 (0.0040) [2024-06-15 21:38:08,924][1652475] Updated weights for policy 0, policy_version 768635 (0.0132) [2024-06-15 21:38:10,686][1652475] Updated weights for policy 0, policy_version 768688 (0.0011) [2024-06-15 21:38:10,738][1648984] Fps is (10 sec: 49151.0, 60 sec: 43144.5, 300 sec: 42542.8). Total num frames: 1574273024. Throughput: 0: 10399.3. Samples: 393601536. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:38:15,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.2, 300 sec: 42542.9). Total num frames: 1574338560. Throughput: 0: 10695.1. Samples: 393677824. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:15,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:38:16,770][1652475] Updated weights for policy 0, policy_version 768765 (0.0016) [2024-06-15 21:38:19,411][1652475] Updated weights for policy 0, policy_version 768805 (0.0015) [2024-06-15 21:38:20,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 42052.2, 300 sec: 42431.8). Total num frames: 1574633472. Throughput: 0: 10660.9. Samples: 393707008. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:38:21,661][1652475] Updated weights for policy 0, policy_version 768901 (0.0123) [2024-06-15 21:38:25,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1574830080. Throughput: 0: 10262.7. Samples: 393761280. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:38:29,939][1652475] Updated weights for policy 0, policy_version 768963 (0.0016) [2024-06-15 21:38:30,738][1648984] Fps is (10 sec: 26214.6, 60 sec: 39867.7, 300 sec: 42098.6). Total num frames: 1574895616. Throughput: 0: 10592.8. Samples: 393831424. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:38:32,666][1652475] Updated weights for policy 0, policy_version 769088 (0.0013) [2024-06-15 21:38:34,180][1652475] Updated weights for policy 0, policy_version 769152 (0.0011) [2024-06-15 21:38:35,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 1575321600. Throughput: 0: 10308.3. Samples: 393851392. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:35,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 21:38:40,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 40413.8, 300 sec: 42653.9). Total num frames: 1575354368. Throughput: 0: 10194.5. Samples: 393910272. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:38:42,134][1652475] Updated weights for policy 0, policy_version 769220 (0.0190) [2024-06-15 21:38:42,766][1651340] Signal inference workers to stop experience collection... (39550 times) [2024-06-15 21:38:42,824][1652475] InferenceWorker_p0-w0: stopping experience collection (39550 times) [2024-06-15 21:38:42,926][1651340] Signal inference workers to resume experience collection... (39550 times) [2024-06-15 21:38:42,926][1652475] InferenceWorker_p0-w0: resuming experience collection (39550 times) [2024-06-15 21:38:43,676][1652475] Updated weights for policy 0, policy_version 769285 (0.0011) [2024-06-15 21:38:45,568][1652475] Updated weights for policy 0, policy_version 769360 (0.0013) [2024-06-15 21:38:45,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 40960.1, 300 sec: 42209.6). Total num frames: 1575649280. Throughput: 0: 10456.1. Samples: 393979904. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:45,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:38:47,472][1652475] Updated weights for policy 0, policy_version 769412 (0.0012) [2024-06-15 21:38:48,812][1652475] Updated weights for policy 0, policy_version 769464 (0.0039) [2024-06-15 21:38:50,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1575878656. Throughput: 0: 10217.3. Samples: 394009088. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:38:53,821][1652475] Updated weights for policy 0, policy_version 769488 (0.0012) [2024-06-15 21:38:55,790][1648984] Fps is (10 sec: 42376.2, 60 sec: 42015.5, 300 sec: 42424.2). Total num frames: 1576075264. Throughput: 0: 10716.8. Samples: 394084352. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:38:55,791][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:38:55,825][1652475] Updated weights for policy 0, policy_version 769584 (0.0015) [2024-06-15 21:38:56,039][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000769600_1576140800.pth... [2024-06-15 21:38:56,123][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000764608_1565917184.pth [2024-06-15 21:38:58,573][1652475] Updated weights for policy 0, policy_version 769657 (0.0015) [2024-06-15 21:38:59,440][1652475] Updated weights for policy 0, policy_version 769683 (0.0105) [2024-06-15 21:39:00,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1576402944. Throughput: 0: 10308.3. Samples: 394141696. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:39:00,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:39:05,738][1648984] Fps is (10 sec: 39529.2, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1576468480. Throughput: 0: 10615.5. Samples: 394184704. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:39:05,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:39:05,750][1652475] Updated weights for policy 0, policy_version 769764 (0.0011) [2024-06-15 21:39:07,194][1652475] Updated weights for policy 0, policy_version 769824 (0.0065) [2024-06-15 21:39:08,897][1652475] Updated weights for policy 0, policy_version 769877 (0.0014) [2024-06-15 21:39:10,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 1576828928. Throughput: 0: 10979.6. Samples: 394255360. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:39:10,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:39:11,437][1652475] Updated weights for policy 0, policy_version 769979 (0.0011) [2024-06-15 21:39:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 1576927232. Throughput: 0: 10854.4. Samples: 394319872. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:39:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:39:17,765][1652475] Updated weights for policy 0, policy_version 770032 (0.0013) [2024-06-15 21:39:19,520][1652475] Updated weights for policy 0, policy_version 770082 (0.0120) [2024-06-15 21:39:20,746][1648984] Fps is (10 sec: 36014.4, 60 sec: 42592.4, 300 sec: 42652.7). Total num frames: 1577189376. Throughput: 0: 11136.7. Samples: 394352640. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:39:20,747][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:39:21,192][1652475] Updated weights for policy 0, policy_version 770144 (0.0013) [2024-06-15 21:39:22,832][1652475] Updated weights for policy 0, policy_version 770208 (0.0035) [2024-06-15 21:39:25,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1577451520. Throughput: 0: 11036.4. Samples: 394406912. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:39:25,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:39:29,821][1651340] Signal inference workers to stop experience collection... (39600 times) [2024-06-15 21:39:29,825][1652475] Updated weights for policy 0, policy_version 770275 (0.0019) [2024-06-15 21:39:29,884][1652475] InferenceWorker_p0-w0: stopping experience collection (39600 times) [2024-06-15 21:39:30,022][1651340] Signal inference workers to resume experience collection... (39600 times) [2024-06-15 21:39:30,022][1652475] InferenceWorker_p0-w0: resuming experience collection (39600 times) [2024-06-15 21:39:30,738][1648984] Fps is (10 sec: 39354.9, 60 sec: 44782.9, 300 sec: 42653.9). Total num frames: 1577582592. Throughput: 0: 11002.3. Samples: 394475008. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:39:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:39:32,457][1652475] Updated weights for policy 0, policy_version 770320 (0.0012) [2024-06-15 21:39:34,294][1652475] Updated weights for policy 0, policy_version 770400 (0.0040) [2024-06-15 21:39:35,738][1648984] Fps is (10 sec: 42599.7, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 1577877504. Throughput: 0: 11127.5. Samples: 394509824. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:39:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:39:35,981][1652475] Updated weights for policy 0, policy_version 770464 (0.0012) [2024-06-15 21:39:40,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1577975808. Throughput: 0: 10730.3. Samples: 394566656. Policy #0 lag: (min: 55.0, avg: 130.2, max: 311.0) [2024-06-15 21:39:40,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:39:41,103][1652475] Updated weights for policy 0, policy_version 770502 (0.0020) [2024-06-15 21:39:44,028][1652475] Updated weights for policy 0, policy_version 770562 (0.0011) [2024-06-15 21:39:45,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 1578237952. Throughput: 0: 11036.5. Samples: 394638336. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:39:45,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:39:45,775][1652475] Updated weights for policy 0, policy_version 770640 (0.0062) [2024-06-15 21:39:48,289][1652475] Updated weights for policy 0, policy_version 770704 (0.0101) [2024-06-15 21:39:50,739][1648984] Fps is (10 sec: 52424.5, 60 sec: 43689.8, 300 sec: 42654.0). Total num frames: 1578500096. Throughput: 0: 10740.3. Samples: 394668032. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:39:50,740][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 21:39:54,261][1652475] Updated weights for policy 0, policy_version 770771 (0.0018) [2024-06-15 21:39:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42635.7, 300 sec: 42654.0). Total num frames: 1578631168. Throughput: 0: 10695.1. Samples: 394736640. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:39:55,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 21:39:56,230][1652475] Updated weights for policy 0, policy_version 770848 (0.0014) [2024-06-15 21:39:57,840][1652475] Updated weights for policy 0, policy_version 770912 (0.0013) [2024-06-15 21:40:00,738][1648984] Fps is (10 sec: 42603.4, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 1578926080. Throughput: 0: 10467.6. Samples: 394790912. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:00,742][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:40:01,275][1652475] Updated weights for policy 0, policy_version 770982 (0.0012) [2024-06-15 21:40:05,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1579024384. Throughput: 0: 10458.1. Samples: 394823168. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:05,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 21:40:07,242][1652475] Updated weights for policy 0, policy_version 771024 (0.0013) [2024-06-15 21:40:09,366][1652475] Updated weights for policy 0, policy_version 771104 (0.0013) [2024-06-15 21:40:10,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 1579286528. Throughput: 0: 10695.2. Samples: 394888192. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:10,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 21:40:11,651][1652475] Updated weights for policy 0, policy_version 771184 (0.0012) [2024-06-15 21:40:13,245][1651340] Signal inference workers to stop experience collection... (39650 times) [2024-06-15 21:40:13,356][1652475] InferenceWorker_p0-w0: stopping experience collection (39650 times) [2024-06-15 21:40:13,606][1651340] Signal inference workers to resume experience collection... (39650 times) [2024-06-15 21:40:13,607][1652475] InferenceWorker_p0-w0: resuming experience collection (39650 times) [2024-06-15 21:40:14,137][1652475] Updated weights for policy 0, policy_version 771259 (0.0092) [2024-06-15 21:40:15,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1579548672. Throughput: 0: 10308.3. Samples: 394938880. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:40:20,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 40419.6, 300 sec: 42542.9). Total num frames: 1579614208. Throughput: 0: 10433.4. Samples: 394979328. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:40:21,389][1652475] Updated weights for policy 0, policy_version 771316 (0.0012) [2024-06-15 21:40:22,690][1652475] Updated weights for policy 0, policy_version 771380 (0.0124) [2024-06-15 21:40:23,619][1652475] Updated weights for policy 0, policy_version 771410 (0.0012) [2024-06-15 21:40:25,305][1652475] Updated weights for policy 0, policy_version 771457 (0.0011) [2024-06-15 21:40:25,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.5, 300 sec: 42321.4). Total num frames: 1579974656. Throughput: 0: 10467.6. Samples: 395037696. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:40:30,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1580072960. Throughput: 0: 10444.8. Samples: 395108352. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:40:31,887][1652475] Updated weights for policy 0, policy_version 771522 (0.0014) [2024-06-15 21:40:33,956][1652475] Updated weights for policy 0, policy_version 771600 (0.0012) [2024-06-15 21:40:35,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 1580335104. Throughput: 0: 10502.0. Samples: 395140608. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:40:36,844][1652475] Updated weights for policy 0, policy_version 771711 (0.0119) [2024-06-15 21:40:38,837][1652475] Updated weights for policy 0, policy_version 771772 (0.0012) [2024-06-15 21:40:40,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1580597248. Throughput: 0: 10103.4. Samples: 395191296. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:40:45,738][1648984] Fps is (10 sec: 26214.6, 60 sec: 39321.6, 300 sec: 42209.6). Total num frames: 1580597248. Throughput: 0: 10615.5. Samples: 395268608. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:45,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:40:47,098][1652475] Updated weights for policy 0, policy_version 771840 (0.0014) [2024-06-15 21:40:49,288][1652475] Updated weights for policy 0, policy_version 771936 (0.0034) [2024-06-15 21:40:50,740][1648984] Fps is (10 sec: 45875.5, 60 sec: 42599.2, 300 sec: 42431.8). Total num frames: 1581056000. Throughput: 0: 10422.0. Samples: 395292160. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:50,741][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:40:51,361][1652475] Updated weights for policy 0, policy_version 772026 (0.0012) [2024-06-15 21:40:55,738][1648984] Fps is (10 sec: 52427.1, 60 sec: 41505.9, 300 sec: 42653.9). Total num frames: 1581121536. Throughput: 0: 10285.4. Samples: 395351040. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:40:55,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:40:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000772032_1581121536.pth... [2024-06-15 21:40:55,808][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000767104_1571028992.pth [2024-06-15 21:40:58,890][1652475] Updated weights for policy 0, policy_version 772086 (0.0015) [2024-06-15 21:41:00,738][1648984] Fps is (10 sec: 26213.7, 60 sec: 39867.5, 300 sec: 42320.7). Total num frames: 1581318144. Throughput: 0: 10660.9. Samples: 395418624. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:41:00,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:41:01,075][1651340] Signal inference workers to stop experience collection... (39700 times) [2024-06-15 21:41:01,095][1652475] Updated weights for policy 0, policy_version 772145 (0.0012) [2024-06-15 21:41:01,109][1652475] InferenceWorker_p0-w0: stopping experience collection (39700 times) [2024-06-15 21:41:01,379][1651340] Signal inference workers to resume experience collection... (39700 times) [2024-06-15 21:41:01,380][1652475] InferenceWorker_p0-w0: resuming experience collection (39700 times) [2024-06-15 21:41:03,622][1652475] Updated weights for policy 0, policy_version 772240 (0.0010) [2024-06-15 21:41:05,738][1648984] Fps is (10 sec: 52430.4, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1581645824. Throughput: 0: 10217.2. Samples: 395439104. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:41:05,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:41:09,798][1652475] Updated weights for policy 0, policy_version 772291 (0.0011) [2024-06-15 21:41:10,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 40413.9, 300 sec: 42655.6). Total num frames: 1581711360. Throughput: 0: 10456.2. Samples: 395508224. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:41:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:41:11,100][1652475] Updated weights for policy 0, policy_version 772349 (0.0071) [2024-06-15 21:41:14,086][1652475] Updated weights for policy 0, policy_version 772411 (0.0038) [2024-06-15 21:41:15,062][1652475] Updated weights for policy 0, policy_version 772449 (0.0010) [2024-06-15 21:41:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.0, 300 sec: 42542.9). Total num frames: 1582006272. Throughput: 0: 10240.0. Samples: 395569152. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:41:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:41:17,135][1652475] Updated weights for policy 0, policy_version 772528 (0.0011) [2024-06-15 21:41:17,631][1652475] Updated weights for policy 0, policy_version 772544 (0.0016) [2024-06-15 21:41:20,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1582170112. Throughput: 0: 10126.2. Samples: 395596288. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:41:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:41:23,136][1652475] Updated weights for policy 0, policy_version 772605 (0.0015) [2024-06-15 21:41:25,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 39867.7, 300 sec: 42542.8). Total num frames: 1582366720. Throughput: 0: 10524.5. Samples: 395664896. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 21:41:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:41:26,152][1652475] Updated weights for policy 0, policy_version 772665 (0.0016) [2024-06-15 21:41:27,996][1652475] Updated weights for policy 0, policy_version 772720 (0.0011) [2024-06-15 21:41:29,918][1652475] Updated weights for policy 0, policy_version 772792 (0.0032) [2024-06-15 21:41:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 1582694400. Throughput: 0: 10046.6. Samples: 395720704. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:41:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:41:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 40960.0, 300 sec: 42654.3). Total num frames: 1582792704. Throughput: 0: 10365.2. Samples: 395758592. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:41:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:41:35,824][1652475] Updated weights for policy 0, policy_version 772856 (0.0108) [2024-06-15 21:41:38,470][1652475] Updated weights for policy 0, policy_version 772912 (0.0019) [2024-06-15 21:41:40,127][1652475] Updated weights for policy 0, policy_version 772960 (0.0029) [2024-06-15 21:41:40,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 40960.0, 300 sec: 42431.8). Total num frames: 1583054848. Throughput: 0: 10479.0. Samples: 395822592. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:41:40,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:41:42,406][1652475] Updated weights for policy 0, policy_version 773047 (0.0017) [2024-06-15 21:41:45,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 1583218688. Throughput: 0: 10285.6. Samples: 395881472. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:41:45,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:41:47,605][1651340] Signal inference workers to stop experience collection... (39750 times) [2024-06-15 21:41:47,684][1652475] InferenceWorker_p0-w0: stopping experience collection (39750 times) [2024-06-15 21:41:47,822][1651340] Signal inference workers to resume experience collection... (39750 times) [2024-06-15 21:41:47,822][1652475] InferenceWorker_p0-w0: resuming experience collection (39750 times) [2024-06-15 21:41:47,824][1652475] Updated weights for policy 0, policy_version 773104 (0.0034) [2024-06-15 21:41:50,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 38229.3, 300 sec: 41765.4). Total num frames: 1583349760. Throughput: 0: 10569.9. Samples: 395914752. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:41:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:41:51,387][1652475] Updated weights for policy 0, policy_version 773152 (0.0013) [2024-06-15 21:41:52,929][1652475] Updated weights for policy 0, policy_version 773217 (0.0012) [2024-06-15 21:41:55,302][1652475] Updated weights for policy 0, policy_version 773310 (0.0115) [2024-06-15 21:41:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1583742976. Throughput: 0: 10365.1. Samples: 395974656. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:41:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:42:00,151][1652475] Updated weights for policy 0, policy_version 773363 (0.0015) [2024-06-15 21:42:00,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 42598.6, 300 sec: 42320.7). Total num frames: 1583874048. Throughput: 0: 10376.5. Samples: 396036096. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:00,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:42:03,780][1652475] Updated weights for policy 0, policy_version 773412 (0.0013) [2024-06-15 21:42:05,241][1652475] Updated weights for policy 0, policy_version 773472 (0.0009) [2024-06-15 21:42:05,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 40959.9, 300 sec: 42098.6). Total num frames: 1584103424. Throughput: 0: 10592.7. Samples: 396072960. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:42:06,493][1652475] Updated weights for policy 0, policy_version 773520 (0.0013) [2024-06-15 21:42:10,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 1584267264. Throughput: 0: 10365.2. Samples: 396131328. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:42:11,659][1652475] Updated weights for policy 0, policy_version 773584 (0.0016) [2024-06-15 21:42:15,480][1652475] Updated weights for policy 0, policy_version 773664 (0.0015) [2024-06-15 21:42:15,738][1648984] Fps is (10 sec: 36045.4, 60 sec: 40960.0, 300 sec: 41876.4). Total num frames: 1584463872. Throughput: 0: 10615.5. Samples: 396198400. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:15,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:42:16,619][1652475] Updated weights for policy 0, policy_version 773698 (0.0012) [2024-06-15 21:42:19,004][1652475] Updated weights for policy 0, policy_version 773766 (0.0011) [2024-06-15 21:42:19,869][1652475] Updated weights for policy 0, policy_version 773818 (0.0012) [2024-06-15 21:42:20,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 42542.9). Total num frames: 1584791552. Throughput: 0: 10399.3. Samples: 396226560. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:42:25,738][1648984] Fps is (10 sec: 42597.5, 60 sec: 42052.2, 300 sec: 41987.4). Total num frames: 1584889856. Throughput: 0: 10604.1. Samples: 396299776. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:42:25,955][1652475] Updated weights for policy 0, policy_version 773883 (0.0114) [2024-06-15 21:42:27,670][1652475] Updated weights for policy 0, policy_version 773945 (0.0027) [2024-06-15 21:42:29,162][1652475] Updated weights for policy 0, policy_version 774000 (0.0013) [2024-06-15 21:42:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1585184768. Throughput: 0: 10570.0. Samples: 396357120. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:30,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 21:42:31,144][1651340] Signal inference workers to stop experience collection... (39800 times) [2024-06-15 21:42:31,201][1652475] InferenceWorker_p0-w0: stopping experience collection (39800 times) [2024-06-15 21:42:31,347][1651340] Signal inference workers to resume experience collection... (39800 times) [2024-06-15 21:42:31,348][1652475] InferenceWorker_p0-w0: resuming experience collection (39800 times) [2024-06-15 21:42:31,350][1652475] Updated weights for policy 0, policy_version 774048 (0.0015) [2024-06-15 21:42:35,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 1585315840. Throughput: 0: 10592.7. Samples: 396391424. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:42:38,852][1652475] Updated weights for policy 0, policy_version 774144 (0.0019) [2024-06-15 21:42:40,628][1652475] Updated weights for policy 0, policy_version 774194 (0.0012) [2024-06-15 21:42:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 1585545216. Throughput: 0: 10717.9. Samples: 396456960. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:42:42,208][1652475] Updated weights for policy 0, policy_version 774268 (0.0012) [2024-06-15 21:42:43,819][1652475] Updated weights for policy 0, policy_version 774305 (0.0012) [2024-06-15 21:42:45,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1585840128. Throughput: 0: 10615.5. Samples: 396513792. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:42:50,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 42052.3, 300 sec: 41765.3). Total num frames: 1585872896. Throughput: 0: 10683.7. Samples: 396553728. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:42:50,889][1652475] Updated weights for policy 0, policy_version 774368 (0.0121) [2024-06-15 21:42:52,348][1652475] Updated weights for policy 0, policy_version 774423 (0.0012) [2024-06-15 21:42:54,075][1652475] Updated weights for policy 0, policy_version 774501 (0.0157) [2024-06-15 21:42:55,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42320.7). Total num frames: 1586266112. Throughput: 0: 10729.2. Samples: 396614144. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:42:55,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:42:55,948][1652475] Updated weights for policy 0, policy_version 774565 (0.0023) [2024-06-15 21:42:56,107][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000774576_1586331648.pth... [2024-06-15 21:42:56,170][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000769600_1576140800.pth [2024-06-15 21:43:00,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1586364416. Throughput: 0: 10854.4. Samples: 396686848. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:43:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:43:01,880][1652475] Updated weights for policy 0, policy_version 774595 (0.0013) [2024-06-15 21:43:04,098][1652475] Updated weights for policy 0, policy_version 774688 (0.0010) [2024-06-15 21:43:05,296][1652475] Updated weights for policy 0, policy_version 774742 (0.0099) [2024-06-15 21:43:05,738][1648984] Fps is (10 sec: 42596.4, 60 sec: 43144.2, 300 sec: 42098.5). Total num frames: 1586692096. Throughput: 0: 10911.2. Samples: 396717568. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:43:05,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:43:05,980][1652475] Updated weights for policy 0, policy_version 774783 (0.0028) [2024-06-15 21:43:10,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 42542.9). Total num frames: 1586888704. Throughput: 0: 10604.1. Samples: 396776960. Policy #0 lag: (min: 15.0, avg: 135.8, max: 279.0) [2024-06-15 21:43:10,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:43:13,184][1652475] Updated weights for policy 0, policy_version 774850 (0.0012) [2024-06-15 21:43:14,546][1652475] Updated weights for policy 0, policy_version 774912 (0.0029) [2024-06-15 21:43:15,738][1648984] Fps is (10 sec: 32769.6, 60 sec: 42598.3, 300 sec: 41987.5). Total num frames: 1587019776. Throughput: 0: 10865.8. Samples: 396846080. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:43:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:43:17,771][1651340] Signal inference workers to stop experience collection... (39850 times) [2024-06-15 21:43:17,846][1652475] InferenceWorker_p0-w0: stopping experience collection (39850 times) [2024-06-15 21:43:18,020][1651340] Signal inference workers to resume experience collection... (39850 times) [2024-06-15 21:43:18,020][1652475] InferenceWorker_p0-w0: resuming experience collection (39850 times) [2024-06-15 21:43:18,844][1652475] Updated weights for policy 0, policy_version 775008 (0.0104) [2024-06-15 21:43:20,086][1652475] Updated weights for policy 0, policy_version 775061 (0.0011) [2024-06-15 21:43:20,738][1648984] Fps is (10 sec: 49152.8, 60 sec: 43144.6, 300 sec: 42542.9). Total num frames: 1587380224. Throughput: 0: 10808.9. Samples: 396877824. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:43:20,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:43:24,783][1652475] Updated weights for policy 0, policy_version 775106 (0.0015) [2024-06-15 21:43:25,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 43690.8, 300 sec: 42765.0). Total num frames: 1587511296. Throughput: 0: 10763.4. Samples: 396941312. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:43:25,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:43:26,051][1652475] Updated weights for policy 0, policy_version 775168 (0.0012) [2024-06-15 21:43:30,234][1652475] Updated weights for policy 0, policy_version 775235 (0.0136) [2024-06-15 21:43:30,738][1648984] Fps is (10 sec: 32767.0, 60 sec: 42052.1, 300 sec: 41987.4). Total num frames: 1587707904. Throughput: 0: 10934.0. Samples: 397005824. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:43:30,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:43:31,600][1652475] Updated weights for policy 0, policy_version 775290 (0.0012) [2024-06-15 21:43:33,495][1652475] Updated weights for policy 0, policy_version 775360 (0.0013) [2024-06-15 21:43:35,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1587937280. Throughput: 0: 10581.3. Samples: 397029888. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:43:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:43:37,839][1652475] Updated weights for policy 0, policy_version 775419 (0.0012) [2024-06-15 21:43:40,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 1588101120. Throughput: 0: 10899.9. Samples: 397104640. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:43:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:43:41,449][1652475] Updated weights for policy 0, policy_version 775478 (0.0110) [2024-06-15 21:43:43,187][1652475] Updated weights for policy 0, policy_version 775522 (0.0012) [2024-06-15 21:43:44,951][1652475] Updated weights for policy 0, policy_version 775587 (0.0012) [2024-06-15 21:43:45,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1588461568. Throughput: 0: 10513.1. Samples: 397159936. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:43:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:43:48,328][1652475] Updated weights for policy 0, policy_version 775617 (0.0012) [2024-06-15 21:43:50,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 45329.1, 300 sec: 42439.3). Total num frames: 1588592640. Throughput: 0: 10797.7. Samples: 397203456. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:43:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:43:52,510][1652475] Updated weights for policy 0, policy_version 775702 (0.0014) [2024-06-15 21:43:54,611][1652475] Updated weights for policy 0, policy_version 775760 (0.0013) [2024-06-15 21:43:55,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 1588822016. Throughput: 0: 10956.8. Samples: 397270016. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:43:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:43:56,492][1652475] Updated weights for policy 0, policy_version 775824 (0.0015) [2024-06-15 21:44:00,738][1648984] Fps is (10 sec: 39319.2, 60 sec: 43690.3, 300 sec: 42431.7). Total num frames: 1588985856. Throughput: 0: 10751.9. Samples: 397329920. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:00,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:44:01,533][1652475] Updated weights for policy 0, policy_version 775904 (0.0037) [2024-06-15 21:44:04,880][1651340] Signal inference workers to stop experience collection... (39900 times) [2024-06-15 21:44:04,914][1652475] InferenceWorker_p0-w0: stopping experience collection (39900 times) [2024-06-15 21:44:05,091][1651340] Signal inference workers to resume experience collection... (39900 times) [2024-06-15 21:44:05,092][1652475] InferenceWorker_p0-w0: resuming experience collection (39900 times) [2024-06-15 21:44:05,210][1652475] Updated weights for policy 0, policy_version 775952 (0.0127) [2024-06-15 21:44:05,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.5, 300 sec: 41876.4). Total num frames: 1589182464. Throughput: 0: 10740.6. Samples: 397361152. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:44:07,651][1652475] Updated weights for policy 0, policy_version 776032 (0.0104) [2024-06-15 21:44:09,204][1652475] Updated weights for policy 0, policy_version 776096 (0.0011) [2024-06-15 21:44:10,739][1648984] Fps is (10 sec: 52423.3, 60 sec: 43689.6, 300 sec: 42653.7). Total num frames: 1589510144. Throughput: 0: 10512.7. Samples: 397414400. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:10,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:44:13,354][1652475] Updated weights for policy 0, policy_version 776144 (0.0009) [2024-06-15 21:44:15,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 42210.8). Total num frames: 1589641216. Throughput: 0: 10570.0. Samples: 397481472. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:15,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:44:18,598][1652475] Updated weights for policy 0, policy_version 776195 (0.0012) [2024-06-15 21:44:20,197][1652475] Updated weights for policy 0, policy_version 776272 (0.0200) [2024-06-15 21:44:20,738][1648984] Fps is (10 sec: 32772.1, 60 sec: 40959.7, 300 sec: 41987.5). Total num frames: 1589837824. Throughput: 0: 10877.1. Samples: 397519360. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:20,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:44:22,525][1652475] Updated weights for policy 0, policy_version 776368 (0.0100) [2024-06-15 21:44:25,583][1652475] Updated weights for policy 0, policy_version 776405 (0.0012) [2024-06-15 21:44:25,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43144.4, 300 sec: 42431.8). Total num frames: 1590099968. Throughput: 0: 10513.1. Samples: 397577728. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 21:44:30,738][1648984] Fps is (10 sec: 32769.3, 60 sec: 40960.2, 300 sec: 41654.2). Total num frames: 1590165504. Throughput: 0: 10808.9. Samples: 397646336. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:44:30,778][1652475] Updated weights for policy 0, policy_version 776464 (0.0020) [2024-06-15 21:44:32,134][1652475] Updated weights for policy 0, policy_version 776528 (0.0011) [2024-06-15 21:44:34,779][1652475] Updated weights for policy 0, policy_version 776613 (0.0017) [2024-06-15 21:44:35,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1590558720. Throughput: 0: 10444.8. Samples: 397673472. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:44:40,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 42598.4, 300 sec: 42098.5). Total num frames: 1590657024. Throughput: 0: 10319.7. Samples: 397734400. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:44:40,802][1652475] Updated weights for policy 0, policy_version 776698 (0.0015) [2024-06-15 21:44:45,183][1652475] Updated weights for policy 0, policy_version 776784 (0.0133) [2024-06-15 21:44:45,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 40413.9, 300 sec: 41987.6). Total num frames: 1590886400. Throughput: 0: 10194.6. Samples: 397788672. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:44:46,762][1651340] Signal inference workers to stop experience collection... (39950 times) [2024-06-15 21:44:46,803][1652475] InferenceWorker_p0-w0: stopping experience collection (39950 times) [2024-06-15 21:44:47,072][1651340] Signal inference workers to resume experience collection... (39950 times) [2024-06-15 21:44:47,083][1652475] InferenceWorker_p0-w0: resuming experience collection (39950 times) [2024-06-15 21:44:47,242][1652475] Updated weights for policy 0, policy_version 776869 (0.0079) [2024-06-15 21:44:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1591083008. Throughput: 0: 10046.6. Samples: 397813248. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:44:55,429][1652475] Updated weights for policy 0, policy_version 776944 (0.0014) [2024-06-15 21:44:55,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 39321.6, 300 sec: 41543.2). Total num frames: 1591181312. Throughput: 0: 10638.6. Samples: 397893120. Policy #0 lag: (min: 11.0, avg: 76.3, max: 267.0) [2024-06-15 21:44:55,740][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:44:56,229][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000776976_1591246848.pth... [2024-06-15 21:44:56,399][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000772032_1581121536.pth [2024-06-15 21:44:57,827][1652475] Updated weights for policy 0, policy_version 777047 (0.0013) [2024-06-15 21:44:59,820][1652475] Updated weights for policy 0, policy_version 777123 (0.0097) [2024-06-15 21:45:00,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43691.1, 300 sec: 42653.9). Total num frames: 1591607296. Throughput: 0: 10171.8. Samples: 397939200. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:45:05,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 40413.9, 300 sec: 41765.3). Total num frames: 1591607296. Throughput: 0: 10194.6. Samples: 397978112. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:45:07,977][1652475] Updated weights for policy 0, policy_version 777186 (0.0014) [2024-06-15 21:45:10,043][1652475] Updated weights for policy 0, policy_version 777267 (0.0011) [2024-06-15 21:45:10,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 39868.9, 300 sec: 41876.4). Total num frames: 1591902208. Throughput: 0: 10353.8. Samples: 398043648. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:45:11,717][1652475] Updated weights for policy 0, policy_version 777330 (0.0011) [2024-06-15 21:45:13,358][1652475] Updated weights for policy 0, policy_version 777408 (0.0013) [2024-06-15 21:45:15,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 41506.3, 300 sec: 42431.8). Total num frames: 1592131584. Throughput: 0: 10058.0. Samples: 398098944. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:45:20,740][1648984] Fps is (10 sec: 29490.8, 60 sec: 39321.8, 300 sec: 41432.1). Total num frames: 1592197120. Throughput: 0: 10296.9. Samples: 398136832. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:20,741][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:45:21,494][1652475] Updated weights for policy 0, policy_version 777474 (0.0047) [2024-06-15 21:45:23,414][1652475] Updated weights for policy 0, policy_version 777552 (0.0167) [2024-06-15 21:45:24,852][1652475] Updated weights for policy 0, policy_version 777605 (0.0013) [2024-06-15 21:45:25,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 41506.2, 300 sec: 42431.8). Total num frames: 1592590336. Throughput: 0: 10160.4. Samples: 398191616. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:45:26,178][1652475] Updated weights for policy 0, policy_version 777664 (0.0201) [2024-06-15 21:45:30,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 1592655872. Throughput: 0: 10410.6. Samples: 398257152. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:45:34,502][1651340] Signal inference workers to stop experience collection... (40000 times) [2024-06-15 21:45:34,564][1652475] InferenceWorker_p0-w0: stopping experience collection (40000 times) [2024-06-15 21:45:34,728][1651340] Signal inference workers to resume experience collection... (40000 times) [2024-06-15 21:45:34,729][1652475] InferenceWorker_p0-w0: resuming experience collection (40000 times) [2024-06-15 21:45:34,962][1652475] Updated weights for policy 0, policy_version 777732 (0.0105) [2024-06-15 21:45:35,738][1648984] Fps is (10 sec: 26214.5, 60 sec: 38229.3, 300 sec: 41543.2). Total num frames: 1592852480. Throughput: 0: 10661.0. Samples: 398292992. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:35,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:45:36,866][1652475] Updated weights for policy 0, policy_version 777808 (0.0011) [2024-06-15 21:45:38,896][1652475] Updated weights for policy 0, policy_version 777889 (0.0014) [2024-06-15 21:45:40,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 42052.1, 300 sec: 42653.9). Total num frames: 1593180160. Throughput: 0: 9898.6. Samples: 398338560. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:40,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:45:45,576][1652475] Updated weights for policy 0, policy_version 777936 (0.0014) [2024-06-15 21:45:45,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 38775.5, 300 sec: 41209.9). Total num frames: 1593212928. Throughput: 0: 10524.5. Samples: 398412800. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:45,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:45:47,423][1652475] Updated weights for policy 0, policy_version 778002 (0.0012) [2024-06-15 21:45:49,679][1652475] Updated weights for policy 0, policy_version 778110 (0.0011) [2024-06-15 21:45:50,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 41506.1, 300 sec: 42209.7). Total num frames: 1593573376. Throughput: 0: 10240.0. Samples: 398438912. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:45:51,869][1652475] Updated weights for policy 0, policy_version 778167 (0.0011) [2024-06-15 21:45:55,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 1593704448. Throughput: 0: 10057.9. Samples: 398496256. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:45:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:45:58,005][1652475] Updated weights for policy 0, policy_version 778214 (0.0012) [2024-06-15 21:45:59,862][1652475] Updated weights for policy 0, policy_version 778303 (0.0011) [2024-06-15 21:46:00,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 39867.8, 300 sec: 41876.4). Total num frames: 1593999360. Throughput: 0: 10296.9. Samples: 398562304. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:46:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:46:01,839][1652475] Updated weights for policy 0, policy_version 778366 (0.0011) [2024-06-15 21:46:05,032][1652475] Updated weights for policy 0, policy_version 778426 (0.0020) [2024-06-15 21:46:05,739][1648984] Fps is (10 sec: 52423.7, 60 sec: 43689.9, 300 sec: 42431.6). Total num frames: 1594228736. Throughput: 0: 10126.0. Samples: 398592512. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:46:05,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:46:10,012][1652475] Updated weights for policy 0, policy_version 778488 (0.0010) [2024-06-15 21:46:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 1594392576. Throughput: 0: 10547.2. Samples: 398666240. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:46:10,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:46:10,957][1652475] Updated weights for policy 0, policy_version 778529 (0.0011) [2024-06-15 21:46:13,141][1652475] Updated weights for policy 0, policy_version 778608 (0.0026) [2024-06-15 21:46:15,738][1648984] Fps is (10 sec: 39325.4, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1594621952. Throughput: 0: 10433.4. Samples: 398726656. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:46:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:46:16,046][1651340] Signal inference workers to stop experience collection... (40050 times) [2024-06-15 21:46:16,101][1652475] InferenceWorker_p0-w0: stopping experience collection (40050 times) [2024-06-15 21:46:16,336][1651340] Signal inference workers to resume experience collection... (40050 times) [2024-06-15 21:46:16,337][1652475] InferenceWorker_p0-w0: resuming experience collection (40050 times) [2024-06-15 21:46:16,626][1652475] Updated weights for policy 0, policy_version 778672 (0.0012) [2024-06-15 21:46:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43144.6, 300 sec: 42098.6). Total num frames: 1594785792. Throughput: 0: 10387.9. Samples: 398760448. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:46:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:46:21,217][1652475] Updated weights for policy 0, policy_version 778720 (0.0012) [2024-06-15 21:46:22,817][1652475] Updated weights for policy 0, policy_version 778768 (0.0012) [2024-06-15 21:46:24,763][1652475] Updated weights for policy 0, policy_version 778817 (0.0014) [2024-06-15 21:46:25,737][1648984] Fps is (10 sec: 49153.0, 60 sec: 42052.4, 300 sec: 42098.6). Total num frames: 1595113472. Throughput: 0: 10706.6. Samples: 398820352. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:46:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:46:26,116][1652475] Updated weights for policy 0, policy_version 778878 (0.0013) [2024-06-15 21:46:29,059][1652475] Updated weights for policy 0, policy_version 778939 (0.0114) [2024-06-15 21:46:30,740][1648984] Fps is (10 sec: 49151.6, 60 sec: 43690.7, 300 sec: 42320.7). Total num frames: 1595277312. Throughput: 0: 10501.7. Samples: 398885376. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:46:30,741][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:46:35,738][1648984] Fps is (10 sec: 26213.7, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 1595375616. Throughput: 0: 10774.7. Samples: 398923776. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:46:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:46:36,197][1652475] Updated weights for policy 0, policy_version 779024 (0.0013) [2024-06-15 21:46:38,460][1652475] Updated weights for policy 0, policy_version 779120 (0.0016) [2024-06-15 21:46:40,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1595670528. Throughput: 0: 10581.3. Samples: 398972416. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:46:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:46:42,081][1652475] Updated weights for policy 0, policy_version 779200 (0.0017) [2024-06-15 21:46:45,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 43144.5, 300 sec: 42209.6). Total num frames: 1595801600. Throughput: 0: 10581.3. Samples: 399038464. Policy #0 lag: (min: 172.0, avg: 230.2, max: 396.0) [2024-06-15 21:46:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:46:48,836][1652475] Updated weights for policy 0, policy_version 779281 (0.0012) [2024-06-15 21:46:49,733][1652475] Updated weights for policy 0, policy_version 779322 (0.0013) [2024-06-15 21:46:50,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.0, 300 sec: 41765.3). Total num frames: 1596063744. Throughput: 0: 10627.1. Samples: 399070720. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:46:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:46:51,579][1652475] Updated weights for policy 0, policy_version 779376 (0.0012) [2024-06-15 21:46:53,471][1652475] Updated weights for policy 0, policy_version 779424 (0.0046) [2024-06-15 21:46:55,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1596325888. Throughput: 0: 10308.3. Samples: 399130112. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:46:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:46:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000779456_1596325888.pth... [2024-06-15 21:46:55,793][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000774576_1586331648.pth [2024-06-15 21:46:58,844][1652475] Updated weights for policy 0, policy_version 779459 (0.0010) [2024-06-15 21:47:00,351][1652475] Updated weights for policy 0, policy_version 779522 (0.0012) [2024-06-15 21:47:00,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 1596489728. Throughput: 0: 10513.1. Samples: 399199744. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:47:03,793][1652475] Updated weights for policy 0, policy_version 779600 (0.0010) [2024-06-15 21:47:04,671][1651340] Signal inference workers to stop experience collection... (40100 times) [2024-06-15 21:47:04,719][1652475] InferenceWorker_p0-w0: stopping experience collection (40100 times) [2024-06-15 21:47:04,980][1651340] Signal inference workers to resume experience collection... (40100 times) [2024-06-15 21:47:04,981][1652475] InferenceWorker_p0-w0: resuming experience collection (40100 times) [2024-06-15 21:47:05,420][1652475] Updated weights for policy 0, policy_version 779664 (0.0010) [2024-06-15 21:47:05,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42053.0, 300 sec: 42320.7). Total num frames: 1596751872. Throughput: 0: 10376.5. Samples: 399227392. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:47:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.0, 300 sec: 41987.5). Total num frames: 1596850176. Throughput: 0: 10353.7. Samples: 399286272. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:47:11,395][1652475] Updated weights for policy 0, policy_version 779728 (0.0012) [2024-06-15 21:47:15,421][1652475] Updated weights for policy 0, policy_version 779792 (0.0107) [2024-06-15 21:47:15,738][1648984] Fps is (10 sec: 26214.2, 60 sec: 39867.7, 300 sec: 41432.1). Total num frames: 1597014016. Throughput: 0: 10467.6. Samples: 399356416. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:15,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 21:47:17,264][1652475] Updated weights for policy 0, policy_version 779858 (0.0010) [2024-06-15 21:47:18,873][1652475] Updated weights for policy 0, policy_version 779923 (0.0015) [2024-06-15 21:47:20,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43144.5, 300 sec: 42320.7). Total num frames: 1597374464. Throughput: 0: 10103.5. Samples: 399378432. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:47:24,562][1652475] Updated weights for policy 0, policy_version 780007 (0.0012) [2024-06-15 21:47:25,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 39867.6, 300 sec: 41765.3). Total num frames: 1597505536. Throughput: 0: 10501.7. Samples: 399444992. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:25,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 21:47:27,043][1652475] Updated weights for policy 0, policy_version 780049 (0.0033) [2024-06-15 21:47:28,277][1652475] Updated weights for policy 0, policy_version 780096 (0.0011) [2024-06-15 21:47:30,705][1652475] Updated weights for policy 0, policy_version 780192 (0.0124) [2024-06-15 21:47:30,739][1648984] Fps is (10 sec: 45870.1, 60 sec: 42597.6, 300 sec: 42431.6). Total num frames: 1597833216. Throughput: 0: 10444.5. Samples: 399508480. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:30,739][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 21:47:35,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 1597931520. Throughput: 0: 10444.8. Samples: 399540736. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 21:47:35,975][1652475] Updated weights for policy 0, policy_version 780256 (0.0012) [2024-06-15 21:47:39,151][1652475] Updated weights for policy 0, policy_version 780340 (0.0013) [2024-06-15 21:47:40,738][1648984] Fps is (10 sec: 32771.5, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 1598160896. Throughput: 0: 10649.6. Samples: 399609344. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:40,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 21:47:42,703][1652475] Updated weights for policy 0, policy_version 780417 (0.0013) [2024-06-15 21:47:43,693][1652475] Updated weights for policy 0, policy_version 780480 (0.0014) [2024-06-15 21:47:45,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 1598423040. Throughput: 0: 10490.3. Samples: 399671808. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:47:48,543][1652475] Updated weights for policy 0, policy_version 780539 (0.0011) [2024-06-15 21:47:49,763][1651340] Signal inference workers to stop experience collection... (40150 times) [2024-06-15 21:47:49,833][1652475] InferenceWorker_p0-w0: stopping experience collection (40150 times) [2024-06-15 21:47:50,042][1651340] Signal inference workers to resume experience collection... (40150 times) [2024-06-15 21:47:50,043][1652475] InferenceWorker_p0-w0: resuming experience collection (40150 times) [2024-06-15 21:47:50,249][1652475] Updated weights for policy 0, policy_version 780603 (0.0136) [2024-06-15 21:47:50,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.8, 300 sec: 42098.6). Total num frames: 1598685184. Throughput: 0: 10672.4. Samples: 399707648. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:47:54,376][1652475] Updated weights for policy 0, policy_version 780658 (0.0012) [2024-06-15 21:47:55,738][1648984] Fps is (10 sec: 49150.2, 60 sec: 43144.3, 300 sec: 42542.8). Total num frames: 1598914560. Throughput: 0: 10899.8. Samples: 399776768. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:47:55,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 21:47:55,760][1652475] Updated weights for policy 0, policy_version 780732 (0.0012) [2024-06-15 21:47:59,942][1652475] Updated weights for policy 0, policy_version 780772 (0.0013) [2024-06-15 21:48:00,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 42598.3, 300 sec: 41876.5). Total num frames: 1599045632. Throughput: 0: 10683.7. Samples: 399837184. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:48:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:48:00,912][1652475] Updated weights for policy 0, policy_version 780801 (0.0015) [2024-06-15 21:48:04,660][1652475] Updated weights for policy 0, policy_version 780880 (0.0014) [2024-06-15 21:48:05,738][1648984] Fps is (10 sec: 42599.8, 60 sec: 43144.5, 300 sec: 42209.6). Total num frames: 1599340544. Throughput: 0: 10979.6. Samples: 399872512. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:48:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:48:09,196][1652475] Updated weights for policy 0, policy_version 780960 (0.0014) [2024-06-15 21:48:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.6, 300 sec: 42209.6). Total num frames: 1599471616. Throughput: 0: 10854.4. Samples: 399933440. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:48:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:48:11,402][1652475] Updated weights for policy 0, policy_version 781024 (0.0023) [2024-06-15 21:48:13,935][1652475] Updated weights for policy 0, policy_version 781072 (0.0021) [2024-06-15 21:48:15,085][1652475] Updated weights for policy 0, policy_version 781120 (0.0010) [2024-06-15 21:48:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 45329.1, 300 sec: 41876.4). Total num frames: 1599733760. Throughput: 0: 10900.2. Samples: 399998976. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:48:15,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:48:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 1599864832. Throughput: 0: 10922.7. Samples: 400032256. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:48:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:48:21,325][1652475] Updated weights for policy 0, policy_version 781188 (0.0018) [2024-06-15 21:48:22,612][1652475] Updated weights for policy 0, policy_version 781250 (0.0119) [2024-06-15 21:48:23,770][1652475] Updated weights for policy 0, policy_version 781306 (0.0016) [2024-06-15 21:48:25,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 42209.7). Total num frames: 1600159744. Throughput: 0: 10934.1. Samples: 400101376. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:48:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:48:26,078][1652475] Updated weights for policy 0, policy_version 781344 (0.0011) [2024-06-15 21:48:26,912][1652475] Updated weights for policy 0, policy_version 781376 (0.0012) [2024-06-15 21:48:28,816][1652475] Updated weights for policy 0, policy_version 781437 (0.0022) [2024-06-15 21:48:30,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 42599.2, 300 sec: 42209.6). Total num frames: 1600389120. Throughput: 0: 10979.5. Samples: 400165888. Policy #0 lag: (min: 3.0, avg: 67.1, max: 259.0) [2024-06-15 21:48:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:48:34,288][1652475] Updated weights for policy 0, policy_version 781507 (0.0084) [2024-06-15 21:48:35,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 45329.1, 300 sec: 42542.9). Total num frames: 1600651264. Throughput: 0: 11013.7. Samples: 400203264. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:48:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:48:36,953][1651340] Signal inference workers to stop experience collection... (40200 times) [2024-06-15 21:48:37,015][1652475] InferenceWorker_p0-w0: stopping experience collection (40200 times) [2024-06-15 21:48:37,288][1651340] Signal inference workers to resume experience collection... (40200 times) [2024-06-15 21:48:37,289][1652475] InferenceWorker_p0-w0: resuming experience collection (40200 times) [2024-06-15 21:48:37,491][1652475] Updated weights for policy 0, policy_version 781604 (0.0109) [2024-06-15 21:48:40,516][1652475] Updated weights for policy 0, policy_version 781652 (0.0011) [2024-06-15 21:48:40,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 41987.5). Total num frames: 1600847872. Throughput: 0: 10877.2. Samples: 400266240. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:48:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:48:45,435][1652475] Updated weights for policy 0, policy_version 781714 (0.0013) [2024-06-15 21:48:45,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 1600978944. Throughput: 0: 11093.3. Samples: 400336384. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:48:45,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 21:48:47,021][1652475] Updated weights for policy 0, policy_version 781782 (0.0013) [2024-06-15 21:48:49,074][1652475] Updated weights for policy 0, policy_version 781843 (0.0011) [2024-06-15 21:48:50,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 43690.5, 300 sec: 42320.7). Total num frames: 1601306624. Throughput: 0: 10843.0. Samples: 400360448. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:48:50,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:48:52,440][1652475] Updated weights for policy 0, policy_version 781895 (0.0015) [2024-06-15 21:48:53,499][1652475] Updated weights for policy 0, policy_version 781952 (0.0011) [2024-06-15 21:48:55,738][1648984] Fps is (10 sec: 45873.5, 60 sec: 42052.2, 300 sec: 42209.7). Total num frames: 1601437696. Throughput: 0: 10899.8. Samples: 400423936. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:48:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:48:55,795][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000781952_1601437696.pth... [2024-06-15 21:48:55,914][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000776976_1591246848.pth [2024-06-15 21:48:59,343][1652475] Updated weights for policy 0, policy_version 782016 (0.0011) [2024-06-15 21:49:00,527][1652475] Updated weights for policy 0, policy_version 782064 (0.0013) [2024-06-15 21:49:00,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 43690.6, 300 sec: 42320.7). Total num frames: 1601667072. Throughput: 0: 10888.5. Samples: 400488960. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:49:02,456][1652475] Updated weights for policy 0, policy_version 782137 (0.0012) [2024-06-15 21:49:05,740][1648984] Fps is (10 sec: 45866.6, 60 sec: 42596.8, 300 sec: 41987.4). Total num frames: 1601896448. Throughput: 0: 10705.9. Samples: 400514048. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:05,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:49:05,997][1652475] Updated weights for policy 0, policy_version 782205 (0.0012) [2024-06-15 21:49:10,738][1648984] Fps is (10 sec: 29491.6, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 1601961984. Throughput: 0: 10683.7. Samples: 400582144. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:49:12,120][1652475] Updated weights for policy 0, policy_version 782274 (0.0012) [2024-06-15 21:49:14,371][1652475] Updated weights for policy 0, policy_version 782368 (0.0012) [2024-06-15 21:49:15,738][1648984] Fps is (10 sec: 45885.8, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 1602355200. Throughput: 0: 10490.3. Samples: 400637952. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:49:18,005][1652475] Updated weights for policy 0, policy_version 782408 (0.0013) [2024-06-15 21:49:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 41987.5). Total num frames: 1602486272. Throughput: 0: 10456.2. Samples: 400673792. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:49:23,243][1652475] Updated weights for policy 0, policy_version 782481 (0.0012) [2024-06-15 21:49:24,691][1651340] Signal inference workers to stop experience collection... (40250 times) [2024-06-15 21:49:24,791][1652475] InferenceWorker_p0-w0: stopping experience collection (40250 times) [2024-06-15 21:49:24,815][1652475] Updated weights for policy 0, policy_version 782551 (0.0012) [2024-06-15 21:49:24,961][1651340] Signal inference workers to resume experience collection... (40250 times) [2024-06-15 21:49:24,963][1652475] InferenceWorker_p0-w0: resuming experience collection (40250 times) [2024-06-15 21:49:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1602748416. Throughput: 0: 10513.1. Samples: 400739328. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:49:27,111][1652475] Updated weights for policy 0, policy_version 782640 (0.0011) [2024-06-15 21:49:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 1602879488. Throughput: 0: 10308.3. Samples: 400800256. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:30,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 21:49:33,787][1652475] Updated weights for policy 0, policy_version 782710 (0.0011) [2024-06-15 21:49:35,754][1648984] Fps is (10 sec: 32767.6, 60 sec: 40413.8, 300 sec: 42098.5). Total num frames: 1603076096. Throughput: 0: 10456.2. Samples: 400830976. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:35,756][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:49:36,057][1652475] Updated weights for policy 0, policy_version 782768 (0.0014) [2024-06-15 21:49:37,726][1652475] Updated weights for policy 0, policy_version 782832 (0.0012) [2024-06-15 21:49:38,675][1652475] Updated weights for policy 0, policy_version 782867 (0.0011) [2024-06-15 21:49:40,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 42598.5, 300 sec: 42431.8). Total num frames: 1603403776. Throughput: 0: 10228.7. Samples: 400884224. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:40,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 21:49:45,738][1648984] Fps is (10 sec: 32768.5, 60 sec: 40413.9, 300 sec: 41765.3). Total num frames: 1603403776. Throughput: 0: 10501.7. Samples: 400961536. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 21:49:46,509][1652475] Updated weights for policy 0, policy_version 782929 (0.0010) [2024-06-15 21:49:48,681][1652475] Updated weights for policy 0, policy_version 783012 (0.0013) [2024-06-15 21:49:50,172][1652475] Updated weights for policy 0, policy_version 783103 (0.0043) [2024-06-15 21:49:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.4, 300 sec: 42876.1). Total num frames: 1603829760. Throughput: 0: 10547.7. Samples: 400988672. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:49:51,662][1652475] Updated weights for policy 0, policy_version 783162 (0.0100) [2024-06-15 21:49:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 41506.4, 300 sec: 41765.3). Total num frames: 1603928064. Throughput: 0: 10387.9. Samples: 401049600. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:49:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:50:00,720][1652475] Updated weights for policy 0, policy_version 783264 (0.0163) [2024-06-15 21:50:00,772][1648984] Fps is (10 sec: 29391.8, 60 sec: 40937.0, 300 sec: 42426.9). Total num frames: 1604124672. Throughput: 0: 10607.5. Samples: 401115648. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:50:00,772][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:50:02,341][1652475] Updated weights for policy 0, policy_version 783344 (0.0014) [2024-06-15 21:50:04,204][1652475] Updated weights for policy 0, policy_version 783424 (0.0012) [2024-06-15 21:50:05,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 42600.0, 300 sec: 42542.8). Total num frames: 1604452352. Throughput: 0: 10274.1. Samples: 401136128. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:50:05,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 21:50:10,738][1648984] Fps is (10 sec: 32879.1, 60 sec: 41506.1, 300 sec: 41765.3). Total num frames: 1604452352. Throughput: 0: 10376.5. Samples: 401206272. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:50:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:50:11,978][1651340] Signal inference workers to stop experience collection... (40300 times) [2024-06-15 21:50:12,071][1652475] InferenceWorker_p0-w0: stopping experience collection (40300 times) [2024-06-15 21:50:12,178][1651340] Signal inference workers to resume experience collection... (40300 times) [2024-06-15 21:50:12,199][1652475] InferenceWorker_p0-w0: resuming experience collection (40300 times) [2024-06-15 21:50:12,656][1652475] Updated weights for policy 0, policy_version 783489 (0.0092) [2024-06-15 21:50:14,270][1652475] Updated weights for policy 0, policy_version 783555 (0.0012) [2024-06-15 21:50:15,699][1652475] Updated weights for policy 0, policy_version 783632 (0.0010) [2024-06-15 21:50:15,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1604878336. Throughput: 0: 10376.6. Samples: 401267200. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:50:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:50:20,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 1604976640. Throughput: 0: 10433.5. Samples: 401300480. Policy #0 lag: (min: 11.0, avg: 93.7, max: 267.0) [2024-06-15 21:50:20,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 21:50:23,050][1652475] Updated weights for policy 0, policy_version 783684 (0.0021) [2024-06-15 21:50:24,146][1652475] Updated weights for policy 0, policy_version 783736 (0.0011) [2024-06-15 21:50:25,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 40960.1, 300 sec: 42542.9). Total num frames: 1605206016. Throughput: 0: 10763.4. Samples: 401368576. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:50:25,738][1648984] Avg episode reward: [(0, '-0.140')] [2024-06-15 21:50:25,742][1652475] Updated weights for policy 0, policy_version 783793 (0.0013) [2024-06-15 21:50:27,200][1652475] Updated weights for policy 0, policy_version 783868 (0.0012) [2024-06-15 21:50:29,244][1652475] Updated weights for policy 0, policy_version 783929 (0.0012) [2024-06-15 21:50:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1605500928. Throughput: 0: 10342.4. Samples: 401426944. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:50:30,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 21:50:35,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 42052.3, 300 sec: 42098.6). Total num frames: 1605599232. Throughput: 0: 10524.4. Samples: 401462272. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:50:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:50:35,748][1652475] Updated weights for policy 0, policy_version 783992 (0.0016) [2024-06-15 21:50:37,469][1652475] Updated weights for policy 0, policy_version 784064 (0.0012) [2024-06-15 21:50:40,665][1652475] Updated weights for policy 0, policy_version 784132 (0.0066) [2024-06-15 21:50:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 1605894144. Throughput: 0: 10649.6. Samples: 401528832. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:50:40,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:50:42,051][1652475] Updated weights for policy 0, policy_version 784192 (0.0012) [2024-06-15 21:50:45,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1606025216. Throughput: 0: 10612.1. Samples: 401592832. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:50:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:50:47,514][1652475] Updated weights for policy 0, policy_version 784242 (0.0012) [2024-06-15 21:50:48,841][1652475] Updated weights for policy 0, policy_version 784291 (0.0012) [2024-06-15 21:50:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 1606287360. Throughput: 0: 10899.9. Samples: 401626624. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:50:50,745][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:50:51,502][1652475] Updated weights for policy 0, policy_version 784337 (0.0012) [2024-06-15 21:50:53,130][1651340] Signal inference workers to stop experience collection... (40350 times) [2024-06-15 21:50:53,169][1652475] InferenceWorker_p0-w0: stopping experience collection (40350 times) [2024-06-15 21:50:53,320][1651340] Signal inference workers to resume experience collection... (40350 times) [2024-06-15 21:50:53,321][1652475] InferenceWorker_p0-w0: resuming experience collection (40350 times) [2024-06-15 21:50:53,916][1652475] Updated weights for policy 0, policy_version 784443 (0.0012) [2024-06-15 21:50:55,757][1648984] Fps is (10 sec: 52326.4, 60 sec: 43676.5, 300 sec: 42540.0). Total num frames: 1606549504. Throughput: 0: 10599.5. Samples: 401683456. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:50:55,759][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:50:55,771][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000784448_1606549504.pth... [2024-06-15 21:50:55,850][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000779456_1596325888.pth [2024-06-15 21:50:55,857][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000784448_1606549504.pth [2024-06-15 21:50:59,793][1652475] Updated weights for policy 0, policy_version 784512 (0.0018) [2024-06-15 21:51:00,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43168.9, 300 sec: 42320.8). Total num frames: 1606713344. Throughput: 0: 10865.8. Samples: 401756160. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:51:01,745][1652475] Updated weights for policy 0, policy_version 784575 (0.0012) [2024-06-15 21:51:04,647][1652475] Updated weights for policy 0, policy_version 784636 (0.0093) [2024-06-15 21:51:05,738][1648984] Fps is (10 sec: 45964.0, 60 sec: 42598.3, 300 sec: 42765.0). Total num frames: 1607008256. Throughput: 0: 10820.2. Samples: 401787392. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:05,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:51:06,225][1652475] Updated weights for policy 0, policy_version 784695 (0.0102) [2024-06-15 21:51:10,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43690.6, 300 sec: 42209.6). Total num frames: 1607073792. Throughput: 0: 10638.2. Samples: 401847296. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:51:11,960][1652475] Updated weights for policy 0, policy_version 784741 (0.0013) [2024-06-15 21:51:14,775][1652475] Updated weights for policy 0, policy_version 784801 (0.0014) [2024-06-15 21:51:15,738][1648984] Fps is (10 sec: 32768.6, 60 sec: 40960.0, 300 sec: 42542.9). Total num frames: 1607335936. Throughput: 0: 10786.1. Samples: 401912320. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:15,739][1648984] Avg episode reward: [(0, '-0.160')] [2024-06-15 21:51:16,740][1652475] Updated weights for policy 0, policy_version 784880 (0.0011) [2024-06-15 21:51:17,804][1652475] Updated weights for policy 0, policy_version 784928 (0.0011) [2024-06-15 21:51:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 42320.7). Total num frames: 1607598080. Throughput: 0: 10535.8. Samples: 401936384. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:20,742][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:51:23,718][1652475] Updated weights for policy 0, policy_version 784992 (0.0016) [2024-06-15 21:51:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 1607729152. Throughput: 0: 10535.8. Samples: 402002944. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:25,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:51:28,040][1652475] Updated weights for policy 0, policy_version 785072 (0.0015) [2024-06-15 21:51:29,128][1652475] Updated weights for policy 0, policy_version 785105 (0.0013) [2024-06-15 21:51:30,687][1652475] Updated weights for policy 0, policy_version 785175 (0.0012) [2024-06-15 21:51:30,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 1608024064. Throughput: 0: 10478.9. Samples: 402064384. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:30,751][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:51:31,770][1652475] Updated weights for policy 0, policy_version 785214 (0.0013) [2024-06-15 21:51:35,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 1608155136. Throughput: 0: 10410.6. Samples: 402095104. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:51:36,917][1652475] Updated weights for policy 0, policy_version 785277 (0.0032) [2024-06-15 21:51:40,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1608384512. Throughput: 0: 10722.5. Samples: 402165760. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:51:41,054][1652475] Updated weights for policy 0, policy_version 785360 (0.0012) [2024-06-15 21:51:41,459][1651340] Signal inference workers to stop experience collection... (40400 times) [2024-06-15 21:51:41,493][1652475] InferenceWorker_p0-w0: stopping experience collection (40400 times) [2024-06-15 21:51:41,674][1651340] Signal inference workers to resume experience collection... (40400 times) [2024-06-15 21:51:41,684][1652475] InferenceWorker_p0-w0: resuming experience collection (40400 times) [2024-06-15 21:51:42,939][1652475] Updated weights for policy 0, policy_version 785440 (0.0012) [2024-06-15 21:51:45,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 1608646656. Throughput: 0: 10353.8. Samples: 402222080. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:45,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:51:49,938][1652475] Updated weights for policy 0, policy_version 785524 (0.0013) [2024-06-15 21:51:50,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.0, 300 sec: 42209.6). Total num frames: 1608777728. Throughput: 0: 10547.2. Samples: 402262016. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:50,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:51:52,052][1652475] Updated weights for policy 0, policy_version 785600 (0.0052) [2024-06-15 21:51:54,465][1652475] Updated weights for policy 0, policy_version 785663 (0.0018) [2024-06-15 21:51:55,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 42612.2, 300 sec: 42765.0). Total num frames: 1609105408. Throughput: 0: 10331.0. Samples: 402312192. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:51:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:51:56,189][1652475] Updated weights for policy 0, policy_version 785728 (0.0012) [2024-06-15 21:52:00,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 40960.0, 300 sec: 42098.6). Total num frames: 1609170944. Throughput: 0: 10376.5. Samples: 402379264. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:52:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:52:04,680][1652475] Updated weights for policy 0, policy_version 785808 (0.0090) [2024-06-15 21:52:05,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 39867.9, 300 sec: 42542.9). Total num frames: 1609400320. Throughput: 0: 10729.2. Samples: 402419200. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:52:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:52:06,584][1652475] Updated weights for policy 0, policy_version 785875 (0.0023) [2024-06-15 21:52:08,256][1652475] Updated weights for policy 0, policy_version 785952 (0.0012) [2024-06-15 21:52:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1609695232. Throughput: 0: 10285.5. Samples: 402465792. Policy #0 lag: (min: 8.0, avg: 61.0, max: 264.0) [2024-06-15 21:52:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:52:15,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 39867.8, 300 sec: 41876.4). Total num frames: 1609728000. Throughput: 0: 10649.6. Samples: 402543616. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:52:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:52:16,208][1652475] Updated weights for policy 0, policy_version 786017 (0.0012) [2024-06-15 21:52:18,231][1652475] Updated weights for policy 0, policy_version 786112 (0.0109) [2024-06-15 21:52:20,474][1652475] Updated weights for policy 0, policy_version 786208 (0.0123) [2024-06-15 21:52:20,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1610153984. Throughput: 0: 10501.7. Samples: 402567680. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:52:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:52:25,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 41506.1, 300 sec: 41987.6). Total num frames: 1610219520. Throughput: 0: 10274.1. Samples: 402628096. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:52:25,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 21:52:28,910][1652475] Updated weights for policy 0, policy_version 786288 (0.0012) [2024-06-15 21:52:29,358][1651340] Signal inference workers to stop experience collection... (40450 times) [2024-06-15 21:52:29,391][1652475] InferenceWorker_p0-w0: stopping experience collection (40450 times) [2024-06-15 21:52:29,530][1651340] Signal inference workers to resume experience collection... (40450 times) [2024-06-15 21:52:29,531][1652475] InferenceWorker_p0-w0: resuming experience collection (40450 times) [2024-06-15 21:52:30,725][1652475] Updated weights for policy 0, policy_version 786368 (0.0092) [2024-06-15 21:52:30,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 40959.9, 300 sec: 42542.9). Total num frames: 1610481664. Throughput: 0: 10524.4. Samples: 402695680. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:52:30,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:52:32,024][1652475] Updated weights for policy 0, policy_version 786419 (0.0037) [2024-06-15 21:52:33,445][1652475] Updated weights for policy 0, policy_version 786492 (0.0011) [2024-06-15 21:52:35,740][1648984] Fps is (10 sec: 52428.9, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1610743808. Throughput: 0: 10205.9. Samples: 402721280. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:52:35,741][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:52:40,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 39867.7, 300 sec: 41876.4). Total num frames: 1610776576. Throughput: 0: 10752.0. Samples: 402796032. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:52:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:52:41,007][1652475] Updated weights for policy 0, policy_version 786533 (0.0012) [2024-06-15 21:52:43,148][1652475] Updated weights for policy 0, policy_version 786624 (0.0012) [2024-06-15 21:52:45,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1611202560. Throughput: 0: 10342.4. Samples: 402844672. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:52:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:52:45,807][1652475] Updated weights for policy 0, policy_version 786721 (0.0012) [2024-06-15 21:52:50,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 1611268096. Throughput: 0: 10114.8. Samples: 402874368. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:52:50,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 21:52:53,063][1652475] Updated weights for policy 0, policy_version 786771 (0.0102) [2024-06-15 21:52:55,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 39867.7, 300 sec: 42209.6). Total num frames: 1611497472. Throughput: 0: 10649.6. Samples: 402945024. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:52:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:52:55,883][1652475] Updated weights for policy 0, policy_version 786880 (0.0013) [2024-06-15 21:52:56,304][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000786896_1611563008.pth... [2024-06-15 21:52:56,449][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000781952_1601437696.pth [2024-06-15 21:52:57,731][1652475] Updated weights for policy 0, policy_version 786949 (0.0011) [2024-06-15 21:52:59,045][1652475] Updated weights for policy 0, policy_version 787006 (0.0013) [2024-06-15 21:53:00,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1611792384. Throughput: 0: 10058.0. Samples: 402996224. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:53:05,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 40413.9, 300 sec: 41876.4). Total num frames: 1611825152. Throughput: 0: 10376.5. Samples: 403034624. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:53:06,482][1652475] Updated weights for policy 0, policy_version 787066 (0.0010) [2024-06-15 21:53:07,660][1652475] Updated weights for policy 0, policy_version 787104 (0.0012) [2024-06-15 21:53:08,453][1652475] Updated weights for policy 0, policy_version 787134 (0.0013) [2024-06-15 21:53:10,311][1652475] Updated weights for policy 0, policy_version 787187 (0.0014) [2024-06-15 21:53:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1612185600. Throughput: 0: 10433.5. Samples: 403097600. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:53:11,069][1651340] Signal inference workers to stop experience collection... (40500 times) [2024-06-15 21:53:11,122][1652475] InferenceWorker_p0-w0: stopping experience collection (40500 times) [2024-06-15 21:53:11,335][1651340] Signal inference workers to resume experience collection... (40500 times) [2024-06-15 21:53:11,339][1652475] InferenceWorker_p0-w0: resuming experience collection (40500 times) [2024-06-15 21:53:11,938][1652475] Updated weights for policy 0, policy_version 787260 (0.0012) [2024-06-15 21:53:15,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43144.5, 300 sec: 42209.6). Total num frames: 1612316672. Throughput: 0: 10376.5. Samples: 403162624. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:15,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:53:18,783][1652475] Updated weights for policy 0, policy_version 787325 (0.0013) [2024-06-15 21:53:20,717][1652475] Updated weights for policy 0, policy_version 787380 (0.0112) [2024-06-15 21:53:20,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 39867.8, 300 sec: 41987.5). Total num frames: 1612546048. Throughput: 0: 10467.6. Samples: 403192320. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:53:23,326][1652475] Updated weights for policy 0, policy_version 787459 (0.0012) [2024-06-15 21:53:24,637][1652475] Updated weights for policy 0, policy_version 787513 (0.0011) [2024-06-15 21:53:25,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 43690.6, 300 sec: 42209.6). Total num frames: 1612840960. Throughput: 0: 9989.7. Samples: 403245568. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:25,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 21:53:30,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 39867.8, 300 sec: 41432.1). Total num frames: 1612873728. Throughput: 0: 10638.2. Samples: 403323392. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:53:31,410][1652475] Updated weights for policy 0, policy_version 787568 (0.0013) [2024-06-15 21:53:33,301][1652475] Updated weights for policy 0, policy_version 787640 (0.0011) [2024-06-15 21:53:35,370][1652475] Updated weights for policy 0, policy_version 787696 (0.0015) [2024-06-15 21:53:35,766][1648984] Fps is (10 sec: 39210.4, 60 sec: 41486.5, 300 sec: 41983.4). Total num frames: 1613234176. Throughput: 0: 10415.4. Samples: 403343360. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:35,767][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:53:40,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43144.6, 300 sec: 41987.5). Total num frames: 1613365248. Throughput: 0: 10217.3. Samples: 403404800. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:53:43,777][1652475] Updated weights for policy 0, policy_version 787793 (0.0016) [2024-06-15 21:53:45,738][1648984] Fps is (10 sec: 29574.9, 60 sec: 38775.3, 300 sec: 41432.1). Total num frames: 1613529088. Throughput: 0: 10592.6. Samples: 403472896. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:45,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:53:45,851][1652475] Updated weights for policy 0, policy_version 787857 (0.0022) [2024-06-15 21:53:46,923][1652475] Updated weights for policy 0, policy_version 787905 (0.0013) [2024-06-15 21:53:48,247][1652475] Updated weights for policy 0, policy_version 787969 (0.0012) [2024-06-15 21:53:50,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 43690.5, 300 sec: 42209.6). Total num frames: 1613889536. Throughput: 0: 10387.9. Samples: 403502080. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:50,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 21:53:54,765][1652475] Updated weights for policy 0, policy_version 788034 (0.0137) [2024-06-15 21:53:55,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 40960.0, 300 sec: 41654.2). Total num frames: 1613955072. Throughput: 0: 10740.6. Samples: 403580928. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:53:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:53:56,241][1652475] Updated weights for policy 0, policy_version 788095 (0.0013) [2024-06-15 21:53:57,997][1652475] Updated weights for policy 0, policy_version 788160 (0.0012) [2024-06-15 21:53:58,139][1651340] Signal inference workers to stop experience collection... (40550 times) [2024-06-15 21:53:58,192][1652475] InferenceWorker_p0-w0: stopping experience collection (40550 times) [2024-06-15 21:53:58,451][1651340] Signal inference workers to resume experience collection... (40550 times) [2024-06-15 21:53:58,452][1652475] InferenceWorker_p0-w0: resuming experience collection (40550 times) [2024-06-15 21:54:00,086][1652475] Updated weights for policy 0, policy_version 788256 (0.0017) [2024-06-15 21:54:00,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 43690.6, 300 sec: 42432.1). Total num frames: 1614413824. Throughput: 0: 10467.5. Samples: 403633664. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:54:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:54:05,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.5, 300 sec: 42209.6). Total num frames: 1614413824. Throughput: 0: 10706.5. Samples: 403674112. Policy #0 lag: (min: 15.0, avg: 68.3, max: 271.0) [2024-06-15 21:54:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:54:07,962][1652475] Updated weights for policy 0, policy_version 788320 (0.0181) [2024-06-15 21:54:09,231][1652475] Updated weights for policy 0, policy_version 788384 (0.0033) [2024-06-15 21:54:10,609][1652475] Updated weights for policy 0, policy_version 788435 (0.0012) [2024-06-15 21:54:10,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 42598.4, 300 sec: 41987.5). Total num frames: 1614741504. Throughput: 0: 10991.0. Samples: 403740160. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:54:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:54:15,739][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1614938112. Throughput: 0: 10695.1. Samples: 403804672. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:54:15,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:54:19,251][1652475] Updated weights for policy 0, policy_version 788547 (0.0022) [2024-06-15 21:54:20,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 1615069184. Throughput: 0: 11100.4. Samples: 403842560. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:54:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:54:21,324][1652475] Updated weights for policy 0, policy_version 788656 (0.0013) [2024-06-15 21:54:22,361][1652475] Updated weights for policy 0, policy_version 788708 (0.0015) [2024-06-15 21:54:25,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1615462400. Throughput: 0: 10956.8. Samples: 403897856. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:54:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:54:30,391][1652475] Updated weights for policy 0, policy_version 788804 (0.0012) [2024-06-15 21:54:30,739][1648984] Fps is (10 sec: 42592.0, 60 sec: 43689.6, 300 sec: 42098.3). Total num frames: 1615495168. Throughput: 0: 11309.2. Samples: 403981824. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:54:30,740][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 21:54:31,469][1652475] Updated weights for policy 0, policy_version 788854 (0.0018) [2024-06-15 21:54:32,596][1652475] Updated weights for policy 0, policy_version 788912 (0.0012) [2024-06-15 21:54:33,795][1652475] Updated weights for policy 0, policy_version 788965 (0.0013) [2024-06-15 21:54:35,682][1651340] Signal inference workers to stop experience collection... (40600 times) [2024-06-15 21:54:35,733][1652475] InferenceWorker_p0-w0: stopping experience collection (40600 times) [2024-06-15 21:54:35,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44804.3, 300 sec: 42431.8). Total num frames: 1615921152. Throughput: 0: 11355.1. Samples: 404013056. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:54:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:54:35,944][1651340] Signal inference workers to resume experience collection... (40600 times) [2024-06-15 21:54:35,945][1652475] InferenceWorker_p0-w0: resuming experience collection (40600 times) [2024-06-15 21:54:35,947][1652475] Updated weights for policy 0, policy_version 789040 (0.0012) [2024-06-15 21:54:40,738][1648984] Fps is (10 sec: 49159.1, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1615986688. Throughput: 0: 11173.0. Samples: 404083712. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:54:40,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 21:54:41,847][1652475] Updated weights for policy 0, policy_version 789088 (0.0012) [2024-06-15 21:54:42,856][1652475] Updated weights for policy 0, policy_version 789136 (0.0127) [2024-06-15 21:54:44,526][1652475] Updated weights for policy 0, policy_version 789216 (0.0013) [2024-06-15 21:54:45,750][1648984] Fps is (10 sec: 45817.5, 60 sec: 47503.9, 300 sec: 42541.1). Total num frames: 1616379904. Throughput: 0: 11442.9. Samples: 404148736. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:54:45,751][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:54:46,814][1652475] Updated weights for policy 0, policy_version 789253 (0.0011) [2024-06-15 21:54:48,150][1652475] Updated weights for policy 0, policy_version 789311 (0.0011) [2024-06-15 21:54:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1616510976. Throughput: 0: 11252.6. Samples: 404180480. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:54:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:54:54,782][1652475] Updated weights for policy 0, policy_version 789378 (0.0014) [2024-06-15 21:54:55,738][1648984] Fps is (10 sec: 36089.2, 60 sec: 46421.2, 300 sec: 42769.9). Total num frames: 1616740352. Throughput: 0: 11332.2. Samples: 404250112. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:54:55,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:54:56,031][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000789440_1616773120.pth... [2024-06-15 21:54:56,210][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000784448_1606549504.pth [2024-06-15 21:54:56,842][1652475] Updated weights for policy 0, policy_version 789472 (0.0014) [2024-06-15 21:54:59,679][1652475] Updated weights for policy 0, policy_version 789537 (0.0013) [2024-06-15 21:55:00,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1617035264. Throughput: 0: 11150.2. Samples: 404306432. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:55:04,074][1652475] Updated weights for policy 0, policy_version 789570 (0.0010) [2024-06-15 21:55:05,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 45875.1, 300 sec: 43098.2). Total num frames: 1617166336. Throughput: 0: 11116.1. Samples: 404342784. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:05,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:55:08,426][1652475] Updated weights for policy 0, policy_version 789634 (0.0016) [2024-06-15 21:55:10,327][1652475] Updated weights for policy 0, policy_version 789712 (0.0010) [2024-06-15 21:55:10,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 43690.6, 300 sec: 42320.7). Total num frames: 1617362944. Throughput: 0: 11355.0. Samples: 404408832. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:55:11,817][1652475] Updated weights for policy 0, policy_version 789763 (0.0011) [2024-06-15 21:55:13,271][1652475] Updated weights for policy 0, policy_version 789824 (0.0012) [2024-06-15 21:55:15,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1617559552. Throughput: 0: 10695.5. Samples: 404463104. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:15,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 21:55:16,901][1652475] Updated weights for policy 0, policy_version 789883 (0.0024) [2024-06-15 21:55:20,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 43690.7, 300 sec: 42320.7). Total num frames: 1617690624. Throughput: 0: 10740.6. Samples: 404496384. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:55:23,196][1652475] Updated weights for policy 0, policy_version 789956 (0.0015) [2024-06-15 21:55:24,055][1651340] Signal inference workers to stop experience collection... (40650 times) [2024-06-15 21:55:24,111][1652475] InferenceWorker_p0-w0: stopping experience collection (40650 times) [2024-06-15 21:55:24,369][1651340] Signal inference workers to resume experience collection... (40650 times) [2024-06-15 21:55:24,370][1652475] InferenceWorker_p0-w0: resuming experience collection (40650 times) [2024-06-15 21:55:25,641][1652475] Updated weights for policy 0, policy_version 790055 (0.0098) [2024-06-15 21:55:25,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1618018304. Throughput: 0: 10490.3. Samples: 404555776. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:25,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:55:29,053][1652475] Updated weights for policy 0, policy_version 790112 (0.0011) [2024-06-15 21:55:30,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 45330.2, 300 sec: 42765.0). Total num frames: 1618214912. Throughput: 0: 10390.8. Samples: 404616192. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:55:34,868][1652475] Updated weights for policy 0, policy_version 790202 (0.0014) [2024-06-15 21:55:35,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 40413.9, 300 sec: 42209.6). Total num frames: 1618345984. Throughput: 0: 10524.5. Samples: 404654080. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:55:37,316][1652475] Updated weights for policy 0, policy_version 790288 (0.0080) [2024-06-15 21:55:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1618608128. Throughput: 0: 10171.8. Samples: 404707840. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:55:41,226][1652475] Updated weights for policy 0, policy_version 790353 (0.0013) [2024-06-15 21:55:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 39329.8, 300 sec: 42209.6). Total num frames: 1618739200. Throughput: 0: 10456.2. Samples: 404776960. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 21:55:45,872][1652475] Updated weights for policy 0, policy_version 790403 (0.0014) [2024-06-15 21:55:46,886][1652475] Updated weights for policy 0, policy_version 790458 (0.0011) [2024-06-15 21:55:48,698][1652475] Updated weights for policy 0, policy_version 790497 (0.0012) [2024-06-15 21:55:49,907][1652475] Updated weights for policy 0, policy_version 790551 (0.0012) [2024-06-15 21:55:50,738][1648984] Fps is (10 sec: 49150.8, 60 sec: 43144.4, 300 sec: 42545.6). Total num frames: 1619099648. Throughput: 0: 10422.0. Samples: 404811776. Policy #0 lag: (min: 14.0, avg: 78.0, max: 270.0) [2024-06-15 21:55:50,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:55:54,294][1652475] Updated weights for policy 0, policy_version 790608 (0.0042) [2024-06-15 21:55:55,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42052.4, 300 sec: 42542.9). Total num frames: 1619263488. Throughput: 0: 10410.7. Samples: 404877312. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:55:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:55:57,767][1652475] Updated weights for policy 0, policy_version 790673 (0.0013) [2024-06-15 21:55:59,916][1652475] Updated weights for policy 0, policy_version 790724 (0.0030) [2024-06-15 21:56:00,738][1648984] Fps is (10 sec: 36045.7, 60 sec: 40413.8, 300 sec: 42209.7). Total num frames: 1619460096. Throughput: 0: 10592.7. Samples: 404939776. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:56:01,778][1652475] Updated weights for policy 0, policy_version 790801 (0.0014) [2024-06-15 21:56:02,727][1652475] Updated weights for policy 0, policy_version 790848 (0.0013) [2024-06-15 21:56:05,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1619656704. Throughput: 0: 10433.4. Samples: 404965888. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:56:08,654][1652475] Updated weights for policy 0, policy_version 790910 (0.0014) [2024-06-15 21:56:10,044][1652475] Updated weights for policy 0, policy_version 790961 (0.0013) [2024-06-15 21:56:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1619918848. Throughput: 0: 10683.7. Samples: 405036544. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:56:11,466][1651340] Signal inference workers to stop experience collection... (40700 times) [2024-06-15 21:56:11,506][1652475] InferenceWorker_p0-w0: stopping experience collection (40700 times) [2024-06-15 21:56:11,798][1651340] Signal inference workers to resume experience collection... (40700 times) [2024-06-15 21:56:11,799][1652475] InferenceWorker_p0-w0: resuming experience collection (40700 times) [2024-06-15 21:56:12,831][1652475] Updated weights for policy 0, policy_version 791033 (0.0012) [2024-06-15 21:56:14,227][1652475] Updated weights for policy 0, policy_version 791097 (0.0012) [2024-06-15 21:56:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1620180992. Throughput: 0: 10661.0. Samples: 405095936. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:15,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:56:20,715][1652475] Updated weights for policy 0, policy_version 791152 (0.0011) [2024-06-15 21:56:20,738][1648984] Fps is (10 sec: 36045.1, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 1620279296. Throughput: 0: 10672.4. Samples: 405134336. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:56:21,870][1652475] Updated weights for policy 0, policy_version 791202 (0.0017) [2024-06-15 21:56:24,855][1652475] Updated weights for policy 0, policy_version 791282 (0.0135) [2024-06-15 21:56:25,738][1648984] Fps is (10 sec: 42598.9, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 1620606976. Throughput: 0: 10945.4. Samples: 405200384. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:56:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 42542.9). Total num frames: 1620705280. Throughput: 0: 10877.2. Samples: 405266432. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:56:31,923][1652475] Updated weights for policy 0, policy_version 791376 (0.0012) [2024-06-15 21:56:33,374][1652475] Updated weights for policy 0, policy_version 791440 (0.0010) [2024-06-15 21:56:35,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 1620967424. Throughput: 0: 10797.6. Samples: 405297664. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:56:35,892][1652475] Updated weights for policy 0, policy_version 791504 (0.0013) [2024-06-15 21:56:37,488][1652475] Updated weights for policy 0, policy_version 791553 (0.0014) [2024-06-15 21:56:38,734][1652475] Updated weights for policy 0, policy_version 791606 (0.0012) [2024-06-15 21:56:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1621229568. Throughput: 0: 10774.8. Samples: 405362176. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:56:43,344][1652475] Updated weights for policy 0, policy_version 791648 (0.0012) [2024-06-15 21:56:44,592][1652475] Updated weights for policy 0, policy_version 791696 (0.0111) [2024-06-15 21:56:45,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 45875.3, 300 sec: 43098.3). Total num frames: 1621491712. Throughput: 0: 10854.4. Samples: 405428224. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:45,738][1648984] Avg episode reward: [(0, '-0.150')] [2024-06-15 21:56:47,407][1652475] Updated weights for policy 0, policy_version 791760 (0.0012) [2024-06-15 21:56:48,478][1652475] Updated weights for policy 0, policy_version 791805 (0.0012) [2024-06-15 21:56:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.5, 300 sec: 42431.8). Total num frames: 1621622784. Throughput: 0: 10968.2. Samples: 405459456. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 21:56:52,168][1652475] Updated weights for policy 0, policy_version 791865 (0.0104) [2024-06-15 21:56:55,056][1652475] Updated weights for policy 0, policy_version 791908 (0.0012) [2024-06-15 21:56:55,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1621884928. Throughput: 0: 10888.5. Samples: 405526528. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:56:55,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:56:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000791936_1621884928.pth... [2024-06-15 21:56:55,796][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000786896_1611563008.pth [2024-06-15 21:56:56,384][1651340] Signal inference workers to stop experience collection... (40750 times) [2024-06-15 21:56:56,461][1652475] InferenceWorker_p0-w0: stopping experience collection (40750 times) [2024-06-15 21:56:56,731][1651340] Signal inference workers to resume experience collection... (40750 times) [2024-06-15 21:56:56,732][1652475] InferenceWorker_p0-w0: resuming experience collection (40750 times) [2024-06-15 21:56:56,944][1652475] Updated weights for policy 0, policy_version 791970 (0.0014) [2024-06-15 21:56:59,519][1652475] Updated weights for policy 0, policy_version 792032 (0.0012) [2024-06-15 21:57:00,742][1648984] Fps is (10 sec: 52404.2, 60 sec: 44779.5, 300 sec: 43208.6). Total num frames: 1622147072. Throughput: 0: 10989.8. Samples: 405590528. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:57:00,743][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:57:05,005][1652475] Updated weights for policy 0, policy_version 792099 (0.0012) [2024-06-15 21:57:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1622278144. Throughput: 0: 10990.9. Samples: 405628928. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:57:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 21:57:06,497][1652475] Updated weights for policy 0, policy_version 792161 (0.0124) [2024-06-15 21:57:09,680][1652475] Updated weights for policy 0, policy_version 792240 (0.0117) [2024-06-15 21:57:10,738][1648984] Fps is (10 sec: 39340.2, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1622540288. Throughput: 0: 10911.3. Samples: 405691392. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:57:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 21:57:12,675][1652475] Updated weights for policy 0, policy_version 792314 (0.0012) [2024-06-15 21:57:15,738][1648984] Fps is (10 sec: 39319.5, 60 sec: 41505.8, 300 sec: 42431.7). Total num frames: 1622671360. Throughput: 0: 10877.0. Samples: 405755904. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:57:15,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:57:16,521][1652475] Updated weights for policy 0, policy_version 792353 (0.0012) [2024-06-15 21:57:18,444][1652475] Updated weights for policy 0, policy_version 792444 (0.0013) [2024-06-15 21:57:20,765][1648984] Fps is (10 sec: 39215.7, 60 sec: 44216.9, 300 sec: 43094.3). Total num frames: 1622933504. Throughput: 0: 10734.2. Samples: 405780992. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:57:20,765][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 21:57:23,888][1652475] Updated weights for policy 0, policy_version 792512 (0.0011) [2024-06-15 21:57:25,738][1648984] Fps is (10 sec: 49153.3, 60 sec: 42598.2, 300 sec: 42987.1). Total num frames: 1623162880. Throughput: 0: 10831.6. Samples: 405849600. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:57:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 21:57:25,898][1652475] Updated weights for policy 0, policy_version 792574 (0.0013) [2024-06-15 21:57:27,120][1652475] Updated weights for policy 0, policy_version 792624 (0.0113) [2024-06-15 21:57:29,391][1652475] Updated weights for policy 0, policy_version 792679 (0.0012) [2024-06-15 21:57:29,944][1652475] Updated weights for policy 0, policy_version 792704 (0.0012) [2024-06-15 21:57:30,738][1648984] Fps is (10 sec: 52570.7, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 1623457792. Throughput: 0: 10763.4. Samples: 405912576. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:57:30,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 21:57:35,533][1652475] Updated weights for policy 0, policy_version 792764 (0.0024) [2024-06-15 21:57:35,738][1648984] Fps is (10 sec: 42599.8, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1623588864. Throughput: 0: 11013.7. Samples: 405955072. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:57:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:57:37,533][1652475] Updated weights for policy 0, policy_version 792823 (0.0011) [2024-06-15 21:57:39,218][1652475] Updated weights for policy 0, policy_version 792884 (0.0011) [2024-06-15 21:57:40,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 43098.2). Total num frames: 1623916544. Throughput: 0: 10934.1. Samples: 406018560. Policy #0 lag: (min: 15.0, avg: 141.0, max: 271.0) [2024-06-15 21:57:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 21:57:40,853][1652475] Updated weights for policy 0, policy_version 792946 (0.0015) [2024-06-15 21:57:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 1624014848. Throughput: 0: 11094.5. Samples: 406089728. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:57:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:57:45,804][1651340] Signal inference workers to stop experience collection... (40800 times) [2024-06-15 21:57:45,937][1652475] InferenceWorker_p0-w0: stopping experience collection (40800 times) [2024-06-15 21:57:45,939][1652475] Updated weights for policy 0, policy_version 792984 (0.0010) [2024-06-15 21:57:46,065][1651340] Signal inference workers to resume experience collection... (40800 times) [2024-06-15 21:57:46,065][1652475] InferenceWorker_p0-w0: resuming experience collection (40800 times) [2024-06-15 21:57:48,311][1652475] Updated weights for policy 0, policy_version 793056 (0.0046) [2024-06-15 21:57:50,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1624244224. Throughput: 0: 10934.1. Samples: 406120960. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:57:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 21:57:51,257][1652475] Updated weights for policy 0, policy_version 793092 (0.0018) [2024-06-15 21:57:53,763][1652475] Updated weights for policy 0, policy_version 793200 (0.0015) [2024-06-15 21:57:55,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1624506368. Throughput: 0: 10808.8. Samples: 406177792. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:57:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 21:57:57,128][1652475] Updated weights for policy 0, policy_version 793232 (0.0012) [2024-06-15 21:58:00,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 41509.2, 300 sec: 43431.5). Total num frames: 1624637440. Throughput: 0: 10968.3. Samples: 406249472. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:00,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 21:58:01,260][1652475] Updated weights for policy 0, policy_version 793299 (0.0017) [2024-06-15 21:58:03,180][1652475] Updated weights for policy 0, policy_version 793350 (0.0012) [2024-06-15 21:58:05,005][1652475] Updated weights for policy 0, policy_version 793427 (0.0014) [2024-06-15 21:58:05,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 1624997888. Throughput: 0: 11179.7. Samples: 406283776. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:05,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:58:06,000][1652475] Updated weights for policy 0, policy_version 793472 (0.0013) [2024-06-15 21:58:09,392][1652475] Updated weights for policy 0, policy_version 793530 (0.0014) [2024-06-15 21:58:10,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.5, 300 sec: 43542.5). Total num frames: 1625161728. Throughput: 0: 11002.3. Samples: 406344704. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:10,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:58:14,875][1652475] Updated weights for policy 0, policy_version 793586 (0.0012) [2024-06-15 21:58:15,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 44237.2, 300 sec: 43320.4). Total num frames: 1625325568. Throughput: 0: 11104.7. Samples: 406412288. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 21:58:16,972][1652475] Updated weights for policy 0, policy_version 793667 (0.0012) [2024-06-15 21:58:18,360][1652475] Updated weights for policy 0, policy_version 793728 (0.0013) [2024-06-15 21:58:20,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44803.0, 300 sec: 43320.4). Total num frames: 1625620480. Throughput: 0: 10672.3. Samples: 406435328. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:58:25,759][1648984] Fps is (10 sec: 35966.7, 60 sec: 42037.3, 300 sec: 43428.3). Total num frames: 1625686016. Throughput: 0: 10860.5. Samples: 406507520. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:25,760][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 21:58:27,560][1652475] Updated weights for policy 0, policy_version 793824 (0.0025) [2024-06-15 21:58:29,655][1652475] Updated weights for policy 0, policy_version 793904 (0.0016) [2024-06-15 21:58:30,556][1651340] Signal inference workers to stop experience collection... (40850 times) [2024-06-15 21:58:30,616][1652475] InferenceWorker_p0-w0: stopping experience collection (40850 times) [2024-06-15 21:58:30,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 42052.2, 300 sec: 43213.5). Total num frames: 1625980928. Throughput: 0: 10717.9. Samples: 406572032. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:58:30,801][1651340] Signal inference workers to resume experience collection... (40850 times) [2024-06-15 21:58:30,802][1652475] InferenceWorker_p0-w0: resuming experience collection (40850 times) [2024-06-15 21:58:31,552][1652475] Updated weights for policy 0, policy_version 793984 (0.0013) [2024-06-15 21:58:35,738][1648984] Fps is (10 sec: 52543.1, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1626210304. Throughput: 0: 10513.1. Samples: 406594048. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:58:39,288][1652475] Updated weights for policy 0, policy_version 794064 (0.0036) [2024-06-15 21:58:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40413.9, 300 sec: 43431.5). Total num frames: 1626341376. Throughput: 0: 11036.5. Samples: 406674432. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:58:41,209][1652475] Updated weights for policy 0, policy_version 794144 (0.0133) [2024-06-15 21:58:43,332][1652475] Updated weights for policy 0, policy_version 794234 (0.0016) [2024-06-15 21:58:44,913][1652475] Updated weights for policy 0, policy_version 794274 (0.0014) [2024-06-15 21:58:45,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 45329.0, 300 sec: 43542.6). Total num frames: 1626734592. Throughput: 0: 10570.0. Samples: 406725120. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:58:50,686][1652475] Updated weights for policy 0, policy_version 794307 (0.0013) [2024-06-15 21:58:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 43320.4). Total num frames: 1626734592. Throughput: 0: 10752.0. Samples: 406767616. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 21:58:52,506][1652475] Updated weights for policy 0, policy_version 794384 (0.0117) [2024-06-15 21:58:54,859][1652475] Updated weights for policy 0, policy_version 794480 (0.0013) [2024-06-15 21:58:55,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1627127808. Throughput: 0: 10729.3. Samples: 406827520. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:58:55,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 21:58:55,747][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000794496_1627127808.pth... [2024-06-15 21:58:55,919][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000789440_1616773120.pth [2024-06-15 21:58:56,912][1652475] Updated weights for policy 0, policy_version 794532 (0.0014) [2024-06-15 21:59:00,740][1648984] Fps is (10 sec: 52418.4, 60 sec: 43689.4, 300 sec: 43542.3). Total num frames: 1627258880. Throughput: 0: 10683.3. Samples: 406893056. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:59:00,741][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 21:59:04,126][1652475] Updated weights for policy 0, policy_version 794612 (0.0012) [2024-06-15 21:59:05,277][1652475] Updated weights for policy 0, policy_version 794657 (0.0010) [2024-06-15 21:59:05,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 1627488256. Throughput: 0: 11002.3. Samples: 406930432. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:59:05,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:59:08,157][1652475] Updated weights for policy 0, policy_version 794736 (0.0020) [2024-06-15 21:59:09,749][1652475] Updated weights for policy 0, policy_version 794816 (0.0018) [2024-06-15 21:59:10,738][1648984] Fps is (10 sec: 52439.4, 60 sec: 43690.8, 300 sec: 43542.6). Total num frames: 1627783168. Throughput: 0: 10552.3. Samples: 406982144. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:59:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 21:59:15,738][1648984] Fps is (10 sec: 42596.8, 60 sec: 43144.2, 300 sec: 43542.5). Total num frames: 1627914240. Throughput: 0: 10740.5. Samples: 407055360. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:59:15,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 21:59:15,947][1651340] Signal inference workers to stop experience collection... (40900 times) [2024-06-15 21:59:15,979][1652475] InferenceWorker_p0-w0: stopping experience collection (40900 times) [2024-06-15 21:59:16,232][1651340] Signal inference workers to resume experience collection... (40900 times) [2024-06-15 21:59:16,233][1652475] InferenceWorker_p0-w0: resuming experience collection (40900 times) [2024-06-15 21:59:16,603][1652475] Updated weights for policy 0, policy_version 794912 (0.0013) [2024-06-15 21:59:20,083][1652475] Updated weights for policy 0, policy_version 794992 (0.0019) [2024-06-15 21:59:20,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 1628176384. Throughput: 0: 10854.3. Samples: 407082496. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:59:20,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 21:59:22,385][1652475] Updated weights for policy 0, policy_version 795062 (0.0011) [2024-06-15 21:59:25,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43706.2, 300 sec: 43431.6). Total num frames: 1628307456. Throughput: 0: 10638.1. Samples: 407153152. Policy #0 lag: (min: 64.0, avg: 220.5, max: 315.0) [2024-06-15 21:59:25,739][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 21:59:27,225][1652475] Updated weights for policy 0, policy_version 795133 (0.0184) [2024-06-15 21:59:29,125][1652475] Updated weights for policy 0, policy_version 795198 (0.0013) [2024-06-15 21:59:30,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 1628569600. Throughput: 0: 10797.5. Samples: 407211008. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 21:59:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 21:59:34,099][1652475] Updated weights for policy 0, policy_version 795264 (0.0014) [2024-06-15 21:59:35,740][1648984] Fps is (10 sec: 52430.6, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1628831744. Throughput: 0: 10729.2. Samples: 407250432. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 21:59:35,741][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 21:59:37,440][1652475] Updated weights for policy 0, policy_version 795329 (0.0010) [2024-06-15 21:59:38,641][1652475] Updated weights for policy 0, policy_version 795388 (0.0013) [2024-06-15 21:59:40,730][1652475] Updated weights for policy 0, policy_version 795449 (0.0012) [2024-06-15 21:59:40,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 45329.0, 300 sec: 42989.0). Total num frames: 1629061120. Throughput: 0: 10717.8. Samples: 407309824. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 21:59:40,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 21:59:45,184][1652475] Updated weights for policy 0, policy_version 795514 (0.0012) [2024-06-15 21:59:45,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1629224960. Throughput: 0: 10741.1. Samples: 407376384. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 21:59:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 21:59:48,055][1652475] Updated weights for policy 0, policy_version 795576 (0.0043) [2024-06-15 21:59:49,739][1652475] Updated weights for policy 0, policy_version 795641 (0.0015) [2024-06-15 21:59:50,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 45875.2, 300 sec: 43209.4). Total num frames: 1629487104. Throughput: 0: 10581.4. Samples: 407406592. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 21:59:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 21:59:54,643][1652475] Updated weights for policy 0, policy_version 795696 (0.0013) [2024-06-15 21:59:55,764][1648984] Fps is (10 sec: 39217.2, 60 sec: 41487.7, 300 sec: 42650.1). Total num frames: 1629618176. Throughput: 0: 10870.7. Samples: 407471616. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 21:59:55,765][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 21:59:56,724][1652475] Updated weights for policy 0, policy_version 795748 (0.0012) [2024-06-15 21:59:57,294][1652475] Updated weights for policy 0, policy_version 795776 (0.0028) [2024-06-15 21:59:59,829][1652475] Updated weights for policy 0, policy_version 795829 (0.0012) [2024-06-15 22:00:00,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44238.3, 300 sec: 43209.4). Total num frames: 1629913088. Throughput: 0: 10797.6. Samples: 407541248. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:00:01,364][1652475] Updated weights for policy 0, policy_version 795901 (0.0102) [2024-06-15 22:00:05,738][1648984] Fps is (10 sec: 39426.9, 60 sec: 42052.3, 300 sec: 42876.1). Total num frames: 1630011392. Throughput: 0: 10922.7. Samples: 407574016. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:05,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:00:06,846][1651340] Signal inference workers to stop experience collection... (40950 times) [2024-06-15 22:00:06,951][1652475] InferenceWorker_p0-w0: stopping experience collection (40950 times) [2024-06-15 22:00:07,125][1651340] Signal inference workers to resume experience collection... (40950 times) [2024-06-15 22:00:07,125][1652475] InferenceWorker_p0-w0: resuming experience collection (40950 times) [2024-06-15 22:00:09,056][1652475] Updated weights for policy 0, policy_version 795989 (0.0011) [2024-06-15 22:00:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42052.3, 300 sec: 43209.3). Total num frames: 1630306304. Throughput: 0: 10729.3. Samples: 407635968. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:00:11,200][1652475] Updated weights for policy 0, policy_version 796067 (0.0036) [2024-06-15 22:00:12,168][1652475] Updated weights for policy 0, policy_version 796097 (0.0014) [2024-06-15 22:00:15,758][1648984] Fps is (10 sec: 52321.2, 60 sec: 43676.1, 300 sec: 43539.5). Total num frames: 1630535680. Throughput: 0: 10735.7. Samples: 407694336. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:15,759][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:00:19,709][1652475] Updated weights for policy 0, policy_version 796178 (0.0013) [2024-06-15 22:00:20,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 40960.1, 300 sec: 42765.0). Total num frames: 1630633984. Throughput: 0: 10695.1. Samples: 407731712. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:00:22,025][1652475] Updated weights for policy 0, policy_version 796288 (0.0092) [2024-06-15 22:00:25,738][1648984] Fps is (10 sec: 39402.1, 60 sec: 43690.9, 300 sec: 43098.2). Total num frames: 1630928896. Throughput: 0: 10558.6. Samples: 407784960. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:00:26,243][1652475] Updated weights for policy 0, policy_version 796356 (0.0015) [2024-06-15 22:00:30,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 1631059968. Throughput: 0: 10683.7. Samples: 407857152. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:30,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:00:31,778][1652475] Updated weights for policy 0, policy_version 796434 (0.0014) [2024-06-15 22:00:33,470][1652475] Updated weights for policy 0, policy_version 796512 (0.0013) [2024-06-15 22:00:35,042][1652475] Updated weights for policy 0, policy_version 796560 (0.0015) [2024-06-15 22:00:35,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1631387648. Throughput: 0: 10626.8. Samples: 407884800. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:00:39,925][1652475] Updated weights for policy 0, policy_version 796624 (0.0127) [2024-06-15 22:00:40,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 41506.0, 300 sec: 43431.4). Total num frames: 1631551488. Throughput: 0: 10621.7. Samples: 407949312. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:40,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:00:41,038][1652475] Updated weights for policy 0, policy_version 796672 (0.0015) [2024-06-15 22:00:44,108][1652475] Updated weights for policy 0, policy_version 796736 (0.0014) [2024-06-15 22:00:45,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 43209.4). Total num frames: 1631846400. Throughput: 0: 10490.3. Samples: 408013312. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:00:47,578][1652475] Updated weights for policy 0, policy_version 796816 (0.0102) [2024-06-15 22:00:50,738][1648984] Fps is (10 sec: 42599.8, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1631977472. Throughput: 0: 10365.1. Samples: 408040448. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:00:52,661][1651340] Signal inference workers to stop experience collection... (41000 times) [2024-06-15 22:00:52,688][1652475] InferenceWorker_p0-w0: stopping experience collection (41000 times) [2024-06-15 22:00:52,942][1651340] Signal inference workers to resume experience collection... (41000 times) [2024-06-15 22:00:52,942][1652475] InferenceWorker_p0-w0: resuming experience collection (41000 times) [2024-06-15 22:00:52,944][1652475] Updated weights for policy 0, policy_version 796896 (0.0013) [2024-06-15 22:00:55,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 42070.9, 300 sec: 42987.2). Total num frames: 1632141312. Throughput: 0: 10581.3. Samples: 408112128. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:00:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:00:56,138][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000796976_1632206848.pth... [2024-06-15 22:00:56,196][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000791936_1621884928.pth [2024-06-15 22:00:56,384][1652475] Updated weights for policy 0, policy_version 796982 (0.0030) [2024-06-15 22:00:57,933][1652475] Updated weights for policy 0, policy_version 797040 (0.0014) [2024-06-15 22:00:59,540][1652475] Updated weights for policy 0, policy_version 797088 (0.0012) [2024-06-15 22:01:00,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43144.5, 300 sec: 43542.6). Total num frames: 1632501760. Throughput: 0: 10563.4. Samples: 408169472. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:01:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:01:04,926][1652475] Updated weights for policy 0, policy_version 797156 (0.0132) [2024-06-15 22:01:05,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1632632832. Throughput: 0: 10581.3. Samples: 408207872. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:01:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:01:07,978][1652475] Updated weights for policy 0, policy_version 797219 (0.0011) [2024-06-15 22:01:09,697][1652475] Updated weights for policy 0, policy_version 797280 (0.0013) [2024-06-15 22:01:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 1632894976. Throughput: 0: 10911.3. Samples: 408275968. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:01:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:01:11,843][1652475] Updated weights for policy 0, policy_version 797371 (0.0013) [2024-06-15 22:01:15,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 41520.2, 300 sec: 43209.3). Total num frames: 1633026048. Throughput: 0: 10752.0. Samples: 408340992. Policy #0 lag: (min: 15.0, avg: 134.9, max: 271.0) [2024-06-15 22:01:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:01:16,986][1652475] Updated weights for policy 0, policy_version 797425 (0.0011) [2024-06-15 22:01:20,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 1633189888. Throughput: 0: 10934.1. Samples: 408376832. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:01:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:01:21,011][1652475] Updated weights for policy 0, policy_version 797477 (0.0011) [2024-06-15 22:01:22,417][1652475] Updated weights for policy 0, policy_version 797537 (0.0016) [2024-06-15 22:01:24,459][1652475] Updated weights for policy 0, policy_version 797622 (0.0012) [2024-06-15 22:01:25,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1633550336. Throughput: 0: 10809.0. Samples: 408435712. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:01:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:01:28,773][1652475] Updated weights for policy 0, policy_version 797689 (0.0012) [2024-06-15 22:01:30,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.9, 300 sec: 43098.3). Total num frames: 1633681408. Throughput: 0: 10877.2. Samples: 408502784. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:01:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:01:33,069][1652475] Updated weights for policy 0, policy_version 797760 (0.0011) [2024-06-15 22:01:34,550][1652475] Updated weights for policy 0, policy_version 797813 (0.0010) [2024-06-15 22:01:34,861][1651340] Signal inference workers to stop experience collection... (41050 times) [2024-06-15 22:01:34,893][1652475] InferenceWorker_p0-w0: stopping experience collection (41050 times) [2024-06-15 22:01:35,115][1651340] Signal inference workers to resume experience collection... (41050 times) [2024-06-15 22:01:35,116][1652475] InferenceWorker_p0-w0: resuming experience collection (41050 times) [2024-06-15 22:01:35,738][1648984] Fps is (10 sec: 45874.1, 60 sec: 43690.5, 300 sec: 43320.4). Total num frames: 1634009088. Throughput: 0: 11104.6. Samples: 408540160. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:01:35,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:01:39,472][1652475] Updated weights for policy 0, policy_version 797909 (0.0012) [2024-06-15 22:01:40,742][1648984] Fps is (10 sec: 52403.5, 60 sec: 44233.5, 300 sec: 43097.6). Total num frames: 1634205696. Throughput: 0: 10762.3. Samples: 408596480. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:01:40,743][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:01:44,760][1652475] Updated weights for policy 0, policy_version 797984 (0.0011) [2024-06-15 22:01:45,738][1648984] Fps is (10 sec: 32768.9, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1634336768. Throughput: 0: 10956.8. Samples: 408662528. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:01:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:01:47,651][1652475] Updated weights for policy 0, policy_version 798072 (0.0020) [2024-06-15 22:01:48,948][1652475] Updated weights for policy 0, policy_version 798128 (0.0013) [2024-06-15 22:01:50,496][1652475] Updated weights for policy 0, policy_version 798147 (0.0015) [2024-06-15 22:01:50,738][1648984] Fps is (10 sec: 42618.2, 60 sec: 44236.7, 300 sec: 43209.3). Total num frames: 1634631680. Throughput: 0: 10740.6. Samples: 408691200. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:01:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:01:55,654][1652475] Updated weights for policy 0, policy_version 798209 (0.0028) [2024-06-15 22:01:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.5, 300 sec: 42654.6). Total num frames: 1634729984. Throughput: 0: 10763.4. Samples: 408760320. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:01:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:01:56,956][1652475] Updated weights for policy 0, policy_version 798271 (0.0012) [2024-06-15 22:01:59,427][1652475] Updated weights for policy 0, policy_version 798320 (0.0013) [2024-06-15 22:02:00,520][1652475] Updated weights for policy 0, policy_version 798357 (0.0012) [2024-06-15 22:02:00,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1635057664. Throughput: 0: 10888.6. Samples: 408830976. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:00,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:02:02,838][1652475] Updated weights for policy 0, policy_version 798454 (0.0016) [2024-06-15 22:02:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1635254272. Throughput: 0: 10581.3. Samples: 408852992. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:02:09,467][1652475] Updated weights for policy 0, policy_version 798528 (0.0013) [2024-06-15 22:02:10,738][1648984] Fps is (10 sec: 32767.4, 60 sec: 41506.0, 300 sec: 43098.3). Total num frames: 1635385344. Throughput: 0: 10911.3. Samples: 408926720. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:10,738][1648984] Avg episode reward: [(0, '-0.190')] [2024-06-15 22:02:12,084][1652475] Updated weights for policy 0, policy_version 798590 (0.0012) [2024-06-15 22:02:15,367][1652475] Updated weights for policy 0, policy_version 798704 (0.0014) [2024-06-15 22:02:15,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 45329.2, 300 sec: 43435.5). Total num frames: 1635745792. Throughput: 0: 10467.5. Samples: 408973824. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:15,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:02:20,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 43144.5, 300 sec: 42765.1). Total num frames: 1635778560. Throughput: 0: 10467.6. Samples: 409011200. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:20,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:02:22,479][1652475] Updated weights for policy 0, policy_version 798723 (0.0012) [2024-06-15 22:02:24,075][1652475] Updated weights for policy 0, policy_version 798800 (0.0090) [2024-06-15 22:02:24,182][1651340] Signal inference workers to stop experience collection... (41100 times) [2024-06-15 22:02:24,245][1652475] InferenceWorker_p0-w0: stopping experience collection (41100 times) [2024-06-15 22:02:24,411][1651340] Signal inference workers to resume experience collection... (41100 times) [2024-06-15 22:02:24,412][1652475] InferenceWorker_p0-w0: resuming experience collection (41100 times) [2024-06-15 22:02:25,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1636073472. Throughput: 0: 10798.6. Samples: 409082368. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:25,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 22:02:26,020][1652475] Updated weights for policy 0, policy_version 798880 (0.0012) [2024-06-15 22:02:28,117][1652475] Updated weights for policy 0, policy_version 798947 (0.0010) [2024-06-15 22:02:30,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1636302848. Throughput: 0: 10501.7. Samples: 409135104. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:30,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:02:34,778][1652475] Updated weights for policy 0, policy_version 798992 (0.0011) [2024-06-15 22:02:35,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 39867.9, 300 sec: 42320.7). Total num frames: 1636401152. Throughput: 0: 10763.4. Samples: 409175552. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:35,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 22:02:36,910][1652475] Updated weights for policy 0, policy_version 799041 (0.0013) [2024-06-15 22:02:38,861][1652475] Updated weights for policy 0, policy_version 799120 (0.0016) [2024-06-15 22:02:40,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42601.8, 300 sec: 43209.3). Total num frames: 1636761600. Throughput: 0: 10444.8. Samples: 409230336. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:40,738][1648984] Avg episode reward: [(0, '-0.520')] [2024-06-15 22:02:40,739][1652475] Updated weights for policy 0, policy_version 799200 (0.0022) [2024-06-15 22:02:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1636827136. Throughput: 0: 10365.1. Samples: 409297408. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:02:46,734][1652475] Updated weights for policy 0, policy_version 799251 (0.0012) [2024-06-15 22:02:47,467][1652475] Updated weights for policy 0, policy_version 799289 (0.0012) [2024-06-15 22:02:50,035][1652475] Updated weights for policy 0, policy_version 799360 (0.0092) [2024-06-15 22:02:50,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 41506.2, 300 sec: 42765.0). Total num frames: 1637122048. Throughput: 0: 10683.7. Samples: 409333760. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:02:52,720][1652475] Updated weights for policy 0, policy_version 799426 (0.0014) [2024-06-15 22:02:54,021][1652475] Updated weights for policy 0, policy_version 799488 (0.0012) [2024-06-15 22:02:55,740][1648984] Fps is (10 sec: 52418.3, 60 sec: 43689.2, 300 sec: 43098.0). Total num frames: 1637351424. Throughput: 0: 10239.6. Samples: 409387520. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:02:55,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:02:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000799488_1637351424.pth... [2024-06-15 22:02:55,792][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000794496_1627127808.pth [2024-06-15 22:02:58,965][1652475] Updated weights for policy 0, policy_version 799547 (0.0013) [2024-06-15 22:03:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 40413.8, 300 sec: 42320.7). Total num frames: 1637482496. Throughput: 0: 10831.6. Samples: 409461248. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 22:03:00,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:03:02,211][1652475] Updated weights for policy 0, policy_version 799616 (0.0011) [2024-06-15 22:03:03,574][1652475] Updated weights for policy 0, policy_version 799674 (0.0012) [2024-06-15 22:03:05,738][1648984] Fps is (10 sec: 39329.7, 60 sec: 41506.2, 300 sec: 42654.0). Total num frames: 1637744640. Throughput: 0: 10467.6. Samples: 409482240. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:03:06,664][1652475] Updated weights for policy 0, policy_version 799715 (0.0039) [2024-06-15 22:03:09,633][1651340] Signal inference workers to stop experience collection... (41150 times) [2024-06-15 22:03:09,772][1652475] InferenceWorker_p0-w0: stopping experience collection (41150 times) [2024-06-15 22:03:09,958][1651340] Signal inference workers to resume experience collection... (41150 times) [2024-06-15 22:03:09,964][1652475] InferenceWorker_p0-w0: resuming experience collection (41150 times) [2024-06-15 22:03:10,170][1652475] Updated weights for policy 0, policy_version 799763 (0.0012) [2024-06-15 22:03:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.5, 300 sec: 42765.0). Total num frames: 1637941248. Throughput: 0: 10604.1. Samples: 409559552. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:03:12,354][1652475] Updated weights for policy 0, policy_version 799812 (0.0014) [2024-06-15 22:03:13,902][1652475] Updated weights for policy 0, policy_version 799874 (0.0147) [2024-06-15 22:03:15,174][1652475] Updated weights for policy 0, policy_version 799931 (0.0013) [2024-06-15 22:03:15,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42052.2, 300 sec: 42876.1). Total num frames: 1638268928. Throughput: 0: 10706.5. Samples: 409616896. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:03:18,250][1652475] Updated weights for policy 0, policy_version 799972 (0.0010) [2024-06-15 22:03:20,738][1648984] Fps is (10 sec: 45873.8, 60 sec: 43690.4, 300 sec: 43101.4). Total num frames: 1638400000. Throughput: 0: 10717.8. Samples: 409657856. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:20,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:03:22,800][1652475] Updated weights for policy 0, policy_version 800059 (0.0012) [2024-06-15 22:03:25,002][1652475] Updated weights for policy 0, policy_version 800120 (0.0012) [2024-06-15 22:03:25,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1638662144. Throughput: 0: 11002.3. Samples: 409725440. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:03:26,633][1652475] Updated weights for policy 0, policy_version 800163 (0.0016) [2024-06-15 22:03:28,484][1652475] Updated weights for policy 0, policy_version 800199 (0.0010) [2024-06-15 22:03:30,738][1648984] Fps is (10 sec: 52430.7, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1638924288. Throughput: 0: 11002.3. Samples: 409792512. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:03:33,698][1652475] Updated weights for policy 0, policy_version 800258 (0.0110) [2024-06-15 22:03:35,739][1648984] Fps is (10 sec: 42593.8, 60 sec: 44782.1, 300 sec: 43209.2). Total num frames: 1639088128. Throughput: 0: 11081.7. Samples: 409832448. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:35,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:03:35,846][1652475] Updated weights for policy 0, policy_version 800339 (0.0011) [2024-06-15 22:03:38,156][1652475] Updated weights for policy 0, policy_version 800400 (0.0014) [2024-06-15 22:03:39,172][1652475] Updated weights for policy 0, policy_version 800444 (0.0012) [2024-06-15 22:03:40,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 1639415808. Throughput: 0: 11105.2. Samples: 409887232. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:40,740][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:03:40,787][1652475] Updated weights for policy 0, policy_version 800508 (0.0011) [2024-06-15 22:03:45,738][1648984] Fps is (10 sec: 36049.2, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1639448576. Throughput: 0: 11070.6. Samples: 409959424. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:03:47,190][1652475] Updated weights for policy 0, policy_version 800546 (0.0018) [2024-06-15 22:03:49,205][1652475] Updated weights for policy 0, policy_version 800637 (0.0012) [2024-06-15 22:03:50,606][1652475] Updated weights for policy 0, policy_version 800693 (0.0103) [2024-06-15 22:03:50,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 45329.2, 300 sec: 43098.3). Total num frames: 1639841792. Throughput: 0: 11320.9. Samples: 409991680. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:03:52,392][1651340] Signal inference workers to stop experience collection... (41200 times) [2024-06-15 22:03:52,425][1652475] InferenceWorker_p0-w0: stopping experience collection (41200 times) [2024-06-15 22:03:52,711][1651340] Signal inference workers to resume experience collection... (41200 times) [2024-06-15 22:03:52,712][1652475] InferenceWorker_p0-w0: resuming experience collection (41200 times) [2024-06-15 22:03:52,992][1652475] Updated weights for policy 0, policy_version 800767 (0.0012) [2024-06-15 22:03:55,738][1648984] Fps is (10 sec: 52427.2, 60 sec: 43692.0, 300 sec: 43098.5). Total num frames: 1639972864. Throughput: 0: 10854.3. Samples: 410048000. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:03:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:03:59,870][1652475] Updated weights for policy 0, policy_version 800832 (0.0013) [2024-06-15 22:04:00,738][1648984] Fps is (10 sec: 26213.4, 60 sec: 43690.5, 300 sec: 42765.0). Total num frames: 1640103936. Throughput: 0: 11138.8. Samples: 410118144. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:04:00,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:04:03,025][1652475] Updated weights for policy 0, policy_version 800944 (0.0014) [2024-06-15 22:04:04,678][1652475] Updated weights for policy 0, policy_version 801018 (0.0013) [2024-06-15 22:04:05,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 1640497152. Throughput: 0: 10774.8. Samples: 410142720. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:04:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:04:10,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 42598.4, 300 sec: 42654.0). Total num frames: 1640497152. Throughput: 0: 10854.4. Samples: 410213888. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:04:10,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:04:12,313][1652475] Updated weights for policy 0, policy_version 801072 (0.0012) [2024-06-15 22:04:14,164][1652475] Updated weights for policy 0, policy_version 801156 (0.0012) [2024-06-15 22:04:15,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.9, 300 sec: 43209.4). Total num frames: 1640923136. Throughput: 0: 10763.4. Samples: 410276864. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:04:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:04:16,871][1652475] Updated weights for policy 0, policy_version 801274 (0.0116) [2024-06-15 22:04:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.9, 300 sec: 43098.3). Total num frames: 1641021440. Throughput: 0: 10433.7. Samples: 410301952. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:04:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:04:25,319][1652475] Updated weights for policy 0, policy_version 801361 (0.0013) [2024-06-15 22:04:25,738][1648984] Fps is (10 sec: 29490.7, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1641218048. Throughput: 0: 11093.3. Samples: 410386432. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:04:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:04:27,085][1652475] Updated weights for policy 0, policy_version 801442 (0.0099) [2024-06-15 22:04:29,197][1652475] Updated weights for policy 0, policy_version 801532 (0.0012) [2024-06-15 22:04:30,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1641545728. Throughput: 0: 10592.7. Samples: 410436096. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:04:30,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 22:04:35,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 40960.9, 300 sec: 42320.7). Total num frames: 1641545728. Throughput: 0: 10774.7. Samples: 410476544. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:04:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:04:36,700][1652475] Updated weights for policy 0, policy_version 801597 (0.0105) [2024-06-15 22:04:37,259][1651340] Signal inference workers to stop experience collection... (41250 times) [2024-06-15 22:04:37,292][1652475] InferenceWorker_p0-w0: stopping experience collection (41250 times) [2024-06-15 22:04:37,446][1651340] Signal inference workers to resume experience collection... (41250 times) [2024-06-15 22:04:37,458][1652475] InferenceWorker_p0-w0: resuming experience collection (41250 times) [2024-06-15 22:04:39,003][1652475] Updated weights for policy 0, policy_version 801696 (0.0116) [2024-06-15 22:04:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 1641971712. Throughput: 0: 10786.2. Samples: 410533376. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:04:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:04:41,299][1652475] Updated weights for policy 0, policy_version 801776 (0.0013) [2024-06-15 22:04:45,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1642070016. Throughput: 0: 10729.3. Samples: 410600960. Policy #0 lag: (min: 102.0, avg: 177.5, max: 343.0) [2024-06-15 22:04:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:04:49,926][1652475] Updated weights for policy 0, policy_version 801872 (0.0108) [2024-06-15 22:04:50,738][1648984] Fps is (10 sec: 29490.2, 60 sec: 40413.6, 300 sec: 42879.9). Total num frames: 1642266624. Throughput: 0: 11002.2. Samples: 410637824. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:04:50,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:04:51,364][1652475] Updated weights for policy 0, policy_version 801920 (0.0013) [2024-06-15 22:04:54,538][1652475] Updated weights for policy 0, policy_version 802016 (0.0014) [2024-06-15 22:04:55,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.9, 300 sec: 42987.2). Total num frames: 1642594304. Throughput: 0: 10399.3. Samples: 410681856. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:04:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:04:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000802048_1642594304.pth... [2024-06-15 22:04:55,803][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000796976_1632206848.pth [2024-06-15 22:05:00,738][1648984] Fps is (10 sec: 32768.9, 60 sec: 41506.3, 300 sec: 42653.9). Total num frames: 1642594304. Throughput: 0: 10592.7. Samples: 410753536. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:05:00,784][1652475] Updated weights for policy 0, policy_version 802064 (0.0011) [2024-06-15 22:05:03,029][1652475] Updated weights for policy 0, policy_version 802149 (0.0011) [2024-06-15 22:05:05,742][1648984] Fps is (10 sec: 26202.7, 60 sec: 39318.7, 300 sec: 42542.2). Total num frames: 1642856448. Throughput: 0: 10489.3. Samples: 410774016. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:05,743][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:05:06,056][1652475] Updated weights for policy 0, policy_version 802192 (0.0011) [2024-06-15 22:05:08,296][1652475] Updated weights for policy 0, policy_version 802288 (0.0012) [2024-06-15 22:05:10,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 43690.4, 300 sec: 42656.9). Total num frames: 1643118592. Throughput: 0: 10023.8. Samples: 410837504. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:10,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:05:13,531][1652475] Updated weights for policy 0, policy_version 802340 (0.0049) [2024-06-15 22:05:15,622][1652475] Updated weights for policy 0, policy_version 802431 (0.0181) [2024-06-15 22:05:15,742][1648984] Fps is (10 sec: 52432.0, 60 sec: 40957.3, 300 sec: 43208.8). Total num frames: 1643380736. Throughput: 0: 10227.7. Samples: 410896384. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:15,742][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:05:20,092][1652475] Updated weights for policy 0, policy_version 802496 (0.0012) [2024-06-15 22:05:20,539][1651340] Signal inference workers to stop experience collection... (41300 times) [2024-06-15 22:05:20,631][1652475] InferenceWorker_p0-w0: stopping experience collection (41300 times) [2024-06-15 22:05:20,738][1648984] Fps is (10 sec: 42599.6, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1643544576. Throughput: 0: 10274.1. Samples: 410938880. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 22:05:20,776][1651340] Signal inference workers to resume experience collection... (41300 times) [2024-06-15 22:05:20,778][1652475] InferenceWorker_p0-w0: resuming experience collection (41300 times) [2024-06-15 22:05:21,538][1652475] Updated weights for policy 0, policy_version 802560 (0.0286) [2024-06-15 22:05:25,738][1648984] Fps is (10 sec: 36058.7, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1643741184. Throughput: 0: 10376.5. Samples: 411000320. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:05:26,536][1652475] Updated weights for policy 0, policy_version 802643 (0.0119) [2024-06-15 22:05:30,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 39321.6, 300 sec: 42431.8). Total num frames: 1643905024. Throughput: 0: 10274.1. Samples: 411063296. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:05:31,410][1652475] Updated weights for policy 0, policy_version 802709 (0.0015) [2024-06-15 22:05:32,658][1652475] Updated weights for policy 0, policy_version 802768 (0.0030) [2024-06-15 22:05:33,683][1652475] Updated weights for policy 0, policy_version 802816 (0.0011) [2024-06-15 22:05:35,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.6, 300 sec: 42765.1). Total num frames: 1644167168. Throughput: 0: 10126.3. Samples: 411093504. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:05:37,729][1652475] Updated weights for policy 0, policy_version 802880 (0.0013) [2024-06-15 22:05:39,178][1652475] Updated weights for policy 0, policy_version 802944 (0.0012) [2024-06-15 22:05:40,740][1648984] Fps is (10 sec: 52428.5, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 1644429312. Throughput: 0: 10547.2. Samples: 411156480. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:40,741][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:05:44,539][1652475] Updated weights for policy 0, policy_version 803024 (0.0014) [2024-06-15 22:05:45,741][1648984] Fps is (10 sec: 52410.2, 60 sec: 43688.0, 300 sec: 43097.7). Total num frames: 1644691456. Throughput: 0: 10489.5. Samples: 411225600. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:45,742][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:05:48,229][1652475] Updated weights for policy 0, policy_version 803077 (0.0013) [2024-06-15 22:05:49,865][1652475] Updated weights for policy 0, policy_version 803138 (0.0076) [2024-06-15 22:05:50,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 43691.0, 300 sec: 43209.3). Total num frames: 1644888064. Throughput: 0: 10889.6. Samples: 411264000. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:05:51,023][1652475] Updated weights for policy 0, policy_version 803195 (0.0012) [2024-06-15 22:05:55,738][1648984] Fps is (10 sec: 36056.9, 60 sec: 40959.9, 300 sec: 42542.8). Total num frames: 1645051904. Throughput: 0: 11002.3. Samples: 411332608. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:05:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:05:56,022][1652475] Updated weights for policy 0, policy_version 803267 (0.0126) [2024-06-15 22:05:57,163][1652475] Updated weights for policy 0, policy_version 803324 (0.0012) [2024-06-15 22:06:00,094][1652475] Updated weights for policy 0, policy_version 803362 (0.0011) [2024-06-15 22:06:00,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 45875.3, 300 sec: 43098.3). Total num frames: 1645346816. Throughput: 0: 11094.3. Samples: 411395584. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:06:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:06:04,026][1652475] Updated weights for policy 0, policy_version 803440 (0.0010) [2024-06-15 22:06:05,738][1648984] Fps is (10 sec: 42599.4, 60 sec: 43694.0, 300 sec: 42653.9). Total num frames: 1645477888. Throughput: 0: 10956.8. Samples: 411431936. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:06:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:06:06,821][1652475] Updated weights for policy 0, policy_version 803474 (0.0012) [2024-06-15 22:06:07,635][1651340] Signal inference workers to stop experience collection... (41350 times) [2024-06-15 22:06:07,686][1652475] InferenceWorker_p0-w0: stopping experience collection (41350 times) [2024-06-15 22:06:07,928][1651340] Signal inference workers to resume experience collection... (41350 times) [2024-06-15 22:06:07,929][1652475] InferenceWorker_p0-w0: resuming experience collection (41350 times) [2024-06-15 22:06:08,586][1652475] Updated weights for policy 0, policy_version 803552 (0.0011) [2024-06-15 22:06:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.9, 300 sec: 43098.3). Total num frames: 1645740032. Throughput: 0: 10786.1. Samples: 411485696. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:06:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:06:11,308][1652475] Updated weights for policy 0, policy_version 803600 (0.0025) [2024-06-15 22:06:15,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41508.8, 300 sec: 42987.2). Total num frames: 1645871104. Throughput: 0: 10899.9. Samples: 411553792. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:06:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:06:19,239][1652475] Updated weights for policy 0, policy_version 803699 (0.0010) [2024-06-15 22:06:20,738][1648984] Fps is (10 sec: 36044.0, 60 sec: 42598.3, 300 sec: 42542.8). Total num frames: 1646100480. Throughput: 0: 11036.4. Samples: 411590144. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:06:20,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:06:21,079][1652475] Updated weights for policy 0, policy_version 803782 (0.0257) [2024-06-15 22:06:22,321][1652475] Updated weights for policy 0, policy_version 803832 (0.0016) [2024-06-15 22:06:23,786][1652475] Updated weights for policy 0, policy_version 803859 (0.0012) [2024-06-15 22:06:25,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 1646395392. Throughput: 0: 10877.1. Samples: 411645952. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:06:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:06:30,570][1652475] Updated weights for policy 0, policy_version 803936 (0.0011) [2024-06-15 22:06:30,738][1648984] Fps is (10 sec: 36045.8, 60 sec: 42598.4, 300 sec: 42209.7). Total num frames: 1646460928. Throughput: 0: 10900.8. Samples: 411716096. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:06:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:06:33,567][1652475] Updated weights for policy 0, policy_version 804064 (0.0011) [2024-06-15 22:06:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.6, 300 sec: 42654.6). Total num frames: 1646788608. Throughput: 0: 10422.0. Samples: 411732992. Policy #0 lag: (min: 95.0, avg: 123.9, max: 288.0) [2024-06-15 22:06:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:06:37,884][1652475] Updated weights for policy 0, policy_version 804112 (0.0011) [2024-06-15 22:06:40,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1646919680. Throughput: 0: 10444.8. Samples: 411802624. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:06:40,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:06:42,300][1652475] Updated weights for policy 0, policy_version 804208 (0.0016) [2024-06-15 22:06:44,197][1652475] Updated weights for policy 0, policy_version 804277 (0.0011) [2024-06-15 22:06:45,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42600.9, 300 sec: 42765.0). Total num frames: 1647247360. Throughput: 0: 10399.3. Samples: 411863552. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:06:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:06:46,081][1652475] Updated weights for policy 0, policy_version 804352 (0.0011) [2024-06-15 22:06:50,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 40413.8, 300 sec: 42653.9). Total num frames: 1647312896. Throughput: 0: 10319.6. Samples: 411896320. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:06:50,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:06:51,595][1652475] Updated weights for policy 0, policy_version 804411 (0.0083) [2024-06-15 22:06:53,911][1651340] Signal inference workers to stop experience collection... (41400 times) [2024-06-15 22:06:53,961][1652475] InferenceWorker_p0-w0: stopping experience collection (41400 times) [2024-06-15 22:06:54,152][1651340] Signal inference workers to resume experience collection... (41400 times) [2024-06-15 22:06:54,153][1652475] InferenceWorker_p0-w0: resuming experience collection (41400 times) [2024-06-15 22:06:54,374][1652475] Updated weights for policy 0, policy_version 804479 (0.0017) [2024-06-15 22:06:55,738][1648984] Fps is (10 sec: 36043.7, 60 sec: 42598.3, 300 sec: 42542.8). Total num frames: 1647607808. Throughput: 0: 10638.1. Samples: 411964416. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:06:55,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:06:56,367][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000804528_1647673344.pth... [2024-06-15 22:06:56,405][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000799488_1637351424.pth [2024-06-15 22:06:57,084][1652475] Updated weights for policy 0, policy_version 804545 (0.0010) [2024-06-15 22:06:58,373][1652475] Updated weights for policy 0, policy_version 804597 (0.0054) [2024-06-15 22:07:00,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 1647837184. Throughput: 0: 10524.5. Samples: 412027392. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:07:02,449][1652475] Updated weights for policy 0, policy_version 804629 (0.0035) [2024-06-15 22:07:05,192][1652475] Updated weights for policy 0, policy_version 804680 (0.0062) [2024-06-15 22:07:05,738][1648984] Fps is (10 sec: 42600.0, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1648033792. Throughput: 0: 10422.1. Samples: 412059136. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:07:06,327][1652475] Updated weights for policy 0, policy_version 804736 (0.0012) [2024-06-15 22:07:08,062][1652475] Updated weights for policy 0, policy_version 804795 (0.0129) [2024-06-15 22:07:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42320.7). Total num frames: 1648230400. Throughput: 0: 10706.5. Samples: 412127744. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:07:12,926][1652475] Updated weights for policy 0, policy_version 804861 (0.0012) [2024-06-15 22:07:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1648492544. Throughput: 0: 10399.3. Samples: 412184064. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:15,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 22:07:18,467][1652475] Updated weights for policy 0, policy_version 804946 (0.0011) [2024-06-15 22:07:19,757][1652475] Updated weights for policy 0, policy_version 804997 (0.0011) [2024-06-15 22:07:20,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 43144.7, 300 sec: 42765.0). Total num frames: 1648689152. Throughput: 0: 10877.1. Samples: 412222464. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:07:21,137][1652475] Updated weights for policy 0, policy_version 805055 (0.0012) [2024-06-15 22:07:25,640][1652475] Updated weights for policy 0, policy_version 805105 (0.0013) [2024-06-15 22:07:25,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 40960.0, 300 sec: 42542.9). Total num frames: 1648852992. Throughput: 0: 10717.9. Samples: 412284928. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:07:26,706][1652475] Updated weights for policy 0, policy_version 805169 (0.0013) [2024-06-15 22:07:30,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 1649115136. Throughput: 0: 10888.5. Samples: 412353536. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:30,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 22:07:30,890][1652475] Updated weights for policy 0, policy_version 805240 (0.0012) [2024-06-15 22:07:33,244][1652475] Updated weights for policy 0, policy_version 805312 (0.0018) [2024-06-15 22:07:35,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 1649278976. Throughput: 0: 10706.5. Samples: 412378112. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:35,739][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:07:38,111][1652475] Updated weights for policy 0, policy_version 805434 (0.0015) [2024-06-15 22:07:40,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1649541120. Throughput: 0: 10661.1. Samples: 412444160. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:40,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:07:41,867][1651340] Signal inference workers to stop experience collection... (41450 times) [2024-06-15 22:07:41,914][1652475] InferenceWorker_p0-w0: stopping experience collection (41450 times) [2024-06-15 22:07:42,151][1651340] Signal inference workers to resume experience collection... (41450 times) [2024-06-15 22:07:42,152][1652475] InferenceWorker_p0-w0: resuming experience collection (41450 times) [2024-06-15 22:07:42,596][1652475] Updated weights for policy 0, policy_version 805488 (0.0092) [2024-06-15 22:07:45,571][1652475] Updated weights for policy 0, policy_version 805528 (0.0015) [2024-06-15 22:07:45,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 41506.2, 300 sec: 42765.0). Total num frames: 1649737728. Throughput: 0: 10831.6. Samples: 412514816. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:45,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:07:48,084][1652475] Updated weights for policy 0, policy_version 805584 (0.0013) [2024-06-15 22:07:49,243][1652475] Updated weights for policy 0, policy_version 805648 (0.0014) [2024-06-15 22:07:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 43098.5). Total num frames: 1650065408. Throughput: 0: 10877.1. Samples: 412548608. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:50,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:07:53,824][1652475] Updated weights for policy 0, policy_version 805728 (0.0012) [2024-06-15 22:07:55,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43144.7, 300 sec: 43098.2). Total num frames: 1650196480. Throughput: 0: 10843.0. Samples: 412615680. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:07:55,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:07:56,668][1652475] Updated weights for policy 0, policy_version 805776 (0.0014) [2024-06-15 22:07:57,947][1652475] Updated weights for policy 0, policy_version 805824 (0.0018) [2024-06-15 22:08:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1650425856. Throughput: 0: 11047.8. Samples: 412681216. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:08:00,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:08:01,266][1652475] Updated weights for policy 0, policy_version 805905 (0.0012) [2024-06-15 22:08:05,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1650622464. Throughput: 0: 10911.3. Samples: 412713472. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:08:05,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:08:05,851][1652475] Updated weights for policy 0, policy_version 805971 (0.0013) [2024-06-15 22:08:08,123][1652475] Updated weights for policy 0, policy_version 806018 (0.0013) [2024-06-15 22:08:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1650851840. Throughput: 0: 11036.5. Samples: 412781568. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:08:10,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:08:11,256][1652475] Updated weights for policy 0, policy_version 806086 (0.0013) [2024-06-15 22:08:12,749][1652475] Updated weights for policy 0, policy_version 806164 (0.0013) [2024-06-15 22:08:15,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1651113984. Throughput: 0: 10990.9. Samples: 412848128. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:08:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:08:17,379][1652475] Updated weights for policy 0, policy_version 806210 (0.0017) [2024-06-15 22:08:18,644][1652475] Updated weights for policy 0, policy_version 806271 (0.0011) [2024-06-15 22:08:20,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 44236.9, 300 sec: 42987.2). Total num frames: 1651343360. Throughput: 0: 11207.1. Samples: 412882432. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:08:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:08:20,741][1652475] Updated weights for policy 0, policy_version 806329 (0.0012) [2024-06-15 22:08:23,396][1652475] Updated weights for policy 0, policy_version 806390 (0.0012) [2024-06-15 22:08:24,411][1651340] Signal inference workers to stop experience collection... (41500 times) [2024-06-15 22:08:24,464][1652475] InferenceWorker_p0-w0: stopping experience collection (41500 times) [2024-06-15 22:08:24,751][1651340] Signal inference workers to resume experience collection... (41500 times) [2024-06-15 22:08:24,752][1652475] InferenceWorker_p0-w0: resuming experience collection (41500 times) [2024-06-15 22:08:25,001][1652475] Updated weights for policy 0, policy_version 806461 (0.0012) [2024-06-15 22:08:25,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 46421.4, 300 sec: 43098.2). Total num frames: 1651638272. Throughput: 0: 11207.1. Samples: 412948480. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:08:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:08:30,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 42052.3, 300 sec: 42543.0). Total num frames: 1651638272. Throughput: 0: 11195.7. Samples: 413018624. Policy #0 lag: (min: 10.0, avg: 136.3, max: 266.0) [2024-06-15 22:08:30,738][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 22:08:32,544][1652475] Updated weights for policy 0, policy_version 806560 (0.0013) [2024-06-15 22:08:35,202][1652475] Updated weights for policy 0, policy_version 806631 (0.0032) [2024-06-15 22:08:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 45875.3, 300 sec: 42765.0). Total num frames: 1652031488. Throughput: 0: 11047.8. Samples: 413045760. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:08:35,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:08:37,123][1652475] Updated weights for policy 0, policy_version 806712 (0.0014) [2024-06-15 22:08:40,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1652162560. Throughput: 0: 10831.6. Samples: 413103104. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:08:40,739][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 22:08:43,518][1652475] Updated weights for policy 0, policy_version 806773 (0.0076) [2024-06-15 22:08:45,194][1652475] Updated weights for policy 0, policy_version 806832 (0.0011) [2024-06-15 22:08:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44782.9, 300 sec: 42653.9). Total num frames: 1652424704. Throughput: 0: 10934.0. Samples: 413173248. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:08:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:08:46,901][1652475] Updated weights for policy 0, policy_version 806896 (0.0012) [2024-06-15 22:08:49,064][1652475] Updated weights for policy 0, policy_version 806928 (0.0011) [2024-06-15 22:08:50,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 43690.5, 300 sec: 43098.2). Total num frames: 1652686848. Throughput: 0: 10877.1. Samples: 413202944. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:08:50,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 22:08:53,875][1652475] Updated weights for policy 0, policy_version 806996 (0.0013) [2024-06-15 22:08:55,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1652817920. Throughput: 0: 10968.1. Samples: 413275136. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:08:55,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:08:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000807040_1652817920.pth... [2024-06-15 22:08:55,832][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000802048_1642594304.pth [2024-06-15 22:08:56,746][1652475] Updated weights for policy 0, policy_version 807059 (0.0012) [2024-06-15 22:08:59,249][1652475] Updated weights for policy 0, policy_version 807162 (0.0013) [2024-06-15 22:09:00,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 44236.8, 300 sec: 42653.9). Total num frames: 1653080064. Throughput: 0: 10706.5. Samples: 413329920. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:00,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:09:03,311][1652475] Updated weights for policy 0, policy_version 807216 (0.0039) [2024-06-15 22:09:05,738][1648984] Fps is (10 sec: 49153.2, 60 sec: 44782.9, 300 sec: 43431.5). Total num frames: 1653309440. Throughput: 0: 10820.3. Samples: 413369344. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:05,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:09:05,812][1652475] Updated weights for policy 0, policy_version 807284 (0.0058) [2024-06-15 22:09:08,617][1652475] Updated weights for policy 0, policy_version 807328 (0.0014) [2024-06-15 22:09:10,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 45329.1, 300 sec: 42876.1). Total num frames: 1653571584. Throughput: 0: 10888.5. Samples: 413438464. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:09:10,940][1652475] Updated weights for policy 0, policy_version 807423 (0.0086) [2024-06-15 22:09:14,144][1651340] Signal inference workers to stop experience collection... (41550 times) [2024-06-15 22:09:14,175][1652475] InferenceWorker_p0-w0: stopping experience collection (41550 times) [2024-06-15 22:09:14,340][1651340] Signal inference workers to resume experience collection... (41550 times) [2024-06-15 22:09:14,341][1652475] InferenceWorker_p0-w0: resuming experience collection (41550 times) [2024-06-15 22:09:15,278][1652475] Updated weights for policy 0, policy_version 807485 (0.0021) [2024-06-15 22:09:15,738][1648984] Fps is (10 sec: 42597.9, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1653735424. Throughput: 0: 10740.6. Samples: 413501952. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:15,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 22:09:17,390][1652475] Updated weights for policy 0, policy_version 807547 (0.0013) [2024-06-15 22:09:20,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 1653932032. Throughput: 0: 10831.7. Samples: 413533184. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:09:23,127][1652475] Updated weights for policy 0, policy_version 807622 (0.0016) [2024-06-15 22:09:24,281][1652475] Updated weights for policy 0, policy_version 807680 (0.0012) [2024-06-15 22:09:25,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1654128640. Throughput: 0: 11025.1. Samples: 413599232. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:09:28,450][1652475] Updated weights for policy 0, policy_version 807760 (0.0012) [2024-06-15 22:09:30,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 43542.5). Total num frames: 1654390784. Throughput: 0: 10831.6. Samples: 413660672. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:09:32,752][1652475] Updated weights for policy 0, policy_version 807872 (0.0014) [2024-06-15 22:09:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42542.9). Total num frames: 1654521856. Throughput: 0: 10797.6. Samples: 413688832. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:09:38,311][1652475] Updated weights for policy 0, policy_version 807924 (0.0014) [2024-06-15 22:09:39,806][1652475] Updated weights for policy 0, policy_version 807998 (0.0011) [2024-06-15 22:09:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1654784000. Throughput: 0: 10615.5. Samples: 413752832. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:40,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:09:43,485][1652475] Updated weights for policy 0, policy_version 808052 (0.0134) [2024-06-15 22:09:45,101][1652475] Updated weights for policy 0, policy_version 808125 (0.0019) [2024-06-15 22:09:45,739][1648984] Fps is (10 sec: 52422.8, 60 sec: 43689.9, 300 sec: 43320.3). Total num frames: 1655046144. Throughput: 0: 10740.4. Samples: 413813248. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:45,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:09:50,511][1652475] Updated weights for policy 0, policy_version 808187 (0.0013) [2024-06-15 22:09:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.3, 300 sec: 42653.9). Total num frames: 1655177216. Throughput: 0: 10660.9. Samples: 413849088. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:50,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:09:54,611][1652475] Updated weights for policy 0, policy_version 808260 (0.0022) [2024-06-15 22:09:55,742][1648984] Fps is (10 sec: 36034.7, 60 sec: 43141.8, 300 sec: 43430.9). Total num frames: 1655406592. Throughput: 0: 10557.7. Samples: 413913600. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:09:55,743][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:09:56,823][1652475] Updated weights for policy 0, policy_version 808321 (0.0012) [2024-06-15 22:09:58,098][1652475] Updated weights for policy 0, policy_version 808384 (0.0013) [2024-06-15 22:10:00,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.1, 300 sec: 43098.9). Total num frames: 1655570432. Throughput: 0: 10581.3. Samples: 413978112. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:10:00,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:10:01,324][1651340] Signal inference workers to stop experience collection... (41600 times) [2024-06-15 22:10:01,414][1652475] InferenceWorker_p0-w0: stopping experience collection (41600 times) [2024-06-15 22:10:01,548][1651340] Signal inference workers to resume experience collection... (41600 times) [2024-06-15 22:10:01,549][1652475] InferenceWorker_p0-w0: resuming experience collection (41600 times) [2024-06-15 22:10:01,718][1652475] Updated weights for policy 0, policy_version 808444 (0.0040) [2024-06-15 22:10:03,372][1652475] Updated weights for policy 0, policy_version 808500 (0.0014) [2024-06-15 22:10:05,738][1648984] Fps is (10 sec: 42615.1, 60 sec: 42052.2, 300 sec: 43098.3). Total num frames: 1655832576. Throughput: 0: 10592.7. Samples: 414009856. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:10:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:10:07,108][1652475] Updated weights for policy 0, policy_version 808560 (0.0011) [2024-06-15 22:10:09,302][1652475] Updated weights for policy 0, policy_version 808624 (0.0011) [2024-06-15 22:10:10,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 42052.3, 300 sec: 43098.8). Total num frames: 1656094720. Throughput: 0: 10706.5. Samples: 414081024. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:10:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:10:12,032][1652475] Updated weights for policy 0, policy_version 808672 (0.0014) [2024-06-15 22:10:14,757][1652475] Updated weights for policy 0, policy_version 808752 (0.0115) [2024-06-15 22:10:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 1656356864. Throughput: 0: 10774.8. Samples: 414145536. Policy #0 lag: (min: 8.0, avg: 91.0, max: 264.0) [2024-06-15 22:10:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:10:18,958][1652475] Updated weights for policy 0, policy_version 808832 (0.0074) [2024-06-15 22:10:20,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 1656487936. Throughput: 0: 11025.1. Samples: 414184960. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:10:20,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:10:22,035][1652475] Updated weights for policy 0, policy_version 808896 (0.0011) [2024-06-15 22:10:24,538][1652475] Updated weights for policy 0, policy_version 808958 (0.0011) [2024-06-15 22:10:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1656750080. Throughput: 0: 10979.6. Samples: 414246912. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:10:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:10:26,783][1652475] Updated weights for policy 0, policy_version 809008 (0.0022) [2024-06-15 22:10:30,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 1656979456. Throughput: 0: 11184.7. Samples: 414316544. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:10:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:10:30,849][1652475] Updated weights for policy 0, policy_version 809087 (0.0105) [2024-06-15 22:10:33,387][1652475] Updated weights for policy 0, policy_version 809152 (0.0011) [2024-06-15 22:10:35,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 45329.1, 300 sec: 43431.5). Total num frames: 1657241600. Throughput: 0: 11002.3. Samples: 414344192. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:10:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:10:38,340][1652475] Updated weights for policy 0, policy_version 809234 (0.0012) [2024-06-15 22:10:40,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.8, 300 sec: 43098.8). Total num frames: 1657405440. Throughput: 0: 10991.9. Samples: 414408192. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:10:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:10:43,302][1652475] Updated weights for policy 0, policy_version 809300 (0.0012) [2024-06-15 22:10:45,653][1652475] Updated weights for policy 0, policy_version 809406 (0.0012) [2024-06-15 22:10:45,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43691.6, 300 sec: 43320.4). Total num frames: 1657667584. Throughput: 0: 11002.3. Samples: 414473216. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:10:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:10:49,803][1651340] Signal inference workers to stop experience collection... (41650 times) [2024-06-15 22:10:49,822][1652475] Updated weights for policy 0, policy_version 809473 (0.0013) [2024-06-15 22:10:49,858][1652475] InferenceWorker_p0-w0: stopping experience collection (41650 times) [2024-06-15 22:10:50,077][1651340] Signal inference workers to resume experience collection... (41650 times) [2024-06-15 22:10:50,078][1652475] InferenceWorker_p0-w0: resuming experience collection (41650 times) [2024-06-15 22:10:50,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 44783.0, 300 sec: 43431.5). Total num frames: 1657864192. Throughput: 0: 10968.2. Samples: 414503424. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:10:50,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:10:55,280][1652475] Updated weights for policy 0, policy_version 809537 (0.0011) [2024-06-15 22:10:55,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 42601.2, 300 sec: 42765.0). Total num frames: 1657962496. Throughput: 0: 10922.7. Samples: 414572544. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:10:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:10:56,308][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000809584_1658028032.pth... [2024-06-15 22:10:56,355][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000804528_1647673344.pth [2024-06-15 22:10:56,361][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000809584_1658028032.pth [2024-06-15 22:10:56,644][1652475] Updated weights for policy 0, policy_version 809598 (0.0010) [2024-06-15 22:10:58,243][1652475] Updated weights for policy 0, policy_version 809648 (0.0013) [2024-06-15 22:11:00,425][1652475] Updated weights for policy 0, policy_version 809696 (0.0027) [2024-06-15 22:11:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44782.8, 300 sec: 43320.4). Total num frames: 1658257408. Throughput: 0: 10899.9. Samples: 414636032. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:00,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:11:02,602][1652475] Updated weights for policy 0, policy_version 809782 (0.0088) [2024-06-15 22:11:05,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1658454016. Throughput: 0: 10570.0. Samples: 414660608. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:05,741][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:11:07,011][1652475] Updated weights for policy 0, policy_version 809824 (0.0026) [2024-06-15 22:11:10,525][1652475] Updated weights for policy 0, policy_version 809862 (0.0023) [2024-06-15 22:11:10,738][1648984] Fps is (10 sec: 36044.3, 60 sec: 42052.1, 300 sec: 43209.3). Total num frames: 1658617856. Throughput: 0: 10820.2. Samples: 414733824. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:10,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:11:11,732][1652475] Updated weights for policy 0, policy_version 809920 (0.0017) [2024-06-15 22:11:14,414][1652475] Updated weights for policy 0, policy_version 810000 (0.0013) [2024-06-15 22:11:15,625][1652475] Updated weights for policy 0, policy_version 810046 (0.0012) [2024-06-15 22:11:15,774][1648984] Fps is (10 sec: 52240.1, 60 sec: 43664.4, 300 sec: 43648.3). Total num frames: 1658978304. Throughput: 0: 10447.8. Samples: 414787072. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:15,775][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:11:20,738][1648984] Fps is (10 sec: 49153.3, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1659109376. Throughput: 0: 10717.9. Samples: 414826496. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:20,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 22:11:23,273][1652475] Updated weights for policy 0, policy_version 810113 (0.0044) [2024-06-15 22:11:25,316][1652475] Updated weights for policy 0, policy_version 810194 (0.0014) [2024-06-15 22:11:25,738][1648984] Fps is (10 sec: 32886.6, 60 sec: 42598.4, 300 sec: 43542.6). Total num frames: 1659305984. Throughput: 0: 10626.8. Samples: 414886400. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 22:11:28,200][1652475] Updated weights for policy 0, policy_version 810241 (0.0016) [2024-06-15 22:11:28,938][1652475] Updated weights for policy 0, policy_version 810274 (0.0013) [2024-06-15 22:11:30,109][1652475] Updated weights for policy 0, policy_version 810320 (0.0022) [2024-06-15 22:11:30,738][1648984] Fps is (10 sec: 45874.2, 60 sec: 43144.3, 300 sec: 43320.4). Total num frames: 1659568128. Throughput: 0: 10581.3. Samples: 414949376. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:30,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 22:11:31,014][1652475] Updated weights for policy 0, policy_version 810359 (0.0011) [2024-06-15 22:11:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 40960.0, 300 sec: 43320.4). Total num frames: 1659699200. Throughput: 0: 10706.5. Samples: 414985216. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:11:36,000][1652475] Updated weights for policy 0, policy_version 810431 (0.0012) [2024-06-15 22:11:38,436][1651340] Signal inference workers to stop experience collection... (41700 times) [2024-06-15 22:11:38,476][1652475] InferenceWorker_p0-w0: stopping experience collection (41700 times) [2024-06-15 22:11:38,690][1651340] Signal inference workers to resume experience collection... (41700 times) [2024-06-15 22:11:38,691][1652475] InferenceWorker_p0-w0: resuming experience collection (41700 times) [2024-06-15 22:11:38,752][1652475] Updated weights for policy 0, policy_version 810483 (0.0013) [2024-06-15 22:11:40,301][1652475] Updated weights for policy 0, policy_version 810515 (0.0012) [2024-06-15 22:11:40,738][1648984] Fps is (10 sec: 39322.4, 60 sec: 42598.3, 300 sec: 43098.3). Total num frames: 1659961344. Throughput: 0: 10581.3. Samples: 415048704. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:40,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:11:41,849][1652475] Updated weights for policy 0, policy_version 810576 (0.0011) [2024-06-15 22:11:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 1660157952. Throughput: 0: 10547.2. Samples: 415110656. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:45,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 22:11:47,496][1652475] Updated weights for policy 0, policy_version 810672 (0.0124) [2024-06-15 22:11:50,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 40413.9, 300 sec: 42987.2). Total num frames: 1660289024. Throughput: 0: 10626.8. Samples: 415138816. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:50,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:11:52,926][1652475] Updated weights for policy 0, policy_version 810752 (0.0013) [2024-06-15 22:11:54,172][1652475] Updated weights for policy 0, policy_version 810808 (0.0011) [2024-06-15 22:11:55,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 43209.3). Total num frames: 1660583936. Throughput: 0: 10410.7. Samples: 415202304. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:11:55,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:11:56,567][1652475] Updated weights for policy 0, policy_version 810873 (0.0090) [2024-06-15 22:11:58,416][1652475] Updated weights for policy 0, policy_version 810914 (0.0011) [2024-06-15 22:11:58,996][1652475] Updated weights for policy 0, policy_version 810943 (0.0065) [2024-06-15 22:12:00,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 42598.6, 300 sec: 43320.4). Total num frames: 1660813312. Throughput: 0: 10680.9. Samples: 415267328. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:12:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:12:05,228][1652475] Updated weights for policy 0, policy_version 811009 (0.0012) [2024-06-15 22:12:05,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 1660977152. Throughput: 0: 10638.2. Samples: 415305216. Policy #0 lag: (min: 27.0, avg: 125.3, max: 283.0) [2024-06-15 22:12:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:12:06,514][1652475] Updated weights for policy 0, policy_version 811069 (0.0024) [2024-06-15 22:12:09,129][1652475] Updated weights for policy 0, policy_version 811120 (0.0014) [2024-06-15 22:12:10,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 44237.0, 300 sec: 43320.4). Total num frames: 1661272064. Throughput: 0: 10683.7. Samples: 415367168. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:12:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:12:11,245][1652475] Updated weights for policy 0, policy_version 811199 (0.0011) [2024-06-15 22:12:15,738][1648984] Fps is (10 sec: 45874.5, 60 sec: 40984.5, 300 sec: 43209.3). Total num frames: 1661435904. Throughput: 0: 10729.3. Samples: 415432192. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:12:15,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:12:16,030][1652475] Updated weights for policy 0, policy_version 811260 (0.0012) [2024-06-15 22:12:18,030][1652475] Updated weights for policy 0, policy_version 811321 (0.0012) [2024-06-15 22:12:20,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 41506.1, 300 sec: 43209.3). Total num frames: 1661599744. Throughput: 0: 10592.7. Samples: 415461888. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:12:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:12:21,747][1652475] Updated weights for policy 0, policy_version 811362 (0.0020) [2024-06-15 22:12:24,173][1652475] Updated weights for policy 0, policy_version 811445 (0.0012) [2024-06-15 22:12:25,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 42598.4, 300 sec: 43209.3). Total num frames: 1661861888. Throughput: 0: 10444.8. Samples: 415518720. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:12:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:12:26,685][1651340] Signal inference workers to stop experience collection... (41750 times) [2024-06-15 22:12:26,747][1652475] InferenceWorker_p0-w0: stopping experience collection (41750 times) [2024-06-15 22:12:26,955][1651340] Signal inference workers to resume experience collection... (41750 times) [2024-06-15 22:12:26,956][1652475] InferenceWorker_p0-w0: resuming experience collection (41750 times) [2024-06-15 22:12:27,073][1652475] Updated weights for policy 0, policy_version 811488 (0.0127) [2024-06-15 22:12:30,264][1652475] Updated weights for policy 0, policy_version 811555 (0.0013) [2024-06-15 22:12:30,737][1648984] Fps is (10 sec: 52430.0, 60 sec: 42598.7, 300 sec: 43542.6). Total num frames: 1662124032. Throughput: 0: 10740.7. Samples: 415593984. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:12:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:12:33,996][1652475] Updated weights for policy 0, policy_version 811632 (0.0011) [2024-06-15 22:12:35,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 43320.4). Total num frames: 1662320640. Throughput: 0: 10956.8. Samples: 415631872. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:12:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:12:35,959][1652475] Updated weights for policy 0, policy_version 811704 (0.0046) [2024-06-15 22:12:38,599][1652475] Updated weights for policy 0, policy_version 811760 (0.0025) [2024-06-15 22:12:40,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 1662517248. Throughput: 0: 10877.2. Samples: 415691776. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:12:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:12:41,766][1652475] Updated weights for policy 0, policy_version 811793 (0.0016) [2024-06-15 22:12:44,464][1652475] Updated weights for policy 0, policy_version 811841 (0.0015) [2024-06-15 22:12:45,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1662746624. Throughput: 0: 10990.9. Samples: 415761920. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:12:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:12:46,895][1652475] Updated weights for policy 0, policy_version 811940 (0.0013) [2024-06-15 22:12:49,760][1652475] Updated weights for policy 0, policy_version 812000 (0.0103) [2024-06-15 22:12:50,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 45875.3, 300 sec: 43542.6). Total num frames: 1663041536. Throughput: 0: 10774.8. Samples: 415790080. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:12:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:12:53,719][1652475] Updated weights for policy 0, policy_version 812064 (0.0013) [2024-06-15 22:12:55,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 1663172608. Throughput: 0: 10934.0. Samples: 415859200. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:12:55,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:12:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000812096_1663172608.pth... [2024-06-15 22:12:55,817][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000807040_1652817920.pth [2024-06-15 22:12:57,263][1652475] Updated weights for policy 0, policy_version 812134 (0.0012) [2024-06-15 22:12:58,588][1652475] Updated weights for policy 0, policy_version 812179 (0.0012) [2024-06-15 22:13:00,737][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.8, 300 sec: 43431.5). Total num frames: 1663434752. Throughput: 0: 11013.8. Samples: 415927808. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:13:01,172][1652475] Updated weights for policy 0, policy_version 812256 (0.0077) [2024-06-15 22:13:05,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 44783.0, 300 sec: 43431.5). Total num frames: 1663664128. Throughput: 0: 11082.0. Samples: 415960576. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:13:05,874][1652475] Updated weights for policy 0, policy_version 812343 (0.0020) [2024-06-15 22:13:09,355][1652475] Updated weights for policy 0, policy_version 812415 (0.0019) [2024-06-15 22:13:10,738][1648984] Fps is (10 sec: 39320.0, 60 sec: 42598.3, 300 sec: 43098.2). Total num frames: 1663827968. Throughput: 0: 11161.6. Samples: 416020992. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:10,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:13:13,351][1651340] Signal inference workers to stop experience collection... (41800 times) [2024-06-15 22:13:13,471][1652475] InferenceWorker_p0-w0: stopping experience collection (41800 times) [2024-06-15 22:13:13,649][1651340] Signal inference workers to resume experience collection... (41800 times) [2024-06-15 22:13:13,649][1652475] InferenceWorker_p0-w0: resuming experience collection (41800 times) [2024-06-15 22:13:14,114][1652475] Updated weights for policy 0, policy_version 812517 (0.0161) [2024-06-15 22:13:15,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44237.0, 300 sec: 43209.3). Total num frames: 1664090112. Throughput: 0: 10786.1. Samples: 416079360. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:15,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 22:13:17,761][1652475] Updated weights for policy 0, policy_version 812576 (0.0155) [2024-06-15 22:13:20,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1664221184. Throughput: 0: 10513.1. Samples: 416104960. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:13:21,316][1652475] Updated weights for policy 0, policy_version 812640 (0.0048) [2024-06-15 22:13:25,747][1648984] Fps is (10 sec: 26190.2, 60 sec: 41499.8, 300 sec: 43096.9). Total num frames: 1664352256. Throughput: 0: 10761.2. Samples: 416176128. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:25,748][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:13:27,434][1652475] Updated weights for policy 0, policy_version 812739 (0.0060) [2024-06-15 22:13:29,424][1652475] Updated weights for policy 0, policy_version 812819 (0.0219) [2024-06-15 22:13:30,739][1648984] Fps is (10 sec: 52423.3, 60 sec: 43689.8, 300 sec: 43098.1). Total num frames: 1664745472. Throughput: 0: 10330.8. Samples: 416226816. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:30,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:13:33,972][1652475] Updated weights for policy 0, policy_version 812899 (0.0081) [2024-06-15 22:13:35,769][1648984] Fps is (10 sec: 52315.2, 60 sec: 42576.5, 300 sec: 43093.8). Total num frames: 1664876544. Throughput: 0: 10619.5. Samples: 416268288. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:35,769][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:13:38,944][1652475] Updated weights for policy 0, policy_version 812960 (0.0017) [2024-06-15 22:13:40,738][1648984] Fps is (10 sec: 29494.6, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1665040384. Throughput: 0: 10535.8. Samples: 416333312. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:13:41,023][1652475] Updated weights for policy 0, policy_version 813026 (0.0015) [2024-06-15 22:13:42,444][1652475] Updated weights for policy 0, policy_version 813088 (0.0167) [2024-06-15 22:13:44,559][1652475] Updated weights for policy 0, policy_version 813136 (0.0014) [2024-06-15 22:13:45,738][1648984] Fps is (10 sec: 49304.7, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1665368064. Throughput: 0: 10308.2. Samples: 416391680. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:13:45,762][1652475] Updated weights for policy 0, policy_version 813178 (0.0011) [2024-06-15 22:13:50,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 39867.7, 300 sec: 42765.0). Total num frames: 1665433600. Throughput: 0: 10422.0. Samples: 416429568. Policy #0 lag: (min: 47.0, avg: 154.4, max: 351.0) [2024-06-15 22:13:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:13:51,487][1652475] Updated weights for policy 0, policy_version 813240 (0.0014) [2024-06-15 22:13:54,329][1652475] Updated weights for policy 0, policy_version 813307 (0.0027) [2024-06-15 22:13:55,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1665761280. Throughput: 0: 10535.9. Samples: 416495104. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:13:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:13:56,295][1652475] Updated weights for policy 0, policy_version 813379 (0.0120) [2024-06-15 22:13:57,668][1652475] Updated weights for policy 0, policy_version 813434 (0.0012) [2024-06-15 22:14:00,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 41506.0, 300 sec: 42765.0). Total num frames: 1665925120. Throughput: 0: 10547.2. Samples: 416553984. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:14:02,860][1651340] Signal inference workers to stop experience collection... (41850 times) [2024-06-15 22:14:02,924][1652475] InferenceWorker_p0-w0: stopping experience collection (41850 times) [2024-06-15 22:14:03,141][1651340] Signal inference workers to resume experience collection... (41850 times) [2024-06-15 22:14:03,142][1652475] InferenceWorker_p0-w0: resuming experience collection (41850 times) [2024-06-15 22:14:03,377][1652475] Updated weights for policy 0, policy_version 813480 (0.0016) [2024-06-15 22:14:05,724][1652475] Updated weights for policy 0, policy_version 813539 (0.0056) [2024-06-15 22:14:05,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 40960.0, 300 sec: 42542.9). Total num frames: 1666121728. Throughput: 0: 10717.9. Samples: 416587264. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:05,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:14:07,608][1652475] Updated weights for policy 0, policy_version 813600 (0.0014) [2024-06-15 22:14:09,429][1652475] Updated weights for policy 0, policy_version 813664 (0.0012) [2024-06-15 22:14:10,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 1666449408. Throughput: 0: 10560.7. Samples: 416651264. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:14:14,521][1652475] Updated weights for policy 0, policy_version 813728 (0.0013) [2024-06-15 22:14:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 1666580480. Throughput: 0: 10922.9. Samples: 416718336. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:14:15,892][1652475] Updated weights for policy 0, policy_version 813761 (0.0013) [2024-06-15 22:14:20,260][1652475] Updated weights for policy 0, policy_version 813825 (0.0020) [2024-06-15 22:14:20,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1666744320. Throughput: 0: 10634.2. Samples: 416746496. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:20,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 22:14:22,307][1652475] Updated weights for policy 0, policy_version 813895 (0.0149) [2024-06-15 22:14:25,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 44243.5, 300 sec: 42765.0). Total num frames: 1667006464. Throughput: 0: 10592.7. Samples: 416809984. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:25,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 22:14:25,849][1652475] Updated weights for policy 0, policy_version 813974 (0.0013) [2024-06-15 22:14:28,808][1652475] Updated weights for policy 0, policy_version 814064 (0.0011) [2024-06-15 22:14:30,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 41506.8, 300 sec: 43098.2). Total num frames: 1667235840. Throughput: 0: 10661.0. Samples: 416871424. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:14:33,624][1652475] Updated weights for policy 0, policy_version 814128 (0.0098) [2024-06-15 22:14:35,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 41527.5, 300 sec: 42654.0). Total num frames: 1667366912. Throughput: 0: 10672.3. Samples: 416909824. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:14:37,499][1652475] Updated weights for policy 0, policy_version 814198 (0.0013) [2024-06-15 22:14:39,192][1652475] Updated weights for policy 0, policy_version 814272 (0.0013) [2024-06-15 22:14:40,711][1652475] Updated weights for policy 0, policy_version 814326 (0.0014) [2024-06-15 22:14:40,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 44782.7, 300 sec: 42987.3). Total num frames: 1667727360. Throughput: 0: 10467.5. Samples: 416966144. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:14:45,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 40960.0, 300 sec: 42876.1). Total num frames: 1667825664. Throughput: 0: 10786.1. Samples: 417039360. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:14:45,912][1652475] Updated weights for policy 0, policy_version 814369 (0.0013) [2024-06-15 22:14:49,272][1652475] Updated weights for policy 0, policy_version 814448 (0.0012) [2024-06-15 22:14:49,880][1651340] Signal inference workers to stop experience collection... (41900 times) [2024-06-15 22:14:49,945][1652475] InferenceWorker_p0-w0: stopping experience collection (41900 times) [2024-06-15 22:14:50,125][1651340] Signal inference workers to resume experience collection... (41900 times) [2024-06-15 22:14:50,127][1652475] InferenceWorker_p0-w0: resuming experience collection (41900 times) [2024-06-15 22:14:50,615][1652475] Updated weights for policy 0, policy_version 814497 (0.0105) [2024-06-15 22:14:50,738][1648984] Fps is (10 sec: 36045.6, 60 sec: 44236.8, 300 sec: 42987.7). Total num frames: 1668087808. Throughput: 0: 10808.9. Samples: 417073664. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:14:52,739][1652475] Updated weights for policy 0, policy_version 814585 (0.0012) [2024-06-15 22:14:55,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42052.3, 300 sec: 43098.3). Total num frames: 1668284416. Throughput: 0: 10626.8. Samples: 417129472. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:14:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:14:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000814592_1668284416.pth... [2024-06-15 22:14:55,842][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000809584_1658028032.pth [2024-06-15 22:14:58,035][1652475] Updated weights for policy 0, policy_version 814645 (0.0011) [2024-06-15 22:15:00,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1668415488. Throughput: 0: 10695.1. Samples: 417199616. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:15:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:15:01,213][1652475] Updated weights for policy 0, policy_version 814688 (0.0012) [2024-06-15 22:15:02,495][1652475] Updated weights for policy 0, policy_version 814724 (0.0011) [2024-06-15 22:15:05,495][1652475] Updated weights for policy 0, policy_version 814846 (0.0012) [2024-06-15 22:15:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 43098.3). Total num frames: 1668808704. Throughput: 0: 10706.5. Samples: 417228288. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:15:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:15:10,204][1652475] Updated weights for policy 0, policy_version 814887 (0.0013) [2024-06-15 22:15:10,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1668939776. Throughput: 0: 10740.6. Samples: 417293312. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:15:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:15:13,693][1652475] Updated weights for policy 0, policy_version 814947 (0.0014) [2024-06-15 22:15:14,774][1652475] Updated weights for policy 0, policy_version 814979 (0.0011) [2024-06-15 22:15:15,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1669136384. Throughput: 0: 10854.4. Samples: 417359872. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:15:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:15:16,877][1652475] Updated weights for policy 0, policy_version 815060 (0.0012) [2024-06-15 22:15:20,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 1669332992. Throughput: 0: 10615.5. Samples: 417387520. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:15:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:15:20,763][1652475] Updated weights for policy 0, policy_version 815105 (0.0011) [2024-06-15 22:15:23,981][1652475] Updated weights for policy 0, policy_version 815172 (0.0012) [2024-06-15 22:15:25,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 43144.6, 300 sec: 42765.0). Total num frames: 1669595136. Throughput: 0: 11013.7. Samples: 417461760. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:15:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:15:26,348][1652475] Updated weights for policy 0, policy_version 815234 (0.0011) [2024-06-15 22:15:28,648][1652475] Updated weights for policy 0, policy_version 815300 (0.0088) [2024-06-15 22:15:29,938][1652475] Updated weights for policy 0, policy_version 815352 (0.0142) [2024-06-15 22:15:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.8, 300 sec: 42765.0). Total num frames: 1669857280. Throughput: 0: 10672.4. Samples: 417519616. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:15:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:15:33,223][1652475] Updated weights for policy 0, policy_version 815408 (0.0012) [2024-06-15 22:15:35,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 44236.9, 300 sec: 42765.0). Total num frames: 1670021120. Throughput: 0: 10729.3. Samples: 417556480. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:15:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:15:35,918][1651340] Signal inference workers to stop experience collection... (41950 times) [2024-06-15 22:15:35,950][1652475] InferenceWorker_p0-w0: stopping experience collection (41950 times) [2024-06-15 22:15:36,146][1651340] Signal inference workers to resume experience collection... (41950 times) [2024-06-15 22:15:36,147][1652475] InferenceWorker_p0-w0: resuming experience collection (41950 times) [2024-06-15 22:15:40,369][1652475] Updated weights for policy 0, policy_version 815490 (0.0012) [2024-06-15 22:15:40,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 40414.0, 300 sec: 42320.7). Total num frames: 1670152192. Throughput: 0: 10888.5. Samples: 417619456. Policy #0 lag: (min: 60.0, avg: 135.4, max: 261.0) [2024-06-15 22:15:40,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:15:42,894][1652475] Updated weights for policy 0, policy_version 815570 (0.0088) [2024-06-15 22:15:44,460][1652475] Updated weights for policy 0, policy_version 815636 (0.0013) [2024-06-15 22:15:45,738][1648984] Fps is (10 sec: 49151.4, 60 sec: 44782.8, 300 sec: 42876.1). Total num frames: 1670512640. Throughput: 0: 10604.1. Samples: 417676800. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:15:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:15:47,146][1652475] Updated weights for policy 0, policy_version 815700 (0.0107) [2024-06-15 22:15:50,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 42598.3, 300 sec: 42987.2). Total num frames: 1670643712. Throughput: 0: 10649.6. Samples: 417707520. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:15:50,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:15:54,297][1652475] Updated weights for policy 0, policy_version 815763 (0.0015) [2024-06-15 22:15:55,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 42052.2, 300 sec: 42542.9). Total num frames: 1670807552. Throughput: 0: 10922.7. Samples: 417784832. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:15:55,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:15:55,998][1652475] Updated weights for policy 0, policy_version 815840 (0.0028) [2024-06-15 22:15:57,860][1652475] Updated weights for policy 0, policy_version 815909 (0.0033) [2024-06-15 22:15:59,832][1652475] Updated weights for policy 0, policy_version 815993 (0.0015) [2024-06-15 22:16:00,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 1671168000. Throughput: 0: 10399.3. Samples: 417827840. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:16:05,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 39321.6, 300 sec: 42542.9). Total num frames: 1671168000. Throughput: 0: 10752.0. Samples: 417871360. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:16:07,923][1652475] Updated weights for policy 0, policy_version 816052 (0.0013) [2024-06-15 22:16:10,017][1652475] Updated weights for policy 0, policy_version 816144 (0.0116) [2024-06-15 22:16:10,741][1648984] Fps is (10 sec: 32764.1, 60 sec: 42597.6, 300 sec: 42436.8). Total num frames: 1671495680. Throughput: 0: 10524.2. Samples: 417935360. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:10,743][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:16:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1671692288. Throughput: 0: 10513.1. Samples: 417992704. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:16:19,344][1652475] Updated weights for policy 0, policy_version 816257 (0.0110) [2024-06-15 22:16:20,738][1648984] Fps is (10 sec: 32771.8, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 1671823360. Throughput: 0: 10535.8. Samples: 418030592. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:16:20,796][1652475] Updated weights for policy 0, policy_version 816322 (0.0031) [2024-06-15 22:16:21,724][1651340] Signal inference workers to stop experience collection... (42000 times) [2024-06-15 22:16:21,835][1652475] InferenceWorker_p0-w0: stopping experience collection (42000 times) [2024-06-15 22:16:22,038][1651340] Signal inference workers to resume experience collection... (42000 times) [2024-06-15 22:16:22,040][1652475] InferenceWorker_p0-w0: resuming experience collection (42000 times) [2024-06-15 22:16:22,499][1652475] Updated weights for policy 0, policy_version 816384 (0.0012) [2024-06-15 22:16:24,659][1652475] Updated weights for policy 0, policy_version 816465 (0.0024) [2024-06-15 22:16:25,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1672216576. Throughput: 0: 10205.9. Samples: 418078720. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:25,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 22:16:30,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 39321.5, 300 sec: 42431.8). Total num frames: 1672216576. Throughput: 0: 10422.0. Samples: 418145792. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:30,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:16:33,601][1652475] Updated weights for policy 0, policy_version 816592 (0.0011) [2024-06-15 22:16:34,791][1652475] Updated weights for policy 0, policy_version 816636 (0.0010) [2024-06-15 22:16:35,738][1648984] Fps is (10 sec: 26214.9, 60 sec: 40960.0, 300 sec: 42431.8). Total num frames: 1672478720. Throughput: 0: 10285.6. Samples: 418170368. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:16:37,344][1652475] Updated weights for policy 0, policy_version 816700 (0.0024) [2024-06-15 22:16:39,190][1652475] Updated weights for policy 0, policy_version 816752 (0.0013) [2024-06-15 22:16:40,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43144.6, 300 sec: 42653.9). Total num frames: 1672740864. Throughput: 0: 9932.8. Samples: 418231808. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:16:44,377][1652475] Updated weights for policy 0, policy_version 816788 (0.0017) [2024-06-15 22:16:45,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 39321.7, 300 sec: 42654.0). Total num frames: 1672871936. Throughput: 0: 10444.8. Samples: 418297856. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:16:45,953][1652475] Updated weights for policy 0, policy_version 816855 (0.0010) [2024-06-15 22:16:49,647][1652475] Updated weights for policy 0, policy_version 816902 (0.0081) [2024-06-15 22:16:50,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 41506.2, 300 sec: 42542.9). Total num frames: 1673134080. Throughput: 0: 10205.9. Samples: 418330624. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:50,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:16:51,278][1652475] Updated weights for policy 0, policy_version 816980 (0.0018) [2024-06-15 22:16:55,439][1652475] Updated weights for policy 0, policy_version 817027 (0.0119) [2024-06-15 22:16:55,738][1648984] Fps is (10 sec: 42597.6, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 1673297920. Throughput: 0: 10274.4. Samples: 418397696. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:16:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:16:56,031][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000817056_1673330688.pth... [2024-06-15 22:16:56,167][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000812096_1663172608.pth [2024-06-15 22:16:57,855][1652475] Updated weights for policy 0, policy_version 817122 (0.0091) [2024-06-15 22:17:00,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 39321.6, 300 sec: 42542.9). Total num frames: 1673527296. Throughput: 0: 10410.7. Samples: 418461184. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:17:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:17:02,227][1652475] Updated weights for policy 0, policy_version 817153 (0.0010) [2024-06-15 22:17:03,871][1652475] Updated weights for policy 0, policy_version 817232 (0.0012) [2024-06-15 22:17:05,738][1648984] Fps is (10 sec: 49153.0, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 1673789440. Throughput: 0: 10308.3. Samples: 418494464. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:17:05,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:17:07,310][1651340] Signal inference workers to stop experience collection... (42050 times) [2024-06-15 22:17:07,400][1652475] InferenceWorker_p0-w0: stopping experience collection (42050 times) [2024-06-15 22:17:07,409][1652475] Updated weights for policy 0, policy_version 817282 (0.0012) [2024-06-15 22:17:07,672][1651340] Signal inference workers to resume experience collection... (42050 times) [2024-06-15 22:17:07,673][1652475] InferenceWorker_p0-w0: resuming experience collection (42050 times) [2024-06-15 22:17:10,165][1652475] Updated weights for policy 0, policy_version 817376 (0.0013) [2024-06-15 22:17:10,778][1648984] Fps is (10 sec: 48955.3, 60 sec: 42024.9, 300 sec: 42648.2). Total num frames: 1674018816. Throughput: 0: 10515.1. Samples: 418552320. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:17:10,778][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:17:15,319][1652475] Updated weights for policy 0, policy_version 817425 (0.0022) [2024-06-15 22:17:15,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 40413.8, 300 sec: 42431.8). Total num frames: 1674117120. Throughput: 0: 10535.8. Samples: 418619904. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:17:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:17:17,153][1652475] Updated weights for policy 0, policy_version 817504 (0.0013) [2024-06-15 22:17:20,738][1648984] Fps is (10 sec: 32899.9, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 1674346496. Throughput: 0: 10615.4. Samples: 418648064. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:17:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:17:21,516][1652475] Updated weights for policy 0, policy_version 817584 (0.0122) [2024-06-15 22:17:23,449][1652475] Updated weights for policy 0, policy_version 817663 (0.0013) [2024-06-15 22:17:25,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 39321.6, 300 sec: 42209.6). Total num frames: 1674575872. Throughput: 0: 10513.1. Samples: 418704896. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:17:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:17:28,950][1652475] Updated weights for policy 0, policy_version 817729 (0.0090) [2024-06-15 22:17:30,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 43690.6, 300 sec: 42431.8). Total num frames: 1674838016. Throughput: 0: 10456.1. Samples: 418768384. Policy #0 lag: (min: 3.0, avg: 119.5, max: 259.0) [2024-06-15 22:17:30,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:17:32,896][1652475] Updated weights for policy 0, policy_version 817793 (0.0014) [2024-06-15 22:17:34,968][1652475] Updated weights for policy 0, policy_version 817881 (0.0015) [2024-06-15 22:17:35,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1675100160. Throughput: 0: 10535.8. Samples: 418804736. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:17:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:17:40,620][1652475] Updated weights for policy 0, policy_version 817968 (0.0128) [2024-06-15 22:17:40,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 40960.0, 300 sec: 42209.6). Total num frames: 1675198464. Throughput: 0: 10604.1. Samples: 418874880. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:17:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:17:45,736][1652475] Updated weights for policy 0, policy_version 818064 (0.0012) [2024-06-15 22:17:45,739][1648984] Fps is (10 sec: 29490.9, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 1675395072. Throughput: 0: 10376.5. Samples: 418928128. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:17:45,740][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:17:46,692][1652475] Updated weights for policy 0, policy_version 818112 (0.0015) [2024-06-15 22:17:48,186][1652475] Updated weights for policy 0, policy_version 818176 (0.0054) [2024-06-15 22:17:50,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1675624448. Throughput: 0: 10274.1. Samples: 418956800. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:17:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:17:51,729][1651340] Signal inference workers to stop experience collection... (42100 times) [2024-06-15 22:17:51,775][1652475] InferenceWorker_p0-w0: stopping experience collection (42100 times) [2024-06-15 22:17:51,919][1651340] Signal inference workers to resume experience collection... (42100 times) [2024-06-15 22:17:51,920][1652475] InferenceWorker_p0-w0: resuming experience collection (42100 times) [2024-06-15 22:17:52,298][1652475] Updated weights for policy 0, policy_version 818240 (0.0012) [2024-06-15 22:17:53,905][1652475] Updated weights for policy 0, policy_version 818303 (0.0012) [2024-06-15 22:17:55,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43144.7, 300 sec: 42209.6). Total num frames: 1675886592. Throughput: 0: 10522.5. Samples: 419025408. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:17:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:17:57,805][1652475] Updated weights for policy 0, policy_version 818363 (0.0120) [2024-06-15 22:18:00,740][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 1676017664. Throughput: 0: 10626.8. Samples: 419098112. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:00,741][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:18:02,207][1652475] Updated weights for policy 0, policy_version 818422 (0.0011) [2024-06-15 22:18:03,500][1652475] Updated weights for policy 0, policy_version 818464 (0.0013) [2024-06-15 22:18:05,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 1676378112. Throughput: 0: 10683.8. Samples: 419128832. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:18:05,924][1652475] Updated weights for policy 0, policy_version 818558 (0.0013) [2024-06-15 22:18:10,738][1648984] Fps is (10 sec: 42597.8, 60 sec: 40440.8, 300 sec: 41876.4). Total num frames: 1676443648. Throughput: 0: 10649.5. Samples: 419184128. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:18:11,191][1652475] Updated weights for policy 0, policy_version 818612 (0.0055) [2024-06-15 22:18:15,277][1652475] Updated weights for policy 0, policy_version 818688 (0.0011) [2024-06-15 22:18:15,740][1648984] Fps is (10 sec: 32767.5, 60 sec: 43144.5, 300 sec: 42320.7). Total num frames: 1676705792. Throughput: 0: 10808.9. Samples: 419254784. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:15,741][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:18:17,262][1652475] Updated weights for policy 0, policy_version 818769 (0.0012) [2024-06-15 22:18:18,429][1652475] Updated weights for policy 0, policy_version 818816 (0.0011) [2024-06-15 22:18:20,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 43144.5, 300 sec: 42655.2). Total num frames: 1676935168. Throughput: 0: 10478.9. Samples: 419276288. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:18:25,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 41506.1, 300 sec: 41765.5). Total num frames: 1677066240. Throughput: 0: 10490.3. Samples: 419346944. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:18:26,060][1652475] Updated weights for policy 0, policy_version 818885 (0.0011) [2024-06-15 22:18:28,767][1652475] Updated weights for policy 0, policy_version 818977 (0.0023) [2024-06-15 22:18:30,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 43144.6, 300 sec: 42547.3). Total num frames: 1677426688. Throughput: 0: 10365.2. Samples: 419394560. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:18:30,775][1652475] Updated weights for policy 0, policy_version 819068 (0.0032) [2024-06-15 22:18:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 39321.6, 300 sec: 42098.5). Total num frames: 1677459456. Throughput: 0: 10558.6. Samples: 419431936. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:18:38,011][1652475] Updated weights for policy 0, policy_version 819129 (0.0129) [2024-06-15 22:18:38,317][1651340] Signal inference workers to stop experience collection... (42150 times) [2024-06-15 22:18:38,395][1652475] InferenceWorker_p0-w0: stopping experience collection (42150 times) [2024-06-15 22:18:38,588][1651340] Signal inference workers to resume experience collection... (42150 times) [2024-06-15 22:18:38,588][1652475] InferenceWorker_p0-w0: resuming experience collection (42150 times) [2024-06-15 22:18:39,600][1652475] Updated weights for policy 0, policy_version 819192 (0.0013) [2024-06-15 22:18:40,738][1648984] Fps is (10 sec: 32767.4, 60 sec: 42598.2, 300 sec: 41987.4). Total num frames: 1677754368. Throughput: 0: 10569.9. Samples: 419501056. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:40,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 22:18:41,836][1652475] Updated weights for policy 0, policy_version 819264 (0.0012) [2024-06-15 22:18:45,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43144.6, 300 sec: 42542.9). Total num frames: 1677983744. Throughput: 0: 10194.5. Samples: 419556864. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:18:48,908][1652475] Updated weights for policy 0, policy_version 819333 (0.0025) [2024-06-15 22:18:50,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 41506.1, 300 sec: 41876.4). Total num frames: 1678114816. Throughput: 0: 10376.5. Samples: 419595776. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:18:51,359][1652475] Updated weights for policy 0, policy_version 819397 (0.0012) [2024-06-15 22:18:54,085][1652475] Updated weights for policy 0, policy_version 819518 (0.0012) [2024-06-15 22:18:55,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 41506.0, 300 sec: 42209.6). Total num frames: 1678376960. Throughput: 0: 10240.0. Samples: 419644928. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:18:55,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:18:55,779][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000819520_1678376960.pth... [2024-06-15 22:18:55,823][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000814592_1668284416.pth [2024-06-15 22:18:57,424][1652475] Updated weights for policy 0, policy_version 819577 (0.0011) [2024-06-15 22:19:00,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 1678508032. Throughput: 0: 10194.5. Samples: 419713536. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:19:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:19:02,264][1652475] Updated weights for policy 0, policy_version 819638 (0.0014) [2024-06-15 22:19:04,251][1652475] Updated weights for policy 0, policy_version 819696 (0.0032) [2024-06-15 22:19:05,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 39867.7, 300 sec: 41765.3). Total num frames: 1678770176. Throughput: 0: 10387.9. Samples: 419743744. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:19:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:19:10,245][1652475] Updated weights for policy 0, policy_version 819780 (0.0203) [2024-06-15 22:19:10,740][1648984] Fps is (10 sec: 42598.1, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 1678934016. Throughput: 0: 10126.2. Samples: 419802624. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:19:10,741][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:19:11,479][1652475] Updated weights for policy 0, policy_version 819834 (0.0018) [2024-06-15 22:19:14,372][1652475] Updated weights for policy 0, policy_version 819896 (0.0017) [2024-06-15 22:19:15,738][1648984] Fps is (10 sec: 45873.1, 60 sec: 42052.0, 300 sec: 42320.6). Total num frames: 1679228928. Throughput: 0: 10478.8. Samples: 419866112. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:19:15,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:19:16,120][1652475] Updated weights for policy 0, policy_version 819963 (0.0010) [2024-06-15 22:19:20,296][1652475] Updated weights for policy 0, policy_version 820026 (0.0070) [2024-06-15 22:19:20,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 41506.2, 300 sec: 42098.5). Total num frames: 1679425536. Throughput: 0: 10433.4. Samples: 419901440. Policy #0 lag: (min: 15.0, avg: 117.9, max: 271.0) [2024-06-15 22:19:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:19:25,042][1652475] Updated weights for policy 0, policy_version 820098 (0.0014) [2024-06-15 22:19:25,738][1648984] Fps is (10 sec: 36046.1, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 1679589376. Throughput: 0: 10319.7. Samples: 419965440. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:19:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:19:25,931][1651340] Signal inference workers to stop experience collection... (42200 times) [2024-06-15 22:19:25,970][1652475] InferenceWorker_p0-w0: stopping experience collection (42200 times) [2024-06-15 22:19:26,140][1651340] Signal inference workers to resume experience collection... (42200 times) [2024-06-15 22:19:26,140][1652475] InferenceWorker_p0-w0: resuming experience collection (42200 times) [2024-06-15 22:19:27,030][1652475] Updated weights for policy 0, policy_version 820163 (0.0011) [2024-06-15 22:19:27,975][1652475] Updated weights for policy 0, policy_version 820208 (0.0012) [2024-06-15 22:19:30,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 39867.8, 300 sec: 42209.6). Total num frames: 1679818752. Throughput: 0: 10626.8. Samples: 420035072. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:19:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:19:31,907][1652475] Updated weights for policy 0, policy_version 820272 (0.0025) [2024-06-15 22:19:35,568][1652475] Updated weights for policy 0, policy_version 820343 (0.0011) [2024-06-15 22:19:35,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 41876.4). Total num frames: 1680080896. Throughput: 0: 10433.4. Samples: 420065280. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:19:35,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:19:37,298][1652475] Updated weights for policy 0, policy_version 820384 (0.0012) [2024-06-15 22:19:39,084][1652475] Updated weights for policy 0, policy_version 820436 (0.0012) [2024-06-15 22:19:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43144.7, 300 sec: 42431.8). Total num frames: 1680343040. Throughput: 0: 10854.4. Samples: 420133376. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:19:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:19:43,029][1652475] Updated weights for policy 0, policy_version 820512 (0.0018) [2024-06-15 22:19:45,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 41506.0, 300 sec: 41987.5). Total num frames: 1680474112. Throughput: 0: 10865.8. Samples: 420202496. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:19:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:19:46,121][1652475] Updated weights for policy 0, policy_version 820560 (0.0021) [2024-06-15 22:19:47,127][1652475] Updated weights for policy 0, policy_version 820605 (0.0011) [2024-06-15 22:19:49,366][1652475] Updated weights for policy 0, policy_version 820669 (0.0015) [2024-06-15 22:19:50,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1680736256. Throughput: 0: 10945.4. Samples: 420236288. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:19:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:19:51,608][1652475] Updated weights for policy 0, policy_version 820728 (0.0090) [2024-06-15 22:19:55,380][1652475] Updated weights for policy 0, policy_version 820794 (0.0013) [2024-06-15 22:19:55,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1680998400. Throughput: 0: 11127.5. Samples: 420303360. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:19:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:19:58,794][1652475] Updated weights for policy 0, policy_version 820835 (0.0019) [2024-06-15 22:20:00,291][1652475] Updated weights for policy 0, policy_version 820880 (0.0018) [2024-06-15 22:20:00,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 44783.0, 300 sec: 41987.5). Total num frames: 1681195008. Throughput: 0: 11116.2. Samples: 420366336. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:20:02,913][1652475] Updated weights for policy 0, policy_version 820944 (0.0012) [2024-06-15 22:20:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 42209.6). Total num frames: 1681391616. Throughput: 0: 11047.8. Samples: 420398592. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:20:06,696][1652475] Updated weights for policy 0, policy_version 821024 (0.0026) [2024-06-15 22:20:10,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 43690.8, 300 sec: 42098.6). Total num frames: 1681555456. Throughput: 0: 11070.6. Samples: 420463616. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:10,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 22:20:11,000][1652475] Updated weights for policy 0, policy_version 821093 (0.0011) [2024-06-15 22:20:12,184][1652475] Updated weights for policy 0, policy_version 821123 (0.0012) [2024-06-15 22:20:13,427][1652475] Updated weights for policy 0, policy_version 821184 (0.0011) [2024-06-15 22:20:15,738][1648984] Fps is (10 sec: 39319.1, 60 sec: 42598.3, 300 sec: 42209.5). Total num frames: 1681784832. Throughput: 0: 10933.9. Samples: 420527104. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:15,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:20:17,164][1651340] Signal inference workers to stop experience collection... (42250 times) [2024-06-15 22:20:17,252][1652475] InferenceWorker_p0-w0: stopping experience collection (42250 times) [2024-06-15 22:20:17,560][1651340] Signal inference workers to resume experience collection... (42250 times) [2024-06-15 22:20:17,561][1652475] InferenceWorker_p0-w0: resuming experience collection (42250 times) [2024-06-15 22:20:18,520][1652475] Updated weights for policy 0, policy_version 821254 (0.0013) [2024-06-15 22:20:19,484][1652475] Updated weights for policy 0, policy_version 821310 (0.0013) [2024-06-15 22:20:20,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.8, 300 sec: 42209.6). Total num frames: 1682046976. Throughput: 0: 10865.8. Samples: 420554240. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:20:22,780][1652475] Updated weights for policy 0, policy_version 821368 (0.0012) [2024-06-15 22:20:25,738][1648984] Fps is (10 sec: 49155.0, 60 sec: 44783.0, 300 sec: 42098.5). Total num frames: 1682276352. Throughput: 0: 10911.3. Samples: 420624384. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:20:25,982][1652475] Updated weights for policy 0, policy_version 821439 (0.0013) [2024-06-15 22:20:30,546][1652475] Updated weights for policy 0, policy_version 821504 (0.0011) [2024-06-15 22:20:30,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 43690.5, 300 sec: 42098.5). Total num frames: 1682440192. Throughput: 0: 10661.0. Samples: 420682240. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:30,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:20:32,015][1652475] Updated weights for policy 0, policy_version 821557 (0.0010) [2024-06-15 22:20:33,597][1652475] Updated weights for policy 0, policy_version 821600 (0.0011) [2024-06-15 22:20:34,247][1652475] Updated weights for policy 0, policy_version 821632 (0.0013) [2024-06-15 22:20:35,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 1682702336. Throughput: 0: 10592.7. Samples: 420712960. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:35,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:20:39,565][1652475] Updated weights for policy 0, policy_version 821693 (0.0012) [2024-06-15 22:20:40,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 1682866176. Throughput: 0: 10706.5. Samples: 420785152. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:20:41,793][1652475] Updated weights for policy 0, policy_version 821760 (0.0131) [2024-06-15 22:20:45,142][1652475] Updated weights for policy 0, policy_version 821825 (0.0011) [2024-06-15 22:20:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44783.0, 300 sec: 42431.8). Total num frames: 1683161088. Throughput: 0: 10558.6. Samples: 420841472. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:45,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:20:46,373][1652475] Updated weights for policy 0, policy_version 821884 (0.0015) [2024-06-15 22:20:50,738][1648984] Fps is (10 sec: 36044.2, 60 sec: 41506.0, 300 sec: 42098.5). Total num frames: 1683226624. Throughput: 0: 10626.8. Samples: 420876800. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:50,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:20:53,683][1652475] Updated weights for policy 0, policy_version 821984 (0.0013) [2024-06-15 22:20:55,485][1652475] Updated weights for policy 0, policy_version 822017 (0.0013) [2024-06-15 22:20:55,738][1648984] Fps is (10 sec: 36043.5, 60 sec: 42052.1, 300 sec: 41876.3). Total num frames: 1683521536. Throughput: 0: 10524.4. Samples: 420937216. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:20:55,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:20:56,099][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000822048_1683554304.pth... [2024-06-15 22:20:56,228][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000817056_1673330688.pth [2024-06-15 22:20:57,301][1652475] Updated weights for policy 0, policy_version 822096 (0.0014) [2024-06-15 22:20:58,425][1652475] Updated weights for policy 0, policy_version 822140 (0.0011) [2024-06-15 22:21:00,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 42598.3, 300 sec: 42653.9). Total num frames: 1683750912. Throughput: 0: 10638.4. Samples: 421005824. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:21:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:21:04,122][1651340] Signal inference workers to stop experience collection... (42300 times) [2024-06-15 22:21:04,226][1652475] InferenceWorker_p0-w0: stopping experience collection (42300 times) [2024-06-15 22:21:04,348][1651340] Signal inference workers to resume experience collection... (42300 times) [2024-06-15 22:21:04,350][1652475] InferenceWorker_p0-w0: resuming experience collection (42300 times) [2024-06-15 22:21:04,935][1652475] Updated weights for policy 0, policy_version 822212 (0.0143) [2024-06-15 22:21:05,738][1648984] Fps is (10 sec: 42599.6, 60 sec: 42598.4, 300 sec: 42209.8). Total num frames: 1683947520. Throughput: 0: 10865.8. Samples: 421043200. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:21:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:21:07,541][1652475] Updated weights for policy 0, policy_version 822275 (0.0013) [2024-06-15 22:21:08,906][1652475] Updated weights for policy 0, policy_version 822330 (0.0010) [2024-06-15 22:21:10,738][1648984] Fps is (10 sec: 39319.9, 60 sec: 43144.2, 300 sec: 42209.6). Total num frames: 1684144128. Throughput: 0: 10513.0. Samples: 421097472. Policy #0 lag: (min: 12.0, avg: 117.9, max: 268.0) [2024-06-15 22:21:10,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:21:11,845][1652475] Updated weights for policy 0, policy_version 822384 (0.0013) [2024-06-15 22:21:15,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 43145.0, 300 sec: 42542.9). Total num frames: 1684373504. Throughput: 0: 10752.0. Samples: 421166080. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:21:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:21:15,747][1652475] Updated weights for policy 0, policy_version 822460 (0.0013) [2024-06-15 22:21:18,709][1652475] Updated weights for policy 0, policy_version 822516 (0.0012) [2024-06-15 22:21:20,705][1652475] Updated weights for policy 0, policy_version 822584 (0.0012) [2024-06-15 22:21:20,738][1648984] Fps is (10 sec: 49154.4, 60 sec: 43144.5, 300 sec: 42098.6). Total num frames: 1684635648. Throughput: 0: 10706.5. Samples: 421194752. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:21:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:21:25,738][1648984] Fps is (10 sec: 29490.8, 60 sec: 39867.6, 300 sec: 42209.6). Total num frames: 1684668416. Throughput: 0: 10410.6. Samples: 421253632. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:21:25,739][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 22:21:27,195][1652475] Updated weights for policy 0, policy_version 822656 (0.0013) [2024-06-15 22:21:30,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1684930560. Throughput: 0: 10535.8. Samples: 421315584. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:21:30,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:21:30,947][1652475] Updated weights for policy 0, policy_version 822738 (0.0070) [2024-06-15 22:21:32,539][1652475] Updated weights for policy 0, policy_version 822787 (0.0011) [2024-06-15 22:21:35,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1685192704. Throughput: 0: 10353.8. Samples: 421342720. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:21:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:21:37,365][1652475] Updated weights for policy 0, policy_version 822853 (0.0012) [2024-06-15 22:21:40,327][1652475] Updated weights for policy 0, policy_version 822920 (0.0013) [2024-06-15 22:21:40,739][1648984] Fps is (10 sec: 42598.5, 60 sec: 41506.1, 300 sec: 42320.7). Total num frames: 1685356544. Throughput: 0: 10479.0. Samples: 421408768. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:21:40,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:21:42,274][1652475] Updated weights for policy 0, policy_version 822977 (0.0013) [2024-06-15 22:21:43,638][1652475] Updated weights for policy 0, policy_version 823039 (0.0015) [2024-06-15 22:21:45,316][1652475] Updated weights for policy 0, policy_version 823074 (0.0012) [2024-06-15 22:21:45,738][1648984] Fps is (10 sec: 49151.1, 60 sec: 42052.1, 300 sec: 42542.8). Total num frames: 1685684224. Throughput: 0: 10490.3. Samples: 421477888. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:21:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:21:49,488][1652475] Updated weights for policy 0, policy_version 823153 (0.0016) [2024-06-15 22:21:50,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 43690.8, 300 sec: 42542.9). Total num frames: 1685848064. Throughput: 0: 10467.6. Samples: 421514240. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:21:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:21:52,644][1651340] Signal inference workers to stop experience collection... (42350 times) [2024-06-15 22:21:52,697][1652475] InferenceWorker_p0-w0: stopping experience collection (42350 times) [2024-06-15 22:21:52,886][1651340] Signal inference workers to resume experience collection... (42350 times) [2024-06-15 22:21:52,887][1652475] InferenceWorker_p0-w0: resuming experience collection (42350 times) [2024-06-15 22:21:52,890][1652475] Updated weights for policy 0, policy_version 823216 (0.0115) [2024-06-15 22:21:55,480][1652475] Updated weights for policy 0, policy_version 823295 (0.0013) [2024-06-15 22:21:55,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43144.7, 300 sec: 42653.9). Total num frames: 1686110208. Throughput: 0: 10740.7. Samples: 421580800. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:21:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:21:57,514][1652475] Updated weights for policy 0, policy_version 823352 (0.0012) [2024-06-15 22:22:00,696][1652475] Updated weights for policy 0, policy_version 823394 (0.0012) [2024-06-15 22:22:00,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1686306816. Throughput: 0: 10649.6. Samples: 421645312. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:00,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:22:03,696][1652475] Updated weights for policy 0, policy_version 823428 (0.0011) [2024-06-15 22:22:04,545][1652475] Updated weights for policy 0, policy_version 823482 (0.0012) [2024-06-15 22:22:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.4, 300 sec: 42326.5). Total num frames: 1686503424. Throughput: 0: 10865.8. Samples: 421683712. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:05,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:22:06,750][1652475] Updated weights for policy 0, policy_version 823526 (0.0012) [2024-06-15 22:22:07,757][1652475] Updated weights for policy 0, policy_version 823570 (0.0012) [2024-06-15 22:22:08,765][1652475] Updated weights for policy 0, policy_version 823615 (0.0012) [2024-06-15 22:22:10,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 44237.0, 300 sec: 42987.2). Total num frames: 1686798336. Throughput: 0: 11093.4. Samples: 421752832. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:22:15,083][1652475] Updated weights for policy 0, policy_version 823682 (0.0012) [2024-06-15 22:22:15,739][1648984] Fps is (10 sec: 45870.3, 60 sec: 43143.8, 300 sec: 42764.9). Total num frames: 1686962176. Throughput: 0: 11263.8. Samples: 421822464. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:15,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:22:16,077][1652475] Updated weights for policy 0, policy_version 823742 (0.0011) [2024-06-15 22:22:18,727][1652475] Updated weights for policy 0, policy_version 823808 (0.0012) [2024-06-15 22:22:20,625][1652475] Updated weights for policy 0, policy_version 823865 (0.0139) [2024-06-15 22:22:20,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 1687289856. Throughput: 0: 11355.0. Samples: 421853696. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:22:23,342][1652475] Updated weights for policy 0, policy_version 823932 (0.0014) [2024-06-15 22:22:25,738][1648984] Fps is (10 sec: 45879.5, 60 sec: 45875.3, 300 sec: 42653.9). Total num frames: 1687420928. Throughput: 0: 11343.6. Samples: 421919232. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:22:30,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 44236.8, 300 sec: 42320.7). Total num frames: 1687584768. Throughput: 0: 11377.8. Samples: 421989888. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:30,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:22:31,040][1652475] Updated weights for policy 0, policy_version 824033 (0.0015) [2024-06-15 22:22:32,764][1652475] Updated weights for policy 0, policy_version 824097 (0.0011) [2024-06-15 22:22:35,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 44782.9, 300 sec: 42987.2). Total num frames: 1687879680. Throughput: 0: 11047.8. Samples: 422011392. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:22:36,148][1652475] Updated weights for policy 0, policy_version 824189 (0.0015) [2024-06-15 22:22:39,450][1651340] Signal inference workers to stop experience collection... (42400 times) [2024-06-15 22:22:39,479][1652475] InferenceWorker_p0-w0: stopping experience collection (42400 times) [2024-06-15 22:22:39,747][1651340] Signal inference workers to resume experience collection... (42400 times) [2024-06-15 22:22:39,748][1652475] InferenceWorker_p0-w0: resuming experience collection (42400 times) [2024-06-15 22:22:39,936][1652475] Updated weights for policy 0, policy_version 824249 (0.0012) [2024-06-15 22:22:40,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 45329.0, 300 sec: 42987.2). Total num frames: 1688076288. Throughput: 0: 10945.4. Samples: 422073344. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:22:43,763][1652475] Updated weights for policy 0, policy_version 824312 (0.0012) [2024-06-15 22:22:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.5, 300 sec: 42765.0). Total num frames: 1688240128. Throughput: 0: 10911.3. Samples: 422136320. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:22:46,864][1652475] Updated weights for policy 0, policy_version 824383 (0.0013) [2024-06-15 22:22:50,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1688403968. Throughput: 0: 10649.6. Samples: 422162944. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:22:51,668][1652475] Updated weights for policy 0, policy_version 824464 (0.0133) [2024-06-15 22:22:55,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1688698880. Throughput: 0: 10422.1. Samples: 422221824. Policy #0 lag: (min: 15.0, avg: 149.1, max: 274.0) [2024-06-15 22:22:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:22:55,821][1652475] Updated weights for policy 0, policy_version 824569 (0.0012) [2024-06-15 22:22:55,934][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000824576_1688731648.pth... [2024-06-15 22:22:55,984][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000819520_1678376960.pth [2024-06-15 22:23:00,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42098.5). Total num frames: 1688797184. Throughput: 0: 10240.2. Samples: 422283264. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:00,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:23:01,115][1652475] Updated weights for policy 0, policy_version 824636 (0.0013) [2024-06-15 22:23:03,776][1652475] Updated weights for policy 0, policy_version 824677 (0.0014) [2024-06-15 22:23:05,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.4, 300 sec: 42765.1). Total num frames: 1689059328. Throughput: 0: 10296.9. Samples: 422317056. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:05,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:23:06,327][1652475] Updated weights for policy 0, policy_version 824768 (0.0011) [2024-06-15 22:23:07,727][1652475] Updated weights for policy 0, policy_version 824825 (0.0010) [2024-06-15 22:23:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 40960.0, 300 sec: 42542.9). Total num frames: 1689255936. Throughput: 0: 10092.1. Samples: 422373376. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:23:14,087][1652475] Updated weights for policy 0, policy_version 824893 (0.0014) [2024-06-15 22:23:15,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 40414.5, 300 sec: 42209.6). Total num frames: 1689387008. Throughput: 0: 10092.1. Samples: 422444032. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:23:16,598][1652475] Updated weights for policy 0, policy_version 824933 (0.0011) [2024-06-15 22:23:18,807][1652475] Updated weights for policy 0, policy_version 825024 (0.0012) [2024-06-15 22:23:20,474][1652475] Updated weights for policy 0, policy_version 825088 (0.0012) [2024-06-15 22:23:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1689780224. Throughput: 0: 10251.4. Samples: 422472704. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:20,740][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:23:25,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 40960.0, 300 sec: 42209.6). Total num frames: 1689878528. Throughput: 0: 10308.3. Samples: 422537216. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:23:25,818][1652475] Updated weights for policy 0, policy_version 825152 (0.0014) [2024-06-15 22:23:28,954][1651340] Signal inference workers to stop experience collection... (42450 times) [2024-06-15 22:23:28,985][1652475] InferenceWorker_p0-w0: stopping experience collection (42450 times) [2024-06-15 22:23:29,279][1651340] Signal inference workers to resume experience collection... (42450 times) [2024-06-15 22:23:29,280][1652475] InferenceWorker_p0-w0: resuming experience collection (42450 times) [2024-06-15 22:23:29,627][1652475] Updated weights for policy 0, policy_version 825216 (0.0012) [2024-06-15 22:23:30,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 42052.2, 300 sec: 42876.1). Total num frames: 1690107904. Throughput: 0: 10319.6. Samples: 422600704. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:23:31,810][1652475] Updated weights for policy 0, policy_version 825297 (0.0013) [2024-06-15 22:23:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 40413.9, 300 sec: 42542.9). Total num frames: 1690304512. Throughput: 0: 10217.3. Samples: 422622720. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:23:36,535][1652475] Updated weights for policy 0, policy_version 825345 (0.0012) [2024-06-15 22:23:37,861][1652475] Updated weights for policy 0, policy_version 825403 (0.0013) [2024-06-15 22:23:40,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 40413.8, 300 sec: 42431.8). Total num frames: 1690501120. Throughput: 0: 10331.0. Samples: 422686720. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:40,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:23:41,255][1652475] Updated weights for policy 0, policy_version 825468 (0.0017) [2024-06-15 22:23:45,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 40413.9, 300 sec: 42542.9). Total num frames: 1690664960. Throughput: 0: 10376.5. Samples: 422750208. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:23:46,237][1652475] Updated weights for policy 0, policy_version 825552 (0.0097) [2024-06-15 22:23:48,969][1652475] Updated weights for policy 0, policy_version 825607 (0.0013) [2024-06-15 22:23:50,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 42598.4, 300 sec: 42654.0). Total num frames: 1690959872. Throughput: 0: 10171.7. Samples: 422774784. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:50,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:23:51,806][1652475] Updated weights for policy 0, policy_version 825667 (0.0012) [2024-06-15 22:23:55,758][1648984] Fps is (10 sec: 42510.9, 60 sec: 39854.1, 300 sec: 42651.0). Total num frames: 1691090944. Throughput: 0: 10383.2. Samples: 422840832. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:23:55,759][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:23:57,707][1652475] Updated weights for policy 0, policy_version 825747 (0.0019) [2024-06-15 22:23:58,787][1652475] Updated weights for policy 0, policy_version 825794 (0.0142) [2024-06-15 22:24:00,209][1652475] Updated weights for policy 0, policy_version 825849 (0.0014) [2024-06-15 22:24:00,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1691353088. Throughput: 0: 10285.5. Samples: 422906880. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:24:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:24:01,747][1652475] Updated weights for policy 0, policy_version 825906 (0.0072) [2024-06-15 22:24:03,334][1652475] Updated weights for policy 0, policy_version 825936 (0.0033) [2024-06-15 22:24:05,738][1648984] Fps is (10 sec: 52537.0, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 1691615232. Throughput: 0: 10365.1. Samples: 422939136. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:24:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:24:09,520][1652475] Updated weights for policy 0, policy_version 825985 (0.0013) [2024-06-15 22:24:10,731][1652475] Updated weights for policy 0, policy_version 826046 (0.0014) [2024-06-15 22:24:10,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 40960.1, 300 sec: 42320.8). Total num frames: 1691713536. Throughput: 0: 10581.3. Samples: 423013376. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:24:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:24:12,557][1651340] Signal inference workers to stop experience collection... (42500 times) [2024-06-15 22:24:12,597][1652475] InferenceWorker_p0-w0: stopping experience collection (42500 times) [2024-06-15 22:24:12,598][1652475] Updated weights for policy 0, policy_version 826116 (0.0011) [2024-06-15 22:24:12,849][1651340] Signal inference workers to resume experience collection... (42500 times) [2024-06-15 22:24:12,858][1652475] InferenceWorker_p0-w0: resuming experience collection (42500 times) [2024-06-15 22:24:15,082][1652475] Updated weights for policy 0, policy_version 826178 (0.0013) [2024-06-15 22:24:15,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 1692073984. Throughput: 0: 10444.8. Samples: 423070720. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:24:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:24:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 39321.6, 300 sec: 42542.9). Total num frames: 1692139520. Throughput: 0: 10729.2. Samples: 423105536. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:24:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:24:21,427][1652475] Updated weights for policy 0, policy_version 826241 (0.0016) [2024-06-15 22:24:23,234][1652475] Updated weights for policy 0, policy_version 826307 (0.0029) [2024-06-15 22:24:25,712][1652475] Updated weights for policy 0, policy_version 826404 (0.0011) [2024-06-15 22:24:25,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 1692467200. Throughput: 0: 10752.0. Samples: 423170560. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:24:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:24:27,456][1652475] Updated weights for policy 0, policy_version 826448 (0.0029) [2024-06-15 22:24:28,689][1652475] Updated weights for policy 0, policy_version 826495 (0.0014) [2024-06-15 22:24:30,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1692663808. Throughput: 0: 10661.0. Samples: 423229952. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:24:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:24:35,480][1652475] Updated weights for policy 0, policy_version 826557 (0.0017) [2024-06-15 22:24:35,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1692794880. Throughput: 0: 11002.3. Samples: 423269888. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:24:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:24:37,246][1652475] Updated weights for policy 0, policy_version 826624 (0.0033) [2024-06-15 22:24:39,434][1652475] Updated weights for policy 0, policy_version 826705 (0.0104) [2024-06-15 22:24:40,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 44783.0, 300 sec: 43098.3). Total num frames: 1693188096. Throughput: 0: 10688.6. Samples: 423321600. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:24:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:24:45,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 1693188096. Throughput: 0: 10877.1. Samples: 423396352. Policy #0 lag: (min: 7.0, avg: 101.2, max: 263.0) [2024-06-15 22:24:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:24:46,300][1652475] Updated weights for policy 0, policy_version 826755 (0.0079) [2024-06-15 22:24:47,448][1652475] Updated weights for policy 0, policy_version 826811 (0.0021) [2024-06-15 22:24:49,415][1652475] Updated weights for policy 0, policy_version 826864 (0.0013) [2024-06-15 22:24:50,738][1648984] Fps is (10 sec: 29491.3, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 1693483008. Throughput: 0: 10899.9. Samples: 423429632. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:24:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:24:51,658][1652475] Updated weights for policy 0, policy_version 826944 (0.0103) [2024-06-15 22:24:53,282][1652475] Updated weights for policy 0, policy_version 826997 (0.0037) [2024-06-15 22:24:55,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43705.6, 300 sec: 42431.8). Total num frames: 1693712384. Throughput: 0: 10376.5. Samples: 423480320. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:24:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:24:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000827008_1693712384.pth... [2024-06-15 22:24:55,812][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000822048_1683554304.pth [2024-06-15 22:24:59,438][1652475] Updated weights for policy 0, policy_version 827056 (0.0013) [2024-06-15 22:25:00,645][1651340] Signal inference workers to stop experience collection... (42550 times) [2024-06-15 22:25:00,695][1652475] InferenceWorker_p0-w0: stopping experience collection (42550 times) [2024-06-15 22:25:00,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1693843456. Throughput: 0: 10604.1. Samples: 423547904. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:25:00,937][1651340] Signal inference workers to resume experience collection... (42550 times) [2024-06-15 22:25:00,937][1652475] InferenceWorker_p0-w0: resuming experience collection (42550 times) [2024-06-15 22:25:01,854][1652475] Updated weights for policy 0, policy_version 827129 (0.0033) [2024-06-15 22:25:03,929][1652475] Updated weights for policy 0, policy_version 827174 (0.0012) [2024-06-15 22:25:05,344][1652475] Updated weights for policy 0, policy_version 827232 (0.0020) [2024-06-15 22:25:05,738][1648984] Fps is (10 sec: 49150.7, 60 sec: 43144.3, 300 sec: 42876.0). Total num frames: 1694203904. Throughput: 0: 10535.7. Samples: 423579648. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:05,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:25:10,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42598.3, 300 sec: 42320.8). Total num frames: 1694269440. Throughput: 0: 10444.8. Samples: 423640576. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:25:11,515][1652475] Updated weights for policy 0, policy_version 827324 (0.0016) [2024-06-15 22:25:13,719][1652475] Updated weights for policy 0, policy_version 827380 (0.0011) [2024-06-15 22:25:15,738][1648984] Fps is (10 sec: 29492.1, 60 sec: 40413.9, 300 sec: 42209.6). Total num frames: 1694498816. Throughput: 0: 10535.8. Samples: 423704064. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:25:17,610][1652475] Updated weights for policy 0, policy_version 827440 (0.0077) [2024-06-15 22:25:19,202][1652475] Updated weights for policy 0, policy_version 827520 (0.0054) [2024-06-15 22:25:20,738][1648984] Fps is (10 sec: 49151.2, 60 sec: 43690.5, 300 sec: 42320.7). Total num frames: 1694760960. Throughput: 0: 10444.8. Samples: 423739904. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:20,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:25:22,407][1652475] Updated weights for policy 0, policy_version 827573 (0.0012) [2024-06-15 22:25:24,462][1652475] Updated weights for policy 0, policy_version 827616 (0.0047) [2024-06-15 22:25:25,167][1652475] Updated weights for policy 0, policy_version 827646 (0.0013) [2024-06-15 22:25:25,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 42598.4, 300 sec: 42654.0). Total num frames: 1695023104. Throughput: 0: 10729.3. Samples: 423804416. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:25:29,627][1652475] Updated weights for policy 0, policy_version 827715 (0.0012) [2024-06-15 22:25:30,738][1648984] Fps is (10 sec: 49153.3, 60 sec: 43144.6, 300 sec: 42542.9). Total num frames: 1695252480. Throughput: 0: 10649.6. Samples: 423875584. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:30,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:25:32,107][1652475] Updated weights for policy 0, policy_version 827780 (0.0139) [2024-06-15 22:25:33,521][1652475] Updated weights for policy 0, policy_version 827838 (0.0012) [2024-06-15 22:25:35,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 42542.9). Total num frames: 1695416320. Throughput: 0: 10501.7. Samples: 423902208. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:25:36,878][1652475] Updated weights for policy 0, policy_version 827895 (0.0015) [2024-06-15 22:25:40,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 40413.9, 300 sec: 42209.6). Total num frames: 1695612928. Throughput: 0: 11059.2. Samples: 423977984. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:25:41,211][1652475] Updated weights for policy 0, policy_version 827960 (0.0014) [2024-06-15 22:25:42,207][1652475] Updated weights for policy 0, policy_version 828003 (0.0012) [2024-06-15 22:25:43,636][1651340] Signal inference workers to stop experience collection... (42600 times) [2024-06-15 22:25:43,668][1652475] InferenceWorker_p0-w0: stopping experience collection (42600 times) [2024-06-15 22:25:43,793][1651340] Signal inference workers to resume experience collection... (42600 times) [2024-06-15 22:25:43,794][1652475] InferenceWorker_p0-w0: resuming experience collection (42600 times) [2024-06-15 22:25:44,376][1652475] Updated weights for policy 0, policy_version 828085 (0.0015) [2024-06-15 22:25:45,738][1648984] Fps is (10 sec: 52427.7, 60 sec: 45875.1, 300 sec: 43098.3). Total num frames: 1695940608. Throughput: 0: 10968.1. Samples: 424041472. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:25:48,287][1652475] Updated weights for policy 0, policy_version 828129 (0.0012) [2024-06-15 22:25:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43144.6, 300 sec: 42542.9). Total num frames: 1696071680. Throughput: 0: 11013.8. Samples: 424075264. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:25:51,214][1652475] Updated weights for policy 0, policy_version 828176 (0.0012) [2024-06-15 22:25:52,912][1652475] Updated weights for policy 0, policy_version 828240 (0.0011) [2024-06-15 22:25:55,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1696333824. Throughput: 0: 11138.9. Samples: 424141824. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:25:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:25:57,578][1652475] Updated weights for policy 0, policy_version 828347 (0.0013) [2024-06-15 22:26:00,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 42431.8). Total num frames: 1696464896. Throughput: 0: 11161.6. Samples: 424206336. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:26:00,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:26:01,772][1652475] Updated weights for policy 0, policy_version 828404 (0.0013) [2024-06-15 22:26:04,031][1652475] Updated weights for policy 0, policy_version 828475 (0.0012) [2024-06-15 22:26:05,354][1652475] Updated weights for policy 0, policy_version 828540 (0.0130) [2024-06-15 22:26:05,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 44237.0, 300 sec: 43098.3). Total num frames: 1696858112. Throughput: 0: 11002.3. Samples: 424235008. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:26:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:26:10,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 1696890880. Throughput: 0: 11082.0. Samples: 424303104. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:26:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:26:11,140][1652475] Updated weights for policy 0, policy_version 828579 (0.0012) [2024-06-15 22:26:14,489][1652475] Updated weights for policy 0, policy_version 828672 (0.0013) [2024-06-15 22:26:15,738][1648984] Fps is (10 sec: 36045.2, 60 sec: 45329.1, 300 sec: 42653.9). Total num frames: 1697218560. Throughput: 0: 10831.6. Samples: 424363008. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:26:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:26:15,845][1652475] Updated weights for policy 0, policy_version 828736 (0.0085) [2024-06-15 22:26:17,055][1652475] Updated weights for policy 0, policy_version 828800 (0.0021) [2024-06-15 22:26:20,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 43690.8, 300 sec: 43098.3). Total num frames: 1697382400. Throughput: 0: 10922.7. Samples: 424393728. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:26:20,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:26:24,807][1652475] Updated weights for policy 0, policy_version 828873 (0.0012) [2024-06-15 22:26:25,738][1648984] Fps is (10 sec: 39320.7, 60 sec: 43144.4, 300 sec: 42987.2). Total num frames: 1697611776. Throughput: 0: 10843.0. Samples: 424465920. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:26:25,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:26:26,014][1652475] Updated weights for policy 0, policy_version 828925 (0.0012) [2024-06-15 22:26:28,724][1652475] Updated weights for policy 0, policy_version 829024 (0.0229) [2024-06-15 22:26:28,823][1651340] Signal inference workers to stop experience collection... (42650 times) [2024-06-15 22:26:28,866][1652475] InferenceWorker_p0-w0: stopping experience collection (42650 times) [2024-06-15 22:26:29,070][1651340] Signal inference workers to resume experience collection... (42650 times) [2024-06-15 22:26:29,074][1652475] InferenceWorker_p0-w0: resuming experience collection (42650 times) [2024-06-15 22:26:30,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 1697906688. Throughput: 0: 10774.8. Samples: 424526336. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:26:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:26:35,404][1652475] Updated weights for policy 0, policy_version 829104 (0.0012) [2024-06-15 22:26:35,738][1648984] Fps is (10 sec: 42599.3, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1698037760. Throughput: 0: 10911.3. Samples: 424566272. Policy #0 lag: (min: 31.0, avg: 116.6, max: 287.0) [2024-06-15 22:26:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:26:37,821][1652475] Updated weights for policy 0, policy_version 829184 (0.0015) [2024-06-15 22:26:40,090][1652475] Updated weights for policy 0, policy_version 829249 (0.0015) [2024-06-15 22:26:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 45329.0, 300 sec: 42876.1). Total num frames: 1698332672. Throughput: 0: 10820.2. Samples: 424628736. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:26:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:26:41,356][1652475] Updated weights for policy 0, policy_version 829305 (0.0013) [2024-06-15 22:26:45,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1698430976. Throughput: 0: 10934.1. Samples: 424698368. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:26:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:26:47,128][1652475] Updated weights for policy 0, policy_version 829350 (0.0011) [2024-06-15 22:26:48,675][1652475] Updated weights for policy 0, policy_version 829408 (0.0011) [2024-06-15 22:26:50,543][1652475] Updated weights for policy 0, policy_version 829456 (0.0013) [2024-06-15 22:26:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 44236.7, 300 sec: 42765.0). Total num frames: 1698725888. Throughput: 0: 11036.5. Samples: 424731648. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:26:50,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:26:52,405][1652475] Updated weights for policy 0, policy_version 829522 (0.0012) [2024-06-15 22:26:55,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 42876.1). Total num frames: 1698955264. Throughput: 0: 10854.4. Samples: 424791552. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:26:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:26:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000829568_1698955264.pth... [2024-06-15 22:26:55,791][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000824576_1688731648.pth [2024-06-15 22:26:58,823][1652475] Updated weights for policy 0, policy_version 829629 (0.0013) [2024-06-15 22:27:00,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 45329.1, 300 sec: 42987.2). Total num frames: 1699184640. Throughput: 0: 11138.8. Samples: 424864256. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:27:00,837][1652475] Updated weights for policy 0, policy_version 829693 (0.0014) [2024-06-15 22:27:04,786][1652475] Updated weights for policy 0, policy_version 829792 (0.0012) [2024-06-15 22:27:05,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1699479552. Throughput: 0: 11184.3. Samples: 424897024. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:27:10,505][1652475] Updated weights for policy 0, policy_version 829860 (0.0013) [2024-06-15 22:27:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 44782.9, 300 sec: 42765.2). Total num frames: 1699577856. Throughput: 0: 11013.7. Samples: 424961536. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:27:13,281][1652475] Updated weights for policy 0, policy_version 829923 (0.0012) [2024-06-15 22:27:15,178][1652475] Updated weights for policy 0, policy_version 829968 (0.0013) [2024-06-15 22:27:15,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 43144.5, 300 sec: 42431.8). Total num frames: 1699807232. Throughput: 0: 11093.4. Samples: 425025536. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:27:17,024][1651340] Signal inference workers to stop experience collection... (42700 times) [2024-06-15 22:27:17,074][1652475] InferenceWorker_p0-w0: stopping experience collection (42700 times) [2024-06-15 22:27:17,404][1651340] Signal inference workers to resume experience collection... (42700 times) [2024-06-15 22:27:17,405][1652475] InferenceWorker_p0-w0: resuming experience collection (42700 times) [2024-06-15 22:27:18,071][1652475] Updated weights for policy 0, policy_version 830076 (0.0011) [2024-06-15 22:27:20,740][1648984] Fps is (10 sec: 42588.7, 60 sec: 43689.0, 300 sec: 42653.6). Total num frames: 1700003840. Throughput: 0: 10603.5. Samples: 425043456. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:20,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:27:22,156][1652475] Updated weights for policy 0, policy_version 830128 (0.0012) [2024-06-15 22:27:25,737][1648984] Fps is (10 sec: 32768.2, 60 sec: 42052.5, 300 sec: 42542.9). Total num frames: 1700134912. Throughput: 0: 10797.5. Samples: 425114624. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:27:28,092][1652475] Updated weights for policy 0, policy_version 830208 (0.0087) [2024-06-15 22:27:30,535][1652475] Updated weights for policy 0, policy_version 830278 (0.0020) [2024-06-15 22:27:30,738][1648984] Fps is (10 sec: 42607.8, 60 sec: 42052.2, 300 sec: 42542.8). Total num frames: 1700429824. Throughput: 0: 10524.4. Samples: 425171968. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:27:31,824][1652475] Updated weights for policy 0, policy_version 830328 (0.0012) [2024-06-15 22:27:33,397][1652475] Updated weights for policy 0, policy_version 830355 (0.0011) [2024-06-15 22:27:35,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 42654.0). Total num frames: 1700659200. Throughput: 0: 10513.1. Samples: 425204736. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:27:39,087][1652475] Updated weights for policy 0, policy_version 830416 (0.0028) [2024-06-15 22:27:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1700823040. Throughput: 0: 10683.7. Samples: 425272320. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:40,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:27:41,141][1652475] Updated weights for policy 0, policy_version 830496 (0.0018) [2024-06-15 22:27:43,623][1652475] Updated weights for policy 0, policy_version 830533 (0.0107) [2024-06-15 22:27:45,340][1652475] Updated weights for policy 0, policy_version 830609 (0.0106) [2024-06-15 22:27:45,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 43098.3). Total num frames: 1701117952. Throughput: 0: 10342.4. Samples: 425329664. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:27:50,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 40960.0, 300 sec: 42320.7). Total num frames: 1701183488. Throughput: 0: 10410.7. Samples: 425365504. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:27:51,447][1652475] Updated weights for policy 0, policy_version 830689 (0.0012) [2024-06-15 22:27:52,260][1652475] Updated weights for policy 0, policy_version 830719 (0.0012) [2024-06-15 22:27:53,569][1652475] Updated weights for policy 0, policy_version 830775 (0.0013) [2024-06-15 22:27:55,738][1648984] Fps is (10 sec: 36044.6, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1701478400. Throughput: 0: 10490.3. Samples: 425433600. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:27:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:27:56,010][1652475] Updated weights for policy 0, policy_version 830819 (0.0011) [2024-06-15 22:27:56,893][1652475] Updated weights for policy 0, policy_version 830864 (0.0011) [2024-06-15 22:28:00,738][1648984] Fps is (10 sec: 52426.4, 60 sec: 42051.9, 300 sec: 42876.0). Total num frames: 1701707776. Throughput: 0: 10547.1. Samples: 425500160. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:28:00,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:28:02,876][1652475] Updated weights for policy 0, policy_version 830944 (0.0163) [2024-06-15 22:28:03,825][1652475] Updated weights for policy 0, policy_version 830976 (0.0015) [2024-06-15 22:28:04,332][1651340] Signal inference workers to stop experience collection... (42750 times) [2024-06-15 22:28:04,431][1652475] InferenceWorker_p0-w0: stopping experience collection (42750 times) [2024-06-15 22:28:04,610][1651340] Signal inference workers to resume experience collection... (42750 times) [2024-06-15 22:28:04,611][1652475] InferenceWorker_p0-w0: resuming experience collection (42750 times) [2024-06-15 22:28:05,356][1652475] Updated weights for policy 0, policy_version 831036 (0.0158) [2024-06-15 22:28:05,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 41506.1, 300 sec: 43098.3). Total num frames: 1701969920. Throughput: 0: 10957.4. Samples: 425536512. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:28:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:28:08,259][1652475] Updated weights for policy 0, policy_version 831100 (0.0014) [2024-06-15 22:28:10,365][1652475] Updated weights for policy 0, policy_version 831152 (0.0014) [2024-06-15 22:28:10,738][1648984] Fps is (10 sec: 52431.0, 60 sec: 44236.8, 300 sec: 43542.6). Total num frames: 1702232064. Throughput: 0: 10763.3. Samples: 425598976. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:28:10,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:28:15,738][1648984] Fps is (10 sec: 32768.1, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 1702297600. Throughput: 0: 10956.8. Samples: 425665024. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:28:15,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 22:28:16,033][1652475] Updated weights for policy 0, policy_version 831216 (0.0014) [2024-06-15 22:28:17,756][1652475] Updated weights for policy 0, policy_version 831287 (0.0012) [2024-06-15 22:28:20,161][1652475] Updated weights for policy 0, policy_version 831357 (0.0011) [2024-06-15 22:28:20,745][1648984] Fps is (10 sec: 39291.6, 60 sec: 43686.8, 300 sec: 43208.2). Total num frames: 1702625280. Throughput: 0: 10738.8. Samples: 425688064. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:28:20,746][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:28:24,944][1652475] Updated weights for policy 0, policy_version 831408 (0.0024) [2024-06-15 22:28:25,738][1648984] Fps is (10 sec: 45872.7, 60 sec: 43690.2, 300 sec: 42876.0). Total num frames: 1702756352. Throughput: 0: 10660.9. Samples: 425752064. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 22:28:25,739][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:28:30,622][1652475] Updated weights for policy 0, policy_version 831520 (0.0012) [2024-06-15 22:28:30,738][1648984] Fps is (10 sec: 32793.2, 60 sec: 42052.4, 300 sec: 42876.1). Total num frames: 1702952960. Throughput: 0: 10820.3. Samples: 425816576. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:28:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:28:32,410][1652475] Updated weights for policy 0, policy_version 831600 (0.0189) [2024-06-15 22:28:35,738][1648984] Fps is (10 sec: 39323.8, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 1703149568. Throughput: 0: 10558.6. Samples: 425840640. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:28:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:28:36,719][1652475] Updated weights for policy 0, policy_version 831633 (0.0028) [2024-06-15 22:28:38,245][1652475] Updated weights for policy 0, policy_version 831696 (0.0010) [2024-06-15 22:28:40,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 1703411712. Throughput: 0: 10524.5. Samples: 425907200. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:28:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:28:42,423][1652475] Updated weights for policy 0, policy_version 831792 (0.0012) [2024-06-15 22:28:45,427][1652475] Updated weights for policy 0, policy_version 831840 (0.0013) [2024-06-15 22:28:45,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1703641088. Throughput: 0: 10581.5. Samples: 425976320. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:28:45,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:28:49,021][1652475] Updated weights for policy 0, policy_version 831904 (0.0011) [2024-06-15 22:28:50,402][1652475] Updated weights for policy 0, policy_version 831968 (0.0023) [2024-06-15 22:28:50,530][1651340] Signal inference workers to stop experience collection... (42800 times) [2024-06-15 22:28:50,585][1652475] InferenceWorker_p0-w0: stopping experience collection (42800 times) [2024-06-15 22:28:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44782.9, 300 sec: 43323.4). Total num frames: 1703870464. Throughput: 0: 10604.1. Samples: 426013696. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:28:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:28:50,743][1651340] Signal inference workers to resume experience collection... (42800 times) [2024-06-15 22:28:50,744][1652475] InferenceWorker_p0-w0: resuming experience collection (42800 times) [2024-06-15 22:28:51,065][1652475] Updated weights for policy 0, policy_version 831996 (0.0085) [2024-06-15 22:28:53,446][1652475] Updated weights for policy 0, policy_version 832059 (0.0014) [2024-06-15 22:28:55,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 43144.5, 300 sec: 43098.2). Total num frames: 1704067072. Throughput: 0: 10763.4. Samples: 426083328. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:28:55,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:28:56,092][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000832080_1704099840.pth... [2024-06-15 22:28:56,214][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000827008_1693712384.pth [2024-06-15 22:28:59,482][1652475] Updated weights for policy 0, policy_version 832130 (0.0012) [2024-06-15 22:29:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.8, 300 sec: 42987.2). Total num frames: 1704296448. Throughput: 0: 10843.0. Samples: 426152960. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:29:01,788][1652475] Updated weights for policy 0, policy_version 832229 (0.0107) [2024-06-15 22:29:05,178][1652475] Updated weights for policy 0, policy_version 832313 (0.0025) [2024-06-15 22:29:05,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 1704591360. Throughput: 0: 10913.1. Samples: 426179072. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:29:08,364][1652475] Updated weights for policy 0, policy_version 832341 (0.0012) [2024-06-15 22:29:09,355][1652475] Updated weights for policy 0, policy_version 832382 (0.0015) [2024-06-15 22:29:10,738][1648984] Fps is (10 sec: 42597.1, 60 sec: 41505.9, 300 sec: 42876.1). Total num frames: 1704722432. Throughput: 0: 10956.9. Samples: 426245120. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:10,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:29:13,174][1652475] Updated weights for policy 0, policy_version 832450 (0.0101) [2024-06-15 22:29:14,323][1652475] Updated weights for policy 0, policy_version 832502 (0.0012) [2024-06-15 22:29:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 44782.9, 300 sec: 43542.6). Total num frames: 1704984576. Throughput: 0: 11025.0. Samples: 426312704. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:15,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:29:17,151][1652475] Updated weights for policy 0, policy_version 832565 (0.0026) [2024-06-15 22:29:20,487][1652475] Updated weights for policy 0, policy_version 832608 (0.0120) [2024-06-15 22:29:20,738][1648984] Fps is (10 sec: 45876.9, 60 sec: 42603.9, 300 sec: 43098.3). Total num frames: 1705181184. Throughput: 0: 11229.9. Samples: 426345984. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:29:21,379][1652475] Updated weights for policy 0, policy_version 832639 (0.0012) [2024-06-15 22:29:24,766][1652475] Updated weights for policy 0, policy_version 832701 (0.0015) [2024-06-15 22:29:25,767][1648984] Fps is (10 sec: 42476.1, 60 sec: 44215.9, 300 sec: 43205.1). Total num frames: 1705410560. Throughput: 0: 11188.6. Samples: 426411008. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:25,767][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:29:28,591][1652475] Updated weights for policy 0, policy_version 832784 (0.0013) [2024-06-15 22:29:30,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 43542.6). Total num frames: 1705639936. Throughput: 0: 11002.3. Samples: 426471424. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:29:33,145][1652475] Updated weights for policy 0, policy_version 832882 (0.0088) [2024-06-15 22:29:35,479][1652475] Updated weights for policy 0, policy_version 832912 (0.0014) [2024-06-15 22:29:35,738][1648984] Fps is (10 sec: 39435.4, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 1705803776. Throughput: 0: 10854.4. Samples: 426502144. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:35,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:29:39,585][1652475] Updated weights for policy 0, policy_version 832982 (0.0012) [2024-06-15 22:29:39,936][1651340] Signal inference workers to stop experience collection... (42850 times) [2024-06-15 22:29:39,992][1652475] InferenceWorker_p0-w0: stopping experience collection (42850 times) [2024-06-15 22:29:40,204][1651340] Signal inference workers to resume experience collection... (42850 times) [2024-06-15 22:29:40,205][1652475] InferenceWorker_p0-w0: resuming experience collection (42850 times) [2024-06-15 22:29:40,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43542.6). Total num frames: 1706033152. Throughput: 0: 10786.2. Samples: 426568704. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:29:42,359][1652475] Updated weights for policy 0, policy_version 833042 (0.0027) [2024-06-15 22:29:43,696][1652475] Updated weights for policy 0, policy_version 833092 (0.0009) [2024-06-15 22:29:45,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 44236.7, 300 sec: 43431.5). Total num frames: 1706295296. Throughput: 0: 10524.4. Samples: 426626560. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:29:47,394][1652475] Updated weights for policy 0, policy_version 833154 (0.0013) [2024-06-15 22:29:48,716][1652475] Updated weights for policy 0, policy_version 833211 (0.0018) [2024-06-15 22:29:50,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 42598.2, 300 sec: 43098.2). Total num frames: 1706426368. Throughput: 0: 10649.5. Samples: 426658304. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:50,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:29:53,595][1652475] Updated weights for policy 0, policy_version 833264 (0.0014) [2024-06-15 22:29:55,539][1652475] Updated weights for policy 0, policy_version 833314 (0.0011) [2024-06-15 22:29:55,738][1648984] Fps is (10 sec: 32767.7, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 1706622976. Throughput: 0: 10672.4. Samples: 426725376. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:29:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:29:56,677][1652475] Updated weights for policy 0, policy_version 833361 (0.0019) [2024-06-15 22:29:57,767][1652475] Updated weights for policy 0, policy_version 833408 (0.0024) [2024-06-15 22:29:59,976][1652475] Updated weights for policy 0, policy_version 833472 (0.0013) [2024-06-15 22:30:00,738][1648984] Fps is (10 sec: 52430.5, 60 sec: 44236.8, 300 sec: 43209.4). Total num frames: 1706950656. Throughput: 0: 10467.6. Samples: 426783744. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:30:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:30:05,738][1648984] Fps is (10 sec: 36045.3, 60 sec: 39867.8, 300 sec: 43098.3). Total num frames: 1706983424. Throughput: 0: 10592.7. Samples: 426822656. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:30:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:30:07,221][1652475] Updated weights for policy 0, policy_version 833555 (0.0014) [2024-06-15 22:30:09,074][1652475] Updated weights for policy 0, policy_version 833601 (0.0012) [2024-06-15 22:30:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.9, 300 sec: 43542.6). Total num frames: 1707343872. Throughput: 0: 10474.3. Samples: 426882048. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:30:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:30:11,421][1652475] Updated weights for policy 0, policy_version 833680 (0.0065) [2024-06-15 22:30:12,427][1652475] Updated weights for policy 0, policy_version 833724 (0.0013) [2024-06-15 22:30:15,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 41506.2, 300 sec: 43098.3). Total num frames: 1707474944. Throughput: 0: 10672.4. Samples: 426951680. Policy #0 lag: (min: 15.0, avg: 102.8, max: 207.0) [2024-06-15 22:30:15,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:30:19,130][1652475] Updated weights for policy 0, policy_version 833795 (0.0018) [2024-06-15 22:30:20,542][1652475] Updated weights for policy 0, policy_version 833861 (0.0010) [2024-06-15 22:30:20,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1707769856. Throughput: 0: 10740.6. Samples: 426985472. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:30:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:30:23,461][1652475] Updated weights for policy 0, policy_version 833926 (0.0012) [2024-06-15 22:30:24,220][1651340] Signal inference workers to stop experience collection... (42900 times) [2024-06-15 22:30:24,273][1652475] InferenceWorker_p0-w0: stopping experience collection (42900 times) [2024-06-15 22:30:24,575][1651340] Signal inference workers to resume experience collection... (42900 times) [2024-06-15 22:30:24,575][1652475] InferenceWorker_p0-w0: resuming experience collection (42900 times) [2024-06-15 22:30:24,900][1652475] Updated weights for policy 0, policy_version 833984 (0.0011) [2024-06-15 22:30:25,744][1648984] Fps is (10 sec: 52428.4, 60 sec: 43165.3, 300 sec: 43209.3). Total num frames: 1707999232. Throughput: 0: 10501.7. Samples: 427041280. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:30:25,745][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:30:30,739][1648984] Fps is (10 sec: 29486.9, 60 sec: 40412.9, 300 sec: 42875.9). Total num frames: 1708064768. Throughput: 0: 10819.9. Samples: 427113472. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:30:30,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:30:31,456][1652475] Updated weights for policy 0, policy_version 834064 (0.0012) [2024-06-15 22:30:32,891][1652475] Updated weights for policy 0, policy_version 834128 (0.0013) [2024-06-15 22:30:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 1708392448. Throughput: 0: 10626.9. Samples: 427136512. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:30:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:30:36,629][1652475] Updated weights for policy 0, policy_version 834179 (0.0125) [2024-06-15 22:30:37,862][1652475] Updated weights for policy 0, policy_version 834240 (0.0019) [2024-06-15 22:30:40,740][1648984] Fps is (10 sec: 45881.3, 60 sec: 41506.0, 300 sec: 42653.9). Total num frames: 1708523520. Throughput: 0: 10695.1. Samples: 427206656. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:30:40,741][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 22:30:44,400][1652475] Updated weights for policy 0, policy_version 834336 (0.0012) [2024-06-15 22:30:45,738][1648984] Fps is (10 sec: 39320.8, 60 sec: 41506.0, 300 sec: 43098.2). Total num frames: 1708785664. Throughput: 0: 10649.5. Samples: 427262976. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:30:45,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:30:46,535][1652475] Updated weights for policy 0, policy_version 834384 (0.0012) [2024-06-15 22:30:47,458][1652475] Updated weights for policy 0, policy_version 834432 (0.0011) [2024-06-15 22:30:50,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 42052.5, 300 sec: 42765.0). Total num frames: 1708949504. Throughput: 0: 10524.4. Samples: 427296256. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:30:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:30:51,785][1652475] Updated weights for policy 0, policy_version 834492 (0.0014) [2024-06-15 22:30:53,476][1652475] Updated weights for policy 0, policy_version 834560 (0.0010) [2024-06-15 22:30:55,738][1648984] Fps is (10 sec: 42599.2, 60 sec: 43144.6, 300 sec: 43209.3). Total num frames: 1709211648. Throughput: 0: 10581.3. Samples: 427358208. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:30:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:30:56,154][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000834608_1709277184.pth... [2024-06-15 22:30:56,196][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000829568_1698955264.pth [2024-06-15 22:30:56,200][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000834608_1709277184.pth [2024-06-15 22:30:56,646][1652475] Updated weights for policy 0, policy_version 834624 (0.0014) [2024-06-15 22:31:00,738][1648984] Fps is (10 sec: 39321.1, 60 sec: 39867.6, 300 sec: 42320.7). Total num frames: 1709342720. Throughput: 0: 10535.8. Samples: 427425792. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:00,739][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 22:31:01,648][1652475] Updated weights for policy 0, policy_version 834680 (0.0014) [2024-06-15 22:31:04,186][1652475] Updated weights for policy 0, policy_version 834736 (0.0012) [2024-06-15 22:31:05,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 43320.4). Total num frames: 1709670400. Throughput: 0: 10467.5. Samples: 427456512. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:31:05,869][1652475] Updated weights for policy 0, policy_version 834809 (0.0013) [2024-06-15 22:31:08,257][1652475] Updated weights for policy 0, policy_version 834873 (0.0012) [2024-06-15 22:31:10,770][1648984] Fps is (10 sec: 48993.3, 60 sec: 41483.6, 300 sec: 42760.3). Total num frames: 1709834240. Throughput: 0: 10471.4. Samples: 427512832. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:10,771][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:31:14,423][1652475] Updated weights for policy 0, policy_version 834940 (0.0010) [2024-06-15 22:31:15,738][1648984] Fps is (10 sec: 29491.2, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1709965312. Throughput: 0: 10479.3. Samples: 427585024. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:31:16,317][1651340] Signal inference workers to stop experience collection... (42950 times) [2024-06-15 22:31:16,376][1652475] InferenceWorker_p0-w0: stopping experience collection (42950 times) [2024-06-15 22:31:16,542][1651340] Signal inference workers to resume experience collection... (42950 times) [2024-06-15 22:31:16,543][1652475] InferenceWorker_p0-w0: resuming experience collection (42950 times) [2024-06-15 22:31:17,136][1652475] Updated weights for policy 0, policy_version 835011 (0.0012) [2024-06-15 22:31:18,663][1652475] Updated weights for policy 0, policy_version 835067 (0.0011) [2024-06-15 22:31:20,424][1652475] Updated weights for policy 0, policy_version 835128 (0.0012) [2024-06-15 22:31:20,738][1648984] Fps is (10 sec: 52599.4, 60 sec: 43144.5, 300 sec: 43209.3). Total num frames: 1710358528. Throughput: 0: 10513.1. Samples: 427609600. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:31:25,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 39867.8, 300 sec: 42320.7). Total num frames: 1710391296. Throughput: 0: 10592.7. Samples: 427683328. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:31:26,499][1652475] Updated weights for policy 0, policy_version 835188 (0.0014) [2024-06-15 22:31:28,914][1652475] Updated weights for policy 0, policy_version 835266 (0.0012) [2024-06-15 22:31:30,251][1652475] Updated weights for policy 0, policy_version 835320 (0.0012) [2024-06-15 22:31:30,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 44784.0, 300 sec: 43098.2). Total num frames: 1710751744. Throughput: 0: 10558.6. Samples: 427738112. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:31:32,848][1652475] Updated weights for policy 0, policy_version 835391 (0.0013) [2024-06-15 22:31:35,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 41506.2, 300 sec: 42542.9). Total num frames: 1710882816. Throughput: 0: 10513.1. Samples: 427769344. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:31:39,416][1652475] Updated weights for policy 0, policy_version 835472 (0.0014) [2024-06-15 22:31:40,634][1652475] Updated weights for policy 0, policy_version 835518 (0.0012) [2024-06-15 22:31:40,738][1648984] Fps is (10 sec: 36044.4, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1711112192. Throughput: 0: 10786.1. Samples: 427843584. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:31:42,231][1652475] Updated weights for policy 0, policy_version 835572 (0.0020) [2024-06-15 22:31:44,129][1652475] Updated weights for policy 0, policy_version 835616 (0.0011) [2024-06-15 22:31:44,811][1652475] Updated weights for policy 0, policy_version 835644 (0.0017) [2024-06-15 22:31:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.9, 300 sec: 42987.2). Total num frames: 1711407104. Throughput: 0: 10661.0. Samples: 427905536. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:31:49,801][1652475] Updated weights for policy 0, policy_version 835684 (0.0018) [2024-06-15 22:31:50,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1711538176. Throughput: 0: 10831.6. Samples: 427943936. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:31:51,305][1652475] Updated weights for policy 0, policy_version 835744 (0.0013) [2024-06-15 22:31:53,565][1652475] Updated weights for policy 0, policy_version 835792 (0.0011) [2024-06-15 22:31:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43144.6, 300 sec: 42765.0). Total num frames: 1711800320. Throughput: 0: 10953.3. Samples: 428005376. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:31:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:31:55,854][1652475] Updated weights for policy 0, policy_version 835842 (0.0012) [2024-06-15 22:32:00,738][1648984] Fps is (10 sec: 39320.3, 60 sec: 43144.4, 300 sec: 42209.6). Total num frames: 1711931392. Throughput: 0: 10786.1. Samples: 428070400. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:32:00,739][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:32:01,169][1652475] Updated weights for policy 0, policy_version 835922 (0.0014) [2024-06-15 22:32:02,027][1651340] Signal inference workers to stop experience collection... (43000 times) [2024-06-15 22:32:02,096][1652475] InferenceWorker_p0-w0: stopping experience collection (43000 times) [2024-06-15 22:32:02,098][1652475] Updated weights for policy 0, policy_version 835975 (0.0013) [2024-06-15 22:32:02,258][1651340] Signal inference workers to resume experience collection... (43000 times) [2024-06-15 22:32:02,259][1652475] InferenceWorker_p0-w0: resuming experience collection (43000 times) [2024-06-15 22:32:03,533][1652475] Updated weights for policy 0, policy_version 836030 (0.0011) [2024-06-15 22:32:05,750][1648984] Fps is (10 sec: 39271.7, 60 sec: 42043.3, 300 sec: 42763.2). Total num frames: 1712193536. Throughput: 0: 10805.8. Samples: 428096000. Policy #0 lag: (min: 49.0, avg: 125.4, max: 289.0) [2024-06-15 22:32:05,751][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:32:07,863][1652475] Updated weights for policy 0, policy_version 836089 (0.0014) [2024-06-15 22:32:10,738][1648984] Fps is (10 sec: 45876.6, 60 sec: 42621.5, 300 sec: 42653.9). Total num frames: 1712390144. Throughput: 0: 10808.9. Samples: 428169728. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:32:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:32:10,951][1652475] Updated weights for policy 0, policy_version 836136 (0.0020) [2024-06-15 22:32:12,449][1652475] Updated weights for policy 0, policy_version 836183 (0.0011) [2024-06-15 22:32:13,687][1652475] Updated weights for policy 0, policy_version 836240 (0.0012) [2024-06-15 22:32:15,738][1648984] Fps is (10 sec: 52495.7, 60 sec: 45875.2, 300 sec: 43098.6). Total num frames: 1712717824. Throughput: 0: 10922.7. Samples: 428229632. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:32:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:32:19,740][1652475] Updated weights for policy 0, policy_version 836304 (0.0015) [2024-06-15 22:32:20,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 40960.1, 300 sec: 42987.2). Total num frames: 1712816128. Throughput: 0: 11138.8. Samples: 428270592. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:32:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:32:23,228][1652475] Updated weights for policy 0, policy_version 836384 (0.0012) [2024-06-15 22:32:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 45329.0, 300 sec: 42987.2). Total num frames: 1713111040. Throughput: 0: 10854.4. Samples: 428332032. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:32:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:32:25,780][1652475] Updated weights for policy 0, policy_version 836496 (0.0012) [2024-06-15 22:32:30,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1713242112. Throughput: 0: 10899.9. Samples: 428396032. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:32:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:32:31,913][1652475] Updated weights for policy 0, policy_version 836546 (0.0011) [2024-06-15 22:32:33,044][1652475] Updated weights for policy 0, policy_version 836608 (0.0013) [2024-06-15 22:32:35,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 1713471488. Throughput: 0: 10808.9. Samples: 428430336. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:32:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:32:36,630][1652475] Updated weights for policy 0, policy_version 836704 (0.0021) [2024-06-15 22:32:38,157][1652475] Updated weights for policy 0, policy_version 836754 (0.0017) [2024-06-15 22:32:39,268][1652475] Updated weights for policy 0, policy_version 836799 (0.0010) [2024-06-15 22:32:40,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 44236.8, 300 sec: 42876.1). Total num frames: 1713766400. Throughput: 0: 10706.5. Samples: 428487168. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:32:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:32:45,086][1652475] Updated weights for policy 0, policy_version 836860 (0.0024) [2024-06-15 22:32:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.1, 300 sec: 43098.2). Total num frames: 1713897472. Throughput: 0: 10809.0. Samples: 428556800. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:32:45,740][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:32:47,719][1651340] Signal inference workers to stop experience collection... (43050 times) [2024-06-15 22:32:47,781][1652475] InferenceWorker_p0-w0: stopping experience collection (43050 times) [2024-06-15 22:32:48,038][1651340] Signal inference workers to resume experience collection... (43050 times) [2024-06-15 22:32:48,039][1652475] InferenceWorker_p0-w0: resuming experience collection (43050 times) [2024-06-15 22:32:48,042][1652475] Updated weights for policy 0, policy_version 836928 (0.0136) [2024-06-15 22:32:49,652][1652475] Updated weights for policy 0, policy_version 836989 (0.0014) [2024-06-15 22:32:50,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 43098.2). Total num frames: 1714192384. Throughput: 0: 10891.6. Samples: 428585984. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:32:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:32:51,292][1652475] Updated weights for policy 0, policy_version 837042 (0.0013) [2024-06-15 22:32:55,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 41506.0, 300 sec: 42654.0). Total num frames: 1714290688. Throughput: 0: 10717.8. Samples: 428652032. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:32:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:32:56,152][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000837088_1714356224.pth... [2024-06-15 22:32:56,312][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000832080_1704099840.pth [2024-06-15 22:32:56,620][1652475] Updated weights for policy 0, policy_version 837104 (0.0012) [2024-06-15 22:33:00,101][1652475] Updated weights for policy 0, policy_version 837168 (0.0011) [2024-06-15 22:33:00,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43690.9, 300 sec: 42653.9). Total num frames: 1714552832. Throughput: 0: 10649.6. Samples: 428708864. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:33:01,771][1652475] Updated weights for policy 0, policy_version 837240 (0.0012) [2024-06-15 22:33:05,654][1652475] Updated weights for policy 0, policy_version 837283 (0.0011) [2024-06-15 22:33:05,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42607.4, 300 sec: 42431.8). Total num frames: 1714749440. Throughput: 0: 10399.3. Samples: 428738560. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:05,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:33:08,637][1652475] Updated weights for policy 0, policy_version 837360 (0.0013) [2024-06-15 22:33:10,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1714946048. Throughput: 0: 10365.2. Samples: 428798464. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:33:11,890][1652475] Updated weights for policy 0, policy_version 837428 (0.0056) [2024-06-15 22:33:15,381][1652475] Updated weights for policy 0, policy_version 837472 (0.0013) [2024-06-15 22:33:15,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 40960.0, 300 sec: 42544.0). Total num frames: 1715175424. Throughput: 0: 10490.3. Samples: 428868096. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:33:18,650][1652475] Updated weights for policy 0, policy_version 837564 (0.0015) [2024-06-15 22:33:20,573][1652475] Updated weights for policy 0, policy_version 837631 (0.0082) [2024-06-15 22:33:20,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 1715470336. Throughput: 0: 10342.4. Samples: 428895744. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:33:25,739][1648984] Fps is (10 sec: 42597.8, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 1715601408. Throughput: 0: 10501.7. Samples: 428959744. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:25,740][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:33:28,281][1652475] Updated weights for policy 0, policy_version 837712 (0.0025) [2024-06-15 22:33:30,172][1652475] Updated weights for policy 0, policy_version 837792 (0.0012) [2024-06-15 22:33:30,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1715863552. Throughput: 0: 10444.8. Samples: 429026816. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:33:31,185][1652475] Updated weights for policy 0, policy_version 837840 (0.0012) [2024-06-15 22:33:32,422][1652475] Updated weights for policy 0, policy_version 837885 (0.0011) [2024-06-15 22:33:35,238][1651340] Signal inference workers to stop experience collection... (43100 times) [2024-06-15 22:33:35,304][1652475] InferenceWorker_p0-w0: stopping experience collection (43100 times) [2024-06-15 22:33:35,537][1651340] Signal inference workers to resume experience collection... (43100 times) [2024-06-15 22:33:35,538][1652475] InferenceWorker_p0-w0: resuming experience collection (43100 times) [2024-06-15 22:33:35,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1716092928. Throughput: 0: 10501.7. Samples: 429058560. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:33:35,893][1652475] Updated weights for policy 0, policy_version 837952 (0.0010) [2024-06-15 22:33:40,738][1648984] Fps is (10 sec: 36044.5, 60 sec: 40960.0, 300 sec: 42653.9). Total num frames: 1716224000. Throughput: 0: 10649.6. Samples: 429131264. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:33:42,199][1652475] Updated weights for policy 0, policy_version 838064 (0.0074) [2024-06-15 22:33:44,569][1652475] Updated weights for policy 0, policy_version 838137 (0.0010) [2024-06-15 22:33:45,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1716518912. Throughput: 0: 10547.2. Samples: 429183488. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:33:50,740][1648984] Fps is (10 sec: 42587.7, 60 sec: 40958.3, 300 sec: 42653.6). Total num frames: 1716649984. Throughput: 0: 10671.8. Samples: 429218816. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:50,741][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:33:51,639][1652475] Updated weights for policy 0, policy_version 838210 (0.0046) [2024-06-15 22:33:54,075][1652475] Updated weights for policy 0, policy_version 838290 (0.0012) [2024-06-15 22:33:55,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.9, 300 sec: 42876.1). Total num frames: 1716944896. Throughput: 0: 10956.8. Samples: 429291520. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 22:33:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:33:56,155][1652475] Updated weights for policy 0, policy_version 838373 (0.0012) [2024-06-15 22:33:56,650][1652475] Updated weights for policy 0, policy_version 838400 (0.0011) [2024-06-15 22:34:00,738][1648984] Fps is (10 sec: 52442.0, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1717174272. Throughput: 0: 10797.5. Samples: 429353984. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:34:03,128][1652475] Updated weights for policy 0, policy_version 838465 (0.0012) [2024-06-15 22:34:05,738][1648984] Fps is (10 sec: 36043.8, 60 sec: 42598.3, 300 sec: 42653.9). Total num frames: 1717305344. Throughput: 0: 10990.9. Samples: 429390336. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:34:06,290][1652475] Updated weights for policy 0, policy_version 838544 (0.0068) [2024-06-15 22:34:08,312][1652475] Updated weights for policy 0, policy_version 838624 (0.0012) [2024-06-15 22:34:10,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1717567488. Throughput: 0: 10786.1. Samples: 429445120. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:34:13,069][1652475] Updated weights for policy 0, policy_version 838696 (0.0012) [2024-06-15 22:34:15,737][1652475] Updated weights for policy 0, policy_version 838768 (0.0012) [2024-06-15 22:34:15,738][1648984] Fps is (10 sec: 49153.8, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1717796864. Throughput: 0: 10740.6. Samples: 429510144. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:15,740][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 22:34:19,488][1652475] Updated weights for policy 0, policy_version 838816 (0.0010) [2024-06-15 22:34:20,738][1648984] Fps is (10 sec: 42597.3, 60 sec: 42052.1, 300 sec: 42658.1). Total num frames: 1717993472. Throughput: 0: 10786.1. Samples: 429543936. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:20,739][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:34:21,120][1652475] Updated weights for policy 0, policy_version 838886 (0.0013) [2024-06-15 22:34:25,738][1648984] Fps is (10 sec: 29490.9, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1718091776. Throughput: 0: 10524.5. Samples: 429604864. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:34:26,628][1651340] Signal inference workers to stop experience collection... (43150 times) [2024-06-15 22:34:26,687][1652475] InferenceWorker_p0-w0: stopping experience collection (43150 times) [2024-06-15 22:34:26,689][1652475] Updated weights for policy 0, policy_version 838947 (0.0013) [2024-06-15 22:34:26,918][1651340] Signal inference workers to resume experience collection... (43150 times) [2024-06-15 22:34:26,919][1652475] InferenceWorker_p0-w0: resuming experience collection (43150 times) [2024-06-15 22:34:28,667][1652475] Updated weights for policy 0, policy_version 839034 (0.0011) [2024-06-15 22:34:30,738][1648984] Fps is (10 sec: 39323.0, 60 sec: 42052.3, 300 sec: 42653.9). Total num frames: 1718386688. Throughput: 0: 10683.7. Samples: 429664256. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:34:31,541][1652475] Updated weights for policy 0, policy_version 839092 (0.0011) [2024-06-15 22:34:35,657][1652475] Updated weights for policy 0, policy_version 839163 (0.0012) [2024-06-15 22:34:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42052.3, 300 sec: 42653.9). Total num frames: 1718616064. Throughput: 0: 10593.3. Samples: 429695488. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:35,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:34:39,438][1652475] Updated weights for policy 0, policy_version 839226 (0.0121) [2024-06-15 22:34:40,738][1648984] Fps is (10 sec: 36044.8, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 1718747136. Throughput: 0: 10296.9. Samples: 429754880. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:34:41,091][1652475] Updated weights for policy 0, policy_version 839264 (0.0014) [2024-06-15 22:34:42,926][1652475] Updated weights for policy 0, policy_version 839312 (0.0095) [2024-06-15 22:34:45,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 1719009280. Throughput: 0: 10274.1. Samples: 429816320. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:34:48,113][1652475] Updated weights for policy 0, policy_version 839419 (0.0095) [2024-06-15 22:34:50,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 42054.1, 300 sec: 42542.9). Total num frames: 1719173120. Throughput: 0: 10160.4. Samples: 429847552. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:50,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:34:51,172][1652475] Updated weights for policy 0, policy_version 839460 (0.0013) [2024-06-15 22:34:53,589][1652475] Updated weights for policy 0, policy_version 839507 (0.0015) [2024-06-15 22:34:55,110][1652475] Updated weights for policy 0, policy_version 839568 (0.0014) [2024-06-15 22:34:55,738][1648984] Fps is (10 sec: 45873.3, 60 sec: 42051.9, 300 sec: 42431.7). Total num frames: 1719468032. Throughput: 0: 10456.1. Samples: 429915648. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:34:55,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:34:56,033][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000839616_1719533568.pth... [2024-06-15 22:34:56,081][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000834608_1709277184.pth [2024-06-15 22:34:59,616][1652475] Updated weights for policy 0, policy_version 839664 (0.0015) [2024-06-15 22:35:00,763][1648984] Fps is (10 sec: 49028.9, 60 sec: 41488.8, 300 sec: 42983.5). Total num frames: 1719664640. Throughput: 0: 10484.4. Samples: 429982208. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:00,764][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:35:03,106][1652475] Updated weights for policy 0, policy_version 839737 (0.0024) [2024-06-15 22:35:05,738][1648984] Fps is (10 sec: 39323.0, 60 sec: 42598.5, 300 sec: 42431.8). Total num frames: 1719861248. Throughput: 0: 10524.5. Samples: 430017536. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:35:06,193][1652475] Updated weights for policy 0, policy_version 839807 (0.0016) [2024-06-15 22:35:07,215][1652475] Updated weights for policy 0, policy_version 839861 (0.0011) [2024-06-15 22:35:10,738][1648984] Fps is (10 sec: 49276.1, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1720156160. Throughput: 0: 10717.9. Samples: 430087168. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:10,740][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:35:10,741][1652475] Updated weights for policy 0, policy_version 839929 (0.0012) [2024-06-15 22:35:14,452][1651340] Signal inference workers to stop experience collection... (43200 times) [2024-06-15 22:35:14,501][1652475] InferenceWorker_p0-w0: stopping experience collection (43200 times) [2024-06-15 22:35:14,616][1651340] Signal inference workers to resume experience collection... (43200 times) [2024-06-15 22:35:14,617][1652475] InferenceWorker_p0-w0: resuming experience collection (43200 times) [2024-06-15 22:35:15,168][1652475] Updated weights for policy 0, policy_version 839994 (0.0079) [2024-06-15 22:35:15,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42052.1, 300 sec: 42542.8). Total num frames: 1720320000. Throughput: 0: 11025.0. Samples: 430160384. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:35:18,129][1652475] Updated weights for policy 0, policy_version 840080 (0.0013) [2024-06-15 22:35:20,609][1652475] Updated weights for policy 0, policy_version 840145 (0.0021) [2024-06-15 22:35:20,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.9, 300 sec: 42765.0). Total num frames: 1720614912. Throughput: 0: 10956.8. Samples: 430188544. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:35:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 42876.3). Total num frames: 1720713216. Throughput: 0: 11229.8. Samples: 430260224. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:35:27,531][1652475] Updated weights for policy 0, policy_version 840208 (0.0158) [2024-06-15 22:35:28,746][1652475] Updated weights for policy 0, policy_version 840263 (0.0012) [2024-06-15 22:35:29,845][1652475] Updated weights for policy 0, policy_version 840318 (0.0013) [2024-06-15 22:35:30,738][1648984] Fps is (10 sec: 39320.2, 60 sec: 43690.4, 300 sec: 42765.0). Total num frames: 1721008128. Throughput: 0: 11229.8. Samples: 430321664. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:30,739][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 22:35:31,736][1652475] Updated weights for policy 0, policy_version 840376 (0.0012) [2024-06-15 22:35:33,470][1652475] Updated weights for policy 0, policy_version 840438 (0.0012) [2024-06-15 22:35:35,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1721237504. Throughput: 0: 11116.1. Samples: 430347776. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:35:40,287][1652475] Updated weights for policy 0, policy_version 840496 (0.0013) [2024-06-15 22:35:40,738][1648984] Fps is (10 sec: 36046.0, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1721368576. Throughput: 0: 11286.9. Samples: 430423552. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:40,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 22:35:42,955][1652475] Updated weights for policy 0, policy_version 840592 (0.0015) [2024-06-15 22:35:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 43690.8, 300 sec: 42987.2). Total num frames: 1721630720. Throughput: 0: 10849.1. Samples: 430470144. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:35:47,259][1652475] Updated weights for policy 0, policy_version 840659 (0.0012) [2024-06-15 22:35:50,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 1721761792. Throughput: 0: 10831.7. Samples: 430504960. Policy #0 lag: (min: 15.0, avg: 128.8, max: 271.0) [2024-06-15 22:35:50,740][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:35:52,567][1652475] Updated weights for policy 0, policy_version 840752 (0.0012) [2024-06-15 22:35:54,248][1652475] Updated weights for policy 0, policy_version 840817 (0.0011) [2024-06-15 22:35:55,307][1652475] Updated weights for policy 0, policy_version 840864 (0.0014) [2024-06-15 22:35:55,738][1648984] Fps is (10 sec: 49151.5, 60 sec: 44237.1, 300 sec: 43320.4). Total num frames: 1722122240. Throughput: 0: 10615.4. Samples: 430564864. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:35:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:36:00,053][1651340] Signal inference workers to stop experience collection... (43250 times) [2024-06-15 22:36:00,081][1652475] InferenceWorker_p0-w0: stopping experience collection (43250 times) [2024-06-15 22:36:00,220][1651340] Signal inference workers to resume experience collection... (43250 times) [2024-06-15 22:36:00,222][1652475] InferenceWorker_p0-w0: resuming experience collection (43250 times) [2024-06-15 22:36:00,698][1652475] Updated weights for policy 0, policy_version 840949 (0.0039) [2024-06-15 22:36:00,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43162.6, 300 sec: 42653.9). Total num frames: 1722253312. Throughput: 0: 10626.8. Samples: 430638592. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:36:04,439][1652475] Updated weights for policy 0, policy_version 841009 (0.0012) [2024-06-15 22:36:05,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 43690.8, 300 sec: 42880.8). Total num frames: 1722482688. Throughput: 0: 10797.5. Samples: 430674432. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:36:06,463][1652475] Updated weights for policy 0, policy_version 841088 (0.0012) [2024-06-15 22:36:08,067][1652475] Updated weights for policy 0, policy_version 841142 (0.0012) [2024-06-15 22:36:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42052.2, 300 sec: 43098.2). Total num frames: 1722679296. Throughput: 0: 10387.9. Samples: 430727680. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:36:12,556][1652475] Updated weights for policy 0, policy_version 841174 (0.0011) [2024-06-15 22:36:15,544][1652475] Updated weights for policy 0, policy_version 841220 (0.0011) [2024-06-15 22:36:15,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1722810368. Throughput: 0: 10615.6. Samples: 430799360. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:15,740][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:36:16,864][1652475] Updated weights for policy 0, policy_version 841275 (0.0012) [2024-06-15 22:36:18,376][1652475] Updated weights for policy 0, policy_version 841328 (0.0023) [2024-06-15 22:36:19,980][1652475] Updated weights for policy 0, policy_version 841392 (0.0025) [2024-06-15 22:36:20,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 43144.3, 300 sec: 43431.4). Total num frames: 1723203584. Throughput: 0: 10706.4. Samples: 430829568. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:36:25,246][1652475] Updated weights for policy 0, policy_version 841461 (0.0016) [2024-06-15 22:36:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1723334656. Throughput: 0: 10535.8. Samples: 430897664. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:36:27,636][1652475] Updated weights for policy 0, policy_version 841504 (0.0011) [2024-06-15 22:36:29,455][1652475] Updated weights for policy 0, policy_version 841568 (0.0012) [2024-06-15 22:36:30,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43144.7, 300 sec: 43098.2). Total num frames: 1723596800. Throughput: 0: 10865.7. Samples: 430959104. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:30,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 22:36:31,022][1652475] Updated weights for policy 0, policy_version 841616 (0.0014) [2024-06-15 22:36:35,740][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.1, 300 sec: 42765.0). Total num frames: 1723727872. Throughput: 0: 10752.0. Samples: 430988800. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:35,741][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 22:36:37,042][1652475] Updated weights for policy 0, policy_version 841696 (0.0119) [2024-06-15 22:36:40,201][1652475] Updated weights for policy 0, policy_version 841760 (0.0012) [2024-06-15 22:36:40,738][1648984] Fps is (10 sec: 32768.4, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1723924480. Throughput: 0: 10831.6. Samples: 431052288. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:36:43,847][1652475] Updated weights for policy 0, policy_version 841840 (0.0014) [2024-06-15 22:36:45,738][1648984] Fps is (10 sec: 42599.0, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1724153856. Throughput: 0: 10444.8. Samples: 431108608. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:36:46,341][1652475] Updated weights for policy 0, policy_version 841891 (0.0031) [2024-06-15 22:36:48,238][1651340] Signal inference workers to stop experience collection... (43300 times) [2024-06-15 22:36:48,304][1652475] InferenceWorker_p0-w0: stopping experience collection (43300 times) [2024-06-15 22:36:48,475][1651340] Signal inference workers to resume experience collection... (43300 times) [2024-06-15 22:36:48,482][1652475] InferenceWorker_p0-w0: resuming experience collection (43300 times) [2024-06-15 22:36:48,849][1652475] Updated weights for policy 0, policy_version 841952 (0.0010) [2024-06-15 22:36:50,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1724383232. Throughput: 0: 10387.9. Samples: 431141888. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:36:51,399][1652475] Updated weights for policy 0, policy_version 841989 (0.0012) [2024-06-15 22:36:52,831][1652475] Updated weights for policy 0, policy_version 842048 (0.0011) [2024-06-15 22:36:55,738][1648984] Fps is (10 sec: 36043.1, 60 sec: 39867.5, 300 sec: 42653.9). Total num frames: 1724514304. Throughput: 0: 10490.2. Samples: 431199744. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:36:55,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:36:55,752][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000842048_1724514304.pth... [2024-06-15 22:36:55,801][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000837088_1714356224.pth [2024-06-15 22:36:58,509][1652475] Updated weights for policy 0, policy_version 842102 (0.0101) [2024-06-15 22:37:00,444][1652475] Updated weights for policy 0, policy_version 842176 (0.0014) [2024-06-15 22:37:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42655.8). Total num frames: 1724776448. Throughput: 0: 10296.9. Samples: 431262720. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:37:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:37:01,950][1652475] Updated weights for policy 0, policy_version 842240 (0.0016) [2024-06-15 22:37:05,738][1648984] Fps is (10 sec: 52430.6, 60 sec: 42598.3, 300 sec: 42876.1). Total num frames: 1725038592. Throughput: 0: 10262.8. Samples: 431291392. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:37:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:37:10,633][1652475] Updated weights for policy 0, policy_version 842307 (0.0013) [2024-06-15 22:37:10,738][1648984] Fps is (10 sec: 26214.3, 60 sec: 39321.6, 300 sec: 41765.3). Total num frames: 1725038592. Throughput: 0: 10228.6. Samples: 431357952. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:37:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:37:12,219][1652475] Updated weights for policy 0, policy_version 842373 (0.0013) [2024-06-15 22:37:14,170][1652475] Updated weights for policy 0, policy_version 842449 (0.0011) [2024-06-15 22:37:15,132][1652475] Updated weights for policy 0, policy_version 842495 (0.0012) [2024-06-15 22:37:15,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 43690.4, 300 sec: 42765.0). Total num frames: 1725431808. Throughput: 0: 10148.9. Samples: 431415808. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:37:15,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:37:17,218][1652475] Updated weights for policy 0, policy_version 842554 (0.0013) [2024-06-15 22:37:20,738][1648984] Fps is (10 sec: 52428.0, 60 sec: 39321.6, 300 sec: 42209.6). Total num frames: 1725562880. Throughput: 0: 10114.8. Samples: 431443968. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:37:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:37:23,186][1652475] Updated weights for policy 0, policy_version 842596 (0.0011) [2024-06-15 22:37:25,525][1652475] Updated weights for policy 0, policy_version 842688 (0.0087) [2024-06-15 22:37:25,738][1648984] Fps is (10 sec: 39322.7, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1725825024. Throughput: 0: 10274.1. Samples: 431514624. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:37:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:37:26,888][1652475] Updated weights for policy 0, policy_version 842748 (0.0012) [2024-06-15 22:37:30,745][1648984] Fps is (10 sec: 42566.8, 60 sec: 39862.8, 300 sec: 42430.7). Total num frames: 1725988864. Throughput: 0: 10386.1. Samples: 431576064. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:37:30,746][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:37:31,477][1652475] Updated weights for policy 0, policy_version 842812 (0.0021) [2024-06-15 22:37:35,064][1652475] Updated weights for policy 0, policy_version 842850 (0.0014) [2024-06-15 22:37:35,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1726218240. Throughput: 0: 10331.0. Samples: 431606784. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:37:35,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:37:36,211][1651340] Signal inference workers to stop experience collection... (43350 times) [2024-06-15 22:37:36,271][1652475] Updated weights for policy 0, policy_version 842885 (0.0011) [2024-06-15 22:37:36,327][1652475] InferenceWorker_p0-w0: stopping experience collection (43350 times) [2024-06-15 22:37:36,452][1651340] Signal inference workers to resume experience collection... (43350 times) [2024-06-15 22:37:36,454][1652475] InferenceWorker_p0-w0: resuming experience collection (43350 times) [2024-06-15 22:37:37,790][1652475] Updated weights for policy 0, policy_version 842946 (0.0012) [2024-06-15 22:37:40,738][1648984] Fps is (10 sec: 49189.3, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1726480384. Throughput: 0: 10376.6. Samples: 431666688. Policy #0 lag: (min: 15.0, avg: 86.4, max: 271.0) [2024-06-15 22:37:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:37:43,959][1652475] Updated weights for policy 0, policy_version 843024 (0.0013) [2024-06-15 22:37:44,795][1652475] Updated weights for policy 0, policy_version 843067 (0.0013) [2024-06-15 22:37:45,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1726644224. Throughput: 0: 10615.5. Samples: 431740416. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:37:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:37:46,358][1652475] Updated weights for policy 0, policy_version 843131 (0.0012) [2024-06-15 22:37:50,495][1652475] Updated weights for policy 0, policy_version 843232 (0.0086) [2024-06-15 22:37:50,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1726971904. Throughput: 0: 10649.6. Samples: 431770624. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:37:50,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:37:55,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.4, 300 sec: 42209.6). Total num frames: 1727004672. Throughput: 0: 10661.0. Samples: 431837696. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:37:55,740][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:37:56,641][1652475] Updated weights for policy 0, policy_version 843280 (0.0130) [2024-06-15 22:37:58,107][1652475] Updated weights for policy 0, policy_version 843344 (0.0011) [2024-06-15 22:38:00,227][1652475] Updated weights for policy 0, policy_version 843408 (0.0014) [2024-06-15 22:38:00,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1727332352. Throughput: 0: 10706.6. Samples: 431897600. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:38:03,840][1652475] Updated weights for policy 0, policy_version 843490 (0.0012) [2024-06-15 22:38:04,476][1652475] Updated weights for policy 0, policy_version 843520 (0.0011) [2024-06-15 22:38:05,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1727528960. Throughput: 0: 10797.6. Samples: 431929856. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:38:10,545][1652475] Updated weights for policy 0, policy_version 843621 (0.0012) [2024-06-15 22:38:10,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45329.1, 300 sec: 42653.9). Total num frames: 1727758336. Throughput: 0: 10695.1. Samples: 431995904. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:38:12,581][1652475] Updated weights for policy 0, policy_version 843696 (0.0013) [2024-06-15 22:38:15,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 41506.4, 300 sec: 42209.6). Total num frames: 1727922176. Throughput: 0: 10856.2. Samples: 432064512. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:15,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:38:16,305][1652475] Updated weights for policy 0, policy_version 843744 (0.0010) [2024-06-15 22:38:20,018][1652475] Updated weights for policy 0, policy_version 843792 (0.0014) [2024-06-15 22:38:20,738][1648984] Fps is (10 sec: 39320.6, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 1728151552. Throughput: 0: 10911.3. Samples: 432097792. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:38:21,239][1651340] Signal inference workers to stop experience collection... (43400 times) [2024-06-15 22:38:21,279][1652475] InferenceWorker_p0-w0: stopping experience collection (43400 times) [2024-06-15 22:38:21,510][1651340] Signal inference workers to resume experience collection... (43400 times) [2024-06-15 22:38:21,511][1652475] InferenceWorker_p0-w0: resuming experience collection (43400 times) [2024-06-15 22:38:21,513][1652475] Updated weights for policy 0, policy_version 843856 (0.0012) [2024-06-15 22:38:22,534][1652475] Updated weights for policy 0, policy_version 843904 (0.0012) [2024-06-15 22:38:24,126][1652475] Updated weights for policy 0, policy_version 843963 (0.0012) [2024-06-15 22:38:25,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1728446464. Throughput: 0: 11013.7. Samples: 432162304. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:25,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:38:27,758][1652475] Updated weights for policy 0, policy_version 844004 (0.0011) [2024-06-15 22:38:30,738][1648984] Fps is (10 sec: 42599.3, 60 sec: 43150.0, 300 sec: 42320.7). Total num frames: 1728577536. Throughput: 0: 11104.7. Samples: 432240128. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:38:31,625][1652475] Updated weights for policy 0, policy_version 844070 (0.0011) [2024-06-15 22:38:34,693][1652475] Updated weights for policy 0, policy_version 844179 (0.0103) [2024-06-15 22:38:35,738][1648984] Fps is (10 sec: 52429.8, 60 sec: 45875.2, 300 sec: 43209.3). Total num frames: 1728970752. Throughput: 0: 10979.6. Samples: 432264704. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:35,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:38:40,183][1652475] Updated weights for policy 0, policy_version 844258 (0.0013) [2024-06-15 22:38:40,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 1729069056. Throughput: 0: 11025.1. Samples: 432333824. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:40,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:38:43,319][1652475] Updated weights for policy 0, policy_version 844311 (0.0013) [2024-06-15 22:38:44,979][1652475] Updated weights for policy 0, policy_version 844400 (0.0015) [2024-06-15 22:38:45,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 45329.0, 300 sec: 43098.6). Total num frames: 1729363968. Throughput: 0: 11104.7. Samples: 432397312. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:45,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:38:46,447][1652475] Updated weights for policy 0, policy_version 844448 (0.0023) [2024-06-15 22:38:50,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 1729495040. Throughput: 0: 11081.9. Samples: 432428544. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:50,750][1648984] Avg episode reward: [(0, '-0.150')] [2024-06-15 22:38:52,327][1652475] Updated weights for policy 0, policy_version 844528 (0.0012) [2024-06-15 22:38:55,742][1648984] Fps is (10 sec: 36028.8, 60 sec: 45325.6, 300 sec: 42542.2). Total num frames: 1729724416. Throughput: 0: 11137.7. Samples: 432497152. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:38:55,743][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:38:55,777][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000844608_1729757184.pth... [2024-06-15 22:38:55,785][1652475] Updated weights for policy 0, policy_version 844608 (0.0013) [2024-06-15 22:38:55,807][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000839616_1719533568.pth [2024-06-15 22:38:58,498][1652475] Updated weights for policy 0, policy_version 844662 (0.0093) [2024-06-15 22:39:00,027][1652475] Updated weights for policy 0, policy_version 844730 (0.0010) [2024-06-15 22:39:00,739][1648984] Fps is (10 sec: 52423.0, 60 sec: 44782.1, 300 sec: 43098.1). Total num frames: 1730019328. Throughput: 0: 10888.2. Samples: 432554496. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:39:00,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:39:03,651][1652475] Updated weights for policy 0, policy_version 844769 (0.0014) [2024-06-15 22:39:05,738][1648984] Fps is (10 sec: 42617.4, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1730150400. Throughput: 0: 10888.6. Samples: 432587776. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:39:05,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:39:06,050][1652475] Updated weights for policy 0, policy_version 844806 (0.0051) [2024-06-15 22:39:06,664][1651340] Signal inference workers to stop experience collection... (43450 times) [2024-06-15 22:39:06,708][1652475] InferenceWorker_p0-w0: stopping experience collection (43450 times) [2024-06-15 22:39:06,938][1651340] Signal inference workers to resume experience collection... (43450 times) [2024-06-15 22:39:06,950][1652475] InferenceWorker_p0-w0: resuming experience collection (43450 times) [2024-06-15 22:39:10,738][1648984] Fps is (10 sec: 26217.4, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 1730281472. Throughput: 0: 10934.1. Samples: 432654336. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:39:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:39:10,912][1652475] Updated weights for policy 0, policy_version 844881 (0.0013) [2024-06-15 22:39:11,987][1652475] Updated weights for policy 0, policy_version 844928 (0.0010) [2024-06-15 22:39:13,335][1652475] Updated weights for policy 0, policy_version 844992 (0.0103) [2024-06-15 22:39:15,738][1648984] Fps is (10 sec: 52429.3, 60 sec: 45875.2, 300 sec: 42987.2). Total num frames: 1730674688. Throughput: 0: 10524.4. Samples: 432713728. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:39:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:39:18,490][1652475] Updated weights for policy 0, policy_version 845074 (0.0012) [2024-06-15 22:39:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 44236.9, 300 sec: 43098.2). Total num frames: 1730805760. Throughput: 0: 10717.9. Samples: 432747008. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:39:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:39:24,501][1652475] Updated weights for policy 0, policy_version 845152 (0.0012) [2024-06-15 22:39:25,738][1648984] Fps is (10 sec: 26214.4, 60 sec: 41506.3, 300 sec: 42542.9). Total num frames: 1730936832. Throughput: 0: 10843.0. Samples: 432821760. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:39:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:39:26,175][1652475] Updated weights for policy 0, policy_version 845216 (0.0112) [2024-06-15 22:39:30,314][1652475] Updated weights for policy 0, policy_version 845313 (0.0013) [2024-06-15 22:39:30,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 1731231744. Throughput: 0: 10638.2. Samples: 432876032. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:39:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:39:31,780][1652475] Updated weights for policy 0, policy_version 845372 (0.0017) [2024-06-15 22:39:35,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 39321.6, 300 sec: 42653.9). Total num frames: 1731330048. Throughput: 0: 10706.5. Samples: 432910336. Policy #0 lag: (min: 3.0, avg: 102.0, max: 259.0) [2024-06-15 22:39:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:39:37,176][1652475] Updated weights for policy 0, policy_version 845440 (0.0015) [2024-06-15 22:39:39,359][1652475] Updated weights for policy 0, policy_version 845520 (0.0028) [2024-06-15 22:39:40,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 44236.8, 300 sec: 43098.3). Total num frames: 1731723264. Throughput: 0: 10548.3. Samples: 432971776. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:39:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:39:43,186][1652475] Updated weights for policy 0, policy_version 845584 (0.0031) [2024-06-15 22:39:44,205][1652475] Updated weights for policy 0, policy_version 845632 (0.0011) [2024-06-15 22:39:45,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 41506.2, 300 sec: 42987.2). Total num frames: 1731854336. Throughput: 0: 10831.9. Samples: 433041920. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:39:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:39:50,502][1652475] Updated weights for policy 0, policy_version 845728 (0.0013) [2024-06-15 22:39:50,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 42598.4, 300 sec: 42654.0). Total num frames: 1732050944. Throughput: 0: 10888.5. Samples: 433077760. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:39:50,758][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:39:51,496][1651340] Signal inference workers to stop experience collection... (43500 times) [2024-06-15 22:39:51,533][1652475] InferenceWorker_p0-w0: stopping experience collection (43500 times) [2024-06-15 22:39:51,823][1651340] Signal inference workers to resume experience collection... (43500 times) [2024-06-15 22:39:51,823][1652475] InferenceWorker_p0-w0: resuming experience collection (43500 times) [2024-06-15 22:39:52,763][1652475] Updated weights for policy 0, policy_version 845822 (0.0011) [2024-06-15 22:39:55,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 42055.4, 300 sec: 42657.6). Total num frames: 1732247552. Throughput: 0: 10478.9. Samples: 433125888. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:39:55,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:40:00,359][1652475] Updated weights for policy 0, policy_version 845889 (0.0016) [2024-06-15 22:40:00,738][1648984] Fps is (10 sec: 36043.9, 60 sec: 39868.3, 300 sec: 42542.8). Total num frames: 1732411392. Throughput: 0: 10774.7. Samples: 433198592. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:00,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:40:02,715][1652475] Updated weights for policy 0, policy_version 845984 (0.0120) [2024-06-15 22:40:04,178][1652475] Updated weights for policy 0, policy_version 846048 (0.0013) [2024-06-15 22:40:05,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1732771840. Throughput: 0: 10478.9. Samples: 433218560. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:05,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:40:10,738][1648984] Fps is (10 sec: 36045.8, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1732771840. Throughput: 0: 10410.7. Samples: 433290240. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:40:11,886][1652475] Updated weights for policy 0, policy_version 846118 (0.0029) [2024-06-15 22:40:14,464][1652475] Updated weights for policy 0, policy_version 846212 (0.0013) [2024-06-15 22:40:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 40960.0, 300 sec: 42431.8). Total num frames: 1733132288. Throughput: 0: 10319.6. Samples: 433340416. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:15,741][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:40:16,067][1652475] Updated weights for policy 0, policy_version 846278 (0.0012) [2024-06-15 22:40:20,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1733296128. Throughput: 0: 10205.9. Samples: 433369600. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:40:23,951][1652475] Updated weights for policy 0, policy_version 846337 (0.0026) [2024-06-15 22:40:25,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 42052.3, 300 sec: 42209.7). Total num frames: 1733459968. Throughput: 0: 10535.8. Samples: 433445888. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:25,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:40:26,019][1652475] Updated weights for policy 0, policy_version 846432 (0.0259) [2024-06-15 22:40:27,696][1652475] Updated weights for policy 0, policy_version 846512 (0.0012) [2024-06-15 22:40:29,875][1652475] Updated weights for policy 0, policy_version 846531 (0.0015) [2024-06-15 22:40:30,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 1733754880. Throughput: 0: 10240.0. Samples: 433502720. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:40:35,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1733820416. Throughput: 0: 10194.5. Samples: 433536512. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:35,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 22:40:35,994][1652475] Updated weights for policy 0, policy_version 846608 (0.0012) [2024-06-15 22:40:37,536][1651340] Signal inference workers to stop experience collection... (43550 times) [2024-06-15 22:40:37,640][1652475] InferenceWorker_p0-w0: stopping experience collection (43550 times) [2024-06-15 22:40:37,780][1651340] Signal inference workers to resume experience collection... (43550 times) [2024-06-15 22:40:37,781][1652475] InferenceWorker_p0-w0: resuming experience collection (43550 times) [2024-06-15 22:40:38,170][1652475] Updated weights for policy 0, policy_version 846704 (0.0201) [2024-06-15 22:40:40,105][1652475] Updated weights for policy 0, policy_version 846784 (0.0026) [2024-06-15 22:40:40,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1734213632. Throughput: 0: 10399.3. Samples: 433593856. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:40,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 22:40:44,184][1652475] Updated weights for policy 0, policy_version 846846 (0.0012) [2024-06-15 22:40:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1734344704. Throughput: 0: 10262.8. Samples: 433660416. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:45,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 22:40:49,407][1652475] Updated weights for policy 0, policy_version 846902 (0.0036) [2024-06-15 22:40:50,738][1648984] Fps is (10 sec: 36043.9, 60 sec: 42052.1, 300 sec: 42209.6). Total num frames: 1734574080. Throughput: 0: 10729.2. Samples: 433701376. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:50,739][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 22:40:50,777][1652475] Updated weights for policy 0, policy_version 846965 (0.0010) [2024-06-15 22:40:52,725][1652475] Updated weights for policy 0, policy_version 847039 (0.0018) [2024-06-15 22:40:55,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 1734803456. Throughput: 0: 10376.5. Samples: 433757184. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:40:55,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 22:40:55,935][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000847088_1734836224.pth... [2024-06-15 22:40:55,986][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000842048_1724514304.pth [2024-06-15 22:40:56,270][1652475] Updated weights for policy 0, policy_version 847104 (0.0017) [2024-06-15 22:41:00,738][1648984] Fps is (10 sec: 39322.7, 60 sec: 42598.6, 300 sec: 42320.7). Total num frames: 1734967296. Throughput: 0: 10945.4. Samples: 433832960. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:41:00,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 22:41:01,201][1652475] Updated weights for policy 0, policy_version 847173 (0.0014) [2024-06-15 22:41:02,376][1652475] Updated weights for policy 0, policy_version 847232 (0.0011) [2024-06-15 22:41:03,919][1652475] Updated weights for policy 0, policy_version 847289 (0.0018) [2024-06-15 22:41:05,738][1648984] Fps is (10 sec: 45875.6, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1735262208. Throughput: 0: 10877.2. Samples: 433859072. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:41:05,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 22:41:07,727][1652475] Updated weights for policy 0, policy_version 847355 (0.0013) [2024-06-15 22:41:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1735393280. Throughput: 0: 10729.2. Samples: 433928704. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:41:10,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 22:41:12,620][1652475] Updated weights for policy 0, policy_version 847415 (0.0011) [2024-06-15 22:41:14,148][1652475] Updated weights for policy 0, policy_version 847479 (0.0032) [2024-06-15 22:41:15,366][1652475] Updated weights for policy 0, policy_version 847522 (0.0013) [2024-06-15 22:41:15,742][1648984] Fps is (10 sec: 49129.3, 60 sec: 43687.3, 300 sec: 42542.2). Total num frames: 1735753728. Throughput: 0: 10750.9. Samples: 433986560. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:41:15,743][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:41:18,868][1652475] Updated weights for policy 0, policy_version 847573 (0.0014) [2024-06-15 22:41:20,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1735917568. Throughput: 0: 10797.5. Samples: 434022400. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:41:20,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:41:22,137][1651340] Signal inference workers to stop experience collection... (43600 times) [2024-06-15 22:41:22,167][1652475] InferenceWorker_p0-w0: stopping experience collection (43600 times) [2024-06-15 22:41:22,398][1651340] Signal inference workers to resume experience collection... (43600 times) [2024-06-15 22:41:22,398][1652475] InferenceWorker_p0-w0: resuming experience collection (43600 times) [2024-06-15 22:41:22,520][1652475] Updated weights for policy 0, policy_version 847632 (0.0133) [2024-06-15 22:41:23,391][1652475] Updated weights for policy 0, policy_version 847680 (0.0012) [2024-06-15 22:41:25,738][1648984] Fps is (10 sec: 29504.6, 60 sec: 43144.5, 300 sec: 42209.6). Total num frames: 1736048640. Throughput: 0: 11070.6. Samples: 434092032. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 22:41:25,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:41:27,864][1652475] Updated weights for policy 0, policy_version 847744 (0.0097) [2024-06-15 22:41:29,903][1652475] Updated weights for policy 0, policy_version 847812 (0.0011) [2024-06-15 22:41:30,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 43690.7, 300 sec: 42876.1). Total num frames: 1736376320. Throughput: 0: 10854.4. Samples: 434148864. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:41:30,738][1648984] Avg episode reward: [(0, '-0.170')] [2024-06-15 22:41:34,030][1652475] Updated weights for policy 0, policy_version 847877 (0.0011) [2024-06-15 22:41:35,491][1652475] Updated weights for policy 0, policy_version 847933 (0.0018) [2024-06-15 22:41:35,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 42876.1). Total num frames: 1736572928. Throughput: 0: 10695.2. Samples: 434182656. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:41:35,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:41:40,297][1652475] Updated weights for policy 0, policy_version 847974 (0.0024) [2024-06-15 22:41:40,738][1648984] Fps is (10 sec: 29491.1, 60 sec: 40960.1, 300 sec: 42431.8). Total num frames: 1736671232. Throughput: 0: 10956.8. Samples: 434250240. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:41:40,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 22:41:41,588][1652475] Updated weights for policy 0, policy_version 848027 (0.0151) [2024-06-15 22:41:43,133][1652475] Updated weights for policy 0, policy_version 848097 (0.0013) [2024-06-15 22:41:45,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 1736998912. Throughput: 0: 10592.7. Samples: 434309632. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:41:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:41:45,756][1652475] Updated weights for policy 0, policy_version 848145 (0.0017) [2024-06-15 22:41:50,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42052.5, 300 sec: 42654.0). Total num frames: 1737097216. Throughput: 0: 10740.6. Samples: 434342400. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:41:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:41:51,375][1652475] Updated weights for policy 0, policy_version 848224 (0.0228) [2024-06-15 22:41:53,502][1652475] Updated weights for policy 0, policy_version 848274 (0.0090) [2024-06-15 22:41:55,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43144.6, 300 sec: 42765.0). Total num frames: 1737392128. Throughput: 0: 10661.0. Samples: 434408448. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:41:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:41:56,179][1652475] Updated weights for policy 0, policy_version 848368 (0.0013) [2024-06-15 22:41:58,767][1652475] Updated weights for policy 0, policy_version 848445 (0.0048) [2024-06-15 22:42:00,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44236.8, 300 sec: 42653.9). Total num frames: 1737621504. Throughput: 0: 10798.6. Samples: 434472448. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:00,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:42:03,677][1652475] Updated weights for policy 0, policy_version 848507 (0.0011) [2024-06-15 22:42:05,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42052.2, 300 sec: 43209.3). Total num frames: 1737785344. Throughput: 0: 10797.5. Samples: 434508288. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:42:05,929][1652475] Updated weights for policy 0, policy_version 848544 (0.0013) [2024-06-15 22:42:08,105][1651340] Signal inference workers to stop experience collection... (43650 times) [2024-06-15 22:42:08,142][1652475] InferenceWorker_p0-w0: stopping experience collection (43650 times) [2024-06-15 22:42:08,493][1651340] Signal inference workers to resume experience collection... (43650 times) [2024-06-15 22:42:08,494][1652475] InferenceWorker_p0-w0: resuming experience collection (43650 times) [2024-06-15 22:42:08,496][1652475] Updated weights for policy 0, policy_version 848624 (0.0013) [2024-06-15 22:42:10,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.7, 300 sec: 42765.1). Total num frames: 1738047488. Throughput: 0: 10558.6. Samples: 434567168. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:10,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:42:10,759][1652475] Updated weights for policy 0, policy_version 848661 (0.0012) [2024-06-15 22:42:14,810][1652475] Updated weights for policy 0, policy_version 848720 (0.0011) [2024-06-15 22:42:15,597][1652475] Updated weights for policy 0, policy_version 848768 (0.0014) [2024-06-15 22:42:15,738][1648984] Fps is (10 sec: 49151.0, 60 sec: 42055.3, 300 sec: 43098.2). Total num frames: 1738276864. Throughput: 0: 10865.7. Samples: 434637824. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:15,739][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:42:19,057][1652475] Updated weights for policy 0, policy_version 848848 (0.0012) [2024-06-15 22:42:20,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1738539008. Throughput: 0: 10808.9. Samples: 434669056. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:42:24,266][1652475] Updated weights for policy 0, policy_version 848928 (0.0013) [2024-06-15 22:42:25,738][1648984] Fps is (10 sec: 39322.5, 60 sec: 43690.7, 300 sec: 42988.3). Total num frames: 1738670080. Throughput: 0: 10729.2. Samples: 434733056. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:42:26,425][1652475] Updated weights for policy 0, policy_version 848992 (0.0013) [2024-06-15 22:42:30,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 42052.3, 300 sec: 42987.2). Total num frames: 1738899456. Throughput: 0: 10877.2. Samples: 434799104. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:42:30,888][1652475] Updated weights for policy 0, policy_version 849088 (0.0093) [2024-06-15 22:42:35,738][1648984] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1739063296. Throughput: 0: 10592.7. Samples: 434819072. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:42:36,293][1652475] Updated weights for policy 0, policy_version 849153 (0.0118) [2024-06-15 22:42:38,015][1652475] Updated weights for policy 0, policy_version 849232 (0.0129) [2024-06-15 22:42:39,117][1652475] Updated weights for policy 0, policy_version 849276 (0.0012) [2024-06-15 22:42:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 1739325440. Throughput: 0: 10592.7. Samples: 434885120. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:40,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:42:43,049][1652475] Updated weights for policy 0, policy_version 849312 (0.0012) [2024-06-15 22:42:45,601][1652475] Updated weights for policy 0, policy_version 849377 (0.0099) [2024-06-15 22:42:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 1739522048. Throughput: 0: 10706.5. Samples: 434954240. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:45,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:42:48,600][1652475] Updated weights for policy 0, policy_version 849426 (0.0012) [2024-06-15 22:42:50,395][1652475] Updated weights for policy 0, policy_version 849504 (0.0012) [2024-06-15 22:42:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 1739784192. Throughput: 0: 10661.0. Samples: 434988032. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:50,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:42:54,590][1652475] Updated weights for policy 0, policy_version 849568 (0.0012) [2024-06-15 22:42:55,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 43144.5, 300 sec: 42876.1). Total num frames: 1739980800. Throughput: 0: 10729.2. Samples: 435049984. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:42:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:42:55,744][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000849600_1739980800.pth... [2024-06-15 22:42:55,797][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000844608_1729757184.pth [2024-06-15 22:42:58,117][1651340] Signal inference workers to stop experience collection... (43700 times) [2024-06-15 22:42:58,163][1652475] InferenceWorker_p0-w0: stopping experience collection (43700 times) [2024-06-15 22:42:58,377][1651340] Signal inference workers to resume experience collection... (43700 times) [2024-06-15 22:42:58,378][1652475] InferenceWorker_p0-w0: resuming experience collection (43700 times) [2024-06-15 22:42:58,614][1652475] Updated weights for policy 0, policy_version 849641 (0.0014) [2024-06-15 22:43:00,698][1652475] Updated weights for policy 0, policy_version 849699 (0.0012) [2024-06-15 22:43:00,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1740177408. Throughput: 0: 10626.9. Samples: 435116032. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:43:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:43:01,937][1652475] Updated weights for policy 0, policy_version 849760 (0.0013) [2024-06-15 22:43:05,738][1648984] Fps is (10 sec: 39322.2, 60 sec: 43144.6, 300 sec: 42765.0). Total num frames: 1740374016. Throughput: 0: 10570.0. Samples: 435144704. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:43:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:43:06,141][1652475] Updated weights for policy 0, policy_version 849798 (0.0053) [2024-06-15 22:43:07,188][1652475] Updated weights for policy 0, policy_version 849848 (0.0015) [2024-06-15 22:43:10,595][1652475] Updated weights for policy 0, policy_version 849892 (0.0013) [2024-06-15 22:43:10,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 42598.4, 300 sec: 42987.2). Total num frames: 1740603392. Throughput: 0: 10786.1. Samples: 435218432. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:43:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:43:12,192][1652475] Updated weights for policy 0, policy_version 849955 (0.0029) [2024-06-15 22:43:13,586][1652475] Updated weights for policy 0, policy_version 850000 (0.0012) [2024-06-15 22:43:15,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.9, 300 sec: 43209.4). Total num frames: 1740898304. Throughput: 0: 10626.8. Samples: 435277312. Policy #0 lag: (min: 13.0, avg: 100.3, max: 275.0) [2024-06-15 22:43:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:43:18,357][1652475] Updated weights for policy 0, policy_version 850049 (0.0012) [2024-06-15 22:43:19,834][1652475] Updated weights for policy 0, policy_version 850107 (0.0011) [2024-06-15 22:43:20,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 1741029376. Throughput: 0: 11047.8. Samples: 435316224. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:43:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:43:23,189][1652475] Updated weights for policy 0, policy_version 850165 (0.0014) [2024-06-15 22:43:24,894][1652475] Updated weights for policy 0, policy_version 850224 (0.0013) [2024-06-15 22:43:25,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1741291520. Throughput: 0: 10888.5. Samples: 435375104. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:43:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:43:26,615][1652475] Updated weights for policy 0, policy_version 850280 (0.0013) [2024-06-15 22:43:30,570][1652475] Updated weights for policy 0, policy_version 850307 (0.0011) [2024-06-15 22:43:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 1741422592. Throughput: 0: 10888.5. Samples: 435444224. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:43:30,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:43:31,897][1652475] Updated weights for policy 0, policy_version 850368 (0.0011) [2024-06-15 22:43:35,738][1648984] Fps is (10 sec: 42597.4, 60 sec: 44236.5, 300 sec: 42876.1). Total num frames: 1741717504. Throughput: 0: 10854.3. Samples: 435476480. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:43:35,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:43:36,110][1652475] Updated weights for policy 0, policy_version 850464 (0.0120) [2024-06-15 22:43:40,217][1652475] Updated weights for policy 0, policy_version 850499 (0.0028) [2024-06-15 22:43:40,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42052.2, 300 sec: 42320.7). Total num frames: 1741848576. Throughput: 0: 10786.1. Samples: 435535360. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:43:40,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:43:43,119][1652475] Updated weights for policy 0, policy_version 850576 (0.0019) [2024-06-15 22:43:45,738][1648984] Fps is (10 sec: 36046.0, 60 sec: 42598.4, 300 sec: 42654.0). Total num frames: 1742077952. Throughput: 0: 10683.7. Samples: 435596800. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:43:45,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:43:46,280][1652475] Updated weights for policy 0, policy_version 850629 (0.0105) [2024-06-15 22:43:46,550][1651340] Signal inference workers to stop experience collection... (43750 times) [2024-06-15 22:43:46,682][1652475] InferenceWorker_p0-w0: stopping experience collection (43750 times) [2024-06-15 22:43:46,845][1651340] Signal inference workers to resume experience collection... (43750 times) [2024-06-15 22:43:46,850][1652475] InferenceWorker_p0-w0: resuming experience collection (43750 times) [2024-06-15 22:43:48,690][1652475] Updated weights for policy 0, policy_version 850725 (0.0012) [2024-06-15 22:43:50,748][1648984] Fps is (10 sec: 49102.0, 60 sec: 42591.1, 300 sec: 42764.2). Total num frames: 1742340096. Throughput: 0: 10658.5. Samples: 435624448. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:43:50,749][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:43:55,057][1652475] Updated weights for policy 0, policy_version 850815 (0.0013) [2024-06-15 22:43:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 41506.2, 300 sec: 42209.8). Total num frames: 1742471168. Throughput: 0: 10558.6. Samples: 435693568. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:43:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:43:58,424][1652475] Updated weights for policy 0, policy_version 850864 (0.0013) [2024-06-15 22:44:00,191][1652475] Updated weights for policy 0, policy_version 850933 (0.0011) [2024-06-15 22:44:00,738][1648984] Fps is (10 sec: 42642.2, 60 sec: 43144.6, 300 sec: 42765.0). Total num frames: 1742766080. Throughput: 0: 10478.9. Samples: 435748864. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:44:05,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1742864384. Throughput: 0: 10274.1. Samples: 435778560. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:44:05,971][1652475] Updated weights for policy 0, policy_version 851010 (0.0159) [2024-06-15 22:44:07,347][1652475] Updated weights for policy 0, policy_version 851072 (0.0013) [2024-06-15 22:44:09,847][1652475] Updated weights for policy 0, policy_version 851126 (0.0102) [2024-06-15 22:44:10,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 43144.5, 300 sec: 42431.8). Total num frames: 1743192064. Throughput: 0: 10604.1. Samples: 435852288. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:44:11,404][1652475] Updated weights for policy 0, policy_version 851200 (0.0205) [2024-06-15 22:44:13,729][1652475] Updated weights for policy 0, policy_version 851263 (0.0011) [2024-06-15 22:44:15,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1743388672. Throughput: 0: 10501.7. Samples: 435916800. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:44:20,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1743519744. Throughput: 0: 10467.6. Samples: 435947520. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:44:21,235][1652475] Updated weights for policy 0, policy_version 851344 (0.0015) [2024-06-15 22:44:23,837][1652475] Updated weights for policy 0, policy_version 851447 (0.0206) [2024-06-15 22:44:25,738][1648984] Fps is (10 sec: 42597.0, 60 sec: 42052.1, 300 sec: 42653.9). Total num frames: 1743814656. Throughput: 0: 10387.9. Samples: 436002816. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:25,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:44:26,403][1652475] Updated weights for policy 0, policy_version 851513 (0.0012) [2024-06-15 22:44:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1744044032. Throughput: 0: 10524.4. Samples: 436070400. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:44:33,529][1652475] Updated weights for policy 0, policy_version 851588 (0.0014) [2024-06-15 22:44:33,833][1651340] Signal inference workers to stop experience collection... (43800 times) [2024-06-15 22:44:33,871][1652475] InferenceWorker_p0-w0: stopping experience collection (43800 times) [2024-06-15 22:44:34,045][1651340] Signal inference workers to resume experience collection... (43800 times) [2024-06-15 22:44:34,046][1652475] InferenceWorker_p0-w0: resuming experience collection (43800 times) [2024-06-15 22:44:34,872][1652475] Updated weights for policy 0, policy_version 851644 (0.0109) [2024-06-15 22:44:35,738][1648984] Fps is (10 sec: 36046.1, 60 sec: 40960.3, 300 sec: 42209.6). Total num frames: 1744175104. Throughput: 0: 10720.3. Samples: 436106752. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:35,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 22:44:38,364][1652475] Updated weights for policy 0, policy_version 851744 (0.0013) [2024-06-15 22:44:40,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 43144.3, 300 sec: 42653.9). Total num frames: 1744437248. Throughput: 0: 10319.6. Samples: 436157952. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:40,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:44:42,127][1652475] Updated weights for policy 0, policy_version 851808 (0.0010) [2024-06-15 22:44:45,738][1648984] Fps is (10 sec: 39320.0, 60 sec: 41505.9, 300 sec: 42431.7). Total num frames: 1744568320. Throughput: 0: 10592.6. Samples: 436225536. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:45,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:44:46,447][1652475] Updated weights for policy 0, policy_version 851842 (0.0014) [2024-06-15 22:44:47,814][1652475] Updated weights for policy 0, policy_version 851900 (0.0019) [2024-06-15 22:44:50,197][1652475] Updated weights for policy 0, policy_version 851953 (0.0013) [2024-06-15 22:44:50,738][1648984] Fps is (10 sec: 39322.8, 60 sec: 41513.2, 300 sec: 42653.9). Total num frames: 1744830464. Throughput: 0: 10615.4. Samples: 436256256. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 22:44:51,257][1652475] Updated weights for policy 0, policy_version 852000 (0.0012) [2024-06-15 22:44:53,269][1652475] Updated weights for policy 0, policy_version 852050 (0.0012) [2024-06-15 22:44:55,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 43690.4, 300 sec: 42987.2). Total num frames: 1745092608. Throughput: 0: 10342.3. Samples: 436317696. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:44:55,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:44:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000852096_1745092608.pth... [2024-06-15 22:44:55,825][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000847088_1734836224.pth [2024-06-15 22:45:00,471][1652475] Updated weights for policy 0, policy_version 852119 (0.0011) [2024-06-15 22:45:00,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 39867.7, 300 sec: 41987.5). Total num frames: 1745158144. Throughput: 0: 10558.6. Samples: 436391936. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:45:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:45:01,945][1652475] Updated weights for policy 0, policy_version 852192 (0.0012) [2024-06-15 22:45:03,334][1652475] Updated weights for policy 0, policy_version 852242 (0.0105) [2024-06-15 22:45:05,516][1652475] Updated weights for policy 0, policy_version 852322 (0.0028) [2024-06-15 22:45:05,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 45328.9, 300 sec: 43431.5). Total num frames: 1745584128. Throughput: 0: 10524.4. Samples: 436421120. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:45:05,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:45:10,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 40413.8, 300 sec: 42320.7). Total num frames: 1745616896. Throughput: 0: 10672.4. Samples: 436483072. Policy #0 lag: (min: 5.0, avg: 107.9, max: 261.0) [2024-06-15 22:45:10,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:45:11,769][1652475] Updated weights for policy 0, policy_version 852372 (0.0012) [2024-06-15 22:45:14,119][1652475] Updated weights for policy 0, policy_version 852432 (0.0025) [2024-06-15 22:45:15,738][1648984] Fps is (10 sec: 29491.9, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1745879040. Throughput: 0: 10683.7. Samples: 436551168. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:45:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:45:15,827][1652475] Updated weights for policy 0, policy_version 852496 (0.0012) [2024-06-15 22:45:17,151][1651340] Signal inference workers to stop experience collection... (43850 times) [2024-06-15 22:45:17,186][1652475] InferenceWorker_p0-w0: stopping experience collection (43850 times) [2024-06-15 22:45:17,378][1651340] Signal inference workers to resume experience collection... (43850 times) [2024-06-15 22:45:17,379][1652475] InferenceWorker_p0-w0: resuming experience collection (43850 times) [2024-06-15 22:45:18,226][1652475] Updated weights for policy 0, policy_version 852596 (0.0105) [2024-06-15 22:45:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1746141184. Throughput: 0: 10319.6. Samples: 436571136. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:45:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:45:23,817][1652475] Updated weights for policy 0, policy_version 852629 (0.0011) [2024-06-15 22:45:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 40960.2, 300 sec: 42431.8). Total num frames: 1746272256. Throughput: 0: 10956.9. Samples: 436651008. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:45:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:45:26,472][1652475] Updated weights for policy 0, policy_version 852674 (0.0012) [2024-06-15 22:45:28,465][1652475] Updated weights for policy 0, policy_version 852759 (0.0013) [2024-06-15 22:45:29,766][1652475] Updated weights for policy 0, policy_version 852816 (0.0011) [2024-06-15 22:45:30,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 1746632704. Throughput: 0: 10740.7. Samples: 436708864. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:45:30,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:45:35,280][1652475] Updated weights for policy 0, policy_version 852880 (0.0012) [2024-06-15 22:45:35,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.3, 300 sec: 42431.8). Total num frames: 1746731008. Throughput: 0: 10831.6. Samples: 436743680. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:45:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:45:39,100][1652475] Updated weights for policy 0, policy_version 852944 (0.0014) [2024-06-15 22:45:40,083][1652475] Updated weights for policy 0, policy_version 852980 (0.0011) [2024-06-15 22:45:40,738][1648984] Fps is (10 sec: 32767.4, 60 sec: 42052.4, 300 sec: 42765.0). Total num frames: 1746960384. Throughput: 0: 11013.7. Samples: 436813312. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:45:40,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:45:42,007][1652475] Updated weights for policy 0, policy_version 853056 (0.0012) [2024-06-15 22:45:43,060][1652475] Updated weights for policy 0, policy_version 853104 (0.0013) [2024-06-15 22:45:45,740][1648984] Fps is (10 sec: 45875.3, 60 sec: 43690.9, 300 sec: 42765.1). Total num frames: 1747189760. Throughput: 0: 10774.7. Samples: 436876800. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:45:45,741][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:45:48,153][1652475] Updated weights for policy 0, policy_version 853168 (0.0013) [2024-06-15 22:45:50,738][1648984] Fps is (10 sec: 36045.5, 60 sec: 41506.2, 300 sec: 42431.8). Total num frames: 1747320832. Throughput: 0: 10729.3. Samples: 436903936. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:45:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:45:52,286][1652475] Updated weights for policy 0, policy_version 853248 (0.0012) [2024-06-15 22:45:53,923][1652475] Updated weights for policy 0, policy_version 853312 (0.0011) [2024-06-15 22:45:55,226][1652475] Updated weights for policy 0, policy_version 853372 (0.0012) [2024-06-15 22:45:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.9, 300 sec: 43209.3). Total num frames: 1747714048. Throughput: 0: 10638.2. Samples: 436961792. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:45:55,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:46:00,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.7, 300 sec: 42431.8). Total num frames: 1747779584. Throughput: 0: 10592.7. Samples: 437027840. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:00,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:46:03,174][1652475] Updated weights for policy 0, policy_version 853441 (0.0160) [2024-06-15 22:46:04,071][1651340] Signal inference workers to stop experience collection... (43900 times) [2024-06-15 22:46:04,099][1652475] InferenceWorker_p0-w0: stopping experience collection (43900 times) [2024-06-15 22:46:04,297][1651340] Signal inference workers to resume experience collection... (43900 times) [2024-06-15 22:46:04,297][1652475] InferenceWorker_p0-w0: resuming experience collection (43900 times) [2024-06-15 22:46:05,018][1652475] Updated weights for policy 0, policy_version 853520 (0.0012) [2024-06-15 22:46:05,738][1648984] Fps is (10 sec: 32768.3, 60 sec: 40960.1, 300 sec: 42876.1). Total num frames: 1748041728. Throughput: 0: 10899.9. Samples: 437061632. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:46:06,041][1652475] Updated weights for policy 0, policy_version 853567 (0.0012) [2024-06-15 22:46:09,213][1652475] Updated weights for policy 0, policy_version 853621 (0.0015) [2024-06-15 22:46:10,738][1648984] Fps is (10 sec: 45873.2, 60 sec: 43690.4, 300 sec: 42321.3). Total num frames: 1748238336. Throughput: 0: 10422.0. Samples: 437120000. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:10,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:46:14,454][1652475] Updated weights for policy 0, policy_version 853688 (0.0025) [2024-06-15 22:46:15,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1748434944. Throughput: 0: 10513.1. Samples: 437181952. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:46:15,889][1652475] Updated weights for policy 0, policy_version 853746 (0.0013) [2024-06-15 22:46:17,423][1652475] Updated weights for policy 0, policy_version 853808 (0.0024) [2024-06-15 22:46:20,738][1648984] Fps is (10 sec: 39323.1, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1748631552. Throughput: 0: 10319.7. Samples: 437208064. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:46:22,896][1652475] Updated weights for policy 0, policy_version 853856 (0.0096) [2024-06-15 22:46:25,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 41506.1, 300 sec: 41987.5). Total num frames: 1748762624. Throughput: 0: 10183.2. Samples: 437271552. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:46:26,959][1652475] Updated weights for policy 0, policy_version 853920 (0.0023) [2024-06-15 22:46:29,452][1652475] Updated weights for policy 0, policy_version 854017 (0.0101) [2024-06-15 22:46:30,471][1652475] Updated weights for policy 0, policy_version 854077 (0.0023) [2024-06-15 22:46:30,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 42052.2, 300 sec: 42653.9). Total num frames: 1749155840. Throughput: 0: 10103.5. Samples: 437331456. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:46:35,212][1652475] Updated weights for policy 0, policy_version 854135 (0.0011) [2024-06-15 22:46:35,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 1749286912. Throughput: 0: 10365.2. Samples: 437370368. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:35,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:46:39,252][1652475] Updated weights for policy 0, policy_version 854208 (0.0012) [2024-06-15 22:46:40,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 42598.5, 300 sec: 42431.8). Total num frames: 1749516288. Throughput: 0: 10570.0. Samples: 437437440. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:46:41,356][1652475] Updated weights for policy 0, policy_version 854275 (0.0012) [2024-06-15 22:46:45,406][1652475] Updated weights for policy 0, policy_version 854354 (0.0011) [2024-06-15 22:46:45,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1749745664. Throughput: 0: 10524.4. Samples: 437501440. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:46:49,627][1652475] Updated weights for policy 0, policy_version 854402 (0.0012) [2024-06-15 22:46:49,937][1651340] Signal inference workers to stop experience collection... (43950 times) [2024-06-15 22:46:49,984][1652475] InferenceWorker_p0-w0: stopping experience collection (43950 times) [2024-06-15 22:46:50,131][1651340] Signal inference workers to resume experience collection... (43950 times) [2024-06-15 22:46:50,132][1652475] InferenceWorker_p0-w0: resuming experience collection (43950 times) [2024-06-15 22:46:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43144.6, 300 sec: 42431.8). Total num frames: 1749909504. Throughput: 0: 10490.3. Samples: 437533696. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:50,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:46:51,055][1652475] Updated weights for policy 0, policy_version 854463 (0.0015) [2024-06-15 22:46:53,640][1652475] Updated weights for policy 0, policy_version 854531 (0.0011) [2024-06-15 22:46:54,494][1652475] Updated weights for policy 0, policy_version 854582 (0.0012) [2024-06-15 22:46:55,738][1648984] Fps is (10 sec: 45874.3, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1750204416. Throughput: 0: 10626.9. Samples: 437598208. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:46:55,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:46:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000854592_1750204416.pth... [2024-06-15 22:46:55,800][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000849600_1739980800.pth [2024-06-15 22:46:57,563][1652475] Updated weights for policy 0, policy_version 854640 (0.0015) [2024-06-15 22:47:00,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 42598.3, 300 sec: 42542.9). Total num frames: 1750335488. Throughput: 0: 10831.6. Samples: 437669376. Policy #0 lag: (min: 2.0, avg: 68.8, max: 258.0) [2024-06-15 22:47:00,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:47:02,720][1652475] Updated weights for policy 0, policy_version 854714 (0.0012) [2024-06-15 22:47:05,253][1652475] Updated weights for policy 0, policy_version 854769 (0.0102) [2024-06-15 22:47:05,738][1648984] Fps is (10 sec: 39319.9, 60 sec: 42598.0, 300 sec: 42542.8). Total num frames: 1750597632. Throughput: 0: 10842.9. Samples: 437696000. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:05,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:47:06,180][1652475] Updated weights for policy 0, policy_version 854804 (0.0012) [2024-06-15 22:47:08,453][1652475] Updated weights for policy 0, policy_version 854867 (0.0012) [2024-06-15 22:47:10,738][1648984] Fps is (10 sec: 52427.8, 60 sec: 43690.8, 300 sec: 42653.9). Total num frames: 1750859776. Throughput: 0: 10763.3. Samples: 437755904. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:10,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:47:15,142][1652475] Updated weights for policy 0, policy_version 854944 (0.0102) [2024-06-15 22:47:15,738][1648984] Fps is (10 sec: 39324.2, 60 sec: 42598.4, 300 sec: 42209.6). Total num frames: 1750990848. Throughput: 0: 11059.2. Samples: 437829120. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:47:16,556][1652475] Updated weights for policy 0, policy_version 855012 (0.0032) [2024-06-15 22:47:19,678][1652475] Updated weights for policy 0, policy_version 855104 (0.0013) [2024-06-15 22:47:20,738][1648984] Fps is (10 sec: 49153.4, 60 sec: 45329.1, 300 sec: 42987.2). Total num frames: 1751351296. Throughput: 0: 10968.2. Samples: 437863936. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:47:25,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 42320.7). Total num frames: 1751384064. Throughput: 0: 10774.8. Samples: 437922304. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:25,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:47:26,666][1652475] Updated weights for policy 0, policy_version 855172 (0.0012) [2024-06-15 22:47:29,243][1652475] Updated weights for policy 0, policy_version 855250 (0.0012) [2024-06-15 22:47:30,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 42052.3, 300 sec: 42765.0). Total num frames: 1751678976. Throughput: 0: 10854.4. Samples: 437989888. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:47:31,615][1652475] Updated weights for policy 0, policy_version 855360 (0.0011) [2024-06-15 22:47:32,675][1651340] Signal inference workers to stop experience collection... (44000 times) [2024-06-15 22:47:32,717][1652475] InferenceWorker_p0-w0: stopping experience collection (44000 times) [2024-06-15 22:47:33,067][1651340] Signal inference workers to resume experience collection... (44000 times) [2024-06-15 22:47:33,068][1652475] InferenceWorker_p0-w0: resuming experience collection (44000 times) [2024-06-15 22:47:35,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1751908352. Throughput: 0: 10558.6. Samples: 438008832. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:47:38,910][1652475] Updated weights for policy 0, policy_version 855427 (0.0015) [2024-06-15 22:47:40,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 1752039424. Throughput: 0: 10831.7. Samples: 438085632. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:47:41,858][1652475] Updated weights for policy 0, policy_version 855490 (0.0015) [2024-06-15 22:47:43,393][1652475] Updated weights for policy 0, policy_version 855556 (0.0011) [2024-06-15 22:47:45,235][1652475] Updated weights for policy 0, policy_version 855632 (0.0012) [2024-06-15 22:47:45,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1752367104. Throughput: 0: 10478.9. Samples: 438140928. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:47:50,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 1752432640. Throughput: 0: 10649.8. Samples: 438175232. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:47:50,791][1652475] Updated weights for policy 0, policy_version 855696 (0.0012) [2024-06-15 22:47:53,436][1652475] Updated weights for policy 0, policy_version 855748 (0.0044) [2024-06-15 22:47:54,603][1652475] Updated weights for policy 0, policy_version 855808 (0.0013) [2024-06-15 22:47:55,740][1648984] Fps is (10 sec: 36044.4, 60 sec: 42052.3, 300 sec: 42542.8). Total num frames: 1752727552. Throughput: 0: 10888.6. Samples: 438245888. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:47:55,740][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:47:56,756][1652475] Updated weights for policy 0, policy_version 855872 (0.0011) [2024-06-15 22:48:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1752956928. Throughput: 0: 10467.6. Samples: 438300160. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:48:03,209][1652475] Updated weights for policy 0, policy_version 855937 (0.0017) [2024-06-15 22:48:05,404][1652475] Updated weights for policy 0, policy_version 856001 (0.0012) [2024-06-15 22:48:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 42052.7, 300 sec: 42431.8). Total num frames: 1753120768. Throughput: 0: 10581.3. Samples: 438340096. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:05,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:48:06,854][1652475] Updated weights for policy 0, policy_version 856060 (0.0013) [2024-06-15 22:48:09,875][1652475] Updated weights for policy 0, policy_version 856116 (0.0011) [2024-06-15 22:48:10,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 42052.4, 300 sec: 42320.7). Total num frames: 1753382912. Throughput: 0: 10695.1. Samples: 438403584. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:10,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:48:15,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1753481216. Throughput: 0: 10535.8. Samples: 438464000. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:48:15,930][1652475] Updated weights for policy 0, policy_version 856210 (0.0151) [2024-06-15 22:48:18,477][1652475] Updated weights for policy 0, policy_version 856304 (0.0013) [2024-06-15 22:48:20,738][1648984] Fps is (10 sec: 36044.9, 60 sec: 39867.7, 300 sec: 42209.6). Total num frames: 1753743360. Throughput: 0: 10672.3. Samples: 438489088. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:48:21,859][1652475] Updated weights for policy 0, policy_version 856339 (0.0010) [2024-06-15 22:48:22,202][1651340] Signal inference workers to stop experience collection... (44050 times) [2024-06-15 22:48:22,266][1652475] InferenceWorker_p0-w0: stopping experience collection (44050 times) [2024-06-15 22:48:22,595][1651340] Signal inference workers to resume experience collection... (44050 times) [2024-06-15 22:48:22,596][1652475] InferenceWorker_p0-w0: resuming experience collection (44050 times) [2024-06-15 22:48:24,055][1652475] Updated weights for policy 0, policy_version 856389 (0.0012) [2024-06-15 22:48:25,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1754005504. Throughput: 0: 10444.8. Samples: 438555648. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:48:28,129][1652475] Updated weights for policy 0, policy_version 856464 (0.0012) [2024-06-15 22:48:30,685][1652475] Updated weights for policy 0, policy_version 856566 (0.0086) [2024-06-15 22:48:30,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1754234880. Throughput: 0: 10490.3. Samples: 438612992. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:48:34,918][1652475] Updated weights for policy 0, policy_version 856640 (0.0017) [2024-06-15 22:48:35,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 42542.9). Total num frames: 1754398720. Throughput: 0: 10444.8. Samples: 438645248. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:48:40,738][1648984] Fps is (10 sec: 29491.0, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1754529792. Throughput: 0: 10240.0. Samples: 438706688. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:48:40,880][1652475] Updated weights for policy 0, policy_version 856720 (0.0013) [2024-06-15 22:48:42,230][1652475] Updated weights for policy 0, policy_version 856767 (0.0013) [2024-06-15 22:48:43,620][1652475] Updated weights for policy 0, policy_version 856823 (0.0017) [2024-06-15 22:48:45,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 40960.0, 300 sec: 42322.2). Total num frames: 1754824704. Throughput: 0: 10490.3. Samples: 438772224. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:45,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:48:46,009][1652475] Updated weights for policy 0, policy_version 856866 (0.0012) [2024-06-15 22:48:50,204][1652475] Updated weights for policy 0, policy_version 856928 (0.0012) [2024-06-15 22:48:50,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 43144.5, 300 sec: 42542.9). Total num frames: 1755021312. Throughput: 0: 10342.4. Samples: 438805504. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 22:48:50,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:48:52,029][1652475] Updated weights for policy 0, policy_version 856967 (0.0011) [2024-06-15 22:48:52,822][1652475] Updated weights for policy 0, policy_version 857015 (0.0012) [2024-06-15 22:48:55,738][1648984] Fps is (10 sec: 39320.5, 60 sec: 41506.1, 300 sec: 42209.6). Total num frames: 1755217920. Throughput: 0: 10615.4. Samples: 438881280. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:48:55,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:48:56,090][1652475] Updated weights for policy 0, policy_version 857059 (0.0011) [2024-06-15 22:48:56,305][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000857072_1755283456.pth... [2024-06-15 22:48:56,506][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000852096_1745092608.pth [2024-06-15 22:49:00,716][1652475] Updated weights for policy 0, policy_version 857159 (0.0013) [2024-06-15 22:49:00,738][1648984] Fps is (10 sec: 42598.3, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1755447296. Throughput: 0: 10490.3. Samples: 438936064. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:00,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:49:01,894][1652475] Updated weights for policy 0, policy_version 857216 (0.0011) [2024-06-15 22:49:05,738][1648984] Fps is (10 sec: 42599.1, 60 sec: 42052.2, 300 sec: 42209.6). Total num frames: 1755643904. Throughput: 0: 10672.4. Samples: 438969344. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:49:08,494][1652475] Updated weights for policy 0, policy_version 857282 (0.0033) [2024-06-15 22:49:09,940][1651340] Signal inference workers to stop experience collection... (44100 times) [2024-06-15 22:49:10,066][1652475] InferenceWorker_p0-w0: stopping experience collection (44100 times) [2024-06-15 22:49:10,195][1651340] Signal inference workers to resume experience collection... (44100 times) [2024-06-15 22:49:10,196][1652475] InferenceWorker_p0-w0: resuming experience collection (44100 times) [2024-06-15 22:49:10,610][1652475] Updated weights for policy 0, policy_version 857376 (0.0215) [2024-06-15 22:49:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42052.3, 300 sec: 42431.8). Total num frames: 1755906048. Throughput: 0: 10615.5. Samples: 439033344. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:10,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:49:13,522][1652475] Updated weights for policy 0, policy_version 857465 (0.0014) [2024-06-15 22:49:15,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 1756102656. Throughput: 0: 10672.3. Samples: 439093248. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:49:20,672][1652475] Updated weights for policy 0, policy_version 857539 (0.0015) [2024-06-15 22:49:20,738][1648984] Fps is (10 sec: 32768.0, 60 sec: 41506.1, 300 sec: 42098.6). Total num frames: 1756233728. Throughput: 0: 10865.8. Samples: 439134208. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:49:22,168][1652475] Updated weights for policy 0, policy_version 857610 (0.0012) [2024-06-15 22:49:24,224][1652475] Updated weights for policy 0, policy_version 857690 (0.0141) [2024-06-15 22:49:25,156][1652475] Updated weights for policy 0, policy_version 857728 (0.0016) [2024-06-15 22:49:25,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1756626944. Throughput: 0: 10672.4. Samples: 439186944. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:49:30,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 39867.7, 300 sec: 42209.6). Total num frames: 1756626944. Throughput: 0: 10945.4. Samples: 439264768. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:30,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:49:31,916][1652475] Updated weights for policy 0, policy_version 857786 (0.0013) [2024-06-15 22:49:33,013][1652475] Updated weights for policy 0, policy_version 857840 (0.0014) [2024-06-15 22:49:35,036][1652475] Updated weights for policy 0, policy_version 857922 (0.0013) [2024-06-15 22:49:35,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 42765.1). Total num frames: 1757052928. Throughput: 0: 10934.0. Samples: 439297536. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:35,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:49:40,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 42654.0). Total num frames: 1757151232. Throughput: 0: 10638.3. Samples: 439360000. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:49:41,568][1652475] Updated weights for policy 0, policy_version 857985 (0.0013) [2024-06-15 22:49:44,483][1652475] Updated weights for policy 0, policy_version 858064 (0.0012) [2024-06-15 22:49:45,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 42653.9). Total num frames: 1757413376. Throughput: 0: 11025.1. Samples: 439432192. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:45,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:49:46,122][1652475] Updated weights for policy 0, policy_version 858128 (0.0011) [2024-06-15 22:49:47,941][1652475] Updated weights for policy 0, policy_version 858195 (0.0012) [2024-06-15 22:49:49,131][1652475] Updated weights for policy 0, policy_version 858238 (0.0011) [2024-06-15 22:49:50,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 44236.7, 300 sec: 42654.0). Total num frames: 1757675520. Throughput: 0: 10729.2. Samples: 439452160. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:50,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:49:52,978][1651340] Signal inference workers to stop experience collection... (44150 times) [2024-06-15 22:49:53,057][1652475] InferenceWorker_p0-w0: stopping experience collection (44150 times) [2024-06-15 22:49:53,195][1651340] Signal inference workers to resume experience collection... (44150 times) [2024-06-15 22:49:53,196][1652475] InferenceWorker_p0-w0: resuming experience collection (44150 times) [2024-06-15 22:49:53,707][1652475] Updated weights for policy 0, policy_version 858299 (0.0011) [2024-06-15 22:49:55,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 43690.9, 300 sec: 42987.2). Total num frames: 1757839360. Throughput: 0: 10991.0. Samples: 439527936. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:49:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:49:57,685][1652475] Updated weights for policy 0, policy_version 858370 (0.0014) [2024-06-15 22:50:00,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 43690.5, 300 sec: 42320.7). Total num frames: 1758068736. Throughput: 0: 11013.6. Samples: 439588864. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:50:00,739][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 22:50:01,143][1652475] Updated weights for policy 0, policy_version 858437 (0.0046) [2024-06-15 22:50:02,520][1652475] Updated weights for policy 0, policy_version 858494 (0.0102) [2024-06-15 22:50:05,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 44783.0, 300 sec: 43098.3). Total num frames: 1758330880. Throughput: 0: 10854.4. Samples: 439622656. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:50:05,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 22:50:07,249][1652475] Updated weights for policy 0, policy_version 858576 (0.0013) [2024-06-15 22:50:08,345][1652475] Updated weights for policy 0, policy_version 858623 (0.0013) [2024-06-15 22:50:10,738][1648984] Fps is (10 sec: 49153.3, 60 sec: 44236.8, 300 sec: 42987.2). Total num frames: 1758560256. Throughput: 0: 11241.3. Samples: 439692800. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:50:10,741][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 22:50:10,796][1652475] Updated weights for policy 0, policy_version 858683 (0.0013) [2024-06-15 22:50:14,539][1652475] Updated weights for policy 0, policy_version 858743 (0.0011) [2024-06-15 22:50:15,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1758724096. Throughput: 0: 10956.8. Samples: 439757824. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:50:15,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 22:50:16,436][1652475] Updated weights for policy 0, policy_version 858788 (0.0011) [2024-06-15 22:50:19,382][1652475] Updated weights for policy 0, policy_version 858871 (0.0014) [2024-06-15 22:50:20,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 45875.2, 300 sec: 43098.3). Total num frames: 1758986240. Throughput: 0: 11036.4. Samples: 439794176. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:50:20,738][1648984] Avg episode reward: [(0, '-0.170')] [2024-06-15 22:50:24,852][1652475] Updated weights for policy 0, policy_version 858947 (0.0022) [2024-06-15 22:50:25,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 42598.4, 300 sec: 42542.9). Total num frames: 1759182848. Throughput: 0: 11127.4. Samples: 439860736. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:50:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:50:26,275][1652475] Updated weights for policy 0, policy_version 859004 (0.0084) [2024-06-15 22:50:27,796][1652475] Updated weights for policy 0, policy_version 859056 (0.0012) [2024-06-15 22:50:30,429][1652475] Updated weights for policy 0, policy_version 859109 (0.0031) [2024-06-15 22:50:30,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 47513.7, 300 sec: 43209.3). Total num frames: 1759477760. Throughput: 0: 11025.1. Samples: 439928320. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:50:30,740][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 22:50:30,988][1652475] Updated weights for policy 0, policy_version 859136 (0.0011) [2024-06-15 22:50:35,596][1652475] Updated weights for policy 0, policy_version 859199 (0.0013) [2024-06-15 22:50:35,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 43144.6, 300 sec: 42987.2). Total num frames: 1759641600. Throughput: 0: 11400.6. Samples: 439965184. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:50:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:50:37,298][1652475] Updated weights for policy 0, policy_version 859248 (0.0012) [2024-06-15 22:50:38,380][1652475] Updated weights for policy 0, policy_version 859271 (0.0012) [2024-06-15 22:50:40,304][1651340] Signal inference workers to stop experience collection... (44200 times) [2024-06-15 22:50:40,462][1652475] InferenceWorker_p0-w0: stopping experience collection (44200 times) [2024-06-15 22:50:40,495][1651340] Signal inference workers to resume experience collection... (44200 times) [2024-06-15 22:50:40,498][1652475] InferenceWorker_p0-w0: resuming experience collection (44200 times) [2024-06-15 22:50:40,587][1652475] Updated weights for policy 0, policy_version 859344 (0.0101) [2024-06-15 22:50:40,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 46421.3, 300 sec: 43209.3). Total num frames: 1759936512. Throughput: 0: 11104.7. Samples: 440027648. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:50:40,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:50:45,738][1648984] Fps is (10 sec: 39321.0, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1760034816. Throughput: 0: 11309.6. Samples: 440097792. Policy #0 lag: (min: 63.0, avg: 163.3, max: 319.0) [2024-06-15 22:50:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:50:47,679][1652475] Updated weights for policy 0, policy_version 859424 (0.0026) [2024-06-15 22:50:49,421][1652475] Updated weights for policy 0, policy_version 859488 (0.0150) [2024-06-15 22:50:50,738][1648984] Fps is (10 sec: 36043.8, 60 sec: 43690.5, 300 sec: 42653.9). Total num frames: 1760296960. Throughput: 0: 11252.5. Samples: 440129024. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:50:50,739][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 22:50:51,738][1652475] Updated weights for policy 0, policy_version 859543 (0.0012) [2024-06-15 22:50:53,431][1652475] Updated weights for policy 0, policy_version 859619 (0.0132) [2024-06-15 22:50:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 45329.0, 300 sec: 43320.4). Total num frames: 1760559104. Throughput: 0: 10899.9. Samples: 440183296. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:50:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:50:55,754][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000859648_1760559104.pth... [2024-06-15 22:50:55,832][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000854592_1750204416.pth [2024-06-15 22:50:55,837][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000859648_1760559104.pth [2024-06-15 22:51:00,531][1652475] Updated weights for policy 0, policy_version 859700 (0.0042) [2024-06-15 22:51:00,738][1648984] Fps is (10 sec: 39322.9, 60 sec: 43690.8, 300 sec: 42876.1). Total num frames: 1760690176. Throughput: 0: 11104.7. Samples: 440257536. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:51:03,039][1652475] Updated weights for policy 0, policy_version 859782 (0.0013) [2024-06-15 22:51:05,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1760952320. Throughput: 0: 10843.0. Samples: 440282112. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:51:06,115][1652475] Updated weights for policy 0, policy_version 859844 (0.0012) [2024-06-15 22:51:10,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 42052.1, 300 sec: 42876.1). Total num frames: 1761083392. Throughput: 0: 10865.7. Samples: 440349696. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:10,739][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 22:51:12,108][1652475] Updated weights for policy 0, policy_version 859952 (0.0177) [2024-06-15 22:51:13,624][1652475] Updated weights for policy 0, policy_version 860021 (0.0013) [2024-06-15 22:51:15,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 43320.4). Total num frames: 1761411072. Throughput: 0: 10820.3. Samples: 440415232. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:15,738][1648984] Avg episode reward: [(0, '-0.170')] [2024-06-15 22:51:15,960][1652475] Updated weights for policy 0, policy_version 860080 (0.0011) [2024-06-15 22:51:20,443][1652475] Updated weights for policy 0, policy_version 860155 (0.0013) [2024-06-15 22:51:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 43690.6, 300 sec: 43542.6). Total num frames: 1761607680. Throughput: 0: 10740.6. Samples: 440448512. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:20,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:51:24,485][1652475] Updated weights for policy 0, policy_version 860227 (0.0027) [2024-06-15 22:51:25,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 44236.9, 300 sec: 42987.2). Total num frames: 1761837056. Throughput: 0: 10615.5. Samples: 440505344. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:51:28,860][1651340] Signal inference workers to stop experience collection... (44250 times) [2024-06-15 22:51:28,896][1652475] Updated weights for policy 0, policy_version 860289 (0.0014) [2024-06-15 22:51:28,917][1652475] InferenceWorker_p0-w0: stopping experience collection (44250 times) [2024-06-15 22:51:29,120][1651340] Signal inference workers to resume experience collection... (44250 times) [2024-06-15 22:51:29,150][1652475] InferenceWorker_p0-w0: resuming experience collection (44250 times) [2024-06-15 22:51:30,310][1652475] Updated weights for policy 0, policy_version 860345 (0.0012) [2024-06-15 22:51:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.2, 300 sec: 43098.3). Total num frames: 1762000896. Throughput: 0: 10467.6. Samples: 440568832. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:51:32,896][1652475] Updated weights for policy 0, policy_version 860400 (0.0023) [2024-06-15 22:51:35,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 43144.5, 300 sec: 43098.3). Total num frames: 1762230272. Throughput: 0: 10524.5. Samples: 440602624. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:51:36,029][1652475] Updated weights for policy 0, policy_version 860479 (0.0111) [2024-06-15 22:51:37,308][1652475] Updated weights for policy 0, policy_version 860544 (0.0013) [2024-06-15 22:51:40,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 40960.0, 300 sec: 42876.1). Total num frames: 1762394112. Throughput: 0: 10706.5. Samples: 440665088. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:51:44,796][1652475] Updated weights for policy 0, policy_version 860640 (0.0080) [2024-06-15 22:51:45,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 43209.3). Total num frames: 1762656256. Throughput: 0: 10456.2. Samples: 440728064. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:45,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:51:47,077][1652475] Updated weights for policy 0, policy_version 860690 (0.0011) [2024-06-15 22:51:48,815][1652475] Updated weights for policy 0, policy_version 860771 (0.0017) [2024-06-15 22:51:50,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.9, 300 sec: 43098.3). Total num frames: 1762918400. Throughput: 0: 10547.2. Samples: 440756736. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:50,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:51:55,738][1648984] Fps is (10 sec: 29491.4, 60 sec: 39867.9, 300 sec: 42765.0). Total num frames: 1762951168. Throughput: 0: 10763.4. Samples: 440834048. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:51:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:51:56,665][1652475] Updated weights for policy 0, policy_version 860864 (0.0012) [2024-06-15 22:51:59,344][1652475] Updated weights for policy 0, policy_version 860946 (0.0012) [2024-06-15 22:52:00,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43098.3). Total num frames: 1763311616. Throughput: 0: 10387.9. Samples: 440882688. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:52:00,741][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 22:52:01,416][1652475] Updated weights for policy 0, policy_version 861008 (0.0010) [2024-06-15 22:52:02,206][1652475] Updated weights for policy 0, policy_version 861056 (0.0014) [2024-06-15 22:52:05,750][1648984] Fps is (10 sec: 49090.1, 60 sec: 41497.5, 300 sec: 42652.2). Total num frames: 1763442688. Throughput: 0: 10521.5. Samples: 440922112. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:52:05,751][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 22:52:09,286][1652475] Updated weights for policy 0, policy_version 861152 (0.0012) [2024-06-15 22:52:10,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 43690.8, 300 sec: 43098.2). Total num frames: 1763704832. Throughput: 0: 10626.8. Samples: 440983552. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:52:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:52:11,108][1652475] Updated weights for policy 0, policy_version 861216 (0.0012) [2024-06-15 22:52:15,682][1651340] Signal inference workers to stop experience collection... (44300 times) [2024-06-15 22:52:15,738][1648984] Fps is (10 sec: 39370.7, 60 sec: 40413.8, 300 sec: 42320.7). Total num frames: 1763835904. Throughput: 0: 10649.6. Samples: 441048064. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:52:15,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:52:15,749][1652475] InferenceWorker_p0-w0: stopping experience collection (44300 times) [2024-06-15 22:52:15,929][1651340] Signal inference workers to resume experience collection... (44300 times) [2024-06-15 22:52:15,929][1652475] InferenceWorker_p0-w0: resuming experience collection (44300 times) [2024-06-15 22:52:16,559][1652475] Updated weights for policy 0, policy_version 861283 (0.0013) [2024-06-15 22:52:19,673][1652475] Updated weights for policy 0, policy_version 861330 (0.0011) [2024-06-15 22:52:20,738][1648984] Fps is (10 sec: 36045.0, 60 sec: 40960.1, 300 sec: 42987.2). Total num frames: 1764065280. Throughput: 0: 10604.1. Samples: 441079808. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:52:20,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:52:21,489][1652475] Updated weights for policy 0, policy_version 861408 (0.0012) [2024-06-15 22:52:23,510][1652475] Updated weights for policy 0, policy_version 861480 (0.0081) [2024-06-15 22:52:25,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 1764360192. Throughput: 0: 10444.8. Samples: 441135104. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:52:25,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 22:52:29,917][1652475] Updated weights for policy 0, policy_version 861536 (0.0012) [2024-06-15 22:52:30,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 41506.2, 300 sec: 42653.9). Total num frames: 1764491264. Throughput: 0: 10740.6. Samples: 441211392. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:52:30,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 22:52:31,863][1652475] Updated weights for policy 0, policy_version 861605 (0.0012) [2024-06-15 22:52:33,624][1652475] Updated weights for policy 0, policy_version 861688 (0.0080) [2024-06-15 22:52:35,511][1652475] Updated weights for policy 0, policy_version 861744 (0.0076) [2024-06-15 22:52:35,738][1648984] Fps is (10 sec: 49150.3, 60 sec: 43690.4, 300 sec: 43431.4). Total num frames: 1764851712. Throughput: 0: 10569.9. Samples: 441232384. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:52:35,739][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:52:40,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 41506.2, 300 sec: 42431.8). Total num frames: 1764884480. Throughput: 0: 10433.4. Samples: 441303552. Policy #0 lag: (min: 13.0, avg: 82.0, max: 269.0) [2024-06-15 22:52:40,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:52:42,016][1652475] Updated weights for policy 0, policy_version 861779 (0.0011) [2024-06-15 22:52:44,080][1652475] Updated weights for policy 0, policy_version 861856 (0.0059) [2024-06-15 22:52:45,716][1652475] Updated weights for policy 0, policy_version 861920 (0.0017) [2024-06-15 22:52:45,738][1648984] Fps is (10 sec: 36045.8, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 1765212160. Throughput: 0: 10615.4. Samples: 441360384. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:52:45,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:52:48,703][1652475] Updated weights for policy 0, policy_version 862011 (0.0128) [2024-06-15 22:52:50,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 1765408768. Throughput: 0: 10379.4. Samples: 441389056. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:52:50,738][1648984] Avg episode reward: [(0, '-0.190')] [2024-06-15 22:52:55,738][1648984] Fps is (10 sec: 32767.5, 60 sec: 43144.3, 300 sec: 42653.9). Total num frames: 1765539840. Throughput: 0: 10729.2. Samples: 441466368. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:52:55,739][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 22:52:55,913][1652475] Updated weights for policy 0, policy_version 862087 (0.0011) [2024-06-15 22:52:56,044][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000862096_1765572608.pth... [2024-06-15 22:52:56,190][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000857072_1755283456.pth [2024-06-15 22:52:58,196][1651340] Signal inference workers to stop experience collection... (44350 times) [2024-06-15 22:52:58,229][1652475] InferenceWorker_p0-w0: stopping experience collection (44350 times) [2024-06-15 22:52:58,411][1651340] Signal inference workers to resume experience collection... (44350 times) [2024-06-15 22:52:58,412][1652475] InferenceWorker_p0-w0: resuming experience collection (44350 times) [2024-06-15 22:52:58,415][1652475] Updated weights for policy 0, policy_version 862192 (0.0013) [2024-06-15 22:53:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42987.2). Total num frames: 1765801984. Throughput: 0: 10478.9. Samples: 441519616. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:53:02,534][1652475] Updated weights for policy 0, policy_version 862270 (0.0015) [2024-06-15 22:53:05,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41514.6, 300 sec: 42542.8). Total num frames: 1765933056. Throughput: 0: 10535.8. Samples: 441553920. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:53:07,871][1652475] Updated weights for policy 0, policy_version 862352 (0.0016) [2024-06-15 22:53:09,893][1652475] Updated weights for policy 0, policy_version 862416 (0.0013) [2024-06-15 22:53:10,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 43144.5, 300 sec: 43431.5). Total num frames: 1766293504. Throughput: 0: 10615.4. Samples: 441612800. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:10,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:53:15,655][1652475] Updated weights for policy 0, policy_version 862496 (0.0012) [2024-06-15 22:53:15,738][1648984] Fps is (10 sec: 45876.1, 60 sec: 42598.4, 300 sec: 42876.1). Total num frames: 1766391808. Throughput: 0: 10444.8. Samples: 441681408. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:53:17,693][1652475] Updated weights for policy 0, policy_version 862530 (0.0013) [2024-06-15 22:53:19,611][1652475] Updated weights for policy 0, policy_version 862598 (0.0103) [2024-06-15 22:53:20,738][1648984] Fps is (10 sec: 39322.1, 60 sec: 43690.6, 300 sec: 42987.2). Total num frames: 1766686720. Throughput: 0: 10649.7. Samples: 441711616. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:53:23,865][1652475] Updated weights for policy 0, policy_version 862677 (0.0016) [2024-06-15 22:53:25,737][1648984] Fps is (10 sec: 45875.8, 60 sec: 41506.2, 300 sec: 42765.0). Total num frames: 1766850560. Throughput: 0: 10365.2. Samples: 441769984. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:53:28,728][1652475] Updated weights for policy 0, policy_version 862774 (0.0106) [2024-06-15 22:53:30,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1767079936. Throughput: 0: 10490.3. Samples: 441832448. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:53:31,059][1652475] Updated weights for policy 0, policy_version 862847 (0.0062) [2024-06-15 22:53:32,412][1652475] Updated weights for policy 0, policy_version 862906 (0.0011) [2024-06-15 22:53:35,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 39868.0, 300 sec: 43098.3). Total num frames: 1767243776. Throughput: 0: 10478.9. Samples: 441860608. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:53:38,803][1652475] Updated weights for policy 0, policy_version 862969 (0.0013) [2024-06-15 22:53:40,738][1648984] Fps is (10 sec: 36044.7, 60 sec: 42598.4, 300 sec: 42765.0). Total num frames: 1767440384. Throughput: 0: 10319.7. Samples: 441930752. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:53:41,157][1652475] Updated weights for policy 0, policy_version 863040 (0.0014) [2024-06-15 22:53:43,140][1652475] Updated weights for policy 0, policy_version 863091 (0.0012) [2024-06-15 22:53:44,456][1652475] Updated weights for policy 0, policy_version 863155 (0.0015) [2024-06-15 22:53:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42598.5, 300 sec: 43209.3). Total num frames: 1767768064. Throughput: 0: 10513.1. Samples: 441992704. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:45,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 22:53:49,606][1651340] Signal inference workers to stop experience collection... (44400 times) [2024-06-15 22:53:49,664][1652475] InferenceWorker_p0-w0: stopping experience collection (44400 times) [2024-06-15 22:53:49,960][1651340] Signal inference workers to resume experience collection... (44400 times) [2024-06-15 22:53:49,961][1652475] InferenceWorker_p0-w0: resuming experience collection (44400 times) [2024-06-15 22:53:50,738][1648984] Fps is (10 sec: 42598.8, 60 sec: 40960.0, 300 sec: 42876.1). Total num frames: 1767866368. Throughput: 0: 10626.9. Samples: 442032128. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:53:51,267][1652475] Updated weights for policy 0, policy_version 863248 (0.0013) [2024-06-15 22:53:54,334][1652475] Updated weights for policy 0, policy_version 863312 (0.0017) [2024-06-15 22:53:55,400][1652475] Updated weights for policy 0, policy_version 863360 (0.0011) [2024-06-15 22:53:55,738][1648984] Fps is (10 sec: 39320.9, 60 sec: 43690.7, 300 sec: 43098.2). Total num frames: 1768161280. Throughput: 0: 10683.7. Samples: 442093568. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:53:55,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:53:57,120][1652475] Updated weights for policy 0, policy_version 863416 (0.0011) [2024-06-15 22:54:00,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 41506.1, 300 sec: 42876.1). Total num frames: 1768292352. Throughput: 0: 10638.2. Samples: 442160128. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:54:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:54:02,920][1652475] Updated weights for policy 0, policy_version 863488 (0.0015) [2024-06-15 22:54:05,738][1648984] Fps is (10 sec: 39321.8, 60 sec: 43690.8, 300 sec: 42876.1). Total num frames: 1768554496. Throughput: 0: 10604.1. Samples: 442188800. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:54:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:54:06,423][1652475] Updated weights for policy 0, policy_version 863571 (0.0011) [2024-06-15 22:54:09,695][1652475] Updated weights for policy 0, policy_version 863648 (0.0013) [2024-06-15 22:54:10,466][1652475] Updated weights for policy 0, policy_version 863680 (0.0028) [2024-06-15 22:54:10,752][1648984] Fps is (10 sec: 52355.5, 60 sec: 42042.5, 300 sec: 43096.2). Total num frames: 1768816640. Throughput: 0: 10805.5. Samples: 442256384. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:54:10,798][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:54:15,002][1652475] Updated weights for policy 0, policy_version 863750 (0.0016) [2024-06-15 22:54:15,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 1769013248. Throughput: 0: 10808.9. Samples: 442318848. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:54:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:54:18,377][1652475] Updated weights for policy 0, policy_version 863824 (0.0013) [2024-06-15 22:54:19,447][1652475] Updated weights for policy 0, policy_version 863872 (0.0010) [2024-06-15 22:54:20,738][1648984] Fps is (10 sec: 39376.8, 60 sec: 42052.2, 300 sec: 42654.0). Total num frames: 1769209856. Throughput: 0: 10979.5. Samples: 442354688. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:54:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:54:22,015][1652475] Updated weights for policy 0, policy_version 863930 (0.0024) [2024-06-15 22:54:25,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 42598.3, 300 sec: 43320.4). Total num frames: 1769406464. Throughput: 0: 10922.7. Samples: 442422272. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:54:25,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 22:54:26,140][1652475] Updated weights for policy 0, policy_version 863991 (0.0011) [2024-06-15 22:54:30,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42052.3, 300 sec: 42542.9). Total num frames: 1769603072. Throughput: 0: 10956.8. Samples: 442485760. Policy #0 lag: (min: 15.0, avg: 72.9, max: 271.0) [2024-06-15 22:54:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:54:31,401][1652475] Updated weights for policy 0, policy_version 864067 (0.0157) [2024-06-15 22:54:32,845][1651340] Signal inference workers to stop experience collection... (44450 times) [2024-06-15 22:54:32,906][1652475] InferenceWorker_p0-w0: stopping experience collection (44450 times) [2024-06-15 22:54:33,185][1651340] Signal inference workers to resume experience collection... (44450 times) [2024-06-15 22:54:33,186][1652475] InferenceWorker_p0-w0: resuming experience collection (44450 times) [2024-06-15 22:54:33,630][1652475] Updated weights for policy 0, policy_version 864160 (0.0012) [2024-06-15 22:54:35,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 1769865216. Throughput: 0: 10706.5. Samples: 442513920. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:54:35,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:54:37,149][1652475] Updated weights for policy 0, policy_version 864211 (0.0011) [2024-06-15 22:54:39,008][1652475] Updated weights for policy 0, policy_version 864291 (0.0027) [2024-06-15 22:54:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 44783.0, 300 sec: 43098.3). Total num frames: 1770127360. Throughput: 0: 10740.6. Samples: 442576896. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:54:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:54:45,025][1652475] Updated weights for policy 0, policy_version 864368 (0.0017) [2024-06-15 22:54:45,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 41506.1, 300 sec: 42654.0). Total num frames: 1770258432. Throughput: 0: 10831.7. Samples: 442647552. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:54:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:54:46,566][1652475] Updated weights for policy 0, policy_version 864432 (0.0020) [2024-06-15 22:54:49,008][1652475] Updated weights for policy 0, policy_version 864480 (0.0012) [2024-06-15 22:54:50,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 45329.0, 300 sec: 43209.3). Total num frames: 1770586112. Throughput: 0: 10899.9. Samples: 442679296. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:54:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:54:51,088][1652475] Updated weights for policy 0, policy_version 864568 (0.0014) [2024-06-15 22:54:55,748][1648984] Fps is (10 sec: 39285.9, 60 sec: 41499.9, 300 sec: 42652.7). Total num frames: 1770651648. Throughput: 0: 10798.7. Samples: 442742272. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:54:55,754][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:54:55,762][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000864576_1770651648.pth... [2024-06-15 22:54:55,804][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000859648_1760559104.pth [2024-06-15 22:54:57,464][1652475] Updated weights for policy 0, policy_version 864624 (0.0020) [2024-06-15 22:54:59,338][1652475] Updated weights for policy 0, policy_version 864695 (0.0014) [2024-06-15 22:55:00,738][1648984] Fps is (10 sec: 32767.9, 60 sec: 43690.7, 300 sec: 42653.9). Total num frames: 1770913792. Throughput: 0: 10888.5. Samples: 442808832. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:00,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:55:00,843][1652475] Updated weights for policy 0, policy_version 864709 (0.0011) [2024-06-15 22:55:02,924][1652475] Updated weights for policy 0, policy_version 864800 (0.0153) [2024-06-15 22:55:05,738][1648984] Fps is (10 sec: 52476.2, 60 sec: 43690.7, 300 sec: 42765.0). Total num frames: 1771175936. Throughput: 0: 10626.8. Samples: 442832896. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:05,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:55:09,542][1652475] Updated weights for policy 0, policy_version 864886 (0.0018) [2024-06-15 22:55:10,415][1652475] Updated weights for policy 0, policy_version 864917 (0.0013) [2024-06-15 22:55:10,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42608.3, 300 sec: 42876.1). Total num frames: 1771372544. Throughput: 0: 10683.7. Samples: 442903040. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:10,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:55:14,548][1652475] Updated weights for policy 0, policy_version 865023 (0.0013) [2024-06-15 22:55:15,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.3, 300 sec: 42653.9). Total num frames: 1771569152. Throughput: 0: 10558.6. Samples: 442960896. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:15,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:55:17,218][1652475] Updated weights for policy 0, policy_version 865088 (0.0011) [2024-06-15 22:55:20,659][1651340] Signal inference workers to stop experience collection... (44500 times) [2024-06-15 22:55:20,703][1652475] InferenceWorker_p0-w0: stopping experience collection (44500 times) [2024-06-15 22:55:20,738][1648984] Fps is (10 sec: 32768.2, 60 sec: 41506.1, 300 sec: 42431.8). Total num frames: 1771700224. Throughput: 0: 10615.5. Samples: 442991616. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:55:20,905][1651340] Signal inference workers to resume experience collection... (44500 times) [2024-06-15 22:55:20,905][1652475] InferenceWorker_p0-w0: resuming experience collection (44500 times) [2024-06-15 22:55:22,477][1652475] Updated weights for policy 0, policy_version 865156 (0.0014) [2024-06-15 22:55:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 42598.4, 300 sec: 42320.7). Total num frames: 1771962368. Throughput: 0: 10524.4. Samples: 443050496. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:55:26,128][1652475] Updated weights for policy 0, policy_version 865235 (0.0012) [2024-06-15 22:55:26,978][1652475] Updated weights for policy 0, policy_version 865280 (0.0018) [2024-06-15 22:55:30,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 42598.3, 300 sec: 42431.8). Total num frames: 1772158976. Throughput: 0: 10513.0. Samples: 443120640. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:30,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:55:31,135][1652475] Updated weights for policy 0, policy_version 865344 (0.0239) [2024-06-15 22:55:34,442][1652475] Updated weights for policy 0, policy_version 865408 (0.0017) [2024-06-15 22:55:35,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 42598.5, 300 sec: 42320.7). Total num frames: 1772421120. Throughput: 0: 10535.8. Samples: 443153408. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:35,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:55:39,142][1652475] Updated weights for policy 0, policy_version 865473 (0.0013) [2024-06-15 22:55:40,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 41506.1, 300 sec: 42653.9). Total num frames: 1772617728. Throughput: 0: 10526.6. Samples: 443215872. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:55:42,986][1652475] Updated weights for policy 0, policy_version 865552 (0.0011) [2024-06-15 22:55:45,729][1652475] Updated weights for policy 0, policy_version 865632 (0.0162) [2024-06-15 22:55:45,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 42598.4, 300 sec: 42431.8). Total num frames: 1772814336. Throughput: 0: 10331.0. Samples: 443273728. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:55:47,756][1652475] Updated weights for policy 0, policy_version 865717 (0.0011) [2024-06-15 22:55:50,738][1648984] Fps is (10 sec: 39322.0, 60 sec: 40413.9, 300 sec: 42209.6). Total num frames: 1773010944. Throughput: 0: 10240.0. Samples: 443293696. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:55:55,438][1652475] Updated weights for policy 0, policy_version 865792 (0.0021) [2024-06-15 22:55:55,738][1648984] Fps is (10 sec: 32767.8, 60 sec: 41512.4, 300 sec: 42209.6). Total num frames: 1773142016. Throughput: 0: 10433.4. Samples: 443372544. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:55:55,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:55:57,765][1652475] Updated weights for policy 0, policy_version 865862 (0.0015) [2024-06-15 22:55:59,306][1652475] Updated weights for policy 0, policy_version 865936 (0.0138) [2024-06-15 22:56:00,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 42654.0). Total num frames: 1773535232. Throughput: 0: 10092.1. Samples: 443415040. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:56:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:56:05,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 39321.6, 300 sec: 42209.6). Total num frames: 1773535232. Throughput: 0: 10319.6. Samples: 443456000. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:56:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:56:07,591][1652475] Updated weights for policy 0, policy_version 866020 (0.0012) [2024-06-15 22:56:09,164][1651340] Signal inference workers to stop experience collection... (44550 times) [2024-06-15 22:56:09,300][1651340] Signal inference workers to resume experience collection... (44550 times) [2024-06-15 22:56:09,321][1652475] InferenceWorker_p0-w0: stopping experience collection (44550 times) [2024-06-15 22:56:09,359][1652475] InferenceWorker_p0-w0: resuming experience collection (44550 times) [2024-06-15 22:56:09,680][1652475] Updated weights for policy 0, policy_version 866096 (0.0012) [2024-06-15 22:56:10,720][1652475] Updated weights for policy 0, policy_version 866144 (0.0012) [2024-06-15 22:56:10,738][1648984] Fps is (10 sec: 32766.8, 60 sec: 41506.0, 300 sec: 42209.6). Total num frames: 1773862912. Throughput: 0: 10376.5. Samples: 443517440. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:56:10,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:56:11,904][1652475] Updated weights for policy 0, policy_version 866192 (0.0018) [2024-06-15 22:56:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1774059520. Throughput: 0: 10228.6. Samples: 443580928. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:56:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:56:18,603][1652475] Updated weights for policy 0, policy_version 866244 (0.0011) [2024-06-15 22:56:20,631][1652475] Updated weights for policy 0, policy_version 866312 (0.0099) [2024-06-15 22:56:20,738][1648984] Fps is (10 sec: 36045.9, 60 sec: 42052.3, 300 sec: 41987.5). Total num frames: 1774223360. Throughput: 0: 10274.1. Samples: 443615744. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:56:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:56:21,713][1652475] Updated weights for policy 0, policy_version 866365 (0.0013) [2024-06-15 22:56:23,067][1652475] Updated weights for policy 0, policy_version 866427 (0.0012) [2024-06-15 22:56:25,739][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1774452736. Throughput: 0: 10228.6. Samples: 443676160. Policy #0 lag: (min: 47.0, avg: 143.7, max: 271.0) [2024-06-15 22:56:25,741][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:56:26,973][1652475] Updated weights for policy 0, policy_version 866480 (0.0012) [2024-06-15 22:56:30,707][1652475] Updated weights for policy 0, policy_version 866557 (0.0128) [2024-06-15 22:56:30,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 1774682112. Throughput: 0: 10433.4. Samples: 443743232. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:56:30,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 22:56:33,305][1652475] Updated weights for policy 0, policy_version 866624 (0.0012) [2024-06-15 22:56:34,761][1652475] Updated weights for policy 0, policy_version 866683 (0.0016) [2024-06-15 22:56:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 42598.4, 300 sec: 42653.9). Total num frames: 1774977024. Throughput: 0: 10683.7. Samples: 443774464. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:56:35,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:56:39,803][1652475] Updated weights for policy 0, policy_version 866746 (0.0014) [2024-06-15 22:56:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 42209.6). Total num frames: 1775108096. Throughput: 0: 10456.2. Samples: 443843072. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:56:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:56:42,367][1652475] Updated weights for policy 0, policy_version 866787 (0.0044) [2024-06-15 22:56:43,656][1652475] Updated weights for policy 0, policy_version 866832 (0.0012) [2024-06-15 22:56:45,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 42598.3, 300 sec: 42209.6). Total num frames: 1775370240. Throughput: 0: 11025.0. Samples: 443911168. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:56:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:56:45,838][1652475] Updated weights for policy 0, policy_version 866884 (0.0012) [2024-06-15 22:56:47,138][1652475] Updated weights for policy 0, policy_version 866942 (0.0041) [2024-06-15 22:56:50,265][1652475] Updated weights for policy 0, policy_version 866998 (0.0010) [2024-06-15 22:56:50,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 1775632384. Throughput: 0: 10922.7. Samples: 443947520. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:56:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:56:53,630][1652475] Updated weights for policy 0, policy_version 867066 (0.0011) [2024-06-15 22:56:54,774][1651340] Signal inference workers to stop experience collection... (44600 times) [2024-06-15 22:56:54,796][1652475] InferenceWorker_p0-w0: stopping experience collection (44600 times) [2024-06-15 22:56:54,953][1651340] Signal inference workers to resume experience collection... (44600 times) [2024-06-15 22:56:54,954][1652475] InferenceWorker_p0-w0: resuming experience collection (44600 times) [2024-06-15 22:56:55,598][1652475] Updated weights for policy 0, policy_version 867120 (0.0012) [2024-06-15 22:56:55,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 45329.0, 300 sec: 42542.9). Total num frames: 1775861760. Throughput: 0: 10934.1. Samples: 444009472. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:56:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:56:55,818][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000867136_1775894528.pth... [2024-06-15 22:56:55,883][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000862096_1765572608.pth [2024-06-15 22:57:00,166][1652475] Updated weights for policy 0, policy_version 867196 (0.0012) [2024-06-15 22:57:00,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 41506.1, 300 sec: 42655.8). Total num frames: 1776025600. Throughput: 0: 11059.2. Samples: 444078592. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:00,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 22:57:02,613][1652475] Updated weights for policy 0, policy_version 867258 (0.0122) [2024-06-15 22:57:05,438][1652475] Updated weights for policy 0, policy_version 867328 (0.0013) [2024-06-15 22:57:05,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 45875.3, 300 sec: 42653.9). Total num frames: 1776287744. Throughput: 0: 10968.2. Samples: 444109312. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:05,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:57:07,402][1652475] Updated weights for policy 0, policy_version 867389 (0.0011) [2024-06-15 22:57:10,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 42598.6, 300 sec: 42653.9). Total num frames: 1776418816. Throughput: 0: 11150.2. Samples: 444177920. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:57:12,802][1652475] Updated weights for policy 0, policy_version 867445 (0.0013) [2024-06-15 22:57:14,511][1652475] Updated weights for policy 0, policy_version 867511 (0.0013) [2024-06-15 22:57:15,739][1648984] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 1776680960. Throughput: 0: 11138.8. Samples: 444244480. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:15,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:57:16,944][1652475] Updated weights for policy 0, policy_version 867582 (0.0114) [2024-06-15 22:57:18,613][1652475] Updated weights for policy 0, policy_version 867638 (0.0012) [2024-06-15 22:57:20,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 45329.0, 300 sec: 42653.9). Total num frames: 1776943104. Throughput: 0: 11150.2. Samples: 444276224. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:57:23,991][1652475] Updated weights for policy 0, policy_version 867696 (0.0011) [2024-06-15 22:57:25,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 42876.1). Total num frames: 1777139712. Throughput: 0: 11377.8. Samples: 444355072. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:57:25,957][1652475] Updated weights for policy 0, policy_version 867766 (0.0012) [2024-06-15 22:57:29,077][1652475] Updated weights for policy 0, policy_version 867824 (0.0010) [2024-06-15 22:57:30,738][1648984] Fps is (10 sec: 45875.7, 60 sec: 45329.1, 300 sec: 42542.9). Total num frames: 1777401856. Throughput: 0: 11138.9. Samples: 444412416. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 22:57:34,387][1652475] Updated weights for policy 0, policy_version 867907 (0.0030) [2024-06-15 22:57:35,738][1648984] Fps is (10 sec: 42598.1, 60 sec: 43144.5, 300 sec: 42987.2). Total num frames: 1777565696. Throughput: 0: 11138.8. Samples: 444448768. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:35,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:57:35,868][1652475] Updated weights for policy 0, policy_version 867968 (0.0018) [2024-06-15 22:57:39,986][1652475] Updated weights for policy 0, policy_version 868048 (0.0236) [2024-06-15 22:57:40,230][1651340] Signal inference workers to stop experience collection... (44650 times) [2024-06-15 22:57:40,302][1652475] InferenceWorker_p0-w0: stopping experience collection (44650 times) [2024-06-15 22:57:40,547][1651340] Signal inference workers to resume experience collection... (44650 times) [2024-06-15 22:57:40,548][1652475] InferenceWorker_p0-w0: resuming experience collection (44650 times) [2024-06-15 22:57:40,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 44782.9, 300 sec: 42653.9). Total num frames: 1777795072. Throughput: 0: 11320.9. Samples: 444518912. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:40,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 22:57:41,857][1652475] Updated weights for policy 0, policy_version 868097 (0.0013) [2024-06-15 22:57:45,451][1652475] Updated weights for policy 0, policy_version 868164 (0.0068) [2024-06-15 22:57:45,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 1778024448. Throughput: 0: 11286.7. Samples: 444586496. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 22:57:46,904][1652475] Updated weights for policy 0, policy_version 868224 (0.0012) [2024-06-15 22:57:50,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 43690.6, 300 sec: 43098.3). Total num frames: 1778253824. Throughput: 0: 11093.3. Samples: 444608512. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:50,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:57:52,281][1652475] Updated weights for policy 0, policy_version 868320 (0.0012) [2024-06-15 22:57:54,760][1652475] Updated weights for policy 0, policy_version 868374 (0.0012) [2024-06-15 22:57:55,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 44236.8, 300 sec: 43098.2). Total num frames: 1778515968. Throughput: 0: 11298.1. Samples: 444686336. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:57:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 22:57:56,959][1652475] Updated weights for policy 0, policy_version 868448 (0.0013) [2024-06-15 22:57:58,539][1652475] Updated weights for policy 0, policy_version 868512 (0.0011) [2024-06-15 22:58:00,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 43542.6). Total num frames: 1778778112. Throughput: 0: 11275.4. Samples: 444751872. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:58:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 22:58:05,205][1652475] Updated weights for policy 0, policy_version 868608 (0.0013) [2024-06-15 22:58:05,738][1648984] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 42876.1). Total num frames: 1778941952. Throughput: 0: 11559.8. Samples: 444796416. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:58:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 22:58:06,612][1652475] Updated weights for policy 0, policy_version 868672 (0.0010) [2024-06-15 22:58:08,703][1652475] Updated weights for policy 0, policy_version 868738 (0.0021) [2024-06-15 22:58:10,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 43764.7). Total num frames: 1779302400. Throughput: 0: 11104.7. Samples: 444854784. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:58:10,741][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:58:15,738][1648984] Fps is (10 sec: 42598.2, 60 sec: 44783.0, 300 sec: 42987.2). Total num frames: 1779367936. Throughput: 0: 11730.5. Samples: 444940288. Policy #0 lag: (min: 15.0, avg: 144.2, max: 271.0) [2024-06-15 22:58:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 22:58:15,832][1652475] Updated weights for policy 0, policy_version 868833 (0.0012) [2024-06-15 22:58:17,932][1652475] Updated weights for policy 0, policy_version 868917 (0.0111) [2024-06-15 22:58:19,423][1652475] Updated weights for policy 0, policy_version 868960 (0.0013) [2024-06-15 22:58:20,112][1651340] Signal inference workers to stop experience collection... (44700 times) [2024-06-15 22:58:20,170][1652475] InferenceWorker_p0-w0: stopping experience collection (44700 times) [2024-06-15 22:58:20,277][1651340] Signal inference workers to resume experience collection... (44700 times) [2024-06-15 22:58:20,278][1652475] InferenceWorker_p0-w0: resuming experience collection (44700 times) [2024-06-15 22:58:20,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 46967.5, 300 sec: 43764.7). Total num frames: 1779761152. Throughput: 0: 11434.7. Samples: 444963328. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:58:20,766][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 22:58:20,950][1652475] Updated weights for policy 0, policy_version 869042 (0.0083) [2024-06-15 22:58:25,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 45875.3, 300 sec: 43431.5). Total num frames: 1779892224. Throughput: 0: 11628.1. Samples: 445042176. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:58:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:58:26,062][1652475] Updated weights for policy 0, policy_version 869104 (0.0014) [2024-06-15 22:58:28,970][1652475] Updated weights for policy 0, policy_version 869179 (0.0014) [2024-06-15 22:58:30,693][1652475] Updated weights for policy 0, policy_version 869232 (0.0014) [2024-06-15 22:58:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 43875.8). Total num frames: 1780187136. Throughput: 0: 11616.7. Samples: 445109248. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:58:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 22:58:32,634][1652475] Updated weights for policy 0, policy_version 869281 (0.0040) [2024-06-15 22:58:35,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 46421.5, 300 sec: 43764.7). Total num frames: 1780350976. Throughput: 0: 11867.0. Samples: 445142528. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:58:35,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:58:36,418][1652475] Updated weights for policy 0, policy_version 869344 (0.0011) [2024-06-15 22:58:38,748][1652475] Updated weights for policy 0, policy_version 869408 (0.0017) [2024-06-15 22:58:40,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 43764.7). Total num frames: 1780678656. Throughput: 0: 11776.0. Samples: 445216256. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:58:40,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 22:58:40,842][1652475] Updated weights for policy 0, policy_version 869474 (0.0012) [2024-06-15 22:58:44,026][1652475] Updated weights for policy 0, policy_version 869525 (0.0013) [2024-06-15 22:58:44,770][1652475] Updated weights for policy 0, policy_version 869562 (0.0012) [2024-06-15 22:58:45,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 47513.6, 300 sec: 44097.9). Total num frames: 1780875264. Throughput: 0: 12014.9. Samples: 445292544. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:58:45,738][1648984] Avg episode reward: [(0, '-0.530')] [2024-06-15 22:58:47,372][1652475] Updated weights for policy 0, policy_version 869601 (0.0011) [2024-06-15 22:58:48,969][1652475] Updated weights for policy 0, policy_version 869664 (0.0017) [2024-06-15 22:58:50,739][1648984] Fps is (10 sec: 45868.9, 60 sec: 48058.6, 300 sec: 43986.7). Total num frames: 1781137408. Throughput: 0: 11855.3. Samples: 445329920. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:58:50,740][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 22:58:51,435][1652475] Updated weights for policy 0, policy_version 869730 (0.0025) [2024-06-15 22:58:55,209][1652475] Updated weights for policy 0, policy_version 869795 (0.0013) [2024-06-15 22:58:55,738][1648984] Fps is (10 sec: 49150.5, 60 sec: 47513.4, 300 sec: 44320.1). Total num frames: 1781366784. Throughput: 0: 12219.7. Samples: 445404672. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:58:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 22:58:55,806][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000869824_1781399552.pth... [2024-06-15 22:58:55,847][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000864576_1770651648.pth [2024-06-15 22:58:57,044][1652475] Updated weights for policy 0, policy_version 869825 (0.0012) [2024-06-15 22:58:58,173][1652475] Updated weights for policy 0, policy_version 869883 (0.0012) [2024-06-15 22:58:59,988][1652475] Updated weights for policy 0, policy_version 869944 (0.0011) [2024-06-15 22:59:00,738][1648984] Fps is (10 sec: 52436.2, 60 sec: 48059.8, 300 sec: 44431.2). Total num frames: 1781661696. Throughput: 0: 11776.0. Samples: 445470208. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:00,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 22:59:04,549][1652475] Updated weights for policy 0, policy_version 870000 (0.0018) [2024-06-15 22:59:05,508][1651340] Signal inference workers to stop experience collection... (44750 times) [2024-06-15 22:59:05,544][1652475] InferenceWorker_p0-w0: stopping experience collection (44750 times) [2024-06-15 22:59:05,738][1648984] Fps is (10 sec: 45876.5, 60 sec: 48059.7, 300 sec: 44100.0). Total num frames: 1781825536. Throughput: 0: 12231.1. Samples: 445513728. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:05,741][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 22:59:05,823][1651340] Signal inference workers to resume experience collection... (44750 times) [2024-06-15 22:59:05,823][1652475] InferenceWorker_p0-w0: resuming experience collection (44750 times) [2024-06-15 22:59:06,452][1652475] Updated weights for policy 0, policy_version 870072 (0.0012) [2024-06-15 22:59:09,105][1652475] Updated weights for policy 0, policy_version 870112 (0.0012) [2024-06-15 22:59:10,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 46421.3, 300 sec: 44320.1). Total num frames: 1782087680. Throughput: 0: 11776.0. Samples: 445572096. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 22:59:11,275][1652475] Updated weights for policy 0, policy_version 870178 (0.0011) [2024-06-15 22:59:15,738][1648984] Fps is (10 sec: 39321.7, 60 sec: 47513.6, 300 sec: 44098.0). Total num frames: 1782218752. Throughput: 0: 11958.0. Samples: 445647360. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:15,746][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 22:59:16,571][1652475] Updated weights for policy 0, policy_version 870260 (0.0011) [2024-06-15 22:59:17,720][1652475] Updated weights for policy 0, policy_version 870305 (0.0011) [2024-06-15 22:59:19,541][1652475] Updated weights for policy 0, policy_version 870353 (0.0014) [2024-06-15 22:59:20,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 46967.4, 300 sec: 44653.3). Total num frames: 1782579200. Throughput: 0: 11889.7. Samples: 445677568. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 22:59:21,786][1652475] Updated weights for policy 0, policy_version 870419 (0.0014) [2024-06-15 22:59:25,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 46967.4, 300 sec: 44431.2). Total num frames: 1782710272. Throughput: 0: 11855.6. Samples: 445749760. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:25,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 22:59:26,941][1652475] Updated weights for policy 0, policy_version 870482 (0.0012) [2024-06-15 22:59:28,625][1652475] Updated weights for policy 0, policy_version 870544 (0.0013) [2024-06-15 22:59:30,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 46421.3, 300 sec: 44431.2). Total num frames: 1782972416. Throughput: 0: 11719.1. Samples: 445819904. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:30,738][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 22:59:30,773][1652475] Updated weights for policy 0, policy_version 870608 (0.0011) [2024-06-15 22:59:31,706][1652475] Updated weights for policy 0, policy_version 870655 (0.0056) [2024-06-15 22:59:33,349][1652475] Updated weights for policy 0, policy_version 870711 (0.0024) [2024-06-15 22:59:35,738][1648984] Fps is (10 sec: 52429.6, 60 sec: 48059.8, 300 sec: 44431.2). Total num frames: 1783234560. Throughput: 0: 11526.1. Samples: 445848576. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:59:38,744][1652475] Updated weights for policy 0, policy_version 870768 (0.0011) [2024-06-15 22:59:40,502][1652475] Updated weights for policy 0, policy_version 870832 (0.0076) [2024-06-15 22:59:40,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 46421.4, 300 sec: 44764.4). Total num frames: 1783463936. Throughput: 0: 11571.3. Samples: 445925376. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 22:59:42,682][1652475] Updated weights for policy 0, policy_version 870869 (0.0010) [2024-06-15 22:59:44,113][1652475] Updated weights for policy 0, policy_version 870928 (0.0014) [2024-06-15 22:59:45,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 44653.3). Total num frames: 1783758848. Throughput: 0: 11480.2. Samples: 445986816. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:45,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 22:59:49,178][1652475] Updated weights for policy 0, policy_version 870994 (0.0012) [2024-06-15 22:59:50,137][1652475] Updated weights for policy 0, policy_version 871042 (0.0012) [2024-06-15 22:59:50,738][1648984] Fps is (10 sec: 49151.6, 60 sec: 46968.5, 300 sec: 45099.0). Total num frames: 1783955456. Throughput: 0: 11480.2. Samples: 446030336. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 22:59:50,818][1651340] Signal inference workers to stop experience collection... (44800 times) [2024-06-15 22:59:50,841][1652475] InferenceWorker_p0-w0: stopping experience collection (44800 times) [2024-06-15 22:59:51,140][1651340] Signal inference workers to resume experience collection... (44800 times) [2024-06-15 22:59:51,141][1652475] InferenceWorker_p0-w0: resuming experience collection (44800 times) [2024-06-15 22:59:51,439][1652475] Updated weights for policy 0, policy_version 871102 (0.0012) [2024-06-15 22:59:53,897][1652475] Updated weights for policy 0, policy_version 871154 (0.0014) [2024-06-15 22:59:55,738][1648984] Fps is (10 sec: 39321.4, 60 sec: 46421.5, 300 sec: 44875.5). Total num frames: 1784152064. Throughput: 0: 11628.1. Samples: 446095360. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 22:59:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 22:59:56,738][1652475] Updated weights for policy 0, policy_version 871203 (0.0011) [2024-06-15 22:59:59,476][1652475] Updated weights for policy 0, policy_version 871239 (0.0012) [2024-06-15 23:00:00,583][1652475] Updated weights for policy 0, policy_version 871292 (0.0013) [2024-06-15 23:00:00,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 45875.2, 300 sec: 44875.5). Total num frames: 1784414208. Throughput: 0: 11707.7. Samples: 446174208. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 23:00:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:00:02,075][1652475] Updated weights for policy 0, policy_version 871360 (0.0012) [2024-06-15 23:00:05,675][1652475] Updated weights for policy 0, policy_version 871421 (0.0124) [2024-06-15 23:00:05,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 47513.7, 300 sec: 45097.7). Total num frames: 1784676352. Throughput: 0: 11764.7. Samples: 446206976. Policy #0 lag: (min: 95.0, avg: 175.9, max: 383.0) [2024-06-15 23:00:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:00:08,343][1652475] Updated weights for policy 0, policy_version 871480 (0.0012) [2024-06-15 23:00:10,738][1648984] Fps is (10 sec: 45875.2, 60 sec: 46421.4, 300 sec: 45097.7). Total num frames: 1784872960. Throughput: 0: 11764.6. Samples: 446279168. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:00:10,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:00:11,541][1652475] Updated weights for policy 0, policy_version 871555 (0.0095) [2024-06-15 23:00:12,811][1652475] Updated weights for policy 0, policy_version 871612 (0.0011) [2024-06-15 23:00:15,738][1648984] Fps is (10 sec: 39320.3, 60 sec: 47513.4, 300 sec: 45319.8). Total num frames: 1785069568. Throughput: 0: 11753.2. Samples: 446348800. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:00:15,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:00:18,086][1652475] Updated weights for policy 0, policy_version 871664 (0.0015) [2024-06-15 23:00:19,300][1652475] Updated weights for policy 0, policy_version 871712 (0.0018) [2024-06-15 23:00:20,738][1648984] Fps is (10 sec: 45874.8, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 1785331712. Throughput: 0: 11980.8. Samples: 446387712. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:00:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:00:21,134][1652475] Updated weights for policy 0, policy_version 871760 (0.0013) [2024-06-15 23:00:22,555][1652475] Updated weights for policy 0, policy_version 871810 (0.0015) [2024-06-15 23:00:23,903][1652475] Updated weights for policy 0, policy_version 871872 (0.0136) [2024-06-15 23:00:25,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 48059.5, 300 sec: 45541.9). Total num frames: 1785593856. Throughput: 0: 11605.2. Samples: 446447616. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:00:25,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:00:29,572][1652475] Updated weights for policy 0, policy_version 871924 (0.0011) [2024-06-15 23:00:30,738][1648984] Fps is (10 sec: 49152.5, 60 sec: 47513.7, 300 sec: 45430.9). Total num frames: 1785823232. Throughput: 0: 11878.4. Samples: 446521344. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:00:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:00:30,796][1652475] Updated weights for policy 0, policy_version 872000 (0.0012) [2024-06-15 23:00:33,447][1651340] Signal inference workers to stop experience collection... (44850 times) [2024-06-15 23:00:33,483][1652475] InferenceWorker_p0-w0: stopping experience collection (44850 times) [2024-06-15 23:00:33,834][1651340] Signal inference workers to resume experience collection... (44850 times) [2024-06-15 23:00:33,835][1652475] InferenceWorker_p0-w0: resuming experience collection (44850 times) [2024-06-15 23:00:34,003][1652475] Updated weights for policy 0, policy_version 872049 (0.0013) [2024-06-15 23:00:35,738][1648984] Fps is (10 sec: 52430.7, 60 sec: 48059.7, 300 sec: 45764.1). Total num frames: 1786118144. Throughput: 0: 11810.1. Samples: 446561792. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:00:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:00:39,472][1652475] Updated weights for policy 0, policy_version 872146 (0.0012) [2024-06-15 23:00:40,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 45542.0). Total num frames: 1786249216. Throughput: 0: 11889.8. Samples: 446630400. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:00:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:00:41,441][1652475] Updated weights for policy 0, policy_version 872211 (0.0013) [2024-06-15 23:00:43,669][1652475] Updated weights for policy 0, policy_version 872261 (0.0013) [2024-06-15 23:00:44,833][1652475] Updated weights for policy 0, policy_version 872319 (0.0013) [2024-06-15 23:00:45,738][1648984] Fps is (10 sec: 39321.2, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 1786511360. Throughput: 0: 11571.2. Samples: 446694912. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:00:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:00:46,703][1652475] Updated weights for policy 0, policy_version 872374 (0.0013) [2024-06-15 23:00:50,284][1652475] Updated weights for policy 0, policy_version 872434 (0.0011) [2024-06-15 23:00:50,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 1786773504. Throughput: 0: 11741.9. Samples: 446735360. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:00:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:00:52,338][1652475] Updated weights for policy 0, policy_version 872496 (0.0012) [2024-06-15 23:00:55,375][1652475] Updated weights for policy 0, policy_version 872560 (0.0012) [2024-06-15 23:00:55,738][1648984] Fps is (10 sec: 52427.9, 60 sec: 48059.6, 300 sec: 45764.1). Total num frames: 1787035648. Throughput: 0: 11821.4. Samples: 446811136. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:00:55,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:00:55,748][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000872576_1787035648.pth... [2024-06-15 23:00:55,816][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000867136_1775894528.pth [2024-06-15 23:00:56,906][1652475] Updated weights for policy 0, policy_version 872594 (0.0012) [2024-06-15 23:00:57,928][1652475] Updated weights for policy 0, policy_version 872639 (0.0011) [2024-06-15 23:01:00,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 1787232256. Throughput: 0: 11901.2. Samples: 446884352. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:01:01,013][1652475] Updated weights for policy 0, policy_version 872701 (0.0025) [2024-06-15 23:01:03,141][1652475] Updated weights for policy 0, policy_version 872759 (0.0092) [2024-06-15 23:01:05,738][1648984] Fps is (10 sec: 39322.3, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 1787428864. Throughput: 0: 11753.2. Samples: 446916608. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:01:07,071][1652475] Updated weights for policy 0, policy_version 872816 (0.0012) [2024-06-15 23:01:10,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 1787691008. Throughput: 0: 11787.4. Samples: 446978048. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:01:11,388][1652475] Updated weights for policy 0, policy_version 872898 (0.0111) [2024-06-15 23:01:14,006][1652475] Updated weights for policy 0, policy_version 872979 (0.0012) [2024-06-15 23:01:15,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 48059.9, 300 sec: 46541.7). Total num frames: 1787953152. Throughput: 0: 11719.1. Samples: 447048704. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:15,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 23:01:18,107][1652475] Updated weights for policy 0, policy_version 873027 (0.0009) [2024-06-15 23:01:18,827][1651340] Signal inference workers to stop experience collection... (44900 times) [2024-06-15 23:01:18,903][1652475] InferenceWorker_p0-w0: stopping experience collection (44900 times) [2024-06-15 23:01:19,120][1651340] Signal inference workers to resume experience collection... (44900 times) [2024-06-15 23:01:19,121][1652475] InferenceWorker_p0-w0: resuming experience collection (44900 times) [2024-06-15 23:01:19,684][1652475] Updated weights for policy 0, policy_version 873090 (0.0011) [2024-06-15 23:01:20,599][1652475] Updated weights for policy 0, policy_version 873142 (0.0012) [2024-06-15 23:01:20,738][1648984] Fps is (10 sec: 49152.4, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 1788182528. Throughput: 0: 11821.5. Samples: 447093760. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:01:22,587][1652475] Updated weights for policy 0, policy_version 873184 (0.0011) [2024-06-15 23:01:25,517][1652475] Updated weights for policy 0, policy_version 873268 (0.0016) [2024-06-15 23:01:25,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 1788477440. Throughput: 0: 11810.1. Samples: 447161856. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:01:29,313][1652475] Updated weights for policy 0, policy_version 873303 (0.0010) [2024-06-15 23:01:30,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1788608512. Throughput: 0: 12049.1. Samples: 447237120. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:30,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:01:31,980][1652475] Updated weights for policy 0, policy_version 873392 (0.0012) [2024-06-15 23:01:33,617][1652475] Updated weights for policy 0, policy_version 873465 (0.0011) [2024-06-15 23:01:35,738][1648984] Fps is (10 sec: 49153.2, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1788968960. Throughput: 0: 11753.3. Samples: 447264256. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:01:35,859][1652475] Updated weights for policy 0, policy_version 873533 (0.0012) [2024-06-15 23:01:40,407][1652475] Updated weights for policy 0, policy_version 873599 (0.0012) [2024-06-15 23:01:40,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 1789132800. Throughput: 0: 11935.4. Samples: 447348224. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:01:43,207][1652475] Updated weights for policy 0, policy_version 873657 (0.0015) [2024-06-15 23:01:44,742][1652475] Updated weights for policy 0, policy_version 873718 (0.0011) [2024-06-15 23:01:45,635][1652475] Updated weights for policy 0, policy_version 873760 (0.0011) [2024-06-15 23:01:45,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 49152.1, 300 sec: 46874.9). Total num frames: 1789460480. Throughput: 0: 11798.8. Samples: 447415296. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:01:49,727][1652475] Updated weights for policy 0, policy_version 873826 (0.0015) [2024-06-15 23:01:50,217][1652475] Updated weights for policy 0, policy_version 873856 (0.0012) [2024-06-15 23:01:50,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 1789657088. Throughput: 0: 11946.7. Samples: 447454208. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:01:55,049][1652475] Updated weights for policy 0, policy_version 873922 (0.0015) [2024-06-15 23:01:55,738][1648984] Fps is (10 sec: 39321.5, 60 sec: 46967.7, 300 sec: 46874.9). Total num frames: 1789853696. Throughput: 0: 12219.8. Samples: 447527936. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:01:55,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:01:56,783][1652475] Updated weights for policy 0, policy_version 874000 (0.0012) [2024-06-15 23:01:57,945][1652475] Updated weights for policy 0, policy_version 874047 (0.0078) [2024-06-15 23:01:59,752][1651340] Signal inference workers to stop experience collection... (44950 times) [2024-06-15 23:01:59,780][1652475] InferenceWorker_p0-w0: stopping experience collection (44950 times) [2024-06-15 23:01:59,918][1651340] Signal inference workers to resume experience collection... (44950 times) [2024-06-15 23:01:59,919][1652475] InferenceWorker_p0-w0: resuming experience collection (44950 times) [2024-06-15 23:02:00,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 46986.0). Total num frames: 1790148608. Throughput: 0: 12140.1. Samples: 447595008. Policy #0 lag: (min: 15.0, avg: 118.5, max: 271.0) [2024-06-15 23:02:00,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 23:02:03,893][1652475] Updated weights for policy 0, policy_version 874114 (0.0209) [2024-06-15 23:02:05,037][1652475] Updated weights for policy 0, policy_version 874173 (0.0014) [2024-06-15 23:02:05,738][1648984] Fps is (10 sec: 45873.9, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1790312448. Throughput: 0: 12003.5. Samples: 447633920. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:05,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:02:07,667][1652475] Updated weights for policy 0, policy_version 874233 (0.0021) [2024-06-15 23:02:09,046][1652475] Updated weights for policy 0, policy_version 874275 (0.0012) [2024-06-15 23:02:10,738][1648984] Fps is (10 sec: 42595.8, 60 sec: 48059.3, 300 sec: 47097.0). Total num frames: 1790574592. Throughput: 0: 11878.3. Samples: 447696384. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:10,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:02:11,604][1652475] Updated weights for policy 0, policy_version 874358 (0.0015) [2024-06-15 23:02:15,124][1652475] Updated weights for policy 0, policy_version 874392 (0.0019) [2024-06-15 23:02:15,738][1648984] Fps is (10 sec: 49152.9, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 1790803968. Throughput: 0: 11878.4. Samples: 447771648. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:15,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:02:18,871][1652475] Updated weights for policy 0, policy_version 874448 (0.0014) [2024-06-15 23:02:20,738][1648984] Fps is (10 sec: 45877.1, 60 sec: 47513.5, 300 sec: 47097.0). Total num frames: 1791033344. Throughput: 0: 12196.9. Samples: 447813120. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:02:20,885][1652475] Updated weights for policy 0, policy_version 874532 (0.0011) [2024-06-15 23:02:22,372][1652475] Updated weights for policy 0, policy_version 874608 (0.0012) [2024-06-15 23:02:22,806][1652475] Updated weights for policy 0, policy_version 874624 (0.0017) [2024-06-15 23:02:25,738][1648984] Fps is (10 sec: 45875.8, 60 sec: 46421.5, 300 sec: 46986.0). Total num frames: 1791262720. Throughput: 0: 11639.5. Samples: 447872000. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:25,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:02:26,697][1652475] Updated weights for policy 0, policy_version 874687 (0.0020) [2024-06-15 23:02:30,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 48059.7, 300 sec: 47208.2). Total num frames: 1791492096. Throughput: 0: 11901.1. Samples: 447950848. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:30,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:02:31,731][1652475] Updated weights for policy 0, policy_version 874754 (0.0011) [2024-06-15 23:02:33,204][1652475] Updated weights for policy 0, policy_version 874816 (0.0011) [2024-06-15 23:02:34,731][1652475] Updated weights for policy 0, policy_version 874878 (0.0011) [2024-06-15 23:02:35,738][1648984] Fps is (10 sec: 49150.5, 60 sec: 46421.1, 300 sec: 47319.2). Total num frames: 1791754240. Throughput: 0: 11764.5. Samples: 447983616. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:35,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:02:37,546][1652475] Updated weights for policy 0, policy_version 874937 (0.0013) [2024-06-15 23:02:40,267][1652475] Updated weights for policy 0, policy_version 874979 (0.0011) [2024-06-15 23:02:40,738][1648984] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 1791983616. Throughput: 0: 11707.7. Samples: 448054784. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:40,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:02:43,815][1652475] Updated weights for policy 0, policy_version 875040 (0.0014) [2024-06-15 23:02:43,927][1651340] Signal inference workers to stop experience collection... (45000 times) [2024-06-15 23:02:44,003][1652475] InferenceWorker_p0-w0: stopping experience collection (45000 times) [2024-06-15 23:02:44,206][1651340] Signal inference workers to resume experience collection... (45000 times) [2024-06-15 23:02:44,207][1652475] InferenceWorker_p0-w0: resuming experience collection (45000 times) [2024-06-15 23:02:44,987][1652475] Updated weights for policy 0, policy_version 875088 (0.0010) [2024-06-15 23:02:45,738][1648984] Fps is (10 sec: 49153.9, 60 sec: 46421.4, 300 sec: 47430.3). Total num frames: 1792245760. Throughput: 0: 11798.8. Samples: 448125952. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:02:46,019][1652475] Updated weights for policy 0, policy_version 875135 (0.0011) [2024-06-15 23:02:47,744][1652475] Updated weights for policy 0, policy_version 875191 (0.0013) [2024-06-15 23:02:50,737][1648984] Fps is (10 sec: 49152.6, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1792475136. Throughput: 0: 11651.0. Samples: 448158208. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:50,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:02:51,149][1652475] Updated weights for policy 0, policy_version 875261 (0.0028) [2024-06-15 23:02:55,046][1652475] Updated weights for policy 0, policy_version 875318 (0.0012) [2024-06-15 23:02:55,738][1648984] Fps is (10 sec: 42598.0, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 1792671744. Throughput: 0: 12015.1. Samples: 448237056. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:02:55,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 23:02:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000875328_1792671744.pth... [2024-06-15 23:02:55,796][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000869824_1781399552.pth [2024-06-15 23:02:56,552][1652475] Updated weights for policy 0, policy_version 875348 (0.0012) [2024-06-15 23:02:58,068][1652475] Updated weights for policy 0, policy_version 875411 (0.0011) [2024-06-15 23:02:59,144][1652475] Updated weights for policy 0, policy_version 875456 (0.0011) [2024-06-15 23:03:00,738][1648984] Fps is (10 sec: 45874.2, 60 sec: 46421.2, 300 sec: 47430.3). Total num frames: 1792933888. Throughput: 0: 11832.9. Samples: 448304128. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:00,739][1648984] Avg episode reward: [(0, '-0.480')] [2024-06-15 23:03:02,110][1652475] Updated weights for policy 0, policy_version 875520 (0.0011) [2024-06-15 23:03:05,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 47513.8, 300 sec: 46986.0). Total num frames: 1793163264. Throughput: 0: 11776.0. Samples: 448343040. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:05,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:03:06,927][1652475] Updated weights for policy 0, policy_version 875587 (0.0018) [2024-06-15 23:03:07,989][1652475] Updated weights for policy 0, policy_version 875640 (0.0010) [2024-06-15 23:03:09,491][1652475] Updated weights for policy 0, policy_version 875696 (0.0011) [2024-06-15 23:03:10,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 48060.1, 300 sec: 47763.5). Total num frames: 1793458176. Throughput: 0: 12026.3. Samples: 448413184. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:03:11,968][1652475] Updated weights for policy 0, policy_version 875731 (0.0014) [2024-06-15 23:03:12,979][1652475] Updated weights for policy 0, policy_version 875774 (0.0013) [2024-06-15 23:03:15,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1793589248. Throughput: 0: 11912.5. Samples: 448486912. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:03:16,836][1652475] Updated weights for policy 0, policy_version 875824 (0.0011) [2024-06-15 23:03:18,193][1652475] Updated weights for policy 0, policy_version 875872 (0.0012) [2024-06-15 23:03:18,939][1652475] Updated weights for policy 0, policy_version 875900 (0.0011) [2024-06-15 23:03:20,744][1648984] Fps is (10 sec: 39304.2, 60 sec: 46964.1, 300 sec: 47318.5). Total num frames: 1793851392. Throughput: 0: 11877.3. Samples: 448518144. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:20,747][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:03:21,807][1652475] Updated weights for policy 0, policy_version 875958 (0.0011) [2024-06-15 23:03:23,375][1652475] Updated weights for policy 0, policy_version 876023 (0.0012) [2024-06-15 23:03:25,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 47513.5, 300 sec: 47208.1). Total num frames: 1794113536. Throughput: 0: 11844.3. Samples: 448587776. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:25,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:03:26,924][1651340] Signal inference workers to stop experience collection... (45050 times) [2024-06-15 23:03:27,004][1652475] InferenceWorker_p0-w0: stopping experience collection (45050 times) [2024-06-15 23:03:27,197][1651340] Signal inference workers to resume experience collection... (45050 times) [2024-06-15 23:03:27,198][1652475] InferenceWorker_p0-w0: resuming experience collection (45050 times) [2024-06-15 23:03:27,588][1652475] Updated weights for policy 0, policy_version 876064 (0.0011) [2024-06-15 23:03:29,322][1652475] Updated weights for policy 0, policy_version 876128 (0.0012) [2024-06-15 23:03:30,738][1648984] Fps is (10 sec: 52452.4, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1794375680. Throughput: 0: 11798.7. Samples: 448656896. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:03:32,447][1652475] Updated weights for policy 0, policy_version 876181 (0.0014) [2024-06-15 23:03:33,291][1652475] Updated weights for policy 0, policy_version 876224 (0.0012) [2024-06-15 23:03:34,863][1652475] Updated weights for policy 0, policy_version 876284 (0.0121) [2024-06-15 23:03:35,743][1648984] Fps is (10 sec: 52403.5, 60 sec: 48056.1, 300 sec: 47318.4). Total num frames: 1794637824. Throughput: 0: 11945.3. Samples: 448695808. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:35,744][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:03:38,628][1652475] Updated weights for policy 0, policy_version 876342 (0.0012) [2024-06-15 23:03:40,385][1652475] Updated weights for policy 0, policy_version 876387 (0.0014) [2024-06-15 23:03:40,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1794867200. Throughput: 0: 11867.0. Samples: 448771072. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:40,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:03:43,538][1652475] Updated weights for policy 0, policy_version 876448 (0.0019) [2024-06-15 23:03:45,622][1652475] Updated weights for policy 0, policy_version 876512 (0.0014) [2024-06-15 23:03:45,738][1648984] Fps is (10 sec: 45897.6, 60 sec: 47513.5, 300 sec: 47319.4). Total num frames: 1795096576. Throughput: 0: 11923.9. Samples: 448840704. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:03:46,237][1652475] Updated weights for policy 0, policy_version 876536 (0.0012) [2024-06-15 23:03:49,082][1652475] Updated weights for policy 0, policy_version 876578 (0.0019) [2024-06-15 23:03:50,738][1648984] Fps is (10 sec: 49152.3, 60 sec: 48059.6, 300 sec: 47430.3). Total num frames: 1795358720. Throughput: 0: 11969.4. Samples: 448881664. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:03:51,136][1652475] Updated weights for policy 0, policy_version 876664 (0.0012) [2024-06-15 23:03:55,744][1648984] Fps is (10 sec: 42581.4, 60 sec: 47510.5, 300 sec: 46985.3). Total num frames: 1795522560. Throughput: 0: 11900.1. Samples: 448948736. Policy #0 lag: (min: 15.0, avg: 105.5, max: 271.0) [2024-06-15 23:03:55,747][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:03:55,750][1652475] Updated weights for policy 0, policy_version 876721 (0.0106) [2024-06-15 23:03:56,659][1652475] Updated weights for policy 0, policy_version 876756 (0.0011) [2024-06-15 23:03:59,928][1652475] Updated weights for policy 0, policy_version 876818 (0.0042) [2024-06-15 23:04:00,738][1648984] Fps is (10 sec: 42598.5, 60 sec: 47513.7, 300 sec: 47319.2). Total num frames: 1795784704. Throughput: 0: 11798.8. Samples: 449017856. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:04:01,886][1652475] Updated weights for policy 0, policy_version 876899 (0.0101) [2024-06-15 23:04:05,738][1648984] Fps is (10 sec: 42615.4, 60 sec: 46421.4, 300 sec: 46986.0). Total num frames: 1795948544. Throughput: 0: 11697.5. Samples: 449044480. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:04:07,512][1652475] Updated weights for policy 0, policy_version 876960 (0.0011) [2024-06-15 23:04:08,472][1651340] Signal inference workers to stop experience collection... (45100 times) [2024-06-15 23:04:08,523][1652475] InferenceWorker_p0-w0: stopping experience collection (45100 times) [2024-06-15 23:04:08,743][1651340] Signal inference workers to resume experience collection... (45100 times) [2024-06-15 23:04:08,744][1652475] InferenceWorker_p0-w0: resuming experience collection (45100 times) [2024-06-15 23:04:09,250][1652475] Updated weights for policy 0, policy_version 877026 (0.0011) [2024-06-15 23:04:10,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 46421.4, 300 sec: 47541.4). Total num frames: 1796243456. Throughput: 0: 11844.3. Samples: 449120768. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:04:11,240][1652475] Updated weights for policy 0, policy_version 877104 (0.0011) [2024-06-15 23:04:12,322][1652475] Updated weights for policy 0, policy_version 877137 (0.0015) [2024-06-15 23:04:13,257][1652475] Updated weights for policy 0, policy_version 877181 (0.0014) [2024-06-15 23:04:15,739][1648984] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1796472832. Throughput: 0: 11753.2. Samples: 449185792. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:15,740][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:04:18,745][1652475] Updated weights for policy 0, policy_version 877220 (0.0096) [2024-06-15 23:04:20,089][1652475] Updated weights for policy 0, policy_version 877283 (0.0098) [2024-06-15 23:04:20,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 48063.3, 300 sec: 47541.4). Total num frames: 1796734976. Throughput: 0: 11834.2. Samples: 449228288. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:20,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 23:04:21,272][1652475] Updated weights for policy 0, policy_version 877328 (0.0010) [2024-06-15 23:04:22,325][1652475] Updated weights for policy 0, policy_version 877371 (0.0011) [2024-06-15 23:04:25,140][1652475] Updated weights for policy 0, policy_version 877412 (0.0011) [2024-06-15 23:04:25,738][1648984] Fps is (10 sec: 52429.9, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1796997120. Throughput: 0: 11673.6. Samples: 449296384. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:04:25,741][1652475] Updated weights for policy 0, policy_version 877440 (0.0013) [2024-06-15 23:04:29,758][1652475] Updated weights for policy 0, policy_version 877507 (0.0021) [2024-06-15 23:04:30,738][1648984] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 1797193728. Throughput: 0: 11730.5. Samples: 449368576. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:30,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:04:31,121][1652475] Updated weights for policy 0, policy_version 877567 (0.0010) [2024-06-15 23:04:32,928][1652475] Updated weights for policy 0, policy_version 877625 (0.0018) [2024-06-15 23:04:35,738][1648984] Fps is (10 sec: 39319.9, 60 sec: 45878.7, 300 sec: 47208.1). Total num frames: 1797390336. Throughput: 0: 11446.0. Samples: 449396736. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:35,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:04:37,889][1652475] Updated weights for policy 0, policy_version 877665 (0.0011) [2024-06-15 23:04:40,388][1652475] Updated weights for policy 0, policy_version 877753 (0.0011) [2024-06-15 23:04:40,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1797652480. Throughput: 0: 11560.8. Samples: 449468928. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:04:41,530][1652475] Updated weights for policy 0, policy_version 877794 (0.0012) [2024-06-15 23:04:42,119][1652475] Updated weights for policy 0, policy_version 877822 (0.0010) [2024-06-15 23:04:44,721][1652475] Updated weights for policy 0, policy_version 877881 (0.0012) [2024-06-15 23:04:45,738][1648984] Fps is (10 sec: 52430.3, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1797914624. Throughput: 0: 11548.4. Samples: 449537536. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:45,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:04:49,117][1652475] Updated weights for policy 0, policy_version 877938 (0.0011) [2024-06-15 23:04:50,659][1651340] Signal inference workers to stop experience collection... (45150 times) [2024-06-15 23:04:50,682][1652475] Updated weights for policy 0, policy_version 877986 (0.0015) [2024-06-15 23:04:50,700][1652475] InferenceWorker_p0-w0: stopping experience collection (45150 times) [2024-06-15 23:04:50,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 1798111232. Throughput: 0: 11946.7. Samples: 449582080. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:04:50,881][1651340] Signal inference workers to resume experience collection... (45150 times) [2024-06-15 23:04:50,882][1652475] InferenceWorker_p0-w0: resuming experience collection (45150 times) [2024-06-15 23:04:52,769][1652475] Updated weights for policy 0, policy_version 878070 (0.0013) [2024-06-15 23:04:55,738][1648984] Fps is (10 sec: 39320.4, 60 sec: 46424.2, 300 sec: 47097.0). Total num frames: 1798307840. Throughput: 0: 11650.8. Samples: 449645056. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:04:55,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:04:56,394][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000878112_1798373376.pth... [2024-06-15 23:04:56,536][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000872576_1787035648.pth [2024-06-15 23:04:59,723][1652475] Updated weights for policy 0, policy_version 878148 (0.0012) [2024-06-15 23:05:00,738][1648984] Fps is (10 sec: 45875.4, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 1798569984. Throughput: 0: 11821.5. Samples: 449717760. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:05:00,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:05:01,099][1652475] Updated weights for policy 0, policy_version 878224 (0.0017) [2024-06-15 23:05:03,075][1652475] Updated weights for policy 0, policy_version 878293 (0.0012) [2024-06-15 23:05:05,738][1648984] Fps is (10 sec: 52430.6, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1798832128. Throughput: 0: 11457.4. Samples: 449743872. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:05:05,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 23:05:07,506][1652475] Updated weights for policy 0, policy_version 878341 (0.0014) [2024-06-15 23:05:10,738][1648984] Fps is (10 sec: 39321.3, 60 sec: 45329.0, 300 sec: 47097.1). Total num frames: 1798963200. Throughput: 0: 11673.6. Samples: 449821696. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:05:10,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:05:10,866][1652475] Updated weights for policy 0, policy_version 878416 (0.0012) [2024-06-15 23:05:12,276][1652475] Updated weights for policy 0, policy_version 878480 (0.0033) [2024-06-15 23:05:14,307][1652475] Updated weights for policy 0, policy_version 878545 (0.0010) [2024-06-15 23:05:15,408][1652475] Updated weights for policy 0, policy_version 878592 (0.0011) [2024-06-15 23:05:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1799356416. Throughput: 0: 11343.7. Samples: 449879040. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:05:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:05:19,907][1652475] Updated weights for policy 0, policy_version 878654 (0.0015) [2024-06-15 23:05:20,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1799487488. Throughput: 0: 11662.3. Samples: 449921536. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:05:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:05:23,113][1652475] Updated weights for policy 0, policy_version 878704 (0.0012) [2024-06-15 23:05:24,715][1652475] Updated weights for policy 0, policy_version 878754 (0.0034) [2024-06-15 23:05:25,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 45875.1, 300 sec: 47208.1). Total num frames: 1799749632. Throughput: 0: 11628.1. Samples: 449992192. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:05:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:05:26,363][1652475] Updated weights for policy 0, policy_version 878800 (0.0012) [2024-06-15 23:05:27,396][1652475] Updated weights for policy 0, policy_version 878846 (0.0013) [2024-06-15 23:05:30,217][1652475] Updated weights for policy 0, policy_version 878903 (0.0012) [2024-06-15 23:05:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 1800011776. Throughput: 0: 11685.0. Samples: 450063360. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:05:30,740][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:05:32,957][1652475] Updated weights for policy 0, policy_version 878949 (0.0012) [2024-06-15 23:05:33,472][1651340] Signal inference workers to stop experience collection... (45200 times) [2024-06-15 23:05:33,528][1652475] InferenceWorker_p0-w0: stopping experience collection (45200 times) [2024-06-15 23:05:33,639][1651340] Signal inference workers to resume experience collection... (45200 times) [2024-06-15 23:05:33,639][1652475] InferenceWorker_p0-w0: resuming experience collection (45200 times) [2024-06-15 23:05:33,815][1652475] Updated weights for policy 0, policy_version 878980 (0.0080) [2024-06-15 23:05:35,051][1652475] Updated weights for policy 0, policy_version 879035 (0.0012) [2024-06-15 23:05:35,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1800273920. Throughput: 0: 11605.3. Samples: 450104320. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:05:35,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:05:38,613][1652475] Updated weights for policy 0, policy_version 879088 (0.0012) [2024-06-15 23:05:39,981][1652475] Updated weights for policy 0, policy_version 879139 (0.0011) [2024-06-15 23:05:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1800536064. Throughput: 0: 11980.9. Samples: 450184192. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:05:40,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 23:05:42,794][1652475] Updated weights for policy 0, policy_version 879185 (0.0012) [2024-06-15 23:05:44,476][1652475] Updated weights for policy 0, policy_version 879250 (0.0011) [2024-06-15 23:05:45,483][1652475] Updated weights for policy 0, policy_version 879293 (0.0012) [2024-06-15 23:05:45,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1800798208. Throughput: 0: 11696.4. Samples: 450244096. Policy #0 lag: (min: 30.0, avg: 138.3, max: 271.0) [2024-06-15 23:05:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:05:50,010][1652475] Updated weights for policy 0, policy_version 879348 (0.0010) [2024-06-15 23:05:50,738][1648984] Fps is (10 sec: 42598.4, 60 sec: 47513.6, 300 sec: 47208.2). Total num frames: 1800962048. Throughput: 0: 12219.7. Samples: 450293760. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:05:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:05:51,148][1652475] Updated weights for policy 0, policy_version 879414 (0.0011) [2024-06-15 23:05:53,824][1652475] Updated weights for policy 0, policy_version 879472 (0.0012) [2024-06-15 23:05:54,657][1652475] Updated weights for policy 0, policy_version 879504 (0.0012) [2024-06-15 23:05:55,672][1652475] Updated weights for policy 0, policy_version 879550 (0.0010) [2024-06-15 23:05:55,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 50244.6, 300 sec: 47763.5). Total num frames: 1801322496. Throughput: 0: 12151.5. Samples: 450368512. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:05:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 23:05:59,528][1652475] Updated weights for policy 0, policy_version 879602 (0.0033) [2024-06-15 23:06:00,486][1652475] Updated weights for policy 0, policy_version 879651 (0.0011) [2024-06-15 23:06:00,738][1648984] Fps is (10 sec: 58982.3, 60 sec: 49698.1, 300 sec: 47874.6). Total num frames: 1801551872. Throughput: 0: 12674.8. Samples: 450449408. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:06:02,751][1652475] Updated weights for policy 0, policy_version 879696 (0.0010) [2024-06-15 23:06:03,702][1652475] Updated weights for policy 0, policy_version 879735 (0.0011) [2024-06-15 23:06:05,105][1652475] Updated weights for policy 0, policy_version 879776 (0.0012) [2024-06-15 23:06:05,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 50244.2, 300 sec: 47985.7). Total num frames: 1801846784. Throughput: 0: 12686.2. Samples: 450492416. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:06:07,294][1652475] Updated weights for policy 0, policy_version 879811 (0.0013) [2024-06-15 23:06:09,000][1652475] Updated weights for policy 0, policy_version 879889 (0.0011) [2024-06-15 23:06:09,385][1651340] Signal inference workers to stop experience collection... (45250 times) [2024-06-15 23:06:09,445][1652475] InferenceWorker_p0-w0: stopping experience collection (45250 times) [2024-06-15 23:06:09,647][1651340] Signal inference workers to resume experience collection... (45250 times) [2024-06-15 23:06:09,647][1652475] InferenceWorker_p0-w0: resuming experience collection (45250 times) [2024-06-15 23:06:10,738][1648984] Fps is (10 sec: 55705.0, 60 sec: 52428.7, 300 sec: 47985.7). Total num frames: 1802108928. Throughput: 0: 12868.2. Samples: 450571264. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:10,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:06:12,613][1652475] Updated weights for policy 0, policy_version 879952 (0.0017) [2024-06-15 23:06:14,850][1652475] Updated weights for policy 0, policy_version 880001 (0.0012) [2024-06-15 23:06:15,738][1648984] Fps is (10 sec: 49151.8, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1802338304. Throughput: 0: 13209.6. Samples: 450657792. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 23:06:16,541][1652475] Updated weights for policy 0, policy_version 880067 (0.0016) [2024-06-15 23:06:17,771][1652475] Updated weights for policy 0, policy_version 880122 (0.0029) [2024-06-15 23:06:19,502][1652475] Updated weights for policy 0, policy_version 880183 (0.0079) [2024-06-15 23:06:20,738][1648984] Fps is (10 sec: 52429.4, 60 sec: 52428.8, 300 sec: 47985.7). Total num frames: 1802633216. Throughput: 0: 13095.8. Samples: 450693632. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:20,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:06:23,714][1652475] Updated weights for policy 0, policy_version 880230 (0.0010) [2024-06-15 23:06:24,801][1652475] Updated weights for policy 0, policy_version 880294 (0.0011) [2024-06-15 23:06:25,738][1648984] Fps is (10 sec: 55705.9, 60 sec: 52428.8, 300 sec: 48430.0). Total num frames: 1802895360. Throughput: 0: 13221.0. Samples: 450779136. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:06:27,108][1652475] Updated weights for policy 0, policy_version 880375 (0.0148) [2024-06-15 23:06:29,260][1652475] Updated weights for policy 0, policy_version 880436 (0.0011) [2024-06-15 23:06:30,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 52428.7, 300 sec: 48096.7). Total num frames: 1803157504. Throughput: 0: 13437.1. Samples: 450848768. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:30,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:06:33,511][1652475] Updated weights for policy 0, policy_version 880464 (0.0010) [2024-06-15 23:06:34,407][1652475] Updated weights for policy 0, policy_version 880511 (0.0013) [2024-06-15 23:06:35,704][1652475] Updated weights for policy 0, policy_version 880572 (0.0012) [2024-06-15 23:06:35,738][1648984] Fps is (10 sec: 49151.9, 60 sec: 51882.7, 300 sec: 48318.9). Total num frames: 1803386880. Throughput: 0: 13448.5. Samples: 450898944. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:35,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 23:06:36,897][1652475] Updated weights for policy 0, policy_version 880611 (0.0010) [2024-06-15 23:06:38,154][1652475] Updated weights for policy 0, policy_version 880657 (0.0011) [2024-06-15 23:06:39,227][1652475] Updated weights for policy 0, policy_version 880704 (0.0079) [2024-06-15 23:06:40,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 52428.8, 300 sec: 48207.8). Total num frames: 1803681792. Throughput: 0: 13186.8. Samples: 450961920. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:40,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 23:06:44,011][1652475] Updated weights for policy 0, policy_version 880764 (0.0010) [2024-06-15 23:06:45,738][1648984] Fps is (10 sec: 42598.6, 60 sec: 50244.3, 300 sec: 47985.7). Total num frames: 1803812864. Throughput: 0: 13391.7. Samples: 451052032. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:06:47,006][1652475] Updated weights for policy 0, policy_version 880816 (0.0011) [2024-06-15 23:06:48,312][1652475] Updated weights for policy 0, policy_version 880867 (0.0010) [2024-06-15 23:06:48,685][1651340] Signal inference workers to stop experience collection... (45300 times) [2024-06-15 23:06:48,745][1652475] InferenceWorker_p0-w0: stopping experience collection (45300 times) [2024-06-15 23:06:48,955][1651340] Signal inference workers to resume experience collection... (45300 times) [2024-06-15 23:06:48,956][1652475] InferenceWorker_p0-w0: resuming experience collection (45300 times) [2024-06-15 23:06:50,177][1652475] Updated weights for policy 0, policy_version 880944 (0.0011) [2024-06-15 23:06:50,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 54067.3, 300 sec: 48652.2). Total num frames: 1804206080. Throughput: 0: 13243.8. Samples: 451088384. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:06:53,771][1652475] Updated weights for policy 0, policy_version 881008 (0.0055) [2024-06-15 23:06:55,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 50244.2, 300 sec: 48096.8). Total num frames: 1804337152. Throughput: 0: 13095.9. Samples: 451160576. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:06:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:06:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000881024_1804337152.pth... [2024-06-15 23:06:55,947][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000875328_1792671744.pth [2024-06-15 23:06:57,223][1652475] Updated weights for policy 0, policy_version 881073 (0.0011) [2024-06-15 23:06:58,510][1652475] Updated weights for policy 0, policy_version 881124 (0.0012) [2024-06-15 23:07:00,096][1652475] Updated weights for policy 0, policy_version 881189 (0.0091) [2024-06-15 23:07:00,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 52975.0, 300 sec: 48874.4). Total num frames: 1804730368. Throughput: 0: 12697.6. Samples: 451229184. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:07:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:07:02,948][1652475] Updated weights for policy 0, policy_version 881236 (0.0010) [2024-06-15 23:07:05,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 50244.3, 300 sec: 48430.1). Total num frames: 1804861440. Throughput: 0: 12925.2. Samples: 451275264. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:07:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:07:06,637][1652475] Updated weights for policy 0, policy_version 881298 (0.0012) [2024-06-15 23:07:08,068][1652475] Updated weights for policy 0, policy_version 881360 (0.0012) [2024-06-15 23:07:09,869][1652475] Updated weights for policy 0, policy_version 881426 (0.0011) [2024-06-15 23:07:10,740][1648984] Fps is (10 sec: 52427.9, 60 sec: 52428.8, 300 sec: 48985.4). Total num frames: 1805254656. Throughput: 0: 12720.3. Samples: 451351552. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:07:10,741][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:07:12,715][1652475] Updated weights for policy 0, policy_version 881498 (0.0014) [2024-06-15 23:07:13,491][1652475] Updated weights for policy 0, policy_version 881536 (0.0010) [2024-06-15 23:07:15,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 50790.5, 300 sec: 48652.2). Total num frames: 1805385728. Throughput: 0: 13050.4. Samples: 451436032. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:07:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:07:18,078][1652475] Updated weights for policy 0, policy_version 881600 (0.0011) [2024-06-15 23:07:19,838][1652475] Updated weights for policy 0, policy_version 881680 (0.0013) [2024-06-15 23:07:20,664][1652475] Updated weights for policy 0, policy_version 881720 (0.0136) [2024-06-15 23:07:20,738][1648984] Fps is (10 sec: 49152.7, 60 sec: 51882.7, 300 sec: 49096.5). Total num frames: 1805746176. Throughput: 0: 12800.0. Samples: 451474944. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:07:20,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 23:07:23,544][1652475] Updated weights for policy 0, policy_version 881776 (0.0088) [2024-06-15 23:07:25,738][1648984] Fps is (10 sec: 52427.6, 60 sec: 50244.1, 300 sec: 48874.3). Total num frames: 1805910016. Throughput: 0: 12834.1. Samples: 451539456. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:07:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:07:27,037][1651340] Signal inference workers to stop experience collection... (45350 times) [2024-06-15 23:07:27,080][1652475] InferenceWorker_p0-w0: stopping experience collection (45350 times) [2024-06-15 23:07:27,191][1651340] Signal inference workers to resume experience collection... (45350 times) [2024-06-15 23:07:27,192][1652475] InferenceWorker_p0-w0: resuming experience collection (45350 times) [2024-06-15 23:07:27,529][1652475] Updated weights for policy 0, policy_version 881842 (0.0020) [2024-06-15 23:07:28,655][1652475] Updated weights for policy 0, policy_version 881890 (0.0010) [2024-06-15 23:07:29,950][1652475] Updated weights for policy 0, policy_version 881936 (0.0010) [2024-06-15 23:07:30,739][1648984] Fps is (10 sec: 52428.1, 60 sec: 51882.7, 300 sec: 49207.6). Total num frames: 1806270464. Throughput: 0: 12697.6. Samples: 451623424. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:07:30,740][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:07:32,504][1652475] Updated weights for policy 0, policy_version 882000 (0.0014) [2024-06-15 23:07:35,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 50790.4, 300 sec: 48985.4). Total num frames: 1806434304. Throughput: 0: 12686.2. Samples: 451659264. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:07:35,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 23:07:36,249][1652475] Updated weights for policy 0, policy_version 882050 (0.0013) [2024-06-15 23:07:37,664][1652475] Updated weights for policy 0, policy_version 882128 (0.0012) [2024-06-15 23:07:40,738][1648984] Fps is (10 sec: 45875.9, 60 sec: 50790.5, 300 sec: 49096.5). Total num frames: 1806729216. Throughput: 0: 12970.7. Samples: 451744256. Policy #0 lag: (min: 15.0, avg: 80.8, max: 223.0) [2024-06-15 23:07:40,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:07:40,969][1652475] Updated weights for policy 0, policy_version 882208 (0.0012) [2024-06-15 23:07:42,828][1652475] Updated weights for policy 0, policy_version 882256 (0.0036) [2024-06-15 23:07:43,773][1652475] Updated weights for policy 0, policy_version 882296 (0.0009) [2024-06-15 23:07:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 52428.8, 300 sec: 49096.4). Total num frames: 1806958592. Throughput: 0: 13334.7. Samples: 451829248. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:07:45,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:07:45,794][1652475] Updated weights for policy 0, policy_version 882320 (0.0017) [2024-06-15 23:07:47,552][1652475] Updated weights for policy 0, policy_version 882402 (0.0029) [2024-06-15 23:07:49,845][1652475] Updated weights for policy 0, policy_version 882435 (0.0011) [2024-06-15 23:07:50,738][1648984] Fps is (10 sec: 55705.6, 60 sec: 51336.5, 300 sec: 49540.8). Total num frames: 1807286272. Throughput: 0: 13141.4. Samples: 451866624. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:07:50,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:07:51,084][1652475] Updated weights for policy 0, policy_version 882496 (0.0011) [2024-06-15 23:07:54,071][1652475] Updated weights for policy 0, policy_version 882554 (0.0011) [2024-06-15 23:07:55,738][1648984] Fps is (10 sec: 55705.4, 60 sec: 52974.9, 300 sec: 49429.7). Total num frames: 1807515648. Throughput: 0: 13255.1. Samples: 451948032. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:07:55,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:07:56,569][1652475] Updated weights for policy 0, policy_version 882624 (0.0086) [2024-06-15 23:07:57,777][1652475] Updated weights for policy 0, policy_version 882686 (0.0012) [2024-06-15 23:08:00,408][1652475] Updated weights for policy 0, policy_version 882750 (0.0073) [2024-06-15 23:08:00,738][1648984] Fps is (10 sec: 58982.4, 60 sec: 52428.8, 300 sec: 49874.0). Total num frames: 1807876096. Throughput: 0: 13141.3. Samples: 452027392. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:08:03,762][1651340] Signal inference workers to stop experience collection... (45400 times) [2024-06-15 23:08:03,836][1652475] InferenceWorker_p0-w0: stopping experience collection (45400 times) [2024-06-15 23:08:04,031][1651340] Signal inference workers to resume experience collection... (45400 times) [2024-06-15 23:08:04,032][1652475] InferenceWorker_p0-w0: resuming experience collection (45400 times) [2024-06-15 23:08:04,200][1652475] Updated weights for policy 0, policy_version 882811 (0.0012) [2024-06-15 23:08:05,532][1652475] Updated weights for policy 0, policy_version 882864 (0.0011) [2024-06-15 23:08:05,738][1648984] Fps is (10 sec: 58983.0, 60 sec: 54067.3, 300 sec: 49651.9). Total num frames: 1808105472. Throughput: 0: 13414.4. Samples: 452078592. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:08:07,112][1652475] Updated weights for policy 0, policy_version 882941 (0.0065) [2024-06-15 23:08:09,612][1652475] Updated weights for policy 0, policy_version 883005 (0.0072) [2024-06-15 23:08:10,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 52428.9, 300 sec: 50207.2). Total num frames: 1808400384. Throughput: 0: 13596.5. Samples: 452151296. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:10,740][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:08:13,756][1652475] Updated weights for policy 0, policy_version 883061 (0.0011) [2024-06-15 23:08:14,427][1652475] Updated weights for policy 0, policy_version 883090 (0.0016) [2024-06-15 23:08:15,738][1648984] Fps is (10 sec: 55704.7, 60 sec: 54613.2, 300 sec: 50208.0). Total num frames: 1808662528. Throughput: 0: 13607.8. Samples: 452235776. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:15,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:08:16,238][1652475] Updated weights for policy 0, policy_version 883168 (0.0205) [2024-06-15 23:08:17,070][1652475] Updated weights for policy 0, policy_version 883200 (0.0018) [2024-06-15 23:08:19,825][1652475] Updated weights for policy 0, policy_version 883264 (0.0098) [2024-06-15 23:08:20,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 52974.9, 300 sec: 50207.3). Total num frames: 1808924672. Throughput: 0: 13585.1. Samples: 452270592. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:20,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 23:08:23,322][1652475] Updated weights for policy 0, policy_version 883320 (0.0012) [2024-06-15 23:08:25,673][1652475] Updated weights for policy 0, policy_version 883369 (0.0011) [2024-06-15 23:08:25,738][1648984] Fps is (10 sec: 45875.5, 60 sec: 53521.2, 300 sec: 49985.1). Total num frames: 1809121280. Throughput: 0: 13596.4. Samples: 452356096. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:25,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:08:27,244][1652475] Updated weights for policy 0, policy_version 883440 (0.0010) [2024-06-15 23:08:28,919][1652475] Updated weights for policy 0, policy_version 883514 (0.0010) [2024-06-15 23:08:30,738][1648984] Fps is (10 sec: 52428.2, 60 sec: 52975.0, 300 sec: 50208.1). Total num frames: 1809448960. Throughput: 0: 13255.1. Samples: 452425728. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:08:32,718][1652475] Updated weights for policy 0, policy_version 883568 (0.0012) [2024-06-15 23:08:35,737][1648984] Fps is (10 sec: 45875.8, 60 sec: 52428.9, 300 sec: 49874.0). Total num frames: 1809580032. Throughput: 0: 13277.9. Samples: 452464128. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:08:36,362][1652475] Updated weights for policy 0, policy_version 883602 (0.0010) [2024-06-15 23:08:38,721][1652475] Updated weights for policy 0, policy_version 883712 (0.0012) [2024-06-15 23:08:39,179][1651340] Signal inference workers to stop experience collection... (45450 times) [2024-06-15 23:08:39,226][1652475] InferenceWorker_p0-w0: stopping experience collection (45450 times) [2024-06-15 23:08:39,449][1651340] Signal inference workers to resume experience collection... (45450 times) [2024-06-15 23:08:39,449][1652475] InferenceWorker_p0-w0: resuming experience collection (45450 times) [2024-06-15 23:08:40,021][1652475] Updated weights for policy 0, policy_version 883766 (0.0010) [2024-06-15 23:08:40,738][1648984] Fps is (10 sec: 52429.2, 60 sec: 54067.2, 300 sec: 50429.4). Total num frames: 1809973248. Throughput: 0: 13084.4. Samples: 452536832. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:08:42,390][1652475] Updated weights for policy 0, policy_version 883810 (0.0010) [2024-06-15 23:08:45,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 52428.8, 300 sec: 49985.1). Total num frames: 1810104320. Throughput: 0: 13095.8. Samples: 452616704. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:08:46,997][1652475] Updated weights for policy 0, policy_version 883857 (0.0011) [2024-06-15 23:08:48,739][1652475] Updated weights for policy 0, policy_version 883922 (0.0028) [2024-06-15 23:08:50,088][1652475] Updated weights for policy 0, policy_version 883984 (0.0023) [2024-06-15 23:08:50,738][1648984] Fps is (10 sec: 45874.6, 60 sec: 52428.6, 300 sec: 50541.1). Total num frames: 1810432000. Throughput: 0: 12834.1. Samples: 452656128. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:08:51,904][1652475] Updated weights for policy 0, policy_version 884034 (0.0010) [2024-06-15 23:08:53,028][1652475] Updated weights for policy 0, policy_version 884095 (0.0011) [2024-06-15 23:08:55,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 51882.6, 300 sec: 50318.3). Total num frames: 1810628608. Throughput: 0: 12788.6. Samples: 452726784. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:08:55,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:08:55,756][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000884096_1810628608.pth... [2024-06-15 23:08:55,816][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000878112_1798373376.pth [2024-06-15 23:08:58,832][1652475] Updated weights for policy 0, policy_version 884176 (0.0011) [2024-06-15 23:09:00,100][1652475] Updated weights for policy 0, policy_version 884226 (0.0010) [2024-06-15 23:09:00,738][1648984] Fps is (10 sec: 52429.7, 60 sec: 51336.5, 300 sec: 50873.7). Total num frames: 1810956288. Throughput: 0: 12652.1. Samples: 452805120. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:09:00,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:09:01,892][1652475] Updated weights for policy 0, policy_version 884291 (0.0012) [2024-06-15 23:09:03,112][1652475] Updated weights for policy 0, policy_version 884339 (0.0012) [2024-06-15 23:09:05,759][1648984] Fps is (10 sec: 52318.9, 60 sec: 50772.5, 300 sec: 50536.9). Total num frames: 1811152896. Throughput: 0: 12600.7. Samples: 452837888. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:09:05,759][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:09:07,786][1652475] Updated weights for policy 0, policy_version 884371 (0.0010) [2024-06-15 23:09:09,267][1652475] Updated weights for policy 0, policy_version 884432 (0.0011) [2024-06-15 23:09:10,738][1648984] Fps is (10 sec: 49152.2, 60 sec: 50790.5, 300 sec: 50762.7). Total num frames: 1811447808. Throughput: 0: 12561.1. Samples: 452921344. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:09:10,741][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:09:11,176][1652475] Updated weights for policy 0, policy_version 884513 (0.0020) [2024-06-15 23:09:11,810][1652475] Updated weights for policy 0, policy_version 884544 (0.0009) [2024-06-15 23:09:13,376][1652475] Updated weights for policy 0, policy_version 884606 (0.0011) [2024-06-15 23:09:15,738][1648984] Fps is (10 sec: 52539.6, 60 sec: 50244.3, 300 sec: 50651.6). Total num frames: 1811677184. Throughput: 0: 12686.2. Samples: 452996608. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:09:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:09:18,498][1652475] Updated weights for policy 0, policy_version 884665 (0.0080) [2024-06-15 23:09:19,176][1651340] Signal inference workers to stop experience collection... (45500 times) [2024-06-15 23:09:19,218][1652475] InferenceWorker_p0-w0: stopping experience collection (45500 times) [2024-06-15 23:09:19,405][1651340] Signal inference workers to resume experience collection... (45500 times) [2024-06-15 23:09:19,406][1652475] InferenceWorker_p0-w0: resuming experience collection (45500 times) [2024-06-15 23:09:20,296][1652475] Updated weights for policy 0, policy_version 884724 (0.0105) [2024-06-15 23:09:20,738][1648984] Fps is (10 sec: 49150.9, 60 sec: 50244.1, 300 sec: 50651.5). Total num frames: 1811939328. Throughput: 0: 12788.5. Samples: 453039616. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:09:20,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:09:21,909][1652475] Updated weights for policy 0, policy_version 884796 (0.0010) [2024-06-15 23:09:23,673][1652475] Updated weights for policy 0, policy_version 884864 (0.0013) [2024-06-15 23:09:25,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 51336.5, 300 sec: 50873.7). Total num frames: 1812201472. Throughput: 0: 12561.1. Samples: 453102080. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:09:25,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:09:28,005][1652475] Updated weights for policy 0, policy_version 884922 (0.0012) [2024-06-15 23:09:29,644][1652475] Updated weights for policy 0, policy_version 884976 (0.0012) [2024-06-15 23:09:30,739][1648984] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 51095.9). Total num frames: 1812463616. Throughput: 0: 12765.8. Samples: 453191168. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:09:30,740][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:09:31,622][1652475] Updated weights for policy 0, policy_version 885024 (0.0010) [2024-06-15 23:09:33,354][1652475] Updated weights for policy 0, policy_version 885104 (0.0012) [2024-06-15 23:09:35,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 52428.7, 300 sec: 51095.9). Total num frames: 1812725760. Throughput: 0: 12629.4. Samples: 453224448. Policy #0 lag: (min: 63.0, avg: 184.1, max: 319.0) [2024-06-15 23:09:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:09:36,555][1652475] Updated weights for policy 0, policy_version 885139 (0.0020) [2024-06-15 23:09:37,485][1652475] Updated weights for policy 0, policy_version 885183 (0.0011) [2024-06-15 23:09:39,012][1652475] Updated weights for policy 0, policy_version 885234 (0.0108) [2024-06-15 23:09:40,738][1648984] Fps is (10 sec: 55705.7, 60 sec: 50790.4, 300 sec: 51206.9). Total num frames: 1813020672. Throughput: 0: 13118.6. Samples: 453317120. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:09:40,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 23:09:40,890][1652475] Updated weights for policy 0, policy_version 885267 (0.0011) [2024-06-15 23:09:41,950][1652475] Updated weights for policy 0, policy_version 885328 (0.0012) [2024-06-15 23:09:43,048][1652475] Updated weights for policy 0, policy_version 885376 (0.0012) [2024-06-15 23:09:45,738][1648984] Fps is (10 sec: 55705.9, 60 sec: 52974.9, 300 sec: 51429.1). Total num frames: 1813282816. Throughput: 0: 13266.5. Samples: 453402112. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:09:45,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:09:46,417][1652475] Updated weights for policy 0, policy_version 885431 (0.0011) [2024-06-15 23:09:48,109][1652475] Updated weights for policy 0, policy_version 885488 (0.0030) [2024-06-15 23:09:50,387][1652475] Updated weights for policy 0, policy_version 885536 (0.0015) [2024-06-15 23:09:50,738][1648984] Fps is (10 sec: 58982.2, 60 sec: 52975.0, 300 sec: 51873.5). Total num frames: 1813610496. Throughput: 0: 13261.3. Samples: 453434368. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:09:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:09:52,714][1652475] Updated weights for policy 0, policy_version 885600 (0.0010) [2024-06-15 23:09:55,180][1652475] Updated weights for policy 0, policy_version 885648 (0.0014) [2024-06-15 23:09:55,738][1648984] Fps is (10 sec: 55705.7, 60 sec: 53521.2, 300 sec: 51762.3). Total num frames: 1813839872. Throughput: 0: 13312.0. Samples: 453520384. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:09:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:09:55,755][1651340] Signal inference workers to stop experience collection... (45550 times) [2024-06-15 23:09:55,777][1652475] InferenceWorker_p0-w0: stopping experience collection (45550 times) [2024-06-15 23:09:56,090][1651340] Signal inference workers to resume experience collection... (45550 times) [2024-06-15 23:09:56,091][1652475] InferenceWorker_p0-w0: resuming experience collection (45550 times) [2024-06-15 23:09:56,432][1652475] Updated weights for policy 0, policy_version 885696 (0.0011) [2024-06-15 23:09:57,810][1652475] Updated weights for policy 0, policy_version 885747 (0.0012) [2024-06-15 23:10:00,001][1652475] Updated weights for policy 0, policy_version 885778 (0.0033) [2024-06-15 23:10:00,620][1652475] Updated weights for policy 0, policy_version 885822 (0.0013) [2024-06-15 23:10:00,738][1648984] Fps is (10 sec: 55706.1, 60 sec: 53521.1, 300 sec: 51984.5). Total num frames: 1814167552. Throughput: 0: 13459.9. Samples: 453602304. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:00,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 23:10:02,961][1652475] Updated weights for policy 0, policy_version 885862 (0.0021) [2024-06-15 23:10:04,711][1652475] Updated weights for policy 0, policy_version 885907 (0.0010) [2024-06-15 23:10:05,738][1648984] Fps is (10 sec: 55705.8, 60 sec: 54086.3, 300 sec: 52317.7). Total num frames: 1814396928. Throughput: 0: 13437.2. Samples: 453644288. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:10:05,754][1652475] Updated weights for policy 0, policy_version 885948 (0.0033) [2024-06-15 23:10:07,045][1652475] Updated weights for policy 0, policy_version 886005 (0.0015) [2024-06-15 23:10:09,690][1652475] Updated weights for policy 0, policy_version 886048 (0.0011) [2024-06-15 23:10:10,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 54067.1, 300 sec: 51984.5). Total num frames: 1814691840. Throughput: 0: 13903.6. Samples: 453727744. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:10,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 23:10:12,448][1652475] Updated weights for policy 0, policy_version 886112 (0.0098) [2024-06-15 23:10:15,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 54067.1, 300 sec: 52317.7). Total num frames: 1814921216. Throughput: 0: 13698.8. Samples: 453807616. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 23:10:15,980][1652475] Updated weights for policy 0, policy_version 886208 (0.0023) [2024-06-15 23:10:17,235][1652475] Updated weights for policy 0, policy_version 886263 (0.0010) [2024-06-15 23:10:19,883][1652475] Updated weights for policy 0, policy_version 886325 (0.0011) [2024-06-15 23:10:20,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 54613.3, 300 sec: 52428.8). Total num frames: 1815216128. Throughput: 0: 13698.8. Samples: 453840896. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:10:21,191][1652475] Updated weights for policy 0, policy_version 886340 (0.0009) [2024-06-15 23:10:22,512][1652475] Updated weights for policy 0, policy_version 886397 (0.0107) [2024-06-15 23:10:25,737][1648984] Fps is (10 sec: 49153.1, 60 sec: 53521.2, 300 sec: 52206.7). Total num frames: 1815412736. Throughput: 0: 13551.0. Samples: 453926912. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:10:26,017][1652475] Updated weights for policy 0, policy_version 886455 (0.0010) [2024-06-15 23:10:27,163][1652475] Updated weights for policy 0, policy_version 886483 (0.0011) [2024-06-15 23:10:29,424][1652475] Updated weights for policy 0, policy_version 886583 (0.0132) [2024-06-15 23:10:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 54613.3, 300 sec: 52428.8). Total num frames: 1815740416. Throughput: 0: 13129.9. Samples: 453992960. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:10:31,675][1652475] Updated weights for policy 0, policy_version 886626 (0.0066) [2024-06-15 23:10:35,738][1648984] Fps is (10 sec: 45874.7, 60 sec: 52428.9, 300 sec: 51984.5). Total num frames: 1815871488. Throughput: 0: 13323.4. Samples: 454033920. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:10:36,627][1651340] Signal inference workers to stop experience collection... (45600 times) [2024-06-15 23:10:36,707][1652475] InferenceWorker_p0-w0: stopping experience collection (45600 times) [2024-06-15 23:10:36,708][1652475] Updated weights for policy 0, policy_version 886677 (0.0012) [2024-06-15 23:10:36,890][1651340] Signal inference workers to resume experience collection... (45600 times) [2024-06-15 23:10:36,891][1652475] InferenceWorker_p0-w0: resuming experience collection (45600 times) [2024-06-15 23:10:38,018][1652475] Updated weights for policy 0, policy_version 886737 (0.0015) [2024-06-15 23:10:39,453][1652475] Updated weights for policy 0, policy_version 886801 (0.0011) [2024-06-15 23:10:40,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 54067.2, 300 sec: 52428.8). Total num frames: 1816264704. Throughput: 0: 13198.2. Samples: 454114304. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:40,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:10:40,989][1652475] Updated weights for policy 0, policy_version 886865 (0.0011) [2024-06-15 23:10:42,000][1652475] Updated weights for policy 0, policy_version 886912 (0.0010) [2024-06-15 23:10:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 51882.7, 300 sec: 52317.7). Total num frames: 1816395776. Throughput: 0: 13198.2. Samples: 454196224. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:10:48,391][1652475] Updated weights for policy 0, policy_version 886980 (0.0010) [2024-06-15 23:10:50,201][1652475] Updated weights for policy 0, policy_version 887056 (0.0015) [2024-06-15 23:10:50,738][1648984] Fps is (10 sec: 45875.3, 60 sec: 51882.8, 300 sec: 52206.6). Total num frames: 1816723456. Throughput: 0: 13107.2. Samples: 454234112. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:10:51,852][1652475] Updated weights for policy 0, policy_version 887120 (0.0029) [2024-06-15 23:10:52,849][1652475] Updated weights for policy 0, policy_version 887160 (0.0023) [2024-06-15 23:10:55,738][1648984] Fps is (10 sec: 52427.4, 60 sec: 51336.3, 300 sec: 52095.5). Total num frames: 1816920064. Throughput: 0: 12617.9. Samples: 454295552. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:10:55,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:10:55,741][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000887168_1816920064.pth... [2024-06-15 23:10:55,775][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000881024_1804337152.pth [2024-06-15 23:10:55,780][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000887168_1816920064.pth [2024-06-15 23:10:58,258][1652475] Updated weights for policy 0, policy_version 887216 (0.0011) [2024-06-15 23:10:59,531][1652475] Updated weights for policy 0, policy_version 887267 (0.0011) [2024-06-15 23:11:00,738][1648984] Fps is (10 sec: 49151.7, 60 sec: 50790.4, 300 sec: 52095.6). Total num frames: 1817214976. Throughput: 0: 12754.5. Samples: 454381568. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:11:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:11:01,405][1652475] Updated weights for policy 0, policy_version 887344 (0.0010) [2024-06-15 23:11:03,231][1652475] Updated weights for policy 0, policy_version 887414 (0.0012) [2024-06-15 23:11:05,738][1648984] Fps is (10 sec: 52430.0, 60 sec: 50790.4, 300 sec: 51984.5). Total num frames: 1817444352. Throughput: 0: 12561.1. Samples: 454406144. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:11:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:11:09,648][1652475] Updated weights for policy 0, policy_version 887488 (0.0010) [2024-06-15 23:11:10,687][1651340] Signal inference workers to stop experience collection... (45650 times) [2024-06-15 23:11:10,732][1652475] InferenceWorker_p0-w0: stopping experience collection (45650 times) [2024-06-15 23:11:10,738][1648984] Fps is (10 sec: 42597.4, 60 sec: 49151.8, 300 sec: 51873.4). Total num frames: 1817640960. Throughput: 0: 12697.5. Samples: 454498304. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:11:10,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:11:10,987][1651340] Signal inference workers to resume experience collection... (45650 times) [2024-06-15 23:11:10,988][1652475] InferenceWorker_p0-w0: resuming experience collection (45650 times) [2024-06-15 23:11:11,182][1652475] Updated weights for policy 0, policy_version 887539 (0.0017) [2024-06-15 23:11:13,482][1652475] Updated weights for policy 0, policy_version 887632 (0.0061) [2024-06-15 23:11:14,487][1652475] Updated weights for policy 0, policy_version 887677 (0.0011) [2024-06-15 23:11:15,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 50790.4, 300 sec: 51984.5). Total num frames: 1817968640. Throughput: 0: 12447.3. Samples: 454553088. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:11:15,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 23:11:20,122][1652475] Updated weights for policy 0, policy_version 887728 (0.0011) [2024-06-15 23:11:20,738][1648984] Fps is (10 sec: 45876.6, 60 sec: 48059.9, 300 sec: 51540.2). Total num frames: 1818099712. Throughput: 0: 12743.1. Samples: 454607360. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:11:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:11:21,526][1652475] Updated weights for policy 0, policy_version 887793 (0.0012) [2024-06-15 23:11:23,538][1652475] Updated weights for policy 0, policy_version 887876 (0.0011) [2024-06-15 23:11:25,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 51336.4, 300 sec: 51984.5). Total num frames: 1818492928. Throughput: 0: 12162.8. Samples: 454661632. Policy #0 lag: (min: 12.0, avg: 108.1, max: 268.0) [2024-06-15 23:11:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:11:30,006][1652475] Updated weights for policy 0, policy_version 887972 (0.0106) [2024-06-15 23:11:30,738][1648984] Fps is (10 sec: 52428.4, 60 sec: 48059.8, 300 sec: 51651.3). Total num frames: 1818624000. Throughput: 0: 12288.0. Samples: 454749184. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:11:30,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 23:11:31,528][1652475] Updated weights for policy 0, policy_version 888036 (0.0010) [2024-06-15 23:11:33,366][1652475] Updated weights for policy 0, policy_version 888112 (0.0092) [2024-06-15 23:11:35,424][1652475] Updated weights for policy 0, policy_version 888163 (0.0012) [2024-06-15 23:11:35,738][1648984] Fps is (10 sec: 49152.6, 60 sec: 51882.7, 300 sec: 51873.4). Total num frames: 1818984448. Throughput: 0: 12026.3. Samples: 454775296. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:11:35,738][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 23:11:40,102][1652475] Updated weights for policy 0, policy_version 888241 (0.0012) [2024-06-15 23:11:40,738][1648984] Fps is (10 sec: 55705.9, 60 sec: 48605.9, 300 sec: 52095.6). Total num frames: 1819181056. Throughput: 0: 12674.9. Samples: 454865920. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:11:40,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:11:41,254][1652475] Updated weights for policy 0, policy_version 888289 (0.0010) [2024-06-15 23:11:42,249][1652475] Updated weights for policy 0, policy_version 888322 (0.0009) [2024-06-15 23:11:43,241][1652475] Updated weights for policy 0, policy_version 888380 (0.0073) [2024-06-15 23:11:45,290][1651340] Signal inference workers to stop experience collection... (45700 times) [2024-06-15 23:11:45,340][1652475] InferenceWorker_p0-w0: stopping experience collection (45700 times) [2024-06-15 23:11:45,510][1651340] Signal inference workers to resume experience collection... (45700 times) [2024-06-15 23:11:45,510][1652475] InferenceWorker_p0-w0: resuming experience collection (45700 times) [2024-06-15 23:11:45,733][1652475] Updated weights for policy 0, policy_version 888444 (0.0013) [2024-06-15 23:11:45,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 51882.7, 300 sec: 51873.4). Total num frames: 1819508736. Throughput: 0: 12265.3. Samples: 454933504. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:11:45,747][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:11:49,522][1652475] Updated weights for policy 0, policy_version 888509 (0.0011) [2024-06-15 23:11:50,738][1648984] Fps is (10 sec: 52428.3, 60 sec: 49698.0, 300 sec: 52095.6). Total num frames: 1819705344. Throughput: 0: 12777.2. Samples: 454981120. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:11:50,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:11:51,346][1652475] Updated weights for policy 0, policy_version 888565 (0.0010) [2024-06-15 23:11:53,336][1652475] Updated weights for policy 0, policy_version 888608 (0.0011) [2024-06-15 23:11:55,027][1652475] Updated weights for policy 0, policy_version 888674 (0.0106) [2024-06-15 23:11:55,738][1648984] Fps is (10 sec: 55705.4, 60 sec: 52429.0, 300 sec: 51984.5). Total num frames: 1820065792. Throughput: 0: 12345.0. Samples: 455053824. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:11:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:11:58,517][1652475] Updated weights for policy 0, policy_version 888737 (0.0012) [2024-06-15 23:11:59,083][1652475] Updated weights for policy 0, policy_version 888768 (0.0012) [2024-06-15 23:12:00,738][1648984] Fps is (10 sec: 55705.9, 60 sec: 50790.4, 300 sec: 52206.6). Total num frames: 1820262400. Throughput: 0: 13016.2. Samples: 455138816. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:12:01,280][1652475] Updated weights for policy 0, policy_version 888829 (0.0012) [2024-06-15 23:12:03,739][1652475] Updated weights for policy 0, policy_version 888887 (0.0011) [2024-06-15 23:12:04,672][1652475] Updated weights for policy 0, policy_version 888917 (0.0013) [2024-06-15 23:12:05,528][1652475] Updated weights for policy 0, policy_version 888958 (0.0010) [2024-06-15 23:12:05,738][1648984] Fps is (10 sec: 52429.1, 60 sec: 52428.9, 300 sec: 51984.5). Total num frames: 1820590080. Throughput: 0: 12583.8. Samples: 455173632. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:12:08,310][1652475] Updated weights for policy 0, policy_version 889008 (0.0106) [2024-06-15 23:12:09,943][1652475] Updated weights for policy 0, policy_version 889048 (0.0011) [2024-06-15 23:12:10,738][1648984] Fps is (10 sec: 58982.5, 60 sec: 53521.3, 300 sec: 52428.8). Total num frames: 1820852224. Throughput: 0: 13368.9. Samples: 455263232. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:12:12,472][1652475] Updated weights for policy 0, policy_version 889104 (0.0018) [2024-06-15 23:12:14,656][1652475] Updated weights for policy 0, policy_version 889200 (0.0093) [2024-06-15 23:12:15,738][1648984] Fps is (10 sec: 52427.3, 60 sec: 52428.6, 300 sec: 52095.5). Total num frames: 1821114368. Throughput: 0: 12925.1. Samples: 455330816. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:15,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:12:18,453][1652475] Updated weights for policy 0, policy_version 889248 (0.0010) [2024-06-15 23:12:19,317][1652475] Updated weights for policy 0, policy_version 889278 (0.0010) [2024-06-15 23:12:20,738][1648984] Fps is (10 sec: 39321.6, 60 sec: 52428.8, 300 sec: 51984.5). Total num frames: 1821245440. Throughput: 0: 13414.4. Samples: 455378944. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:12:21,760][1652475] Updated weights for policy 0, policy_version 889328 (0.0013) [2024-06-15 23:12:23,434][1651340] Signal inference workers to stop experience collection... (45750 times) [2024-06-15 23:12:23,472][1652475] InferenceWorker_p0-w0: stopping experience collection (45750 times) [2024-06-15 23:12:23,486][1652475] Updated weights for policy 0, policy_version 889410 (0.0012) [2024-06-15 23:12:23,643][1651340] Signal inference workers to resume experience collection... (45750 times) [2024-06-15 23:12:23,644][1652475] InferenceWorker_p0-w0: resuming experience collection (45750 times) [2024-06-15 23:12:25,738][1648984] Fps is (10 sec: 52430.2, 60 sec: 52428.9, 300 sec: 52095.6). Total num frames: 1821638656. Throughput: 0: 12800.0. Samples: 455441920. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:25,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:12:28,794][1652475] Updated weights for policy 0, policy_version 889489 (0.0014) [2024-06-15 23:12:30,519][1652475] Updated weights for policy 0, policy_version 889537 (0.0011) [2024-06-15 23:12:30,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 52428.8, 300 sec: 51984.5). Total num frames: 1821769728. Throughput: 0: 13437.1. Samples: 455538176. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:12:32,185][1652475] Updated weights for policy 0, policy_version 889616 (0.0010) [2024-06-15 23:12:33,456][1652475] Updated weights for policy 0, policy_version 889667 (0.0020) [2024-06-15 23:12:34,646][1652475] Updated weights for policy 0, policy_version 889719 (0.0012) [2024-06-15 23:12:35,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 52974.9, 300 sec: 52317.7). Total num frames: 1822162944. Throughput: 0: 13027.6. Samples: 455567360. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:12:40,449][1652475] Updated weights for policy 0, policy_version 889776 (0.0101) [2024-06-15 23:12:40,738][1648984] Fps is (10 sec: 49152.0, 60 sec: 51336.5, 300 sec: 51873.4). Total num frames: 1822261248. Throughput: 0: 13482.7. Samples: 455660544. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:40,738][1648984] Avg episode reward: [(0, '-0.500')] [2024-06-15 23:12:41,941][1652475] Updated weights for policy 0, policy_version 889841 (0.0012) [2024-06-15 23:12:43,965][1652475] Updated weights for policy 0, policy_version 889920 (0.0112) [2024-06-15 23:12:45,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 52974.9, 300 sec: 52206.6). Total num frames: 1822687232. Throughput: 0: 12618.0. Samples: 455706624. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:45,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 23:12:50,274][1652475] Updated weights for policy 0, policy_version 889986 (0.0011) [2024-06-15 23:12:50,738][1648984] Fps is (10 sec: 45875.1, 60 sec: 50244.3, 300 sec: 51540.2). Total num frames: 1822720000. Throughput: 0: 12993.4. Samples: 455758336. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:50,738][1648984] Avg episode reward: [(0, '-0.580')] [2024-06-15 23:12:51,525][1652475] Updated weights for policy 0, policy_version 890051 (0.0012) [2024-06-15 23:12:53,203][1652475] Updated weights for policy 0, policy_version 890115 (0.0012) [2024-06-15 23:12:55,029][1652475] Updated weights for policy 0, policy_version 890192 (0.0109) [2024-06-15 23:12:55,738][1648984] Fps is (10 sec: 45874.9, 60 sec: 51336.5, 300 sec: 51762.3). Total num frames: 1823145984. Throughput: 0: 12640.7. Samples: 455832064. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:12:55,738][1648984] Avg episode reward: [(0, '-0.590')] [2024-06-15 23:12:56,222][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000890240_1823211520.pth... [2024-06-15 23:12:56,248][1652475] Updated weights for policy 0, policy_version 890240 (0.0013) [2024-06-15 23:12:56,258][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000884096_1810628608.pth [2024-06-15 23:13:00,738][1648984] Fps is (10 sec: 49151.0, 60 sec: 49151.9, 300 sec: 51206.9). Total num frames: 1823211520. Throughput: 0: 13038.9. Samples: 455917568. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:13:00,739][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 23:13:01,650][1651340] Signal inference workers to stop experience collection... (45800 times) [2024-06-15 23:13:01,747][1652475] InferenceWorker_p0-w0: stopping experience collection (45800 times) [2024-06-15 23:13:01,957][1651340] Signal inference workers to resume experience collection... (45800 times) [2024-06-15 23:13:01,957][1652475] InferenceWorker_p0-w0: resuming experience collection (45800 times) [2024-06-15 23:13:02,129][1652475] Updated weights for policy 0, policy_version 890305 (0.0010) [2024-06-15 23:13:03,581][1652475] Updated weights for policy 0, policy_version 890369 (0.0013) [2024-06-15 23:13:04,829][1652475] Updated weights for policy 0, policy_version 890432 (0.0010) [2024-06-15 23:13:05,738][1648984] Fps is (10 sec: 52429.5, 60 sec: 51336.6, 300 sec: 51762.3). Total num frames: 1823670272. Throughput: 0: 12629.4. Samples: 455947264. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:13:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:13:05,923][1652475] Updated weights for policy 0, policy_version 890496 (0.0011) [2024-06-15 23:13:10,509][1652475] Updated weights for policy 0, policy_version 890560 (0.0014) [2024-06-15 23:13:10,738][1648984] Fps is (10 sec: 65536.9, 60 sec: 50244.2, 300 sec: 51540.2). Total num frames: 1823866880. Throughput: 0: 13437.1. Samples: 456046592. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:13:10,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:13:11,665][1652475] Updated weights for policy 0, policy_version 890620 (0.0011) [2024-06-15 23:13:12,570][1652475] Updated weights for policy 0, policy_version 890658 (0.0028) [2024-06-15 23:13:13,424][1652475] Updated weights for policy 0, policy_version 890704 (0.0011) [2024-06-15 23:13:15,738][1648984] Fps is (10 sec: 58981.6, 60 sec: 52429.0, 300 sec: 51984.5). Total num frames: 1824260096. Throughput: 0: 13198.2. Samples: 456132096. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:13:15,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:13:17,553][1652475] Updated weights for policy 0, policy_version 890769 (0.0011) [2024-06-15 23:13:18,935][1652475] Updated weights for policy 0, policy_version 890848 (0.0014) [2024-06-15 23:13:20,073][1652475] Updated weights for policy 0, policy_version 890898 (0.0010) [2024-06-15 23:13:20,738][1648984] Fps is (10 sec: 78643.7, 60 sec: 56797.9, 300 sec: 52651.0). Total num frames: 1824653312. Throughput: 0: 13767.1. Samples: 456186880. Policy #0 lag: (min: 6.0, avg: 61.7, max: 262.0) [2024-06-15 23:13:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:13:22,958][1652475] Updated weights for policy 0, policy_version 890960 (0.0069) [2024-06-15 23:13:25,738][1648984] Fps is (10 sec: 55705.5, 60 sec: 52974.9, 300 sec: 52095.6). Total num frames: 1824817152. Throughput: 0: 13676.1. Samples: 456275968. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:13:25,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:13:25,987][1652475] Updated weights for policy 0, policy_version 891044 (0.0078) [2024-06-15 23:13:27,797][1652475] Updated weights for policy 0, policy_version 891136 (0.0011) [2024-06-15 23:13:28,859][1652475] Updated weights for policy 0, policy_version 891199 (0.0009) [2024-06-15 23:13:30,738][1648984] Fps is (10 sec: 52428.8, 60 sec: 56797.9, 300 sec: 52873.1). Total num frames: 1825177600. Throughput: 0: 14700.1. Samples: 456368128. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:13:30,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:13:31,023][1651340] Signal inference workers to stop experience collection... (45850 times) [2024-06-15 23:13:31,084][1652475] InferenceWorker_p0-w0: stopping experience collection (45850 times) [2024-06-15 23:13:31,177][1651340] Signal inference workers to resume experience collection... (45850 times) [2024-06-15 23:13:31,178][1652475] InferenceWorker_p0-w0: resuming experience collection (45850 times) [2024-06-15 23:13:31,598][1652475] Updated weights for policy 0, policy_version 891256 (0.0011) [2024-06-15 23:13:33,705][1652475] Updated weights for policy 0, policy_version 891296 (0.0010) [2024-06-15 23:13:34,276][1652475] Updated weights for policy 0, policy_version 891327 (0.0010) [2024-06-15 23:13:35,518][1652475] Updated weights for policy 0, policy_version 891392 (0.0010) [2024-06-15 23:13:35,738][1648984] Fps is (10 sec: 75367.0, 60 sec: 56797.9, 300 sec: 52873.1). Total num frames: 1825570816. Throughput: 0: 14859.4. Samples: 456427008. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:13:35,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:13:36,780][1652475] Updated weights for policy 0, policy_version 891452 (0.0009) [2024-06-15 23:13:39,088][1652475] Updated weights for policy 0, policy_version 891510 (0.0010) [2024-06-15 23:13:40,738][1648984] Fps is (10 sec: 65536.2, 60 sec: 59528.5, 300 sec: 53317.4). Total num frames: 1825832960. Throughput: 0: 15291.8. Samples: 456520192. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:13:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:13:41,093][1652475] Updated weights for policy 0, policy_version 891539 (0.0009) [2024-06-15 23:13:41,664][1652475] Updated weights for policy 0, policy_version 891576 (0.0010) [2024-06-15 23:13:42,703][1652475] Updated weights for policy 0, policy_version 891616 (0.0011) [2024-06-15 23:13:43,612][1652475] Updated weights for policy 0, policy_version 891667 (0.0009) [2024-06-15 23:13:45,738][1648984] Fps is (10 sec: 65535.8, 60 sec: 58982.4, 300 sec: 53539.6). Total num frames: 1826226176. Throughput: 0: 15815.2. Samples: 456629248. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:13:45,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:13:46,009][1652475] Updated weights for policy 0, policy_version 891728 (0.0010) [2024-06-15 23:13:46,778][1652475] Updated weights for policy 0, policy_version 891776 (0.0027) [2024-06-15 23:13:48,938][1652475] Updated weights for policy 0, policy_version 891833 (0.0011) [2024-06-15 23:13:50,738][1648984] Fps is (10 sec: 72089.5, 60 sec: 63897.6, 300 sec: 53983.9). Total num frames: 1826553856. Throughput: 0: 16327.1. Samples: 456681984. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:13:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:13:51,124][1652475] Updated weights for policy 0, policy_version 891890 (0.0010) [2024-06-15 23:13:52,200][1652475] Updated weights for policy 0, policy_version 891962 (0.0065) [2024-06-15 23:13:54,691][1652475] Updated weights for policy 0, policy_version 892032 (0.0078) [2024-06-15 23:13:55,738][1648984] Fps is (10 sec: 65536.4, 60 sec: 62259.3, 300 sec: 53983.9). Total num frames: 1826881536. Throughput: 0: 16258.9. Samples: 456778240. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:13:55,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:13:57,992][1652475] Updated weights for policy 0, policy_version 892098 (0.0011) [2024-06-15 23:13:58,864][1652475] Updated weights for policy 0, policy_version 892152 (0.0011) [2024-06-15 23:14:00,143][1652475] Updated weights for policy 0, policy_version 892208 (0.0010) [2024-06-15 23:14:00,738][1648984] Fps is (10 sec: 72089.6, 60 sec: 67720.8, 300 sec: 54654.3). Total num frames: 1827274752. Throughput: 0: 16497.8. Samples: 456874496. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:00,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:14:01,443][1651340] Signal inference workers to stop experience collection... (45900 times) [2024-06-15 23:14:01,489][1652475] InferenceWorker_p0-w0: stopping experience collection (45900 times) [2024-06-15 23:14:01,610][1651340] Signal inference workers to resume experience collection... (45900 times) [2024-06-15 23:14:01,611][1652475] InferenceWorker_p0-w0: resuming experience collection (45900 times) [2024-06-15 23:14:02,226][1652475] Updated weights for policy 0, policy_version 892272 (0.0010) [2024-06-15 23:14:04,562][1652475] Updated weights for policy 0, policy_version 892322 (0.0010) [2024-06-15 23:14:05,550][1652475] Updated weights for policy 0, policy_version 892384 (0.0009) [2024-06-15 23:14:05,738][1648984] Fps is (10 sec: 72089.5, 60 sec: 65535.9, 300 sec: 54761.4). Total num frames: 1827602432. Throughput: 0: 16475.0. Samples: 456928256. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:14:08,594][1652475] Updated weights for policy 0, policy_version 892421 (0.0016) [2024-06-15 23:14:09,958][1652475] Updated weights for policy 0, policy_version 892498 (0.0104) [2024-06-15 23:14:10,738][1648984] Fps is (10 sec: 65535.9, 60 sec: 67720.6, 300 sec: 55094.7). Total num frames: 1827930112. Throughput: 0: 16748.1. Samples: 457029632. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:14:11,900][1652475] Updated weights for policy 0, policy_version 892551 (0.0010) [2024-06-15 23:14:13,509][1652475] Updated weights for policy 0, policy_version 892624 (0.0079) [2024-06-15 23:14:14,386][1652475] Updated weights for policy 0, policy_version 892672 (0.0011) [2024-06-15 23:14:15,738][1648984] Fps is (10 sec: 58982.4, 60 sec: 65536.1, 300 sec: 55094.7). Total num frames: 1828192256. Throughput: 0: 16634.3. Samples: 457116672. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 23:14:17,686][1652475] Updated weights for policy 0, policy_version 892726 (0.0010) [2024-06-15 23:14:18,622][1652475] Updated weights for policy 0, policy_version 892769 (0.0009) [2024-06-15 23:14:19,884][1652475] Updated weights for policy 0, policy_version 892822 (0.0028) [2024-06-15 23:14:20,423][1652475] Updated weights for policy 0, policy_version 892864 (0.0010) [2024-06-15 23:14:20,738][1648984] Fps is (10 sec: 65536.2, 60 sec: 65536.0, 300 sec: 55539.0). Total num frames: 1828585472. Throughput: 0: 16531.9. Samples: 457170944. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:14:22,407][1652475] Updated weights for policy 0, policy_version 892922 (0.0013) [2024-06-15 23:14:25,128][1652475] Updated weights for policy 0, policy_version 892976 (0.0010) [2024-06-15 23:14:25,738][1648984] Fps is (10 sec: 65535.3, 60 sec: 67174.4, 300 sec: 55539.0). Total num frames: 1828847616. Throughput: 0: 16827.7. Samples: 457277440. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:14:25,916][1652475] Updated weights for policy 0, policy_version 893008 (0.0091) [2024-06-15 23:14:26,834][1652475] Updated weights for policy 0, policy_version 893059 (0.0011) [2024-06-15 23:14:27,695][1652475] Updated weights for policy 0, policy_version 893114 (0.0015) [2024-06-15 23:14:29,454][1652475] Updated weights for policy 0, policy_version 893152 (0.0107) [2024-06-15 23:14:30,738][1648984] Fps is (10 sec: 65535.8, 60 sec: 67720.5, 300 sec: 55983.3). Total num frames: 1829240832. Throughput: 0: 16588.8. Samples: 457375744. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:14:32,300][1651340] Signal inference workers to stop experience collection... (45950 times) [2024-06-15 23:14:32,333][1652475] Updated weights for policy 0, policy_version 893204 (0.0018) [2024-06-15 23:14:32,351][1652475] InferenceWorker_p0-w0: stopping experience collection (45950 times) [2024-06-15 23:14:32,508][1651340] Signal inference workers to resume experience collection... (45950 times) [2024-06-15 23:14:32,509][1652475] InferenceWorker_p0-w0: resuming experience collection (45950 times) [2024-06-15 23:14:33,641][1652475] Updated weights for policy 0, policy_version 893249 (0.0012) [2024-06-15 23:14:34,862][1652475] Updated weights for policy 0, policy_version 893316 (0.0010) [2024-06-15 23:14:35,738][1648984] Fps is (10 sec: 75367.4, 60 sec: 67174.4, 300 sec: 56205.5). Total num frames: 1829601280. Throughput: 0: 16588.8. Samples: 457428480. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:14:36,711][1652475] Updated weights for policy 0, policy_version 893378 (0.0012) [2024-06-15 23:14:40,738][1648984] Fps is (10 sec: 52428.7, 60 sec: 65535.9, 300 sec: 55872.2). Total num frames: 1829765120. Throughput: 0: 16497.8. Samples: 457520640. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:14:40,761][1652475] Updated weights for policy 0, policy_version 893442 (0.0012) [2024-06-15 23:14:42,011][1652475] Updated weights for policy 0, policy_version 893507 (0.0044) [2024-06-15 23:14:43,190][1652475] Updated weights for policy 0, policy_version 893570 (0.0010) [2024-06-15 23:14:43,957][1652475] Updated weights for policy 0, policy_version 893627 (0.0011) [2024-06-15 23:14:45,697][1652475] Updated weights for policy 0, policy_version 893690 (0.0011) [2024-06-15 23:14:45,738][1648984] Fps is (10 sec: 65534.0, 60 sec: 67174.1, 300 sec: 56427.6). Total num frames: 1830256640. Throughput: 0: 16486.3. Samples: 457616384. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:45,739][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:14:49,614][1652475] Updated weights for policy 0, policy_version 893744 (0.0012) [2024-06-15 23:14:50,738][1648984] Fps is (10 sec: 75365.8, 60 sec: 66082.0, 300 sec: 56538.7). Total num frames: 1830518784. Throughput: 0: 16600.1. Samples: 457675264. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:14:51,209][1652475] Updated weights for policy 0, policy_version 893840 (0.0009) [2024-06-15 23:14:51,949][1652475] Updated weights for policy 0, policy_version 893881 (0.0009) [2024-06-15 23:14:53,127][1652475] Updated weights for policy 0, policy_version 893945 (0.0009) [2024-06-15 23:14:55,738][1648984] Fps is (10 sec: 55707.1, 60 sec: 65536.0, 300 sec: 56427.6). Total num frames: 1830813696. Throughput: 0: 16293.0. Samples: 457762816. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:14:55,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:14:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000893952_1830813696.pth... [2024-06-15 23:14:55,790][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000887168_1816920064.pth [2024-06-15 23:14:56,907][1652475] Updated weights for policy 0, policy_version 893987 (0.0010) [2024-06-15 23:14:58,772][1652475] Updated weights for policy 0, policy_version 894054 (0.0011) [2024-06-15 23:15:00,024][1651340] Signal inference workers to stop experience collection... (46000 times) [2024-06-15 23:15:00,076][1652475] InferenceWorker_p0-w0: stopping experience collection (46000 times) [2024-06-15 23:15:00,197][1651340] Signal inference workers to resume experience collection... (46000 times) [2024-06-15 23:15:00,198][1652475] InferenceWorker_p0-w0: resuming experience collection (46000 times) [2024-06-15 23:15:00,199][1652475] Updated weights for policy 0, policy_version 894144 (0.0010) [2024-06-15 23:15:00,738][1648984] Fps is (10 sec: 72090.7, 60 sec: 66082.2, 300 sec: 57094.1). Total num frames: 1831239680. Throughput: 0: 16554.7. Samples: 457861632. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:15:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:15:01,202][1652475] Updated weights for policy 0, policy_version 894200 (0.0011) [2024-06-15 23:15:04,488][1652475] Updated weights for policy 0, policy_version 894242 (0.0012) [2024-06-15 23:15:05,738][1648984] Fps is (10 sec: 65535.8, 60 sec: 64443.7, 300 sec: 56871.9). Total num frames: 1831469056. Throughput: 0: 16634.3. Samples: 457919488. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:15:05,741][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:15:06,171][1652475] Updated weights for policy 0, policy_version 894293 (0.0009) [2024-06-15 23:15:07,155][1652475] Updated weights for policy 0, policy_version 894353 (0.0011) [2024-06-15 23:15:07,873][1652475] Updated weights for policy 0, policy_version 894400 (0.0019) [2024-06-15 23:15:09,643][1652475] Updated weights for policy 0, policy_version 894457 (0.0012) [2024-06-15 23:15:10,738][1648984] Fps is (10 sec: 62258.9, 60 sec: 65536.0, 300 sec: 57427.3). Total num frames: 1831862272. Throughput: 0: 16406.8. Samples: 458015744. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:15:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:15:12,066][1652475] Updated weights for policy 0, policy_version 894498 (0.0082) [2024-06-15 23:15:14,134][1652475] Updated weights for policy 0, policy_version 894581 (0.0012) [2024-06-15 23:15:15,030][1652475] Updated weights for policy 0, policy_version 894625 (0.0119) [2024-06-15 23:15:15,738][1648984] Fps is (10 sec: 78643.0, 60 sec: 67720.5, 300 sec: 57760.6). Total num frames: 1832255488. Throughput: 0: 16475.0. Samples: 458117120. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 23:15:15,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:15:16,695][1652475] Updated weights for policy 0, policy_version 894678 (0.0010) [2024-06-15 23:15:17,432][1652475] Updated weights for policy 0, policy_version 894720 (0.0009) [2024-06-15 23:15:19,873][1652475] Updated weights for policy 0, policy_version 894783 (0.0011) [2024-06-15 23:15:20,737][1648984] Fps is (10 sec: 65536.8, 60 sec: 65536.1, 300 sec: 57982.7). Total num frames: 1832517632. Throughput: 0: 16577.5. Samples: 458174464. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:15:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:15:21,552][1652475] Updated weights for policy 0, policy_version 894838 (0.0015) [2024-06-15 23:15:22,498][1652475] Updated weights for policy 0, policy_version 894880 (0.0011) [2024-06-15 23:15:24,197][1652475] Updated weights for policy 0, policy_version 894928 (0.0010) [2024-06-15 23:15:25,738][1648984] Fps is (10 sec: 65536.1, 60 sec: 67720.6, 300 sec: 58204.9). Total num frames: 1832910848. Throughput: 0: 16770.8. Samples: 458275328. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:15:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:15:27,013][1652475] Updated weights for policy 0, policy_version 894980 (0.0010) [2024-06-15 23:15:27,881][1652475] Updated weights for policy 0, policy_version 895037 (0.0011) [2024-06-15 23:15:29,388][1652475] Updated weights for policy 0, policy_version 895075 (0.0011) [2024-06-15 23:15:30,303][1651340] Signal inference workers to stop experience collection... (46050 times) [2024-06-15 23:15:30,359][1652475] InferenceWorker_p0-w0: stopping experience collection (46050 times) [2024-06-15 23:15:30,498][1651340] Signal inference workers to resume experience collection... (46050 times) [2024-06-15 23:15:30,499][1652475] InferenceWorker_p0-w0: resuming experience collection (46050 times) [2024-06-15 23:15:30,501][1652475] Updated weights for policy 0, policy_version 895136 (0.0012) [2024-06-15 23:15:30,738][1648984] Fps is (10 sec: 72087.1, 60 sec: 66628.0, 300 sec: 58871.3). Total num frames: 1833238528. Throughput: 0: 16793.6. Samples: 458372096. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:15:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:15:31,494][1652475] Updated weights for policy 0, policy_version 895169 (0.0014) [2024-06-15 23:15:32,516][1652475] Updated weights for policy 0, policy_version 895228 (0.0011) [2024-06-15 23:15:35,738][1648984] Fps is (10 sec: 52428.9, 60 sec: 63897.5, 300 sec: 58204.9). Total num frames: 1833435136. Throughput: 0: 16497.8. Samples: 458417664. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:15:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:15:36,444][1652475] Updated weights for policy 0, policy_version 895290 (0.0058) [2024-06-15 23:15:37,574][1652475] Updated weights for policy 0, policy_version 895344 (0.0011) [2024-06-15 23:15:38,466][1652475] Updated weights for policy 0, policy_version 895378 (0.0010) [2024-06-15 23:15:40,007][1652475] Updated weights for policy 0, policy_version 895456 (0.0011) [2024-06-15 23:15:40,738][1648984] Fps is (10 sec: 72091.2, 60 sec: 69905.1, 300 sec: 59537.8). Total num frames: 1833959424. Throughput: 0: 16748.1. Samples: 458516480. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:15:40,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:15:44,497][1652475] Updated weights for policy 0, policy_version 895505 (0.0010) [2024-06-15 23:15:45,738][1648984] Fps is (10 sec: 68811.1, 60 sec: 64443.7, 300 sec: 58982.3). Total num frames: 1834123264. Throughput: 0: 16804.9. Samples: 458617856. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:15:45,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:15:45,763][1652475] Updated weights for policy 0, policy_version 895572 (0.0094) [2024-06-15 23:15:47,426][1652475] Updated weights for policy 0, policy_version 895664 (0.0011) [2024-06-15 23:15:48,569][1652475] Updated weights for policy 0, policy_version 895712 (0.0010) [2024-06-15 23:15:50,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 66082.3, 300 sec: 59537.8). Total num frames: 1834483712. Throughput: 0: 16293.0. Samples: 458652672. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:15:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:15:52,629][1652475] Updated weights for policy 0, policy_version 895746 (0.0011) [2024-06-15 23:15:53,891][1652475] Updated weights for policy 0, policy_version 895810 (0.0079) [2024-06-15 23:15:55,575][1652475] Updated weights for policy 0, policy_version 895920 (0.0124) [2024-06-15 23:15:55,738][1648984] Fps is (10 sec: 72091.4, 60 sec: 67174.4, 300 sec: 59760.0). Total num frames: 1834844160. Throughput: 0: 16486.4. Samples: 458757632. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:15:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:15:56,665][1652475] Updated weights for policy 0, policy_version 895970 (0.0009) [2024-06-15 23:16:00,738][1648984] Fps is (10 sec: 52428.5, 60 sec: 62805.3, 300 sec: 59537.8). Total num frames: 1835008000. Throughput: 0: 16440.9. Samples: 458856960. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:16:01,521][1652475] Updated weights for policy 0, policy_version 896038 (0.0014) [2024-06-15 23:16:01,968][1651340] Signal inference workers to stop experience collection... (46100 times) [2024-06-15 23:16:02,013][1652475] InferenceWorker_p0-w0: stopping experience collection (46100 times) [2024-06-15 23:16:02,115][1651340] Signal inference workers to resume experience collection... (46100 times) [2024-06-15 23:16:02,115][1652475] InferenceWorker_p0-w0: resuming experience collection (46100 times) [2024-06-15 23:16:02,236][1652475] Updated weights for policy 0, policy_version 896083 (0.0012) [2024-06-15 23:16:03,429][1652475] Updated weights for policy 0, policy_version 896160 (0.0014) [2024-06-15 23:16:04,612][1652475] Updated weights for policy 0, policy_version 896224 (0.0088) [2024-06-15 23:16:05,137][1652475] Updated weights for policy 0, policy_version 896256 (0.0010) [2024-06-15 23:16:05,738][1648984] Fps is (10 sec: 68812.3, 60 sec: 67720.5, 300 sec: 60648.6). Total num frames: 1835532288. Throughput: 0: 16190.5. Samples: 458903040. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:16:09,438][1652475] Updated weights for policy 0, policy_version 896320 (0.0009) [2024-06-15 23:16:10,738][1648984] Fps is (10 sec: 78643.4, 60 sec: 65536.0, 300 sec: 60426.4). Total num frames: 1835794432. Throughput: 0: 16440.9. Samples: 459015168. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:10,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:16:10,740][1652475] Updated weights for policy 0, policy_version 896384 (0.0010) [2024-06-15 23:16:11,928][1652475] Updated weights for policy 0, policy_version 896448 (0.0011) [2024-06-15 23:16:12,850][1652475] Updated weights for policy 0, policy_version 896496 (0.0010) [2024-06-15 23:16:15,738][1648984] Fps is (10 sec: 52429.0, 60 sec: 63351.5, 300 sec: 60870.7). Total num frames: 1836056576. Throughput: 0: 16395.4. Samples: 459109888. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:15,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:16:16,490][1652475] Updated weights for policy 0, policy_version 896544 (0.0010) [2024-06-15 23:16:17,522][1652475] Updated weights for policy 0, policy_version 896592 (0.0012) [2024-06-15 23:16:18,601][1652475] Updated weights for policy 0, policy_version 896647 (0.0076) [2024-06-15 23:16:19,506][1652475] Updated weights for policy 0, policy_version 896691 (0.0010) [2024-06-15 23:16:20,721][1652475] Updated weights for policy 0, policy_version 896758 (0.0010) [2024-06-15 23:16:20,738][1648984] Fps is (10 sec: 75366.4, 60 sec: 67174.3, 300 sec: 61204.0). Total num frames: 1836548096. Throughput: 0: 16452.3. Samples: 459158016. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:20,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:16:24,967][1652475] Updated weights for policy 0, policy_version 896789 (0.0010) [2024-06-15 23:16:25,738][1648984] Fps is (10 sec: 65535.8, 60 sec: 63351.4, 300 sec: 61315.0). Total num frames: 1836711936. Throughput: 0: 16531.9. Samples: 459260416. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:25,739][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:16:25,946][1652475] Updated weights for policy 0, policy_version 896848 (0.0011) [2024-06-15 23:16:27,317][1652475] Updated weights for policy 0, policy_version 896925 (0.0059) [2024-06-15 23:16:27,481][1651340] Signal inference workers to stop experience collection... (46150 times) [2024-06-15 23:16:27,528][1652475] InferenceWorker_p0-w0: stopping experience collection (46150 times) [2024-06-15 23:16:27,655][1651340] Signal inference workers to resume experience collection... (46150 times) [2024-06-15 23:16:27,656][1652475] InferenceWorker_p0-w0: resuming experience collection (46150 times) [2024-06-15 23:16:27,975][1652475] Updated weights for policy 0, policy_version 896960 (0.0011) [2024-06-15 23:16:29,229][1652475] Updated weights for policy 0, policy_version 897015 (0.0010) [2024-06-15 23:16:30,738][1648984] Fps is (10 sec: 55705.3, 60 sec: 64443.9, 300 sec: 61426.1). Total num frames: 1837105152. Throughput: 0: 16213.4. Samples: 459347456. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:16:32,270][1652475] Updated weights for policy 0, policy_version 897041 (0.0012) [2024-06-15 23:16:33,501][1652475] Updated weights for policy 0, policy_version 897120 (0.0012) [2024-06-15 23:16:34,822][1652475] Updated weights for policy 0, policy_version 897168 (0.0021) [2024-06-15 23:16:35,737][1648984] Fps is (10 sec: 78644.5, 60 sec: 67720.7, 300 sec: 62092.6). Total num frames: 1837498368. Throughput: 0: 16531.9. Samples: 459396608. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:35,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:16:36,641][1652475] Updated weights for policy 0, policy_version 897232 (0.0011) [2024-06-15 23:16:40,140][1652475] Updated weights for policy 0, policy_version 897299 (0.0011) [2024-06-15 23:16:40,738][1648984] Fps is (10 sec: 62259.5, 60 sec: 62805.4, 300 sec: 61759.3). Total num frames: 1837727744. Throughput: 0: 16463.7. Samples: 459498496. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:40,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:16:41,330][1652475] Updated weights for policy 0, policy_version 897364 (0.0010) [2024-06-15 23:16:41,911][1652475] Updated weights for policy 0, policy_version 897408 (0.0012) [2024-06-15 23:16:43,813][1652475] Updated weights for policy 0, policy_version 897469 (0.0010) [2024-06-15 23:16:45,162][1652475] Updated weights for policy 0, policy_version 897520 (0.0012) [2024-06-15 23:16:45,738][1648984] Fps is (10 sec: 65535.6, 60 sec: 67174.7, 300 sec: 62536.9). Total num frames: 1838153728. Throughput: 0: 16349.9. Samples: 459592704. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:45,740][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:16:48,450][1652475] Updated weights for policy 0, policy_version 897586 (0.0082) [2024-06-15 23:16:49,818][1652475] Updated weights for policy 0, policy_version 897659 (0.0011) [2024-06-15 23:16:50,743][1648984] Fps is (10 sec: 68778.7, 60 sec: 65530.6, 300 sec: 62202.6). Total num frames: 1838415872. Throughput: 0: 16587.0. Samples: 459649536. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:50,743][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:16:51,722][1652475] Updated weights for policy 0, policy_version 897727 (0.0013) [2024-06-15 23:16:53,948][1652475] Updated weights for policy 0, policy_version 897792 (0.0088) [2024-06-15 23:16:55,738][1648984] Fps is (10 sec: 52428.1, 60 sec: 63897.5, 300 sec: 62425.8). Total num frames: 1838678016. Throughput: 0: 15985.7. Samples: 459734528. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:16:55,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:16:56,163][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000897824_1838743552.pth... [2024-06-15 23:16:56,291][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000890240_1823211520.pth [2024-06-15 23:16:56,702][1652475] Updated weights for policy 0, policy_version 897853 (0.0014) [2024-06-15 23:16:57,855][1652475] Updated weights for policy 0, policy_version 897910 (0.0011) [2024-06-15 23:16:58,912][1651340] Signal inference workers to stop experience collection... (46200 times) [2024-06-15 23:16:58,943][1652475] InferenceWorker_p0-w0: stopping experience collection (46200 times) [2024-06-15 23:16:59,099][1651340] Signal inference workers to resume experience collection... (46200 times) [2024-06-15 23:16:59,100][1652475] InferenceWorker_p0-w0: resuming experience collection (46200 times) [2024-06-15 23:16:59,252][1652475] Updated weights for policy 0, policy_version 897956 (0.0011) [2024-06-15 23:17:00,738][1648984] Fps is (10 sec: 68846.7, 60 sec: 68266.7, 300 sec: 62759.0). Total num frames: 1839104000. Throughput: 0: 16179.2. Samples: 459837952. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:17:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:17:01,007][1652475] Updated weights for policy 0, policy_version 898016 (0.0091) [2024-06-15 23:17:04,594][1652475] Updated weights for policy 0, policy_version 898096 (0.0011) [2024-06-15 23:17:05,738][1648984] Fps is (10 sec: 68813.9, 60 sec: 63897.7, 300 sec: 62759.1). Total num frames: 1839366144. Throughput: 0: 16304.4. Samples: 459891712. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:17:05,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 23:17:05,743][1652475] Updated weights for policy 0, policy_version 898144 (0.0016) [2024-06-15 23:17:06,956][1652475] Updated weights for policy 0, policy_version 898225 (0.0084) [2024-06-15 23:17:08,535][1652475] Updated weights for policy 0, policy_version 898261 (0.0011) [2024-06-15 23:17:10,738][1648984] Fps is (10 sec: 62257.3, 60 sec: 65535.6, 300 sec: 63092.3). Total num frames: 1839726592. Throughput: 0: 16019.8. Samples: 459981312. Policy #0 lag: (min: 15.0, avg: 149.5, max: 271.0) [2024-06-15 23:17:10,740][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:17:12,567][1652475] Updated weights for policy 0, policy_version 898336 (0.0012) [2024-06-15 23:17:13,659][1652475] Updated weights for policy 0, policy_version 898403 (0.0011) [2024-06-15 23:17:14,972][1652475] Updated weights for policy 0, policy_version 898466 (0.0010) [2024-06-15 23:17:15,737][1648984] Fps is (10 sec: 75366.8, 60 sec: 67720.7, 300 sec: 63980.9). Total num frames: 1840119808. Throughput: 0: 16384.1. Samples: 460084736. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:17:15,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 23:17:16,032][1652475] Updated weights for policy 0, policy_version 898515 (0.0011) [2024-06-15 23:17:19,536][1652475] Updated weights for policy 0, policy_version 898567 (0.0011) [2024-06-15 23:17:20,220][1652475] Updated weights for policy 0, policy_version 898617 (0.0012) [2024-06-15 23:17:20,738][1648984] Fps is (10 sec: 65538.4, 60 sec: 63897.6, 300 sec: 63536.6). Total num frames: 1840381952. Throughput: 0: 16463.6. Samples: 460137472. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:17:20,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:17:21,716][1652475] Updated weights for policy 0, policy_version 898672 (0.0079) [2024-06-15 23:17:22,580][1652475] Updated weights for policy 0, policy_version 898707 (0.0012) [2024-06-15 23:17:23,971][1652475] Updated weights for policy 0, policy_version 898784 (0.0096) [2024-06-15 23:17:25,738][1648984] Fps is (10 sec: 65535.2, 60 sec: 67720.6, 300 sec: 64425.2). Total num frames: 1840775168. Throughput: 0: 16361.2. Samples: 460234752. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:17:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:17:27,292][1652475] Updated weights for policy 0, policy_version 898848 (0.0013) [2024-06-15 23:17:28,956][1652475] Updated weights for policy 0, policy_version 898896 (0.0010) [2024-06-15 23:17:29,764][1652475] Updated weights for policy 0, policy_version 898942 (0.0012) [2024-06-15 23:17:29,859][1651340] Signal inference workers to stop experience collection... (46250 times) [2024-06-15 23:17:29,895][1651340] Signal inference workers to resume experience collection... (46250 times) [2024-06-15 23:17:29,914][1652475] InferenceWorker_p0-w0: stopping experience collection (46250 times) [2024-06-15 23:17:29,963][1652475] InferenceWorker_p0-w0: resuming experience collection (46250 times) [2024-06-15 23:17:30,738][1648984] Fps is (10 sec: 72089.2, 60 sec: 66628.3, 300 sec: 64203.1). Total num frames: 1841102848. Throughput: 0: 16736.7. Samples: 460345856. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:17:30,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:17:31,211][1652475] Updated weights for policy 0, policy_version 899008 (0.0011) [2024-06-15 23:17:32,236][1652475] Updated weights for policy 0, policy_version 899062 (0.0011) [2024-06-15 23:17:35,133][1652475] Updated weights for policy 0, policy_version 899120 (0.0012) [2024-06-15 23:17:35,738][1648984] Fps is (10 sec: 65536.1, 60 sec: 65535.9, 300 sec: 64980.6). Total num frames: 1841430528. Throughput: 0: 16499.6. Samples: 460391936. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:17:35,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:17:36,745][1652475] Updated weights for policy 0, policy_version 899154 (0.0009) [2024-06-15 23:17:37,578][1652475] Updated weights for policy 0, policy_version 899200 (0.0012) [2024-06-15 23:17:39,167][1652475] Updated weights for policy 0, policy_version 899265 (0.0012) [2024-06-15 23:17:40,272][1652475] Updated weights for policy 0, policy_version 899327 (0.0012) [2024-06-15 23:17:40,738][1648984] Fps is (10 sec: 72089.6, 60 sec: 68266.6, 300 sec: 64869.5). Total num frames: 1841823744. Throughput: 0: 16873.3. Samples: 460493824. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:17:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:17:43,064][1652475] Updated weights for policy 0, policy_version 899388 (0.0159) [2024-06-15 23:17:45,179][1652475] Updated weights for policy 0, policy_version 899451 (0.0017) [2024-06-15 23:17:45,738][1648984] Fps is (10 sec: 65536.4, 60 sec: 65536.0, 300 sec: 65647.1). Total num frames: 1842085888. Throughput: 0: 16770.9. Samples: 460592640. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:17:45,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:17:46,526][1652475] Updated weights for policy 0, policy_version 899495 (0.0010) [2024-06-15 23:17:48,115][1652475] Updated weights for policy 0, policy_version 899542 (0.0010) [2024-06-15 23:17:50,034][1652475] Updated weights for policy 0, policy_version 899585 (0.0012) [2024-06-15 23:17:50,738][1648984] Fps is (10 sec: 62259.0, 60 sec: 67179.9, 300 sec: 65424.9). Total num frames: 1842446336. Throughput: 0: 16645.6. Samples: 460640768. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:17:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:17:50,858][1652475] Updated weights for policy 0, policy_version 899648 (0.0010) [2024-06-15 23:17:53,343][1652475] Updated weights for policy 0, policy_version 899713 (0.0012) [2024-06-15 23:17:54,309][1652475] Updated weights for policy 0, policy_version 899768 (0.0012) [2024-06-15 23:17:55,738][1648984] Fps is (10 sec: 65535.8, 60 sec: 67720.7, 300 sec: 66202.5). Total num frames: 1842741248. Throughput: 0: 16680.0. Samples: 460731904. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:17:55,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 23:17:57,067][1652475] Updated weights for policy 0, policy_version 899814 (0.0011) [2024-06-15 23:17:59,479][1652475] Updated weights for policy 0, policy_version 899875 (0.0010) [2024-06-15 23:18:00,401][1651340] Signal inference workers to stop experience collection... (46300 times) [2024-06-15 23:18:00,451][1652475] InferenceWorker_p0-w0: stopping experience collection (46300 times) [2024-06-15 23:18:00,620][1651340] Signal inference workers to resume experience collection... (46300 times) [2024-06-15 23:18:00,621][1652475] InferenceWorker_p0-w0: resuming experience collection (46300 times) [2024-06-15 23:18:00,738][1648984] Fps is (10 sec: 62259.7, 60 sec: 66082.2, 300 sec: 65758.1). Total num frames: 1843068928. Throughput: 0: 16657.0. Samples: 460834304. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:18:01,120][1652475] Updated weights for policy 0, policy_version 899953 (0.0011) [2024-06-15 23:18:01,960][1652475] Updated weights for policy 0, policy_version 900001 (0.0010) [2024-06-15 23:18:02,372][1652475] Updated weights for policy 0, policy_version 900030 (0.0011) [2024-06-15 23:18:05,605][1652475] Updated weights for policy 0, policy_version 900080 (0.0012) [2024-06-15 23:18:05,738][1648984] Fps is (10 sec: 62258.8, 60 sec: 66628.2, 300 sec: 66091.4). Total num frames: 1843363840. Throughput: 0: 16509.1. Samples: 460880384. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:18:07,548][1652475] Updated weights for policy 0, policy_version 900152 (0.0013) [2024-06-15 23:18:09,254][1652475] Updated weights for policy 0, policy_version 900224 (0.0010) [2024-06-15 23:18:10,412][1652475] Updated weights for policy 0, policy_version 900282 (0.0010) [2024-06-15 23:18:10,738][1648984] Fps is (10 sec: 72087.6, 60 sec: 67720.6, 300 sec: 66202.4). Total num frames: 1843789824. Throughput: 0: 16566.0. Samples: 460980224. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:10,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:18:13,403][1652475] Updated weights for policy 0, policy_version 900327 (0.0021) [2024-06-15 23:18:14,274][1652475] Updated weights for policy 0, policy_version 900354 (0.0048) [2024-06-15 23:18:15,103][1652475] Updated weights for policy 0, policy_version 900405 (0.0012) [2024-06-15 23:18:15,738][1648984] Fps is (10 sec: 68813.1, 60 sec: 65535.9, 300 sec: 65758.2). Total num frames: 1844051968. Throughput: 0: 16384.0. Samples: 461083136. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:18:16,521][1652475] Updated weights for policy 0, policy_version 900451 (0.0010) [2024-06-15 23:18:17,517][1652475] Updated weights for policy 0, policy_version 900512 (0.0010) [2024-06-15 23:18:20,415][1652475] Updated weights for policy 0, policy_version 900546 (0.0011) [2024-06-15 23:18:20,738][1648984] Fps is (10 sec: 55707.4, 60 sec: 66082.2, 300 sec: 66202.5). Total num frames: 1844346880. Throughput: 0: 16418.2. Samples: 461130752. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:20,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:18:21,515][1652475] Updated weights for policy 0, policy_version 900608 (0.0011) [2024-06-15 23:18:23,074][1652475] Updated weights for policy 0, policy_version 900670 (0.0012) [2024-06-15 23:18:24,592][1652475] Updated weights for policy 0, policy_version 900720 (0.0009) [2024-06-15 23:18:25,738][1648984] Fps is (10 sec: 72089.4, 60 sec: 66628.3, 300 sec: 66424.6). Total num frames: 1844772864. Throughput: 0: 16543.3. Samples: 461238272. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:18:25,936][1652475] Updated weights for policy 0, policy_version 900791 (0.0016) [2024-06-15 23:18:28,940][1652475] Updated weights for policy 0, policy_version 900834 (0.0012) [2024-06-15 23:18:30,738][1648984] Fps is (10 sec: 68812.3, 60 sec: 65536.0, 300 sec: 65980.3). Total num frames: 1845035008. Throughput: 0: 16577.4. Samples: 461338624. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:30,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 23:18:30,759][1651340] Signal inference workers to stop experience collection... (46350 times) [2024-06-15 23:18:30,795][1652475] Updated weights for policy 0, policy_version 900899 (0.0011) [2024-06-15 23:18:30,817][1652475] InferenceWorker_p0-w0: stopping experience collection (46350 times) [2024-06-15 23:18:31,001][1651340] Signal inference workers to resume experience collection... (46350 times) [2024-06-15 23:18:31,001][1652475] InferenceWorker_p0-w0: resuming experience collection (46350 times) [2024-06-15 23:18:32,356][1652475] Updated weights for policy 0, policy_version 900945 (0.0012) [2024-06-15 23:18:33,512][1652475] Updated weights for policy 0, policy_version 901009 (0.0012) [2024-06-15 23:18:35,738][1648984] Fps is (10 sec: 58982.6, 60 sec: 65536.0, 300 sec: 66202.5). Total num frames: 1845362688. Throughput: 0: 16418.2. Samples: 461379584. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:35,738][1648984] Avg episode reward: [(0, '-0.150')] [2024-06-15 23:18:36,052][1652475] Updated weights for policy 0, policy_version 901059 (0.0009) [2024-06-15 23:18:39,191][1652475] Updated weights for policy 0, policy_version 901136 (0.0013) [2024-06-15 23:18:40,707][1652475] Updated weights for policy 0, policy_version 901212 (0.0010) [2024-06-15 23:18:40,738][1648984] Fps is (10 sec: 65532.4, 60 sec: 64443.2, 300 sec: 65980.2). Total num frames: 1845690368. Throughput: 0: 16747.9. Samples: 461485568. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:40,739][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:18:41,726][1652475] Updated weights for policy 0, policy_version 901280 (0.0055) [2024-06-15 23:18:43,793][1652475] Updated weights for policy 0, policy_version 901328 (0.0012) [2024-06-15 23:18:44,575][1652475] Updated weights for policy 0, policy_version 901374 (0.0011) [2024-06-15 23:18:45,738][1648984] Fps is (10 sec: 65534.8, 60 sec: 65535.8, 300 sec: 65980.3). Total num frames: 1846018048. Throughput: 0: 16452.2. Samples: 461574656. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:45,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:18:47,379][1652475] Updated weights for policy 0, policy_version 901424 (0.0011) [2024-06-15 23:18:48,738][1652475] Updated weights for policy 0, policy_version 901488 (0.0011) [2024-06-15 23:18:50,661][1652475] Updated weights for policy 0, policy_version 901538 (0.0018) [2024-06-15 23:18:50,738][1648984] Fps is (10 sec: 65539.4, 60 sec: 64989.9, 300 sec: 65980.3). Total num frames: 1846345728. Throughput: 0: 16611.6. Samples: 461627904. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:18:51,941][1652475] Updated weights for policy 0, policy_version 901604 (0.0010) [2024-06-15 23:18:54,094][1652475] Updated weights for policy 0, policy_version 901633 (0.0011) [2024-06-15 23:18:54,853][1652475] Updated weights for policy 0, policy_version 901688 (0.0011) [2024-06-15 23:18:55,747][1648984] Fps is (10 sec: 65484.3, 60 sec: 65527.2, 300 sec: 65756.4). Total num frames: 1846673408. Throughput: 0: 16620.1. Samples: 461728256. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:18:55,762][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:18:56,140][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000901728_1846738944.pth... [2024-06-15 23:18:56,146][1652475] Updated weights for policy 0, policy_version 901728 (0.0010) [2024-06-15 23:18:56,251][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000893952_1830813696.pth [2024-06-15 23:18:58,313][1652475] Updated weights for policy 0, policy_version 901792 (0.0014) [2024-06-15 23:18:59,286][1652475] Updated weights for policy 0, policy_version 901840 (0.0010) [2024-06-15 23:18:59,664][1651340] Signal inference workers to stop experience collection... (46400 times) [2024-06-15 23:18:59,691][1652475] InferenceWorker_p0-w0: stopping experience collection (46400 times) [2024-06-15 23:18:59,870][1651340] Signal inference workers to resume experience collection... (46400 times) [2024-06-15 23:18:59,870][1652475] InferenceWorker_p0-w0: resuming experience collection (46400 times) [2024-06-15 23:19:00,088][1652475] Updated weights for policy 0, policy_version 901884 (0.0010) [2024-06-15 23:19:00,738][1648984] Fps is (10 sec: 72089.6, 60 sec: 66628.2, 300 sec: 65980.3). Total num frames: 1847066624. Throughput: 0: 16543.3. Samples: 461827584. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:19:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:19:02,181][1652475] Updated weights for policy 0, policy_version 901949 (0.0074) [2024-06-15 23:19:05,604][1652475] Updated weights for policy 0, policy_version 902021 (0.0011) [2024-06-15 23:19:05,738][1648984] Fps is (10 sec: 65588.8, 60 sec: 66082.2, 300 sec: 65758.2). Total num frames: 1847328768. Throughput: 0: 16736.7. Samples: 461883904. Policy #0 lag: (min: 0.0, avg: 43.1, max: 192.0) [2024-06-15 23:19:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:19:06,538][1652475] Updated weights for policy 0, policy_version 902074 (0.0010) [2024-06-15 23:19:07,571][1652475] Updated weights for policy 0, policy_version 902114 (0.0023) [2024-06-15 23:19:09,435][1652475] Updated weights for policy 0, policy_version 902163 (0.0010) [2024-06-15 23:19:10,082][1652475] Updated weights for policy 0, policy_version 902208 (0.0010) [2024-06-15 23:19:10,738][1648984] Fps is (10 sec: 68812.9, 60 sec: 66082.4, 300 sec: 66313.5). Total num frames: 1847754752. Throughput: 0: 16668.4. Samples: 461988352. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:19:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:19:11,261][1652475] Updated weights for policy 0, policy_version 902268 (0.0009) [2024-06-15 23:19:13,747][1652475] Updated weights for policy 0, policy_version 902320 (0.0012) [2024-06-15 23:19:14,500][1652475] Updated weights for policy 0, policy_version 902353 (0.0009) [2024-06-15 23:19:15,739][1648984] Fps is (10 sec: 78641.9, 60 sec: 67720.3, 300 sec: 66202.4). Total num frames: 1848115200. Throughput: 0: 16668.4. Samples: 462088704. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:19:15,740][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:19:16,412][1652475] Updated weights for policy 0, policy_version 902401 (0.0010) [2024-06-15 23:19:17,899][1652475] Updated weights for policy 0, policy_version 902480 (0.0010) [2024-06-15 23:19:20,362][1652475] Updated weights for policy 0, policy_version 902529 (0.0011) [2024-06-15 23:19:20,738][1648984] Fps is (10 sec: 65534.4, 60 sec: 67720.2, 300 sec: 66313.5). Total num frames: 1848410112. Throughput: 0: 16884.5. Samples: 462139392. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:19:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:19:21,303][1652475] Updated weights for policy 0, policy_version 902588 (0.0011) [2024-06-15 23:19:23,678][1652475] Updated weights for policy 0, policy_version 902640 (0.0010) [2024-06-15 23:19:24,875][1652475] Updated weights for policy 0, policy_version 902688 (0.0028) [2024-06-15 23:19:25,738][1648984] Fps is (10 sec: 68813.5, 60 sec: 67174.4, 300 sec: 66313.5). Total num frames: 1848803328. Throughput: 0: 16884.8. Samples: 462245376. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:19:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:19:26,244][1652475] Updated weights for policy 0, policy_version 902777 (0.0011) [2024-06-15 23:19:28,268][1652475] Updated weights for policy 0, policy_version 902817 (0.0009) [2024-06-15 23:19:30,738][1648984] Fps is (10 sec: 62260.8, 60 sec: 66628.3, 300 sec: 65869.2). Total num frames: 1849032704. Throughput: 0: 17078.1. Samples: 462343168. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:19:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:19:31,526][1652475] Updated weights for policy 0, policy_version 902864 (0.0012) [2024-06-15 23:19:32,118][1651340] Signal inference workers to stop experience collection... (46450 times) [2024-06-15 23:19:32,149][1652475] InferenceWorker_p0-w0: stopping experience collection (46450 times) [2024-06-15 23:19:32,271][1651340] Signal inference workers to resume experience collection... (46450 times) [2024-06-15 23:19:32,272][1652475] InferenceWorker_p0-w0: resuming experience collection (46450 times) [2024-06-15 23:19:32,830][1652475] Updated weights for policy 0, policy_version 902944 (0.0022) [2024-06-15 23:19:34,508][1652475] Updated weights for policy 0, policy_version 902996 (0.0011) [2024-06-15 23:19:35,524][1652475] Updated weights for policy 0, policy_version 903056 (0.0011) [2024-06-15 23:19:35,738][1648984] Fps is (10 sec: 65536.5, 60 sec: 68266.7, 300 sec: 66757.9). Total num frames: 1849458688. Throughput: 0: 16941.5. Samples: 462390272. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:19:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:19:36,296][1652475] Updated weights for policy 0, policy_version 903097 (0.0010) [2024-06-15 23:19:39,750][1652475] Updated weights for policy 0, policy_version 903162 (0.0010) [2024-06-15 23:19:40,618][1652475] Updated weights for policy 0, policy_version 903204 (0.0010) [2024-06-15 23:19:40,744][1648984] Fps is (10 sec: 75340.9, 60 sec: 68263.4, 300 sec: 66201.8). Total num frames: 1849786368. Throughput: 0: 17022.9. Samples: 462494208. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:19:40,747][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:19:41,852][1652475] Updated weights for policy 0, policy_version 903234 (0.0011) [2024-06-15 23:19:42,990][1652475] Updated weights for policy 0, policy_version 903298 (0.0072) [2024-06-15 23:19:43,904][1652475] Updated weights for policy 0, policy_version 903353 (0.0012) [2024-06-15 23:19:45,738][1648984] Fps is (10 sec: 62258.9, 60 sec: 67720.7, 300 sec: 66313.6). Total num frames: 1850081280. Throughput: 0: 17157.7. Samples: 462599680. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:19:45,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:19:46,982][1652475] Updated weights for policy 0, policy_version 903393 (0.0012) [2024-06-15 23:19:47,736][1652475] Updated weights for policy 0, policy_version 903443 (0.0010) [2024-06-15 23:19:49,275][1652475] Updated weights for policy 0, policy_version 903489 (0.0012) [2024-06-15 23:19:50,597][1652475] Updated weights for policy 0, policy_version 903553 (0.0011) [2024-06-15 23:19:50,738][1648984] Fps is (10 sec: 68835.9, 60 sec: 68812.8, 300 sec: 66646.8). Total num frames: 1850474496. Throughput: 0: 17043.9. Samples: 462650880. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:19:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:19:51,482][1652475] Updated weights for policy 0, policy_version 903604 (0.0011) [2024-06-15 23:19:55,443][1652475] Updated weights for policy 0, policy_version 903682 (0.0010) [2024-06-15 23:19:55,738][1648984] Fps is (10 sec: 68812.8, 60 sec: 68275.8, 300 sec: 66202.5). Total num frames: 1850769408. Throughput: 0: 17055.3. Samples: 462755840. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:19:55,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:19:57,247][1652475] Updated weights for policy 0, policy_version 903746 (0.0065) [2024-06-15 23:19:58,412][1652475] Updated weights for policy 0, policy_version 903812 (0.0010) [2024-06-15 23:19:58,933][1651340] Signal inference workers to stop experience collection... (46500 times) [2024-06-15 23:19:59,005][1652475] InferenceWorker_p0-w0: stopping experience collection (46500 times) [2024-06-15 23:19:59,128][1651340] Signal inference workers to resume experience collection... (46500 times) [2024-06-15 23:19:59,128][1652475] InferenceWorker_p0-w0: resuming experience collection (46500 times) [2024-06-15 23:19:59,380][1652475] Updated weights for policy 0, policy_version 903868 (0.0010) [2024-06-15 23:20:00,738][1648984] Fps is (10 sec: 65536.8, 60 sec: 67720.7, 300 sec: 66646.8). Total num frames: 1851129856. Throughput: 0: 16827.8. Samples: 462845952. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:20:03,834][1652475] Updated weights for policy 0, policy_version 903924 (0.0011) [2024-06-15 23:20:05,115][1652475] Updated weights for policy 0, policy_version 903994 (0.0089) [2024-06-15 23:20:05,701][1652475] Updated weights for policy 0, policy_version 904032 (0.0013) [2024-06-15 23:20:05,738][1648984] Fps is (10 sec: 68811.0, 60 sec: 68812.5, 300 sec: 66424.6). Total num frames: 1851457536. Throughput: 0: 16952.9. Samples: 462902272. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:05,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:20:06,757][1652475] Updated weights for policy 0, policy_version 904066 (0.0010) [2024-06-15 23:20:07,683][1652475] Updated weights for policy 0, policy_version 904127 (0.0010) [2024-06-15 23:20:10,738][1648984] Fps is (10 sec: 55703.4, 60 sec: 65535.7, 300 sec: 65869.2). Total num frames: 1851686912. Throughput: 0: 16554.6. Samples: 462990336. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:10,739][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:20:11,497][1652475] Updated weights for policy 0, policy_version 904188 (0.0010) [2024-06-15 23:20:12,893][1652475] Updated weights for policy 0, policy_version 904248 (0.0012) [2024-06-15 23:20:13,791][1652475] Updated weights for policy 0, policy_version 904288 (0.0074) [2024-06-15 23:20:15,738][1648984] Fps is (10 sec: 58983.9, 60 sec: 65536.1, 300 sec: 66202.4). Total num frames: 1852047360. Throughput: 0: 16429.5. Samples: 463082496. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:20:16,320][1652475] Updated weights for policy 0, policy_version 904340 (0.0014) [2024-06-15 23:20:18,724][1652475] Updated weights for policy 0, policy_version 904401 (0.0014) [2024-06-15 23:20:19,707][1652475] Updated weights for policy 0, policy_version 904450 (0.0050) [2024-06-15 23:20:20,588][1652475] Updated weights for policy 0, policy_version 904506 (0.0010) [2024-06-15 23:20:20,738][1648984] Fps is (10 sec: 75368.5, 60 sec: 67174.7, 300 sec: 66202.5). Total num frames: 1852440576. Throughput: 0: 16600.2. Samples: 463137280. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:20:22,125][1652475] Updated weights for policy 0, policy_version 904560 (0.0011) [2024-06-15 23:20:25,098][1652475] Updated weights for policy 0, policy_version 904610 (0.0011) [2024-06-15 23:20:25,749][1648984] Fps is (10 sec: 65471.5, 60 sec: 64979.2, 300 sec: 65978.1). Total num frames: 1852702720. Throughput: 0: 16449.9. Samples: 463234560. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:25,753][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:20:25,973][1652475] Updated weights for policy 0, policy_version 904641 (0.0021) [2024-06-15 23:20:26,994][1652475] Updated weights for policy 0, policy_version 904704 (0.0009) [2024-06-15 23:20:27,966][1652475] Updated weights for policy 0, policy_version 904766 (0.0012) [2024-06-15 23:20:29,386][1651340] Signal inference workers to stop experience collection... (46550 times) [2024-06-15 23:20:29,422][1652475] InferenceWorker_p0-w0: stopping experience collection (46550 times) [2024-06-15 23:20:29,602][1651340] Signal inference workers to resume experience collection... (46550 times) [2024-06-15 23:20:29,604][1652475] InferenceWorker_p0-w0: resuming experience collection (46550 times) [2024-06-15 23:20:29,764][1652475] Updated weights for policy 0, policy_version 904826 (0.0012) [2024-06-15 23:20:30,738][1648984] Fps is (10 sec: 65536.3, 60 sec: 67720.6, 300 sec: 66646.8). Total num frames: 1853095936. Throughput: 0: 16133.7. Samples: 463325696. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:30,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:20:34,069][1652475] Updated weights for policy 0, policy_version 904887 (0.0029) [2024-06-15 23:20:35,507][1652475] Updated weights for policy 0, policy_version 904946 (0.0010) [2024-06-15 23:20:35,738][1648984] Fps is (10 sec: 65599.6, 60 sec: 64989.6, 300 sec: 65758.1). Total num frames: 1853358080. Throughput: 0: 16361.2. Samples: 463387136. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:35,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:20:36,465][1652475] Updated weights for policy 0, policy_version 904999 (0.0010) [2024-06-15 23:20:37,711][1652475] Updated weights for policy 0, policy_version 905072 (0.0082) [2024-06-15 23:20:40,738][1648984] Fps is (10 sec: 52428.6, 60 sec: 63901.2, 300 sec: 66091.4). Total num frames: 1853620224. Throughput: 0: 15872.0. Samples: 463470080. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:40,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:20:41,878][1652475] Updated weights for policy 0, policy_version 905120 (0.0010) [2024-06-15 23:20:42,691][1652475] Updated weights for policy 0, policy_version 905157 (0.0010) [2024-06-15 23:20:44,206][1652475] Updated weights for policy 0, policy_version 905239 (0.0086) [2024-06-15 23:20:45,738][1648984] Fps is (10 sec: 72090.7, 60 sec: 66628.2, 300 sec: 66424.6). Total num frames: 1854078976. Throughput: 0: 15951.6. Samples: 463563776. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:45,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:20:45,808][1652475] Updated weights for policy 0, policy_version 905328 (0.0145) [2024-06-15 23:20:49,587][1652475] Updated weights for policy 0, policy_version 905363 (0.0023) [2024-06-15 23:20:50,738][1648984] Fps is (10 sec: 65536.3, 60 sec: 63351.5, 300 sec: 65869.2). Total num frames: 1854275584. Throughput: 0: 15906.2. Samples: 463618048. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:50,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 23:20:51,665][1652475] Updated weights for policy 0, policy_version 905440 (0.0014) [2024-06-15 23:20:53,070][1652475] Updated weights for policy 0, policy_version 905511 (0.0012) [2024-06-15 23:20:54,060][1652475] Updated weights for policy 0, policy_version 905569 (0.0012) [2024-06-15 23:20:55,738][1648984] Fps is (10 sec: 58980.9, 60 sec: 64989.6, 300 sec: 66646.7). Total num frames: 1854668800. Throughput: 0: 15837.9. Samples: 463703040. Policy #0 lag: (min: 16.0, avg: 114.6, max: 272.0) [2024-06-15 23:20:55,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:20:55,743][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000905600_1854668800.pth... [2024-06-15 23:20:55,800][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000897824_1838743552.pth [2024-06-15 23:20:56,956][1652475] Updated weights for policy 0, policy_version 905602 (0.0010) [2024-06-15 23:20:57,926][1652475] Updated weights for policy 0, policy_version 905663 (0.0009) [2024-06-15 23:21:00,192][1652475] Updated weights for policy 0, policy_version 905721 (0.0011) [2024-06-15 23:21:00,738][1648984] Fps is (10 sec: 65535.7, 60 sec: 63351.3, 300 sec: 65758.2). Total num frames: 1854930944. Throughput: 0: 15997.2. Samples: 463802368. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:00,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 23:21:01,219][1651340] Signal inference workers to stop experience collection... (46600 times) [2024-06-15 23:21:01,281][1652475] InferenceWorker_p0-w0: stopping experience collection (46600 times) [2024-06-15 23:21:01,480][1651340] Signal inference workers to resume experience collection... (46600 times) [2024-06-15 23:21:01,481][1652475] InferenceWorker_p0-w0: resuming experience collection (46600 times) [2024-06-15 23:21:01,617][1652475] Updated weights for policy 0, policy_version 905763 (0.0010) [2024-06-15 23:21:03,626][1652475] Updated weights for policy 0, policy_version 905848 (0.0091) [2024-06-15 23:21:05,330][1652475] Updated weights for policy 0, policy_version 905888 (0.0009) [2024-06-15 23:21:05,738][1648984] Fps is (10 sec: 65538.5, 60 sec: 64444.1, 300 sec: 66202.5). Total num frames: 1855324160. Throughput: 0: 15837.9. Samples: 463849984. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:21:07,077][1652475] Updated weights for policy 0, policy_version 905940 (0.0010) [2024-06-15 23:21:09,562][1652475] Updated weights for policy 0, policy_version 906000 (0.0084) [2024-06-15 23:21:10,207][1652475] Updated weights for policy 0, policy_version 906042 (0.0010) [2024-06-15 23:21:10,738][1648984] Fps is (10 sec: 65536.0, 60 sec: 64990.2, 300 sec: 66202.5). Total num frames: 1855586304. Throughput: 0: 16034.8. Samples: 463955968. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:10,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:21:11,611][1652475] Updated weights for policy 0, policy_version 906096 (0.0014) [2024-06-15 23:21:12,537][1652475] Updated weights for policy 0, policy_version 906144 (0.0090) [2024-06-15 23:21:14,775][1652475] Updated weights for policy 0, policy_version 906208 (0.0085) [2024-06-15 23:21:15,738][1648984] Fps is (10 sec: 65535.5, 60 sec: 65536.0, 300 sec: 65869.2). Total num frames: 1855979520. Throughput: 0: 16156.4. Samples: 464052736. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:15,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:21:16,924][1652475] Updated weights for policy 0, policy_version 906260 (0.0011) [2024-06-15 23:21:17,772][1652475] Updated weights for policy 0, policy_version 906306 (0.0009) [2024-06-15 23:21:18,580][1652475] Updated weights for policy 0, policy_version 906365 (0.0009) [2024-06-15 23:21:19,205][1652475] Updated weights for policy 0, policy_version 906414 (0.0028) [2024-06-15 23:21:20,438][1652475] Updated weights for policy 0, policy_version 906436 (0.0009) [2024-06-15 23:21:20,738][1648984] Fps is (10 sec: 81919.0, 60 sec: 66082.0, 300 sec: 66757.8). Total num frames: 1856405504. Throughput: 0: 16384.0. Samples: 464124416. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:20,739][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:21:22,483][1652475] Updated weights for policy 0, policy_version 906497 (0.0079) [2024-06-15 23:21:23,055][1652475] Updated weights for policy 0, policy_version 906544 (0.0010) [2024-06-15 23:21:23,943][1652475] Updated weights for policy 0, policy_version 906595 (0.0010) [2024-06-15 23:21:24,966][1652475] Updated weights for policy 0, policy_version 906659 (0.0010) [2024-06-15 23:21:25,740][1648984] Fps is (10 sec: 91739.6, 60 sec: 69915.2, 300 sec: 67090.8). Total num frames: 1856897024. Throughput: 0: 17510.0. Samples: 464258048. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:25,745][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 23:21:26,743][1652475] Updated weights for policy 0, policy_version 906710 (0.0013) [2024-06-15 23:21:27,189][1652475] Updated weights for policy 0, policy_version 906751 (0.0009) [2024-06-15 23:21:28,190][1651340] Signal inference workers to stop experience collection... (46650 times) [2024-06-15 23:21:28,234][1652475] InferenceWorker_p0-w0: stopping experience collection (46650 times) [2024-06-15 23:21:28,329][1651340] Signal inference workers to resume experience collection... (46650 times) [2024-06-15 23:21:28,329][1652475] InferenceWorker_p0-w0: resuming experience collection (46650 times) [2024-06-15 23:21:28,650][1652475] Updated weights for policy 0, policy_version 906813 (0.0010) [2024-06-15 23:21:29,625][1652475] Updated weights for policy 0, policy_version 906849 (0.0010) [2024-06-15 23:21:30,626][1652475] Updated weights for policy 0, policy_version 906904 (0.0011) [2024-06-15 23:21:30,738][1648984] Fps is (10 sec: 95028.1, 60 sec: 70997.3, 300 sec: 67313.2). Total num frames: 1857355776. Throughput: 0: 18397.9. Samples: 464391680. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:21:33,210][1652475] Updated weights for policy 0, policy_version 906960 (0.0011) [2024-06-15 23:21:34,104][1652475] Updated weights for policy 0, policy_version 907030 (0.0055) [2024-06-15 23:21:34,693][1652475] Updated weights for policy 0, policy_version 907073 (0.0010) [2024-06-15 23:21:35,386][1652475] Updated weights for policy 0, policy_version 907134 (0.0010) [2024-06-15 23:21:35,738][1648984] Fps is (10 sec: 91761.5, 60 sec: 74274.4, 300 sec: 68090.8). Total num frames: 1857814528. Throughput: 0: 18807.5. Samples: 464464384. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:35,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:21:37,846][1652475] Updated weights for policy 0, policy_version 907184 (0.0012) [2024-06-15 23:21:39,370][1652475] Updated weights for policy 0, policy_version 907235 (0.0011) [2024-06-15 23:21:40,324][1652475] Updated weights for policy 0, policy_version 907300 (0.0010) [2024-06-15 23:21:40,738][1648984] Fps is (10 sec: 85195.8, 60 sec: 76458.5, 300 sec: 67979.7). Total num frames: 1858207744. Throughput: 0: 19649.5. Samples: 464587264. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:40,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:21:41,042][1652475] Updated weights for policy 0, policy_version 907360 (0.0011) [2024-06-15 23:21:44,338][1652475] Updated weights for policy 0, policy_version 907411 (0.0010) [2024-06-15 23:21:45,157][1652475] Updated weights for policy 0, policy_version 907477 (0.0010) [2024-06-15 23:21:45,738][1648984] Fps is (10 sec: 78642.6, 60 sec: 75366.4, 300 sec: 68425.2). Total num frames: 1858600960. Throughput: 0: 20297.9. Samples: 464715776. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:21:45,787][1652475] Updated weights for policy 0, policy_version 907522 (0.0010) [2024-06-15 23:21:46,406][1652475] Updated weights for policy 0, policy_version 907579 (0.0010) [2024-06-15 23:21:47,552][1652475] Updated weights for policy 0, policy_version 907648 (0.0011) [2024-06-15 23:21:50,738][1648984] Fps is (10 sec: 78644.7, 60 sec: 78643.2, 300 sec: 68868.4). Total num frames: 1858994176. Throughput: 0: 20571.0. Samples: 464775680. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:21:50,753][1652475] Updated weights for policy 0, policy_version 907714 (0.0063) [2024-06-15 23:21:51,122][1651340] Signal inference workers to stop experience collection... (46700 times) [2024-06-15 23:21:51,154][1652475] InferenceWorker_p0-w0: stopping experience collection (46700 times) [2024-06-15 23:21:51,243][1651340] Signal inference workers to resume experience collection... (46700 times) [2024-06-15 23:21:51,243][1652475] InferenceWorker_p0-w0: resuming experience collection (46700 times) [2024-06-15 23:21:51,421][1652475] Updated weights for policy 0, policy_version 907776 (0.0011) [2024-06-15 23:21:52,629][1652475] Updated weights for policy 0, policy_version 907834 (0.0011) [2024-06-15 23:21:53,209][1652475] Updated weights for policy 0, policy_version 907876 (0.0011) [2024-06-15 23:21:55,738][1648984] Fps is (10 sec: 78643.3, 60 sec: 78643.6, 300 sec: 68757.3). Total num frames: 1859387392. Throughput: 0: 21208.2. Samples: 464910336. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:21:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:21:56,253][1652475] Updated weights for policy 0, policy_version 907952 (0.0011) [2024-06-15 23:21:56,823][1652475] Updated weights for policy 0, policy_version 907989 (0.0009) [2024-06-15 23:21:58,003][1652475] Updated weights for policy 0, policy_version 908052 (0.0011) [2024-06-15 23:21:58,627][1652475] Updated weights for policy 0, policy_version 908101 (0.0009) [2024-06-15 23:21:59,237][1652475] Updated weights for policy 0, policy_version 908154 (0.0057) [2024-06-15 23:22:00,738][1648984] Fps is (10 sec: 91750.1, 60 sec: 83012.3, 300 sec: 69645.9). Total num frames: 1859911680. Throughput: 0: 22107.0. Samples: 465047552. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:00,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:22:01,939][1652475] Updated weights for policy 0, policy_version 908216 (0.0012) [2024-06-15 23:22:02,577][1652475] Updated weights for policy 0, policy_version 908256 (0.0010) [2024-06-15 23:22:04,434][1652475] Updated weights for policy 0, policy_version 908340 (0.0009) [2024-06-15 23:22:05,238][1652475] Updated weights for policy 0, policy_version 908409 (0.0012) [2024-06-15 23:22:05,738][1648984] Fps is (10 sec: 104855.5, 60 sec: 85196.4, 300 sec: 70201.3). Total num frames: 1860435968. Throughput: 0: 22061.5. Samples: 465117184. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:05,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:22:07,969][1652475] Updated weights for policy 0, policy_version 908450 (0.0009) [2024-06-15 23:22:08,760][1652475] Updated weights for policy 0, policy_version 908512 (0.0010) [2024-06-15 23:22:09,563][1652475] Updated weights for policy 0, policy_version 908562 (0.0010) [2024-06-15 23:22:10,738][1648984] Fps is (10 sec: 91750.7, 60 sec: 87381.4, 300 sec: 70201.3). Total num frames: 1860829184. Throughput: 0: 21936.9. Samples: 465245184. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:22:11,561][1652475] Updated weights for policy 0, policy_version 908609 (0.0012) [2024-06-15 23:22:12,249][1652475] Updated weights for policy 0, policy_version 908672 (0.0011) [2024-06-15 23:22:13,424][1651340] Signal inference workers to stop experience collection... (46750 times) [2024-06-15 23:22:13,470][1652475] InferenceWorker_p0-w0: stopping experience collection (46750 times) [2024-06-15 23:22:13,553][1651340] Signal inference workers to resume experience collection... (46750 times) [2024-06-15 23:22:13,554][1652475] InferenceWorker_p0-w0: resuming experience collection (46750 times) [2024-06-15 23:22:14,377][1652475] Updated weights for policy 0, policy_version 908737 (0.0010) [2024-06-15 23:22:15,043][1652475] Updated weights for policy 0, policy_version 908800 (0.0011) [2024-06-15 23:22:15,738][1648984] Fps is (10 sec: 81922.0, 60 sec: 87927.5, 300 sec: 70756.7). Total num frames: 1861255168. Throughput: 0: 21743.0. Samples: 465370112. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:15,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:22:16,275][1652475] Updated weights for policy 0, policy_version 908864 (0.0009) [2024-06-15 23:22:18,815][1652475] Updated weights for policy 0, policy_version 908912 (0.0009) [2024-06-15 23:22:19,649][1652475] Updated weights for policy 0, policy_version 908976 (0.0010) [2024-06-15 23:22:20,595][1652475] Updated weights for policy 0, policy_version 909051 (0.0012) [2024-06-15 23:22:20,738][1648984] Fps is (10 sec: 91748.9, 60 sec: 89019.7, 300 sec: 71089.9). Total num frames: 1861746688. Throughput: 0: 21754.2. Samples: 465443328. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:20,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:22:22,339][1652475] Updated weights for policy 0, policy_version 909105 (0.0011) [2024-06-15 23:22:24,691][1652475] Updated weights for policy 0, policy_version 909152 (0.0036) [2024-06-15 23:22:25,379][1652475] Updated weights for policy 0, policy_version 909201 (0.0009) [2024-06-15 23:22:25,738][1648984] Fps is (10 sec: 85196.5, 60 sec: 86836.9, 300 sec: 71201.0). Total num frames: 1862107136. Throughput: 0: 21959.2. Samples: 465575424. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:22:26,018][1652475] Updated weights for policy 0, policy_version 909255 (0.0011) [2024-06-15 23:22:26,680][1652475] Updated weights for policy 0, policy_version 909312 (0.0012) [2024-06-15 23:22:28,199][1652475] Updated weights for policy 0, policy_version 909363 (0.0010) [2024-06-15 23:22:30,339][1652475] Updated weights for policy 0, policy_version 909395 (0.0012) [2024-06-15 23:22:30,738][1648984] Fps is (10 sec: 75367.4, 60 sec: 85743.0, 300 sec: 71423.1). Total num frames: 1862500352. Throughput: 0: 22209.4. Samples: 465715200. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:22:31,120][1652475] Updated weights for policy 0, policy_version 909459 (0.0010) [2024-06-15 23:22:31,824][1652475] Updated weights for policy 0, policy_version 909520 (0.0011) [2024-06-15 23:22:32,418][1652475] Updated weights for policy 0, policy_version 909568 (0.0009) [2024-06-15 23:22:33,764][1652475] Updated weights for policy 0, policy_version 909616 (0.0010) [2024-06-15 23:22:35,738][1648984] Fps is (10 sec: 81920.7, 60 sec: 85196.9, 300 sec: 71534.2). Total num frames: 1862926336. Throughput: 0: 22198.1. Samples: 465774592. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:35,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:22:35,739][1651340] Signal inference workers to stop experience collection... (46800 times) [2024-06-15 23:22:35,847][1652475] InferenceWorker_p0-w0: stopping experience collection (46800 times) [2024-06-15 23:22:35,883][1651340] Signal inference workers to resume experience collection... (46800 times) [2024-06-15 23:22:35,884][1652475] InferenceWorker_p0-w0: resuming experience collection (46800 times) [2024-06-15 23:22:36,087][1652475] Updated weights for policy 0, policy_version 909664 (0.0009) [2024-06-15 23:22:36,820][1652475] Updated weights for policy 0, policy_version 909715 (0.0010) [2024-06-15 23:22:38,589][1652475] Updated weights for policy 0, policy_version 909781 (0.0011) [2024-06-15 23:22:39,377][1652475] Updated weights for policy 0, policy_version 909845 (0.0009) [2024-06-15 23:22:40,738][1648984] Fps is (10 sec: 95027.4, 60 sec: 87381.6, 300 sec: 72422.8). Total num frames: 1863450624. Throughput: 0: 22095.7. Samples: 465904640. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:22:41,254][1652475] Updated weights for policy 0, policy_version 909890 (0.0011) [2024-06-15 23:22:41,940][1652475] Updated weights for policy 0, policy_version 909951 (0.0010) [2024-06-15 23:22:43,128][1652475] Updated weights for policy 0, policy_version 910002 (0.0011) [2024-06-15 23:22:44,412][1652475] Updated weights for policy 0, policy_version 910039 (0.0016) [2024-06-15 23:22:45,609][1652475] Updated weights for policy 0, policy_version 910096 (0.0010) [2024-06-15 23:22:45,738][1648984] Fps is (10 sec: 95026.5, 60 sec: 87927.5, 300 sec: 72645.0). Total num frames: 1863876608. Throughput: 0: 22016.0. Samples: 466038272. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:45,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:22:46,139][1652475] Updated weights for policy 0, policy_version 910142 (0.0009) [2024-06-15 23:22:47,171][1652475] Updated weights for policy 0, policy_version 910192 (0.0009) [2024-06-15 23:22:49,581][1652475] Updated weights for policy 0, policy_version 910243 (0.0017) [2024-06-15 23:22:50,297][1652475] Updated weights for policy 0, policy_version 910304 (0.0065) [2024-06-15 23:22:50,738][1648984] Fps is (10 sec: 91748.9, 60 sec: 89565.6, 300 sec: 73311.4). Total num frames: 1864368128. Throughput: 0: 21959.2. Samples: 466105344. Policy #0 lag: (min: 47.0, avg: 117.9, max: 255.0) [2024-06-15 23:22:50,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 23:22:51,782][1652475] Updated weights for policy 0, policy_version 910356 (0.0013) [2024-06-15 23:22:52,608][1652475] Updated weights for policy 0, policy_version 910421 (0.0009) [2024-06-15 23:22:53,094][1652475] Updated weights for policy 0, policy_version 910464 (0.0010) [2024-06-15 23:22:55,488][1652475] Updated weights for policy 0, policy_version 910522 (0.0013) [2024-06-15 23:22:55,738][1648984] Fps is (10 sec: 88473.1, 60 sec: 89565.8, 300 sec: 73533.6). Total num frames: 1864761344. Throughput: 0: 22152.5. Samples: 466242048. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:22:55,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:22:55,746][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000910528_1864761344.pth... [2024-06-15 23:22:55,799][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000901728_1846738944.pth [2024-06-15 23:22:56,470][1652475] Updated weights for policy 0, policy_version 910563 (0.0010) [2024-06-15 23:22:56,842][1651340] Signal inference workers to stop experience collection... (46850 times) [2024-06-15 23:22:56,876][1652475] InferenceWorker_p0-w0: stopping experience collection (46850 times) [2024-06-15 23:22:56,974][1651340] Signal inference workers to resume experience collection... (46850 times) [2024-06-15 23:22:56,974][1652475] InferenceWorker_p0-w0: resuming experience collection (46850 times) [2024-06-15 23:22:57,098][1652475] Updated weights for policy 0, policy_version 910613 (0.0010) [2024-06-15 23:22:58,101][1652475] Updated weights for policy 0, policy_version 910689 (0.0011) [2024-06-15 23:23:00,738][1648984] Fps is (10 sec: 85196.8, 60 sec: 88473.4, 300 sec: 74089.0). Total num frames: 1865220096. Throughput: 0: 22550.7. Samples: 466384896. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:00,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:23:00,770][1652475] Updated weights for policy 0, policy_version 910758 (0.0010) [2024-06-15 23:23:02,049][1652475] Updated weights for policy 0, policy_version 910800 (0.0011) [2024-06-15 23:23:02,754][1652475] Updated weights for policy 0, policy_version 910852 (0.0011) [2024-06-15 23:23:03,566][1652475] Updated weights for policy 0, policy_version 910914 (0.0010) [2024-06-15 23:23:04,300][1652475] Updated weights for policy 0, policy_version 910970 (0.0015) [2024-06-15 23:23:05,738][1648984] Fps is (10 sec: 91751.3, 60 sec: 87381.7, 300 sec: 74200.2). Total num frames: 1865678848. Throughput: 0: 22255.0. Samples: 466444800. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:05,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:23:06,584][1652475] Updated weights for policy 0, policy_version 911024 (0.0010) [2024-06-15 23:23:08,240][1652475] Updated weights for policy 0, policy_version 911075 (0.0010) [2024-06-15 23:23:09,258][1652475] Updated weights for policy 0, policy_version 911162 (0.0011) [2024-06-15 23:23:09,901][1652475] Updated weights for policy 0, policy_version 911201 (0.0010) [2024-06-15 23:23:10,738][1648984] Fps is (10 sec: 98303.2, 60 sec: 89565.5, 300 sec: 75088.6). Total num frames: 1866203136. Throughput: 0: 22266.2. Samples: 466577408. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:10,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:23:12,317][1652475] Updated weights for policy 0, policy_version 911249 (0.0019) [2024-06-15 23:23:12,857][1652475] Updated weights for policy 0, policy_version 911296 (0.0010) [2024-06-15 23:23:13,774][1652475] Updated weights for policy 0, policy_version 911345 (0.0011) [2024-06-15 23:23:14,578][1652475] Updated weights for policy 0, policy_version 911417 (0.0011) [2024-06-15 23:23:15,738][1648984] Fps is (10 sec: 95026.4, 60 sec: 89565.8, 300 sec: 75533.0). Total num frames: 1866629120. Throughput: 0: 22209.4. Samples: 466714624. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:23:15,995][1652475] Updated weights for policy 0, policy_version 911457 (0.0010) [2024-06-15 23:23:18,557][1652475] Updated weights for policy 0, policy_version 911520 (0.0012) [2024-06-15 23:23:18,830][1651340] Signal inference workers to stop experience collection... (46900 times) [2024-06-15 23:23:18,862][1652475] InferenceWorker_p0-w0: stopping experience collection (46900 times) [2024-06-15 23:23:18,962][1651340] Signal inference workers to resume experience collection... (46900 times) [2024-06-15 23:23:18,962][1652475] InferenceWorker_p0-w0: resuming experience collection (46900 times) [2024-06-15 23:23:19,268][1652475] Updated weights for policy 0, policy_version 911571 (0.0009) [2024-06-15 23:23:20,123][1652475] Updated weights for policy 0, policy_version 911635 (0.0010) [2024-06-15 23:23:20,738][1648984] Fps is (10 sec: 91752.6, 60 sec: 89566.1, 300 sec: 75755.2). Total num frames: 1867120640. Throughput: 0: 22630.4. Samples: 466792960. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:23:22,432][1652475] Updated weights for policy 0, policy_version 911686 (0.0011) [2024-06-15 23:23:23,673][1652475] Updated weights for policy 0, policy_version 911746 (0.0011) [2024-06-15 23:23:24,500][1652475] Updated weights for policy 0, policy_version 911811 (0.0081) [2024-06-15 23:23:25,105][1652475] Updated weights for policy 0, policy_version 911863 (0.0011) [2024-06-15 23:23:25,738][1648984] Fps is (10 sec: 88474.6, 60 sec: 90112.1, 300 sec: 76199.5). Total num frames: 1867513856. Throughput: 0: 22505.3. Samples: 466917376. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:23:26,522][1652475] Updated weights for policy 0, policy_version 911920 (0.0010) [2024-06-15 23:23:28,830][1652475] Updated weights for policy 0, policy_version 911968 (0.0009) [2024-06-15 23:23:29,664][1652475] Updated weights for policy 0, policy_version 912032 (0.0070) [2024-06-15 23:23:30,479][1652475] Updated weights for policy 0, policy_version 912096 (0.0014) [2024-06-15 23:23:30,738][1648984] Fps is (10 sec: 88473.0, 60 sec: 91750.3, 300 sec: 76754.9). Total num frames: 1868005376. Throughput: 0: 22562.1. Samples: 467053568. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:30,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:23:32,301][1652475] Updated weights for policy 0, policy_version 912149 (0.0010) [2024-06-15 23:23:34,895][1652475] Updated weights for policy 0, policy_version 912212 (0.0016) [2024-06-15 23:23:35,718][1652475] Updated weights for policy 0, policy_version 912274 (0.0010) [2024-06-15 23:23:35,738][1648984] Fps is (10 sec: 81919.6, 60 sec: 90111.9, 300 sec: 76755.0). Total num frames: 1868333056. Throughput: 0: 22425.7. Samples: 467114496. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:23:36,490][1652475] Updated weights for policy 0, policy_version 912336 (0.0013) [2024-06-15 23:23:38,089][1652475] Updated weights for policy 0, policy_version 912401 (0.0013) [2024-06-15 23:23:38,559][1652475] Updated weights for policy 0, policy_version 912447 (0.0011) [2024-06-15 23:23:40,738][1648984] Fps is (10 sec: 68812.9, 60 sec: 87381.3, 300 sec: 76866.0). Total num frames: 1868693504. Throughput: 0: 22380.1. Samples: 467249152. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:40,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:23:41,006][1651340] Signal inference workers to stop experience collection... (46950 times) [2024-06-15 23:23:41,058][1652475] InferenceWorker_p0-w0: stopping experience collection (46950 times) [2024-06-15 23:23:41,130][1651340] Signal inference workers to resume experience collection... (46950 times) [2024-06-15 23:23:41,130][1652475] InferenceWorker_p0-w0: resuming experience collection (46950 times) [2024-06-15 23:23:41,686][1652475] Updated weights for policy 0, policy_version 912516 (0.0011) [2024-06-15 23:23:42,346][1652475] Updated weights for policy 0, policy_version 912568 (0.0012) [2024-06-15 23:23:43,147][1652475] Updated weights for policy 0, policy_version 912635 (0.0011) [2024-06-15 23:23:44,181][1652475] Updated weights for policy 0, policy_version 912678 (0.0011) [2024-06-15 23:23:45,738][1648984] Fps is (10 sec: 88473.8, 60 sec: 89019.8, 300 sec: 77532.4). Total num frames: 1869217792. Throughput: 0: 22027.5. Samples: 467376128. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:45,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:23:46,775][1652475] Updated weights for policy 0, policy_version 912727 (0.0011) [2024-06-15 23:23:47,439][1652475] Updated weights for policy 0, policy_version 912784 (0.0013) [2024-06-15 23:23:48,849][1652475] Updated weights for policy 0, policy_version 912836 (0.0010) [2024-06-15 23:23:49,712][1652475] Updated weights for policy 0, policy_version 912899 (0.0011) [2024-06-15 23:23:50,738][1648984] Fps is (10 sec: 104858.3, 60 sec: 89566.1, 300 sec: 78201.0). Total num frames: 1869742080. Throughput: 0: 22163.9. Samples: 467442176. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:50,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:23:52,128][1652475] Updated weights for policy 0, policy_version 912976 (0.0013) [2024-06-15 23:23:53,831][1652475] Updated weights for policy 0, policy_version 913026 (0.0010) [2024-06-15 23:23:54,465][1652475] Updated weights for policy 0, policy_version 913086 (0.0011) [2024-06-15 23:23:55,331][1652475] Updated weights for policy 0, policy_version 913148 (0.0014) [2024-06-15 23:23:55,738][1648984] Fps is (10 sec: 91748.7, 60 sec: 89565.7, 300 sec: 78198.9). Total num frames: 1870135296. Throughput: 0: 22118.4. Samples: 467572736. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:23:55,739][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:23:56,365][1652475] Updated weights for policy 0, policy_version 913185 (0.0011) [2024-06-15 23:23:57,640][1652475] Updated weights for policy 0, policy_version 913248 (0.0018) [2024-06-15 23:24:00,037][1652475] Updated weights for policy 0, policy_version 913300 (0.0010) [2024-06-15 23:24:00,738][1648984] Fps is (10 sec: 78642.5, 60 sec: 88473.7, 300 sec: 78643.2). Total num frames: 1870528512. Throughput: 0: 22141.1. Samples: 467710976. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:24:00,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 23:24:00,853][1652475] Updated weights for policy 0, policy_version 913347 (0.0011) [2024-06-15 23:24:02,007][1652475] Updated weights for policy 0, policy_version 913428 (0.0012) [2024-06-15 23:24:02,176][1651340] Signal inference workers to stop experience collection... (47000 times) [2024-06-15 23:24:02,210][1652475] InferenceWorker_p0-w0: stopping experience collection (47000 times) [2024-06-15 23:24:02,345][1651340] Signal inference workers to resume experience collection... (47000 times) [2024-06-15 23:24:02,346][1652475] InferenceWorker_p0-w0: resuming experience collection (47000 times) [2024-06-15 23:24:02,732][1652475] Updated weights for policy 0, policy_version 913473 (0.0011) [2024-06-15 23:24:03,344][1652475] Updated weights for policy 0, policy_version 913532 (0.0011) [2024-06-15 23:24:05,738][1648984] Fps is (10 sec: 81921.1, 60 sec: 87927.4, 300 sec: 78643.2). Total num frames: 1870954496. Throughput: 0: 21674.7. Samples: 467768320. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:24:05,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 23:24:06,098][1652475] Updated weights for policy 0, policy_version 913584 (0.0012) [2024-06-15 23:24:07,033][1652475] Updated weights for policy 0, policy_version 913632 (0.0011) [2024-06-15 23:24:07,585][1652475] Updated weights for policy 0, policy_version 913666 (0.0009) [2024-06-15 23:24:08,391][1652475] Updated weights for policy 0, policy_version 913729 (0.0017) [2024-06-15 23:24:08,970][1652475] Updated weights for policy 0, policy_version 913779 (0.0010) [2024-06-15 23:24:10,738][1648984] Fps is (10 sec: 91750.9, 60 sec: 87381.6, 300 sec: 79087.6). Total num frames: 1871446016. Throughput: 0: 22016.0. Samples: 467908096. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:24:10,738][1648984] Avg episode reward: [(0, '-0.550')] [2024-06-15 23:24:11,510][1652475] Updated weights for policy 0, policy_version 913824 (0.0010) [2024-06-15 23:24:12,490][1652475] Updated weights for policy 0, policy_version 913888 (0.0012) [2024-06-15 23:24:13,679][1652475] Updated weights for policy 0, policy_version 913938 (0.0010) [2024-06-15 23:24:14,204][1652475] Updated weights for policy 0, policy_version 913984 (0.0009) [2024-06-15 23:24:14,962][1652475] Updated weights for policy 0, policy_version 914045 (0.0009) [2024-06-15 23:24:15,738][1648984] Fps is (10 sec: 101580.8, 60 sec: 89019.8, 300 sec: 79865.1). Total num frames: 1871970304. Throughput: 0: 22027.4. Samples: 468044800. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:24:15,738][1648984] Avg episode reward: [(0, '-0.540')] [2024-06-15 23:24:17,844][1652475] Updated weights for policy 0, policy_version 914112 (0.0011) [2024-06-15 23:24:19,055][1652475] Updated weights for policy 0, policy_version 914180 (0.0069) [2024-06-15 23:24:19,714][1652475] Updated weights for policy 0, policy_version 914235 (0.0010) [2024-06-15 23:24:20,738][1648984] Fps is (10 sec: 98304.2, 60 sec: 88473.6, 300 sec: 80087.2). Total num frames: 1872429056. Throughput: 0: 22334.6. Samples: 468119552. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:24:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:24:20,860][1652475] Updated weights for policy 0, policy_version 914297 (0.0023) [2024-06-15 23:24:23,531][1652475] Updated weights for policy 0, policy_version 914337 (0.0011) [2024-06-15 23:24:24,213][1652475] Updated weights for policy 0, policy_version 914391 (0.0011) [2024-06-15 23:24:24,358][1651340] Signal inference workers to stop experience collection... (47050 times) [2024-06-15 23:24:24,407][1652475] InferenceWorker_p0-w0: stopping experience collection (47050 times) [2024-06-15 23:24:24,479][1651340] Signal inference workers to resume experience collection... (47050 times) [2024-06-15 23:24:24,479][1652475] InferenceWorker_p0-w0: resuming experience collection (47050 times) [2024-06-15 23:24:25,007][1652475] Updated weights for policy 0, policy_version 914452 (0.0011) [2024-06-15 23:24:25,738][1648984] Fps is (10 sec: 91750.5, 60 sec: 89565.8, 300 sec: 80864.8). Total num frames: 1872887808. Throughput: 0: 22254.9. Samples: 468250624. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:24:25,738][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 23:24:27,048][1652475] Updated weights for policy 0, policy_version 914497 (0.0011) [2024-06-15 23:24:27,702][1652475] Updated weights for policy 0, policy_version 914559 (0.0009) [2024-06-15 23:24:28,840][1652475] Updated weights for policy 0, policy_version 914597 (0.0010) [2024-06-15 23:24:29,505][1652475] Updated weights for policy 0, policy_version 914644 (0.0010) [2024-06-15 23:24:30,738][1648984] Fps is (10 sec: 85194.8, 60 sec: 87927.2, 300 sec: 80753.6). Total num frames: 1873281024. Throughput: 0: 22254.8. Samples: 468377600. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:24:30,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:24:31,460][1652475] Updated weights for policy 0, policy_version 914704 (0.0010) [2024-06-15 23:24:32,066][1652475] Updated weights for policy 0, policy_version 914752 (0.0009) [2024-06-15 23:24:34,029][1652475] Updated weights for policy 0, policy_version 914801 (0.0010) [2024-06-15 23:24:34,788][1652475] Updated weights for policy 0, policy_version 914864 (0.0009) [2024-06-15 23:24:35,543][1652475] Updated weights for policy 0, policy_version 914914 (0.0008) [2024-06-15 23:24:35,738][1648984] Fps is (10 sec: 88473.4, 60 sec: 90658.1, 300 sec: 81310.0). Total num frames: 1873772544. Throughput: 0: 22493.8. Samples: 468454400. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:24:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:24:37,816][1652475] Updated weights for policy 0, policy_version 914968 (0.0079) [2024-06-15 23:24:38,213][1652475] Updated weights for policy 0, policy_version 915005 (0.0010) [2024-06-15 23:24:39,391][1652475] Updated weights for policy 0, policy_version 915048 (0.0009) [2024-06-15 23:24:40,017][1652475] Updated weights for policy 0, policy_version 915093 (0.0009) [2024-06-15 23:24:40,738][1648984] Fps is (10 sec: 95028.7, 60 sec: 92296.5, 300 sec: 81864.4). Total num frames: 1874231296. Throughput: 0: 22607.7. Samples: 468590080. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:24:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:24:40,919][1652475] Updated weights for policy 0, policy_version 915168 (0.0010) [2024-06-15 23:24:43,271][1652475] Updated weights for policy 0, policy_version 915222 (0.0012) [2024-06-15 23:24:44,936][1652475] Updated weights for policy 0, policy_version 915284 (0.0077) [2024-06-15 23:24:45,738][1648984] Fps is (10 sec: 85196.8, 60 sec: 90111.9, 300 sec: 81864.5). Total num frames: 1874624512. Throughput: 0: 22573.5. Samples: 468726784. Policy #0 lag: (min: 15.0, avg: 102.9, max: 271.0) [2024-06-15 23:24:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:24:45,820][1651340] Signal inference workers to stop experience collection... (47100 times) [2024-06-15 23:24:45,849][1652475] Updated weights for policy 0, policy_version 915346 (0.0078) [2024-06-15 23:24:45,896][1652475] InferenceWorker_p0-w0: stopping experience collection (47100 times) [2024-06-15 23:24:45,969][1651340] Signal inference workers to resume experience collection... (47100 times) [2024-06-15 23:24:45,970][1652475] InferenceWorker_p0-w0: resuming experience collection (47100 times) [2024-06-15 23:24:46,482][1652475] Updated weights for policy 0, policy_version 915393 (0.0007) [2024-06-15 23:24:47,150][1652475] Updated weights for policy 0, policy_version 915451 (0.0011) [2024-06-15 23:24:49,008][1652475] Updated weights for policy 0, policy_version 915493 (0.0008) [2024-06-15 23:24:50,554][1652475] Updated weights for policy 0, policy_version 915539 (0.0009) [2024-06-15 23:24:50,738][1648984] Fps is (10 sec: 81917.5, 60 sec: 88473.0, 300 sec: 82308.7). Total num frames: 1875050496. Throughput: 0: 22687.1. Samples: 468789248. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:24:50,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:24:51,052][1652475] Updated weights for policy 0, policy_version 915584 (0.0010) [2024-06-15 23:24:51,874][1652475] Updated weights for policy 0, policy_version 915641 (0.0068) [2024-06-15 23:24:53,323][1652475] Updated weights for policy 0, policy_version 915683 (0.0015) [2024-06-15 23:24:54,110][1652475] Updated weights for policy 0, policy_version 915730 (0.0010) [2024-06-15 23:24:55,738][1648984] Fps is (10 sec: 88472.6, 60 sec: 89565.9, 300 sec: 82641.9). Total num frames: 1875509248. Throughput: 0: 22584.8. Samples: 468924416. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:24:55,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:24:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000915776_1875509248.pth... [2024-06-15 23:24:55,779][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000905600_1854668800.pth [2024-06-15 23:24:56,192][1652475] Updated weights for policy 0, policy_version 915777 (0.0008) [2024-06-15 23:24:56,849][1652475] Updated weights for policy 0, policy_version 915831 (0.0012) [2024-06-15 23:24:57,630][1652475] Updated weights for policy 0, policy_version 915899 (0.0009) [2024-06-15 23:24:59,780][1652475] Updated weights for policy 0, policy_version 915952 (0.0012) [2024-06-15 23:25:00,408][1652475] Updated weights for policy 0, policy_version 916000 (0.0009) [2024-06-15 23:25:00,738][1648984] Fps is (10 sec: 95030.5, 60 sec: 91204.3, 300 sec: 83197.5). Total num frames: 1876000768. Throughput: 0: 22471.1. Samples: 469056000. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:00,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:25:01,830][1652475] Updated weights for policy 0, policy_version 916050 (0.0009) [2024-06-15 23:25:02,265][1652475] Updated weights for policy 0, policy_version 916094 (0.0009) [2024-06-15 23:25:03,997][1652475] Updated weights for policy 0, policy_version 916135 (0.0008) [2024-06-15 23:25:04,516][1652475] Updated weights for policy 0, policy_version 916176 (0.0010) [2024-06-15 23:25:05,030][1652475] Updated weights for policy 0, policy_version 916218 (0.0009) [2024-06-15 23:25:05,738][1648984] Fps is (10 sec: 91752.1, 60 sec: 91204.4, 300 sec: 83864.0). Total num frames: 1876426752. Throughput: 0: 22414.2. Samples: 469128192. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:05,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:25:06,449][1652475] Updated weights for policy 0, policy_version 916260 (0.0011) [2024-06-15 23:25:07,037][1651340] Signal inference workers to stop experience collection... (47150 times) [2024-06-15 23:25:07,061][1652475] Updated weights for policy 0, policy_version 916306 (0.0010) [2024-06-15 23:25:07,090][1652475] InferenceWorker_p0-w0: stopping experience collection (47150 times) [2024-06-15 23:25:07,187][1651340] Signal inference workers to resume experience collection... (47150 times) [2024-06-15 23:25:07,188][1652475] InferenceWorker_p0-w0: resuming experience collection (47150 times) [2024-06-15 23:25:07,532][1652475] Updated weights for policy 0, policy_version 916346 (0.0014) [2024-06-15 23:25:09,874][1652475] Updated weights for policy 0, policy_version 916386 (0.0008) [2024-06-15 23:25:10,510][1652475] Updated weights for policy 0, policy_version 916434 (0.0012) [2024-06-15 23:25:10,738][1648984] Fps is (10 sec: 88471.6, 60 sec: 90657.8, 300 sec: 84197.0). Total num frames: 1876885504. Throughput: 0: 22653.0. Samples: 469270016. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:10,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:25:11,012][1652475] Updated weights for policy 0, policy_version 916480 (0.0009) [2024-06-15 23:25:12,247][1652475] Updated weights for policy 0, policy_version 916532 (0.0010) [2024-06-15 23:25:13,065][1652475] Updated weights for policy 0, policy_version 916596 (0.0066) [2024-06-15 23:25:15,083][1652475] Updated weights for policy 0, policy_version 916637 (0.0010) [2024-06-15 23:25:15,480][1652475] Updated weights for policy 0, policy_version 916672 (0.0009) [2024-06-15 23:25:15,738][1648984] Fps is (10 sec: 91750.2, 60 sec: 89565.9, 300 sec: 84419.3). Total num frames: 1877344256. Throughput: 0: 22949.1. Samples: 469410304. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:15,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:25:16,346][1652475] Updated weights for policy 0, policy_version 916732 (0.0018) [2024-06-15 23:25:17,964][1652475] Updated weights for policy 0, policy_version 916788 (0.0010) [2024-06-15 23:25:18,806][1652475] Updated weights for policy 0, policy_version 916861 (0.0013) [2024-06-15 23:25:20,738][1648984] Fps is (10 sec: 85199.0, 60 sec: 88473.6, 300 sec: 84866.4). Total num frames: 1877737472. Throughput: 0: 22687.3. Samples: 469475328. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:20,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:25:21,105][1652475] Updated weights for policy 0, policy_version 916900 (0.0010) [2024-06-15 23:25:21,776][1652475] Updated weights for policy 0, policy_version 916960 (0.0012) [2024-06-15 23:25:23,296][1652475] Updated weights for policy 0, policy_version 917011 (0.0011) [2024-06-15 23:25:24,050][1652475] Updated weights for policy 0, policy_version 917072 (0.0009) [2024-06-15 23:25:25,738][1648984] Fps is (10 sec: 91750.3, 60 sec: 89565.9, 300 sec: 85307.9). Total num frames: 1878261760. Throughput: 0: 22664.6. Samples: 469609984. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:25,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:25:26,121][1652475] Updated weights for policy 0, policy_version 917121 (0.0010) [2024-06-15 23:25:26,763][1652475] Updated weights for policy 0, policy_version 917180 (0.0073) [2024-06-15 23:25:28,527][1652475] Updated weights for policy 0, policy_version 917242 (0.0011) [2024-06-15 23:25:29,115][1651340] Signal inference workers to stop experience collection... (47200 times) [2024-06-15 23:25:29,162][1652475] InferenceWorker_p0-w0: stopping experience collection (47200 times) [2024-06-15 23:25:29,278][1651340] Signal inference workers to resume experience collection... (47200 times) [2024-06-15 23:25:29,279][1652475] InferenceWorker_p0-w0: resuming experience collection (47200 times) [2024-06-15 23:25:29,453][1652475] Updated weights for policy 0, policy_version 917286 (0.0074) [2024-06-15 23:25:30,426][1652475] Updated weights for policy 0, policy_version 917333 (0.0011) [2024-06-15 23:25:30,738][1648984] Fps is (10 sec: 101579.9, 60 sec: 91204.5, 300 sec: 86085.5). Total num frames: 1878753280. Throughput: 0: 22493.8. Samples: 469739008. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:30,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:25:31,584][1652475] Updated weights for policy 0, policy_version 917377 (0.0011) [2024-06-15 23:25:32,223][1652475] Updated weights for policy 0, policy_version 917432 (0.0010) [2024-06-15 23:25:34,519][1652475] Updated weights for policy 0, policy_version 917476 (0.0011) [2024-06-15 23:25:35,411][1652475] Updated weights for policy 0, policy_version 917521 (0.0009) [2024-06-15 23:25:35,738][1648984] Fps is (10 sec: 88473.1, 60 sec: 89565.9, 300 sec: 86529.7). Total num frames: 1879146496. Throughput: 0: 22641.9. Samples: 469808128. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:35,741][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:25:35,840][1652475] Updated weights for policy 0, policy_version 917562 (0.0008) [2024-06-15 23:25:37,196][1652475] Updated weights for policy 0, policy_version 917610 (0.0010) [2024-06-15 23:25:37,886][1652475] Updated weights for policy 0, policy_version 917664 (0.0010) [2024-06-15 23:25:39,901][1652475] Updated weights for policy 0, policy_version 917712 (0.0010) [2024-06-15 23:25:40,738][1648984] Fps is (10 sec: 81920.5, 60 sec: 89019.8, 300 sec: 86418.7). Total num frames: 1879572480. Throughput: 0: 22493.9. Samples: 469936640. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:40,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:25:41,560][1652475] Updated weights for policy 0, policy_version 917761 (0.0012) [2024-06-15 23:25:42,536][1652475] Updated weights for policy 0, policy_version 917840 (0.0066) [2024-06-15 23:25:43,959][1652475] Updated weights for policy 0, policy_version 917920 (0.0136) [2024-06-15 23:25:45,738][1648984] Fps is (10 sec: 81919.5, 60 sec: 89019.6, 300 sec: 87085.1). Total num frames: 1879965696. Throughput: 0: 22380.1. Samples: 470063104. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:45,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:25:45,814][1652475] Updated weights for policy 0, policy_version 917953 (0.0011) [2024-06-15 23:25:46,516][1652475] Updated weights for policy 0, policy_version 918011 (0.0011) [2024-06-15 23:25:48,398][1652475] Updated weights for policy 0, policy_version 918064 (0.0009) [2024-06-15 23:25:49,147][1652475] Updated weights for policy 0, policy_version 918115 (0.0008) [2024-06-15 23:25:49,915][1652475] Updated weights for policy 0, policy_version 918176 (0.0010) [2024-06-15 23:25:50,738][1648984] Fps is (10 sec: 91751.1, 60 sec: 90658.8, 300 sec: 87529.5). Total num frames: 1880489984. Throughput: 0: 22471.1. Samples: 470139392. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:25:51,742][1652475] Updated weights for policy 0, policy_version 918218 (0.0011) [2024-06-15 23:25:52,070][1651340] Signal inference workers to stop experience collection... (47250 times) [2024-06-15 23:25:52,113][1652475] InferenceWorker_p0-w0: stopping experience collection (47250 times) [2024-06-15 23:25:52,222][1651340] Signal inference workers to resume experience collection... (47250 times) [2024-06-15 23:25:52,223][1652475] InferenceWorker_p0-w0: resuming experience collection (47250 times) [2024-06-15 23:25:52,455][1652475] Updated weights for policy 0, policy_version 918272 (0.0011) [2024-06-15 23:25:54,594][1652475] Updated weights for policy 0, policy_version 918338 (0.0013) [2024-06-15 23:25:55,721][1652475] Updated weights for policy 0, policy_version 918416 (0.0011) [2024-06-15 23:25:55,738][1648984] Fps is (10 sec: 95025.2, 60 sec: 90111.7, 300 sec: 88084.7). Total num frames: 1880915968. Throughput: 0: 22232.1. Samples: 470270464. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:25:55,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:25:56,336][1652475] Updated weights for policy 0, policy_version 918464 (0.0011) [2024-06-15 23:25:58,741][1652475] Updated weights for policy 0, policy_version 918527 (0.0010) [2024-06-15 23:26:00,207][1652475] Updated weights for policy 0, policy_version 918581 (0.0011) [2024-06-15 23:26:00,738][1648984] Fps is (10 sec: 78642.5, 60 sec: 87927.5, 300 sec: 87973.7). Total num frames: 1881276416. Throughput: 0: 21799.8. Samples: 470391296. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:26:00,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 23:26:00,915][1652475] Updated weights for policy 0, policy_version 918611 (0.0010) [2024-06-15 23:26:01,730][1652475] Updated weights for policy 0, policy_version 918676 (0.0012) [2024-06-15 23:26:04,887][1652475] Updated weights for policy 0, policy_version 918752 (0.0012) [2024-06-15 23:26:05,640][1652475] Updated weights for policy 0, policy_version 918802 (0.0010) [2024-06-15 23:26:05,738][1648984] Fps is (10 sec: 78646.5, 60 sec: 87927.5, 300 sec: 88529.2). Total num frames: 1881702400. Throughput: 0: 21743.0. Samples: 470453760. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:26:05,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:26:06,070][1652475] Updated weights for policy 0, policy_version 918841 (0.0009) [2024-06-15 23:26:06,855][1652475] Updated weights for policy 0, policy_version 918880 (0.0011) [2024-06-15 23:26:08,330][1652475] Updated weights for policy 0, policy_version 918928 (0.0010) [2024-06-15 23:26:10,555][1652475] Updated weights for policy 0, policy_version 918995 (0.0065) [2024-06-15 23:26:10,738][1648984] Fps is (10 sec: 85196.8, 60 sec: 87381.7, 300 sec: 88640.2). Total num frames: 1882128384. Throughput: 0: 21560.9. Samples: 470580224. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:26:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:26:12,104][1652475] Updated weights for policy 0, policy_version 919059 (0.0010) [2024-06-15 23:26:12,973][1652475] Updated weights for policy 0, policy_version 919127 (0.0010) [2024-06-15 23:26:15,510][1652475] Updated weights for policy 0, policy_version 919184 (0.0009) [2024-06-15 23:26:15,738][1648984] Fps is (10 sec: 81919.6, 60 sec: 86289.1, 300 sec: 88529.2). Total num frames: 1882521600. Throughput: 0: 21617.8. Samples: 470711808. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:26:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:26:16,021][1651340] Signal inference workers to stop experience collection... (47300 times) [2024-06-15 23:26:16,050][1652475] InferenceWorker_p0-w0: stopping experience collection (47300 times) [2024-06-15 23:26:16,171][1651340] Signal inference workers to resume experience collection... (47300 times) [2024-06-15 23:26:16,172][1652475] InferenceWorker_p0-w0: resuming experience collection (47300 times) [2024-06-15 23:26:16,174][1652475] Updated weights for policy 0, policy_version 919232 (0.0011) [2024-06-15 23:26:16,865][1652475] Updated weights for policy 0, policy_version 919282 (0.0013) [2024-06-15 23:26:18,180][1652475] Updated weights for policy 0, policy_version 919344 (0.0011) [2024-06-15 23:26:18,899][1652475] Updated weights for policy 0, policy_version 919396 (0.0073) [2024-06-15 23:26:20,738][1648984] Fps is (10 sec: 85197.4, 60 sec: 87381.4, 300 sec: 88418.4). Total num frames: 1882980352. Throughput: 0: 21572.3. Samples: 470778880. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:26:20,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:26:21,424][1652475] Updated weights for policy 0, policy_version 919440 (0.0011) [2024-06-15 23:26:22,487][1652475] Updated weights for policy 0, policy_version 919520 (0.0009) [2024-06-15 23:26:23,307][1652475] Updated weights for policy 0, policy_version 919553 (0.0009) [2024-06-15 23:26:23,999][1652475] Updated weights for policy 0, policy_version 919613 (0.0011) [2024-06-15 23:26:24,672][1652475] Updated weights for policy 0, policy_version 919652 (0.0010) [2024-06-15 23:26:25,738][1648984] Fps is (10 sec: 98303.3, 60 sec: 87381.3, 300 sec: 88640.2). Total num frames: 1883504640. Throughput: 0: 21640.5. Samples: 470910464. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:26:25,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:26:27,712][1652475] Updated weights for policy 0, policy_version 919712 (0.0010) [2024-06-15 23:26:28,540][1652475] Updated weights for policy 0, policy_version 919767 (0.0010) [2024-06-15 23:26:29,237][1652475] Updated weights for policy 0, policy_version 919824 (0.0009) [2024-06-15 23:26:29,762][1652475] Updated weights for policy 0, policy_version 919868 (0.0010) [2024-06-15 23:26:30,587][1652475] Updated weights for policy 0, policy_version 919920 (0.0010) [2024-06-15 23:26:30,737][1648984] Fps is (10 sec: 101581.3, 60 sec: 87381.6, 300 sec: 88751.3). Total num frames: 1883996160. Throughput: 0: 21868.2. Samples: 471047168. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:26:30,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:26:33,257][1652475] Updated weights for policy 0, policy_version 919956 (0.0011) [2024-06-15 23:26:33,998][1652475] Updated weights for policy 0, policy_version 920016 (0.0034) [2024-06-15 23:26:34,894][1652475] Updated weights for policy 0, policy_version 920080 (0.0010) [2024-06-15 23:26:35,738][1648984] Fps is (10 sec: 91750.7, 60 sec: 87927.6, 300 sec: 88862.4). Total num frames: 1884422144. Throughput: 0: 21799.8. Samples: 471120384. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:26:35,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:26:36,497][1651340] Signal inference workers to stop experience collection... (47350 times) [2024-06-15 23:26:36,525][1652475] InferenceWorker_p0-w0: stopping experience collection (47350 times) [2024-06-15 23:26:36,616][1651340] Signal inference workers to resume experience collection... (47350 times) [2024-06-15 23:26:36,616][1652475] InferenceWorker_p0-w0: resuming experience collection (47350 times) [2024-06-15 23:26:36,618][1652475] Updated weights for policy 0, policy_version 920144 (0.0012) [2024-06-15 23:26:37,137][1652475] Updated weights for policy 0, policy_version 920183 (0.0018) [2024-06-15 23:26:39,152][1652475] Updated weights for policy 0, policy_version 920226 (0.0011) [2024-06-15 23:26:39,874][1652475] Updated weights for policy 0, policy_version 920277 (0.0009) [2024-06-15 23:26:40,738][1648984] Fps is (10 sec: 81919.2, 60 sec: 87381.3, 300 sec: 88862.4). Total num frames: 1884815360. Throughput: 0: 21709.0. Samples: 471247360. Policy #0 lag: (min: 15.0, avg: 100.9, max: 271.0) [2024-06-15 23:26:40,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:26:40,931][1652475] Updated weights for policy 0, policy_version 920341 (0.0017) [2024-06-15 23:26:43,256][1652475] Updated weights for policy 0, policy_version 920389 (0.0012) [2024-06-15 23:26:43,946][1652475] Updated weights for policy 0, policy_version 920448 (0.0017) [2024-06-15 23:26:44,911][1652475] Updated weights for policy 0, policy_version 920509 (0.0076) [2024-06-15 23:26:45,723][1652475] Updated weights for policy 0, policy_version 920576 (0.0011) [2024-06-15 23:26:45,740][1648984] Fps is (10 sec: 91729.6, 60 sec: 89562.7, 300 sec: 89306.0). Total num frames: 1885339648. Throughput: 0: 21685.0. Samples: 471367168. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:26:45,741][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:26:49,700][1652475] Updated weights for policy 0, policy_version 920645 (0.0012) [2024-06-15 23:26:50,331][1652475] Updated weights for policy 0, policy_version 920690 (0.0009) [2024-06-15 23:26:50,738][1648984] Fps is (10 sec: 81920.6, 60 sec: 85742.9, 300 sec: 88973.5). Total num frames: 1885634560. Throughput: 0: 21845.3. Samples: 471436800. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:26:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:26:51,192][1652475] Updated weights for policy 0, policy_version 920753 (0.0011) [2024-06-15 23:26:51,990][1652475] Updated weights for policy 0, policy_version 920816 (0.0075) [2024-06-15 23:26:53,976][1652475] Updated weights for policy 0, policy_version 920864 (0.0011) [2024-06-15 23:26:55,443][1652475] Updated weights for policy 0, policy_version 920897 (0.0010) [2024-06-15 23:26:55,738][1648984] Fps is (10 sec: 68828.3, 60 sec: 85197.2, 300 sec: 88529.1). Total num frames: 1886027776. Throughput: 0: 21981.9. Samples: 471569408. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:26:55,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:26:56,029][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000920944_1886093312.pth... [2024-06-15 23:26:56,131][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000910528_1864761344.pth [2024-06-15 23:26:56,416][1652475] Updated weights for policy 0, policy_version 920963 (0.0038) [2024-06-15 23:26:57,073][1652475] Updated weights for policy 0, policy_version 921016 (0.0012) [2024-06-15 23:26:57,641][1652475] Updated weights for policy 0, policy_version 921056 (0.0009) [2024-06-15 23:26:59,362][1651340] Signal inference workers to stop experience collection... (47400 times) [2024-06-15 23:26:59,393][1652475] InferenceWorker_p0-w0: stopping experience collection (47400 times) [2024-06-15 23:26:59,513][1651340] Signal inference workers to resume experience collection... (47400 times) [2024-06-15 23:26:59,514][1652475] InferenceWorker_p0-w0: resuming experience collection (47400 times) [2024-06-15 23:26:59,645][1652475] Updated weights for policy 0, policy_version 921110 (0.0008) [2024-06-15 23:27:00,134][1652475] Updated weights for policy 0, policy_version 921152 (0.0010) [2024-06-15 23:27:00,738][1648984] Fps is (10 sec: 88472.7, 60 sec: 87381.3, 300 sec: 88418.1). Total num frames: 1886519296. Throughput: 0: 22095.6. Samples: 471706112. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:27:01,916][1652475] Updated weights for policy 0, policy_version 921216 (0.0010) [2024-06-15 23:27:02,657][1652475] Updated weights for policy 0, policy_version 921278 (0.0010) [2024-06-15 23:27:03,545][1652475] Updated weights for policy 0, policy_version 921340 (0.0011) [2024-06-15 23:27:05,578][1652475] Updated weights for policy 0, policy_version 921392 (0.0010) [2024-06-15 23:27:05,737][1648984] Fps is (10 sec: 101582.6, 60 sec: 89019.8, 300 sec: 88862.4). Total num frames: 1887043584. Throughput: 0: 21970.5. Samples: 471767552. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:05,738][1648984] Avg episode reward: [(0, '-0.150')] [2024-06-15 23:27:07,645][1652475] Updated weights for policy 0, policy_version 921428 (0.0013) [2024-06-15 23:27:08,456][1652475] Updated weights for policy 0, policy_version 921488 (0.0011) [2024-06-15 23:27:09,086][1652475] Updated weights for policy 0, policy_version 921536 (0.0023) [2024-06-15 23:27:09,783][1652475] Updated weights for policy 0, policy_version 921595 (0.0011) [2024-06-15 23:27:10,737][1648984] Fps is (10 sec: 91751.8, 60 sec: 88473.8, 300 sec: 88751.3). Total num frames: 1887436800. Throughput: 0: 21993.3. Samples: 471900160. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:10,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:27:11,208][1652475] Updated weights for policy 0, policy_version 921648 (0.0010) [2024-06-15 23:27:13,474][1652475] Updated weights for policy 0, policy_version 921682 (0.0009) [2024-06-15 23:27:14,162][1652475] Updated weights for policy 0, policy_version 921744 (0.0010) [2024-06-15 23:27:15,698][1652475] Updated weights for policy 0, policy_version 921795 (0.0013) [2024-06-15 23:27:15,738][1648984] Fps is (10 sec: 78642.7, 60 sec: 88473.7, 300 sec: 88418.1). Total num frames: 1887830016. Throughput: 0: 21959.1. Samples: 472035328. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:15,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:27:16,316][1652475] Updated weights for policy 0, policy_version 921844 (0.0013) [2024-06-15 23:27:17,020][1652475] Updated weights for policy 0, policy_version 921905 (0.0010) [2024-06-15 23:27:19,275][1652475] Updated weights for policy 0, policy_version 921957 (0.0010) [2024-06-15 23:27:20,622][1652475] Updated weights for policy 0, policy_version 922001 (0.0011) [2024-06-15 23:27:20,738][1648984] Fps is (10 sec: 81919.8, 60 sec: 87927.5, 300 sec: 88640.3). Total num frames: 1888256000. Throughput: 0: 21811.2. Samples: 472101888. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:20,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:27:21,211][1651340] Signal inference workers to stop experience collection... (47450 times) [2024-06-15 23:27:21,247][1652475] InferenceWorker_p0-w0: stopping experience collection (47450 times) [2024-06-15 23:27:21,338][1651340] Signal inference workers to resume experience collection... (47450 times) [2024-06-15 23:27:21,339][1652475] InferenceWorker_p0-w0: resuming experience collection (47450 times) [2024-06-15 23:27:21,660][1652475] Updated weights for policy 0, policy_version 922084 (0.0013) [2024-06-15 23:27:22,754][1652475] Updated weights for policy 0, policy_version 922128 (0.0013) [2024-06-15 23:27:23,335][1652475] Updated weights for policy 0, policy_version 922176 (0.0011) [2024-06-15 23:27:25,124][1652475] Updated weights for policy 0, policy_version 922237 (0.0011) [2024-06-15 23:27:25,738][1648984] Fps is (10 sec: 91749.5, 60 sec: 87381.4, 300 sec: 88973.5). Total num frames: 1888747520. Throughput: 0: 21879.5. Samples: 472231936. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:25,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:27:27,000][1652475] Updated weights for policy 0, policy_version 922272 (0.0009) [2024-06-15 23:27:27,691][1652475] Updated weights for policy 0, policy_version 922324 (0.0012) [2024-06-15 23:27:28,147][1652475] Updated weights for policy 0, policy_version 922368 (0.0010) [2024-06-15 23:27:28,975][1652475] Updated weights for policy 0, policy_version 922422 (0.0009) [2024-06-15 23:27:30,738][1648984] Fps is (10 sec: 95027.2, 60 sec: 86835.2, 300 sec: 89084.5). Total num frames: 1889206272. Throughput: 0: 22244.7. Samples: 472368128. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:27:30,751][1652475] Updated weights for policy 0, policy_version 922473 (0.0011) [2024-06-15 23:27:32,523][1652475] Updated weights for policy 0, policy_version 922501 (0.0012) [2024-06-15 23:27:33,420][1652475] Updated weights for policy 0, policy_version 922568 (0.0011) [2024-06-15 23:27:34,340][1652475] Updated weights for policy 0, policy_version 922640 (0.0012) [2024-06-15 23:27:35,738][1648984] Fps is (10 sec: 91750.7, 60 sec: 87381.4, 300 sec: 88862.4). Total num frames: 1889665024. Throughput: 0: 22175.3. Samples: 472434688. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:27:36,782][1652475] Updated weights for policy 0, policy_version 922708 (0.0078) [2024-06-15 23:27:38,524][1652475] Updated weights for policy 0, policy_version 922755 (0.0010) [2024-06-15 23:27:39,323][1652475] Updated weights for policy 0, policy_version 922822 (0.0009) [2024-06-15 23:27:39,922][1652475] Updated weights for policy 0, policy_version 922869 (0.0009) [2024-06-15 23:27:40,678][1652475] Updated weights for policy 0, policy_version 922928 (0.0010) [2024-06-15 23:27:40,738][1648984] Fps is (10 sec: 95026.1, 60 sec: 89019.7, 300 sec: 89084.5). Total num frames: 1890156544. Throughput: 0: 22198.0. Samples: 472568320. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:40,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:27:42,765][1652475] Updated weights for policy 0, policy_version 922979 (0.0010) [2024-06-15 23:27:44,256][1651340] Signal inference workers to stop experience collection... (47500 times) [2024-06-15 23:27:44,305][1652475] InferenceWorker_p0-w0: stopping experience collection (47500 times) [2024-06-15 23:27:44,388][1651340] Signal inference workers to resume experience collection... (47500 times) [2024-06-15 23:27:44,389][1652475] InferenceWorker_p0-w0: resuming experience collection (47500 times) [2024-06-15 23:27:44,511][1652475] Updated weights for policy 0, policy_version 923027 (0.0009) [2024-06-15 23:27:45,215][1652475] Updated weights for policy 0, policy_version 923088 (0.0010) [2024-06-15 23:27:45,737][1648984] Fps is (10 sec: 88474.7, 60 sec: 86838.7, 300 sec: 88751.4). Total num frames: 1890549760. Throughput: 0: 22141.3. Samples: 472702464. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:45,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:27:45,910][1652475] Updated weights for policy 0, policy_version 923139 (0.0008) [2024-06-15 23:27:49,146][1652475] Updated weights for policy 0, policy_version 923203 (0.0012) [2024-06-15 23:27:49,856][1652475] Updated weights for policy 0, policy_version 923264 (0.0010) [2024-06-15 23:27:50,617][1652475] Updated weights for policy 0, policy_version 923320 (0.0009) [2024-06-15 23:27:50,738][1648984] Fps is (10 sec: 81921.3, 60 sec: 89019.8, 300 sec: 88862.4). Total num frames: 1890975744. Throughput: 0: 22186.6. Samples: 472765952. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:50,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 23:27:51,349][1652475] Updated weights for policy 0, policy_version 923387 (0.0009) [2024-06-15 23:27:52,705][1652475] Updated weights for policy 0, policy_version 923429 (0.0009) [2024-06-15 23:27:54,891][1652475] Updated weights for policy 0, policy_version 923475 (0.0008) [2024-06-15 23:27:55,738][1648984] Fps is (10 sec: 81918.4, 60 sec: 89019.7, 300 sec: 88640.3). Total num frames: 1891368960. Throughput: 0: 22107.0. Samples: 472894976. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:27:55,738][1648984] Avg episode reward: [(0, '-0.160')] [2024-06-15 23:27:55,761][1652475] Updated weights for policy 0, policy_version 923522 (0.0063) [2024-06-15 23:27:56,405][1652475] Updated weights for policy 0, policy_version 923582 (0.0010) [2024-06-15 23:27:57,452][1652475] Updated weights for policy 0, policy_version 923645 (0.0009) [2024-06-15 23:27:59,529][1652475] Updated weights for policy 0, policy_version 923706 (0.0012) [2024-06-15 23:28:00,170][1652475] Updated weights for policy 0, policy_version 923735 (0.0010) [2024-06-15 23:28:00,737][1648984] Fps is (10 sec: 91750.5, 60 sec: 89566.1, 300 sec: 88862.4). Total num frames: 1891893248. Throughput: 0: 22061.5. Samples: 473028096. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:28:00,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:28:01,618][1652475] Updated weights for policy 0, policy_version 923808 (0.0011) [2024-06-15 23:28:02,857][1652475] Updated weights for policy 0, policy_version 923842 (0.0013) [2024-06-15 23:28:04,646][1652475] Updated weights for policy 0, policy_version 923905 (0.0010) [2024-06-15 23:28:05,282][1652475] Updated weights for policy 0, policy_version 923960 (0.0011) [2024-06-15 23:28:05,738][1648984] Fps is (10 sec: 95027.3, 60 sec: 87927.2, 300 sec: 88529.2). Total num frames: 1892319232. Throughput: 0: 22118.3. Samples: 473097216. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:28:05,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:28:05,762][1651340] Signal inference workers to stop experience collection... (47550 times) [2024-06-15 23:28:05,806][1652475] InferenceWorker_p0-w0: stopping experience collection (47550 times) [2024-06-15 23:28:05,895][1651340] Signal inference workers to resume experience collection... (47550 times) [2024-06-15 23:28:05,896][1652475] InferenceWorker_p0-w0: resuming experience collection (47550 times) [2024-06-15 23:28:05,999][1652475] Updated weights for policy 0, policy_version 924004 (0.0010) [2024-06-15 23:28:06,935][1652475] Updated weights for policy 0, policy_version 924036 (0.0009) [2024-06-15 23:28:07,511][1652475] Updated weights for policy 0, policy_version 924087 (0.0010) [2024-06-15 23:28:08,969][1652475] Updated weights for policy 0, policy_version 924128 (0.0010) [2024-06-15 23:28:09,922][1652475] Updated weights for policy 0, policy_version 924161 (0.0010) [2024-06-15 23:28:10,557][1652475] Updated weights for policy 0, policy_version 924220 (0.0008) [2024-06-15 23:28:10,738][1648984] Fps is (10 sec: 91748.9, 60 sec: 89565.6, 300 sec: 88751.3). Total num frames: 1892810752. Throughput: 0: 22380.1. Samples: 473239040. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:28:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:28:11,591][1652475] Updated weights for policy 0, policy_version 924260 (0.0009) [2024-06-15 23:28:12,771][1652475] Updated weights for policy 0, policy_version 924304 (0.0008) [2024-06-15 23:28:13,313][1652475] Updated weights for policy 0, policy_version 924352 (0.0008) [2024-06-15 23:28:14,698][1652475] Updated weights for policy 0, policy_version 924390 (0.0009) [2024-06-15 23:28:15,729][1652475] Updated weights for policy 0, policy_version 924480 (0.0010) [2024-06-15 23:28:15,737][1648984] Fps is (10 sec: 101582.4, 60 sec: 91750.5, 300 sec: 88862.4). Total num frames: 1893335040. Throughput: 0: 22402.9. Samples: 473376256. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:28:15,742][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 23:28:17,774][1652475] Updated weights for policy 0, policy_version 924541 (0.0013) [2024-06-15 23:28:19,671][1652475] Updated weights for policy 0, policy_version 924582 (0.0008) [2024-06-15 23:28:20,407][1652475] Updated weights for policy 0, policy_version 924644 (0.0010) [2024-06-15 23:28:20,737][1648984] Fps is (10 sec: 91751.9, 60 sec: 91204.3, 300 sec: 88862.4). Total num frames: 1893728256. Throughput: 0: 22391.5. Samples: 473442304. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:28:20,743][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:28:21,280][1652475] Updated weights for policy 0, policy_version 924709 (0.0009) [2024-06-15 23:28:23,974][1652475] Updated weights for policy 0, policy_version 924741 (0.0010) [2024-06-15 23:28:24,573][1652475] Updated weights for policy 0, policy_version 924797 (0.0008) [2024-06-15 23:28:25,285][1652475] Updated weights for policy 0, policy_version 924837 (0.0010) [2024-06-15 23:28:25,738][1648984] Fps is (10 sec: 78642.7, 60 sec: 89566.0, 300 sec: 88529.2). Total num frames: 1894121472. Throughput: 0: 22459.8. Samples: 473579008. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:28:25,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:28:26,084][1652475] Updated weights for policy 0, policy_version 924901 (0.0009) [2024-06-15 23:28:26,570][1652475] Updated weights for policy 0, policy_version 924944 (0.0009) [2024-06-15 23:28:26,649][1651340] Signal inference workers to stop experience collection... (47600 times) [2024-06-15 23:28:26,697][1652475] InferenceWorker_p0-w0: stopping experience collection (47600 times) [2024-06-15 23:28:26,762][1651340] Signal inference workers to resume experience collection... (47600 times) [2024-06-15 23:28:26,764][1652475] InferenceWorker_p0-w0: resuming experience collection (47600 times) [2024-06-15 23:28:30,088][1652475] Updated weights for policy 0, policy_version 924994 (0.0009) [2024-06-15 23:28:30,703][1652475] Updated weights for policy 0, policy_version 925046 (0.0009) [2024-06-15 23:28:30,738][1648984] Fps is (10 sec: 75365.8, 60 sec: 87927.4, 300 sec: 88640.2). Total num frames: 1894481920. Throughput: 0: 22630.3. Samples: 473720832. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:28:30,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:28:31,427][1652475] Updated weights for policy 0, policy_version 925104 (0.0053) [2024-06-15 23:28:32,073][1652475] Updated weights for policy 0, policy_version 925155 (0.0010) [2024-06-15 23:28:32,749][1652475] Updated weights for policy 0, policy_version 925208 (0.0010) [2024-06-15 23:28:35,737][1648984] Fps is (10 sec: 78643.6, 60 sec: 87381.5, 300 sec: 88862.4). Total num frames: 1894907904. Throughput: 0: 22368.7. Samples: 473772544. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:28:35,744][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:28:36,234][1652475] Updated weights for policy 0, policy_version 925268 (0.0010) [2024-06-15 23:28:37,092][1652475] Updated weights for policy 0, policy_version 925340 (0.0076) [2024-06-15 23:28:37,731][1652475] Updated weights for policy 0, policy_version 925392 (0.0009) [2024-06-15 23:28:38,420][1652475] Updated weights for policy 0, policy_version 925445 (0.0010) [2024-06-15 23:28:39,055][1652475] Updated weights for policy 0, policy_version 925501 (0.0012) [2024-06-15 23:28:40,738][1648984] Fps is (10 sec: 95027.2, 60 sec: 87927.6, 300 sec: 88862.4). Total num frames: 1895432192. Throughput: 0: 22584.9. Samples: 473911296. Policy #0 lag: (min: 41.0, avg: 164.2, max: 287.0) [2024-06-15 23:28:40,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:28:42,194][1652475] Updated weights for policy 0, policy_version 925552 (0.0012) [2024-06-15 23:28:42,767][1652475] Updated weights for policy 0, policy_version 925600 (0.0008) [2024-06-15 23:28:43,662][1652475] Updated weights for policy 0, policy_version 925668 (0.0010) [2024-06-15 23:28:44,551][1652475] Updated weights for policy 0, policy_version 925744 (0.0016) [2024-06-15 23:28:45,737][1648984] Fps is (10 sec: 104857.7, 60 sec: 90112.0, 300 sec: 88862.4). Total num frames: 1895956480. Throughput: 0: 22619.0. Samples: 474045952. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:28:45,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:28:47,447][1652475] Updated weights for policy 0, policy_version 925764 (0.0010) [2024-06-15 23:28:48,345][1651340] Signal inference workers to stop experience collection... (47650 times) [2024-06-15 23:28:48,353][1652475] Updated weights for policy 0, policy_version 925825 (0.0011) [2024-06-15 23:28:48,367][1652475] InferenceWorker_p0-w0: stopping experience collection (47650 times) [2024-06-15 23:28:48,464][1651340] Signal inference workers to resume experience collection... (47650 times) [2024-06-15 23:28:48,465][1652475] InferenceWorker_p0-w0: resuming experience collection (47650 times) [2024-06-15 23:28:49,006][1652475] Updated weights for policy 0, policy_version 925879 (0.0010) [2024-06-15 23:28:49,911][1652475] Updated weights for policy 0, policy_version 925952 (0.0010) [2024-06-15 23:28:50,677][1652475] Updated weights for policy 0, policy_version 926012 (0.0012) [2024-06-15 23:28:50,737][1648984] Fps is (10 sec: 104858.1, 60 sec: 91750.4, 300 sec: 89306.8). Total num frames: 1896480768. Throughput: 0: 22698.7. Samples: 474118656. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:28:50,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:28:54,361][1652475] Updated weights for policy 0, policy_version 926064 (0.0009) [2024-06-15 23:28:55,379][1652475] Updated weights for policy 0, policy_version 926129 (0.0082) [2024-06-15 23:28:55,738][1648984] Fps is (10 sec: 81918.2, 60 sec: 90111.9, 300 sec: 88973.4). Total num frames: 1896775680. Throughput: 0: 22459.7. Samples: 474249728. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:28:55,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:28:55,916][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000926176_1896808448.pth... [2024-06-15 23:28:56,008][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000915776_1875509248.pth [2024-06-15 23:28:56,390][1652475] Updated weights for policy 0, policy_version 926208 (0.0069) [2024-06-15 23:28:57,281][1652475] Updated weights for policy 0, policy_version 926272 (0.0013) [2024-06-15 23:29:00,738][1648984] Fps is (10 sec: 65535.9, 60 sec: 87381.3, 300 sec: 88751.3). Total num frames: 1897136128. Throughput: 0: 22016.0. Samples: 474366976. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:00,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:29:00,761][1652475] Updated weights for policy 0, policy_version 926340 (0.0010) [2024-06-15 23:29:01,468][1652475] Updated weights for policy 0, policy_version 926400 (0.0012) [2024-06-15 23:29:02,721][1652475] Updated weights for policy 0, policy_version 926455 (0.0010) [2024-06-15 23:29:03,585][1652475] Updated weights for policy 0, policy_version 926498 (0.0009) [2024-06-15 23:29:05,607][1652475] Updated weights for policy 0, policy_version 926544 (0.0009) [2024-06-15 23:29:05,738][1648984] Fps is (10 sec: 78643.5, 60 sec: 87381.3, 300 sec: 88529.1). Total num frames: 1897562112. Throughput: 0: 21765.6. Samples: 474421760. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:05,739][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 23:29:06,322][1652475] Updated weights for policy 0, policy_version 926597 (0.0009) [2024-06-15 23:29:08,790][1652475] Updated weights for policy 0, policy_version 926679 (0.0009) [2024-06-15 23:29:09,367][1651340] Signal inference workers to stop experience collection... (47700 times) [2024-06-15 23:29:09,393][1652475] InferenceWorker_p0-w0: stopping experience collection (47700 times) [2024-06-15 23:29:09,395][1652475] Updated weights for policy 0, policy_version 926723 (0.0009) [2024-06-15 23:29:09,496][1651340] Signal inference workers to resume experience collection... (47700 times) [2024-06-15 23:29:09,497][1652475] InferenceWorker_p0-w0: resuming experience collection (47700 times) [2024-06-15 23:29:10,098][1652475] Updated weights for policy 0, policy_version 926783 (0.0010) [2024-06-15 23:29:10,742][1648984] Fps is (10 sec: 91708.8, 60 sec: 87374.9, 300 sec: 88416.7). Total num frames: 1898053632. Throughput: 0: 21695.2. Samples: 474555392. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:10,742][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:29:12,460][1652475] Updated weights for policy 0, policy_version 926853 (0.0013) [2024-06-15 23:29:14,803][1652475] Updated weights for policy 0, policy_version 926915 (0.0014) [2024-06-15 23:29:15,550][1652475] Updated weights for policy 0, policy_version 926976 (0.0009) [2024-06-15 23:29:15,738][1648984] Fps is (10 sec: 88474.5, 60 sec: 85196.7, 300 sec: 88195.9). Total num frames: 1898446848. Throughput: 0: 21458.5. Samples: 474686464. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:29:16,377][1652475] Updated weights for policy 0, policy_version 927040 (0.0009) [2024-06-15 23:29:18,280][1652475] Updated weights for policy 0, policy_version 927094 (0.0010) [2024-06-15 23:29:18,984][1652475] Updated weights for policy 0, policy_version 927152 (0.0010) [2024-06-15 23:29:20,572][1652475] Updated weights for policy 0, policy_version 927200 (0.0012) [2024-06-15 23:29:20,738][1648984] Fps is (10 sec: 85235.3, 60 sec: 86289.0, 300 sec: 88195.9). Total num frames: 1898905600. Throughput: 0: 21788.4. Samples: 474753024. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:20,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:29:21,404][1652475] Updated weights for policy 0, policy_version 927264 (0.0012) [2024-06-15 23:29:23,717][1652475] Updated weights for policy 0, policy_version 927318 (0.0011) [2024-06-15 23:29:25,045][1652475] Updated weights for policy 0, policy_version 927363 (0.0013) [2024-06-15 23:29:25,738][1648984] Fps is (10 sec: 88469.6, 60 sec: 86834.5, 300 sec: 88306.9). Total num frames: 1899331584. Throughput: 0: 21674.4. Samples: 474886656. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:25,740][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 23:29:25,769][1652475] Updated weights for policy 0, policy_version 927424 (0.0009) [2024-06-15 23:29:26,336][1652475] Updated weights for policy 0, policy_version 927466 (0.0011) [2024-06-15 23:29:26,979][1652475] Updated weights for policy 0, policy_version 927520 (0.0012) [2024-06-15 23:29:30,210][1652475] Updated weights for policy 0, policy_version 927569 (0.0011) [2024-06-15 23:29:30,738][1648984] Fps is (10 sec: 85197.0, 60 sec: 87927.5, 300 sec: 88084.9). Total num frames: 1899757568. Throughput: 0: 21731.5. Samples: 475023872. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:30,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:29:30,850][1652475] Updated weights for policy 0, policy_version 927617 (0.0011) [2024-06-15 23:29:31,264][1651340] Signal inference workers to stop experience collection... (47750 times) [2024-06-15 23:29:31,331][1652475] InferenceWorker_p0-w0: stopping experience collection (47750 times) [2024-06-15 23:29:31,386][1651340] Signal inference workers to resume experience collection... (47750 times) [2024-06-15 23:29:31,386][1652475] InferenceWorker_p0-w0: resuming experience collection (47750 times) [2024-06-15 23:29:31,599][1652475] Updated weights for policy 0, policy_version 927680 (0.0074) [2024-06-15 23:29:32,314][1652475] Updated weights for policy 0, policy_version 927732 (0.0009) [2024-06-15 23:29:33,028][1652475] Updated weights for policy 0, policy_version 927792 (0.0010) [2024-06-15 23:29:35,739][1648984] Fps is (10 sec: 81915.2, 60 sec: 87379.7, 300 sec: 87862.4). Total num frames: 1900150784. Throughput: 0: 21241.8. Samples: 475074560. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:35,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:29:36,225][1652475] Updated weights for policy 0, policy_version 927824 (0.0010) [2024-06-15 23:29:36,954][1652475] Updated weights for policy 0, policy_version 927876 (0.0062) [2024-06-15 23:29:38,092][1652475] Updated weights for policy 0, policy_version 927958 (0.0012) [2024-06-15 23:29:38,911][1652475] Updated weights for policy 0, policy_version 928016 (0.0012) [2024-06-15 23:29:40,737][1648984] Fps is (10 sec: 91750.9, 60 sec: 87381.5, 300 sec: 88307.0). Total num frames: 1900675072. Throughput: 0: 21196.9. Samples: 475203584. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:40,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 23:29:42,026][1652475] Updated weights for policy 0, policy_version 928065 (0.0011) [2024-06-15 23:29:42,682][1652475] Updated weights for policy 0, policy_version 928113 (0.0009) [2024-06-15 23:29:43,338][1652475] Updated weights for policy 0, policy_version 928163 (0.0011) [2024-06-15 23:29:44,429][1652475] Updated weights for policy 0, policy_version 928220 (0.0013) [2024-06-15 23:29:45,002][1652475] Updated weights for policy 0, policy_version 928261 (0.0054) [2024-06-15 23:29:45,593][1652475] Updated weights for policy 0, policy_version 928312 (0.0009) [2024-06-15 23:29:45,738][1648984] Fps is (10 sec: 104868.8, 60 sec: 87381.2, 300 sec: 88640.4). Total num frames: 1901199360. Throughput: 0: 21697.4. Samples: 475343360. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:45,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 23:29:48,014][1652475] Updated weights for policy 0, policy_version 928357 (0.0010) [2024-06-15 23:29:48,885][1652475] Updated weights for policy 0, policy_version 928423 (0.0010) [2024-06-15 23:29:50,079][1652475] Updated weights for policy 0, policy_version 928480 (0.0010) [2024-06-15 23:29:50,738][1648984] Fps is (10 sec: 91744.5, 60 sec: 85195.9, 300 sec: 88418.0). Total num frames: 1901592576. Throughput: 0: 22175.1. Samples: 475419648. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:50,739][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 23:29:50,986][1652475] Updated weights for policy 0, policy_version 928544 (0.0009) [2024-06-15 23:29:51,058][1651340] Signal inference workers to stop experience collection... (47800 times) [2024-06-15 23:29:51,093][1652475] InferenceWorker_p0-w0: stopping experience collection (47800 times) [2024-06-15 23:29:51,199][1651340] Signal inference workers to resume experience collection... (47800 times) [2024-06-15 23:29:51,199][1652475] InferenceWorker_p0-w0: resuming experience collection (47800 times) [2024-06-15 23:29:53,633][1652475] Updated weights for policy 0, policy_version 928596 (0.0009) [2024-06-15 23:29:54,436][1652475] Updated weights for policy 0, policy_version 928660 (0.0011) [2024-06-15 23:29:55,696][1652475] Updated weights for policy 0, policy_version 928731 (0.0069) [2024-06-15 23:29:55,738][1648984] Fps is (10 sec: 85196.4, 60 sec: 87927.6, 300 sec: 88307.0). Total num frames: 1902051328. Throughput: 0: 22177.5. Samples: 475553280. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:29:55,738][1648984] Avg episode reward: [(0, '-0.170')] [2024-06-15 23:29:56,249][1652475] Updated weights for policy 0, policy_version 928770 (0.0012) [2024-06-15 23:29:56,908][1652475] Updated weights for policy 0, policy_version 928832 (0.0010) [2024-06-15 23:29:59,859][1652475] Updated weights for policy 0, policy_version 928894 (0.0013) [2024-06-15 23:30:00,738][1648984] Fps is (10 sec: 81922.0, 60 sec: 87927.0, 300 sec: 88084.7). Total num frames: 1902411776. Throughput: 0: 22266.2. Samples: 475688448. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:30:00,738][1648984] Avg episode reward: [(0, '-0.490')] [2024-06-15 23:30:01,080][1652475] Updated weights for policy 0, policy_version 928951 (0.0011) [2024-06-15 23:30:01,781][1652475] Updated weights for policy 0, policy_version 929008 (0.0010) [2024-06-15 23:30:02,659][1652475] Updated weights for policy 0, policy_version 929072 (0.0013) [2024-06-15 23:30:05,737][1648984] Fps is (10 sec: 72090.2, 60 sec: 86835.4, 300 sec: 87751.7). Total num frames: 1902772224. Throughput: 0: 22072.9. Samples: 475746304. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:30:05,738][1648984] Avg episode reward: [(0, '-0.450')] [2024-06-15 23:30:06,186][1652475] Updated weights for policy 0, policy_version 929128 (0.0010) [2024-06-15 23:30:07,098][1652475] Updated weights for policy 0, policy_version 929200 (0.0010) [2024-06-15 23:30:08,155][1652475] Updated weights for policy 0, policy_version 929283 (0.0015) [2024-06-15 23:30:08,890][1652475] Updated weights for policy 0, policy_version 929344 (0.0020) [2024-06-15 23:30:10,747][1648984] Fps is (10 sec: 88416.3, 60 sec: 87378.0, 300 sec: 87971.7). Total num frames: 1903296512. Throughput: 0: 21842.3. Samples: 475869696. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:30:10,770][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:30:11,996][1652475] Updated weights for policy 0, policy_version 929400 (0.0011) [2024-06-15 23:30:12,753][1652475] Updated weights for policy 0, policy_version 929447 (0.0010) [2024-06-15 23:30:13,535][1651340] Signal inference workers to stop experience collection... (47850 times) [2024-06-15 23:30:13,584][1652475] InferenceWorker_p0-w0: stopping experience collection (47850 times) [2024-06-15 23:30:13,691][1651340] Signal inference workers to resume experience collection... (47850 times) [2024-06-15 23:30:13,692][1652475] InferenceWorker_p0-w0: resuming experience collection (47850 times) [2024-06-15 23:30:13,802][1652475] Updated weights for policy 0, policy_version 929492 (0.0011) [2024-06-15 23:30:14,979][1652475] Updated weights for policy 0, policy_version 929538 (0.0013) [2024-06-15 23:30:15,676][1652475] Updated weights for policy 0, policy_version 929600 (0.0011) [2024-06-15 23:30:15,738][1648984] Fps is (10 sec: 104857.0, 60 sec: 89565.9, 300 sec: 88418.1). Total num frames: 1903820800. Throughput: 0: 21902.2. Samples: 476009472. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:30:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:30:17,183][1652475] Updated weights for policy 0, policy_version 929648 (0.0011) [2024-06-15 23:30:18,252][1652475] Updated weights for policy 0, policy_version 929699 (0.0012) [2024-06-15 23:30:19,591][1652475] Updated weights for policy 0, policy_version 929744 (0.0010) [2024-06-15 23:30:20,197][1652475] Updated weights for policy 0, policy_version 929792 (0.0010) [2024-06-15 23:30:20,738][1648984] Fps is (10 sec: 95091.3, 60 sec: 89019.6, 300 sec: 88084.8). Total num frames: 1904246784. Throughput: 0: 22209.9. Samples: 476073984. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:30:20,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:30:21,256][1652475] Updated weights for policy 0, policy_version 929852 (0.0009) [2024-06-15 23:30:22,783][1652475] Updated weights for policy 0, policy_version 929897 (0.0010) [2024-06-15 23:30:23,669][1652475] Updated weights for policy 0, policy_version 929941 (0.0012) [2024-06-15 23:30:24,091][1652475] Updated weights for policy 0, policy_version 929983 (0.0009) [2024-06-15 23:30:25,599][1652475] Updated weights for policy 0, policy_version 930041 (0.0082) [2024-06-15 23:30:25,738][1648984] Fps is (10 sec: 91750.5, 60 sec: 90112.7, 300 sec: 88084.9). Total num frames: 1904738304. Throughput: 0: 22493.8. Samples: 476215808. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:30:25,740][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 23:30:26,635][1652475] Updated weights for policy 0, policy_version 930084 (0.0012) [2024-06-15 23:30:28,059][1652475] Updated weights for policy 0, policy_version 930128 (0.0009) [2024-06-15 23:30:28,680][1652475] Updated weights for policy 0, policy_version 930176 (0.0010) [2024-06-15 23:30:30,462][1652475] Updated weights for policy 0, policy_version 930239 (0.0010) [2024-06-15 23:30:30,738][1648984] Fps is (10 sec: 88473.4, 60 sec: 89565.7, 300 sec: 88084.8). Total num frames: 1905131520. Throughput: 0: 22448.3. Samples: 476353536. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:30:30,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:30:31,240][1652475] Updated weights for policy 0, policy_version 930288 (0.0009) [2024-06-15 23:30:32,262][1652475] Updated weights for policy 0, policy_version 930362 (0.0012) [2024-06-15 23:30:34,639][1652475] Updated weights for policy 0, policy_version 930416 (0.0011) [2024-06-15 23:30:35,738][1648984] Fps is (10 sec: 78642.8, 60 sec: 89567.4, 300 sec: 87973.8). Total num frames: 1905524736. Throughput: 0: 22130.0. Samples: 476415488. Policy #0 lag: (min: 15.0, avg: 81.0, max: 271.0) [2024-06-15 23:30:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:30:36,262][1652475] Updated weights for policy 0, policy_version 930454 (0.0010) [2024-06-15 23:30:36,987][1652475] Updated weights for policy 0, policy_version 930512 (0.0010) [2024-06-15 23:30:37,277][1651340] Signal inference workers to stop experience collection... (47900 times) [2024-06-15 23:30:37,317][1652475] InferenceWorker_p0-w0: stopping experience collection (47900 times) [2024-06-15 23:30:37,411][1651340] Signal inference workers to resume experience collection... (47900 times) [2024-06-15 23:30:37,412][1652475] InferenceWorker_p0-w0: resuming experience collection (47900 times) [2024-06-15 23:30:38,152][1652475] Updated weights for policy 0, policy_version 930598 (0.0012) [2024-06-15 23:30:40,738][1648984] Fps is (10 sec: 78642.0, 60 sec: 87380.9, 300 sec: 87973.7). Total num frames: 1905917952. Throughput: 0: 21924.9. Samples: 476539904. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:30:40,739][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:30:41,444][1652475] Updated weights for policy 0, policy_version 930656 (0.0012) [2024-06-15 23:30:42,263][1652475] Updated weights for policy 0, policy_version 930712 (0.0011) [2024-06-15 23:30:42,970][1652475] Updated weights for policy 0, policy_version 930768 (0.0009) [2024-06-15 23:30:43,518][1652475] Updated weights for policy 0, policy_version 930813 (0.0011) [2024-06-15 23:30:44,689][1652475] Updated weights for policy 0, policy_version 930877 (0.0014) [2024-06-15 23:30:45,738][1648984] Fps is (10 sec: 91750.5, 60 sec: 87381.3, 300 sec: 87973.7). Total num frames: 1906442240. Throughput: 0: 21743.1. Samples: 476666880. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:30:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:30:47,695][1652475] Updated weights for policy 0, policy_version 930919 (0.0012) [2024-06-15 23:30:48,312][1652475] Updated weights for policy 0, policy_version 930967 (0.0011) [2024-06-15 23:30:48,992][1652475] Updated weights for policy 0, policy_version 931021 (0.0011) [2024-06-15 23:30:49,579][1652475] Updated weights for policy 0, policy_version 931068 (0.0013) [2024-06-15 23:30:50,343][1652475] Updated weights for policy 0, policy_version 931125 (0.0009) [2024-06-15 23:30:50,738][1648984] Fps is (10 sec: 104860.3, 60 sec: 89566.7, 300 sec: 88307.1). Total num frames: 1906966528. Throughput: 0: 22152.5. Samples: 476743168. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:30:50,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:30:53,333][1652475] Updated weights for policy 0, policy_version 931171 (0.0009) [2024-06-15 23:30:53,852][1652475] Updated weights for policy 0, policy_version 931216 (0.0057) [2024-06-15 23:30:54,393][1652475] Updated weights for policy 0, policy_version 931252 (0.0008) [2024-06-15 23:30:55,133][1652475] Updated weights for policy 0, policy_version 931312 (0.0012) [2024-06-15 23:30:55,738][1648984] Fps is (10 sec: 95027.4, 60 sec: 89019.8, 300 sec: 88529.2). Total num frames: 1907392512. Throughput: 0: 22520.0. Samples: 476882944. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:30:55,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:30:55,965][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000931376_1907458048.pth... [2024-06-15 23:30:55,965][1652475] Updated weights for policy 0, policy_version 931376 (0.0010) [2024-06-15 23:30:56,007][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000920944_1886093312.pth [2024-06-15 23:30:56,010][1651340] Saving a milestone train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/milestones/checkpoint_000931376_1907458048.pth [2024-06-15 23:30:58,796][1652475] Updated weights for policy 0, policy_version 931397 (0.0016) [2024-06-15 23:30:58,937][1651340] Signal inference workers to stop experience collection... (47950 times) [2024-06-15 23:30:58,988][1652475] InferenceWorker_p0-w0: stopping experience collection (47950 times) [2024-06-15 23:30:59,073][1651340] Signal inference workers to resume experience collection... (47950 times) [2024-06-15 23:30:59,073][1652475] InferenceWorker_p0-w0: resuming experience collection (47950 times) [2024-06-15 23:30:59,500][1652475] Updated weights for policy 0, policy_version 931456 (0.0010) [2024-06-15 23:31:00,450][1652475] Updated weights for policy 0, policy_version 931522 (0.0011) [2024-06-15 23:31:00,738][1648984] Fps is (10 sec: 81919.8, 60 sec: 89566.3, 300 sec: 88418.0). Total num frames: 1907785728. Throughput: 0: 22277.7. Samples: 477011968. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:00,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:31:01,146][1652475] Updated weights for policy 0, policy_version 931584 (0.0013) [2024-06-15 23:31:01,958][1652475] Updated weights for policy 0, policy_version 931642 (0.0069) [2024-06-15 23:31:05,146][1652475] Updated weights for policy 0, policy_version 931697 (0.0010) [2024-06-15 23:31:05,738][1648984] Fps is (10 sec: 81914.8, 60 sec: 90657.1, 300 sec: 88417.9). Total num frames: 1908211712. Throughput: 0: 22231.9. Samples: 477074432. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:05,739][1648984] Avg episode reward: [(0, '-0.510')] [2024-06-15 23:31:05,771][1652475] Updated weights for policy 0, policy_version 931747 (0.0011) [2024-06-15 23:31:06,698][1652475] Updated weights for policy 0, policy_version 931795 (0.0018) [2024-06-15 23:31:07,299][1652475] Updated weights for policy 0, policy_version 931841 (0.0010) [2024-06-15 23:31:07,947][1652475] Updated weights for policy 0, policy_version 931898 (0.0011) [2024-06-15 23:31:10,738][1648984] Fps is (10 sec: 81920.1, 60 sec: 88483.6, 300 sec: 88418.1). Total num frames: 1908604928. Throughput: 0: 22198.0. Samples: 477214720. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:10,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:31:10,753][1652475] Updated weights for policy 0, policy_version 931941 (0.0055) [2024-06-15 23:31:11,570][1652475] Updated weights for policy 0, policy_version 932006 (0.0011) [2024-06-15 23:31:12,426][1652475] Updated weights for policy 0, policy_version 932064 (0.0009) [2024-06-15 23:31:13,462][1652475] Updated weights for policy 0, policy_version 932116 (0.0011) [2024-06-15 23:31:15,737][1648984] Fps is (10 sec: 85202.8, 60 sec: 87381.4, 300 sec: 88418.1). Total num frames: 1909063680. Throughput: 0: 22095.7. Samples: 477347840. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:15,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:31:15,828][1652475] Updated weights for policy 0, policy_version 932176 (0.0014) [2024-06-15 23:31:16,374][1652475] Updated weights for policy 0, policy_version 932224 (0.0009) [2024-06-15 23:31:17,563][1652475] Updated weights for policy 0, policy_version 932274 (0.0011) [2024-06-15 23:31:18,220][1652475] Updated weights for policy 0, policy_version 932320 (0.0012) [2024-06-15 23:31:20,250][1651340] Signal inference workers to stop experience collection... (48000 times) [2024-06-15 23:31:20,289][1652475] InferenceWorker_p0-w0: stopping experience collection (48000 times) [2024-06-15 23:31:20,371][1651340] Signal inference workers to resume experience collection... (48000 times) [2024-06-15 23:31:20,371][1652475] InferenceWorker_p0-w0: resuming experience collection (48000 times) [2024-06-15 23:31:20,481][1652475] Updated weights for policy 0, policy_version 932388 (0.0010) [2024-06-15 23:31:20,738][1648984] Fps is (10 sec: 98304.5, 60 sec: 89019.9, 300 sec: 88418.1). Total num frames: 1909587968. Throughput: 0: 22152.6. Samples: 477412352. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:20,738][1648984] Avg episode reward: [(0, '-0.190')] [2024-06-15 23:31:21,521][1652475] Updated weights for policy 0, policy_version 932440 (0.0017) [2024-06-15 23:31:22,517][1652475] Updated weights for policy 0, policy_version 932502 (0.0013) [2024-06-15 23:31:24,488][1652475] Updated weights for policy 0, policy_version 932545 (0.0012) [2024-06-15 23:31:25,082][1652475] Updated weights for policy 0, policy_version 932601 (0.0010) [2024-06-15 23:31:25,738][1648984] Fps is (10 sec: 91749.8, 60 sec: 87381.3, 300 sec: 88084.8). Total num frames: 1909981184. Throughput: 0: 22380.2. Samples: 477547008. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:25,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 23:31:26,717][1652475] Updated weights for policy 0, policy_version 932662 (0.0011) [2024-06-15 23:31:27,465][1652475] Updated weights for policy 0, policy_version 932728 (0.0010) [2024-06-15 23:31:28,299][1652475] Updated weights for policy 0, policy_version 932768 (0.0010) [2024-06-15 23:31:30,250][1652475] Updated weights for policy 0, policy_version 932806 (0.0011) [2024-06-15 23:31:30,738][1648984] Fps is (10 sec: 88472.8, 60 sec: 89019.8, 300 sec: 88307.0). Total num frames: 1910472704. Throughput: 0: 22675.9. Samples: 477687296. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:30,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 23:31:31,962][1652475] Updated weights for policy 0, policy_version 932880 (0.0057) [2024-06-15 23:31:32,733][1652475] Updated weights for policy 0, policy_version 932944 (0.0011) [2024-06-15 23:31:33,260][1652475] Updated weights for policy 0, policy_version 932987 (0.0010) [2024-06-15 23:31:34,245][1652475] Updated weights for policy 0, policy_version 933040 (0.0057) [2024-06-15 23:31:35,738][1648984] Fps is (10 sec: 91750.7, 60 sec: 89566.0, 300 sec: 88418.1). Total num frames: 1910898688. Throughput: 0: 22391.5. Samples: 477750784. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:35,741][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:31:36,288][1652475] Updated weights for policy 0, policy_version 933104 (0.0011) [2024-06-15 23:31:37,987][1652475] Updated weights for policy 0, policy_version 933168 (0.0010) [2024-06-15 23:31:38,606][1652475] Updated weights for policy 0, policy_version 933216 (0.0009) [2024-06-15 23:31:40,703][1652475] Updated weights for policy 0, policy_version 933266 (0.0010) [2024-06-15 23:31:40,738][1648984] Fps is (10 sec: 85197.3, 60 sec: 90112.4, 300 sec: 88085.5). Total num frames: 1911324672. Throughput: 0: 22220.8. Samples: 477882880. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:31:41,333][1652475] Updated weights for policy 0, policy_version 933315 (0.0010) [2024-06-15 23:31:41,694][1651340] Signal inference workers to stop experience collection... (48050 times) [2024-06-15 23:31:41,730][1652475] InferenceWorker_p0-w0: stopping experience collection (48050 times) [2024-06-15 23:31:41,847][1651340] Signal inference workers to resume experience collection... (48050 times) [2024-06-15 23:31:41,848][1652475] InferenceWorker_p0-w0: resuming experience collection (48050 times) [2024-06-15 23:31:41,996][1652475] Updated weights for policy 0, policy_version 933376 (0.0009) [2024-06-15 23:31:43,411][1652475] Updated weights for policy 0, policy_version 933434 (0.0010) [2024-06-15 23:31:45,294][1652475] Updated weights for policy 0, policy_version 933476 (0.0009) [2024-06-15 23:31:45,738][1648984] Fps is (10 sec: 91750.5, 60 sec: 89566.0, 300 sec: 88751.3). Total num frames: 1911816192. Throughput: 0: 22471.1. Samples: 478023168. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:45,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:31:46,151][1652475] Updated weights for policy 0, policy_version 933520 (0.0008) [2024-06-15 23:31:47,766][1652475] Updated weights for policy 0, policy_version 933569 (0.0013) [2024-06-15 23:31:48,392][1652475] Updated weights for policy 0, policy_version 933621 (0.0010) [2024-06-15 23:31:49,170][1652475] Updated weights for policy 0, policy_version 933687 (0.0011) [2024-06-15 23:31:50,738][1648984] Fps is (10 sec: 88473.7, 60 sec: 87381.3, 300 sec: 88751.3). Total num frames: 1912209408. Throughput: 0: 22539.7. Samples: 478088704. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:31:51,527][1652475] Updated weights for policy 0, policy_version 933744 (0.0011) [2024-06-15 23:31:52,327][1652475] Updated weights for policy 0, policy_version 933799 (0.0010) [2024-06-15 23:31:53,311][1652475] Updated weights for policy 0, policy_version 933826 (0.0011) [2024-06-15 23:31:53,902][1652475] Updated weights for policy 0, policy_version 933880 (0.0010) [2024-06-15 23:31:54,599][1652475] Updated weights for policy 0, policy_version 933925 (0.0009) [2024-06-15 23:31:55,738][1648984] Fps is (10 sec: 91750.3, 60 sec: 89019.8, 300 sec: 88862.4). Total num frames: 1912733696. Throughput: 0: 22414.2. Samples: 478223360. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:31:55,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:31:56,723][1652475] Updated weights for policy 0, policy_version 933984 (0.0011) [2024-06-15 23:31:58,160][1652475] Updated weights for policy 0, policy_version 934036 (0.0012) [2024-06-15 23:31:59,310][1652475] Updated weights for policy 0, policy_version 934103 (0.0012) [2024-06-15 23:32:00,134][1652475] Updated weights for policy 0, policy_version 934160 (0.0010) [2024-06-15 23:32:00,715][1652475] Updated weights for policy 0, policy_version 934205 (0.0073) [2024-06-15 23:32:00,737][1648984] Fps is (10 sec: 104858.2, 60 sec: 91204.4, 300 sec: 88862.4). Total num frames: 1913257984. Throughput: 0: 22357.3. Samples: 478353920. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:32:00,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:32:03,647][1652475] Updated weights for policy 0, policy_version 934276 (0.0061) [2024-06-15 23:32:04,300][1652475] Updated weights for policy 0, policy_version 934335 (0.0073) [2024-06-15 23:32:05,067][1651340] Signal inference workers to stop experience collection... (48100 times) [2024-06-15 23:32:05,096][1652475] InferenceWorker_p0-w0: stopping experience collection (48100 times) [2024-06-15 23:32:05,190][1651340] Signal inference workers to resume experience collection... (48100 times) [2024-06-15 23:32:05,191][1652475] InferenceWorker_p0-w0: resuming experience collection (48100 times) [2024-06-15 23:32:05,193][1652475] Updated weights for policy 0, policy_version 934384 (0.0010) [2024-06-15 23:32:05,738][1648984] Fps is (10 sec: 91750.2, 60 sec: 90659.1, 300 sec: 88862.4). Total num frames: 1913651200. Throughput: 0: 22539.4. Samples: 478426624. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:32:05,738][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:32:06,150][1652475] Updated weights for policy 0, policy_version 934436 (0.0009) [2024-06-15 23:32:08,296][1652475] Updated weights for policy 0, policy_version 934496 (0.0011) [2024-06-15 23:32:09,892][1652475] Updated weights for policy 0, policy_version 934560 (0.0012) [2024-06-15 23:32:10,738][1648984] Fps is (10 sec: 81919.1, 60 sec: 91204.2, 300 sec: 88973.4). Total num frames: 1914077184. Throughput: 0: 22596.3. Samples: 478563840. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:32:10,740][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:32:10,961][1652475] Updated weights for policy 0, policy_version 934624 (0.0010) [2024-06-15 23:32:12,074][1652475] Updated weights for policy 0, policy_version 934694 (0.0010) [2024-06-15 23:32:14,457][1652475] Updated weights for policy 0, policy_version 934738 (0.0010) [2024-06-15 23:32:15,368][1652475] Updated weights for policy 0, policy_version 934785 (0.0010) [2024-06-15 23:32:15,738][1648984] Fps is (10 sec: 85196.4, 60 sec: 90658.0, 300 sec: 88973.4). Total num frames: 1914503168. Throughput: 0: 22346.0. Samples: 478692864. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:32:15,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:32:16,021][1652475] Updated weights for policy 0, policy_version 934846 (0.0011) [2024-06-15 23:32:17,637][1652475] Updated weights for policy 0, policy_version 934912 (0.0012) [2024-06-15 23:32:18,350][1652475] Updated weights for policy 0, policy_version 934974 (0.0009) [2024-06-15 23:32:20,738][1648984] Fps is (10 sec: 78643.4, 60 sec: 87927.4, 300 sec: 88529.2). Total num frames: 1914863616. Throughput: 0: 22232.2. Samples: 478751232. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:32:20,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:32:21,074][1652475] Updated weights for policy 0, policy_version 935024 (0.0013) [2024-06-15 23:32:21,795][1652475] Updated weights for policy 0, policy_version 935079 (0.0008) [2024-06-15 23:32:22,617][1652475] Updated weights for policy 0, policy_version 935136 (0.0010) [2024-06-15 23:32:24,399][1652475] Updated weights for policy 0, policy_version 935172 (0.0011) [2024-06-15 23:32:25,738][1648984] Fps is (10 sec: 85196.3, 60 sec: 89565.7, 300 sec: 88640.2). Total num frames: 1915355136. Throughput: 0: 22334.5. Samples: 478887936. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:32:25,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:32:25,986][1652475] Updated weights for policy 0, policy_version 935233 (0.0009) [2024-06-15 23:32:26,685][1652475] Updated weights for policy 0, policy_version 935293 (0.0011) [2024-06-15 23:32:27,876][1652475] Updated weights for policy 0, policy_version 935344 (0.0012) [2024-06-15 23:32:28,361][1652475] Updated weights for policy 0, policy_version 935376 (0.0017) [2024-06-15 23:32:28,630][1651340] Signal inference workers to stop experience collection... (48150 times) [2024-06-15 23:32:28,679][1652475] InferenceWorker_p0-w0: stopping experience collection (48150 times) [2024-06-15 23:32:28,769][1651340] Signal inference workers to resume experience collection... (48150 times) [2024-06-15 23:32:28,770][1652475] InferenceWorker_p0-w0: resuming experience collection (48150 times) [2024-06-15 23:32:30,651][1652475] Updated weights for policy 0, policy_version 935427 (0.0013) [2024-06-15 23:32:30,738][1648984] Fps is (10 sec: 88473.8, 60 sec: 87927.6, 300 sec: 88418.1). Total num frames: 1915748352. Throughput: 0: 22254.9. Samples: 479024640. Policy #0 lag: (min: 50.0, avg: 185.2, max: 367.0) [2024-06-15 23:32:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:32:31,482][1652475] Updated weights for policy 0, policy_version 935489 (0.0012) [2024-06-15 23:32:32,970][1652475] Updated weights for policy 0, policy_version 935557 (0.0013) [2024-06-15 23:32:33,637][1652475] Updated weights for policy 0, policy_version 935616 (0.0010) [2024-06-15 23:32:35,058][1652475] Updated weights for policy 0, policy_version 935670 (0.0010) [2024-06-15 23:32:35,737][1648984] Fps is (10 sec: 91752.1, 60 sec: 89565.9, 300 sec: 88529.2). Total num frames: 1916272640. Throughput: 0: 22163.9. Samples: 479086080. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:32:35,738][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 23:32:36,815][1652475] Updated weights for policy 0, policy_version 935715 (0.0011) [2024-06-15 23:32:37,571][1652475] Updated weights for policy 0, policy_version 935763 (0.0011) [2024-06-15 23:32:38,666][1652475] Updated weights for policy 0, policy_version 935810 (0.0011) [2024-06-15 23:32:39,258][1652475] Updated weights for policy 0, policy_version 935862 (0.0011) [2024-06-15 23:32:40,738][1648984] Fps is (10 sec: 95027.0, 60 sec: 89565.9, 300 sec: 88640.2). Total num frames: 1916698624. Throughput: 0: 22220.8. Samples: 479223296. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:32:40,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 23:32:40,914][1652475] Updated weights for policy 0, policy_version 935909 (0.0075) [2024-06-15 23:32:42,013][1652475] Updated weights for policy 0, policy_version 935968 (0.0013) [2024-06-15 23:32:43,336][1652475] Updated weights for policy 0, policy_version 936017 (0.0014) [2024-06-15 23:32:43,852][1652475] Updated weights for policy 0, policy_version 936064 (0.0011) [2024-06-15 23:32:45,628][1652475] Updated weights for policy 0, policy_version 936128 (0.0011) [2024-06-15 23:32:45,738][1648984] Fps is (10 sec: 91742.8, 60 sec: 89564.7, 300 sec: 88862.1). Total num frames: 1917190144. Throughput: 0: 22459.3. Samples: 479364608. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:32:45,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:32:46,787][1652475] Updated weights for policy 0, policy_version 936190 (0.0009) [2024-06-15 23:32:47,769][1652475] Updated weights for policy 0, policy_version 936252 (0.0011) [2024-06-15 23:32:49,728][1652475] Updated weights for policy 0, policy_version 936304 (0.0081) [2024-06-15 23:32:50,738][1648984] Fps is (10 sec: 88473.8, 60 sec: 89565.9, 300 sec: 88862.4). Total num frames: 1917583360. Throughput: 0: 22152.5. Samples: 479423488. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:32:50,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:32:51,961][1652475] Updated weights for policy 0, policy_version 936368 (0.0009) [2024-06-15 23:32:52,680][1652475] Updated weights for policy 0, policy_version 936423 (0.0009) [2024-06-15 23:32:52,799][1651340] Signal inference workers to stop experience collection... (48200 times) [2024-06-15 23:32:52,850][1652475] InferenceWorker_p0-w0: stopping experience collection (48200 times) [2024-06-15 23:32:52,923][1651340] Signal inference workers to resume experience collection... (48200 times) [2024-06-15 23:32:52,923][1652475] InferenceWorker_p0-w0: resuming experience collection (48200 times) [2024-06-15 23:32:53,619][1652475] Updated weights for policy 0, policy_version 936506 (0.0011) [2024-06-15 23:32:55,738][1648984] Fps is (10 sec: 78648.8, 60 sec: 87381.2, 300 sec: 88418.0). Total num frames: 1917976576. Throughput: 0: 21868.1. Samples: 479547904. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:32:55,740][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:32:55,747][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000936512_1917976576.pth... [2024-06-15 23:32:55,807][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000926176_1896808448.pth [2024-06-15 23:32:56,760][1652475] Updated weights for policy 0, policy_version 936550 (0.0010) [2024-06-15 23:32:57,567][1652475] Updated weights for policy 0, policy_version 936610 (0.0009) [2024-06-15 23:32:58,186][1652475] Updated weights for policy 0, policy_version 936660 (0.0010) [2024-06-15 23:32:58,777][1652475] Updated weights for policy 0, policy_version 936706 (0.0011) [2024-06-15 23:32:59,464][1652475] Updated weights for policy 0, policy_version 936768 (0.0010) [2024-06-15 23:33:00,737][1648984] Fps is (10 sec: 91750.6, 60 sec: 87381.3, 300 sec: 88751.3). Total num frames: 1918500864. Throughput: 0: 21993.3. Samples: 479682560. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:00,738][1648984] Avg episode reward: [(0, '-0.160')] [2024-06-15 23:33:03,196][1652475] Updated weights for policy 0, policy_version 936832 (0.0124) [2024-06-15 23:33:03,998][1652475] Updated weights for policy 0, policy_version 936885 (0.0009) [2024-06-15 23:33:04,849][1652475] Updated weights for policy 0, policy_version 936951 (0.0011) [2024-06-15 23:33:05,738][1648984] Fps is (10 sec: 104858.4, 60 sec: 89565.9, 300 sec: 88862.4). Total num frames: 1919025152. Throughput: 0: 22357.3. Samples: 479757312. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:05,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:33:08,896][1652475] Updated weights for policy 0, policy_version 937042 (0.0011) [2024-06-15 23:33:09,588][1652475] Updated weights for policy 0, policy_version 937089 (0.0015) [2024-06-15 23:33:10,267][1652475] Updated weights for policy 0, policy_version 937143 (0.0009) [2024-06-15 23:33:10,737][1648984] Fps is (10 sec: 85197.3, 60 sec: 87927.7, 300 sec: 88195.9). Total num frames: 1919352832. Throughput: 0: 22198.1. Samples: 479886848. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:10,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:33:10,846][1652475] Updated weights for policy 0, policy_version 937190 (0.0008) [2024-06-15 23:33:11,455][1652475] Updated weights for policy 0, policy_version 937236 (0.0009) [2024-06-15 23:33:11,884][1652475] Updated weights for policy 0, policy_version 937279 (0.0010) [2024-06-15 23:33:14,478][1651340] Signal inference workers to stop experience collection... (48250 times) [2024-06-15 23:33:14,515][1652475] InferenceWorker_p0-w0: stopping experience collection (48250 times) [2024-06-15 23:33:14,614][1651340] Signal inference workers to resume experience collection... (48250 times) [2024-06-15 23:33:14,615][1652475] InferenceWorker_p0-w0: resuming experience collection (48250 times) [2024-06-15 23:33:14,955][1652475] Updated weights for policy 0, policy_version 937329 (0.0009) [2024-06-15 23:33:15,700][1652475] Updated weights for policy 0, policy_version 937392 (0.0085) [2024-06-15 23:33:15,738][1648984] Fps is (10 sec: 75365.5, 60 sec: 87927.4, 300 sec: 88306.9). Total num frames: 1919778816. Throughput: 0: 22209.4. Samples: 480024064. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:15,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:33:16,384][1652475] Updated weights for policy 0, policy_version 937443 (0.0010) [2024-06-15 23:33:17,014][1652475] Updated weights for policy 0, policy_version 937494 (0.0012) [2024-06-15 23:33:20,334][1652475] Updated weights for policy 0, policy_version 937552 (0.0070) [2024-06-15 23:33:20,738][1648984] Fps is (10 sec: 81918.4, 60 sec: 88473.5, 300 sec: 88306.9). Total num frames: 1920172032. Throughput: 0: 22118.3. Samples: 480081408. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:20,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:33:21,176][1652475] Updated weights for policy 0, policy_version 937616 (0.0011) [2024-06-15 23:33:21,975][1652475] Updated weights for policy 0, policy_version 937669 (0.0011) [2024-06-15 23:33:22,727][1652475] Updated weights for policy 0, policy_version 937731 (0.0011) [2024-06-15 23:33:25,738][1648984] Fps is (10 sec: 81920.6, 60 sec: 87381.5, 300 sec: 88529.1). Total num frames: 1920598016. Throughput: 0: 21936.3. Samples: 480210432. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:25,738][1648984] Avg episode reward: [(0, '-0.170')] [2024-06-15 23:33:26,449][1652475] Updated weights for policy 0, policy_version 937795 (0.0059) [2024-06-15 23:33:27,530][1652475] Updated weights for policy 0, policy_version 937879 (0.0014) [2024-06-15 23:33:28,315][1652475] Updated weights for policy 0, policy_version 937941 (0.0011) [2024-06-15 23:33:29,263][1652475] Updated weights for policy 0, policy_version 938005 (0.0016) [2024-06-15 23:33:29,716][1652475] Updated weights for policy 0, policy_version 938047 (0.0011) [2024-06-15 23:33:30,738][1648984] Fps is (10 sec: 95027.2, 60 sec: 89565.7, 300 sec: 88862.3). Total num frames: 1921122304. Throughput: 0: 21584.0. Samples: 480335872. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:30,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:33:32,689][1652475] Updated weights for policy 0, policy_version 938097 (0.0019) [2024-06-15 23:33:33,372][1652475] Updated weights for policy 0, policy_version 938160 (0.0014) [2024-06-15 23:33:34,382][1651340] Signal inference workers to stop experience collection... (48300 times) [2024-06-15 23:33:34,420][1652475] InferenceWorker_p0-w0: stopping experience collection (48300 times) [2024-06-15 23:33:34,529][1651340] Signal inference workers to resume experience collection... (48300 times) [2024-06-15 23:33:34,530][1652475] InferenceWorker_p0-w0: resuming experience collection (48300 times) [2024-06-15 23:33:34,531][1652475] Updated weights for policy 0, policy_version 938208 (0.0011) [2024-06-15 23:33:34,919][1652475] Updated weights for policy 0, policy_version 938239 (0.0012) [2024-06-15 23:33:35,742][1648984] Fps is (10 sec: 98259.6, 60 sec: 88466.8, 300 sec: 88638.9). Total num frames: 1921581056. Throughput: 0: 21854.5. Samples: 480407040. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:35,743][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:33:36,036][1652475] Updated weights for policy 0, policy_version 938302 (0.0010) [2024-06-15 23:33:37,854][1652475] Updated weights for policy 0, policy_version 938339 (0.0011) [2024-06-15 23:33:38,701][1652475] Updated weights for policy 0, policy_version 938403 (0.0011) [2024-06-15 23:33:40,737][1648984] Fps is (10 sec: 78644.3, 60 sec: 86835.3, 300 sec: 87973.7). Total num frames: 1921908736. Throughput: 0: 21936.4. Samples: 480535040. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:40,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:33:41,365][1652475] Updated weights for policy 0, policy_version 938464 (0.0011) [2024-06-15 23:33:42,114][1652475] Updated weights for policy 0, policy_version 938516 (0.0011) [2024-06-15 23:33:43,250][1652475] Updated weights for policy 0, policy_version 938576 (0.0012) [2024-06-15 23:33:43,879][1652475] Updated weights for policy 0, policy_version 938624 (0.0009) [2024-06-15 23:33:44,513][1652475] Updated weights for policy 0, policy_version 938672 (0.0008) [2024-06-15 23:33:45,738][1648984] Fps is (10 sec: 85234.2, 60 sec: 87382.2, 300 sec: 87973.7). Total num frames: 1922433024. Throughput: 0: 22038.7. Samples: 480674304. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:45,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:33:47,117][1652475] Updated weights for policy 0, policy_version 938736 (0.0010) [2024-06-15 23:33:47,846][1652475] Updated weights for policy 0, policy_version 938786 (0.0072) [2024-06-15 23:33:49,233][1652475] Updated weights for policy 0, policy_version 938838 (0.0012) [2024-06-15 23:33:50,105][1652475] Updated weights for policy 0, policy_version 938896 (0.0010) [2024-06-15 23:33:50,738][1648984] Fps is (10 sec: 104856.9, 60 sec: 89565.8, 300 sec: 88751.3). Total num frames: 1922957312. Throughput: 0: 21822.6. Samples: 480739328. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:33:52,494][1652475] Updated weights for policy 0, policy_version 938946 (0.0009) [2024-06-15 23:33:53,077][1652475] Updated weights for policy 0, policy_version 939000 (0.0011) [2024-06-15 23:33:53,718][1652475] Updated weights for policy 0, policy_version 939042 (0.0068) [2024-06-15 23:33:55,627][1652475] Updated weights for policy 0, policy_version 939104 (0.0019) [2024-06-15 23:33:55,738][1648984] Fps is (10 sec: 85196.4, 60 sec: 88473.4, 300 sec: 88640.2). Total num frames: 1923284992. Throughput: 0: 21765.5. Samples: 480866304. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:33:55,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:33:55,971][1652475] Updated weights for policy 0, policy_version 939133 (0.0009) [2024-06-15 23:33:56,810][1651340] Signal inference workers to stop experience collection... (48350 times) [2024-06-15 23:33:56,864][1652475] InferenceWorker_p0-w0: stopping experience collection (48350 times) [2024-06-15 23:33:56,936][1651340] Signal inference workers to resume experience collection... (48350 times) [2024-06-15 23:33:56,937][1652475] InferenceWorker_p0-w0: resuming experience collection (48350 times) [2024-06-15 23:33:57,244][1652475] Updated weights for policy 0, policy_version 939192 (0.0011) [2024-06-15 23:33:58,253][1652475] Updated weights for policy 0, policy_version 939237 (0.0010) [2024-06-15 23:33:59,424][1652475] Updated weights for policy 0, policy_version 939299 (0.0012) [2024-06-15 23:34:00,738][1648984] Fps is (10 sec: 78642.4, 60 sec: 87381.1, 300 sec: 88751.3). Total num frames: 1923743744. Throughput: 0: 21617.8. Samples: 480996864. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:34:00,739][1648984] Avg episode reward: [(0, '-0.410')] [2024-06-15 23:34:02,381][1652475] Updated weights for policy 0, policy_version 939352 (0.0012) [2024-06-15 23:34:03,375][1652475] Updated weights for policy 0, policy_version 939425 (0.0071) [2024-06-15 23:34:04,230][1652475] Updated weights for policy 0, policy_version 939492 (0.0012) [2024-06-15 23:34:05,068][1652475] Updated weights for policy 0, policy_version 939555 (0.0010) [2024-06-15 23:34:05,742][1648984] Fps is (10 sec: 98260.3, 60 sec: 87374.5, 300 sec: 88862.3). Total num frames: 1924268032. Throughput: 0: 21922.8. Samples: 481068032. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:34:05,743][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:34:08,592][1652475] Updated weights for policy 0, policy_version 939589 (0.0012) [2024-06-15 23:34:09,206][1652475] Updated weights for policy 0, policy_version 939641 (0.0011) [2024-06-15 23:34:10,042][1652475] Updated weights for policy 0, policy_version 939684 (0.0011) [2024-06-15 23:34:10,737][1648984] Fps is (10 sec: 81921.7, 60 sec: 86835.2, 300 sec: 88529.2). Total num frames: 1924562944. Throughput: 0: 21868.1. Samples: 481194496. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:34:10,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:34:10,766][1652475] Updated weights for policy 0, policy_version 939744 (0.0012) [2024-06-15 23:34:11,632][1652475] Updated weights for policy 0, policy_version 939801 (0.0089) [2024-06-15 23:34:14,871][1652475] Updated weights for policy 0, policy_version 939856 (0.0013) [2024-06-15 23:34:15,467][1652475] Updated weights for policy 0, policy_version 939897 (0.0023) [2024-06-15 23:34:15,737][1648984] Fps is (10 sec: 68845.4, 60 sec: 86289.3, 300 sec: 88307.0). Total num frames: 1924956160. Throughput: 0: 21993.3. Samples: 481325568. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:34:15,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:34:16,068][1652475] Updated weights for policy 0, policy_version 939937 (0.0009) [2024-06-15 23:34:16,559][1652475] Updated weights for policy 0, policy_version 939974 (0.0010) [2024-06-15 23:34:17,233][1652475] Updated weights for policy 0, policy_version 940032 (0.0010) [2024-06-15 23:34:17,556][1651340] Signal inference workers to stop experience collection... (48400 times) [2024-06-15 23:34:17,604][1652475] InferenceWorker_p0-w0: stopping experience collection (48400 times) [2024-06-15 23:34:17,685][1651340] Signal inference workers to resume experience collection... (48400 times) [2024-06-15 23:34:17,687][1652475] InferenceWorker_p0-w0: resuming experience collection (48400 times) [2024-06-15 23:34:18,072][1652475] Updated weights for policy 0, policy_version 940096 (0.0061) [2024-06-15 23:34:20,738][1648984] Fps is (10 sec: 75365.1, 60 sec: 85743.0, 300 sec: 88084.9). Total num frames: 1925316608. Throughput: 0: 21665.4. Samples: 481381888. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:34:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:34:21,509][1652475] Updated weights for policy 0, policy_version 940160 (0.0009) [2024-06-15 23:34:22,397][1652475] Updated weights for policy 0, policy_version 940224 (0.0010) [2024-06-15 23:34:23,185][1652475] Updated weights for policy 0, policy_version 940276 (0.0010) [2024-06-15 23:34:23,833][1652475] Updated weights for policy 0, policy_version 940323 (0.0009) [2024-06-15 23:34:25,738][1648984] Fps is (10 sec: 88472.4, 60 sec: 87381.3, 300 sec: 88418.0). Total num frames: 1925840896. Throughput: 0: 21697.4. Samples: 481511424. Policy #0 lag: (min: 18.0, avg: 140.0, max: 274.0) [2024-06-15 23:34:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:34:26,999][1652475] Updated weights for policy 0, policy_version 940384 (0.0011) [2024-06-15 23:34:27,729][1652475] Updated weights for policy 0, policy_version 940434 (0.0010) [2024-06-15 23:34:28,382][1652475] Updated weights for policy 0, policy_version 940485 (0.0009) [2024-06-15 23:34:29,041][1652475] Updated weights for policy 0, policy_version 940533 (0.0010) [2024-06-15 23:34:29,910][1652475] Updated weights for policy 0, policy_version 940603 (0.0079) [2024-06-15 23:34:30,738][1648984] Fps is (10 sec: 104855.8, 60 sec: 87381.1, 300 sec: 88862.6). Total num frames: 1926365184. Throughput: 0: 21617.7. Samples: 481647104. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:34:30,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:34:32,853][1652475] Updated weights for policy 0, policy_version 940656 (0.0010) [2024-06-15 23:34:33,802][1652475] Updated weights for policy 0, policy_version 940705 (0.0011) [2024-06-15 23:34:34,574][1652475] Updated weights for policy 0, policy_version 940768 (0.0020) [2024-06-15 23:34:35,419][1652475] Updated weights for policy 0, policy_version 940832 (0.0012) [2024-06-15 23:34:35,738][1648984] Fps is (10 sec: 101581.4, 60 sec: 87934.2, 300 sec: 88751.3). Total num frames: 1926856704. Throughput: 0: 21845.4. Samples: 481722368. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:34:35,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:34:38,179][1652475] Updated weights for policy 0, policy_version 940883 (0.0011) [2024-06-15 23:34:38,650][1652475] Updated weights for policy 0, policy_version 940926 (0.0010) [2024-06-15 23:34:39,342][1651340] Signal inference workers to stop experience collection... (48450 times) [2024-06-15 23:34:39,381][1652475] InferenceWorker_p0-w0: stopping experience collection (48450 times) [2024-06-15 23:34:39,463][1651340] Signal inference workers to resume experience collection... (48450 times) [2024-06-15 23:34:39,470][1652475] InferenceWorker_p0-w0: resuming experience collection (48450 times) [2024-06-15 23:34:39,704][1652475] Updated weights for policy 0, policy_version 940976 (0.0010) [2024-06-15 23:34:40,738][1648984] Fps is (10 sec: 78645.3, 60 sec: 87381.3, 300 sec: 87973.7). Total num frames: 1927151616. Throughput: 0: 21890.9. Samples: 481851392. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:34:40,738][1648984] Avg episode reward: [(0, '-0.180')] [2024-06-15 23:34:40,923][1652475] Updated weights for policy 0, policy_version 941011 (0.0009) [2024-06-15 23:34:41,574][1652475] Updated weights for policy 0, policy_version 941061 (0.0008) [2024-06-15 23:34:42,203][1652475] Updated weights for policy 0, policy_version 941117 (0.0010) [2024-06-15 23:34:44,125][1652475] Updated weights for policy 0, policy_version 941181 (0.0012) [2024-06-15 23:34:45,389][1652475] Updated weights for policy 0, policy_version 941232 (0.0011) [2024-06-15 23:34:45,738][1648984] Fps is (10 sec: 81917.6, 60 sec: 87381.2, 300 sec: 88418.1). Total num frames: 1927675904. Throughput: 0: 21720.1. Samples: 481974272. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:34:45,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:34:47,509][1652475] Updated weights for policy 0, policy_version 941273 (0.0009) [2024-06-15 23:34:48,324][1652475] Updated weights for policy 0, policy_version 941334 (0.0009) [2024-06-15 23:34:49,445][1652475] Updated weights for policy 0, policy_version 941393 (0.0010) [2024-06-15 23:34:50,635][1652475] Updated weights for policy 0, policy_version 941458 (0.0012) [2024-06-15 23:34:50,737][1648984] Fps is (10 sec: 95027.5, 60 sec: 85743.0, 300 sec: 88307.0). Total num frames: 1928101888. Throughput: 0: 21631.4. Samples: 482041344. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:34:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:34:51,137][1652475] Updated weights for policy 0, policy_version 941503 (0.0010) [2024-06-15 23:34:54,018][1652475] Updated weights for policy 0, policy_version 941555 (0.0071) [2024-06-15 23:34:54,668][1652475] Updated weights for policy 0, policy_version 941603 (0.0009) [2024-06-15 23:34:55,548][1652475] Updated weights for policy 0, policy_version 941665 (0.0010) [2024-06-15 23:34:55,742][1648984] Fps is (10 sec: 88437.5, 60 sec: 87921.4, 300 sec: 88639.0). Total num frames: 1928560640. Throughput: 0: 21945.6. Samples: 482182144. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:34:55,742][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:34:55,899][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000941696_1928593408.pth... [2024-06-15 23:34:56,017][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000931376_1907458048.pth [2024-06-15 23:34:56,613][1652475] Updated weights for policy 0, policy_version 941744 (0.0011) [2024-06-15 23:34:59,594][1652475] Updated weights for policy 0, policy_version 941792 (0.0010) [2024-06-15 23:35:00,509][1652475] Updated weights for policy 0, policy_version 941856 (0.0077) [2024-06-15 23:35:00,738][1648984] Fps is (10 sec: 85196.5, 60 sec: 86835.4, 300 sec: 88751.3). Total num frames: 1928953856. Throughput: 0: 21845.3. Samples: 482308608. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:00,740][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:35:01,557][1651340] Signal inference workers to stop experience collection... (48500 times) [2024-06-15 23:35:01,575][1652475] InferenceWorker_p0-w0: stopping experience collection (48500 times) [2024-06-15 23:35:01,682][1651340] Signal inference workers to resume experience collection... (48500 times) [2024-06-15 23:35:01,683][1652475] InferenceWorker_p0-w0: resuming experience collection (48500 times) [2024-06-15 23:35:01,784][1652475] Updated weights for policy 0, policy_version 941906 (0.0010) [2024-06-15 23:35:02,406][1652475] Updated weights for policy 0, policy_version 941953 (0.0010) [2024-06-15 23:35:03,070][1652475] Updated weights for policy 0, policy_version 942013 (0.0009) [2024-06-15 23:35:05,476][1652475] Updated weights for policy 0, policy_version 942064 (0.0011) [2024-06-15 23:35:05,738][1648984] Fps is (10 sec: 81955.5, 60 sec: 85203.4, 300 sec: 88420.1). Total num frames: 1929379840. Throughput: 0: 21890.9. Samples: 482366976. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:05,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:35:06,133][1652475] Updated weights for policy 0, policy_version 942112 (0.0010) [2024-06-15 23:35:06,513][1652475] Updated weights for policy 0, policy_version 942143 (0.0009) [2024-06-15 23:35:08,457][1652475] Updated weights for policy 0, policy_version 942204 (0.0011) [2024-06-15 23:35:09,147][1652475] Updated weights for policy 0, policy_version 942245 (0.0014) [2024-06-15 23:35:10,738][1648984] Fps is (10 sec: 85190.9, 60 sec: 87380.2, 300 sec: 88084.6). Total num frames: 1929805824. Throughput: 0: 21981.5. Samples: 482500608. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:10,739][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:35:10,795][1652475] Updated weights for policy 0, policy_version 942304 (0.0010) [2024-06-15 23:35:11,811][1652475] Updated weights for policy 0, policy_version 942384 (0.0010) [2024-06-15 23:35:15,639][1652475] Updated weights for policy 0, policy_version 942468 (0.0010) [2024-06-15 23:35:15,738][1648984] Fps is (10 sec: 81918.8, 60 sec: 87381.0, 300 sec: 87973.7). Total num frames: 1930199040. Throughput: 0: 21970.5. Samples: 482635776. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:35:16,266][1652475] Updated weights for policy 0, policy_version 942514 (0.0010) [2024-06-15 23:35:16,883][1652475] Updated weights for policy 0, policy_version 942562 (0.0011) [2024-06-15 23:35:17,744][1652475] Updated weights for policy 0, policy_version 942627 (0.0010) [2024-06-15 23:35:20,737][1648984] Fps is (10 sec: 75372.4, 60 sec: 87381.6, 300 sec: 87529.5). Total num frames: 1930559488. Throughput: 0: 21390.3. Samples: 482684928. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:20,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:35:21,428][1652475] Updated weights for policy 0, policy_version 942692 (0.0009) [2024-06-15 23:35:22,060][1652475] Updated weights for policy 0, policy_version 942743 (0.0010) [2024-06-15 23:35:22,698][1651340] Signal inference workers to stop experience collection... (48550 times) [2024-06-15 23:35:22,756][1652475] InferenceWorker_p0-w0: stopping experience collection (48550 times) [2024-06-15 23:35:22,836][1651340] Signal inference workers to resume experience collection... (48550 times) [2024-06-15 23:35:22,837][1652475] InferenceWorker_p0-w0: resuming experience collection (48550 times) [2024-06-15 23:35:22,940][1652475] Updated weights for policy 0, policy_version 942802 (0.0010) [2024-06-15 23:35:23,622][1652475] Updated weights for policy 0, policy_version 942851 (0.0009) [2024-06-15 23:35:24,240][1652475] Updated weights for policy 0, policy_version 942899 (0.0009) [2024-06-15 23:35:25,737][1648984] Fps is (10 sec: 88475.3, 60 sec: 87381.4, 300 sec: 87973.8). Total num frames: 1931083776. Throughput: 0: 21629.2. Samples: 482824704. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:25,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:35:26,807][1652475] Updated weights for policy 0, policy_version 942944 (0.0012) [2024-06-15 23:35:27,802][1652475] Updated weights for policy 0, policy_version 943010 (0.0009) [2024-06-15 23:35:28,653][1652475] Updated weights for policy 0, policy_version 943074 (0.0010) [2024-06-15 23:35:29,600][1652475] Updated weights for policy 0, policy_version 943140 (0.0010) [2024-06-15 23:35:30,738][1648984] Fps is (10 sec: 104856.9, 60 sec: 87381.8, 300 sec: 88418.1). Total num frames: 1931608064. Throughput: 0: 21708.9. Samples: 482951168. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:30,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:35:32,787][1652475] Updated weights for policy 0, policy_version 943201 (0.0012) [2024-06-15 23:35:33,547][1652475] Updated weights for policy 0, policy_version 943254 (0.0009) [2024-06-15 23:35:34,489][1652475] Updated weights for policy 0, policy_version 943328 (0.0011) [2024-06-15 23:35:35,231][1652475] Updated weights for policy 0, policy_version 943377 (0.0014) [2024-06-15 23:35:35,738][1648984] Fps is (10 sec: 104857.1, 60 sec: 87927.4, 300 sec: 88862.4). Total num frames: 1932132352. Throughput: 0: 22050.1. Samples: 483033600. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:35,738][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:35:38,123][1652475] Updated weights for policy 0, policy_version 943441 (0.0008) [2024-06-15 23:35:39,327][1652475] Updated weights for policy 0, policy_version 943491 (0.0011) [2024-06-15 23:35:40,210][1652475] Updated weights for policy 0, policy_version 943559 (0.0009) [2024-06-15 23:35:40,738][1648984] Fps is (10 sec: 88472.1, 60 sec: 89019.5, 300 sec: 88307.0). Total num frames: 1932492800. Throughput: 0: 21870.1. Samples: 483166208. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:40,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:35:41,012][1652475] Updated weights for policy 0, policy_version 943617 (0.0010) [2024-06-15 23:35:41,679][1652475] Updated weights for policy 0, policy_version 943679 (0.0011) [2024-06-15 23:35:43,602][1651340] Signal inference workers to stop experience collection... (48600 times) [2024-06-15 23:35:43,636][1652475] InferenceWorker_p0-w0: stopping experience collection (48600 times) [2024-06-15 23:35:43,728][1651340] Signal inference workers to resume experience collection... (48600 times) [2024-06-15 23:35:43,728][1652475] InferenceWorker_p0-w0: resuming experience collection (48600 times) [2024-06-15 23:35:44,287][1652475] Updated weights for policy 0, policy_version 943739 (0.0011) [2024-06-15 23:35:45,737][1648984] Fps is (10 sec: 72090.1, 60 sec: 86289.5, 300 sec: 87751.6). Total num frames: 1932853248. Throughput: 0: 21913.6. Samples: 483294720. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:35:45,861][1652475] Updated weights for policy 0, policy_version 943796 (0.0071) [2024-06-15 23:35:46,448][1652475] Updated weights for policy 0, policy_version 943840 (0.0012) [2024-06-15 23:35:47,615][1652475] Updated weights for policy 0, policy_version 943888 (0.0012) [2024-06-15 23:35:49,190][1652475] Updated weights for policy 0, policy_version 943959 (0.0010) [2024-06-15 23:35:50,738][1648984] Fps is (10 sec: 81919.3, 60 sec: 86834.8, 300 sec: 87862.6). Total num frames: 1933312000. Throughput: 0: 21868.0. Samples: 483351040. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:50,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 23:35:50,746][1652475] Updated weights for policy 0, policy_version 944003 (0.0010) [2024-06-15 23:35:51,442][1652475] Updated weights for policy 0, policy_version 944060 (0.0010) [2024-06-15 23:35:53,518][1652475] Updated weights for policy 0, policy_version 944127 (0.0011) [2024-06-15 23:35:54,719][1652475] Updated weights for policy 0, policy_version 944176 (0.0010) [2024-06-15 23:35:55,639][1652475] Updated weights for policy 0, policy_version 944249 (0.0010) [2024-06-15 23:35:55,738][1648984] Fps is (10 sec: 98303.7, 60 sec: 87933.9, 300 sec: 88307.0). Total num frames: 1933836288. Throughput: 0: 21970.8. Samples: 483489280. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:35:55,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 23:35:57,055][1652475] Updated weights for policy 0, policy_version 944318 (0.0011) [2024-06-15 23:36:00,012][1652475] Updated weights for policy 0, policy_version 944357 (0.0025) [2024-06-15 23:36:00,637][1652475] Updated weights for policy 0, policy_version 944401 (0.0010) [2024-06-15 23:36:00,737][1648984] Fps is (10 sec: 81922.3, 60 sec: 86289.2, 300 sec: 87862.9). Total num frames: 1934131200. Throughput: 0: 21947.8. Samples: 483623424. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:36:00,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:36:01,361][1652475] Updated weights for policy 0, policy_version 944464 (0.0009) [2024-06-15 23:36:02,446][1652475] Updated weights for policy 0, policy_version 944544 (0.0011) [2024-06-15 23:36:05,369][1652475] Updated weights for policy 0, policy_version 944579 (0.0011) [2024-06-15 23:36:05,738][1648984] Fps is (10 sec: 72089.7, 60 sec: 86289.1, 300 sec: 87973.8). Total num frames: 1934557184. Throughput: 0: 22163.9. Samples: 483682304. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:36:05,738][1648984] Avg episode reward: [(0, '-0.170')] [2024-06-15 23:36:05,972][1652475] Updated weights for policy 0, policy_version 944626 (0.0010) [2024-06-15 23:36:06,164][1651340] Signal inference workers to stop experience collection... (48650 times) [2024-06-15 23:36:06,191][1652475] InferenceWorker_p0-w0: stopping experience collection (48650 times) [2024-06-15 23:36:06,293][1651340] Signal inference workers to resume experience collection... (48650 times) [2024-06-15 23:36:06,295][1652475] InferenceWorker_p0-w0: resuming experience collection (48650 times) [2024-06-15 23:36:06,623][1652475] Updated weights for policy 0, policy_version 944675 (0.0011) [2024-06-15 23:36:07,549][1652475] Updated weights for policy 0, policy_version 944752 (0.0062) [2024-06-15 23:36:08,214][1652475] Updated weights for policy 0, policy_version 944800 (0.0011) [2024-06-15 23:36:10,738][1648984] Fps is (10 sec: 88472.5, 60 sec: 86836.1, 300 sec: 87973.7). Total num frames: 1935015936. Throughput: 0: 22095.6. Samples: 483819008. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:36:10,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:36:10,814][1652475] Updated weights for policy 0, policy_version 944848 (0.0012) [2024-06-15 23:36:11,651][1652475] Updated weights for policy 0, policy_version 944912 (0.0008) [2024-06-15 23:36:12,252][1652475] Updated weights for policy 0, policy_version 944956 (0.0010) [2024-06-15 23:36:13,369][1652475] Updated weights for policy 0, policy_version 945018 (0.0011) [2024-06-15 23:36:14,127][1652475] Updated weights for policy 0, policy_version 945058 (0.0010) [2024-06-15 23:36:15,737][1648984] Fps is (10 sec: 98304.4, 60 sec: 89020.1, 300 sec: 87973.8). Total num frames: 1935540224. Throughput: 0: 22220.8. Samples: 483951104. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:36:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:36:16,279][1652475] Updated weights for policy 0, policy_version 945094 (0.0010) [2024-06-15 23:36:17,008][1652475] Updated weights for policy 0, policy_version 945152 (0.0074) [2024-06-15 23:36:17,863][1652475] Updated weights for policy 0, policy_version 945216 (0.0011) [2024-06-15 23:36:20,544][1652475] Updated weights for policy 0, policy_version 945278 (0.0010) [2024-06-15 23:36:20,738][1648984] Fps is (10 sec: 91750.4, 60 sec: 89565.6, 300 sec: 87973.7). Total num frames: 1935933440. Throughput: 0: 21765.7. Samples: 484013056. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:36:20,739][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 23:36:21,287][1652475] Updated weights for policy 0, policy_version 945335 (0.0012) [2024-06-15 23:36:22,241][1652475] Updated weights for policy 0, policy_version 945376 (0.0008) [2024-06-15 23:36:22,871][1652475] Updated weights for policy 0, policy_version 945424 (0.0010) [2024-06-15 23:36:23,463][1652475] Updated weights for policy 0, policy_version 945471 (0.0018) [2024-06-15 23:36:25,738][1648984] Fps is (10 sec: 81919.6, 60 sec: 87927.4, 300 sec: 87751.6). Total num frames: 1936359424. Throughput: 0: 21777.1. Samples: 484146176. Policy #0 lag: (min: 123.0, avg: 228.6, max: 399.0) [2024-06-15 23:36:25,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:36:26,073][1652475] Updated weights for policy 0, policy_version 945532 (0.0011) [2024-06-15 23:36:27,797][1652475] Updated weights for policy 0, policy_version 945584 (0.0011) [2024-06-15 23:36:27,892][1651340] Signal inference workers to stop experience collection... (48700 times) [2024-06-15 23:36:27,915][1652475] InferenceWorker_p0-w0: stopping experience collection (48700 times) [2024-06-15 23:36:28,025][1651340] Signal inference workers to resume experience collection... (48700 times) [2024-06-15 23:36:28,026][1652475] InferenceWorker_p0-w0: resuming experience collection (48700 times) [2024-06-15 23:36:28,446][1652475] Updated weights for policy 0, policy_version 945632 (0.0008) [2024-06-15 23:36:29,011][1652475] Updated weights for policy 0, policy_version 945670 (0.0007) [2024-06-15 23:36:29,611][1652475] Updated weights for policy 0, policy_version 945721 (0.0007) [2024-06-15 23:36:30,738][1648984] Fps is (10 sec: 95027.7, 60 sec: 87927.4, 300 sec: 88084.8). Total num frames: 1936883712. Throughput: 0: 22209.4. Samples: 484294144. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:36:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:36:30,979][1652475] Updated weights for policy 0, policy_version 945762 (0.0007) [2024-06-15 23:36:32,257][1652475] Updated weights for policy 0, policy_version 945811 (0.0007) [2024-06-15 23:36:32,865][1652475] Updated weights for policy 0, policy_version 945860 (0.0007) [2024-06-15 23:36:33,549][1652475] Updated weights for policy 0, policy_version 945920 (0.0009) [2024-06-15 23:36:34,210][1652475] Updated weights for policy 0, policy_version 945977 (0.0009) [2024-06-15 23:36:35,672][1652475] Updated weights for policy 0, policy_version 946016 (0.0009) [2024-06-15 23:36:35,738][1648984] Fps is (10 sec: 108133.6, 60 sec: 88473.5, 300 sec: 88529.1). Total num frames: 1937440768. Throughput: 0: 22721.5. Samples: 484373504. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:36:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:36:36,920][1652475] Updated weights for policy 0, policy_version 946064 (0.0015) [2024-06-15 23:36:37,642][1652475] Updated weights for policy 0, policy_version 946113 (0.0008) [2024-06-15 23:36:38,278][1652475] Updated weights for policy 0, policy_version 946161 (0.0012) [2024-06-15 23:36:39,063][1652475] Updated weights for policy 0, policy_version 946224 (0.0009) [2024-06-15 23:36:40,637][1652475] Updated weights for policy 0, policy_version 946272 (0.0008) [2024-06-15 23:36:40,738][1648984] Fps is (10 sec: 108133.4, 60 sec: 91204.3, 300 sec: 88640.2). Total num frames: 1937965056. Throughput: 0: 23074.1. Samples: 484527616. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:36:40,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:36:42,113][1652475] Updated weights for policy 0, policy_version 946322 (0.0010) [2024-06-15 23:36:42,637][1652475] Updated weights for policy 0, policy_version 946368 (0.0009) [2024-06-15 23:36:43,518][1652475] Updated weights for policy 0, policy_version 946421 (0.0008) [2024-06-15 23:36:44,216][1652475] Updated weights for policy 0, policy_version 946480 (0.0009) [2024-06-15 23:36:45,329][1651340] Signal inference workers to stop experience collection... (48750 times) [2024-06-15 23:36:45,371][1652475] InferenceWorker_p0-w0: stopping experience collection (48750 times) [2024-06-15 23:36:45,457][1651340] Signal inference workers to resume experience collection... (48750 times) [2024-06-15 23:36:45,458][1652475] InferenceWorker_p0-w0: resuming experience collection (48750 times) [2024-06-15 23:36:45,519][1652475] Updated weights for policy 0, policy_version 946512 (0.0070) [2024-06-15 23:36:45,738][1648984] Fps is (10 sec: 104858.5, 60 sec: 93934.9, 300 sec: 89084.5). Total num frames: 1938489344. Throughput: 0: 23745.4. Samples: 484691968. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:36:45,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:36:46,004][1652475] Updated weights for policy 0, policy_version 946559 (0.0009) [2024-06-15 23:36:47,668][1652475] Updated weights for policy 0, policy_version 946598 (0.0010) [2024-06-15 23:36:48,362][1652475] Updated weights for policy 0, policy_version 946656 (0.0009) [2024-06-15 23:36:49,063][1652475] Updated weights for policy 0, policy_version 946708 (0.0009) [2024-06-15 23:36:50,056][1652475] Updated weights for policy 0, policy_version 946754 (0.0010) [2024-06-15 23:36:50,660][1652475] Updated weights for policy 0, policy_version 946812 (0.0011) [2024-06-15 23:36:50,738][1648984] Fps is (10 sec: 111412.1, 60 sec: 96119.8, 300 sec: 89306.7). Total num frames: 1939079168. Throughput: 0: 24177.8. Samples: 484770304. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:36:50,762][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:36:52,673][1652475] Updated weights for policy 0, policy_version 946873 (0.0009) [2024-06-15 23:36:53,808][1652475] Updated weights for policy 0, policy_version 946928 (0.0009) [2024-06-15 23:36:54,438][1652475] Updated weights for policy 0, policy_version 946963 (0.0009) [2024-06-15 23:36:55,038][1652475] Updated weights for policy 0, policy_version 947010 (0.0012) [2024-06-15 23:36:55,601][1652475] Updated weights for policy 0, policy_version 947062 (0.0010) [2024-06-15 23:36:55,738][1648984] Fps is (10 sec: 111409.7, 60 sec: 96119.3, 300 sec: 89306.6). Total num frames: 1939603456. Throughput: 0: 24450.8. Samples: 484919296. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:36:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:36:55,742][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000947072_1939603456.pth... [2024-06-15 23:36:55,781][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000936512_1917976576.pth [2024-06-15 23:36:57,079][1652475] Updated weights for policy 0, policy_version 947108 (0.0010) [2024-06-15 23:36:58,560][1652475] Updated weights for policy 0, policy_version 947152 (0.0009) [2024-06-15 23:36:59,070][1652475] Updated weights for policy 0, policy_version 947198 (0.0011) [2024-06-15 23:36:59,797][1652475] Updated weights for policy 0, policy_version 947248 (0.0009) [2024-06-15 23:37:00,463][1652475] Updated weights for policy 0, policy_version 947296 (0.0009) [2024-06-15 23:37:00,738][1648984] Fps is (10 sec: 101578.1, 60 sec: 99395.7, 300 sec: 89639.8). Total num frames: 1940094976. Throughput: 0: 25087.8. Samples: 485080064. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:00,739][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 23:37:01,612][1652475] Updated weights for policy 0, policy_version 947346 (0.0009) [2024-06-15 23:37:03,417][1652475] Updated weights for policy 0, policy_version 947393 (0.0009) [2024-06-15 23:37:04,025][1652475] Updated weights for policy 0, policy_version 947452 (0.0009) [2024-06-15 23:37:04,641][1651340] Signal inference workers to stop experience collection... (48800 times) [2024-06-15 23:37:04,690][1652475] InferenceWorker_p0-w0: stopping experience collection (48800 times) [2024-06-15 23:37:04,790][1651340] Signal inference workers to resume experience collection... (48800 times) [2024-06-15 23:37:04,791][1652475] InferenceWorker_p0-w0: resuming experience collection (48800 times) [2024-06-15 23:37:05,003][1652475] Updated weights for policy 0, policy_version 947504 (0.0008) [2024-06-15 23:37:05,710][1652475] Updated weights for policy 0, policy_version 947554 (0.0009) [2024-06-15 23:37:05,738][1648984] Fps is (10 sec: 98304.3, 60 sec: 100488.4, 300 sec: 89862.1). Total num frames: 1940586496. Throughput: 0: 25452.1. Samples: 485158400. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:05,739][1648984] Avg episode reward: [(0, '-0.210')] [2024-06-15 23:37:06,484][1652475] Updated weights for policy 0, policy_version 947600 (0.0010) [2024-06-15 23:37:08,945][1652475] Updated weights for policy 0, policy_version 947649 (0.0011) [2024-06-15 23:37:09,559][1652475] Updated weights for policy 0, policy_version 947698 (0.0010) [2024-06-15 23:37:10,162][1652475] Updated weights for policy 0, policy_version 947747 (0.0007) [2024-06-15 23:37:10,738][1648984] Fps is (10 sec: 98306.6, 60 sec: 101034.8, 300 sec: 90084.2). Total num frames: 1941078016. Throughput: 0: 26009.6. Samples: 485316608. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:10,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:37:10,849][1652475] Updated weights for policy 0, policy_version 947808 (0.0010) [2024-06-15 23:37:11,500][1652475] Updated weights for policy 0, policy_version 947856 (0.0071) [2024-06-15 23:37:12,040][1652475] Updated weights for policy 0, policy_version 947904 (0.0010) [2024-06-15 23:37:14,503][1652475] Updated weights for policy 0, policy_version 947955 (0.0009) [2024-06-15 23:37:15,223][1652475] Updated weights for policy 0, policy_version 948016 (0.0010) [2024-06-15 23:37:15,738][1648984] Fps is (10 sec: 98304.0, 60 sec: 100488.3, 300 sec: 90528.5). Total num frames: 1941569536. Throughput: 0: 25804.8. Samples: 485455360. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:15,739][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:37:16,422][1652475] Updated weights for policy 0, policy_version 948035 (0.0010) [2024-06-15 23:37:17,031][1652475] Updated weights for policy 0, policy_version 948087 (0.0009) [2024-06-15 23:37:17,685][1652475] Updated weights for policy 0, policy_version 948145 (0.0009) [2024-06-15 23:37:18,689][1652475] Updated weights for policy 0, policy_version 948181 (0.0011) [2024-06-15 23:37:19,275][1652475] Updated weights for policy 0, policy_version 948229 (0.0053) [2024-06-15 23:37:20,738][1648984] Fps is (10 sec: 101580.7, 60 sec: 102673.1, 300 sec: 90639.6). Total num frames: 1942093824. Throughput: 0: 25782.1. Samples: 485533696. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:20,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:37:21,866][1652475] Updated weights for policy 0, policy_version 948289 (0.0010) [2024-06-15 23:37:22,503][1652475] Updated weights for policy 0, policy_version 948345 (0.0008) [2024-06-15 23:37:23,155][1652475] Updated weights for policy 0, policy_version 948374 (0.0011) [2024-06-15 23:37:23,274][1651340] Signal inference workers to stop experience collection... (48850 times) [2024-06-15 23:37:23,324][1652475] InferenceWorker_p0-w0: stopping experience collection (48850 times) [2024-06-15 23:37:23,416][1651340] Signal inference workers to resume experience collection... (48850 times) [2024-06-15 23:37:23,417][1652475] InferenceWorker_p0-w0: resuming experience collection (48850 times) [2024-06-15 23:37:23,703][1652475] Updated weights for policy 0, policy_version 948419 (0.0009) [2024-06-15 23:37:24,484][1652475] Updated weights for policy 0, policy_version 948481 (0.0009) [2024-06-15 23:37:25,204][1652475] Updated weights for policy 0, policy_version 948538 (0.0010) [2024-06-15 23:37:25,738][1648984] Fps is (10 sec: 104858.5, 60 sec: 104311.4, 300 sec: 91083.9). Total num frames: 1942618112. Throughput: 0: 25782.1. Samples: 485687808. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:25,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:37:27,249][1652475] Updated weights for policy 0, policy_version 948576 (0.0011) [2024-06-15 23:37:28,061][1652475] Updated weights for policy 0, policy_version 948626 (0.0009) [2024-06-15 23:37:28,636][1652475] Updated weights for policy 0, policy_version 948674 (0.0008) [2024-06-15 23:37:29,288][1652475] Updated weights for policy 0, policy_version 948726 (0.0008) [2024-06-15 23:37:30,014][1652475] Updated weights for policy 0, policy_version 948787 (0.0010) [2024-06-15 23:37:30,738][1648984] Fps is (10 sec: 104856.9, 60 sec: 104311.3, 300 sec: 91083.9). Total num frames: 1943142400. Throughput: 0: 25417.9. Samples: 485835776. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:37:32,140][1652475] Updated weights for policy 0, policy_version 948836 (0.0009) [2024-06-15 23:37:32,849][1652475] Updated weights for policy 0, policy_version 948885 (0.0010) [2024-06-15 23:37:33,458][1652475] Updated weights for policy 0, policy_version 948933 (0.0018) [2024-06-15 23:37:34,309][1652475] Updated weights for policy 0, policy_version 949008 (0.0008) [2024-06-15 23:37:34,870][1652475] Updated weights for policy 0, policy_version 949056 (0.0009) [2024-06-15 23:37:35,737][1648984] Fps is (10 sec: 104857.9, 60 sec: 103765.5, 300 sec: 91417.2). Total num frames: 1943666688. Throughput: 0: 25736.6. Samples: 485928448. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:35,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:37:37,561][1652475] Updated weights for policy 0, policy_version 949113 (0.0010) [2024-06-15 23:37:38,200][1652475] Updated weights for policy 0, policy_version 949157 (0.0010) [2024-06-15 23:37:38,862][1652475] Updated weights for policy 0, policy_version 949216 (0.0008) [2024-06-15 23:37:39,572][1651340] Signal inference workers to stop experience collection... (48900 times) [2024-06-15 23:37:39,582][1652475] Updated weights for policy 0, policy_version 949265 (0.0008) [2024-06-15 23:37:39,593][1652475] InferenceWorker_p0-w0: stopping experience collection (48900 times) [2024-06-15 23:37:39,695][1651340] Signal inference workers to resume experience collection... (48900 times) [2024-06-15 23:37:39,696][1652475] InferenceWorker_p0-w0: resuming experience collection (48900 times) [2024-06-15 23:37:40,030][1652475] Updated weights for policy 0, policy_version 949309 (0.0010) [2024-06-15 23:37:40,738][1648984] Fps is (10 sec: 104857.3, 60 sec: 103765.3, 300 sec: 91528.4). Total num frames: 1944190976. Throughput: 0: 25782.0. Samples: 486079488. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:40,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:37:42,105][1652475] Updated weights for policy 0, policy_version 949349 (0.0009) [2024-06-15 23:37:42,839][1652475] Updated weights for policy 0, policy_version 949392 (0.0009) [2024-06-15 23:37:43,444][1652475] Updated weights for policy 0, policy_version 949439 (0.0009) [2024-06-15 23:37:43,934][1652475] Updated weights for policy 0, policy_version 949474 (0.0009) [2024-06-15 23:37:44,511][1652475] Updated weights for policy 0, policy_version 949522 (0.0009) [2024-06-15 23:37:44,938][1652475] Updated weights for policy 0, policy_version 949563 (0.0009) [2024-06-15 23:37:45,738][1648984] Fps is (10 sec: 104856.4, 60 sec: 103765.2, 300 sec: 91972.5). Total num frames: 1944715264. Throughput: 0: 25645.6. Samples: 486234112. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:45,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:37:46,671][1652475] Updated weights for policy 0, policy_version 949602 (0.0009) [2024-06-15 23:37:47,786][1652475] Updated weights for policy 0, policy_version 949634 (0.0010) [2024-06-15 23:37:48,458][1652475] Updated weights for policy 0, policy_version 949696 (0.0010) [2024-06-15 23:37:49,141][1652475] Updated weights for policy 0, policy_version 949754 (0.0011) [2024-06-15 23:37:49,786][1652475] Updated weights for policy 0, policy_version 949796 (0.0011) [2024-06-15 23:37:50,053][1652475] Updated weights for policy 0, policy_version 949824 (0.0010) [2024-06-15 23:37:50,738][1648984] Fps is (10 sec: 104858.7, 60 sec: 102673.1, 300 sec: 92416.9). Total num frames: 1945239552. Throughput: 0: 25918.6. Samples: 486324736. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:37:51,384][1652475] Updated weights for policy 0, policy_version 949878 (0.0012) [2024-06-15 23:37:52,467][1652475] Updated weights for policy 0, policy_version 949906 (0.0009) [2024-06-15 23:37:54,082][1652475] Updated weights for policy 0, policy_version 949953 (0.0010) [2024-06-15 23:37:54,663][1652475] Updated weights for policy 0, policy_version 950007 (0.0011) [2024-06-15 23:37:55,361][1652475] Updated weights for policy 0, policy_version 950064 (0.0055) [2024-06-15 23:37:55,738][1648984] Fps is (10 sec: 104858.2, 60 sec: 102673.2, 300 sec: 92416.9). Total num frames: 1945763840. Throughput: 0: 26009.6. Samples: 486487040. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:37:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:37:55,883][1652475] Updated weights for policy 0, policy_version 950102 (0.0008) [2024-06-15 23:37:57,085][1652475] Updated weights for policy 0, policy_version 950161 (0.0012) [2024-06-15 23:37:57,554][1652475] Updated weights for policy 0, policy_version 950205 (0.0077) [2024-06-15 23:37:59,782][1651340] Signal inference workers to stop experience collection... (48950 times) [2024-06-15 23:37:59,848][1652475] InferenceWorker_p0-w0: stopping experience collection (48950 times) [2024-06-15 23:37:59,905][1651340] Signal inference workers to resume experience collection... (48950 times) [2024-06-15 23:37:59,906][1652475] InferenceWorker_p0-w0: resuming experience collection (48950 times) [2024-06-15 23:37:59,997][1652475] Updated weights for policy 0, policy_version 950241 (0.0009) [2024-06-15 23:38:00,451][1652475] Updated weights for policy 0, policy_version 950278 (0.0008) [2024-06-15 23:38:00,738][1648984] Fps is (10 sec: 95026.8, 60 sec: 101581.2, 300 sec: 92083.6). Total num frames: 1946189824. Throughput: 0: 26339.6. Samples: 486640640. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:38:00,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:38:01,042][1652475] Updated weights for policy 0, policy_version 950321 (0.0008) [2024-06-15 23:38:01,589][1652475] Updated weights for policy 0, policy_version 950368 (0.0008) [2024-06-15 23:38:02,111][1652475] Updated weights for policy 0, policy_version 950402 (0.0008) [2024-06-15 23:38:02,766][1652475] Updated weights for policy 0, policy_version 950464 (0.0011) [2024-06-15 23:38:05,106][1652475] Updated weights for policy 0, policy_version 950516 (0.0010) [2024-06-15 23:38:05,738][1648984] Fps is (10 sec: 98304.6, 60 sec: 102673.3, 300 sec: 92861.2). Total num frames: 1946746880. Throughput: 0: 26134.8. Samples: 486709760. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:38:05,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:38:05,785][1652475] Updated weights for policy 0, policy_version 950576 (0.0009) [2024-06-15 23:38:06,457][1652475] Updated weights for policy 0, policy_version 950625 (0.0009) [2024-06-15 23:38:07,037][1652475] Updated weights for policy 0, policy_version 950675 (0.0008) [2024-06-15 23:38:09,301][1652475] Updated weights for policy 0, policy_version 950722 (0.0010) [2024-06-15 23:38:09,898][1652475] Updated weights for policy 0, policy_version 950771 (0.0010) [2024-06-15 23:38:10,524][1652475] Updated weights for policy 0, policy_version 950820 (0.0010) [2024-06-15 23:38:10,738][1648984] Fps is (10 sec: 111411.2, 60 sec: 103765.3, 300 sec: 93305.5). Total num frames: 1947303936. Throughput: 0: 26373.7. Samples: 486874624. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:38:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:38:11,466][1652475] Updated weights for policy 0, policy_version 950869 (0.0011) [2024-06-15 23:38:11,889][1652475] Updated weights for policy 0, policy_version 950912 (0.0010) [2024-06-15 23:38:12,783][1652475] Updated weights for policy 0, policy_version 950960 (0.0011) [2024-06-15 23:38:14,259][1652475] Updated weights for policy 0, policy_version 951008 (0.0009) [2024-06-15 23:38:14,969][1652475] Updated weights for policy 0, policy_version 951064 (0.0011) [2024-06-15 23:38:15,738][1648984] Fps is (10 sec: 111409.7, 60 sec: 104857.6, 300 sec: 93860.9). Total num frames: 1947860992. Throughput: 0: 26214.4. Samples: 487015424. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:38:15,742][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:38:17,409][1651340] Signal inference workers to stop experience collection... (49000 times) [2024-06-15 23:38:17,423][1652475] Updated weights for policy 0, policy_version 951121 (0.0011) [2024-06-15 23:38:17,456][1652475] InferenceWorker_p0-w0: stopping experience collection (49000 times) [2024-06-15 23:38:17,536][1651340] Signal inference workers to resume experience collection... (49000 times) [2024-06-15 23:38:17,537][1652475] InferenceWorker_p0-w0: resuming experience collection (49000 times) [2024-06-15 23:38:17,857][1652475] Updated weights for policy 0, policy_version 951163 (0.0009) [2024-06-15 23:38:18,465][1652475] Updated weights for policy 0, policy_version 951203 (0.0009) [2024-06-15 23:38:19,130][1652475] Updated weights for policy 0, policy_version 951264 (0.0010) [2024-06-15 23:38:19,813][1652475] Updated weights for policy 0, policy_version 951314 (0.0011) [2024-06-15 23:38:20,738][1648984] Fps is (10 sec: 108135.0, 60 sec: 104857.6, 300 sec: 94194.1). Total num frames: 1948385280. Throughput: 0: 26134.7. Samples: 487104512. Policy #0 lag: (min: 47.0, avg: 128.2, max: 303.0) [2024-06-15 23:38:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:38:22,644][1652475] Updated weights for policy 0, policy_version 951363 (0.0010) [2024-06-15 23:38:23,276][1652475] Updated weights for policy 0, policy_version 951413 (0.0009) [2024-06-15 23:38:24,104][1652475] Updated weights for policy 0, policy_version 951479 (0.0010) [2024-06-15 23:38:24,690][1652475] Updated weights for policy 0, policy_version 951527 (0.0009) [2024-06-15 23:38:25,359][1652475] Updated weights for policy 0, policy_version 951584 (0.0009) [2024-06-15 23:38:25,738][1648984] Fps is (10 sec: 104858.8, 60 sec: 104857.6, 300 sec: 94194.2). Total num frames: 1948909568. Throughput: 0: 26032.4. Samples: 487250944. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:38:25,738][1648984] Avg episode reward: [(0, '-0.400')] [2024-06-15 23:38:27,496][1652475] Updated weights for policy 0, policy_version 951618 (0.0009) [2024-06-15 23:38:28,334][1652475] Updated weights for policy 0, policy_version 951680 (0.0075) [2024-06-15 23:38:28,983][1652475] Updated weights for policy 0, policy_version 951730 (0.0008) [2024-06-15 23:38:29,628][1652475] Updated weights for policy 0, policy_version 951783 (0.0013) [2024-06-15 23:38:30,206][1652475] Updated weights for policy 0, policy_version 951827 (0.0010) [2024-06-15 23:38:30,737][1648984] Fps is (10 sec: 104858.1, 60 sec: 104857.8, 300 sec: 94417.7). Total num frames: 1949433856. Throughput: 0: 25986.9. Samples: 487403520. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:38:30,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:38:32,521][1652475] Updated weights for policy 0, policy_version 951875 (0.0010) [2024-06-15 23:38:33,134][1652475] Updated weights for policy 0, policy_version 951921 (0.0009) [2024-06-15 23:38:33,783][1652475] Updated weights for policy 0, policy_version 951968 (0.0009) [2024-06-15 23:38:33,861][1651340] Signal inference workers to stop experience collection... (49050 times) [2024-06-15 23:38:33,909][1652475] InferenceWorker_p0-w0: stopping experience collection (49050 times) [2024-06-15 23:38:34,016][1651340] Signal inference workers to resume experience collection... (49050 times) [2024-06-15 23:38:34,016][1652475] InferenceWorker_p0-w0: resuming experience collection (49050 times) [2024-06-15 23:38:34,505][1652475] Updated weights for policy 0, policy_version 952022 (0.0008) [2024-06-15 23:38:35,259][1652475] Updated weights for policy 0, policy_version 952083 (0.0010) [2024-06-15 23:38:35,738][1648984] Fps is (10 sec: 104857.5, 60 sec: 104857.6, 300 sec: 95082.7). Total num frames: 1949958144. Throughput: 0: 26032.4. Samples: 487496192. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:38:35,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:38:35,759][1652475] Updated weights for policy 0, policy_version 952128 (0.0007) [2024-06-15 23:38:38,142][1652475] Updated weights for policy 0, policy_version 952181 (0.0008) [2024-06-15 23:38:38,823][1652475] Updated weights for policy 0, policy_version 952240 (0.0008) [2024-06-15 23:38:39,767][1652475] Updated weights for policy 0, policy_version 952314 (0.0011) [2024-06-15 23:38:40,381][1652475] Updated weights for policy 0, policy_version 952355 (0.0009) [2024-06-15 23:38:40,738][1648984] Fps is (10 sec: 104856.8, 60 sec: 104857.8, 300 sec: 95082.8). Total num frames: 1950482432. Throughput: 0: 25736.5. Samples: 487645184. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:38:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:38:42,734][1652475] Updated weights for policy 0, policy_version 952405 (0.0010) [2024-06-15 23:38:43,453][1652475] Updated weights for policy 0, policy_version 952464 (0.0008) [2024-06-15 23:38:44,034][1652475] Updated weights for policy 0, policy_version 952512 (0.0010) [2024-06-15 23:38:44,913][1652475] Updated weights for policy 0, policy_version 952562 (0.0010) [2024-06-15 23:38:45,450][1652475] Updated weights for policy 0, policy_version 952608 (0.0009) [2024-06-15 23:38:45,737][1648984] Fps is (10 sec: 101581.1, 60 sec: 104311.7, 300 sec: 94971.7). Total num frames: 1950973952. Throughput: 0: 25474.9. Samples: 487787008. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:38:45,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:38:45,786][1652475] Updated weights for policy 0, policy_version 952640 (0.0010) [2024-06-15 23:38:47,722][1652475] Updated weights for policy 0, policy_version 952695 (0.0011) [2024-06-15 23:38:49,038][1652475] Updated weights for policy 0, policy_version 952736 (0.0008) [2024-06-15 23:38:49,891][1652475] Updated weights for policy 0, policy_version 952800 (0.0017) [2024-06-15 23:38:50,227][1652475] Updated weights for policy 0, policy_version 952827 (0.0008) [2024-06-15 23:38:50,738][1648984] Fps is (10 sec: 95027.5, 60 sec: 103219.2, 300 sec: 95416.0). Total num frames: 1951432704. Throughput: 0: 25793.4. Samples: 487870464. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:38:50,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:38:50,982][1651340] Signal inference workers to stop experience collection... (49100 times) [2024-06-15 23:38:51,033][1652475] InferenceWorker_p0-w0: stopping experience collection (49100 times) [2024-06-15 23:38:51,036][1652475] Updated weights for policy 0, policy_version 952870 (0.0010) [2024-06-15 23:38:51,117][1651340] Signal inference workers to resume experience collection... (49100 times) [2024-06-15 23:38:51,118][1652475] InferenceWorker_p0-w0: resuming experience collection (49100 times) [2024-06-15 23:38:51,954][1652475] Updated weights for policy 0, policy_version 952915 (0.0011) [2024-06-15 23:38:52,444][1652475] Updated weights for policy 0, policy_version 952960 (0.0010) [2024-06-15 23:38:54,045][1652475] Updated weights for policy 0, policy_version 952998 (0.0011) [2024-06-15 23:38:55,320][1652475] Updated weights for policy 0, policy_version 953042 (0.0009) [2024-06-15 23:38:55,738][1648984] Fps is (10 sec: 91749.9, 60 sec: 102126.9, 300 sec: 95416.0). Total num frames: 1951891456. Throughput: 0: 25486.2. Samples: 488021504. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:38:55,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:38:56,002][1652475] Updated weights for policy 0, policy_version 953089 (0.0009) [2024-06-15 23:38:56,115][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000953104_1951956992.pth... [2024-06-15 23:38:56,213][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000941696_1928593408.pth [2024-06-15 23:38:56,765][1652475] Updated weights for policy 0, policy_version 953152 (0.0008) [2024-06-15 23:38:57,467][1652475] Updated weights for policy 0, policy_version 953209 (0.0008) [2024-06-15 23:38:58,980][1652475] Updated weights for policy 0, policy_version 953254 (0.0008) [2024-06-15 23:39:00,622][1652475] Updated weights for policy 0, policy_version 953312 (0.0010) [2024-06-15 23:39:00,738][1648984] Fps is (10 sec: 95027.2, 60 sec: 103219.3, 300 sec: 95306.4). Total num frames: 1952382976. Throughput: 0: 26009.7. Samples: 488185856. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:00,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:39:01,331][1652475] Updated weights for policy 0, policy_version 953364 (0.0008) [2024-06-15 23:39:01,982][1652475] Updated weights for policy 0, policy_version 953414 (0.0009) [2024-06-15 23:39:02,611][1652475] Updated weights for policy 0, policy_version 953472 (0.0008) [2024-06-15 23:39:03,886][1652475] Updated weights for policy 0, policy_version 953520 (0.0008) [2024-06-15 23:39:05,303][1652475] Updated weights for policy 0, policy_version 953568 (0.0007) [2024-06-15 23:39:05,739][1648984] Fps is (10 sec: 108131.0, 60 sec: 103764.7, 300 sec: 96304.5). Total num frames: 1952972800. Throughput: 0: 25599.8. Samples: 488256512. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:05,740][1648984] Avg episode reward: [(0, '-0.560')] [2024-06-15 23:39:05,940][1652475] Updated weights for policy 0, policy_version 953616 (0.0007) [2024-06-15 23:39:06,479][1652475] Updated weights for policy 0, policy_version 953659 (0.0009) [2024-06-15 23:39:07,667][1652475] Updated weights for policy 0, policy_version 953722 (0.0012) [2024-06-15 23:39:08,838][1652475] Updated weights for policy 0, policy_version 953776 (0.0010) [2024-06-15 23:39:10,042][1652475] Updated weights for policy 0, policy_version 953815 (0.0008) [2024-06-15 23:39:10,182][1651340] Signal inference workers to stop experience collection... (49150 times) [2024-06-15 23:39:10,232][1652475] InferenceWorker_p0-w0: stopping experience collection (49150 times) [2024-06-15 23:39:10,308][1651340] Signal inference workers to resume experience collection... (49150 times) [2024-06-15 23:39:10,309][1652475] InferenceWorker_p0-w0: resuming experience collection (49150 times) [2024-06-15 23:39:10,738][1648984] Fps is (10 sec: 114686.0, 60 sec: 103765.1, 300 sec: 96859.9). Total num frames: 1953529856. Throughput: 0: 25850.2. Samples: 488414208. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:10,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:39:10,785][1652475] Updated weights for policy 0, policy_version 953874 (0.0056) [2024-06-15 23:39:11,201][1652475] Updated weights for policy 0, policy_version 953916 (0.0010) [2024-06-15 23:39:13,678][1652475] Updated weights for policy 0, policy_version 953977 (0.0010) [2024-06-15 23:39:14,357][1652475] Updated weights for policy 0, policy_version 954022 (0.0011) [2024-06-15 23:39:15,066][1652475] Updated weights for policy 0, policy_version 954080 (0.0010) [2024-06-15 23:39:15,742][1648984] Fps is (10 sec: 108107.0, 60 sec: 103214.4, 300 sec: 97414.5). Total num frames: 1954054144. Throughput: 0: 25700.7. Samples: 488560128. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:15,743][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:39:15,755][1652475] Updated weights for policy 0, policy_version 954132 (0.0008) [2024-06-15 23:39:18,774][1652475] Updated weights for policy 0, policy_version 954181 (0.0008) [2024-06-15 23:39:19,363][1652475] Updated weights for policy 0, policy_version 954228 (0.0010) [2024-06-15 23:39:20,013][1652475] Updated weights for policy 0, policy_version 954288 (0.0009) [2024-06-15 23:39:20,737][1648984] Fps is (10 sec: 95029.2, 60 sec: 101580.9, 300 sec: 97082.2). Total num frames: 1954480128. Throughput: 0: 25463.5. Samples: 488642048. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:20,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 23:39:20,803][1652475] Updated weights for policy 0, policy_version 954352 (0.0009) [2024-06-15 23:39:21,452][1652475] Updated weights for policy 0, policy_version 954401 (0.0010) [2024-06-15 23:39:23,843][1652475] Updated weights for policy 0, policy_version 954449 (0.0012) [2024-06-15 23:39:24,391][1652475] Updated weights for policy 0, policy_version 954496 (0.0008) [2024-06-15 23:39:24,931][1652475] Updated weights for policy 0, policy_version 954530 (0.0007) [2024-06-15 23:39:25,564][1652475] Updated weights for policy 0, policy_version 954581 (0.0007) [2024-06-15 23:39:25,738][1648984] Fps is (10 sec: 95053.9, 60 sec: 101580.7, 300 sec: 97082.2). Total num frames: 1955004416. Throughput: 0: 25622.7. Samples: 488798208. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:25,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:39:26,494][1652475] Updated weights for policy 0, policy_version 954626 (0.0011) [2024-06-15 23:39:27,118][1652475] Updated weights for policy 0, policy_version 954677 (0.0009) [2024-06-15 23:39:28,872][1651340] Signal inference workers to stop experience collection... (49200 times) [2024-06-15 23:39:28,882][1652475] Updated weights for policy 0, policy_version 954721 (0.0008) [2024-06-15 23:39:28,927][1652475] InferenceWorker_p0-w0: stopping experience collection (49200 times) [2024-06-15 23:39:29,055][1651340] Signal inference workers to resume experience collection... (49200 times) [2024-06-15 23:39:29,056][1652475] InferenceWorker_p0-w0: resuming experience collection (49200 times) [2024-06-15 23:39:29,717][1652475] Updated weights for policy 0, policy_version 954784 (0.0011) [2024-06-15 23:39:30,510][1652475] Updated weights for policy 0, policy_version 954834 (0.0010) [2024-06-15 23:39:30,738][1648984] Fps is (10 sec: 104857.0, 60 sec: 101580.7, 300 sec: 97193.2). Total num frames: 1955528704. Throughput: 0: 25816.1. Samples: 488948736. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:39:31,871][1652475] Updated weights for policy 0, policy_version 954898 (0.0010) [2024-06-15 23:39:32,360][1652475] Updated weights for policy 0, policy_version 954942 (0.0008) [2024-06-15 23:39:34,011][1652475] Updated weights for policy 0, policy_version 954993 (0.0010) [2024-06-15 23:39:34,482][1652475] Updated weights for policy 0, policy_version 955031 (0.0007) [2024-06-15 23:39:35,119][1652475] Updated weights for policy 0, policy_version 955080 (0.0008) [2024-06-15 23:39:35,738][1648984] Fps is (10 sec: 111411.6, 60 sec: 102673.0, 300 sec: 98192.9). Total num frames: 1956118528. Throughput: 0: 25838.9. Samples: 489033216. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:39:35,742][1652475] Updated weights for policy 0, policy_version 955136 (0.0008) [2024-06-15 23:39:37,115][1652475] Updated weights for policy 0, policy_version 955174 (0.0008) [2024-06-15 23:39:38,716][1652475] Updated weights for policy 0, policy_version 955219 (0.0019) [2024-06-15 23:39:39,307][1652475] Updated weights for policy 0, policy_version 955265 (0.0008) [2024-06-15 23:39:39,974][1652475] Updated weights for policy 0, policy_version 955315 (0.0009) [2024-06-15 23:39:40,579][1652475] Updated weights for policy 0, policy_version 955361 (0.0008) [2024-06-15 23:39:40,738][1648984] Fps is (10 sec: 108133.9, 60 sec: 102126.9, 300 sec: 98081.9). Total num frames: 1956610048. Throughput: 0: 25964.1. Samples: 489189888. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:40,740][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:39:41,839][1652475] Updated weights for policy 0, policy_version 955409 (0.0031) [2024-06-15 23:39:42,294][1652475] Updated weights for policy 0, policy_version 955451 (0.0010) [2024-06-15 23:39:43,892][1652475] Updated weights for policy 0, policy_version 955489 (0.0010) [2024-06-15 23:39:44,567][1652475] Updated weights for policy 0, policy_version 955539 (0.0008) [2024-06-15 23:39:45,238][1652475] Updated weights for policy 0, policy_version 955587 (0.0010) [2024-06-15 23:39:45,738][1648984] Fps is (10 sec: 98303.5, 60 sec: 102126.8, 300 sec: 98304.0). Total num frames: 1957101568. Throughput: 0: 25543.1. Samples: 489335296. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:45,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:39:45,859][1652475] Updated weights for policy 0, policy_version 955641 (0.0009) [2024-06-15 23:39:46,359][1651340] Signal inference workers to stop experience collection... (49250 times) [2024-06-15 23:39:46,406][1652475] InferenceWorker_p0-w0: stopping experience collection (49250 times) [2024-06-15 23:39:46,507][1651340] Signal inference workers to resume experience collection... (49250 times) [2024-06-15 23:39:46,509][1652475] InferenceWorker_p0-w0: resuming experience collection (49250 times) [2024-06-15 23:39:46,629][1652475] Updated weights for policy 0, policy_version 955683 (0.0010) [2024-06-15 23:39:48,386][1652475] Updated weights for policy 0, policy_version 955733 (0.0010) [2024-06-15 23:39:50,086][1652475] Updated weights for policy 0, policy_version 955779 (0.0011) [2024-06-15 23:39:50,737][1648984] Fps is (10 sec: 91751.4, 60 sec: 101580.9, 300 sec: 98194.4). Total num frames: 1957527552. Throughput: 0: 25702.6. Samples: 489413120. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:50,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:39:50,975][1652475] Updated weights for policy 0, policy_version 955841 (0.0008) [2024-06-15 23:39:51,515][1652475] Updated weights for policy 0, policy_version 955888 (0.0008) [2024-06-15 23:39:52,232][1652475] Updated weights for policy 0, policy_version 955937 (0.0009) [2024-06-15 23:39:53,417][1652475] Updated weights for policy 0, policy_version 955993 (0.0064) [2024-06-15 23:39:55,226][1652475] Updated weights for policy 0, policy_version 956048 (0.0011) [2024-06-15 23:39:55,738][1648984] Fps is (10 sec: 95027.7, 60 sec: 102673.1, 300 sec: 98637.2). Total num frames: 1958051840. Throughput: 0: 25418.0. Samples: 489558016. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:39:55,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:39:55,867][1652475] Updated weights for policy 0, policy_version 956096 (0.0008) [2024-06-15 23:39:56,827][1652475] Updated weights for policy 0, policy_version 956146 (0.0009) [2024-06-15 23:39:57,471][1652475] Updated weights for policy 0, policy_version 956195 (0.0010) [2024-06-15 23:39:58,572][1652475] Updated weights for policy 0, policy_version 956256 (0.0010) [2024-06-15 23:40:00,080][1652475] Updated weights for policy 0, policy_version 956304 (0.0008) [2024-06-15 23:40:00,738][1648984] Fps is (10 sec: 108133.6, 60 sec: 103765.3, 300 sec: 99081.5). Total num frames: 1958608896. Throughput: 0: 25863.3. Samples: 489723904. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:40:00,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 23:40:01,265][1652475] Updated weights for policy 0, policy_version 956368 (0.0010) [2024-06-15 23:40:01,977][1652475] Updated weights for policy 0, policy_version 956418 (0.0011) [2024-06-15 23:40:02,609][1652475] Updated weights for policy 0, policy_version 956477 (0.0009) [2024-06-15 23:40:04,000][1652475] Updated weights for policy 0, policy_version 956518 (0.0010) [2024-06-15 23:40:05,215][1652475] Updated weights for policy 0, policy_version 956580 (0.0010) [2024-06-15 23:40:05,737][1648984] Fps is (10 sec: 108135.1, 60 sec: 102673.7, 300 sec: 99415.0). Total num frames: 1959133184. Throughput: 0: 25622.8. Samples: 489795072. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:40:05,738][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 23:40:06,091][1652475] Updated weights for policy 0, policy_version 956624 (0.0010) [2024-06-15 23:40:06,392][1651340] Signal inference workers to stop experience collection... (49300 times) [2024-06-15 23:40:06,440][1652475] InferenceWorker_p0-w0: stopping experience collection (49300 times) [2024-06-15 23:40:06,526][1651340] Signal inference workers to resume experience collection... (49300 times) [2024-06-15 23:40:06,526][1652475] InferenceWorker_p0-w0: resuming experience collection (49300 times) [2024-06-15 23:40:06,631][1652475] Updated weights for policy 0, policy_version 956658 (0.0008) [2024-06-15 23:40:07,165][1652475] Updated weights for policy 0, policy_version 956704 (0.0008) [2024-06-15 23:40:09,506][1652475] Updated weights for policy 0, policy_version 956752 (0.0009) [2024-06-15 23:40:10,033][1652475] Updated weights for policy 0, policy_version 956786 (0.0009) [2024-06-15 23:40:10,708][1652475] Updated weights for policy 0, policy_version 956839 (0.0008) [2024-06-15 23:40:10,737][1648984] Fps is (10 sec: 98304.7, 60 sec: 101035.0, 300 sec: 99637.0). Total num frames: 1959591936. Throughput: 0: 25782.1. Samples: 489958400. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:40:10,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:40:11,387][1652475] Updated weights for policy 0, policy_version 956896 (0.0012) [2024-06-15 23:40:12,042][1652475] Updated weights for policy 0, policy_version 956948 (0.0010) [2024-06-15 23:40:12,440][1652475] Updated weights for policy 0, policy_version 956988 (0.0011) [2024-06-15 23:40:15,296][1652475] Updated weights for policy 0, policy_version 957029 (0.0011) [2024-06-15 23:40:15,738][1648984] Fps is (10 sec: 91746.7, 60 sec: 99946.6, 300 sec: 99970.0). Total num frames: 1960050688. Throughput: 0: 25872.9. Samples: 490113024. Policy #0 lag: (min: 31.0, avg: 63.4, max: 207.0) [2024-06-15 23:40:15,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:40:16,035][1652475] Updated weights for policy 0, policy_version 957090 (0.0012) [2024-06-15 23:40:16,663][1652475] Updated weights for policy 0, policy_version 957140 (0.0008) [2024-06-15 23:40:17,355][1652475] Updated weights for policy 0, policy_version 957200 (0.0011) [2024-06-15 23:40:17,984][1652475] Updated weights for policy 0, policy_version 957248 (0.0011) [2024-06-15 23:40:20,255][1652475] Updated weights for policy 0, policy_version 957301 (0.0009) [2024-06-15 23:40:20,737][1648984] Fps is (10 sec: 104858.8, 60 sec: 102673.2, 300 sec: 100192.4). Total num frames: 1960640512. Throughput: 0: 25224.6. Samples: 490168320. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:40:20,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:40:20,939][1652475] Updated weights for policy 0, policy_version 957360 (0.0009) [2024-06-15 23:40:21,641][1652475] Updated weights for policy 0, policy_version 957409 (0.0008) [2024-06-15 23:40:22,957][1652475] Updated weights for policy 0, policy_version 957458 (0.0010) [2024-06-15 23:40:24,482][1652475] Updated weights for policy 0, policy_version 957520 (0.0010) [2024-06-15 23:40:24,549][1651340] Signal inference workers to stop experience collection... (49350 times) [2024-06-15 23:40:24,594][1652475] InferenceWorker_p0-w0: stopping experience collection (49350 times) [2024-06-15 23:40:24,685][1651340] Signal inference workers to resume experience collection... (49350 times) [2024-06-15 23:40:24,686][1652475] InferenceWorker_p0-w0: resuming experience collection (49350 times) [2024-06-15 23:40:25,317][1652475] Updated weights for policy 0, policy_version 957584 (0.0009) [2024-06-15 23:40:25,738][1648984] Fps is (10 sec: 114691.1, 60 sec: 103219.1, 300 sec: 100303.4). Total num frames: 1961197568. Throughput: 0: 25554.5. Samples: 490339840. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:40:25,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:40:25,991][1652475] Updated weights for policy 0, policy_version 957636 (0.0009) [2024-06-15 23:40:26,552][1652475] Updated weights for policy 0, policy_version 957685 (0.0010) [2024-06-15 23:40:28,051][1652475] Updated weights for policy 0, policy_version 957715 (0.0009) [2024-06-15 23:40:29,108][1652475] Updated weights for policy 0, policy_version 957762 (0.0009) [2024-06-15 23:40:29,726][1652475] Updated weights for policy 0, policy_version 957815 (0.0007) [2024-06-15 23:40:30,315][1652475] Updated weights for policy 0, policy_version 957866 (0.0009) [2024-06-15 23:40:30,738][1648984] Fps is (10 sec: 111410.0, 60 sec: 103765.4, 300 sec: 100414.5). Total num frames: 1961754624. Throughput: 0: 25873.1. Samples: 490499584. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:40:30,739][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:40:30,932][1652475] Updated weights for policy 0, policy_version 957920 (0.0008) [2024-06-15 23:40:32,949][1652475] Updated weights for policy 0, policy_version 957971 (0.0012) [2024-06-15 23:40:33,380][1652475] Updated weights for policy 0, policy_version 958014 (0.0011) [2024-06-15 23:40:34,316][1652475] Updated weights for policy 0, policy_version 958064 (0.0011) [2024-06-15 23:40:34,958][1652475] Updated weights for policy 0, policy_version 958112 (0.0010) [2024-06-15 23:40:35,528][1652475] Updated weights for policy 0, policy_version 958160 (0.0009) [2024-06-15 23:40:35,738][1648984] Fps is (10 sec: 114688.7, 60 sec: 103765.3, 300 sec: 101192.1). Total num frames: 1962344448. Throughput: 0: 25952.7. Samples: 490580992. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:40:35,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:40:36,036][1652475] Updated weights for policy 0, policy_version 958203 (0.0009) [2024-06-15 23:40:37,525][1652475] Updated weights for policy 0, policy_version 958243 (0.0009) [2024-06-15 23:40:38,732][1652475] Updated weights for policy 0, policy_version 958288 (0.0009) [2024-06-15 23:40:39,478][1652475] Updated weights for policy 0, policy_version 958340 (0.0011) [2024-06-15 23:40:40,101][1652475] Updated weights for policy 0, policy_version 958396 (0.0010) [2024-06-15 23:40:40,742][1648984] Fps is (10 sec: 104808.9, 60 sec: 103211.4, 300 sec: 101523.7). Total num frames: 1962803200. Throughput: 0: 26427.9. Samples: 490747392. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:40:40,743][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 23:40:41,309][1652475] Updated weights for policy 0, policy_version 958454 (0.0011) [2024-06-15 23:40:41,732][1651340] Signal inference workers to stop experience collection... (49400 times) [2024-06-15 23:40:41,768][1652475] InferenceWorker_p0-w0: stopping experience collection (49400 times) [2024-06-15 23:40:41,855][1651340] Signal inference workers to resume experience collection... (49400 times) [2024-06-15 23:40:41,855][1652475] InferenceWorker_p0-w0: resuming experience collection (49400 times) [2024-06-15 23:40:42,062][1652475] Updated weights for policy 0, policy_version 958496 (0.0010) [2024-06-15 23:40:42,396][1652475] Updated weights for policy 0, policy_version 958527 (0.0009) [2024-06-15 23:40:44,028][1652475] Updated weights for policy 0, policy_version 958586 (0.0059) [2024-06-15 23:40:45,227][1652475] Updated weights for policy 0, policy_version 958631 (0.0009) [2024-06-15 23:40:45,738][1648984] Fps is (10 sec: 98304.1, 60 sec: 103765.4, 300 sec: 101747.5). Total num frames: 1963327488. Throughput: 0: 26123.4. Samples: 490899456. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:40:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:40:45,944][1652475] Updated weights for policy 0, policy_version 958659 (0.0010) [2024-06-15 23:40:46,528][1652475] Updated weights for policy 0, policy_version 958717 (0.0009) [2024-06-15 23:40:47,226][1652475] Updated weights for policy 0, policy_version 958768 (0.0009) [2024-06-15 23:40:48,298][1652475] Updated weights for policy 0, policy_version 958820 (0.0009) [2024-06-15 23:40:50,448][1652475] Updated weights for policy 0, policy_version 958870 (0.0009) [2024-06-15 23:40:50,738][1648984] Fps is (10 sec: 98349.0, 60 sec: 104311.3, 300 sec: 101525.2). Total num frames: 1963786240. Throughput: 0: 26259.9. Samples: 490976768. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:40:50,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:40:51,153][1652475] Updated weights for policy 0, policy_version 958928 (0.0010) [2024-06-15 23:40:52,030][1652475] Updated weights for policy 0, policy_version 958978 (0.0012) [2024-06-15 23:40:52,705][1652475] Updated weights for policy 0, policy_version 959034 (0.0011) [2024-06-15 23:40:53,320][1652475] Updated weights for policy 0, policy_version 959075 (0.0010) [2024-06-15 23:40:55,731][1652475] Updated weights for policy 0, policy_version 959132 (0.0011) [2024-06-15 23:40:55,738][1648984] Fps is (10 sec: 95024.8, 60 sec: 103764.9, 300 sec: 102191.6). Total num frames: 1964277760. Throughput: 0: 26032.2. Samples: 491129856. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:40:55,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:40:55,995][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000959152_1964343296.pth... [2024-06-15 23:40:56,026][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000947072_1939603456.pth [2024-06-15 23:40:56,360][1652475] Updated weights for policy 0, policy_version 959170 (0.0009) [2024-06-15 23:40:56,970][1652475] Updated weights for policy 0, policy_version 959222 (0.0011) [2024-06-15 23:40:57,821][1652475] Updated weights for policy 0, policy_version 959292 (0.0011) [2024-06-15 23:40:58,437][1652475] Updated weights for policy 0, policy_version 959330 (0.0009) [2024-06-15 23:41:00,624][1652475] Updated weights for policy 0, policy_version 959382 (0.0008) [2024-06-15 23:41:00,738][1648984] Fps is (10 sec: 104855.8, 60 sec: 103765.0, 300 sec: 102636.0). Total num frames: 1964834816. Throughput: 0: 26225.9. Samples: 491293184. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:00,739][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:41:01,150][1651340] Signal inference workers to stop experience collection... (49450 times) [2024-06-15 23:41:01,180][1652475] InferenceWorker_p0-w0: stopping experience collection (49450 times) [2024-06-15 23:41:01,279][1651340] Signal inference workers to resume experience collection... (49450 times) [2024-06-15 23:41:01,280][1652475] InferenceWorker_p0-w0: resuming experience collection (49450 times) [2024-06-15 23:41:01,383][1652475] Updated weights for policy 0, policy_version 959444 (0.0010) [2024-06-15 23:41:02,208][1652475] Updated weights for policy 0, policy_version 959506 (0.0009) [2024-06-15 23:41:03,907][1652475] Updated weights for policy 0, policy_version 959568 (0.0010) [2024-06-15 23:41:04,457][1652475] Updated weights for policy 0, policy_version 959615 (0.0007) [2024-06-15 23:41:05,640][1652475] Updated weights for policy 0, policy_version 959651 (0.0011) [2024-06-15 23:41:05,737][1648984] Fps is (10 sec: 111415.1, 60 sec: 104311.5, 300 sec: 102969.3). Total num frames: 1965391872. Throughput: 0: 26510.2. Samples: 491361280. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:05,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:41:06,156][1652475] Updated weights for policy 0, policy_version 959696 (0.0010) [2024-06-15 23:41:06,817][1652475] Updated weights for policy 0, policy_version 959748 (0.0009) [2024-06-15 23:41:07,417][1652475] Updated weights for policy 0, policy_version 959801 (0.0012) [2024-06-15 23:41:10,086][1652475] Updated weights for policy 0, policy_version 959840 (0.0010) [2024-06-15 23:41:10,738][1648984] Fps is (10 sec: 101582.2, 60 sec: 104311.3, 300 sec: 102747.1). Total num frames: 1965850624. Throughput: 0: 26350.9. Samples: 491525632. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:10,738][1648984] Avg episode reward: [(0, '-0.260')] [2024-06-15 23:41:10,848][1652475] Updated weights for policy 0, policy_version 959894 (0.0010) [2024-06-15 23:41:11,453][1652475] Updated weights for policy 0, policy_version 959939 (0.0009) [2024-06-15 23:41:12,121][1652475] Updated weights for policy 0, policy_version 959990 (0.0010) [2024-06-15 23:41:12,795][1652475] Updated weights for policy 0, policy_version 960048 (0.0010) [2024-06-15 23:41:15,020][1652475] Updated weights for policy 0, policy_version 960067 (0.0010) [2024-06-15 23:41:15,678][1652475] Updated weights for policy 0, policy_version 960119 (0.0009) [2024-06-15 23:41:15,738][1648984] Fps is (10 sec: 91749.1, 60 sec: 104312.0, 300 sec: 102969.3). Total num frames: 1966309376. Throughput: 0: 26146.1. Samples: 491676160. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:15,738][1648984] Avg episode reward: [(0, '-0.280')] [2024-06-15 23:41:16,353][1652475] Updated weights for policy 0, policy_version 960176 (0.0009) [2024-06-15 23:41:16,870][1652475] Updated weights for policy 0, policy_version 960211 (0.0009) [2024-06-15 23:41:17,362][1652475] Updated weights for policy 0, policy_version 960256 (0.0008) [2024-06-15 23:41:18,164][1652475] Updated weights for policy 0, policy_version 960304 (0.0009) [2024-06-15 23:41:19,659][1651340] Signal inference workers to stop experience collection... (49500 times) [2024-06-15 23:41:19,674][1652475] InferenceWorker_p0-w0: stopping experience collection (49500 times) [2024-06-15 23:41:19,788][1651340] Signal inference workers to resume experience collection... (49500 times) [2024-06-15 23:41:19,788][1652475] InferenceWorker_p0-w0: resuming experience collection (49500 times) [2024-06-15 23:41:19,974][1652475] Updated weights for policy 0, policy_version 960352 (0.0009) [2024-06-15 23:41:20,673][1652475] Updated weights for policy 0, policy_version 960401 (0.0010) [2024-06-15 23:41:20,737][1648984] Fps is (10 sec: 104858.6, 60 sec: 104311.3, 300 sec: 103524.7). Total num frames: 1966899200. Throughput: 0: 25804.8. Samples: 491742208. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:20,738][1648984] Avg episode reward: [(0, '-0.470')] [2024-06-15 23:41:21,409][1652475] Updated weights for policy 0, policy_version 960464 (0.0010) [2024-06-15 23:41:21,928][1652475] Updated weights for policy 0, policy_version 960510 (0.0009) [2024-06-15 23:41:23,912][1652475] Updated weights for policy 0, policy_version 960575 (0.0011) [2024-06-15 23:41:25,379][1652475] Updated weights for policy 0, policy_version 960611 (0.0008) [2024-06-15 23:41:25,738][1648984] Fps is (10 sec: 108131.8, 60 sec: 103218.8, 300 sec: 103413.5). Total num frames: 1967390720. Throughput: 0: 25670.7. Samples: 491902464. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:25,739][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:41:26,040][1652475] Updated weights for policy 0, policy_version 960659 (0.0008) [2024-06-15 23:41:26,654][1652475] Updated weights for policy 0, policy_version 960710 (0.0009) [2024-06-15 23:41:27,217][1652475] Updated weights for policy 0, policy_version 960760 (0.0009) [2024-06-15 23:41:28,716][1652475] Updated weights for policy 0, policy_version 960802 (0.0012) [2024-06-15 23:41:30,380][1652475] Updated weights for policy 0, policy_version 960864 (0.0010) [2024-06-15 23:41:30,737][1648984] Fps is (10 sec: 98304.0, 60 sec: 102126.9, 300 sec: 103191.5). Total num frames: 1967882240. Throughput: 0: 25770.7. Samples: 492059136. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:30,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 23:41:31,073][1652475] Updated weights for policy 0, policy_version 960915 (0.0008) [2024-06-15 23:41:31,764][1652475] Updated weights for policy 0, policy_version 960976 (0.0011) [2024-06-15 23:41:33,353][1652475] Updated weights for policy 0, policy_version 961028 (0.0009) [2024-06-15 23:41:34,029][1652475] Updated weights for policy 0, policy_version 961084 (0.0080) [2024-06-15 23:41:35,487][1652475] Updated weights for policy 0, policy_version 961122 (0.0009) [2024-06-15 23:41:35,738][1648984] Fps is (10 sec: 101584.1, 60 sec: 101034.8, 300 sec: 103191.5). Total num frames: 1968406528. Throughput: 0: 25679.7. Samples: 492132352. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:35,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:41:36,016][1652475] Updated weights for policy 0, policy_version 961168 (0.0009) [2024-06-15 23:41:36,659][1652475] Updated weights for policy 0, policy_version 961216 (0.0008) [2024-06-15 23:41:36,728][1651340] Signal inference workers to stop experience collection... (49550 times) [2024-06-15 23:41:36,775][1652475] InferenceWorker_p0-w0: stopping experience collection (49550 times) [2024-06-15 23:41:36,867][1651340] Signal inference workers to resume experience collection... (49550 times) [2024-06-15 23:41:36,868][1652475] InferenceWorker_p0-w0: resuming experience collection (49550 times) [2024-06-15 23:41:37,348][1652475] Updated weights for policy 0, policy_version 961272 (0.0009) [2024-06-15 23:41:38,674][1652475] Updated weights for policy 0, policy_version 961318 (0.0009) [2024-06-15 23:41:39,917][1652475] Updated weights for policy 0, policy_version 961356 (0.0010) [2024-06-15 23:41:40,500][1652475] Updated weights for policy 0, policy_version 961406 (0.0043) [2024-06-15 23:41:40,738][1648984] Fps is (10 sec: 108133.6, 60 sec: 102680.9, 300 sec: 103302.5). Total num frames: 1968963584. Throughput: 0: 25531.9. Samples: 492278784. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:40,738][1648984] Avg episode reward: [(0, '-0.420')] [2024-06-15 23:41:42,259][1652475] Updated weights for policy 0, policy_version 961458 (0.0009) [2024-06-15 23:41:42,848][1652475] Updated weights for policy 0, policy_version 961504 (0.0010) [2024-06-15 23:41:43,374][1652475] Updated weights for policy 0, policy_version 961539 (0.0009) [2024-06-15 23:41:43,924][1652475] Updated weights for policy 0, policy_version 961586 (0.0011) [2024-06-15 23:41:44,823][1652475] Updated weights for policy 0, policy_version 961617 (0.0010) [2024-06-15 23:41:45,737][1648984] Fps is (10 sec: 108134.8, 60 sec: 102673.2, 300 sec: 103080.4). Total num frames: 1969487872. Throughput: 0: 25190.6. Samples: 492426752. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:45,738][1648984] Avg episode reward: [(0, '-0.200')] [2024-06-15 23:41:47,017][1652475] Updated weights for policy 0, policy_version 961666 (0.0009) [2024-06-15 23:41:47,649][1652475] Updated weights for policy 0, policy_version 961713 (0.0008) [2024-06-15 23:41:48,120][1652475] Updated weights for policy 0, policy_version 961748 (0.0008) [2024-06-15 23:41:48,746][1652475] Updated weights for policy 0, policy_version 961793 (0.0008) [2024-06-15 23:41:49,227][1652475] Updated weights for policy 0, policy_version 961832 (0.0007) [2024-06-15 23:41:49,827][1652475] Updated weights for policy 0, policy_version 961874 (0.0008) [2024-06-15 23:41:50,343][1652475] Updated weights for policy 0, policy_version 961920 (0.0008) [2024-06-15 23:41:50,737][1648984] Fps is (10 sec: 104858.4, 60 sec: 103765.5, 300 sec: 103080.4). Total num frames: 1970012160. Throughput: 0: 25520.3. Samples: 492509696. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:50,738][1648984] Avg episode reward: [(0, '-0.460')] [2024-06-15 23:41:52,548][1652475] Updated weights for policy 0, policy_version 961958 (0.0007) [2024-06-15 23:41:53,205][1652475] Updated weights for policy 0, policy_version 962003 (0.0008) [2024-06-15 23:41:53,798][1652475] Updated weights for policy 0, policy_version 962049 (0.0007) [2024-06-15 23:41:54,272][1652475] Updated weights for policy 0, policy_version 962087 (0.0007) [2024-06-15 23:41:54,833][1651340] Signal inference workers to stop experience collection... (49600 times) [2024-06-15 23:41:54,862][1652475] Updated weights for policy 0, policy_version 962131 (0.0007) [2024-06-15 23:41:54,897][1652475] InferenceWorker_p0-w0: stopping experience collection (49600 times) [2024-06-15 23:41:54,957][1651340] Signal inference workers to resume experience collection... (49600 times) [2024-06-15 23:41:54,957][1652475] InferenceWorker_p0-w0: resuming experience collection (49600 times) [2024-06-15 23:41:55,744][1648984] Fps is (10 sec: 104805.4, 60 sec: 104303.4, 300 sec: 103189.8). Total num frames: 1970536448. Throughput: 0: 25347.0. Samples: 492666368. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:41:55,753][1648984] Avg episode reward: [(0, '-0.440')] [2024-06-15 23:41:56,683][1652475] Updated weights for policy 0, policy_version 962183 (0.0009) [2024-06-15 23:41:57,256][1652475] Updated weights for policy 0, policy_version 962233 (0.0020) [2024-06-15 23:41:58,265][1652475] Updated weights for policy 0, policy_version 962273 (0.0008) [2024-06-15 23:41:58,702][1652475] Updated weights for policy 0, policy_version 962310 (0.0008) [2024-06-15 23:41:59,307][1652475] Updated weights for policy 0, policy_version 962356 (0.0009) [2024-06-15 23:41:59,905][1652475] Updated weights for policy 0, policy_version 962401 (0.0008) [2024-06-15 23:42:00,738][1648984] Fps is (10 sec: 104857.3, 60 sec: 103765.7, 300 sec: 103302.5). Total num frames: 1971060736. Throughput: 0: 25361.1. Samples: 492817408. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:42:00,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:42:01,729][1652475] Updated weights for policy 0, policy_version 962449 (0.0010) [2024-06-15 23:42:02,211][1652475] Updated weights for policy 0, policy_version 962496 (0.0008) [2024-06-15 23:42:03,307][1652475] Updated weights for policy 0, policy_version 962544 (0.0008) [2024-06-15 23:42:04,087][1652475] Updated weights for policy 0, policy_version 962593 (0.0010) [2024-06-15 23:42:05,644][1652475] Updated weights for policy 0, policy_version 962646 (0.0011) [2024-06-15 23:42:05,737][1648984] Fps is (10 sec: 98352.4, 60 sec: 102126.8, 300 sec: 103191.4). Total num frames: 1971519488. Throughput: 0: 25793.4. Samples: 492902912. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:42:05,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:42:07,034][1652475] Updated weights for policy 0, policy_version 962704 (0.0011) [2024-06-15 23:42:07,741][1652475] Updated weights for policy 0, policy_version 962753 (0.0009) [2024-06-15 23:42:08,369][1652475] Updated weights for policy 0, policy_version 962802 (0.0010) [2024-06-15 23:42:09,076][1652475] Updated weights for policy 0, policy_version 962849 (0.0014) [2024-06-15 23:42:10,738][1648984] Fps is (10 sec: 91748.8, 60 sec: 102126.8, 300 sec: 103080.3). Total num frames: 1971978240. Throughput: 0: 25315.6. Samples: 493041664. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:42:10,739][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:42:11,484][1652475] Updated weights for policy 0, policy_version 962899 (0.0010) [2024-06-15 23:42:12,133][1652475] Updated weights for policy 0, policy_version 962946 (0.0012) [2024-06-15 23:42:12,779][1652475] Updated weights for policy 0, policy_version 962999 (0.0009) [2024-06-15 23:42:13,338][1651340] Signal inference workers to stop experience collection... (49650 times) [2024-06-15 23:42:13,394][1652475] InferenceWorker_p0-w0: stopping experience collection (49650 times) [2024-06-15 23:42:13,499][1651340] Signal inference workers to resume experience collection... (49650 times) [2024-06-15 23:42:13,500][1652475] InferenceWorker_p0-w0: resuming experience collection (49650 times) [2024-06-15 23:42:13,650][1652475] Updated weights for policy 0, policy_version 963071 (0.0056) [2024-06-15 23:42:14,275][1652475] Updated weights for policy 0, policy_version 963107 (0.0012) [2024-06-15 23:42:15,738][1648984] Fps is (10 sec: 98303.5, 60 sec: 103219.2, 300 sec: 103080.3). Total num frames: 1972502528. Throughput: 0: 25258.6. Samples: 493195776. Policy #0 lag: (min: 63.0, avg: 130.7, max: 319.0) [2024-06-15 23:42:15,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:42:16,493][1652475] Updated weights for policy 0, policy_version 963152 (0.0011) [2024-06-15 23:42:17,020][1652475] Updated weights for policy 0, policy_version 963186 (0.0008) [2024-06-15 23:42:17,536][1652475] Updated weights for policy 0, policy_version 963232 (0.0009) [2024-06-15 23:42:18,279][1652475] Updated weights for policy 0, policy_version 963281 (0.0008) [2024-06-15 23:42:18,774][1652475] Updated weights for policy 0, policy_version 963328 (0.0008) [2024-06-15 23:42:19,946][1652475] Updated weights for policy 0, policy_version 963366 (0.0008) [2024-06-15 23:42:20,737][1648984] Fps is (10 sec: 104859.6, 60 sec: 102126.9, 300 sec: 103080.4). Total num frames: 1973026816. Throughput: 0: 25395.2. Samples: 493275136. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:42:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:42:21,558][1652475] Updated weights for policy 0, policy_version 963421 (0.0013) [2024-06-15 23:42:22,120][1652475] Updated weights for policy 0, policy_version 963459 (0.0011) [2024-06-15 23:42:22,951][1652475] Updated weights for policy 0, policy_version 963525 (0.0054) [2024-06-15 23:42:23,528][1652475] Updated weights for policy 0, policy_version 963575 (0.0009) [2024-06-15 23:42:25,019][1652475] Updated weights for policy 0, policy_version 963606 (0.0009) [2024-06-15 23:42:25,741][1648984] Fps is (10 sec: 104855.8, 60 sec: 102673.2, 300 sec: 103080.3). Total num frames: 1973551104. Throughput: 0: 25656.8. Samples: 493433344. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:42:25,746][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:42:26,401][1652475] Updated weights for policy 0, policy_version 963655 (0.0008) [2024-06-15 23:42:26,992][1652475] Updated weights for policy 0, policy_version 963697 (0.0009) [2024-06-15 23:42:27,606][1652475] Updated weights for policy 0, policy_version 963745 (0.0008) [2024-06-15 23:42:28,078][1652475] Updated weights for policy 0, policy_version 963783 (0.0008) [2024-06-15 23:42:28,678][1652475] Updated weights for policy 0, policy_version 963833 (0.0008) [2024-06-15 23:42:30,357][1652475] Updated weights for policy 0, policy_version 963878 (0.0007) [2024-06-15 23:42:30,738][1648984] Fps is (10 sec: 104857.5, 60 sec: 103219.2, 300 sec: 103080.3). Total num frames: 1974075392. Throughput: 0: 25918.5. Samples: 493593088. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:42:30,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:42:31,548][1652475] Updated weights for policy 0, policy_version 963936 (0.0010) [2024-06-15 23:42:31,843][1651340] Signal inference workers to stop experience collection... (49700 times) [2024-06-15 23:42:31,906][1652475] InferenceWorker_p0-w0: stopping experience collection (49700 times) [2024-06-15 23:42:31,968][1651340] Signal inference workers to resume experience collection... (49700 times) [2024-06-15 23:42:31,969][1652475] InferenceWorker_p0-w0: resuming experience collection (49700 times) [2024-06-15 23:42:32,175][1652475] Updated weights for policy 0, policy_version 963984 (0.0070) [2024-06-15 23:42:32,871][1652475] Updated weights for policy 0, policy_version 964033 (0.0009) [2024-06-15 23:42:33,569][1652475] Updated weights for policy 0, policy_version 964094 (0.0009) [2024-06-15 23:42:35,315][1652475] Updated weights for policy 0, policy_version 964133 (0.0009) [2024-06-15 23:42:35,738][1648984] Fps is (10 sec: 104860.0, 60 sec: 103219.2, 300 sec: 103080.4). Total num frames: 1974599680. Throughput: 0: 25543.1. Samples: 493659136. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:42:35,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 23:42:36,346][1652475] Updated weights for policy 0, policy_version 964177 (0.0034) [2024-06-15 23:42:36,878][1652475] Updated weights for policy 0, policy_version 964224 (0.0011) [2024-06-15 23:42:38,241][1652475] Updated weights for policy 0, policy_version 964275 (0.0012) [2024-06-15 23:42:38,890][1652475] Updated weights for policy 0, policy_version 964322 (0.0021) [2024-06-15 23:42:39,794][1652475] Updated weights for policy 0, policy_version 964372 (0.0010) [2024-06-15 23:42:40,253][1652475] Updated weights for policy 0, policy_version 964414 (0.0013) [2024-06-15 23:42:40,738][1648984] Fps is (10 sec: 104856.0, 60 sec: 102672.9, 300 sec: 103080.3). Total num frames: 1975123968. Throughput: 0: 25500.3. Samples: 493813760. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:42:40,739][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:42:41,198][1652475] Updated weights for policy 0, policy_version 964454 (0.0010) [2024-06-15 23:42:43,510][1652475] Updated weights for policy 0, policy_version 964498 (0.0015) [2024-06-15 23:42:44,110][1652475] Updated weights for policy 0, policy_version 964549 (0.0008) [2024-06-15 23:42:44,665][1652475] Updated weights for policy 0, policy_version 964596 (0.0010) [2024-06-15 23:42:45,281][1652475] Updated weights for policy 0, policy_version 964643 (0.0008) [2024-06-15 23:42:45,738][1648984] Fps is (10 sec: 104857.5, 60 sec: 102673.0, 300 sec: 103080.4). Total num frames: 1975648256. Throughput: 0: 25634.1. Samples: 493970944. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:42:45,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:42:45,897][1652475] Updated weights for policy 0, policy_version 964691 (0.0009) [2024-06-15 23:42:48,170][1652475] Updated weights for policy 0, policy_version 964739 (0.0009) [2024-06-15 23:42:48,738][1652475] Updated weights for policy 0, policy_version 964789 (0.0009) [2024-06-15 23:42:49,724][1652475] Updated weights for policy 0, policy_version 964848 (0.0081) [2024-06-15 23:42:50,226][1651340] Signal inference workers to stop experience collection... (49750 times) [2024-06-15 23:42:50,257][1652475] Updated weights for policy 0, policy_version 964883 (0.0010) [2024-06-15 23:42:50,296][1652475] InferenceWorker_p0-w0: stopping experience collection (49750 times) [2024-06-15 23:42:50,359][1651340] Signal inference workers to resume experience collection... (49750 times) [2024-06-15 23:42:50,359][1652475] InferenceWorker_p0-w0: resuming experience collection (49750 times) [2024-06-15 23:42:50,737][1648984] Fps is (10 sec: 101582.4, 60 sec: 102126.9, 300 sec: 102969.3). Total num frames: 1976139776. Throughput: 0: 25679.7. Samples: 494058496. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:42:50,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:42:50,863][1652475] Updated weights for policy 0, policy_version 964930 (0.0008) [2024-06-15 23:42:51,326][1652475] Updated weights for policy 0, policy_version 964967 (0.0007) [2024-06-15 23:42:53,093][1652475] Updated weights for policy 0, policy_version 965015 (0.0011) [2024-06-15 23:42:54,394][1652475] Updated weights for policy 0, policy_version 965072 (0.0014) [2024-06-15 23:42:54,916][1652475] Updated weights for policy 0, policy_version 965109 (0.0009) [2024-06-15 23:42:55,537][1652475] Updated weights for policy 0, policy_version 965155 (0.0009) [2024-06-15 23:42:55,738][1648984] Fps is (10 sec: 101579.5, 60 sec: 102135.1, 300 sec: 103302.5). Total num frames: 1976664064. Throughput: 0: 26089.3. Samples: 494215680. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:42:55,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:42:56,093][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000965200_1976729600.pth... [2024-06-15 23:42:56,094][1652475] Updated weights for policy 0, policy_version 965200 (0.0010) [2024-06-15 23:42:56,188][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000953104_1951956992.pth [2024-06-15 23:42:56,680][1652475] Updated weights for policy 0, policy_version 965246 (0.0010) [2024-06-15 23:42:58,279][1652475] Updated weights for policy 0, policy_version 965283 (0.0009) [2024-06-15 23:42:59,236][1652475] Updated weights for policy 0, policy_version 965328 (0.0009) [2024-06-15 23:42:59,745][1652475] Updated weights for policy 0, policy_version 965363 (0.0008) [2024-06-15 23:43:00,333][1652475] Updated weights for policy 0, policy_version 965409 (0.0008) [2024-06-15 23:43:00,738][1648984] Fps is (10 sec: 108132.3, 60 sec: 102672.7, 300 sec: 103302.4). Total num frames: 1977221120. Throughput: 0: 25929.9. Samples: 494362624. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:00,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:43:00,962][1652475] Updated weights for policy 0, policy_version 965456 (0.0009) [2024-06-15 23:43:01,531][1652475] Updated weights for policy 0, policy_version 965502 (0.0008) [2024-06-15 23:43:03,048][1652475] Updated weights for policy 0, policy_version 965552 (0.0011) [2024-06-15 23:43:03,873][1652475] Updated weights for policy 0, policy_version 965588 (0.0008) [2024-06-15 23:43:04,476][1652475] Updated weights for policy 0, policy_version 965634 (0.0008) [2024-06-15 23:43:05,086][1652475] Updated weights for policy 0, policy_version 965690 (0.0007) [2024-06-15 23:43:05,738][1648984] Fps is (10 sec: 108135.7, 60 sec: 103765.3, 300 sec: 103191.4). Total num frames: 1977745408. Throughput: 0: 26055.1. Samples: 494447616. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:05,738][1648984] Avg episode reward: [(0, '-0.380')] [2024-06-15 23:43:06,808][1652475] Updated weights for policy 0, policy_version 965730 (0.0008) [2024-06-15 23:43:08,070][1652475] Updated weights for policy 0, policy_version 965776 (0.0009) [2024-06-15 23:43:08,588][1652475] Updated weights for policy 0, policy_version 965811 (0.0009) [2024-06-15 23:43:08,768][1651340] Signal inference workers to stop experience collection... (49800 times) [2024-06-15 23:43:08,787][1652475] InferenceWorker_p0-w0: stopping experience collection (49800 times) [2024-06-15 23:43:08,877][1651340] Signal inference workers to resume experience collection... (49800 times) [2024-06-15 23:43:08,877][1652475] InferenceWorker_p0-w0: resuming experience collection (49800 times) [2024-06-15 23:43:09,191][1652475] Updated weights for policy 0, policy_version 965861 (0.0008) [2024-06-15 23:43:09,833][1652475] Updated weights for policy 0, policy_version 965908 (0.0008) [2024-06-15 23:43:10,275][1652475] Updated weights for policy 0, policy_version 965952 (0.0009) [2024-06-15 23:43:10,739][1648984] Fps is (10 sec: 104857.6, 60 sec: 104857.6, 300 sec: 103080.3). Total num frames: 1978269696. Throughput: 0: 26043.8. Samples: 494605312. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:10,740][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:43:11,999][1652475] Updated weights for policy 0, policy_version 965993 (0.0010) [2024-06-15 23:43:12,826][1652475] Updated weights for policy 0, policy_version 966035 (0.0009) [2024-06-15 23:43:13,497][1652475] Updated weights for policy 0, policy_version 966096 (0.0007) [2024-06-15 23:43:14,101][1652475] Updated weights for policy 0, policy_version 966144 (0.0008) [2024-06-15 23:43:14,755][1652475] Updated weights for policy 0, policy_version 966194 (0.0009) [2024-06-15 23:43:15,737][1648984] Fps is (10 sec: 104858.1, 60 sec: 104857.8, 300 sec: 103080.4). Total num frames: 1978793984. Throughput: 0: 25964.1. Samples: 494761472. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:15,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:43:16,370][1652475] Updated weights for policy 0, policy_version 966224 (0.0011) [2024-06-15 23:43:17,378][1652475] Updated weights for policy 0, policy_version 966273 (0.0009) [2024-06-15 23:43:17,917][1652475] Updated weights for policy 0, policy_version 966320 (0.0010) [2024-06-15 23:43:18,472][1652475] Updated weights for policy 0, policy_version 966358 (0.0008) [2024-06-15 23:43:19,681][1652475] Updated weights for policy 0, policy_version 966419 (0.0009) [2024-06-15 23:43:20,738][1648984] Fps is (10 sec: 104859.3, 60 sec: 104857.5, 300 sec: 103080.3). Total num frames: 1979318272. Throughput: 0: 26339.5. Samples: 494844416. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:20,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:43:21,287][1652475] Updated weights for policy 0, policy_version 966465 (0.0012) [2024-06-15 23:43:21,903][1652475] Updated weights for policy 0, policy_version 966521 (0.0012) [2024-06-15 23:43:22,717][1652475] Updated weights for policy 0, policy_version 966561 (0.0009) [2024-06-15 23:43:23,254][1652475] Updated weights for policy 0, policy_version 966608 (0.0009) [2024-06-15 23:43:24,353][1652475] Updated weights for policy 0, policy_version 966657 (0.0009) [2024-06-15 23:43:25,074][1652475] Updated weights for policy 0, policy_version 966720 (0.0010) [2024-06-15 23:43:25,738][1648984] Fps is (10 sec: 104856.9, 60 sec: 104857.9, 300 sec: 103080.3). Total num frames: 1979842560. Throughput: 0: 26373.8. Samples: 495000576. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:25,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:43:26,991][1652475] Updated weights for policy 0, policy_version 966783 (0.0008) [2024-06-15 23:43:27,260][1651340] Signal inference workers to stop experience collection... (49850 times) [2024-06-15 23:43:27,283][1652475] InferenceWorker_p0-w0: stopping experience collection (49850 times) [2024-06-15 23:43:27,376][1651340] Signal inference workers to resume experience collection... (49850 times) [2024-06-15 23:43:27,376][1652475] InferenceWorker_p0-w0: resuming experience collection (49850 times) [2024-06-15 23:43:27,716][1652475] Updated weights for policy 0, policy_version 966819 (0.0010) [2024-06-15 23:43:28,331][1652475] Updated weights for policy 0, policy_version 966866 (0.0008) [2024-06-15 23:43:29,399][1652475] Updated weights for policy 0, policy_version 966928 (0.0008) [2024-06-15 23:43:29,938][1652475] Updated weights for policy 0, policy_version 966972 (0.0009) [2024-06-15 23:43:30,738][1648984] Fps is (10 sec: 104858.0, 60 sec: 104857.6, 300 sec: 103080.4). Total num frames: 1980366848. Throughput: 0: 26407.8. Samples: 495159296. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:30,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:43:31,429][1652475] Updated weights for policy 0, policy_version 967009 (0.0009) [2024-06-15 23:43:32,412][1652475] Updated weights for policy 0, policy_version 967056 (0.0009) [2024-06-15 23:43:33,050][1652475] Updated weights for policy 0, policy_version 967104 (0.0009) [2024-06-15 23:43:33,665][1652475] Updated weights for policy 0, policy_version 967152 (0.0008) [2024-06-15 23:43:34,253][1652475] Updated weights for policy 0, policy_version 967184 (0.0010) [2024-06-15 23:43:34,786][1652475] Updated weights for policy 0, policy_version 967228 (0.0009) [2024-06-15 23:43:35,737][1648984] Fps is (10 sec: 108135.0, 60 sec: 105403.8, 300 sec: 103191.5). Total num frames: 1980923904. Throughput: 0: 26271.3. Samples: 495240704. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:35,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:43:35,967][1652475] Updated weights for policy 0, policy_version 967270 (0.0008) [2024-06-15 23:43:38,077][1652475] Updated weights for policy 0, policy_version 967313 (0.0012) [2024-06-15 23:43:38,748][1652475] Updated weights for policy 0, policy_version 967363 (0.0009) [2024-06-15 23:43:39,393][1652475] Updated weights for policy 0, policy_version 967409 (0.0009) [2024-06-15 23:43:40,001][1652475] Updated weights for policy 0, policy_version 967456 (0.0009) [2024-06-15 23:43:40,547][1652475] Updated weights for policy 0, policy_version 967491 (0.0008) [2024-06-15 23:43:40,738][1648984] Fps is (10 sec: 108132.2, 60 sec: 105403.7, 300 sec: 103302.4). Total num frames: 1981448192. Throughput: 0: 26259.9. Samples: 495397376. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:40,739][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:43:41,189][1652475] Updated weights for policy 0, policy_version 967548 (0.0008) [2024-06-15 23:43:43,282][1652475] Updated weights for policy 0, policy_version 967585 (0.0011) [2024-06-15 23:43:44,224][1652475] Updated weights for policy 0, policy_version 967632 (0.0010) [2024-06-15 23:43:44,738][1652475] Updated weights for policy 0, policy_version 967667 (0.0008) [2024-06-15 23:43:45,340][1651340] Signal inference workers to stop experience collection... (49900 times) [2024-06-15 23:43:45,349][1652475] Updated weights for policy 0, policy_version 967713 (0.0008) [2024-06-15 23:43:45,398][1652475] InferenceWorker_p0-w0: stopping experience collection (49900 times) [2024-06-15 23:43:45,470][1651340] Signal inference workers to resume experience collection... (49900 times) [2024-06-15 23:43:45,471][1652475] InferenceWorker_p0-w0: resuming experience collection (49900 times) [2024-06-15 23:43:45,737][1648984] Fps is (10 sec: 101581.0, 60 sec: 104857.7, 300 sec: 103413.6). Total num frames: 1981939712. Throughput: 0: 26260.0. Samples: 495544320. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:45,738][1648984] Avg episode reward: [(0, '-0.430')] [2024-06-15 23:43:45,967][1652475] Updated weights for policy 0, policy_version 967762 (0.0008) [2024-06-15 23:43:47,770][1652475] Updated weights for policy 0, policy_version 967812 (0.0010) [2024-06-15 23:43:48,325][1652475] Updated weights for policy 0, policy_version 967865 (0.0008) [2024-06-15 23:43:49,573][1652475] Updated weights for policy 0, policy_version 967904 (0.0008) [2024-06-15 23:43:50,238][1652475] Updated weights for policy 0, policy_version 967954 (0.0008) [2024-06-15 23:43:50,738][1648984] Fps is (10 sec: 98305.1, 60 sec: 104857.5, 300 sec: 103524.7). Total num frames: 1982431232. Throughput: 0: 26112.0. Samples: 495622656. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:50,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:43:50,850][1652475] Updated weights for policy 0, policy_version 968003 (0.0008) [2024-06-15 23:43:51,367][1652475] Updated weights for policy 0, policy_version 968048 (0.0007) [2024-06-15 23:43:52,838][1652475] Updated weights for policy 0, policy_version 968097 (0.0010) [2024-06-15 23:43:54,362][1652475] Updated weights for policy 0, policy_version 968148 (0.0012) [2024-06-15 23:43:54,949][1652475] Updated weights for policy 0, policy_version 968197 (0.0009) [2024-06-15 23:43:55,738][1648984] Fps is (10 sec: 104856.2, 60 sec: 105403.8, 300 sec: 103746.8). Total num frames: 1982988288. Throughput: 0: 26351.0. Samples: 495791104. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:43:55,741][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 23:43:55,765][1652475] Updated weights for policy 0, policy_version 968263 (0.0012) [2024-06-15 23:43:56,356][1652475] Updated weights for policy 0, policy_version 968317 (0.0011) [2024-06-15 23:43:57,993][1652475] Updated weights for policy 0, policy_version 968368 (0.0011) [2024-06-15 23:43:59,121][1652475] Updated weights for policy 0, policy_version 968420 (0.0012) [2024-06-15 23:43:59,621][1652475] Updated weights for policy 0, policy_version 968464 (0.0009) [2024-06-15 23:44:00,158][1652475] Updated weights for policy 0, policy_version 968512 (0.0013) [2024-06-15 23:44:00,738][1648984] Fps is (10 sec: 108134.6, 60 sec: 104857.8, 300 sec: 103524.8). Total num frames: 1983512576. Throughput: 0: 26123.3. Samples: 495937024. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:44:00,738][1648984] Avg episode reward: [(0, '-0.340')] [2024-06-15 23:44:01,730][1652475] Updated weights for policy 0, policy_version 968567 (0.0010) [2024-06-15 23:44:02,696][1652475] Updated weights for policy 0, policy_version 968608 (0.0010) [2024-06-15 23:44:03,217][1652475] Updated weights for policy 0, policy_version 968645 (0.0009) [2024-06-15 23:44:03,364][1651340] Signal inference workers to stop experience collection... (49950 times) [2024-06-15 23:44:03,405][1652475] InferenceWorker_p0-w0: stopping experience collection (49950 times) [2024-06-15 23:44:03,519][1651340] Signal inference workers to resume experience collection... (49950 times) [2024-06-15 23:44:03,520][1652475] InferenceWorker_p0-w0: resuming experience collection (49950 times) [2024-06-15 23:44:03,947][1652475] Updated weights for policy 0, policy_version 968704 (0.0014) [2024-06-15 23:44:04,787][1652475] Updated weights for policy 0, policy_version 968765 (0.0011) [2024-06-15 23:44:05,738][1648984] Fps is (10 sec: 104858.8, 60 sec: 104857.7, 300 sec: 103413.7). Total num frames: 1984036864. Throughput: 0: 26180.3. Samples: 496022528. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:44:05,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:44:07,605][1652475] Updated weights for policy 0, policy_version 968817 (0.0009) [2024-06-15 23:44:08,244][1652475] Updated weights for policy 0, policy_version 968867 (0.0010) [2024-06-15 23:44:08,920][1652475] Updated weights for policy 0, policy_version 968917 (0.0009) [2024-06-15 23:44:09,571][1652475] Updated weights for policy 0, policy_version 968962 (0.0008) [2024-06-15 23:44:10,087][1652475] Updated weights for policy 0, policy_version 969008 (0.0008) [2024-06-15 23:44:10,738][1648984] Fps is (10 sec: 104856.5, 60 sec: 104857.7, 300 sec: 103414.5). Total num frames: 1984561152. Throughput: 0: 25941.3. Samples: 496167936. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:44:10,739][1648984] Avg episode reward: [(0, '-0.220')] [2024-06-15 23:44:12,392][1652475] Updated weights for policy 0, policy_version 969059 (0.0076) [2024-06-15 23:44:12,736][1652475] Updated weights for policy 0, policy_version 969088 (0.0071) [2024-06-15 23:44:13,434][1652475] Updated weights for policy 0, policy_version 969148 (0.0011) [2024-06-15 23:44:14,081][1652475] Updated weights for policy 0, policy_version 969185 (0.0009) [2024-06-15 23:44:15,455][1652475] Updated weights for policy 0, policy_version 969240 (0.0011) [2024-06-15 23:44:15,738][1648984] Fps is (10 sec: 101579.1, 60 sec: 104311.2, 300 sec: 103635.7). Total num frames: 1985052672. Throughput: 0: 25918.5. Samples: 496325632. Policy #0 lag: (min: 114.0, avg: 168.9, max: 322.0) [2024-06-15 23:44:15,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:44:16,606][1652475] Updated weights for policy 0, policy_version 969283 (0.0009) [2024-06-15 23:44:17,639][1652475] Updated weights for policy 0, policy_version 969360 (0.0010) [2024-06-15 23:44:18,856][1652475] Updated weights for policy 0, policy_version 969409 (0.0036) [2024-06-15 23:44:19,452][1652475] Updated weights for policy 0, policy_version 969465 (0.0010) [2024-06-15 23:44:20,737][1648984] Fps is (10 sec: 95029.3, 60 sec: 103219.4, 300 sec: 103413.6). Total num frames: 1985511424. Throughput: 0: 25622.8. Samples: 496393728. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:44:20,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:44:20,990][1652475] Updated weights for policy 0, policy_version 969520 (0.0012) [2024-06-15 23:44:22,211][1652475] Updated weights for policy 0, policy_version 969568 (0.0009) [2024-06-15 23:44:22,856][1651340] Signal inference workers to stop experience collection... (50000 times) [2024-06-15 23:44:22,877][1652475] InferenceWorker_p0-w0: stopping experience collection (50000 times) [2024-06-15 23:44:22,878][1652475] Updated weights for policy 0, policy_version 969618 (0.0008) [2024-06-15 23:44:22,983][1651340] Signal inference workers to resume experience collection... (50000 times) [2024-06-15 23:44:22,984][1652475] InferenceWorker_p0-w0: resuming experience collection (50000 times) [2024-06-15 23:44:23,547][1652475] Updated weights for policy 0, policy_version 969668 (0.0009) [2024-06-15 23:44:24,181][1652475] Updated weights for policy 0, policy_version 969728 (0.0009) [2024-06-15 23:44:25,737][1648984] Fps is (10 sec: 101582.5, 60 sec: 103765.4, 300 sec: 103524.7). Total num frames: 1986068480. Throughput: 0: 25668.4. Samples: 496552448. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:44:25,738][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:44:26,033][1652475] Updated weights for policy 0, policy_version 969787 (0.0013) [2024-06-15 23:44:27,465][1652475] Updated weights for policy 0, policy_version 969849 (0.0009) [2024-06-15 23:44:28,134][1652475] Updated weights for policy 0, policy_version 969889 (0.0008) [2024-06-15 23:44:28,417][1652475] Updated weights for policy 0, policy_version 969920 (0.0009) [2024-06-15 23:44:29,271][1652475] Updated weights for policy 0, policy_version 969975 (0.0012) [2024-06-15 23:44:30,732][1652475] Updated weights for policy 0, policy_version 970017 (0.0010) [2024-06-15 23:44:30,743][1648984] Fps is (10 sec: 108089.7, 60 sec: 103758.3, 300 sec: 103301.1). Total num frames: 1986592768. Throughput: 0: 26041.4. Samples: 496716288. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:44:30,744][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:44:31,460][1652475] Updated weights for policy 0, policy_version 970053 (0.0008) [2024-06-15 23:44:32,053][1652475] Updated weights for policy 0, policy_version 970109 (0.0008) [2024-06-15 23:44:33,690][1652475] Updated weights for policy 0, policy_version 970149 (0.0008) [2024-06-15 23:44:34,258][1652475] Updated weights for policy 0, policy_version 970196 (0.0009) [2024-06-15 23:44:34,908][1652475] Updated weights for policy 0, policy_version 970243 (0.0013) [2024-06-15 23:44:35,503][1652475] Updated weights for policy 0, policy_version 970302 (0.0010) [2024-06-15 23:44:35,738][1648984] Fps is (10 sec: 111410.2, 60 sec: 104311.3, 300 sec: 103635.7). Total num frames: 1987182592. Throughput: 0: 26157.5. Samples: 496799744. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:44:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:44:36,450][1652475] Updated weights for policy 0, policy_version 970352 (0.0011) [2024-06-15 23:44:39,049][1652475] Updated weights for policy 0, policy_version 970400 (0.0009) [2024-06-15 23:44:39,380][1652475] Updated weights for policy 0, policy_version 970430 (0.0010) [2024-06-15 23:44:40,366][1652475] Updated weights for policy 0, policy_version 970481 (0.0018) [2024-06-15 23:44:40,738][1648984] Fps is (10 sec: 101621.2, 60 sec: 102673.2, 300 sec: 103413.6). Total num frames: 1987608576. Throughput: 0: 25804.8. Samples: 496952320. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:44:40,741][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:44:41,058][1652475] Updated weights for policy 0, policy_version 970544 (0.0009) [2024-06-15 23:44:41,746][1651340] Signal inference workers to stop experience collection... (50050 times) [2024-06-15 23:44:41,788][1652475] Updated weights for policy 0, policy_version 970596 (0.0010) [2024-06-15 23:44:41,801][1652475] InferenceWorker_p0-w0: stopping experience collection (50050 times) [2024-06-15 23:44:41,887][1651340] Signal inference workers to resume experience collection... (50050 times) [2024-06-15 23:44:41,888][1652475] InferenceWorker_p0-w0: resuming experience collection (50050 times) [2024-06-15 23:44:42,090][1652475] Updated weights for policy 0, policy_version 970624 (0.0011) [2024-06-15 23:44:44,296][1652475] Updated weights for policy 0, policy_version 970683 (0.0009) [2024-06-15 23:44:45,407][1652475] Updated weights for policy 0, policy_version 970740 (0.0011) [2024-06-15 23:44:45,738][1648984] Fps is (10 sec: 95027.1, 60 sec: 103219.0, 300 sec: 103746.8). Total num frames: 1988132864. Throughput: 0: 25895.8. Samples: 497102336. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:44:45,738][1648984] Avg episode reward: [(0, '-0.250')] [2024-06-15 23:44:46,001][1652475] Updated weights for policy 0, policy_version 970790 (0.0009) [2024-06-15 23:44:46,800][1652475] Updated weights for policy 0, policy_version 970835 (0.0011) [2024-06-15 23:44:48,260][1652475] Updated weights for policy 0, policy_version 970882 (0.0011) [2024-06-15 23:44:48,918][1652475] Updated weights for policy 0, policy_version 970943 (0.0009) [2024-06-15 23:44:50,311][1652475] Updated weights for policy 0, policy_version 970999 (0.0011) [2024-06-15 23:44:50,738][1648984] Fps is (10 sec: 104858.0, 60 sec: 103765.4, 300 sec: 103746.8). Total num frames: 1988657152. Throughput: 0: 25782.0. Samples: 497182720. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:44:50,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:44:50,988][1652475] Updated weights for policy 0, policy_version 971056 (0.0009) [2024-06-15 23:44:52,008][1652475] Updated weights for policy 0, policy_version 971104 (0.0010) [2024-06-15 23:44:53,689][1652475] Updated weights for policy 0, policy_version 971156 (0.0011) [2024-06-15 23:44:54,646][1652475] Updated weights for policy 0, policy_version 971205 (0.0009) [2024-06-15 23:44:55,219][1652475] Updated weights for policy 0, policy_version 971261 (0.0008) [2024-06-15 23:44:55,738][1648984] Fps is (10 sec: 101580.3, 60 sec: 102673.0, 300 sec: 103524.6). Total num frames: 1989148672. Throughput: 0: 25747.9. Samples: 497326592. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:44:55,740][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:44:55,745][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000971264_1989148672.pth... [2024-06-15 23:44:55,790][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000959152_1964343296.pth [2024-06-15 23:44:56,622][1652475] Updated weights for policy 0, policy_version 971302 (0.0008) [2024-06-15 23:44:57,294][1652475] Updated weights for policy 0, policy_version 971360 (0.0010) [2024-06-15 23:44:57,625][1652475] Updated weights for policy 0, policy_version 971390 (0.0011) [2024-06-15 23:44:58,875][1652475] Updated weights for policy 0, policy_version 971440 (0.0012) [2024-06-15 23:44:59,664][1652475] Updated weights for policy 0, policy_version 971504 (0.0011) [2024-06-15 23:45:00,738][1648984] Fps is (10 sec: 101580.6, 60 sec: 102673.0, 300 sec: 103524.6). Total num frames: 1989672960. Throughput: 0: 25645.5. Samples: 497479680. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:00,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:45:01,230][1652475] Updated weights for policy 0, policy_version 971552 (0.0009) [2024-06-15 23:45:02,626][1651340] Signal inference workers to stop experience collection... (50100 times) [2024-06-15 23:45:02,653][1652475] InferenceWorker_p0-w0: stopping experience collection (50100 times) [2024-06-15 23:45:02,747][1651340] Signal inference workers to resume experience collection... (50100 times) [2024-06-15 23:45:02,748][1652475] InferenceWorker_p0-w0: resuming experience collection (50100 times) [2024-06-15 23:45:02,844][1652475] Updated weights for policy 0, policy_version 971601 (0.0012) [2024-06-15 23:45:03,241][1652475] Updated weights for policy 0, policy_version 971640 (0.0010) [2024-06-15 23:45:03,949][1652475] Updated weights for policy 0, policy_version 971683 (0.0009) [2024-06-15 23:45:04,474][1652475] Updated weights for policy 0, policy_version 971716 (0.0009) [2024-06-15 23:45:05,105][1652475] Updated weights for policy 0, policy_version 971769 (0.0011) [2024-06-15 23:45:05,738][1648984] Fps is (10 sec: 104858.9, 60 sec: 102673.0, 300 sec: 103746.8). Total num frames: 1990197248. Throughput: 0: 25986.8. Samples: 497563136. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:05,738][1648984] Avg episode reward: [(0, '-0.290')] [2024-06-15 23:45:06,131][1652475] Updated weights for policy 0, policy_version 971809 (0.0009) [2024-06-15 23:45:07,770][1652475] Updated weights for policy 0, policy_version 971856 (0.0011) [2024-06-15 23:45:08,509][1652475] Updated weights for policy 0, policy_version 971905 (0.0009) [2024-06-15 23:45:08,986][1652475] Updated weights for policy 0, policy_version 971944 (0.0009) [2024-06-15 23:45:10,399][1652475] Updated weights for policy 0, policy_version 972000 (0.0009) [2024-06-15 23:45:10,738][1648984] Fps is (10 sec: 101580.4, 60 sec: 102127.0, 300 sec: 103858.0). Total num frames: 1990688768. Throughput: 0: 25850.2. Samples: 497715712. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:10,738][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:45:11,127][1652475] Updated weights for policy 0, policy_version 972052 (0.0010) [2024-06-15 23:45:12,616][1652475] Updated weights for policy 0, policy_version 972099 (0.0009) [2024-06-15 23:45:13,289][1652475] Updated weights for policy 0, policy_version 972160 (0.0008) [2024-06-15 23:45:13,865][1652475] Updated weights for policy 0, policy_version 972208 (0.0008) [2024-06-15 23:45:15,172][1652475] Updated weights for policy 0, policy_version 972256 (0.0008) [2024-06-15 23:45:15,737][1648984] Fps is (10 sec: 104857.8, 60 sec: 103219.5, 300 sec: 103746.8). Total num frames: 1991245824. Throughput: 0: 25886.8. Samples: 497881088. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:15,738][1648984] Avg episode reward: [(0, '-0.230')] [2024-06-15 23:45:15,852][1652475] Updated weights for policy 0, policy_version 972305 (0.0009) [2024-06-15 23:45:17,441][1652475] Updated weights for policy 0, policy_version 972355 (0.0009) [2024-06-15 23:45:18,264][1652475] Updated weights for policy 0, policy_version 972420 (0.0015) [2024-06-15 23:45:18,863][1652475] Updated weights for policy 0, policy_version 972474 (0.0009) [2024-06-15 23:45:20,245][1652475] Updated weights for policy 0, policy_version 972516 (0.0009) [2024-06-15 23:45:20,384][1651340] Signal inference workers to stop experience collection... (50150 times) [2024-06-15 23:45:20,443][1652475] InferenceWorker_p0-w0: stopping experience collection (50150 times) [2024-06-15 23:45:20,530][1651340] Signal inference workers to resume experience collection... (50150 times) [2024-06-15 23:45:20,531][1652475] InferenceWorker_p0-w0: resuming experience collection (50150 times) [2024-06-15 23:45:20,738][1648984] Fps is (10 sec: 111412.0, 60 sec: 104857.4, 300 sec: 103746.8). Total num frames: 1991802880. Throughput: 0: 25770.7. Samples: 497959424. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:20,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:45:20,930][1652475] Updated weights for policy 0, policy_version 972576 (0.0008) [2024-06-15 23:45:22,513][1652475] Updated weights for policy 0, policy_version 972629 (0.0010) [2024-06-15 23:45:23,143][1652475] Updated weights for policy 0, policy_version 972678 (0.0060) [2024-06-15 23:45:23,745][1652475] Updated weights for policy 0, policy_version 972732 (0.0009) [2024-06-15 23:45:25,378][1652475] Updated weights for policy 0, policy_version 972784 (0.0009) [2024-06-15 23:45:25,738][1648984] Fps is (10 sec: 104854.1, 60 sec: 103764.7, 300 sec: 103524.5). Total num frames: 1992294400. Throughput: 0: 25793.3. Samples: 498113024. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:25,739][1648984] Avg episode reward: [(0, '-0.310')] [2024-06-15 23:45:26,143][1652475] Updated weights for policy 0, policy_version 972833 (0.0010) [2024-06-15 23:45:27,246][1652475] Updated weights for policy 0, policy_version 972868 (0.0010) [2024-06-15 23:45:27,861][1652475] Updated weights for policy 0, policy_version 972923 (0.0010) [2024-06-15 23:45:28,735][1652475] Updated weights for policy 0, policy_version 972963 (0.0009) [2024-06-15 23:45:29,747][1652475] Updated weights for policy 0, policy_version 973008 (0.0010) [2024-06-15 23:45:30,598][1652475] Updated weights for policy 0, policy_version 973060 (0.0011) [2024-06-15 23:45:30,742][1648984] Fps is (10 sec: 104807.1, 60 sec: 104310.1, 300 sec: 103411.9). Total num frames: 1992851456. Throughput: 0: 25972.7. Samples: 498271232. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:30,743][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:45:31,111][1652475] Updated weights for policy 0, policy_version 973104 (0.0009) [2024-06-15 23:45:32,047][1652475] Updated weights for policy 0, policy_version 973140 (0.0008) [2024-06-15 23:45:34,312][1652475] Updated weights for policy 0, policy_version 973186 (0.0010) [2024-06-15 23:45:34,969][1652475] Updated weights for policy 0, policy_version 973238 (0.0011) [2024-06-15 23:45:35,737][1648984] Fps is (10 sec: 101584.3, 60 sec: 102127.1, 300 sec: 103415.2). Total num frames: 1993310208. Throughput: 0: 25861.7. Samples: 498346496. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:35,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:45:35,799][1652475] Updated weights for policy 0, policy_version 973302 (0.0064) [2024-06-15 23:45:36,512][1652475] Updated weights for policy 0, policy_version 973360 (0.0011) [2024-06-15 23:45:37,044][1652475] Updated weights for policy 0, policy_version 973395 (0.0009) [2024-06-15 23:45:37,478][1652475] Updated weights for policy 0, policy_version 973434 (0.0010) [2024-06-15 23:45:39,881][1652475] Updated weights for policy 0, policy_version 973477 (0.0009) [2024-06-15 23:45:40,319][1651340] Signal inference workers to stop experience collection... (50200 times) [2024-06-15 23:45:40,345][1652475] InferenceWorker_p0-w0: stopping experience collection (50200 times) [2024-06-15 23:45:40,443][1651340] Signal inference workers to resume experience collection... (50200 times) [2024-06-15 23:45:40,444][1652475] InferenceWorker_p0-w0: resuming experience collection (50200 times) [2024-06-15 23:45:40,598][1652475] Updated weights for policy 0, policy_version 973530 (0.0009) [2024-06-15 23:45:40,738][1648984] Fps is (10 sec: 95072.9, 60 sec: 103219.3, 300 sec: 103302.5). Total num frames: 1993801728. Throughput: 0: 26066.5. Samples: 498499584. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:40,738][1648984] Avg episode reward: [(0, '-0.350')] [2024-06-15 23:45:41,293][1652475] Updated weights for policy 0, policy_version 973584 (0.0010) [2024-06-15 23:45:41,782][1652475] Updated weights for policy 0, policy_version 973626 (0.0009) [2024-06-15 23:45:42,607][1652475] Updated weights for policy 0, policy_version 973680 (0.0011) [2024-06-15 23:45:44,500][1652475] Updated weights for policy 0, policy_version 973728 (0.0009) [2024-06-15 23:45:45,096][1652475] Updated weights for policy 0, policy_version 973761 (0.0008) [2024-06-15 23:45:45,738][1648984] Fps is (10 sec: 104856.7, 60 sec: 103765.4, 300 sec: 103635.7). Total num frames: 1994358784. Throughput: 0: 26168.9. Samples: 498657280. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:45,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:45:45,752][1652475] Updated weights for policy 0, policy_version 973817 (0.0009) [2024-06-15 23:45:46,537][1652475] Updated weights for policy 0, policy_version 973857 (0.0009) [2024-06-15 23:45:47,472][1652475] Updated weights for policy 0, policy_version 973906 (0.0021) [2024-06-15 23:45:47,984][1652475] Updated weights for policy 0, policy_version 973952 (0.0010) [2024-06-15 23:45:49,677][1652475] Updated weights for policy 0, policy_version 974000 (0.0009) [2024-06-15 23:45:50,283][1652475] Updated weights for policy 0, policy_version 974048 (0.0009) [2024-06-15 23:45:50,738][1648984] Fps is (10 sec: 111411.4, 60 sec: 104311.5, 300 sec: 103858.0). Total num frames: 1994915840. Throughput: 0: 25895.8. Samples: 498728448. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:50,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:45:51,770][1652475] Updated weights for policy 0, policy_version 974096 (0.0010) [2024-06-15 23:45:52,427][1652475] Updated weights for policy 0, policy_version 974147 (0.0009) [2024-06-15 23:45:53,026][1652475] Updated weights for policy 0, policy_version 974205 (0.0009) [2024-06-15 23:45:54,376][1652475] Updated weights for policy 0, policy_version 974256 (0.0010) [2024-06-15 23:45:54,875][1652475] Updated weights for policy 0, policy_version 974290 (0.0008) [2024-06-15 23:45:55,365][1652475] Updated weights for policy 0, policy_version 974336 (0.0009) [2024-06-15 23:45:55,738][1648984] Fps is (10 sec: 108134.3, 60 sec: 104857.7, 300 sec: 103746.9). Total num frames: 1995440128. Throughput: 0: 26066.5. Samples: 498888704. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:45:55,738][1648984] Avg episode reward: [(0, '-0.240')] [2024-06-15 23:45:57,102][1652475] Updated weights for policy 0, policy_version 974392 (0.0012) [2024-06-15 23:45:58,471][1652475] Updated weights for policy 0, policy_version 974436 (0.0009) [2024-06-15 23:45:59,012][1652475] Updated weights for policy 0, policy_version 974469 (0.0010) [2024-06-15 23:45:59,367][1651340] Signal inference workers to stop experience collection... (50250 times) [2024-06-15 23:45:59,398][1652475] InferenceWorker_p0-w0: stopping experience collection (50250 times) [2024-06-15 23:45:59,484][1651340] Signal inference workers to resume experience collection... (50250 times) [2024-06-15 23:45:59,485][1652475] InferenceWorker_p0-w0: resuming experience collection (50250 times) [2024-06-15 23:45:59,595][1652475] Updated weights for policy 0, policy_version 974517 (0.0010) [2024-06-15 23:46:00,165][1652475] Updated weights for policy 0, policy_version 974566 (0.0010) [2024-06-15 23:46:00,738][1648984] Fps is (10 sec: 104857.9, 60 sec: 104857.7, 300 sec: 103635.7). Total num frames: 1995964416. Throughput: 0: 25770.7. Samples: 499040768. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:46:00,738][1648984] Avg episode reward: [(0, '-0.360')] [2024-06-15 23:46:01,246][1652475] Updated weights for policy 0, policy_version 974614 (0.0009) [2024-06-15 23:46:03,269][1652475] Updated weights for policy 0, policy_version 974658 (0.0010) [2024-06-15 23:46:03,837][1652475] Updated weights for policy 0, policy_version 974707 (0.0008) [2024-06-15 23:46:04,513][1652475] Updated weights for policy 0, policy_version 974768 (0.0009) [2024-06-15 23:46:05,623][1652475] Updated weights for policy 0, policy_version 974822 (0.0012) [2024-06-15 23:46:05,738][1648984] Fps is (10 sec: 101579.2, 60 sec: 104311.1, 300 sec: 103746.8). Total num frames: 1996455936. Throughput: 0: 26089.1. Samples: 499133440. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:46:05,739][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:46:06,243][1652475] Updated weights for policy 0, policy_version 974868 (0.0009) [2024-06-15 23:46:06,713][1652475] Updated weights for policy 0, policy_version 974912 (0.0009) [2024-06-15 23:46:08,836][1652475] Updated weights for policy 0, policy_version 974976 (0.0010) [2024-06-15 23:46:09,528][1652475] Updated weights for policy 0, policy_version 975024 (0.0013) [2024-06-15 23:46:10,738][1648984] Fps is (10 sec: 98303.8, 60 sec: 104311.6, 300 sec: 103857.9). Total num frames: 1996947456. Throughput: 0: 25850.5. Samples: 499276288. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:46:10,738][1648984] Avg episode reward: [(0, '-0.270')] [2024-06-15 23:46:10,766][1652475] Updated weights for policy 0, policy_version 975074 (0.0010) [2024-06-15 23:46:11,614][1652475] Updated weights for policy 0, policy_version 975122 (0.0010) [2024-06-15 23:46:12,144][1652475] Updated weights for policy 0, policy_version 975168 (0.0008) [2024-06-15 23:46:13,395][1652475] Updated weights for policy 0, policy_version 975208 (0.0008) [2024-06-15 23:46:14,003][1652475] Updated weights for policy 0, policy_version 975251 (0.0010) [2024-06-15 23:46:15,710][1652475] Updated weights for policy 0, policy_version 975297 (0.0008) [2024-06-15 23:46:15,737][1648984] Fps is (10 sec: 95029.6, 60 sec: 102673.1, 300 sec: 103413.6). Total num frames: 1997406208. Throughput: 0: 25784.8. Samples: 499431424. Policy #0 lag: (min: 11.0, avg: 110.5, max: 267.0) [2024-06-15 23:46:15,738][1648984] Avg episode reward: [(0, '-0.370')] [2024-06-15 23:46:16,230][1652475] Updated weights for policy 0, policy_version 975344 (0.0008) [2024-06-15 23:46:17,338][1652475] Updated weights for policy 0, policy_version 975408 (0.0011) [2024-06-15 23:46:18,052][1652475] Updated weights for policy 0, policy_version 975456 (0.0009) [2024-06-15 23:46:18,128][1651340] Signal inference workers to stop experience collection... (50300 times) [2024-06-15 23:46:18,154][1652475] InferenceWorker_p0-w0: stopping experience collection (50300 times) [2024-06-15 23:46:18,259][1651340] Signal inference workers to resume experience collection... (50300 times) [2024-06-15 23:46:18,260][1652475] InferenceWorker_p0-w0: resuming experience collection (50300 times) [2024-06-15 23:46:18,944][1652475] Updated weights for policy 0, policy_version 975520 (0.0009) [2024-06-15 23:46:20,738][1648984] Fps is (10 sec: 101579.2, 60 sec: 102672.8, 300 sec: 103635.8). Total num frames: 1997963264. Throughput: 0: 25827.4. Samples: 499508736. Policy #0 lag: (min: 15.0, avg: 127.3, max: 271.0) [2024-06-15 23:46:20,739][1648984] Avg episode reward: [(0, '-0.390')] [2024-06-15 23:46:20,796][1652475] Updated weights for policy 0, policy_version 975570 (0.0010) [2024-06-15 23:46:21,771][1652475] Updated weights for policy 0, policy_version 975620 (0.0010) [2024-06-15 23:46:22,423][1652475] Updated weights for policy 0, policy_version 975675 (0.0010) [2024-06-15 23:46:23,230][1652475] Updated weights for policy 0, policy_version 975718 (0.0007) [2024-06-15 23:46:23,977][1652475] Updated weights for policy 0, policy_version 975776 (0.0010) [2024-06-15 23:46:25,531][1652475] Updated weights for policy 0, policy_version 975829 (0.0012) [2024-06-15 23:46:25,738][1648984] Fps is (10 sec: 111406.6, 60 sec: 103765.2, 300 sec: 103857.8). Total num frames: 1998520320. Throughput: 0: 25884.3. Samples: 499664384. Policy #0 lag: (min: 15.0, avg: 127.3, max: 271.0) [2024-06-15 23:46:25,739][1648984] Avg episode reward: [(0, '-0.300')] [2024-06-15 23:46:25,995][1652475] Updated weights for policy 0, policy_version 975872 (0.0010) [2024-06-15 23:46:27,324][1652475] Updated weights for policy 0, policy_version 975931 (0.0010) [2024-06-15 23:46:28,344][1652475] Updated weights for policy 0, policy_version 975971 (0.0019) [2024-06-15 23:46:29,268][1652475] Updated weights for policy 0, policy_version 976016 (0.0011) [2024-06-15 23:46:29,802][1652475] Updated weights for policy 0, policy_version 976058 (0.0008) [2024-06-15 23:46:30,738][1648984] Fps is (10 sec: 104857.8, 60 sec: 102681.1, 300 sec: 103746.8). Total num frames: 1999011840. Throughput: 0: 25975.4. Samples: 499826176. Policy #0 lag: (min: 15.0, avg: 127.3, max: 271.0) [2024-06-15 23:46:30,738][1648984] Avg episode reward: [(0, '-0.320')] [2024-06-15 23:46:30,844][1652475] Updated weights for policy 0, policy_version 976097 (0.0010) [2024-06-15 23:46:31,122][1652475] Updated weights for policy 0, policy_version 976128 (0.0009) [2024-06-15 23:46:31,728][1652475] Updated weights for policy 0, policy_version 976165 (0.0009) [2024-06-15 23:46:32,855][1652475] Updated weights for policy 0, policy_version 976211 (0.0010) [2024-06-15 23:46:33,259][1652475] Updated weights for policy 0, policy_version 976248 (0.0010) [2024-06-15 23:46:35,002][1652475] Updated weights for policy 0, policy_version 976288 (0.0009) [2024-06-15 23:46:35,552][1652475] Updated weights for policy 0, policy_version 976325 (0.0008) [2024-06-15 23:46:35,738][1648984] Fps is (10 sec: 101584.1, 60 sec: 103765.2, 300 sec: 103635.7). Total num frames: 1999536128. Throughput: 0: 26032.3. Samples: 499899904. Policy #0 lag: (min: 15.0, avg: 127.3, max: 271.0) [2024-06-15 23:46:35,738][1648984] Avg episode reward: [(0, '-0.330')] [2024-06-15 23:46:36,254][1652475] Updated weights for policy 0, policy_version 976384 (0.0008) [2024-06-15 23:46:36,891][1652475] Updated weights for policy 0, policy_version 976432 (0.0008) [2024-06-15 23:46:37,460][1651340] Signal inference workers to stop experience collection... (50350 times) [2024-06-15 23:46:37,499][1652475] InferenceWorker_p0-w0: stopping experience collection (50350 times) [2024-06-15 23:46:37,594][1651340] Signal inference workers to resume experience collection... (50350 times) [2024-06-15 23:46:37,595][1652475] InferenceWorker_p0-w0: resuming experience collection (50350 times) [2024-06-15 23:46:37,597][1652475] Updated weights for policy 0, policy_version 976480 (0.0009) [2024-06-15 23:46:39,721][1652475] Updated weights for policy 0, policy_version 976515 (0.0009) [2024-06-15 23:46:40,357][1652475] Updated weights for policy 0, policy_version 976565 (0.0008) [2024-06-15 23:46:40,642][1652487] Stopping RolloutWorker_w3... [2024-06-15 23:46:40,642][1652477] Stopping RolloutWorker_w1... [2024-06-15 23:46:40,642][1648984] Component RolloutWorker_w1 stopped! [2024-06-15 23:46:40,643][1652477] Loop rollout_proc1_evt_loop terminating... [2024-06-15 23:46:40,643][1652487] Loop rollout_proc3_evt_loop terminating... [2024-06-15 23:46:40,643][1648984] Component RolloutWorker_w2 stopped! [2024-06-15 23:46:40,643][1648984] Component RolloutWorker_w3 stopped! [2024-06-15 23:46:40,642][1652479] Stopping RolloutWorker_w2... [2024-06-15 23:46:40,643][1648984] Component RolloutWorker_w0 stopped! [2024-06-15 23:46:40,644][1652479] Loop rollout_proc2_evt_loop terminating... [2024-06-15 23:46:40,642][1652476] Stopping RolloutWorker_w0... [2024-06-15 23:46:40,644][1652476] Loop rollout_proc0_evt_loop terminating... [2024-06-15 23:46:40,665][1652475] Weights refcount: 2 0 [2024-06-15 23:46:40,666][1652475] Stopping InferenceWorker_p0-w0... [2024-06-15 23:46:40,666][1652475] Loop inference_proc0-0_evt_loop terminating... [2024-06-15 23:46:40,666][1648984] Component InferenceWorker_p0-w0 stopped! [2024-06-15 23:46:40,701][1648984] Component Batcher_0 stopped! [2024-06-15 23:46:40,701][1651340] Stopping Batcher_0... [2024-06-15 23:46:40,702][1651340] Loop batcher_evt_loop terminating... [2024-06-15 23:46:40,857][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000976608_2000093184.pth... [2024-06-15 23:46:40,900][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000965200_1976729600.pth [2024-06-15 23:46:41,038][1651340] Saving train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000976624_2000125952.pth... [2024-06-15 23:46:41,069][1651340] Removing train_dir/atari_2B_atari_journeyescape_1111/checkpoint_p0/checkpoint_000971264_1989148672.pth [2024-06-15 23:46:41,072][1651340] Stopping LearnerWorker_p0... [2024-06-15 23:46:41,072][1651340] Loop learner_proc0_evt_loop terminating... [2024-06-15 23:46:41,072][1648984] Component LearnerWorker_p0 stopped! [2024-06-15 23:46:41,072][1648984] Waiting for process learner_proc0 to stop... [2024-06-15 23:46:42,097][1648984] Waiting for process inference_proc0-0 to join... [2024-06-15 23:46:42,098][1648984] Waiting for process rollout_proc0 to join... [2024-06-15 23:46:42,098][1648984] Waiting for process rollout_proc1 to join... [2024-06-15 23:46:42,099][1648984] Waiting for process rollout_proc2 to join... [2024-06-15 23:46:42,099][1648984] Waiting for process rollout_proc3 to join... [2024-06-15 23:46:42,099][1648984] Batcher 0 profile tree view: batching: 2545.8504, releasing_batches: 4860.8707 [2024-06-15 23:46:42,100][1648984] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 14474.2321 update_model: 529.4970 weight_update: 0.0009 one_step: 0.0373 handle_policy_step: 20765.4692 deserialize: 17.9929, stack: 3254.2655, obs_to_device_normalize: 12052.9636, forward: 4111.5162, prepare_outputs: 887.4402, send_messages: 168.3345 [2024-06-15 23:46:42,100][1648984] Learner 0 profile tree view: misc: 0.3909, prepare_batch: 6016.1444 train: 15512.5140 epoch_init: 3.1919, minibatch_init: 187.4616, losses_postprocess: 2296.6268, kl_divergence: 1172.0836, update: 5796.4933, after_optimizer: 2989.5784 calculate_losses: 2846.9326 losses_init: 5.6336, forward_head: 1133.8963, bptt_initial: 18.4757, bptt: 25.4415, tail: 604.3835, advantages_returns: 174.1003, losses: 703.7908 [2024-06-15 23:46:42,100][1648984] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6084, enqueue_policy_requests: 1690.7816, process_policy_outputs: 70.3390, env_step: 23826.5997, finalize_trajectories: 24.3051, complete_rollouts: 5.4270 post_env_step: 111.8782 process_env_step: 24.5237 [2024-06-15 23:46:42,101][1648984] RolloutWorker_w3 profile tree view: wait_for_trajectories: 0.6490, enqueue_policy_requests: 1672.7538, process_policy_outputs: 69.1022, env_step: 24441.0194, finalize_trajectories: 24.6177, complete_rollouts: 5.6299 post_env_step: 110.9511 process_env_step: 24.6968 [2024-06-15 23:46:42,101][1648984] Loop Runner_EvtLoop terminating... [2024-06-15 23:46:42,102][1648984] Runner profile tree view: main_loop: 44132.9735 [2024-06-15 23:46:42,102][1648984] Collected {0: 2000125952}, FPS: 45320.4