[2023-09-20 19:12:24,648][105923] Saving configuration to ./train_dir/Swimmer/config.json... [2023-09-20 19:12:24,714][105923] Rollout worker 0 uses device cpu [2023-09-20 19:12:24,714][105923] Rollout worker 1 uses device cpu [2023-09-20 19:12:24,715][105923] Rollout worker 2 uses device cpu [2023-09-20 19:12:24,715][105923] Rollout worker 3 uses device cpu [2023-09-20 19:12:24,716][105923] Rollout worker 4 uses device cpu [2023-09-20 19:12:24,716][105923] Rollout worker 5 uses device cpu [2023-09-20 19:12:24,717][105923] Rollout worker 6 uses device cpu [2023-09-20 19:12:24,717][105923] Rollout worker 7 uses device cpu [2023-09-20 19:12:24,718][105923] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-20 19:12:24,764][105923] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-20 19:12:24,764][105923] InferenceWorker_p0-w0: min num requests: 1 [2023-09-20 19:12:24,767][105923] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-20 19:12:24,768][105923] InferenceWorker_p1-w0: min num requests: 1 [2023-09-20 19:12:24,793][105923] Starting all processes... [2023-09-20 19:12:24,794][105923] Starting process learner_proc0 [2023-09-20 19:12:24,822][105923] Starting process learner_proc1 [2023-09-20 19:12:24,843][105923] Starting all processes... [2023-09-20 19:12:24,850][105923] Starting process inference_proc0-0 [2023-09-20 19:12:24,850][105923] Starting process inference_proc1-0 [2023-09-20 19:12:24,850][105923] Starting process rollout_proc0 [2023-09-20 19:12:24,850][105923] Starting process rollout_proc1 [2023-09-20 19:12:24,851][105923] Starting process rollout_proc2 [2023-09-20 19:12:24,851][105923] Starting process rollout_proc3 [2023-09-20 19:12:24,851][105923] Starting process rollout_proc4 [2023-09-20 19:12:24,855][105923] Starting process rollout_proc5 [2023-09-20 19:12:24,855][105923] Starting process rollout_proc6 [2023-09-20 19:12:24,856][105923] Starting process rollout_proc7 [2023-09-20 19:12:26,718][106486] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-20 19:12:26,718][106486] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1 [2023-09-20 19:12:26,736][106467] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-20 19:12:26,736][106467] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1 [2023-09-20 19:12:26,736][106486] Num visible devices: 1 [2023-09-20 19:12:26,755][106467] Num visible devices: 1 [2023-09-20 19:12:26,758][106485] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-20 19:12:26,760][106484] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-20 19:12:26,760][106484] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-20 19:12:26,778][106484] Num visible devices: 1 [2023-09-20 19:12:26,795][106487] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-20 19:12:26,799][106463] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-20 19:12:26,799][106463] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-20 19:12:26,806][106467] Starting seed is not provided [2023-09-20 19:12:26,806][106467] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-20 19:12:26,806][106467] Initializing actor-critic model on device cuda:0 [2023-09-20 19:12:26,806][106467] RunningMeanStd input shape: (8,) [2023-09-20 19:12:26,807][106467] RunningMeanStd input shape: (1,) [2023-09-20 19:12:26,818][106463] Num visible devices: 1 [2023-09-20 19:12:26,825][106491] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-20 19:12:26,829][106524] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-20 19:12:26,835][106489] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-20 19:12:26,841][106463] Starting seed is not provided [2023-09-20 19:12:26,841][106463] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-20 19:12:26,841][106463] Initializing actor-critic model on device cuda:0 [2023-09-20 19:12:26,842][106463] RunningMeanStd input shape: (8,) [2023-09-20 19:12:26,843][106463] RunningMeanStd input shape: (1,) [2023-09-20 19:12:26,850][106488] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-20 19:12:26,855][106467] Created Actor Critic model with architecture: [2023-09-20 19:12:26,855][106467] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=2, bias=True) ) ) [2023-09-20 19:12:26,916][106463] Created Actor Critic model with architecture: [2023-09-20 19:12:26,916][106463] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=2, bias=True) ) ) [2023-09-20 19:12:26,928][106490] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-20 19:12:26,929][106556] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-20 19:12:27,468][106463] Using optimizer [2023-09-20 19:12:27,469][106463] No checkpoints found [2023-09-20 19:12:27,469][106463] Did not load from checkpoint, starting from scratch! [2023-09-20 19:12:27,469][106463] Initialized policy 0 weights for model version 0 [2023-09-20 19:12:27,471][106463] LearnerWorker_p0 finished initialization! [2023-09-20 19:12:27,471][106463] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-20 19:12:27,474][106467] Using optimizer [2023-09-20 19:12:27,474][106467] No checkpoints found [2023-09-20 19:12:27,474][106467] Did not load from checkpoint, starting from scratch! [2023-09-20 19:12:27,475][106467] Initialized policy 1 weights for model version 0 [2023-09-20 19:12:27,476][106467] LearnerWorker_p1 finished initialization! [2023-09-20 19:12:27,476][106467] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-20 19:12:28,046][106484] RunningMeanStd input shape: (8,) [2023-09-20 19:12:28,047][106484] RunningMeanStd input shape: (1,) [2023-09-20 19:12:28,057][106486] RunningMeanStd input shape: (8,) [2023-09-20 19:12:28,057][106486] RunningMeanStd input shape: (1,) [2023-09-20 19:12:28,079][105923] Inference worker 0-0 is ready! [2023-09-20 19:12:28,089][105923] Inference worker 1-0 is ready! [2023-09-20 19:12:28,090][105923] All inference workers are ready! Signal rollout workers to start! [2023-09-20 19:12:28,182][106489] Decorrelating experience for 0 frames... [2023-09-20 19:12:28,183][106489] Decorrelating experience for 64 frames... [2023-09-20 19:12:28,187][106485] Decorrelating experience for 0 frames... [2023-09-20 19:12:28,187][106485] Decorrelating experience for 64 frames... [2023-09-20 19:12:28,188][106524] Decorrelating experience for 0 frames... [2023-09-20 19:12:28,189][106524] Decorrelating experience for 64 frames... [2023-09-20 19:12:28,191][106491] Decorrelating experience for 0 frames... [2023-09-20 19:12:28,192][106491] Decorrelating experience for 64 frames... [2023-09-20 19:12:28,194][106489] Decorrelating experience for 128 frames... [2023-09-20 19:12:28,199][106485] Decorrelating experience for 128 frames... [2023-09-20 19:12:28,200][106524] Decorrelating experience for 128 frames... [2023-09-20 19:12:28,203][106491] Decorrelating experience for 128 frames... [2023-09-20 19:12:28,213][106556] Decorrelating experience for 0 frames... [2023-09-20 19:12:28,214][106556] Decorrelating experience for 64 frames... [2023-09-20 19:12:28,215][106488] Decorrelating experience for 0 frames... [2023-09-20 19:12:28,216][106488] Decorrelating experience for 64 frames... [2023-09-20 19:12:28,216][106489] Decorrelating experience for 192 frames... [2023-09-20 19:12:28,221][106485] Decorrelating experience for 192 frames... [2023-09-20 19:12:28,222][106524] Decorrelating experience for 192 frames... [2023-09-20 19:12:28,225][106556] Decorrelating experience for 128 frames... [2023-09-20 19:12:28,226][106491] Decorrelating experience for 192 frames... [2023-09-20 19:12:28,227][106488] Decorrelating experience for 128 frames... [2023-09-20 19:12:28,238][106490] Decorrelating experience for 0 frames... [2023-09-20 19:12:28,239][106490] Decorrelating experience for 64 frames... [2023-09-20 19:12:28,239][106487] Decorrelating experience for 0 frames... [2023-09-20 19:12:28,240][106487] Decorrelating experience for 64 frames... [2023-09-20 19:12:28,247][106556] Decorrelating experience for 192 frames... [2023-09-20 19:12:28,250][106488] Decorrelating experience for 192 frames... [2023-09-20 19:12:28,260][106490] Decorrelating experience for 128 frames... [2023-09-20 19:12:28,261][106487] Decorrelating experience for 128 frames... [2023-09-20 19:12:28,262][106489] Decorrelating experience for 256 frames... [2023-09-20 19:12:28,267][106524] Decorrelating experience for 256 frames... [2023-09-20 19:12:28,268][106485] Decorrelating experience for 256 frames... [2023-09-20 19:12:28,274][106491] Decorrelating experience for 256 frames... [2023-09-20 19:12:28,293][106556] Decorrelating experience for 256 frames... [2023-09-20 19:12:28,296][106488] Decorrelating experience for 256 frames... [2023-09-20 19:12:28,299][106490] Decorrelating experience for 192 frames... [2023-09-20 19:12:28,300][106487] Decorrelating experience for 192 frames... [2023-09-20 19:12:28,305][106489] Decorrelating experience for 320 frames... [2023-09-20 19:12:28,312][106524] Decorrelating experience for 320 frames... [2023-09-20 19:12:28,313][106485] Decorrelating experience for 320 frames... [2023-09-20 19:12:28,319][106491] Decorrelating experience for 320 frames... [2023-09-20 19:12:28,336][106556] Decorrelating experience for 320 frames... [2023-09-20 19:12:28,340][106488] Decorrelating experience for 320 frames... [2023-09-20 19:12:28,358][106489] Decorrelating experience for 384 frames... [2023-09-20 19:12:28,365][106524] Decorrelating experience for 384 frames... [2023-09-20 19:12:28,366][106485] Decorrelating experience for 384 frames... [2023-09-20 19:12:28,372][106491] Decorrelating experience for 384 frames... [2023-09-20 19:12:28,375][106490] Decorrelating experience for 256 frames... [2023-09-20 19:12:28,384][106487] Decorrelating experience for 256 frames... [2023-09-20 19:12:28,389][106556] Decorrelating experience for 384 frames... [2023-09-20 19:12:28,392][106488] Decorrelating experience for 384 frames... [2023-09-20 19:12:28,419][106490] Decorrelating experience for 320 frames... [2023-09-20 19:12:28,427][106489] Decorrelating experience for 448 frames... [2023-09-20 19:12:28,429][106524] Decorrelating experience for 448 frames... [2023-09-20 19:12:28,430][106487] Decorrelating experience for 320 frames... [2023-09-20 19:12:28,438][106491] Decorrelating experience for 448 frames... [2023-09-20 19:12:28,445][106485] Decorrelating experience for 448 frames... [2023-09-20 19:12:28,458][106488] Decorrelating experience for 448 frames... [2023-09-20 19:12:28,467][106556] Decorrelating experience for 448 frames... [2023-09-20 19:12:28,471][106490] Decorrelating experience for 384 frames... [2023-09-20 19:12:28,484][106487] Decorrelating experience for 384 frames... [2023-09-20 19:12:28,537][106490] Decorrelating experience for 448 frames... [2023-09-20 19:12:28,551][106487] Decorrelating experience for 448 frames... [2023-09-20 19:12:31,100][105923] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8192. Throughput: 0: nan, 1: nan. Samples: 14132. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:12:36,100][105923] Fps is (10 sec: 9830.2, 60 sec: 9830.2, 300 sec: 9830.2). Total num frames: 57344. Throughput: 0: 2423.1, 1: 2414.7. Samples: 38322. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:12:36,101][105923] Avg episode reward: [(0, '17.932'), (1, '11.271')] [2023-09-20 19:12:36,105][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000000056_28672.pth... [2023-09-20 19:12:36,106][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000000056_28672.pth... [2023-09-20 19:12:37,478][106484] Updated weights for policy 0, policy_version 80 (0.0016) [2023-09-20 19:12:37,478][106486] Updated weights for policy 1, policy_version 80 (0.0010) [2023-09-20 19:12:41,100][105923] Fps is (10 sec: 11468.6, 60 sec: 11468.6, 300 sec: 11468.6). Total num frames: 122880. Throughput: 0: 5038.5, 1: 5036.3. Samples: 114882. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-20 19:12:41,101][105923] Avg episode reward: [(0, '19.217'), (1, '14.513')] [2023-09-20 19:12:43,916][106484] Updated weights for policy 0, policy_version 160 (0.0014) [2023-09-20 19:12:43,917][106486] Updated weights for policy 1, policy_version 160 (0.0015) [2023-09-20 19:12:44,750][105923] Heartbeat connected on Batcher_0 [2023-09-20 19:12:44,754][105923] Heartbeat connected on LearnerWorker_p0 [2023-09-20 19:12:44,757][105923] Heartbeat connected on Batcher_1 [2023-09-20 19:12:44,760][105923] Heartbeat connected on LearnerWorker_p1 [2023-09-20 19:12:44,766][105923] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-20 19:12:44,770][105923] Heartbeat connected on InferenceWorker_p1-w0 [2023-09-20 19:12:44,774][105923] Heartbeat connected on RolloutWorker_w0 [2023-09-20 19:12:44,775][105923] Heartbeat connected on RolloutWorker_w1 [2023-09-20 19:12:44,778][105923] Heartbeat connected on RolloutWorker_w2 [2023-09-20 19:12:44,780][105923] Heartbeat connected on RolloutWorker_w3 [2023-09-20 19:12:44,784][105923] Heartbeat connected on RolloutWorker_w4 [2023-09-20 19:12:44,786][105923] Heartbeat connected on RolloutWorker_w5 [2023-09-20 19:12:44,789][105923] Heartbeat connected on RolloutWorker_w6 [2023-09-20 19:12:44,793][105923] Heartbeat connected on RolloutWorker_w7 [2023-09-20 19:12:46,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12014.8, 300 sec: 12014.8). Total num frames: 188416. Throughput: 0: 5900.3, 1: 5899.3. Samples: 191128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:12:46,101][105923] Avg episode reward: [(0, '20.024'), (1, '18.865')] [2023-09-20 19:12:50,360][106484] Updated weights for policy 0, policy_version 240 (0.0013) [2023-09-20 19:12:50,360][106486] Updated weights for policy 1, policy_version 240 (0.0014) [2023-09-20 19:12:51,100][105923] Fps is (10 sec: 13107.4, 60 sec: 12288.0, 300 sec: 12288.0). Total num frames: 253952. Throughput: 0: 5385.1, 1: 5380.3. Samples: 229440. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:12:51,101][105923] Avg episode reward: [(0, '22.765'), (1, '25.187')] [2023-09-20 19:12:51,107][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000000248_126976.pth... [2023-09-20 19:12:51,107][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000000248_126976.pth... [2023-09-20 19:12:51,114][106467] Saving new best policy, reward=25.187! [2023-09-20 19:12:51,114][106463] Saving new best policy, reward=22.765! [2023-09-20 19:12:56,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12124.1, 300 sec: 12124.1). Total num frames: 311296. Throughput: 0: 5781.3, 1: 5777.5. Samples: 303106. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:12:56,101][105923] Avg episode reward: [(0, '27.521'), (1, '30.851')] [2023-09-20 19:12:56,103][106467] Saving new best policy, reward=30.851! [2023-09-20 19:12:56,103][106463] Saving new best policy, reward=27.521! [2023-09-20 19:12:56,905][106484] Updated weights for policy 0, policy_version 320 (0.0016) [2023-09-20 19:12:56,905][106486] Updated weights for policy 1, policy_version 320 (0.0013) [2023-09-20 19:13:01,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12287.9, 300 sec: 12287.9). Total num frames: 376832. Throughput: 0: 6169.6, 1: 6168.6. Samples: 384278. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:13:01,101][105923] Avg episode reward: [(0, '32.430'), (1, '33.387')] [2023-09-20 19:13:01,102][106463] Saving new best policy, reward=32.430! [2023-09-20 19:13:01,103][106467] Saving new best policy, reward=33.387! [2023-09-20 19:13:03,074][106486] Updated weights for policy 1, policy_version 400 (0.0010) [2023-09-20 19:13:03,074][106484] Updated weights for policy 0, policy_version 400 (0.0013) [2023-09-20 19:13:06,100][105923] Fps is (10 sec: 13926.5, 60 sec: 12639.0, 300 sec: 12639.0). Total num frames: 450560. Throughput: 0: 5872.0, 1: 5872.2. Samples: 425180. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:13:06,101][105923] Avg episode reward: [(0, '35.503'), (1, '32.729')] [2023-09-20 19:13:06,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000000440_225280.pth... [2023-09-20 19:13:06,109][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000000440_225280.pth... [2023-09-20 19:13:06,112][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000000056_28672.pth [2023-09-20 19:13:06,113][106463] Saving new best policy, reward=35.503! [2023-09-20 19:13:06,114][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000000056_28672.pth [2023-09-20 19:13:09,227][106486] Updated weights for policy 1, policy_version 480 (0.0011) [2023-09-20 19:13:09,227][106484] Updated weights for policy 0, policy_version 480 (0.0013) [2023-09-20 19:13:11,100][105923] Fps is (10 sec: 13926.3, 60 sec: 12697.5, 300 sec: 12697.5). Total num frames: 516096. Throughput: 0: 6123.1, 1: 6123.0. Samples: 503978. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:13:11,101][105923] Avg episode reward: [(0, '36.154'), (1, '32.562')] [2023-09-20 19:13:11,102][106463] Saving new best policy, reward=36.154! [2023-09-20 19:13:15,483][106486] Updated weights for policy 1, policy_version 560 (0.0013) [2023-09-20 19:13:15,484][106484] Updated weights for policy 0, policy_version 560 (0.0014) [2023-09-20 19:13:16,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12561.0, 300 sec: 12561.0). Total num frames: 573440. Throughput: 0: 5882.1, 1: 5881.3. Samples: 543488. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:13:16,101][105923] Avg episode reward: [(0, '36.666'), (1, '31.694')] [2023-09-20 19:13:16,147][106463] Saving new best policy, reward=36.666! [2023-09-20 19:13:21,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12615.6, 300 sec: 12615.6). Total num frames: 638976. Throughput: 0: 6466.0, 1: 6466.4. Samples: 620282. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:13:21,101][105923] Avg episode reward: [(0, '35.156'), (1, '33.331')] [2023-09-20 19:13:21,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000000624_319488.pth... [2023-09-20 19:13:21,111][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000000624_319488.pth... [2023-09-20 19:13:21,114][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000000248_126976.pth [2023-09-20 19:13:21,119][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000000248_126976.pth [2023-09-20 19:13:22,035][106484] Updated weights for policy 0, policy_version 640 (0.0015) [2023-09-20 19:13:22,036][106486] Updated weights for policy 1, policy_version 640 (0.0015) [2023-09-20 19:13:26,100][105923] Fps is (10 sec: 13107.5, 60 sec: 12660.4, 300 sec: 12660.4). Total num frames: 704512. Throughput: 0: 6449.7, 1: 6450.1. Samples: 695370. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:13:26,101][105923] Avg episode reward: [(0, '35.088'), (1, '35.366')] [2023-09-20 19:13:26,101][106467] Saving new best policy, reward=35.366! [2023-09-20 19:13:28,475][106484] Updated weights for policy 0, policy_version 720 (0.0016) [2023-09-20 19:13:28,475][106486] Updated weights for policy 1, policy_version 720 (0.0015) [2023-09-20 19:13:31,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12561.0, 300 sec: 12561.0). Total num frames: 761856. Throughput: 0: 6023.0, 1: 6022.6. Samples: 733180. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:13:31,101][105923] Avg episode reward: [(0, '36.033'), (1, '38.665')] [2023-09-20 19:13:31,113][106467] Saving new best policy, reward=38.665! [2023-09-20 19:13:34,854][106486] Updated weights for policy 1, policy_version 800 (0.0015) [2023-09-20 19:13:34,854][106484] Updated weights for policy 0, policy_version 800 (0.0013) [2023-09-20 19:13:36,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12603.1). Total num frames: 827392. Throughput: 0: 6458.6, 1: 6459.1. Samples: 810738. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:13:36,101][105923] Avg episode reward: [(0, '37.505'), (1, '40.142')] [2023-09-20 19:13:36,124][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000000816_417792.pth... [2023-09-20 19:13:36,128][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000000440_225280.pth [2023-09-20 19:13:36,129][106467] Saving new best policy, reward=40.142! [2023-09-20 19:13:36,131][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000000816_417792.pth... [2023-09-20 19:13:36,134][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000000440_225280.pth [2023-09-20 19:13:36,134][106463] Saving new best policy, reward=37.505! [2023-09-20 19:13:40,981][106486] Updated weights for policy 1, policy_version 880 (0.0013) [2023-09-20 19:13:40,981][106484] Updated weights for policy 0, policy_version 880 (0.0010) [2023-09-20 19:13:41,100][105923] Fps is (10 sec: 13926.7, 60 sec: 12970.7, 300 sec: 12756.1). Total num frames: 901120. Throughput: 0: 6532.4, 1: 6534.6. Samples: 891120. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-20 19:13:41,101][105923] Avg episode reward: [(0, '37.446'), (1, '39.856')] [2023-09-20 19:13:46,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12670.3). Total num frames: 958464. Throughput: 0: 6471.2, 1: 6470.0. Samples: 966632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:13:46,101][105923] Avg episode reward: [(0, '36.302'), (1, '39.682')] [2023-09-20 19:13:47,447][106486] Updated weights for policy 1, policy_version 960 (0.0014) [2023-09-20 19:13:47,448][106484] Updated weights for policy 0, policy_version 960 (0.0014) [2023-09-20 19:13:51,100][105923] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12697.6). Total num frames: 1024000. Throughput: 0: 6462.6, 1: 6462.2. Samples: 1006796. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:13:51,101][105923] Avg episode reward: [(0, '36.063'), (1, '39.127')] [2023-09-20 19:13:51,145][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001008_516096.pth... [2023-09-20 19:13:51,148][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000000624_319488.pth [2023-09-20 19:13:51,155][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001008_516096.pth... [2023-09-20 19:13:51,158][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000000624_319488.pth [2023-09-20 19:13:53,631][106484] Updated weights for policy 0, policy_version 1040 (0.0013) [2023-09-20 19:13:53,633][106486] Updated weights for policy 1, policy_version 1040 (0.0013) [2023-09-20 19:13:56,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12721.7). Total num frames: 1089536. Throughput: 0: 6438.4, 1: 6438.2. Samples: 1083422. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:13:56,101][105923] Avg episode reward: [(0, '36.746'), (1, '39.015')] [2023-09-20 19:13:59,903][106484] Updated weights for policy 0, policy_version 1120 (0.0013) [2023-09-20 19:13:59,904][106486] Updated weights for policy 1, policy_version 1120 (0.0015) [2023-09-20 19:14:01,100][105923] Fps is (10 sec: 13926.3, 60 sec: 13107.2, 300 sec: 12834.1). Total num frames: 1163264. Throughput: 0: 6897.2, 1: 6897.9. Samples: 1164268. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:14:01,101][105923] Avg episode reward: [(0, '37.161'), (1, '38.234')] [2023-09-20 19:14:06,100][105923] Fps is (10 sec: 13516.4, 60 sec: 12902.4, 300 sec: 12805.3). Total num frames: 1224704. Throughput: 0: 6485.4, 1: 6484.7. Samples: 1203940. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:14:06,101][105923] Avg episode reward: [(0, '38.054'), (1, '38.090')] [2023-09-20 19:14:06,104][106484] Updated weights for policy 0, policy_version 1200 (0.0015) [2023-09-20 19:14:06,104][106486] Updated weights for policy 1, policy_version 1200 (0.0013) [2023-09-20 19:14:06,108][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001200_614400.pth... [2023-09-20 19:14:06,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001200_614400.pth... [2023-09-20 19:14:06,112][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000000816_417792.pth [2023-09-20 19:14:06,115][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000000816_417792.pth [2023-09-20 19:14:06,116][106463] Saving new best policy, reward=38.054! [2023-09-20 19:14:11,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12861.4). Total num frames: 1294336. Throughput: 0: 6539.7, 1: 6539.1. Samples: 1283920. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:14:11,101][105923] Avg episode reward: [(0, '38.800'), (1, '40.398')] [2023-09-20 19:14:11,102][106463] Saving new best policy, reward=38.800! [2023-09-20 19:14:11,103][106467] Saving new best policy, reward=40.398! [2023-09-20 19:14:12,263][106484] Updated weights for policy 0, policy_version 1280 (0.0010) [2023-09-20 19:14:12,263][106486] Updated weights for policy 1, policy_version 1280 (0.0013) [2023-09-20 19:14:16,100][105923] Fps is (10 sec: 13517.0, 60 sec: 13107.2, 300 sec: 12873.1). Total num frames: 1359872. Throughput: 0: 6985.4, 1: 6985.4. Samples: 1361868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:14:16,101][105923] Avg episode reward: [(0, '41.008'), (1, '42.087')] [2023-09-20 19:14:16,102][106463] Saving new best policy, reward=41.008! [2023-09-20 19:14:16,103][106467] Saving new best policy, reward=42.087! [2023-09-20 19:14:18,539][106486] Updated weights for policy 1, policy_version 1360 (0.0014) [2023-09-20 19:14:18,540][106484] Updated weights for policy 0, policy_version 1360 (0.0015) [2023-09-20 19:14:21,112][105923] Fps is (10 sec: 13091.4, 60 sec: 13104.6, 300 sec: 12882.3). Total num frames: 1425408. Throughput: 0: 6544.0, 1: 6545.3. Samples: 1399916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:14:21,114][105923] Avg episode reward: [(0, '43.821'), (1, '44.023')] [2023-09-20 19:14:21,122][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001392_712704.pth... [2023-09-20 19:14:21,122][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001392_712704.pth... [2023-09-20 19:14:21,127][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001008_516096.pth [2023-09-20 19:14:21,128][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001008_516096.pth [2023-09-20 19:14:21,128][106463] Saving new best policy, reward=43.821! [2023-09-20 19:14:21,128][106467] Saving new best policy, reward=44.023! [2023-09-20 19:14:24,449][106467] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000008 [2023-09-20 19:14:25,106][106486] Updated weights for policy 1, policy_version 1440 (0.0013) [2023-09-20 19:14:25,107][106484] Updated weights for policy 0, policy_version 1440 (0.0015) [2023-09-20 19:14:26,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12970.6, 300 sec: 12822.3). Total num frames: 1482752. Throughput: 0: 6484.5, 1: 6482.3. Samples: 1474626. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:14:26,101][105923] Avg episode reward: [(0, '49.531'), (1, '44.385')] [2023-09-20 19:14:26,102][106463] Saving new best policy, reward=49.531! [2023-09-20 19:14:26,102][106467] Saving new best policy, reward=44.385! [2023-09-20 19:14:31,100][105923] Fps is (10 sec: 12302.7, 60 sec: 13107.2, 300 sec: 12834.1). Total num frames: 1548288. Throughput: 0: 6498.2, 1: 6499.4. Samples: 1551526. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:14:31,102][105923] Avg episode reward: [(0, '53.447'), (1, '46.314')] [2023-09-20 19:14:31,103][106463] Saving new best policy, reward=53.447! [2023-09-20 19:14:31,103][106467] Saving new best policy, reward=46.314! [2023-09-20 19:14:31,532][106484] Updated weights for policy 0, policy_version 1520 (0.0013) [2023-09-20 19:14:31,532][106486] Updated weights for policy 1, policy_version 1520 (0.0013) [2023-09-20 19:14:36,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 12779.5). Total num frames: 1605632. Throughput: 0: 6473.4, 1: 6471.5. Samples: 1589316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:14:36,101][105923] Avg episode reward: [(0, '58.202'), (1, '52.955')] [2023-09-20 19:14:36,136][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001576_806912.pth... [2023-09-20 19:14:36,137][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001576_806912.pth... [2023-09-20 19:14:36,140][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001200_614400.pth [2023-09-20 19:14:36,140][106463] Saving new best policy, reward=58.202! [2023-09-20 19:14:36,141][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001200_614400.pth [2023-09-20 19:14:36,141][106467] Saving new best policy, reward=52.955! [2023-09-20 19:14:37,853][106484] Updated weights for policy 0, policy_version 1600 (0.0012) [2023-09-20 19:14:37,854][106486] Updated weights for policy 1, policy_version 1600 (0.0017) [2023-09-20 19:14:41,100][105923] Fps is (10 sec: 13107.6, 60 sec: 12970.7, 300 sec: 12855.1). Total num frames: 1679360. Throughput: 0: 6495.6, 1: 6496.0. Samples: 1668044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:14:41,101][105923] Avg episode reward: [(0, '62.241'), (1, '65.394')] [2023-09-20 19:14:41,101][106463] Saving new best policy, reward=62.241! [2023-09-20 19:14:41,102][106467] Saving new best policy, reward=65.394! [2023-09-20 19:14:44,148][106484] Updated weights for policy 0, policy_version 1680 (0.0012) [2023-09-20 19:14:44,149][106486] Updated weights for policy 1, policy_version 1680 (0.0015) [2023-09-20 19:14:46,100][105923] Fps is (10 sec: 13926.2, 60 sec: 13107.2, 300 sec: 12864.5). Total num frames: 1744896. Throughput: 0: 6457.5, 1: 6456.5. Samples: 1745398. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:14:46,101][105923] Avg episode reward: [(0, '68.057'), (1, '77.795')] [2023-09-20 19:14:46,102][106463] Saving new best policy, reward=68.057! [2023-09-20 19:14:46,103][106467] Saving new best policy, reward=77.795! [2023-09-20 19:14:50,392][106486] Updated weights for policy 1, policy_version 1760 (0.0013) [2023-09-20 19:14:50,393][106484] Updated weights for policy 0, policy_version 1760 (0.0013) [2023-09-20 19:14:51,100][105923] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12873.1). Total num frames: 1810432. Throughput: 0: 6463.6, 1: 6463.5. Samples: 1785658. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:14:51,101][105923] Avg episode reward: [(0, '73.434'), (1, '92.593')] [2023-09-20 19:14:51,108][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001768_905216.pth... [2023-09-20 19:14:51,108][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001768_905216.pth... [2023-09-20 19:14:51,112][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001392_712704.pth [2023-09-20 19:14:51,112][106463] Saving new best policy, reward=73.434! [2023-09-20 19:14:51,114][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001392_712704.pth [2023-09-20 19:14:51,115][106467] Saving new best policy, reward=92.593! [2023-09-20 19:14:56,100][105923] Fps is (10 sec: 12288.2, 60 sec: 12970.7, 300 sec: 12824.7). Total num frames: 1867776. Throughput: 0: 6428.4, 1: 6428.6. Samples: 1862480. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:14:56,101][105923] Avg episode reward: [(0, '76.125'), (1, '105.086')] [2023-09-20 19:14:56,102][106463] Saving new best policy, reward=76.125! [2023-09-20 19:14:56,102][106467] Saving new best policy, reward=105.086! [2023-09-20 19:14:56,908][106484] Updated weights for policy 0, policy_version 1840 (0.0015) [2023-09-20 19:14:56,908][106486] Updated weights for policy 1, policy_version 1840 (0.0014) [2023-09-20 19:15:01,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12834.2, 300 sec: 12834.1). Total num frames: 1933312. Throughput: 0: 6361.7, 1: 5971.9. Samples: 1916880. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-20 19:15:01,101][105923] Avg episode reward: [(0, '78.070'), (1, '117.899')] [2023-09-20 19:15:01,101][106463] Saving new best policy, reward=78.070! [2023-09-20 19:15:01,101][106467] Saving new best policy, reward=117.899! [2023-09-20 19:15:03,457][106484] Updated weights for policy 0, policy_version 1920 (0.0014) [2023-09-20 19:15:03,457][106486] Updated weights for policy 1, policy_version 1920 (0.0014) [2023-09-20 19:15:06,100][105923] Fps is (10 sec: 12287.7, 60 sec: 12765.9, 300 sec: 12790.1). Total num frames: 1990656. Throughput: 0: 6384.4, 1: 6382.6. Samples: 1974280. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:15:06,101][105923] Avg episode reward: [(0, '81.527'), (1, '122.922')] [2023-09-20 19:15:06,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001952_999424.pth... [2023-09-20 19:15:06,112][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001576_806912.pth [2023-09-20 19:15:06,112][106463] Saving new best policy, reward=81.527! [2023-09-20 19:15:06,113][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001952_999424.pth... [2023-09-20 19:15:06,116][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001576_806912.pth [2023-09-20 19:15:06,116][106467] Saving new best policy, reward=122.922! [2023-09-20 19:15:09,578][106486] Updated weights for policy 1, policy_version 2000 (0.0014) [2023-09-20 19:15:09,579][106484] Updated weights for policy 0, policy_version 2000 (0.0015) [2023-09-20 19:15:11,100][105923] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12851.2). Total num frames: 2064384. Throughput: 0: 6461.9, 1: 6461.9. Samples: 2056196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:15:11,101][105923] Avg episode reward: [(0, '84.889'), (1, '125.522')] [2023-09-20 19:15:11,102][106463] Saving new best policy, reward=84.889! [2023-09-20 19:15:11,103][106467] Saving new best policy, reward=125.522! [2023-09-20 19:15:15,930][106484] Updated weights for policy 0, policy_version 2080 (0.0014) [2023-09-20 19:15:15,931][106486] Updated weights for policy 1, policy_version 2080 (0.0013) [2023-09-20 19:15:16,100][105923] Fps is (10 sec: 13926.8, 60 sec: 12834.2, 300 sec: 12859.0). Total num frames: 2129920. Throughput: 0: 6449.4, 1: 6449.1. Samples: 2131954. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-20 19:15:16,101][105923] Avg episode reward: [(0, '87.044'), (1, '123.567')] [2023-09-20 19:15:16,101][106463] Saving new best policy, reward=87.044! [2023-09-20 19:15:21,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12700.2, 300 sec: 12818.1). Total num frames: 2187264. Throughput: 0: 6441.0, 1: 6442.9. Samples: 2169094. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:15:21,101][105923] Avg episode reward: [(0, '87.603'), (1, '122.461')] [2023-09-20 19:15:21,106][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000002136_1093632.pth... [2023-09-20 19:15:21,107][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000002136_1093632.pth... [2023-09-20 19:15:21,111][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001768_905216.pth [2023-09-20 19:15:21,114][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001768_905216.pth [2023-09-20 19:15:21,115][106463] Saving new best policy, reward=87.603! [2023-09-20 19:15:22,414][106486] Updated weights for policy 1, policy_version 2160 (0.0012) [2023-09-20 19:15:22,414][106484] Updated weights for policy 0, policy_version 2160 (0.0017) [2023-09-20 19:15:26,100][105923] Fps is (10 sec: 12287.7, 60 sec: 12834.1, 300 sec: 12826.3). Total num frames: 2252800. Throughput: 0: 6428.2, 1: 6428.1. Samples: 2246580. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:15:26,101][105923] Avg episode reward: [(0, '87.080'), (1, '121.341')] [2023-09-20 19:15:28,568][106486] Updated weights for policy 1, policy_version 2240 (0.0015) [2023-09-20 19:15:28,568][106484] Updated weights for policy 0, policy_version 2240 (0.0015) [2023-09-20 19:15:31,100][105923] Fps is (10 sec: 13926.6, 60 sec: 12970.7, 300 sec: 12879.6). Total num frames: 2326528. Throughput: 0: 6484.8, 1: 6485.6. Samples: 2329064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:15:31,101][105923] Avg episode reward: [(0, '87.983'), (1, '118.775')] [2023-09-20 19:15:31,102][106463] Saving new best policy, reward=87.983! [2023-09-20 19:15:34,468][106484] Updated weights for policy 0, policy_version 2320 (0.0012) [2023-09-20 19:15:34,468][106486] Updated weights for policy 1, policy_version 2320 (0.0014) [2023-09-20 19:15:36,100][105923] Fps is (10 sec: 13926.4, 60 sec: 13107.1, 300 sec: 12885.8). Total num frames: 2392064. Throughput: 0: 6499.9, 1: 6500.6. Samples: 2370684. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:15:36,101][105923] Avg episode reward: [(0, '88.927'), (1, '112.677')] [2023-09-20 19:15:36,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000002336_1196032.pth... [2023-09-20 19:15:36,109][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000002336_1196032.pth... [2023-09-20 19:15:36,116][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000001952_999424.pth [2023-09-20 19:15:36,116][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000001952_999424.pth [2023-09-20 19:15:36,117][106463] Saving new best policy, reward=88.927! [2023-09-20 19:15:40,653][106484] Updated weights for policy 0, policy_version 2400 (0.0014) [2023-09-20 19:15:40,653][106486] Updated weights for policy 1, policy_version 2400 (0.0014) [2023-09-20 19:15:41,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12891.6). Total num frames: 2457600. Throughput: 0: 6527.0, 1: 6527.5. Samples: 2449934. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:15:41,101][105923] Avg episode reward: [(0, '91.199'), (1, '104.138')] [2023-09-20 19:15:41,102][106463] Saving new best policy, reward=91.199! [2023-09-20 19:15:46,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12897.1). Total num frames: 2523136. Throughput: 0: 6607.8, 1: 6997.9. Samples: 2529140. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:15:46,101][105923] Avg episode reward: [(0, '91.102'), (1, '94.899')] [2023-09-20 19:15:46,964][106486] Updated weights for policy 1, policy_version 2480 (0.0016) [2023-09-20 19:15:46,964][106484] Updated weights for policy 0, policy_version 2480 (0.0014) [2023-09-20 19:15:51,100][105923] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 12902.4). Total num frames: 2588672. Throughput: 0: 6559.4, 1: 6561.1. Samples: 2564702. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:15:51,101][105923] Avg episode reward: [(0, '90.916'), (1, '82.982')] [2023-09-20 19:15:51,108][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000002528_1294336.pth... [2023-09-20 19:15:51,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000002528_1294336.pth... [2023-09-20 19:15:51,116][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000002136_1093632.pth [2023-09-20 19:15:51,117][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000002136_1093632.pth [2023-09-20 19:15:53,558][106484] Updated weights for policy 0, policy_version 2560 (0.0014) [2023-09-20 19:15:53,558][106486] Updated weights for policy 1, policy_version 2560 (0.0016) [2023-09-20 19:15:56,100][105923] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12907.4). Total num frames: 2654208. Throughput: 0: 6515.7, 1: 6517.8. Samples: 2642704. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:15:56,101][105923] Avg episode reward: [(0, '90.377'), (1, '70.315')] [2023-09-20 19:15:59,662][106486] Updated weights for policy 1, policy_version 2640 (0.0012) [2023-09-20 19:15:59,662][106484] Updated weights for policy 0, policy_version 2640 (0.0010) [2023-09-20 19:16:01,100][105923] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 12912.2). Total num frames: 2719744. Throughput: 0: 6572.2, 1: 6573.0. Samples: 2723488. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:16:01,101][105923] Avg episode reward: [(0, '91.860'), (1, '58.392')] [2023-09-20 19:16:01,102][106463] Saving new best policy, reward=91.860! [2023-09-20 19:16:05,921][106486] Updated weights for policy 1, policy_version 2720 (0.0014) [2023-09-20 19:16:05,921][106484] Updated weights for policy 0, policy_version 2720 (0.0015) [2023-09-20 19:16:06,100][105923] Fps is (10 sec: 13107.3, 60 sec: 13243.8, 300 sec: 12916.7). Total num frames: 2785280. Throughput: 0: 6580.2, 1: 6580.1. Samples: 2761308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:16:06,101][105923] Avg episode reward: [(0, '93.955'), (1, '53.196')] [2023-09-20 19:16:06,107][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000002720_1392640.pth... [2023-09-20 19:16:06,107][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000002720_1392640.pth... [2023-09-20 19:16:06,110][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000002336_1196032.pth [2023-09-20 19:16:06,110][106463] Saving new best policy, reward=93.955! [2023-09-20 19:16:06,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000002336_1196032.pth [2023-09-20 19:16:11,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12883.8). Total num frames: 2842624. Throughput: 0: 6533.5, 1: 6532.2. Samples: 2834536. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:16:11,101][105923] Avg episode reward: [(0, '97.096'), (1, '51.152')] [2023-09-20 19:16:11,102][106463] Saving new best policy, reward=97.096! [2023-09-20 19:16:12,778][106484] Updated weights for policy 0, policy_version 2800 (0.0012) [2023-09-20 19:16:12,779][106486] Updated weights for policy 1, policy_version 2800 (0.0016) [2023-09-20 19:16:16,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12970.6, 300 sec: 12888.7). Total num frames: 2908160. Throughput: 0: 6468.3, 1: 6467.7. Samples: 2911184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:16:16,101][105923] Avg episode reward: [(0, '100.149'), (1, '49.455')] [2023-09-20 19:16:16,102][106463] Saving new best policy, reward=100.149! [2023-09-20 19:16:19,320][106484] Updated weights for policy 0, policy_version 2880 (0.0013) [2023-09-20 19:16:19,321][106486] Updated weights for policy 1, policy_version 2880 (0.0015) [2023-09-20 19:16:21,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12857.9). Total num frames: 2965504. Throughput: 0: 6401.3, 1: 6402.0. Samples: 2946834. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:16:21,101][105923] Avg episode reward: [(0, '103.951'), (1, '48.156')] [2023-09-20 19:16:21,157][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000002904_1486848.pth... [2023-09-20 19:16:21,159][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000002904_1486848.pth... [2023-09-20 19:16:21,160][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000002528_1294336.pth [2023-09-20 19:16:21,161][106463] Saving new best policy, reward=103.951! [2023-09-20 19:16:21,163][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000002528_1294336.pth [2023-09-20 19:16:25,456][106486] Updated weights for policy 1, policy_version 2960 (0.0013) [2023-09-20 19:16:25,456][106484] Updated weights for policy 0, policy_version 2960 (0.0013) [2023-09-20 19:16:26,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 12863.2). Total num frames: 3031040. Throughput: 0: 6407.4, 1: 6407.4. Samples: 3026600. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:16:26,101][105923] Avg episode reward: [(0, '107.772'), (1, '49.476')] [2023-09-20 19:16:26,135][106463] Saving new best policy, reward=107.772! [2023-09-20 19:16:31,100][105923] Fps is (10 sec: 13926.5, 60 sec: 12970.6, 300 sec: 12902.4). Total num frames: 3104768. Throughput: 0: 6397.5, 1: 6395.8. Samples: 3104840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:16:31,101][105923] Avg episode reward: [(0, '109.267'), (1, '49.848')] [2023-09-20 19:16:31,102][106463] Saving new best policy, reward=109.267! [2023-09-20 19:16:31,760][106484] Updated weights for policy 0, policy_version 3040 (0.0011) [2023-09-20 19:16:31,761][106486] Updated weights for policy 1, policy_version 3040 (0.0014) [2023-09-20 19:16:36,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 12873.1). Total num frames: 3162112. Throughput: 0: 6415.9, 1: 6415.4. Samples: 3142110. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-20 19:16:36,101][105923] Avg episode reward: [(0, '108.928'), (1, '51.064')] [2023-09-20 19:16:36,107][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000003088_1581056.pth... [2023-09-20 19:16:36,107][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000003088_1581056.pth... [2023-09-20 19:16:36,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000002720_1392640.pth [2023-09-20 19:16:36,115][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000002720_1392640.pth [2023-09-20 19:16:38,126][106486] Updated weights for policy 1, policy_version 3120 (0.0013) [2023-09-20 19:16:38,126][106484] Updated weights for policy 0, policy_version 3120 (0.0013) [2023-09-20 19:16:41,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12877.8). Total num frames: 3227648. Throughput: 0: 6428.6, 1: 6428.8. Samples: 3221284. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:16:41,101][105923] Avg episode reward: [(0, '106.536'), (1, '54.543')] [2023-09-20 19:16:44,503][106486] Updated weights for policy 1, policy_version 3200 (0.0015) [2023-09-20 19:16:44,504][106484] Updated weights for policy 0, policy_version 3200 (0.0014) [2023-09-20 19:16:45,129][106463] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000007 [2023-09-20 19:16:46,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12882.3). Total num frames: 3293184. Throughput: 0: 6375.5, 1: 6375.6. Samples: 3297288. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:16:46,101][105923] Avg episode reward: [(0, '103.855'), (1, '58.359')] [2023-09-20 19:16:50,772][106486] Updated weights for policy 1, policy_version 3280 (0.0014) [2023-09-20 19:16:50,773][106484] Updated weights for policy 0, policy_version 3280 (0.0012) [2023-09-20 19:16:51,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12834.2, 300 sec: 12886.6). Total num frames: 3358720. Throughput: 0: 6397.5, 1: 6397.8. Samples: 3337096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:16:51,101][105923] Avg episode reward: [(0, '102.222'), (1, '61.192')] [2023-09-20 19:16:51,108][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000003280_1679360.pth... [2023-09-20 19:16:51,109][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000003280_1679360.pth... [2023-09-20 19:16:51,112][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000002904_1486848.pth [2023-09-20 19:16:51,115][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000002904_1486848.pth [2023-09-20 19:16:56,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12890.8). Total num frames: 3424256. Throughput: 0: 6486.8, 1: 6488.0. Samples: 3418406. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:16:56,101][105923] Avg episode reward: [(0, '100.234'), (1, '59.851')] [2023-09-20 19:16:56,814][106486] Updated weights for policy 1, policy_version 3360 (0.0015) [2023-09-20 19:16:56,815][106484] Updated weights for policy 0, policy_version 3360 (0.0016) [2023-09-20 19:17:01,100][105923] Fps is (10 sec: 13926.4, 60 sec: 12970.6, 300 sec: 12925.1). Total num frames: 3497984. Throughput: 0: 6533.1, 1: 6533.3. Samples: 3499174. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:17:01,101][105923] Avg episode reward: [(0, '100.015'), (1, '59.450')] [2023-09-20 19:17:02,859][106486] Updated weights for policy 1, policy_version 3440 (0.0015) [2023-09-20 19:17:02,860][106484] Updated weights for policy 0, policy_version 3440 (0.0015) [2023-09-20 19:17:06,100][105923] Fps is (10 sec: 13926.3, 60 sec: 12970.6, 300 sec: 12928.5). Total num frames: 3563520. Throughput: 0: 6594.8, 1: 6594.3. Samples: 3540344. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:17:06,101][105923] Avg episode reward: [(0, '100.079'), (1, '57.903')] [2023-09-20 19:17:06,111][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000003480_1781760.pth... [2023-09-20 19:17:06,111][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000003480_1781760.pth... [2023-09-20 19:17:06,120][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000003088_1581056.pth [2023-09-20 19:17:06,120][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000003088_1581056.pth [2023-09-20 19:17:08,965][106484] Updated weights for policy 0, policy_version 3520 (0.0015) [2023-09-20 19:17:08,966][106486] Updated weights for policy 1, policy_version 3520 (0.0009) [2023-09-20 19:17:11,100][105923] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12931.7). Total num frames: 3629056. Throughput: 0: 6596.0, 1: 6595.5. Samples: 3620214. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-20 19:17:11,101][105923] Avg episode reward: [(0, '100.135'), (1, '56.415')] [2023-09-20 19:17:15,350][106484] Updated weights for policy 0, policy_version 3600 (0.0013) [2023-09-20 19:17:15,350][106486] Updated weights for policy 1, policy_version 3600 (0.0014) [2023-09-20 19:17:16,100][105923] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12934.7). Total num frames: 3694592. Throughput: 0: 6567.2, 1: 6568.8. Samples: 3695960. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:17:16,101][105923] Avg episode reward: [(0, '96.148'), (1, '56.204')] [2023-09-20 19:17:21,100][105923] Fps is (10 sec: 13106.9, 60 sec: 13243.7, 300 sec: 12937.7). Total num frames: 3760128. Throughput: 0: 6595.0, 1: 6594.0. Samples: 3735618. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:17:21,101][105923] Avg episode reward: [(0, '89.670'), (1, '58.843')] [2023-09-20 19:17:21,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000003672_1880064.pth... [2023-09-20 19:17:21,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000003672_1880064.pth... [2023-09-20 19:17:21,116][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000003280_1679360.pth [2023-09-20 19:17:21,118][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000003280_1679360.pth [2023-09-20 19:17:21,590][106486] Updated weights for policy 1, policy_version 3680 (0.0012) [2023-09-20 19:17:21,591][106484] Updated weights for policy 0, policy_version 3680 (0.0011) [2023-09-20 19:17:26,100][105923] Fps is (10 sec: 13107.1, 60 sec: 13243.7, 300 sec: 12940.6). Total num frames: 3825664. Throughput: 0: 6587.9, 1: 6587.4. Samples: 3814176. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:17:26,101][105923] Avg episode reward: [(0, '78.716'), (1, '62.822')] [2023-09-20 19:17:27,977][106486] Updated weights for policy 1, policy_version 3760 (0.0014) [2023-09-20 19:17:27,977][106484] Updated weights for policy 0, policy_version 3760 (0.0013) [2023-09-20 19:17:31,100][105923] Fps is (10 sec: 12288.4, 60 sec: 12970.7, 300 sec: 12968.4). Total num frames: 3883008. Throughput: 0: 6570.5, 1: 6570.0. Samples: 3888610. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:17:31,101][105923] Avg episode reward: [(0, '71.375'), (1, '69.420')] [2023-09-20 19:17:34,664][106486] Updated weights for policy 1, policy_version 3840 (0.0015) [2023-09-20 19:17:34,665][106484] Updated weights for policy 0, policy_version 3840 (0.0015) [2023-09-20 19:17:36,100][105923] Fps is (10 sec: 12287.9, 60 sec: 13107.2, 300 sec: 12968.3). Total num frames: 3948544. Throughput: 0: 6525.9, 1: 6525.3. Samples: 3924402. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:17:36,101][105923] Avg episode reward: [(0, '67.571'), (1, '76.763')] [2023-09-20 19:17:36,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000003856_1974272.pth... [2023-09-20 19:17:36,109][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000003856_1974272.pth... [2023-09-20 19:17:36,116][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000003480_1781760.pth [2023-09-20 19:17:36,120][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000003480_1781760.pth [2023-09-20 19:17:41,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 4005888. Throughput: 0: 6476.0, 1: 6475.8. Samples: 4001234. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:17:41,101][105923] Avg episode reward: [(0, '64.807'), (1, '82.967')] [2023-09-20 19:17:41,242][106484] Updated weights for policy 0, policy_version 3920 (0.0010) [2023-09-20 19:17:41,243][106486] Updated weights for policy 1, policy_version 3920 (0.0016) [2023-09-20 19:17:46,100][105923] Fps is (10 sec: 12288.2, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 4071424. Throughput: 0: 6444.9, 1: 6444.4. Samples: 4079194. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:17:46,101][105923] Avg episode reward: [(0, '64.873'), (1, '86.099')] [2023-09-20 19:17:47,540][106484] Updated weights for policy 0, policy_version 4000 (0.0012) [2023-09-20 19:17:47,541][106486] Updated weights for policy 1, policy_version 4000 (0.0013) [2023-09-20 19:17:51,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12968.4). Total num frames: 4136960. Throughput: 0: 6388.0, 1: 6387.5. Samples: 4115240. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:17:51,101][105923] Avg episode reward: [(0, '63.626'), (1, '86.231')] [2023-09-20 19:17:51,106][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004040_2068480.pth... [2023-09-20 19:17:51,107][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004040_2068480.pth... [2023-09-20 19:17:51,116][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000003672_1880064.pth [2023-09-20 19:17:51,117][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000003672_1880064.pth [2023-09-20 19:17:54,097][106484] Updated weights for policy 0, policy_version 4080 (0.0014) [2023-09-20 19:17:54,098][106486] Updated weights for policy 1, policy_version 4080 (0.0014) [2023-09-20 19:17:56,100][105923] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12968.3). Total num frames: 4202496. Throughput: 0: 6362.5, 1: 6362.8. Samples: 4192858. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:17:56,101][105923] Avg episode reward: [(0, '66.852'), (1, '86.914')] [2023-09-20 19:18:00,219][106484] Updated weights for policy 0, policy_version 4160 (0.0013) [2023-09-20 19:18:00,219][106486] Updated weights for policy 1, policy_version 4160 (0.0015) [2023-09-20 19:18:01,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 12940.6). Total num frames: 4268032. Throughput: 0: 6402.6, 1: 6403.0. Samples: 4272212. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-20 19:18:01,101][105923] Avg episode reward: [(0, '74.766'), (1, '86.777')] [2023-09-20 19:18:06,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12940.6). Total num frames: 4333568. Throughput: 0: 6378.9, 1: 6380.9. Samples: 4309804. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:18:06,101][105923] Avg episode reward: [(0, '78.228'), (1, '83.675')] [2023-09-20 19:18:06,107][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004232_2166784.pth... [2023-09-20 19:18:06,107][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004232_2166784.pth... [2023-09-20 19:18:06,113][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000003856_1974272.pth [2023-09-20 19:18:06,114][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000003856_1974272.pth [2023-09-20 19:18:06,345][106486] Updated weights for policy 1, policy_version 4240 (0.0013) [2023-09-20 19:18:06,345][106484] Updated weights for policy 0, policy_version 4240 (0.0014) [2023-09-20 19:18:11,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 4399104. Throughput: 0: 6411.3, 1: 6411.6. Samples: 4391208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:18:11,101][105923] Avg episode reward: [(0, '76.991'), (1, '82.452')] [2023-09-20 19:18:12,763][106484] Updated weights for policy 0, policy_version 4320 (0.0013) [2023-09-20 19:18:12,763][106486] Updated weights for policy 1, policy_version 4320 (0.0011) [2023-09-20 19:18:16,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 4464640. Throughput: 0: 6424.5, 1: 6425.0. Samples: 4466840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:18:16,101][105923] Avg episode reward: [(0, '72.423'), (1, '82.210')] [2023-09-20 19:18:19,167][106484] Updated weights for policy 0, policy_version 4400 (0.0013) [2023-09-20 19:18:19,167][106486] Updated weights for policy 1, policy_version 4400 (0.0015) [2023-09-20 19:18:21,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12697.6, 300 sec: 12940.6). Total num frames: 4521984. Throughput: 0: 6458.2, 1: 6457.1. Samples: 4505592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:18:21,101][105923] Avg episode reward: [(0, '70.378'), (1, '83.121')] [2023-09-20 19:18:21,125][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004424_2265088.pth... [2023-09-20 19:18:21,129][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004040_2068480.pth [2023-09-20 19:18:21,130][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004424_2265088.pth... [2023-09-20 19:18:21,134][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004040_2068480.pth [2023-09-20 19:18:25,735][106484] Updated weights for policy 0, policy_version 4480 (0.0011) [2023-09-20 19:18:25,735][106486] Updated weights for policy 1, policy_version 4480 (0.0012) [2023-09-20 19:18:26,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12697.6, 300 sec: 12968.4). Total num frames: 4587520. Throughput: 0: 6424.1, 1: 6422.4. Samples: 4579328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:18:26,101][105923] Avg episode reward: [(0, '71.825'), (1, '82.972')] [2023-09-20 19:18:31,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 4653056. Throughput: 0: 6450.8, 1: 6451.4. Samples: 4659794. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-20 19:18:31,101][105923] Avg episode reward: [(0, '73.702'), (1, '83.612')] [2023-09-20 19:18:32,007][106486] Updated weights for policy 1, policy_version 4560 (0.0014) [2023-09-20 19:18:32,007][106484] Updated weights for policy 0, policy_version 4560 (0.0015) [2023-09-20 19:18:36,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 12940.6). Total num frames: 4718592. Throughput: 0: 6463.8, 1: 6464.8. Samples: 4697028. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:18:36,101][105923] Avg episode reward: [(0, '75.029'), (1, '85.306')] [2023-09-20 19:18:36,108][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004608_2359296.pth... [2023-09-20 19:18:36,108][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004608_2359296.pth... [2023-09-20 19:18:36,111][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004232_2166784.pth [2023-09-20 19:18:36,113][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004232_2166784.pth [2023-09-20 19:18:38,348][106486] Updated weights for policy 1, policy_version 4640 (0.0015) [2023-09-20 19:18:38,348][106484] Updated weights for policy 0, policy_version 4640 (0.0012) [2023-09-20 19:18:41,100][105923] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 12968.3). Total num frames: 4784128. Throughput: 0: 6449.1, 1: 6448.6. Samples: 4773252. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:18:41,101][105923] Avg episode reward: [(0, '75.141'), (1, '85.356')] [2023-09-20 19:18:44,987][106484] Updated weights for policy 0, policy_version 4720 (0.0013) [2023-09-20 19:18:44,988][106486] Updated weights for policy 1, policy_version 4720 (0.0013) [2023-09-20 19:18:46,100][105923] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 4841472. Throughput: 0: 6415.3, 1: 6413.8. Samples: 4849524. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:18:46,101][105923] Avg episode reward: [(0, '74.283'), (1, '82.999')] [2023-09-20 19:18:51,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 4907008. Throughput: 0: 6417.9, 1: 6417.2. Samples: 4887384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:18:51,101][105923] Avg episode reward: [(0, '74.755'), (1, '78.887')] [2023-09-20 19:18:51,109][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004792_2453504.pth... [2023-09-20 19:18:51,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004792_2453504.pth... [2023-09-20 19:18:51,116][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004424_2265088.pth [2023-09-20 19:18:51,116][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004424_2265088.pth [2023-09-20 19:18:51,197][106484] Updated weights for policy 0, policy_version 4800 (0.0013) [2023-09-20 19:18:51,198][106486] Updated weights for policy 1, policy_version 4800 (0.0013) [2023-09-20 19:18:56,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12912.8). Total num frames: 4972544. Throughput: 0: 6389.9, 1: 6389.9. Samples: 4966300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:18:56,101][105923] Avg episode reward: [(0, '74.822'), (1, '78.510')] [2023-09-20 19:18:57,425][106484] Updated weights for policy 0, policy_version 4880 (0.0013) [2023-09-20 19:18:57,425][106486] Updated weights for policy 1, policy_version 4880 (0.0015) [2023-09-20 19:19:01,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 12926.7). Total num frames: 5038080. Throughput: 0: 6438.7, 1: 6437.0. Samples: 5046248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:19:01,101][105923] Avg episode reward: [(0, '75.999'), (1, '79.474')] [2023-09-20 19:19:03,568][106486] Updated weights for policy 1, policy_version 4960 (0.0015) [2023-09-20 19:19:03,569][106484] Updated weights for policy 0, policy_version 4960 (0.0009) [2023-09-20 19:19:06,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 5103616. Throughput: 0: 6458.7, 1: 6460.1. Samples: 5086940. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:19:06,101][105923] Avg episode reward: [(0, '77.729'), (1, '82.603')] [2023-09-20 19:19:06,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004992_2555904.pth... [2023-09-20 19:19:06,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004992_2555904.pth... [2023-09-20 19:19:06,113][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004608_2359296.pth [2023-09-20 19:19:06,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004608_2359296.pth [2023-09-20 19:19:09,962][106484] Updated weights for policy 0, policy_version 5040 (0.0015) [2023-09-20 19:19:09,962][106486] Updated weights for policy 1, policy_version 5040 (0.0014) [2023-09-20 19:19:11,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 5169152. Throughput: 0: 6485.6, 1: 6487.9. Samples: 5163134. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:19:11,101][105923] Avg episode reward: [(0, '80.705'), (1, '88.946')] [2023-09-20 19:19:16,100][105923] Fps is (10 sec: 13107.4, 60 sec: 12834.1, 300 sec: 12913.3). Total num frames: 5234688. Throughput: 0: 6445.1, 1: 6445.3. Samples: 5239862. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:19:16,101][105923] Avg episode reward: [(0, '83.016'), (1, '93.844')] [2023-09-20 19:19:16,370][106484] Updated weights for policy 0, policy_version 5120 (0.0015) [2023-09-20 19:19:16,370][106486] Updated weights for policy 1, policy_version 5120 (0.0013) [2023-09-20 19:19:21,100][105923] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 12940.6). Total num frames: 5300224. Throughput: 0: 6454.2, 1: 6453.3. Samples: 5277868. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:19:21,101][105923] Avg episode reward: [(0, '82.461'), (1, '101.325')] [2023-09-20 19:19:21,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000005176_2650112.pth... [2023-09-20 19:19:21,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000005176_2650112.pth... [2023-09-20 19:19:21,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004792_2453504.pth [2023-09-20 19:19:21,117][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004792_2453504.pth [2023-09-20 19:19:22,712][106484] Updated weights for policy 0, policy_version 5200 (0.0013) [2023-09-20 19:19:22,712][106486] Updated weights for policy 1, policy_version 5200 (0.0014) [2023-09-20 19:19:26,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12970.6, 300 sec: 12940.6). Total num frames: 5365760. Throughput: 0: 6483.1, 1: 6483.1. Samples: 5356732. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-20 19:19:26,101][105923] Avg episode reward: [(0, '79.790'), (1, '104.354')] [2023-09-20 19:19:29,076][106486] Updated weights for policy 1, policy_version 5280 (0.0013) [2023-09-20 19:19:29,076][106484] Updated weights for policy 0, policy_version 5280 (0.0014) [2023-09-20 19:19:31,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 5423104. Throughput: 0: 6464.3, 1: 6063.7. Samples: 5413286. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:19:31,101][105923] Avg episode reward: [(0, '71.890'), (1, '107.285')] [2023-09-20 19:19:35,757][106484] Updated weights for policy 0, policy_version 5360 (0.0016) [2023-09-20 19:19:35,757][106486] Updated weights for policy 1, policy_version 5360 (0.0015) [2023-09-20 19:19:36,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 5488640. Throughput: 0: 6444.6, 1: 6445.2. Samples: 5467428. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:19:36,101][105923] Avg episode reward: [(0, '63.056'), (1, '108.531')] [2023-09-20 19:19:36,111][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000005360_2744320.pth... [2023-09-20 19:19:36,111][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000005360_2744320.pth... [2023-09-20 19:19:36,116][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000004992_2555904.pth [2023-09-20 19:19:36,119][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000004992_2555904.pth [2023-09-20 19:19:41,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 5554176. Throughput: 0: 6441.6, 1: 6440.0. Samples: 5545968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:19:41,101][105923] Avg episode reward: [(0, '58.532'), (1, '112.171')] [2023-09-20 19:19:42,114][106484] Updated weights for policy 0, policy_version 5440 (0.0011) [2023-09-20 19:19:42,115][106486] Updated weights for policy 1, policy_version 5440 (0.0013) [2023-09-20 19:19:46,100][105923] Fps is (10 sec: 12288.3, 60 sec: 12834.2, 300 sec: 12885.0). Total num frames: 5611520. Throughput: 0: 6363.9, 1: 6365.8. Samples: 5619086. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:19:46,101][105923] Avg episode reward: [(0, '58.717'), (1, '113.557')] [2023-09-20 19:19:48,858][106486] Updated weights for policy 1, policy_version 5520 (0.0015) [2023-09-20 19:19:48,858][106484] Updated weights for policy 0, policy_version 5520 (0.0012) [2023-09-20 19:19:51,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12834.2, 300 sec: 12912.8). Total num frames: 5677056. Throughput: 0: 6326.5, 1: 6326.7. Samples: 5656332. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:19:51,101][105923] Avg episode reward: [(0, '61.865'), (1, '116.299')] [2023-09-20 19:19:51,108][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000005544_2838528.pth... [2023-09-20 19:19:51,108][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000005544_2838528.pth... [2023-09-20 19:19:51,115][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000005176_2650112.pth [2023-09-20 19:19:51,116][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000005176_2650112.pth [2023-09-20 19:19:55,561][106486] Updated weights for policy 1, policy_version 5600 (0.0014) [2023-09-20 19:19:55,561][106484] Updated weights for policy 0, policy_version 5600 (0.0014) [2023-09-20 19:19:56,100][105923] Fps is (10 sec: 12287.8, 60 sec: 12697.6, 300 sec: 12885.0). Total num frames: 5734400. Throughput: 0: 6288.0, 1: 6287.8. Samples: 5729044. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:19:56,101][105923] Avg episode reward: [(0, '63.568'), (1, '116.711')] [2023-09-20 19:20:01,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 5808128. Throughput: 0: 6341.8, 1: 6341.6. Samples: 5810616. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-20 19:20:01,101][105923] Avg episode reward: [(0, '63.447'), (1, '115.213')] [2023-09-20 19:20:01,572][106484] Updated weights for policy 0, policy_version 5680 (0.0014) [2023-09-20 19:20:01,572][106486] Updated weights for policy 1, policy_version 5680 (0.0013) [2023-09-20 19:20:06,100][105923] Fps is (10 sec: 13926.5, 60 sec: 12834.2, 300 sec: 12912.8). Total num frames: 5873664. Throughput: 0: 6355.4, 1: 6355.9. Samples: 5849876. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:20:06,101][105923] Avg episode reward: [(0, '63.695'), (1, '111.690')] [2023-09-20 19:20:06,106][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000005736_2936832.pth... [2023-09-20 19:20:06,107][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000005736_2936832.pth... [2023-09-20 19:20:06,109][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000005360_2744320.pth [2023-09-20 19:20:06,114][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000005360_2744320.pth [2023-09-20 19:20:07,625][106486] Updated weights for policy 1, policy_version 5760 (0.0012) [2023-09-20 19:20:07,626][106484] Updated weights for policy 0, policy_version 5760 (0.0013) [2023-09-20 19:20:11,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 5939200. Throughput: 0: 6400.6, 1: 6401.2. Samples: 5932812. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:20:11,101][105923] Avg episode reward: [(0, '64.855'), (1, '110.458')] [2023-09-20 19:20:13,712][106484] Updated weights for policy 0, policy_version 5840 (0.0010) [2023-09-20 19:20:13,713][106486] Updated weights for policy 1, policy_version 5840 (0.0012) [2023-09-20 19:20:16,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 6004736. Throughput: 0: 6436.3, 1: 6838.4. Samples: 6010646. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:20:16,101][105923] Avg episode reward: [(0, '67.828'), (1, '110.089')] [2023-09-20 19:20:20,108][106484] Updated weights for policy 0, policy_version 5920 (0.0013) [2023-09-20 19:20:20,108][106486] Updated weights for policy 1, policy_version 5920 (0.0014) [2023-09-20 19:20:21,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12940.6). Total num frames: 6070272. Throughput: 0: 6465.7, 1: 6465.7. Samples: 6049338. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:20:21,101][105923] Avg episode reward: [(0, '72.288'), (1, '110.366')] [2023-09-20 19:20:21,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000005928_3035136.pth... [2023-09-20 19:20:21,109][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000005928_3035136.pth... [2023-09-20 19:20:21,114][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000005544_2838528.pth [2023-09-20 19:20:21,119][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000005544_2838528.pth [2023-09-20 19:20:26,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12912.8). Total num frames: 6135808. Throughput: 0: 6459.8, 1: 6461.1. Samples: 6127404. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:20:26,101][105923] Avg episode reward: [(0, '77.299'), (1, '110.018')] [2023-09-20 19:20:26,436][106484] Updated weights for policy 0, policy_version 6000 (0.0014) [2023-09-20 19:20:26,437][106486] Updated weights for policy 1, policy_version 6000 (0.0014) [2023-09-20 19:20:31,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 6201344. Throughput: 0: 6512.9, 1: 6512.5. Samples: 6205232. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:20:31,101][105923] Avg episode reward: [(0, '78.529'), (1, '110.071')] [2023-09-20 19:20:32,727][106484] Updated weights for policy 0, policy_version 6080 (0.0013) [2023-09-20 19:20:32,727][106486] Updated weights for policy 1, policy_version 6080 (0.0013) [2023-09-20 19:20:36,100][105923] Fps is (10 sec: 13106.9, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 6266880. Throughput: 0: 6525.4, 1: 6525.0. Samples: 6243600. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:20:36,101][105923] Avg episode reward: [(0, '76.892'), (1, '111.237')] [2023-09-20 19:20:36,111][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000006120_3133440.pth... [2023-09-20 19:20:36,112][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000006120_3133440.pth... [2023-09-20 19:20:36,114][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000005736_2936832.pth [2023-09-20 19:20:36,119][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000005736_2936832.pth [2023-09-20 19:20:39,060][106484] Updated weights for policy 0, policy_version 6160 (0.0010) [2023-09-20 19:20:39,060][106486] Updated weights for policy 1, policy_version 6160 (0.0012) [2023-09-20 19:20:41,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 6332416. Throughput: 0: 6610.4, 1: 6609.6. Samples: 6323942. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:20:41,101][105923] Avg episode reward: [(0, '73.666'), (1, '111.722')] [2023-09-20 19:20:45,273][106486] Updated weights for policy 1, policy_version 6240 (0.0011) [2023-09-20 19:20:45,273][106484] Updated weights for policy 0, policy_version 6240 (0.0015) [2023-09-20 19:20:46,100][105923] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12912.8). Total num frames: 6397952. Throughput: 0: 6556.3, 1: 6556.1. Samples: 6400676. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:20:46,101][105923] Avg episode reward: [(0, '72.114'), (1, '112.654')] [2023-09-20 19:20:51,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 12885.1). Total num frames: 6455296. Throughput: 0: 6545.8, 1: 6544.7. Samples: 6438946. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:20:51,101][105923] Avg episode reward: [(0, '71.516'), (1, '112.621')] [2023-09-20 19:20:51,115][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000006312_3231744.pth... [2023-09-20 19:20:51,118][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000005928_3035136.pth [2023-09-20 19:20:51,123][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000006312_3231744.pth... [2023-09-20 19:20:51,127][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000005928_3035136.pth [2023-09-20 19:20:51,749][106486] Updated weights for policy 1, policy_version 6320 (0.0016) [2023-09-20 19:20:51,749][106484] Updated weights for policy 0, policy_version 6320 (0.0012) [2023-09-20 19:20:56,100][105923] Fps is (10 sec: 12287.8, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 6520832. Throughput: 0: 6485.0, 1: 6484.7. Samples: 6516446. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:20:56,101][105923] Avg episode reward: [(0, '70.091'), (1, '113.639')] [2023-09-20 19:20:58,010][106486] Updated weights for policy 1, policy_version 6400 (0.0013) [2023-09-20 19:20:58,011][106484] Updated weights for policy 0, policy_version 6400 (0.0010) [2023-09-20 19:21:01,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 6586368. Throughput: 0: 6485.2, 1: 6483.8. Samples: 6594250. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:21:01,101][105923] Avg episode reward: [(0, '70.267'), (1, '114.190')] [2023-09-20 19:21:04,205][106484] Updated weights for policy 0, policy_version 6480 (0.0009) [2023-09-20 19:21:04,206][106486] Updated weights for policy 1, policy_version 6480 (0.0012) [2023-09-20 19:21:06,101][105923] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 12912.8). Total num frames: 6651904. Throughput: 0: 6510.1, 1: 6508.8. Samples: 6635192. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:21:06,101][105923] Avg episode reward: [(0, '70.277'), (1, '112.894')] [2023-09-20 19:21:06,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000006496_3325952.pth... [2023-09-20 19:21:06,109][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000006496_3325952.pth... [2023-09-20 19:21:06,124][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000006120_3133440.pth [2023-09-20 19:21:06,125][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000006120_3133440.pth [2023-09-20 19:21:10,808][106486] Updated weights for policy 1, policy_version 6560 (0.0016) [2023-09-20 19:21:10,808][106484] Updated weights for policy 0, policy_version 6560 (0.0009) [2023-09-20 19:21:11,100][105923] Fps is (10 sec: 13106.9, 60 sec: 12970.6, 300 sec: 12912.8). Total num frames: 6717440. Throughput: 0: 6465.8, 1: 6464.8. Samples: 6709282. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:21:11,101][105923] Avg episode reward: [(0, '71.053'), (1, '111.534')] [2023-09-20 19:21:16,100][105923] Fps is (10 sec: 12288.4, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 6774784. Throughput: 0: 6005.2, 1: 6005.0. Samples: 6745692. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:21:16,101][105923] Avg episode reward: [(0, '71.141'), (1, '110.991')] [2023-09-20 19:21:17,531][106484] Updated weights for policy 0, policy_version 6640 (0.0013) [2023-09-20 19:21:17,531][106486] Updated weights for policy 1, policy_version 6640 (0.0016) [2023-09-20 19:21:21,100][105923] Fps is (10 sec: 12288.3, 60 sec: 12834.2, 300 sec: 12912.8). Total num frames: 6840320. Throughput: 0: 6423.1, 1: 6423.7. Samples: 6821706. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:21:21,101][105923] Avg episode reward: [(0, '70.393'), (1, '112.596')] [2023-09-20 19:21:21,107][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000006680_3420160.pth... [2023-09-20 19:21:21,107][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000006680_3420160.pth... [2023-09-20 19:21:21,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000006312_3231744.pth [2023-09-20 19:21:21,114][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000006312_3231744.pth [2023-09-20 19:21:23,506][106484] Updated weights for policy 0, policy_version 6720 (0.0013) [2023-09-20 19:21:23,507][106486] Updated weights for policy 1, policy_version 6720 (0.0015) [2023-09-20 19:21:26,100][105923] Fps is (10 sec: 13926.3, 60 sec: 12970.6, 300 sec: 12912.8). Total num frames: 6914048. Throughput: 0: 6449.8, 1: 6450.2. Samples: 6904442. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:21:26,101][105923] Avg episode reward: [(0, '70.056'), (1, '112.603')] [2023-09-20 19:21:29,574][106484] Updated weights for policy 0, policy_version 6800 (0.0014) [2023-09-20 19:21:29,575][106486] Updated weights for policy 1, policy_version 6800 (0.0013) [2023-09-20 19:21:31,100][105923] Fps is (10 sec: 13926.2, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 6979584. Throughput: 0: 6474.7, 1: 6475.1. Samples: 6983416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:21:31,101][105923] Avg episode reward: [(0, '70.166'), (1, '101.489')] [2023-09-20 19:21:36,100][105923] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 7036928. Throughput: 0: 6462.5, 1: 6461.9. Samples: 7020546. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:21:36,101][105923] Avg episode reward: [(0, '70.966'), (1, '84.684')] [2023-09-20 19:21:36,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000006872_3518464.pth... [2023-09-20 19:21:36,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000006872_3518464.pth... [2023-09-20 19:21:36,112][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000006496_3325952.pth [2023-09-20 19:21:36,117][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000006496_3325952.pth [2023-09-20 19:21:36,201][106484] Updated weights for policy 0, policy_version 6880 (0.0012) [2023-09-20 19:21:36,202][106486] Updated weights for policy 1, policy_version 6880 (0.0014) [2023-09-20 19:21:41,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 7102464. Throughput: 0: 6429.9, 1: 6429.8. Samples: 7095132. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:21:41,101][105923] Avg episode reward: [(0, '70.786'), (1, '67.575')] [2023-09-20 19:21:42,759][106486] Updated weights for policy 1, policy_version 6960 (0.0008) [2023-09-20 19:21:42,759][106484] Updated weights for policy 0, policy_version 6960 (0.0014) [2023-09-20 19:21:46,100][105923] Fps is (10 sec: 13107.4, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 7168000. Throughput: 0: 6451.4, 1: 6452.5. Samples: 7174928. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:21:46,101][105923] Avg episode reward: [(0, '70.715'), (1, '61.214')] [2023-09-20 19:21:48,690][106484] Updated weights for policy 0, policy_version 7040 (0.0015) [2023-09-20 19:21:48,691][106486] Updated weights for policy 1, policy_version 7040 (0.0013) [2023-09-20 19:21:51,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 12912.8). Total num frames: 7233536. Throughput: 0: 6439.1, 1: 6440.5. Samples: 7214768. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:21:51,101][105923] Avg episode reward: [(0, '70.969'), (1, '67.289')] [2023-09-20 19:21:51,108][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000007064_3616768.pth... [2023-09-20 19:21:51,109][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000007064_3616768.pth... [2023-09-20 19:21:51,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000006680_3420160.pth [2023-09-20 19:21:51,114][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000006680_3420160.pth [2023-09-20 19:21:54,976][106484] Updated weights for policy 0, policy_version 7120 (0.0013) [2023-09-20 19:21:54,977][106486] Updated weights for policy 1, policy_version 7120 (0.0014) [2023-09-20 19:21:56,100][105923] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12885.1). Total num frames: 7299072. Throughput: 0: 6478.4, 1: 6479.0. Samples: 7292360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:21:56,101][105923] Avg episode reward: [(0, '71.919'), (1, '77.150')] [2023-09-20 19:22:01,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12885.1). Total num frames: 7364608. Throughput: 0: 6967.2, 1: 6966.0. Samples: 7372690. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-20 19:22:01,101][105923] Avg episode reward: [(0, '71.559'), (1, '88.384')] [2023-09-20 19:22:01,179][106484] Updated weights for policy 0, policy_version 7200 (0.0014) [2023-09-20 19:22:01,180][106486] Updated weights for policy 1, policy_version 7200 (0.0015) [2023-09-20 19:22:06,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 7430144. Throughput: 0: 6562.2, 1: 6562.4. Samples: 7412316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:22:06,101][105923] Avg episode reward: [(0, '70.156'), (1, '95.747')] [2023-09-20 19:22:06,108][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000007256_3715072.pth... [2023-09-20 19:22:06,108][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000007256_3715072.pth... [2023-09-20 19:22:06,114][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000006872_3518464.pth [2023-09-20 19:22:06,118][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000006872_3518464.pth [2023-09-20 19:22:07,428][106484] Updated weights for policy 0, policy_version 7280 (0.0015) [2023-09-20 19:22:07,428][106486] Updated weights for policy 1, policy_version 7280 (0.0013) [2023-09-20 19:22:11,100][105923] Fps is (10 sec: 13106.9, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 7495680. Throughput: 0: 6496.8, 1: 6497.3. Samples: 7489178. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:22:11,101][105923] Avg episode reward: [(0, '68.449'), (1, '101.139')] [2023-09-20 19:22:13,770][106486] Updated weights for policy 1, policy_version 7360 (0.0014) [2023-09-20 19:22:13,772][106484] Updated weights for policy 0, policy_version 7360 (0.0015) [2023-09-20 19:22:16,100][105923] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12885.1). Total num frames: 7561216. Throughput: 0: 6477.4, 1: 6477.2. Samples: 7566372. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:22:16,101][105923] Avg episode reward: [(0, '67.223'), (1, '100.990')] [2023-09-20 19:22:20,435][106484] Updated weights for policy 0, policy_version 7440 (0.0015) [2023-09-20 19:22:20,435][106486] Updated weights for policy 1, policy_version 7440 (0.0013) [2023-09-20 19:22:21,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12970.6, 300 sec: 12857.3). Total num frames: 7618560. Throughput: 0: 6463.6, 1: 6464.3. Samples: 7602300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:22:21,101][105923] Avg episode reward: [(0, '66.906'), (1, '98.093')] [2023-09-20 19:22:21,118][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000007448_3813376.pth... [2023-09-20 19:22:21,119][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000007448_3813376.pth... [2023-09-20 19:22:21,122][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000007064_3616768.pth [2023-09-20 19:22:21,127][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000007064_3616768.pth [2023-09-20 19:22:26,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 7692288. Throughput: 0: 6536.7, 1: 6536.9. Samples: 7683446. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:22:26,101][105923] Avg episode reward: [(0, '67.062'), (1, '98.624')] [2023-09-20 19:22:26,471][106484] Updated weights for policy 0, policy_version 7520 (0.0013) [2023-09-20 19:22:26,471][106486] Updated weights for policy 1, policy_version 7520 (0.0014) [2023-09-20 19:22:31,100][105923] Fps is (10 sec: 13926.7, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 7757824. Throughput: 0: 6549.4, 1: 6549.6. Samples: 7764382. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:22:31,101][105923] Avg episode reward: [(0, '67.616'), (1, '100.104')] [2023-09-20 19:22:32,616][106486] Updated weights for policy 1, policy_version 7600 (0.0012) [2023-09-20 19:22:32,618][106484] Updated weights for policy 0, policy_version 7600 (0.0016) [2023-09-20 19:22:36,100][105923] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12940.6). Total num frames: 7823360. Throughput: 0: 6525.1, 1: 6524.9. Samples: 7802018. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:22:36,101][105923] Avg episode reward: [(0, '66.763'), (1, '101.194')] [2023-09-20 19:22:36,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000007640_3911680.pth... [2023-09-20 19:22:36,111][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000007640_3911680.pth... [2023-09-20 19:22:36,119][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000007256_3715072.pth [2023-09-20 19:22:36,119][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000007256_3715072.pth [2023-09-20 19:22:39,035][106484] Updated weights for policy 0, policy_version 7680 (0.0016) [2023-09-20 19:22:39,035][106486] Updated weights for policy 1, policy_version 7680 (0.0014) [2023-09-20 19:22:41,100][105923] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12940.6). Total num frames: 7888896. Throughput: 0: 6528.0, 1: 6528.3. Samples: 7879892. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-20 19:22:41,101][105923] Avg episode reward: [(0, '64.189'), (1, '98.076')] [2023-09-20 19:22:45,524][106484] Updated weights for policy 0, policy_version 7760 (0.0012) [2023-09-20 19:22:45,525][106486] Updated weights for policy 1, policy_version 7760 (0.0014) [2023-09-20 19:22:46,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 7946240. Throughput: 0: 6036.8, 1: 6038.2. Samples: 7916068. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:22:46,101][105923] Avg episode reward: [(0, '60.756'), (1, '97.350')] [2023-09-20 19:22:51,100][105923] Fps is (10 sec: 12287.6, 60 sec: 12970.6, 300 sec: 12912.8). Total num frames: 8011776. Throughput: 0: 6467.9, 1: 6467.0. Samples: 7994390. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:22:51,102][105923] Avg episode reward: [(0, '61.351'), (1, '97.021')] [2023-09-20 19:22:51,153][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000007832_4009984.pth... [2023-09-20 19:22:51,155][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000007832_4009984.pth... [2023-09-20 19:22:51,158][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000007448_3813376.pth [2023-09-20 19:22:51,159][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000007448_3813376.pth [2023-09-20 19:22:51,808][106486] Updated weights for policy 1, policy_version 7840 (0.0012) [2023-09-20 19:22:51,808][106484] Updated weights for policy 0, policy_version 7840 (0.0016) [2023-09-20 19:22:56,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 12912.8). Total num frames: 8077312. Throughput: 0: 6467.2, 1: 6466.4. Samples: 8071186. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:22:56,101][105923] Avg episode reward: [(0, '66.884'), (1, '96.892')] [2023-09-20 19:22:58,038][106484] Updated weights for policy 0, policy_version 7920 (0.0013) [2023-09-20 19:22:58,039][106486] Updated weights for policy 1, policy_version 7920 (0.0014) [2023-09-20 19:23:01,100][105923] Fps is (10 sec: 13107.4, 60 sec: 12970.6, 300 sec: 12912.8). Total num frames: 8142848. Throughput: 0: 6465.5, 1: 6464.9. Samples: 8148244. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:23:01,101][105923] Avg episode reward: [(0, '74.671'), (1, '95.614')] [2023-09-20 19:23:04,607][106484] Updated weights for policy 0, policy_version 8000 (0.0015) [2023-09-20 19:23:04,607][106486] Updated weights for policy 1, policy_version 8000 (0.0017) [2023-09-20 19:23:06,101][105923] Fps is (10 sec: 13106.8, 60 sec: 12970.6, 300 sec: 12912.8). Total num frames: 8208384. Throughput: 0: 6479.9, 1: 6481.1. Samples: 8185548. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:23:06,102][105923] Avg episode reward: [(0, '81.169'), (1, '95.781')] [2023-09-20 19:23:06,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008016_4104192.pth... [2023-09-20 19:23:06,109][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008016_4104192.pth... [2023-09-20 19:23:06,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000007640_3911680.pth [2023-09-20 19:23:06,116][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000007640_3911680.pth [2023-09-20 19:23:11,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12885.0). Total num frames: 8265728. Throughput: 0: 6429.2, 1: 6429.1. Samples: 8262068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:23:11,101][105923] Avg episode reward: [(0, '81.374'), (1, '98.140')] [2023-09-20 19:23:11,152][106484] Updated weights for policy 0, policy_version 8080 (0.0015) [2023-09-20 19:23:11,153][106486] Updated weights for policy 1, policy_version 8080 (0.0015) [2023-09-20 19:23:16,100][105923] Fps is (10 sec: 12288.4, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 8331264. Throughput: 0: 5945.5, 1: 5945.3. Samples: 8299472. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:23:16,101][105923] Avg episode reward: [(0, '81.666'), (1, '98.268')] [2023-09-20 19:23:17,258][106486] Updated weights for policy 1, policy_version 8160 (0.0012) [2023-09-20 19:23:17,259][106484] Updated weights for policy 0, policy_version 8160 (0.0015) [2023-09-20 19:23:21,100][105923] Fps is (10 sec: 13926.3, 60 sec: 13107.2, 300 sec: 12940.6). Total num frames: 8404992. Throughput: 0: 6427.0, 1: 6425.5. Samples: 8380380. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:23:21,101][105923] Avg episode reward: [(0, '78.809'), (1, '95.823')] [2023-09-20 19:23:21,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008208_4202496.pth... [2023-09-20 19:23:21,111][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008208_4202496.pth... [2023-09-20 19:23:21,116][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000007832_4009984.pth [2023-09-20 19:23:21,118][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000007832_4009984.pth [2023-09-20 19:23:23,487][106484] Updated weights for policy 0, policy_version 8240 (0.0011) [2023-09-20 19:23:23,488][106486] Updated weights for policy 1, policy_version 8240 (0.0014) [2023-09-20 19:23:26,100][105923] Fps is (10 sec: 13926.6, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 8470528. Throughput: 0: 6454.1, 1: 6454.7. Samples: 8460788. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:23:26,101][105923] Avg episode reward: [(0, '79.446'), (1, '92.539')] [2023-09-20 19:23:29,777][106486] Updated weights for policy 1, policy_version 8320 (0.0013) [2023-09-20 19:23:29,778][106484] Updated weights for policy 0, policy_version 8320 (0.0012) [2023-09-20 19:23:31,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 8527872. Throughput: 0: 6482.6, 1: 6482.1. Samples: 8499480. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:23:31,101][105923] Avg episode reward: [(0, '78.401'), (1, '90.873')] [2023-09-20 19:23:36,033][106486] Updated weights for policy 1, policy_version 8400 (0.0011) [2023-09-20 19:23:36,034][106484] Updated weights for policy 0, policy_version 8400 (0.0014) [2023-09-20 19:23:36,101][105923] Fps is (10 sec: 13106.8, 60 sec: 12970.6, 300 sec: 12940.6). Total num frames: 8601600. Throughput: 0: 6473.5, 1: 6472.7. Samples: 8576972. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:23:36,102][105923] Avg episode reward: [(0, '75.964'), (1, '87.928')] [2023-09-20 19:23:36,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008400_4300800.pth... [2023-09-20 19:23:36,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008400_4300800.pth... [2023-09-20 19:23:36,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008016_4104192.pth [2023-09-20 19:23:36,118][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008016_4104192.pth [2023-09-20 19:23:41,100][105923] Fps is (10 sec: 13926.4, 60 sec: 12970.6, 300 sec: 12968.4). Total num frames: 8667136. Throughput: 0: 6499.0, 1: 6499.6. Samples: 8656124. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:23:41,101][105923] Avg episode reward: [(0, '72.585'), (1, '88.299')] [2023-09-20 19:23:42,193][106484] Updated weights for policy 0, policy_version 8480 (0.0013) [2023-09-20 19:23:42,194][106486] Updated weights for policy 1, policy_version 8480 (0.0013) [2023-09-20 19:23:46,100][105923] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 12968.4). Total num frames: 8732672. Throughput: 0: 6495.1, 1: 6493.8. Samples: 8732742. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:23:46,101][105923] Avg episode reward: [(0, '65.323'), (1, '87.291')] [2023-09-20 19:23:48,571][106484] Updated weights for policy 0, policy_version 8560 (0.0012) [2023-09-20 19:23:48,571][106486] Updated weights for policy 1, policy_version 8560 (0.0013) [2023-09-20 19:23:51,100][105923] Fps is (10 sec: 12288.2, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 8790016. Throughput: 0: 6534.0, 1: 6532.6. Samples: 8773538. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:23:51,101][105923] Avg episode reward: [(0, '63.775'), (1, '86.614')] [2023-09-20 19:23:51,106][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008584_4395008.pth... [2023-09-20 19:23:51,107][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008584_4395008.pth... [2023-09-20 19:23:51,110][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008208_4202496.pth [2023-09-20 19:23:51,118][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008208_4202496.pth [2023-09-20 19:23:55,062][106486] Updated weights for policy 1, policy_version 8640 (0.0013) [2023-09-20 19:23:55,062][106484] Updated weights for policy 0, policy_version 8640 (0.0012) [2023-09-20 19:23:56,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 8855552. Throughput: 0: 6506.3, 1: 6506.3. Samples: 8847632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:23:56,101][105923] Avg episode reward: [(0, '63.553'), (1, '82.791')] [2023-09-20 19:24:01,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 8921088. Throughput: 0: 6923.5, 1: 6922.9. Samples: 8922560. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:24:01,101][105923] Avg episode reward: [(0, '66.442'), (1, '78.846')] [2023-09-20 19:24:01,681][106484] Updated weights for policy 0, policy_version 8720 (0.0015) [2023-09-20 19:24:01,681][106486] Updated weights for policy 1, policy_version 8720 (0.0012) [2023-09-20 19:24:06,100][105923] Fps is (10 sec: 12288.2, 60 sec: 12834.2, 300 sec: 12912.8). Total num frames: 8978432. Throughput: 0: 6456.9, 1: 6457.5. Samples: 8961524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:24:06,101][105923] Avg episode reward: [(0, '66.567'), (1, '73.281')] [2023-09-20 19:24:06,147][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008776_4493312.pth... [2023-09-20 19:24:06,152][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008400_4300800.pth [2023-09-20 19:24:06,153][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008776_4493312.pth... [2023-09-20 19:24:06,156][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008400_4300800.pth [2023-09-20 19:24:08,046][106486] Updated weights for policy 1, policy_version 8800 (0.0011) [2023-09-20 19:24:08,047][106484] Updated weights for policy 0, policy_version 8800 (0.0012) [2023-09-20 19:24:11,100][105923] Fps is (10 sec: 12287.7, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 9043968. Throughput: 0: 6390.6, 1: 6388.9. Samples: 9035864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:24:11,101][105923] Avg episode reward: [(0, '65.479'), (1, '71.329')] [2023-09-20 19:24:14,404][106486] Updated weights for policy 1, policy_version 8880 (0.0013) [2023-09-20 19:24:14,405][106484] Updated weights for policy 0, policy_version 8880 (0.0015) [2023-09-20 19:24:16,100][105923] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 9109504. Throughput: 0: 6853.2, 1: 6853.7. Samples: 9116288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:24:16,101][105923] Avg episode reward: [(0, '65.600'), (1, '72.063')] [2023-09-20 19:24:21,051][106484] Updated weights for policy 0, policy_version 8960 (0.0015) [2023-09-20 19:24:21,051][106486] Updated weights for policy 1, policy_version 8960 (0.0015) [2023-09-20 19:24:21,101][105923] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 9175040. Throughput: 0: 6391.6, 1: 6393.4. Samples: 9152294. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:24:21,101][105923] Avg episode reward: [(0, '65.903'), (1, '74.320')] [2023-09-20 19:24:21,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008960_4587520.pth... [2023-09-20 19:24:21,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008960_4587520.pth... [2023-09-20 19:24:21,116][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008584_4395008.pth [2023-09-20 19:24:21,118][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008584_4395008.pth [2023-09-20 19:24:26,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 9232384. Throughput: 0: 6313.6, 1: 6311.8. Samples: 9224266. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:24:26,101][105923] Avg episode reward: [(0, '65.493'), (1, '74.078')] [2023-09-20 19:24:27,657][106484] Updated weights for policy 0, policy_version 9040 (0.0015) [2023-09-20 19:24:27,658][106486] Updated weights for policy 1, policy_version 9040 (0.0012) [2023-09-20 19:24:31,100][105923] Fps is (10 sec: 12288.4, 60 sec: 12834.2, 300 sec: 12912.8). Total num frames: 9297920. Throughput: 0: 6284.7, 1: 6286.1. Samples: 9298424. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:24:31,101][105923] Avg episode reward: [(0, '64.309'), (1, '72.238')] [2023-09-20 19:24:34,153][106484] Updated weights for policy 0, policy_version 9120 (0.0010) [2023-09-20 19:24:34,154][106486] Updated weights for policy 1, policy_version 9120 (0.0014) [2023-09-20 19:24:36,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12697.7, 300 sec: 12912.8). Total num frames: 9363456. Throughput: 0: 6281.8, 1: 6281.4. Samples: 9338882. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:24:36,101][105923] Avg episode reward: [(0, '64.514'), (1, '72.166')] [2023-09-20 19:24:36,106][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000009144_4681728.pth... [2023-09-20 19:24:36,106][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000009144_4681728.pth... [2023-09-20 19:24:36,112][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008776_4493312.pth [2023-09-20 19:24:36,118][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008776_4493312.pth [2023-09-20 19:24:40,220][106484] Updated weights for policy 0, policy_version 9200 (0.0010) [2023-09-20 19:24:40,221][106486] Updated weights for policy 1, policy_version 9200 (0.0014) [2023-09-20 19:24:41,100][105923] Fps is (10 sec: 13107.0, 60 sec: 12697.6, 300 sec: 12940.6). Total num frames: 9428992. Throughput: 0: 6360.0, 1: 6359.6. Samples: 9420014. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:24:41,101][105923] Avg episode reward: [(0, '65.429'), (1, '76.684')] [2023-09-20 19:24:46,100][105923] Fps is (10 sec: 13107.4, 60 sec: 12697.6, 300 sec: 12940.6). Total num frames: 9494528. Throughput: 0: 6376.3, 1: 6377.6. Samples: 9496488. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:24:46,101][105923] Avg episode reward: [(0, '66.119'), (1, '82.320')] [2023-09-20 19:24:46,563][106486] Updated weights for policy 1, policy_version 9280 (0.0014) [2023-09-20 19:24:46,563][106484] Updated weights for policy 0, policy_version 9280 (0.0014) [2023-09-20 19:24:51,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 9560064. Throughput: 0: 6377.9, 1: 6377.5. Samples: 9535522. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:24:51,101][105923] Avg episode reward: [(0, '64.929'), (1, '83.564')] [2023-09-20 19:24:51,112][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000009336_4780032.pth... [2023-09-20 19:24:51,112][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000009336_4780032.pth... [2023-09-20 19:24:51,118][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000008960_4587520.pth [2023-09-20 19:24:51,118][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000008960_4587520.pth [2023-09-20 19:24:52,990][106484] Updated weights for policy 0, policy_version 9360 (0.0011) [2023-09-20 19:24:52,990][106486] Updated weights for policy 1, policy_version 9360 (0.0011) [2023-09-20 19:24:56,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 9617408. Throughput: 0: 6396.5, 1: 6397.9. Samples: 9611608. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:24:56,101][105923] Avg episode reward: [(0, '64.342'), (1, '82.472')] [2023-09-20 19:24:59,347][106486] Updated weights for policy 1, policy_version 9440 (0.0015) [2023-09-20 19:24:59,348][106484] Updated weights for policy 0, policy_version 9440 (0.0012) [2023-09-20 19:25:01,100][105923] Fps is (10 sec: 12288.3, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 9682944. Throughput: 0: 6351.9, 1: 6352.6. Samples: 9687992. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:25:01,101][105923] Avg episode reward: [(0, '65.477'), (1, '82.731')] [2023-09-20 19:25:05,936][106484] Updated weights for policy 0, policy_version 9520 (0.0010) [2023-09-20 19:25:05,937][106486] Updated weights for policy 1, policy_version 9520 (0.0012) [2023-09-20 19:25:06,100][105923] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 9748480. Throughput: 0: 6381.5, 1: 6380.8. Samples: 9726596. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:25:06,101][105923] Avg episode reward: [(0, '65.928'), (1, '84.092')] [2023-09-20 19:25:06,108][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000009520_4874240.pth... [2023-09-20 19:25:06,108][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000009520_4874240.pth... [2023-09-20 19:25:06,114][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000009144_4681728.pth [2023-09-20 19:25:06,116][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000009144_4681728.pth [2023-09-20 19:25:11,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12697.6, 300 sec: 12885.0). Total num frames: 9805824. Throughput: 0: 6380.9, 1: 6382.2. Samples: 9798604. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-20 19:25:11,101][105923] Avg episode reward: [(0, '65.550'), (1, '82.984')] [2023-09-20 19:25:12,587][106484] Updated weights for policy 0, policy_version 9600 (0.0012) [2023-09-20 19:25:12,588][106486] Updated weights for policy 1, policy_version 9600 (0.0015) [2023-09-20 19:25:16,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12697.6, 300 sec: 12885.0). Total num frames: 9871360. Throughput: 0: 6419.8, 1: 6420.2. Samples: 9876228. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:25:16,101][105923] Avg episode reward: [(0, '64.094'), (1, '81.130')] [2023-09-20 19:25:19,058][106486] Updated weights for policy 1, policy_version 9680 (0.0013) [2023-09-20 19:25:19,059][106484] Updated weights for policy 0, policy_version 9680 (0.0014) [2023-09-20 19:25:21,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12697.7, 300 sec: 12885.0). Total num frames: 9936896. Throughput: 0: 6374.7, 1: 6376.9. Samples: 9912704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:25:21,101][105923] Avg episode reward: [(0, '65.623'), (1, '78.421')] [2023-09-20 19:25:21,107][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000009704_4968448.pth... [2023-09-20 19:25:21,107][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000009704_4968448.pth... [2023-09-20 19:25:21,112][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000009336_4780032.pth [2023-09-20 19:25:21,114][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000009336_4780032.pth [2023-09-20 19:25:25,488][106484] Updated weights for policy 0, policy_version 9760 (0.0014) [2023-09-20 19:25:25,489][106486] Updated weights for policy 1, policy_version 9760 (0.0013) [2023-09-20 19:25:26,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12697.6, 300 sec: 12857.3). Total num frames: 9994240. Throughput: 0: 6335.7, 1: 6336.3. Samples: 9990252. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:25:26,101][105923] Avg episode reward: [(0, '67.355'), (1, '79.619')] [2023-09-20 19:25:31,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12885.1). Total num frames: 10067968. Throughput: 0: 6383.4, 1: 6383.3. Samples: 10070990. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:25:31,101][105923] Avg episode reward: [(0, '68.429'), (1, '81.404')] [2023-09-20 19:25:31,494][106484] Updated weights for policy 0, policy_version 9840 (0.0009) [2023-09-20 19:25:31,494][106486] Updated weights for policy 1, policy_version 9840 (0.0015) [2023-09-20 19:25:36,101][105923] Fps is (10 sec: 13926.1, 60 sec: 12834.1, 300 sec: 12885.0). Total num frames: 10133504. Throughput: 0: 6381.5, 1: 6382.4. Samples: 10109902. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:25:36,102][105923] Avg episode reward: [(0, '68.087'), (1, '84.588')] [2023-09-20 19:25:36,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000009896_5066752.pth... [2023-09-20 19:25:36,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000009896_5066752.pth... [2023-09-20 19:25:36,115][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000009520_4874240.pth [2023-09-20 19:25:36,117][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000009520_4874240.pth [2023-09-20 19:25:37,621][106486] Updated weights for policy 1, policy_version 9920 (0.0012) [2023-09-20 19:25:37,622][106484] Updated weights for policy 0, policy_version 9920 (0.0015) [2023-09-20 19:25:41,100][105923] Fps is (10 sec: 13106.9, 60 sec: 12834.1, 300 sec: 12885.0). Total num frames: 10199040. Throughput: 0: 6437.6, 1: 6435.8. Samples: 10190912. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:25:41,101][105923] Avg episode reward: [(0, '66.984'), (1, '85.819')] [2023-09-20 19:25:43,841][106486] Updated weights for policy 1, policy_version 10000 (0.0013) [2023-09-20 19:25:43,841][106484] Updated weights for policy 0, policy_version 10000 (0.0014) [2023-09-20 19:25:46,100][105923] Fps is (10 sec: 13107.8, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 10264576. Throughput: 0: 6458.8, 1: 6458.4. Samples: 10269262. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:25:46,101][105923] Avg episode reward: [(0, '65.340'), (1, '88.712')] [2023-09-20 19:25:50,217][106484] Updated weights for policy 0, policy_version 10080 (0.0014) [2023-09-20 19:25:50,218][106486] Updated weights for policy 1, policy_version 10080 (0.0015) [2023-09-20 19:25:51,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12912.8). Total num frames: 10330112. Throughput: 0: 6458.4, 1: 6458.1. Samples: 10307836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:25:51,101][105923] Avg episode reward: [(0, '63.741'), (1, '88.084')] [2023-09-20 19:25:51,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000010088_5165056.pth... [2023-09-20 19:25:51,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000010088_5165056.pth... [2023-09-20 19:25:51,115][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000009704_4968448.pth [2023-09-20 19:25:51,117][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000009704_4968448.pth [2023-09-20 19:25:56,100][105923] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12885.0). Total num frames: 10387456. Throughput: 0: 6492.7, 1: 6493.1. Samples: 10382964. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-20 19:25:56,101][105923] Avg episode reward: [(0, '63.543'), (1, '85.212')] [2023-09-20 19:25:56,869][106484] Updated weights for policy 0, policy_version 10160 (0.0014) [2023-09-20 19:25:56,869][106486] Updated weights for policy 1, policy_version 10160 (0.0012) [2023-09-20 19:26:01,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12885.1). Total num frames: 10452992. Throughput: 0: 6473.6, 1: 6473.2. Samples: 10458832. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:26:01,101][105923] Avg episode reward: [(0, '65.838'), (1, '82.677')] [2023-09-20 19:26:03,378][106484] Updated weights for policy 0, policy_version 10240 (0.0014) [2023-09-20 19:26:03,379][106486] Updated weights for policy 1, policy_version 10240 (0.0015) [2023-09-20 19:26:06,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12885.0). Total num frames: 10518528. Throughput: 0: 6465.3, 1: 6464.9. Samples: 10494564. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:26:06,101][105923] Avg episode reward: [(0, '69.064'), (1, '79.798')] [2023-09-20 19:26:06,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000010272_5259264.pth... [2023-09-20 19:26:06,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000010272_5259264.pth... [2023-09-20 19:26:06,117][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000009896_5066752.pth [2023-09-20 19:26:06,121][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000009896_5066752.pth [2023-09-20 19:26:10,106][106484] Updated weights for policy 0, policy_version 10320 (0.0015) [2023-09-20 19:26:10,106][106486] Updated weights for policy 1, policy_version 10320 (0.0014) [2023-09-20 19:26:11,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12885.0). Total num frames: 10575872. Throughput: 0: 6417.6, 1: 6415.7. Samples: 10567746. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:26:11,101][105923] Avg episode reward: [(0, '71.873'), (1, '81.602')] [2023-09-20 19:26:16,100][105923] Fps is (10 sec: 11468.9, 60 sec: 12697.6, 300 sec: 12857.3). Total num frames: 10633216. Throughput: 0: 5928.4, 1: 5927.3. Samples: 10604498. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:26:16,101][105923] Avg episode reward: [(0, '73.206'), (1, '83.770')] [2023-09-20 19:26:16,745][106486] Updated weights for policy 1, policy_version 10400 (0.0015) [2023-09-20 19:26:16,746][106484] Updated weights for policy 0, policy_version 10400 (0.0015) [2023-09-20 19:26:21,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12697.6, 300 sec: 12829.5). Total num frames: 10698752. Throughput: 0: 6322.3, 1: 6322.5. Samples: 10678916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:26:21,101][105923] Avg episode reward: [(0, '73.387'), (1, '83.757')] [2023-09-20 19:26:21,106][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000010448_5349376.pth... [2023-09-20 19:26:21,106][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000010448_5349376.pth... [2023-09-20 19:26:21,110][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000010088_5165056.pth [2023-09-20 19:26:21,113][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000010088_5165056.pth [2023-09-20 19:26:23,480][106484] Updated weights for policy 0, policy_version 10480 (0.0016) [2023-09-20 19:26:23,480][106486] Updated weights for policy 1, policy_version 10480 (0.0010) [2023-09-20 19:26:26,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12829.5). Total num frames: 10764288. Throughput: 0: 6229.2, 1: 6230.5. Samples: 10751598. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-20 19:26:26,101][105923] Avg episode reward: [(0, '73.297'), (1, '82.234')] [2023-09-20 19:26:29,881][106486] Updated weights for policy 1, policy_version 10560 (0.0015) [2023-09-20 19:26:29,881][106484] Updated weights for policy 0, policy_version 10560 (0.0016) [2023-09-20 19:26:31,100][105923] Fps is (10 sec: 12287.8, 60 sec: 12561.0, 300 sec: 12829.5). Total num frames: 10821632. Throughput: 0: 5796.1, 1: 5795.8. Samples: 10790900. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-20 19:26:31,101][105923] Avg episode reward: [(0, '74.243'), (1, '80.318')] [2023-09-20 19:26:36,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12561.1, 300 sec: 12829.5). Total num frames: 10887168. Throughput: 0: 6252.5, 1: 6251.7. Samples: 10870526. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:26:36,101][105923] Avg episode reward: [(0, '75.511'), (1, '80.546')] [2023-09-20 19:26:36,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000010632_5443584.pth... [2023-09-20 19:26:36,111][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000010632_5443584.pth... [2023-09-20 19:26:36,116][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000010272_5259264.pth [2023-09-20 19:26:36,119][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000010272_5259264.pth [2023-09-20 19:26:36,182][106484] Updated weights for policy 0, policy_version 10640 (0.0012) [2023-09-20 19:26:36,183][106486] Updated weights for policy 1, policy_version 10640 (0.0015) [2023-09-20 19:26:41,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12561.1, 300 sec: 12829.5). Total num frames: 10952704. Throughput: 0: 6270.1, 1: 6270.3. Samples: 10947282. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:26:41,101][105923] Avg episode reward: [(0, '77.357'), (1, '81.344')] [2023-09-20 19:26:42,322][106484] Updated weights for policy 0, policy_version 10720 (0.0014) [2023-09-20 19:26:42,322][106486] Updated weights for policy 1, policy_version 10720 (0.0015) [2023-09-20 19:26:46,100][105923] Fps is (10 sec: 13926.6, 60 sec: 12697.6, 300 sec: 12857.3). Total num frames: 11026432. Throughput: 0: 6308.1, 1: 6307.4. Samples: 11026532. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:26:46,101][105923] Avg episode reward: [(0, '78.118'), (1, '85.323')] [2023-09-20 19:26:48,659][106486] Updated weights for policy 1, policy_version 10800 (0.0011) [2023-09-20 19:26:48,659][106484] Updated weights for policy 0, policy_version 10800 (0.0011) [2023-09-20 19:26:51,100][105923] Fps is (10 sec: 13926.3, 60 sec: 12697.6, 300 sec: 12857.3). Total num frames: 11091968. Throughput: 0: 6344.8, 1: 6344.7. Samples: 11065590. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:26:51,101][105923] Avg episode reward: [(0, '78.540'), (1, '88.936')] [2023-09-20 19:26:51,111][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000010832_5545984.pth... [2023-09-20 19:26:51,112][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000010832_5545984.pth... [2023-09-20 19:26:51,118][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000010448_5349376.pth [2023-09-20 19:26:51,124][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000010448_5349376.pth [2023-09-20 19:26:54,716][106484] Updated weights for policy 0, policy_version 10880 (0.0013) [2023-09-20 19:26:54,717][106486] Updated weights for policy 1, policy_version 10880 (0.0014) [2023-09-20 19:26:56,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 11157504. Throughput: 0: 6427.4, 1: 6428.1. Samples: 11146242. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:26:56,101][105923] Avg episode reward: [(0, '79.420'), (1, '88.292')] [2023-09-20 19:27:00,932][106484] Updated weights for policy 0, policy_version 10960 (0.0013) [2023-09-20 19:27:00,934][106486] Updated weights for policy 1, policy_version 10960 (0.0012) [2023-09-20 19:27:01,100][105923] Fps is (10 sec: 13107.4, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 11223040. Throughput: 0: 6897.8, 1: 6898.2. Samples: 11225312. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:27:01,101][105923] Avg episode reward: [(0, '80.633'), (1, '84.194')] [2023-09-20 19:27:06,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12857.3). Total num frames: 11288576. Throughput: 0: 6503.3, 1: 6503.1. Samples: 11264202. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:27:06,101][105923] Avg episode reward: [(0, '82.331'), (1, '79.437')] [2023-09-20 19:27:06,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011024_5644288.pth... [2023-09-20 19:27:06,109][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011024_5644288.pth... [2023-09-20 19:27:06,116][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000010632_5443584.pth [2023-09-20 19:27:06,118][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000010632_5443584.pth [2023-09-20 19:27:07,323][106484] Updated weights for policy 0, policy_version 11040 (0.0013) [2023-09-20 19:27:07,324][106486] Updated weights for policy 1, policy_version 11040 (0.0013) [2023-09-20 19:27:11,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12829.5). Total num frames: 11345920. Throughput: 0: 6546.7, 1: 6546.9. Samples: 11340806. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:27:11,101][105923] Avg episode reward: [(0, '83.218'), (1, '79.897')] [2023-09-20 19:27:13,694][106486] Updated weights for policy 1, policy_version 11120 (0.0012) [2023-09-20 19:27:13,696][106484] Updated weights for policy 0, policy_version 11120 (0.0012) [2023-09-20 19:27:16,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 11411456. Throughput: 0: 6985.0, 1: 6984.0. Samples: 11419502. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:27:16,101][105923] Avg episode reward: [(0, '83.807'), (1, '83.756')] [2023-09-20 19:27:19,931][106486] Updated weights for policy 1, policy_version 11200 (0.0015) [2023-09-20 19:27:19,931][106484] Updated weights for policy 0, policy_version 11200 (0.0014) [2023-09-20 19:27:21,101][105923] Fps is (10 sec: 13106.8, 60 sec: 12970.6, 300 sec: 12829.5). Total num frames: 11476992. Throughput: 0: 6535.4, 1: 6537.0. Samples: 11458786. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-20 19:27:21,102][105923] Avg episode reward: [(0, '83.819'), (1, '85.676')] [2023-09-20 19:27:21,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011208_5738496.pth... [2023-09-20 19:27:21,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011208_5738496.pth... [2023-09-20 19:27:21,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000010832_5545984.pth [2023-09-20 19:27:21,114][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000010832_5545984.pth [2023-09-20 19:27:26,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12829.5). Total num frames: 11542528. Throughput: 0: 6494.1, 1: 6494.4. Samples: 11531762. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-20 19:27:26,101][105923] Avg episode reward: [(0, '83.576'), (1, '84.517')] [2023-09-20 19:27:26,544][106484] Updated weights for policy 0, policy_version 11280 (0.0015) [2023-09-20 19:27:26,545][106486] Updated weights for policy 1, policy_version 11280 (0.0014) [2023-09-20 19:27:31,100][105923] Fps is (10 sec: 13107.6, 60 sec: 13107.2, 300 sec: 12829.5). Total num frames: 11608064. Throughput: 0: 6462.6, 1: 6461.9. Samples: 11608136. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:27:31,101][105923] Avg episode reward: [(0, '82.681'), (1, '78.893')] [2023-09-20 19:27:32,945][106484] Updated weights for policy 0, policy_version 11360 (0.0015) [2023-09-20 19:27:32,945][106486] Updated weights for policy 1, policy_version 11360 (0.0016) [2023-09-20 19:27:36,100][105923] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12829.5). Total num frames: 11673600. Throughput: 0: 6484.2, 1: 6482.5. Samples: 11649090. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:27:36,101][105923] Avg episode reward: [(0, '81.887'), (1, '74.401')] [2023-09-20 19:27:36,108][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011400_5836800.pth... [2023-09-20 19:27:36,108][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011400_5836800.pth... [2023-09-20 19:27:36,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011024_5644288.pth [2023-09-20 19:27:36,115][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011024_5644288.pth [2023-09-20 19:27:39,066][106486] Updated weights for policy 1, policy_version 11440 (0.0010) [2023-09-20 19:27:39,067][106484] Updated weights for policy 0, policy_version 11440 (0.0011) [2023-09-20 19:27:41,100][105923] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12857.3). Total num frames: 11739136. Throughput: 0: 6477.9, 1: 6479.1. Samples: 11729304. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-20 19:27:41,101][105923] Avg episode reward: [(0, '81.385'), (1, '73.571')] [2023-09-20 19:27:45,264][106484] Updated weights for policy 0, policy_version 11520 (0.0014) [2023-09-20 19:27:45,265][106486] Updated weights for policy 1, policy_version 11520 (0.0014) [2023-09-20 19:27:46,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 11804672. Throughput: 0: 6474.4, 1: 6474.7. Samples: 11808020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:27:46,101][105923] Avg episode reward: [(0, '80.596'), (1, '73.309')] [2023-09-20 19:27:51,100][105923] Fps is (10 sec: 13106.9, 60 sec: 12970.6, 300 sec: 12857.3). Total num frames: 11870208. Throughput: 0: 6473.6, 1: 6472.8. Samples: 11846794. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:27:51,101][105923] Avg episode reward: [(0, '79.591'), (1, '74.107')] [2023-09-20 19:27:51,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011592_5935104.pth... [2023-09-20 19:27:51,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011592_5935104.pth... [2023-09-20 19:27:51,118][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011208_5738496.pth [2023-09-20 19:27:51,120][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011208_5738496.pth [2023-09-20 19:27:51,538][106484] Updated weights for policy 0, policy_version 11600 (0.0013) [2023-09-20 19:27:51,539][106486] Updated weights for policy 1, policy_version 11600 (0.0015) [2023-09-20 19:27:56,100][105923] Fps is (10 sec: 13106.9, 60 sec: 12970.6, 300 sec: 12857.3). Total num frames: 11935744. Throughput: 0: 6520.9, 1: 6519.7. Samples: 11927632. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:27:56,101][105923] Avg episode reward: [(0, '78.407'), (1, '74.637')] [2023-09-20 19:27:57,615][106484] Updated weights for policy 0, policy_version 11680 (0.0011) [2023-09-20 19:27:57,616][106486] Updated weights for policy 1, policy_version 11680 (0.0014) [2023-09-20 19:28:01,100][105923] Fps is (10 sec: 13107.4, 60 sec: 12970.6, 300 sec: 12857.3). Total num frames: 12001280. Throughput: 0: 6490.7, 1: 6491.5. Samples: 12003700. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-20 19:28:01,101][105923] Avg episode reward: [(0, '77.377'), (1, '73.982')] [2023-09-20 19:28:04,311][106486] Updated weights for policy 1, policy_version 11760 (0.0013) [2023-09-20 19:28:04,312][106484] Updated weights for policy 0, policy_version 11760 (0.0015) [2023-09-20 19:28:06,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 12058624. Throughput: 0: 6460.4, 1: 6460.3. Samples: 12040216. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:28:06,101][105923] Avg episode reward: [(0, '76.567'), (1, '72.740')] [2023-09-20 19:28:06,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011776_6029312.pth... [2023-09-20 19:28:06,111][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011776_6029312.pth... [2023-09-20 19:28:06,114][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011400_5836800.pth [2023-09-20 19:28:06,117][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011400_5836800.pth [2023-09-20 19:28:10,785][106486] Updated weights for policy 1, policy_version 11840 (0.0013) [2023-09-20 19:28:10,785][106484] Updated weights for policy 0, policy_version 11840 (0.0013) [2023-09-20 19:28:11,100][105923] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 12124160. Throughput: 0: 6491.6, 1: 6489.8. Samples: 12115924. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:28:11,101][105923] Avg episode reward: [(0, '76.449'), (1, '69.816')] [2023-09-20 19:28:16,100][105923] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12829.5). Total num frames: 12189696. Throughput: 0: 6468.6, 1: 6470.6. Samples: 12190400. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:28:16,101][105923] Avg episode reward: [(0, '76.626'), (1, '63.435')] [2023-09-20 19:28:17,199][106486] Updated weights for policy 1, policy_version 11920 (0.0009) [2023-09-20 19:28:17,199][106484] Updated weights for policy 0, policy_version 11920 (0.0014) [2023-09-20 19:28:21,100][105923] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12829.5). Total num frames: 12255232. Throughput: 0: 6462.0, 1: 6461.9. Samples: 12230668. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:28:21,101][105923] Avg episode reward: [(0, '77.166'), (1, '53.948')] [2023-09-20 19:28:21,110][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011968_6127616.pth... [2023-09-20 19:28:21,110][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011968_6127616.pth... [2023-09-20 19:28:21,117][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011592_5935104.pth [2023-09-20 19:28:21,117][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011592_5935104.pth [2023-09-20 19:28:23,554][106486] Updated weights for policy 1, policy_version 12000 (0.0009) [2023-09-20 19:28:23,555][106484] Updated weights for policy 0, policy_version 12000 (0.0011) [2023-09-20 19:28:26,100][105923] Fps is (10 sec: 12697.6, 60 sec: 12902.4, 300 sec: 12843.4). Total num frames: 12316672. Throughput: 0: 6431.2, 1: 6431.2. Samples: 12308108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:28:26,101][105923] Avg episode reward: [(0, '76.923'), (1, '49.542')] [2023-09-20 19:28:29,958][106484] Updated weights for policy 0, policy_version 12080 (0.0013) [2023-09-20 19:28:29,958][106486] Updated weights for policy 1, policy_version 12080 (0.0014) [2023-09-20 19:28:31,100][105923] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12801.7). Total num frames: 12378112. Throughput: 0: 5973.4, 1: 5973.1. Samples: 12345616. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:28:31,101][105923] Avg episode reward: [(0, '76.662'), (1, '50.583')] [2023-09-20 19:28:36,100][105923] Fps is (10 sec: 12697.4, 60 sec: 12834.1, 300 sec: 12801.7). Total num frames: 12443648. Throughput: 0: 6437.8, 1: 6438.8. Samples: 12426240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:28:36,105][105923] Avg episode reward: [(0, '76.529'), (1, '54.473')] [2023-09-20 19:28:36,113][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000012152_6221824.pth... [2023-09-20 19:28:36,114][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000012152_6221824.pth... [2023-09-20 19:28:36,121][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011776_6029312.pth [2023-09-20 19:28:36,123][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011776_6029312.pth [2023-09-20 19:28:36,193][106486] Updated weights for policy 1, policy_version 12160 (0.0011) [2023-09-20 19:28:36,193][106484] Updated weights for policy 0, policy_version 12160 (0.0013) [2023-09-20 19:28:41,100][105923] Fps is (10 sec: 13926.6, 60 sec: 12970.6, 300 sec: 12829.5). Total num frames: 12517376. Throughput: 0: 6410.3, 1: 6412.1. Samples: 12504640. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:28:41,101][105923] Avg episode reward: [(0, '76.859'), (1, '54.055')] [2023-09-20 19:28:42,255][106484] Updated weights for policy 0, policy_version 12240 (0.0014) [2023-09-20 19:28:42,255][106486] Updated weights for policy 1, policy_version 12240 (0.0013) [2023-09-20 19:28:46,100][105923] Fps is (10 sec: 13926.5, 60 sec: 12970.6, 300 sec: 12857.3). Total num frames: 12582912. Throughput: 0: 6437.3, 1: 6436.3. Samples: 12583014. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:28:46,101][105923] Avg episode reward: [(0, '76.557'), (1, '51.046')] [2023-09-20 19:28:48,535][106486] Updated weights for policy 1, policy_version 12320 (0.0013) [2023-09-20 19:28:48,537][106484] Updated weights for policy 0, policy_version 12320 (0.0012) [2023-09-20 19:28:51,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 12648448. Throughput: 0: 6485.1, 1: 6483.6. Samples: 12623808. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:28:51,101][105923] Avg episode reward: [(0, '74.979'), (1, '50.235')] [2023-09-20 19:28:51,112][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000012352_6324224.pth... [2023-09-20 19:28:51,112][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000012352_6324224.pth... [2023-09-20 19:28:51,115][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000011968_6127616.pth [2023-09-20 19:28:51,117][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000011968_6127616.pth [2023-09-20 19:28:54,710][106484] Updated weights for policy 0, policy_version 12400 (0.0010) [2023-09-20 19:28:54,710][106486] Updated weights for policy 1, policy_version 12400 (0.0012) [2023-09-20 19:28:56,100][105923] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 12713984. Throughput: 0: 6527.3, 1: 6528.0. Samples: 12703416. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-20 19:28:56,101][105923] Avg episode reward: [(0, '72.551'), (1, '51.410')] [2023-09-20 19:29:00,931][106486] Updated weights for policy 1, policy_version 12480 (0.0012) [2023-09-20 19:29:00,932][106484] Updated weights for policy 0, policy_version 12480 (0.0014) [2023-09-20 19:29:01,100][105923] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 12779520. Throughput: 0: 6571.7, 1: 6571.5. Samples: 12781844. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:29:01,101][105923] Avg episode reward: [(0, '70.111'), (1, '59.079')] [2023-09-20 19:29:06,100][105923] Fps is (10 sec: 12697.8, 60 sec: 13039.0, 300 sec: 12871.2). Total num frames: 12840960. Throughput: 0: 6554.2, 1: 6554.2. Samples: 12820546. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:29:06,101][105923] Avg episode reward: [(0, '69.317'), (1, '64.445')] [2023-09-20 19:29:06,107][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000012544_6422528.pth... [2023-09-20 19:29:06,107][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000012544_6422528.pth... [2023-09-20 19:29:06,110][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000012152_6221824.pth [2023-09-20 19:29:06,115][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000012152_6221824.pth [2023-09-20 19:29:07,399][106484] Updated weights for policy 0, policy_version 12560 (0.0011) [2023-09-20 19:29:07,400][106486] Updated weights for policy 1, policy_version 12560 (0.0014) [2023-09-20 19:29:11,100][105923] Fps is (10 sec: 12287.9, 60 sec: 12970.6, 300 sec: 12857.3). Total num frames: 12902400. Throughput: 0: 6532.8, 1: 6532.2. Samples: 12896034. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:29:11,101][105923] Avg episode reward: [(0, '69.374'), (1, '65.657')] [2023-09-20 19:29:13,706][106484] Updated weights for policy 0, policy_version 12640 (0.0015) [2023-09-20 19:29:13,706][106486] Updated weights for policy 1, policy_version 12640 (0.0013) [2023-09-20 19:29:16,100][105923] Fps is (10 sec: 12697.5, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 12967936. Throughput: 0: 6999.4, 1: 6998.6. Samples: 12975526. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:29:16,101][105923] Avg episode reward: [(0, '69.388'), (1, '63.188')] [2023-09-20 19:29:20,131][106486] Updated weights for policy 1, policy_version 12720 (0.0015) [2023-09-20 19:29:20,131][106484] Updated weights for policy 0, policy_version 12720 (0.0015) [2023-09-20 19:29:21,100][105923] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 13033472. Throughput: 0: 6515.5, 1: 6515.0. Samples: 13012610. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:29:21,101][105923] Avg episode reward: [(0, '68.892'), (1, '61.342')] [2023-09-20 19:29:21,111][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000012728_6516736.pth... [2023-09-20 19:29:21,111][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000012728_6516736.pth... [2023-09-20 19:29:21,116][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000012352_6324224.pth [2023-09-20 19:29:21,118][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000012352_6324224.pth [2023-09-20 19:29:26,100][105923] Fps is (10 sec: 13107.2, 60 sec: 13038.9, 300 sec: 12885.0). Total num frames: 13099008. Throughput: 0: 6518.6, 1: 6518.6. Samples: 13091314. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:29:26,101][105923] Avg episode reward: [(0, '68.737'), (1, '60.976')] [2023-09-20 19:29:26,298][106484] Updated weights for policy 0, policy_version 12800 (0.0014) [2023-09-20 19:29:26,299][106486] Updated weights for policy 1, policy_version 12800 (0.0015) [2023-09-20 19:29:31,100][105923] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 13164544. Throughput: 0: 6543.3, 1: 6544.1. Samples: 13171942. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:29:31,101][105923] Avg episode reward: [(0, '69.833'), (1, '59.048')] [2023-09-20 19:29:32,380][106484] Updated weights for policy 0, policy_version 12880 (0.0013) [2023-09-20 19:29:32,380][106486] Updated weights for policy 1, policy_version 12880 (0.0013) [2023-09-20 19:29:36,100][105923] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 13230080. Throughput: 0: 6548.6, 1: 6549.0. Samples: 13213200. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-20 19:29:36,101][105923] Avg episode reward: [(0, '71.867'), (1, '58.814')] [2023-09-20 19:29:36,106][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000012920_6615040.pth... [2023-09-20 19:29:36,106][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000012920_6615040.pth... [2023-09-20 19:29:36,110][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000012544_6422528.pth [2023-09-20 19:29:36,113][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000012544_6422528.pth [2023-09-20 19:29:38,728][106486] Updated weights for policy 1, policy_version 12960 (0.0007) [2023-09-20 19:29:38,729][106484] Updated weights for policy 0, policy_version 12960 (0.0015) [2023-09-20 19:29:41,100][105923] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 13295616. Throughput: 0: 6490.8, 1: 6490.6. Samples: 13287576. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-20 19:29:41,101][105923] Avg episode reward: [(0, '74.702'), (1, '59.789')] [2023-09-20 19:29:45,042][105923] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 105923], exiting... [2023-09-20 19:29:45,043][106490] Stopping RolloutWorker_w5... [2023-09-20 19:29:45,043][106489] Stopping RolloutWorker_w2... [2023-09-20 19:29:45,044][106490] Loop rollout_proc5_evt_loop terminating... [2023-09-20 19:29:45,044][106491] Stopping RolloutWorker_w6... [2023-09-20 19:29:45,044][106488] Stopping RolloutWorker_w3... [2023-09-20 19:29:45,044][106487] Stopping RolloutWorker_w1... [2023-09-20 19:29:45,044][106489] Loop rollout_proc2_evt_loop terminating... [2023-09-20 19:29:45,044][106524] Stopping RolloutWorker_w4... [2023-09-20 19:29:45,044][106556] Stopping RolloutWorker_w7... [2023-09-20 19:29:45,044][106485] Stopping RolloutWorker_w0... [2023-09-20 19:29:45,044][106488] Loop rollout_proc3_evt_loop terminating... [2023-09-20 19:29:45,044][106487] Loop rollout_proc1_evt_loop terminating... [2023-09-20 19:29:45,044][106491] Loop rollout_proc6_evt_loop terminating... [2023-09-20 19:29:45,044][106556] Loop rollout_proc7_evt_loop terminating... [2023-09-20 19:29:45,044][105923] Runner profile tree view: main_loop: 1040.2506 [2023-09-20 19:29:45,044][106524] Loop rollout_proc4_evt_loop terminating... [2023-09-20 19:29:45,044][106485] Loop rollout_proc0_evt_loop terminating... [2023-09-20 19:29:45,044][105923] Collected {0: 6672384, 1: 6672384}, FPS: 12828.4 [2023-09-20 19:29:45,049][106467] Stopping Batcher_1... [2023-09-20 19:29:45,049][106467] Loop batcher_evt_loop terminating... [2023-09-20 19:29:45,050][106463] Stopping Batcher_0... [2023-09-20 19:29:45,051][106463] Loop batcher_evt_loop terminating... [2023-09-20 19:29:45,103][106484] Weights refcount: 2 0 [2023-09-20 19:29:45,104][106486] Weights refcount: 2 0 [2023-09-20 19:29:45,104][106484] Stopping InferenceWorker_p0-w0... [2023-09-20 19:29:45,105][106486] Stopping InferenceWorker_p1-w0... [2023-09-20 19:29:45,105][106484] Loop inference_proc0-0_evt_loop terminating... [2023-09-20 19:29:45,105][106486] Loop inference_proc1-0_evt_loop terminating... [2023-09-20 19:29:45,105][106467] Saving ./train_dir/Swimmer/checkpoint_p1/checkpoint_000013040_6676480.pth... [2023-09-20 19:29:45,108][106467] Removing ./train_dir/Swimmer/checkpoint_p1/checkpoint_000012728_6516736.pth [2023-09-20 19:29:45,109][106467] Stopping LearnerWorker_p1... [2023-09-20 19:29:45,109][106467] Loop learner_proc1_evt_loop terminating... [2023-09-20 19:29:45,109][106463] Saving ./train_dir/Swimmer/checkpoint_p0/checkpoint_000013040_6676480.pth... [2023-09-20 19:29:45,112][106463] Removing ./train_dir/Swimmer/checkpoint_p0/checkpoint_000012728_6516736.pth [2023-09-20 19:29:45,113][106463] Stopping LearnerWorker_p0... [2023-09-20 19:29:45,113][106463] Loop learner_proc0_evt_loop terminating...