[2023-09-21 13:23:17,827][52220] Saving configuration to ./train_dir/DoublePendulum/config.json... [2023-09-21 13:23:17,892][52220] Rollout worker 0 uses device cpu [2023-09-21 13:23:17,893][52220] Rollout worker 1 uses device cpu [2023-09-21 13:23:17,894][52220] Rollout worker 2 uses device cpu [2023-09-21 13:23:17,894][52220] Rollout worker 3 uses device cpu [2023-09-21 13:23:17,895][52220] Rollout worker 4 uses device cpu [2023-09-21 13:23:17,896][52220] Rollout worker 5 uses device cpu [2023-09-21 13:23:17,896][52220] Rollout worker 6 uses device cpu [2023-09-21 13:23:17,897][52220] Rollout worker 7 uses device cpu [2023-09-21 13:23:17,897][52220] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-21 13:23:17,949][52220] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-21 13:23:17,950][52220] InferenceWorker_p0-w0: min num requests: 1 [2023-09-21 13:23:17,953][52220] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-21 13:23:17,953][52220] InferenceWorker_p1-w0: min num requests: 1 [2023-09-21 13:23:17,977][52220] Starting all processes... [2023-09-21 13:23:17,977][52220] Starting process learner_proc0 [2023-09-21 13:23:17,980][52220] Starting process learner_proc1 [2023-09-21 13:23:18,027][52220] Starting all processes... [2023-09-21 13:23:18,033][52220] Starting process inference_proc0-0 [2023-09-21 13:23:18,034][52220] Starting process inference_proc1-0 [2023-09-21 13:23:18,034][52220] Starting process rollout_proc0 [2023-09-21 13:23:18,034][52220] Starting process rollout_proc1 [2023-09-21 13:23:18,035][52220] Starting process rollout_proc2 [2023-09-21 13:23:18,035][52220] Starting process rollout_proc3 [2023-09-21 13:23:18,039][52220] Starting process rollout_proc4 [2023-09-21 13:23:18,039][52220] Starting process rollout_proc5 [2023-09-21 13:23:18,040][52220] Starting process rollout_proc6 [2023-09-21 13:23:18,043][52220] Starting process rollout_proc7 [2023-09-21 13:23:19,835][52884] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-21 13:23:19,835][52884] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-21 13:23:19,840][52885] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-21 13:23:19,840][52885] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1 [2023-09-21 13:23:19,853][52884] Num visible devices: 1 [2023-09-21 13:23:19,858][52885] Num visible devices: 1 [2023-09-21 13:23:19,875][52884] Starting seed is not provided [2023-09-21 13:23:19,875][52884] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-21 13:23:19,875][52884] Initializing actor-critic model on device cuda:0 [2023-09-21 13:23:19,876][52884] RunningMeanStd input shape: (11,) [2023-09-21 13:23:19,876][52884] RunningMeanStd input shape: (1,) [2023-09-21 13:23:19,896][52885] Starting seed is not provided [2023-09-21 13:23:19,896][52885] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-21 13:23:19,896][52885] Initializing actor-critic model on device cuda:0 [2023-09-21 13:23:19,897][52885] RunningMeanStd input shape: (11,) [2023-09-21 13:23:19,897][52885] RunningMeanStd input shape: (1,) [2023-09-21 13:23:19,912][52984] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-21 13:23:19,917][52979] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-21 13:23:19,917][52979] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1 [2023-09-21 13:23:19,925][52884] Created Actor Critic model with architecture: [2023-09-21 13:23:19,925][52884] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=1, bias=True) ) ) [2023-09-21 13:23:19,932][52986] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-21 13:23:19,962][52979] Num visible devices: 1 [2023-09-21 13:23:19,984][52990] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-21 13:23:19,986][52985] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-21 13:23:19,986][52885] Created Actor Critic model with architecture: [2023-09-21 13:23:19,986][52885] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=1, bias=True) ) ) [2023-09-21 13:23:20,065][52982] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-21 13:23:20,079][52980] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-21 13:23:20,079][52980] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-21 13:23:20,093][52988] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-21 13:23:20,097][52980] Num visible devices: 1 [2023-09-21 13:23:20,121][52987] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-21 13:23:20,153][52989] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-21 13:23:20,537][52884] Using optimizer [2023-09-21 13:23:20,537][52884] No checkpoints found [2023-09-21 13:23:20,538][52884] Did not load from checkpoint, starting from scratch! [2023-09-21 13:23:20,538][52884] Initialized policy 0 weights for model version 0 [2023-09-21 13:23:20,539][52884] LearnerWorker_p0 finished initialization! [2023-09-21 13:23:20,540][52884] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-21 13:23:20,583][52885] Using optimizer [2023-09-21 13:23:20,584][52885] No checkpoints found [2023-09-21 13:23:20,584][52885] Did not load from checkpoint, starting from scratch! [2023-09-21 13:23:20,584][52885] Initialized policy 1 weights for model version 0 [2023-09-21 13:23:20,586][52885] LearnerWorker_p1 finished initialization! [2023-09-21 13:23:20,586][52885] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-21 13:23:21,118][52980] RunningMeanStd input shape: (11,) [2023-09-21 13:23:21,118][52980] RunningMeanStd input shape: (1,) [2023-09-21 13:23:21,129][52979] RunningMeanStd input shape: (11,) [2023-09-21 13:23:21,130][52979] RunningMeanStd input shape: (1,) [2023-09-21 13:23:21,151][52220] Inference worker 0-0 is ready! [2023-09-21 13:23:21,163][52220] Inference worker 1-0 is ready! [2023-09-21 13:23:21,164][52220] All inference workers are ready! Signal rollout workers to start! [2023-09-21 13:23:21,247][52986] Decorrelating experience for 0 frames... [2023-09-21 13:23:21,248][52986] Decorrelating experience for 64 frames... [2023-09-21 13:23:21,248][52990] Decorrelating experience for 0 frames... [2023-09-21 13:23:21,249][52990] Decorrelating experience for 64 frames... [2023-09-21 13:23:21,252][52985] Decorrelating experience for 0 frames... [2023-09-21 13:23:21,253][52985] Decorrelating experience for 64 frames... [2023-09-21 13:23:21,256][52989] Decorrelating experience for 0 frames... [2023-09-21 13:23:21,257][52982] Decorrelating experience for 0 frames... [2023-09-21 13:23:21,257][52989] Decorrelating experience for 64 frames... [2023-09-21 13:23:21,257][52982] Decorrelating experience for 64 frames... [2023-09-21 13:23:21,260][52987] Decorrelating experience for 0 frames... [2023-09-21 13:23:21,260][52987] Decorrelating experience for 64 frames... [2023-09-21 13:23:21,262][52986] Decorrelating experience for 128 frames... [2023-09-21 13:23:21,262][52990] Decorrelating experience for 128 frames... [2023-09-21 13:23:21,266][52985] Decorrelating experience for 128 frames... [2023-09-21 13:23:21,270][52989] Decorrelating experience for 128 frames... [2023-09-21 13:23:21,271][52982] Decorrelating experience for 128 frames... [2023-09-21 13:23:21,273][52987] Decorrelating experience for 128 frames... [2023-09-21 13:23:21,287][52986] Decorrelating experience for 192 frames... [2023-09-21 13:23:21,288][52990] Decorrelating experience for 192 frames... [2023-09-21 13:23:21,290][52985] Decorrelating experience for 192 frames... [2023-09-21 13:23:21,295][52989] Decorrelating experience for 192 frames... [2023-09-21 13:23:21,296][52982] Decorrelating experience for 192 frames... [2023-09-21 13:23:21,295][52984] Decorrelating experience for 0 frames... [2023-09-21 13:23:21,295][52988] Decorrelating experience for 0 frames... [2023-09-21 13:23:21,296][52984] Decorrelating experience for 64 frames... [2023-09-21 13:23:21,296][52988] Decorrelating experience for 64 frames... [2023-09-21 13:23:21,299][52987] Decorrelating experience for 192 frames... [2023-09-21 13:23:21,320][52984] Decorrelating experience for 128 frames... [2023-09-21 13:23:21,320][52988] Decorrelating experience for 128 frames... [2023-09-21 13:23:21,332][52990] Decorrelating experience for 256 frames... [2023-09-21 13:23:21,333][52986] Decorrelating experience for 256 frames... [2023-09-21 13:23:21,334][52985] Decorrelating experience for 256 frames... [2023-09-21 13:23:21,339][52989] Decorrelating experience for 256 frames... [2023-09-21 13:23:21,342][52982] Decorrelating experience for 256 frames... [2023-09-21 13:23:21,344][52987] Decorrelating experience for 256 frames... [2023-09-21 13:23:21,366][52984] Decorrelating experience for 192 frames... [2023-09-21 13:23:21,367][52988] Decorrelating experience for 192 frames... [2023-09-21 13:23:21,381][52990] Decorrelating experience for 320 frames... [2023-09-21 13:23:21,382][52985] Decorrelating experience for 320 frames... [2023-09-21 13:23:21,382][52986] Decorrelating experience for 320 frames... [2023-09-21 13:23:21,388][52989] Decorrelating experience for 320 frames... [2023-09-21 13:23:21,390][52982] Decorrelating experience for 320 frames... [2023-09-21 13:23:21,394][52987] Decorrelating experience for 320 frames... [2023-09-21 13:23:21,433][52988] Decorrelating experience for 256 frames... [2023-09-21 13:23:21,438][52984] Decorrelating experience for 256 frames... [2023-09-21 13:23:21,441][52990] Decorrelating experience for 384 frames... [2023-09-21 13:23:21,441][52985] Decorrelating experience for 384 frames... [2023-09-21 13:23:21,444][52986] Decorrelating experience for 384 frames... [2023-09-21 13:23:21,449][52989] Decorrelating experience for 384 frames... [2023-09-21 13:23:21,451][52982] Decorrelating experience for 384 frames... [2023-09-21 13:23:21,457][52987] Decorrelating experience for 384 frames... [2023-09-21 13:23:21,483][52988] Decorrelating experience for 320 frames... [2023-09-21 13:23:21,486][52984] Decorrelating experience for 320 frames... [2023-09-21 13:23:21,513][52985] Decorrelating experience for 448 frames... [2023-09-21 13:23:21,514][52990] Decorrelating experience for 448 frames... [2023-09-21 13:23:21,519][52986] Decorrelating experience for 448 frames... [2023-09-21 13:23:21,522][52989] Decorrelating experience for 448 frames... [2023-09-21 13:23:21,526][52982] Decorrelating experience for 448 frames... [2023-09-21 13:23:21,532][52987] Decorrelating experience for 448 frames... [2023-09-21 13:23:21,544][52988] Decorrelating experience for 384 frames... [2023-09-21 13:23:21,547][52984] Decorrelating experience for 384 frames... [2023-09-21 13:23:21,618][52988] Decorrelating experience for 448 frames... [2023-09-21 13:23:21,621][52984] Decorrelating experience for 448 frames... [2023-09-21 13:23:24,286][52220] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8192. Throughput: 0: nan, 1: nan. Samples: 12962. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:23:24,287][52220] Avg episode reward: [(0, '58.179'), (1, '48.837')] [2023-09-21 13:23:29,286][52220] Fps is (10 sec: 9830.6, 60 sec: 9830.6, 300 sec: 9830.6). Total num frames: 57344. Throughput: 0: 2432.8, 1: 2433.2. Samples: 37292. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:23:29,287][52220] Avg episode reward: [(0, '112.394'), (1, '100.528')] [2023-09-21 13:23:29,291][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000056_28672.pth... [2023-09-21 13:23:29,291][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000056_28672.pth... [2023-09-21 13:23:30,770][52980] Updated weights for policy 0, policy_version 80 (0.0015) [2023-09-21 13:23:30,770][52979] Updated weights for policy 1, policy_version 80 (0.0015) [2023-09-21 13:23:34,287][52220] Fps is (10 sec: 11468.5, 60 sec: 11468.5, 300 sec: 11468.5). Total num frames: 122880. Throughput: 0: 5028.3, 1: 5027.7. Samples: 113524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:23:34,288][52220] Avg episode reward: [(0, '177.707'), (1, '178.329')] [2023-09-21 13:23:37,163][52980] Updated weights for policy 0, policy_version 160 (0.0012) [2023-09-21 13:23:37,164][52979] Updated weights for policy 1, policy_version 160 (0.0014) [2023-09-21 13:23:37,937][52220] Heartbeat connected on Batcher_0 [2023-09-21 13:23:37,940][52220] Heartbeat connected on LearnerWorker_p0 [2023-09-21 13:23:37,943][52220] Heartbeat connected on Batcher_1 [2023-09-21 13:23:37,946][52220] Heartbeat connected on LearnerWorker_p1 [2023-09-21 13:23:37,952][52220] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-21 13:23:37,956][52220] Heartbeat connected on RolloutWorker_w0 [2023-09-21 13:23:37,957][52220] Heartbeat connected on InferenceWorker_p1-w0 [2023-09-21 13:23:37,961][52220] Heartbeat connected on RolloutWorker_w1 [2023-09-21 13:23:37,963][52220] Heartbeat connected on RolloutWorker_w2 [2023-09-21 13:23:37,966][52220] Heartbeat connected on RolloutWorker_w3 [2023-09-21 13:23:37,968][52220] Heartbeat connected on RolloutWorker_w4 [2023-09-21 13:23:37,970][52220] Heartbeat connected on RolloutWorker_w5 [2023-09-21 13:23:37,973][52220] Heartbeat connected on RolloutWorker_w6 [2023-09-21 13:23:37,975][52220] Heartbeat connected on RolloutWorker_w7 [2023-09-21 13:23:39,287][52220] Fps is (10 sec: 13106.7, 60 sec: 12014.7, 300 sec: 12014.7). Total num frames: 188416. Throughput: 0: 5892.0, 1: 5893.5. Samples: 189748. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:23:39,288][52220] Avg episode reward: [(0, '224.041'), (1, '220.758')] [2023-09-21 13:23:43,737][52980] Updated weights for policy 0, policy_version 240 (0.0012) [2023-09-21 13:23:43,738][52979] Updated weights for policy 1, policy_version 240 (0.0015) [2023-09-21 13:23:44,287][52220] Fps is (10 sec: 12288.1, 60 sec: 11878.3, 300 sec: 11878.3). Total num frames: 245760. Throughput: 0: 5328.7, 1: 5329.1. Samples: 226122. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:23:44,287][52220] Avg episode reward: [(0, '297.844'), (1, '299.326')] [2023-09-21 13:23:44,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000240_122880.pth... [2023-09-21 13:23:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000240_122880.pth... [2023-09-21 13:23:44,303][52885] Saving new best policy, reward=299.326! [2023-09-21 13:23:44,304][52884] Saving new best policy, reward=297.844! [2023-09-21 13:23:49,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12124.1, 300 sec: 12124.1). Total num frames: 311296. Throughput: 0: 5801.9, 1: 5799.6. Samples: 303000. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:23:49,287][52220] Avg episode reward: [(0, '333.954'), (1, '360.469')] [2023-09-21 13:23:49,289][52885] Saving new best policy, reward=360.469! [2023-09-21 13:23:49,289][52884] Saving new best policy, reward=333.954! [2023-09-21 13:23:50,207][52980] Updated weights for policy 0, policy_version 320 (0.0014) [2023-09-21 13:23:50,207][52979] Updated weights for policy 1, policy_version 320 (0.0014) [2023-09-21 13:23:54,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12287.9, 300 sec: 12287.9). Total num frames: 376832. Throughput: 0: 6077.8, 1: 6078.5. Samples: 377654. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:23:54,288][52220] Avg episode reward: [(0, '405.556'), (1, '490.471')] [2023-09-21 13:23:54,289][52885] Saving new best policy, reward=490.471! [2023-09-21 13:23:54,289][52884] Saving new best policy, reward=405.556! [2023-09-21 13:23:56,804][52980] Updated weights for policy 0, policy_version 400 (0.0015) [2023-09-21 13:23:56,804][52979] Updated weights for policy 1, policy_version 400 (0.0013) [2023-09-21 13:23:59,287][52220] Fps is (10 sec: 12287.5, 60 sec: 12170.8, 300 sec: 12170.8). Total num frames: 434176. Throughput: 0: 5736.0, 1: 5736.4. Samples: 414500. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:23:59,288][52220] Avg episode reward: [(0, '530.836'), (1, '625.912')] [2023-09-21 13:23:59,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000424_217088.pth... [2023-09-21 13:23:59,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000424_217088.pth... [2023-09-21 13:23:59,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000056_28672.pth [2023-09-21 13:23:59,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000056_28672.pth [2023-09-21 13:23:59,305][52885] Saving new best policy, reward=625.912! [2023-09-21 13:23:59,305][52884] Saving new best policy, reward=530.836! [2023-09-21 13:24:03,331][52980] Updated weights for policy 0, policy_version 480 (0.0013) [2023-09-21 13:24:03,331][52979] Updated weights for policy 1, policy_version 480 (0.0013) [2023-09-21 13:24:04,286][52220] Fps is (10 sec: 12288.3, 60 sec: 12288.0, 300 sec: 12288.0). Total num frames: 499712. Throughput: 0: 5977.2, 1: 5976.6. Samples: 491118. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:24:04,287][52220] Avg episode reward: [(0, '811.461'), (1, '833.450')] [2023-09-21 13:24:04,288][52884] Saving new best policy, reward=811.461! [2023-09-21 13:24:04,288][52885] Saving new best policy, reward=833.450! [2023-09-21 13:24:09,286][52220] Fps is (10 sec: 13107.8, 60 sec: 12379.0, 300 sec: 12379.0). Total num frames: 565248. Throughput: 0: 6138.2, 1: 6136.3. Samples: 565318. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:24:09,287][52220] Avg episode reward: [(0, '1132.216'), (1, '1290.527')] [2023-09-21 13:24:09,288][52884] Saving new best policy, reward=1132.216! [2023-09-21 13:24:09,288][52885] Saving new best policy, reward=1290.527! [2023-09-21 13:24:09,772][52980] Updated weights for policy 0, policy_version 560 (0.0014) [2023-09-21 13:24:09,772][52979] Updated weights for policy 1, policy_version 560 (0.0014) [2023-09-21 13:24:14,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12451.8, 300 sec: 12451.8). Total num frames: 630784. Throughput: 0: 6322.2, 1: 6320.4. Samples: 606212. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:24:14,287][52220] Avg episode reward: [(0, '1820.030'), (1, '2335.145')] [2023-09-21 13:24:14,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000616_315392.pth... [2023-09-21 13:24:14,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000616_315392.pth... [2023-09-21 13:24:14,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000240_122880.pth [2023-09-21 13:24:14,302][52885] Saving new best policy, reward=2335.145! [2023-09-21 13:24:14,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000240_122880.pth [2023-09-21 13:24:14,305][52884] Saving new best policy, reward=1820.030! [2023-09-21 13:24:15,764][52979] Updated weights for policy 1, policy_version 640 (0.0014) [2023-09-21 13:24:15,765][52980] Updated weights for policy 0, policy_version 640 (0.0013) [2023-09-21 13:24:19,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12511.4, 300 sec: 12511.4). Total num frames: 696320. Throughput: 0: 6386.1, 1: 6384.4. Samples: 688194. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:24:19,288][52220] Avg episode reward: [(0, '3468.930'), (1, '4412.869')] [2023-09-21 13:24:19,289][52884] Saving new best policy, reward=3468.930! [2023-09-21 13:24:19,289][52885] Saving new best policy, reward=4412.869! [2023-09-21 13:24:21,997][52979] Updated weights for policy 1, policy_version 720 (0.0013) [2023-09-21 13:24:21,998][52980] Updated weights for policy 0, policy_version 720 (0.0016) [2023-09-21 13:24:24,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12561.0, 300 sec: 12561.0). Total num frames: 761856. Throughput: 0: 6405.2, 1: 6405.3. Samples: 766218. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:24:24,288][52220] Avg episode reward: [(0, '5840.580'), (1, '5940.595')] [2023-09-21 13:24:24,289][52885] Saving new best policy, reward=5940.595! [2023-09-21 13:24:24,289][52884] Saving new best policy, reward=5840.580! [2023-09-21 13:24:28,294][52980] Updated weights for policy 0, policy_version 800 (0.0012) [2023-09-21 13:24:28,294][52979] Updated weights for policy 1, policy_version 800 (0.0013) [2023-09-21 13:24:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12834.0, 300 sec: 12603.0). Total num frames: 827392. Throughput: 0: 6422.8, 1: 6422.5. Samples: 804164. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:24:29,288][52220] Avg episode reward: [(0, '7623.753'), (1, '6997.287')] [2023-09-21 13:24:29,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000808_413696.pth... [2023-09-21 13:24:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000808_413696.pth... [2023-09-21 13:24:29,298][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000424_217088.pth [2023-09-21 13:24:29,298][52884] Saving new best policy, reward=7623.753! [2023-09-21 13:24:29,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000424_217088.pth [2023-09-21 13:24:29,303][52885] Saving new best policy, reward=6997.287! [2023-09-21 13:24:34,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12639.1). Total num frames: 892928. Throughput: 0: 6415.3, 1: 6417.3. Samples: 880470. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:24:34,287][52220] Avg episode reward: [(0, '8302.445'), (1, '7779.651')] [2023-09-21 13:24:34,288][52884] Saving new best policy, reward=8302.445! [2023-09-21 13:24:34,288][52885] Saving new best policy, reward=7779.651! [2023-09-21 13:24:34,804][52980] Updated weights for policy 0, policy_version 880 (0.0013) [2023-09-21 13:24:34,806][52979] Updated weights for policy 1, policy_version 880 (0.0016) [2023-09-21 13:24:39,287][52220] Fps is (10 sec: 12288.1, 60 sec: 12697.6, 300 sec: 12561.0). Total num frames: 950272. Throughput: 0: 6449.9, 1: 6449.0. Samples: 958104. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:24:39,288][52220] Avg episode reward: [(0, '8446.668'), (1, '8495.056')] [2023-09-21 13:24:39,295][52885] Saving new best policy, reward=8495.056! [2023-09-21 13:24:39,302][52884] Saving new best policy, reward=8446.668! [2023-09-21 13:24:41,205][52980] Updated weights for policy 0, policy_version 960 (0.0014) [2023-09-21 13:24:41,205][52979] Updated weights for policy 1, policy_version 960 (0.0015) [2023-09-21 13:24:44,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12834.2, 300 sec: 12595.2). Total num frames: 1015808. Throughput: 0: 6460.1, 1: 6460.2. Samples: 995908. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:24:44,287][52220] Avg episode reward: [(0, '8409.842'), (1, '8561.365')] [2023-09-21 13:24:44,291][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000992_507904.pth... [2023-09-21 13:24:44,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000992_507904.pth... [2023-09-21 13:24:44,296][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000616_315392.pth [2023-09-21 13:24:44,296][52885] Saving new best policy, reward=8561.365! [2023-09-21 13:24:44,297][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000616_315392.pth [2023-09-21 13:24:47,855][52979] Updated weights for policy 1, policy_version 1040 (0.0009) [2023-09-21 13:24:47,856][52980] Updated weights for policy 0, policy_version 1040 (0.0015) [2023-09-21 13:24:49,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12834.1, 300 sec: 12625.3). Total num frames: 1081344. Throughput: 0: 6419.3, 1: 6419.3. Samples: 1068854. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:24:49,287][52220] Avg episode reward: [(0, '8587.504'), (1, '8914.431')] [2023-09-21 13:24:49,288][52885] Saving new best policy, reward=8914.431! [2023-09-21 13:24:49,288][52884] Saving new best policy, reward=8587.504! [2023-09-21 13:24:54,166][52979] Updated weights for policy 1, policy_version 1120 (0.0011) [2023-09-21 13:24:54,166][52980] Updated weights for policy 0, policy_version 1120 (0.0015) [2023-09-21 13:24:54,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12834.2, 300 sec: 12652.1). Total num frames: 1146880. Throughput: 0: 6462.6, 1: 6462.5. Samples: 1146948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:24:54,288][52220] Avg episode reward: [(0, '8752.820'), (1, '8903.087')] [2023-09-21 13:24:54,289][52884] Saving new best policy, reward=8752.820! [2023-09-21 13:24:59,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12970.7, 300 sec: 12676.0). Total num frames: 1212416. Throughput: 0: 6445.4, 1: 6448.0. Samples: 1186416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:24:59,287][52220] Avg episode reward: [(0, '8885.027'), (1, '8883.166')] [2023-09-21 13:24:59,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001184_606208.pth... [2023-09-21 13:24:59,293][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001184_606208.pth... [2023-09-21 13:24:59,296][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000808_413696.pth [2023-09-21 13:24:59,296][52884] Saving new best policy, reward=8885.027! [2023-09-21 13:24:59,297][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000808_413696.pth [2023-09-21 13:25:00,488][52979] Updated weights for policy 1, policy_version 1200 (0.0013) [2023-09-21 13:25:00,488][52980] Updated weights for policy 0, policy_version 1200 (0.0011) [2023-09-21 13:25:04,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12697.6). Total num frames: 1277952. Throughput: 0: 6396.5, 1: 6398.3. Samples: 1263958. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:25:04,287][52220] Avg episode reward: [(0, '8887.110'), (1, '8976.377')] [2023-09-21 13:25:04,288][52885] Saving new best policy, reward=8976.377! [2023-09-21 13:25:04,288][52884] Saving new best policy, reward=8887.110! [2023-09-21 13:25:06,706][52980] Updated weights for policy 0, policy_version 1280 (0.0014) [2023-09-21 13:25:06,706][52979] Updated weights for policy 1, policy_version 1280 (0.0009) [2023-09-21 13:25:09,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 12717.1). Total num frames: 1343488. Throughput: 0: 6416.0, 1: 6413.6. Samples: 1343552. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:25:09,287][52220] Avg episode reward: [(0, '9073.447'), (1, '9070.914')] [2023-09-21 13:25:09,289][52884] Saving new best policy, reward=9073.447! [2023-09-21 13:25:09,289][52885] Saving new best policy, reward=9070.914! [2023-09-21 13:25:12,903][52980] Updated weights for policy 0, policy_version 1360 (0.0012) [2023-09-21 13:25:12,903][52979] Updated weights for policy 1, policy_version 1360 (0.0015) [2023-09-21 13:25:14,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12734.8). Total num frames: 1409024. Throughput: 0: 6430.3, 1: 6430.1. Samples: 1382876. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:25:14,287][52220] Avg episode reward: [(0, '9257.818'), (1, '9254.347')] [2023-09-21 13:25:14,292][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001376_704512.pth... [2023-09-21 13:25:14,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001376_704512.pth... [2023-09-21 13:25:14,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000992_507904.pth [2023-09-21 13:25:14,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000992_507904.pth [2023-09-21 13:25:14,300][52884] Saving new best policy, reward=9257.818! [2023-09-21 13:25:14,300][52885] Saving new best policy, reward=9254.347! [2023-09-21 13:25:19,286][52220] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12679.8). Total num frames: 1466368. Throughput: 0: 6424.4, 1: 6423.9. Samples: 1458646. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:25:19,287][52220] Avg episode reward: [(0, '9349.989'), (1, '9256.270')] [2023-09-21 13:25:19,289][52884] Saving new best policy, reward=9349.989! [2023-09-21 13:25:19,289][52885] Saving new best policy, reward=9256.270! [2023-09-21 13:25:19,413][52980] Updated weights for policy 0, policy_version 1440 (0.0010) [2023-09-21 13:25:19,414][52979] Updated weights for policy 1, policy_version 1440 (0.0016) [2023-09-21 13:25:24,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12765.9). Total num frames: 1540096. Throughput: 0: 6464.2, 1: 6463.7. Samples: 1539856. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:25:24,287][52220] Avg episode reward: [(0, '9351.115'), (1, '9350.083')] [2023-09-21 13:25:24,288][52885] Saving new best policy, reward=9350.083! [2023-09-21 13:25:24,288][52884] Saving new best policy, reward=9351.115! [2023-09-21 13:25:25,490][52979] Updated weights for policy 1, policy_version 1520 (0.0012) [2023-09-21 13:25:25,492][52980] Updated weights for policy 0, policy_version 1520 (0.0014) [2023-09-21 13:25:29,287][52220] Fps is (10 sec: 13926.2, 60 sec: 12970.7, 300 sec: 12779.5). Total num frames: 1605632. Throughput: 0: 6502.8, 1: 6501.2. Samples: 1581092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:25:29,288][52220] Avg episode reward: [(0, '8733.420'), (1, '9351.004')] [2023-09-21 13:25:29,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001568_802816.pth... [2023-09-21 13:25:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001568_802816.pth... [2023-09-21 13:25:29,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001184_606208.pth [2023-09-21 13:25:29,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001184_606208.pth [2023-09-21 13:25:29,304][52885] Saving new best policy, reward=9351.004! [2023-09-21 13:25:31,687][52979] Updated weights for policy 1, policy_version 1600 (0.0016) [2023-09-21 13:25:31,687][52980] Updated weights for policy 0, policy_version 1600 (0.0018) [2023-09-21 13:25:34,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12792.1). Total num frames: 1671168. Throughput: 0: 6560.7, 1: 6561.7. Samples: 1659366. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:25:34,287][52220] Avg episode reward: [(0, '8198.340'), (1, '9351.573')] [2023-09-21 13:25:34,289][52885] Saving new best policy, reward=9351.573! [2023-09-21 13:25:37,878][52980] Updated weights for policy 0, policy_version 1680 (0.0015) [2023-09-21 13:25:37,878][52979] Updated weights for policy 1, policy_version 1680 (0.0014) [2023-09-21 13:25:39,286][52220] Fps is (10 sec: 13107.6, 60 sec: 13107.3, 300 sec: 12803.8). Total num frames: 1736704. Throughput: 0: 6552.9, 1: 6552.9. Samples: 1736706. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:25:39,287][52220] Avg episode reward: [(0, '7933.386'), (1, '9259.461')] [2023-09-21 13:25:44,260][52979] Updated weights for policy 1, policy_version 1760 (0.0013) [2023-09-21 13:25:44,260][52980] Updated weights for policy 0, policy_version 1760 (0.0013) [2023-09-21 13:25:44,287][52220] Fps is (10 sec: 13106.7, 60 sec: 13107.1, 300 sec: 12814.6). Total num frames: 1802240. Throughput: 0: 6544.5, 1: 6544.1. Samples: 1775406. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:25:44,288][52220] Avg episode reward: [(0, '8092.241'), (1, '9260.352')] [2023-09-21 13:25:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001760_901120.pth... [2023-09-21 13:25:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001760_901120.pth... [2023-09-21 13:25:44,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001376_704512.pth [2023-09-21 13:25:44,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001376_704512.pth [2023-09-21 13:25:49,286][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12824.7). Total num frames: 1867776. Throughput: 0: 6560.9, 1: 6561.2. Samples: 1854456. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:25:49,287][52220] Avg episode reward: [(0, '8264.465'), (1, '9261.111')] [2023-09-21 13:25:50,550][52980] Updated weights for policy 0, policy_version 1840 (0.0014) [2023-09-21 13:25:50,550][52979] Updated weights for policy 1, policy_version 1840 (0.0014) [2023-09-21 13:25:54,287][52220] Fps is (10 sec: 13107.6, 60 sec: 13107.2, 300 sec: 12834.1). Total num frames: 1933312. Throughput: 0: 6553.7, 1: 6553.7. Samples: 1933384. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:25:54,287][52220] Avg episode reward: [(0, '7817.697'), (1, '9354.608')] [2023-09-21 13:25:54,289][52885] Saving new best policy, reward=9354.608! [2023-09-21 13:25:56,795][52979] Updated weights for policy 1, policy_version 1920 (0.0012) [2023-09-21 13:25:56,795][52980] Updated weights for policy 0, policy_version 1920 (0.0014) [2023-09-21 13:25:59,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 12790.1). Total num frames: 1990656. Throughput: 0: 6544.5, 1: 6545.2. Samples: 1971916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:25:59,287][52220] Avg episode reward: [(0, '6932.151'), (1, '9354.989')] [2023-09-21 13:25:59,292][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001944_995328.pth... [2023-09-21 13:25:59,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001944_995328.pth... [2023-09-21 13:25:59,298][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001568_802816.pth [2023-09-21 13:25:59,298][52885] Saving new best policy, reward=9354.989! [2023-09-21 13:25:59,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001568_802816.pth [2023-09-21 13:26:03,240][52979] Updated weights for policy 1, policy_version 2000 (0.0013) [2023-09-21 13:26:03,240][52980] Updated weights for policy 0, policy_version 2000 (0.0013) [2023-09-21 13:26:04,286][52220] Fps is (10 sec: 12288.3, 60 sec: 12970.7, 300 sec: 12800.0). Total num frames: 2056192. Throughput: 0: 6549.4, 1: 6547.3. Samples: 2047996. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:26:04,287][52220] Avg episode reward: [(0, '6922.979'), (1, '8869.455')] [2023-09-21 13:26:09,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12809.3). Total num frames: 2121728. Throughput: 0: 6532.9, 1: 6534.0. Samples: 2127868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:26:09,288][52220] Avg episode reward: [(0, '6832.589'), (1, '1018.921')] [2023-09-21 13:26:09,399][52979] Updated weights for policy 1, policy_version 2080 (0.0015) [2023-09-21 13:26:09,400][52980] Updated weights for policy 0, policy_version 2080 (0.0015) [2023-09-21 13:26:14,286][52220] Fps is (10 sec: 13106.9, 60 sec: 12970.6, 300 sec: 12818.1). Total num frames: 2187264. Throughput: 0: 6488.6, 1: 6489.9. Samples: 2165124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:26:14,287][52220] Avg episode reward: [(0, '6299.769'), (1, '2917.314')] [2023-09-21 13:26:14,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002136_1093632.pth... [2023-09-21 13:26:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002136_1093632.pth... [2023-09-21 13:26:14,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001760_901120.pth [2023-09-21 13:26:14,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001760_901120.pth [2023-09-21 13:26:15,925][52980] Updated weights for policy 0, policy_version 2160 (0.0012) [2023-09-21 13:26:15,925][52979] Updated weights for policy 1, policy_version 2160 (0.0015) [2023-09-21 13:26:19,286][52220] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12826.3). Total num frames: 2252800. Throughput: 0: 6467.7, 1: 6467.4. Samples: 2241444. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:26:19,287][52220] Avg episode reward: [(0, '417.647'), (1, '5806.103')] [2023-09-21 13:26:22,247][52980] Updated weights for policy 0, policy_version 2240 (0.0014) [2023-09-21 13:26:22,247][52979] Updated weights for policy 1, policy_version 2240 (0.0015) [2023-09-21 13:26:24,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 12834.1). Total num frames: 2318336. Throughput: 0: 6473.8, 1: 6476.3. Samples: 2319464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:26:24,288][52220] Avg episode reward: [(0, '674.690'), (1, '8673.284')] [2023-09-21 13:26:28,391][52979] Updated weights for policy 1, policy_version 2320 (0.0014) [2023-09-21 13:26:28,391][52980] Updated weights for policy 0, policy_version 2320 (0.0015) [2023-09-21 13:26:29,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12841.5). Total num frames: 2383872. Throughput: 0: 6492.4, 1: 6492.4. Samples: 2359718. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:26:29,287][52220] Avg episode reward: [(0, '3037.838'), (1, '7623.096')] [2023-09-21 13:26:29,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002328_1191936.pth... [2023-09-21 13:26:29,293][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002328_1191936.pth... [2023-09-21 13:26:29,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001944_995328.pth [2023-09-21 13:26:29,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001944_995328.pth [2023-09-21 13:26:34,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12848.5). Total num frames: 2449408. Throughput: 0: 6479.8, 1: 6480.2. Samples: 2437656. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:26:34,287][52220] Avg episode reward: [(0, '5721.071'), (1, '463.553')] [2023-09-21 13:26:34,774][52979] Updated weights for policy 1, policy_version 2400 (0.0014) [2023-09-21 13:26:34,774][52980] Updated weights for policy 0, policy_version 2400 (0.0014) [2023-09-21 13:26:39,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.6, 300 sec: 12855.1). Total num frames: 2514944. Throughput: 0: 6461.5, 1: 6461.8. Samples: 2514930. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:26:39,287][52220] Avg episode reward: [(0, '6041.492'), (1, '619.276')] [2023-09-21 13:26:41,066][52980] Updated weights for policy 0, policy_version 2480 (0.0013) [2023-09-21 13:26:41,066][52979] Updated weights for policy 1, policy_version 2480 (0.0012) [2023-09-21 13:26:44,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12861.4). Total num frames: 2580480. Throughput: 0: 6489.9, 1: 6487.6. Samples: 2555906. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:26:44,287][52220] Avg episode reward: [(0, '5855.563'), (1, '3436.696')] [2023-09-21 13:26:44,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002520_1290240.pth... [2023-09-21 13:26:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002520_1290240.pth... [2023-09-21 13:26:44,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002136_1093632.pth [2023-09-21 13:26:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002136_1093632.pth [2023-09-21 13:26:47,194][52980] Updated weights for policy 0, policy_version 2560 (0.0013) [2023-09-21 13:26:47,194][52979] Updated weights for policy 1, policy_version 2560 (0.0015) [2023-09-21 13:26:49,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.6, 300 sec: 12867.4). Total num frames: 2646016. Throughput: 0: 6546.0, 1: 6547.4. Samples: 2637204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:26:49,287][52220] Avg episode reward: [(0, '6457.100'), (1, '6204.589')] [2023-09-21 13:26:53,289][52980] Updated weights for policy 0, policy_version 2640 (0.0014) [2023-09-21 13:26:53,289][52979] Updated weights for policy 1, policy_version 2640 (0.0013) [2023-09-21 13:26:54,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12873.1). Total num frames: 2711552. Throughput: 0: 6526.7, 1: 6526.7. Samples: 2715270. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:26:54,288][52220] Avg episode reward: [(0, '6807.248'), (1, '7389.904')] [2023-09-21 13:26:59,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.1, 300 sec: 12878.6). Total num frames: 2777088. Throughput: 0: 6551.0, 1: 6551.5. Samples: 2754738. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:26:59,288][52220] Avg episode reward: [(0, '6351.091'), (1, '7023.754')] [2023-09-21 13:26:59,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002712_1388544.pth... [2023-09-21 13:26:59,298][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002712_1388544.pth... [2023-09-21 13:26:59,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002328_1191936.pth [2023-09-21 13:26:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002328_1191936.pth [2023-09-21 13:26:59,506][52980] Updated weights for policy 0, policy_version 2720 (0.0013) [2023-09-21 13:26:59,506][52979] Updated weights for policy 1, policy_version 2720 (0.0013) [2023-09-21 13:27:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.1, 300 sec: 12883.8). Total num frames: 2842624. Throughput: 0: 6555.4, 1: 6555.6. Samples: 2831442. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:27:04,288][52220] Avg episode reward: [(0, '6102.821'), (1, '6401.047')] [2023-09-21 13:27:06,082][52980] Updated weights for policy 0, policy_version 2800 (0.0015) [2023-09-21 13:27:06,082][52979] Updated weights for policy 1, policy_version 2800 (0.0014) [2023-09-21 13:27:09,287][52220] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12888.7). Total num frames: 2908160. Throughput: 0: 6544.5, 1: 6543.5. Samples: 2908426. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:27:09,288][52220] Avg episode reward: [(0, '5230.132'), (1, '5837.226')] [2023-09-21 13:27:12,422][52980] Updated weights for policy 0, policy_version 2880 (0.0014) [2023-09-21 13:27:12,424][52979] Updated weights for policy 1, policy_version 2880 (0.0014) [2023-09-21 13:27:14,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12857.9). Total num frames: 2965504. Throughput: 0: 6520.6, 1: 6520.3. Samples: 2946558. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:27:14,287][52220] Avg episode reward: [(0, '1027.219'), (1, '2280.409')] [2023-09-21 13:27:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002896_1482752.pth... [2023-09-21 13:27:14,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002896_1482752.pth... [2023-09-21 13:27:14,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002520_1290240.pth [2023-09-21 13:27:14,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002520_1290240.pth [2023-09-21 13:27:18,730][52980] Updated weights for policy 0, policy_version 2960 (0.0014) [2023-09-21 13:27:18,730][52979] Updated weights for policy 1, policy_version 2960 (0.0015) [2023-09-21 13:27:19,286][52220] Fps is (10 sec: 12288.0, 60 sec: 12970.6, 300 sec: 12863.2). Total num frames: 3031040. Throughput: 0: 6506.2, 1: 6506.0. Samples: 3023208. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:27:19,287][52220] Avg episode reward: [(0, '2016.999'), (1, '1212.339')] [2023-09-21 13:27:24,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12868.3). Total num frames: 3096576. Throughput: 0: 6541.0, 1: 6542.6. Samples: 3103690. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:27:24,287][52220] Avg episode reward: [(0, '743.100'), (1, '3084.323')] [2023-09-21 13:27:24,974][52979] Updated weights for policy 1, policy_version 3040 (0.0013) [2023-09-21 13:27:24,974][52980] Updated weights for policy 0, policy_version 3040 (0.0011) [2023-09-21 13:27:29,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12873.1). Total num frames: 3162112. Throughput: 0: 6505.3, 1: 6507.4. Samples: 3141476. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:27:29,287][52220] Avg episode reward: [(0, '1835.633'), (1, '1015.822')] [2023-09-21 13:27:29,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003088_1581056.pth... [2023-09-21 13:27:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003088_1581056.pth... [2023-09-21 13:27:29,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002712_1388544.pth [2023-09-21 13:27:29,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002712_1388544.pth [2023-09-21 13:27:31,482][52979] Updated weights for policy 1, policy_version 3120 (0.0016) [2023-09-21 13:27:31,482][52980] Updated weights for policy 0, policy_version 3120 (0.0015) [2023-09-21 13:27:34,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12877.8). Total num frames: 3227648. Throughput: 0: 6447.3, 1: 6447.9. Samples: 3217490. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:27:34,287][52220] Avg episode reward: [(0, '1079.976'), (1, '919.988')] [2023-09-21 13:27:37,734][52980] Updated weights for policy 0, policy_version 3200 (0.0014) [2023-09-21 13:27:37,734][52979] Updated weights for policy 1, policy_version 3200 (0.0016) [2023-09-21 13:27:39,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12882.3). Total num frames: 3293184. Throughput: 0: 6436.5, 1: 6436.7. Samples: 3294560. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:27:39,287][52220] Avg episode reward: [(0, '815.120'), (1, '554.483')] [2023-09-21 13:27:44,167][52980] Updated weights for policy 0, policy_version 3280 (0.0014) [2023-09-21 13:27:44,167][52979] Updated weights for policy 1, policy_version 3280 (0.0014) [2023-09-21 13:27:44,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 12886.6). Total num frames: 3358720. Throughput: 0: 6432.0, 1: 6431.4. Samples: 3333588. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:27:44,288][52220] Avg episode reward: [(0, '1107.621'), (1, '3291.924')] [2023-09-21 13:27:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003280_1679360.pth... [2023-09-21 13:27:44,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003280_1679360.pth... [2023-09-21 13:27:44,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002896_1482752.pth [2023-09-21 13:27:44,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002896_1482752.pth [2023-09-21 13:27:49,286][52220] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12859.9). Total num frames: 3416064. Throughput: 0: 6431.5, 1: 6431.1. Samples: 3410258. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:27:49,288][52220] Avg episode reward: [(0, '1492.838'), (1, '4366.645')] [2023-09-21 13:27:50,625][52980] Updated weights for policy 0, policy_version 3360 (0.0010) [2023-09-21 13:27:50,625][52979] Updated weights for policy 1, policy_version 3360 (0.0012) [2023-09-21 13:27:54,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12864.5). Total num frames: 3481600. Throughput: 0: 6446.5, 1: 6447.1. Samples: 3488638. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:27:54,287][52220] Avg episode reward: [(0, '2174.187'), (1, '5011.108')] [2023-09-21 13:27:56,841][52980] Updated weights for policy 0, policy_version 3440 (0.0013) [2023-09-21 13:27:56,841][52979] Updated weights for policy 1, policy_version 3440 (0.0012) [2023-09-21 13:27:59,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12868.9). Total num frames: 3547136. Throughput: 0: 6460.1, 1: 6459.6. Samples: 3527942. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:27:59,287][52220] Avg episode reward: [(0, '467.350'), (1, '600.526')] [2023-09-21 13:27:59,291][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003464_1773568.pth... [2023-09-21 13:27:59,295][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003088_1581056.pth [2023-09-21 13:27:59,337][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003472_1777664.pth... [2023-09-21 13:27:59,340][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003088_1581056.pth [2023-09-21 13:28:03,068][52980] Updated weights for policy 0, policy_version 3520 (0.0015) [2023-09-21 13:28:03,068][52979] Updated weights for policy 1, policy_version 3520 (0.0013) [2023-09-21 13:28:04,286][52220] Fps is (10 sec: 13926.5, 60 sec: 12970.7, 300 sec: 12902.4). Total num frames: 3620864. Throughput: 0: 6476.0, 1: 6476.1. Samples: 3606052. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:28:04,287][52220] Avg episode reward: [(0, '1380.872'), (1, '3287.548')] [2023-09-21 13:28:09,163][52980] Updated weights for policy 0, policy_version 3600 (0.0015) [2023-09-21 13:28:09,163][52979] Updated weights for policy 1, policy_version 3600 (0.0012) [2023-09-21 13:28:09,286][52220] Fps is (10 sec: 13926.3, 60 sec: 12970.7, 300 sec: 12906.0). Total num frames: 3686400. Throughput: 0: 6476.3, 1: 6474.4. Samples: 3686468. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:28:09,288][52220] Avg episode reward: [(0, '958.101'), (1, '5374.573')] [2023-09-21 13:28:14,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12909.5). Total num frames: 3751936. Throughput: 0: 6506.9, 1: 6505.6. Samples: 3727042. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:28:14,288][52220] Avg episode reward: [(0, '828.671'), (1, '5105.236')] [2023-09-21 13:28:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003664_1875968.pth... [2023-09-21 13:28:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003664_1875968.pth... [2023-09-21 13:28:14,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003280_1679360.pth [2023-09-21 13:28:14,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003280_1679360.pth [2023-09-21 13:28:15,450][52979] Updated weights for policy 1, policy_version 3680 (0.0014) [2023-09-21 13:28:15,450][52980] Updated weights for policy 0, policy_version 3680 (0.0013) [2023-09-21 13:28:19,287][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12912.8). Total num frames: 3817472. Throughput: 0: 6547.3, 1: 6547.5. Samples: 3806760. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:28:19,287][52220] Avg episode reward: [(0, '927.280'), (1, '6190.526')] [2023-09-21 13:28:21,753][52980] Updated weights for policy 0, policy_version 3760 (0.0012) [2023-09-21 13:28:21,754][52979] Updated weights for policy 1, policy_version 3760 (0.0016) [2023-09-21 13:28:24,286][52220] Fps is (10 sec: 12288.3, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 3874816. Throughput: 0: 6534.3, 1: 6532.7. Samples: 3882572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:28:24,287][52220] Avg episode reward: [(0, '1593.569'), (1, '6727.270')] [2023-09-21 13:28:28,119][52980] Updated weights for policy 0, policy_version 3840 (0.0013) [2023-09-21 13:28:28,120][52979] Updated weights for policy 1, policy_version 3840 (0.0011) [2023-09-21 13:28:29,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 3940352. Throughput: 0: 6511.5, 1: 6511.5. Samples: 3919624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:28:29,287][52220] Avg episode reward: [(0, '430.176'), (1, '6822.899')] [2023-09-21 13:28:29,292][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003848_1970176.pth... [2023-09-21 13:28:29,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003848_1970176.pth... [2023-09-21 13:28:29,297][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003472_1777664.pth [2023-09-21 13:28:29,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003464_1773568.pth [2023-09-21 13:28:34,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 4005888. Throughput: 0: 6528.7, 1: 6526.9. Samples: 3997760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:28:34,287][52220] Avg episode reward: [(0, '1148.194'), (1, '6026.845')] [2023-09-21 13:28:34,403][52979] Updated weights for policy 1, policy_version 3920 (0.0011) [2023-09-21 13:28:34,403][52980] Updated weights for policy 0, policy_version 3920 (0.0015) [2023-09-21 13:28:39,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12970.6, 300 sec: 12968.4). Total num frames: 4071424. Throughput: 0: 6515.7, 1: 6515.4. Samples: 4075036. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:28:39,288][52220] Avg episode reward: [(0, '285.021'), (1, '6219.559')] [2023-09-21 13:28:40,922][52980] Updated weights for policy 0, policy_version 4000 (0.0014) [2023-09-21 13:28:40,923][52979] Updated weights for policy 1, policy_version 4000 (0.0016) [2023-09-21 13:28:44,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12968.4). Total num frames: 4136960. Throughput: 0: 6494.6, 1: 6493.8. Samples: 4112418. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:28:44,287][52220] Avg episode reward: [(0, '644.599'), (1, '6672.147')] [2023-09-21 13:28:44,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004040_2068480.pth... [2023-09-21 13:28:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004040_2068480.pth... [2023-09-21 13:28:44,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003664_1875968.pth [2023-09-21 13:28:44,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003664_1875968.pth [2023-09-21 13:28:46,847][52980] Updated weights for policy 0, policy_version 4080 (0.0011) [2023-09-21 13:28:46,848][52979] Updated weights for policy 1, policy_version 4080 (0.0013) [2023-09-21 13:28:49,287][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12968.4). Total num frames: 4202496. Throughput: 0: 6555.2, 1: 6554.8. Samples: 4196006. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:28:49,288][52220] Avg episode reward: [(0, '918.545'), (1, '7838.863')] [2023-09-21 13:28:53,047][52979] Updated weights for policy 1, policy_version 4160 (0.0013) [2023-09-21 13:28:53,048][52980] Updated weights for policy 0, policy_version 4160 (0.0012) [2023-09-21 13:28:54,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 4268032. Throughput: 0: 6543.2, 1: 6545.3. Samples: 4275446. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:28:54,287][52220] Avg episode reward: [(0, '706.591'), (1, '8731.683')] [2023-09-21 13:28:59,170][52979] Updated weights for policy 1, policy_version 4240 (0.0013) [2023-09-21 13:28:59,170][52980] Updated weights for policy 0, policy_version 4240 (0.0012) [2023-09-21 13:28:59,286][52220] Fps is (10 sec: 13926.6, 60 sec: 13243.7, 300 sec: 13023.9). Total num frames: 4341760. Throughput: 0: 6545.1, 1: 6546.6. Samples: 4316168. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:28:59,287][52220] Avg episode reward: [(0, '367.257'), (1, '9270.249')] [2023-09-21 13:28:59,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004240_2170880.pth... [2023-09-21 13:28:59,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004240_2170880.pth... [2023-09-21 13:28:59,297][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003848_1970176.pth [2023-09-21 13:28:59,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003848_1970176.pth [2023-09-21 13:29:04,287][52220] Fps is (10 sec: 13106.7, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 4399104. Throughput: 0: 6518.0, 1: 6518.1. Samples: 4393388. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:29:04,288][52220] Avg episode reward: [(0, '1165.006'), (1, '9356.618')] [2023-09-21 13:29:04,298][52885] Saving new best policy, reward=9356.618! [2023-09-21 13:29:05,674][52979] Updated weights for policy 1, policy_version 4320 (0.0013) [2023-09-21 13:29:05,674][52980] Updated weights for policy 0, policy_version 4320 (0.0014) [2023-09-21 13:29:09,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 4464640. Throughput: 0: 6485.0, 1: 6486.5. Samples: 4466292. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:29:09,288][52220] Avg episode reward: [(0, '3001.533'), (1, '9003.651')] [2023-09-21 13:29:12,403][52980] Updated weights for policy 0, policy_version 4400 (0.0013) [2023-09-21 13:29:12,403][52979] Updated weights for policy 1, policy_version 4400 (0.0013) [2023-09-21 13:29:14,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 4530176. Throughput: 0: 6484.3, 1: 6483.9. Samples: 4503198. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:29:14,288][52220] Avg episode reward: [(0, '3265.888'), (1, '8376.869')] [2023-09-21 13:29:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004424_2265088.pth... [2023-09-21 13:29:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004424_2265088.pth... [2023-09-21 13:29:14,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004040_2068480.pth [2023-09-21 13:29:14,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004040_2068480.pth [2023-09-21 13:29:18,776][52980] Updated weights for policy 0, policy_version 4480 (0.0013) [2023-09-21 13:29:18,777][52979] Updated weights for policy 1, policy_version 4480 (0.0014) [2023-09-21 13:29:19,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12834.2, 300 sec: 12968.4). Total num frames: 4587520. Throughput: 0: 6469.4, 1: 6470.9. Samples: 4580072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:29:19,287][52220] Avg episode reward: [(0, '3622.771'), (1, '6926.232')] [2023-09-21 13:29:24,287][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.6, 300 sec: 12968.4). Total num frames: 4653056. Throughput: 0: 6488.6, 1: 6489.2. Samples: 4659038. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:29:24,288][52220] Avg episode reward: [(0, '3888.915'), (1, '6733.320')] [2023-09-21 13:29:24,975][52979] Updated weights for policy 1, policy_version 4560 (0.0014) [2023-09-21 13:29:24,975][52980] Updated weights for policy 0, policy_version 4560 (0.0015) [2023-09-21 13:29:29,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 12968.3). Total num frames: 4718592. Throughput: 0: 6528.4, 1: 6529.7. Samples: 4700036. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:29:29,287][52220] Avg episode reward: [(0, '4531.947'), (1, '7272.334')] [2023-09-21 13:29:29,335][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004616_2363392.pth... [2023-09-21 13:29:29,338][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004240_2170880.pth [2023-09-21 13:29:29,343][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004616_2363392.pth... [2023-09-21 13:29:29,347][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004240_2170880.pth [2023-09-21 13:29:31,071][52980] Updated weights for policy 0, policy_version 4640 (0.0015) [2023-09-21 13:29:31,071][52979] Updated weights for policy 1, policy_version 4640 (0.0011) [2023-09-21 13:29:34,287][52220] Fps is (10 sec: 13926.4, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 4792320. Throughput: 0: 6489.5, 1: 6489.9. Samples: 4780080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:29:34,287][52220] Avg episode reward: [(0, '3506.421'), (1, '7730.738')] [2023-09-21 13:29:37,116][52980] Updated weights for policy 0, policy_version 4720 (0.0014) [2023-09-21 13:29:37,116][52979] Updated weights for policy 1, policy_version 4720 (0.0014) [2023-09-21 13:29:39,286][52220] Fps is (10 sec: 13926.7, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 4857856. Throughput: 0: 6523.8, 1: 6524.4. Samples: 4862616. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:29:39,287][52220] Avg episode reward: [(0, '3920.986'), (1, '3772.676')] [2023-09-21 13:29:42,959][52980] Updated weights for policy 0, policy_version 4800 (0.0014) [2023-09-21 13:29:42,959][52979] Updated weights for policy 1, policy_version 4800 (0.0015) [2023-09-21 13:29:44,286][52220] Fps is (10 sec: 13926.5, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 4931584. Throughput: 0: 6553.8, 1: 6553.1. Samples: 4905978. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:29:44,287][52220] Avg episode reward: [(0, '4582.768'), (1, '1280.545')] [2023-09-21 13:29:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004816_2465792.pth... [2023-09-21 13:29:44,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004816_2465792.pth... [2023-09-21 13:29:44,306][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004424_2265088.pth [2023-09-21 13:29:44,306][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004424_2265088.pth [2023-09-21 13:29:49,227][52980] Updated weights for policy 0, policy_version 4880 (0.0012) [2023-09-21 13:29:49,227][52979] Updated weights for policy 1, policy_version 4880 (0.0012) [2023-09-21 13:29:49,286][52220] Fps is (10 sec: 13926.3, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 4997120. Throughput: 0: 6577.6, 1: 6576.7. Samples: 4985328. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:29:49,287][52220] Avg episode reward: [(0, '350.718'), (1, '1158.513')] [2023-09-21 13:29:54,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 5062656. Throughput: 0: 6628.2, 1: 6625.8. Samples: 5062720. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:29:54,287][52220] Avg episode reward: [(0, '1454.240'), (1, '377.826')] [2023-09-21 13:29:55,414][52980] Updated weights for policy 0, policy_version 4960 (0.0011) [2023-09-21 13:29:55,416][52979] Updated weights for policy 1, policy_version 4960 (0.0013) [2023-09-21 13:29:59,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 5128192. Throughput: 0: 6669.3, 1: 6668.4. Samples: 5103396. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:29:59,288][52220] Avg episode reward: [(0, '1881.211'), (1, '516.736')] [2023-09-21 13:29:59,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005008_2564096.pth... [2023-09-21 13:29:59,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005008_2564096.pth... [2023-09-21 13:29:59,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004616_2363392.pth [2023-09-21 13:29:59,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004616_2363392.pth [2023-09-21 13:30:01,428][52980] Updated weights for policy 0, policy_version 5040 (0.0014) [2023-09-21 13:30:01,428][52979] Updated weights for policy 1, policy_version 5040 (0.0015) [2023-09-21 13:30:04,286][52220] Fps is (10 sec: 13516.6, 60 sec: 13312.0, 300 sec: 13065.5). Total num frames: 5197824. Throughput: 0: 6729.2, 1: 6729.8. Samples: 5185728. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:30:04,287][52220] Avg episode reward: [(0, '294.281'), (1, '2939.865')] [2023-09-21 13:30:07,369][52980] Updated weights for policy 0, policy_version 5120 (0.0012) [2023-09-21 13:30:07,370][52979] Updated weights for policy 1, policy_version 5120 (0.0014) [2023-09-21 13:30:09,286][52220] Fps is (10 sec: 13926.7, 60 sec: 13380.3, 300 sec: 13079.4). Total num frames: 5267456. Throughput: 0: 6762.1, 1: 6759.8. Samples: 5267520. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:30:09,287][52220] Avg episode reward: [(0, '824.120'), (1, '4656.765')] [2023-09-21 13:30:13,682][52979] Updated weights for policy 1, policy_version 5200 (0.0013) [2023-09-21 13:30:13,683][52980] Updated weights for policy 0, policy_version 5200 (0.0014) [2023-09-21 13:30:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13312.0, 300 sec: 13093.3). Total num frames: 5328896. Throughput: 0: 6705.6, 1: 6706.1. Samples: 5303564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:30:14,288][52220] Avg episode reward: [(0, '444.057'), (1, '5805.610')] [2023-09-21 13:30:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005208_2666496.pth... [2023-09-21 13:30:14,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005208_2666496.pth... [2023-09-21 13:30:14,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004816_2465792.pth [2023-09-21 13:30:14,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004816_2465792.pth [2023-09-21 13:30:19,286][52220] Fps is (10 sec: 12287.9, 60 sec: 13380.3, 300 sec: 13051.7). Total num frames: 5390336. Throughput: 0: 6702.2, 1: 6701.1. Samples: 5383228. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:30:19,287][52220] Avg episode reward: [(0, '376.089'), (1, '4874.545')] [2023-09-21 13:30:20,037][52979] Updated weights for policy 1, policy_version 5280 (0.0012) [2023-09-21 13:30:20,037][52980] Updated weights for policy 0, policy_version 5280 (0.0014) [2023-09-21 13:30:24,286][52220] Fps is (10 sec: 12697.7, 60 sec: 13380.3, 300 sec: 13051.7). Total num frames: 5455872. Throughput: 0: 6656.3, 1: 6655.6. Samples: 5461656. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:30:24,287][52220] Avg episode reward: [(0, '578.371'), (1, '3591.697')] [2023-09-21 13:30:26,486][52979] Updated weights for policy 1, policy_version 5360 (0.0014) [2023-09-21 13:30:26,487][52980] Updated weights for policy 0, policy_version 5360 (0.0015) [2023-09-21 13:30:29,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13380.3, 300 sec: 13051.7). Total num frames: 5521408. Throughput: 0: 6565.9, 1: 6564.9. Samples: 5496866. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:30:29,287][52220] Avg episode reward: [(0, '365.739'), (1, '4877.659')] [2023-09-21 13:30:29,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005392_2760704.pth... [2023-09-21 13:30:29,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005392_2760704.pth... [2023-09-21 13:30:29,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005008_2564096.pth [2023-09-21 13:30:29,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005008_2564096.pth [2023-09-21 13:30:32,916][52979] Updated weights for policy 1, policy_version 5440 (0.0012) [2023-09-21 13:30:32,917][52980] Updated weights for policy 0, policy_version 5440 (0.0015) [2023-09-21 13:30:34,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 5586944. Throughput: 0: 6545.7, 1: 6546.7. Samples: 5574486. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:30:34,287][52220] Avg episode reward: [(0, '372.220'), (1, '1004.327')] [2023-09-21 13:30:39,286][52220] Fps is (10 sec: 12287.9, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 5644288. Throughput: 0: 6526.7, 1: 6528.9. Samples: 5650224. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:30:39,287][52220] Avg episode reward: [(0, '335.218'), (1, '2058.538')] [2023-09-21 13:30:39,406][52979] Updated weights for policy 1, policy_version 5520 (0.0011) [2023-09-21 13:30:39,407][52980] Updated weights for policy 0, policy_version 5520 (0.0015) [2023-09-21 13:30:44,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 5709824. Throughput: 0: 6490.5, 1: 6491.4. Samples: 5687580. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:30:44,287][52220] Avg episode reward: [(0, '480.115'), (1, '3705.854')] [2023-09-21 13:30:44,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005576_2854912.pth... [2023-09-21 13:30:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005576_2854912.pth... [2023-09-21 13:30:44,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005208_2666496.pth [2023-09-21 13:30:44,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005208_2666496.pth [2023-09-21 13:30:45,754][52980] Updated weights for policy 0, policy_version 5600 (0.0015) [2023-09-21 13:30:45,754][52979] Updated weights for policy 1, policy_version 5600 (0.0013) [2023-09-21 13:30:49,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 5775360. Throughput: 0: 6443.5, 1: 6443.0. Samples: 5765616. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:30:49,287][52220] Avg episode reward: [(0, '184.421'), (1, '5297.616')] [2023-09-21 13:30:52,259][52979] Updated weights for policy 1, policy_version 5680 (0.0010) [2023-09-21 13:30:52,259][52980] Updated weights for policy 0, policy_version 5680 (0.0014) [2023-09-21 13:30:54,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13051.7). Total num frames: 5840896. Throughput: 0: 6370.7, 1: 6370.9. Samples: 5840892. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:30:54,287][52220] Avg episode reward: [(0, '306.843'), (1, '6361.076')] [2023-09-21 13:30:58,840][52980] Updated weights for policy 0, policy_version 5760 (0.0013) [2023-09-21 13:30:58,841][52979] Updated weights for policy 1, policy_version 5760 (0.0013) [2023-09-21 13:30:59,286][52220] Fps is (10 sec: 12287.9, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 5898240. Throughput: 0: 6380.6, 1: 6379.5. Samples: 5877764. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:30:59,287][52220] Avg episode reward: [(0, '69.112'), (1, '6973.074')] [2023-09-21 13:30:59,292][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005760_2949120.pth... [2023-09-21 13:30:59,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005760_2949120.pth... [2023-09-21 13:30:59,297][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005392_2760704.pth [2023-09-21 13:30:59,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005392_2760704.pth [2023-09-21 13:31:04,287][52220] Fps is (10 sec: 12287.7, 60 sec: 12765.9, 300 sec: 13023.9). Total num frames: 5963776. Throughput: 0: 6306.4, 1: 6307.1. Samples: 5950838. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:31:04,287][52220] Avg episode reward: [(0, '914.490'), (1, '6972.267')] [2023-09-21 13:31:05,449][52979] Updated weights for policy 1, policy_version 5840 (0.0014) [2023-09-21 13:31:05,450][52980] Updated weights for policy 0, policy_version 5840 (0.0015) [2023-09-21 13:31:09,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12697.6, 300 sec: 13023.9). Total num frames: 6029312. Throughput: 0: 6315.0, 1: 6315.2. Samples: 6030016. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:31:09,287][52220] Avg episode reward: [(0, '85.445'), (1, '7152.617')] [2023-09-21 13:31:11,738][52979] Updated weights for policy 1, policy_version 5920 (0.0013) [2023-09-21 13:31:11,738][52980] Updated weights for policy 0, policy_version 5920 (0.0013) [2023-09-21 13:31:14,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12629.3, 300 sec: 12996.1). Total num frames: 6086656. Throughput: 0: 6354.4, 1: 6355.8. Samples: 6068826. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:31:14,288][52220] Avg episode reward: [(0, '1651.748'), (1, '6784.776')] [2023-09-21 13:31:14,313][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005952_3047424.pth... [2023-09-21 13:31:14,316][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005576_2854912.pth [2023-09-21 13:31:14,318][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005952_3047424.pth... [2023-09-21 13:31:14,321][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005576_2854912.pth [2023-09-21 13:31:17,748][52979] Updated weights for policy 1, policy_version 6000 (0.0014) [2023-09-21 13:31:17,748][52980] Updated weights for policy 0, policy_version 6000 (0.0013) [2023-09-21 13:31:19,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 6160384. Throughput: 0: 6401.9, 1: 6401.6. Samples: 6150644. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:31:19,287][52220] Avg episode reward: [(0, '413.373'), (1, '6414.272')] [2023-09-21 13:31:23,981][52979] Updated weights for policy 1, policy_version 6080 (0.0013) [2023-09-21 13:31:23,982][52980] Updated weights for policy 0, policy_version 6080 (0.0014) [2023-09-21 13:31:24,286][52220] Fps is (10 sec: 13926.7, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 6225920. Throughput: 0: 6422.3, 1: 6421.6. Samples: 6228200. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:31:24,287][52220] Avg episode reward: [(0, '355.671'), (1, '6870.944')] [2023-09-21 13:31:29,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 13023.9). Total num frames: 6291456. Throughput: 0: 6437.4, 1: 6436.0. Samples: 6266880. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:31:29,287][52220] Avg episode reward: [(0, '113.754'), (1, '7331.743')] [2023-09-21 13:31:29,291][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006144_3145728.pth... [2023-09-21 13:31:29,291][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006144_3145728.pth... [2023-09-21 13:31:29,294][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005760_2949120.pth [2023-09-21 13:31:29,296][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005760_2949120.pth [2023-09-21 13:31:30,453][52979] Updated weights for policy 1, policy_version 6160 (0.0014) [2023-09-21 13:31:30,454][52980] Updated weights for policy 0, policy_version 6160 (0.0015) [2023-09-21 13:31:34,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 6356992. Throughput: 0: 6416.6, 1: 6417.7. Samples: 6343160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:31:34,287][52220] Avg episode reward: [(0, '54.877'), (1, '8621.272')] [2023-09-21 13:31:36,645][52979] Updated weights for policy 1, policy_version 6240 (0.0011) [2023-09-21 13:31:36,646][52980] Updated weights for policy 0, policy_version 6240 (0.0015) [2023-09-21 13:31:39,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 6422528. Throughput: 0: 6462.8, 1: 6463.2. Samples: 6422564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:31:39,287][52220] Avg episode reward: [(0, '2101.810'), (1, '9082.155')] [2023-09-21 13:31:43,038][52980] Updated weights for policy 0, policy_version 6320 (0.0015) [2023-09-21 13:31:43,039][52979] Updated weights for policy 1, policy_version 6320 (0.0014) [2023-09-21 13:31:44,287][52220] Fps is (10 sec: 12287.7, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 6479872. Throughput: 0: 6474.8, 1: 6475.7. Samples: 6460540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:31:44,288][52220] Avg episode reward: [(0, '2284.566'), (1, '9265.721')] [2023-09-21 13:31:44,315][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006336_3244032.pth... [2023-09-21 13:31:44,318][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005952_3047424.pth [2023-09-21 13:31:44,319][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006336_3244032.pth... [2023-09-21 13:31:44,322][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005952_3047424.pth [2023-09-21 13:31:49,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 6545408. Throughput: 0: 6517.1, 1: 6515.3. Samples: 6537296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:31:49,287][52220] Avg episode reward: [(0, '2004.841'), (1, '9265.750')] [2023-09-21 13:31:49,456][52979] Updated weights for policy 1, policy_version 6400 (0.0014) [2023-09-21 13:31:49,456][52980] Updated weights for policy 0, policy_version 6400 (0.0016) [2023-09-21 13:31:54,286][52220] Fps is (10 sec: 13107.6, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 6610944. Throughput: 0: 6499.7, 1: 6499.5. Samples: 6614976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:31:54,287][52220] Avg episode reward: [(0, '3119.889'), (1, '9082.109')] [2023-09-21 13:31:55,681][52979] Updated weights for policy 1, policy_version 6480 (0.0013) [2023-09-21 13:31:55,681][52980] Updated weights for policy 0, policy_version 6480 (0.0010) [2023-09-21 13:31:59,287][52220] Fps is (10 sec: 13925.9, 60 sec: 13107.1, 300 sec: 13023.9). Total num frames: 6684672. Throughput: 0: 6568.6, 1: 6567.9. Samples: 6659966. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:31:59,288][52220] Avg episode reward: [(0, '5542.930'), (1, '8899.402')] [2023-09-21 13:31:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006528_3342336.pth... [2023-09-21 13:31:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006528_3342336.pth... [2023-09-21 13:31:59,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006144_3145728.pth [2023-09-21 13:31:59,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006144_3145728.pth [2023-09-21 13:32:01,656][52980] Updated weights for policy 0, policy_version 6560 (0.0013) [2023-09-21 13:32:01,656][52979] Updated weights for policy 1, policy_version 6560 (0.0014) [2023-09-21 13:32:04,286][52220] Fps is (10 sec: 13926.4, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 6750208. Throughput: 0: 6538.2, 1: 6538.6. Samples: 6739100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:04,287][52220] Avg episode reward: [(0, '5077.679'), (1, '8902.310')] [2023-09-21 13:32:07,582][52979] Updated weights for policy 1, policy_version 6640 (0.0015) [2023-09-21 13:32:07,582][52980] Updated weights for policy 0, policy_version 6640 (0.0015) [2023-09-21 13:32:09,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 6815744. Throughput: 0: 6563.0, 1: 6563.8. Samples: 6818904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:09,287][52220] Avg episode reward: [(0, '5082.925'), (1, '8460.693')] [2023-09-21 13:32:13,354][52884] KL-divergence is very high: 114.0149 [2023-09-21 13:32:13,359][52884] KL-divergence is very high: 121.4593 [2023-09-21 13:32:14,015][52979] Updated weights for policy 1, policy_version 6720 (0.0013) [2023-09-21 13:32:14,015][52980] Updated weights for policy 0, policy_version 6720 (0.0011) [2023-09-21 13:32:14,287][52220] Fps is (10 sec: 13106.7, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 6881280. Throughput: 0: 6559.7, 1: 6562.1. Samples: 6857364. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:14,288][52220] Avg episode reward: [(0, '5185.753'), (1, '8294.328')] [2023-09-21 13:32:14,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006720_3440640.pth... [2023-09-21 13:32:14,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006720_3440640.pth... [2023-09-21 13:32:14,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006336_3244032.pth [2023-09-21 13:32:14,308][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006336_3244032.pth [2023-09-21 13:32:19,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 6946816. Throughput: 0: 6617.5, 1: 6615.0. Samples: 6938624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:19,287][52220] Avg episode reward: [(0, '5546.651'), (1, '7885.490')] [2023-09-21 13:32:20,121][52979] Updated weights for policy 1, policy_version 6800 (0.0014) [2023-09-21 13:32:20,121][52980] Updated weights for policy 0, policy_version 6800 (0.0013) [2023-09-21 13:32:24,287][52220] Fps is (10 sec: 13926.6, 60 sec: 13243.7, 300 sec: 13079.4). Total num frames: 7020544. Throughput: 0: 6644.5, 1: 6643.9. Samples: 7020546. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:24,287][52220] Avg episode reward: [(0, '5820.198'), (1, '7771.782')] [2023-09-21 13:32:26,116][52980] Updated weights for policy 0, policy_version 6880 (0.0012) [2023-09-21 13:32:26,116][52979] Updated weights for policy 1, policy_version 6880 (0.0012) [2023-09-21 13:32:28,055][52884] KL-divergence is very high: 174.7079 [2023-09-21 13:32:28,060][52884] KL-divergence is very high: 127.8993 [2023-09-21 13:32:28,752][52884] KL-divergence is very high: 108.4238 [2023-09-21 13:32:29,287][52220] Fps is (10 sec: 13106.9, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 7077888. Throughput: 0: 6638.7, 1: 6638.4. Samples: 7058014. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:29,288][52220] Avg episode reward: [(0, '6021.697'), (1, '6924.544')] [2023-09-21 13:32:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006912_3538944.pth... [2023-09-21 13:32:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006912_3538944.pth... [2023-09-21 13:32:29,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006528_3342336.pth [2023-09-21 13:32:29,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006528_3342336.pth [2023-09-21 13:32:32,786][52980] Updated weights for policy 0, policy_version 6960 (0.0017) [2023-09-21 13:32:32,786][52979] Updated weights for policy 1, policy_version 6960 (0.0016) [2023-09-21 13:32:34,043][52885] KL-divergence is very high: 162.5900 [2023-09-21 13:32:34,286][52220] Fps is (10 sec: 12288.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 7143424. Throughput: 0: 6608.1, 1: 6609.9. Samples: 7132108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:34,287][52220] Avg episode reward: [(0, '5944.587'), (1, '6984.766')] [2023-09-21 13:32:38,501][52885] KL-divergence is very high: 112.2562 [2023-09-21 13:32:38,506][52885] KL-divergence is very high: 182.1912 [2023-09-21 13:32:38,517][52885] KL-divergence is very high: 237.8661 [2023-09-21 13:32:39,123][52884] KL-divergence is very high: 111.6947 [2023-09-21 13:32:39,128][52884] KL-divergence is very high: 195.2763 [2023-09-21 13:32:39,143][52979] Updated weights for policy 1, policy_version 7040 (0.0013) [2023-09-21 13:32:39,144][52980] Updated weights for policy 0, policy_version 7040 (0.0014) [2023-09-21 13:32:39,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 7208960. Throughput: 0: 6602.2, 1: 6601.8. Samples: 7209156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:39,287][52220] Avg episode reward: [(0, '4853.285'), (1, '6706.128')] [2023-09-21 13:32:44,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13243.7, 300 sec: 13079.4). Total num frames: 7274496. Throughput: 0: 6555.0, 1: 6554.1. Samples: 7249874. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:44,288][52220] Avg episode reward: [(0, '1385.633'), (1, '6167.174')] [2023-09-21 13:32:44,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007104_3637248.pth... [2023-09-21 13:32:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007104_3637248.pth... [2023-09-21 13:32:44,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006720_3440640.pth [2023-09-21 13:32:44,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006720_3440640.pth [2023-09-21 13:32:45,396][52980] Updated weights for policy 0, policy_version 7120 (0.0014) [2023-09-21 13:32:45,396][52979] Updated weights for policy 1, policy_version 7120 (0.0016) [2023-09-21 13:32:49,286][52220] Fps is (10 sec: 12287.9, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 7331840. Throughput: 0: 6499.0, 1: 6498.0. Samples: 7323970. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:32:49,288][52220] Avg episode reward: [(0, '1190.850'), (1, '5245.301')] [2023-09-21 13:32:51,877][52980] Updated weights for policy 0, policy_version 7200 (0.0014) [2023-09-21 13:32:51,877][52979] Updated weights for policy 1, policy_version 7200 (0.0016) [2023-09-21 13:32:54,286][52220] Fps is (10 sec: 12288.3, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 7397376. Throughput: 0: 6472.2, 1: 6471.4. Samples: 7401364. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:54,287][52220] Avg episode reward: [(0, '1688.535'), (1, '3485.425')] [2023-09-21 13:32:58,292][52979] Updated weights for policy 1, policy_version 7280 (0.0013) [2023-09-21 13:32:58,294][52980] Updated weights for policy 0, policy_version 7280 (0.0015) [2023-09-21 13:32:59,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 7462912. Throughput: 0: 6470.2, 1: 6470.1. Samples: 7439676. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:32:59,287][52220] Avg episode reward: [(0, '428.059'), (1, '3129.384')] [2023-09-21 13:32:59,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007288_3731456.pth... [2023-09-21 13:32:59,293][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007288_3731456.pth... [2023-09-21 13:32:59,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006912_3538944.pth [2023-09-21 13:32:59,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006912_3538944.pth [2023-09-21 13:33:04,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 13023.9). Total num frames: 7528448. Throughput: 0: 6456.7, 1: 6457.7. Samples: 7519776. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:33:04,288][52220] Avg episode reward: [(0, '1320.466'), (1, '2583.518')] [2023-09-21 13:33:04,644][52979] Updated weights for policy 1, policy_version 7360 (0.0013) [2023-09-21 13:33:04,646][52980] Updated weights for policy 0, policy_version 7360 (0.0016) [2023-09-21 13:33:09,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 7585792. Throughput: 0: 6362.0, 1: 6364.0. Samples: 7593220. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:33:09,287][52220] Avg episode reward: [(0, '3285.713'), (1, '4140.794')] [2023-09-21 13:33:11,235][52979] Updated weights for policy 1, policy_version 7440 (0.0014) [2023-09-21 13:33:11,236][52980] Updated weights for policy 0, policy_version 7440 (0.0013) [2023-09-21 13:33:14,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 7659520. Throughput: 0: 6372.5, 1: 6373.0. Samples: 7631556. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:33:14,287][52220] Avg episode reward: [(0, '327.184'), (1, '5978.137')] [2023-09-21 13:33:14,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007480_3829760.pth... [2023-09-21 13:33:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007480_3829760.pth... [2023-09-21 13:33:14,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007104_3637248.pth [2023-09-21 13:33:14,323][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007104_3637248.pth [2023-09-21 13:33:17,369][52980] Updated weights for policy 0, policy_version 7520 (0.0014) [2023-09-21 13:33:17,370][52979] Updated weights for policy 1, policy_version 7520 (0.0013) [2023-09-21 13:33:19,112][52885] KL-divergence is very high: 127.2461 [2023-09-21 13:33:19,121][52885] KL-divergence is very high: 160.4157 [2023-09-21 13:33:19,125][52885] KL-divergence is very high: 220.9313 [2023-09-21 13:33:19,130][52885] KL-divergence is very high: 173.6443 [2023-09-21 13:33:19,286][52220] Fps is (10 sec: 13926.5, 60 sec: 12970.7, 300 sec: 13051.7). Total num frames: 7725056. Throughput: 0: 6457.0, 1: 6456.9. Samples: 7713232. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:33:19,287][52220] Avg episode reward: [(0, '1051.801'), (1, '6242.662')] [2023-09-21 13:33:23,620][52979] Updated weights for policy 1, policy_version 7600 (0.0015) [2023-09-21 13:33:23,620][52980] Updated weights for policy 0, policy_version 7600 (0.0014) [2023-09-21 13:33:24,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 13051.7). Total num frames: 7790592. Throughput: 0: 6458.1, 1: 6457.9. Samples: 7790376. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:33:24,287][52220] Avg episode reward: [(0, '948.608'), (1, '5786.479')] [2023-09-21 13:33:29,287][52220] Fps is (10 sec: 13106.8, 60 sec: 12970.6, 300 sec: 13051.7). Total num frames: 7856128. Throughput: 0: 6453.8, 1: 6455.7. Samples: 7830804. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:33:29,288][52220] Avg episode reward: [(0, '850.104'), (1, '5709.365')] [2023-09-21 13:33:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007672_3928064.pth... [2023-09-21 13:33:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007672_3928064.pth... [2023-09-21 13:33:29,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007288_3731456.pth [2023-09-21 13:33:29,308][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007288_3731456.pth [2023-09-21 13:33:29,739][52979] Updated weights for policy 1, policy_version 7680 (0.0012) [2023-09-21 13:33:29,740][52980] Updated weights for policy 0, policy_version 7680 (0.0013) [2023-09-21 13:33:34,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 13051.7). Total num frames: 7921664. Throughput: 0: 6545.9, 1: 6546.1. Samples: 7913108. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:33:34,287][52220] Avg episode reward: [(0, '1505.381'), (1, '4029.727')] [2023-09-21 13:33:35,672][52980] Updated weights for policy 0, policy_version 7760 (0.0014) [2023-09-21 13:33:35,673][52979] Updated weights for policy 1, policy_version 7760 (0.0014) [2023-09-21 13:33:39,287][52220] Fps is (10 sec: 13926.6, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 7995392. Throughput: 0: 6601.0, 1: 6599.6. Samples: 7995392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:33:39,287][52220] Avg episode reward: [(0, '3677.659'), (1, '625.058')] [2023-09-21 13:33:41,818][52979] Updated weights for policy 1, policy_version 7840 (0.0013) [2023-09-21 13:33:41,819][52980] Updated weights for policy 0, policy_version 7840 (0.0014) [2023-09-21 13:33:44,287][52220] Fps is (10 sec: 13106.7, 60 sec: 12970.6, 300 sec: 13051.6). Total num frames: 8052736. Throughput: 0: 6597.9, 1: 6597.9. Samples: 8033494. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:33:44,288][52220] Avg episode reward: [(0, '6019.572'), (1, '632.349')] [2023-09-21 13:33:44,340][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007872_4030464.pth... [2023-09-21 13:33:44,343][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007480_3829760.pth [2023-09-21 13:33:44,346][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007872_4030464.pth... [2023-09-21 13:33:44,349][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007480_3829760.pth [2023-09-21 13:33:48,318][52979] Updated weights for policy 1, policy_version 7920 (0.0011) [2023-09-21 13:33:48,319][52980] Updated weights for policy 0, policy_version 7920 (0.0016) [2023-09-21 13:33:49,286][52220] Fps is (10 sec: 12288.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 8118272. Throughput: 0: 6549.2, 1: 6549.5. Samples: 8109216. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:33:49,287][52220] Avg episode reward: [(0, '6462.962'), (1, '653.049')] [2023-09-21 13:33:54,287][52220] Fps is (10 sec: 13107.7, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 8183808. Throughput: 0: 6578.4, 1: 6577.8. Samples: 8185248. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:33:54,288][52220] Avg episode reward: [(0, '6169.284'), (1, '937.395')] [2023-09-21 13:33:54,732][52979] Updated weights for policy 1, policy_version 8000 (0.0013) [2023-09-21 13:33:54,732][52980] Updated weights for policy 0, policy_version 8000 (0.0014) [2023-09-21 13:33:59,286][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 8249344. Throughput: 0: 6593.2, 1: 6590.7. Samples: 8224832. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:33:59,287][52220] Avg episode reward: [(0, '6363.465'), (1, '2655.908')] [2023-09-21 13:33:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008056_4124672.pth... [2023-09-21 13:33:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008056_4124672.pth... [2023-09-21 13:33:59,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007672_3928064.pth [2023-09-21 13:33:59,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007672_3928064.pth [2023-09-21 13:34:00,953][52980] Updated weights for policy 0, policy_version 8080 (0.0014) [2023-09-21 13:34:00,953][52979] Updated weights for policy 1, policy_version 8080 (0.0017) [2023-09-21 13:34:04,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 8314880. Throughput: 0: 6567.0, 1: 6567.3. Samples: 8304278. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:34:04,287][52220] Avg episode reward: [(0, '6284.006'), (1, '3940.957')] [2023-09-21 13:34:07,249][52979] Updated weights for policy 1, policy_version 8160 (0.0012) [2023-09-21 13:34:07,250][52980] Updated weights for policy 0, policy_version 8160 (0.0012) [2023-09-21 13:34:09,287][52220] Fps is (10 sec: 13106.9, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 8380416. Throughput: 0: 6565.4, 1: 6566.1. Samples: 8381298. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:34:09,288][52220] Avg episode reward: [(0, '6191.470'), (1, '3735.090')] [2023-09-21 13:34:13,319][52979] Updated weights for policy 1, policy_version 8240 (0.0011) [2023-09-21 13:34:13,320][52980] Updated weights for policy 0, policy_version 8240 (0.0015) [2023-09-21 13:34:14,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 8445952. Throughput: 0: 6568.1, 1: 6568.2. Samples: 8421936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:34:14,287][52220] Avg episode reward: [(0, '5728.107'), (1, '5522.777')] [2023-09-21 13:34:14,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008248_4222976.pth... [2023-09-21 13:34:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008248_4222976.pth... [2023-09-21 13:34:14,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007872_4030464.pth [2023-09-21 13:34:14,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007872_4030464.pth [2023-09-21 13:34:19,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 8511488. Throughput: 0: 6558.5, 1: 6557.5. Samples: 8503330. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:34:19,287][52220] Avg episode reward: [(0, '6006.540'), (1, '7048.366')] [2023-09-21 13:34:19,511][52980] Updated weights for policy 0, policy_version 8320 (0.0015) [2023-09-21 13:34:19,511][52979] Updated weights for policy 1, policy_version 8320 (0.0015) [2023-09-21 13:34:24,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 8577024. Throughput: 0: 6460.6, 1: 6461.9. Samples: 8576902. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:34:24,287][52220] Avg episode reward: [(0, '6099.005'), (1, '795.161')] [2023-09-21 13:34:26,332][52980] Updated weights for policy 0, policy_version 8400 (0.0013) [2023-09-21 13:34:26,332][52979] Updated weights for policy 1, policy_version 8400 (0.0015) [2023-09-21 13:34:29,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12970.8, 300 sec: 13023.9). Total num frames: 8634368. Throughput: 0: 6428.6, 1: 6428.5. Samples: 8612056. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:34:29,287][52220] Avg episode reward: [(0, '6844.153'), (1, '1821.965')] [2023-09-21 13:34:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008432_4317184.pth... [2023-09-21 13:34:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008432_4317184.pth... [2023-09-21 13:34:29,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008056_4124672.pth [2023-09-21 13:34:29,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008056_4124672.pth [2023-09-21 13:34:32,574][52979] Updated weights for policy 1, policy_version 8480 (0.0014) [2023-09-21 13:34:32,575][52980] Updated weights for policy 0, policy_version 8480 (0.0015) [2023-09-21 13:34:34,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 8699904. Throughput: 0: 6473.7, 1: 6474.0. Samples: 8691864. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:34:34,287][52220] Avg episode reward: [(0, '7216.463'), (1, '1158.496')] [2023-09-21 13:34:38,729][52979] Updated weights for policy 1, policy_version 8560 (0.0011) [2023-09-21 13:34:38,731][52980] Updated weights for policy 0, policy_version 8560 (0.0013) [2023-09-21 13:34:39,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 12996.1). Total num frames: 8765440. Throughput: 0: 6524.3, 1: 6525.4. Samples: 8772482. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:34:39,287][52220] Avg episode reward: [(0, '8054.671'), (1, '784.632')] [2023-09-21 13:34:44,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.8, 300 sec: 12996.1). Total num frames: 8830976. Throughput: 0: 6500.3, 1: 6501.8. Samples: 8809924. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:34:44,287][52220] Avg episode reward: [(0, '7314.468'), (1, '1509.728')] [2023-09-21 13:34:44,293][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008624_4415488.pth... [2023-09-21 13:34:44,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008624_4415488.pth... [2023-09-21 13:34:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008248_4222976.pth [2023-09-21 13:34:44,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008248_4222976.pth [2023-09-21 13:34:45,087][52979] Updated weights for policy 1, policy_version 8640 (0.0012) [2023-09-21 13:34:45,088][52980] Updated weights for policy 0, policy_version 8640 (0.0015) [2023-09-21 13:34:49,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 8896512. Throughput: 0: 6490.6, 1: 6488.2. Samples: 8888324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:34:49,288][52220] Avg episode reward: [(0, '7315.971'), (1, '3849.357')] [2023-09-21 13:34:51,377][52980] Updated weights for policy 0, policy_version 8720 (0.0012) [2023-09-21 13:34:51,378][52979] Updated weights for policy 1, policy_version 8720 (0.0011) [2023-09-21 13:34:54,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 8962048. Throughput: 0: 6460.1, 1: 6459.8. Samples: 8962688. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:34:54,288][52220] Avg episode reward: [(0, '7406.052'), (1, '4832.839')] [2023-09-21 13:34:57,777][52980] Updated weights for policy 0, policy_version 8800 (0.0010) [2023-09-21 13:34:57,777][52979] Updated weights for policy 1, policy_version 8800 (0.0012) [2023-09-21 13:34:59,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 12982.2). Total num frames: 9027584. Throughput: 0: 6457.5, 1: 6455.9. Samples: 9003042. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:34:59,288][52220] Avg episode reward: [(0, '7689.771'), (1, '6585.865')] [2023-09-21 13:34:59,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008816_4513792.pth... [2023-09-21 13:34:59,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008816_4513792.pth... [2023-09-21 13:34:59,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008432_4317184.pth [2023-09-21 13:34:59,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008432_4317184.pth [2023-09-21 13:35:04,093][52979] Updated weights for policy 1, policy_version 8880 (0.0014) [2023-09-21 13:35:04,093][52980] Updated weights for policy 0, policy_version 8880 (0.0011) [2023-09-21 13:35:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12968.3). Total num frames: 9093120. Throughput: 0: 6405.5, 1: 6407.7. Samples: 9079926. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:04,287][52220] Avg episode reward: [(0, '6344.982'), (1, '8266.991')] [2023-09-21 13:35:09,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12982.2). Total num frames: 9158656. Throughput: 0: 6465.3, 1: 6464.0. Samples: 9158724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:09,287][52220] Avg episode reward: [(0, '5886.401'), (1, '8260.217')] [2023-09-21 13:35:10,541][52980] Updated weights for policy 0, policy_version 8960 (0.0014) [2023-09-21 13:35:10,542][52979] Updated weights for policy 1, policy_version 8960 (0.0014) [2023-09-21 13:35:14,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 9224192. Throughput: 0: 6501.5, 1: 6501.1. Samples: 9197176. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:14,287][52220] Avg episode reward: [(0, '6321.409'), (1, '7973.051')] [2023-09-21 13:35:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009008_4612096.pth... [2023-09-21 13:35:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009008_4612096.pth... [2023-09-21 13:35:14,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008624_4415488.pth [2023-09-21 13:35:14,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008624_4415488.pth [2023-09-21 13:35:16,683][52884] KL-divergence is very high: 147.5208 [2023-09-21 13:35:16,702][52979] Updated weights for policy 1, policy_version 9040 (0.0013) [2023-09-21 13:35:16,703][52980] Updated weights for policy 0, policy_version 9040 (0.0014) [2023-09-21 13:35:19,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 9289728. Throughput: 0: 6503.1, 1: 6503.6. Samples: 9277164. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:19,288][52220] Avg episode reward: [(0, '6772.299'), (1, '8248.228')] [2023-09-21 13:35:23,189][52979] Updated weights for policy 1, policy_version 9120 (0.0013) [2023-09-21 13:35:23,189][52980] Updated weights for policy 0, policy_version 9120 (0.0015) [2023-09-21 13:35:24,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 9347072. Throughput: 0: 6450.2, 1: 6450.3. Samples: 9353006. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:35:24,288][52220] Avg episode reward: [(0, '6876.287'), (1, '8247.524')] [2023-09-21 13:35:29,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.6, 300 sec: 12968.4). Total num frames: 9412608. Throughput: 0: 6452.2, 1: 6453.6. Samples: 9390684. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:29,287][52220] Avg episode reward: [(0, '6970.759'), (1, '8433.254')] [2023-09-21 13:35:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009192_4706304.pth... [2023-09-21 13:35:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009192_4706304.pth... [2023-09-21 13:35:29,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008816_4513792.pth [2023-09-21 13:35:29,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008816_4513792.pth [2023-09-21 13:35:29,690][52979] Updated weights for policy 1, policy_version 9200 (0.0014) [2023-09-21 13:35:29,691][52980] Updated weights for policy 0, policy_version 9200 (0.0013) [2023-09-21 13:35:34,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 9478144. Throughput: 0: 6412.3, 1: 6414.5. Samples: 9465524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:34,287][52220] Avg episode reward: [(0, '7790.873'), (1, '8158.990')] [2023-09-21 13:35:36,222][52980] Updated weights for policy 0, policy_version 9280 (0.0014) [2023-09-21 13:35:36,222][52979] Updated weights for policy 1, policy_version 9280 (0.0014) [2023-09-21 13:35:39,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 9535488. Throughput: 0: 6439.4, 1: 6439.4. Samples: 9542234. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:39,287][52220] Avg episode reward: [(0, '7686.583'), (1, '8435.588')] [2023-09-21 13:35:42,494][52980] Updated weights for policy 0, policy_version 9360 (0.0010) [2023-09-21 13:35:42,495][52979] Updated weights for policy 1, policy_version 9360 (0.0015) [2023-09-21 13:35:44,287][52220] Fps is (10 sec: 12287.6, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 9601024. Throughput: 0: 6419.4, 1: 6420.9. Samples: 9580856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:44,288][52220] Avg episode reward: [(0, '7499.111'), (1, '8810.491')] [2023-09-21 13:35:44,306][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009384_4804608.pth... [2023-09-21 13:35:44,310][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009008_4612096.pth [2023-09-21 13:35:44,312][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009384_4804608.pth... [2023-09-21 13:35:44,315][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009008_4612096.pth [2023-09-21 13:35:48,737][52979] Updated weights for policy 1, policy_version 9440 (0.0015) [2023-09-21 13:35:48,737][52980] Updated weights for policy 0, policy_version 9440 (0.0013) [2023-09-21 13:35:49,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 9666560. Throughput: 0: 6442.4, 1: 6441.5. Samples: 9659702. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:49,287][52220] Avg episode reward: [(0, '7780.731'), (1, '8995.249')] [2023-09-21 13:35:54,286][52220] Fps is (10 sec: 13107.6, 60 sec: 12834.2, 300 sec: 12996.1). Total num frames: 9732096. Throughput: 0: 6434.7, 1: 6436.9. Samples: 9737944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:54,287][52220] Avg episode reward: [(0, '7785.500'), (1, '8905.386')] [2023-09-21 13:35:55,014][52980] Updated weights for policy 0, policy_version 9520 (0.0010) [2023-09-21 13:35:55,014][52979] Updated weights for policy 1, policy_version 9520 (0.0015) [2023-09-21 13:35:59,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12996.1). Total num frames: 9797632. Throughput: 0: 6421.8, 1: 6421.5. Samples: 9775124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:35:59,287][52220] Avg episode reward: [(0, '8345.138'), (1, '4258.503')] [2023-09-21 13:35:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009568_4898816.pth... [2023-09-21 13:35:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009568_4898816.pth... [2023-09-21 13:35:59,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009192_4706304.pth [2023-09-21 13:35:59,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009192_4706304.pth [2023-09-21 13:36:01,481][52979] Updated weights for policy 1, policy_version 9600 (0.0011) [2023-09-21 13:36:01,481][52980] Updated weights for policy 0, policy_version 9600 (0.0014) [2023-09-21 13:36:04,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 9863168. Throughput: 0: 6397.5, 1: 6397.3. Samples: 9852930. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:36:04,288][52220] Avg episode reward: [(0, '6398.098'), (1, '3352.109')] [2023-09-21 13:36:07,758][52980] Updated weights for policy 0, policy_version 9680 (0.0014) [2023-09-21 13:36:07,758][52979] Updated weights for policy 1, policy_version 9680 (0.0013) [2023-09-21 13:36:09,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 9928704. Throughput: 0: 6421.9, 1: 6421.4. Samples: 9930952. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:36:09,287][52220] Avg episode reward: [(0, '5843.385'), (1, '3512.802')] [2023-09-21 13:36:13,897][52979] Updated weights for policy 1, policy_version 9760 (0.0012) [2023-09-21 13:36:13,897][52980] Updated weights for policy 0, policy_version 9760 (0.0013) [2023-09-21 13:36:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 9994240. Throughput: 0: 6449.6, 1: 6449.6. Samples: 9971150. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:36:14,288][52220] Avg episode reward: [(0, '7050.183'), (1, '1346.937')] [2023-09-21 13:36:14,298][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009760_4997120.pth... [2023-09-21 13:36:14,299][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009760_4997120.pth... [2023-09-21 13:36:14,306][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009384_4804608.pth [2023-09-21 13:36:14,307][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009384_4804608.pth [2023-09-21 13:36:19,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 10059776. Throughput: 0: 6508.6, 1: 6507.3. Samples: 10051246. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:36:19,287][52220] Avg episode reward: [(0, '8058.993'), (1, '413.387')] [2023-09-21 13:36:20,323][52980] Updated weights for policy 0, policy_version 9840 (0.0015) [2023-09-21 13:36:20,324][52979] Updated weights for policy 1, policy_version 9840 (0.0014) [2023-09-21 13:36:24,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 10125312. Throughput: 0: 6488.9, 1: 6489.0. Samples: 10126240. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:36:24,287][52220] Avg episode reward: [(0, '8148.723'), (1, '2379.849')] [2023-09-21 13:36:26,694][52979] Updated weights for policy 1, policy_version 9920 (0.0014) [2023-09-21 13:36:26,694][52980] Updated weights for policy 0, policy_version 9920 (0.0013) [2023-09-21 13:36:29,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 10182656. Throughput: 0: 6498.2, 1: 6497.2. Samples: 10165648. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:36:29,287][52220] Avg episode reward: [(0, '8428.183'), (1, '4691.143')] [2023-09-21 13:36:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009944_5091328.pth... [2023-09-21 13:36:29,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009944_5091328.pth... [2023-09-21 13:36:29,305][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009568_4898816.pth [2023-09-21 13:36:29,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009568_4898816.pth [2023-09-21 13:36:33,318][52980] Updated weights for policy 0, policy_version 10000 (0.0016) [2023-09-21 13:36:33,319][52979] Updated weights for policy 1, policy_version 10000 (0.0014) [2023-09-21 13:36:34,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 10248192. Throughput: 0: 6432.7, 1: 6432.5. Samples: 10238636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:36:34,287][52220] Avg episode reward: [(0, '6512.251'), (1, '5747.198')] [2023-09-21 13:36:39,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 10313728. Throughput: 0: 6457.1, 1: 6457.1. Samples: 10319088. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:36:39,287][52220] Avg episode reward: [(0, '6034.451'), (1, '6649.274')] [2023-09-21 13:36:39,421][52979] Updated weights for policy 1, policy_version 10080 (0.0013) [2023-09-21 13:36:39,421][52980] Updated weights for policy 0, policy_version 10080 (0.0015) [2023-09-21 13:36:44,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 10379264. Throughput: 0: 6465.2, 1: 6465.9. Samples: 10357024. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:36:44,287][52220] Avg episode reward: [(0, '6474.585'), (1, '7822.539')] [2023-09-21 13:36:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010136_5189632.pth... [2023-09-21 13:36:44,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010136_5189632.pth... [2023-09-21 13:36:44,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009760_4997120.pth [2023-09-21 13:36:44,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009760_4997120.pth [2023-09-21 13:36:45,746][52980] Updated weights for policy 0, policy_version 10160 (0.0007) [2023-09-21 13:36:45,746][52979] Updated weights for policy 1, policy_version 10160 (0.0011) [2023-09-21 13:36:49,286][52220] Fps is (10 sec: 13926.5, 60 sec: 13107.3, 300 sec: 13023.9). Total num frames: 10452992. Throughput: 0: 6500.3, 1: 6500.8. Samples: 10437978. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:36:49,287][52220] Avg episode reward: [(0, '5752.454'), (1, '8363.105')] [2023-09-21 13:36:51,686][52979] Updated weights for policy 1, policy_version 10240 (0.0014) [2023-09-21 13:36:51,687][52980] Updated weights for policy 0, policy_version 10240 (0.0011) [2023-09-21 13:36:54,286][52220] Fps is (10 sec: 13926.2, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 10518528. Throughput: 0: 6530.3, 1: 6528.3. Samples: 10518592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:36:54,287][52220] Avg episode reward: [(0, '5499.023'), (1, '8265.828')] [2023-09-21 13:36:57,890][52979] Updated weights for policy 1, policy_version 10320 (0.0013) [2023-09-21 13:36:57,890][52980] Updated weights for policy 0, policy_version 10320 (0.0013) [2023-09-21 13:36:59,287][52220] Fps is (10 sec: 13106.8, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 10584064. Throughput: 0: 6531.3, 1: 6530.0. Samples: 10558912. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:36:59,288][52220] Avg episode reward: [(0, '2764.041'), (1, '8536.378')] [2023-09-21 13:36:59,298][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010336_5292032.pth... [2023-09-21 13:36:59,299][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010336_5292032.pth... [2023-09-21 13:36:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009944_5091328.pth [2023-09-21 13:36:59,305][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009944_5091328.pth [2023-09-21 13:37:04,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12968.3). Total num frames: 10641408. Throughput: 0: 6475.6, 1: 6476.1. Samples: 10634076. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:37:04,288][52220] Avg episode reward: [(0, '2261.539'), (1, '8898.543')] [2023-09-21 13:37:04,341][52979] Updated weights for policy 1, policy_version 10400 (0.0015) [2023-09-21 13:37:04,341][52980] Updated weights for policy 0, policy_version 10400 (0.0015) [2023-09-21 13:37:09,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 10715136. Throughput: 0: 6563.3, 1: 6563.3. Samples: 10716940. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:37:09,287][52220] Avg episode reward: [(0, '4187.421'), (1, '9086.295')] [2023-09-21 13:37:10,335][52980] Updated weights for policy 0, policy_version 10480 (0.0010) [2023-09-21 13:37:10,336][52979] Updated weights for policy 1, policy_version 10480 (0.0014) [2023-09-21 13:37:14,287][52220] Fps is (10 sec: 13926.3, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 10780672. Throughput: 0: 6558.8, 1: 6558.5. Samples: 10755928. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:37:14,288][52220] Avg episode reward: [(0, '5168.250'), (1, '8814.144')] [2023-09-21 13:37:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010528_5390336.pth... [2023-09-21 13:37:14,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010528_5390336.pth... [2023-09-21 13:37:14,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010136_5189632.pth [2023-09-21 13:37:14,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010136_5189632.pth [2023-09-21 13:37:16,736][52979] Updated weights for policy 1, policy_version 10560 (0.0015) [2023-09-21 13:37:16,736][52980] Updated weights for policy 0, policy_version 10560 (0.0015) [2023-09-21 13:37:19,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12968.4). Total num frames: 10846208. Throughput: 0: 6626.3, 1: 6626.6. Samples: 10835016. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:37:19,288][52220] Avg episode reward: [(0, '6051.410'), (1, '7824.859')] [2023-09-21 13:37:22,912][52980] Updated weights for policy 0, policy_version 10640 (0.0015) [2023-09-21 13:37:22,912][52979] Updated weights for policy 1, policy_version 10640 (0.0014) [2023-09-21 13:37:24,286][52220] Fps is (10 sec: 13107.7, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 10911744. Throughput: 0: 6602.1, 1: 6601.4. Samples: 10913244. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:37:24,287][52220] Avg episode reward: [(0, '6501.744'), (1, '7278.530')] [2023-09-21 13:37:27,810][52885] KL-divergence is very high: 143.8693 [2023-09-21 13:37:29,190][52980] Updated weights for policy 0, policy_version 10720 (0.0014) [2023-09-21 13:37:29,190][52979] Updated weights for policy 1, policy_version 10720 (0.0012) [2023-09-21 13:37:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13243.7, 300 sec: 12996.1). Total num frames: 10977280. Throughput: 0: 6619.7, 1: 6618.3. Samples: 10952738. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:37:29,288][52220] Avg episode reward: [(0, '6683.727'), (1, '7822.196')] [2023-09-21 13:37:29,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010720_5488640.pth... [2023-09-21 13:37:29,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010720_5488640.pth... [2023-09-21 13:37:29,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010336_5292032.pth [2023-09-21 13:37:29,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010336_5292032.pth [2023-09-21 13:37:34,287][52220] Fps is (10 sec: 12287.7, 60 sec: 13107.2, 300 sec: 12968.3). Total num frames: 11034624. Throughput: 0: 6518.6, 1: 6518.2. Samples: 11024638. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:37:34,287][52220] Avg episode reward: [(0, '6778.857'), (1, '6895.518')] [2023-09-21 13:37:35,908][52980] Updated weights for policy 0, policy_version 10800 (0.0012) [2023-09-21 13:37:35,909][52979] Updated weights for policy 1, policy_version 10800 (0.0016) [2023-09-21 13:37:39,287][52220] Fps is (10 sec: 12288.0, 60 sec: 13107.2, 300 sec: 12968.4). Total num frames: 11100160. Throughput: 0: 6486.9, 1: 6488.4. Samples: 11102482. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:37:39,288][52220] Avg episode reward: [(0, '7150.504'), (1, '6519.836')] [2023-09-21 13:37:42,034][52980] Updated weights for policy 0, policy_version 10880 (0.0013) [2023-09-21 13:37:42,035][52979] Updated weights for policy 1, policy_version 10880 (0.0014) [2023-09-21 13:37:42,606][52885] KL-divergence is very high: 138.0579 [2023-09-21 13:37:44,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 11165696. Throughput: 0: 6489.3, 1: 6489.9. Samples: 11142974. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 13:37:44,287][52220] Avg episode reward: [(0, '6963.117'), (1, '7246.856')] [2023-09-21 13:37:44,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010904_5582848.pth... [2023-09-21 13:37:44,298][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010904_5582848.pth... [2023-09-21 13:37:44,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010528_5390336.pth [2023-09-21 13:37:44,306][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010528_5390336.pth [2023-09-21 13:37:48,122][52980] Updated weights for policy 0, policy_version 10960 (0.0012) [2023-09-21 13:37:48,122][52979] Updated weights for policy 1, policy_version 10960 (0.0013) [2023-09-21 13:37:49,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 11231232. Throughput: 0: 6550.0, 1: 6551.1. Samples: 11223624. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:37:49,288][52220] Avg episode reward: [(0, '7147.782'), (1, '8438.793')] [2023-09-21 13:37:54,252][52980] Updated weights for policy 0, policy_version 11040 (0.0012) [2023-09-21 13:37:54,252][52979] Updated weights for policy 1, policy_version 11040 (0.0014) [2023-09-21 13:37:54,286][52220] Fps is (10 sec: 13926.3, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 11304960. Throughput: 0: 6531.1, 1: 6530.9. Samples: 11304730. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:37:54,287][52220] Avg episode reward: [(0, '7782.139'), (1, '8537.772')] [2023-09-21 13:37:57,931][52885] KL-divergence is very high: 115.5991 [2023-09-21 13:37:59,286][52220] Fps is (10 sec: 13926.6, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 11370496. Throughput: 0: 6548.2, 1: 6549.4. Samples: 11345316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:37:59,287][52220] Avg episode reward: [(0, '7417.448'), (1, '7447.728')] [2023-09-21 13:37:59,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011104_5685248.pth... [2023-09-21 13:37:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011104_5685248.pth... [2023-09-21 13:37:59,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010720_5488640.pth [2023-09-21 13:37:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010720_5488640.pth [2023-09-21 13:38:00,539][52980] Updated weights for policy 0, policy_version 11120 (0.0015) [2023-09-21 13:38:00,539][52979] Updated weights for policy 1, policy_version 11120 (0.0014) [2023-09-21 13:38:01,151][52884] KL-divergence is very high: 230.7247 [2023-09-21 13:38:04,286][52220] Fps is (10 sec: 12697.7, 60 sec: 13175.5, 300 sec: 13037.8). Total num frames: 11431936. Throughput: 0: 6512.6, 1: 6512.1. Samples: 11421128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:38:04,288][52220] Avg episode reward: [(0, '6873.014'), (1, '6362.016')] [2023-09-21 13:38:06,743][52979] Updated weights for policy 1, policy_version 11200 (0.0013) [2023-09-21 13:38:06,744][52980] Updated weights for policy 0, policy_version 11200 (0.0014) [2023-09-21 13:38:07,990][52885] KL-divergence is very high: 212.7733 [2023-09-21 13:38:08,000][52885] KL-divergence is very high: 139.2826 [2023-09-21 13:38:09,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 11493376. Throughput: 0: 6517.9, 1: 6518.6. Samples: 11499888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:38:09,287][52220] Avg episode reward: [(0, '6786.092'), (1, '6250.348')] [2023-09-21 13:38:13,076][52885] KL-divergence is very high: 109.2257 [2023-09-21 13:38:13,086][52885] KL-divergence is very high: 150.3513 [2023-09-21 13:38:13,109][52980] Updated weights for policy 0, policy_version 11280 (0.0012) [2023-09-21 13:38:13,110][52979] Updated weights for policy 1, policy_version 11280 (0.0016) [2023-09-21 13:38:14,286][52220] Fps is (10 sec: 12697.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 11558912. Throughput: 0: 6512.6, 1: 6514.2. Samples: 11538944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:38:14,287][52220] Avg episode reward: [(0, '7521.599'), (1, '5948.663')] [2023-09-21 13:38:14,293][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011288_5779456.pth... [2023-09-21 13:38:14,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011288_5779456.pth... [2023-09-21 13:38:14,296][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010904_5582848.pth [2023-09-21 13:38:14,296][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010904_5582848.pth [2023-09-21 13:38:19,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 11624448. Throughput: 0: 6574.6, 1: 6572.5. Samples: 11616260. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:38:19,287][52220] Avg episode reward: [(0, '8163.203'), (1, '6327.272')] [2023-09-21 13:38:19,616][52980] Updated weights for policy 0, policy_version 11360 (0.0016) [2023-09-21 13:38:19,616][52979] Updated weights for policy 1, policy_version 11360 (0.0014) [2023-09-21 13:38:24,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 11689984. Throughput: 0: 6565.7, 1: 6566.3. Samples: 11693422. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:38:24,287][52220] Avg episode reward: [(0, '8707.112'), (1, '7533.837')] [2023-09-21 13:38:25,719][52980] Updated weights for policy 0, policy_version 11440 (0.0012) [2023-09-21 13:38:25,719][52979] Updated weights for policy 1, policy_version 11440 (0.0014) [2023-09-21 13:38:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 11755520. Throughput: 0: 6542.6, 1: 6542.6. Samples: 11731810. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:38:29,288][52220] Avg episode reward: [(0, '8334.238'), (1, '7161.629')] [2023-09-21 13:38:29,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011480_5877760.pth... [2023-09-21 13:38:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011480_5877760.pth... [2023-09-21 13:38:29,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011104_5685248.pth [2023-09-21 13:38:29,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011104_5685248.pth [2023-09-21 13:38:32,316][52979] Updated weights for policy 1, policy_version 11520 (0.0014) [2023-09-21 13:38:32,316][52980] Updated weights for policy 0, policy_version 11520 (0.0014) [2023-09-21 13:38:34,287][52220] Fps is (10 sec: 13106.9, 60 sec: 13107.2, 300 sec: 12968.3). Total num frames: 11821056. Throughput: 0: 6506.0, 1: 6505.5. Samples: 11809142. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:38:34,288][52220] Avg episode reward: [(0, '8241.042'), (1, '7426.216')] [2023-09-21 13:38:38,500][52980] Updated weights for policy 0, policy_version 11600 (0.0014) [2023-09-21 13:38:38,500][52979] Updated weights for policy 1, policy_version 11600 (0.0014) [2023-09-21 13:38:39,287][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 11886592. Throughput: 0: 6467.5, 1: 6467.5. Samples: 11886808. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:38:39,288][52220] Avg episode reward: [(0, '8147.919'), (1, '6684.867')] [2023-09-21 13:38:44,287][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 11952128. Throughput: 0: 6434.1, 1: 6434.2. Samples: 11924392. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:38:44,288][52220] Avg episode reward: [(0, '8057.555'), (1, '4198.564')] [2023-09-21 13:38:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011672_5976064.pth... [2023-09-21 13:38:44,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011672_5976064.pth... [2023-09-21 13:38:44,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011288_5779456.pth [2023-09-21 13:38:44,307][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011288_5779456.pth [2023-09-21 13:38:44,662][52979] Updated weights for policy 1, policy_version 11680 (0.0015) [2023-09-21 13:38:44,663][52980] Updated weights for policy 0, policy_version 11680 (0.0015) [2023-09-21 13:38:49,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 12017664. Throughput: 0: 6514.3, 1: 6514.4. Samples: 12007422. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:38:49,288][52220] Avg episode reward: [(0, '7413.246'), (1, '3924.282')] [2023-09-21 13:38:50,814][52979] Updated weights for policy 1, policy_version 11760 (0.0010) [2023-09-21 13:38:50,815][52980] Updated weights for policy 0, policy_version 11760 (0.0014) [2023-09-21 13:38:54,286][52220] Fps is (10 sec: 13107.6, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 12083200. Throughput: 0: 6511.6, 1: 6510.8. Samples: 12085896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:38:54,287][52220] Avg episode reward: [(0, '7505.948'), (1, '5026.154')] [2023-09-21 13:38:57,041][52980] Updated weights for policy 0, policy_version 11840 (0.0011) [2023-09-21 13:38:57,042][52979] Updated weights for policy 1, policy_version 11840 (0.0014) [2023-09-21 13:38:59,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 12148736. Throughput: 0: 6525.1, 1: 6525.3. Samples: 12126212. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:38:59,287][52220] Avg episode reward: [(0, '7775.590'), (1, '5951.310')] [2023-09-21 13:38:59,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011864_6074368.pth... [2023-09-21 13:38:59,292][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011864_6074368.pth... [2023-09-21 13:38:59,296][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011480_5877760.pth [2023-09-21 13:38:59,298][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011480_5877760.pth [2023-09-21 13:39:03,339][52979] Updated weights for policy 1, policy_version 11920 (0.0014) [2023-09-21 13:39:03,339][52980] Updated weights for policy 0, policy_version 11920 (0.0011) [2023-09-21 13:39:04,286][52220] Fps is (10 sec: 13107.0, 60 sec: 13038.9, 300 sec: 12996.1). Total num frames: 12214272. Throughput: 0: 6540.5, 1: 6542.7. Samples: 12205004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:39:04,287][52220] Avg episode reward: [(0, '7961.348'), (1, '6505.100')] [2023-09-21 13:39:09,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 12279808. Throughput: 0: 6536.7, 1: 6537.0. Samples: 12281738. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:39:09,287][52220] Avg episode reward: [(0, '7495.365'), (1, '6685.848')] [2023-09-21 13:39:09,636][52980] Updated weights for policy 0, policy_version 12000 (0.0016) [2023-09-21 13:39:09,636][52979] Updated weights for policy 1, policy_version 12000 (0.0014) [2023-09-21 13:39:14,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 12345344. Throughput: 0: 6563.3, 1: 6563.6. Samples: 12322520. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:39:14,288][52220] Avg episode reward: [(0, '5639.209'), (1, '7243.011')] [2023-09-21 13:39:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012056_6172672.pth... [2023-09-21 13:39:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012056_6172672.pth... [2023-09-21 13:39:14,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011672_5976064.pth [2023-09-21 13:39:14,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011672_5976064.pth [2023-09-21 13:39:15,687][52979] Updated weights for policy 1, policy_version 12080 (0.0011) [2023-09-21 13:39:15,688][52980] Updated weights for policy 0, policy_version 12080 (0.0010) [2023-09-21 13:39:19,287][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 12410880. Throughput: 0: 6590.5, 1: 6589.8. Samples: 12402256. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:39:19,288][52220] Avg episode reward: [(0, '5827.376'), (1, '7798.181')] [2023-09-21 13:39:22,172][52884] KL-divergence is very high: 129.4762 [2023-09-21 13:39:22,191][52979] Updated weights for policy 1, policy_version 12160 (0.0008) [2023-09-21 13:39:22,192][52980] Updated weights for policy 0, policy_version 12160 (0.0015) [2023-09-21 13:39:24,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 12476416. Throughput: 0: 6586.4, 1: 6587.3. Samples: 12479620. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 13:39:24,287][52220] Avg episode reward: [(0, '5543.639'), (1, '8535.429')] [2023-09-21 13:39:28,185][52980] Updated weights for policy 0, policy_version 12240 (0.0014) [2023-09-21 13:39:28,185][52979] Updated weights for policy 1, policy_version 12240 (0.0015) [2023-09-21 13:39:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 12541952. Throughput: 0: 6631.4, 1: 6631.0. Samples: 12521200. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:39:29,288][52220] Avg episode reward: [(0, '5169.488'), (1, '8902.665')] [2023-09-21 13:39:29,298][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012248_6270976.pth... [2023-09-21 13:39:29,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012248_6270976.pth... [2023-09-21 13:39:29,310][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011864_6074368.pth [2023-09-21 13:39:29,314][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011864_6074368.pth [2023-09-21 13:39:34,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.3, 300 sec: 13023.9). Total num frames: 12607488. Throughput: 0: 6538.6, 1: 6538.5. Samples: 12595888. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:39:34,287][52220] Avg episode reward: [(0, '5544.157'), (1, '8813.978')] [2023-09-21 13:39:34,748][52979] Updated weights for policy 1, policy_version 12320 (0.0015) [2023-09-21 13:39:34,748][52980] Updated weights for policy 0, policy_version 12320 (0.0011) [2023-09-21 13:39:35,923][52885] KL-divergence is very high: 109.0132 [2023-09-21 13:39:39,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 12673024. Throughput: 0: 6524.4, 1: 6523.6. Samples: 12673058. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:39:39,287][52220] Avg episode reward: [(0, '5093.072'), (1, '8631.845')] [2023-09-21 13:39:41,072][52979] Updated weights for policy 1, policy_version 12400 (0.0014) [2023-09-21 13:39:41,072][52980] Updated weights for policy 0, policy_version 12400 (0.0011) [2023-09-21 13:39:44,287][52220] Fps is (10 sec: 12287.6, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 12730368. Throughput: 0: 6501.5, 1: 6501.3. Samples: 12711342. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:39:44,288][52220] Avg episode reward: [(0, '3407.951'), (1, '8258.386')] [2023-09-21 13:39:44,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012432_6365184.pth... [2023-09-21 13:39:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012432_6365184.pth... [2023-09-21 13:39:44,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012056_6172672.pth [2023-09-21 13:39:44,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012056_6172672.pth [2023-09-21 13:39:47,504][52980] Updated weights for policy 0, policy_version 12480 (0.0014) [2023-09-21 13:39:47,504][52979] Updated weights for policy 1, policy_version 12480 (0.0012) [2023-09-21 13:39:49,286][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 12795904. Throughput: 0: 6491.6, 1: 6491.5. Samples: 12789246. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:39:49,287][52220] Avg episode reward: [(0, '2658.060'), (1, '2674.528')] [2023-09-21 13:39:50,492][52885] KL-divergence is very high: 115.5594 [2023-09-21 13:39:53,672][52980] Updated weights for policy 0, policy_version 12560 (0.0013) [2023-09-21 13:39:53,674][52979] Updated weights for policy 1, policy_version 12560 (0.0013) [2023-09-21 13:39:54,286][52220] Fps is (10 sec: 13517.1, 60 sec: 13038.9, 300 sec: 13010.0). Total num frames: 12865536. Throughput: 0: 6524.7, 1: 6523.8. Samples: 12868920. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:39:54,287][52220] Avg episode reward: [(0, '3869.231'), (1, '1042.752')] [2023-09-21 13:39:59,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 12926976. Throughput: 0: 6491.7, 1: 6491.7. Samples: 12906770. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:39:59,287][52220] Avg episode reward: [(0, '4981.854'), (1, '1124.933')] [2023-09-21 13:39:59,340][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012632_6467584.pth... [2023-09-21 13:39:59,342][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012632_6467584.pth... [2023-09-21 13:39:59,343][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012248_6270976.pth [2023-09-21 13:39:59,345][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012248_6270976.pth [2023-09-21 13:39:59,892][52980] Updated weights for policy 0, policy_version 12640 (0.0012) [2023-09-21 13:39:59,892][52979] Updated weights for policy 1, policy_version 12640 (0.0015) [2023-09-21 13:40:02,465][52885] KL-divergence is very high: 110.5634 [2023-09-21 13:40:04,286][52220] Fps is (10 sec: 12697.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 12992512. Throughput: 0: 6473.3, 1: 6473.4. Samples: 12984858. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:40:04,287][52220] Avg episode reward: [(0, '4702.406'), (1, '1019.731')] [2023-09-21 13:40:06,275][52980] Updated weights for policy 0, policy_version 12720 (0.0014) [2023-09-21 13:40:06,275][52979] Updated weights for policy 1, policy_version 12720 (0.0014) [2023-09-21 13:40:09,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 13058048. Throughput: 0: 6496.7, 1: 6495.8. Samples: 13064286. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:40:09,288][52220] Avg episode reward: [(0, '5727.771'), (1, '851.461')] [2023-09-21 13:40:12,550][52980] Updated weights for policy 0, policy_version 12800 (0.0015) [2023-09-21 13:40:12,550][52979] Updated weights for policy 1, policy_version 12800 (0.0015) [2023-09-21 13:40:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 13123584. Throughput: 0: 6459.3, 1: 6459.3. Samples: 13102536. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:40:14,287][52220] Avg episode reward: [(0, '6286.904'), (1, '2510.005')] [2023-09-21 13:40:14,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012816_6561792.pth... [2023-09-21 13:40:14,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012816_6561792.pth... [2023-09-21 13:40:14,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012432_6365184.pth [2023-09-21 13:40:14,306][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012432_6365184.pth [2023-09-21 13:40:18,934][52980] Updated weights for policy 0, policy_version 12880 (0.0014) [2023-09-21 13:40:18,934][52979] Updated weights for policy 1, policy_version 12880 (0.0010) [2023-09-21 13:40:19,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 13189120. Throughput: 0: 6495.7, 1: 6496.0. Samples: 13180516. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:40:19,287][52220] Avg episode reward: [(0, '6403.772'), (1, '1094.830')] [2023-09-21 13:40:24,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 13023.9). Total num frames: 13254656. Throughput: 0: 6480.1, 1: 6480.6. Samples: 13256292. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:40:24,287][52220] Avg episode reward: [(0, '6589.868'), (1, '3891.651')] [2023-09-21 13:40:25,346][52979] Updated weights for policy 1, policy_version 12960 (0.0011) [2023-09-21 13:40:25,346][52980] Updated weights for policy 0, policy_version 12960 (0.0014) [2023-09-21 13:40:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 13320192. Throughput: 0: 6493.9, 1: 6491.5. Samples: 13295684. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:40:29,288][52220] Avg episode reward: [(0, '5357.029'), (1, '4019.919')] [2023-09-21 13:40:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013008_6660096.pth... [2023-09-21 13:40:29,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013008_6660096.pth... [2023-09-21 13:40:29,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012632_6467584.pth [2023-09-21 13:40:29,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012632_6467584.pth [2023-09-21 13:40:31,604][52980] Updated weights for policy 0, policy_version 13040 (0.0015) [2023-09-21 13:40:31,605][52979] Updated weights for policy 1, policy_version 13040 (0.0014) [2023-09-21 13:40:34,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.6, 300 sec: 13051.7). Total num frames: 13385728. Throughput: 0: 6513.6, 1: 6512.9. Samples: 13375438. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:40:34,288][52220] Avg episode reward: [(0, '4798.645'), (1, '1322.321')] [2023-09-21 13:40:37,698][52980] Updated weights for policy 0, policy_version 13120 (0.0012) [2023-09-21 13:40:37,698][52979] Updated weights for policy 1, policy_version 13120 (0.0013) [2023-09-21 13:40:39,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 13051.7). Total num frames: 13451264. Throughput: 0: 6515.2, 1: 6516.0. Samples: 13455328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:40:39,287][52220] Avg episode reward: [(0, '4894.226'), (1, '1733.578')] [2023-09-21 13:40:43,916][52980] Updated weights for policy 0, policy_version 13200 (0.0012) [2023-09-21 13:40:43,917][52979] Updated weights for policy 1, policy_version 13200 (0.0015) [2023-09-21 13:40:44,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.3, 300 sec: 13051.7). Total num frames: 13516800. Throughput: 0: 6536.4, 1: 6536.6. Samples: 13495052. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:40:44,287][52220] Avg episode reward: [(0, '5821.654'), (1, '2273.474')] [2023-09-21 13:40:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013200_6758400.pth... [2023-09-21 13:40:44,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013200_6758400.pth... [2023-09-21 13:40:44,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012816_6561792.pth [2023-09-21 13:40:44,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012816_6561792.pth [2023-09-21 13:40:49,287][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13582336. Throughput: 0: 6536.3, 1: 6537.0. Samples: 13573156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:40:49,288][52220] Avg episode reward: [(0, '6473.417'), (1, '2034.321')] [2023-09-21 13:40:50,262][52980] Updated weights for policy 0, policy_version 13280 (0.0011) [2023-09-21 13:40:50,263][52979] Updated weights for policy 1, policy_version 13280 (0.0014) [2023-09-21 13:40:54,287][52220] Fps is (10 sec: 13106.9, 60 sec: 13038.9, 300 sec: 13051.7). Total num frames: 13647872. Throughput: 0: 6534.6, 1: 6535.8. Samples: 13652452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:40:54,287][52220] Avg episode reward: [(0, '6846.870'), (1, '4313.712')] [2023-09-21 13:40:56,350][52979] Updated weights for policy 1, policy_version 13360 (0.0015) [2023-09-21 13:40:56,351][52980] Updated weights for policy 0, policy_version 13360 (0.0015) [2023-09-21 13:40:59,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13713408. Throughput: 0: 6553.2, 1: 6553.4. Samples: 13692332. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:40:59,287][52220] Avg episode reward: [(0, '6939.912'), (1, '6488.300')] [2023-09-21 13:40:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013392_6856704.pth... [2023-09-21 13:40:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013392_6856704.pth... [2023-09-21 13:40:59,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013008_6660096.pth [2023-09-21 13:40:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013008_6660096.pth [2023-09-21 13:41:02,541][52979] Updated weights for policy 1, policy_version 13440 (0.0012) [2023-09-21 13:41:02,541][52980] Updated weights for policy 0, policy_version 13440 (0.0014) [2023-09-21 13:41:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13778944. Throughput: 0: 6570.4, 1: 6571.0. Samples: 13771876. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:41:04,287][52220] Avg episode reward: [(0, '6845.519'), (1, '6178.206')] [2023-09-21 13:41:08,761][52979] Updated weights for policy 1, policy_version 13520 (0.0012) [2023-09-21 13:41:08,761][52980] Updated weights for policy 0, policy_version 13520 (0.0010) [2023-09-21 13:41:09,287][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13844480. Throughput: 0: 6601.5, 1: 6602.3. Samples: 13850466. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:41:09,288][52220] Avg episode reward: [(0, '6752.829'), (1, '1113.059')] [2023-09-21 13:41:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13910016. Throughput: 0: 6612.3, 1: 6614.8. Samples: 13890904. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:41:14,288][52220] Avg episode reward: [(0, '6938.709'), (1, '775.866')] [2023-09-21 13:41:14,314][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013592_6959104.pth... [2023-09-21 13:41:14,318][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013592_6959104.pth... [2023-09-21 13:41:14,321][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013200_6758400.pth [2023-09-21 13:41:14,321][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013200_6758400.pth [2023-09-21 13:41:14,951][52979] Updated weights for policy 1, policy_version 13600 (0.0014) [2023-09-21 13:41:14,951][52980] Updated weights for policy 0, policy_version 13600 (0.0014) [2023-09-21 13:41:19,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13975552. Throughput: 0: 6578.3, 1: 6577.0. Samples: 13967426. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:41:19,287][52220] Avg episode reward: [(0, '7684.580'), (1, '875.549')] [2023-09-21 13:41:21,564][52979] Updated weights for policy 1, policy_version 13680 (0.0015) [2023-09-21 13:41:21,565][52980] Updated weights for policy 0, policy_version 13680 (0.0014) [2023-09-21 13:41:24,286][52220] Fps is (10 sec: 13107.6, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 14041088. Throughput: 0: 6522.3, 1: 6522.0. Samples: 14042316. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:41:24,287][52220] Avg episode reward: [(0, '7871.616'), (1, '901.659')] [2023-09-21 13:41:27,942][52980] Updated weights for policy 0, policy_version 13760 (0.0016) [2023-09-21 13:41:27,942][52979] Updated weights for policy 1, policy_version 13760 (0.0013) [2023-09-21 13:41:29,287][52220] Fps is (10 sec: 13106.8, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 14106624. Throughput: 0: 6510.1, 1: 6509.6. Samples: 14080944. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:41:29,288][52220] Avg episode reward: [(0, '8058.714'), (1, '986.991')] [2023-09-21 13:41:29,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013776_7053312.pth... [2023-09-21 13:41:29,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013776_7053312.pth... [2023-09-21 13:41:29,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013392_6856704.pth [2023-09-21 13:41:29,307][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013392_6856704.pth [2023-09-21 13:41:34,286][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 13051.7). Total num frames: 14163968. Throughput: 0: 6475.9, 1: 6475.1. Samples: 14155952. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:41:34,287][52220] Avg episode reward: [(0, '7422.855'), (1, '973.493')] [2023-09-21 13:41:34,417][52980] Updated weights for policy 0, policy_version 13840 (0.0011) [2023-09-21 13:41:34,418][52979] Updated weights for policy 1, policy_version 13840 (0.0011) [2023-09-21 13:41:39,286][52220] Fps is (10 sec: 13107.6, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 14237696. Throughput: 0: 6505.0, 1: 6503.5. Samples: 14237834. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:41:39,288][52220] Avg episode reward: [(0, '6406.380'), (1, '944.796')] [2023-09-21 13:41:40,439][52980] Updated weights for policy 0, policy_version 13920 (0.0013) [2023-09-21 13:41:40,440][52979] Updated weights for policy 1, policy_version 13920 (0.0015) [2023-09-21 13:41:44,287][52220] Fps is (10 sec: 13926.3, 60 sec: 13107.1, 300 sec: 13051.6). Total num frames: 14303232. Throughput: 0: 6514.6, 1: 6512.8. Samples: 14278562. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:41:44,287][52220] Avg episode reward: [(0, '5668.432'), (1, '1637.056')] [2023-09-21 13:41:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013968_7151616.pth... [2023-09-21 13:41:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013968_7151616.pth... [2023-09-21 13:41:44,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013592_6959104.pth [2023-09-21 13:41:44,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013592_6959104.pth [2023-09-21 13:41:46,779][52980] Updated weights for policy 0, policy_version 14000 (0.0010) [2023-09-21 13:41:46,779][52979] Updated weights for policy 1, policy_version 14000 (0.0014) [2023-09-21 13:41:49,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 14368768. Throughput: 0: 6501.0, 1: 6500.4. Samples: 14356936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:41:49,287][52220] Avg episode reward: [(0, '5751.232'), (1, '1077.826')] [2023-09-21 13:41:52,943][52980] Updated weights for policy 0, policy_version 14080 (0.0014) [2023-09-21 13:41:52,943][52979] Updated weights for policy 1, policy_version 14080 (0.0014) [2023-09-21 13:41:54,286][52220] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 14434304. Throughput: 0: 6487.6, 1: 6486.1. Samples: 14434282. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:41:54,287][52220] Avg episode reward: [(0, '5288.614'), (1, '1222.203')] [2023-09-21 13:41:59,074][52979] Updated weights for policy 1, policy_version 14160 (0.0012) [2023-09-21 13:41:59,075][52980] Updated weights for policy 0, policy_version 14160 (0.0011) [2023-09-21 13:41:59,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 14499840. Throughput: 0: 6491.3, 1: 6489.6. Samples: 14475046. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:41:59,288][52220] Avg episode reward: [(0, '5381.397'), (1, '4453.209')] [2023-09-21 13:41:59,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014160_7249920.pth... [2023-09-21 13:41:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014160_7249920.pth... [2023-09-21 13:41:59,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013776_7053312.pth [2023-09-21 13:41:59,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013776_7053312.pth [2023-09-21 13:42:04,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 14565376. Throughput: 0: 6495.6, 1: 6497.6. Samples: 14552122. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:42:04,287][52220] Avg episode reward: [(0, '5913.161'), (1, '6389.361')] [2023-09-21 13:42:05,472][52979] Updated weights for policy 1, policy_version 14240 (0.0009) [2023-09-21 13:42:05,473][52980] Updated weights for policy 0, policy_version 14240 (0.0012) [2023-09-21 13:42:09,287][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 14622720. Throughput: 0: 6529.7, 1: 6529.5. Samples: 14629980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:42:09,288][52220] Avg episode reward: [(0, '4700.708'), (1, '1918.955')] [2023-09-21 13:42:11,933][52979] Updated weights for policy 1, policy_version 14320 (0.0012) [2023-09-21 13:42:11,934][52980] Updated weights for policy 0, policy_version 14320 (0.0013) [2023-09-21 13:42:14,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 14688256. Throughput: 0: 6520.9, 1: 6521.3. Samples: 14667840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:42:14,287][52220] Avg episode reward: [(0, '4606.087'), (1, '4103.747')] [2023-09-21 13:42:14,337][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014352_7348224.pth... [2023-09-21 13:42:14,339][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014352_7348224.pth... [2023-09-21 13:42:14,340][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013968_7151616.pth [2023-09-21 13:42:14,342][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013968_7151616.pth [2023-09-21 13:42:18,049][52980] Updated weights for policy 0, policy_version 14400 (0.0013) [2023-09-21 13:42:18,050][52979] Updated weights for policy 1, policy_version 14400 (0.0009) [2023-09-21 13:42:19,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 14753792. Throughput: 0: 6568.6, 1: 6569.8. Samples: 14747182. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:42:19,287][52220] Avg episode reward: [(0, '5632.569'), (1, '1881.283')] [2023-09-21 13:42:24,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 14819328. Throughput: 0: 6508.1, 1: 6509.0. Samples: 14823600. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:42:24,287][52220] Avg episode reward: [(0, '6099.656'), (1, '2057.043')] [2023-09-21 13:42:24,532][52980] Updated weights for policy 0, policy_version 14480 (0.0016) [2023-09-21 13:42:24,532][52979] Updated weights for policy 1, policy_version 14480 (0.0011) [2023-09-21 13:42:29,287][52220] Fps is (10 sec: 13106.8, 60 sec: 12970.7, 300 sec: 13051.7). Total num frames: 14884864. Throughput: 0: 6464.5, 1: 6465.3. Samples: 14860408. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:42:29,288][52220] Avg episode reward: [(0, '6192.287'), (1, '1154.242')] [2023-09-21 13:42:29,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014536_7442432.pth... [2023-09-21 13:42:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014536_7442432.pth... [2023-09-21 13:42:29,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014160_7249920.pth [2023-09-21 13:42:29,305][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014160_7249920.pth [2023-09-21 13:42:30,892][52979] Updated weights for policy 1, policy_version 14560 (0.0014) [2023-09-21 13:42:30,892][52980] Updated weights for policy 0, policy_version 14560 (0.0014) [2023-09-21 13:42:34,286][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 14950400. Throughput: 0: 6484.9, 1: 6483.7. Samples: 14940524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:42:34,287][52220] Avg episode reward: [(0, '6564.418'), (1, '776.134')] [2023-09-21 13:42:37,348][52979] Updated weights for policy 1, policy_version 14640 (0.0011) [2023-09-21 13:42:37,350][52980] Updated weights for policy 0, policy_version 14640 (0.0012) [2023-09-21 13:42:39,286][52220] Fps is (10 sec: 12288.4, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 15007744. Throughput: 0: 6449.1, 1: 6450.1. Samples: 15014748. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:42:39,287][52220] Avg episode reward: [(0, '6752.231'), (1, '798.317')] [2023-09-21 13:42:43,911][52979] Updated weights for policy 1, policy_version 14720 (0.0011) [2023-09-21 13:42:43,911][52980] Updated weights for policy 0, policy_version 14720 (0.0013) [2023-09-21 13:42:44,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 13023.9). Total num frames: 15073280. Throughput: 0: 6414.2, 1: 6414.8. Samples: 15052352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:42:44,288][52220] Avg episode reward: [(0, '7218.806'), (1, '952.032')] [2023-09-21 13:42:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014720_7536640.pth... [2023-09-21 13:42:44,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014720_7536640.pth... [2023-09-21 13:42:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014352_7348224.pth [2023-09-21 13:42:44,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014352_7348224.pth [2023-09-21 13:42:49,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 15138816. Throughput: 0: 6432.9, 1: 6433.1. Samples: 15131090. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:42:49,287][52220] Avg episode reward: [(0, '7591.005'), (1, '1153.393')] [2023-09-21 13:42:50,055][52979] Updated weights for policy 1, policy_version 14800 (0.0013) [2023-09-21 13:42:50,055][52980] Updated weights for policy 0, policy_version 14800 (0.0014) [2023-09-21 13:42:54,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 15204352. Throughput: 0: 6436.1, 1: 6436.5. Samples: 15209242. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:42:54,288][52220] Avg episode reward: [(0, '7315.933'), (1, '505.392')] [2023-09-21 13:42:56,471][52980] Updated weights for policy 0, policy_version 14880 (0.0016) [2023-09-21 13:42:56,471][52979] Updated weights for policy 1, policy_version 14880 (0.0015) [2023-09-21 13:42:59,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 13010.0). Total num frames: 15269888. Throughput: 0: 6419.2, 1: 6418.8. Samples: 15245548. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:42:59,287][52220] Avg episode reward: [(0, '7593.644'), (1, '585.633')] [2023-09-21 13:42:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014912_7634944.pth... [2023-09-21 13:42:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014912_7634944.pth... [2023-09-21 13:42:59,308][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014536_7442432.pth [2023-09-21 13:42:59,308][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014536_7442432.pth [2023-09-21 13:43:02,766][52980] Updated weights for policy 0, policy_version 14960 (0.0015) [2023-09-21 13:43:02,766][52979] Updated weights for policy 1, policy_version 14960 (0.0015) [2023-09-21 13:43:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 13023.9). Total num frames: 15335424. Throughput: 0: 6411.9, 1: 6411.4. Samples: 15324232. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:43:04,288][52220] Avg episode reward: [(0, '7965.551'), (1, '611.235')] [2023-09-21 13:43:09,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 15392768. Throughput: 0: 6396.2, 1: 6396.4. Samples: 15399270. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:43:09,287][52220] Avg episode reward: [(0, '7776.205'), (1, '863.828')] [2023-09-21 13:43:09,400][52979] Updated weights for policy 1, policy_version 15040 (0.0013) [2023-09-21 13:43:09,401][52980] Updated weights for policy 0, policy_version 15040 (0.0016) [2023-09-21 13:43:14,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 13023.9). Total num frames: 15466496. Throughput: 0: 6432.4, 1: 6433.6. Samples: 15439374. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:43:14,287][52220] Avg episode reward: [(0, '7590.204'), (1, '841.888')] [2023-09-21 13:43:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015104_7733248.pth... [2023-09-21 13:43:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015104_7733248.pth... [2023-09-21 13:43:14,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014720_7536640.pth [2023-09-21 13:43:14,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014720_7536640.pth [2023-09-21 13:43:15,499][52884] KL-divergence is very high: 348.9812 [2023-09-21 13:43:15,531][52980] Updated weights for policy 0, policy_version 15120 (0.0013) [2023-09-21 13:43:15,531][52979] Updated weights for policy 1, policy_version 15120 (0.0014) [2023-09-21 13:43:19,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 15523840. Throughput: 0: 6417.9, 1: 6419.4. Samples: 15518202. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:43:19,288][52220] Avg episode reward: [(0, '7496.744'), (1, '2087.163')] [2023-09-21 13:43:21,834][52980] Updated weights for policy 0, policy_version 15200 (0.0009) [2023-09-21 13:43:21,835][52979] Updated weights for policy 1, policy_version 15200 (0.0014) [2023-09-21 13:43:24,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 15597568. Throughput: 0: 6471.3, 1: 6472.0. Samples: 15597196. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:43:24,287][52220] Avg episode reward: [(0, '7870.136'), (1, '3205.815')] [2023-09-21 13:43:28,151][52980] Updated weights for policy 0, policy_version 15280 (0.0016) [2023-09-21 13:43:28,151][52979] Updated weights for policy 1, policy_version 15280 (0.0016) [2023-09-21 13:43:29,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 12996.1). Total num frames: 15654912. Throughput: 0: 6473.3, 1: 6473.8. Samples: 15634972. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:43:29,287][52220] Avg episode reward: [(0, '7966.122'), (1, '2217.232')] [2023-09-21 13:43:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015288_7827456.pth... [2023-09-21 13:43:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015288_7827456.pth... [2023-09-21 13:43:29,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014912_7634944.pth [2023-09-21 13:43:29,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014912_7634944.pth [2023-09-21 13:43:34,286][52220] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 15720448. Throughput: 0: 6427.9, 1: 6427.4. Samples: 15709580. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:43:34,287][52220] Avg episode reward: [(0, '7873.016'), (1, '1097.525')] [2023-09-21 13:43:34,707][52979] Updated weights for policy 1, policy_version 15360 (0.0013) [2023-09-21 13:43:34,708][52980] Updated weights for policy 0, policy_version 15360 (0.0015) [2023-09-21 13:43:39,286][52220] Fps is (10 sec: 12288.3, 60 sec: 12834.2, 300 sec: 12968.4). Total num frames: 15777792. Throughput: 0: 6392.9, 1: 6392.1. Samples: 15784562. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:43:39,287][52220] Avg episode reward: [(0, '7228.105'), (1, '3462.300')] [2023-09-21 13:43:41,356][52980] Updated weights for policy 0, policy_version 15440 (0.0015) [2023-09-21 13:43:41,356][52979] Updated weights for policy 1, policy_version 15440 (0.0010) [2023-09-21 13:43:44,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 15843328. Throughput: 0: 6399.0, 1: 6398.8. Samples: 15821450. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:43:44,288][52220] Avg episode reward: [(0, '6114.203'), (1, '5123.172')] [2023-09-21 13:43:44,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015472_7921664.pth... [2023-09-21 13:43:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015472_7921664.pth... [2023-09-21 13:43:44,298][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015104_7733248.pth [2023-09-21 13:43:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015104_7733248.pth [2023-09-21 13:43:47,434][52979] Updated weights for policy 1, policy_version 15520 (0.0013) [2023-09-21 13:43:47,435][52980] Updated weights for policy 0, policy_version 15520 (0.0014) [2023-09-21 13:43:49,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 15908864. Throughput: 0: 6427.6, 1: 6427.6. Samples: 15902720. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:43:49,288][52220] Avg episode reward: [(0, '6109.852'), (1, '5100.013')] [2023-09-21 13:43:53,865][52979] Updated weights for policy 1, policy_version 15600 (0.0015) [2023-09-21 13:43:53,865][52980] Updated weights for policy 0, policy_version 15600 (0.0016) [2023-09-21 13:43:54,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 15974400. Throughput: 0: 6445.1, 1: 6445.4. Samples: 15979340. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:43:54,288][52220] Avg episode reward: [(0, '6944.572'), (1, '3918.626')] [2023-09-21 13:43:59,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 16039936. Throughput: 0: 6401.8, 1: 6399.4. Samples: 16015424. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:43:59,287][52220] Avg episode reward: [(0, '6940.350'), (1, '4835.653')] [2023-09-21 13:43:59,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015664_8019968.pth... [2023-09-21 13:43:59,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015664_8019968.pth... [2023-09-21 13:43:59,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015288_7827456.pth [2023-09-21 13:43:59,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015288_7827456.pth [2023-09-21 13:44:00,598][52979] Updated weights for policy 1, policy_version 15680 (0.0016) [2023-09-21 13:44:00,598][52980] Updated weights for policy 0, policy_version 15680 (0.0017) [2023-09-21 13:44:04,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12697.6, 300 sec: 12940.6). Total num frames: 16097280. Throughput: 0: 6339.2, 1: 6338.0. Samples: 16088674. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:44:04,287][52220] Avg episode reward: [(0, '7590.234'), (1, '5486.204')] [2023-09-21 13:44:07,116][52980] Updated weights for policy 0, policy_version 15760 (0.0013) [2023-09-21 13:44:07,117][52979] Updated weights for policy 1, policy_version 15760 (0.0013) [2023-09-21 13:44:09,287][52220] Fps is (10 sec: 12287.6, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 16162816. Throughput: 0: 6310.4, 1: 6310.2. Samples: 16165130. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:44:09,288][52220] Avg episode reward: [(0, '7685.314'), (1, '5132.896')] [2023-09-21 13:44:13,630][52979] Updated weights for policy 1, policy_version 15840 (0.0016) [2023-09-21 13:44:13,630][52980] Updated weights for policy 0, policy_version 15840 (0.0012) [2023-09-21 13:44:14,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12697.6, 300 sec: 12940.6). Total num frames: 16228352. Throughput: 0: 6316.1, 1: 6315.9. Samples: 16203410. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:44:14,287][52220] Avg episode reward: [(0, '7316.430'), (1, '5339.681')] [2023-09-21 13:44:14,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015848_8114176.pth... [2023-09-21 13:44:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015848_8114176.pth... [2023-09-21 13:44:14,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015472_7921664.pth [2023-09-21 13:44:14,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015472_7921664.pth [2023-09-21 13:44:19,287][52220] Fps is (10 sec: 12288.2, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 16285696. Throughput: 0: 6334.7, 1: 6335.3. Samples: 16279732. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:44:19,288][52220] Avg episode reward: [(0, '6666.683'), (1, '5483.146')] [2023-09-21 13:44:19,984][52980] Updated weights for policy 0, policy_version 15920 (0.0015) [2023-09-21 13:44:19,984][52979] Updated weights for policy 1, policy_version 15920 (0.0017) [2023-09-21 13:44:24,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12561.0, 300 sec: 12912.8). Total num frames: 16351232. Throughput: 0: 6353.3, 1: 6354.3. Samples: 16356406. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:44:24,288][52220] Avg episode reward: [(0, '6300.850'), (1, '5569.612')] [2023-09-21 13:44:26,416][52979] Updated weights for policy 1, policy_version 16000 (0.0014) [2023-09-21 13:44:26,417][52980] Updated weights for policy 0, policy_version 16000 (0.0012) [2023-09-21 13:44:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 16416768. Throughput: 0: 6350.6, 1: 6350.9. Samples: 16393020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:44:29,288][52220] Avg episode reward: [(0, '6671.813'), (1, '4860.758')] [2023-09-21 13:44:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016032_8208384.pth... [2023-09-21 13:44:29,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016032_8208384.pth... [2023-09-21 13:44:29,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015664_8019968.pth [2023-09-21 13:44:29,306][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015664_8019968.pth [2023-09-21 13:44:32,893][52980] Updated weights for policy 0, policy_version 16080 (0.0015) [2023-09-21 13:44:32,893][52979] Updated weights for policy 1, policy_version 16080 (0.0016) [2023-09-21 13:44:34,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 16482304. Throughput: 0: 6305.5, 1: 6305.4. Samples: 16470210. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:44:34,287][52220] Avg episode reward: [(0, '7041.262'), (1, '4043.707')] [2023-09-21 13:44:39,117][52979] Updated weights for policy 1, policy_version 16160 (0.0015) [2023-09-21 13:44:39,117][52980] Updated weights for policy 0, policy_version 16160 (0.0014) [2023-09-21 13:44:39,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 16547840. Throughput: 0: 6323.3, 1: 6323.0. Samples: 16548424. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:44:39,287][52220] Avg episode reward: [(0, '7127.936'), (1, '4871.255')] [2023-09-21 13:44:44,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 16613376. Throughput: 0: 6362.1, 1: 6364.0. Samples: 16588100. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:44:44,288][52220] Avg episode reward: [(0, '7220.699'), (1, '5130.083')] [2023-09-21 13:44:44,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016224_8306688.pth... [2023-09-21 13:44:44,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016224_8306688.pth... [2023-09-21 13:44:44,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015848_8114176.pth [2023-09-21 13:44:44,308][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015848_8114176.pth [2023-09-21 13:44:45,530][52979] Updated weights for policy 1, policy_version 16240 (0.0013) [2023-09-21 13:44:45,530][52980] Updated weights for policy 0, policy_version 16240 (0.0012) [2023-09-21 13:44:49,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12697.6, 300 sec: 12898.9). Total num frames: 16670720. Throughput: 0: 6389.4, 1: 6390.5. Samples: 16663772. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:44:49,287][52220] Avg episode reward: [(0, '6941.879'), (1, '5579.268')] [2023-09-21 13:44:51,904][52979] Updated weights for policy 1, policy_version 16320 (0.0009) [2023-09-21 13:44:51,905][52980] Updated weights for policy 0, policy_version 16320 (0.0014) [2023-09-21 13:44:54,287][52220] Fps is (10 sec: 12288.1, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 16736256. Throughput: 0: 6430.6, 1: 6430.0. Samples: 16743856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:44:54,288][52220] Avg episode reward: [(0, '6757.726'), (1, '6578.346')] [2023-09-21 13:44:57,876][52979] Updated weights for policy 1, policy_version 16400 (0.0009) [2023-09-21 13:44:57,876][52980] Updated weights for policy 0, policy_version 16400 (0.0012) [2023-09-21 13:44:59,287][52220] Fps is (10 sec: 13925.8, 60 sec: 12834.0, 300 sec: 12940.6). Total num frames: 16809984. Throughput: 0: 6467.2, 1: 6465.7. Samples: 16785394. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:44:59,288][52220] Avg episode reward: [(0, '7129.192'), (1, '6559.431')] [2023-09-21 13:44:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016416_8404992.pth... [2023-09-21 13:44:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016416_8404992.pth... [2023-09-21 13:44:59,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016032_8208384.pth [2023-09-21 13:44:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016032_8208384.pth [2023-09-21 13:45:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 16867328. Throughput: 0: 6440.1, 1: 6438.8. Samples: 16859282. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:45:04,288][52220] Avg episode reward: [(0, '7497.588'), (1, '6464.558')] [2023-09-21 13:45:04,465][52980] Updated weights for policy 0, policy_version 16480 (0.0012) [2023-09-21 13:45:04,466][52979] Updated weights for policy 1, policy_version 16480 (0.0013) [2023-09-21 13:45:09,286][52220] Fps is (10 sec: 12288.6, 60 sec: 12834.2, 300 sec: 12912.8). Total num frames: 16932864. Throughput: 0: 6431.1, 1: 6430.9. Samples: 16935194. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:45:09,287][52220] Avg episode reward: [(0, '7218.519'), (1, '6990.016')] [2023-09-21 13:45:10,959][52979] Updated weights for policy 1, policy_version 16560 (0.0016) [2023-09-21 13:45:10,960][52980] Updated weights for policy 0, policy_version 16560 (0.0013) [2023-09-21 13:45:14,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12697.6, 300 sec: 12885.0). Total num frames: 16990208. Throughput: 0: 6453.1, 1: 6451.8. Samples: 16973736. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:45:14,287][52220] Avg episode reward: [(0, '7314.189'), (1, '6807.945')] [2023-09-21 13:45:14,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016600_8499200.pth... [2023-09-21 13:45:14,298][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016224_8306688.pth [2023-09-21 13:45:14,301][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016600_8499200.pth... [2023-09-21 13:45:14,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016224_8306688.pth [2023-09-21 13:45:17,374][52980] Updated weights for policy 0, policy_version 16640 (0.0017) [2023-09-21 13:45:17,375][52979] Updated weights for policy 1, policy_version 16640 (0.0013) [2023-09-21 13:45:19,287][52220] Fps is (10 sec: 13106.7, 60 sec: 12970.6, 300 sec: 12912.8). Total num frames: 17063936. Throughput: 0: 6445.4, 1: 6445.6. Samples: 17050306. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 13:45:19,288][52220] Avg episode reward: [(0, '7129.868'), (1, '4730.448')] [2023-09-21 13:45:23,757][52980] Updated weights for policy 0, policy_version 16720 (0.0014) [2023-09-21 13:45:23,757][52979] Updated weights for policy 1, policy_version 16720 (0.0013) [2023-09-21 13:45:24,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12885.0). Total num frames: 17121280. Throughput: 0: 6444.9, 1: 6445.2. Samples: 17128480. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:45:24,288][52220] Avg episode reward: [(0, '7221.769'), (1, '4524.132')] [2023-09-21 13:45:29,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 17195008. Throughput: 0: 6471.3, 1: 6469.4. Samples: 17170430. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:45:29,287][52220] Avg episode reward: [(0, '6757.209'), (1, '5528.581')] [2023-09-21 13:45:29,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016792_8597504.pth... [2023-09-21 13:45:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016792_8597504.pth... [2023-09-21 13:45:29,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016416_8404992.pth [2023-09-21 13:45:29,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016416_8404992.pth [2023-09-21 13:45:29,823][52979] Updated weights for policy 1, policy_version 16800 (0.0012) [2023-09-21 13:45:29,823][52980] Updated weights for policy 0, policy_version 16800 (0.0013) [2023-09-21 13:45:34,286][52220] Fps is (10 sec: 13926.7, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 17260544. Throughput: 0: 6484.9, 1: 6485.3. Samples: 17247432. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:45:34,287][52220] Avg episode reward: [(0, '6760.660'), (1, '6089.937')] [2023-09-21 13:45:36,145][52980] Updated weights for policy 0, policy_version 16880 (0.0011) [2023-09-21 13:45:36,146][52979] Updated weights for policy 1, policy_version 16880 (0.0012) [2023-09-21 13:45:39,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 17326080. Throughput: 0: 6468.8, 1: 6468.0. Samples: 17326014. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:45:39,288][52220] Avg episode reward: [(0, '6953.498'), (1, '6911.194')] [2023-09-21 13:45:42,392][52979] Updated weights for policy 1, policy_version 16960 (0.0010) [2023-09-21 13:45:42,393][52980] Updated weights for policy 0, policy_version 16960 (0.0013) [2023-09-21 13:45:44,286][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.2, 300 sec: 12885.0). Total num frames: 17383424. Throughput: 0: 6443.1, 1: 6445.0. Samples: 17365356. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:45:44,287][52220] Avg episode reward: [(0, '6680.874'), (1, '7279.392')] [2023-09-21 13:45:44,330][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016984_8695808.pth... [2023-09-21 13:45:44,333][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016600_8499200.pth [2023-09-21 13:45:44,336][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016984_8695808.pth... [2023-09-21 13:45:44,340][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016600_8499200.pth [2023-09-21 13:45:48,771][52980] Updated weights for policy 0, policy_version 17040 (0.0011) [2023-09-21 13:45:48,772][52979] Updated weights for policy 1, policy_version 17040 (0.0012) [2023-09-21 13:45:49,287][52220] Fps is (10 sec: 12288.0, 60 sec: 12970.6, 300 sec: 12885.0). Total num frames: 17448960. Throughput: 0: 6463.4, 1: 6464.4. Samples: 17441036. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:45:49,287][52220] Avg episode reward: [(0, '6579.048'), (1, '7368.600')] [2023-09-21 13:45:54,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12885.1). Total num frames: 17514496. Throughput: 0: 6449.2, 1: 6449.1. Samples: 17515620. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:45:54,287][52220] Avg episode reward: [(0, '7315.701'), (1, '7733.072')] [2023-09-21 13:45:55,452][52980] Updated weights for policy 0, policy_version 17120 (0.0010) [2023-09-21 13:45:55,452][52979] Updated weights for policy 1, policy_version 17120 (0.0014) [2023-09-21 13:45:59,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12697.7, 300 sec: 12857.3). Total num frames: 17571840. Throughput: 0: 6446.7, 1: 6448.4. Samples: 17554018. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:45:59,287][52220] Avg episode reward: [(0, '7225.743'), (1, '7824.554')] [2023-09-21 13:45:59,336][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017168_8790016.pth... [2023-09-21 13:45:59,339][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016792_8597504.pth [2023-09-21 13:45:59,347][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017168_8790016.pth... [2023-09-21 13:45:59,350][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016792_8597504.pth [2023-09-21 13:46:01,961][52979] Updated weights for policy 1, policy_version 17200 (0.0012) [2023-09-21 13:46:01,961][52980] Updated weights for policy 0, policy_version 17200 (0.0015) [2023-09-21 13:46:04,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 17645568. Throughput: 0: 6437.0, 1: 6436.7. Samples: 17629618. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:46:04,287][52220] Avg episode reward: [(0, '7133.540'), (1, '7552.845')] [2023-09-21 13:46:07,869][52979] Updated weights for policy 1, policy_version 17280 (0.0013) [2023-09-21 13:46:07,869][52980] Updated weights for policy 0, policy_version 17280 (0.0014) [2023-09-21 13:46:09,286][52220] Fps is (10 sec: 13926.4, 60 sec: 12970.6, 300 sec: 12885.0). Total num frames: 17711104. Throughput: 0: 6499.3, 1: 6499.1. Samples: 17713410. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:46:09,287][52220] Avg episode reward: [(0, '7224.133'), (1, '7191.170')] [2023-09-21 13:46:14,028][52979] Updated weights for policy 1, policy_version 17360 (0.0014) [2023-09-21 13:46:14,028][52980] Updated weights for policy 0, policy_version 17360 (0.0013) [2023-09-21 13:46:14,287][52220] Fps is (10 sec: 13106.9, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 17776640. Throughput: 0: 6461.5, 1: 6461.9. Samples: 17751988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:46:14,288][52220] Avg episode reward: [(0, '7406.381'), (1, '6908.129')] [2023-09-21 13:46:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017360_8888320.pth... [2023-09-21 13:46:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017360_8888320.pth... [2023-09-21 13:46:14,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016984_8695808.pth [2023-09-21 13:46:14,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016984_8695808.pth [2023-09-21 13:46:19,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 17842176. Throughput: 0: 6503.4, 1: 6502.8. Samples: 17832712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:46:19,288][52220] Avg episode reward: [(0, '7033.404'), (1, '6989.562')] [2023-09-21 13:46:20,321][52979] Updated weights for policy 1, policy_version 17440 (0.0011) [2023-09-21 13:46:20,321][52980] Updated weights for policy 0, policy_version 17440 (0.0016) [2023-09-21 13:46:24,287][52220] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12885.1). Total num frames: 17907712. Throughput: 0: 6464.3, 1: 6463.8. Samples: 17907776. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:46:24,287][52220] Avg episode reward: [(0, '7033.707'), (1, '7080.749')] [2023-09-21 13:46:26,690][52980] Updated weights for policy 0, policy_version 17520 (0.0013) [2023-09-21 13:46:26,690][52979] Updated weights for policy 1, policy_version 17520 (0.0012) [2023-09-21 13:46:29,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 17973248. Throughput: 0: 6475.4, 1: 6475.6. Samples: 17948152. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:46:29,287][52220] Avg episode reward: [(0, '7032.291'), (1, '7447.172')] [2023-09-21 13:46:29,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017552_8986624.pth... [2023-09-21 13:46:29,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017552_8986624.pth... [2023-09-21 13:46:29,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017168_8790016.pth [2023-09-21 13:46:29,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017168_8790016.pth [2023-09-21 13:46:33,080][52980] Updated weights for policy 0, policy_version 17600 (0.0011) [2023-09-21 13:46:33,080][52979] Updated weights for policy 1, policy_version 17600 (0.0013) [2023-09-21 13:46:34,287][52220] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18030592. Throughput: 0: 6474.1, 1: 6474.6. Samples: 18023728. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:46:34,288][52220] Avg episode reward: [(0, '7311.416'), (1, '7624.408')] [2023-09-21 13:46:39,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18096128. Throughput: 0: 6526.3, 1: 6526.3. Samples: 18102990. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:46:39,288][52220] Avg episode reward: [(0, '7127.357'), (1, '7631.744')] [2023-09-21 13:46:39,332][52979] Updated weights for policy 1, policy_version 17680 (0.0014) [2023-09-21 13:46:39,332][52980] Updated weights for policy 0, policy_version 17680 (0.0011) [2023-09-21 13:46:44,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 18161664. Throughput: 0: 6517.1, 1: 6516.0. Samples: 18140504. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:46:44,287][52220] Avg episode reward: [(0, '7406.257'), (1, '7722.659')] [2023-09-21 13:46:44,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017736_9080832.pth... [2023-09-21 13:46:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017736_9080832.pth... [2023-09-21 13:46:44,298][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017360_8888320.pth [2023-09-21 13:46:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017360_8888320.pth [2023-09-21 13:46:45,896][52979] Updated weights for policy 1, policy_version 17760 (0.0015) [2023-09-21 13:46:45,896][52980] Updated weights for policy 0, policy_version 17760 (0.0014) [2023-09-21 13:46:49,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 18227200. Throughput: 0: 6497.4, 1: 6497.0. Samples: 18214364. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:46:49,287][52220] Avg episode reward: [(0, '7312.815'), (1, '7638.060')] [2023-09-21 13:46:52,597][52979] Updated weights for policy 1, policy_version 17840 (0.0011) [2023-09-21 13:46:52,598][52980] Updated weights for policy 0, policy_version 17840 (0.0014) [2023-09-21 13:46:54,286][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12829.5). Total num frames: 18284544. Throughput: 0: 6381.2, 1: 6380.8. Samples: 18287702. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:46:54,287][52220] Avg episode reward: [(0, '7124.114'), (1, '7818.946')] [2023-09-21 13:46:59,078][52979] Updated weights for policy 1, policy_version 17920 (0.0013) [2023-09-21 13:46:59,079][52980] Updated weights for policy 0, policy_version 17920 (0.0014) [2023-09-21 13:46:59,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12970.6, 300 sec: 12829.5). Total num frames: 18350080. Throughput: 0: 6373.4, 1: 6372.9. Samples: 18325572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:46:59,288][52220] Avg episode reward: [(0, '7589.081'), (1, '7720.539')] [2023-09-21 13:46:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017920_9175040.pth... [2023-09-21 13:46:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017920_9175040.pth... [2023-09-21 13:46:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017552_8986624.pth [2023-09-21 13:46:59,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017552_8986624.pth [2023-09-21 13:47:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18415616. Throughput: 0: 6328.9, 1: 6329.4. Samples: 18402336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:47:04,287][52220] Avg episode reward: [(0, '7495.950'), (1, '8078.740')] [2023-09-21 13:47:05,541][52980] Updated weights for policy 0, policy_version 18000 (0.0011) [2023-09-21 13:47:05,542][52979] Updated weights for policy 1, policy_version 18000 (0.0012) [2023-09-21 13:47:09,287][52220] Fps is (10 sec: 13107.4, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18481152. Throughput: 0: 6377.3, 1: 6379.5. Samples: 18481830. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:47:09,288][52220] Avg episode reward: [(0, '7682.699'), (1, '8168.326')] [2023-09-21 13:47:11,515][52980] Updated weights for policy 0, policy_version 18080 (0.0012) [2023-09-21 13:47:11,515][52979] Updated weights for policy 1, policy_version 18080 (0.0014) [2023-09-21 13:47:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18546688. Throughput: 0: 6380.0, 1: 6379.8. Samples: 18522342. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:47:14,288][52220] Avg episode reward: [(0, '7682.805'), (1, '8267.660')] [2023-09-21 13:47:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018112_9273344.pth... [2023-09-21 13:47:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018112_9273344.pth... [2023-09-21 13:47:14,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017736_9080832.pth [2023-09-21 13:47:14,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017736_9080832.pth [2023-09-21 13:47:17,693][52980] Updated weights for policy 0, policy_version 18160 (0.0015) [2023-09-21 13:47:17,693][52979] Updated weights for policy 1, policy_version 18160 (0.0011) [2023-09-21 13:47:19,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12834.2, 300 sec: 12857.3). Total num frames: 18612224. Throughput: 0: 6444.6, 1: 6442.6. Samples: 18603646. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:47:19,287][52220] Avg episode reward: [(0, '7684.587'), (1, '8452.451')] [2023-09-21 13:47:24,195][52980] Updated weights for policy 0, policy_version 18240 (0.0014) [2023-09-21 13:47:24,195][52979] Updated weights for policy 1, policy_version 18240 (0.0014) [2023-09-21 13:47:24,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12834.2, 300 sec: 12857.3). Total num frames: 18677760. Throughput: 0: 6387.4, 1: 6386.1. Samples: 18677794. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:47:24,287][52220] Avg episode reward: [(0, '7500.530'), (1, '8360.955')] [2023-09-21 13:47:29,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18743296. Throughput: 0: 6420.5, 1: 6420.1. Samples: 18718336. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:47:29,288][52220] Avg episode reward: [(0, '7414.395'), (1, '8452.178')] [2023-09-21 13:47:29,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018304_9371648.pth... [2023-09-21 13:47:29,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018304_9371648.pth... [2023-09-21 13:47:29,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017920_9175040.pth [2023-09-21 13:47:29,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017920_9175040.pth [2023-09-21 13:47:30,246][52979] Updated weights for policy 1, policy_version 18320 (0.0012) [2023-09-21 13:47:30,246][52980] Updated weights for policy 0, policy_version 18320 (0.0014) [2023-09-21 13:47:34,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 18808832. Throughput: 0: 6471.3, 1: 6471.8. Samples: 18796804. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:47:34,287][52220] Avg episode reward: [(0, '6953.468'), (1, '8446.490')] [2023-09-21 13:47:36,669][52980] Updated weights for policy 0, policy_version 18400 (0.0012) [2023-09-21 13:47:36,670][52979] Updated weights for policy 1, policy_version 18400 (0.0016) [2023-09-21 13:47:39,286][52220] Fps is (10 sec: 13107.6, 60 sec: 12970.7, 300 sec: 12885.1). Total num frames: 18874368. Throughput: 0: 6524.5, 1: 6524.3. Samples: 18874896. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 13:47:39,287][52220] Avg episode reward: [(0, '6953.002'), (1, '8442.549')] [2023-09-21 13:47:43,014][52980] Updated weights for policy 0, policy_version 18480 (0.0014) [2023-09-21 13:47:43,014][52979] Updated weights for policy 1, policy_version 18480 (0.0011) [2023-09-21 13:47:44,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 12885.0). Total num frames: 18939904. Throughput: 0: 6533.9, 1: 6535.9. Samples: 18913714. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:47:44,287][52220] Avg episode reward: [(0, '6484.211'), (1, '8712.539')] [2023-09-21 13:47:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018496_9469952.pth... [2023-09-21 13:47:44,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018496_9469952.pth... [2023-09-21 13:47:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018112_9273344.pth [2023-09-21 13:47:44,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018112_9273344.pth [2023-09-21 13:47:48,928][52979] Updated weights for policy 1, policy_version 18560 (0.0012) [2023-09-21 13:47:48,929][52980] Updated weights for policy 0, policy_version 18560 (0.0013) [2023-09-21 13:47:49,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 19005440. Throughput: 0: 6599.1, 1: 6599.0. Samples: 18996248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:47:49,288][52220] Avg episode reward: [(0, '6296.169'), (1, '8897.060')] [2023-09-21 13:47:54,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 19070976. Throughput: 0: 6568.1, 1: 6568.0. Samples: 19072954. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:47:54,287][52220] Avg episode reward: [(0, '6668.544'), (1, '8988.684')] [2023-09-21 13:47:55,314][52980] Updated weights for policy 0, policy_version 18640 (0.0014) [2023-09-21 13:47:55,315][52979] Updated weights for policy 1, policy_version 18640 (0.0015) [2023-09-21 13:47:59,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 19136512. Throughput: 0: 6552.9, 1: 6550.7. Samples: 19112004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:47:59,288][52220] Avg episode reward: [(0, '7318.485'), (1, '8989.116')] [2023-09-21 13:47:59,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018688_9568256.pth... [2023-09-21 13:47:59,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018688_9568256.pth... [2023-09-21 13:47:59,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018304_9371648.pth [2023-09-21 13:47:59,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018304_9371648.pth [2023-09-21 13:48:01,469][52979] Updated weights for policy 1, policy_version 18720 (0.0013) [2023-09-21 13:48:01,469][52980] Updated weights for policy 0, policy_version 18720 (0.0013) [2023-09-21 13:48:04,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12912.8). Total num frames: 19202048. Throughput: 0: 6533.9, 1: 6535.2. Samples: 19191760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:48:04,288][52220] Avg episode reward: [(0, '6765.058'), (1, '9081.704')] [2023-09-21 13:48:08,066][52980] Updated weights for policy 0, policy_version 18800 (0.0011) [2023-09-21 13:48:08,068][52979] Updated weights for policy 1, policy_version 18800 (0.0016) [2023-09-21 13:48:09,286][52220] Fps is (10 sec: 12288.3, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 19259392. Throughput: 0: 6537.6, 1: 6538.8. Samples: 19266228. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:48:09,287][52220] Avg episode reward: [(0, '6766.459'), (1, '8991.223')] [2023-09-21 13:48:14,232][52979] Updated weights for policy 1, policy_version 18880 (0.0010) [2023-09-21 13:48:14,232][52980] Updated weights for policy 0, policy_version 18880 (0.0012) [2023-09-21 13:48:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12912.8). Total num frames: 19333120. Throughput: 0: 6519.4, 1: 6520.5. Samples: 19305134. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:48:14,288][52220] Avg episode reward: [(0, '7039.722'), (1, '8621.066')] [2023-09-21 13:48:14,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018880_9666560.pth... [2023-09-21 13:48:14,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018880_9666560.pth... [2023-09-21 13:48:14,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018496_9469952.pth [2023-09-21 13:48:14,307][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018496_9469952.pth [2023-09-21 13:48:19,286][52220] Fps is (10 sec: 13926.2, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 19398656. Throughput: 0: 6548.1, 1: 6547.7. Samples: 19386116. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:48:19,287][52220] Avg episode reward: [(0, '7041.964'), (1, '8620.213')] [2023-09-21 13:48:20,512][52979] Updated weights for policy 1, policy_version 18960 (0.0010) [2023-09-21 13:48:20,513][52980] Updated weights for policy 0, policy_version 18960 (0.0014) [2023-09-21 13:48:24,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12912.8). Total num frames: 19464192. Throughput: 0: 6545.2, 1: 6544.5. Samples: 19463938. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:48:24,287][52220] Avg episode reward: [(0, '6856.232'), (1, '8529.884')] [2023-09-21 13:48:26,835][52980] Updated weights for policy 0, policy_version 19040 (0.0014) [2023-09-21 13:48:26,835][52979] Updated weights for policy 1, policy_version 19040 (0.0014) [2023-09-21 13:48:29,286][52220] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 19521536. Throughput: 0: 6538.3, 1: 6538.8. Samples: 19502180. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:48:29,287][52220] Avg episode reward: [(0, '6569.805'), (1, '8345.107')] [2023-09-21 13:48:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019064_9760768.pth... [2023-09-21 13:48:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019064_9760768.pth... [2023-09-21 13:48:29,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018688_9568256.pth [2023-09-21 13:48:29,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018688_9568256.pth [2023-09-21 13:48:33,115][52980] Updated weights for policy 0, policy_version 19120 (0.0010) [2023-09-21 13:48:33,115][52979] Updated weights for policy 1, policy_version 19120 (0.0016) [2023-09-21 13:48:34,287][52220] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 19587072. Throughput: 0: 6481.6, 1: 6481.7. Samples: 19579596. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 13:48:34,287][52220] Avg episode reward: [(0, '6385.226'), (1, '8530.049')] [2023-09-21 13:48:39,255][52979] Updated weights for policy 1, policy_version 19200 (0.0014) [2023-09-21 13:48:39,256][52980] Updated weights for policy 0, policy_version 19200 (0.0013) [2023-09-21 13:48:39,286][52220] Fps is (10 sec: 13926.5, 60 sec: 13107.2, 300 sec: 12940.6). Total num frames: 19660800. Throughput: 0: 6531.7, 1: 6530.3. Samples: 19660742. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:48:39,287][52220] Avg episode reward: [(0, '6850.179'), (1, '8527.774')] [2023-09-21 13:48:44,287][52220] Fps is (10 sec: 13926.3, 60 sec: 13107.2, 300 sec: 12940.6). Total num frames: 19726336. Throughput: 0: 6547.3, 1: 6548.7. Samples: 19701322. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 13:48:44,288][52220] Avg episode reward: [(0, '7131.582'), (1, '8528.641')] [2023-09-21 13:48:44,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019264_9863168.pth... [2023-09-21 13:48:44,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019264_9863168.pth... [2023-09-21 13:48:44,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018880_9666560.pth [2023-09-21 13:48:44,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018880_9666560.pth [2023-09-21 13:48:45,625][52979] Updated weights for policy 1, policy_version 19280 (0.0012) [2023-09-21 13:48:45,625][52980] Updated weights for policy 0, policy_version 19280 (0.0015) [2023-09-21 13:48:49,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 19783680. Throughput: 0: 6504.2, 1: 6504.5. Samples: 19777148. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:48:49,287][52220] Avg episode reward: [(0, '7595.058'), (1, '8712.437')] [2023-09-21 13:48:51,930][52980] Updated weights for policy 0, policy_version 19360 (0.0014) [2023-09-21 13:48:51,931][52979] Updated weights for policy 1, policy_version 19360 (0.0013) [2023-09-21 13:48:54,287][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 19849216. Throughput: 0: 6521.0, 1: 6520.6. Samples: 19853104. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:48:54,287][52220] Avg episode reward: [(0, '7593.209'), (1, '8711.769')] [2023-09-21 13:48:58,371][52980] Updated weights for policy 0, policy_version 19440 (0.0014) [2023-09-21 13:48:58,371][52979] Updated weights for policy 1, policy_version 19440 (0.0013) [2023-09-21 13:48:59,287][52220] Fps is (10 sec: 13106.8, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 19914752. Throughput: 0: 6514.9, 1: 6513.9. Samples: 19891434. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 13:48:59,288][52220] Avg episode reward: [(0, '7314.703'), (1, '8711.742')] [2023-09-21 13:48:59,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019448_9957376.pth... [2023-09-21 13:48:59,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019448_9957376.pth... [2023-09-21 13:48:59,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019064_9760768.pth [2023-09-21 13:48:59,314][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019064_9760768.pth [2023-09-21 13:49:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 19980288. Throughput: 0: 6483.0, 1: 6484.0. Samples: 19969630. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 13:49:04,287][52220] Avg episode reward: [(0, '7501.187'), (1, '8711.551')] [2023-09-21 13:49:04,683][52980] Updated weights for policy 0, policy_version 19520 (0.0012) [2023-09-21 13:49:04,684][52979] Updated weights for policy 1, policy_version 19520 (0.0013) [2023-09-21 13:49:06,521][52885] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000 [2023-09-21 13:49:06,522][52985] Stopping RolloutWorker_w2... [2023-09-21 13:49:06,523][52985] Loop rollout_proc2_evt_loop terminating... [2023-09-21 13:49:06,522][52982] Stopping RolloutWorker_w0... [2023-09-21 13:49:06,522][52220] Component RolloutWorker_w2 stopped! [2023-09-21 13:49:06,522][52986] Stopping RolloutWorker_w3... [2023-09-21 13:49:06,522][52988] Stopping RolloutWorker_w5... [2023-09-21 13:49:06,523][52990] Stopping RolloutWorker_w7... [2023-09-21 13:49:06,523][52989] Stopping RolloutWorker_w6... [2023-09-21 13:49:06,523][52987] Stopping RolloutWorker_w4... [2023-09-21 13:49:06,523][52984] Stopping RolloutWorker_w1... [2023-09-21 13:49:06,523][52982] Loop rollout_proc0_evt_loop terminating... [2023-09-21 13:49:06,523][52986] Loop rollout_proc3_evt_loop terminating... [2023-09-21 13:49:06,523][52220] Component RolloutWorker_w3 stopped! [2023-09-21 13:49:06,523][52884] Stopping Batcher_0... [2023-09-21 13:49:06,523][52990] Loop rollout_proc7_evt_loop terminating... [2023-09-21 13:49:06,523][52988] Loop rollout_proc5_evt_loop terminating... [2023-09-21 13:49:06,523][52220] Component RolloutWorker_w0 stopped! [2023-09-21 13:49:06,523][52989] Loop rollout_proc6_evt_loop terminating... [2023-09-21 13:49:06,523][52984] Loop rollout_proc1_evt_loop terminating... [2023-09-21 13:49:06,523][52987] Loop rollout_proc4_evt_loop terminating... [2023-09-21 13:49:06,524][52220] Component RolloutWorker_w5 stopped! [2023-09-21 13:49:06,524][52884] Loop batcher_evt_loop terminating... [2023-09-21 13:49:06,524][52220] Component RolloutWorker_w4 stopped! [2023-09-21 13:49:06,524][52220] Component RolloutWorker_w6 stopped! [2023-09-21 13:49:06,525][52220] Component RolloutWorker_w1 stopped! [2023-09-21 13:49:06,524][52885] Stopping Batcher_1... [2023-09-21 13:49:06,525][52220] Component RolloutWorker_w7 stopped! [2023-09-21 13:49:06,525][52220] Component Batcher_0 stopped! [2023-09-21 13:49:06,525][52220] Component Batcher_1 stopped! [2023-09-21 13:49:06,526][52884] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000 [2023-09-21 13:49:06,525][52885] Loop batcher_evt_loop terminating... [2023-09-21 13:49:06,526][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019544_10006528.pth... [2023-09-21 13:49:06,526][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019544_10006528.pth... [2023-09-21 13:49:06,529][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019264_9863168.pth [2023-09-21 13:49:06,530][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019544_10006528.pth... [2023-09-21 13:49:06,533][52884] Stopping LearnerWorker_p0... [2023-09-21 13:49:06,533][52884] Loop learner_proc0_evt_loop terminating... [2023-09-21 13:49:06,533][52220] Component LearnerWorker_p0 stopped! [2023-09-21 13:49:06,534][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019264_9863168.pth [2023-09-21 13:49:06,535][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019544_10006528.pth... [2023-09-21 13:49:06,539][52885] Stopping LearnerWorker_p1... [2023-09-21 13:49:06,539][52885] Loop learner_proc1_evt_loop terminating... [2023-09-21 13:49:06,539][52220] Component LearnerWorker_p1 stopped! [2023-09-21 13:49:06,578][52979] Weights refcount: 2 0 [2023-09-21 13:49:06,579][52979] Stopping InferenceWorker_p1-w0... [2023-09-21 13:49:06,579][52980] Weights refcount: 2 0 [2023-09-21 13:49:06,579][52979] Loop inference_proc1-0_evt_loop terminating... [2023-09-21 13:49:06,579][52220] Component InferenceWorker_p1-w0 stopped! [2023-09-21 13:49:06,580][52980] Stopping InferenceWorker_p0-w0... [2023-09-21 13:49:06,580][52980] Loop inference_proc0-0_evt_loop terminating... [2023-09-21 13:49:06,580][52220] Component InferenceWorker_p0-w0 stopped! [2023-09-21 13:49:06,580][52220] Waiting for process learner_proc0 to stop... [2023-09-21 13:49:07,107][52220] Waiting for process learner_proc1 to stop... [2023-09-21 13:49:07,131][52220] Waiting for process inference_proc0-0 to join... [2023-09-21 13:49:07,189][52220] Waiting for process inference_proc1-0 to join... [2023-09-21 13:49:07,190][52220] Waiting for process rollout_proc0 to join... [2023-09-21 13:49:07,190][52220] Waiting for process rollout_proc1 to join... [2023-09-21 13:49:07,191][52220] Waiting for process rollout_proc2 to join... [2023-09-21 13:49:07,192][52220] Waiting for process rollout_proc3 to join... [2023-09-21 13:49:07,192][52220] Waiting for process rollout_proc4 to join... [2023-09-21 13:49:07,193][52220] Waiting for process rollout_proc5 to join... [2023-09-21 13:49:07,193][52220] Waiting for process rollout_proc6 to join... [2023-09-21 13:49:07,194][52220] Waiting for process rollout_proc7 to join... [2023-09-21 13:49:07,194][52220] Batcher 0 profile tree view: batching: 40.1539, releasing_batches: 3.4137 [2023-09-21 13:49:07,195][52220] Batcher 1 profile tree view: batching: 39.8640, releasing_batches: 3.4962 [2023-09-21 13:49:07,195][52220] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0052 wait_policy_total: 206.1246 update_model: 19.4573 weight_update: 0.0014 one_step: 0.0012 handle_policy_step: 1227.2543 deserialize: 33.0974, stack: 7.3991, obs_to_device_normalize: 246.6086, forward: 612.8423, send_messages: 100.2734 prepare_outputs: 157.0522 to_cpu: 78.7695 [2023-09-21 13:49:07,196][52220] InferenceWorker_p1-w0 profile tree view: wait_policy: 0.0052 wait_policy_total: 206.1224 update_model: 19.1857 weight_update: 0.0015 one_step: 0.0011 handle_policy_step: 1228.1887 deserialize: 33.5167, stack: 7.7549, obs_to_device_normalize: 247.0163, forward: 616.7188, send_messages: 97.8848 prepare_outputs: 154.8795 to_cpu: 78.1772 [2023-09-21 13:49:07,197][52220] Learner 0 profile tree view: misc: 0.0148, prepare_batch: 21.9078 train: 106.3967 epoch_init: 0.0623, minibatch_init: 1.7514, losses_postprocess: 2.8950, kl_divergence: 1.3439, after_optimizer: 1.6311 calculate_losses: 31.6922 losses_init: 0.0586, forward_head: 3.6187, bptt_initial: 0.2110, bptt: 0.2228, tail: 11.9533, advantages_returns: 1.6383, losses: 12.0433 update: 64.8146 clip: 8.3106 [2023-09-21 13:49:07,198][52220] Learner 1 profile tree view: misc: 0.0145, prepare_batch: 21.7773 train: 106.0387 epoch_init: 0.0630, minibatch_init: 1.7017, losses_postprocess: 2.8528, kl_divergence: 1.3355, after_optimizer: 1.5977 calculate_losses: 31.4837 losses_init: 0.0557, forward_head: 3.6214, bptt_initial: 0.2100, bptt: 0.2298, tail: 11.8125, advantages_returns: 1.6091, losses: 11.9833 update: 64.7614 clip: 8.2701 [2023-09-21 13:49:07,198][52220] RolloutWorker_w0 profile tree view: wait_for_trajectories: 1.6159, enqueue_policy_requests: 75.0153, complete_rollouts: 2.5135, env_step: 470.5115, overhead: 91.9338 save_policy_outputs: 175.1953 split_output_tensors: 59.5794 [2023-09-21 13:49:07,199][52220] RolloutWorker_w7 profile tree view: wait_for_trajectories: 1.5498, enqueue_policy_requests: 71.7281, complete_rollouts: 2.4267, env_step: 455.7171, overhead: 88.3418 save_policy_outputs: 170.8656 split_output_tensors: 57.4403 [2023-09-21 13:49:07,200][52220] Loop Runner_EvtLoop terminating... [2023-09-21 13:49:07,201][52220] Runner profile tree view: main_loop: 1549.2240 [2023-09-21 13:49:07,201][52220] Collected {0: 10006528, 1: 10006528}, FPS: 12918.1