[2023-02-26 12:39:42,832][00203] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-26 12:39:42,836][00203] Rollout worker 0 uses device cpu [2023-02-26 12:39:42,838][00203] Rollout worker 1 uses device cpu [2023-02-26 12:39:42,840][00203] Rollout worker 2 uses device cpu [2023-02-26 12:39:42,841][00203] Rollout worker 3 uses device cpu [2023-02-26 12:39:42,843][00203] Rollout worker 4 uses device cpu [2023-02-26 12:39:42,844][00203] Rollout worker 5 uses device cpu [2023-02-26 12:39:42,846][00203] Rollout worker 6 uses device cpu [2023-02-26 12:39:42,847][00203] Rollout worker 7 uses device cpu [2023-02-26 12:48:49,195][00203] Environment doom_basic already registered, overwriting... [2023-02-26 12:48:49,207][00203] Environment doom_two_colors_easy already registered, overwriting... [2023-02-26 12:48:49,210][00203] Environment doom_two_colors_hard already registered, overwriting... [2023-02-26 12:48:49,213][00203] Environment doom_dm already registered, overwriting... [2023-02-26 12:48:49,214][00203] Environment doom_dwango5 already registered, overwriting... [2023-02-26 12:48:49,215][00203] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-26 12:48:49,216][00203] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-26 12:48:49,217][00203] Environment doom_my_way_home already registered, overwriting... [2023-02-26 12:48:49,219][00203] Environment doom_deadly_corridor already registered, overwriting... [2023-02-26 12:48:49,221][00203] Environment doom_defend_the_center already registered, overwriting... [2023-02-26 12:48:49,222][00203] Environment doom_defend_the_line already registered, overwriting... [2023-02-26 12:48:49,223][00203] Environment doom_health_gathering already registered, overwriting... [2023-02-26 12:48:49,224][00203] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-26 12:48:49,227][00203] Environment doom_battle already registered, overwriting... [2023-02-26 12:48:49,228][00203] Environment doom_battle2 already registered, overwriting... [2023-02-26 12:48:49,229][00203] Environment doom_duel_bots already registered, overwriting... [2023-02-26 12:48:49,230][00203] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-26 12:48:49,232][00203] Environment doom_duel already registered, overwriting... [2023-02-26 12:48:49,234][00203] Environment doom_deathmatch_full already registered, overwriting... [2023-02-26 12:48:49,236][00203] Environment doom_benchmark already registered, overwriting... [2023-02-26 12:48:49,237][00203] register_encoder_factory: [2023-02-26 12:48:49,290][00203] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 12:48:49,292][00203] Overriding arg 'device' with value 'cpu' passed from command line [2023-02-26 12:48:49,300][00203] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-26 12:48:49,302][00203] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-26 12:48:49,304][00203] Weights and Biases integration disabled [2023-02-26 12:48:49,310][00203] Environment var CUDA_VISIBLE_DEVICES is [2023-02-26 12:48:52,073][00203] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-26 12:48:52,076][00203] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-26 12:48:52,080][00203] Rollout worker 0 uses device cpu [2023-02-26 12:48:52,083][00203] Rollout worker 1 uses device cpu [2023-02-26 12:48:52,086][00203] Rollout worker 2 uses device cpu [2023-02-26 12:48:52,088][00203] Rollout worker 3 uses device cpu [2023-02-26 12:48:52,089][00203] Rollout worker 4 uses device cpu [2023-02-26 12:48:52,091][00203] Rollout worker 5 uses device cpu [2023-02-26 12:48:52,093][00203] Rollout worker 6 uses device cpu [2023-02-26 12:48:52,096][00203] Rollout worker 7 uses device cpu [2023-02-26 12:48:52,253][00203] InferenceWorker_p0-w0: min num requests: 2 [2023-02-26 12:48:52,292][00203] Starting all processes... [2023-02-26 12:48:52,295][00203] Starting process learner_proc0 [2023-02-26 12:48:52,368][00203] Starting all processes... [2023-02-26 12:48:52,379][00203] Starting process inference_proc0-0 [2023-02-26 12:48:52,380][00203] Starting process rollout_proc0 [2023-02-26 12:48:52,382][00203] Starting process rollout_proc1 [2023-02-26 12:48:52,382][00203] Starting process rollout_proc2 [2023-02-26 12:48:52,382][00203] Starting process rollout_proc3 [2023-02-26 12:48:52,382][00203] Starting process rollout_proc4 [2023-02-26 12:48:52,382][00203] Starting process rollout_proc5 [2023-02-26 12:48:52,382][00203] Starting process rollout_proc6 [2023-02-26 12:48:52,382][00203] Starting process rollout_proc7 [2023-02-26 12:49:08,777][13314] Starting seed is not provided [2023-02-26 12:49:08,779][13314] Initializing actor-critic model on device cpu [2023-02-26 12:49:08,781][13314] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 12:49:08,788][13314] RunningMeanStd input shape: (1,) [2023-02-26 12:49:08,903][13314] ConvEncoder: input_channels=3 [2023-02-26 12:49:08,951][13337] Worker 3 uses CPU cores [1] [2023-02-26 12:49:09,160][13334] Worker 1 uses CPU cores [1] [2023-02-26 12:49:09,232][13340] Worker 7 uses CPU cores [1] [2023-02-26 12:49:09,389][13332] Worker 0 uses CPU cores [0] [2023-02-26 12:49:09,459][13338] Worker 5 uses CPU cores [1] [2023-02-26 12:49:09,461][13336] Worker 4 uses CPU cores [0] [2023-02-26 12:49:09,509][13339] Worker 6 uses CPU cores [0] [2023-02-26 12:49:09,568][13335] Worker 2 uses CPU cores [0] [2023-02-26 12:49:09,645][13314] Conv encoder output size: 512 [2023-02-26 12:49:09,645][13314] Policy head output size: 512 [2023-02-26 12:49:09,719][13314] Created Actor Critic model with architecture: [2023-02-26 12:49:09,720][13314] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-26 12:49:10,354][13314] Using optimizer [2023-02-26 12:49:10,355][13314] No checkpoints found [2023-02-26 12:49:10,356][13314] Did not load from checkpoint, starting from scratch! [2023-02-26 12:49:10,357][13314] Initialized policy 0 weights for model version 0 [2023-02-26 12:49:10,362][13333] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 12:49:10,365][13314] LearnerWorker_p0 finished initialization! [2023-02-26 12:49:10,364][13333] RunningMeanStd input shape: (1,) [2023-02-26 12:49:10,388][13333] ConvEncoder: input_channels=3 [2023-02-26 12:49:10,587][13333] Conv encoder output size: 512 [2023-02-26 12:49:10,589][13333] Policy head output size: 512 [2023-02-26 12:49:10,616][00203] Inference worker 0-0 is ready! [2023-02-26 12:49:10,618][00203] All inference workers are ready! Signal rollout workers to start! [2023-02-26 12:49:10,818][13338] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 12:49:10,824][13337] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 12:49:10,822][13334] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 12:49:10,826][13340] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 12:49:10,826][13336] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 12:49:10,829][13335] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 12:49:10,844][13332] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 12:49:10,854][13339] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 12:49:12,244][00203] Heartbeat connected on Batcher_0 [2023-02-26 12:49:12,249][00203] Heartbeat connected on LearnerWorker_p0 [2023-02-26 12:49:12,309][00203] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-26 12:49:12,500][13336] Decorrelating experience for 0 frames... [2023-02-26 12:49:12,502][13339] Decorrelating experience for 0 frames... [2023-02-26 12:49:12,504][13332] Decorrelating experience for 0 frames... [2023-02-26 12:49:12,816][13338] Decorrelating experience for 0 frames... [2023-02-26 12:49:12,833][13334] Decorrelating experience for 0 frames... [2023-02-26 12:49:12,831][13337] Decorrelating experience for 0 frames... [2023-02-26 12:49:12,852][13340] Decorrelating experience for 0 frames... [2023-02-26 12:49:13,277][13332] Decorrelating experience for 32 frames... [2023-02-26 12:49:14,311][00203] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 12:49:14,494][13336] Decorrelating experience for 32 frames... [2023-02-26 12:49:14,521][13335] Decorrelating experience for 0 frames... [2023-02-26 12:49:14,706][13337] Decorrelating experience for 32 frames... [2023-02-26 12:49:14,723][13334] Decorrelating experience for 32 frames... [2023-02-26 12:49:14,711][13338] Decorrelating experience for 32 frames... [2023-02-26 12:49:14,757][13340] Decorrelating experience for 32 frames... [2023-02-26 12:49:14,831][13339] Decorrelating experience for 32 frames... [2023-02-26 12:49:16,389][13336] Decorrelating experience for 64 frames... [2023-02-26 12:49:16,515][13332] Decorrelating experience for 64 frames... [2023-02-26 12:49:17,251][13338] Decorrelating experience for 64 frames... [2023-02-26 12:49:17,265][13334] Decorrelating experience for 64 frames... [2023-02-26 12:49:17,279][13339] Decorrelating experience for 64 frames... [2023-02-26 12:49:17,420][13340] Decorrelating experience for 64 frames... [2023-02-26 12:49:18,699][13335] Decorrelating experience for 32 frames... [2023-02-26 12:49:18,805][13336] Decorrelating experience for 96 frames... [2023-02-26 12:49:19,159][13337] Decorrelating experience for 64 frames... [2023-02-26 12:49:19,235][00203] Heartbeat connected on RolloutWorker_w4 [2023-02-26 12:49:19,311][00203] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 12:49:19,332][13338] Decorrelating experience for 96 frames... [2023-02-26 12:49:19,533][13332] Decorrelating experience for 96 frames... [2023-02-26 12:49:19,552][00203] Heartbeat connected on RolloutWorker_w5 [2023-02-26 12:49:20,201][00203] Heartbeat connected on RolloutWorker_w0 [2023-02-26 12:49:22,771][13337] Decorrelating experience for 96 frames... [2023-02-26 12:49:23,360][13340] Decorrelating experience for 96 frames... [2023-02-26 12:49:23,560][00203] Heartbeat connected on RolloutWorker_w3 [2023-02-26 12:49:23,593][13339] Decorrelating experience for 96 frames... [2023-02-26 12:49:23,745][13335] Decorrelating experience for 64 frames... [2023-02-26 12:49:24,312][00203] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 50.6. Samples: 506. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 12:49:24,320][00203] Avg episode reward: [(0, '1.408')] [2023-02-26 12:49:24,369][00203] Heartbeat connected on RolloutWorker_w6 [2023-02-26 12:49:24,575][00203] Heartbeat connected on RolloutWorker_w7 [2023-02-26 12:49:27,815][13334] Decorrelating experience for 96 frames... [2023-02-26 12:49:27,946][00203] Heartbeat connected on RolloutWorker_w1 [2023-02-26 12:49:28,216][13335] Decorrelating experience for 96 frames... [2023-02-26 12:49:28,572][00203] Heartbeat connected on RolloutWorker_w2 [2023-02-26 12:49:29,311][00203] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 138.9. Samples: 2084. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 12:49:29,316][00203] Avg episode reward: [(0, '2.558')] [2023-02-26 12:49:29,856][13314] Signal inference workers to stop experience collection... [2023-02-26 12:49:29,926][13333] InferenceWorker_p0-w0: stopping experience collection [2023-02-26 12:49:31,741][13314] Signal inference workers to resume experience collection... [2023-02-26 12:49:31,743][13333] InferenceWorker_p0-w0: resuming experience collection [2023-02-26 12:49:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 125.0. Samples: 2500. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-26 12:49:34,321][00203] Avg episode reward: [(0, '2.663')] [2023-02-26 12:49:39,314][00203] Fps is (10 sec: 818.9, 60 sec: 327.6, 300 sec: 327.6). Total num frames: 8192. Throughput: 0: 157.3. Samples: 3932. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-26 12:49:39,325][00203] Avg episode reward: [(0, '2.936')] [2023-02-26 12:49:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 12288. Throughput: 0: 160.8. Samples: 4824. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:49:44,316][00203] Avg episode reward: [(0, '3.133')] [2023-02-26 12:49:49,311][00203] Fps is (10 sec: 819.5, 60 sec: 468.1, 300 sec: 468.1). Total num frames: 16384. Throughput: 0: 154.5. Samples: 5408. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:49:49,315][00203] Avg episode reward: [(0, '3.356')] [2023-02-26 12:49:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 512.0, 300 sec: 512.0). Total num frames: 20480. Throughput: 0: 166.7. Samples: 6668. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:49:54,322][00203] Avg episode reward: [(0, '3.577')] [2023-02-26 12:49:59,318][00203] Fps is (10 sec: 818.6, 60 sec: 546.0, 300 sec: 546.0). Total num frames: 24576. Throughput: 0: 181.5. Samples: 8168. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:49:59,326][00203] Avg episode reward: [(0, '3.649')] [2023-02-26 12:50:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 573.4, 300 sec: 573.4). Total num frames: 28672. Throughput: 0: 191.3. Samples: 8608. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:50:04,314][00203] Avg episode reward: [(0, '3.767')] [2023-02-26 12:50:09,311][00203] Fps is (10 sec: 819.8, 60 sec: 595.8, 300 sec: 595.8). Total num frames: 32768. Throughput: 0: 202.4. Samples: 9612. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:50:09,313][00203] Avg episode reward: [(0, '3.809')] [2023-02-26 12:50:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 36864. Throughput: 0: 199.5. Samples: 11060. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:50:14,316][00203] Avg episode reward: [(0, '4.035')] [2023-02-26 12:50:16,562][13333] Updated weights for policy 0, policy_version 10 (0.1186) [2023-02-26 12:50:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 630.2). Total num frames: 40960. Throughput: 0: 204.8. Samples: 11716. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:50:19,315][00203] Avg episode reward: [(0, '4.256')] [2023-02-26 12:50:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 643.7). Total num frames: 45056. Throughput: 0: 198.2. Samples: 12852. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:50:24,316][00203] Avg episode reward: [(0, '4.410')] [2023-02-26 12:50:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 655.4). Total num frames: 49152. Throughput: 0: 199.6. Samples: 13804. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:50:29,315][00203] Avg episode reward: [(0, '4.454')] [2023-02-26 12:50:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 665.6). Total num frames: 53248. Throughput: 0: 206.5. Samples: 14702. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:50:34,315][00203] Avg episode reward: [(0, '4.527')] [2023-02-26 12:50:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 674.6). Total num frames: 57344. Throughput: 0: 207.5. Samples: 16006. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:50:39,317][00203] Avg episode reward: [(0, '4.537')] [2023-02-26 12:50:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 682.7). Total num frames: 61440. Throughput: 0: 200.8. Samples: 17202. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:50:44,315][00203] Avg episode reward: [(0, '4.543')] [2023-02-26 12:50:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 689.9). Total num frames: 65536. Throughput: 0: 204.0. Samples: 17786. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:50:49,319][00203] Avg episode reward: [(0, '4.563')] [2023-02-26 12:50:52,745][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000017_69632.pth... [2023-02-26 12:50:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 696.3). Total num frames: 69632. Throughput: 0: 205.4. Samples: 18854. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:50:54,315][00203] Avg episode reward: [(0, '4.501')] [2023-02-26 12:50:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 702.2). Total num frames: 73728. Throughput: 0: 209.0. Samples: 20466. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:50:59,315][00203] Avg episode reward: [(0, '4.406')] [2023-02-26 12:51:04,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 707.5). Total num frames: 77824. Throughput: 0: 204.3. Samples: 20910. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:51:04,321][00203] Avg episode reward: [(0, '4.406')] [2023-02-26 12:51:08,441][13333] Updated weights for policy 0, policy_version 20 (0.0970) [2023-02-26 12:51:09,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 712.3). Total num frames: 81920. Throughput: 0: 201.6. Samples: 21926. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:51:09,319][00203] Avg episode reward: [(0, '4.393')] [2023-02-26 12:51:14,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 716.8). Total num frames: 86016. Throughput: 0: 206.6. Samples: 23100. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:51:14,316][00203] Avg episode reward: [(0, '4.343')] [2023-02-26 12:51:19,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 720.9). Total num frames: 90112. Throughput: 0: 207.0. Samples: 24018. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:51:19,319][00203] Avg episode reward: [(0, '4.457')] [2023-02-26 12:51:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 724.7). Total num frames: 94208. Throughput: 0: 204.0. Samples: 25184. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:51:24,321][00203] Avg episode reward: [(0, '4.444')] [2023-02-26 12:51:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 728.2). Total num frames: 98304. Throughput: 0: 197.7. Samples: 26098. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:51:29,316][00203] Avg episode reward: [(0, '4.515')] [2023-02-26 12:51:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 731.4). Total num frames: 102400. Throughput: 0: 204.4. Samples: 26986. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:51:34,314][00203] Avg episode reward: [(0, '4.527')] [2023-02-26 12:51:37,327][13314] Saving new best policy, reward=4.527! [2023-02-26 12:51:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 734.5). Total num frames: 106496. Throughput: 0: 210.5. Samples: 28328. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:51:39,320][00203] Avg episode reward: [(0, '4.566')] [2023-02-26 12:51:41,933][13314] Saving new best policy, reward=4.566! [2023-02-26 12:51:44,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 737.3). Total num frames: 110592. Throughput: 0: 202.9. Samples: 29598. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:51:44,318][00203] Avg episode reward: [(0, '4.561')] [2023-02-26 12:51:49,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 739.9). Total num frames: 114688. Throughput: 0: 206.2. Samples: 30190. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:51:49,317][00203] Avg episode reward: [(0, '4.594')] [2023-02-26 12:51:53,784][13314] Saving new best policy, reward=4.594! [2023-02-26 12:51:54,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 742.4). Total num frames: 118784. Throughput: 0: 206.9. Samples: 31238. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:51:54,318][00203] Avg episode reward: [(0, '4.630')] [2023-02-26 12:51:58,046][13314] Saving new best policy, reward=4.630! [2023-02-26 12:51:58,053][13333] Updated weights for policy 0, policy_version 30 (0.0075) [2023-02-26 12:51:59,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 744.7). Total num frames: 122880. Throughput: 0: 209.5. Samples: 32528. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:51:59,315][00203] Avg episode reward: [(0, '4.660')] [2023-02-26 12:52:02,695][13314] Saving new best policy, reward=4.660! [2023-02-26 12:52:04,313][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 746.9). Total num frames: 126976. Throughput: 0: 207.1. Samples: 33336. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:52:04,324][00203] Avg episode reward: [(0, '4.640')] [2023-02-26 12:52:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 725.6). Total num frames: 126976. Throughput: 0: 203.3. Samples: 34334. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:52:09,314][00203] Avg episode reward: [(0, '4.614')] [2023-02-26 12:52:14,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 728.2). Total num frames: 131072. Throughput: 0: 204.5. Samples: 35302. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:52:14,316][00203] Avg episode reward: [(0, '4.470')] [2023-02-26 12:52:19,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 752.8). Total num frames: 139264. Throughput: 0: 198.0. Samples: 35896. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:52:19,317][00203] Avg episode reward: [(0, '4.385')] [2023-02-26 12:52:24,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 754.5). Total num frames: 143360. Throughput: 0: 201.9. Samples: 37412. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:52:24,317][00203] Avg episode reward: [(0, '4.345')] [2023-02-26 12:52:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 735.2). Total num frames: 143360. Throughput: 0: 196.8. Samples: 38452. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:52:29,314][00203] Avg episode reward: [(0, '4.260')] [2023-02-26 12:52:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 737.3). Total num frames: 147456. Throughput: 0: 192.4. Samples: 38848. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:52:34,315][00203] Avg episode reward: [(0, '4.241')] [2023-02-26 12:52:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 739.3). Total num frames: 151552. Throughput: 0: 202.9. Samples: 40370. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:52:39,323][00203] Avg episode reward: [(0, '4.245')] [2023-02-26 12:52:44,316][00203] Fps is (10 sec: 1228.2, 60 sec: 819.2, 300 sec: 760.7). Total num frames: 159744. Throughput: 0: 203.8. Samples: 41698. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:52:44,323][00203] Avg episode reward: [(0, '4.274')] [2023-02-26 12:52:49,132][13333] Updated weights for policy 0, policy_version 40 (0.1278) [2023-02-26 12:52:49,314][00203] Fps is (10 sec: 1228.4, 60 sec: 819.2, 300 sec: 762.0). Total num frames: 163840. Throughput: 0: 201.3. Samples: 42394. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:52:49,322][00203] Avg episode reward: [(0, '4.278')] [2023-02-26 12:52:54,311][00203] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 744.7). Total num frames: 163840. Throughput: 0: 199.1. Samples: 43292. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:52:54,315][00203] Avg episode reward: [(0, '4.298')] [2023-02-26 12:52:55,428][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000041_167936.pth... [2023-02-26 12:52:59,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 746.4). Total num frames: 167936. Throughput: 0: 207.5. Samples: 44640. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:52:59,315][00203] Avg episode reward: [(0, '4.367')] [2023-02-26 12:53:04,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 765.8). Total num frames: 176128. Throughput: 0: 209.2. Samples: 45308. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:53:04,318][00203] Avg episode reward: [(0, '4.380')] [2023-02-26 12:53:09,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 749.5). Total num frames: 176128. Throughput: 0: 198.4. Samples: 46342. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:53:09,318][00203] Avg episode reward: [(0, '4.358')] [2023-02-26 12:53:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 750.9). Total num frames: 180224. Throughput: 0: 198.2. Samples: 47372. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:53:14,320][00203] Avg episode reward: [(0, '4.358')] [2023-02-26 12:53:19,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 752.3). Total num frames: 184320. Throughput: 0: 198.0. Samples: 47756. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:53:19,325][00203] Avg episode reward: [(0, '4.341')] [2023-02-26 12:53:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 753.7). Total num frames: 188416. Throughput: 0: 201.8. Samples: 49450. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:53:24,315][00203] Avg episode reward: [(0, '4.348')] [2023-02-26 12:53:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 754.9). Total num frames: 192512. Throughput: 0: 198.5. Samples: 50630. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:53:29,316][00203] Avg episode reward: [(0, '4.381')] [2023-02-26 12:53:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 756.2). Total num frames: 196608. Throughput: 0: 188.6. Samples: 50882. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:53:34,317][00203] Avg episode reward: [(0, '4.381')] [2023-02-26 12:53:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 757.4). Total num frames: 200704. Throughput: 0: 191.8. Samples: 51922. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:53:39,319][00203] Avg episode reward: [(0, '4.426')] [2023-02-26 12:53:41,260][13333] Updated weights for policy 0, policy_version 50 (0.0598) [2023-02-26 12:53:43,730][13314] Signal inference workers to stop experience collection... (50 times) [2023-02-26 12:53:43,767][13333] InferenceWorker_p0-w0: stopping experience collection (50 times) [2023-02-26 12:53:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 758.5). Total num frames: 204800. Throughput: 0: 201.4. Samples: 53702. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:53:44,322][00203] Avg episode reward: [(0, '4.384')] [2023-02-26 12:53:45,157][13314] Signal inference workers to resume experience collection... (50 times) [2023-02-26 12:53:45,159][13333] InferenceWorker_p0-w0: resuming experience collection (50 times) [2023-02-26 12:53:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 759.6). Total num frames: 208896. Throughput: 0: 195.6. Samples: 54108. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:53:49,317][00203] Avg episode reward: [(0, '4.397')] [2023-02-26 12:53:54,317][00203] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 760.7). Total num frames: 212992. Throughput: 0: 194.1. Samples: 55076. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:53:54,320][00203] Avg episode reward: [(0, '4.337')] [2023-02-26 12:53:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 761.7). Total num frames: 217088. Throughput: 0: 196.9. Samples: 56232. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:53:59,316][00203] Avg episode reward: [(0, '4.342')] [2023-02-26 12:54:04,311][00203] Fps is (10 sec: 819.7, 60 sec: 750.9, 300 sec: 762.7). Total num frames: 221184. Throughput: 0: 202.2. Samples: 56854. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:54:04,315][00203] Avg episode reward: [(0, '4.374')] [2023-02-26 12:54:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 225280. Throughput: 0: 202.3. Samples: 58552. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:54:09,316][00203] Avg episode reward: [(0, '4.407')] [2023-02-26 12:54:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 229376. Throughput: 0: 194.9. Samples: 59400. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:54:14,319][00203] Avg episode reward: [(0, '4.469')] [2023-02-26 12:54:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 233472. Throughput: 0: 200.5. Samples: 59904. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:54:19,317][00203] Avg episode reward: [(0, '4.567')] [2023-02-26 12:54:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 237568. Throughput: 0: 208.8. Samples: 61320. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:54:24,315][00203] Avg episode reward: [(0, '4.515')] [2023-02-26 12:54:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 241664. Throughput: 0: 205.7. Samples: 62958. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:54:29,314][00203] Avg episode reward: [(0, '4.544')] [2023-02-26 12:54:30,919][13333] Updated weights for policy 0, policy_version 60 (0.0586) [2023-02-26 12:54:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 245760. Throughput: 0: 201.4. Samples: 63170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:54:34,317][00203] Avg episode reward: [(0, '4.522')] [2023-02-26 12:54:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 249856. Throughput: 0: 200.1. Samples: 64078. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:54:39,315][00203] Avg episode reward: [(0, '4.590')] [2023-02-26 12:54:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 253952. Throughput: 0: 209.5. Samples: 65658. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:54:44,319][00203] Avg episode reward: [(0, '4.669')] [2023-02-26 12:54:46,539][13314] Saving new best policy, reward=4.669! [2023-02-26 12:54:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 258048. Throughput: 0: 208.0. Samples: 66214. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:54:49,317][00203] Avg episode reward: [(0, '4.710')] [2023-02-26 12:54:52,510][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000064_262144.pth... [2023-02-26 12:54:52,649][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000017_69632.pth [2023-02-26 12:54:52,681][13314] Saving new best policy, reward=4.710! [2023-02-26 12:54:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 262144. Throughput: 0: 192.6. Samples: 67218. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:54:54,325][00203] Avg episode reward: [(0, '4.740')] [2023-02-26 12:54:58,862][13314] Saving new best policy, reward=4.740! [2023-02-26 12:54:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 266240. Throughput: 0: 195.9. Samples: 68216. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:54:59,316][00203] Avg episode reward: [(0, '4.694')] [2023-02-26 12:55:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 270336. Throughput: 0: 207.8. Samples: 69256. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:55:04,324][00203] Avg episode reward: [(0, '4.603')] [2023-02-26 12:55:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 274432. Throughput: 0: 204.8. Samples: 70534. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:55:09,317][00203] Avg episode reward: [(0, '4.585')] [2023-02-26 12:55:14,317][00203] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 278528. Throughput: 0: 189.2. Samples: 71472. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:55:14,325][00203] Avg episode reward: [(0, '4.555')] [2023-02-26 12:55:19,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 278528. Throughput: 0: 197.0. Samples: 72036. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:55:19,315][00203] Avg episode reward: [(0, '4.480')] [2023-02-26 12:55:24,172][13333] Updated weights for policy 0, policy_version 70 (0.0633) [2023-02-26 12:55:24,311][00203] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 286720. Throughput: 0: 206.1. Samples: 73352. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:55:24,316][00203] Avg episode reward: [(0, '4.467')] [2023-02-26 12:55:29,314][00203] Fps is (10 sec: 1228.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 290816. Throughput: 0: 198.6. Samples: 74594. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 12:55:29,324][00203] Avg episode reward: [(0, '4.342')] [2023-02-26 12:55:34,314][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 294912. Throughput: 0: 200.7. Samples: 75248. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 12:55:34,319][00203] Avg episode reward: [(0, '4.336')] [2023-02-26 12:55:39,313][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 294912. Throughput: 0: 197.5. Samples: 76104. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 12:55:39,320][00203] Avg episode reward: [(0, '4.277')] [2023-02-26 12:55:44,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 299008. Throughput: 0: 205.2. Samples: 77450. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:55:44,323][00203] Avg episode reward: [(0, '4.244')] [2023-02-26 12:55:49,311][00203] Fps is (10 sec: 1229.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 307200. Throughput: 0: 199.5. Samples: 78234. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:55:49,315][00203] Avg episode reward: [(0, '4.246')] [2023-02-26 12:55:54,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 311296. Throughput: 0: 199.7. Samples: 79520. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:55:54,313][00203] Avg episode reward: [(0, '4.318')] [2023-02-26 12:55:59,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 311296. Throughput: 0: 196.4. Samples: 80310. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:55:59,321][00203] Avg episode reward: [(0, '4.263')] [2023-02-26 12:56:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 315392. Throughput: 0: 197.2. Samples: 80910. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 12:56:04,318][00203] Avg episode reward: [(0, '4.299')] [2023-02-26 12:56:09,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 323584. Throughput: 0: 203.9. Samples: 82526. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:56:09,314][00203] Avg episode reward: [(0, '4.340')] [2023-02-26 12:56:14,085][13333] Updated weights for policy 0, policy_version 80 (0.0752) [2023-02-26 12:56:14,313][00203] Fps is (10 sec: 1228.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 327680. Throughput: 0: 199.0. Samples: 83548. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:56:14,321][00203] Avg episode reward: [(0, '4.321')] [2023-02-26 12:56:19,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 327680. Throughput: 0: 196.7. Samples: 84098. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 12:56:19,315][00203] Avg episode reward: [(0, '4.484')] [2023-02-26 12:56:24,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 331776. Throughput: 0: 202.5. Samples: 85216. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:56:24,318][00203] Avg episode reward: [(0, '4.490')] [2023-02-26 12:56:29,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 339968. Throughput: 0: 201.8. Samples: 86532. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:56:29,315][00203] Avg episode reward: [(0, '4.500')] [2023-02-26 12:56:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 339968. Throughput: 0: 201.6. Samples: 87308. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:56:34,314][00203] Avg episode reward: [(0, '4.458')] [2023-02-26 12:56:39,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 344064. Throughput: 0: 192.0. Samples: 88158. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:56:39,317][00203] Avg episode reward: [(0, '4.471')] [2023-02-26 12:56:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 348160. Throughput: 0: 200.1. Samples: 89316. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:56:44,315][00203] Avg episode reward: [(0, '4.490')] [2023-02-26 12:56:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 352256. Throughput: 0: 201.9. Samples: 89994. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:56:49,315][00203] Avg episode reward: [(0, '4.475')] [2023-02-26 12:56:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 356352. Throughput: 0: 201.7. Samples: 91604. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:56:54,317][00203] Avg episode reward: [(0, '4.475')] [2023-02-26 12:56:54,986][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000088_360448.pth... [2023-02-26 12:56:55,086][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000041_167936.pth [2023-02-26 12:56:59,314][00203] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 360448. Throughput: 0: 197.4. Samples: 92432. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:56:59,327][00203] Avg episode reward: [(0, '4.419')] [2023-02-26 12:57:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 364544. Throughput: 0: 192.9. Samples: 92778. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:57:04,317][00203] Avg episode reward: [(0, '4.344')] [2023-02-26 12:57:05,531][13333] Updated weights for policy 0, policy_version 90 (0.1043) [2023-02-26 12:57:09,311][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 368640. Throughput: 0: 207.6. Samples: 94560. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:57:09,315][00203] Avg episode reward: [(0, '4.344')] [2023-02-26 12:57:14,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 376832. Throughput: 0: 206.2. Samples: 95812. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:57:14,321][00203] Avg episode reward: [(0, '4.321')] [2023-02-26 12:57:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 376832. Throughput: 0: 199.6. Samples: 96288. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:57:19,320][00203] Avg episode reward: [(0, '4.304')] [2023-02-26 12:57:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 380928. Throughput: 0: 201.4. Samples: 97222. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:57:24,322][00203] Avg episode reward: [(0, '4.367')] [2023-02-26 12:57:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 385024. Throughput: 0: 211.1. Samples: 98816. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 12:57:29,314][00203] Avg episode reward: [(0, '4.472')] [2023-02-26 12:57:34,314][00203] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 389120. Throughput: 0: 206.3. Samples: 99278. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:57:34,327][00203] Avg episode reward: [(0, '4.428')] [2023-02-26 12:57:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 393216. Throughput: 0: 182.2. Samples: 99804. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:57:39,315][00203] Avg episode reward: [(0, '4.428')] [2023-02-26 12:57:44,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 393216. Throughput: 0: 177.7. Samples: 100426. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:57:44,315][00203] Avg episode reward: [(0, '4.460')] [2023-02-26 12:57:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 397312. Throughput: 0: 177.6. Samples: 100772. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:57:49,322][00203] Avg episode reward: [(0, '4.390')] [2023-02-26 12:57:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 401408. Throughput: 0: 176.2. Samples: 102490. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 12:57:54,319][00203] Avg episode reward: [(0, '4.439')] [2023-02-26 12:57:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 405504. Throughput: 0: 174.8. Samples: 103680. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:57:59,318][00203] Avg episode reward: [(0, '4.390')] [2023-02-26 12:58:01,252][13333] Updated weights for policy 0, policy_version 100 (0.2009) [2023-02-26 12:58:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 409600. Throughput: 0: 170.5. Samples: 103960. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:58:04,318][00203] Avg episode reward: [(0, '4.400')] [2023-02-26 12:58:06,134][13314] Signal inference workers to stop experience collection... (100 times) [2023-02-26 12:58:06,266][13333] InferenceWorker_p0-w0: stopping experience collection (100 times) [2023-02-26 12:58:07,395][13314] Signal inference workers to resume experience collection... (100 times) [2023-02-26 12:58:07,396][13333] InferenceWorker_p0-w0: resuming experience collection (100 times) [2023-02-26 12:58:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 413696. Throughput: 0: 172.2. Samples: 104970. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:58:09,318][00203] Avg episode reward: [(0, '4.416')] [2023-02-26 12:58:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 791.4). Total num frames: 417792. Throughput: 0: 176.4. Samples: 106756. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:58:14,322][00203] Avg episode reward: [(0, '4.435')] [2023-02-26 12:58:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 421888. Throughput: 0: 180.3. Samples: 107390. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:58:19,317][00203] Avg episode reward: [(0, '4.446')] [2023-02-26 12:58:24,313][00203] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 425984. Throughput: 0: 186.7. Samples: 108204. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:58:24,318][00203] Avg episode reward: [(0, '4.459')] [2023-02-26 12:58:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 430080. Throughput: 0: 196.4. Samples: 109262. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:58:29,316][00203] Avg episode reward: [(0, '4.402')] [2023-02-26 12:58:34,311][00203] Fps is (10 sec: 819.3, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 434176. Throughput: 0: 204.2. Samples: 109960. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:58:34,314][00203] Avg episode reward: [(0, '4.341')] [2023-02-26 12:58:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 438272. Throughput: 0: 201.7. Samples: 111568. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:58:39,316][00203] Avg episode reward: [(0, '4.356')] [2023-02-26 12:58:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 442368. Throughput: 0: 192.8. Samples: 112354. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:58:44,319][00203] Avg episode reward: [(0, '4.467')] [2023-02-26 12:58:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 446464. Throughput: 0: 199.8. Samples: 112952. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:58:49,320][00203] Avg episode reward: [(0, '4.512')] [2023-02-26 12:58:52,288][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000110_450560.pth... [2023-02-26 12:58:52,294][13333] Updated weights for policy 0, policy_version 110 (0.0588) [2023-02-26 12:58:52,429][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000064_262144.pth [2023-02-26 12:58:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 450560. Throughput: 0: 205.5. Samples: 114218. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:58:54,319][00203] Avg episode reward: [(0, '4.577')] [2023-02-26 12:58:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 454656. Throughput: 0: 201.2. Samples: 115812. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:58:59,315][00203] Avg episode reward: [(0, '4.568')] [2023-02-26 12:59:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 458752. Throughput: 0: 195.5. Samples: 116188. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:59:04,315][00203] Avg episode reward: [(0, '4.585')] [2023-02-26 12:59:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 462848. Throughput: 0: 199.7. Samples: 117188. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:59:09,319][00203] Avg episode reward: [(0, '4.483')] [2023-02-26 12:59:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 466944. Throughput: 0: 205.7. Samples: 118518. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 12:59:14,314][00203] Avg episode reward: [(0, '4.478')] [2023-02-26 12:59:19,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 471040. Throughput: 0: 205.7. Samples: 119216. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:59:19,316][00203] Avg episode reward: [(0, '4.478')] [2023-02-26 12:59:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 475136. Throughput: 0: 194.5. Samples: 120322. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:59:24,316][00203] Avg episode reward: [(0, '4.494')] [2023-02-26 12:59:29,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 479232. Throughput: 0: 199.5. Samples: 121332. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:59:29,316][00203] Avg episode reward: [(0, '4.546')] [2023-02-26 12:59:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 483328. Throughput: 0: 207.7. Samples: 122298. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:59:34,315][00203] Avg episode reward: [(0, '4.506')] [2023-02-26 12:59:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 487424. Throughput: 0: 204.0. Samples: 123398. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:59:39,322][00203] Avg episode reward: [(0, '4.496')] [2023-02-26 12:59:43,666][13333] Updated weights for policy 0, policy_version 120 (0.1310) [2023-02-26 12:59:44,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 491520. Throughput: 0: 190.5. Samples: 124386. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:59:44,319][00203] Avg episode reward: [(0, '4.496')] [2023-02-26 12:59:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 491520. Throughput: 0: 195.4. Samples: 124982. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 12:59:49,318][00203] Avg episode reward: [(0, '4.486')] [2023-02-26 12:59:54,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 499712. Throughput: 0: 203.6. Samples: 126348. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:59:54,323][00203] Avg episode reward: [(0, '4.525')] [2023-02-26 12:59:59,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 503808. Throughput: 0: 203.8. Samples: 127688. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 12:59:59,316][00203] Avg episode reward: [(0, '4.531')] [2023-02-26 13:00:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 507904. Throughput: 0: 203.9. Samples: 128390. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:00:04,319][00203] Avg episode reward: [(0, '4.525')] [2023-02-26 13:00:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 507904. Throughput: 0: 197.7. Samples: 129218. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:00:09,317][00203] Avg episode reward: [(0, '4.545')] [2023-02-26 13:00:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 512000. Throughput: 0: 206.3. Samples: 130616. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:00:14,316][00203] Avg episode reward: [(0, '4.574')] [2023-02-26 13:00:19,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 520192. Throughput: 0: 201.2. Samples: 131354. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:00:19,318][00203] Avg episode reward: [(0, '4.643')] [2023-02-26 13:00:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 520192. Throughput: 0: 207.4. Samples: 132732. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:00:24,314][00203] Avg episode reward: [(0, '4.582')] [2023-02-26 13:00:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 524288. Throughput: 0: 205.5. Samples: 133632. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:00:29,318][00203] Avg episode reward: [(0, '4.558')] [2023-02-26 13:00:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 528384. Throughput: 0: 201.1. Samples: 134030. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:00:34,317][00203] Avg episode reward: [(0, '4.542')] [2023-02-26 13:00:35,382][13333] Updated weights for policy 0, policy_version 130 (0.0057) [2023-02-26 13:00:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 532480. Throughput: 0: 208.6. Samples: 135736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:00:39,321][00203] Avg episode reward: [(0, '4.546')] [2023-02-26 13:00:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 536576. Throughput: 0: 204.9. Samples: 136908. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:00:44,319][00203] Avg episode reward: [(0, '4.495')] [2023-02-26 13:00:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 540672. Throughput: 0: 197.5. Samples: 137278. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:00:49,321][00203] Avg episode reward: [(0, '4.530')] [2023-02-26 13:00:50,961][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000133_544768.pth... [2023-02-26 13:00:51,096][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000088_360448.pth [2023-02-26 13:00:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 544768. Throughput: 0: 209.4. Samples: 138640. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:00:54,318][00203] Avg episode reward: [(0, '4.592')] [2023-02-26 13:00:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 548864. Throughput: 0: 206.1. Samples: 139892. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:00:59,322][00203] Avg episode reward: [(0, '4.599')] [2023-02-26 13:01:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 552960. Throughput: 0: 207.2. Samples: 140678. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:01:04,315][00203] Avg episode reward: [(0, '4.664')] [2023-02-26 13:01:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 557056. Throughput: 0: 193.2. Samples: 141428. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:01:09,321][00203] Avg episode reward: [(0, '4.569')] [2023-02-26 13:01:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 561152. Throughput: 0: 200.7. Samples: 142664. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:01:14,316][00203] Avg episode reward: [(0, '4.540')] [2023-02-26 13:01:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 565248. Throughput: 0: 203.8. Samples: 143200. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:01:19,316][00203] Avg episode reward: [(0, '4.446')] [2023-02-26 13:01:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 569344. Throughput: 0: 197.3. Samples: 144614. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:01:24,317][00203] Avg episode reward: [(0, '4.425')] [2023-02-26 13:01:26,329][13333] Updated weights for policy 0, policy_version 140 (0.0067) [2023-02-26 13:01:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 573440. Throughput: 0: 189.0. Samples: 145414. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:01:29,317][00203] Avg episode reward: [(0, '4.487')] [2023-02-26 13:01:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 577536. Throughput: 0: 193.0. Samples: 145962. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:01:34,318][00203] Avg episode reward: [(0, '4.535')] [2023-02-26 13:01:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 581632. Throughput: 0: 193.0. Samples: 147324. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:01:39,316][00203] Avg episode reward: [(0, '4.471')] [2023-02-26 13:01:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 585728. Throughput: 0: 200.9. Samples: 148934. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:01:44,322][00203] Avg episode reward: [(0, '4.351')] [2023-02-26 13:01:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 589824. Throughput: 0: 188.4. Samples: 149154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:01:49,315][00203] Avg episode reward: [(0, '4.372')] [2023-02-26 13:01:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 593920. Throughput: 0: 192.8. Samples: 150104. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:01:54,319][00203] Avg episode reward: [(0, '4.348')] [2023-02-26 13:01:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 598016. Throughput: 0: 203.9. Samples: 151840. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:01:59,318][00203] Avg episode reward: [(0, '4.266')] [2023-02-26 13:02:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 602112. Throughput: 0: 203.4. Samples: 152354. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:02:04,315][00203] Avg episode reward: [(0, '4.293')] [2023-02-26 13:02:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 606208. Throughput: 0: 192.4. Samples: 153270. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:02:09,314][00203] Avg episode reward: [(0, '4.322')] [2023-02-26 13:02:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 610304. Throughput: 0: 196.3. Samples: 154248. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:02:14,313][00203] Avg episode reward: [(0, '4.333')] [2023-02-26 13:02:18,160][13333] Updated weights for policy 0, policy_version 150 (0.0598) [2023-02-26 13:02:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 614400. Throughput: 0: 206.0. Samples: 155232. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:02:19,319][00203] Avg episode reward: [(0, '4.239')] [2023-02-26 13:02:20,963][13314] Signal inference workers to stop experience collection... (150 times) [2023-02-26 13:02:21,021][13333] InferenceWorker_p0-w0: stopping experience collection (150 times) [2023-02-26 13:02:22,459][13314] Signal inference workers to resume experience collection... (150 times) [2023-02-26 13:02:22,459][13333] InferenceWorker_p0-w0: resuming experience collection (150 times) [2023-02-26 13:02:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 618496. Throughput: 0: 202.0. Samples: 156412. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:02:24,318][00203] Avg episode reward: [(0, '4.340')] [2023-02-26 13:02:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 622592. Throughput: 0: 188.7. Samples: 157426. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:02:29,315][00203] Avg episode reward: [(0, '4.337')] [2023-02-26 13:02:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 622592. Throughput: 0: 193.9. Samples: 157878. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:02:34,317][00203] Avg episode reward: [(0, '4.394')] [2023-02-26 13:02:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 630784. Throughput: 0: 204.6. Samples: 159310. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:02:39,315][00203] Avg episode reward: [(0, '4.472')] [2023-02-26 13:02:44,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 634880. Throughput: 0: 194.3. Samples: 160582. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:02:44,314][00203] Avg episode reward: [(0, '4.478')] [2023-02-26 13:02:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 634880. Throughput: 0: 197.7. Samples: 161252. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:02:49,315][00203] Avg episode reward: [(0, '4.484')] [2023-02-26 13:02:54,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 638976. Throughput: 0: 196.4. Samples: 162106. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:02:54,316][00203] Avg episode reward: [(0, '4.448')] [2023-02-26 13:02:55,668][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000157_643072.pth... [2023-02-26 13:02:55,774][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000110_450560.pth [2023-02-26 13:02:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 643072. Throughput: 0: 206.1. Samples: 163522. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:02:59,315][00203] Avg episode reward: [(0, '4.487')] [2023-02-26 13:03:04,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 651264. Throughput: 0: 197.7. Samples: 164130. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:03:04,315][00203] Avg episode reward: [(0, '4.523')] [2023-02-26 13:03:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 651264. Throughput: 0: 198.5. Samples: 165344. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:03:09,317][00203] Avg episode reward: [(0, '4.611')] [2023-02-26 13:03:10,238][13333] Updated weights for policy 0, policy_version 160 (0.1172) [2023-02-26 13:03:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 655360. Throughput: 0: 195.6. Samples: 166228. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:03:14,322][00203] Avg episode reward: [(0, '4.595')] [2023-02-26 13:03:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 659456. Throughput: 0: 196.0. Samples: 166698. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:03:19,318][00203] Avg episode reward: [(0, '4.513')] [2023-02-26 13:03:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 663552. Throughput: 0: 201.0. Samples: 168356. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:03:24,319][00203] Avg episode reward: [(0, '4.439')] [2023-02-26 13:03:29,314][00203] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 667648. Throughput: 0: 198.6. Samples: 169518. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:03:29,321][00203] Avg episode reward: [(0, '4.445')] [2023-02-26 13:03:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 671744. Throughput: 0: 188.7. Samples: 169744. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:03:34,317][00203] Avg episode reward: [(0, '4.481')] [2023-02-26 13:03:39,311][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 675840. Throughput: 0: 191.1. Samples: 170706. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:03:39,319][00203] Avg episode reward: [(0, '4.488')] [2023-02-26 13:03:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 679936. Throughput: 0: 199.5. Samples: 172498. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:03:44,315][00203] Avg episode reward: [(0, '4.529')] [2023-02-26 13:03:49,317][00203] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 684032. Throughput: 0: 200.5. Samples: 173152. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:03:49,331][00203] Avg episode reward: [(0, '4.522')] [2023-02-26 13:03:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 688128. Throughput: 0: 190.9. Samples: 173934. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:03:54,316][00203] Avg episode reward: [(0, '4.522')] [2023-02-26 13:03:59,311][00203] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 692224. Throughput: 0: 194.5. Samples: 174982. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:03:59,319][00203] Avg episode reward: [(0, '4.509')] [2023-02-26 13:04:02,299][13333] Updated weights for policy 0, policy_version 170 (0.0524) [2023-02-26 13:04:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 696320. Throughput: 0: 201.6. Samples: 175772. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:04:04,316][00203] Avg episode reward: [(0, '4.464')] [2023-02-26 13:04:09,313][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 700416. Throughput: 0: 199.1. Samples: 177316. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:04:09,323][00203] Avg episode reward: [(0, '4.513')] [2023-02-26 13:04:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 704512. Throughput: 0: 191.4. Samples: 178130. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:04:14,320][00203] Avg episode reward: [(0, '4.594')] [2023-02-26 13:04:19,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 708608. Throughput: 0: 198.1. Samples: 178658. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:04:19,316][00203] Avg episode reward: [(0, '4.555')] [2023-02-26 13:04:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 712704. Throughput: 0: 203.9. Samples: 179880. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:04:24,320][00203] Avg episode reward: [(0, '4.514')] [2023-02-26 13:04:29,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 716800. Throughput: 0: 201.5. Samples: 181568. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:04:29,319][00203] Avg episode reward: [(0, '4.474')] [2023-02-26 13:04:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 720896. Throughput: 0: 195.0. Samples: 181928. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:04:34,321][00203] Avg episode reward: [(0, '4.487')] [2023-02-26 13:04:39,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 724992. Throughput: 0: 199.8. Samples: 182926. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:04:39,317][00203] Avg episode reward: [(0, '4.543')] [2023-02-26 13:04:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 729088. Throughput: 0: 201.4. Samples: 184046. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:04:44,327][00203] Avg episode reward: [(0, '4.616')] [2023-02-26 13:04:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 733184. Throughput: 0: 204.9. Samples: 184994. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:04:49,317][00203] Avg episode reward: [(0, '4.567')] [2023-02-26 13:04:53,672][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth... [2023-02-26 13:04:53,689][13333] Updated weights for policy 0, policy_version 180 (0.1071) [2023-02-26 13:04:53,775][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000133_544768.pth [2023-02-26 13:04:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 737280. Throughput: 0: 194.8. Samples: 186080. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:04:54,317][00203] Avg episode reward: [(0, '4.591')] [2023-02-26 13:04:59,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 737280. Throughput: 0: 197.1. Samples: 187000. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:04:59,314][00203] Avg episode reward: [(0, '4.686')] [2023-02-26 13:05:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 741376. Throughput: 0: 198.7. Samples: 187600. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:05:04,315][00203] Avg episode reward: [(0, '4.582')] [2023-02-26 13:05:09,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 749568. Throughput: 0: 206.2. Samples: 189158. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:05:09,320][00203] Avg episode reward: [(0, '4.592')] [2023-02-26 13:05:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 749568. Throughput: 0: 190.4. Samples: 190134. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:05:14,314][00203] Avg episode reward: [(0, '4.527')] [2023-02-26 13:05:19,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 753664. Throughput: 0: 189.2. Samples: 190444. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:05:19,318][00203] Avg episode reward: [(0, '4.613')] [2023-02-26 13:05:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 757760. Throughput: 0: 193.0. Samples: 191612. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:05:24,325][00203] Avg episode reward: [(0, '4.632')] [2023-02-26 13:05:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 761856. Throughput: 0: 202.9. Samples: 193178. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:05:29,320][00203] Avg episode reward: [(0, '4.654')] [2023-02-26 13:05:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 765952. Throughput: 0: 198.1. Samples: 193908. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:05:34,317][00203] Avg episode reward: [(0, '4.611')] [2023-02-26 13:05:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 770048. Throughput: 0: 190.4. Samples: 194650. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:05:39,322][00203] Avg episode reward: [(0, '4.703')] [2023-02-26 13:05:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 774144. Throughput: 0: 196.4. Samples: 195840. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:05:44,318][00203] Avg episode reward: [(0, '4.724')] [2023-02-26 13:05:46,156][13333] Updated weights for policy 0, policy_version 190 (0.1247) [2023-02-26 13:05:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 778240. Throughput: 0: 197.2. Samples: 196474. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:05:49,325][00203] Avg episode reward: [(0, '4.779')] [2023-02-26 13:05:54,314][00203] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 782336. Throughput: 0: 194.3. Samples: 197900. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:05:54,317][00203] Avg episode reward: [(0, '4.827')] [2023-02-26 13:05:56,305][13314] Saving new best policy, reward=4.779! [2023-02-26 13:05:59,316][00203] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 786432. Throughput: 0: 189.7. Samples: 198670. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:05:59,326][00203] Avg episode reward: [(0, '4.821')] [2023-02-26 13:06:02,709][13314] Saving new best policy, reward=4.827! [2023-02-26 13:06:04,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 790528. Throughput: 0: 196.0. Samples: 199262. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:04,322][00203] Avg episode reward: [(0, '4.909')] [2023-02-26 13:06:07,043][13314] Saving new best policy, reward=4.909! [2023-02-26 13:06:09,311][00203] Fps is (10 sec: 819.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 794624. Throughput: 0: 197.9. Samples: 200518. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:09,322][00203] Avg episode reward: [(0, '4.890')] [2023-02-26 13:06:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 798720. Throughput: 0: 193.5. Samples: 201884. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:14,313][00203] Avg episode reward: [(0, '4.999')] [2023-02-26 13:06:17,505][13314] Saving new best policy, reward=4.999! [2023-02-26 13:06:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 802816. Throughput: 0: 186.5. Samples: 202300. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:19,320][00203] Avg episode reward: [(0, '4.950')] [2023-02-26 13:06:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 806912. Throughput: 0: 194.1. Samples: 203384. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:24,314][00203] Avg episode reward: [(0, '4.884')] [2023-02-26 13:06:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 811008. Throughput: 0: 198.4. Samples: 204768. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:29,315][00203] Avg episode reward: [(0, '4.822')] [2023-02-26 13:06:34,319][00203] Fps is (10 sec: 818.5, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 815104. Throughput: 0: 198.7. Samples: 205416. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:34,331][00203] Avg episode reward: [(0, '4.864')] [2023-02-26 13:06:37,741][13333] Updated weights for policy 0, policy_version 200 (0.0063) [2023-02-26 13:06:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 819200. Throughput: 0: 191.2. Samples: 206502. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:39,319][00203] Avg episode reward: [(0, '4.812')] [2023-02-26 13:06:42,864][13314] Signal inference workers to stop experience collection... (200 times) [2023-02-26 13:06:42,961][13333] InferenceWorker_p0-w0: stopping experience collection (200 times) [2023-02-26 13:06:44,311][00203] Fps is (10 sec: 409.9, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 819200. Throughput: 0: 196.5. Samples: 207510. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:44,317][00203] Avg episode reward: [(0, '4.694')] [2023-02-26 13:06:44,415][13314] Signal inference workers to resume experience collection... (200 times) [2023-02-26 13:06:44,417][13333] InferenceWorker_p0-w0: resuming experience collection (200 times) [2023-02-26 13:06:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 827392. Throughput: 0: 199.4. Samples: 208236. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:49,314][00203] Avg episode reward: [(0, '4.642')] [2023-02-26 13:06:52,171][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000203_831488.pth... [2023-02-26 13:06:52,281][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000157_643072.pth [2023-02-26 13:06:54,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 831488. Throughput: 0: 203.2. Samples: 209660. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:54,321][00203] Avg episode reward: [(0, '4.688')] [2023-02-26 13:06:59,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 835584. Throughput: 0: 196.4. Samples: 210722. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:06:59,319][00203] Avg episode reward: [(0, '4.664')] [2023-02-26 13:07:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 835584. Throughput: 0: 199.4. Samples: 211272. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:04,315][00203] Avg episode reward: [(0, '4.690')] [2023-02-26 13:07:09,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 843776. Throughput: 0: 205.8. Samples: 212646. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:09,320][00203] Avg episode reward: [(0, '4.603')] [2023-02-26 13:07:14,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 847872. Throughput: 0: 201.6. Samples: 213840. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:14,319][00203] Avg episode reward: [(0, '4.626')] [2023-02-26 13:07:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 851968. Throughput: 0: 203.5. Samples: 214574. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:19,319][00203] Avg episode reward: [(0, '4.555')] [2023-02-26 13:07:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 851968. Throughput: 0: 198.1. Samples: 215416. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:24,314][00203] Avg episode reward: [(0, '4.496')] [2023-02-26 13:07:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 856064. Throughput: 0: 204.8. Samples: 216724. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:29,316][00203] Avg episode reward: [(0, '4.502')] [2023-02-26 13:07:30,225][13333] Updated weights for policy 0, policy_version 210 (0.1634) [2023-02-26 13:07:34,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 864256. Throughput: 0: 199.9. Samples: 217232. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:34,319][00203] Avg episode reward: [(0, '4.620')] [2023-02-26 13:07:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 864256. Throughput: 0: 197.1. Samples: 218530. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:39,322][00203] Avg episode reward: [(0, '4.690')] [2023-02-26 13:07:44,315][00203] Fps is (10 sec: 409.4, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 868352. Throughput: 0: 192.0. Samples: 219362. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:44,320][00203] Avg episode reward: [(0, '4.645')] [2023-02-26 13:07:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 868352. Throughput: 0: 187.2. Samples: 219696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:49,333][00203] Avg episode reward: [(0, '4.661')] [2023-02-26 13:07:54,311][00203] Fps is (10 sec: 409.8, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 872448. Throughput: 0: 168.6. Samples: 220232. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:54,314][00203] Avg episode reward: [(0, '4.681')] [2023-02-26 13:07:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 763.7). Total num frames: 876544. Throughput: 0: 174.7. Samples: 221702. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:07:59,318][00203] Avg episode reward: [(0, '4.736')] [2023-02-26 13:08:04,312][00203] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 880640. Throughput: 0: 167.5. Samples: 222110. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:08:04,326][00203] Avg episode reward: [(0, '4.656')] [2023-02-26 13:08:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 884736. Throughput: 0: 169.0. Samples: 223022. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:08:09,317][00203] Avg episode reward: [(0, '4.747')] [2023-02-26 13:08:14,311][00203] Fps is (10 sec: 819.3, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 888832. Throughput: 0: 166.2. Samples: 224204. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:08:14,323][00203] Avg episode reward: [(0, '4.682')] [2023-02-26 13:08:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 892928. Throughput: 0: 168.3. Samples: 224806. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:08:19,316][00203] Avg episode reward: [(0, '4.845')] [2023-02-26 13:08:24,313][00203] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 897024. Throughput: 0: 167.4. Samples: 226062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:08:24,325][00203] Avg episode reward: [(0, '4.803')] [2023-02-26 13:08:28,312][13333] Updated weights for policy 0, policy_version 220 (0.1344) [2023-02-26 13:08:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 901120. Throughput: 0: 168.4. Samples: 226938. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:08:29,317][00203] Avg episode reward: [(0, '4.796')] [2023-02-26 13:08:34,311][00203] Fps is (10 sec: 819.4, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 905216. Throughput: 0: 175.7. Samples: 227602. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:08:34,320][00203] Avg episode reward: [(0, '4.711')] [2023-02-26 13:08:39,312][00203] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 909312. Throughput: 0: 192.8. Samples: 228908. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:08:39,320][00203] Avg episode reward: [(0, '4.661')] [2023-02-26 13:08:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.6). Total num frames: 913408. Throughput: 0: 186.8. Samples: 230108. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:08:44,316][00203] Avg episode reward: [(0, '4.708')] [2023-02-26 13:08:49,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 917504. Throughput: 0: 190.4. Samples: 230676. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:08:49,314][00203] Avg episode reward: [(0, '4.671')] [2023-02-26 13:08:54,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 917504. Throughput: 0: 189.4. Samples: 231546. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:08:54,322][00203] Avg episode reward: [(0, '4.651')] [2023-02-26 13:08:54,537][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000225_921600.pth... [2023-02-26 13:08:54,673][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth [2023-02-26 13:08:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 925696. Throughput: 0: 196.1. Samples: 233030. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:08:59,319][00203] Avg episode reward: [(0, '4.701')] [2023-02-26 13:09:04,312][00203] Fps is (10 sec: 1228.7, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 929792. Throughput: 0: 200.0. Samples: 233806. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:09:04,315][00203] Avg episode reward: [(0, '4.760')] [2023-02-26 13:09:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 929792. Throughput: 0: 194.6. Samples: 234820. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:09:09,315][00203] Avg episode reward: [(0, '4.832')] [2023-02-26 13:09:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 933888. Throughput: 0: 198.8. Samples: 235884. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:09:14,315][00203] Avg episode reward: [(0, '4.724')] [2023-02-26 13:09:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 937984. Throughput: 0: 199.0. Samples: 236558. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:09:19,321][00203] Avg episode reward: [(0, '4.740')] [2023-02-26 13:09:19,381][13333] Updated weights for policy 0, policy_version 230 (0.1797) [2023-02-26 13:09:24,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 946176. Throughput: 0: 203.8. Samples: 238078. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:09:24,320][00203] Avg episode reward: [(0, '4.768')] [2023-02-26 13:09:29,313][00203] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 946176. Throughput: 0: 195.7. Samples: 238916. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:09:29,319][00203] Avg episode reward: [(0, '4.768')] [2023-02-26 13:09:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 950272. Throughput: 0: 193.7. Samples: 239392. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:09:34,315][00203] Avg episode reward: [(0, '4.745')] [2023-02-26 13:09:39,311][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 954368. Throughput: 0: 204.4. Samples: 240746. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:09:39,323][00203] Avg episode reward: [(0, '4.728')] [2023-02-26 13:09:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 958464. Throughput: 0: 202.5. Samples: 242144. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:09:44,316][00203] Avg episode reward: [(0, '4.818')] [2023-02-26 13:09:49,312][00203] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 962560. Throughput: 0: 194.4. Samples: 242556. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:09:49,316][00203] Avg episode reward: [(0, '4.808')] [2023-02-26 13:09:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 966656. Throughput: 0: 191.8. Samples: 243450. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:09:54,314][00203] Avg episode reward: [(0, '4.854')] [2023-02-26 13:09:59,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 970752. Throughput: 0: 198.3. Samples: 244808. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:09:59,325][00203] Avg episode reward: [(0, '4.763')] [2023-02-26 13:10:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 974848. Throughput: 0: 194.6. Samples: 245316. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:04,317][00203] Avg episode reward: [(0, '4.699')] [2023-02-26 13:10:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 978944. Throughput: 0: 189.0. Samples: 246584. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:09,315][00203] Avg episode reward: [(0, '4.735')] [2023-02-26 13:10:12,935][13333] Updated weights for policy 0, policy_version 240 (0.1245) [2023-02-26 13:10:14,316][00203] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 777.5). Total num frames: 983040. Throughput: 0: 189.6. Samples: 247448. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:14,320][00203] Avg episode reward: [(0, '4.717')] [2023-02-26 13:10:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 987136. Throughput: 0: 196.7. Samples: 248242. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:19,318][00203] Avg episode reward: [(0, '4.809')] [2023-02-26 13:10:24,311][00203] Fps is (10 sec: 819.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 991232. Throughput: 0: 198.2. Samples: 249664. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:24,316][00203] Avg episode reward: [(0, '4.740')] [2023-02-26 13:10:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 995328. Throughput: 0: 193.6. Samples: 250856. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:29,315][00203] Avg episode reward: [(0, '4.707')] [2023-02-26 13:10:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 999424. Throughput: 0: 197.3. Samples: 251434. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:34,321][00203] Avg episode reward: [(0, '4.709')] [2023-02-26 13:10:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1003520. Throughput: 0: 198.9. Samples: 252400. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:39,321][00203] Avg episode reward: [(0, '4.703')] [2023-02-26 13:10:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1007616. Throughput: 0: 197.4. Samples: 253690. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:44,315][00203] Avg episode reward: [(0, '4.600')] [2023-02-26 13:10:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 1011712. Throughput: 0: 203.3. Samples: 254464. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:49,314][00203] Avg episode reward: [(0, '4.518')] [2023-02-26 13:10:54,262][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000248_1015808.pth... [2023-02-26 13:10:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 1015808. Throughput: 0: 197.2. Samples: 255458. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:54,315][00203] Avg episode reward: [(0, '4.537')] [2023-02-26 13:10:54,372][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000203_831488.pth [2023-02-26 13:10:59,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1015808. Throughput: 0: 200.0. Samples: 256448. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:10:59,325][00203] Avg episode reward: [(0, '4.580')] [2023-02-26 13:11:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1019904. Throughput: 0: 196.6. Samples: 257088. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:11:04,323][00203] Avg episode reward: [(0, '4.633')] [2023-02-26 13:11:05,027][13333] Updated weights for policy 0, policy_version 250 (0.1210) [2023-02-26 13:11:07,733][13314] Signal inference workers to stop experience collection... (250 times) [2023-02-26 13:11:07,787][13333] InferenceWorker_p0-w0: stopping experience collection (250 times) [2023-02-26 13:11:09,103][13314] Signal inference workers to resume experience collection... (250 times) [2023-02-26 13:11:09,107][13333] InferenceWorker_p0-w0: resuming experience collection (250 times) [2023-02-26 13:11:09,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1028096. Throughput: 0: 196.4. Samples: 258500. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:11:09,315][00203] Avg episode reward: [(0, '4.462')] [2023-02-26 13:11:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 763.7). Total num frames: 1028096. Throughput: 0: 190.1. Samples: 259410. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:11:14,323][00203] Avg episode reward: [(0, '4.449')] [2023-02-26 13:11:19,313][00203] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1032192. Throughput: 0: 184.9. Samples: 259754. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:11:19,321][00203] Avg episode reward: [(0, '4.479')] [2023-02-26 13:11:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1036288. Throughput: 0: 190.7. Samples: 260980. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:11:24,323][00203] Avg episode reward: [(0, '4.471')] [2023-02-26 13:11:29,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1040384. Throughput: 0: 199.2. Samples: 262652. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:11:29,318][00203] Avg episode reward: [(0, '4.545')] [2023-02-26 13:11:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1044480. Throughput: 0: 190.6. Samples: 263040. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:11:34,320][00203] Avg episode reward: [(0, '4.574')] [2023-02-26 13:11:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1048576. Throughput: 0: 188.0. Samples: 263918. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:11:39,320][00203] Avg episode reward: [(0, '4.527')] [2023-02-26 13:11:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1052672. Throughput: 0: 195.2. Samples: 265234. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:11:44,315][00203] Avg episode reward: [(0, '4.642')] [2023-02-26 13:11:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1056768. Throughput: 0: 194.9. Samples: 265858. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:11:49,323][00203] Avg episode reward: [(0, '4.554')] [2023-02-26 13:11:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1060864. Throughput: 0: 195.8. Samples: 267310. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:11:54,329][00203] Avg episode reward: [(0, '4.620')] [2023-02-26 13:11:57,873][13333] Updated weights for policy 0, policy_version 260 (0.0760) [2023-02-26 13:11:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1064960. Throughput: 0: 192.7. Samples: 268080. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:11:59,318][00203] Avg episode reward: [(0, '4.572')] [2023-02-26 13:12:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 1069056. Throughput: 0: 199.9. Samples: 268748. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:12:04,322][00203] Avg episode reward: [(0, '4.677')] [2023-02-26 13:12:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1073152. Throughput: 0: 199.0. Samples: 269934. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:12:09,315][00203] Avg episode reward: [(0, '4.706')] [2023-02-26 13:12:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 1077248. Throughput: 0: 189.9. Samples: 271196. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:12:14,319][00203] Avg episode reward: [(0, '4.706')] [2023-02-26 13:12:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1081344. Throughput: 0: 193.4. Samples: 271744. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:12:19,314][00203] Avg episode reward: [(0, '4.779')] [2023-02-26 13:12:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1085440. Throughput: 0: 198.4. Samples: 272848. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:12:24,322][00203] Avg episode reward: [(0, '4.683')] [2023-02-26 13:12:29,320][00203] Fps is (10 sec: 818.4, 60 sec: 819.1, 300 sec: 763.6). Total num frames: 1089536. Throughput: 0: 200.5. Samples: 274258. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:12:29,324][00203] Avg episode reward: [(0, '4.891')] [2023-02-26 13:12:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1093632. Throughput: 0: 204.5. Samples: 275060. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:12:34,318][00203] Avg episode reward: [(0, '4.836')] [2023-02-26 13:12:39,311][00203] Fps is (10 sec: 820.0, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 1097728. Throughput: 0: 195.0. Samples: 276086. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:12:39,319][00203] Avg episode reward: [(0, '4.829')] [2023-02-26 13:12:44,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1097728. Throughput: 0: 201.3. Samples: 277138. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:12:44,315][00203] Avg episode reward: [(0, '4.898')] [2023-02-26 13:12:48,868][13333] Updated weights for policy 0, policy_version 270 (0.1254) [2023-02-26 13:12:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1105920. Throughput: 0: 203.0. Samples: 277882. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:12:49,315][00203] Avg episode reward: [(0, '4.862')] [2023-02-26 13:12:53,002][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000271_1110016.pth... [2023-02-26 13:12:53,113][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000225_921600.pth [2023-02-26 13:12:54,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1110016. Throughput: 0: 206.3. Samples: 279218. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:12:54,316][00203] Avg episode reward: [(0, '5.019')] [2023-02-26 13:12:58,921][13314] Saving new best policy, reward=5.019! [2023-02-26 13:12:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1114112. Throughput: 0: 201.4. Samples: 280260. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:12:59,316][00203] Avg episode reward: [(0, '5.016')] [2023-02-26 13:13:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1114112. Throughput: 0: 202.1. Samples: 280840. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:13:04,314][00203] Avg episode reward: [(0, '5.082')] [2023-02-26 13:13:09,114][13314] Saving new best policy, reward=5.082! [2023-02-26 13:13:09,329][00203] Fps is (10 sec: 817.7, 60 sec: 819.0, 300 sec: 791.4). Total num frames: 1122304. Throughput: 0: 209.6. Samples: 282284. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:13:09,334][00203] Avg episode reward: [(0, '5.101')] [2023-02-26 13:13:13,544][13314] Saving new best policy, reward=5.101! [2023-02-26 13:13:14,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1126400. Throughput: 0: 203.1. Samples: 283396. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:13:14,315][00203] Avg episode reward: [(0, '5.235')] [2023-02-26 13:13:19,311][00203] Fps is (10 sec: 410.3, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 1126400. Throughput: 0: 203.1. Samples: 284198. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:13:19,314][00203] Avg episode reward: [(0, '5.312')] [2023-02-26 13:13:19,779][13314] Saving new best policy, reward=5.235! [2023-02-26 13:13:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1130496. Throughput: 0: 197.8. Samples: 284988. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:13:24,319][00203] Avg episode reward: [(0, '5.240')] [2023-02-26 13:13:26,198][13314] Saving new best policy, reward=5.312! [2023-02-26 13:13:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 1134592. Throughput: 0: 204.1. Samples: 286322. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:13:29,315][00203] Avg episode reward: [(0, '5.264')] [2023-02-26 13:13:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1138688. Throughput: 0: 202.0. Samples: 286972. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:13:34,318][00203] Avg episode reward: [(0, '5.293')] [2023-02-26 13:13:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1142784. Throughput: 0: 202.4. Samples: 288326. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:13:39,315][00203] Avg episode reward: [(0, '5.304')] [2023-02-26 13:13:40,430][13333] Updated weights for policy 0, policy_version 280 (0.0080) [2023-02-26 13:13:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1146880. Throughput: 0: 196.6. Samples: 289108. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:13:44,320][00203] Avg episode reward: [(0, '5.369')] [2023-02-26 13:13:46,756][13314] Saving new best policy, reward=5.369! [2023-02-26 13:13:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1150976. Throughput: 0: 192.7. Samples: 289510. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:13:49,316][00203] Avg episode reward: [(0, '5.293')] [2023-02-26 13:13:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1155072. Throughput: 0: 199.1. Samples: 291238. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:13:54,315][00203] Avg episode reward: [(0, '5.254')] [2023-02-26 13:13:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1159168. Throughput: 0: 202.2. Samples: 292494. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:13:59,321][00203] Avg episode reward: [(0, '5.260')] [2023-02-26 13:14:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1163264. Throughput: 0: 189.4. Samples: 292720. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:14:04,316][00203] Avg episode reward: [(0, '5.348')] [2023-02-26 13:14:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.2, 300 sec: 791.4). Total num frames: 1167360. Throughput: 0: 194.9. Samples: 293758. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:14:09,321][00203] Avg episode reward: [(0, '5.151')] [2023-02-26 13:14:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1171456. Throughput: 0: 201.5. Samples: 295388. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:14:14,315][00203] Avg episode reward: [(0, '5.121')] [2023-02-26 13:14:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1175552. Throughput: 0: 201.4. Samples: 296036. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:14:19,324][00203] Avg episode reward: [(0, '5.171')] [2023-02-26 13:14:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1179648. Throughput: 0: 190.6. Samples: 296904. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:14:24,315][00203] Avg episode reward: [(0, '5.090')] [2023-02-26 13:14:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1183744. Throughput: 0: 195.5. Samples: 297906. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:14:29,314][00203] Avg episode reward: [(0, '5.110')] [2023-02-26 13:14:32,455][13333] Updated weights for policy 0, policy_version 290 (0.0067) [2023-02-26 13:14:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1187840. Throughput: 0: 204.0. Samples: 298688. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:14:34,316][00203] Avg episode reward: [(0, '4.988')] [2023-02-26 13:14:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1191936. Throughput: 0: 194.5. Samples: 299990. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:14:39,319][00203] Avg episode reward: [(0, '4.879')] [2023-02-26 13:14:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1196032. Throughput: 0: 185.6. Samples: 300844. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:14:44,318][00203] Avg episode reward: [(0, '5.005')] [2023-02-26 13:14:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1200128. Throughput: 0: 195.9. Samples: 301534. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:14:49,323][00203] Avg episode reward: [(0, '5.011')] [2023-02-26 13:14:52,843][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000294_1204224.pth... [2023-02-26 13:14:52,955][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000248_1015808.pth [2023-02-26 13:14:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1204224. Throughput: 0: 198.2. Samples: 302676. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:14:54,314][00203] Avg episode reward: [(0, '4.853')] [2023-02-26 13:14:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1208320. Throughput: 0: 200.8. Samples: 304426. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:14:59,321][00203] Avg episode reward: [(0, '4.830')] [2023-02-26 13:15:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1212416. Throughput: 0: 194.1. Samples: 304772. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:15:04,320][00203] Avg episode reward: [(0, '4.843')] [2023-02-26 13:15:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1216512. Throughput: 0: 197.6. Samples: 305794. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:15:09,313][00203] Avg episode reward: [(0, '4.803')] [2023-02-26 13:15:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1220608. Throughput: 0: 206.3. Samples: 307188. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:15:14,318][00203] Avg episode reward: [(0, '4.778')] [2023-02-26 13:15:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1224704. Throughput: 0: 203.2. Samples: 307832. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:15:19,322][00203] Avg episode reward: [(0, '4.775')] [2023-02-26 13:15:21,590][13333] Updated weights for policy 0, policy_version 300 (0.0532) [2023-02-26 13:15:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1228800. Throughput: 0: 199.8. Samples: 308982. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:15:24,318][00203] Avg episode reward: [(0, '4.863')] [2023-02-26 13:15:26,769][13314] Signal inference workers to stop experience collection... (300 times) [2023-02-26 13:15:26,860][13333] InferenceWorker_p0-w0: stopping experience collection (300 times) [2023-02-26 13:15:28,206][13314] Signal inference workers to resume experience collection... (300 times) [2023-02-26 13:15:28,212][13333] InferenceWorker_p0-w0: resuming experience collection (300 times) [2023-02-26 13:15:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1232896. Throughput: 0: 200.9. Samples: 309884. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:15:29,319][00203] Avg episode reward: [(0, '5.082')] [2023-02-26 13:15:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1236992. Throughput: 0: 205.7. Samples: 310790. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:15:34,316][00203] Avg episode reward: [(0, '5.137')] [2023-02-26 13:15:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1241088. Throughput: 0: 207.8. Samples: 312026. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:15:39,316][00203] Avg episode reward: [(0, '5.125')] [2023-02-26 13:15:44,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1245184. Throughput: 0: 199.4. Samples: 313398. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:15:44,320][00203] Avg episode reward: [(0, '5.177')] [2023-02-26 13:15:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1249280. Throughput: 0: 202.8. Samples: 313896. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:15:49,315][00203] Avg episode reward: [(0, '5.196')] [2023-02-26 13:15:54,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1253376. Throughput: 0: 205.2. Samples: 315026. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:15:54,318][00203] Avg episode reward: [(0, '5.271')] [2023-02-26 13:15:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1257472. Throughput: 0: 212.8. Samples: 316762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:15:59,324][00203] Avg episode reward: [(0, '5.474')] [2023-02-26 13:16:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1261568. Throughput: 0: 205.9. Samples: 317098. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:16:04,314][00203] Avg episode reward: [(0, '5.526')] [2023-02-26 13:16:07,599][13314] Saving new best policy, reward=5.474! [2023-02-26 13:16:07,738][13314] Saving new best policy, reward=5.526! [2023-02-26 13:16:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1265664. Throughput: 0: 201.7. Samples: 318058. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:16:09,319][00203] Avg episode reward: [(0, '5.474')] [2023-02-26 13:16:13,272][13333] Updated weights for policy 0, policy_version 310 (0.0536) [2023-02-26 13:16:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1269760. Throughput: 0: 206.3. Samples: 319166. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:16:14,317][00203] Avg episode reward: [(0, '5.577')] [2023-02-26 13:16:17,730][13314] Saving new best policy, reward=5.577! [2023-02-26 13:16:19,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1273856. Throughput: 0: 206.3. Samples: 320072. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:16:19,316][00203] Avg episode reward: [(0, '5.571')] [2023-02-26 13:16:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1277952. Throughput: 0: 201.1. Samples: 321074. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:16:24,324][00203] Avg episode reward: [(0, '5.378')] [2023-02-26 13:16:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1277952. Throughput: 0: 194.2. Samples: 322136. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:16:29,315][00203] Avg episode reward: [(0, '5.352')] [2023-02-26 13:16:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1282048. Throughput: 0: 192.3. Samples: 322548. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:16:34,318][00203] Avg episode reward: [(0, '5.446')] [2023-02-26 13:16:39,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1290240. Throughput: 0: 202.7. Samples: 324148. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:16:39,321][00203] Avg episode reward: [(0, '5.404')] [2023-02-26 13:16:44,313][00203] Fps is (10 sec: 1228.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1294336. Throughput: 0: 186.9. Samples: 325172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:16:44,318][00203] Avg episode reward: [(0, '5.352')] [2023-02-26 13:16:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1294336. Throughput: 0: 193.4. Samples: 325802. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:16:49,313][00203] Avg episode reward: [(0, '5.417')] [2023-02-26 13:16:54,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1298432. Throughput: 0: 193.3. Samples: 326758. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:16:54,314][00203] Avg episode reward: [(0, '5.375')] [2023-02-26 13:16:54,708][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000318_1302528.pth... [2023-02-26 13:16:54,831][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000271_1110016.pth [2023-02-26 13:16:59,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1306624. Throughput: 0: 200.7. Samples: 328196. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:16:59,317][00203] Avg episode reward: [(0, '5.355')] [2023-02-26 13:17:03,509][13333] Updated weights for policy 0, policy_version 320 (0.0644) [2023-02-26 13:17:04,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1310720. Throughput: 0: 203.7. Samples: 329240. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:17:04,315][00203] Avg episode reward: [(0, '5.342')] [2023-02-26 13:17:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1310720. Throughput: 0: 201.0. Samples: 330118. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:17:09,314][00203] Avg episode reward: [(0, '5.352')] [2023-02-26 13:17:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1314816. Throughput: 0: 201.3. Samples: 331196. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:17:14,321][00203] Avg episode reward: [(0, '5.338')] [2023-02-26 13:17:19,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1323008. Throughput: 0: 209.6. Samples: 331982. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:17:19,316][00203] Avg episode reward: [(0, '5.406')] [2023-02-26 13:17:24,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1327104. Throughput: 0: 205.2. Samples: 333384. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:17:24,323][00203] Avg episode reward: [(0, '5.393')] [2023-02-26 13:17:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1327104. Throughput: 0: 204.5. Samples: 334376. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:17:29,317][00203] Avg episode reward: [(0, '5.268')] [2023-02-26 13:17:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1331200. Throughput: 0: 199.0. Samples: 334756. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:17:34,319][00203] Avg episode reward: [(0, '5.251')] [2023-02-26 13:17:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1335296. Throughput: 0: 210.3. Samples: 336222. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:17:39,315][00203] Avg episode reward: [(0, '5.480')] [2023-02-26 13:17:44,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1343488. Throughput: 0: 206.4. Samples: 337482. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:17:44,314][00203] Avg episode reward: [(0, '5.331')] [2023-02-26 13:17:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1343488. Throughput: 0: 194.4. Samples: 337988. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:17:49,314][00203] Avg episode reward: [(0, '5.226')] [2023-02-26 13:17:54,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1347584. Throughput: 0: 195.4. Samples: 338910. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:17:54,321][00203] Avg episode reward: [(0, '5.216')] [2023-02-26 13:17:59,313][00203] Fps is (10 sec: 409.5, 60 sec: 682.6, 300 sec: 791.4). Total num frames: 1347584. Throughput: 0: 184.1. Samples: 339482. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:17:59,322][00203] Avg episode reward: [(0, '5.194')] [2023-02-26 13:18:00,017][13333] Updated weights for policy 0, policy_version 330 (0.0617) [2023-02-26 13:18:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 777.6). Total num frames: 1351680. Throughput: 0: 171.9. Samples: 339718. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:18:04,319][00203] Avg episode reward: [(0, '5.303')] [2023-02-26 13:18:09,313][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1355776. Throughput: 0: 169.5. Samples: 341014. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:18:09,318][00203] Avg episode reward: [(0, '5.234')] [2023-02-26 13:18:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1359872. Throughput: 0: 166.1. Samples: 341850. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:18:14,319][00203] Avg episode reward: [(0, '5.217')] [2023-02-26 13:18:19,311][00203] Fps is (10 sec: 819.4, 60 sec: 682.7, 300 sec: 791.4). Total num frames: 1363968. Throughput: 0: 170.2. Samples: 342416. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:18:19,316][00203] Avg episode reward: [(0, '5.312')] [2023-02-26 13:18:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 791.4). Total num frames: 1368064. Throughput: 0: 161.2. Samples: 343474. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:18:24,322][00203] Avg episode reward: [(0, '5.250')] [2023-02-26 13:18:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1372160. Throughput: 0: 165.4. Samples: 344926. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:18:29,314][00203] Avg episode reward: [(0, '5.342')] [2023-02-26 13:18:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1376256. Throughput: 0: 167.1. Samples: 345508. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:18:34,317][00203] Avg episode reward: [(0, '5.362')] [2023-02-26 13:18:39,311][00203] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 1376256. Throughput: 0: 166.8. Samples: 346416. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:18:39,324][00203] Avg episode reward: [(0, '5.418')] [2023-02-26 13:18:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 791.4). Total num frames: 1384448. Throughput: 0: 179.4. Samples: 347554. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:18:44,315][00203] Avg episode reward: [(0, '5.395')] [2023-02-26 13:18:49,311][00203] Fps is (10 sec: 1228.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1388544. Throughput: 0: 193.8. Samples: 348440. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:18:49,316][00203] Avg episode reward: [(0, '5.598')] [2023-02-26 13:18:54,311][00203] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 1388544. Throughput: 0: 190.0. Samples: 349564. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:18:54,315][00203] Avg episode reward: [(0, '5.588')] [2023-02-26 13:18:54,507][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000340_1392640.pth... [2023-02-26 13:18:54,509][13333] Updated weights for policy 0, policy_version 340 (0.0556) [2023-02-26 13:18:54,629][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000294_1204224.pth [2023-02-26 13:18:54,651][13314] Saving new best policy, reward=5.598! [2023-02-26 13:18:59,311][00203] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 1392640. Throughput: 0: 189.6. Samples: 350384. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:18:59,316][00203] Avg episode reward: [(0, '5.545')] [2023-02-26 13:19:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1396736. Throughput: 0: 187.0. Samples: 350830. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:19:04,325][00203] Avg episode reward: [(0, '5.468')] [2023-02-26 13:19:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 1400832. Throughput: 0: 198.9. Samples: 352426. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:19:09,319][00203] Avg episode reward: [(0, '5.602')] [2023-02-26 13:19:14,312][00203] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1404928. Throughput: 0: 187.6. Samples: 353370. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:19:14,353][00203] Avg episode reward: [(0, '5.631')] [2023-02-26 13:19:16,195][13314] Saving new best policy, reward=5.602! [2023-02-26 13:19:16,353][13314] Saving new best policy, reward=5.631! [2023-02-26 13:19:19,312][00203] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1409024. Throughput: 0: 182.4. Samples: 353718. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:19:19,322][00203] Avg episode reward: [(0, '5.635')] [2023-02-26 13:19:22,081][13314] Saving new best policy, reward=5.635! [2023-02-26 13:19:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1413120. Throughput: 0: 185.7. Samples: 354772. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:19:24,315][00203] Avg episode reward: [(0, '5.628')] [2023-02-26 13:19:29,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1417216. Throughput: 0: 194.7. Samples: 356314. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:19:29,319][00203] Avg episode reward: [(0, '5.572')] [2023-02-26 13:19:34,316][00203] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1421312. Throughput: 0: 185.6. Samples: 356792. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:19:34,320][00203] Avg episode reward: [(0, '5.517')] [2023-02-26 13:19:39,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1425408. Throughput: 0: 180.4. Samples: 357682. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:19:39,317][00203] Avg episode reward: [(0, '5.341')] [2023-02-26 13:19:44,311][00203] Fps is (10 sec: 819.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1429504. Throughput: 0: 184.6. Samples: 358690. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:19:44,314][00203] Avg episode reward: [(0, '5.311')] [2023-02-26 13:19:47,581][13333] Updated weights for policy 0, policy_version 350 (0.0533) [2023-02-26 13:19:49,311][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1433600. Throughput: 0: 196.4. Samples: 359666. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:19:49,315][00203] Avg episode reward: [(0, '5.067')] [2023-02-26 13:19:50,140][13314] Signal inference workers to stop experience collection... (350 times) [2023-02-26 13:19:50,178][13333] InferenceWorker_p0-w0: stopping experience collection (350 times) [2023-02-26 13:19:51,581][13314] Signal inference workers to resume experience collection... (350 times) [2023-02-26 13:19:51,583][13333] InferenceWorker_p0-w0: resuming experience collection (350 times) [2023-02-26 13:19:54,318][00203] Fps is (10 sec: 818.6, 60 sec: 819.1, 300 sec: 777.5). Total num frames: 1437696. Throughput: 0: 187.7. Samples: 360874. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:19:54,321][00203] Avg episode reward: [(0, '5.083')] [2023-02-26 13:19:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1441792. Throughput: 0: 189.7. Samples: 361906. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:19:59,319][00203] Avg episode reward: [(0, '5.019')] [2023-02-26 13:20:04,311][00203] Fps is (10 sec: 819.8, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1445888. Throughput: 0: 195.6. Samples: 362520. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:04,320][00203] Avg episode reward: [(0, '5.028')] [2023-02-26 13:20:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1449984. Throughput: 0: 198.1. Samples: 363686. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:09,321][00203] Avg episode reward: [(0, '4.977')] [2023-02-26 13:20:14,317][00203] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 777.5). Total num frames: 1454080. Throughput: 0: 200.7. Samples: 365346. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:14,327][00203] Avg episode reward: [(0, '5.061')] [2023-02-26 13:20:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1458176. Throughput: 0: 200.2. Samples: 365802. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:19,319][00203] Avg episode reward: [(0, '5.140')] [2023-02-26 13:20:24,311][00203] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1458176. Throughput: 0: 203.2. Samples: 366824. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:24,314][00203] Avg episode reward: [(0, '5.209')] [2023-02-26 13:20:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1466368. Throughput: 0: 206.0. Samples: 367962. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:29,315][00203] Avg episode reward: [(0, '5.169')] [2023-02-26 13:20:34,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 777.5). Total num frames: 1470464. Throughput: 0: 205.4. Samples: 368910. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:34,318][00203] Avg episode reward: [(0, '5.267')] [2023-02-26 13:20:39,103][13333] Updated weights for policy 0, policy_version 360 (0.0620) [2023-02-26 13:20:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 1474560. Throughput: 0: 200.4. Samples: 369892. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:39,317][00203] Avg episode reward: [(0, '5.326')] [2023-02-26 13:20:44,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1474560. Throughput: 0: 198.1. Samples: 370822. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:44,319][00203] Avg episode reward: [(0, '5.367')] [2023-02-26 13:20:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1478656. Throughput: 0: 195.5. Samples: 371316. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:20:49,323][00203] Avg episode reward: [(0, '5.440')] [2023-02-26 13:20:54,101][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000363_1486848.pth... [2023-02-26 13:20:54,216][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000318_1302528.pth [2023-02-26 13:20:54,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 777.5). Total num frames: 1486848. Throughput: 0: 205.1. Samples: 372914. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:54,326][00203] Avg episode reward: [(0, '5.446')] [2023-02-26 13:20:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1486848. Throughput: 0: 190.7. Samples: 373928. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:20:59,314][00203] Avg episode reward: [(0, '5.453')] [2023-02-26 13:21:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1490944. Throughput: 0: 185.8. Samples: 374164. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:21:04,318][00203] Avg episode reward: [(0, '5.462')] [2023-02-26 13:21:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1495040. Throughput: 0: 189.5. Samples: 375350. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:21:09,324][00203] Avg episode reward: [(0, '5.385')] [2023-02-26 13:21:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 763.7). Total num frames: 1499136. Throughput: 0: 203.3. Samples: 377110. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:21:14,324][00203] Avg episode reward: [(0, '5.290')] [2023-02-26 13:21:19,313][00203] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1503232. Throughput: 0: 188.8. Samples: 377408. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:21:19,321][00203] Avg episode reward: [(0, '5.256')] [2023-02-26 13:21:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1507328. Throughput: 0: 190.2. Samples: 378452. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:21:24,318][00203] Avg episode reward: [(0, '5.424')] [2023-02-26 13:21:29,311][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1511424. Throughput: 0: 194.7. Samples: 379582. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:21:29,313][00203] Avg episode reward: [(0, '5.357')] [2023-02-26 13:21:31,550][13333] Updated weights for policy 0, policy_version 370 (0.0537) [2023-02-26 13:21:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1515520. Throughput: 0: 198.4. Samples: 380242. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:21:34,314][00203] Avg episode reward: [(0, '5.279')] [2023-02-26 13:21:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1519616. Throughput: 0: 199.1. Samples: 381874. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:21:39,325][00203] Avg episode reward: [(0, '5.276')] [2023-02-26 13:21:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1523712. Throughput: 0: 194.9. Samples: 382698. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-26 13:21:44,319][00203] Avg episode reward: [(0, '5.263')] [2023-02-26 13:21:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1527808. Throughput: 0: 202.0. Samples: 383256. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:21:49,314][00203] Avg episode reward: [(0, '5.263')] [2023-02-26 13:21:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1531904. Throughput: 0: 198.6. Samples: 384288. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:21:54,325][00203] Avg episode reward: [(0, '5.334')] [2023-02-26 13:21:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 1536000. Throughput: 0: 195.9. Samples: 385926. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:21:59,316][00203] Avg episode reward: [(0, '5.343')] [2023-02-26 13:22:04,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1540096. Throughput: 0: 198.5. Samples: 386342. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:04,319][00203] Avg episode reward: [(0, '5.402')] [2023-02-26 13:22:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1544192. Throughput: 0: 197.4. Samples: 387336. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:09,316][00203] Avg episode reward: [(0, '5.355')] [2023-02-26 13:22:14,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 1548288. Throughput: 0: 199.2. Samples: 388544. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:14,318][00203] Avg episode reward: [(0, '5.103')] [2023-02-26 13:22:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 1552384. Throughput: 0: 203.8. Samples: 389414. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:19,321][00203] Avg episode reward: [(0, '5.086')] [2023-02-26 13:22:23,046][13333] Updated weights for policy 0, policy_version 380 (0.0600) [2023-02-26 13:22:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1556480. Throughput: 0: 189.6. Samples: 390408. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:24,318][00203] Avg episode reward: [(0, '5.112')] [2023-02-26 13:22:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1556480. Throughput: 0: 194.4. Samples: 391448. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:29,319][00203] Avg episode reward: [(0, '5.073')] [2023-02-26 13:22:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1564672. Throughput: 0: 197.7. Samples: 392152. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:34,319][00203] Avg episode reward: [(0, '5.198')] [2023-02-26 13:22:39,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 1568768. Throughput: 0: 204.4. Samples: 393484. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:39,322][00203] Avg episode reward: [(0, '5.199')] [2023-02-26 13:22:44,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1572864. Throughput: 0: 191.1. Samples: 394524. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:44,315][00203] Avg episode reward: [(0, '5.048')] [2023-02-26 13:22:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 1572864. Throughput: 0: 196.4. Samples: 395180. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:49,319][00203] Avg episode reward: [(0, '5.169')] [2023-02-26 13:22:54,250][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000386_1581056.pth... [2023-02-26 13:22:54,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1581056. Throughput: 0: 205.2. Samples: 396572. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:54,314][00203] Avg episode reward: [(0, '5.130')] [2023-02-26 13:22:54,369][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000340_1392640.pth [2023-02-26 13:22:59,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1585152. Throughput: 0: 202.3. Samples: 397648. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:22:59,315][00203] Avg episode reward: [(0, '5.278')] [2023-02-26 13:23:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 777.6). Total num frames: 1585152. Throughput: 0: 196.2. Samples: 398244. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:23:04,316][00203] Avg episode reward: [(0, '5.286')] [2023-02-26 13:23:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1589248. Throughput: 0: 194.8. Samples: 399176. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:23:09,317][00203] Avg episode reward: [(0, '5.276')] [2023-02-26 13:23:14,313][00203] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1593344. Throughput: 0: 201.0. Samples: 400492. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:23:14,325][00203] Avg episode reward: [(0, '5.325')] [2023-02-26 13:23:15,823][13333] Updated weights for policy 0, policy_version 390 (0.0989) [2023-02-26 13:23:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1597440. Throughput: 0: 199.6. Samples: 401136. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:23:19,320][00203] Avg episode reward: [(0, '5.380')] [2023-02-26 13:23:24,311][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1601536. Throughput: 0: 203.7. Samples: 402652. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:23:24,314][00203] Avg episode reward: [(0, '5.504')] [2023-02-26 13:23:29,314][00203] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1605632. Throughput: 0: 199.5. Samples: 403500. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:23:29,320][00203] Avg episode reward: [(0, '5.366')] [2023-02-26 13:23:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1609728. Throughput: 0: 193.1. Samples: 403868. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:23:34,321][00203] Avg episode reward: [(0, '5.316')] [2023-02-26 13:23:39,311][00203] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1613824. Throughput: 0: 196.7. Samples: 405424. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:23:39,318][00203] Avg episode reward: [(0, '5.342')] [2023-02-26 13:23:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1617920. Throughput: 0: 206.2. Samples: 406928. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:23:44,316][00203] Avg episode reward: [(0, '5.339')] [2023-02-26 13:23:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1622016. Throughput: 0: 198.8. Samples: 407192. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:23:49,318][00203] Avg episode reward: [(0, '5.398')] [2023-02-26 13:23:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1626112. Throughput: 0: 200.4. Samples: 408194. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:23:54,324][00203] Avg episode reward: [(0, '5.322')] [2023-02-26 13:23:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1630208. Throughput: 0: 209.7. Samples: 409930. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:23:59,316][00203] Avg episode reward: [(0, '5.465')] [2023-02-26 13:24:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1634304. Throughput: 0: 209.7. Samples: 410572. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:24:04,315][00203] Avg episode reward: [(0, '5.423')] [2023-02-26 13:24:06,078][13333] Updated weights for policy 0, policy_version 400 (0.0049) [2023-02-26 13:24:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1638400. Throughput: 0: 194.2. Samples: 411390. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:24:09,313][00203] Avg episode reward: [(0, '5.379')] [2023-02-26 13:24:11,022][13314] Signal inference workers to stop experience collection... (400 times) [2023-02-26 13:24:11,106][13333] InferenceWorker_p0-w0: stopping experience collection (400 times) [2023-02-26 13:24:12,345][13314] Signal inference workers to resume experience collection... (400 times) [2023-02-26 13:24:12,347][13333] InferenceWorker_p0-w0: resuming experience collection (400 times) [2023-02-26 13:24:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1642496. Throughput: 0: 198.3. Samples: 412424. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:24:14,315][00203] Avg episode reward: [(0, '5.357')] [2023-02-26 13:24:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1646592. Throughput: 0: 206.4. Samples: 413156. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:24:19,321][00203] Avg episode reward: [(0, '5.211')] [2023-02-26 13:24:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1650688. Throughput: 0: 210.0. Samples: 414874. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:24:24,321][00203] Avg episode reward: [(0, '5.221')] [2023-02-26 13:24:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1654784. Throughput: 0: 193.5. Samples: 415636. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:24:29,321][00203] Avg episode reward: [(0, '5.214')] [2023-02-26 13:24:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1658880. Throughput: 0: 202.0. Samples: 416282. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:24:34,317][00203] Avg episode reward: [(0, '5.237')] [2023-02-26 13:24:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1662976. Throughput: 0: 207.5. Samples: 417530. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:24:39,321][00203] Avg episode reward: [(0, '5.286')] [2023-02-26 13:24:44,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1667072. Throughput: 0: 203.5. Samples: 419088. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:24:44,319][00203] Avg episode reward: [(0, '5.505')] [2023-02-26 13:24:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1671168. Throughput: 0: 196.8. Samples: 419426. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:24:49,315][00203] Avg episode reward: [(0, '5.528')] [2023-02-26 13:24:53,398][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000409_1675264.pth... [2023-02-26 13:24:53,515][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000363_1486848.pth [2023-02-26 13:24:54,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1675264. Throughput: 0: 201.3. Samples: 420450. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:24:54,314][00203] Avg episode reward: [(0, '5.570')] [2023-02-26 13:24:57,969][13333] Updated weights for policy 0, policy_version 410 (0.1238) [2023-02-26 13:24:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1679360. Throughput: 0: 207.1. Samples: 421742. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:24:59,314][00203] Avg episode reward: [(0, '5.602')] [2023-02-26 13:25:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1683456. Throughput: 0: 207.6. Samples: 422500. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:25:04,320][00203] Avg episode reward: [(0, '5.541')] [2023-02-26 13:25:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1687552. Throughput: 0: 191.2. Samples: 423480. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:25:09,325][00203] Avg episode reward: [(0, '5.629')] [2023-02-26 13:25:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1687552. Throughput: 0: 197.8. Samples: 424538. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:25:14,318][00203] Avg episode reward: [(0, '5.742')] [2023-02-26 13:25:19,126][13314] Saving new best policy, reward=5.742! [2023-02-26 13:25:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1695744. Throughput: 0: 198.6. Samples: 425218. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:25:19,315][00203] Avg episode reward: [(0, '5.754')] [2023-02-26 13:25:23,125][13314] Saving new best policy, reward=5.754! [2023-02-26 13:25:24,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1699840. Throughput: 0: 201.7. Samples: 426606. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:25:24,317][00203] Avg episode reward: [(0, '5.851')] [2023-02-26 13:25:29,143][13314] Saving new best policy, reward=5.851! [2023-02-26 13:25:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1703936. Throughput: 0: 189.2. Samples: 427602. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:25:29,313][00203] Avg episode reward: [(0, '5.857')] [2023-02-26 13:25:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1703936. Throughput: 0: 193.4. Samples: 428128. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:25:34,316][00203] Avg episode reward: [(0, '5.790')] [2023-02-26 13:25:35,522][13314] Saving new best policy, reward=5.857! [2023-02-26 13:25:39,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1708032. Throughput: 0: 197.6. Samples: 429344. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:25:39,321][00203] Avg episode reward: [(0, '5.783')] [2023-02-26 13:25:44,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1716224. Throughput: 0: 198.6. Samples: 430680. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:25:44,321][00203] Avg episode reward: [(0, '5.728')] [2023-02-26 13:25:49,313][00203] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1716224. Throughput: 0: 191.8. Samples: 431130. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:25:49,319][00203] Avg episode reward: [(0, '5.613')] [2023-02-26 13:25:50,784][13333] Updated weights for policy 0, policy_version 420 (0.0637) [2023-02-26 13:25:54,314][00203] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1720320. Throughput: 0: 192.1. Samples: 432124. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:25:54,318][00203] Avg episode reward: [(0, '5.668')] [2023-02-26 13:25:59,311][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1724416. Throughput: 0: 193.7. Samples: 433254. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:25:59,315][00203] Avg episode reward: [(0, '5.758')] [2023-02-26 13:26:04,311][00203] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1728512. Throughput: 0: 190.7. Samples: 433800. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:04,318][00203] Avg episode reward: [(0, '5.897')] [2023-02-26 13:26:06,170][13314] Saving new best policy, reward=5.897! [2023-02-26 13:26:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1732608. Throughput: 0: 189.6. Samples: 435136. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:09,316][00203] Avg episode reward: [(0, '5.940')] [2023-02-26 13:26:12,918][13314] Saving new best policy, reward=5.940! [2023-02-26 13:26:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1736704. Throughput: 0: 185.2. Samples: 435934. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:14,323][00203] Avg episode reward: [(0, '5.871')] [2023-02-26 13:26:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1740800. Throughput: 0: 186.8. Samples: 436536. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:19,315][00203] Avg episode reward: [(0, '5.874')] [2023-02-26 13:26:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1744896. Throughput: 0: 189.2. Samples: 437860. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:24,322][00203] Avg episode reward: [(0, '5.868')] [2023-02-26 13:26:29,315][00203] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1748992. Throughput: 0: 186.7. Samples: 439082. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:29,319][00203] Avg episode reward: [(0, '5.872')] [2023-02-26 13:26:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1753088. Throughput: 0: 190.1. Samples: 439686. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:34,317][00203] Avg episode reward: [(0, '5.860')] [2023-02-26 13:26:39,311][00203] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1753088. Throughput: 0: 189.6. Samples: 440654. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:39,324][00203] Avg episode reward: [(0, '5.821')] [2023-02-26 13:26:44,060][13333] Updated weights for policy 0, policy_version 430 (0.0547) [2023-02-26 13:26:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1761280. Throughput: 0: 192.0. Samples: 441892. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:44,319][00203] Avg episode reward: [(0, '5.903')] [2023-02-26 13:26:49,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1765376. Throughput: 0: 200.9. Samples: 442840. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:49,316][00203] Avg episode reward: [(0, '5.939')] [2023-02-26 13:26:54,311][00203] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 1765376. Throughput: 0: 192.8. Samples: 443812. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:54,314][00203] Avg episode reward: [(0, '5.966')] [2023-02-26 13:26:55,087][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000432_1769472.pth... [2023-02-26 13:26:55,246][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000386_1581056.pth [2023-02-26 13:26:59,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 1769472. Throughput: 0: 197.1. Samples: 444802. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:26:59,324][00203] Avg episode reward: [(0, '5.957')] [2023-02-26 13:27:01,072][13314] Saving new best policy, reward=5.966! [2023-02-26 13:27:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1773568. Throughput: 0: 192.5. Samples: 445198. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:27:04,324][00203] Avg episode reward: [(0, '5.955')] [2023-02-26 13:27:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1777664. Throughput: 0: 196.0. Samples: 446678. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:27:09,315][00203] Avg episode reward: [(0, '5.840')] [2023-02-26 13:27:14,314][00203] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1781760. Throughput: 0: 194.4. Samples: 447830. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:27:14,321][00203] Avg episode reward: [(0, '6.020')] [2023-02-26 13:27:16,997][13314] Saving new best policy, reward=6.020! [2023-02-26 13:27:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1785856. Throughput: 0: 187.2. Samples: 448108. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:27:19,315][00203] Avg episode reward: [(0, '6.098')] [2023-02-26 13:27:23,120][13314] Saving new best policy, reward=6.098! [2023-02-26 13:27:24,311][00203] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1789952. Throughput: 0: 189.2. Samples: 449168. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:27:24,315][00203] Avg episode reward: [(0, '6.281')] [2023-02-26 13:27:27,159][13314] Saving new best policy, reward=6.281! [2023-02-26 13:27:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 1794048. Throughput: 0: 196.8. Samples: 450750. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:27:29,316][00203] Avg episode reward: [(0, '6.556')] [2023-02-26 13:27:31,560][13314] Saving new best policy, reward=6.556! [2023-02-26 13:27:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1798144. Throughput: 0: 186.2. Samples: 451218. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:27:34,316][00203] Avg episode reward: [(0, '6.523')] [2023-02-26 13:27:37,912][13333] Updated weights for policy 0, policy_version 440 (0.0660) [2023-02-26 13:27:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1802240. Throughput: 0: 185.3. Samples: 452152. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:27:39,317][00203] Avg episode reward: [(0, '6.611')] [2023-02-26 13:27:43,453][13314] Saving new best policy, reward=6.611! [2023-02-26 13:27:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1806336. Throughput: 0: 188.7. Samples: 453292. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:27:44,320][00203] Avg episode reward: [(0, '6.658')] [2023-02-26 13:27:47,707][13314] Saving new best policy, reward=6.658! [2023-02-26 13:27:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1810432. Throughput: 0: 200.6. Samples: 454224. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:27:49,321][00203] Avg episode reward: [(0, '6.923')] [2023-02-26 13:27:51,973][13314] Saving new best policy, reward=6.923! [2023-02-26 13:27:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1814528. Throughput: 0: 198.1. Samples: 455592. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:27:54,318][00203] Avg episode reward: [(0, '6.802')] [2023-02-26 13:27:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1818624. Throughput: 0: 189.5. Samples: 456356. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:27:59,314][00203] Avg episode reward: [(0, '6.808')] [2023-02-26 13:28:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1818624. Throughput: 0: 196.4. Samples: 456946. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:28:04,315][00203] Avg episode reward: [(0, '6.844')] [2023-02-26 13:28:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1826816. Throughput: 0: 202.6. Samples: 458286. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:28:09,321][00203] Avg episode reward: [(0, '6.795')] [2023-02-26 13:28:14,331][00203] Fps is (10 sec: 1226.3, 60 sec: 819.0, 300 sec: 791.4). Total num frames: 1830912. Throughput: 0: 196.7. Samples: 459604. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:28:14,338][00203] Avg episode reward: [(0, '6.657')] [2023-02-26 13:28:19,314][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1835008. Throughput: 0: 200.7. Samples: 460248. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:28:19,317][00203] Avg episode reward: [(0, '6.748')] [2023-02-26 13:28:24,311][00203] Fps is (10 sec: 410.4, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 1835008. Throughput: 0: 199.5. Samples: 461128. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:28:24,317][00203] Avg episode reward: [(0, '6.793')] [2023-02-26 13:28:28,826][13333] Updated weights for policy 0, policy_version 450 (0.0515) [2023-02-26 13:28:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1843200. Throughput: 0: 202.1. Samples: 462386. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:28:29,319][00203] Avg episode reward: [(0, '6.756')] [2023-02-26 13:28:31,308][13314] Signal inference workers to stop experience collection... (450 times) [2023-02-26 13:28:31,389][13333] InferenceWorker_p0-w0: stopping experience collection (450 times) [2023-02-26 13:28:32,756][13314] Signal inference workers to resume experience collection... (450 times) [2023-02-26 13:28:32,759][13333] InferenceWorker_p0-w0: resuming experience collection (450 times) [2023-02-26 13:28:34,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1847296. Throughput: 0: 205.5. Samples: 463470. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:28:34,315][00203] Avg episode reward: [(0, '6.831')] [2023-02-26 13:28:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1851392. Throughput: 0: 197.3. Samples: 464470. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:28:39,317][00203] Avg episode reward: [(0, '6.818')] [2023-02-26 13:28:44,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1851392. Throughput: 0: 199.5. Samples: 465332. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:28:44,315][00203] Avg episode reward: [(0, '6.981')] [2023-02-26 13:28:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1855488. Throughput: 0: 203.1. Samples: 466086. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:28:49,320][00203] Avg episode reward: [(0, '6.912')] [2023-02-26 13:28:49,454][13314] Saving new best policy, reward=6.981! [2023-02-26 13:28:53,553][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000455_1863680.pth... [2023-02-26 13:28:53,669][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000409_1675264.pth [2023-02-26 13:28:54,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1863680. Throughput: 0: 206.5. Samples: 467578. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:28:54,314][00203] Avg episode reward: [(0, '6.669')] [2023-02-26 13:28:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1863680. Throughput: 0: 199.7. Samples: 468586. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:28:59,315][00203] Avg episode reward: [(0, '6.672')] [2023-02-26 13:29:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1867776. Throughput: 0: 195.6. Samples: 469048. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:29:04,314][00203] Avg episode reward: [(0, '6.885')] [2023-02-26 13:29:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1871872. Throughput: 0: 201.9. Samples: 470212. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:29:09,315][00203] Avg episode reward: [(0, '7.037')] [2023-02-26 13:29:14,289][13314] Saving new best policy, reward=7.037! [2023-02-26 13:29:14,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.5, 300 sec: 791.4). Total num frames: 1880064. Throughput: 0: 206.6. Samples: 471682. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:29:14,314][00203] Avg episode reward: [(0, '7.089')] [2023-02-26 13:29:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1880064. Throughput: 0: 200.6. Samples: 472496. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:29:19,314][00203] Avg episode reward: [(0, '7.031')] [2023-02-26 13:29:19,807][13314] Saving new best policy, reward=7.089! [2023-02-26 13:29:19,806][13333] Updated weights for policy 0, policy_version 460 (0.0706) [2023-02-26 13:29:24,312][00203] Fps is (10 sec: 409.5, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1884160. Throughput: 0: 195.2. Samples: 473254. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:29:24,320][00203] Avg episode reward: [(0, '6.828')] [2023-02-26 13:29:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1888256. Throughput: 0: 206.4. Samples: 474622. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:29:29,315][00203] Avg episode reward: [(0, '6.810')] [2023-02-26 13:29:34,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1892352. Throughput: 0: 203.0. Samples: 475220. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:29:34,325][00203] Avg episode reward: [(0, '7.059')] [2023-02-26 13:29:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1896448. Throughput: 0: 198.0. Samples: 476488. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:29:39,315][00203] Avg episode reward: [(0, '7.059')] [2023-02-26 13:29:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1900544. Throughput: 0: 196.9. Samples: 477448. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:29:44,315][00203] Avg episode reward: [(0, '7.125')] [2023-02-26 13:29:46,837][13314] Saving new best policy, reward=7.125! [2023-02-26 13:29:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1904640. Throughput: 0: 193.9. Samples: 477774. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:29:49,314][00203] Avg episode reward: [(0, '7.020')] [2023-02-26 13:29:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1908736. Throughput: 0: 204.8. Samples: 479430. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:29:54,315][00203] Avg episode reward: [(0, '7.300')] [2023-02-26 13:29:59,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1912832. Throughput: 0: 200.8. Samples: 480718. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:29:59,324][00203] Avg episode reward: [(0, '7.238')] [2023-02-26 13:30:01,298][13314] Saving new best policy, reward=7.300! [2023-02-26 13:30:04,321][00203] Fps is (10 sec: 818.4, 60 sec: 819.1, 300 sec: 777.5). Total num frames: 1916928. Throughput: 0: 187.8. Samples: 480950. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:30:04,331][00203] Avg episode reward: [(0, '7.254')] [2023-02-26 13:30:09,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1921024. Throughput: 0: 191.4. Samples: 481868. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:30:09,318][00203] Avg episode reward: [(0, '6.967')] [2023-02-26 13:30:12,319][13333] Updated weights for policy 0, policy_version 470 (0.1127) [2023-02-26 13:30:14,311][00203] Fps is (10 sec: 820.0, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1925120. Throughput: 0: 195.2. Samples: 483404. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:30:14,316][00203] Avg episode reward: [(0, '6.862')] [2023-02-26 13:30:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1929216. Throughput: 0: 198.0. Samples: 484128. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:30:19,319][00203] Avg episode reward: [(0, '7.041')] [2023-02-26 13:30:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1933312. Throughput: 0: 191.4. Samples: 485102. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:30:24,314][00203] Avg episode reward: [(0, '7.025')] [2023-02-26 13:30:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1937408. Throughput: 0: 192.2. Samples: 486098. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:30:29,318][00203] Avg episode reward: [(0, '7.179')] [2023-02-26 13:30:34,325][00203] Fps is (10 sec: 818.0, 60 sec: 819.0, 300 sec: 791.4). Total num frames: 1941504. Throughput: 0: 203.7. Samples: 486944. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:30:34,335][00203] Avg episode reward: [(0, '6.872')] [2023-02-26 13:30:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1945600. Throughput: 0: 192.3. Samples: 488084. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:30:39,317][00203] Avg episode reward: [(0, '6.738')] [2023-02-26 13:30:44,311][00203] Fps is (10 sec: 820.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1949696. Throughput: 0: 187.9. Samples: 489174. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:30:44,315][00203] Avg episode reward: [(0, '6.608')] [2023-02-26 13:30:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 1949696. Throughput: 0: 195.3. Samples: 489738. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:30:49,324][00203] Avg episode reward: [(0, '6.631')] [2023-02-26 13:30:53,679][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000478_1957888.pth... [2023-02-26 13:30:53,793][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000432_1769472.pth [2023-02-26 13:30:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1957888. Throughput: 0: 204.7. Samples: 491078. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:30:54,324][00203] Avg episode reward: [(0, '6.476')] [2023-02-26 13:30:59,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1961984. Throughput: 0: 199.6. Samples: 492384. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:30:59,314][00203] Avg episode reward: [(0, '6.589')] [2023-02-26 13:31:03,915][13333] Updated weights for policy 0, policy_version 480 (0.0510) [2023-02-26 13:31:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 1966080. Throughput: 0: 197.8. Samples: 493030. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:31:04,318][00203] Avg episode reward: [(0, '6.657')] [2023-02-26 13:31:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1966080. Throughput: 0: 195.9. Samples: 493918. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:31:09,316][00203] Avg episode reward: [(0, '6.711')] [2023-02-26 13:31:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1970176. Throughput: 0: 201.5. Samples: 495166. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:31:14,315][00203] Avg episode reward: [(0, '6.921')] [2023-02-26 13:31:19,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1978368. Throughput: 0: 199.4. Samples: 495914. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:31:19,323][00203] Avg episode reward: [(0, '6.849')] [2023-02-26 13:31:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 1978368. Throughput: 0: 202.1. Samples: 497180. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:31:24,315][00203] Avg episode reward: [(0, '6.734')] [2023-02-26 13:31:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1982464. Throughput: 0: 197.2. Samples: 498046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:31:29,322][00203] Avg episode reward: [(0, '6.859')] [2023-02-26 13:31:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.1, 300 sec: 791.4). Total num frames: 1986560. Throughput: 0: 195.2. Samples: 498520. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:31:34,320][00203] Avg episode reward: [(0, '6.808')] [2023-02-26 13:31:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1990656. Throughput: 0: 202.7. Samples: 500198. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:31:39,316][00203] Avg episode reward: [(0, '6.830')] [2023-02-26 13:31:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1994752. Throughput: 0: 197.5. Samples: 501270. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:31:44,319][00203] Avg episode reward: [(0, '6.981')] [2023-02-26 13:31:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1998848. Throughput: 0: 189.7. Samples: 501566. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:31:49,314][00203] Avg episode reward: [(0, '7.044')] [2023-02-26 13:31:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2002944. Throughput: 0: 192.2. Samples: 502566. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:31:54,320][00203] Avg episode reward: [(0, '7.051')] [2023-02-26 13:31:55,811][13333] Updated weights for policy 0, policy_version 490 (0.0053) [2023-02-26 13:31:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2007040. Throughput: 0: 202.4. Samples: 504276. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:31:59,323][00203] Avg episode reward: [(0, '6.937')] [2023-02-26 13:32:04,359][00203] Fps is (10 sec: 815.3, 60 sec: 750.3, 300 sec: 791.3). Total num frames: 2011136. Throughput: 0: 198.3. Samples: 504846. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:32:04,365][00203] Avg episode reward: [(0, '6.838')] [2023-02-26 13:32:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2015232. Throughput: 0: 189.8. Samples: 505720. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:32:09,315][00203] Avg episode reward: [(0, '6.912')] [2023-02-26 13:32:14,311][00203] Fps is (10 sec: 823.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2019328. Throughput: 0: 196.6. Samples: 506892. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:32:14,315][00203] Avg episode reward: [(0, '6.761')] [2023-02-26 13:32:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2023424. Throughput: 0: 198.4. Samples: 507450. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:32:19,320][00203] Avg episode reward: [(0, '6.866')] [2023-02-26 13:32:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2027520. Throughput: 0: 194.3. Samples: 508942. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:32:24,322][00203] Avg episode reward: [(0, '6.771')] [2023-02-26 13:32:29,314][00203] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2031616. Throughput: 0: 190.9. Samples: 509862. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:32:29,323][00203] Avg episode reward: [(0, '7.384')] [2023-02-26 13:32:33,574][13314] Saving new best policy, reward=7.384! [2023-02-26 13:32:34,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2035712. Throughput: 0: 197.7. Samples: 510462. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:32:34,315][00203] Avg episode reward: [(0, '7.128')] [2023-02-26 13:32:39,311][00203] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2039808. Throughput: 0: 201.9. Samples: 511650. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:32:39,316][00203] Avg episode reward: [(0, '7.146')] [2023-02-26 13:32:44,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2043904. Throughput: 0: 200.0. Samples: 513276. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:32:44,315][00203] Avg episode reward: [(0, '7.199')] [2023-02-26 13:32:47,416][13333] Updated weights for policy 0, policy_version 500 (0.0066) [2023-02-26 13:32:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2048000. Throughput: 0: 197.2. Samples: 513712. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:32:49,314][00203] Avg episode reward: [(0, '7.253')] [2023-02-26 13:32:52,301][13314] Signal inference workers to stop experience collection... (500 times) [2023-02-26 13:32:52,365][13333] InferenceWorker_p0-w0: stopping experience collection (500 times) [2023-02-26 13:32:53,971][13314] Signal inference workers to resume experience collection... (500 times) [2023-02-26 13:32:53,974][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000501_2052096.pth... [2023-02-26 13:32:53,974][13333] InferenceWorker_p0-w0: resuming experience collection (500 times) [2023-02-26 13:32:54,088][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000455_1863680.pth [2023-02-26 13:32:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2052096. Throughput: 0: 200.4. Samples: 514738. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:32:54,318][00203] Avg episode reward: [(0, '7.323')] [2023-02-26 13:32:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2056192. Throughput: 0: 200.4. Samples: 515908. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:32:59,316][00203] Avg episode reward: [(0, '7.408')] [2023-02-26 13:33:02,223][13314] Saving new best policy, reward=7.408! [2023-02-26 13:33:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.9, 300 sec: 791.4). Total num frames: 2060288. Throughput: 0: 207.2. Samples: 516774. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:33:04,316][00203] Avg episode reward: [(0, '7.597')] [2023-02-26 13:33:08,283][13314] Saving new best policy, reward=7.597! [2023-02-26 13:33:09,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.5). Total num frames: 2064384. Throughput: 0: 197.3. Samples: 517820. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:33:09,324][00203] Avg episode reward: [(0, '7.627')] [2023-02-26 13:33:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2064384. Throughput: 0: 199.6. Samples: 518844. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:33:14,315][00203] Avg episode reward: [(0, '7.590')] [2023-02-26 13:33:14,983][13314] Saving new best policy, reward=7.627! [2023-02-26 13:33:19,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2068480. Throughput: 0: 198.4. Samples: 519388. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:33:19,323][00203] Avg episode reward: [(0, '7.856')] [2023-02-26 13:33:23,758][13314] Saving new best policy, reward=7.856! [2023-02-26 13:33:24,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2076672. Throughput: 0: 204.8. Samples: 520864. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:33:24,315][00203] Avg episode reward: [(0, '7.827')] [2023-02-26 13:33:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 2076672. Throughput: 0: 191.7. Samples: 521904. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:33:29,314][00203] Avg episode reward: [(0, '7.726')] [2023-02-26 13:33:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2080768. Throughput: 0: 189.5. Samples: 522238. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:33:34,320][00203] Avg episode reward: [(0, '7.433')] [2023-02-26 13:33:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2084864. Throughput: 0: 193.8. Samples: 523460. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:33:39,317][00203] Avg episode reward: [(0, '7.771')] [2023-02-26 13:33:40,592][13333] Updated weights for policy 0, policy_version 510 (0.0089) [2023-02-26 13:33:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2088960. Throughput: 0: 201.6. Samples: 524978. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:33:44,319][00203] Avg episode reward: [(0, '7.624')] [2023-02-26 13:33:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2093056. Throughput: 0: 192.4. Samples: 525434. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:33:49,317][00203] Avg episode reward: [(0, '7.367')] [2023-02-26 13:33:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2097152. Throughput: 0: 188.9. Samples: 526320. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:33:54,329][00203] Avg episode reward: [(0, '7.151')] [2023-02-26 13:33:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2101248. Throughput: 0: 192.6. Samples: 527512. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:33:59,314][00203] Avg episode reward: [(0, '7.175')] [2023-02-26 13:34:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2105344. Throughput: 0: 193.8. Samples: 528108. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:34:04,320][00203] Avg episode reward: [(0, '7.132')] [2023-02-26 13:34:09,316][00203] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2109440. Throughput: 0: 196.3. Samples: 529700. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:34:09,326][00203] Avg episode reward: [(0, '7.031')] [2023-02-26 13:34:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2113536. Throughput: 0: 190.7. Samples: 530484. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:34:14,317][00203] Avg episode reward: [(0, '6.974')] [2023-02-26 13:34:19,317][00203] Fps is (10 sec: 819.2, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 2117632. Throughput: 0: 196.0. Samples: 531058. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:34:19,321][00203] Avg episode reward: [(0, '6.943')] [2023-02-26 13:34:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2121728. Throughput: 0: 192.6. Samples: 532126. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:34:24,315][00203] Avg episode reward: [(0, '6.901')] [2023-02-26 13:34:29,311][00203] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2125824. Throughput: 0: 192.9. Samples: 533658. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:34:29,321][00203] Avg episode reward: [(0, '7.082')] [2023-02-26 13:34:32,817][13333] Updated weights for policy 0, policy_version 520 (0.0617) [2023-02-26 13:34:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2129920. Throughput: 0: 192.3. Samples: 534086. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:34:34,315][00203] Avg episode reward: [(0, '6.990')] [2023-02-26 13:34:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2134016. Throughput: 0: 195.6. Samples: 535124. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:34:39,321][00203] Avg episode reward: [(0, '6.971')] [2023-02-26 13:34:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2138112. Throughput: 0: 197.1. Samples: 536382. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:34:44,320][00203] Avg episode reward: [(0, '6.915')] [2023-02-26 13:34:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2142208. Throughput: 0: 201.9. Samples: 537194. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:34:49,317][00203] Avg episode reward: [(0, '7.146')] [2023-02-26 13:34:52,971][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth... [2023-02-26 13:34:53,081][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000478_1957888.pth [2023-02-26 13:34:54,314][00203] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2146304. Throughput: 0: 188.2. Samples: 538168. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:34:54,325][00203] Avg episode reward: [(0, '7.306')] [2023-02-26 13:34:59,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 2146304. Throughput: 0: 194.1. Samples: 539218. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:34:59,323][00203] Avg episode reward: [(0, '7.273')] [2023-02-26 13:35:04,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2154496. Throughput: 0: 198.5. Samples: 539990. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:35:04,315][00203] Avg episode reward: [(0, '7.179')] [2023-02-26 13:35:09,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 2158592. Throughput: 0: 203.2. Samples: 541268. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:35:09,317][00203] Avg episode reward: [(0, '7.046')] [2023-02-26 13:35:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2162688. Throughput: 0: 191.9. Samples: 542294. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:35:14,314][00203] Avg episode reward: [(0, '6.982')] [2023-02-26 13:35:19,311][00203] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 2162688. Throughput: 0: 193.7. Samples: 542802. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:35:19,320][00203] Avg episode reward: [(0, '7.023')] [2023-02-26 13:35:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2166784. Throughput: 0: 202.6. Samples: 544240. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:35:24,317][00203] Avg episode reward: [(0, '6.892')] [2023-02-26 13:35:24,418][13333] Updated weights for policy 0, policy_version 530 (0.0072) [2023-02-26 13:35:29,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.5). Total num frames: 2174976. Throughput: 0: 199.2. Samples: 545348. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:35:29,322][00203] Avg episode reward: [(0, '7.135')] [2023-02-26 13:35:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2174976. Throughput: 0: 199.2. Samples: 546160. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:35:34,318][00203] Avg episode reward: [(0, '6.984')] [2023-02-26 13:35:39,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2179072. Throughput: 0: 198.5. Samples: 547100. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:35:39,315][00203] Avg episode reward: [(0, '6.885')] [2023-02-26 13:35:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2183168. Throughput: 0: 204.9. Samples: 548440. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:35:44,316][00203] Avg episode reward: [(0, '6.721')] [2023-02-26 13:35:49,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2191360. Throughput: 0: 204.8. Samples: 549208. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:35:49,322][00203] Avg episode reward: [(0, '6.744')] [2023-02-26 13:35:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 2191360. Throughput: 0: 204.4. Samples: 550468. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:35:54,320][00203] Avg episode reward: [(0, '6.691')] [2023-02-26 13:35:59,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 2195456. Throughput: 0: 198.8. Samples: 551238. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:35:59,315][00203] Avg episode reward: [(0, '6.701')] [2023-02-26 13:36:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2199552. Throughput: 0: 197.3. Samples: 551682. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:36:04,322][00203] Avg episode reward: [(0, '6.956')] [2023-02-26 13:36:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2203648. Throughput: 0: 204.6. Samples: 553446. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:36:09,316][00203] Avg episode reward: [(0, '7.126')] [2023-02-26 13:36:14,314][00203] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2207744. Throughput: 0: 201.5. Samples: 554418. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:36:14,324][00203] Avg episode reward: [(0, '7.181')] [2023-02-26 13:36:15,359][13333] Updated weights for policy 0, policy_version 540 (0.0522) [2023-02-26 13:36:19,312][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2211840. Throughput: 0: 192.7. Samples: 554832. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:36:19,322][00203] Avg episode reward: [(0, '7.187')] [2023-02-26 13:36:24,311][00203] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2215936. Throughput: 0: 196.1. Samples: 555926. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:36:24,316][00203] Avg episode reward: [(0, '7.241')] [2023-02-26 13:36:29,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2220032. Throughput: 0: 203.9. Samples: 557614. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:36:29,322][00203] Avg episode reward: [(0, '7.190')] [2023-02-26 13:36:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2224128. Throughput: 0: 201.7. Samples: 558284. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:36:34,315][00203] Avg episode reward: [(0, '7.040')] [2023-02-26 13:36:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2228224. Throughput: 0: 192.9. Samples: 559150. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:36:39,318][00203] Avg episode reward: [(0, '7.236')] [2023-02-26 13:36:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2232320. Throughput: 0: 201.6. Samples: 560312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:36:44,315][00203] Avg episode reward: [(0, '7.324')] [2023-02-26 13:36:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2236416. Throughput: 0: 204.7. Samples: 560894. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:36:49,318][00203] Avg episode reward: [(0, '7.151')] [2023-02-26 13:36:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2240512. Throughput: 0: 200.6. Samples: 562474. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:36:54,315][00203] Avg episode reward: [(0, '7.075')] [2023-02-26 13:36:56,160][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000548_2244608.pth... [2023-02-26 13:36:56,241][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000501_2052096.pth [2023-02-26 13:36:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.6). Total num frames: 2244608. Throughput: 0: 197.0. Samples: 563282. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:36:59,316][00203] Avg episode reward: [(0, '6.945')] [2023-02-26 13:37:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2248704. Throughput: 0: 198.7. Samples: 563774. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:37:04,313][00203] Avg episode reward: [(0, '7.088')] [2023-02-26 13:37:06,499][13333] Updated weights for policy 0, policy_version 550 (0.0055) [2023-02-26 13:37:08,910][13314] Signal inference workers to stop experience collection... (550 times) [2023-02-26 13:37:08,948][13333] InferenceWorker_p0-w0: stopping experience collection (550 times) [2023-02-26 13:37:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2252800. Throughput: 0: 209.1. Samples: 565334. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:37:09,323][00203] Avg episode reward: [(0, '7.441')] [2023-02-26 13:37:10,548][13314] Signal inference workers to resume experience collection... (550 times) [2023-02-26 13:37:10,551][13333] InferenceWorker_p0-w0: resuming experience collection (550 times) [2023-02-26 13:37:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2256896. Throughput: 0: 204.4. Samples: 566810. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:37:14,318][00203] Avg episode reward: [(0, '7.452')] [2023-02-26 13:37:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2260992. Throughput: 0: 193.6. Samples: 566998. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:37:19,323][00203] Avg episode reward: [(0, '7.425')] [2023-02-26 13:37:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2265088. Throughput: 0: 193.4. Samples: 567852. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:37:24,320][00203] Avg episode reward: [(0, '7.363')] [2023-02-26 13:37:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2269184. Throughput: 0: 203.2. Samples: 569454. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:37:29,315][00203] Avg episode reward: [(0, '7.399')] [2023-02-26 13:37:34,320][00203] Fps is (10 sec: 818.5, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 2273280. Throughput: 0: 203.9. Samples: 570070. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:37:34,324][00203] Avg episode reward: [(0, '7.759')] [2023-02-26 13:37:39,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2277376. Throughput: 0: 188.7. Samples: 570966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:37:39,323][00203] Avg episode reward: [(0, '8.120')] [2023-02-26 13:37:43,844][13314] Saving new best policy, reward=8.120! [2023-02-26 13:37:44,311][00203] Fps is (10 sec: 819.9, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2281472. Throughput: 0: 193.3. Samples: 571982. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:37:44,318][00203] Avg episode reward: [(0, '8.009')] [2023-02-26 13:37:49,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2285568. Throughput: 0: 204.3. Samples: 572968. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:37:49,316][00203] Avg episode reward: [(0, '7.718')] [2023-02-26 13:37:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2289664. Throughput: 0: 195.0. Samples: 574108. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:37:54,314][00203] Avg episode reward: [(0, '7.708')] [2023-02-26 13:37:58,543][13333] Updated weights for policy 0, policy_version 560 (0.0774) [2023-02-26 13:37:59,315][00203] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 2293760. Throughput: 0: 184.0. Samples: 575092. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:37:59,326][00203] Avg episode reward: [(0, '7.964')] [2023-02-26 13:38:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2293760. Throughput: 0: 194.2. Samples: 575736. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:38:04,319][00203] Avg episode reward: [(0, '8.127')] [2023-02-26 13:38:08,621][13314] Saving new best policy, reward=8.127! [2023-02-26 13:38:09,311][00203] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2301952. Throughput: 0: 204.5. Samples: 577054. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:38:09,317][00203] Avg episode reward: [(0, '8.122')] [2023-02-26 13:38:14,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2306048. Throughput: 0: 199.5. Samples: 578430. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:38:14,321][00203] Avg episode reward: [(0, '7.980')] [2023-02-26 13:38:19,314][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2310144. Throughput: 0: 200.2. Samples: 579080. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:38:19,320][00203] Avg episode reward: [(0, '7.653')] [2023-02-26 13:38:24,315][00203] Fps is (10 sec: 409.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2310144. Throughput: 0: 200.9. Samples: 580006. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:38:24,318][00203] Avg episode reward: [(0, '7.705')] [2023-02-26 13:38:29,311][00203] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2318336. Throughput: 0: 206.3. Samples: 581266. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:38:29,313][00203] Avg episode reward: [(0, '8.060')] [2023-02-26 13:38:34,311][00203] Fps is (10 sec: 1229.3, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 2322432. Throughput: 0: 203.7. Samples: 582134. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:38:34,316][00203] Avg episode reward: [(0, '7.751')] [2023-02-26 13:38:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2326528. Throughput: 0: 202.3. Samples: 583210. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:38:39,316][00203] Avg episode reward: [(0, '7.649')] [2023-02-26 13:38:44,313][00203] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2326528. Throughput: 0: 202.9. Samples: 584222. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:38:44,318][00203] Avg episode reward: [(0, '7.451')] [2023-02-26 13:38:48,454][13333] Updated weights for policy 0, policy_version 570 (0.0060) [2023-02-26 13:38:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2334720. Throughput: 0: 207.2. Samples: 585062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:38:49,319][00203] Avg episode reward: [(0, '7.520')] [2023-02-26 13:38:52,574][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000571_2338816.pth... [2023-02-26 13:38:52,677][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth [2023-02-26 13:38:54,311][00203] Fps is (10 sec: 1229.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2338816. Throughput: 0: 204.9. Samples: 586276. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:38:54,320][00203] Avg episode reward: [(0, '7.845')] [2023-02-26 13:38:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 2342912. Throughput: 0: 203.0. Samples: 587564. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:38:59,315][00203] Avg episode reward: [(0, '7.898')] [2023-02-26 13:39:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 805.3). Total num frames: 2347008. Throughput: 0: 203.3. Samples: 588226. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:04,316][00203] Avg episode reward: [(0, '8.057')] [2023-02-26 13:39:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2351104. Throughput: 0: 206.7. Samples: 589308. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:09,317][00203] Avg episode reward: [(0, '7.967')] [2023-02-26 13:39:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2355200. Throughput: 0: 209.8. Samples: 590706. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:14,315][00203] Avg episode reward: [(0, '7.968')] [2023-02-26 13:39:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2359296. Throughput: 0: 204.4. Samples: 591334. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:19,318][00203] Avg episode reward: [(0, '8.102')] [2023-02-26 13:39:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 2359296. Throughput: 0: 201.6. Samples: 592284. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:24,315][00203] Avg episode reward: [(0, '7.939')] [2023-02-26 13:39:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2367488. Throughput: 0: 203.7. Samples: 593388. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:29,316][00203] Avg episode reward: [(0, '7.958')] [2023-02-26 13:39:34,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2371584. Throughput: 0: 207.9. Samples: 594418. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:34,319][00203] Avg episode reward: [(0, '8.017')] [2023-02-26 13:39:38,671][13333] Updated weights for policy 0, policy_version 580 (0.1297) [2023-02-26 13:39:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2375680. Throughput: 0: 205.1. Samples: 595504. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:39,314][00203] Avg episode reward: [(0, '8.097')] [2023-02-26 13:39:44,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2375680. Throughput: 0: 197.6. Samples: 596454. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:44,316][00203] Avg episode reward: [(0, '8.208')] [2023-02-26 13:39:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2379776. Throughput: 0: 193.4. Samples: 596930. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:49,321][00203] Avg episode reward: [(0, '8.318')] [2023-02-26 13:39:49,601][13314] Saving new best policy, reward=8.208! [2023-02-26 13:39:53,942][13314] Saving new best policy, reward=8.318! [2023-02-26 13:39:54,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2387968. Throughput: 0: 205.0. Samples: 598532. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:54,319][00203] Avg episode reward: [(0, '8.139')] [2023-02-26 13:39:59,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2392064. Throughput: 0: 196.6. Samples: 599554. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:39:59,313][00203] Avg episode reward: [(0, '7.941')] [2023-02-26 13:40:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2392064. Throughput: 0: 193.9. Samples: 600058. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:40:04,315][00203] Avg episode reward: [(0, '7.882')] [2023-02-26 13:40:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2396160. Throughput: 0: 200.4. Samples: 601302. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:40:09,319][00203] Avg episode reward: [(0, '7.918')] [2023-02-26 13:40:14,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2404352. Throughput: 0: 203.6. Samples: 602548. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:40:14,322][00203] Avg episode reward: [(0, '7.900')] [2023-02-26 13:40:19,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2408448. Throughput: 0: 199.1. Samples: 603378. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:40:19,319][00203] Avg episode reward: [(0, '8.112')] [2023-02-26 13:40:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2408448. Throughput: 0: 196.0. Samples: 604324. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:40:24,315][00203] Avg episode reward: [(0, '8.158')] [2023-02-26 13:40:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2412544. Throughput: 0: 204.4. Samples: 605654. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:40:29,316][00203] Avg episode reward: [(0, '8.024')] [2023-02-26 13:40:30,353][13333] Updated weights for policy 0, policy_version 590 (0.0525) [2023-02-26 13:40:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2416640. Throughput: 0: 205.0. Samples: 606154. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:40:34,320][00203] Avg episode reward: [(0, '7.968')] [2023-02-26 13:40:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2420736. Throughput: 0: 203.3. Samples: 607680. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:40:39,316][00203] Avg episode reward: [(0, '7.752')] [2023-02-26 13:40:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2424832. Throughput: 0: 199.4. Samples: 608526. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:40:44,319][00203] Avg episode reward: [(0, '7.737')] [2023-02-26 13:40:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2428928. Throughput: 0: 195.5. Samples: 608854. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:40:49,315][00203] Avg episode reward: [(0, '7.813')] [2023-02-26 13:40:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2433024. Throughput: 0: 205.3. Samples: 610540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:40:54,317][00203] Avg episode reward: [(0, '7.732')] [2023-02-26 13:40:54,974][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000595_2437120.pth... [2023-02-26 13:40:55,090][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000548_2244608.pth [2023-02-26 13:40:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2437120. Throughput: 0: 206.5. Samples: 611842. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:40:59,314][00203] Avg episode reward: [(0, '7.901')] [2023-02-26 13:41:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2441216. Throughput: 0: 194.4. Samples: 612128. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:41:04,323][00203] Avg episode reward: [(0, '8.216')] [2023-02-26 13:41:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2445312. Throughput: 0: 195.7. Samples: 613130. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:41:09,314][00203] Avg episode reward: [(0, '8.124')] [2023-02-26 13:41:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2449408. Throughput: 0: 206.1. Samples: 614928. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:41:14,325][00203] Avg episode reward: [(0, '8.048')] [2023-02-26 13:41:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2453504. Throughput: 0: 209.1. Samples: 615562. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:41:19,316][00203] Avg episode reward: [(0, '8.168')] [2023-02-26 13:41:19,746][13333] Updated weights for policy 0, policy_version 600 (0.0514) [2023-02-26 13:41:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2457600. Throughput: 0: 196.7. Samples: 616530. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:41:24,317][00203] Avg episode reward: [(0, '8.130')] [2023-02-26 13:41:24,843][13314] Signal inference workers to stop experience collection... (600 times) [2023-02-26 13:41:24,930][13333] InferenceWorker_p0-w0: stopping experience collection (600 times) [2023-02-26 13:41:26,379][13314] Signal inference workers to resume experience collection... (600 times) [2023-02-26 13:41:26,381][13333] InferenceWorker_p0-w0: resuming experience collection (600 times) [2023-02-26 13:41:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2461696. Throughput: 0: 201.5. Samples: 617594. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:41:29,318][00203] Avg episode reward: [(0, '7.996')] [2023-02-26 13:41:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2465792. Throughput: 0: 211.1. Samples: 618354. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:41:34,315][00203] Avg episode reward: [(0, '8.070')] [2023-02-26 13:41:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2469888. Throughput: 0: 209.5. Samples: 619966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:41:39,321][00203] Avg episode reward: [(0, '8.270')] [2023-02-26 13:41:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2473984. Throughput: 0: 197.9. Samples: 620748. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:41:44,318][00203] Avg episode reward: [(0, '8.451')] [2023-02-26 13:41:46,962][13314] Saving new best policy, reward=8.451! [2023-02-26 13:41:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2478080. Throughput: 0: 200.6. Samples: 621154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:41:49,321][00203] Avg episode reward: [(0, '8.809')] [2023-02-26 13:41:51,069][13314] Saving new best policy, reward=8.809! [2023-02-26 13:41:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2482176. Throughput: 0: 213.5. Samples: 622738. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:41:54,321][00203] Avg episode reward: [(0, '8.838')] [2023-02-26 13:41:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2486272. Throughput: 0: 206.8. Samples: 624232. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:41:59,314][00203] Avg episode reward: [(0, '8.840')] [2023-02-26 13:42:00,460][13314] Saving new best policy, reward=8.838! [2023-02-26 13:42:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2490368. Throughput: 0: 199.0. Samples: 624516. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:42:04,317][00203] Avg episode reward: [(0, '8.748')] [2023-02-26 13:42:06,801][13314] Saving new best policy, reward=8.840! [2023-02-26 13:42:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2494464. Throughput: 0: 197.8. Samples: 625430. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:42:09,317][00203] Avg episode reward: [(0, '8.607')] [2023-02-26 13:42:11,282][13333] Updated weights for policy 0, policy_version 610 (0.0681) [2023-02-26 13:42:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2498560. Throughput: 0: 214.7. Samples: 627256. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:42:14,316][00203] Avg episode reward: [(0, '8.486')] [2023-02-26 13:42:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2502656. Throughput: 0: 209.0. Samples: 627760. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:42:19,318][00203] Avg episode reward: [(0, '8.460')] [2023-02-26 13:42:24,315][00203] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 2506752. Throughput: 0: 195.1. Samples: 628746. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:42:24,320][00203] Avg episode reward: [(0, '8.253')] [2023-02-26 13:42:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2510848. Throughput: 0: 199.6. Samples: 629730. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:42:29,322][00203] Avg episode reward: [(0, '8.145')] [2023-02-26 13:42:34,311][00203] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2514944. Throughput: 0: 204.7. Samples: 630364. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:42:34,322][00203] Avg episode reward: [(0, '8.085')] [2023-02-26 13:42:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2519040. Throughput: 0: 205.3. Samples: 631978. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:42:39,316][00203] Avg episode reward: [(0, '8.122')] [2023-02-26 13:42:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2523136. Throughput: 0: 192.2. Samples: 632880. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:42:44,316][00203] Avg episode reward: [(0, '8.192')] [2023-02-26 13:42:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2527232. Throughput: 0: 197.3. Samples: 633394. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:42:49,315][00203] Avg episode reward: [(0, '8.106')] [2023-02-26 13:42:52,005][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000618_2531328.pth... [2023-02-26 13:42:52,116][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000571_2338816.pth [2023-02-26 13:42:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2531328. Throughput: 0: 206.1. Samples: 634706. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:42:54,315][00203] Avg episode reward: [(0, '8.099')] [2023-02-26 13:42:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2535424. Throughput: 0: 195.2. Samples: 636038. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:42:59,318][00203] Avg episode reward: [(0, '7.871')] [2023-02-26 13:43:02,181][13333] Updated weights for policy 0, policy_version 620 (0.0612) [2023-02-26 13:43:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2539520. Throughput: 0: 195.0. Samples: 636534. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:43:04,321][00203] Avg episode reward: [(0, '8.534')] [2023-02-26 13:43:09,315][00203] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 2543616. Throughput: 0: 195.5. Samples: 637542. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:43:09,324][00203] Avg episode reward: [(0, '8.477')] [2023-02-26 13:43:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2547712. Throughput: 0: 201.2. Samples: 638782. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:43:14,318][00203] Avg episode reward: [(0, '8.471')] [2023-02-26 13:43:19,311][00203] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2551808. Throughput: 0: 204.8. Samples: 639578. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:43:19,319][00203] Avg episode reward: [(0, '8.116')] [2023-02-26 13:43:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 2555904. Throughput: 0: 192.1. Samples: 640622. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:43:24,324][00203] Avg episode reward: [(0, '8.330')] [2023-02-26 13:43:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2555904. Throughput: 0: 196.0. Samples: 641702. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:43:29,319][00203] Avg episode reward: [(0, '8.320')] [2023-02-26 13:43:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2564096. Throughput: 0: 199.3. Samples: 642362. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:43:34,324][00203] Avg episode reward: [(0, '8.457')] [2023-02-26 13:43:39,314][00203] Fps is (10 sec: 1228.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2568192. Throughput: 0: 200.5. Samples: 643730. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:43:39,321][00203] Avg episode reward: [(0, '8.578')] [2023-02-26 13:43:44,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2568192. Throughput: 0: 192.2. Samples: 644688. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:43:44,314][00203] Avg episode reward: [(0, '8.724')] [2023-02-26 13:43:49,314][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2572288. Throughput: 0: 191.6. Samples: 645158. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:43:49,320][00203] Avg episode reward: [(0, '8.784')] [2023-02-26 13:43:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2576384. Throughput: 0: 195.8. Samples: 646354. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:43:54,323][00203] Avg episode reward: [(0, '8.716')] [2023-02-26 13:43:55,209][13333] Updated weights for policy 0, policy_version 630 (0.1573) [2023-02-26 13:43:59,311][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2580480. Throughput: 0: 201.7. Samples: 647860. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:43:59,322][00203] Avg episode reward: [(0, '9.252')] [2023-02-26 13:44:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2584576. Throughput: 0: 197.9. Samples: 648484. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:44:04,314][00203] Avg episode reward: [(0, '9.596')] [2023-02-26 13:44:05,525][13314] Saving new best policy, reward=9.252! [2023-02-26 13:44:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 2588672. Throughput: 0: 192.8. Samples: 649298. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:44:09,319][00203] Avg episode reward: [(0, '9.470')] [2023-02-26 13:44:11,610][13314] Saving new best policy, reward=9.596! [2023-02-26 13:44:14,315][00203] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2592768. Throughput: 0: 199.0. Samples: 650658. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:44:14,320][00203] Avg episode reward: [(0, '9.480')] [2023-02-26 13:44:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2596864. Throughput: 0: 195.1. Samples: 651142. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 13:44:19,322][00203] Avg episode reward: [(0, '9.126')] [2023-02-26 13:44:24,313][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2600960. Throughput: 0: 194.8. Samples: 652496. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:44:24,316][00203] Avg episode reward: [(0, '8.989')] [2023-02-26 13:44:29,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2605056. Throughput: 0: 193.4. Samples: 653392. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:44:29,325][00203] Avg episode reward: [(0, '9.281')] [2023-02-26 13:44:34,311][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2609152. Throughput: 0: 195.4. Samples: 653952. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:44:34,318][00203] Avg episode reward: [(0, '9.357')] [2023-02-26 13:44:39,311][00203] Fps is (10 sec: 819.3, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 2613248. Throughput: 0: 197.8. Samples: 655256. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:44:39,322][00203] Avg episode reward: [(0, '9.286')] [2023-02-26 13:44:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2617344. Throughput: 0: 194.4. Samples: 656606. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:44:44,317][00203] Avg episode reward: [(0, '9.411')] [2023-02-26 13:44:47,589][13333] Updated weights for policy 0, policy_version 640 (0.0079) [2023-02-26 13:44:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2621440. Throughput: 0: 188.9. Samples: 656984. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:44:49,317][00203] Avg episode reward: [(0, '9.495')] [2023-02-26 13:44:53,913][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_2625536.pth... [2023-02-26 13:44:54,031][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000595_2437120.pth [2023-02-26 13:44:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2625536. Throughput: 0: 194.1. Samples: 658034. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:44:54,321][00203] Avg episode reward: [(0, '9.633')] [2023-02-26 13:44:58,278][13314] Saving new best policy, reward=9.633! [2023-02-26 13:44:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2629632. Throughput: 0: 191.0. Samples: 659250. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:44:59,324][00203] Avg episode reward: [(0, '9.595')] [2023-02-26 13:45:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2633728. Throughput: 0: 199.0. Samples: 660098. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:45:04,317][00203] Avg episode reward: [(0, '9.562')] [2023-02-26 13:45:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2637824. Throughput: 0: 191.7. Samples: 661124. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:45:09,324][00203] Avg episode reward: [(0, '9.524')] [2023-02-26 13:45:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 2637824. Throughput: 0: 193.9. Samples: 662118. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 13:45:14,321][00203] Avg episode reward: [(0, '9.369')] [2023-02-26 13:45:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2646016. Throughput: 0: 195.3. Samples: 662740. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:45:19,323][00203] Avg episode reward: [(0, '9.435')] [2023-02-26 13:45:24,312][00203] Fps is (10 sec: 1228.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2650112. Throughput: 0: 198.8. Samples: 664202. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:45:24,327][00203] Avg episode reward: [(0, '9.657')] [2023-02-26 13:45:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2650112. Throughput: 0: 189.5. Samples: 665134. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:45:29,316][00203] Avg episode reward: [(0, '9.460')] [2023-02-26 13:45:29,395][13314] Saving new best policy, reward=9.657! [2023-02-26 13:45:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2654208. Throughput: 0: 191.1. Samples: 665584. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:45:34,320][00203] Avg episode reward: [(0, '9.650')] [2023-02-26 13:45:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2658304. Throughput: 0: 195.5. Samples: 666830. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:45:39,315][00203] Avg episode reward: [(0, '9.684')] [2023-02-26 13:45:39,884][13333] Updated weights for policy 0, policy_version 650 (0.0600) [2023-02-26 13:45:42,165][13314] Signal inference workers to stop experience collection... (650 times) [2023-02-26 13:45:42,207][13333] InferenceWorker_p0-w0: stopping experience collection (650 times) [2023-02-26 13:45:43,640][13314] Signal inference workers to resume experience collection... (650 times) [2023-02-26 13:45:43,641][13333] InferenceWorker_p0-w0: resuming experience collection (650 times) [2023-02-26 13:45:43,653][13314] Saving new best policy, reward=9.684! [2023-02-26 13:45:44,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2666496. Throughput: 0: 197.5. Samples: 668136. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:45:44,320][00203] Avg episode reward: [(0, '9.751')] [2023-02-26 13:45:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2666496. Throughput: 0: 196.8. Samples: 668954. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:45:49,314][00203] Avg episode reward: [(0, '9.620')] [2023-02-26 13:45:49,800][13314] Saving new best policy, reward=9.751! [2023-02-26 13:45:54,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2670592. Throughput: 0: 191.9. Samples: 669758. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:45:54,313][00203] Avg episode reward: [(0, '9.708')] [2023-02-26 13:45:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2674688. Throughput: 0: 199.6. Samples: 671100. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:45:59,316][00203] Avg episode reward: [(0, '9.807')] [2023-02-26 13:46:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2678784. Throughput: 0: 196.8. Samples: 671598. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:46:04,318][00203] Avg episode reward: [(0, '9.815')] [2023-02-26 13:46:04,531][13314] Saving new best policy, reward=9.807! [2023-02-26 13:46:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2682880. Throughput: 0: 195.5. Samples: 673000. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:46:09,313][00203] Avg episode reward: [(0, '9.589')] [2023-02-26 13:46:10,481][13314] Saving new best policy, reward=9.815! [2023-02-26 13:46:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2686976. Throughput: 0: 195.3. Samples: 673922. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:46:14,322][00203] Avg episode reward: [(0, '9.874')] [2023-02-26 13:46:16,779][13314] Saving new best policy, reward=9.874! [2023-02-26 13:46:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2691072. Throughput: 0: 192.5. Samples: 674248. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:46:19,315][00203] Avg episode reward: [(0, '10.061')] [2023-02-26 13:46:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2695168. Throughput: 0: 204.7. Samples: 676042. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:46:24,315][00203] Avg episode reward: [(0, '9.867')] [2023-02-26 13:46:24,903][13314] Saving new best policy, reward=10.061! [2023-02-26 13:46:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2699264. Throughput: 0: 204.0. Samples: 677316. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:46:29,314][00203] Avg episode reward: [(0, '9.848')] [2023-02-26 13:46:30,650][13333] Updated weights for policy 0, policy_version 660 (0.0079) [2023-02-26 13:46:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2703360. Throughput: 0: 190.6. Samples: 677530. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:46:34,314][00203] Avg episode reward: [(0, '9.867')] [2023-02-26 13:46:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2707456. Throughput: 0: 193.2. Samples: 678454. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:46:39,315][00203] Avg episode reward: [(0, '9.402')] [2023-02-26 13:46:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2711552. Throughput: 0: 200.0. Samples: 680098. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:46:44,324][00203] Avg episode reward: [(0, '9.467')] [2023-02-26 13:46:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2715648. Throughput: 0: 201.2. Samples: 680650. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:46:49,318][00203] Avg episode reward: [(0, '9.562')] [2023-02-26 13:46:51,775][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000664_2719744.pth... [2023-02-26 13:46:51,920][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000618_2531328.pth [2023-02-26 13:46:54,333][00203] Fps is (10 sec: 817.4, 60 sec: 818.9, 300 sec: 791.4). Total num frames: 2719744. Throughput: 0: 191.0. Samples: 681600. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:46:54,339][00203] Avg episode reward: [(0, '9.689')] [2023-02-26 13:46:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2723840. Throughput: 0: 195.3. Samples: 682712. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:46:59,315][00203] Avg episode reward: [(0, '9.711')] [2023-02-26 13:47:04,311][00203] Fps is (10 sec: 821.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2727936. Throughput: 0: 208.3. Samples: 683620. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:47:04,316][00203] Avg episode reward: [(0, '9.787')] [2023-02-26 13:47:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2732032. Throughput: 0: 194.7. Samples: 684804. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:47:09,315][00203] Avg episode reward: [(0, '9.813')] [2023-02-26 13:47:14,315][00203] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 2736128. Throughput: 0: 191.2. Samples: 685922. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:47:14,318][00203] Avg episode reward: [(0, '10.055')] [2023-02-26 13:47:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2740224. Throughput: 0: 200.0. Samples: 686530. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:47:19,324][00203] Avg episode reward: [(0, '10.009')] [2023-02-26 13:47:22,757][13333] Updated weights for policy 0, policy_version 670 (0.0591) [2023-02-26 13:47:24,311][00203] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2744320. Throughput: 0: 202.8. Samples: 687578. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:47:24,319][00203] Avg episode reward: [(0, '10.310')] [2023-02-26 13:47:27,087][13314] Saving new best policy, reward=10.310! [2023-02-26 13:47:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2748416. Throughput: 0: 202.3. Samples: 689202. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:47:29,316][00203] Avg episode reward: [(0, '10.385')] [2023-02-26 13:47:32,695][13314] Saving new best policy, reward=10.385! [2023-02-26 13:47:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2752512. Throughput: 0: 198.6. Samples: 689588. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:47:34,317][00203] Avg episode reward: [(0, '10.253')] [2023-02-26 13:47:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2756608. Throughput: 0: 199.7. Samples: 690584. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:47:39,317][00203] Avg episode reward: [(0, '10.363')] [2023-02-26 13:47:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2760704. Throughput: 0: 203.3. Samples: 691862. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:47:44,317][00203] Avg episode reward: [(0, '10.190')] [2023-02-26 13:47:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2764800. Throughput: 0: 202.0. Samples: 692708. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:47:49,319][00203] Avg episode reward: [(0, '10.188')] [2023-02-26 13:47:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.5, 300 sec: 791.4). Total num frames: 2768896. Throughput: 0: 198.0. Samples: 693716. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:47:54,317][00203] Avg episode reward: [(0, '10.141')] [2023-02-26 13:47:59,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2768896. Throughput: 0: 195.9. Samples: 694736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:47:59,315][00203] Avg episode reward: [(0, '10.551')] [2023-02-26 13:48:04,116][13314] Saving new best policy, reward=10.551! [2023-02-26 13:48:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2777088. Throughput: 0: 198.8. Samples: 695476. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:48:04,323][00203] Avg episode reward: [(0, '10.615')] [2023-02-26 13:48:08,530][13314] Saving new best policy, reward=10.615! [2023-02-26 13:48:09,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2781184. Throughput: 0: 207.3. Samples: 696906. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:48:09,319][00203] Avg episode reward: [(0, '10.697')] [2023-02-26 13:48:14,312][00203] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 2781184. Throughput: 0: 193.1. Samples: 697890. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:48:14,314][00203] Avg episode reward: [(0, '10.869')] [2023-02-26 13:48:14,386][13314] Saving new best policy, reward=10.697! [2023-02-26 13:48:14,396][13333] Updated weights for policy 0, policy_version 680 (0.0104) [2023-02-26 13:48:19,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2785280. Throughput: 0: 194.0. Samples: 698318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:48:19,320][00203] Avg episode reward: [(0, '10.819')] [2023-02-26 13:48:20,803][13314] Saving new best policy, reward=10.869! [2023-02-26 13:48:24,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2789376. Throughput: 0: 201.2. Samples: 699640. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:48:24,318][00203] Avg episode reward: [(0, '10.852')] [2023-02-26 13:48:29,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2797568. Throughput: 0: 201.8. Samples: 700942. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:48:29,321][00203] Avg episode reward: [(0, '10.527')] [2023-02-26 13:48:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 2797568. Throughput: 0: 201.3. Samples: 701768. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:48:34,321][00203] Avg episode reward: [(0, '10.328')] [2023-02-26 13:48:39,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2801664. Throughput: 0: 197.5. Samples: 702604. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:48:39,315][00203] Avg episode reward: [(0, '10.750')] [2023-02-26 13:48:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2805760. Throughput: 0: 207.1. Samples: 704054. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:48:44,319][00203] Avg episode reward: [(0, '10.773')] [2023-02-26 13:48:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2809856. Throughput: 0: 202.5. Samples: 704588. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:48:49,320][00203] Avg episode reward: [(0, '10.510')] [2023-02-26 13:48:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2813952. Throughput: 0: 204.2. Samples: 706096. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:48:54,315][00203] Avg episode reward: [(0, '10.786')] [2023-02-26 13:48:55,071][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000688_2818048.pth... [2023-02-26 13:48:55,165][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_2625536.pth [2023-02-26 13:48:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2818048. Throughput: 0: 201.4. Samples: 706954. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:48:59,321][00203] Avg episode reward: [(0, '10.749')] [2023-02-26 13:49:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2822144. Throughput: 0: 199.6. Samples: 707298. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:49:04,321][00203] Avg episode reward: [(0, '11.036')] [2023-02-26 13:49:06,224][13333] Updated weights for policy 0, policy_version 690 (0.0592) [2023-02-26 13:49:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2826240. Throughput: 0: 203.6. Samples: 708804. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:49:09,316][00203] Avg episode reward: [(0, '11.231')] [2023-02-26 13:49:09,988][13314] Saving new best policy, reward=11.036! [2023-02-26 13:49:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2830336. Throughput: 0: 205.2. Samples: 710178. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:49:14,314][00203] Avg episode reward: [(0, '10.920')] [2023-02-26 13:49:15,930][13314] Saving new best policy, reward=11.231! [2023-02-26 13:49:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2834432. Throughput: 0: 191.2. Samples: 710372. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:49:19,315][00203] Avg episode reward: [(0, '10.898')] [2023-02-26 13:49:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2838528. Throughput: 0: 192.9. Samples: 711286. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:49:24,315][00203] Avg episode reward: [(0, '10.895')] [2023-02-26 13:49:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2842624. Throughput: 0: 198.6. Samples: 712990. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:49:29,317][00203] Avg episode reward: [(0, '10.448')] [2023-02-26 13:49:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2846720. Throughput: 0: 198.2. Samples: 713508. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:49:34,316][00203] Avg episode reward: [(0, '10.205')] [2023-02-26 13:49:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2850816. Throughput: 0: 184.4. Samples: 714394. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:49:39,319][00203] Avg episode reward: [(0, '10.205')] [2023-02-26 13:49:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2854912. Throughput: 0: 189.1. Samples: 715462. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:49:44,316][00203] Avg episode reward: [(0, '10.141')] [2023-02-26 13:49:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2859008. Throughput: 0: 201.5. Samples: 716364. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:49:49,321][00203] Avg episode reward: [(0, '10.239')] [2023-02-26 13:49:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2863104. Throughput: 0: 196.1. Samples: 717630. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:49:54,316][00203] Avg episode reward: [(0, '10.340')] [2023-02-26 13:49:57,454][13333] Updated weights for policy 0, policy_version 700 (0.0675) [2023-02-26 13:49:59,314][00203] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2867200. Throughput: 0: 188.0. Samples: 718638. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:49:59,319][00203] Avg episode reward: [(0, '10.180')] [2023-02-26 13:50:02,381][13314] Signal inference workers to stop experience collection... (700 times) [2023-02-26 13:50:02,456][13333] InferenceWorker_p0-w0: stopping experience collection (700 times) [2023-02-26 13:50:03,896][13314] Signal inference workers to resume experience collection... (700 times) [2023-02-26 13:50:03,900][13333] InferenceWorker_p0-w0: resuming experience collection (700 times) [2023-02-26 13:50:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2871296. Throughput: 0: 196.5. Samples: 719214. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:04,315][00203] Avg episode reward: [(0, '10.390')] [2023-02-26 13:50:09,311][00203] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2875392. Throughput: 0: 202.9. Samples: 720416. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:09,319][00203] Avg episode reward: [(0, '10.222')] [2023-02-26 13:50:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2879488. Throughput: 0: 198.8. Samples: 721934. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:14,316][00203] Avg episode reward: [(0, '10.108')] [2023-02-26 13:50:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2883584. Throughput: 0: 199.0. Samples: 722464. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:19,316][00203] Avg episode reward: [(0, '10.076')] [2023-02-26 13:50:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2883584. Throughput: 0: 200.6. Samples: 723422. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:24,322][00203] Avg episode reward: [(0, '9.923')] [2023-02-26 13:50:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2891776. Throughput: 0: 204.0. Samples: 724644. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:29,321][00203] Avg episode reward: [(0, '10.010')] [2023-02-26 13:50:34,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2895872. Throughput: 0: 204.4. Samples: 725562. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:34,315][00203] Avg episode reward: [(0, '10.015')] [2023-02-26 13:50:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2899968. Throughput: 0: 197.6. Samples: 726520. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:39,321][00203] Avg episode reward: [(0, '10.259')] [2023-02-26 13:50:44,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2899968. Throughput: 0: 197.2. Samples: 727512. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:44,314][00203] Avg episode reward: [(0, '10.406')] [2023-02-26 13:50:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2908160. Throughput: 0: 199.8. Samples: 728206. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:50:49,315][00203] Avg episode reward: [(0, '10.768')] [2023-02-26 13:50:49,629][13333] Updated weights for policy 0, policy_version 710 (0.1994) [2023-02-26 13:50:53,167][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000711_2912256.pth... [2023-02-26 13:50:53,291][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000664_2719744.pth [2023-02-26 13:50:54,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2912256. Throughput: 0: 203.1. Samples: 729554. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:54,319][00203] Avg episode reward: [(0, '11.401')] [2023-02-26 13:50:58,941][13314] Saving new best policy, reward=11.401! [2023-02-26 13:50:59,313][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2916352. Throughput: 0: 191.7. Samples: 730560. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:50:59,317][00203] Avg episode reward: [(0, '11.807')] [2023-02-26 13:51:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2916352. Throughput: 0: 192.8. Samples: 731142. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:51:04,313][00203] Avg episode reward: [(0, '12.643')] [2023-02-26 13:51:05,414][13314] Saving new best policy, reward=11.807! [2023-02-26 13:51:09,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2920448. Throughput: 0: 200.5. Samples: 732444. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:51:09,316][00203] Avg episode reward: [(0, '12.679')] [2023-02-26 13:51:09,889][13314] Saving new best policy, reward=12.643! [2023-02-26 13:51:14,228][13314] Saving new best policy, reward=12.679! [2023-02-26 13:51:14,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2928640. Throughput: 0: 201.0. Samples: 733690. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:51:14,316][00203] Avg episode reward: [(0, '12.669')] [2023-02-26 13:51:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2928640. Throughput: 0: 193.9. Samples: 734286. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:51:19,315][00203] Avg episode reward: [(0, '12.871')] [2023-02-26 13:51:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2932736. Throughput: 0: 188.3. Samples: 734994. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:51:24,316][00203] Avg episode reward: [(0, '13.661')] [2023-02-26 13:51:26,856][13314] Saving new best policy, reward=12.871! [2023-02-26 13:51:27,013][13314] Saving new best policy, reward=13.661! [2023-02-26 13:51:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2936832. Throughput: 0: 195.5. Samples: 736310. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:51:29,318][00203] Avg episode reward: [(0, '13.739')] [2023-02-26 13:51:31,341][13314] Saving new best policy, reward=13.739! [2023-02-26 13:51:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2940928. Throughput: 0: 191.8. Samples: 736838. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:51:34,323][00203] Avg episode reward: [(0, '13.642')] [2023-02-26 13:51:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2945024. Throughput: 0: 194.7. Samples: 738314. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:51:39,323][00203] Avg episode reward: [(0, '13.402')] [2023-02-26 13:51:41,718][13333] Updated weights for policy 0, policy_version 720 (0.0718) [2023-02-26 13:51:44,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2949120. Throughput: 0: 191.8. Samples: 739192. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:51:44,319][00203] Avg episode reward: [(0, '13.560')] [2023-02-26 13:51:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.5). Total num frames: 2953216. Throughput: 0: 191.6. Samples: 739762. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:51:49,318][00203] Avg episode reward: [(0, '13.649')] [2023-02-26 13:51:54,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2957312. Throughput: 0: 189.5. Samples: 740972. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:51:54,315][00203] Avg episode reward: [(0, '13.565')] [2023-02-26 13:51:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 2961408. Throughput: 0: 193.2. Samples: 742386. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:51:59,321][00203] Avg episode reward: [(0, '13.811')] [2023-02-26 13:52:03,514][13314] Saving new best policy, reward=13.811! [2023-02-26 13:52:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2965504. Throughput: 0: 192.0. Samples: 742924. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:52:04,324][00203] Avg episode reward: [(0, '14.013')] [2023-02-26 13:52:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 2965504. Throughput: 0: 194.8. Samples: 743760. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:52:09,321][00203] Avg episode reward: [(0, '13.970')] [2023-02-26 13:52:09,960][13314] Saving new best policy, reward=14.013! [2023-02-26 13:52:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2973696. Throughput: 0: 192.7. Samples: 744982. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:52:14,316][00203] Avg episode reward: [(0, '13.774')] [2023-02-26 13:52:19,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2977792. Throughput: 0: 201.2. Samples: 745890. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:52:19,313][00203] Avg episode reward: [(0, '14.262')] [2023-02-26 13:52:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2977792. Throughput: 0: 191.7. Samples: 746942. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:52:24,314][00203] Avg episode reward: [(0, '14.157')] [2023-02-26 13:52:24,702][13314] Saving new best policy, reward=14.262! [2023-02-26 13:52:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2981888. Throughput: 0: 192.5. Samples: 747854. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 13:52:29,313][00203] Avg episode reward: [(0, '13.463')] [2023-02-26 13:52:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2985984. Throughput: 0: 190.8. Samples: 748348. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:52:34,323][00203] Avg episode reward: [(0, '13.471')] [2023-02-26 13:52:35,003][13333] Updated weights for policy 0, policy_version 730 (0.0623) [2023-02-26 13:52:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2990080. Throughput: 0: 201.6. Samples: 750046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:52:39,324][00203] Avg episode reward: [(0, '12.913')] [2023-02-26 13:52:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2994176. Throughput: 0: 189.8. Samples: 750926. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:52:44,322][00203] Avg episode reward: [(0, '12.825')] [2023-02-26 13:52:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 2998272. Throughput: 0: 188.1. Samples: 751390. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 13:52:49,322][00203] Avg episode reward: [(0, '12.694')] [2023-02-26 13:52:52,042][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000733_3002368.pth... [2023-02-26 13:52:52,160][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000688_2818048.pth [2023-02-26 13:52:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3002368. Throughput: 0: 192.1. Samples: 752404. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:52:54,319][00203] Avg episode reward: [(0, '13.022')] [2023-02-26 13:52:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3006464. Throughput: 0: 199.6. Samples: 753962. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:52:59,318][00203] Avg episode reward: [(0, '12.962')] [2023-02-26 13:53:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3010560. Throughput: 0: 191.8. Samples: 754522. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:53:04,316][00203] Avg episode reward: [(0, '12.830')] [2023-02-26 13:53:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3014656. Throughput: 0: 187.9. Samples: 755396. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:53:09,323][00203] Avg episode reward: [(0, '12.871')] [2023-02-26 13:53:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3018752. Throughput: 0: 190.1. Samples: 756408. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:53:14,316][00203] Avg episode reward: [(0, '12.930')] [2023-02-26 13:53:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3022848. Throughput: 0: 197.3. Samples: 757228. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:53:19,317][00203] Avg episode reward: [(0, '13.609')] [2023-02-26 13:53:24,315][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3026944. Throughput: 0: 190.4. Samples: 758616. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:53:24,319][00203] Avg episode reward: [(0, '13.681')] [2023-02-26 13:53:27,914][13333] Updated weights for policy 0, policy_version 740 (0.0551) [2023-02-26 13:53:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3031040. Throughput: 0: 190.3. Samples: 759488. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:53:29,315][00203] Avg episode reward: [(0, '14.166')] [2023-02-26 13:53:34,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3031040. Throughput: 0: 192.7. Samples: 760060. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:53:34,317][00203] Avg episode reward: [(0, '13.955')] [2023-02-26 13:53:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3039232. Throughput: 0: 199.2. Samples: 761370. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:53:39,318][00203] Avg episode reward: [(0, '14.084')] [2023-02-26 13:53:44,312][00203] Fps is (10 sec: 1228.7, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3043328. Throughput: 0: 194.2. Samples: 762702. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:53:44,323][00203] Avg episode reward: [(0, '14.013')] [2023-02-26 13:53:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3047424. Throughput: 0: 196.3. Samples: 763356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:53:49,319][00203] Avg episode reward: [(0, '14.191')] [2023-02-26 13:53:54,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3047424. Throughput: 0: 195.8. Samples: 764206. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:53:54,315][00203] Avg episode reward: [(0, '14.212')] [2023-02-26 13:53:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3055616. Throughput: 0: 200.9. Samples: 765448. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:53:59,317][00203] Avg episode reward: [(0, '14.158')] [2023-02-26 13:54:04,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3059712. Throughput: 0: 201.3. Samples: 766286. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:54:04,322][00203] Avg episode reward: [(0, '14.301')] [2023-02-26 13:54:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3059712. Throughput: 0: 195.2. Samples: 767400. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:54:09,314][00203] Avg episode reward: [(0, '14.407')] [2023-02-26 13:54:09,690][13314] Saving new best policy, reward=14.301! [2023-02-26 13:54:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3063808. Throughput: 0: 194.9. Samples: 768258. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:54:14,320][00203] Avg episode reward: [(0, '14.235')] [2023-02-26 13:54:16,047][13314] Saving new best policy, reward=14.407! [2023-02-26 13:54:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3067904. Throughput: 0: 193.6. Samples: 768774. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:54:19,315][00203] Avg episode reward: [(0, '14.563')] [2023-02-26 13:54:20,423][13333] Updated weights for policy 0, policy_version 750 (0.1145) [2023-02-26 13:54:22,818][13314] Signal inference workers to stop experience collection... (750 times) [2023-02-26 13:54:22,885][13333] InferenceWorker_p0-w0: stopping experience collection (750 times) [2023-02-26 13:54:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 3072000. Throughput: 0: 203.1. Samples: 770508. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:54:24,321][00203] Avg episode reward: [(0, '14.831')] [2023-02-26 13:54:24,485][13314] Saving new best policy, reward=14.563! [2023-02-26 13:54:24,503][13314] Signal inference workers to resume experience collection... (750 times) [2023-02-26 13:54:24,507][13333] InferenceWorker_p0-w0: resuming experience collection (750 times) [2023-02-26 13:54:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3076096. Throughput: 0: 197.0. Samples: 771568. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:54:29,317][00203] Avg episode reward: [(0, '14.142')] [2023-02-26 13:54:30,896][13314] Saving new best policy, reward=14.831! [2023-02-26 13:54:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3080192. Throughput: 0: 187.3. Samples: 771784. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:54:34,320][00203] Avg episode reward: [(0, '14.240')] [2023-02-26 13:54:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3084288. Throughput: 0: 193.2. Samples: 772902. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:54:39,317][00203] Avg episode reward: [(0, '14.491')] [2023-02-26 13:54:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3088384. Throughput: 0: 202.2. Samples: 774546. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:54:44,316][00203] Avg episode reward: [(0, '14.568')] [2023-02-26 13:54:49,314][00203] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3092480. Throughput: 0: 190.5. Samples: 774858. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:54:49,320][00203] Avg episode reward: [(0, '13.929')] [2023-02-26 13:54:52,089][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000756_3096576.pth... [2023-02-26 13:54:52,229][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000711_2912256.pth [2023-02-26 13:54:54,314][00203] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3096576. Throughput: 0: 187.5. Samples: 775836. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:54:54,326][00203] Avg episode reward: [(0, '14.034')] [2023-02-26 13:54:59,311][00203] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3100672. Throughput: 0: 190.3. Samples: 776822. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:54:59,315][00203] Avg episode reward: [(0, '14.173')] [2023-02-26 13:55:04,311][00203] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3104768. Throughput: 0: 199.7. Samples: 777760. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:55:04,313][00203] Avg episode reward: [(0, '14.018')] [2023-02-26 13:55:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3108864. Throughput: 0: 190.1. Samples: 779062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:55:09,322][00203] Avg episode reward: [(0, '14.072')] [2023-02-26 13:55:12,887][13333] Updated weights for policy 0, policy_version 760 (0.1089) [2023-02-26 13:55:14,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3112960. Throughput: 0: 186.4. Samples: 779958. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:55:14,321][00203] Avg episode reward: [(0, '14.300')] [2023-02-26 13:55:19,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3112960. Throughput: 0: 194.4. Samples: 780530. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:55:19,317][00203] Avg episode reward: [(0, '13.906')] [2023-02-26 13:55:24,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3121152. Throughput: 0: 197.6. Samples: 781796. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:55:24,320][00203] Avg episode reward: [(0, '14.079')] [2023-02-26 13:55:29,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3125248. Throughput: 0: 185.5. Samples: 782892. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:55:29,318][00203] Avg episode reward: [(0, '13.652')] [2023-02-26 13:55:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3125248. Throughput: 0: 192.0. Samples: 783496. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:55:34,314][00203] Avg episode reward: [(0, '13.723')] [2023-02-26 13:55:39,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3129344. Throughput: 0: 188.6. Samples: 784322. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:55:39,318][00203] Avg episode reward: [(0, '13.453')] [2023-02-26 13:55:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3133440. Throughput: 0: 200.0. Samples: 785822. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:55:44,317][00203] Avg episode reward: [(0, '13.431')] [2023-02-26 13:55:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 763.7). Total num frames: 3137536. Throughput: 0: 193.2. Samples: 786452. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:55:49,319][00203] Avg episode reward: [(0, '13.621')] [2023-02-26 13:55:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 763.7). Total num frames: 3141632. Throughput: 0: 193.9. Samples: 787788. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:55:54,314][00203] Avg episode reward: [(0, '13.912')] [2023-02-26 13:55:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3145728. Throughput: 0: 194.1. Samples: 788692. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:55:59,318][00203] Avg episode reward: [(0, '13.854')] [2023-02-26 13:56:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3149824. Throughput: 0: 190.7. Samples: 789112. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:56:04,326][00203] Avg episode reward: [(0, '14.281')] [2023-02-26 13:56:05,337][13333] Updated weights for policy 0, policy_version 770 (0.0057) [2023-02-26 13:56:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3153920. Throughput: 0: 204.5. Samples: 791000. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-26 13:56:09,315][00203] Avg episode reward: [(0, '14.497')] [2023-02-26 13:56:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 3158016. Throughput: 0: 203.7. Samples: 792058. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:56:14,315][00203] Avg episode reward: [(0, '14.504')] [2023-02-26 13:56:19,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3162112. Throughput: 0: 198.0. Samples: 792406. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:56:19,319][00203] Avg episode reward: [(0, '14.217')] [2023-02-26 13:56:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3166208. Throughput: 0: 202.9. Samples: 793454. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:56:24,322][00203] Avg episode reward: [(0, '14.626')] [2023-02-26 13:56:29,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3170304. Throughput: 0: 206.0. Samples: 795094. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:56:29,317][00203] Avg episode reward: [(0, '14.732')] [2023-02-26 13:56:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3174400. Throughput: 0: 203.4. Samples: 795604. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:56:34,322][00203] Avg episode reward: [(0, '15.156')] [2023-02-26 13:56:36,460][13314] Saving new best policy, reward=15.156! [2023-02-26 13:56:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3178496. Throughput: 0: 191.4. Samples: 796402. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:56:39,320][00203] Avg episode reward: [(0, '15.064')] [2023-02-26 13:56:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3182592. Throughput: 0: 194.4. Samples: 797440. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:56:44,315][00203] Avg episode reward: [(0, '14.867')] [2023-02-26 13:56:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3186688. Throughput: 0: 202.0. Samples: 798200. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:56:49,318][00203] Avg episode reward: [(0, '15.267')] [2023-02-26 13:56:51,407][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000779_3190784.pth... [2023-02-26 13:56:51,523][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000733_3002368.pth [2023-02-26 13:56:51,543][13314] Saving new best policy, reward=15.267! [2023-02-26 13:56:54,314][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3190784. Throughput: 0: 192.4. Samples: 799660. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:56:54,318][00203] Avg episode reward: [(0, '15.798')] [2023-02-26 13:56:57,326][13333] Updated weights for policy 0, policy_version 780 (0.0072) [2023-02-26 13:56:57,329][13314] Saving new best policy, reward=15.798! [2023-02-26 13:56:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3194880. Throughput: 0: 187.9. Samples: 800512. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:56:59,318][00203] Avg episode reward: [(0, '15.948')] [2023-02-26 13:57:03,331][13314] Saving new best policy, reward=15.948! [2023-02-26 13:57:04,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3198976. Throughput: 0: 194.9. Samples: 801178. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:57:04,318][00203] Avg episode reward: [(0, '16.241')] [2023-02-26 13:57:07,628][13314] Saving new best policy, reward=16.241! [2023-02-26 13:57:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3203072. Throughput: 0: 197.3. Samples: 802334. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:57:09,313][00203] Avg episode reward: [(0, '16.180')] [2023-02-26 13:57:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3207168. Throughput: 0: 193.2. Samples: 803790. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:57:14,316][00203] Avg episode reward: [(0, '17.127')] [2023-02-26 13:57:17,216][13314] Saving new best policy, reward=17.127! [2023-02-26 13:57:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3211264. Throughput: 0: 193.6. Samples: 804316. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:57:19,319][00203] Avg episode reward: [(0, '17.704')] [2023-02-26 13:57:23,651][13314] Saving new best policy, reward=17.704! [2023-02-26 13:57:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3215360. Throughput: 0: 197.4. Samples: 805286. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:57:24,321][00203] Avg episode reward: [(0, '17.726')] [2023-02-26 13:57:27,982][13314] Saving new best policy, reward=17.726! [2023-02-26 13:57:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3219456. Throughput: 0: 202.8. Samples: 806568. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 13:57:29,314][00203] Avg episode reward: [(0, '18.409')] [2023-02-26 13:57:32,258][13314] Saving new best policy, reward=18.409! [2023-02-26 13:57:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3223552. Throughput: 0: 203.6. Samples: 807362. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:57:34,315][00203] Avg episode reward: [(0, '19.332')] [2023-02-26 13:57:37,962][13314] Saving new best policy, reward=19.332! [2023-02-26 13:57:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3227648. Throughput: 0: 194.5. Samples: 808410. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:57:39,315][00203] Avg episode reward: [(0, '19.329')] [2023-02-26 13:57:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3231744. Throughput: 0: 198.4. Samples: 809442. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:57:44,315][00203] Avg episode reward: [(0, '19.861')] [2023-02-26 13:57:48,530][13314] Saving new best policy, reward=19.861! [2023-02-26 13:57:48,536][13333] Updated weights for policy 0, policy_version 790 (0.0611) [2023-02-26 13:57:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3235840. Throughput: 0: 202.8. Samples: 810304. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:57:49,319][00203] Avg episode reward: [(0, '20.056')] [2023-02-26 13:57:52,966][13314] Saving new best policy, reward=20.056! [2023-02-26 13:57:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3239936. Throughput: 0: 203.7. Samples: 811500. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:57:54,317][00203] Avg episode reward: [(0, '19.667')] [2023-02-26 13:57:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3244032. Throughput: 0: 194.5. Samples: 812542. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:57:59,317][00203] Avg episode reward: [(0, '19.773')] [2023-02-26 13:58:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3244032. Throughput: 0: 197.4. Samples: 813198. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:58:04,314][00203] Avg episode reward: [(0, '19.919')] [2023-02-26 13:58:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3252224. Throughput: 0: 204.8. Samples: 814504. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:58:09,315][00203] Avg episode reward: [(0, '19.736')] [2023-02-26 13:58:14,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3256320. Throughput: 0: 200.0. Samples: 815570. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:58:14,315][00203] Avg episode reward: [(0, '19.925')] [2023-02-26 13:58:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3260416. Throughput: 0: 197.3. Samples: 816242. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:58:19,325][00203] Avg episode reward: [(0, '20.324')] [2023-02-26 13:58:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3260416. Throughput: 0: 196.7. Samples: 817262. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:58:24,315][00203] Avg episode reward: [(0, '19.993')] [2023-02-26 13:58:25,388][13314] Saving new best policy, reward=20.324! [2023-02-26 13:58:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3264512. Throughput: 0: 202.8. Samples: 818566. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:58:29,322][00203] Avg episode reward: [(0, '20.170')] [2023-02-26 13:58:34,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3272704. Throughput: 0: 200.4. Samples: 819320. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:58:34,319][00203] Avg episode reward: [(0, '20.496')] [2023-02-26 13:58:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3272704. Throughput: 0: 202.4. Samples: 820606. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:58:39,317][00203] Avg episode reward: [(0, '19.571')] [2023-02-26 13:58:39,820][13314] Saving new best policy, reward=20.496! [2023-02-26 13:58:39,839][13333] Updated weights for policy 0, policy_version 800 (0.0516) [2023-02-26 13:58:44,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3276800. Throughput: 0: 198.0. Samples: 821454. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 13:58:44,316][00203] Avg episode reward: [(0, '19.154')] [2023-02-26 13:58:44,806][13314] Signal inference workers to stop experience collection... (800 times) [2023-02-26 13:58:44,900][13333] InferenceWorker_p0-w0: stopping experience collection (800 times) [2023-02-26 13:58:46,296][13314] Signal inference workers to resume experience collection... (800 times) [2023-02-26 13:58:46,297][13333] InferenceWorker_p0-w0: resuming experience collection (800 times) [2023-02-26 13:58:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3280896. Throughput: 0: 192.6. Samples: 821864. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:58:49,319][00203] Avg episode reward: [(0, '18.997')] [2023-02-26 13:58:54,129][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000803_3289088.pth... [2023-02-26 13:58:54,230][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000756_3096576.pth [2023-02-26 13:58:54,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3289088. Throughput: 0: 203.5. Samples: 823662. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:58:54,321][00203] Avg episode reward: [(0, '18.394')] [2023-02-26 13:58:59,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3293184. Throughput: 0: 205.1. Samples: 824800. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:58:59,316][00203] Avg episode reward: [(0, '18.155')] [2023-02-26 13:59:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3293184. Throughput: 0: 201.2. Samples: 825296. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:59:04,315][00203] Avg episode reward: [(0, '18.339')] [2023-02-26 13:59:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3297280. Throughput: 0: 201.2. Samples: 826316. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:59:09,315][00203] Avg episode reward: [(0, '17.538')] [2023-02-26 13:59:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3301376. Throughput: 0: 206.9. Samples: 827878. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:59:14,315][00203] Avg episode reward: [(0, '17.312')] [2023-02-26 13:59:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3305472. Throughput: 0: 206.8. Samples: 828628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:59:19,314][00203] Avg episode reward: [(0, '17.334')] [2023-02-26 13:59:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3309568. Throughput: 0: 193.8. Samples: 829328. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:59:24,317][00203] Avg episode reward: [(0, '17.293')] [2023-02-26 13:59:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3313664. Throughput: 0: 208.0. Samples: 830814. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:59:29,319][00203] Avg episode reward: [(0, '17.417')] [2023-02-26 13:59:30,384][13333] Updated weights for policy 0, policy_version 810 (0.1262) [2023-02-26 13:59:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3317760. Throughput: 0: 211.5. Samples: 831380. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 13:59:34,319][00203] Avg episode reward: [(0, '17.528')] [2023-02-26 13:59:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3321856. Throughput: 0: 204.3. Samples: 832854. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:59:39,315][00203] Avg episode reward: [(0, '17.262')] [2023-02-26 13:59:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3325952. Throughput: 0: 196.3. Samples: 833632. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:59:44,323][00203] Avg episode reward: [(0, '17.028')] [2023-02-26 13:59:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3330048. Throughput: 0: 192.4. Samples: 833956. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:59:49,321][00203] Avg episode reward: [(0, '16.912')] [2023-02-26 13:59:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3334144. Throughput: 0: 206.2. Samples: 835596. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:59:54,321][00203] Avg episode reward: [(0, '17.158')] [2023-02-26 13:59:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3338240. Throughput: 0: 203.7. Samples: 837044. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 13:59:59,317][00203] Avg episode reward: [(0, '16.859')] [2023-02-26 14:00:04,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3342336. Throughput: 0: 193.5. Samples: 837336. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:00:04,321][00203] Avg episode reward: [(0, '16.924')] [2023-02-26 14:00:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3346432. Throughput: 0: 199.9. Samples: 838324. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:00:09,320][00203] Avg episode reward: [(0, '17.757')] [2023-02-26 14:00:14,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3350528. Throughput: 0: 199.2. Samples: 839776. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:00:14,330][00203] Avg episode reward: [(0, '17.455')] [2023-02-26 14:00:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3354624. Throughput: 0: 193.3. Samples: 840080. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:00:19,317][00203] Avg episode reward: [(0, '17.189')] [2023-02-26 14:00:24,313][00203] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3354624. Throughput: 0: 174.9. Samples: 840724. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:00:24,321][00203] Avg episode reward: [(0, '16.809')] [2023-02-26 14:00:27,122][13333] Updated weights for policy 0, policy_version 820 (0.1349) [2023-02-26 14:00:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3358720. Throughput: 0: 174.6. Samples: 841488. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:00:29,314][00203] Avg episode reward: [(0, '16.991')] [2023-02-26 14:00:34,311][00203] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3362816. Throughput: 0: 181.9. Samples: 842140. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:00:34,313][00203] Avg episode reward: [(0, '16.801')] [2023-02-26 14:00:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3366912. Throughput: 0: 172.2. Samples: 843346. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:00:39,315][00203] Avg episode reward: [(0, '16.458')] [2023-02-26 14:00:44,315][00203] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3371008. Throughput: 0: 167.0. Samples: 844560. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:00:44,318][00203] Avg episode reward: [(0, '16.468')] [2023-02-26 14:00:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3375104. Throughput: 0: 173.0. Samples: 845122. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:00:49,316][00203] Avg episode reward: [(0, '16.208')] [2023-02-26 14:00:54,040][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000825_3379200.pth... [2023-02-26 14:00:54,157][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000779_3190784.pth [2023-02-26 14:00:54,311][00203] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3379200. Throughput: 0: 174.8. Samples: 846190. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:00:54,317][00203] Avg episode reward: [(0, '15.865')] [2023-02-26 14:00:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3383296. Throughput: 0: 169.2. Samples: 847392. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:00:59,315][00203] Avg episode reward: [(0, '15.933')] [2023-02-26 14:01:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 3387392. Throughput: 0: 182.6. Samples: 848298. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:01:04,325][00203] Avg episode reward: [(0, '16.148')] [2023-02-26 14:01:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 3387392. Throughput: 0: 187.9. Samples: 849178. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:01:09,318][00203] Avg episode reward: [(0, '16.509')] [2023-02-26 14:01:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 3391488. Throughput: 0: 196.1. Samples: 850312. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:01:14,316][00203] Avg episode reward: [(0, '16.994')] [2023-02-26 14:01:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 3395584. Throughput: 0: 191.7. Samples: 850766. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:01:19,314][00203] Avg episode reward: [(0, '16.849')] [2023-02-26 14:01:19,838][13333] Updated weights for policy 0, policy_version 830 (0.0615) [2023-02-26 14:01:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 3399680. Throughput: 0: 204.5. Samples: 852548. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:01:24,314][00203] Avg episode reward: [(0, '16.919')] [2023-02-26 14:01:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3403776. Throughput: 0: 195.6. Samples: 853360. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:01:29,320][00203] Avg episode reward: [(0, '17.020')] [2023-02-26 14:01:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3407872. Throughput: 0: 190.6. Samples: 853700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:01:34,315][00203] Avg episode reward: [(0, '17.096')] [2023-02-26 14:01:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3411968. Throughput: 0: 197.7. Samples: 855088. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:01:39,324][00203] Avg episode reward: [(0, '17.187')] [2023-02-26 14:01:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 3416064. Throughput: 0: 205.0. Samples: 856616. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:01:44,317][00203] Avg episode reward: [(0, '17.300')] [2023-02-26 14:01:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 3420160. Throughput: 0: 193.4. Samples: 857002. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:01:49,325][00203] Avg episode reward: [(0, '16.650')] [2023-02-26 14:01:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3424256. Throughput: 0: 195.3. Samples: 857966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:01:54,319][00203] Avg episode reward: [(0, '16.966')] [2023-02-26 14:01:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3428352. Throughput: 0: 199.9. Samples: 859308. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:01:59,316][00203] Avg episode reward: [(0, '17.665')] [2023-02-26 14:02:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3432448. Throughput: 0: 202.2. Samples: 859866. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:02:04,315][00203] Avg episode reward: [(0, '17.341')] [2023-02-26 14:02:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3436544. Throughput: 0: 188.6. Samples: 861036. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:09,317][00203] Avg episode reward: [(0, '17.417')] [2023-02-26 14:02:13,910][13333] Updated weights for policy 0, policy_version 840 (0.0755) [2023-02-26 14:02:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3440640. Throughput: 0: 189.1. Samples: 861870. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:14,324][00203] Avg episode reward: [(0, '17.271')] [2023-02-26 14:02:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3444736. Throughput: 0: 198.5. Samples: 862632. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:19,317][00203] Avg episode reward: [(0, '17.293')] [2023-02-26 14:02:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3448832. Throughput: 0: 195.1. Samples: 863868. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:24,314][00203] Avg episode reward: [(0, '17.878')] [2023-02-26 14:02:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3452928. Throughput: 0: 188.4. Samples: 865096. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:29,316][00203] Avg episode reward: [(0, '18.800')] [2023-02-26 14:02:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3457024. Throughput: 0: 194.3. Samples: 865746. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:34,314][00203] Avg episode reward: [(0, '19.092')] [2023-02-26 14:02:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3461120. Throughput: 0: 197.6. Samples: 866858. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:39,323][00203] Avg episode reward: [(0, '19.099')] [2023-02-26 14:02:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3465216. Throughput: 0: 198.7. Samples: 868248. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:44,315][00203] Avg episode reward: [(0, '19.286')] [2023-02-26 14:02:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3469312. Throughput: 0: 201.5. Samples: 868932. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:49,316][00203] Avg episode reward: [(0, '18.730')] [2023-02-26 14:02:53,884][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000848_3473408.pth... [2023-02-26 14:02:54,031][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000803_3289088.pth [2023-02-26 14:02:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3473408. Throughput: 0: 200.0. Samples: 870038. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:54,315][00203] Avg episode reward: [(0, '19.238')] [2023-02-26 14:02:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3477504. Throughput: 0: 203.9. Samples: 871046. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:02:59,314][00203] Avg episode reward: [(0, '19.040')] [2023-02-26 14:03:02,937][13333] Updated weights for policy 0, policy_version 850 (0.1731) [2023-02-26 14:03:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3481600. Throughput: 0: 207.7. Samples: 871978. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:03:04,322][00203] Avg episode reward: [(0, '18.412')] [2023-02-26 14:03:05,846][13314] Signal inference workers to stop experience collection... (850 times) [2023-02-26 14:03:05,911][13333] InferenceWorker_p0-w0: stopping experience collection (850 times) [2023-02-26 14:03:07,848][13314] Signal inference workers to resume experience collection... (850 times) [2023-02-26 14:03:07,850][13333] InferenceWorker_p0-w0: resuming experience collection (850 times) [2023-02-26 14:03:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3485696. Throughput: 0: 205.4. Samples: 873112. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:03:09,318][00203] Avg episode reward: [(0, '18.381')] [2023-02-26 14:03:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3489792. Throughput: 0: 200.2. Samples: 874104. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:03:14,316][00203] Avg episode reward: [(0, '18.466')] [2023-02-26 14:03:19,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3489792. Throughput: 0: 198.5. Samples: 874680. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:03:19,313][00203] Avg episode reward: [(0, '18.456')] [2023-02-26 14:03:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3497984. Throughput: 0: 208.4. Samples: 876238. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:03:24,317][00203] Avg episode reward: [(0, '19.061')] [2023-02-26 14:03:29,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3502080. Throughput: 0: 201.4. Samples: 877312. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:03:29,317][00203] Avg episode reward: [(0, '19.305')] [2023-02-26 14:03:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3502080. Throughput: 0: 200.0. Samples: 877934. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:03:34,314][00203] Avg episode reward: [(0, '18.844')] [2023-02-26 14:03:39,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3506176. Throughput: 0: 196.9. Samples: 878898. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:03:39,320][00203] Avg episode reward: [(0, '18.855')] [2023-02-26 14:03:44,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3514368. Throughput: 0: 204.8. Samples: 880260. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 14:03:44,316][00203] Avg episode reward: [(0, '19.130')] [2023-02-26 14:03:49,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3518464. Throughput: 0: 201.7. Samples: 881054. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:03:49,316][00203] Avg episode reward: [(0, '19.128')] [2023-02-26 14:03:54,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3518464. Throughput: 0: 199.0. Samples: 882066. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:03:54,315][00203] Avg episode reward: [(0, '19.416')] [2023-02-26 14:03:55,721][13333] Updated weights for policy 0, policy_version 860 (0.0068) [2023-02-26 14:03:59,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3522560. Throughput: 0: 196.4. Samples: 882940. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:03:59,318][00203] Avg episode reward: [(0, '19.169')] [2023-02-26 14:04:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3526656. Throughput: 0: 193.5. Samples: 883388. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:04:04,317][00203] Avg episode reward: [(0, '19.471')] [2023-02-26 14:04:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3530752. Throughput: 0: 196.9. Samples: 885100. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:04:09,319][00203] Avg episode reward: [(0, '19.391')] [2023-02-26 14:04:14,312][00203] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3534848. Throughput: 0: 193.5. Samples: 886020. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:04:14,319][00203] Avg episode reward: [(0, '19.498')] [2023-02-26 14:04:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3538944. Throughput: 0: 189.9. Samples: 886480. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:04:19,318][00203] Avg episode reward: [(0, '19.646')] [2023-02-26 14:04:24,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3543040. Throughput: 0: 190.5. Samples: 887472. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:04:24,317][00203] Avg episode reward: [(0, '19.698')] [2023-02-26 14:04:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3547136. Throughput: 0: 199.2. Samples: 889224. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:04:29,318][00203] Avg episode reward: [(0, '19.264')] [2023-02-26 14:04:34,320][00203] Fps is (10 sec: 818.4, 60 sec: 819.1, 300 sec: 777.5). Total num frames: 3551232. Throughput: 0: 193.0. Samples: 889740. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:04:34,329][00203] Avg episode reward: [(0, '19.233')] [2023-02-26 14:04:39,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3555328. Throughput: 0: 188.8. Samples: 890560. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:04:39,320][00203] Avg episode reward: [(0, '19.547')] [2023-02-26 14:04:44,311][00203] Fps is (10 sec: 819.9, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3559424. Throughput: 0: 192.9. Samples: 891622. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:04:44,319][00203] Avg episode reward: [(0, '20.096')] [2023-02-26 14:04:48,224][13333] Updated weights for policy 0, policy_version 870 (0.0595) [2023-02-26 14:04:49,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3563520. Throughput: 0: 205.1. Samples: 892616. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:04:49,314][00203] Avg episode reward: [(0, '20.215')] [2023-02-26 14:04:52,311][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000871_3567616.pth... [2023-02-26 14:04:52,410][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000825_3379200.pth [2023-02-26 14:04:54,314][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3567616. Throughput: 0: 193.0. Samples: 893786. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 14:04:54,325][00203] Avg episode reward: [(0, '20.514')] [2023-02-26 14:04:58,977][13314] Saving new best policy, reward=20.514! [2023-02-26 14:04:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 3571712. Throughput: 0: 192.6. Samples: 894688. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 14:04:59,322][00203] Avg episode reward: [(0, '20.625')] [2023-02-26 14:05:04,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3571712. Throughput: 0: 194.5. Samples: 895232. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 14:05:04,323][00203] Avg episode reward: [(0, '21.013')] [2023-02-26 14:05:05,156][13314] Saving new best policy, reward=20.625! [2023-02-26 14:05:09,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3575808. Throughput: 0: 203.5. Samples: 896630. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 14:05:09,318][00203] Avg episode reward: [(0, '20.921')] [2023-02-26 14:05:09,427][13314] Saving new best policy, reward=21.013! [2023-02-26 14:05:14,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3584000. Throughput: 0: 189.2. Samples: 897738. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 14:05:14,323][00203] Avg episode reward: [(0, '20.515')] [2023-02-26 14:05:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 3584000. Throughput: 0: 189.0. Samples: 898244. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 14:05:19,327][00203] Avg episode reward: [(0, '20.515')] [2023-02-26 14:05:24,313][00203] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3588096. Throughput: 0: 192.7. Samples: 899232. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 14:05:24,316][00203] Avg episode reward: [(0, '20.410')] [2023-02-26 14:05:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3592192. Throughput: 0: 198.6. Samples: 900560. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 14:05:29,323][00203] Avg episode reward: [(0, '20.420')] [2023-02-26 14:05:34,311][00203] Fps is (10 sec: 819.4, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 3596288. Throughput: 0: 192.9. Samples: 901296. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2023-02-26 14:05:34,325][00203] Avg episode reward: [(0, '20.108')] [2023-02-26 14:05:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 3600384. Throughput: 0: 194.9. Samples: 902554. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:05:39,314][00203] Avg episode reward: [(0, '20.324')] [2023-02-26 14:05:40,920][13333] Updated weights for policy 0, policy_version 880 (0.1136) [2023-02-26 14:05:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3604480. Throughput: 0: 194.1. Samples: 903424. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:05:44,320][00203] Avg episode reward: [(0, '20.200')] [2023-02-26 14:05:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3608576. Throughput: 0: 192.5. Samples: 903894. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:05:49,318][00203] Avg episode reward: [(0, '19.724')] [2023-02-26 14:05:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 3612672. Throughput: 0: 197.2. Samples: 905502. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:05:54,318][00203] Avg episode reward: [(0, '20.095')] [2023-02-26 14:05:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3616768. Throughput: 0: 203.2. Samples: 906880. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:05:59,318][00203] Avg episode reward: [(0, '20.314')] [2023-02-26 14:06:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3620864. Throughput: 0: 197.1. Samples: 907114. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2023-02-26 14:06:04,322][00203] Avg episode reward: [(0, '20.055')] [2023-02-26 14:06:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3624960. Throughput: 0: 197.2. Samples: 908104. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:06:09,321][00203] Avg episode reward: [(0, '20.682')] [2023-02-26 14:06:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3629056. Throughput: 0: 205.5. Samples: 909808. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:06:14,315][00203] Avg episode reward: [(0, '20.777')] [2023-02-26 14:06:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3633152. Throughput: 0: 201.5. Samples: 910364. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:06:19,315][00203] Avg episode reward: [(0, '21.128')] [2023-02-26 14:06:21,530][13314] Saving new best policy, reward=21.128! [2023-02-26 14:06:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3637248. Throughput: 0: 192.1. Samples: 911200. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:06:24,319][00203] Avg episode reward: [(0, '21.343')] [2023-02-26 14:06:27,821][13314] Saving new best policy, reward=21.343! [2023-02-26 14:06:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3641344. Throughput: 0: 197.1. Samples: 912292. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:06:29,315][00203] Avg episode reward: [(0, '21.612')] [2023-02-26 14:06:31,822][13314] Saving new best policy, reward=21.612! [2023-02-26 14:06:31,828][13333] Updated weights for policy 0, policy_version 890 (0.0065) [2023-02-26 14:06:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3645440. Throughput: 0: 202.4. Samples: 913002. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:06:34,321][00203] Avg episode reward: [(0, '21.333')] [2023-02-26 14:06:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3649536. Throughput: 0: 202.4. Samples: 914612. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:06:39,314][00203] Avg episode reward: [(0, '21.260')] [2023-02-26 14:06:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3653632. Throughput: 0: 189.0. Samples: 915386. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:06:44,320][00203] Avg episode reward: [(0, '20.716')] [2023-02-26 14:06:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3657728. Throughput: 0: 196.5. Samples: 915958. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:06:49,315][00203] Avg episode reward: [(0, '20.771')] [2023-02-26 14:06:53,353][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000894_3661824.pth... [2023-02-26 14:06:53,479][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000848_3473408.pth [2023-02-26 14:06:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3661824. Throughput: 0: 201.2. Samples: 917156. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:06:54,318][00203] Avg episode reward: [(0, '20.850')] [2023-02-26 14:06:59,313][00203] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3665920. Throughput: 0: 192.1. Samples: 918452. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:06:59,316][00203] Avg episode reward: [(0, '20.462')] [2023-02-26 14:07:04,318][00203] Fps is (10 sec: 818.6, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 3670016. Throughput: 0: 192.3. Samples: 919020. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:07:04,330][00203] Avg episode reward: [(0, '20.409')] [2023-02-26 14:07:09,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3670016. Throughput: 0: 193.6. Samples: 919914. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:07:09,313][00203] Avg episode reward: [(0, '20.083')] [2023-02-26 14:07:14,311][00203] Fps is (10 sec: 409.9, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3674112. Throughput: 0: 197.2. Samples: 921164. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:07:14,324][00203] Avg episode reward: [(0, '19.672')] [2023-02-26 14:07:19,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3682304. Throughput: 0: 199.5. Samples: 921978. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:07:19,318][00203] Avg episode reward: [(0, '19.832')] [2023-02-26 14:07:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3682304. Throughput: 0: 184.0. Samples: 922890. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:07:24,314][00203] Avg episode reward: [(0, '20.037')] [2023-02-26 14:07:25,060][13333] Updated weights for policy 0, policy_version 900 (0.0606) [2023-02-26 14:07:29,315][00203] Fps is (10 sec: 409.4, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3686400. Throughput: 0: 188.9. Samples: 923888. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:07:29,321][00203] Avg episode reward: [(0, '20.008')] [2023-02-26 14:07:30,204][13314] Signal inference workers to stop experience collection... (900 times) [2023-02-26 14:07:30,319][13333] InferenceWorker_p0-w0: stopping experience collection (900 times) [2023-02-26 14:07:31,562][13314] Signal inference workers to resume experience collection... (900 times) [2023-02-26 14:07:31,565][13333] InferenceWorker_p0-w0: resuming experience collection (900 times) [2023-02-26 14:07:34,312][00203] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3690496. Throughput: 0: 184.0. Samples: 924240. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:07:34,323][00203] Avg episode reward: [(0, '19.824')] [2023-02-26 14:07:39,311][00203] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3694592. Throughput: 0: 192.5. Samples: 925818. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:07:39,316][00203] Avg episode reward: [(0, '20.582')] [2023-02-26 14:07:44,314][00203] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3698688. Throughput: 0: 186.6. Samples: 926850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:07:44,320][00203] Avg episode reward: [(0, '20.413')] [2023-02-26 14:07:49,312][00203] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3702784. Throughput: 0: 183.3. Samples: 927266. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:07:49,320][00203] Avg episode reward: [(0, '21.428')] [2023-02-26 14:07:54,311][00203] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3706880. Throughput: 0: 186.8. Samples: 928318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:07:54,315][00203] Avg episode reward: [(0, '21.396')] [2023-02-26 14:07:59,311][00203] Fps is (10 sec: 819.3, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 3710976. Throughput: 0: 190.8. Samples: 929750. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:07:59,319][00203] Avg episode reward: [(0, '20.414')] [2023-02-26 14:08:04,312][00203] Fps is (10 sec: 819.1, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 3715072. Throughput: 0: 186.0. Samples: 930346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:08:04,325][00203] Avg episode reward: [(0, '19.643')] [2023-02-26 14:08:09,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3719168. Throughput: 0: 190.0. Samples: 931440. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:08:09,320][00203] Avg episode reward: [(0, '19.652')] [2023-02-26 14:08:14,311][00203] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3723264. Throughput: 0: 190.2. Samples: 932444. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:08:14,316][00203] Avg episode reward: [(0, '20.101')] [2023-02-26 14:08:18,737][13333] Updated weights for policy 0, policy_version 910 (0.1547) [2023-02-26 14:08:19,311][00203] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3727360. Throughput: 0: 203.9. Samples: 933414. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:08:19,314][00203] Avg episode reward: [(0, '19.386')] [2023-02-26 14:08:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3731456. Throughput: 0: 197.4. Samples: 934702. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:08:24,314][00203] Avg episode reward: [(0, '19.912')] [2023-02-26 14:08:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 3735552. Throughput: 0: 195.1. Samples: 935628. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:08:29,320][00203] Avg episode reward: [(0, '19.912')] [2023-02-26 14:08:34,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3735552. Throughput: 0: 197.9. Samples: 936172. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:08:34,317][00203] Avg episode reward: [(0, '19.883')] [2023-02-26 14:08:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3743744. Throughput: 0: 204.3. Samples: 937510. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:08:39,324][00203] Avg episode reward: [(0, '20.243')] [2023-02-26 14:08:44,314][00203] Fps is (10 sec: 1228.6, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3747840. Throughput: 0: 202.1. Samples: 938846. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:08:44,324][00203] Avg episode reward: [(0, '20.297')] [2023-02-26 14:08:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3751936. Throughput: 0: 203.4. Samples: 939498. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:08:49,317][00203] Avg episode reward: [(0, '20.297')] [2023-02-26 14:08:54,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3751936. Throughput: 0: 196.6. Samples: 940288. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:08:54,320][00203] Avg episode reward: [(0, '20.566')] [2023-02-26 14:08:55,315][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000917_3756032.pth... [2023-02-26 14:08:55,444][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000871_3567616.pth [2023-02-26 14:08:59,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3756032. Throughput: 0: 202.7. Samples: 941566. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:08:59,315][00203] Avg episode reward: [(0, '20.977')] [2023-02-26 14:09:04,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3764224. Throughput: 0: 197.0. Samples: 942278. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:09:04,315][00203] Avg episode reward: [(0, '21.458')] [2023-02-26 14:09:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3764224. Throughput: 0: 194.0. Samples: 943432. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:09:09,318][00203] Avg episode reward: [(0, '21.410')] [2023-02-26 14:09:10,304][13333] Updated weights for policy 0, policy_version 920 (0.0061) [2023-02-26 14:09:14,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3768320. Throughput: 0: 193.4. Samples: 944330. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:09:14,319][00203] Avg episode reward: [(0, '21.812')] [2023-02-26 14:09:16,535][13314] Saving new best policy, reward=21.812! [2023-02-26 14:09:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3772416. Throughput: 0: 189.2. Samples: 944684. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:09:19,315][00203] Avg episode reward: [(0, '21.685')] [2023-02-26 14:09:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3776512. Throughput: 0: 195.2. Samples: 946296. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:09:24,316][00203] Avg episode reward: [(0, '21.881')] [2023-02-26 14:09:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 3780608. Throughput: 0: 193.2. Samples: 947538. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:09:29,318][00203] Avg episode reward: [(0, '21.989')] [2023-02-26 14:09:31,470][13314] Saving new best policy, reward=21.881! [2023-02-26 14:09:31,651][13314] Saving new best policy, reward=21.989! [2023-02-26 14:09:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3784704. Throughput: 0: 184.5. Samples: 947802. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:09:34,316][00203] Avg episode reward: [(0, '22.100')] [2023-02-26 14:09:37,831][13314] Saving new best policy, reward=22.100! [2023-02-26 14:09:39,314][00203] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3788800. Throughput: 0: 188.2. Samples: 948756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:09:39,318][00203] Avg episode reward: [(0, '22.750')] [2023-02-26 14:09:42,093][13314] Saving new best policy, reward=22.750! [2023-02-26 14:09:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 3792896. Throughput: 0: 195.9. Samples: 950380. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:09:44,317][00203] Avg episode reward: [(0, '22.722')] [2023-02-26 14:09:49,319][00203] Fps is (10 sec: 818.8, 60 sec: 750.8, 300 sec: 777.5). Total num frames: 3796992. Throughput: 0: 190.9. Samples: 950872. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:09:49,327][00203] Avg episode reward: [(0, '22.184')] [2023-02-26 14:09:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3801088. Throughput: 0: 187.1. Samples: 951850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:09:54,317][00203] Avg episode reward: [(0, '21.584')] [2023-02-26 14:09:59,311][00203] Fps is (10 sec: 819.9, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3805184. Throughput: 0: 190.4. Samples: 952896. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:09:59,315][00203] Avg episode reward: [(0, '21.814')] [2023-02-26 14:10:03,470][13333] Updated weights for policy 0, policy_version 930 (0.0619) [2023-02-26 14:10:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3809280. Throughput: 0: 199.5. Samples: 953662. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:10:04,324][00203] Avg episode reward: [(0, '21.641')] [2023-02-26 14:10:09,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3813376. Throughput: 0: 192.8. Samples: 954972. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:10:09,319][00203] Avg episode reward: [(0, '21.743')] [2023-02-26 14:10:14,312][00203] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3817472. Throughput: 0: 187.0. Samples: 955952. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:10:14,315][00203] Avg episode reward: [(0, '21.712')] [2023-02-26 14:10:19,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 3817472. Throughput: 0: 193.6. Samples: 956516. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:10:19,322][00203] Avg episode reward: [(0, '21.179')] [2023-02-26 14:10:24,317][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3821568. Throughput: 0: 204.1. Samples: 957940. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:10:24,325][00203] Avg episode reward: [(0, '20.835')] [2023-02-26 14:10:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3825664. Throughput: 0: 190.3. Samples: 958944. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:10:29,316][00203] Avg episode reward: [(0, '20.658')] [2023-02-26 14:10:34,311][00203] Fps is (10 sec: 409.7, 60 sec: 682.7, 300 sec: 763.7). Total num frames: 3825664. Throughput: 0: 181.1. Samples: 959022. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:10:34,314][00203] Avg episode reward: [(0, '20.658')] [2023-02-26 14:10:39,311][00203] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 763.7). Total num frames: 3829760. Throughput: 0: 173.4. Samples: 959652. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:10:39,314][00203] Avg episode reward: [(0, '20.507')] [2023-02-26 14:10:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 763.7). Total num frames: 3833856. Throughput: 0: 178.9. Samples: 960948. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2023-02-26 14:10:44,322][00203] Avg episode reward: [(0, '20.249')] [2023-02-26 14:10:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.8, 300 sec: 763.7). Total num frames: 3837952. Throughput: 0: 170.0. Samples: 961314. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:10:49,317][00203] Avg episode reward: [(0, '20.411')] [2023-02-26 14:10:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 763.7). Total num frames: 3842048. Throughput: 0: 176.0. Samples: 962890. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:10:54,316][00203] Avg episode reward: [(0, '19.971')] [2023-02-26 14:10:55,647][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000939_3846144.pth... [2023-02-26 14:10:55,784][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000894_3661824.pth [2023-02-26 14:10:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 763.7). Total num frames: 3846144. Throughput: 0: 171.3. Samples: 963660. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:10:59,316][00203] Avg episode reward: [(0, '19.795')] [2023-02-26 14:11:02,200][13333] Updated weights for policy 0, policy_version 940 (0.1527) [2023-02-26 14:11:04,318][00203] Fps is (10 sec: 818.6, 60 sec: 682.6, 300 sec: 763.6). Total num frames: 3850240. Throughput: 0: 169.9. Samples: 964162. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:04,329][00203] Avg episode reward: [(0, '19.845')] [2023-02-26 14:11:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 763.7). Total num frames: 3854336. Throughput: 0: 170.3. Samples: 965604. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:09,314][00203] Avg episode reward: [(0, '19.854')] [2023-02-26 14:11:14,311][00203] Fps is (10 sec: 819.8, 60 sec: 682.7, 300 sec: 763.7). Total num frames: 3858432. Throughput: 0: 176.4. Samples: 966880. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:14,314][00203] Avg episode reward: [(0, '19.983')] [2023-02-26 14:11:19,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3862528. Throughput: 0: 183.1. Samples: 967260. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:19,318][00203] Avg episode reward: [(0, '20.271')] [2023-02-26 14:11:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 763.7). Total num frames: 3866624. Throughput: 0: 192.0. Samples: 968290. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:24,314][00203] Avg episode reward: [(0, '19.646')] [2023-02-26 14:11:29,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3870720. Throughput: 0: 195.5. Samples: 969746. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:29,316][00203] Avg episode reward: [(0, '19.901')] [2023-02-26 14:11:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3874816. Throughput: 0: 200.7. Samples: 970346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:34,328][00203] Avg episode reward: [(0, '19.995')] [2023-02-26 14:11:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3878912. Throughput: 0: 187.2. Samples: 971314. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:39,314][00203] Avg episode reward: [(0, '20.123')] [2023-02-26 14:11:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3883008. Throughput: 0: 192.5. Samples: 972324. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:44,318][00203] Avg episode reward: [(0, '19.763')] [2023-02-26 14:11:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3887104. Throughput: 0: 201.0. Samples: 973206. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:49,314][00203] Avg episode reward: [(0, '19.936')] [2023-02-26 14:11:53,587][13333] Updated weights for policy 0, policy_version 950 (0.1913) [2023-02-26 14:11:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3891200. Throughput: 0: 196.4. Samples: 974442. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:11:54,316][00203] Avg episode reward: [(0, '19.864')] [2023-02-26 14:11:57,727][13314] Signal inference workers to stop experience collection... (950 times) [2023-02-26 14:11:57,900][13333] InferenceWorker_p0-w0: stopping experience collection (950 times) [2023-02-26 14:11:59,162][13314] Signal inference workers to resume experience collection... (950 times) [2023-02-26 14:11:59,163][13333] InferenceWorker_p0-w0: resuming experience collection (950 times) [2023-02-26 14:11:59,314][00203] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3895296. Throughput: 0: 189.8. Samples: 975420. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:11:59,318][00203] Avg episode reward: [(0, '19.966')] [2023-02-26 14:12:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 763.7). Total num frames: 3895296. Throughput: 0: 193.9. Samples: 975984. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:12:04,317][00203] Avg episode reward: [(0, '19.872')] [2023-02-26 14:12:09,311][00203] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3899392. Throughput: 0: 200.4. Samples: 977306. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2023-02-26 14:12:09,323][00203] Avg episode reward: [(0, '20.524')] [2023-02-26 14:12:14,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3907584. Throughput: 0: 193.8. Samples: 978466. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-26 14:12:14,313][00203] Avg episode reward: [(0, '20.501')] [2023-02-26 14:12:19,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3911680. Throughput: 0: 198.0. Samples: 979254. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:12:19,314][00203] Avg episode reward: [(0, '20.414')] [2023-02-26 14:12:24,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3911680. Throughput: 0: 194.2. Samples: 980054. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:12:24,317][00203] Avg episode reward: [(0, '20.863')] [2023-02-26 14:12:29,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3915776. Throughput: 0: 201.4. Samples: 981386. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2023-02-26 14:12:29,316][00203] Avg episode reward: [(0, '20.973')] [2023-02-26 14:12:34,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3923968. Throughput: 0: 199.6. Samples: 982188. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:12:34,315][00203] Avg episode reward: [(0, '21.250')] [2023-02-26 14:12:39,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 3928064. Throughput: 0: 198.5. Samples: 983374. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:12:39,316][00203] Avg episode reward: [(0, '21.254')] [2023-02-26 14:12:44,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3928064. Throughput: 0: 197.4. Samples: 984302. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:12:44,315][00203] Avg episode reward: [(0, '21.177')] [2023-02-26 14:12:45,407][13333] Updated weights for policy 0, policy_version 960 (0.0064) [2023-02-26 14:12:49,311][00203] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3932160. Throughput: 0: 195.3. Samples: 984772. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:12:49,316][00203] Avg episode reward: [(0, '21.578')] [2023-02-26 14:12:54,214][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000962_3940352.pth... [2023-02-26 14:12:54,311][00203] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3940352. Throughput: 0: 201.6. Samples: 986380. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:12:54,315][00203] Avg episode reward: [(0, '22.412')] [2023-02-26 14:12:54,378][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000917_3756032.pth [2023-02-26 14:12:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 763.7). Total num frames: 3940352. Throughput: 0: 200.4. Samples: 987486. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:12:59,317][00203] Avg episode reward: [(0, '22.624')] [2023-02-26 14:13:04,311][00203] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3944448. Throughput: 0: 190.8. Samples: 987842. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:13:04,321][00203] Avg episode reward: [(0, '22.711')] [2023-02-26 14:13:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3948544. Throughput: 0: 196.9. Samples: 988916. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:13:09,316][00203] Avg episode reward: [(0, '22.248')] [2023-02-26 14:13:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3952640. Throughput: 0: 204.8. Samples: 990602. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:13:14,315][00203] Avg episode reward: [(0, '21.633')] [2023-02-26 14:13:19,314][00203] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3956736. Throughput: 0: 196.7. Samples: 991040. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:13:19,324][00203] Avg episode reward: [(0, '22.051')] [2023-02-26 14:13:24,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3960832. Throughput: 0: 188.4. Samples: 991850. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:13:24,320][00203] Avg episode reward: [(0, '21.990')] [2023-02-26 14:13:29,311][00203] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3964928. Throughput: 0: 192.9. Samples: 992984. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-26 14:13:29,323][00203] Avg episode reward: [(0, '22.502')] [2023-02-26 14:13:34,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3969024. Throughput: 0: 195.6. Samples: 993576. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:13:34,322][00203] Avg episode reward: [(0, '22.130')] [2023-02-26 14:13:36,050][13333] Updated weights for policy 0, policy_version 970 (0.0059) [2023-02-26 14:13:39,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 763.7). Total num frames: 3973120. Throughput: 0: 193.6. Samples: 995094. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:13:39,315][00203] Avg episode reward: [(0, '21.163')] [2023-02-26 14:13:44,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3977216. Throughput: 0: 188.4. Samples: 995964. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:13:44,320][00203] Avg episode reward: [(0, '21.224')] [2023-02-26 14:13:49,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3981312. Throughput: 0: 194.0. Samples: 996572. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:13:49,315][00203] Avg episode reward: [(0, '20.952')] [2023-02-26 14:13:54,311][00203] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 3985408. Throughput: 0: 196.5. Samples: 997760. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:13:54,316][00203] Avg episode reward: [(0, '20.935')] [2023-02-26 14:13:59,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 3989504. Throughput: 0: 194.4. Samples: 999348. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:13:59,318][00203] Avg episode reward: [(0, '20.935')] [2023-02-26 14:14:04,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3993600. Throughput: 0: 192.5. Samples: 999704. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:14:04,317][00203] Avg episode reward: [(0, '21.139')] [2023-02-26 14:14:09,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 3997696. Throughput: 0: 197.1. Samples: 1000718. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:14:09,319][00203] Avg episode reward: [(0, '22.172')] [2023-02-26 14:14:14,311][00203] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 4001792. Throughput: 0: 201.8. Samples: 1002066. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-26 14:14:14,316][00203] Avg episode reward: [(0, '22.004')] [2023-02-26 14:14:17,186][13314] Stopping Batcher_0... [2023-02-26 14:14:17,195][13314] Loop batcher_evt_loop terminating... [2023-02-26 14:14:17,197][00203] Component Batcher_0 stopped! [2023-02-26 14:14:17,270][13338] Stopping RolloutWorker_w5... [2023-02-26 14:14:17,270][00203] Component RolloutWorker_w5 stopped! [2023-02-26 14:14:17,279][00203] Component RolloutWorker_w3 stopped! [2023-02-26 14:14:17,286][13334] Stopping RolloutWorker_w1... [2023-02-26 14:14:17,287][13334] Loop rollout_proc1_evt_loop terminating... [2023-02-26 14:14:17,276][13338] Loop rollout_proc5_evt_loop terminating... [2023-02-26 14:14:17,287][00203] Component RolloutWorker_w1 stopped! [2023-02-26 14:14:17,278][13337] Stopping RolloutWorker_w3... [2023-02-26 14:14:17,292][13337] Loop rollout_proc3_evt_loop terminating... [2023-02-26 14:14:17,294][13340] Stopping RolloutWorker_w7... [2023-02-26 14:14:17,297][13340] Loop rollout_proc7_evt_loop terminating... [2023-02-26 14:14:17,294][00203] Component RolloutWorker_w7 stopped! [2023-02-26 14:14:17,378][13332] Stopping RolloutWorker_w0... [2023-02-26 14:14:17,378][00203] Component RolloutWorker_w0 stopped! [2023-02-26 14:14:17,383][00203] Component RolloutWorker_w6 stopped! [2023-02-26 14:14:17,382][13339] Stopping RolloutWorker_w6... [2023-02-26 14:14:17,379][13332] Loop rollout_proc0_evt_loop terminating... [2023-02-26 14:14:17,430][13339] Loop rollout_proc6_evt_loop terminating... [2023-02-26 14:14:17,427][00203] Component RolloutWorker_w2 stopped! [2023-02-26 14:14:17,436][13336] Stopping RolloutWorker_w4... [2023-02-26 14:14:17,427][13335] Stopping RolloutWorker_w2... [2023-02-26 14:14:17,440][13335] Loop rollout_proc2_evt_loop terminating... [2023-02-26 14:14:17,444][00203] Component RolloutWorker_w4 stopped! [2023-02-26 14:14:17,461][13336] Loop rollout_proc4_evt_loop terminating... [2023-02-26 14:14:17,535][13333] Weights refcount: 2 0 [2023-02-26 14:14:17,546][00203] Component InferenceWorker_p0-w0 stopped! [2023-02-26 14:14:17,550][13333] Stopping InferenceWorker_p0-w0... [2023-02-26 14:14:17,553][13333] Loop inference_proc0-0_evt_loop terminating... [2023-02-26 14:14:23,732][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2023-02-26 14:14:23,854][13314] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000939_3846144.pth [2023-02-26 14:14:23,870][13314] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2023-02-26 14:14:24,075][13314] Stopping LearnerWorker_p0... [2023-02-26 14:14:24,076][13314] Loop learner_proc0_evt_loop terminating... [2023-02-26 14:14:24,077][00203] Component LearnerWorker_p0 stopped! [2023-02-26 14:14:24,088][00203] Waiting for process learner_proc0 to stop... [2023-02-26 14:14:25,120][00203] Waiting for process inference_proc0-0 to join... [2023-02-26 14:14:25,129][00203] Waiting for process rollout_proc0 to join... [2023-02-26 14:14:25,341][00203] Waiting for process rollout_proc1 to join... [2023-02-26 14:14:25,343][00203] Waiting for process rollout_proc2 to join... [2023-02-26 14:14:25,353][00203] Waiting for process rollout_proc3 to join... [2023-02-26 14:14:25,356][00203] Waiting for process rollout_proc4 to join... [2023-02-26 14:14:25,358][00203] Waiting for process rollout_proc5 to join... [2023-02-26 14:14:25,361][00203] Waiting for process rollout_proc6 to join... [2023-02-26 14:14:25,363][00203] Waiting for process rollout_proc7 to join... [2023-02-26 14:14:25,365][00203] Batcher 0 profile tree view: batching: 28.8926, releasing_batches: 0.2719 [2023-02-26 14:14:25,368][00203] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0075 wait_policy_total: 61.6237 update_model: 95.6018 weight_update: 0.0118 one_step: 0.1583 handle_policy_step: 3414.2416 deserialize: 122.6422, stack: 16.5447, obs_to_device_normalize: 490.1901, forward: 2579.0537, send_messages: 76.1875 prepare_outputs: 46.5024 to_cpu: 4.6580 [2023-02-26 14:14:25,371][00203] Learner 0 profile tree view: misc: 0.0092, prepare_batch: 1342.0179 train: 3719.0938 epoch_init: 0.0210, minibatch_init: 0.0167, losses_postprocess: 0.2210, kl_divergence: 0.5868, after_optimizer: 2.7668 calculate_losses: 1775.9654 losses_init: 0.0121, forward_head: 1570.1221, bptt_initial: 6.3348, tail: 3.9013, advantages_returns: 0.2416, losses: 1.9079 bptt: 192.6915 bptt_forward_core: 191.4537 update: 1938.6131 clip: 4.2957 [2023-02-26 14:14:25,373][00203] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.9135, enqueue_policy_requests: 76.7528, env_step: 1938.0132, overhead: 77.7525, complete_rollouts: 18.5781 save_policy_outputs: 46.9979 split_output_tensors: 23.0825 [2023-02-26 14:14:25,376][00203] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.9207, enqueue_policy_requests: 74.8061, env_step: 1908.2730, overhead: 76.2911, complete_rollouts: 20.5078 save_policy_outputs: 42.6051 split_output_tensors: 20.9161 [2023-02-26 14:14:25,379][00203] Loop Runner_EvtLoop terminating... [2023-02-26 14:14:25,382][00203] Runner profile tree view: main_loop: 5133.0908 [2023-02-26 14:14:25,384][00203] Collected {0: 4009984}, FPS: 781.2 [2023-02-26 14:21:03,847][00203] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 14:21:03,850][00203] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 14:21:03,852][00203] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 14:21:03,855][00203] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 14:21:03,858][00203] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 14:21:03,861][00203] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 14:21:03,864][00203] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 14:21:03,865][00203] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 14:21:03,868][00203] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-26 14:21:03,870][00203] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-26 14:21:03,871][00203] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 14:21:03,874][00203] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 14:21:03,875][00203] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 14:21:03,878][00203] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 14:21:03,879][00203] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 14:21:03,946][00203] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 14:21:03,952][00203] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 14:21:03,959][00203] RunningMeanStd input shape: (1,) [2023-02-26 14:21:04,028][00203] ConvEncoder: input_channels=3 [2023-02-26 14:21:04,292][00203] Conv encoder output size: 512 [2023-02-26 14:21:04,298][00203] Policy head output size: 512 [2023-02-26 14:21:04,340][00203] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2023-02-26 14:21:05,510][00203] Num frames 100... [2023-02-26 14:21:05,827][00203] Num frames 200... [2023-02-26 14:21:06,126][00203] Num frames 300... [2023-02-26 14:21:06,419][00203] Num frames 400... [2023-02-26 14:21:06,727][00203] Num frames 500... [2023-02-26 14:21:07,018][00203] Num frames 600... [2023-02-26 14:21:07,311][00203] Num frames 700... [2023-02-26 14:21:07,387][00203] Avg episode rewards: #0: 15.040, true rewards: #0: 7.040 [2023-02-26 14:21:07,390][00203] Avg episode reward: 15.040, avg true_objective: 7.040 [2023-02-26 14:21:07,666][00203] Num frames 800... [2023-02-26 14:21:07,906][00203] Num frames 900... [2023-02-26 14:21:08,176][00203] Num frames 1000... [2023-02-26 14:21:08,425][00203] Num frames 1100... [2023-02-26 14:21:08,677][00203] Num frames 1200... [2023-02-26 14:21:08,935][00203] Num frames 1300... [2023-02-26 14:21:09,099][00203] Avg episode rewards: #0: 14.720, true rewards: #0: 6.720 [2023-02-26 14:21:09,101][00203] Avg episode reward: 14.720, avg true_objective: 6.720 [2023-02-26 14:21:09,250][00203] Num frames 1400... [2023-02-26 14:21:09,491][00203] Num frames 1500... [2023-02-26 14:21:09,734][00203] Num frames 1600... [2023-02-26 14:21:09,969][00203] Num frames 1700... [2023-02-26 14:21:10,211][00203] Num frames 1800... [2023-02-26 14:21:10,449][00203] Num frames 1900... [2023-02-26 14:21:10,686][00203] Num frames 2000... [2023-02-26 14:21:10,911][00203] Num frames 2100... [2023-02-26 14:21:11,122][00203] Num frames 2200... [2023-02-26 14:21:11,340][00203] Num frames 2300... [2023-02-26 14:21:11,553][00203] Num frames 2400... [2023-02-26 14:21:11,757][00203] Num frames 2500... [2023-02-26 14:21:12,001][00203] Num frames 2600... [2023-02-26 14:21:12,256][00203] Num frames 2700... [2023-02-26 14:21:12,469][00203] Avg episode rewards: #0: 22.207, true rewards: #0: 9.207 [2023-02-26 14:21:12,478][00203] Avg episode reward: 22.207, avg true_objective: 9.207 [2023-02-26 14:21:12,567][00203] Num frames 2800... [2023-02-26 14:21:12,818][00203] Num frames 2900... [2023-02-26 14:21:13,042][00203] Num frames 3000... [2023-02-26 14:21:13,273][00203] Num frames 3100... [2023-02-26 14:21:13,515][00203] Num frames 3200... [2023-02-26 14:21:13,746][00203] Num frames 3300... [2023-02-26 14:21:13,966][00203] Num frames 3400... [2023-02-26 14:21:14,175][00203] Num frames 3500... [2023-02-26 14:21:14,379][00203] Num frames 3600... [2023-02-26 14:21:14,541][00203] Avg episode rewards: #0: 21.375, true rewards: #0: 9.125 [2023-02-26 14:21:14,543][00203] Avg episode reward: 21.375, avg true_objective: 9.125 [2023-02-26 14:21:14,649][00203] Num frames 3700... [2023-02-26 14:21:14,851][00203] Num frames 3800... [2023-02-26 14:21:15,125][00203] Num frames 3900... [2023-02-26 14:21:15,374][00203] Num frames 4000... [2023-02-26 14:21:15,613][00203] Num frames 4100... [2023-02-26 14:21:15,862][00203] Num frames 4200... [2023-02-26 14:21:16,099][00203] Num frames 4300... [2023-02-26 14:21:16,346][00203] Num frames 4400... [2023-02-26 14:21:16,592][00203] Num frames 4500... [2023-02-26 14:21:16,838][00203] Num frames 4600... [2023-02-26 14:21:17,051][00203] Num frames 4700... [2023-02-26 14:21:17,249][00203] Num frames 4800... [2023-02-26 14:21:17,463][00203] Num frames 4900... [2023-02-26 14:21:17,676][00203] Num frames 5000... [2023-02-26 14:21:17,957][00203] Num frames 5100... [2023-02-26 14:21:18,219][00203] Num frames 5200... [2023-02-26 14:21:18,330][00203] Avg episode rewards: #0: 25.236, true rewards: #0: 10.436 [2023-02-26 14:21:18,332][00203] Avg episode reward: 25.236, avg true_objective: 10.436 [2023-02-26 14:21:18,557][00203] Num frames 5300... [2023-02-26 14:21:18,815][00203] Num frames 5400... [2023-02-26 14:21:19,091][00203] Num frames 5500... [2023-02-26 14:21:19,394][00203] Num frames 5600... [2023-02-26 14:21:19,655][00203] Avg episode rewards: #0: 21.943, true rewards: #0: 9.443 [2023-02-26 14:21:19,658][00203] Avg episode reward: 21.943, avg true_objective: 9.443 [2023-02-26 14:21:19,770][00203] Num frames 5700... [2023-02-26 14:21:20,112][00203] Num frames 5800... [2023-02-26 14:21:20,452][00203] Num frames 5900... [2023-02-26 14:21:20,800][00203] Num frames 6000... [2023-02-26 14:21:21,135][00203] Num frames 6100... [2023-02-26 14:21:21,419][00203] Num frames 6200... [2023-02-26 14:21:21,638][00203] Avg episode rewards: #0: 20.504, true rewards: #0: 8.933 [2023-02-26 14:21:21,640][00203] Avg episode reward: 20.504, avg true_objective: 8.933 [2023-02-26 14:21:21,787][00203] Num frames 6300... [2023-02-26 14:21:22,080][00203] Num frames 6400... [2023-02-26 14:21:22,304][00203] Num frames 6500... [2023-02-26 14:21:22,512][00203] Num frames 6600... [2023-02-26 14:21:22,727][00203] Num frames 6700... [2023-02-26 14:21:22,931][00203] Num frames 6800... [2023-02-26 14:21:23,167][00203] Num frames 6900... [2023-02-26 14:21:23,428][00203] Num frames 7000... [2023-02-26 14:21:23,697][00203] Num frames 7100... [2023-02-26 14:21:23,955][00203] Num frames 7200... [2023-02-26 14:21:24,220][00203] Num frames 7300... [2023-02-26 14:21:24,463][00203] Num frames 7400... [2023-02-26 14:21:24,710][00203] Num frames 7500... [2023-02-26 14:21:24,967][00203] Num frames 7600... [2023-02-26 14:21:25,211][00203] Num frames 7700... [2023-02-26 14:21:25,479][00203] Num frames 7800... [2023-02-26 14:21:25,719][00203] Num frames 7900... [2023-02-26 14:21:25,964][00203] Num frames 8000... [2023-02-26 14:21:26,206][00203] Num frames 8100... [2023-02-26 14:21:26,426][00203] Num frames 8200... [2023-02-26 14:21:26,648][00203] Num frames 8300... [2023-02-26 14:21:26,713][00203] Avg episode rewards: #0: 23.876, true rewards: #0: 10.376 [2023-02-26 14:21:26,715][00203] Avg episode reward: 23.876, avg true_objective: 10.376 [2023-02-26 14:21:26,917][00203] Num frames 8400... [2023-02-26 14:21:27,117][00203] Num frames 8500... [2023-02-26 14:21:27,351][00203] Num frames 8600... [2023-02-26 14:21:27,584][00203] Num frames 8700... [2023-02-26 14:21:27,874][00203] Num frames 8800... [2023-02-26 14:21:28,272][00203] Num frames 8900... [2023-02-26 14:21:28,752][00203] Num frames 9000... [2023-02-26 14:21:29,007][00203] Num frames 9100... [2023-02-26 14:21:29,250][00203] Num frames 9200... [2023-02-26 14:21:29,451][00203] Num frames 9300... [2023-02-26 14:21:29,663][00203] Num frames 9400... [2023-02-26 14:21:29,874][00203] Num frames 9500... [2023-02-26 14:21:30,083][00203] Num frames 9600... [2023-02-26 14:21:30,295][00203] Num frames 9700... [2023-02-26 14:21:30,494][00203] Num frames 9800... [2023-02-26 14:21:30,698][00203] Num frames 9900... [2023-02-26 14:21:30,908][00203] Num frames 10000... [2023-02-26 14:21:31,119][00203] Num frames 10100... [2023-02-26 14:21:31,329][00203] Avg episode rewards: #0: 26.744, true rewards: #0: 11.300 [2023-02-26 14:21:31,332][00203] Avg episode reward: 26.744, avg true_objective: 11.300 [2023-02-26 14:21:31,403][00203] Num frames 10200... [2023-02-26 14:21:31,637][00203] Num frames 10300... [2023-02-26 14:21:31,889][00203] Num frames 10400... [2023-02-26 14:21:32,123][00203] Num frames 10500... [2023-02-26 14:21:32,440][00203] Num frames 10600... [2023-02-26 14:21:32,713][00203] Num frames 10700... [2023-02-26 14:21:33,002][00203] Num frames 10800... [2023-02-26 14:21:33,277][00203] Num frames 10900... [2023-02-26 14:21:33,547][00203] Num frames 11000... [2023-02-26 14:21:33,835][00203] Num frames 11100... [2023-02-26 14:21:34,116][00203] Num frames 11200... [2023-02-26 14:21:34,407][00203] Num frames 11300... [2023-02-26 14:21:34,746][00203] Num frames 11400... [2023-02-26 14:21:35,081][00203] Num frames 11500... [2023-02-26 14:21:35,409][00203] Num frames 11600... [2023-02-26 14:21:35,724][00203] Avg episode rewards: #0: 28.492, true rewards: #0: 11.692 [2023-02-26 14:21:35,727][00203] Avg episode reward: 28.492, avg true_objective: 11.692 [2023-02-26 14:23:03,693][00203] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-26 14:24:38,738][00203] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 14:24:38,742][00203] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 14:24:38,755][00203] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 14:24:38,763][00203] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 14:24:38,772][00203] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 14:24:38,779][00203] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 14:24:38,796][00203] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-26 14:24:38,802][00203] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 14:24:38,814][00203] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-26 14:24:38,817][00203] Adding new argument 'hf_repository'='habanoz/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-26 14:24:38,820][00203] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 14:24:38,826][00203] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 14:24:38,828][00203] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 14:24:38,834][00203] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 14:24:38,837][00203] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 14:24:38,898][00203] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 14:24:38,902][00203] RunningMeanStd input shape: (1,) [2023-02-26 14:24:38,927][00203] ConvEncoder: input_channels=3 [2023-02-26 14:24:39,062][00203] Conv encoder output size: 512 [2023-02-26 14:24:39,066][00203] Policy head output size: 512 [2023-02-26 14:24:39,098][00203] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2023-02-26 14:24:40,206][00203] Num frames 100... [2023-02-26 14:24:40,384][00203] Num frames 200... [2023-02-26 14:24:40,572][00203] Num frames 300... [2023-02-26 14:24:40,770][00203] Num frames 400... [2023-02-26 14:24:40,957][00203] Num frames 500... [2023-02-26 14:24:41,159][00203] Num frames 600... [2023-02-26 14:24:41,344][00203] Num frames 700... [2023-02-26 14:24:41,525][00203] Num frames 800... [2023-02-26 14:24:41,709][00203] Num frames 900... [2023-02-26 14:24:41,885][00203] Num frames 1000... [2023-02-26 14:24:42,229][00203] Num frames 1100... [2023-02-26 14:24:42,404][00203] Avg episode rewards: #0: 27.620, true rewards: #0: 11.620 [2023-02-26 14:24:42,406][00203] Avg episode reward: 27.620, avg true_objective: 11.620 [2023-02-26 14:24:42,484][00203] Num frames 1200... [2023-02-26 14:24:42,669][00203] Num frames 1300... [2023-02-26 14:24:43,057][00203] Num frames 1400... [2023-02-26 14:24:43,280][00203] Num frames 1500... [2023-02-26 14:24:43,543][00203] Num frames 1600... [2023-02-26 14:24:43,797][00203] Avg episode rewards: #0: 17.740, true rewards: #0: 8.240 [2023-02-26 14:24:43,801][00203] Avg episode reward: 17.740, avg true_objective: 8.240 [2023-02-26 14:24:43,913][00203] Num frames 1700... [2023-02-26 14:24:44,147][00203] Num frames 1800... [2023-02-26 14:24:44,367][00203] Num frames 1900... [2023-02-26 14:24:44,585][00203] Num frames 2000... [2023-02-26 14:24:44,810][00203] Num frames 2100... [2023-02-26 14:24:45,037][00203] Num frames 2200... [2023-02-26 14:24:45,259][00203] Num frames 2300... [2023-02-26 14:24:45,483][00203] Num frames 2400... [2023-02-26 14:24:45,702][00203] Num frames 2500... [2023-02-26 14:24:45,918][00203] Num frames 2600... [2023-02-26 14:24:46,138][00203] Num frames 2700... [2023-02-26 14:24:46,365][00203] Num frames 2800... [2023-02-26 14:24:46,612][00203] Num frames 2900... [2023-02-26 14:24:46,912][00203] Num frames 3000... [2023-02-26 14:24:47,214][00203] Avg episode rewards: #0: 22.960, true rewards: #0: 10.293 [2023-02-26 14:24:47,219][00203] Avg episode reward: 22.960, avg true_objective: 10.293 [2023-02-26 14:24:47,255][00203] Num frames 3100... [2023-02-26 14:24:47,498][00203] Num frames 3200... [2023-02-26 14:24:47,752][00203] Num frames 3300... [2023-02-26 14:24:48,011][00203] Num frames 3400... [2023-02-26 14:24:48,263][00203] Num frames 3500... [2023-02-26 14:24:48,512][00203] Num frames 3600... [2023-02-26 14:24:48,759][00203] Num frames 3700... [2023-02-26 14:24:49,019][00203] Num frames 3800... [2023-02-26 14:24:49,265][00203] Num frames 3900... [2023-02-26 14:24:49,515][00203] Num frames 4000... [2023-02-26 14:24:49,769][00203] Num frames 4100... [2023-02-26 14:24:49,862][00203] Avg episode rewards: #0: 22.780, true rewards: #0: 10.280 [2023-02-26 14:24:49,865][00203] Avg episode reward: 22.780, avg true_objective: 10.280 [2023-02-26 14:24:50,100][00203] Num frames 4200... [2023-02-26 14:24:50,358][00203] Num frames 4300... [2023-02-26 14:24:50,607][00203] Num frames 4400... [2023-02-26 14:24:50,858][00203] Num frames 4500... [2023-02-26 14:24:51,084][00203] Num frames 4600... [2023-02-26 14:24:51,200][00203] Avg episode rewards: #0: 19.648, true rewards: #0: 9.248 [2023-02-26 14:24:51,202][00203] Avg episode reward: 19.648, avg true_objective: 9.248 [2023-02-26 14:24:51,377][00203] Num frames 4700... [2023-02-26 14:24:51,601][00203] Num frames 4800... [2023-02-26 14:24:51,823][00203] Num frames 4900... [2023-02-26 14:24:52,042][00203] Num frames 5000... [2023-02-26 14:24:52,252][00203] Num frames 5100... [2023-02-26 14:24:52,477][00203] Num frames 5200... [2023-02-26 14:24:52,687][00203] Num frames 5300... [2023-02-26 14:24:52,914][00203] Num frames 5400... [2023-02-26 14:24:53,144][00203] Num frames 5500... [2023-02-26 14:24:53,372][00203] Num frames 5600... [2023-02-26 14:24:53,593][00203] Num frames 5700... [2023-02-26 14:24:53,834][00203] Num frames 5800... [2023-02-26 14:24:54,081][00203] Num frames 5900... [2023-02-26 14:24:54,307][00203] Num frames 6000... [2023-02-26 14:24:54,541][00203] Num frames 6100... [2023-02-26 14:24:54,768][00203] Num frames 6200... [2023-02-26 14:24:54,986][00203] Num frames 6300... [2023-02-26 14:24:55,200][00203] Num frames 6400... [2023-02-26 14:24:55,435][00203] Num frames 6500... [2023-02-26 14:24:55,665][00203] Num frames 6600... [2023-02-26 14:24:55,905][00203] Num frames 6700... [2023-02-26 14:24:56,019][00203] Avg episode rewards: #0: 26.040, true rewards: #0: 11.207 [2023-02-26 14:24:56,021][00203] Avg episode reward: 26.040, avg true_objective: 11.207 [2023-02-26 14:24:56,173][00203] Num frames 6800... [2023-02-26 14:24:56,357][00203] Num frames 6900... [2023-02-26 14:24:56,541][00203] Num frames 7000... [2023-02-26 14:24:56,733][00203] Num frames 7100... [2023-02-26 14:24:56,909][00203] Num frames 7200... [2023-02-26 14:24:57,100][00203] Num frames 7300... [2023-02-26 14:24:57,328][00203] Num frames 7400... [2023-02-26 14:24:57,576][00203] Num frames 7500... [2023-02-26 14:24:57,833][00203] Avg episode rewards: #0: 24.840, true rewards: #0: 10.840 [2023-02-26 14:24:57,836][00203] Avg episode reward: 24.840, avg true_objective: 10.840 [2023-02-26 14:24:57,871][00203] Num frames 7600... [2023-02-26 14:24:58,099][00203] Num frames 7700... [2023-02-26 14:24:58,284][00203] Num frames 7800... [2023-02-26 14:24:58,486][00203] Num frames 7900... [2023-02-26 14:24:58,659][00203] Num frames 8000... [2023-02-26 14:24:58,851][00203] Num frames 8100... [2023-02-26 14:24:59,029][00203] Num frames 8200... [2023-02-26 14:24:59,229][00203] Num frames 8300... [2023-02-26 14:24:59,445][00203] Num frames 8400... [2023-02-26 14:24:59,673][00203] Num frames 8500... [2023-02-26 14:24:59,897][00203] Num frames 8600... [2023-02-26 14:25:00,124][00203] Num frames 8700... [2023-02-26 14:25:00,314][00203] Avg episode rewards: #0: 24.840, true rewards: #0: 10.965 [2023-02-26 14:25:00,318][00203] Avg episode reward: 24.840, avg true_objective: 10.965 [2023-02-26 14:25:00,385][00203] Num frames 8800... [2023-02-26 14:25:00,578][00203] Num frames 8900... [2023-02-26 14:25:00,762][00203] Num frames 9000... [2023-02-26 14:25:00,956][00203] Num frames 9100... [2023-02-26 14:25:01,216][00203] Num frames 9200... [2023-02-26 14:25:01,486][00203] Num frames 9300... [2023-02-26 14:25:01,749][00203] Num frames 9400... [2023-02-26 14:25:02,010][00203] Num frames 9500... [2023-02-26 14:25:02,174][00203] Avg episode rewards: #0: 23.489, true rewards: #0: 10.600 [2023-02-26 14:25:02,177][00203] Avg episode reward: 23.489, avg true_objective: 10.600 [2023-02-26 14:25:02,368][00203] Num frames 9600... [2023-02-26 14:25:02,666][00203] Num frames 9700... [2023-02-26 14:25:02,973][00203] Num frames 9800... [2023-02-26 14:25:03,271][00203] Num frames 9900... [2023-02-26 14:25:03,518][00203] Num frames 10000... [2023-02-26 14:25:03,779][00203] Num frames 10100... [2023-02-26 14:25:04,042][00203] Num frames 10200... [2023-02-26 14:25:04,334][00203] Num frames 10300... [2023-02-26 14:25:04,628][00203] Avg episode rewards: #0: 22.672, true rewards: #0: 10.372 [2023-02-26 14:25:04,632][00203] Avg episode reward: 22.672, avg true_objective: 10.372 [2023-02-26 14:26:21,693][00203] Replay video saved to /content/train_dir/default_experiment/replay.mp4!