[2024-10-06 11:11:26,114][00491] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-10-06 11:11:26,119][00491] Rollout worker 0 uses device cpu [2024-10-06 11:11:26,123][00491] Rollout worker 1 uses device cpu [2024-10-06 11:11:26,131][00491] Rollout worker 2 uses device cpu [2024-10-06 11:11:26,132][00491] Rollout worker 3 uses device cpu [2024-10-06 11:11:26,135][00491] Rollout worker 4 uses device cpu [2024-10-06 11:11:26,137][00491] Rollout worker 5 uses device cpu [2024-10-06 11:11:26,140][00491] Rollout worker 6 uses device cpu [2024-10-06 11:11:26,143][00491] Rollout worker 7 uses device cpu [2024-10-06 11:14:15,798][00491] Environment doom_basic already registered, overwriting... [2024-10-06 11:14:15,805][00491] Environment doom_two_colors_easy already registered, overwriting... [2024-10-06 11:14:15,807][00491] Environment doom_two_colors_hard already registered, overwriting... [2024-10-06 11:14:15,809][00491] Environment doom_dm already registered, overwriting... [2024-10-06 11:14:15,814][00491] Environment doom_dwango5 already registered, overwriting... [2024-10-06 11:14:15,816][00491] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-10-06 11:14:15,817][00491] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-10-06 11:14:15,818][00491] Environment doom_my_way_home already registered, overwriting... [2024-10-06 11:14:15,823][00491] Environment doom_deadly_corridor already registered, overwriting... [2024-10-06 11:14:15,825][00491] Environment doom_defend_the_center already registered, overwriting... [2024-10-06 11:14:15,826][00491] Environment doom_defend_the_line already registered, overwriting... [2024-10-06 11:14:15,831][00491] Environment doom_health_gathering already registered, overwriting... [2024-10-06 11:14:15,833][00491] Environment doom_health_gathering_supreme already registered, overwriting... [2024-10-06 11:14:15,835][00491] Environment doom_battle already registered, overwriting... [2024-10-06 11:14:15,836][00491] Environment doom_battle2 already registered, overwriting... [2024-10-06 11:14:15,839][00491] Environment doom_duel_bots already registered, overwriting... [2024-10-06 11:14:15,841][00491] Environment doom_deathmatch_bots already registered, overwriting... [2024-10-06 11:14:15,843][00491] Environment doom_duel already registered, overwriting... [2024-10-06 11:14:15,845][00491] Environment doom_deathmatch_full already registered, overwriting... [2024-10-06 11:14:15,847][00491] Environment doom_benchmark already registered, overwriting... [2024-10-06 11:14:15,849][00491] register_encoder_factory: [2024-10-06 11:14:15,879][00491] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-06 11:14:15,888][00491] Experiment dir /content/train_dir/default_experiment already exists! [2024-10-06 11:14:15,890][00491] Resuming existing experiment from /content/train_dir/default_experiment... [2024-10-06 11:14:15,893][00491] Weights and Biases integration disabled [2024-10-06 11:14:15,899][00491] Environment var CUDA_VISIBLE_DEVICES is [2024-10-06 11:14:19,175][00491] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000000} git_hash=unknown git_repo_name=not a git repository [2024-10-06 11:14:19,179][00491] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-10-06 11:14:19,184][00491] Rollout worker 0 uses device cpu [2024-10-06 11:14:19,186][00491] Rollout worker 1 uses device cpu [2024-10-06 11:14:19,188][00491] Rollout worker 2 uses device cpu [2024-10-06 11:14:19,190][00491] Rollout worker 3 uses device cpu [2024-10-06 11:14:19,194][00491] Rollout worker 4 uses device cpu [2024-10-06 11:14:19,196][00491] Rollout worker 5 uses device cpu [2024-10-06 11:14:19,201][00491] Rollout worker 6 uses device cpu [2024-10-06 11:14:19,203][00491] Rollout worker 7 uses device cpu [2024-10-06 11:17:44,853][00491] Environment doom_basic already registered, overwriting... [2024-10-06 11:17:44,858][00491] Environment doom_two_colors_easy already registered, overwriting... [2024-10-06 11:17:44,861][00491] Environment doom_two_colors_hard already registered, overwriting... [2024-10-06 11:17:44,862][00491] Environment doom_dm already registered, overwriting... [2024-10-06 11:17:44,867][00491] Environment doom_dwango5 already registered, overwriting... [2024-10-06 11:17:44,868][00491] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-10-06 11:17:44,870][00491] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-10-06 11:17:44,871][00491] Environment doom_my_way_home already registered, overwriting... [2024-10-06 11:17:44,874][00491] Environment doom_deadly_corridor already registered, overwriting... [2024-10-06 11:17:44,876][00491] Environment doom_defend_the_center already registered, overwriting... [2024-10-06 11:17:44,878][00491] Environment doom_defend_the_line already registered, overwriting... [2024-10-06 11:17:44,880][00491] Environment doom_health_gathering already registered, overwriting... [2024-10-06 11:17:44,884][00491] Environment doom_health_gathering_supreme already registered, overwriting... [2024-10-06 11:17:44,885][00491] Environment doom_battle already registered, overwriting... [2024-10-06 11:17:44,886][00491] Environment doom_battle2 already registered, overwriting... [2024-10-06 11:17:44,890][00491] Environment doom_duel_bots already registered, overwriting... [2024-10-06 11:17:44,891][00491] Environment doom_deathmatch_bots already registered, overwriting... [2024-10-06 11:17:44,892][00491] Environment doom_duel already registered, overwriting... [2024-10-06 11:17:44,894][00491] Environment doom_deathmatch_full already registered, overwriting... [2024-10-06 11:17:44,897][00491] Environment doom_benchmark already registered, overwriting... [2024-10-06 11:17:44,898][00491] register_encoder_factory: [2024-10-06 11:17:44,938][00491] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-06 11:17:44,940][00491] Overriding arg 'device' with value 'cpu' passed from command line [2024-10-06 11:17:44,962][00491] Experiment dir /content/train_dir/default_experiment already exists! [2024-10-06 11:17:44,964][00491] Resuming existing experiment from /content/train_dir/default_experiment... [2024-10-06 11:17:44,966][00491] Weights and Biases integration disabled [2024-10-06 11:17:44,972][00491] Environment var CUDA_VISIBLE_DEVICES is [2024-10-06 11:17:48,272][00491] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000000} git_hash=unknown git_repo_name=not a git repository [2024-10-06 11:17:48,274][00491] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-10-06 11:17:48,279][00491] Rollout worker 0 uses device cpu [2024-10-06 11:17:48,281][00491] Rollout worker 1 uses device cpu [2024-10-06 11:17:48,284][00491] Rollout worker 2 uses device cpu [2024-10-06 11:17:48,285][00491] Rollout worker 3 uses device cpu [2024-10-06 11:17:48,286][00491] Rollout worker 4 uses device cpu [2024-10-06 11:17:48,288][00491] Rollout worker 5 uses device cpu [2024-10-06 11:17:48,290][00491] Rollout worker 6 uses device cpu [2024-10-06 11:17:48,291][00491] Rollout worker 7 uses device cpu [2024-10-06 11:17:48,422][00491] InferenceWorker_p0-w0: min num requests: 2 [2024-10-06 11:17:48,463][00491] Starting all processes... [2024-10-06 11:17:48,465][00491] Starting process learner_proc0 [2024-10-06 11:17:49,384][00491] Starting all processes... [2024-10-06 11:17:49,399][00491] Starting process inference_proc0-0 [2024-10-06 11:17:49,400][00491] Starting process rollout_proc0 [2024-10-06 11:17:49,402][00491] Starting process rollout_proc1 [2024-10-06 11:17:49,402][00491] Starting process rollout_proc2 [2024-10-06 11:17:49,402][00491] Starting process rollout_proc3 [2024-10-06 11:17:49,402][00491] Starting process rollout_proc4 [2024-10-06 11:17:49,402][00491] Starting process rollout_proc5 [2024-10-06 11:17:49,402][00491] Starting process rollout_proc6 [2024-10-06 11:17:49,402][00491] Starting process rollout_proc7 [2024-10-06 11:18:10,759][04742] Starting seed is not provided [2024-10-06 11:18:10,761][04742] Initializing actor-critic model on device cpu [2024-10-06 11:18:10,762][04742] RunningMeanStd input shape: (3, 72, 128) [2024-10-06 11:18:10,766][04742] RunningMeanStd input shape: (1,) [2024-10-06 11:18:10,764][00491] Heartbeat connected on Batcher_0 [2024-10-06 11:18:10,974][04742] ConvEncoder: input_channels=3 [2024-10-06 11:18:11,987][00491] Heartbeat connected on InferenceWorker_p0-w0 [2024-10-06 11:18:12,093][04762] Worker 7 uses CPU cores [1] [2024-10-06 11:18:12,145][00491] Heartbeat connected on RolloutWorker_w7 [2024-10-06 11:18:12,309][04761] Worker 5 uses CPU cores [1] [2024-10-06 11:18:12,346][04756] Worker 0 uses CPU cores [0] [2024-10-06 11:18:12,381][04760] Worker 4 uses CPU cores [0] [2024-10-06 11:18:12,392][04758] Worker 3 uses CPU cores [1] [2024-10-06 11:18:12,432][00491] Heartbeat connected on RolloutWorker_w5 [2024-10-06 11:18:12,459][04763] Worker 6 uses CPU cores [0] [2024-10-06 11:18:12,465][00491] Heartbeat connected on RolloutWorker_w3 [2024-10-06 11:18:12,506][00491] Heartbeat connected on RolloutWorker_w0 [2024-10-06 11:18:12,511][04759] Worker 2 uses CPU cores [0] [2024-10-06 11:18:12,527][00491] Heartbeat connected on RolloutWorker_w4 [2024-10-06 11:18:12,535][04757] Worker 1 uses CPU cores [1] [2024-10-06 11:18:12,539][00491] Heartbeat connected on RolloutWorker_w6 [2024-10-06 11:18:12,545][00491] Heartbeat connected on RolloutWorker_w2 [2024-10-06 11:18:12,576][00491] Heartbeat connected on RolloutWorker_w1 [2024-10-06 11:18:12,625][04742] Conv encoder output size: 512 [2024-10-06 11:18:12,626][04742] Policy head output size: 512 [2024-10-06 11:18:12,731][04742] Created Actor Critic model with architecture: [2024-10-06 11:18:12,732][04742] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-10-06 11:18:13,637][04742] Using optimizer [2024-10-06 11:18:14,858][04742] No checkpoints found [2024-10-06 11:18:14,858][04742] Did not load from checkpoint, starting from scratch! [2024-10-06 11:18:14,859][04742] Initialized policy 0 weights for model version 0 [2024-10-06 11:18:14,863][04742] LearnerWorker_p0 finished initialization! [2024-10-06 11:18:14,864][00491] Heartbeat connected on LearnerWorker_p0 [2024-10-06 11:18:14,872][04755] RunningMeanStd input shape: (3, 72, 128) [2024-10-06 11:18:14,874][04755] RunningMeanStd input shape: (1,) [2024-10-06 11:18:14,898][04755] ConvEncoder: input_channels=3 [2024-10-06 11:18:14,973][00491] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-06 11:18:15,110][04755] Conv encoder output size: 512 [2024-10-06 11:18:15,111][04755] Policy head output size: 512 [2024-10-06 11:18:15,142][00491] Inference worker 0-0 is ready! [2024-10-06 11:18:15,143][00491] All inference workers are ready! Signal rollout workers to start! [2024-10-06 11:18:15,482][04757] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 11:18:15,478][04761] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 11:18:15,480][04762] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 11:18:15,484][04758] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 11:18:15,494][04760] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 11:18:15,499][04756] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 11:18:15,497][04759] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 11:18:15,501][04763] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 11:18:17,251][04760] Decorrelating experience for 0 frames... [2024-10-06 11:18:17,254][04756] Decorrelating experience for 0 frames... [2024-10-06 11:18:17,252][04759] Decorrelating experience for 0 frames... [2024-10-06 11:18:17,627][04758] Decorrelating experience for 0 frames... [2024-10-06 11:18:17,622][04761] Decorrelating experience for 0 frames... [2024-10-06 11:18:17,625][04762] Decorrelating experience for 0 frames... [2024-10-06 11:18:17,638][04757] Decorrelating experience for 0 frames... [2024-10-06 11:18:18,270][04760] Decorrelating experience for 32 frames... [2024-10-06 11:18:18,273][04763] Decorrelating experience for 0 frames... [2024-10-06 11:18:19,475][04763] Decorrelating experience for 32 frames... [2024-10-06 11:18:19,497][04761] Decorrelating experience for 32 frames... [2024-10-06 11:18:19,529][04762] Decorrelating experience for 32 frames... [2024-10-06 11:18:19,973][00491] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-06 11:18:20,116][04758] Decorrelating experience for 32 frames... [2024-10-06 11:18:20,142][04757] Decorrelating experience for 32 frames... [2024-10-06 11:18:21,016][04763] Decorrelating experience for 64 frames... [2024-10-06 11:18:21,212][04760] Decorrelating experience for 64 frames... [2024-10-06 11:18:21,676][04761] Decorrelating experience for 64 frames... [2024-10-06 11:18:21,760][04762] Decorrelating experience for 64 frames... [2024-10-06 11:18:22,269][04756] Decorrelating experience for 32 frames... [2024-10-06 11:18:22,419][04758] Decorrelating experience for 64 frames... [2024-10-06 11:18:22,499][04757] Decorrelating experience for 64 frames... [2024-10-06 11:18:22,638][04760] Decorrelating experience for 96 frames... [2024-10-06 11:18:23,632][04762] Decorrelating experience for 96 frames... [2024-10-06 11:18:23,680][04763] Decorrelating experience for 96 frames... [2024-10-06 11:18:24,206][04761] Decorrelating experience for 96 frames... [2024-10-06 11:18:24,372][04758] Decorrelating experience for 96 frames... [2024-10-06 11:18:24,740][04759] Decorrelating experience for 32 frames... [2024-10-06 11:18:24,973][00491] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 15.6. Samples: 156. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-06 11:18:24,980][00491] Avg episode reward: [(0, '0.640')] [2024-10-06 11:18:26,097][04756] Decorrelating experience for 64 frames... [2024-10-06 11:18:28,485][04757] Decorrelating experience for 96 frames... [2024-10-06 11:18:29,973][00491] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 56.3. Samples: 844. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-06 11:18:29,976][00491] Avg episode reward: [(0, '2.048')] [2024-10-06 11:18:30,801][04759] Decorrelating experience for 64 frames... [2024-10-06 11:18:34,194][04742] Signal inference workers to stop experience collection... [2024-10-06 11:18:34,236][04755] InferenceWorker_p0-w0: stopping experience collection [2024-10-06 11:18:34,456][04756] Decorrelating experience for 96 frames... [2024-10-06 11:18:34,973][00491] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 105.0. Samples: 2100. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-06 11:18:34,977][00491] Avg episode reward: [(0, '2.531')] [2024-10-06 11:18:35,060][04759] Decorrelating experience for 96 frames... [2024-10-06 11:18:35,701][04742] Signal inference workers to resume experience collection... [2024-10-06 11:18:35,703][04755] InferenceWorker_p0-w0: resuming experience collection [2024-10-06 11:18:39,977][00491] Fps is (10 sec: 409.5, 60 sec: 163.8, 300 sec: 163.8). Total num frames: 4096. Throughput: 0: 142.1. Samples: 3552. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-10-06 11:18:39,982][00491] Avg episode reward: [(0, '2.774')] [2024-10-06 11:18:44,973][00491] Fps is (10 sec: 1228.7, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 12288. Throughput: 0: 147.2. Samples: 4416. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:18:44,980][00491] Avg episode reward: [(0, '3.125')] [2024-10-06 11:18:49,973][00491] Fps is (10 sec: 819.5, 60 sec: 351.1, 300 sec: 351.1). Total num frames: 12288. Throughput: 0: 153.1. Samples: 5358. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:18:49,976][00491] Avg episode reward: [(0, '3.469')] [2024-10-06 11:18:54,973][00491] Fps is (10 sec: 409.6, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 16384. Throughput: 0: 166.7. Samples: 6670. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:18:54,976][00491] Avg episode reward: [(0, '3.657')] [2024-10-06 11:18:59,973][00491] Fps is (10 sec: 1228.8, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 24576. Throughput: 0: 162.8. Samples: 7328. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:18:59,975][00491] Avg episode reward: [(0, '3.861')] [2024-10-06 11:19:04,976][00491] Fps is (10 sec: 1228.4, 60 sec: 573.4, 300 sec: 573.4). Total num frames: 28672. Throughput: 0: 193.9. Samples: 8726. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:19:04,980][00491] Avg episode reward: [(0, '3.895')] [2024-10-06 11:19:09,973][00491] Fps is (10 sec: 409.6, 60 sec: 521.3, 300 sec: 521.3). Total num frames: 28672. Throughput: 0: 212.7. Samples: 9728. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:19:09,985][00491] Avg episode reward: [(0, '3.947')] [2024-10-06 11:19:14,973][00491] Fps is (10 sec: 409.7, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 32768. Throughput: 0: 204.3. Samples: 10036. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:19:14,980][00491] Avg episode reward: [(0, '4.021')] [2024-10-06 11:19:19,637][04755] Updated weights for policy 0, policy_version 10 (0.1202) [2024-10-06 11:19:19,973][00491] Fps is (10 sec: 1228.8, 60 sec: 682.7, 300 sec: 630.2). Total num frames: 40960. Throughput: 0: 215.9. Samples: 11814. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:19:19,979][00491] Avg episode reward: [(0, '4.178')] [2024-10-06 11:19:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 585.1). Total num frames: 40960. Throughput: 0: 205.4. Samples: 12792. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:19:24,977][00491] Avg episode reward: [(0, '4.310')] [2024-10-06 11:19:29,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 600.7). Total num frames: 45056. Throughput: 0: 192.9. Samples: 13098. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:19:29,984][00491] Avg episode reward: [(0, '4.364')] [2024-10-06 11:19:34,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 665.6). Total num frames: 53248. Throughput: 0: 207.5. Samples: 14696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:19:34,981][00491] Avg episode reward: [(0, '4.383')] [2024-10-06 11:19:39,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 674.6). Total num frames: 57344. Throughput: 0: 205.9. Samples: 15936. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:19:39,976][00491] Avg episode reward: [(0, '4.421')] [2024-10-06 11:19:44,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 637.2). Total num frames: 57344. Throughput: 0: 205.0. Samples: 16552. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:19:44,976][00491] Avg episode reward: [(0, '4.356')] [2024-10-06 11:19:49,857][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000016_65536.pth... [2024-10-06 11:19:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 689.9). Total num frames: 65536. Throughput: 0: 206.2. Samples: 18004. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:19:49,978][00491] Avg episode reward: [(0, '4.333')] [2024-10-06 11:19:54,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 696.3). Total num frames: 69632. Throughput: 0: 208.3. Samples: 19100. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:19:54,978][00491] Avg episode reward: [(0, '4.340')] [2024-10-06 11:19:59,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 702.2). Total num frames: 73728. Throughput: 0: 220.7. Samples: 19966. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:19:59,981][00491] Avg episode reward: [(0, '4.356')] [2024-10-06 11:20:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 707.5). Total num frames: 77824. Throughput: 0: 200.9. Samples: 20854. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:20:04,976][00491] Avg episode reward: [(0, '4.461')] [2024-10-06 11:20:09,057][04755] Updated weights for policy 0, policy_version 20 (0.0511) [2024-10-06 11:20:09,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 712.3). Total num frames: 81920. Throughput: 0: 208.0. Samples: 22154. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:20:09,976][00491] Avg episode reward: [(0, '4.507')] [2024-10-06 11:20:14,974][00491] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 716.8). Total num frames: 86016. Throughput: 0: 222.8. Samples: 23126. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:20:14,981][00491] Avg episode reward: [(0, '4.530')] [2024-10-06 11:20:19,977][00491] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 720.9). Total num frames: 90112. Throughput: 0: 210.5. Samples: 24170. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:20:19,979][00491] Avg episode reward: [(0, '4.530')] [2024-10-06 11:20:24,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 724.7). Total num frames: 94208. Throughput: 0: 207.2. Samples: 25260. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:20:24,977][00491] Avg episode reward: [(0, '4.503')] [2024-10-06 11:20:29,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 728.2). Total num frames: 98304. Throughput: 0: 215.4. Samples: 26244. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:20:29,976][00491] Avg episode reward: [(0, '4.493')] [2024-10-06 11:20:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 731.4). Total num frames: 102400. Throughput: 0: 208.7. Samples: 27394. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:20:34,977][00491] Avg episode reward: [(0, '4.510')] [2024-10-06 11:20:38,620][04742] Saving new best policy, reward=4.510! [2024-10-06 11:20:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 734.5). Total num frames: 106496. Throughput: 0: 208.7. Samples: 28490. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:20:39,978][00491] Avg episode reward: [(0, '4.608')] [2024-10-06 11:20:42,585][04742] Saving new best policy, reward=4.608! [2024-10-06 11:20:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 737.3). Total num frames: 110592. Throughput: 0: 206.3. Samples: 29248. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:20:44,978][00491] Avg episode reward: [(0, '4.558')] [2024-10-06 11:20:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 739.9). Total num frames: 114688. Throughput: 0: 224.6. Samples: 30960. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:20:49,976][00491] Avg episode reward: [(0, '4.490')] [2024-10-06 11:20:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 742.4). Total num frames: 118784. Throughput: 0: 214.9. Samples: 31826. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:20:54,976][00491] Avg episode reward: [(0, '4.503')] [2024-10-06 11:20:57,957][04755] Updated weights for policy 0, policy_version 30 (0.0493) [2024-10-06 11:20:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 744.7). Total num frames: 122880. Throughput: 0: 204.4. Samples: 32322. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:20:59,976][00491] Avg episode reward: [(0, '4.398')] [2024-10-06 11:21:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 746.9). Total num frames: 126976. Throughput: 0: 215.7. Samples: 33874. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:21:04,981][00491] Avg episode reward: [(0, '4.381')] [2024-10-06 11:21:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 749.0). Total num frames: 131072. Throughput: 0: 226.2. Samples: 35438. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:21:09,976][00491] Avg episode reward: [(0, '4.403')] [2024-10-06 11:21:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 750.9). Total num frames: 135168. Throughput: 0: 209.6. Samples: 35678. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:21:14,975][00491] Avg episode reward: [(0, '4.456')] [2024-10-06 11:21:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 752.8). Total num frames: 139264. Throughput: 0: 219.8. Samples: 37286. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:21:19,980][00491] Avg episode reward: [(0, '4.436')] [2024-10-06 11:21:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 776.1). Total num frames: 147456. Throughput: 0: 223.6. Samples: 38550. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:21:24,979][00491] Avg episode reward: [(0, '4.384')] [2024-10-06 11:21:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 756.2). Total num frames: 147456. Throughput: 0: 221.5. Samples: 39216. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:21:29,976][00491] Avg episode reward: [(0, '4.394')] [2024-10-06 11:21:34,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 757.8). Total num frames: 151552. Throughput: 0: 208.4. Samples: 40338. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:21:34,982][00491] Avg episode reward: [(0, '4.486')] [2024-10-06 11:21:39,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 779.2). Total num frames: 159744. Throughput: 0: 218.0. Samples: 41636. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:21:39,978][00491] Avg episode reward: [(0, '4.502')] [2024-10-06 11:21:43,558][04755] Updated weights for policy 0, policy_version 40 (0.1459) [2024-10-06 11:21:44,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 780.2). Total num frames: 163840. Throughput: 0: 228.7. Samples: 42614. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:21:44,975][00491] Avg episode reward: [(0, '4.523')] [2024-10-06 11:21:49,907][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000041_167936.pth... [2024-10-06 11:21:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 781.1). Total num frames: 167936. Throughput: 0: 217.6. Samples: 43664. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:21:49,981][00491] Avg episode reward: [(0, '4.510')] [2024-10-06 11:21:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 782.0). Total num frames: 172032. Throughput: 0: 208.4. Samples: 44816. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:21:54,981][00491] Avg episode reward: [(0, '4.498')] [2024-10-06 11:21:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 782.8). Total num frames: 176128. Throughput: 0: 223.2. Samples: 45722. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:21:59,976][00491] Avg episode reward: [(0, '4.352')] [2024-10-06 11:22:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 783.6). Total num frames: 180224. Throughput: 0: 211.3. Samples: 46794. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:22:04,976][00491] Avg episode reward: [(0, '4.326')] [2024-10-06 11:22:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 784.3). Total num frames: 184320. Throughput: 0: 209.9. Samples: 47996. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:22:09,978][00491] Avg episode reward: [(0, '4.362')] [2024-10-06 11:22:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 785.1). Total num frames: 188416. Throughput: 0: 213.6. Samples: 48830. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:22:14,976][00491] Avg episode reward: [(0, '4.208')] [2024-10-06 11:22:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 785.8). Total num frames: 192512. Throughput: 0: 223.4. Samples: 50392. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:22:19,978][00491] Avg episode reward: [(0, '4.188')] [2024-10-06 11:22:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 786.4). Total num frames: 196608. Throughput: 0: 214.0. Samples: 51268. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:22:24,980][00491] Avg episode reward: [(0, '4.250')] [2024-10-06 11:22:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 787.1). Total num frames: 200704. Throughput: 0: 206.5. Samples: 51908. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:22:29,976][00491] Avg episode reward: [(0, '4.246')] [2024-10-06 11:22:31,825][04755] Updated weights for policy 0, policy_version 50 (0.0503) [2024-10-06 11:22:34,285][04742] Signal inference workers to stop experience collection... (50 times) [2024-10-06 11:22:34,354][04755] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-10-06 11:22:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 787.7). Total num frames: 204800. Throughput: 0: 218.5. Samples: 53498. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:22:34,981][00491] Avg episode reward: [(0, '4.252')] [2024-10-06 11:22:35,418][04742] Signal inference workers to resume experience collection... (50 times) [2024-10-06 11:22:35,419][04755] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-10-06 11:22:39,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 788.3). Total num frames: 208896. Throughput: 0: 222.4. Samples: 54826. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:22:39,979][00491] Avg episode reward: [(0, '4.263')] [2024-10-06 11:22:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 788.9). Total num frames: 212992. Throughput: 0: 209.3. Samples: 55142. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:22:44,976][00491] Avg episode reward: [(0, '4.298')] [2024-10-06 11:22:49,973][00491] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 789.4). Total num frames: 217088. Throughput: 0: 222.3. Samples: 56796. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:22:49,983][00491] Avg episode reward: [(0, '4.409')] [2024-10-06 11:22:54,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 804.6). Total num frames: 225280. Throughput: 0: 221.8. Samples: 57978. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:22:54,978][00491] Avg episode reward: [(0, '4.436')] [2024-10-06 11:22:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 790.5). Total num frames: 225280. Throughput: 0: 218.1. Samples: 58644. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:22:59,975][00491] Avg episode reward: [(0, '4.440')] [2024-10-06 11:23:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 805.1). Total num frames: 233472. Throughput: 0: 211.6. Samples: 59912. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:23:04,976][00491] Avg episode reward: [(0, '4.472')] [2024-10-06 11:23:09,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 805.3). Total num frames: 237568. Throughput: 0: 221.6. Samples: 61242. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:23:09,976][00491] Avg episode reward: [(0, '4.472')] [2024-10-06 11:23:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 241664. Throughput: 0: 225.2. Samples: 62040. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:23:14,976][00491] Avg episode reward: [(0, '4.486')] [2024-10-06 11:23:19,343][04755] Updated weights for policy 0, policy_version 60 (0.0526) [2024-10-06 11:23:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 245760. Throughput: 0: 213.3. Samples: 63098. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:23:19,976][00491] Avg episode reward: [(0, '4.482')] [2024-10-06 11:23:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 249856. Throughput: 0: 209.9. Samples: 64270. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:23:24,976][00491] Avg episode reward: [(0, '4.476')] [2024-10-06 11:23:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 253952. Throughput: 0: 223.2. Samples: 65184. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:23:29,982][00491] Avg episode reward: [(0, '4.516')] [2024-10-06 11:23:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 258048. Throughput: 0: 212.0. Samples: 66338. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:23:34,975][00491] Avg episode reward: [(0, '4.503')] [2024-10-06 11:23:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 262144. Throughput: 0: 213.6. Samples: 67592. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:23:39,976][00491] Avg episode reward: [(0, '4.483')] [2024-10-06 11:23:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 266240. Throughput: 0: 213.4. Samples: 68248. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:23:44,981][00491] Avg episode reward: [(0, '4.390')] [2024-10-06 11:23:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 270336. Throughput: 0: 221.3. Samples: 69870. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:23:49,975][00491] Avg episode reward: [(0, '4.396')] [2024-10-06 11:23:51,816][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth... [2024-10-06 11:23:52,046][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000016_65536.pth [2024-10-06 11:23:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 274432. Throughput: 0: 212.3. Samples: 70796. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:23:54,978][00491] Avg episode reward: [(0, '4.416')] [2024-10-06 11:23:59,976][00491] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 278528. Throughput: 0: 206.0. Samples: 71312. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:23:59,979][00491] Avg episode reward: [(0, '4.488')] [2024-10-06 11:24:04,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 847.0). Total num frames: 278528. Throughput: 0: 206.4. Samples: 72384. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:24:04,978][00491] Avg episode reward: [(0, '4.494')] [2024-10-06 11:24:09,973][00491] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 847.0). Total num frames: 282624. Throughput: 0: 199.6. Samples: 73252. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:24:09,978][00491] Avg episode reward: [(0, '4.508')] [2024-10-06 11:24:11,825][04755] Updated weights for policy 0, policy_version 70 (0.0514) [2024-10-06 11:24:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 286720. Throughput: 0: 184.9. Samples: 73504. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:24:14,976][00491] Avg episode reward: [(0, '4.423')] [2024-10-06 11:24:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 847.0). Total num frames: 290816. Throughput: 0: 195.3. Samples: 75128. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:24:19,982][00491] Avg episode reward: [(0, '4.401')] [2024-10-06 11:24:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 847.0). Total num frames: 294912. Throughput: 0: 198.1. Samples: 76508. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:24:24,976][00491] Avg episode reward: [(0, '4.422')] [2024-10-06 11:24:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 299008. Throughput: 0: 192.1. Samples: 76894. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:24:29,977][00491] Avg episode reward: [(0, '4.357')] [2024-10-06 11:24:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 303104. Throughput: 0: 183.3. Samples: 78120. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:24:34,976][00491] Avg episode reward: [(0, '4.412')] [2024-10-06 11:24:39,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 311296. Throughput: 0: 193.7. Samples: 79512. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:24:39,975][00491] Avg episode reward: [(0, '4.406')] [2024-10-06 11:24:44,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 315392. Throughput: 0: 202.0. Samples: 80400. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:24:44,978][00491] Avg episode reward: [(0, '4.450')] [2024-10-06 11:24:49,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 315392. Throughput: 0: 197.2. Samples: 81258. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:24:49,984][00491] Avg episode reward: [(0, '4.355')] [2024-10-06 11:24:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 323584. Throughput: 0: 209.3. Samples: 82670. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:24:54,980][00491] Avg episode reward: [(0, '4.215')] [2024-10-06 11:24:58,673][04755] Updated weights for policy 0, policy_version 80 (0.1010) [2024-10-06 11:24:59,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 327680. Throughput: 0: 222.1. Samples: 83498. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:24:59,977][00491] Avg episode reward: [(0, '4.347')] [2024-10-06 11:25:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 331776. Throughput: 0: 211.5. Samples: 84644. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:25:04,976][00491] Avg episode reward: [(0, '4.307')] [2024-10-06 11:25:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 335872. Throughput: 0: 205.9. Samples: 85772. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:25:09,978][00491] Avg episode reward: [(0, '4.242')] [2024-10-06 11:25:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 339968. Throughput: 0: 218.5. Samples: 86728. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:25:14,978][00491] Avg episode reward: [(0, '4.271')] [2024-10-06 11:25:19,975][00491] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 344064. Throughput: 0: 214.2. Samples: 87760. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:25:19,979][00491] Avg episode reward: [(0, '4.258')] [2024-10-06 11:25:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 348160. Throughput: 0: 207.7. Samples: 88860. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:25:24,982][00491] Avg episode reward: [(0, '4.233')] [2024-10-06 11:25:29,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 352256. Throughput: 0: 204.6. Samples: 89606. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:25:29,976][00491] Avg episode reward: [(0, '4.262')] [2024-10-06 11:25:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 356352. Throughput: 0: 212.2. Samples: 90808. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) [2024-10-06 11:25:34,981][00491] Avg episode reward: [(0, '4.265')] [2024-10-06 11:25:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 360448. Throughput: 0: 207.7. Samples: 92018. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) [2024-10-06 11:25:39,978][00491] Avg episode reward: [(0, '4.313')] [2024-10-06 11:25:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 364544. Throughput: 0: 203.7. Samples: 92666. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:25:44,979][00491] Avg episode reward: [(0, '4.371')] [2024-10-06 11:25:47,758][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000090_368640.pth... [2024-10-06 11:25:47,763][04755] Updated weights for policy 0, policy_version 90 (0.1428) [2024-10-06 11:25:47,872][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000041_167936.pth [2024-10-06 11:25:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 368640. Throughput: 0: 208.1. Samples: 94010. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:25:49,979][00491] Avg episode reward: [(0, '4.381')] [2024-10-06 11:25:54,977][00491] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 372736. Throughput: 0: 214.9. Samples: 95444. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:25:54,983][00491] Avg episode reward: [(0, '4.410')] [2024-10-06 11:25:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 376832. Throughput: 0: 202.7. Samples: 95848. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:25:59,984][00491] Avg episode reward: [(0, '4.460')] [2024-10-06 11:26:04,973][00491] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 380928. Throughput: 0: 205.3. Samples: 96996. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:26:04,977][00491] Avg episode reward: [(0, '4.490')] [2024-10-06 11:26:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 385024. Throughput: 0: 217.3. Samples: 98638. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:26:09,975][00491] Avg episode reward: [(0, '4.482')] [2024-10-06 11:26:14,977][00491] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 847.0). Total num frames: 389120. Throughput: 0: 211.5. Samples: 99126. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:26:14,987][00491] Avg episode reward: [(0, '4.453')] [2024-10-06 11:26:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 393216. Throughput: 0: 206.0. Samples: 100080. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:26:19,976][00491] Avg episode reward: [(0, '4.447')] [2024-10-06 11:26:24,973][00491] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 397312. Throughput: 0: 215.9. Samples: 101732. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:26:24,981][00491] Avg episode reward: [(0, '4.470')] [2024-10-06 11:26:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 401408. Throughput: 0: 216.7. Samples: 102418. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:26:29,976][00491] Avg episode reward: [(0, '4.424')] [2024-10-06 11:26:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 405504. Throughput: 0: 208.7. Samples: 103402. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:26:34,975][00491] Avg episode reward: [(0, '4.427')] [2024-10-06 11:26:36,967][04755] Updated weights for policy 0, policy_version 100 (0.0980) [2024-10-06 11:26:39,459][04742] Signal inference workers to stop experience collection... (100 times) [2024-10-06 11:26:39,543][04755] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-10-06 11:26:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 409600. Throughput: 0: 210.6. Samples: 104920. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:26:39,982][00491] Avg episode reward: [(0, '4.450')] [2024-10-06 11:26:41,016][04742] Signal inference workers to resume experience collection... (100 times) [2024-10-06 11:26:41,018][04755] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-10-06 11:26:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 413696. Throughput: 0: 215.6. Samples: 105548. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:26:44,976][00491] Avg episode reward: [(0, '4.492')] [2024-10-06 11:26:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 417792. Throughput: 0: 216.8. Samples: 106750. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:26:49,979][00491] Avg episode reward: [(0, '4.464')] [2024-10-06 11:26:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 421888. Throughput: 0: 204.8. Samples: 107852. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:26:54,976][00491] Avg episode reward: [(0, '4.538')] [2024-10-06 11:26:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 425984. Throughput: 0: 209.7. Samples: 108560. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:26:59,979][00491] Avg episode reward: [(0, '4.520')] [2024-10-06 11:27:04,980][00491] Fps is (10 sec: 1228.0, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 434176. Throughput: 0: 223.3. Samples: 110130. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:27:04,982][00491] Avg episode reward: [(0, '4.519')] [2024-10-06 11:27:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 434176. Throughput: 0: 208.2. Samples: 111100. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:27:09,976][00491] Avg episode reward: [(0, '4.509')] [2024-10-06 11:27:14,973][00491] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 442368. Throughput: 0: 202.1. Samples: 111514. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:27:14,981][00491] Avg episode reward: [(0, '4.405')] [2024-10-06 11:27:19,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 446464. Throughput: 0: 215.1. Samples: 113080. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:27:19,980][00491] Avg episode reward: [(0, '4.346')] [2024-10-06 11:27:23,540][04755] Updated weights for policy 0, policy_version 110 (0.0536) [2024-10-06 11:27:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 450560. Throughput: 0: 207.8. Samples: 114270. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:27:24,977][00491] Avg episode reward: [(0, '4.352')] [2024-10-06 11:27:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 454656. Throughput: 0: 208.4. Samples: 114928. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:27:29,980][00491] Avg episode reward: [(0, '4.363')] [2024-10-06 11:27:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 458752. Throughput: 0: 209.1. Samples: 116158. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:27:34,976][00491] Avg episode reward: [(0, '4.242')] [2024-10-06 11:27:39,975][00491] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 462848. Throughput: 0: 218.7. Samples: 117692. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:27:39,978][00491] Avg episode reward: [(0, '4.246')] [2024-10-06 11:27:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 466944. Throughput: 0: 215.3. Samples: 118248. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:27:44,981][00491] Avg episode reward: [(0, '4.240')] [2024-10-06 11:27:48,754][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000115_471040.pth... [2024-10-06 11:27:48,867][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth [2024-10-06 11:27:49,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 471040. Throughput: 0: 202.2. Samples: 119228. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:27:49,978][00491] Avg episode reward: [(0, '4.338')] [2024-10-06 11:27:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 475136. Throughput: 0: 214.4. Samples: 120746. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:27:54,977][00491] Avg episode reward: [(0, '4.403')] [2024-10-06 11:27:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 479232. Throughput: 0: 218.3. Samples: 121338. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:27:59,975][00491] Avg episode reward: [(0, '4.413')] [2024-10-06 11:28:04,976][00491] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 483328. Throughput: 0: 203.7. Samples: 122248. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:28:04,980][00491] Avg episode reward: [(0, '4.440')] [2024-10-06 11:28:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 487424. Throughput: 0: 211.5. Samples: 123788. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:28:09,975][00491] Avg episode reward: [(0, '4.505')] [2024-10-06 11:28:11,852][04755] Updated weights for policy 0, policy_version 120 (0.0529) [2024-10-06 11:28:14,973][00491] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 491520. Throughput: 0: 211.6. Samples: 124450. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:28:14,975][00491] Avg episode reward: [(0, '4.432')] [2024-10-06 11:28:19,977][00491] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 833.1). Total num frames: 495616. Throughput: 0: 213.7. Samples: 125776. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:28:19,980][00491] Avg episode reward: [(0, '4.487')] [2024-10-06 11:28:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 499712. Throughput: 0: 205.1. Samples: 126920. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:28:24,976][00491] Avg episode reward: [(0, '4.498')] [2024-10-06 11:28:29,973][00491] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 503808. Throughput: 0: 207.3. Samples: 127576. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-10-06 11:28:29,980][00491] Avg episode reward: [(0, '4.512')] [2024-10-06 11:28:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 507904. Throughput: 0: 225.0. Samples: 129354. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-10-06 11:28:34,977][00491] Avg episode reward: [(0, '4.469')] [2024-10-06 11:28:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 512000. Throughput: 0: 207.5. Samples: 130084. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:28:39,975][00491] Avg episode reward: [(0, '4.482')] [2024-10-06 11:28:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 516096. Throughput: 0: 205.3. Samples: 130578. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:28:44,977][00491] Avg episode reward: [(0, '4.441')] [2024-10-06 11:28:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 520192. Throughput: 0: 227.6. Samples: 132488. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:28:49,982][00491] Avg episode reward: [(0, '4.457')] [2024-10-06 11:28:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 524288. Throughput: 0: 219.1. Samples: 133646. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:28:54,979][00491] Avg episode reward: [(0, '4.519')] [2024-10-06 11:28:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 528384. Throughput: 0: 211.3. Samples: 133958. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:28:59,982][00491] Avg episode reward: [(0, '4.519')] [2024-10-06 11:29:01,025][04755] Updated weights for policy 0, policy_version 130 (0.1132) [2024-10-06 11:29:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 532480. Throughput: 0: 218.0. Samples: 135584. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:29:04,976][00491] Avg episode reward: [(0, '4.510')] [2024-10-06 11:29:09,980][00491] Fps is (10 sec: 1228.0, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 540672. Throughput: 0: 216.9. Samples: 136682. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:29:09,984][00491] Avg episode reward: [(0, '4.528')] [2024-10-06 11:29:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 540672. Throughput: 0: 216.3. Samples: 137308. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:29:14,975][00491] Avg episode reward: [(0, '4.554')] [2024-10-06 11:29:19,973][00491] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 548864. Throughput: 0: 205.9. Samples: 138620. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:29:19,978][00491] Avg episode reward: [(0, '4.518')] [2024-10-06 11:29:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 552960. Throughput: 0: 217.4. Samples: 139866. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:29:24,979][00491] Avg episode reward: [(0, '4.508')] [2024-10-06 11:29:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 557056. Throughput: 0: 228.0. Samples: 140838. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:29:29,976][00491] Avg episode reward: [(0, '4.500')] [2024-10-06 11:29:34,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 557056. Throughput: 0: 201.9. Samples: 141572. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:29:34,982][00491] Avg episode reward: [(0, '4.472')] [2024-10-06 11:29:39,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 561152. Throughput: 0: 206.6. Samples: 142942. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:29:39,981][00491] Avg episode reward: [(0, '4.367')] [2024-10-06 11:29:44,975][00491] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 569344. Throughput: 0: 218.5. Samples: 143790. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:29:44,978][00491] Avg episode reward: [(0, '4.305')] [2024-10-06 11:29:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 569344. Throughput: 0: 208.7. Samples: 144974. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:29:49,976][00491] Avg episode reward: [(0, '4.312')] [2024-10-06 11:29:50,192][04755] Updated weights for policy 0, policy_version 140 (0.0999) [2024-10-06 11:29:50,195][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000140_573440.pth... [2024-10-06 11:29:50,351][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000090_368640.pth [2024-10-06 11:29:54,973][00491] Fps is (10 sec: 409.7, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 573440. Throughput: 0: 206.3. Samples: 145966. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:29:54,976][00491] Avg episode reward: [(0, '4.397')] [2024-10-06 11:29:59,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 581632. Throughput: 0: 210.0. Samples: 146760. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:29:59,976][00491] Avg episode reward: [(0, '4.338')] [2024-10-06 11:30:04,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 585728. Throughput: 0: 207.6. Samples: 147962. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:30:04,979][00491] Avg episode reward: [(0, '4.294')] [2024-10-06 11:30:09,973][00491] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 833.1). Total num frames: 585728. Throughput: 0: 203.3. Samples: 149016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:30:09,976][00491] Avg episode reward: [(0, '4.357')] [2024-10-06 11:30:14,974][00491] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 593920. Throughput: 0: 198.5. Samples: 149772. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:30:14,983][00491] Avg episode reward: [(0, '4.324')] [2024-10-06 11:30:19,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 598016. Throughput: 0: 211.0. Samples: 151066. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:30:19,978][00491] Avg episode reward: [(0, '4.436')] [2024-10-06 11:30:24,978][00491] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 847.0). Total num frames: 602112. Throughput: 0: 204.8. Samples: 152160. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:30:24,984][00491] Avg episode reward: [(0, '4.485')] [2024-10-06 11:30:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 606208. Throughput: 0: 200.2. Samples: 152800. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:30:29,976][00491] Avg episode reward: [(0, '4.531')] [2024-10-06 11:30:34,973][00491] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 610304. Throughput: 0: 204.4. Samples: 154172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:30:34,981][00491] Avg episode reward: [(0, '4.398')] [2024-10-06 11:30:38,311][04755] Updated weights for policy 0, policy_version 150 (0.1062) [2024-10-06 11:30:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 614400. Throughput: 0: 209.1. Samples: 155376. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:30:39,979][00491] Avg episode reward: [(0, '4.433')] [2024-10-06 11:30:43,043][04742] Signal inference workers to stop experience collection... (150 times) [2024-10-06 11:30:43,120][04755] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-10-06 11:30:43,810][04742] Signal inference workers to resume experience collection... (150 times) [2024-10-06 11:30:43,811][04755] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-10-06 11:30:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 618496. Throughput: 0: 205.6. Samples: 156012. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:30:44,983][00491] Avg episode reward: [(0, '4.357')] [2024-10-06 11:30:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 622592. Throughput: 0: 206.5. Samples: 157254. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:30:49,976][00491] Avg episode reward: [(0, '4.384')] [2024-10-06 11:30:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 626688. Throughput: 0: 218.8. Samples: 158862. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:30:54,976][00491] Avg episode reward: [(0, '4.409')] [2024-10-06 11:30:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 630784. Throughput: 0: 212.7. Samples: 159344. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:30:59,981][00491] Avg episode reward: [(0, '4.460')] [2024-10-06 11:31:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 634880. Throughput: 0: 205.8. Samples: 160326. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:31:04,981][00491] Avg episode reward: [(0, '4.454')] [2024-10-06 11:31:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 638976. Throughput: 0: 209.3. Samples: 161578. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:31:09,978][00491] Avg episode reward: [(0, '4.399')] [2024-10-06 11:31:14,975][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 643072. Throughput: 0: 212.5. Samples: 162362. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:31:14,977][00491] Avg episode reward: [(0, '4.353')] [2024-10-06 11:31:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 647168. Throughput: 0: 207.8. Samples: 163524. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:31:19,975][00491] Avg episode reward: [(0, '4.395')] [2024-10-06 11:31:24,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.3, 300 sec: 847.0). Total num frames: 651264. Throughput: 0: 207.8. Samples: 164726. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:31:24,976][00491] Avg episode reward: [(0, '4.428')] [2024-10-06 11:31:27,995][04755] Updated weights for policy 0, policy_version 160 (0.0524) [2024-10-06 11:31:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 655360. Throughput: 0: 210.5. Samples: 165484. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:31:29,975][00491] Avg episode reward: [(0, '4.396')] [2024-10-06 11:31:34,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 659456. Throughput: 0: 216.2. Samples: 166982. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:31:34,977][00491] Avg episode reward: [(0, '4.380')] [2024-10-06 11:31:39,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 663552. Throughput: 0: 200.4. Samples: 167878. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:31:39,981][00491] Avg episode reward: [(0, '4.365')] [2024-10-06 11:31:44,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 667648. Throughput: 0: 205.4. Samples: 168588. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:31:44,975][00491] Avg episode reward: [(0, '4.383')] [2024-10-06 11:31:46,871][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000164_671744.pth... [2024-10-06 11:31:46,991][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000115_471040.pth [2024-10-06 11:31:49,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 671744. Throughput: 0: 218.4. Samples: 170154. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:31:49,975][00491] Avg episode reward: [(0, '4.408')] [2024-10-06 11:31:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 675840. Throughput: 0: 210.6. Samples: 171054. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:31:54,981][00491] Avg episode reward: [(0, '4.405')] [2024-10-06 11:31:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 679936. Throughput: 0: 206.8. Samples: 171666. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:31:59,978][00491] Avg episode reward: [(0, '4.360')] [2024-10-06 11:32:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 684032. Throughput: 0: 207.8. Samples: 172876. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:32:04,977][00491] Avg episode reward: [(0, '4.378')] [2024-10-06 11:32:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 688128. Throughput: 0: 216.5. Samples: 174470. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:32:09,975][00491] Avg episode reward: [(0, '4.314')] [2024-10-06 11:32:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 692224. Throughput: 0: 206.7. Samples: 174784. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:32:14,977][00491] Avg episode reward: [(0, '4.330')] [2024-10-06 11:32:18,136][04755] Updated weights for policy 0, policy_version 170 (0.0520) [2024-10-06 11:32:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 696320. Throughput: 0: 196.4. Samples: 175818. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:32:19,976][00491] Avg episode reward: [(0, '4.301')] [2024-10-06 11:32:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 700416. Throughput: 0: 218.6. Samples: 177716. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:32:24,975][00491] Avg episode reward: [(0, '4.352')] [2024-10-06 11:32:29,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 704512. Throughput: 0: 205.0. Samples: 177812. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:32:29,984][00491] Avg episode reward: [(0, '4.299')] [2024-10-06 11:32:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 708608. Throughput: 0: 192.9. Samples: 178836. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:32:34,978][00491] Avg episode reward: [(0, '4.309')] [2024-10-06 11:32:39,973][00491] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 712704. Throughput: 0: 203.6. Samples: 180214. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:32:39,975][00491] Avg episode reward: [(0, '4.266')] [2024-10-06 11:32:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 716800. Throughput: 0: 207.1. Samples: 180986. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:32:44,976][00491] Avg episode reward: [(0, '4.278')] [2024-10-06 11:32:49,976][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 720896. Throughput: 0: 207.3. Samples: 182206. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:32:49,979][00491] Avg episode reward: [(0, '4.320')] [2024-10-06 11:32:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 724992. Throughput: 0: 200.7. Samples: 183502. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:32:54,976][00491] Avg episode reward: [(0, '4.341')] [2024-10-06 11:32:59,973][00491] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 729088. Throughput: 0: 207.6. Samples: 184126. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:32:59,975][00491] Avg episode reward: [(0, '4.441')] [2024-10-06 11:33:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 733184. Throughput: 0: 219.6. Samples: 185700. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:33:04,977][00491] Avg episode reward: [(0, '4.503')] [2024-10-06 11:33:06,549][04755] Updated weights for policy 0, policy_version 180 (0.0528) [2024-10-06 11:33:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 737280. Throughput: 0: 197.5. Samples: 186604. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-10-06 11:33:09,978][00491] Avg episode reward: [(0, '4.568')] [2024-10-06 11:33:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 741376. Throughput: 0: 208.2. Samples: 187180. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:33:14,982][00491] Avg episode reward: [(0, '4.649')] [2024-10-06 11:33:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 745472. Throughput: 0: 221.6. Samples: 188810. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:33:19,979][00491] Avg episode reward: [(0, '4.694')] [2024-10-06 11:33:19,973][04742] Saving new best policy, reward=4.649! [2024-10-06 11:33:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 749568. Throughput: 0: 218.5. Samples: 190046. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:33:24,976][00491] Avg episode reward: [(0, '4.726')] [2024-10-06 11:33:26,402][04742] Saving new best policy, reward=4.694! [2024-10-06 11:33:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 753664. Throughput: 0: 206.0. Samples: 190254. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:33:29,981][00491] Avg episode reward: [(0, '4.725')] [2024-10-06 11:33:30,881][04742] Saving new best policy, reward=4.726! [2024-10-06 11:33:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 757760. Throughput: 0: 216.8. Samples: 191960. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:33:34,981][00491] Avg episode reward: [(0, '4.807')] [2024-10-06 11:33:39,809][04742] Saving new best policy, reward=4.807! [2024-10-06 11:33:39,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 765952. Throughput: 0: 215.0. Samples: 193176. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:33:39,976][00491] Avg episode reward: [(0, '4.826')] [2024-10-06 11:33:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 765952. Throughput: 0: 212.7. Samples: 193698. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:33:44,978][00491] Avg episode reward: [(0, '5.010')] [2024-10-06 11:33:45,943][04742] Saving new best policy, reward=4.826! [2024-10-06 11:33:49,851][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000189_774144.pth... [2024-10-06 11:33:49,970][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000140_573440.pth [2024-10-06 11:33:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 774144. Throughput: 0: 209.3. Samples: 195118. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:33:49,978][00491] Avg episode reward: [(0, '5.147')] [2024-10-06 11:33:49,987][04742] Saving new best policy, reward=5.010! [2024-10-06 11:33:54,009][04742] Saving new best policy, reward=5.147! [2024-10-06 11:33:54,014][04755] Updated weights for policy 0, policy_version 190 (0.0568) [2024-10-06 11:33:54,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 778240. Throughput: 0: 215.7. Samples: 196310. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:33:54,979][00491] Avg episode reward: [(0, '5.072')] [2024-10-06 11:33:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 782336. Throughput: 0: 223.7. Samples: 197246. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:33:59,981][00491] Avg episode reward: [(0, '5.076')] [2024-10-06 11:34:04,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 782336. Throughput: 0: 206.1. Samples: 198084. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:34:04,977][00491] Avg episode reward: [(0, '5.031')] [2024-10-06 11:34:09,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 786432. Throughput: 0: 203.4. Samples: 199198. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:34:09,988][00491] Avg episode reward: [(0, '4.888')] [2024-10-06 11:34:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 790528. Throughput: 0: 200.8. Samples: 199288. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:34:14,976][00491] Avg episode reward: [(0, '4.901')] [2024-10-06 11:34:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 794624. Throughput: 0: 185.8. Samples: 200322. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:34:19,980][00491] Avg episode reward: [(0, '4.856')] [2024-10-06 11:34:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 798720. Throughput: 0: 181.9. Samples: 201360. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:34:24,978][00491] Avg episode reward: [(0, '4.913')] [2024-10-06 11:34:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 802816. Throughput: 0: 193.3. Samples: 202396. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:34:29,975][00491] Avg episode reward: [(0, '4.850')] [2024-10-06 11:34:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 806912. Throughput: 0: 185.9. Samples: 203484. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:34:34,976][00491] Avg episode reward: [(0, '4.765')] [2024-10-06 11:34:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 811008. Throughput: 0: 181.5. Samples: 204478. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:34:39,976][00491] Avg episode reward: [(0, '4.627')] [2024-10-06 11:34:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 815104. Throughput: 0: 182.7. Samples: 205468. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:34:44,984][00491] Avg episode reward: [(0, '4.526')] [2024-10-06 11:34:47,772][04755] Updated weights for policy 0, policy_version 200 (0.1324) [2024-10-06 11:34:49,974][00491] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 819200. Throughput: 0: 189.6. Samples: 206618. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:34:49,979][00491] Avg episode reward: [(0, '4.477')] [2024-10-06 11:34:50,720][04742] Signal inference workers to stop experience collection... (200 times) [2024-10-06 11:34:50,810][04755] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-10-06 11:34:52,352][04742] Signal inference workers to resume experience collection... (200 times) [2024-10-06 11:34:52,354][04755] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-10-06 11:34:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 823296. Throughput: 0: 194.4. Samples: 207944. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:34:54,976][00491] Avg episode reward: [(0, '4.444')] [2024-10-06 11:34:59,973][00491] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 827392. Throughput: 0: 205.0. Samples: 208512. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-10-06 11:34:59,977][00491] Avg episode reward: [(0, '4.479')] [2024-10-06 11:35:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 831488. Throughput: 0: 213.2. Samples: 209916. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:04,977][00491] Avg episode reward: [(0, '4.486')] [2024-10-06 11:35:09,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 835584. Throughput: 0: 224.1. Samples: 211446. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:09,977][00491] Avg episode reward: [(0, '4.477')] [2024-10-06 11:35:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 839680. Throughput: 0: 206.4. Samples: 211684. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:14,978][00491] Avg episode reward: [(0, '4.511')] [2024-10-06 11:35:19,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 843776. Throughput: 0: 211.9. Samples: 213020. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:19,980][00491] Avg episode reward: [(0, '4.469')] [2024-10-06 11:35:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 847872. Throughput: 0: 225.4. Samples: 214622. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:24,976][00491] Avg episode reward: [(0, '4.403')] [2024-10-06 11:35:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 851968. Throughput: 0: 212.6. Samples: 215036. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:29,980][00491] Avg episode reward: [(0, '4.378')] [2024-10-06 11:35:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 856064. Throughput: 0: 211.8. Samples: 216148. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:34,976][00491] Avg episode reward: [(0, '4.473')] [2024-10-06 11:35:36,593][04755] Updated weights for policy 0, policy_version 210 (0.0998) [2024-10-06 11:35:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 860160. Throughput: 0: 217.2. Samples: 217718. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:39,976][00491] Avg episode reward: [(0, '4.568')] [2024-10-06 11:35:44,974][00491] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 868352. Throughput: 0: 222.5. Samples: 218524. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:44,976][00491] Avg episode reward: [(0, '4.667')] [2024-10-06 11:35:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 868352. Throughput: 0: 215.6. Samples: 219618. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:49,976][00491] Avg episode reward: [(0, '4.653')] [2024-10-06 11:35:50,454][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000213_872448.pth... [2024-10-06 11:35:50,584][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000164_671744.pth [2024-10-06 11:35:54,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 872448. Throughput: 0: 207.6. Samples: 220790. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:54,981][00491] Avg episode reward: [(0, '4.673')] [2024-10-06 11:35:59,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 880640. Throughput: 0: 217.6. Samples: 221474. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:35:59,980][00491] Avg episode reward: [(0, '4.538')] [2024-10-06 11:36:04,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 884736. Throughput: 0: 217.8. Samples: 222822. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:36:04,985][00491] Avg episode reward: [(0, '4.569')] [2024-10-06 11:36:09,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 884736. Throughput: 0: 205.1. Samples: 223852. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:36:09,979][00491] Avg episode reward: [(0, '4.524')] [2024-10-06 11:36:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 892928. Throughput: 0: 213.9. Samples: 224662. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:36:14,977][00491] Avg episode reward: [(0, '4.553')] [2024-10-06 11:36:19,973][00491] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 897024. Throughput: 0: 215.9. Samples: 225862. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:36:19,977][00491] Avg episode reward: [(0, '4.455')] [2024-10-06 11:36:23,236][04755] Updated weights for policy 0, policy_version 220 (0.0530) [2024-10-06 11:36:24,980][00491] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 901120. Throughput: 0: 207.7. Samples: 227064. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:36:24,985][00491] Avg episode reward: [(0, '4.354')] [2024-10-06 11:36:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 905216. Throughput: 0: 205.5. Samples: 227772. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:36:29,976][00491] Avg episode reward: [(0, '4.296')] [2024-10-06 11:36:34,973][00491] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 909312. Throughput: 0: 207.9. Samples: 228972. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:36:34,976][00491] Avg episode reward: [(0, '4.236')] [2024-10-06 11:36:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 913408. Throughput: 0: 215.8. Samples: 230500. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:36:39,979][00491] Avg episode reward: [(0, '4.255')] [2024-10-06 11:36:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 917504. Throughput: 0: 212.0. Samples: 231014. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:36:44,979][00491] Avg episode reward: [(0, '4.288')] [2024-10-06 11:36:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 921600. Throughput: 0: 204.2. Samples: 232010. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:36:49,976][00491] Avg episode reward: [(0, '4.298')] [2024-10-06 11:36:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 925696. Throughput: 0: 220.6. Samples: 233780. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:36:54,978][00491] Avg episode reward: [(0, '4.288')] [2024-10-06 11:36:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 929792. Throughput: 0: 210.9. Samples: 234152. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:36:59,976][00491] Avg episode reward: [(0, '4.298')] [2024-10-06 11:37:04,976][00491] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 933888. Throughput: 0: 204.7. Samples: 235076. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:37:04,983][00491] Avg episode reward: [(0, '4.288')] [2024-10-06 11:37:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 937984. Throughput: 0: 216.5. Samples: 236806. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:37:09,976][00491] Avg episode reward: [(0, '4.349')] [2024-10-06 11:37:11,506][04755] Updated weights for policy 0, policy_version 230 (0.0556) [2024-10-06 11:37:14,973][00491] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 942080. Throughput: 0: 213.4. Samples: 237376. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:37:14,975][00491] Avg episode reward: [(0, '4.373')] [2024-10-06 11:37:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 946176. Throughput: 0: 211.6. Samples: 238492. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:37:19,981][00491] Avg episode reward: [(0, '4.354')] [2024-10-06 11:37:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 950272. Throughput: 0: 209.0. Samples: 239904. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:37:24,982][00491] Avg episode reward: [(0, '4.423')] [2024-10-06 11:37:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 954368. Throughput: 0: 206.7. Samples: 240316. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:37:29,981][00491] Avg episode reward: [(0, '4.410')] [2024-10-06 11:37:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 958464. Throughput: 0: 221.4. Samples: 241974. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:37:34,977][00491] Avg episode reward: [(0, '4.417')] [2024-10-06 11:37:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 962560. Throughput: 0: 202.5. Samples: 242892. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:37:39,975][00491] Avg episode reward: [(0, '4.475')] [2024-10-06 11:37:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 966656. Throughput: 0: 209.1. Samples: 243560. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:37:44,976][00491] Avg episode reward: [(0, '4.541')] [2024-10-06 11:37:49,346][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000238_974848.pth... [2024-10-06 11:37:49,463][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000189_774144.pth [2024-10-06 11:37:49,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 974848. Throughput: 0: 226.0. Samples: 245246. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:37:49,979][00491] Avg episode reward: [(0, '4.565')] [2024-10-06 11:37:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 974848. Throughput: 0: 211.6. Samples: 246326. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:37:54,976][00491] Avg episode reward: [(0, '4.605')] [2024-10-06 11:37:59,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 978944. Throughput: 0: 207.4. Samples: 246708. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:37:59,979][00491] Avg episode reward: [(0, '4.594')] [2024-10-06 11:38:00,317][04755] Updated weights for policy 0, policy_version 240 (0.0544) [2024-10-06 11:38:04,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 987136. Throughput: 0: 219.1. Samples: 248352. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:38:04,978][00491] Avg episode reward: [(0, '4.642')] [2024-10-06 11:38:09,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 991232. Throughput: 0: 212.0. Samples: 249446. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:38:09,976][00491] Avg episode reward: [(0, '4.678')] [2024-10-06 11:38:14,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 991232. Throughput: 0: 217.7. Samples: 250114. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:38:14,976][00491] Avg episode reward: [(0, '4.562')] [2024-10-06 11:38:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 999424. Throughput: 0: 211.0. Samples: 251468. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:38:19,978][00491] Avg episode reward: [(0, '4.622')] [2024-10-06 11:38:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1003520. Throughput: 0: 215.6. Samples: 252596. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:38:24,975][00491] Avg episode reward: [(0, '4.579')] [2024-10-06 11:38:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1007616. Throughput: 0: 220.3. Samples: 253474. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:38:29,980][00491] Avg episode reward: [(0, '4.629')] [2024-10-06 11:38:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1011712. Throughput: 0: 206.9. Samples: 254556. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:38:34,976][00491] Avg episode reward: [(0, '4.567')] [2024-10-06 11:38:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1015808. Throughput: 0: 206.8. Samples: 255632. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:38:39,976][00491] Avg episode reward: [(0, '4.579')] [2024-10-06 11:38:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1019904. Throughput: 0: 220.5. Samples: 256632. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:38:44,980][00491] Avg episode reward: [(0, '4.552')] [2024-10-06 11:38:49,020][04755] Updated weights for policy 0, policy_version 250 (0.0971) [2024-10-06 11:38:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1024000. Throughput: 0: 206.6. Samples: 257648. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:38:49,981][00491] Avg episode reward: [(0, '4.575')] [2024-10-06 11:38:52,393][04742] Signal inference workers to stop experience collection... (250 times) [2024-10-06 11:38:52,440][04755] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-10-06 11:38:53,833][04742] Signal inference workers to resume experience collection... (250 times) [2024-10-06 11:38:53,834][04755] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-10-06 11:38:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1028096. Throughput: 0: 207.5. Samples: 258782. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:38:54,977][00491] Avg episode reward: [(0, '4.661')] [2024-10-06 11:38:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1032192. Throughput: 0: 213.0. Samples: 259700. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:38:59,975][00491] Avg episode reward: [(0, '4.665')] [2024-10-06 11:39:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1036288. Throughput: 0: 209.2. Samples: 260880. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:04,978][00491] Avg episode reward: [(0, '4.760')] [2024-10-06 11:39:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1040384. Throughput: 0: 205.9. Samples: 261860. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:09,982][00491] Avg episode reward: [(0, '4.714')] [2024-10-06 11:39:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1044480. Throughput: 0: 206.8. Samples: 262778. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:14,975][00491] Avg episode reward: [(0, '4.603')] [2024-10-06 11:39:19,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1048576. Throughput: 0: 213.3. Samples: 264156. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:19,978][00491] Avg episode reward: [(0, '4.721')] [2024-10-06 11:39:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1052672. Throughput: 0: 215.6. Samples: 265334. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:24,985][00491] Avg episode reward: [(0, '4.704')] [2024-10-06 11:39:29,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1056768. Throughput: 0: 205.0. Samples: 265858. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:29,981][00491] Avg episode reward: [(0, '4.630')] [2024-10-06 11:39:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1060864. Throughput: 0: 217.1. Samples: 267416. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:34,983][00491] Avg episode reward: [(0, '4.627')] [2024-10-06 11:39:35,992][04755] Updated weights for policy 0, policy_version 260 (0.1003) [2024-10-06 11:39:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1064960. Throughput: 0: 223.4. Samples: 268836. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:39,979][00491] Avg episode reward: [(0, '4.517')] [2024-10-06 11:39:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1069056. Throughput: 0: 207.2. Samples: 269024. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:44,976][00491] Avg episode reward: [(0, '4.547')] [2024-10-06 11:39:46,817][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000262_1073152.pth... [2024-10-06 11:39:46,941][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000213_872448.pth [2024-10-06 11:39:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1073152. Throughput: 0: 212.5. Samples: 270444. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:49,978][00491] Avg episode reward: [(0, '4.511')] [2024-10-06 11:39:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1077248. Throughput: 0: 222.8. Samples: 271886. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:54,976][00491] Avg episode reward: [(0, '4.475')] [2024-10-06 11:39:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1081344. Throughput: 0: 210.7. Samples: 272260. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:39:59,980][00491] Avg episode reward: [(0, '4.487')] [2024-10-06 11:40:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1085440. Throughput: 0: 203.6. Samples: 273316. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:40:04,976][00491] Avg episode reward: [(0, '4.536')] [2024-10-06 11:40:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1089536. Throughput: 0: 213.5. Samples: 274940. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:40:09,985][00491] Avg episode reward: [(0, '4.445')] [2024-10-06 11:40:14,975][00491] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 1097728. Throughput: 0: 220.1. Samples: 275762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:40:14,982][00491] Avg episode reward: [(0, '4.580')] [2024-10-06 11:40:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1097728. Throughput: 0: 208.5. Samples: 276800. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:40:19,980][00491] Avg episode reward: [(0, '4.664')] [2024-10-06 11:40:24,740][04755] Updated weights for policy 0, policy_version 270 (0.0529) [2024-10-06 11:40:24,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1105920. Throughput: 0: 203.8. Samples: 278006. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:40:24,976][00491] Avg episode reward: [(0, '4.729')] [2024-10-06 11:40:29,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1110016. Throughput: 0: 221.6. Samples: 278994. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:40:29,975][00491] Avg episode reward: [(0, '4.731')] [2024-10-06 11:40:34,976][00491] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 1114112. Throughput: 0: 213.4. Samples: 280046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:40:34,983][00491] Avg episode reward: [(0, '4.781')] [2024-10-06 11:40:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1118208. Throughput: 0: 205.5. Samples: 281134. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:40:39,976][00491] Avg episode reward: [(0, '4.876')] [2024-10-06 11:40:44,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1122304. Throughput: 0: 218.4. Samples: 282090. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:40:44,976][00491] Avg episode reward: [(0, '4.869')] [2024-10-06 11:40:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1126400. Throughput: 0: 220.2. Samples: 283224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:40:49,984][00491] Avg episode reward: [(0, '4.801')] [2024-10-06 11:40:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1130496. Throughput: 0: 207.5. Samples: 284276. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:40:54,978][00491] Avg episode reward: [(0, '4.758')] [2024-10-06 11:40:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1134592. Throughput: 0: 205.7. Samples: 285018. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:40:59,979][00491] Avg episode reward: [(0, '4.784')] [2024-10-06 11:41:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1138688. Throughput: 0: 211.9. Samples: 286334. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:41:04,976][00491] Avg episode reward: [(0, '4.656')] [2024-10-06 11:41:09,974][00491] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 1142784. Throughput: 0: 214.3. Samples: 287652. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:41:09,978][00491] Avg episode reward: [(0, '4.751')] [2024-10-06 11:41:14,238][04755] Updated weights for policy 0, policy_version 280 (0.1453) [2024-10-06 11:41:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1146880. Throughput: 0: 207.2. Samples: 288318. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:41:14,983][00491] Avg episode reward: [(0, '4.774')] [2024-10-06 11:41:19,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1150976. Throughput: 0: 208.0. Samples: 289406. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:41:19,978][00491] Avg episode reward: [(0, '4.817')] [2024-10-06 11:41:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1155072. Throughput: 0: 220.0. Samples: 291032. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:41:24,976][00491] Avg episode reward: [(0, '4.744')] [2024-10-06 11:41:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1159168. Throughput: 0: 207.5. Samples: 291426. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:41:29,981][00491] Avg episode reward: [(0, '4.780')] [2024-10-06 11:41:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1163264. Throughput: 0: 204.3. Samples: 292418. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:41:34,975][00491] Avg episode reward: [(0, '4.744')] [2024-10-06 11:41:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1167360. Throughput: 0: 217.0. Samples: 294040. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:41:39,982][00491] Avg episode reward: [(0, '4.845')] [2024-10-06 11:41:44,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1171456. Throughput: 0: 210.3. Samples: 294482. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:41:44,978][00491] Avg episode reward: [(0, '4.881')] [2024-10-06 11:41:47,570][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000287_1175552.pth... [2024-10-06 11:41:47,731][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000238_974848.pth [2024-10-06 11:41:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1175552. Throughput: 0: 206.3. Samples: 295616. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:41:49,977][00491] Avg episode reward: [(0, '4.963')] [2024-10-06 11:41:54,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1179648. Throughput: 0: 203.5. Samples: 296810. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:41:54,981][00491] Avg episode reward: [(0, '5.048')] [2024-10-06 11:41:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1183744. Throughput: 0: 205.9. Samples: 297582. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:41:59,982][00491] Avg episode reward: [(0, '4.919')] [2024-10-06 11:42:01,671][04755] Updated weights for policy 0, policy_version 290 (0.2207) [2024-10-06 11:42:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1187840. Throughput: 0: 209.8. Samples: 298848. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:42:04,980][00491] Avg episode reward: [(0, '4.763')] [2024-10-06 11:42:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1191936. Throughput: 0: 200.4. Samples: 300052. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:42:09,976][00491] Avg episode reward: [(0, '4.871')] [2024-10-06 11:42:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1196032. Throughput: 0: 205.7. Samples: 300684. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:42:14,976][00491] Avg episode reward: [(0, '4.901')] [2024-10-06 11:42:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1200128. Throughput: 0: 219.3. Samples: 302286. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:42:19,981][00491] Avg episode reward: [(0, '4.808')] [2024-10-06 11:42:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1204224. Throughput: 0: 207.3. Samples: 303370. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:42:24,982][00491] Avg episode reward: [(0, '4.859')] [2024-10-06 11:42:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1208320. Throughput: 0: 206.4. Samples: 303768. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:42:29,982][00491] Avg episode reward: [(0, '4.882')] [2024-10-06 11:42:34,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1216512. Throughput: 0: 221.4. Samples: 305578. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:42:34,976][00491] Avg episode reward: [(0, '4.873')] [2024-10-06 11:42:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1216512. Throughput: 0: 219.3. Samples: 306680. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:42:39,978][00491] Avg episode reward: [(0, '4.824')] [2024-10-06 11:42:44,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1220608. Throughput: 0: 210.6. Samples: 307058. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:42:44,975][00491] Avg episode reward: [(0, '4.755')] [2024-10-06 11:42:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1224704. Throughput: 0: 212.0. Samples: 308390. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:42:49,981][00491] Avg episode reward: [(0, '4.861')] [2024-10-06 11:42:50,219][04755] Updated weights for policy 0, policy_version 300 (0.1444) [2024-10-06 11:42:52,602][04742] Signal inference workers to stop experience collection... (300 times) [2024-10-06 11:42:52,663][04755] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-10-06 11:42:54,219][04742] Signal inference workers to resume experience collection... (300 times) [2024-10-06 11:42:54,223][04755] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-10-06 11:42:54,973][00491] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1232896. Throughput: 0: 216.0. Samples: 309772. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:42:54,981][00491] Avg episode reward: [(0, '4.835')] [2024-10-06 11:42:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1232896. Throughput: 0: 214.8. Samples: 310348. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:42:59,976][00491] Avg episode reward: [(0, '4.872')] [2024-10-06 11:43:04,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1236992. Throughput: 0: 202.0. Samples: 311376. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:43:04,976][00491] Avg episode reward: [(0, '4.797')] [2024-10-06 11:43:09,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1245184. Throughput: 0: 211.3. Samples: 312880. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:43:09,976][00491] Avg episode reward: [(0, '4.917')] [2024-10-06 11:43:14,976][00491] Fps is (10 sec: 1228.4, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 1249280. Throughput: 0: 223.4. Samples: 313820. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:43:14,983][00491] Avg episode reward: [(0, '4.852')] [2024-10-06 11:43:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1253376. Throughput: 0: 205.4. Samples: 314822. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:43:19,976][00491] Avg episode reward: [(0, '4.747')] [2024-10-06 11:43:24,974][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1257472. Throughput: 0: 204.3. Samples: 315872. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:43:24,981][00491] Avg episode reward: [(0, '4.834')] [2024-10-06 11:43:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1261568. Throughput: 0: 216.4. Samples: 316794. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:43:29,980][00491] Avg episode reward: [(0, '4.851')] [2024-10-06 11:43:34,975][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1265664. Throughput: 0: 210.7. Samples: 317870. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:43:34,979][00491] Avg episode reward: [(0, '4.850')] [2024-10-06 11:43:39,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1265664. Throughput: 0: 203.9. Samples: 318946. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:43:39,976][00491] Avg episode reward: [(0, '4.785')] [2024-10-06 11:43:40,335][04755] Updated weights for policy 0, policy_version 310 (0.2060) [2024-10-06 11:43:44,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1273856. Throughput: 0: 206.5. Samples: 319640. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:43:44,982][00491] Avg episode reward: [(0, '4.811')] [2024-10-06 11:43:48,665][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000312_1277952.pth... [2024-10-06 11:43:48,782][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000262_1073152.pth [2024-10-06 11:43:49,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1277952. Throughput: 0: 213.3. Samples: 320974. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:43:49,976][00491] Avg episode reward: [(0, '4.733')] [2024-10-06 11:43:54,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1282048. Throughput: 0: 201.2. Samples: 321934. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:43:54,982][00491] Avg episode reward: [(0, '4.730')] [2024-10-06 11:43:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1286144. Throughput: 0: 193.9. Samples: 322544. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:43:59,979][00491] Avg episode reward: [(0, '4.812')] [2024-10-06 11:44:04,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1290240. Throughput: 0: 204.8. Samples: 324036. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:44:04,982][00491] Avg episode reward: [(0, '4.737')] [2024-10-06 11:44:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1294336. Throughput: 0: 209.2. Samples: 325286. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:44:09,980][00491] Avg episode reward: [(0, '4.812')] [2024-10-06 11:44:14,976][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1298432. Throughput: 0: 203.1. Samples: 325934. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:44:14,984][00491] Avg episode reward: [(0, '4.842')] [2024-10-06 11:44:19,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 1298432. Throughput: 0: 203.5. Samples: 327026. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:44:19,985][00491] Avg episode reward: [(0, '4.845')] [2024-10-06 11:44:24,979][00491] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 1302528. Throughput: 0: 194.4. Samples: 327694. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:44:24,995][00491] Avg episode reward: [(0, '4.806')] [2024-10-06 11:44:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 1306624. Throughput: 0: 188.0. Samples: 328100. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:44:29,978][00491] Avg episode reward: [(0, '4.836')] [2024-10-06 11:44:34,973][00491] Fps is (10 sec: 409.8, 60 sec: 682.7, 300 sec: 819.2). Total num frames: 1306624. Throughput: 0: 176.7. Samples: 328926. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:44:34,975][00491] Avg episode reward: [(0, '4.829')] [2024-10-06 11:44:35,772][04755] Updated weights for policy 0, policy_version 320 (0.1530) [2024-10-06 11:44:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1314816. Throughput: 0: 182.0. Samples: 330124. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:44:39,980][00491] Avg episode reward: [(0, '4.878')] [2024-10-06 11:44:44,977][00491] Fps is (10 sec: 1228.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 1318912. Throughput: 0: 190.4. Samples: 331114. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:44:44,980][00491] Avg episode reward: [(0, '4.786')] [2024-10-06 11:44:49,973][00491] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 819.2). Total num frames: 1318912. Throughput: 0: 179.6. Samples: 332120. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:44:49,976][00491] Avg episode reward: [(0, '4.819')] [2024-10-06 11:44:54,973][00491] Fps is (10 sec: 409.8, 60 sec: 682.7, 300 sec: 819.2). Total num frames: 1323008. Throughput: 0: 176.1. Samples: 333212. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:44:54,981][00491] Avg episode reward: [(0, '4.870')] [2024-10-06 11:44:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 819.2). Total num frames: 1327104. Throughput: 0: 174.6. Samples: 333792. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:44:59,984][00491] Avg episode reward: [(0, '4.797')] [2024-10-06 11:45:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 819.2). Total num frames: 1331200. Throughput: 0: 178.0. Samples: 335034. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:45:04,977][00491] Avg episode reward: [(0, '4.699')] [2024-10-06 11:45:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 805.3). Total num frames: 1335296. Throughput: 0: 183.9. Samples: 335968. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:45:09,980][00491] Avg episode reward: [(0, '4.644')] [2024-10-06 11:45:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 819.2). Total num frames: 1339392. Throughput: 0: 183.0. Samples: 336336. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:45:14,976][00491] Avg episode reward: [(0, '4.594')] [2024-10-06 11:45:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1343488. Throughput: 0: 203.3. Samples: 338074. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:45:19,981][00491] Avg episode reward: [(0, '4.657')] [2024-10-06 11:45:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 1347584. Throughput: 0: 198.8. Samples: 339072. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:45:24,978][00491] Avg episode reward: [(0, '4.775')] [2024-10-06 11:45:26,824][04755] Updated weights for policy 0, policy_version 330 (0.1592) [2024-10-06 11:45:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1351680. Throughput: 0: 185.9. Samples: 339478. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:45:29,976][00491] Avg episode reward: [(0, '4.742')] [2024-10-06 11:45:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1355776. Throughput: 0: 191.1. Samples: 340720. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:45:34,976][00491] Avg episode reward: [(0, '4.759')] [2024-10-06 11:45:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1359872. Throughput: 0: 205.5. Samples: 342458. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:45:39,976][00491] Avg episode reward: [(0, '4.893')] [2024-10-06 11:45:44,977][00491] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1363968. Throughput: 0: 198.8. Samples: 342738. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:45:44,982][00491] Avg episode reward: [(0, '4.819')] [2024-10-06 11:45:47,410][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000334_1368064.pth... [2024-10-06 11:45:47,497][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000287_1175552.pth [2024-10-06 11:45:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1368064. Throughput: 0: 193.4. Samples: 343738. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:45:49,976][00491] Avg episode reward: [(0, '4.823')] [2024-10-06 11:45:54,973][00491] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1372160. Throughput: 0: 210.7. Samples: 345450. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:45:54,980][00491] Avg episode reward: [(0, '4.938')] [2024-10-06 11:45:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1376256. Throughput: 0: 214.4. Samples: 345986. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:45:59,980][00491] Avg episode reward: [(0, '4.876')] [2024-10-06 11:46:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1380352. Throughput: 0: 199.2. Samples: 347036. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:04,981][00491] Avg episode reward: [(0, '4.935')] [2024-10-06 11:46:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1384448. Throughput: 0: 206.1. Samples: 348346. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:09,976][00491] Avg episode reward: [(0, '4.874')] [2024-10-06 11:46:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1388544. Throughput: 0: 205.3. Samples: 348718. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:14,983][00491] Avg episode reward: [(0, '4.834')] [2024-10-06 11:46:15,942][04755] Updated weights for policy 0, policy_version 340 (0.1075) [2024-10-06 11:46:19,973][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1392640. Throughput: 0: 211.3. Samples: 350230. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:19,980][00491] Avg episode reward: [(0, '4.854')] [2024-10-06 11:46:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1396736. Throughput: 0: 192.7. Samples: 351128. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:24,982][00491] Avg episode reward: [(0, '4.828')] [2024-10-06 11:46:29,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1400832. Throughput: 0: 200.1. Samples: 351742. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:29,984][00491] Avg episode reward: [(0, '4.882')] [2024-10-06 11:46:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1404928. Throughput: 0: 213.8. Samples: 353360. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:34,976][00491] Avg episode reward: [(0, '4.862')] [2024-10-06 11:46:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1409024. Throughput: 0: 197.2. Samples: 354326. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:39,981][00491] Avg episode reward: [(0, '4.971')] [2024-10-06 11:46:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 1413120. Throughput: 0: 196.5. Samples: 354828. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:44,975][00491] Avg episode reward: [(0, '4.935')] [2024-10-06 11:46:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1417216. Throughput: 0: 203.1. Samples: 356174. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:49,983][00491] Avg episode reward: [(0, '5.008')] [2024-10-06 11:46:54,983][00491] Fps is (10 sec: 818.4, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 1421312. Throughput: 0: 206.8. Samples: 357652. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:54,991][00491] Avg episode reward: [(0, '4.944')] [2024-10-06 11:46:59,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1425408. Throughput: 0: 204.3. Samples: 357910. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:46:59,983][00491] Avg episode reward: [(0, '4.931')] [2024-10-06 11:47:04,973][00491] Fps is (10 sec: 820.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1429504. Throughput: 0: 193.4. Samples: 358934. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:47:04,976][00491] Avg episode reward: [(0, '5.023')] [2024-10-06 11:47:07,936][04755] Updated weights for policy 0, policy_version 350 (0.0054) [2024-10-06 11:47:09,973][00491] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1433600. Throughput: 0: 206.4. Samples: 360416. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:47:09,983][00491] Avg episode reward: [(0, '4.929')] [2024-10-06 11:47:10,793][04742] Signal inference workers to stop experience collection... (350 times) [2024-10-06 11:47:10,839][04755] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-10-06 11:47:13,164][04742] Signal inference workers to resume experience collection... (350 times) [2024-10-06 11:47:13,165][04755] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-10-06 11:47:14,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1437696. Throughput: 0: 206.3. Samples: 361026. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:47:14,979][00491] Avg episode reward: [(0, '4.936')] [2024-10-06 11:47:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1441792. Throughput: 0: 192.0. Samples: 361998. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:47:19,976][00491] Avg episode reward: [(0, '4.876')] [2024-10-06 11:47:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1445888. Throughput: 0: 197.4. Samples: 363210. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:47:24,978][00491] Avg episode reward: [(0, '4.843')] [2024-10-06 11:47:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1449984. Throughput: 0: 205.0. Samples: 364054. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:47:29,978][00491] Avg episode reward: [(0, '4.853')] [2024-10-06 11:47:34,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1454080. Throughput: 0: 198.3. Samples: 365096. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:47:34,986][00491] Avg episode reward: [(0, '4.787')] [2024-10-06 11:47:39,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1454080. Throughput: 0: 188.3. Samples: 366122. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:47:39,983][00491] Avg episode reward: [(0, '4.839')] [2024-10-06 11:47:44,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1462272. Throughput: 0: 200.6. Samples: 366938. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:47:44,986][00491] Avg episode reward: [(0, '4.931')] [2024-10-06 11:47:49,401][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000358_1466368.pth... [2024-10-06 11:47:49,542][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000312_1277952.pth [2024-10-06 11:47:49,979][00491] Fps is (10 sec: 1228.1, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 1466368. Throughput: 0: 205.5. Samples: 368182. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:47:49,988][00491] Avg episode reward: [(0, '4.923')] [2024-10-06 11:47:54,973][00491] Fps is (10 sec: 409.6, 60 sec: 751.1, 300 sec: 791.4). Total num frames: 1466368. Throughput: 0: 194.1. Samples: 369150. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:47:54,980][00491] Avg episode reward: [(0, '4.946')] [2024-10-06 11:47:59,973][00491] Fps is (10 sec: 409.8, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 1470464. Throughput: 0: 186.4. Samples: 369412. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:47:59,983][00491] Avg episode reward: [(0, '5.021')] [2024-10-06 11:48:01,298][04755] Updated weights for policy 0, policy_version 360 (0.1528) [2024-10-06 11:48:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1474560. Throughput: 0: 199.7. Samples: 370984. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:48:04,982][00491] Avg episode reward: [(0, '5.178')] [2024-10-06 11:48:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 1478656. Throughput: 0: 201.0. Samples: 372256. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:48:09,976][00491] Avg episode reward: [(0, '5.116')] [2024-10-06 11:48:10,389][04742] Saving new best policy, reward=5.178! [2024-10-06 11:48:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1482752. Throughput: 0: 191.3. Samples: 372662. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:48:14,978][00491] Avg episode reward: [(0, '5.188')] [2024-10-06 11:48:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 1486848. Throughput: 0: 197.3. Samples: 373976. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:48:19,982][00491] Avg episode reward: [(0, '5.253')] [2024-10-06 11:48:20,493][04742] Saving new best policy, reward=5.188! [2024-10-06 11:48:24,655][04742] Saving new best policy, reward=5.253! [2024-10-06 11:48:24,973][00491] Fps is (10 sec: 1228.7, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1495040. Throughput: 0: 205.0. Samples: 375348. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:48:24,977][00491] Avg episode reward: [(0, '5.321')] [2024-10-06 11:48:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.6). Total num frames: 1495040. Throughput: 0: 201.4. Samples: 376000. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:48:29,976][00491] Avg episode reward: [(0, '5.262')] [2024-10-06 11:48:30,729][04742] Saving new best policy, reward=5.321! [2024-10-06 11:48:34,973][00491] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 1499136. Throughput: 0: 193.4. Samples: 376882. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:48:34,985][00491] Avg episode reward: [(0, '5.112')] [2024-10-06 11:48:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1503232. Throughput: 0: 204.9. Samples: 378370. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:48:39,982][00491] Avg episode reward: [(0, '5.138')] [2024-10-06 11:48:44,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1511424. Throughput: 0: 217.2. Samples: 379186. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:48:44,980][00491] Avg episode reward: [(0, '5.180')] [2024-10-06 11:48:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 777.5). Total num frames: 1511424. Throughput: 0: 205.0. Samples: 380210. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:48:49,976][00491] Avg episode reward: [(0, '5.131')] [2024-10-06 11:48:51,229][04755] Updated weights for policy 0, policy_version 370 (0.0517) [2024-10-06 11:48:54,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1515520. Throughput: 0: 205.2. Samples: 381488. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:48:54,988][00491] Avg episode reward: [(0, '5.089')] [2024-10-06 11:48:59,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 791.4). Total num frames: 1523712. Throughput: 0: 209.6. Samples: 382094. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:48:59,978][00491] Avg episode reward: [(0, '5.184')] [2024-10-06 11:49:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 1523712. Throughput: 0: 210.4. Samples: 383446. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:49:04,977][00491] Avg episode reward: [(0, '5.207')] [2024-10-06 11:49:09,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 1527808. Throughput: 0: 200.4. Samples: 384364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:49:09,976][00491] Avg episode reward: [(0, '5.342')] [2024-10-06 11:49:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 1531904. Throughput: 0: 194.8. Samples: 384764. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:49:14,981][00491] Avg episode reward: [(0, '5.336')] [2024-10-06 11:49:15,398][04742] Saving new best policy, reward=5.342! [2024-10-06 11:49:19,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 805.3). Total num frames: 1540096. Throughput: 0: 215.1. Samples: 386562. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:49:19,978][00491] Avg episode reward: [(0, '5.252')] [2024-10-06 11:49:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 1540096. Throughput: 0: 204.1. Samples: 387556. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:49:24,978][00491] Avg episode reward: [(0, '5.439')] [2024-10-06 11:49:29,976][00491] Fps is (10 sec: 409.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1544192. Throughput: 0: 193.9. Samples: 387912. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:49:29,979][00491] Avg episode reward: [(0, '5.571')] [2024-10-06 11:49:30,490][04742] Saving new best policy, reward=5.439! [2024-10-06 11:49:34,602][04742] Saving new best policy, reward=5.571! [2024-10-06 11:49:34,973][00491] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 805.3). Total num frames: 1552384. Throughput: 0: 209.6. Samples: 389644. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:49:34,978][00491] Avg episode reward: [(0, '5.545')] [2024-10-06 11:49:39,240][04755] Updated weights for policy 0, policy_version 380 (0.1043) [2024-10-06 11:49:39,973][00491] Fps is (10 sec: 1229.2, 60 sec: 887.5, 300 sec: 805.3). Total num frames: 1556480. Throughput: 0: 202.8. Samples: 390616. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:49:39,982][00491] Avg episode reward: [(0, '5.620')] [2024-10-06 11:49:44,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1556480. Throughput: 0: 205.1. Samples: 391324. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:49:44,976][00491] Avg episode reward: [(0, '5.632')] [2024-10-06 11:49:45,678][04742] Saving new best policy, reward=5.620! [2024-10-06 11:49:49,897][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000382_1564672.pth... [2024-10-06 11:49:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1564672. Throughput: 0: 203.8. Samples: 392618. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:49:49,983][00491] Avg episode reward: [(0, '5.565')] [2024-10-06 11:49:50,027][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000334_1368064.pth [2024-10-06 11:49:50,046][04742] Saving new best policy, reward=5.632! [2024-10-06 11:49:54,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1568768. Throughput: 0: 208.0. Samples: 393724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:49:54,975][00491] Avg episode reward: [(0, '5.519')] [2024-10-06 11:49:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1572864. Throughput: 0: 219.1. Samples: 394622. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:49:59,976][00491] Avg episode reward: [(0, '5.565')] [2024-10-06 11:50:04,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1572864. Throughput: 0: 196.8. Samples: 395418. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:50:04,976][00491] Avg episode reward: [(0, '5.414')] [2024-10-06 11:50:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1581056. Throughput: 0: 204.9. Samples: 396776. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:50:09,981][00491] Avg episode reward: [(0, '5.405')] [2024-10-06 11:50:14,977][00491] Fps is (10 sec: 1228.3, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 1585152. Throughput: 0: 220.1. Samples: 397818. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:50:14,981][00491] Avg episode reward: [(0, '5.330')] [2024-10-06 11:50:19,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1585152. Throughput: 0: 202.8. Samples: 398768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:50:19,976][00491] Avg episode reward: [(0, '5.408')] [2024-10-06 11:50:24,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1593344. Throughput: 0: 204.4. Samples: 399812. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:50:24,976][00491] Avg episode reward: [(0, '5.334')] [2024-10-06 11:50:29,442][04755] Updated weights for policy 0, policy_version 390 (0.0560) [2024-10-06 11:50:29,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1597440. Throughput: 0: 209.4. Samples: 400748. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:50:29,975][00491] Avg episode reward: [(0, '5.482')] [2024-10-06 11:50:34,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1601536. Throughput: 0: 206.3. Samples: 401900. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:50:34,977][00491] Avg episode reward: [(0, '5.376')] [2024-10-06 11:50:39,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1601536. Throughput: 0: 205.4. Samples: 402968. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:50:39,976][00491] Avg episode reward: [(0, '5.403')] [2024-10-06 11:50:44,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1609728. Throughput: 0: 199.4. Samples: 403596. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:50:44,976][00491] Avg episode reward: [(0, '5.489')] [2024-10-06 11:50:49,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1613824. Throughput: 0: 212.1. Samples: 404964. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:50:49,975][00491] Avg episode reward: [(0, '5.560')] [2024-10-06 11:50:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1617920. Throughput: 0: 204.1. Samples: 405960. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:50:54,982][00491] Avg episode reward: [(0, '5.579')] [2024-10-06 11:50:59,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1617920. Throughput: 0: 193.3. Samples: 406516. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:50:59,976][00491] Avg episode reward: [(0, '5.537')] [2024-10-06 11:51:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1626112. Throughput: 0: 206.4. Samples: 408058. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:51:04,976][00491] Avg episode reward: [(0, '5.536')] [2024-10-06 11:51:09,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1630208. Throughput: 0: 209.9. Samples: 409256. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:51:09,977][00491] Avg episode reward: [(0, '5.519')] [2024-10-06 11:51:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 1634304. Throughput: 0: 203.9. Samples: 409922. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:51:14,978][00491] Avg episode reward: [(0, '5.560')] [2024-10-06 11:51:19,936][04755] Updated weights for policy 0, policy_version 400 (0.1022) [2024-10-06 11:51:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1638400. Throughput: 0: 203.5. Samples: 411056. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:51:19,977][00491] Avg episode reward: [(0, '5.472')] [2024-10-06 11:51:22,414][04742] Signal inference workers to stop experience collection... (400 times) [2024-10-06 11:51:22,487][04755] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-10-06 11:51:23,620][04742] Signal inference workers to resume experience collection... (400 times) [2024-10-06 11:51:23,622][04755] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-10-06 11:51:24,973][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1642496. Throughput: 0: 203.2. Samples: 412112. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:51:24,980][00491] Avg episode reward: [(0, '5.464')] [2024-10-06 11:51:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1646592. Throughput: 0: 211.6. Samples: 413116. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:51:29,983][00491] Avg episode reward: [(0, '5.455')] [2024-10-06 11:51:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1650688. Throughput: 0: 203.8. Samples: 414136. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:51:34,977][00491] Avg episode reward: [(0, '5.340')] [2024-10-06 11:51:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1654784. Throughput: 0: 208.5. Samples: 415344. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:51:39,976][00491] Avg episode reward: [(0, '5.328')] [2024-10-06 11:51:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1658880. Throughput: 0: 217.5. Samples: 416302. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:51:44,976][00491] Avg episode reward: [(0, '5.208')] [2024-10-06 11:51:48,916][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000406_1662976.pth... [2024-10-06 11:51:49,035][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000358_1466368.pth [2024-10-06 11:51:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1662976. Throughput: 0: 206.5. Samples: 417352. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:51:49,980][00491] Avg episode reward: [(0, '5.243')] [2024-10-06 11:51:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1667072. Throughput: 0: 203.4. Samples: 418410. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:51:54,980][00491] Avg episode reward: [(0, '5.334')] [2024-10-06 11:51:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1671168. Throughput: 0: 211.7. Samples: 419450. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:51:59,975][00491] Avg episode reward: [(0, '5.457')] [2024-10-06 11:52:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1675264. Throughput: 0: 212.9. Samples: 420636. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:52:04,977][00491] Avg episode reward: [(0, '5.428')] [2024-10-06 11:52:08,627][04755] Updated weights for policy 0, policy_version 410 (0.0549) [2024-10-06 11:52:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1679360. Throughput: 0: 210.8. Samples: 421596. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:52:09,984][00491] Avg episode reward: [(0, '5.545')] [2024-10-06 11:52:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1683456. Throughput: 0: 208.4. Samples: 422494. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:52:14,975][00491] Avg episode reward: [(0, '5.591')] [2024-10-06 11:52:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1687552. Throughput: 0: 210.1. Samples: 423592. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:52:19,975][00491] Avg episode reward: [(0, '5.880')] [2024-10-06 11:52:22,866][04742] Saving new best policy, reward=5.880! [2024-10-06 11:52:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1691648. Throughput: 0: 211.4. Samples: 424856. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:52:24,980][00491] Avg episode reward: [(0, '5.932')] [2024-10-06 11:52:29,162][04742] Saving new best policy, reward=5.932! [2024-10-06 11:52:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1695744. Throughput: 0: 205.0. Samples: 425526. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:52:29,982][00491] Avg episode reward: [(0, '6.021')] [2024-10-06 11:52:33,420][04742] Saving new best policy, reward=6.021! [2024-10-06 11:52:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1699840. Throughput: 0: 205.6. Samples: 426606. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:52:34,982][00491] Avg episode reward: [(0, '5.997')] [2024-10-06 11:52:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1703936. Throughput: 0: 212.8. Samples: 427986. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:52:39,980][00491] Avg episode reward: [(0, '6.261')] [2024-10-06 11:52:43,704][04742] Saving new best policy, reward=6.261! [2024-10-06 11:52:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1708032. Throughput: 0: 204.2. Samples: 428640. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:52:44,982][00491] Avg episode reward: [(0, '6.682')] [2024-10-06 11:52:49,216][04742] Saving new best policy, reward=6.682! [2024-10-06 11:52:49,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1712128. Throughput: 0: 201.5. Samples: 429706. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:52:49,977][00491] Avg episode reward: [(0, '6.834')] [2024-10-06 11:52:53,291][04742] Saving new best policy, reward=6.834! [2024-10-06 11:52:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1716224. Throughput: 0: 210.9. Samples: 431086. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:52:54,980][00491] Avg episode reward: [(0, '6.678')] [2024-10-06 11:52:57,632][04755] Updated weights for policy 0, policy_version 420 (0.1516) [2024-10-06 11:52:59,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1720320. Throughput: 0: 205.7. Samples: 431750. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:52:59,975][00491] Avg episode reward: [(0, '6.804')] [2024-10-06 11:53:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1724416. Throughput: 0: 204.3. Samples: 432784. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:53:04,975][00491] Avg episode reward: [(0, '6.768')] [2024-10-06 11:53:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1728512. Throughput: 0: 206.5. Samples: 434150. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:53:09,980][00491] Avg episode reward: [(0, '6.930')] [2024-10-06 11:53:12,223][04742] Saving new best policy, reward=6.930! [2024-10-06 11:53:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1732608. Throughput: 0: 205.2. Samples: 434762. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:53:14,975][00491] Avg episode reward: [(0, '6.837')] [2024-10-06 11:53:19,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1736704. Throughput: 0: 209.8. Samples: 436048. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:53:19,979][00491] Avg episode reward: [(0, '6.931')] [2024-10-06 11:53:23,406][04742] Saving new best policy, reward=6.931! [2024-10-06 11:53:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1740800. Throughput: 0: 202.8. Samples: 437110. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:53:24,985][00491] Avg episode reward: [(0, '7.076')] [2024-10-06 11:53:27,592][04742] Saving new best policy, reward=7.076! [2024-10-06 11:53:29,973][00491] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1744896. Throughput: 0: 204.4. Samples: 437838. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:53:29,981][00491] Avg episode reward: [(0, '7.015')] [2024-10-06 11:53:34,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1748992. Throughput: 0: 216.1. Samples: 439430. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:53:34,981][00491] Avg episode reward: [(0, '7.067')] [2024-10-06 11:53:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1753088. Throughput: 0: 204.5. Samples: 440290. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:53:39,983][00491] Avg episode reward: [(0, '6.916')] [2024-10-06 11:53:44,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1757184. Throughput: 0: 203.8. Samples: 440922. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:53:44,978][00491] Avg episode reward: [(0, '6.658')] [2024-10-06 11:53:46,669][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000430_1761280.pth... [2024-10-06 11:53:46,674][04755] Updated weights for policy 0, policy_version 430 (0.0057) [2024-10-06 11:53:46,803][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000382_1564672.pth [2024-10-06 11:53:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1761280. Throughput: 0: 216.8. Samples: 442542. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:53:49,975][00491] Avg episode reward: [(0, '6.584')] [2024-10-06 11:53:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1765376. Throughput: 0: 211.2. Samples: 443652. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:53:54,975][00491] Avg episode reward: [(0, '7.021')] [2024-10-06 11:53:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1769472. Throughput: 0: 205.3. Samples: 444000. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:53:59,976][00491] Avg episode reward: [(0, '7.180')] [2024-10-06 11:54:01,870][04742] Saving new best policy, reward=7.180! [2024-10-06 11:54:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1773568. Throughput: 0: 211.8. Samples: 445580. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:54:04,982][00491] Avg episode reward: [(0, '7.334')] [2024-10-06 11:54:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1777664. Throughput: 0: 220.4. Samples: 447030. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:54:09,976][00491] Avg episode reward: [(0, '7.101')] [2024-10-06 11:54:10,861][04742] Saving new best policy, reward=7.334! [2024-10-06 11:54:14,979][00491] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 1781760. Throughput: 0: 210.8. Samples: 447326. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:54:14,982][00491] Avg episode reward: [(0, '7.337')] [2024-10-06 11:54:17,013][04742] Saving new best policy, reward=7.337! [2024-10-06 11:54:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1785856. Throughput: 0: 202.4. Samples: 448538. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:54:19,982][00491] Avg episode reward: [(0, '7.578')] [2024-10-06 11:54:24,973][00491] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1789952. Throughput: 0: 218.2. Samples: 450108. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:54:24,976][00491] Avg episode reward: [(0, '7.368')] [2024-10-06 11:54:25,272][04742] Saving new best policy, reward=7.578! [2024-10-06 11:54:29,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1794048. Throughput: 0: 217.5. Samples: 450708. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:54:29,985][00491] Avg episode reward: [(0, '7.443')] [2024-10-06 11:54:34,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1798144. Throughput: 0: 190.8. Samples: 451128. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:54:35,003][00491] Avg episode reward: [(0, '7.508')] [2024-10-06 11:54:39,973][00491] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1798144. Throughput: 0: 179.5. Samples: 451730. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:54:39,976][00491] Avg episode reward: [(0, '7.375')] [2024-10-06 11:54:42,386][04755] Updated weights for policy 0, policy_version 440 (0.0558) [2024-10-06 11:54:44,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1802240. Throughput: 0: 181.0. Samples: 452144. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:54:44,983][00491] Avg episode reward: [(0, '7.385')] [2024-10-06 11:54:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1806336. Throughput: 0: 185.0. Samples: 453906. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:54:49,976][00491] Avg episode reward: [(0, '7.488')] [2024-10-06 11:54:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1810432. Throughput: 0: 175.9. Samples: 454944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:54:54,976][00491] Avg episode reward: [(0, '7.631')] [2024-10-06 11:54:56,981][04742] Saving new best policy, reward=7.631! [2024-10-06 11:54:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1814528. Throughput: 0: 175.5. Samples: 455224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 11:54:59,980][00491] Avg episode reward: [(0, '7.884')] [2024-10-06 11:55:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1818624. Throughput: 0: 186.6. Samples: 456936. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:55:04,976][00491] Avg episode reward: [(0, '7.693')] [2024-10-06 11:55:05,457][04742] Saving new best policy, reward=7.884! [2024-10-06 11:55:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1822720. Throughput: 0: 180.9. Samples: 458250. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:55:09,982][00491] Avg episode reward: [(0, '7.627')] [2024-10-06 11:55:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 819.2). Total num frames: 1826816. Throughput: 0: 173.6. Samples: 458522. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:55:14,981][00491] Avg episode reward: [(0, '7.477')] [2024-10-06 11:55:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1830912. Throughput: 0: 195.2. Samples: 459914. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:55:19,981][00491] Avg episode reward: [(0, '7.830')] [2024-10-06 11:55:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1839104. Throughput: 0: 212.3. Samples: 461282. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:55:24,976][00491] Avg episode reward: [(0, '7.877')] [2024-10-06 11:55:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1839104. Throughput: 0: 210.8. Samples: 461628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:55:29,977][00491] Avg episode reward: [(0, '8.194')] [2024-10-06 11:55:31,083][04755] Updated weights for policy 0, policy_version 450 (0.0650) [2024-10-06 11:55:34,812][04742] Signal inference workers to stop experience collection... (450 times) [2024-10-06 11:55:34,857][04755] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-10-06 11:55:34,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1843200. Throughput: 0: 197.1. Samples: 462774. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:55:34,985][00491] Avg episode reward: [(0, '8.031')] [2024-10-06 11:55:36,127][04742] Signal inference workers to resume experience collection... (450 times) [2024-10-06 11:55:36,129][04755] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-10-06 11:55:36,127][04742] Saving new best policy, reward=8.194! [2024-10-06 11:55:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1847296. Throughput: 0: 211.0. Samples: 464440. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:55:39,982][00491] Avg episode reward: [(0, '8.093')] [2024-10-06 11:55:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1851392. Throughput: 0: 216.6. Samples: 464970. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:55:44,976][00491] Avg episode reward: [(0, '8.180')] [2024-10-06 11:55:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1855488. Throughput: 0: 201.3. Samples: 465996. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:55:49,980][00491] Avg episode reward: [(0, '8.178')] [2024-10-06 11:55:51,521][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000454_1859584.pth... [2024-10-06 11:55:51,641][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000406_1662976.pth [2024-10-06 11:55:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1859584. Throughput: 0: 204.4. Samples: 467450. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:55:54,983][00491] Avg episode reward: [(0, '8.232')] [2024-10-06 11:55:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1863680. Throughput: 0: 210.5. Samples: 467994. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:55:59,976][00491] Avg episode reward: [(0, '8.239')] [2024-10-06 11:55:59,991][04742] Saving new best policy, reward=8.232! [2024-10-06 11:56:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1867776. Throughput: 0: 211.3. Samples: 469422. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:04,976][00491] Avg episode reward: [(0, '8.250')] [2024-10-06 11:56:05,828][04742] Saving new best policy, reward=8.239! [2024-10-06 11:56:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1871872. Throughput: 0: 199.6. Samples: 470266. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:09,982][00491] Avg episode reward: [(0, '8.092')] [2024-10-06 11:56:11,191][04742] Saving new best policy, reward=8.250! [2024-10-06 11:56:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1875968. Throughput: 0: 206.8. Samples: 470934. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:14,982][00491] Avg episode reward: [(0, '8.373')] [2024-10-06 11:56:19,944][04742] Saving new best policy, reward=8.373! [2024-10-06 11:56:19,945][04755] Updated weights for policy 0, policy_version 460 (0.1684) [2024-10-06 11:56:19,974][00491] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1884160. Throughput: 0: 215.1. Samples: 472452. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:19,977][00491] Avg episode reward: [(0, '8.249')] [2024-10-06 11:56:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1884160. Throughput: 0: 201.4. Samples: 473502. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:24,980][00491] Avg episode reward: [(0, '8.254')] [2024-10-06 11:56:29,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1888256. Throughput: 0: 195.2. Samples: 473756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:29,976][00491] Avg episode reward: [(0, '8.172')] [2024-10-06 11:56:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1892352. Throughput: 0: 212.5. Samples: 475560. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:34,982][00491] Avg episode reward: [(0, '7.974')] [2024-10-06 11:56:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1896448. Throughput: 0: 205.2. Samples: 476682. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:39,975][00491] Avg episode reward: [(0, '8.085')] [2024-10-06 11:56:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1900544. Throughput: 0: 202.6. Samples: 477112. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:44,976][00491] Avg episode reward: [(0, '8.195')] [2024-10-06 11:56:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1904640. Throughput: 0: 200.4. Samples: 478438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:49,980][00491] Avg episode reward: [(0, '8.261')] [2024-10-06 11:56:54,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1912832. Throughput: 0: 211.0. Samples: 479760. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:54,975][00491] Avg episode reward: [(0, '8.533')] [2024-10-06 11:56:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1912832. Throughput: 0: 208.5. Samples: 480318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:56:59,976][00491] Avg episode reward: [(0, '8.926')] [2024-10-06 11:57:00,844][04742] Saving new best policy, reward=8.533! [2024-10-06 11:57:04,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1916928. Throughput: 0: 198.1. Samples: 481366. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:57:04,978][00491] Avg episode reward: [(0, '9.231')] [2024-10-06 11:57:06,057][04742] Saving new best policy, reward=8.926! [2024-10-06 11:57:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1921024. Throughput: 0: 209.5. Samples: 482928. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:57:09,981][00491] Avg episode reward: [(0, '9.323')] [2024-10-06 11:57:10,305][04755] Updated weights for policy 0, policy_version 470 (0.1028) [2024-10-06 11:57:10,312][04742] Saving new best policy, reward=9.231! [2024-10-06 11:57:14,871][04742] Saving new best policy, reward=9.323! [2024-10-06 11:57:14,979][00491] Fps is (10 sec: 1228.1, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 1929216. Throughput: 0: 220.5. Samples: 483682. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:57:14,981][00491] Avg episode reward: [(0, '9.397')] [2024-10-06 11:57:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1929216. Throughput: 0: 202.3. Samples: 484664. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:57:19,976][00491] Avg episode reward: [(0, '9.441')] [2024-10-06 11:57:21,530][04742] Saving new best policy, reward=9.397! [2024-10-06 11:57:24,973][00491] Fps is (10 sec: 409.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1933312. Throughput: 0: 206.8. Samples: 485988. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:57:24,982][00491] Avg episode reward: [(0, '9.401')] [2024-10-06 11:57:25,778][04742] Saving new best policy, reward=9.441! [2024-10-06 11:57:29,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1941504. Throughput: 0: 210.0. Samples: 486562. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:57:29,981][00491] Avg episode reward: [(0, '9.356')] [2024-10-06 11:57:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1941504. Throughput: 0: 213.2. Samples: 488034. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:57:34,975][00491] Avg episode reward: [(0, '9.524')] [2024-10-06 11:57:39,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1945600. Throughput: 0: 203.6. Samples: 488922. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 11:57:39,975][00491] Avg episode reward: [(0, '9.417')] [2024-10-06 11:57:40,993][04742] Saving new best policy, reward=9.524! [2024-10-06 11:57:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1949696. Throughput: 0: 205.0. Samples: 489542. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:57:44,985][00491] Avg episode reward: [(0, '9.586')] [2024-10-06 11:57:49,280][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000478_1957888.pth... [2024-10-06 11:57:49,392][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000430_1761280.pth [2024-10-06 11:57:49,415][04742] Saving new best policy, reward=9.586! [2024-10-06 11:57:49,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1957888. Throughput: 0: 216.0. Samples: 491084. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:57:49,979][00491] Avg episode reward: [(0, '9.426')] [2024-10-06 11:57:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1957888. Throughput: 0: 203.6. Samples: 492090. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:57:54,977][00491] Avg episode reward: [(0, '9.259')] [2024-10-06 11:57:59,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1961984. Throughput: 0: 194.4. Samples: 492430. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:57:59,975][00491] Avg episode reward: [(0, '9.013')] [2024-10-06 11:58:01,145][04755] Updated weights for policy 0, policy_version 480 (0.1052) [2024-10-06 11:58:04,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1970176. Throughput: 0: 206.0. Samples: 493932. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 11:58:04,977][00491] Avg episode reward: [(0, '8.605')] [2024-10-06 11:58:09,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1974272. Throughput: 0: 201.4. Samples: 495052. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:58:09,978][00491] Avg episode reward: [(0, '8.264')] [2024-10-06 11:58:14,973][00491] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 1974272. Throughput: 0: 203.8. Samples: 495732. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:58:14,976][00491] Avg episode reward: [(0, '8.307')] [2024-10-06 11:58:19,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1978368. Throughput: 0: 198.7. Samples: 496976. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 11:58:19,982][00491] Avg episode reward: [(0, '8.186')] [2024-10-06 11:58:24,974][00491] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1986560. Throughput: 0: 203.2. Samples: 498064. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:58:24,979][00491] Avg episode reward: [(0, '8.305')] [2024-10-06 11:58:29,975][00491] Fps is (10 sec: 1228.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1990656. Throughput: 0: 207.7. Samples: 498888. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:58:29,984][00491] Avg episode reward: [(0, '8.419')] [2024-10-06 11:58:34,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1990656. Throughput: 0: 194.9. Samples: 499856. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:58:34,975][00491] Avg episode reward: [(0, '8.482')] [2024-10-06 11:58:39,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1998848. Throughput: 0: 200.5. Samples: 501114. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:58:39,976][00491] Avg episode reward: [(0, '8.635')] [2024-10-06 11:58:44,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 2002944. Throughput: 0: 213.4. Samples: 502032. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:58:44,978][00491] Avg episode reward: [(0, '8.684')] [2024-10-06 11:58:49,804][04755] Updated weights for policy 0, policy_version 490 (0.1486) [2024-10-06 11:58:49,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2007040. Throughput: 0: 205.6. Samples: 503186. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:58:49,983][00491] Avg episode reward: [(0, '8.759')] [2024-10-06 11:58:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 2011136. Throughput: 0: 203.4. Samples: 504206. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:58:54,975][00491] Avg episode reward: [(0, '8.840')] [2024-10-06 11:58:59,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 2015232. Throughput: 0: 209.5. Samples: 505158. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:58:59,978][00491] Avg episode reward: [(0, '9.682')] [2024-10-06 11:59:03,324][04742] Saving new best policy, reward=9.682! [2024-10-06 11:59:04,976][00491] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2019328. Throughput: 0: 206.1. Samples: 506250. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:04,979][00491] Avg episode reward: [(0, '9.560')] [2024-10-06 11:59:09,981][00491] Fps is (10 sec: 818.5, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 2023424. Throughput: 0: 205.2. Samples: 507300. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:09,989][00491] Avg episode reward: [(0, '9.529')] [2024-10-06 11:59:14,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 2027520. Throughput: 0: 204.9. Samples: 508110. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:14,975][00491] Avg episode reward: [(0, '9.955')] [2024-10-06 11:59:18,417][04742] Saving new best policy, reward=9.955! [2024-10-06 11:59:19,973][00491] Fps is (10 sec: 819.9, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 2031616. Throughput: 0: 210.4. Samples: 509322. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:19,980][00491] Avg episode reward: [(0, '9.970')] [2024-10-06 11:59:23,693][04742] Saving new best policy, reward=9.970! [2024-10-06 11:59:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2035712. Throughput: 0: 206.5. Samples: 510408. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:24,976][00491] Avg episode reward: [(0, '10.302')] [2024-10-06 11:59:29,950][04742] Saving new best policy, reward=10.302! [2024-10-06 11:59:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2039808. Throughput: 0: 199.3. Samples: 511002. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:29,986][00491] Avg episode reward: [(0, '10.242')] [2024-10-06 11:59:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2043904. Throughput: 0: 204.2. Samples: 512376. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:34,976][00491] Avg episode reward: [(0, '10.253')] [2024-10-06 11:59:38,317][04755] Updated weights for policy 0, policy_version 500 (0.0077) [2024-10-06 11:59:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2048000. Throughput: 0: 211.0. Samples: 513702. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:39,978][00491] Avg episode reward: [(0, '10.299')] [2024-10-06 11:59:42,485][04742] Signal inference workers to stop experience collection... (500 times) [2024-10-06 11:59:42,662][04755] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-10-06 11:59:44,503][04742] Signal inference workers to resume experience collection... (500 times) [2024-10-06 11:59:44,506][04755] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-10-06 11:59:44,973][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2052096. Throughput: 0: 203.9. Samples: 514332. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:44,984][00491] Avg episode reward: [(0, '10.243')] [2024-10-06 11:59:49,666][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000502_2056192.pth... [2024-10-06 11:59:49,778][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000454_1859584.pth [2024-10-06 11:59:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2056192. Throughput: 0: 203.6. Samples: 515410. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:49,975][00491] Avg episode reward: [(0, '10.426')] [2024-10-06 11:59:54,155][04742] Saving new best policy, reward=10.426! [2024-10-06 11:59:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2060288. Throughput: 0: 205.1. Samples: 516530. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 11:59:54,989][00491] Avg episode reward: [(0, '10.393')] [2024-10-06 11:59:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2064384. Throughput: 0: 209.4. Samples: 517532. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 11:59:59,981][00491] Avg episode reward: [(0, '10.554')] [2024-10-06 12:00:04,973][00491] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 819.2). Total num frames: 2064384. Throughput: 0: 201.3. Samples: 518380. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:04,977][00491] Avg episode reward: [(0, '10.646')] [2024-10-06 12:00:05,809][04742] Saving new best policy, reward=10.554! [2024-10-06 12:00:09,973][00491] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 819.2). Total num frames: 2068480. Throughput: 0: 204.6. Samples: 519614. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:09,976][00491] Avg episode reward: [(0, '10.787')] [2024-10-06 12:00:10,130][04742] Saving new best policy, reward=10.646! [2024-10-06 12:00:14,387][04742] Saving new best policy, reward=10.787! [2024-10-06 12:00:14,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2076672. Throughput: 0: 209.1. Samples: 520410. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:14,976][00491] Avg episode reward: [(0, '11.175')] [2024-10-06 12:00:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2076672. Throughput: 0: 204.9. Samples: 521598. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:19,976][00491] Avg episode reward: [(0, '10.738')] [2024-10-06 12:00:20,559][04742] Saving new best policy, reward=11.175! [2024-10-06 12:00:24,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2080768. Throughput: 0: 194.9. Samples: 522472. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:24,978][00491] Avg episode reward: [(0, '10.841')] [2024-10-06 12:00:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2084864. Throughput: 0: 196.3. Samples: 523166. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:29,981][00491] Avg episode reward: [(0, '10.768')] [2024-10-06 12:00:30,120][04755] Updated weights for policy 0, policy_version 510 (0.0047) [2024-10-06 12:00:34,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2093056. Throughput: 0: 207.4. Samples: 524742. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:34,978][00491] Avg episode reward: [(0, '10.652')] [2024-10-06 12:00:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2093056. Throughput: 0: 204.4. Samples: 525730. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:39,979][00491] Avg episode reward: [(0, '10.372')] [2024-10-06 12:00:44,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2097152. Throughput: 0: 190.2. Samples: 526090. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:00:44,981][00491] Avg episode reward: [(0, '10.509')] [2024-10-06 12:00:49,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2105344. Throughput: 0: 207.8. Samples: 527732. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:49,978][00491] Avg episode reward: [(0, '10.709')] [2024-10-06 12:00:54,976][00491] Fps is (10 sec: 1228.4, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2109440. Throughput: 0: 204.4. Samples: 528812. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:54,982][00491] Avg episode reward: [(0, '10.733')] [2024-10-06 12:00:59,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2109440. Throughput: 0: 198.5. Samples: 529342. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:00:59,976][00491] Avg episode reward: [(0, '11.008')] [2024-10-06 12:01:04,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2117632. Throughput: 0: 203.2. Samples: 530742. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:01:04,977][00491] Avg episode reward: [(0, '10.747')] [2024-10-06 12:01:09,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2121728. Throughput: 0: 210.5. Samples: 531944. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:01:09,984][00491] Avg episode reward: [(0, '10.684')] [2024-10-06 12:01:14,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2121728. Throughput: 0: 208.6. Samples: 532552. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:01:14,976][00491] Avg episode reward: [(0, '10.800')] [2024-10-06 12:01:19,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2125824. Throughput: 0: 196.4. Samples: 533580. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:01:19,982][00491] Avg episode reward: [(0, '10.752')] [2024-10-06 12:01:20,220][04755] Updated weights for policy 0, policy_version 520 (0.2025) [2024-10-06 12:01:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2134016. Throughput: 0: 204.8. Samples: 534946. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:01:24,976][00491] Avg episode reward: [(0, '10.570')] [2024-10-06 12:01:29,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2138112. Throughput: 0: 218.1. Samples: 535904. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:01:29,975][00491] Avg episode reward: [(0, '10.574')] [2024-10-06 12:01:34,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2138112. Throughput: 0: 204.5. Samples: 536934. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:01:34,976][00491] Avg episode reward: [(0, '10.139')] [2024-10-06 12:01:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2146304. Throughput: 0: 205.4. Samples: 538056. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:01:39,981][00491] Avg episode reward: [(0, '10.299')] [2024-10-06 12:01:44,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2150400. Throughput: 0: 214.0. Samples: 538972. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:01:44,976][00491] Avg episode reward: [(0, '10.385')] [2024-10-06 12:01:49,236][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000526_2154496.pth... [2024-10-06 12:01:49,364][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000478_1957888.pth [2024-10-06 12:01:49,978][00491] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 2154496. Throughput: 0: 207.8. Samples: 540096. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:01:49,982][00491] Avg episode reward: [(0, '10.488')] [2024-10-06 12:01:54,973][00491] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 819.2). Total num frames: 2154496. Throughput: 0: 206.0. Samples: 541214. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:01:54,977][00491] Avg episode reward: [(0, '10.460')] [2024-10-06 12:01:59,973][00491] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2162688. Throughput: 0: 207.5. Samples: 541888. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:01:59,975][00491] Avg episode reward: [(0, '10.397')] [2024-10-06 12:02:04,974][00491] Fps is (10 sec: 1228.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2166784. Throughput: 0: 214.9. Samples: 543252. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:02:04,982][00491] Avg episode reward: [(0, '10.018')] [2024-10-06 12:02:09,659][04755] Updated weights for policy 0, policy_version 530 (0.0558) [2024-10-06 12:02:09,979][00491] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 2170880. Throughput: 0: 206.6. Samples: 544242. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:02:09,982][00491] Avg episode reward: [(0, '10.080')] [2024-10-06 12:02:14,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2174976. Throughput: 0: 198.4. Samples: 544830. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:02:14,976][00491] Avg episode reward: [(0, '9.919')] [2024-10-06 12:02:19,973][00491] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2179072. Throughput: 0: 206.3. Samples: 546218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:02:19,982][00491] Avg episode reward: [(0, '10.288')] [2024-10-06 12:02:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2183168. Throughput: 0: 210.1. Samples: 547510. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:02:24,980][00491] Avg episode reward: [(0, '10.350')] [2024-10-06 12:02:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2187264. Throughput: 0: 204.2. Samples: 548162. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:02:29,981][00491] Avg episode reward: [(0, '10.181')] [2024-10-06 12:02:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2191360. Throughput: 0: 205.3. Samples: 549334. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:02:34,978][00491] Avg episode reward: [(0, '10.226')] [2024-10-06 12:02:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2195456. Throughput: 0: 211.1. Samples: 550712. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:02:39,975][00491] Avg episode reward: [(0, '10.511')] [2024-10-06 12:02:44,979][00491] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2199552. Throughput: 0: 210.8. Samples: 551374. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:02:44,981][00491] Avg episode reward: [(0, '10.563')] [2024-10-06 12:02:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 2203648. Throughput: 0: 203.2. Samples: 552394. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:02:49,982][00491] Avg episode reward: [(0, '10.215')] [2024-10-06 12:02:54,975][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2207744. Throughput: 0: 216.9. Samples: 554000. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:02:54,982][00491] Avg episode reward: [(0, '9.923')] [2024-10-06 12:02:56,929][04755] Updated weights for policy 0, policy_version 540 (0.1053) [2024-10-06 12:02:59,976][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2211840. Throughput: 0: 216.0. Samples: 554550. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:02:59,979][00491] Avg episode reward: [(0, '10.175')] [2024-10-06 12:03:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2215936. Throughput: 0: 205.7. Samples: 555476. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:04,978][00491] Avg episode reward: [(0, '10.175')] [2024-10-06 12:03:09,973][00491] Fps is (10 sec: 819.4, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 2220032. Throughput: 0: 213.4. Samples: 557114. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:09,975][00491] Avg episode reward: [(0, '10.454')] [2024-10-06 12:03:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2224128. Throughput: 0: 212.5. Samples: 557726. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:14,976][00491] Avg episode reward: [(0, '10.307')] [2024-10-06 12:03:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2228224. Throughput: 0: 213.8. Samples: 558956. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:19,984][00491] Avg episode reward: [(0, '10.614')] [2024-10-06 12:03:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2232320. Throughput: 0: 210.5. Samples: 560186. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:24,985][00491] Avg episode reward: [(0, '11.109')] [2024-10-06 12:03:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2236416. Throughput: 0: 205.3. Samples: 560610. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:29,978][00491] Avg episode reward: [(0, '11.230')] [2024-10-06 12:03:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2240512. Throughput: 0: 223.6. Samples: 562454. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:34,980][00491] Avg episode reward: [(0, '11.427')] [2024-10-06 12:03:35,416][04742] Saving new best policy, reward=11.230! [2024-10-06 12:03:39,981][00491] Fps is (10 sec: 818.6, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 2244608. Throughput: 0: 210.2. Samples: 563462. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:39,985][00491] Avg episode reward: [(0, '11.607')] [2024-10-06 12:03:41,528][04742] Saving new best policy, reward=11.427! [2024-10-06 12:03:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2248704. Throughput: 0: 207.3. Samples: 563876. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:44,978][00491] Avg episode reward: [(0, '11.575')] [2024-10-06 12:03:45,778][04742] Saving new best policy, reward=11.607! [2024-10-06 12:03:45,791][04755] Updated weights for policy 0, policy_version 550 (0.2154) [2024-10-06 12:03:48,244][04742] Signal inference workers to stop experience collection... (550 times) [2024-10-06 12:03:48,272][04755] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-10-06 12:03:49,786][04742] Signal inference workers to resume experience collection... (550 times) [2024-10-06 12:03:49,788][04755] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-10-06 12:03:49,788][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000551_2256896.pth... [2024-10-06 12:03:49,909][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000502_2056192.pth [2024-10-06 12:03:49,973][00491] Fps is (10 sec: 1229.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2256896. Throughput: 0: 225.2. Samples: 565608. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:49,976][00491] Avg episode reward: [(0, '11.880')] [2024-10-06 12:03:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2256896. Throughput: 0: 213.2. Samples: 566706. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:54,980][00491] Avg episode reward: [(0, '11.866')] [2024-10-06 12:03:55,933][04742] Saving new best policy, reward=11.880! [2024-10-06 12:03:59,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2260992. Throughput: 0: 205.8. Samples: 566986. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:03:59,981][00491] Avg episode reward: [(0, '11.695')] [2024-10-06 12:04:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2265088. Throughput: 0: 211.8. Samples: 568486. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:04:04,978][00491] Avg episode reward: [(0, '11.580')] [2024-10-06 12:04:09,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2273280. Throughput: 0: 212.2. Samples: 569734. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:04:09,981][00491] Avg episode reward: [(0, '11.730')] [2024-10-06 12:04:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2273280. Throughput: 0: 215.6. Samples: 570310. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:04:14,977][00491] Avg episode reward: [(0, '11.647')] [2024-10-06 12:04:19,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2277376. Throughput: 0: 200.7. Samples: 571484. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:04:19,984][00491] Avg episode reward: [(0, '11.899')] [2024-10-06 12:04:24,523][04742] Saving new best policy, reward=11.899! [2024-10-06 12:04:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2285568. Throughput: 0: 207.8. Samples: 572812. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:04:24,984][00491] Avg episode reward: [(0, '12.510')] [2024-10-06 12:04:29,525][04742] Saving new best policy, reward=12.510! [2024-10-06 12:04:29,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2289664. Throughput: 0: 220.0. Samples: 573778. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:04:29,979][00491] Avg episode reward: [(0, '12.484')] [2024-10-06 12:04:34,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2289664. Throughput: 0: 199.2. Samples: 574570. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:04:34,977][00491] Avg episode reward: [(0, '12.449')] [2024-10-06 12:04:36,091][04755] Updated weights for policy 0, policy_version 560 (0.1482) [2024-10-06 12:04:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 833.1). Total num frames: 2297856. Throughput: 0: 203.2. Samples: 575852. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:04:39,986][00491] Avg episode reward: [(0, '12.108')] [2024-10-06 12:04:44,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2301952. Throughput: 0: 214.2. Samples: 576626. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:04:44,981][00491] Avg episode reward: [(0, '11.899')] [2024-10-06 12:04:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2306048. Throughput: 0: 208.5. Samples: 577868. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:04:49,978][00491] Avg episode reward: [(0, '11.983')] [2024-10-06 12:04:54,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2306048. Throughput: 0: 204.4. Samples: 578934. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:04:54,976][00491] Avg episode reward: [(0, '11.772')] [2024-10-06 12:04:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2314240. Throughput: 0: 209.0. Samples: 579714. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:04:59,985][00491] Avg episode reward: [(0, '11.826')] [2024-10-06 12:05:04,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2318336. Throughput: 0: 212.9. Samples: 581064. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:05:04,982][00491] Avg episode reward: [(0, '11.741')] [2024-10-06 12:05:09,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2318336. Throughput: 0: 205.9. Samples: 582078. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:05:09,977][00491] Avg episode reward: [(0, '12.111')] [2024-10-06 12:05:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2326528. Throughput: 0: 199.8. Samples: 582768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:05:14,976][00491] Avg episode reward: [(0, '11.687')] [2024-10-06 12:05:19,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2330624. Throughput: 0: 211.3. Samples: 584078. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:05:19,978][00491] Avg episode reward: [(0, '11.920')] [2024-10-06 12:05:23,249][04755] Updated weights for policy 0, policy_version 570 (0.1014) [2024-10-06 12:05:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2334720. Throughput: 0: 211.7. Samples: 585380. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:05:24,981][00491] Avg episode reward: [(0, '11.961')] [2024-10-06 12:05:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2338816. Throughput: 0: 208.6. Samples: 586012. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:05:29,980][00491] Avg episode reward: [(0, '12.102')] [2024-10-06 12:05:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2342912. Throughput: 0: 207.2. Samples: 587190. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:05:34,977][00491] Avg episode reward: [(0, '12.398')] [2024-10-06 12:05:39,980][00491] Fps is (10 sec: 818.6, 60 sec: 819.1, 300 sec: 846.9). Total num frames: 2347008. Throughput: 0: 218.2. Samples: 588756. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:05:39,991][00491] Avg episode reward: [(0, '12.199')] [2024-10-06 12:05:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2351104. Throughput: 0: 210.3. Samples: 589176. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:05:44,977][00491] Avg episode reward: [(0, '12.102')] [2024-10-06 12:05:48,251][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000575_2355200.pth... [2024-10-06 12:05:48,373][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000526_2154496.pth [2024-10-06 12:05:49,973][00491] Fps is (10 sec: 819.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2355200. Throughput: 0: 202.7. Samples: 590186. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:05:49,976][00491] Avg episode reward: [(0, '12.116')] [2024-10-06 12:05:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2359296. Throughput: 0: 215.7. Samples: 591786. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:05:54,980][00491] Avg episode reward: [(0, '12.020')] [2024-10-06 12:05:59,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2363392. Throughput: 0: 217.2. Samples: 592542. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:05:59,982][00491] Avg episode reward: [(0, '12.177')] [2024-10-06 12:06:04,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2367488. Throughput: 0: 208.3. Samples: 593450. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:06:04,986][00491] Avg episode reward: [(0, '12.253')] [2024-10-06 12:06:09,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2371584. Throughput: 0: 208.8. Samples: 594776. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:06:09,976][00491] Avg episode reward: [(0, '12.564')] [2024-10-06 12:06:11,721][04742] Saving new best policy, reward=12.564! [2024-10-06 12:06:11,734][04755] Updated weights for policy 0, policy_version 580 (0.1922) [2024-10-06 12:06:14,973][00491] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2375680. Throughput: 0: 211.2. Samples: 595518. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:06:14,980][00491] Avg episode reward: [(0, '13.456')] [2024-10-06 12:06:19,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2379776. Throughput: 0: 213.3. Samples: 596788. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:06:19,984][00491] Avg episode reward: [(0, '13.857')] [2024-10-06 12:06:22,794][04742] Saving new best policy, reward=13.456! [2024-10-06 12:06:22,941][04742] Saving new best policy, reward=13.857! [2024-10-06 12:06:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2383872. Throughput: 0: 205.5. Samples: 598004. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:06:24,976][00491] Avg episode reward: [(0, '13.486')] [2024-10-06 12:06:29,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2387968. Throughput: 0: 206.6. Samples: 598472. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:06:29,976][00491] Avg episode reward: [(0, '13.504')] [2024-10-06 12:06:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2392064. Throughput: 0: 222.5. Samples: 600200. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:06:34,984][00491] Avg episode reward: [(0, '13.274')] [2024-10-06 12:06:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 2396160. Throughput: 0: 205.9. Samples: 601050. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:06:39,980][00491] Avg episode reward: [(0, '13.645')] [2024-10-06 12:06:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2400256. Throughput: 0: 200.2. Samples: 601550. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:06:44,976][00491] Avg episode reward: [(0, '13.482')] [2024-10-06 12:06:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2404352. Throughput: 0: 220.1. Samples: 603354. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:06:49,981][00491] Avg episode reward: [(0, '13.641')] [2024-10-06 12:06:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2408448. Throughput: 0: 216.6. Samples: 604522. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:06:54,976][00491] Avg episode reward: [(0, '13.265')] [2024-10-06 12:06:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2412544. Throughput: 0: 205.3. Samples: 604756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:06:59,985][00491] Avg episode reward: [(0, '13.289')] [2024-10-06 12:07:01,031][04755] Updated weights for policy 0, policy_version 590 (0.0537) [2024-10-06 12:07:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2416640. Throughput: 0: 214.5. Samples: 606442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:07:04,976][00491] Avg episode reward: [(0, '13.584')] [2024-10-06 12:07:09,974][00491] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 2424832. Throughput: 0: 212.5. Samples: 607566. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:07:09,977][00491] Avg episode reward: [(0, '13.772')] [2024-10-06 12:07:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2424832. Throughput: 0: 216.3. Samples: 608206. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:07:14,975][00491] Avg episode reward: [(0, '13.425')] [2024-10-06 12:07:19,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2433024. Throughput: 0: 204.4. Samples: 609398. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:07:19,985][00491] Avg episode reward: [(0, '13.440')] [2024-10-06 12:07:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2437120. Throughput: 0: 214.3. Samples: 610694. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:07:24,982][00491] Avg episode reward: [(0, '13.705')] [2024-10-06 12:07:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2441216. Throughput: 0: 221.0. Samples: 611496. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:07:29,979][00491] Avg episode reward: [(0, '13.615')] [2024-10-06 12:07:34,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2441216. Throughput: 0: 204.1. Samples: 612538. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:07:34,981][00491] Avg episode reward: [(0, '13.917')] [2024-10-06 12:07:39,145][04742] Saving new best policy, reward=13.917! [2024-10-06 12:07:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2449408. Throughput: 0: 205.3. Samples: 613762. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:07:39,986][00491] Avg episode reward: [(0, '13.573')] [2024-10-06 12:07:44,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2453504. Throughput: 0: 222.5. Samples: 614768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:07:44,977][00491] Avg episode reward: [(0, '14.067')] [2024-10-06 12:07:49,153][04755] Updated weights for policy 0, policy_version 600 (0.1926) [2024-10-06 12:07:49,156][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000600_2457600.pth... [2024-10-06 12:07:49,304][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000551_2256896.pth [2024-10-06 12:07:49,327][04742] Saving new best policy, reward=14.067! [2024-10-06 12:07:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2457600. Throughput: 0: 207.1. Samples: 615762. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:07:49,985][00491] Avg episode reward: [(0, '13.939')] [2024-10-06 12:07:52,915][04742] Signal inference workers to stop experience collection... (600 times) [2024-10-06 12:07:52,992][04755] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-10-06 12:07:54,396][04742] Signal inference workers to resume experience collection... (600 times) [2024-10-06 12:07:54,398][04755] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-10-06 12:07:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2461696. Throughput: 0: 205.4. Samples: 616810. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:07:54,981][00491] Avg episode reward: [(0, '13.898')] [2024-10-06 12:07:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2465792. Throughput: 0: 212.2. Samples: 617756. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:07:59,980][00491] Avg episode reward: [(0, '14.356')] [2024-10-06 12:08:02,997][04742] Saving new best policy, reward=14.356! [2024-10-06 12:08:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2469888. Throughput: 0: 210.6. Samples: 618874. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:04,977][00491] Avg episode reward: [(0, '14.300')] [2024-10-06 12:08:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2473984. Throughput: 0: 203.6. Samples: 619854. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:09,981][00491] Avg episode reward: [(0, '14.252')] [2024-10-06 12:08:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2478080. Throughput: 0: 208.6. Samples: 620882. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:14,977][00491] Avg episode reward: [(0, '14.205')] [2024-10-06 12:08:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2482176. Throughput: 0: 214.0. Samples: 622168. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:19,986][00491] Avg episode reward: [(0, '14.594')] [2024-10-06 12:08:22,706][04742] Saving new best policy, reward=14.594! [2024-10-06 12:08:24,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2486272. Throughput: 0: 211.5. Samples: 623278. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:24,984][00491] Avg episode reward: [(0, '14.594')] [2024-10-06 12:08:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2490368. Throughput: 0: 203.5. Samples: 623924. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:29,983][00491] Avg episode reward: [(0, '14.951')] [2024-10-06 12:08:32,412][04742] Saving new best policy, reward=14.951! [2024-10-06 12:08:34,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2494464. Throughput: 0: 211.3. Samples: 625272. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:34,981][00491] Avg episode reward: [(0, '14.712')] [2024-10-06 12:08:36,938][04755] Updated weights for policy 0, policy_version 610 (0.0967) [2024-10-06 12:08:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2498560. Throughput: 0: 217.0. Samples: 626576. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:39,976][00491] Avg episode reward: [(0, '15.274')] [2024-10-06 12:08:42,283][04742] Saving new best policy, reward=15.274! [2024-10-06 12:08:44,977][00491] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 833.1). Total num frames: 2502656. Throughput: 0: 207.0. Samples: 627072. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:44,984][00491] Avg episode reward: [(0, '15.424')] [2024-10-06 12:08:47,453][04742] Saving new best policy, reward=15.424! [2024-10-06 12:08:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2506752. Throughput: 0: 212.2. Samples: 628424. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:49,976][00491] Avg episode reward: [(0, '15.489')] [2024-10-06 12:08:54,973][00491] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2510848. Throughput: 0: 227.3. Samples: 630084. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:08:54,986][00491] Avg episode reward: [(0, '15.084')] [2024-10-06 12:08:55,482][04742] Saving new best policy, reward=15.489! [2024-10-06 12:08:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2514944. Throughput: 0: 214.9. Samples: 630554. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:08:59,977][00491] Avg episode reward: [(0, '14.810')] [2024-10-06 12:09:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2519040. Throughput: 0: 205.4. Samples: 631412. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:09:04,983][00491] Avg episode reward: [(0, '15.129')] [2024-10-06 12:09:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2523136. Throughput: 0: 213.9. Samples: 632902. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:09:09,983][00491] Avg episode reward: [(0, '15.016')] [2024-10-06 12:09:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2527232. Throughput: 0: 209.0. Samples: 633328. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:09:14,982][00491] Avg episode reward: [(0, '14.597')] [2024-10-06 12:09:19,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2527232. Throughput: 0: 192.6. Samples: 633940. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:09:19,977][00491] Avg episode reward: [(0, '14.802')] [2024-10-06 12:09:24,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2531328. Throughput: 0: 181.7. Samples: 634754. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:09:24,975][00491] Avg episode reward: [(0, '14.920')] [2024-10-06 12:09:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 2535424. Throughput: 0: 183.2. Samples: 635314. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:09:29,976][00491] Avg episode reward: [(0, '14.545')] [2024-10-06 12:09:31,952][04755] Updated weights for policy 0, policy_version 620 (0.1941) [2024-10-06 12:09:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2539520. Throughput: 0: 187.5. Samples: 636860. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:09:34,976][00491] Avg episode reward: [(0, '14.813')] [2024-10-06 12:09:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2543616. Throughput: 0: 173.2. Samples: 637876. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:09:39,975][00491] Avg episode reward: [(0, '14.927')] [2024-10-06 12:09:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 819.2). Total num frames: 2547712. Throughput: 0: 174.4. Samples: 638400. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:09:44,978][00491] Avg episode reward: [(0, '14.911')] [2024-10-06 12:09:46,877][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000623_2551808.pth... [2024-10-06 12:09:46,993][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000575_2355200.pth [2024-10-06 12:09:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 2551808. Throughput: 0: 191.0. Samples: 640006. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:09:49,980][00491] Avg episode reward: [(0, '14.833')] [2024-10-06 12:09:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2555904. Throughput: 0: 186.4. Samples: 641288. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:09:54,978][00491] Avg episode reward: [(0, '15.468')] [2024-10-06 12:09:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2560000. Throughput: 0: 182.5. Samples: 641540. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:09:59,983][00491] Avg episode reward: [(0, '15.577')] [2024-10-06 12:10:02,398][04742] Saving new best policy, reward=15.577! [2024-10-06 12:10:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 2564096. Throughput: 0: 199.6. Samples: 642920. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:10:04,976][00491] Avg episode reward: [(0, '15.880')] [2024-10-06 12:10:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2568192. Throughput: 0: 218.8. Samples: 644602. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:10:09,977][00491] Avg episode reward: [(0, '16.214')] [2024-10-06 12:10:11,164][04742] Saving new best policy, reward=15.880! [2024-10-06 12:10:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2572288. Throughput: 0: 211.9. Samples: 644850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:10:14,980][00491] Avg episode reward: [(0, '16.274')] [2024-10-06 12:10:17,492][04742] Saving new best policy, reward=16.214! [2024-10-06 12:10:17,636][04742] Saving new best policy, reward=16.274! [2024-10-06 12:10:19,979][00491] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 2576384. Throughput: 0: 201.6. Samples: 645934. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:10:19,987][00491] Avg episode reward: [(0, '16.311')] [2024-10-06 12:10:22,201][04755] Updated weights for policy 0, policy_version 630 (0.0994) [2024-10-06 12:10:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2580480. Throughput: 0: 213.4. Samples: 647480. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:10:24,983][00491] Avg episode reward: [(0, '16.375')] [2024-10-06 12:10:25,813][04742] Saving new best policy, reward=16.311! [2024-10-06 12:10:29,973][00491] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2584576. Throughput: 0: 217.3. Samples: 648180. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:10:29,981][00491] Avg episode reward: [(0, '15.985')] [2024-10-06 12:10:31,733][04742] Saving new best policy, reward=16.375! [2024-10-06 12:10:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2588672. Throughput: 0: 199.8. Samples: 648998. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:10:34,978][00491] Avg episode reward: [(0, '16.969')] [2024-10-06 12:10:37,029][04742] Saving new best policy, reward=16.969! [2024-10-06 12:10:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2592768. Throughput: 0: 205.5. Samples: 650536. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:10:39,987][00491] Avg episode reward: [(0, '17.276')] [2024-10-06 12:10:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2596864. Throughput: 0: 213.2. Samples: 651134. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:10:44,982][00491] Avg episode reward: [(0, '17.558')] [2024-10-06 12:10:45,116][04742] Saving new best policy, reward=17.276! [2024-10-06 12:10:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2600960. Throughput: 0: 212.5. Samples: 652484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:10:49,980][00491] Avg episode reward: [(0, '17.228')] [2024-10-06 12:10:51,260][04742] Saving new best policy, reward=17.558! [2024-10-06 12:10:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2605056. Throughput: 0: 198.6. Samples: 653538. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:10:54,984][00491] Avg episode reward: [(0, '17.083')] [2024-10-06 12:10:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2609152. Throughput: 0: 207.7. Samples: 654196. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:10:59,977][00491] Avg episode reward: [(0, '16.943')] [2024-10-06 12:11:04,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2617344. Throughput: 0: 219.6. Samples: 655814. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:11:04,975][00491] Avg episode reward: [(0, '17.227')] [2024-10-06 12:11:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2617344. Throughput: 0: 208.0. Samples: 656838. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:11:09,976][00491] Avg episode reward: [(0, '17.771')] [2024-10-06 12:11:11,047][04755] Updated weights for policy 0, policy_version 640 (0.0057) [2024-10-06 12:11:14,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2621440. Throughput: 0: 201.4. Samples: 657244. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:11:14,984][00491] Avg episode reward: [(0, '17.527')] [2024-10-06 12:11:15,144][04742] Saving new best policy, reward=17.771! [2024-10-06 12:11:19,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.6, 300 sec: 833.1). Total num frames: 2629632. Throughput: 0: 220.4. Samples: 658918. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:11:19,977][00491] Avg episode reward: [(0, '17.592')] [2024-10-06 12:11:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2633728. Throughput: 0: 208.7. Samples: 659928. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:11:24,982][00491] Avg episode reward: [(0, '17.458')] [2024-10-06 12:11:29,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2633728. Throughput: 0: 209.1. Samples: 660542. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:11:29,986][00491] Avg episode reward: [(0, '17.500')] [2024-10-06 12:11:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2641920. Throughput: 0: 210.4. Samples: 661950. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:11:34,976][00491] Avg episode reward: [(0, '18.009')] [2024-10-06 12:11:38,514][04742] Saving new best policy, reward=18.009! [2024-10-06 12:11:39,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2646016. Throughput: 0: 215.3. Samples: 663228. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:11:39,976][00491] Avg episode reward: [(0, '17.968')] [2024-10-06 12:11:44,976][00491] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 2650112. Throughput: 0: 216.9. Samples: 663958. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:11:44,981][00491] Avg episode reward: [(0, '18.041')] [2024-10-06 12:11:49,445][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000648_2654208.pth... [2024-10-06 12:11:49,556][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000600_2457600.pth [2024-10-06 12:11:49,577][04742] Saving new best policy, reward=18.041! [2024-10-06 12:11:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2654208. Throughput: 0: 205.0. Samples: 665038. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:11:49,976][00491] Avg episode reward: [(0, '17.543')] [2024-10-06 12:11:54,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2658304. Throughput: 0: 205.9. Samples: 666102. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:11:54,976][00491] Avg episode reward: [(0, '17.179')] [2024-10-06 12:11:58,018][04755] Updated weights for policy 0, policy_version 650 (0.1011) [2024-10-06 12:11:59,974][00491] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 2662400. Throughput: 0: 218.0. Samples: 667054. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:11:59,977][00491] Avg episode reward: [(0, '17.032')] [2024-10-06 12:12:02,326][04742] Signal inference workers to stop experience collection... (650 times) [2024-10-06 12:12:02,439][04755] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-10-06 12:12:04,210][04742] Signal inference workers to resume experience collection... (650 times) [2024-10-06 12:12:04,210][04755] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-10-06 12:12:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2666496. Throughput: 0: 203.1. Samples: 668058. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:12:04,978][00491] Avg episode reward: [(0, '17.026')] [2024-10-06 12:12:09,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2670592. Throughput: 0: 208.6. Samples: 669314. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:12:09,976][00491] Avg episode reward: [(0, '17.253')] [2024-10-06 12:12:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 2674688. Throughput: 0: 212.3. Samples: 670094. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:12:14,982][00491] Avg episode reward: [(0, '17.283')] [2024-10-06 12:12:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2678784. Throughput: 0: 206.8. Samples: 671256. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:12:19,975][00491] Avg episode reward: [(0, '16.929')] [2024-10-06 12:12:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2682880. Throughput: 0: 204.9. Samples: 672450. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:12:24,975][00491] Avg episode reward: [(0, '16.744')] [2024-10-06 12:12:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2686976. Throughput: 0: 205.3. Samples: 673194. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:12:29,976][00491] Avg episode reward: [(0, '16.755')] [2024-10-06 12:12:34,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2691072. Throughput: 0: 215.1. Samples: 674716. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:12:34,979][00491] Avg episode reward: [(0, '17.244')] [2024-10-06 12:12:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2695168. Throughput: 0: 209.7. Samples: 675540. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:12:39,982][00491] Avg episode reward: [(0, '17.244')] [2024-10-06 12:12:44,973][00491] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2699264. Throughput: 0: 205.8. Samples: 676314. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:12:44,976][00491] Avg episode reward: [(0, '18.011')] [2024-10-06 12:12:46,401][04755] Updated weights for policy 0, policy_version 660 (0.1096) [2024-10-06 12:12:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2703360. Throughput: 0: 223.9. Samples: 678134. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:12:49,983][00491] Avg episode reward: [(0, '18.318')] [2024-10-06 12:12:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2707456. Throughput: 0: 221.4. Samples: 679276. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:12:54,981][00491] Avg episode reward: [(0, '18.036')] [2024-10-06 12:12:56,581][04742] Saving new best policy, reward=18.318! [2024-10-06 12:12:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2711552. Throughput: 0: 211.3. Samples: 679602. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:12:59,977][00491] Avg episode reward: [(0, '18.487')] [2024-10-06 12:13:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2715648. Throughput: 0: 220.3. Samples: 681168. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:13:04,986][00491] Avg episode reward: [(0, '18.699')] [2024-10-06 12:13:05,311][04742] Saving new best policy, reward=18.487! [2024-10-06 12:13:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2719744. Throughput: 0: 222.4. Samples: 682458. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:13:09,976][00491] Avg episode reward: [(0, '18.699')] [2024-10-06 12:13:10,073][04742] Saving new best policy, reward=18.699! [2024-10-06 12:13:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2723840. Throughput: 0: 216.8. Samples: 682948. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:13:14,977][00491] Avg episode reward: [(0, '18.654')] [2024-10-06 12:13:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2727936. Throughput: 0: 210.8. Samples: 684202. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:13:19,976][00491] Avg episode reward: [(0, '19.233')] [2024-10-06 12:13:24,548][04742] Saving new best policy, reward=19.233! [2024-10-06 12:13:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2736128. Throughput: 0: 220.8. Samples: 685474. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:13:24,976][00491] Avg episode reward: [(0, '19.255')] [2024-10-06 12:13:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2736128. Throughput: 0: 221.9. Samples: 686298. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:13:29,976][00491] Avg episode reward: [(0, '19.173')] [2024-10-06 12:13:30,272][04742] Saving new best policy, reward=19.255! [2024-10-06 12:13:34,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2740224. Throughput: 0: 199.8. Samples: 687124. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:13:34,977][00491] Avg episode reward: [(0, '18.821')] [2024-10-06 12:13:35,698][04755] Updated weights for policy 0, policy_version 670 (0.0540) [2024-10-06 12:13:39,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2748416. Throughput: 0: 204.5. Samples: 688480. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:13:39,981][00491] Avg episode reward: [(0, '18.635')] [2024-10-06 12:13:44,981][00491] Fps is (10 sec: 1227.9, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 2752512. Throughput: 0: 219.7. Samples: 689490. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:13:44,987][00491] Avg episode reward: [(0, '19.731')] [2024-10-06 12:13:49,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2752512. Throughput: 0: 203.5. Samples: 690326. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:13:49,978][00491] Avg episode reward: [(0, '19.430')] [2024-10-06 12:13:50,124][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000673_2756608.pth... [2024-10-06 12:13:50,275][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000623_2551808.pth [2024-10-06 12:13:50,299][04742] Saving new best policy, reward=19.731! [2024-10-06 12:13:54,973][00491] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2760704. Throughput: 0: 201.2. Samples: 691512. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:13:54,976][00491] Avg episode reward: [(0, '19.977')] [2024-10-06 12:13:58,861][04742] Saving new best policy, reward=19.977! [2024-10-06 12:13:59,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2764800. Throughput: 0: 211.1. Samples: 692446. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:13:59,978][00491] Avg episode reward: [(0, '20.080')] [2024-10-06 12:14:03,561][04742] Saving new best policy, reward=20.080! [2024-10-06 12:14:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2768896. Throughput: 0: 208.2. Samples: 693570. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:14:04,978][00491] Avg episode reward: [(0, '20.050')] [2024-10-06 12:14:09,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2768896. Throughput: 0: 203.1. Samples: 694614. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:14:09,985][00491] Avg episode reward: [(0, '20.033')] [2024-10-06 12:14:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2777088. Throughput: 0: 200.3. Samples: 695310. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:14:14,978][00491] Avg episode reward: [(0, '19.666')] [2024-10-06 12:14:19,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2781184. Throughput: 0: 212.3. Samples: 696676. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:14:19,981][00491] Avg episode reward: [(0, '19.517')] [2024-10-06 12:14:24,795][04755] Updated weights for policy 0, policy_version 680 (0.2187) [2024-10-06 12:14:24,975][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2785280. Throughput: 0: 207.3. Samples: 697808. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:14:24,982][00491] Avg episode reward: [(0, '19.713')] [2024-10-06 12:14:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2789376. Throughput: 0: 197.3. Samples: 698366. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-10-06 12:14:29,983][00491] Avg episode reward: [(0, '18.764')] [2024-10-06 12:14:34,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2793472. Throughput: 0: 212.4. Samples: 699882. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:14:34,984][00491] Avg episode reward: [(0, '18.524')] [2024-10-06 12:14:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2797568. Throughput: 0: 215.3. Samples: 701202. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:14:39,978][00491] Avg episode reward: [(0, '18.664')] [2024-10-06 12:14:44,978][00491] Fps is (10 sec: 818.8, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2801664. Throughput: 0: 209.2. Samples: 701862. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:14:44,987][00491] Avg episode reward: [(0, '18.205')] [2024-10-06 12:14:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2805760. Throughput: 0: 209.6. Samples: 703000. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:14:49,975][00491] Avg episode reward: [(0, '18.201')] [2024-10-06 12:14:54,973][00491] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2809856. Throughput: 0: 218.1. Samples: 704430. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:14:54,985][00491] Avg episode reward: [(0, '18.290')] [2024-10-06 12:14:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2813952. Throughput: 0: 215.9. Samples: 705024. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:14:59,976][00491] Avg episode reward: [(0, '18.426')] [2024-10-06 12:15:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2818048. Throughput: 0: 208.0. Samples: 706034. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:15:04,982][00491] Avg episode reward: [(0, '18.316')] [2024-10-06 12:15:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2822144. Throughput: 0: 215.1. Samples: 707486. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:15:09,977][00491] Avg episode reward: [(0, '18.508')] [2024-10-06 12:15:11,836][04755] Updated weights for policy 0, policy_version 690 (0.1495) [2024-10-06 12:15:14,974][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2826240. Throughput: 0: 219.7. Samples: 708252. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:15:14,979][00491] Avg episode reward: [(0, '18.594')] [2024-10-06 12:15:19,973][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2830336. Throughput: 0: 209.6. Samples: 709312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:15:19,982][00491] Avg episode reward: [(0, '18.715')] [2024-10-06 12:15:24,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2834432. Throughput: 0: 209.0. Samples: 710608. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:15:24,976][00491] Avg episode reward: [(0, '18.402')] [2024-10-06 12:15:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2838528. Throughput: 0: 211.2. Samples: 711366. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:15:29,976][00491] Avg episode reward: [(0, '18.240')] [2024-10-06 12:15:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2842624. Throughput: 0: 215.8. Samples: 712712. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:15:34,977][00491] Avg episode reward: [(0, '18.678')] [2024-10-06 12:15:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2846720. Throughput: 0: 207.6. Samples: 713770. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:15:39,975][00491] Avg episode reward: [(0, '18.491')] [2024-10-06 12:15:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 847.0). Total num frames: 2850816. Throughput: 0: 209.5. Samples: 714450. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:15:44,978][00491] Avg episode reward: [(0, '18.386')] [2024-10-06 12:15:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2854912. Throughput: 0: 227.3. Samples: 716262. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:15:49,977][00491] Avg episode reward: [(0, '18.404')] [2024-10-06 12:15:50,040][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000698_2859008.pth... [2024-10-06 12:15:50,180][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000648_2654208.pth [2024-10-06 12:15:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2859008. Throughput: 0: 216.2. Samples: 717214. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:15:54,977][00491] Avg episode reward: [(0, '18.034')] [2024-10-06 12:15:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2863104. Throughput: 0: 208.5. Samples: 717634. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:15:59,977][00491] Avg episode reward: [(0, '17.703')] [2024-10-06 12:16:00,329][04755] Updated weights for policy 0, policy_version 700 (0.0515) [2024-10-06 12:16:02,710][04742] Signal inference workers to stop experience collection... (700 times) [2024-10-06 12:16:02,755][04755] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-10-06 12:16:04,297][04742] Signal inference workers to resume experience collection... (700 times) [2024-10-06 12:16:04,299][04755] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-10-06 12:16:04,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 2871296. Throughput: 0: 221.6. Samples: 719282. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:16:04,980][00491] Avg episode reward: [(0, '17.665')] [2024-10-06 12:16:09,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 2875392. Throughput: 0: 216.0. Samples: 720330. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:16:09,976][00491] Avg episode reward: [(0, '17.834')] [2024-10-06 12:16:14,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2875392. Throughput: 0: 211.7. Samples: 720892. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:16:14,980][00491] Avg episode reward: [(0, '17.574')] [2024-10-06 12:16:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2883584. Throughput: 0: 215.8. Samples: 722424. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:16:19,976][00491] Avg episode reward: [(0, '17.782')] [2024-10-06 12:16:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 2887680. Throughput: 0: 222.0. Samples: 723762. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:16:24,980][00491] Avg episode reward: [(0, '17.157')] [2024-10-06 12:16:29,981][00491] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 846.9). Total num frames: 2891776. Throughput: 0: 222.0. Samples: 724442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:16:29,983][00491] Avg episode reward: [(0, '16.796')] [2024-10-06 12:16:34,976][00491] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 2895872. Throughput: 0: 205.0. Samples: 725486. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:16:34,989][00491] Avg episode reward: [(0, '17.158')] [2024-10-06 12:16:39,974][00491] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2899968. Throughput: 0: 210.7. Samples: 726696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:16:39,979][00491] Avg episode reward: [(0, '17.217')] [2024-10-06 12:16:44,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2904064. Throughput: 0: 219.9. Samples: 727528. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:16:44,978][00491] Avg episode reward: [(0, '17.219')] [2024-10-06 12:16:48,566][04755] Updated weights for policy 0, policy_version 710 (0.0974) [2024-10-06 12:16:49,977][00491] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 2908160. Throughput: 0: 205.4. Samples: 728524. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:16:49,983][00491] Avg episode reward: [(0, '17.098')] [2024-10-06 12:16:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2912256. Throughput: 0: 211.4. Samples: 729844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:16:54,985][00491] Avg episode reward: [(0, '16.965')] [2024-10-06 12:16:59,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2916352. Throughput: 0: 216.8. Samples: 730650. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:16:59,982][00491] Avg episode reward: [(0, '17.207')] [2024-10-06 12:17:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2920448. Throughput: 0: 212.9. Samples: 732004. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:17:04,989][00491] Avg episode reward: [(0, '17.062')] [2024-10-06 12:17:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2924544. Throughput: 0: 205.6. Samples: 733014. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:17:09,975][00491] Avg episode reward: [(0, '17.475')] [2024-10-06 12:17:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 2928640. Throughput: 0: 208.0. Samples: 733800. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:17:14,975][00491] Avg episode reward: [(0, '17.398')] [2024-10-06 12:17:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2932736. Throughput: 0: 220.0. Samples: 735384. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:17:19,984][00491] Avg episode reward: [(0, '17.121')] [2024-10-06 12:17:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2936832. Throughput: 0: 214.8. Samples: 736364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:17:24,981][00491] Avg episode reward: [(0, '16.971')] [2024-10-06 12:17:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 847.0). Total num frames: 2940928. Throughput: 0: 207.6. Samples: 736868. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:17:29,975][00491] Avg episode reward: [(0, '16.905')] [2024-10-06 12:17:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2945024. Throughput: 0: 223.8. Samples: 738594. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:17:34,976][00491] Avg episode reward: [(0, '17.411')] [2024-10-06 12:17:36,017][04755] Updated weights for policy 0, policy_version 720 (0.1975) [2024-10-06 12:17:39,976][00491] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2949120. Throughput: 0: 216.7. Samples: 739596. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:17:39,982][00491] Avg episode reward: [(0, '17.444')] [2024-10-06 12:17:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2953216. Throughput: 0: 208.1. Samples: 740016. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:17:44,976][00491] Avg episode reward: [(0, '18.203')] [2024-10-06 12:17:46,866][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000722_2957312.pth... [2024-10-06 12:17:46,979][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000673_2756608.pth [2024-10-06 12:17:49,973][00491] Fps is (10 sec: 819.5, 60 sec: 819.3, 300 sec: 847.0). Total num frames: 2957312. Throughput: 0: 212.3. Samples: 741556. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:17:49,984][00491] Avg episode reward: [(0, '18.380')] [2024-10-06 12:17:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2961408. Throughput: 0: 221.4. Samples: 742976. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:17:54,978][00491] Avg episode reward: [(0, '18.008')] [2024-10-06 12:17:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2965504. Throughput: 0: 212.2. Samples: 743350. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:17:59,982][00491] Avg episode reward: [(0, '18.199')] [2024-10-06 12:18:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2969600. Throughput: 0: 203.3. Samples: 744534. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:18:04,985][00491] Avg episode reward: [(0, '18.071')] [2024-10-06 12:18:09,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 2977792. Throughput: 0: 214.5. Samples: 746018. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:18:09,976][00491] Avg episode reward: [(0, '18.507')] [2024-10-06 12:18:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 2977792. Throughput: 0: 222.4. Samples: 746874. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:18:14,976][00491] Avg episode reward: [(0, '18.582')] [2024-10-06 12:18:19,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2981888. Throughput: 0: 202.6. Samples: 747710. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 12:18:19,982][00491] Avg episode reward: [(0, '18.336')] [2024-10-06 12:18:24,893][04755] Updated weights for policy 0, policy_version 730 (0.0055) [2024-10-06 12:18:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 2990080. Throughput: 0: 211.6. Samples: 749116. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:18:24,976][00491] Avg episode reward: [(0, '18.309')] [2024-10-06 12:18:29,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 2994176. Throughput: 0: 223.2. Samples: 750062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:18:29,982][00491] Avg episode reward: [(0, '17.720')] [2024-10-06 12:18:34,976][00491] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 2998272. Throughput: 0: 214.1. Samples: 751190. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:18:34,979][00491] Avg episode reward: [(0, '17.654')] [2024-10-06 12:18:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3002368. Throughput: 0: 205.3. Samples: 752214. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:18:39,977][00491] Avg episode reward: [(0, '17.353')] [2024-10-06 12:18:44,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3006464. Throughput: 0: 219.0. Samples: 753204. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:18:44,975][00491] Avg episode reward: [(0, '17.490')] [2024-10-06 12:18:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3010560. Throughput: 0: 219.3. Samples: 754404. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:18:49,976][00491] Avg episode reward: [(0, '17.559')] [2024-10-06 12:18:54,977][00491] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 3014656. Throughput: 0: 208.6. Samples: 755406. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:18:54,979][00491] Avg episode reward: [(0, '17.265')] [2024-10-06 12:18:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3018752. Throughput: 0: 208.7. Samples: 756266. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:18:59,987][00491] Avg episode reward: [(0, '16.729')] [2024-10-06 12:19:04,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3022848. Throughput: 0: 216.8. Samples: 757468. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:19:04,977][00491] Avg episode reward: [(0, '17.597')] [2024-10-06 12:19:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3026944. Throughput: 0: 215.6. Samples: 758820. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:19:09,981][00491] Avg episode reward: [(0, '17.999')] [2024-10-06 12:19:13,175][04755] Updated weights for policy 0, policy_version 740 (0.0985) [2024-10-06 12:19:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3031040. Throughput: 0: 205.3. Samples: 759302. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:19:14,981][00491] Avg episode reward: [(0, '18.452')] [2024-10-06 12:19:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3035136. Throughput: 0: 211.1. Samples: 760690. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:19:19,980][00491] Avg episode reward: [(0, '18.379')] [2024-10-06 12:19:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3039232. Throughput: 0: 222.2. Samples: 762212. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:19:24,976][00491] Avg episode reward: [(0, '18.502')] [2024-10-06 12:19:29,978][00491] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 847.0). Total num frames: 3043328. Throughput: 0: 207.4. Samples: 762538. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:19:29,981][00491] Avg episode reward: [(0, '17.915')] [2024-10-06 12:19:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3047424. Throughput: 0: 209.2. Samples: 763816. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:19:34,983][00491] Avg episode reward: [(0, '17.525')] [2024-10-06 12:19:39,973][00491] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3051520. Throughput: 0: 223.3. Samples: 765454. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:19:39,975][00491] Avg episode reward: [(0, '18.442')] [2024-10-06 12:19:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3055616. Throughput: 0: 219.9. Samples: 766160. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:19:44,976][00491] Avg episode reward: [(0, '18.366')] [2024-10-06 12:19:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3059712. Throughput: 0: 212.7. Samples: 767040. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:19:49,977][00491] Avg episode reward: [(0, '18.518')] [2024-10-06 12:19:50,793][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000748_3063808.pth... [2024-10-06 12:19:50,905][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000698_2859008.pth [2024-10-06 12:19:54,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3067904. Throughput: 0: 214.9. Samples: 768492. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:19:54,975][00491] Avg episode reward: [(0, '18.378')] [2024-10-06 12:19:59,454][04755] Updated weights for policy 0, policy_version 750 (0.1014) [2024-10-06 12:19:59,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3072000. Throughput: 0: 224.7. Samples: 769414. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:19:59,978][00491] Avg episode reward: [(0, '18.303')] [2024-10-06 12:20:04,025][04742] Signal inference workers to stop experience collection... (750 times) [2024-10-06 12:20:04,145][04755] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-10-06 12:20:04,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3072000. Throughput: 0: 215.7. Samples: 770398. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:20:04,976][00491] Avg episode reward: [(0, '18.479')] [2024-10-06 12:20:05,165][04742] Signal inference workers to resume experience collection... (750 times) [2024-10-06 12:20:05,168][04755] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-10-06 12:20:09,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3076096. Throughput: 0: 209.6. Samples: 771646. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:20:09,978][00491] Avg episode reward: [(0, '19.164')] [2024-10-06 12:20:14,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3084288. Throughput: 0: 221.5. Samples: 772504. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:20:14,975][00491] Avg episode reward: [(0, '19.442')] [2024-10-06 12:20:19,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3088384. Throughput: 0: 219.1. Samples: 773676. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:20:19,979][00491] Avg episode reward: [(0, '19.173')] [2024-10-06 12:20:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3092480. Throughput: 0: 206.4. Samples: 774742. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:20:24,978][00491] Avg episode reward: [(0, '19.206')] [2024-10-06 12:20:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3096576. Throughput: 0: 210.4. Samples: 775628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:20:29,982][00491] Avg episode reward: [(0, '20.228')] [2024-10-06 12:20:32,854][04742] Saving new best policy, reward=20.228! [2024-10-06 12:20:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3100672. Throughput: 0: 218.8. Samples: 776884. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:20:34,977][00491] Avg episode reward: [(0, '20.176')] [2024-10-06 12:20:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3104768. Throughput: 0: 214.0. Samples: 778124. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:20:39,978][00491] Avg episode reward: [(0, '20.524')] [2024-10-06 12:20:43,618][04742] Saving new best policy, reward=20.524! [2024-10-06 12:20:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3108864. Throughput: 0: 206.9. Samples: 778724. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:20:44,975][00491] Avg episode reward: [(0, '20.764')] [2024-10-06 12:20:47,534][04742] Saving new best policy, reward=20.764! [2024-10-06 12:20:47,544][04755] Updated weights for policy 0, policy_version 760 (0.0518) [2024-10-06 12:20:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3112960. Throughput: 0: 213.5. Samples: 780004. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:20:49,980][00491] Avg episode reward: [(0, '20.470')] [2024-10-06 12:20:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3117056. Throughput: 0: 224.9. Samples: 781766. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:20:54,977][00491] Avg episode reward: [(0, '20.577')] [2024-10-06 12:20:59,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3121152. Throughput: 0: 210.8. Samples: 781990. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:20:59,979][00491] Avg episode reward: [(0, '20.492')] [2024-10-06 12:21:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3125248. Throughput: 0: 212.8. Samples: 783254. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:21:04,985][00491] Avg episode reward: [(0, '20.048')] [2024-10-06 12:21:09,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3129344. Throughput: 0: 224.9. Samples: 784862. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:21:09,981][00491] Avg episode reward: [(0, '20.709')] [2024-10-06 12:21:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3133440. Throughput: 0: 219.7. Samples: 785516. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:21:14,977][00491] Avg episode reward: [(0, '21.051')] [2024-10-06 12:21:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3137536. Throughput: 0: 212.4. Samples: 786442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:21:19,984][00491] Avg episode reward: [(0, '21.284')] [2024-10-06 12:21:21,064][04742] Saving new best policy, reward=21.051! [2024-10-06 12:21:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3141632. Throughput: 0: 218.2. Samples: 787942. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:21:24,976][00491] Avg episode reward: [(0, '21.264')] [2024-10-06 12:21:25,189][04742] Saving new best policy, reward=21.284! [2024-10-06 12:21:29,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3149824. Throughput: 0: 222.8. Samples: 788750. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:21:29,979][00491] Avg episode reward: [(0, '20.321')] [2024-10-06 12:21:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3149824. Throughput: 0: 219.2. Samples: 789866. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:21:34,981][00491] Avg episode reward: [(0, '20.055')] [2024-10-06 12:21:35,668][04755] Updated weights for policy 0, policy_version 770 (0.0523) [2024-10-06 12:21:39,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3153920. Throughput: 0: 205.2. Samples: 790998. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:21:39,982][00491] Avg episode reward: [(0, '19.997')] [2024-10-06 12:21:44,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3162112. Throughput: 0: 213.3. Samples: 791588. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:21:44,985][00491] Avg episode reward: [(0, '19.847')] [2024-10-06 12:21:49,188][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000773_3166208.pth... [2024-10-06 12:21:49,369][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000722_2957312.pth [2024-10-06 12:21:49,975][00491] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 3166208. Throughput: 0: 217.3. Samples: 793034. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:21:49,980][00491] Avg episode reward: [(0, '19.318')] [2024-10-06 12:21:54,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3166208. Throughput: 0: 205.4. Samples: 794104. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:21:54,977][00491] Avg episode reward: [(0, '19.434')] [2024-10-06 12:21:59,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3174400. Throughput: 0: 203.0. Samples: 794652. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:21:59,975][00491] Avg episode reward: [(0, '18.498')] [2024-10-06 12:22:04,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3178496. Throughput: 0: 216.5. Samples: 796186. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:22:04,980][00491] Avg episode reward: [(0, '18.980')] [2024-10-06 12:22:09,974][00491] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 860.9). Total num frames: 3182592. Throughput: 0: 207.3. Samples: 797272. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:22:09,981][00491] Avg episode reward: [(0, '18.528')] [2024-10-06 12:22:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3186688. Throughput: 0: 203.5. Samples: 797908. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:22:14,985][00491] Avg episode reward: [(0, '17.973')] [2024-10-06 12:22:19,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3190784. Throughput: 0: 208.3. Samples: 799240. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:22:19,976][00491] Avg episode reward: [(0, '17.924')] [2024-10-06 12:22:22,440][04755] Updated weights for policy 0, policy_version 780 (0.1909) [2024-10-06 12:22:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3194880. Throughput: 0: 216.6. Samples: 800746. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:22:24,976][00491] Avg episode reward: [(0, '17.628')] [2024-10-06 12:22:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3198976. Throughput: 0: 213.9. Samples: 801212. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:22:29,981][00491] Avg episode reward: [(0, '17.856')] [2024-10-06 12:22:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3203072. Throughput: 0: 205.8. Samples: 802296. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:22:34,977][00491] Avg episode reward: [(0, '17.820')] [2024-10-06 12:22:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3207168. Throughput: 0: 222.9. Samples: 804136. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:22:39,977][00491] Avg episode reward: [(0, '18.649')] [2024-10-06 12:22:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3211264. Throughput: 0: 214.1. Samples: 804288. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:22:44,983][00491] Avg episode reward: [(0, '18.751')] [2024-10-06 12:22:49,973][00491] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 847.0). Total num frames: 3211264. Throughput: 0: 193.3. Samples: 804886. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:22:49,979][00491] Avg episode reward: [(0, '18.439')] [2024-10-06 12:22:54,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3215360. Throughput: 0: 189.3. Samples: 805788. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:22:54,982][00491] Avg episode reward: [(0, '18.306')] [2024-10-06 12:22:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 847.0). Total num frames: 3219456. Throughput: 0: 187.2. Samples: 806330. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:22:59,986][00491] Avg episode reward: [(0, '17.860')] [2024-10-06 12:23:04,976][00491] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 3223552. Throughput: 0: 194.6. Samples: 807996. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:23:04,980][00491] Avg episode reward: [(0, '17.771')] [2024-10-06 12:23:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 847.0). Total num frames: 3227648. Throughput: 0: 181.7. Samples: 808922. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:23:09,980][00491] Avg episode reward: [(0, '17.517')] [2024-10-06 12:23:14,973][00491] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 847.0). Total num frames: 3231744. Throughput: 0: 182.9. Samples: 809444. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:23:14,982][00491] Avg episode reward: [(0, '17.671')] [2024-10-06 12:23:16,048][04755] Updated weights for policy 0, policy_version 790 (0.0066) [2024-10-06 12:23:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 3235840. Throughput: 0: 200.4. Samples: 811316. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:23:19,986][00491] Avg episode reward: [(0, '17.761')] [2024-10-06 12:23:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 3239936. Throughput: 0: 180.5. Samples: 812258. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:23:24,977][00491] Avg episode reward: [(0, '17.666')] [2024-10-06 12:23:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 3244032. Throughput: 0: 187.1. Samples: 812708. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:23:29,985][00491] Avg episode reward: [(0, '18.546')] [2024-10-06 12:23:34,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3252224. Throughput: 0: 213.2. Samples: 814482. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:23:34,977][00491] Avg episode reward: [(0, '19.438')] [2024-10-06 12:23:39,973][00491] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3256320. Throughput: 0: 217.6. Samples: 815578. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:23:39,977][00491] Avg episode reward: [(0, '20.071')] [2024-10-06 12:23:44,973][00491] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 3256320. Throughput: 0: 219.6. Samples: 816212. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:23:44,979][00491] Avg episode reward: [(0, '19.807')] [2024-10-06 12:23:49,713][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000797_3264512.pth... [2024-10-06 12:23:49,824][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000748_3063808.pth [2024-10-06 12:23:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3264512. Throughput: 0: 213.2. Samples: 817588. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:23:49,985][00491] Avg episode reward: [(0, '19.893')] [2024-10-06 12:23:54,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3268608. Throughput: 0: 220.4. Samples: 818840. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:23:54,983][00491] Avg episode reward: [(0, '19.942')] [2024-10-06 12:23:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3272704. Throughput: 0: 227.4. Samples: 819676. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:23:59,979][00491] Avg episode reward: [(0, '19.843')] [2024-10-06 12:24:04,866][04755] Updated weights for policy 0, policy_version 800 (0.1608) [2024-10-06 12:24:04,975][00491] Fps is (10 sec: 819.0, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3276800. Throughput: 0: 207.9. Samples: 820674. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:24:04,978][00491] Avg episode reward: [(0, '19.871')] [2024-10-06 12:24:07,296][04742] Signal inference workers to stop experience collection... (800 times) [2024-10-06 12:24:07,344][04755] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-10-06 12:24:08,488][04742] Signal inference workers to resume experience collection... (800 times) [2024-10-06 12:24:08,490][04755] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-10-06 12:24:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3280896. Throughput: 0: 212.1. Samples: 821804. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:24:09,976][00491] Avg episode reward: [(0, '20.638')] [2024-10-06 12:24:14,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3284992. Throughput: 0: 222.3. Samples: 822710. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:24:14,982][00491] Avg episode reward: [(0, '20.200')] [2024-10-06 12:24:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3289088. Throughput: 0: 205.6. Samples: 823736. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:24:19,975][00491] Avg episode reward: [(0, '20.155')] [2024-10-06 12:24:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3293184. Throughput: 0: 210.6. Samples: 825056. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:24:24,977][00491] Avg episode reward: [(0, '20.178')] [2024-10-06 12:24:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3297280. Throughput: 0: 212.8. Samples: 825786. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:24:29,975][00491] Avg episode reward: [(0, '20.275')] [2024-10-06 12:24:34,976][00491] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3301376. Throughput: 0: 211.3. Samples: 827098. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:24:34,979][00491] Avg episode reward: [(0, '19.690')] [2024-10-06 12:24:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3305472. Throughput: 0: 207.7. Samples: 828188. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:24:39,978][00491] Avg episode reward: [(0, '19.573')] [2024-10-06 12:24:44,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3309568. Throughput: 0: 204.3. Samples: 828868. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:24:44,975][00491] Avg episode reward: [(0, '19.354')] [2024-10-06 12:24:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3313664. Throughput: 0: 214.2. Samples: 830312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:24:49,985][00491] Avg episode reward: [(0, '19.497')] [2024-10-06 12:24:51,199][04755] Updated weights for policy 0, policy_version 810 (0.1956) [2024-10-06 12:24:54,973][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3317760. Throughput: 0: 216.8. Samples: 831558. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:24:54,976][00491] Avg episode reward: [(0, '19.880')] [2024-10-06 12:24:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3321856. Throughput: 0: 205.0. Samples: 831936. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:24:59,976][00491] Avg episode reward: [(0, '19.619')] [2024-10-06 12:25:04,973][00491] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3325952. Throughput: 0: 220.4. Samples: 833656. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:25:04,984][00491] Avg episode reward: [(0, '19.792')] [2024-10-06 12:25:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3330048. Throughput: 0: 221.1. Samples: 835004. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:25:09,980][00491] Avg episode reward: [(0, '19.918')] [2024-10-06 12:25:14,974][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3334144. Throughput: 0: 212.8. Samples: 835364. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:25:14,984][00491] Avg episode reward: [(0, '19.696')] [2024-10-06 12:25:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3338240. Throughput: 0: 213.4. Samples: 836700. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:25:19,986][00491] Avg episode reward: [(0, '18.975')] [2024-10-06 12:25:24,973][00491] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3346432. Throughput: 0: 221.4. Samples: 838152. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:25:24,976][00491] Avg episode reward: [(0, '18.671')] [2024-10-06 12:25:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3346432. Throughput: 0: 218.0. Samples: 838680. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:25:29,976][00491] Avg episode reward: [(0, '18.637')] [2024-10-06 12:25:34,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3350528. Throughput: 0: 210.4. Samples: 839778. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:25:34,985][00491] Avg episode reward: [(0, '18.241')] [2024-10-06 12:25:39,822][04755] Updated weights for policy 0, policy_version 820 (0.1921) [2024-10-06 12:25:39,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3358720. Throughput: 0: 214.9. Samples: 841228. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:25:39,984][00491] Avg episode reward: [(0, '18.394')] [2024-10-06 12:25:44,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3362816. Throughput: 0: 227.6. Samples: 842180. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:25:44,985][00491] Avg episode reward: [(0, '18.869')] [2024-10-06 12:25:49,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3362816. Throughput: 0: 211.9. Samples: 843190. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:25:49,981][00491] Avg episode reward: [(0, '18.994')] [2024-10-06 12:25:50,451][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000822_3366912.pth... [2024-10-06 12:25:50,539][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000773_3166208.pth [2024-10-06 12:25:54,977][00491] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 3371008. Throughput: 0: 206.8. Samples: 844310. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:25:54,991][00491] Avg episode reward: [(0, '18.885')] [2024-10-06 12:25:59,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3375104. Throughput: 0: 218.9. Samples: 845214. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:25:59,981][00491] Avg episode reward: [(0, '18.764')] [2024-10-06 12:26:04,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3379200. Throughput: 0: 214.2. Samples: 846338. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:04,986][00491] Avg episode reward: [(0, '18.780')] [2024-10-06 12:26:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3383296. Throughput: 0: 205.6. Samples: 847402. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:09,978][00491] Avg episode reward: [(0, '18.772')] [2024-10-06 12:26:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3387392. Throughput: 0: 214.4. Samples: 848328. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:14,980][00491] Avg episode reward: [(0, '18.364')] [2024-10-06 12:26:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3391488. Throughput: 0: 215.8. Samples: 849490. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:19,982][00491] Avg episode reward: [(0, '18.364')] [2024-10-06 12:26:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3395584. Throughput: 0: 207.7. Samples: 850576. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:24,978][00491] Avg episode reward: [(0, '18.412')] [2024-10-06 12:26:28,799][04755] Updated weights for policy 0, policy_version 830 (0.0517) [2024-10-06 12:26:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3399680. Throughput: 0: 200.9. Samples: 851222. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:29,977][00491] Avg episode reward: [(0, '18.315')] [2024-10-06 12:26:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3403776. Throughput: 0: 210.0. Samples: 852642. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:34,976][00491] Avg episode reward: [(0, '18.697')] [2024-10-06 12:26:39,973][00491] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3407872. Throughput: 0: 217.6. Samples: 854102. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:39,977][00491] Avg episode reward: [(0, '18.635')] [2024-10-06 12:26:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3411968. Throughput: 0: 206.3. Samples: 854498. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:44,980][00491] Avg episode reward: [(0, '18.688')] [2024-10-06 12:26:49,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3416064. Throughput: 0: 208.1. Samples: 855704. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:49,978][00491] Avg episode reward: [(0, '19.258')] [2024-10-06 12:26:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 3420160. Throughput: 0: 225.3. Samples: 857540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:54,981][00491] Avg episode reward: [(0, '19.096')] [2024-10-06 12:26:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3424256. Throughput: 0: 211.6. Samples: 857848. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:26:59,976][00491] Avg episode reward: [(0, '19.365')] [2024-10-06 12:27:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3428352. Throughput: 0: 206.7. Samples: 858792. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:27:04,975][00491] Avg episode reward: [(0, '18.783')] [2024-10-06 12:27:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3432448. Throughput: 0: 222.1. Samples: 860570. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:27:09,979][00491] Avg episode reward: [(0, '18.716')] [2024-10-06 12:27:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3436544. Throughput: 0: 223.2. Samples: 861266. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:27:14,976][00491] Avg episode reward: [(0, '18.904')] [2024-10-06 12:27:15,254][04755] Updated weights for policy 0, policy_version 840 (0.0048) [2024-10-06 12:27:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3440640. Throughput: 0: 213.8. Samples: 862262. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:27:19,988][00491] Avg episode reward: [(0, '19.556')] [2024-10-06 12:27:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3444736. Throughput: 0: 211.9. Samples: 863638. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:27:24,984][00491] Avg episode reward: [(0, '20.314')] [2024-10-06 12:27:29,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3452928. Throughput: 0: 214.8. Samples: 864164. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:27:29,976][00491] Avg episode reward: [(0, '20.342')] [2024-10-06 12:27:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3452928. Throughput: 0: 220.8. Samples: 865638. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:27:34,976][00491] Avg episode reward: [(0, '20.138')] [2024-10-06 12:27:39,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3457024. Throughput: 0: 204.1. Samples: 866726. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:27:39,985][00491] Avg episode reward: [(0, '19.836')] [2024-10-06 12:27:44,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3465216. Throughput: 0: 212.4. Samples: 867408. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:27:44,975][00491] Avg episode reward: [(0, '19.562')] [2024-10-06 12:27:48,779][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000847_3469312.pth... [2024-10-06 12:27:48,894][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000797_3264512.pth [2024-10-06 12:27:49,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3469312. Throughput: 0: 222.7. Samples: 868812. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:27:49,982][00491] Avg episode reward: [(0, '19.583')] [2024-10-06 12:27:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3473408. Throughput: 0: 205.2. Samples: 869802. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:27:54,980][00491] Avg episode reward: [(0, '19.991')] [2024-10-06 12:27:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3477504. Throughput: 0: 202.4. Samples: 870376. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:27:59,978][00491] Avg episode reward: [(0, '20.260')] [2024-10-06 12:28:04,337][04755] Updated weights for policy 0, policy_version 850 (0.1458) [2024-10-06 12:28:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3481600. Throughput: 0: 215.6. Samples: 871964. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:28:04,978][00491] Avg episode reward: [(0, '21.096')] [2024-10-06 12:28:06,747][04742] Signal inference workers to stop experience collection... (850 times) [2024-10-06 12:28:06,813][04755] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-10-06 12:28:08,189][04742] Signal inference workers to resume experience collection... (850 times) [2024-10-06 12:28:08,189][04755] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-10-06 12:28:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3485696. Throughput: 0: 212.8. Samples: 873212. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:28:09,975][00491] Avg episode reward: [(0, '20.645')] [2024-10-06 12:28:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3489792. Throughput: 0: 216.7. Samples: 873914. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:28:14,980][00491] Avg episode reward: [(0, '20.611')] [2024-10-06 12:28:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3493888. Throughput: 0: 208.8. Samples: 875036. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:28:19,982][00491] Avg episode reward: [(0, '19.928')] [2024-10-06 12:28:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3497984. Throughput: 0: 220.1. Samples: 876630. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:28:24,975][00491] Avg episode reward: [(0, '19.187')] [2024-10-06 12:28:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3502080. Throughput: 0: 215.9. Samples: 877122. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:28:29,976][00491] Avg episode reward: [(0, '19.522')] [2024-10-06 12:28:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3506176. Throughput: 0: 207.8. Samples: 878162. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:28:34,976][00491] Avg episode reward: [(0, '19.070')] [2024-10-06 12:28:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3510272. Throughput: 0: 222.3. Samples: 879806. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:28:39,976][00491] Avg episode reward: [(0, '18.666')] [2024-10-06 12:28:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3514368. Throughput: 0: 222.9. Samples: 880406. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:28:44,978][00491] Avg episode reward: [(0, '18.804')] [2024-10-06 12:28:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3518464. Throughput: 0: 211.7. Samples: 881490. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:28:49,980][00491] Avg episode reward: [(0, '18.408')] [2024-10-06 12:28:53,233][04755] Updated weights for policy 0, policy_version 860 (0.0515) [2024-10-06 12:28:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3522560. Throughput: 0: 211.3. Samples: 882720. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:28:54,976][00491] Avg episode reward: [(0, '18.433')] [2024-10-06 12:28:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3526656. Throughput: 0: 209.9. Samples: 883360. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:28:59,976][00491] Avg episode reward: [(0, '18.579')] [2024-10-06 12:29:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3530752. Throughput: 0: 215.2. Samples: 884718. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:29:04,976][00491] Avg episode reward: [(0, '17.630')] [2024-10-06 12:29:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3534848. Throughput: 0: 204.3. Samples: 885824. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:29:09,982][00491] Avg episode reward: [(0, '17.530')] [2024-10-06 12:29:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3538944. Throughput: 0: 208.0. Samples: 886484. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:29:14,976][00491] Avg episode reward: [(0, '17.716')] [2024-10-06 12:29:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3543040. Throughput: 0: 224.5. Samples: 888266. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:29:19,976][00491] Avg episode reward: [(0, '17.955')] [2024-10-06 12:29:24,976][00491] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3547136. Throughput: 0: 209.4. Samples: 889228. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:29:24,982][00491] Avg episode reward: [(0, '17.622')] [2024-10-06 12:29:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3551232. Throughput: 0: 204.5. Samples: 889608. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:29:29,982][00491] Avg episode reward: [(0, '17.358')] [2024-10-06 12:29:34,973][00491] Fps is (10 sec: 1229.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3559424. Throughput: 0: 220.9. Samples: 891432. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:29:34,975][00491] Avg episode reward: [(0, '17.612')] [2024-10-06 12:29:39,804][04755] Updated weights for policy 0, policy_version 870 (0.1046) [2024-10-06 12:29:39,973][00491] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3563520. Throughput: 0: 216.0. Samples: 892442. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:29:39,981][00491] Avg episode reward: [(0, '17.443')] [2024-10-06 12:29:44,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3563520. Throughput: 0: 213.0. Samples: 892944. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:29:44,976][00491] Avg episode reward: [(0, '17.443')] [2024-10-06 12:29:49,595][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000872_3571712.pth... [2024-10-06 12:29:49,713][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000822_3366912.pth [2024-10-06 12:29:49,977][00491] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 3571712. Throughput: 0: 216.2. Samples: 894448. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-10-06 12:29:49,980][00491] Avg episode reward: [(0, '16.997')] [2024-10-06 12:29:54,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3575808. Throughput: 0: 218.0. Samples: 895632. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:29:54,981][00491] Avg episode reward: [(0, '17.239')] [2024-10-06 12:29:59,977][00491] Fps is (10 sec: 819.3, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 3579904. Throughput: 0: 219.0. Samples: 896338. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:29:59,981][00491] Avg episode reward: [(0, '17.239')] [2024-10-06 12:30:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3584000. Throughput: 0: 202.9. Samples: 897396. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:30:04,982][00491] Avg episode reward: [(0, '17.538')] [2024-10-06 12:30:09,973][00491] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3588096. Throughput: 0: 209.9. Samples: 898672. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:30:09,981][00491] Avg episode reward: [(0, '18.284')] [2024-10-06 12:30:14,975][00491] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 3592192. Throughput: 0: 222.9. Samples: 899638. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:30:14,982][00491] Avg episode reward: [(0, '19.098')] [2024-10-06 12:30:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3596288. Throughput: 0: 204.8. Samples: 900646. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:30:19,975][00491] Avg episode reward: [(0, '19.333')] [2024-10-06 12:30:24,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3600384. Throughput: 0: 205.6. Samples: 901692. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:30:24,976][00491] Avg episode reward: [(0, '19.135')] [2024-10-06 12:30:28,657][04755] Updated weights for policy 0, policy_version 880 (0.0994) [2024-10-06 12:30:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3604480. Throughput: 0: 216.0. Samples: 902664. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:30:29,977][00491] Avg episode reward: [(0, '19.494')] [2024-10-06 12:30:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3608576. Throughput: 0: 205.0. Samples: 903674. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:30:34,983][00491] Avg episode reward: [(0, '20.596')] [2024-10-06 12:30:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3612672. Throughput: 0: 201.9. Samples: 904718. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:30:39,976][00491] Avg episode reward: [(0, '20.612')] [2024-10-06 12:30:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3616768. Throughput: 0: 207.6. Samples: 905680. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:30:44,976][00491] Avg episode reward: [(0, '20.653')] [2024-10-06 12:30:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 847.0). Total num frames: 3620864. Throughput: 0: 214.6. Samples: 907054. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:30:49,980][00491] Avg episode reward: [(0, '20.576')] [2024-10-06 12:30:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3624960. Throughput: 0: 206.3. Samples: 907954. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:30:54,979][00491] Avg episode reward: [(0, '20.649')] [2024-10-06 12:30:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 847.0). Total num frames: 3629056. Throughput: 0: 202.9. Samples: 908766. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:30:59,984][00491] Avg episode reward: [(0, '20.946')] [2024-10-06 12:31:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3633152. Throughput: 0: 207.4. Samples: 909978. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:31:04,976][00491] Avg episode reward: [(0, '21.337')] [2024-10-06 12:31:06,699][04742] Saving new best policy, reward=21.337! [2024-10-06 12:31:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3637248. Throughput: 0: 215.0. Samples: 911368. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:31:09,977][00491] Avg episode reward: [(0, '21.525')] [2024-10-06 12:31:13,103][04742] Saving new best policy, reward=21.525! [2024-10-06 12:31:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3641344. Throughput: 0: 203.2. Samples: 911806. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:31:14,981][00491] Avg episode reward: [(0, '21.307')] [2024-10-06 12:31:17,991][04755] Updated weights for policy 0, policy_version 890 (0.2127) [2024-10-06 12:31:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3645440. Throughput: 0: 205.6. Samples: 912924. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:31:19,975][00491] Avg episode reward: [(0, '21.367')] [2024-10-06 12:31:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3649536. Throughput: 0: 223.9. Samples: 914792. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:31:24,976][00491] Avg episode reward: [(0, '21.092')] [2024-10-06 12:31:29,977][00491] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 847.0). Total num frames: 3653632. Throughput: 0: 207.5. Samples: 915020. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:31:29,981][00491] Avg episode reward: [(0, '21.056')] [2024-10-06 12:31:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3657728. Throughput: 0: 199.9. Samples: 916048. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:31:34,975][00491] Avg episode reward: [(0, '21.059')] [2024-10-06 12:31:39,973][00491] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3661824. Throughput: 0: 222.2. Samples: 917952. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:31:39,976][00491] Avg episode reward: [(0, '21.490')] [2024-10-06 12:31:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3665920. Throughput: 0: 217.6. Samples: 918560. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:31:44,978][00491] Avg episode reward: [(0, '21.508')] [2024-10-06 12:31:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3670016. Throughput: 0: 209.6. Samples: 919412. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:31:49,977][00491] Avg episode reward: [(0, '21.082')] [2024-10-06 12:31:51,427][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000897_3674112.pth... [2024-10-06 12:31:51,545][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000847_3469312.pth [2024-10-06 12:31:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3674112. Throughput: 0: 214.9. Samples: 921038. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:31:54,976][00491] Avg episode reward: [(0, '21.302')] [2024-10-06 12:31:59,975][00491] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 3682304. Throughput: 0: 219.4. Samples: 921678. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:31:59,984][00491] Avg episode reward: [(0, '21.410')] [2024-10-06 12:32:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3682304. Throughput: 0: 220.4. Samples: 922842. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:04,975][00491] Avg episode reward: [(0, '21.623')] [2024-10-06 12:32:06,004][04755] Updated weights for policy 0, policy_version 900 (0.0511) [2024-10-06 12:32:09,104][04742] Signal inference workers to stop experience collection... (900 times) [2024-10-06 12:32:09,157][04755] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-10-06 12:32:09,973][00491] Fps is (10 sec: 409.7, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3686400. Throughput: 0: 206.4. Samples: 924080. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:09,981][00491] Avg episode reward: [(0, '21.414')] [2024-10-06 12:32:10,543][04742] Signal inference workers to resume experience collection... (900 times) [2024-10-06 12:32:10,544][04755] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-10-06 12:32:10,546][04742] Saving new best policy, reward=21.623! [2024-10-06 12:32:14,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3694592. Throughput: 0: 215.7. Samples: 924724. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:14,984][00491] Avg episode reward: [(0, '21.459')] [2024-10-06 12:32:19,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3698688. Throughput: 0: 223.7. Samples: 926114. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:19,976][00491] Avg episode reward: [(0, '21.385')] [2024-10-06 12:32:24,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3698688. Throughput: 0: 204.0. Samples: 927130. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:24,980][00491] Avg episode reward: [(0, '21.390')] [2024-10-06 12:32:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3706880. Throughput: 0: 206.2. Samples: 927840. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:29,975][00491] Avg episode reward: [(0, '22.057')] [2024-10-06 12:32:33,645][04742] Saving new best policy, reward=22.057! [2024-10-06 12:32:34,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3710976. Throughput: 0: 218.5. Samples: 929244. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:34,983][00491] Avg episode reward: [(0, '21.815')] [2024-10-06 12:32:39,974][00491] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3715072. Throughput: 0: 205.7. Samples: 930294. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:39,983][00491] Avg episode reward: [(0, '21.815')] [2024-10-06 12:32:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3719168. Throughput: 0: 206.0. Samples: 930948. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:44,976][00491] Avg episode reward: [(0, '21.815')] [2024-10-06 12:32:49,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3723264. Throughput: 0: 209.9. Samples: 932288. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:49,982][00491] Avg episode reward: [(0, '21.789')] [2024-10-06 12:32:52,624][04755] Updated weights for policy 0, policy_version 910 (0.0998) [2024-10-06 12:32:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3727360. Throughput: 0: 214.3. Samples: 933724. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:54,976][00491] Avg episode reward: [(0, '21.659')] [2024-10-06 12:32:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3731456. Throughput: 0: 213.9. Samples: 934348. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:32:59,976][00491] Avg episode reward: [(0, '22.357')] [2024-10-06 12:33:03,504][04742] Saving new best policy, reward=22.357! [2024-10-06 12:33:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3735552. Throughput: 0: 205.9. Samples: 935380. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:33:04,976][00491] Avg episode reward: [(0, '22.008')] [2024-10-06 12:33:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3739648. Throughput: 0: 219.8. Samples: 937020. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:33:09,976][00491] Avg episode reward: [(0, '22.627')] [2024-10-06 12:33:12,010][04742] Saving new best policy, reward=22.627! [2024-10-06 12:33:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3743744. Throughput: 0: 213.1. Samples: 937430. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:33:14,978][00491] Avg episode reward: [(0, '22.886')] [2024-10-06 12:33:18,366][04742] Saving new best policy, reward=22.886! [2024-10-06 12:33:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3747840. Throughput: 0: 203.1. Samples: 938382. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:33:19,976][00491] Avg episode reward: [(0, '23.223')] [2024-10-06 12:33:22,632][04742] Saving new best policy, reward=23.223! [2024-10-06 12:33:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3751936. Throughput: 0: 218.3. Samples: 940116. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:33:24,976][00491] Avg episode reward: [(0, '23.509')] [2024-10-06 12:33:26,730][04742] Saving new best policy, reward=23.509! [2024-10-06 12:33:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3756032. Throughput: 0: 216.3. Samples: 940680. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:33:29,976][00491] Avg episode reward: [(0, '23.539')] [2024-10-06 12:33:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3760128. Throughput: 0: 210.8. Samples: 941774. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:33:34,978][00491] Avg episode reward: [(0, '23.539')] [2024-10-06 12:33:37,817][04742] Saving new best policy, reward=23.539! [2024-10-06 12:33:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3764224. Throughput: 0: 208.9. Samples: 943126. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:33:39,982][00491] Avg episode reward: [(0, '23.140')] [2024-10-06 12:33:41,984][04755] Updated weights for policy 0, policy_version 920 (0.0060) [2024-10-06 12:33:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3768320. Throughput: 0: 207.6. Samples: 943688. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:33:44,983][00491] Avg episode reward: [(0, '22.860')] [2024-10-06 12:33:49,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3772416. Throughput: 0: 213.8. Samples: 945000. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:33:49,977][00491] Avg episode reward: [(0, '23.060')] [2024-10-06 12:33:52,265][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000922_3776512.pth... [2024-10-06 12:33:52,390][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000872_3571712.pth [2024-10-06 12:33:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3776512. Throughput: 0: 201.1. Samples: 946070. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:33:54,975][00491] Avg episode reward: [(0, '23.190')] [2024-10-06 12:33:59,973][00491] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3780608. Throughput: 0: 204.8. Samples: 946644. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:33:59,986][00491] Avg episode reward: [(0, '22.041')] [2024-10-06 12:34:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3784704. Throughput: 0: 223.2. Samples: 948424. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:34:04,985][00491] Avg episode reward: [(0, '21.905')] [2024-10-06 12:34:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3788800. Throughput: 0: 205.8. Samples: 949376. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:34:09,978][00491] Avg episode reward: [(0, '21.556')] [2024-10-06 12:34:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3792896. Throughput: 0: 203.1. Samples: 949818. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:34:14,987][00491] Avg episode reward: [(0, '22.272')] [2024-10-06 12:34:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3796992. Throughput: 0: 217.2. Samples: 951550. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:34:19,979][00491] Avg episode reward: [(0, '21.712')] [2024-10-06 12:34:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3801088. Throughput: 0: 214.4. Samples: 952776. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:34:24,975][00491] Avg episode reward: [(0, '20.938')] [2024-10-06 12:34:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3805184. Throughput: 0: 212.4. Samples: 953248. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:34:29,976][00491] Avg episode reward: [(0, '20.596')] [2024-10-06 12:34:31,187][04755] Updated weights for policy 0, policy_version 930 (0.1123) [2024-10-06 12:34:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3809280. Throughput: 0: 214.1. Samples: 954636. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:34:34,975][00491] Avg episode reward: [(0, '20.553')] [2024-10-06 12:34:39,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3817472. Throughput: 0: 217.0. Samples: 955836. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:34:39,977][00491] Avg episode reward: [(0, '19.929')] [2024-10-06 12:34:44,977][00491] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 833.1). Total num frames: 3817472. Throughput: 0: 219.0. Samples: 956500. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:34:44,980][00491] Avg episode reward: [(0, '19.536')] [2024-10-06 12:34:49,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3821568. Throughput: 0: 204.1. Samples: 957608. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:34:49,988][00491] Avg episode reward: [(0, '19.401')] [2024-10-06 12:34:54,973][00491] Fps is (10 sec: 1229.3, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3829760. Throughput: 0: 212.6. Samples: 958942. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:34:54,976][00491] Avg episode reward: [(0, '19.769')] [2024-10-06 12:34:59,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3833856. Throughput: 0: 225.6. Samples: 959972. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:34:59,985][00491] Avg episode reward: [(0, '19.233')] [2024-10-06 12:35:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3837952. Throughput: 0: 209.2. Samples: 960962. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:35:04,976][00491] Avg episode reward: [(0, '18.635')] [2024-10-06 12:35:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3842048. Throughput: 0: 205.6. Samples: 962028. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:35:09,981][00491] Avg episode reward: [(0, '18.993')] [2024-10-06 12:35:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3846144. Throughput: 0: 213.9. Samples: 962874. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:35:14,979][00491] Avg episode reward: [(0, '18.918')] [2024-10-06 12:35:18,644][04755] Updated weights for policy 0, policy_version 940 (0.1472) [2024-10-06 12:35:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3850240. Throughput: 0: 210.5. Samples: 964108. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:35:19,976][00491] Avg episode reward: [(0, '18.808')] [2024-10-06 12:35:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3854336. Throughput: 0: 207.5. Samples: 965172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:35:24,976][00491] Avg episode reward: [(0, '18.936')] [2024-10-06 12:35:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3858432. Throughput: 0: 213.1. Samples: 966090. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:35:29,981][00491] Avg episode reward: [(0, '18.412')] [2024-10-06 12:35:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3862528. Throughput: 0: 214.9. Samples: 967278. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:35:34,983][00491] Avg episode reward: [(0, '18.893')] [2024-10-06 12:35:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3866624. Throughput: 0: 209.9. Samples: 968386. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-10-06 12:35:39,977][00491] Avg episode reward: [(0, '18.937')] [2024-10-06 12:35:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3870720. Throughput: 0: 201.4. Samples: 969034. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:35:44,984][00491] Avg episode reward: [(0, '18.723')] [2024-10-06 12:35:47,934][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000946_3874816.pth... [2024-10-06 12:35:48,057][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000897_3674112.pth [2024-10-06 12:35:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3874816. Throughput: 0: 206.7. Samples: 970264. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:35:49,979][00491] Avg episode reward: [(0, '18.508')] [2024-10-06 12:35:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3878912. Throughput: 0: 211.8. Samples: 971558. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:35:54,978][00491] Avg episode reward: [(0, '18.218')] [2024-10-06 12:35:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3883008. Throughput: 0: 207.2. Samples: 972200. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:35:59,976][00491] Avg episode reward: [(0, '18.737')] [2024-10-06 12:36:04,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3887104. Throughput: 0: 202.7. Samples: 973230. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:36:04,982][00491] Avg episode reward: [(0, '18.535')] [2024-10-06 12:36:07,499][04755] Updated weights for policy 0, policy_version 950 (0.0046) [2024-10-06 12:36:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3891200. Throughput: 0: 216.6. Samples: 974920. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:36:09,980][04742] Signal inference workers to stop experience collection... (950 times) [2024-10-06 12:36:09,978][00491] Avg episode reward: [(0, '18.453')] [2024-10-06 12:36:10,075][04755] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-10-06 12:36:12,012][04742] Signal inference workers to resume experience collection... (950 times) [2024-10-06 12:36:12,014][04755] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-10-06 12:36:14,976][00491] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3895296. Throughput: 0: 205.2. Samples: 975326. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:36:14,980][00491] Avg episode reward: [(0, '18.731')] [2024-10-06 12:36:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3899392. Throughput: 0: 200.2. Samples: 976288. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:36:19,976][00491] Avg episode reward: [(0, '18.603')] [2024-10-06 12:36:24,973][00491] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3903488. Throughput: 0: 210.3. Samples: 977848. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:36:24,976][00491] Avg episode reward: [(0, '19.258')] [2024-10-06 12:36:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3907584. Throughput: 0: 212.8. Samples: 978610. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-10-06 12:36:29,978][00491] Avg episode reward: [(0, '19.291')] [2024-10-06 12:36:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3911680. Throughput: 0: 209.5. Samples: 979690. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:36:34,981][00491] Avg episode reward: [(0, '19.344')] [2024-10-06 12:36:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3915776. Throughput: 0: 209.1. Samples: 980968. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:36:39,976][00491] Avg episode reward: [(0, '20.127')] [2024-10-06 12:36:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3919872. Throughput: 0: 212.3. Samples: 981752. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:36:44,982][00491] Avg episode reward: [(0, '20.146')] [2024-10-06 12:36:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3923968. Throughput: 0: 221.0. Samples: 983176. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:36:49,976][00491] Avg episode reward: [(0, '20.047')] [2024-10-06 12:36:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3928064. Throughput: 0: 202.4. Samples: 984030. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:36:54,977][00491] Avg episode reward: [(0, '20.188')] [2024-10-06 12:36:56,285][04755] Updated weights for policy 0, policy_version 960 (0.1560) [2024-10-06 12:36:59,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3932160. Throughput: 0: 212.2. Samples: 984876. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:36:59,975][00491] Avg episode reward: [(0, '19.956')] [2024-10-06 12:37:04,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3940352. Throughput: 0: 228.4. Samples: 986568. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:37:04,979][00491] Avg episode reward: [(0, '19.966')] [2024-10-06 12:37:09,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3940352. Throughput: 0: 215.9. Samples: 987564. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:37:09,978][00491] Avg episode reward: [(0, '20.791')] [2024-10-06 12:37:14,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3944448. Throughput: 0: 207.1. Samples: 987928. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:37:14,975][00491] Avg episode reward: [(0, '20.356')] [2024-10-06 12:37:19,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3952640. Throughput: 0: 221.9. Samples: 989676. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:37:19,978][00491] Avg episode reward: [(0, '21.646')] [2024-10-06 12:37:24,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3956736. Throughput: 0: 216.8. Samples: 990724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:37:24,979][00491] Avg episode reward: [(0, '21.719')] [2024-10-06 12:37:29,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3956736. Throughput: 0: 211.6. Samples: 991276. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:37:29,983][00491] Avg episode reward: [(0, '20.615')] [2024-10-06 12:37:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3964928. Throughput: 0: 212.1. Samples: 992720. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:37:34,979][00491] Avg episode reward: [(0, '20.665')] [2024-10-06 12:37:39,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3969024. Throughput: 0: 220.4. Samples: 993950. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:37:39,976][00491] Avg episode reward: [(0, '20.915')] [2024-10-06 12:37:44,283][04755] Updated weights for policy 0, policy_version 970 (0.2033) [2024-10-06 12:37:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3973120. Throughput: 0: 217.2. Samples: 994652. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:37:44,976][00491] Avg episode reward: [(0, '20.978')] [2024-10-06 12:37:49,357][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000971_3977216.pth... [2024-10-06 12:37:49,475][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000922_3776512.pth [2024-10-06 12:37:49,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3977216. Throughput: 0: 206.3. Samples: 995850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:37:49,976][00491] Avg episode reward: [(0, '20.473')] [2024-10-06 12:37:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3981312. Throughput: 0: 213.5. Samples: 997172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:37:54,978][00491] Avg episode reward: [(0, '19.867')] [2024-10-06 12:37:59,975][00491] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 3985408. Throughput: 0: 222.0. Samples: 997920. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:37:59,981][00491] Avg episode reward: [(0, '19.998')] [2024-10-06 12:38:04,975][00491] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 3989504. Throughput: 0: 205.1. Samples: 998904. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:38:04,979][00491] Avg episode reward: [(0, '19.759')] [2024-10-06 12:38:09,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3993600. Throughput: 0: 207.9. Samples: 1000080. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:38:09,976][00491] Avg episode reward: [(0, '19.360')] [2024-10-06 12:38:14,973][00491] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 3997696. Throughput: 0: 214.0. Samples: 1000906. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:38:14,977][00491] Avg episode reward: [(0, '19.290')] [2024-10-06 12:38:19,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 4001792. Throughput: 0: 212.1. Samples: 1002266. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:38:19,981][00491] Avg episode reward: [(0, '18.779')] [2024-10-06 12:38:24,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 4005888. Throughput: 0: 209.1. Samples: 1003360. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:38:24,976][00491] Avg episode reward: [(0, '18.530')] [2024-10-06 12:38:29,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 4009984. Throughput: 0: 208.2. Samples: 1004020. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:38:29,980][00491] Avg episode reward: [(0, '18.725')] [2024-10-06 12:38:31,541][04755] Updated weights for policy 0, policy_version 980 (0.1008) [2024-10-06 12:38:34,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 4014080. Throughput: 0: 218.0. Samples: 1005658. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:38:34,976][00491] Avg episode reward: [(0, '19.518')] [2024-10-06 12:38:39,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 4018176. Throughput: 0: 209.2. Samples: 1006586. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:38:39,980][00491] Avg episode reward: [(0, '19.413')] [2024-10-06 12:38:44,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 4022272. Throughput: 0: 206.2. Samples: 1007198. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:38:44,976][00491] Avg episode reward: [(0, '19.257')] [2024-10-06 12:38:49,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 4030464. Throughput: 0: 225.6. Samples: 1009054. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:38:49,982][00491] Avg episode reward: [(0, '19.672')] [2024-10-06 12:38:54,973][00491] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 4030464. Throughput: 0: 223.4. Samples: 1010132. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-10-06 12:38:54,979][00491] Avg episode reward: [(0, '19.743')] [2024-10-06 12:38:59,973][00491] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 4034560. Throughput: 0: 214.4. Samples: 1010554. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:38:59,977][00491] Avg episode reward: [(0, '19.286')] [2024-10-06 12:39:04,973][00491] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 4042752. Throughput: 0: 219.7. Samples: 1012154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:39:04,980][00491] Avg episode reward: [(0, '19.701')] [2024-10-06 12:39:09,974][00491] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 4046848. Throughput: 0: 219.9. Samples: 1013258. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-10-06 12:39:09,984][00491] Avg episode reward: [(0, '20.268')] [2024-10-06 12:39:14,973][00491] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 4050944. Throughput: 0: 220.5. Samples: 1013942. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:39:14,980][00491] Avg episode reward: [(0, '21.089')] [2024-10-06 12:39:19,786][04755] Updated weights for policy 0, policy_version 990 (0.2118) [2024-10-06 12:39:19,973][00491] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 4055040. Throughput: 0: 208.9. Samples: 1015060. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-10-06 12:39:19,984][00491] Avg episode reward: [(0, '20.968')] [2024-10-06 12:39:20,158][00491] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 491], exiting... [2024-10-06 12:39:20,163][04742] Stopping Batcher_0... [2024-10-06 12:39:20,164][04742] Loop batcher_evt_loop terminating... [2024-10-06 12:39:20,165][00491] Runner profile tree view: main_loop: 4891.7018 [2024-10-06 12:39:20,170][00491] Collected {0: 4055040}, FPS: 829.0 [2024-10-06 12:39:20,537][04755] Weights refcount: 2 0 [2024-10-06 12:39:20,549][04755] Stopping InferenceWorker_p0-w0... [2024-10-06 12:39:20,557][04755] Loop inference_proc0-0_evt_loop terminating... [2024-10-06 12:39:20,753][04758] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(1, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-10-06 12:39:20,824][04758] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop [2024-10-06 12:39:20,776][04757] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-10-06 12:39:20,826][04757] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop [2024-10-06 12:39:20,788][04761] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-10-06 12:39:20,827][04761] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop [2024-10-06 12:39:20,788][04760] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-10-06 12:39:20,884][04762] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-10-06 12:39:20,973][04762] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop [2024-10-06 12:39:20,964][04760] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop [2024-10-06 12:39:20,902][04756] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(1, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-10-06 12:39:21,168][04756] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop [2024-10-06 12:39:21,003][04759] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(1, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-10-06 12:39:21,218][04759] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop [2024-10-06 12:39:22,310][04763] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-10-06 12:39:22,424][04763] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop [2024-10-06 12:39:27,979][04742] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000991_4059136.pth... [2024-10-06 12:39:28,133][04742] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000946_3874816.pth [2024-10-06 12:39:28,153][04742] Stopping LearnerWorker_p0... [2024-10-06 12:39:28,154][04742] Loop learner_proc0_evt_loop terminating... [2024-10-06 12:39:31,191][00491] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-06 12:39:31,196][00491] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-06 12:39:31,197][00491] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-06 12:39:31,199][00491] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-06 12:39:31,202][00491] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-06 12:39:31,208][00491] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-06 12:39:31,210][00491] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-10-06 12:39:31,212][00491] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-06 12:39:31,214][00491] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-10-06 12:39:31,217][00491] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-10-06 12:39:31,219][00491] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-06 12:39:31,221][00491] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-06 12:39:31,223][00491] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-06 12:39:31,225][00491] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-06 12:39:31,242][00491] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-06 12:39:31,332][00491] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 12:39:31,344][00491] RunningMeanStd input shape: (3, 72, 128) [2024-10-06 12:39:31,356][00491] RunningMeanStd input shape: (1,) [2024-10-06 12:39:31,455][00491] ConvEncoder: input_channels=3 [2024-10-06 12:39:31,663][00491] Conv encoder output size: 512 [2024-10-06 12:39:31,665][00491] Policy head output size: 512 [2024-10-06 12:39:31,692][00491] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000991_4059136.pth... [2024-10-06 12:39:32,403][00491] Num frames 100... [2024-10-06 12:39:32,623][00491] Num frames 200... [2024-10-06 12:39:32,829][00491] Num frames 300... [2024-10-06 12:39:33,040][00491] Num frames 400... [2024-10-06 12:39:33,255][00491] Num frames 500... [2024-10-06 12:39:33,475][00491] Num frames 600... [2024-10-06 12:39:33,611][00491] Avg episode rewards: #0: 11.400, true rewards: #0: 6.400 [2024-10-06 12:39:33,614][00491] Avg episode reward: 11.400, avg true_objective: 6.400 [2024-10-06 12:39:33,743][00491] Num frames 700... [2024-10-06 12:39:33,943][00491] Num frames 800... [2024-10-06 12:39:34,158][00491] Num frames 900... [2024-10-06 12:39:34,371][00491] Num frames 1000... [2024-10-06 12:39:34,576][00491] Num frames 1100... [2024-10-06 12:39:34,795][00491] Num frames 1200... [2024-10-06 12:39:35,005][00491] Num frames 1300... [2024-10-06 12:39:35,214][00491] Num frames 1400... [2024-10-06 12:39:35,424][00491] Num frames 1500... [2024-10-06 12:39:35,640][00491] Num frames 1600... [2024-10-06 12:39:35,856][00491] Num frames 1700... [2024-10-06 12:39:36,059][00491] Num frames 1800... [2024-10-06 12:39:36,277][00491] Num frames 1900... [2024-10-06 12:39:36,494][00491] Num frames 2000... [2024-10-06 12:39:36,706][00491] Num frames 2100... [2024-10-06 12:39:36,936][00491] Num frames 2200... [2024-10-06 12:39:37,153][00491] Num frames 2300... [2024-10-06 12:39:37,363][00491] Num frames 2400... [2024-10-06 12:39:37,583][00491] Num frames 2500... [2024-10-06 12:39:37,698][00491] Avg episode rewards: #0: 29.130, true rewards: #0: 12.630 [2024-10-06 12:39:37,700][00491] Avg episode reward: 29.130, avg true_objective: 12.630 [2024-10-06 12:39:37,871][00491] Num frames 2600... [2024-10-06 12:39:38,090][00491] Num frames 2700... [2024-10-06 12:39:38,308][00491] Num frames 2800... [2024-10-06 12:39:38,519][00491] Num frames 2900... [2024-10-06 12:39:38,725][00491] Num frames 3000... [2024-10-06 12:39:38,948][00491] Num frames 3100... [2024-10-06 12:39:39,142][00491] Avg episode rewards: #0: 22.886, true rewards: #0: 10.553 [2024-10-06 12:39:39,144][00491] Avg episode reward: 22.886, avg true_objective: 10.553 [2024-10-06 12:39:39,219][00491] Num frames 3200... [2024-10-06 12:39:39,432][00491] Num frames 3300... [2024-10-06 12:39:39,643][00491] Num frames 3400... [2024-10-06 12:39:39,860][00491] Num frames 3500... [2024-10-06 12:39:40,071][00491] Num frames 3600... [2024-10-06 12:39:40,306][00491] Avg episode rewards: #0: 19.445, true rewards: #0: 9.195 [2024-10-06 12:39:40,308][00491] Avg episode reward: 19.445, avg true_objective: 9.195 [2024-10-06 12:39:40,356][00491] Num frames 3700... [2024-10-06 12:39:40,571][00491] Num frames 3800... [2024-10-06 12:39:40,785][00491] Num frames 3900... [2024-10-06 12:39:41,008][00491] Num frames 4000... [2024-10-06 12:39:41,231][00491] Num frames 4100... [2024-10-06 12:39:41,449][00491] Num frames 4200... [2024-10-06 12:39:41,739][00491] Num frames 4300... [2024-10-06 12:39:42,044][00491] Num frames 4400... [2024-10-06 12:39:42,333][00491] Num frames 4500... [2024-10-06 12:39:42,623][00491] Num frames 4600... [2024-10-06 12:39:42,915][00491] Num frames 4700... [2024-10-06 12:39:43,241][00491] Num frames 4800... [2024-10-06 12:39:43,547][00491] Num frames 4900... [2024-10-06 12:39:43,684][00491] Avg episode rewards: #0: 22.052, true rewards: #0: 9.852 [2024-10-06 12:39:43,687][00491] Avg episode reward: 22.052, avg true_objective: 9.852 [2024-10-06 12:39:43,901][00491] Num frames 5000... [2024-10-06 12:39:44,204][00491] Num frames 5100... [2024-10-06 12:39:44,496][00491] Num frames 5200... [2024-10-06 12:39:44,741][00491] Num frames 5300... [2024-10-06 12:39:44,954][00491] Num frames 5400... [2024-10-06 12:39:45,180][00491] Num frames 5500... [2024-10-06 12:39:45,397][00491] Num frames 5600... [2024-10-06 12:39:45,614][00491] Num frames 5700... [2024-10-06 12:39:45,823][00491] Num frames 5800... [2024-10-06 12:39:46,041][00491] Num frames 5900... [2024-10-06 12:39:46,263][00491] Num frames 6000... [2024-10-06 12:39:46,486][00491] Num frames 6100... [2024-10-06 12:39:46,693][00491] Num frames 6200... [2024-10-06 12:39:46,908][00491] Num frames 6300... [2024-10-06 12:39:46,973][00491] Avg episode rewards: #0: 23.837, true rewards: #0: 10.503 [2024-10-06 12:39:46,976][00491] Avg episode reward: 23.837, avg true_objective: 10.503 [2024-10-06 12:39:47,196][00491] Num frames 6400... [2024-10-06 12:39:47,410][00491] Num frames 6500... [2024-10-06 12:39:47,618][00491] Num frames 6600... [2024-10-06 12:39:47,825][00491] Num frames 6700... [2024-10-06 12:39:48,038][00491] Num frames 6800... [2024-10-06 12:39:48,197][00491] Avg episode rewards: #0: 21.923, true rewards: #0: 9.780 [2024-10-06 12:39:48,201][00491] Avg episode reward: 21.923, avg true_objective: 9.780 [2024-10-06 12:39:48,316][00491] Num frames 6900... [2024-10-06 12:39:48,529][00491] Num frames 7000... [2024-10-06 12:39:48,736][00491] Num frames 7100... [2024-10-06 12:39:48,944][00491] Num frames 7200... [2024-10-06 12:39:49,161][00491] Num frames 7300... [2024-10-06 12:39:49,386][00491] Num frames 7400... [2024-10-06 12:39:49,598][00491] Num frames 7500... [2024-10-06 12:39:49,810][00491] Num frames 7600... [2024-10-06 12:39:50,021][00491] Num frames 7700... [2024-10-06 12:39:50,244][00491] Num frames 7800... [2024-10-06 12:39:50,475][00491] Num frames 7900... [2024-10-06 12:39:50,696][00491] Num frames 8000... [2024-10-06 12:39:50,927][00491] Num frames 8100... [2024-10-06 12:39:51,151][00491] Num frames 8200... [2024-10-06 12:39:51,383][00491] Num frames 8300... [2024-10-06 12:39:51,603][00491] Num frames 8400... [2024-10-06 12:39:51,824][00491] Num frames 8500... [2024-10-06 12:39:52,060][00491] Avg episode rewards: #0: 25.347, true rewards: #0: 10.722 [2024-10-06 12:39:52,062][00491] Avg episode reward: 25.347, avg true_objective: 10.722 [2024-10-06 12:39:52,117][00491] Num frames 8600... [2024-10-06 12:39:52,357][00491] Num frames 8700... [2024-10-06 12:39:52,583][00491] Num frames 8800... [2024-10-06 12:39:52,807][00491] Num frames 8900... [2024-10-06 12:39:53,060][00491] Num frames 9000... [2024-10-06 12:39:53,288][00491] Num frames 9100... [2024-10-06 12:39:53,523][00491] Num frames 9200... [2024-10-06 12:39:53,730][00491] Num frames 9300... [2024-10-06 12:39:53,949][00491] Num frames 9400... [2024-10-06 12:39:54,184][00491] Num frames 9500... [2024-10-06 12:39:54,256][00491] Avg episode rewards: #0: 24.562, true rewards: #0: 10.562 [2024-10-06 12:39:54,258][00491] Avg episode reward: 24.562, avg true_objective: 10.562 [2024-10-06 12:39:54,474][00491] Num frames 9600... [2024-10-06 12:39:54,708][00491] Num frames 9700... [2024-10-06 12:39:55,000][00491] Num frames 9800... [2024-10-06 12:39:55,286][00491] Num frames 9900... [2024-10-06 12:39:55,589][00491] Num frames 10000... [2024-10-06 12:39:55,878][00491] Num frames 10100... [2024-10-06 12:39:56,165][00491] Num frames 10200... [2024-10-06 12:39:56,394][00491] Avg episode rewards: #0: 23.760, true rewards: #0: 10.260 [2024-10-06 12:39:56,396][00491] Avg episode reward: 23.760, avg true_objective: 10.260 [2024-10-06 12:41:11,603][00491] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-10-06 12:45:15,013][00491] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-06 12:45:15,015][00491] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-06 12:45:15,021][00491] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-06 12:45:15,023][00491] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-06 12:45:15,026][00491] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-06 12:45:15,029][00491] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-06 12:45:15,030][00491] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-10-06 12:45:15,032][00491] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-06 12:45:15,033][00491] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-10-06 12:45:15,034][00491] Adding new argument 'hf_repository'='maavaneck/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-10-06 12:45:15,036][00491] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-06 12:45:15,037][00491] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-06 12:45:15,039][00491] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-06 12:45:15,041][00491] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-06 12:45:15,042][00491] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-06 12:45:15,091][00491] RunningMeanStd input shape: (3, 72, 128) [2024-10-06 12:45:15,095][00491] RunningMeanStd input shape: (1,) [2024-10-06 12:45:15,114][00491] ConvEncoder: input_channels=3 [2024-10-06 12:45:15,175][00491] Conv encoder output size: 512 [2024-10-06 12:45:15,178][00491] Policy head output size: 512 [2024-10-06 12:45:15,203][00491] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000991_4059136.pth... [2024-10-06 12:45:15,791][00491] Num frames 100... [2024-10-06 12:45:15,994][00491] Num frames 200... [2024-10-06 12:45:16,216][00491] Num frames 300... [2024-10-06 12:45:16,444][00491] Num frames 400... [2024-10-06 12:45:16,654][00491] Num frames 500... [2024-10-06 12:45:16,870][00491] Num frames 600... [2024-10-06 12:45:17,013][00491] Avg episode rewards: #0: 11.400, true rewards: #0: 6.400 [2024-10-06 12:45:17,016][00491] Avg episode reward: 11.400, avg true_objective: 6.400 [2024-10-06 12:45:17,150][00491] Num frames 700... [2024-10-06 12:45:17,357][00491] Num frames 800... [2024-10-06 12:45:17,579][00491] Num frames 900... [2024-10-06 12:45:17,790][00491] Num frames 1000... [2024-10-06 12:45:18,001][00491] Num frames 1100... [2024-10-06 12:45:18,169][00491] Avg episode rewards: #0: 9.760, true rewards: #0: 5.760 [2024-10-06 12:45:18,171][00491] Avg episode reward: 9.760, avg true_objective: 5.760 [2024-10-06 12:45:18,276][00491] Num frames 1200... [2024-10-06 12:45:18,504][00491] Num frames 1300... [2024-10-06 12:45:18,716][00491] Num frames 1400... [2024-10-06 12:45:18,923][00491] Num frames 1500... [2024-10-06 12:45:19,135][00491] Num frames 1600... [2024-10-06 12:45:19,348][00491] Num frames 1700... [2024-10-06 12:45:19,570][00491] Num frames 1800... [2024-10-06 12:45:19,789][00491] Num frames 1900... [2024-10-06 12:45:20,009][00491] Num frames 2000... [2024-10-06 12:45:20,226][00491] Num frames 2100... [2024-10-06 12:45:20,457][00491] Num frames 2200... [2024-10-06 12:45:20,697][00491] Num frames 2300... [2024-10-06 12:45:20,928][00491] Num frames 2400... [2024-10-06 12:45:21,140][00491] Num frames 2500... [2024-10-06 12:45:21,349][00491] Num frames 2600... [2024-10-06 12:45:21,574][00491] Num frames 2700... [2024-10-06 12:45:21,793][00491] Num frames 2800... [2024-10-06 12:45:22,007][00491] Num frames 2900... [2024-10-06 12:45:22,224][00491] Num frames 3000... [2024-10-06 12:45:22,485][00491] Avg episode rewards: #0: 24.650, true rewards: #0: 10.317 [2024-10-06 12:45:22,487][00491] Avg episode reward: 24.650, avg true_objective: 10.317 [2024-10-06 12:45:22,502][00491] Num frames 3100... [2024-10-06 12:45:22,723][00491] Num frames 3200... [2024-10-06 12:45:22,934][00491] Num frames 3300... [2024-10-06 12:45:23,162][00491] Num frames 3400... [2024-10-06 12:45:23,374][00491] Num frames 3500... [2024-10-06 12:45:23,584][00491] Num frames 3600... [2024-10-06 12:45:23,799][00491] Avg episode rewards: #0: 20.427, true rewards: #0: 9.178 [2024-10-06 12:45:23,802][00491] Avg episode reward: 20.427, avg true_objective: 9.178 [2024-10-06 12:45:23,870][00491] Num frames 3700... [2024-10-06 12:45:24,116][00491] Num frames 3800... [2024-10-06 12:45:24,345][00491] Num frames 3900... [2024-10-06 12:45:24,549][00491] Num frames 4000... [2024-10-06 12:45:24,786][00491] Num frames 4100... [2024-10-06 12:45:25,077][00491] Num frames 4200... [2024-10-06 12:45:25,379][00491] Num frames 4300... [2024-10-06 12:45:25,678][00491] Num frames 4400... [2024-10-06 12:45:25,736][00491] Avg episode rewards: #0: 19.400, true rewards: #0: 8.800 [2024-10-06 12:45:25,739][00491] Avg episode reward: 19.400, avg true_objective: 8.800 [2024-10-06 12:45:26,020][00491] Num frames 4500... [2024-10-06 12:45:26,314][00491] Num frames 4600... [2024-10-06 12:45:26,620][00491] Num frames 4700... [2024-10-06 12:45:26,930][00491] Num frames 4800... [2024-10-06 12:45:27,230][00491] Num frames 4900... [2024-10-06 12:45:27,516][00491] Num frames 5000... [2024-10-06 12:45:27,837][00491] Num frames 5100... [2024-10-06 12:45:27,911][00491] Avg episode rewards: #0: 18.510, true rewards: #0: 8.510 [2024-10-06 12:45:27,914][00491] Avg episode reward: 18.510, avg true_objective: 8.510 [2024-10-06 12:45:28,107][00491] Num frames 5200... [2024-10-06 12:45:28,323][00491] Num frames 5300... [2024-10-06 12:45:28,535][00491] Num frames 5400... [2024-10-06 12:45:28,741][00491] Num frames 5500... [2024-10-06 12:45:28,959][00491] Num frames 5600... [2024-10-06 12:45:29,175][00491] Num frames 5700... [2024-10-06 12:45:29,393][00491] Num frames 5800... [2024-10-06 12:45:29,601][00491] Num frames 5900... [2024-10-06 12:45:29,738][00491] Avg episode rewards: #0: 18.197, true rewards: #0: 8.483 [2024-10-06 12:45:29,740][00491] Avg episode reward: 18.197, avg true_objective: 8.483 [2024-10-06 12:45:29,877][00491] Num frames 6000... [2024-10-06 12:45:30,084][00491] Num frames 6100... [2024-10-06 12:45:30,301][00491] Num frames 6200... [2024-10-06 12:45:30,509][00491] Num frames 6300... [2024-10-06 12:45:30,717][00491] Num frames 6400... [2024-10-06 12:45:30,930][00491] Num frames 6500... [2024-10-06 12:45:31,145][00491] Num frames 6600... [2024-10-06 12:45:31,361][00491] Num frames 6700... [2024-10-06 12:45:31,502][00491] Avg episode rewards: #0: 17.672, true rewards: #0: 8.422 [2024-10-06 12:45:31,504][00491] Avg episode reward: 17.672, avg true_objective: 8.422 [2024-10-06 12:45:31,633][00491] Num frames 6800... [2024-10-06 12:45:31,840][00491] Num frames 6900... [2024-10-06 12:45:32,055][00491] Num frames 7000... [2024-10-06 12:45:32,270][00491] Num frames 7100... [2024-10-06 12:45:32,491][00491] Num frames 7200... [2024-10-06 12:45:32,701][00491] Num frames 7300... [2024-10-06 12:45:32,915][00491] Num frames 7400... [2024-10-06 12:45:33,147][00491] Num frames 7500... [2024-10-06 12:45:33,220][00491] Avg episode rewards: #0: 17.118, true rewards: #0: 8.340 [2024-10-06 12:45:33,223][00491] Avg episode reward: 17.118, avg true_objective: 8.340 [2024-10-06 12:45:33,426][00491] Num frames 7600... [2024-10-06 12:45:33,644][00491] Num frames 7700... [2024-10-06 12:45:33,854][00491] Num frames 7800... [2024-10-06 12:45:34,075][00491] Num frames 7900... [2024-10-06 12:45:34,310][00491] Num frames 8000... [2024-10-06 12:45:34,531][00491] Num frames 8100... [2024-10-06 12:45:34,688][00491] Avg episode rewards: #0: 16.946, true rewards: #0: 8.146 [2024-10-06 12:45:34,691][00491] Avg episode reward: 16.946, avg true_objective: 8.146 [2024-10-06 12:46:34,928][00491] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-10-06 12:50:37,108][00491] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-06 12:50:37,111][00491] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-06 12:50:37,114][00491] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-06 12:50:37,117][00491] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-06 12:50:37,120][00491] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-06 12:50:37,122][00491] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-06 12:50:37,123][00491] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-10-06 12:50:37,126][00491] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-06 12:50:37,127][00491] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-10-06 12:50:37,130][00491] Adding new argument 'hf_repository'='maavaneck/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-10-06 12:50:37,132][00491] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-06 12:50:37,135][00491] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-06 12:50:37,137][00491] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-06 12:50:37,139][00491] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-06 12:50:37,141][00491] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-06 12:50:37,189][00491] RunningMeanStd input shape: (3, 72, 128) [2024-10-06 12:50:37,193][00491] RunningMeanStd input shape: (1,) [2024-10-06 12:50:37,213][00491] ConvEncoder: input_channels=3 [2024-10-06 12:50:37,261][00491] Conv encoder output size: 512 [2024-10-06 12:50:37,264][00491] Policy head output size: 512 [2024-10-06 12:50:37,283][00491] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000991_4059136.pth... [2024-10-06 12:50:37,845][00491] Num frames 100... [2024-10-06 12:50:38,058][00491] Num frames 200... [2024-10-06 12:50:38,270][00491] Num frames 300... [2024-10-06 12:50:38,501][00491] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-10-06 12:50:38,504][00491] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-10-06 12:50:38,540][00491] Num frames 400... [2024-10-06 12:50:38,743][00491] Num frames 500... [2024-10-06 12:50:38,959][00491] Num frames 600... [2024-10-06 12:50:39,169][00491] Num frames 700... [2024-10-06 12:50:39,382][00491] Num frames 800... [2024-10-06 12:50:39,590][00491] Num frames 900... [2024-10-06 12:50:39,797][00491] Num frames 1000... [2024-10-06 12:50:40,096][00491] Num frames 1100... [2024-10-06 12:50:40,394][00491] Num frames 1200... [2024-10-06 12:50:40,668][00491] Num frames 1300... [2024-10-06 12:50:40,964][00491] Num frames 1400... [2024-10-06 12:50:41,258][00491] Num frames 1500... [2024-10-06 12:50:41,340][00491] Avg episode rewards: #0: 14.535, true rewards: #0: 7.535 [2024-10-06 12:50:41,343][00491] Avg episode reward: 14.535, avg true_objective: 7.535 [2024-10-06 12:50:41,612][00491] Num frames 1600... [2024-10-06 12:50:41,900][00491] Num frames 1700... [2024-10-06 12:50:42,209][00491] Num frames 1800... [2024-10-06 12:50:42,503][00491] Num frames 1900... [2024-10-06 12:50:42,813][00491] Num frames 2000... [2024-10-06 12:50:43,066][00491] Num frames 2100... [2024-10-06 12:50:43,281][00491] Num frames 2200... [2024-10-06 12:50:43,502][00491] Num frames 2300... [2024-10-06 12:50:43,711][00491] Num frames 2400... [2024-10-06 12:50:43,923][00491] Num frames 2500... [2024-10-06 12:50:44,140][00491] Num frames 2600... [2024-10-06 12:50:44,255][00491] Avg episode rewards: #0: 17.423, true rewards: #0: 8.757 [2024-10-06 12:50:44,257][00491] Avg episode reward: 17.423, avg true_objective: 8.757 [2024-10-06 12:50:44,423][00491] Num frames 2700... [2024-10-06 12:50:44,639][00491] Num frames 2800... [2024-10-06 12:50:44,842][00491] Num frames 2900... [2024-10-06 12:50:45,055][00491] Num frames 3000... [2024-10-06 12:50:45,274][00491] Num frames 3100... [2024-10-06 12:50:45,485][00491] Num frames 3200... [2024-10-06 12:50:45,689][00491] Num frames 3300... [2024-10-06 12:50:45,912][00491] Num frames 3400... [2024-10-06 12:50:46,157][00491] Num frames 3500... [2024-10-06 12:50:46,242][00491] Avg episode rewards: #0: 18.030, true rewards: #0: 8.780 [2024-10-06 12:50:46,244][00491] Avg episode reward: 18.030, avg true_objective: 8.780 [2024-10-06 12:50:46,438][00491] Num frames 3600... [2024-10-06 12:50:46,643][00491] Num frames 3700... [2024-10-06 12:50:46,857][00491] Num frames 3800... [2024-10-06 12:50:47,065][00491] Num frames 3900... [2024-10-06 12:50:47,288][00491] Num frames 4000... [2024-10-06 12:50:47,499][00491] Num frames 4100... [2024-10-06 12:50:47,706][00491] Num frames 4200... [2024-10-06 12:50:47,914][00491] Num frames 4300... [2024-10-06 12:50:47,996][00491] Avg episode rewards: #0: 17.424, true rewards: #0: 8.624 [2024-10-06 12:50:47,998][00491] Avg episode reward: 17.424, avg true_objective: 8.624 [2024-10-06 12:50:48,203][00491] Num frames 4400... [2024-10-06 12:50:48,418][00491] Num frames 4500... [2024-10-06 12:50:48,640][00491] Num frames 4600... [2024-10-06 12:50:48,855][00491] Num frames 4700... [2024-10-06 12:50:49,067][00491] Num frames 4800... [2024-10-06 12:50:49,289][00491] Num frames 4900... [2024-10-06 12:50:49,497][00491] Num frames 5000... [2024-10-06 12:50:49,708][00491] Num frames 5100... [2024-10-06 12:50:49,923][00491] Num frames 5200... [2024-10-06 12:50:50,144][00491] Num frames 5300... [2024-10-06 12:50:50,375][00491] Num frames 5400... [2024-10-06 12:50:50,595][00491] Num frames 5500... [2024-10-06 12:50:50,811][00491] Num frames 5600... [2024-10-06 12:50:51,028][00491] Num frames 5700... [2024-10-06 12:50:51,164][00491] Avg episode rewards: #0: 19.723, true rewards: #0: 9.557 [2024-10-06 12:50:51,167][00491] Avg episode reward: 19.723, avg true_objective: 9.557 [2024-10-06 12:50:51,317][00491] Num frames 5800... [2024-10-06 12:50:51,528][00491] Num frames 5900... [2024-10-06 12:50:51,728][00491] Num frames 6000... [2024-10-06 12:50:51,936][00491] Num frames 6100... [2024-10-06 12:50:52,157][00491] Num frames 6200... [2024-10-06 12:50:52,383][00491] Num frames 6300... [2024-10-06 12:50:52,603][00491] Num frames 6400... [2024-10-06 12:50:52,812][00491] Num frames 6500... [2024-10-06 12:50:52,877][00491] Avg episode rewards: #0: 18.717, true rewards: #0: 9.289 [2024-10-06 12:50:52,880][00491] Avg episode reward: 18.717, avg true_objective: 9.289 [2024-10-06 12:50:53,157][00491] Num frames 6600... [2024-10-06 12:50:53,457][00491] Num frames 6700... [2024-10-06 12:50:53,727][00491] Num frames 6800... [2024-10-06 12:50:54,007][00491] Num frames 6900... [2024-10-06 12:50:54,293][00491] Num frames 7000... [2024-10-06 12:50:54,602][00491] Num frames 7100... [2024-10-06 12:50:54,697][00491] Avg episode rewards: #0: 17.888, true rewards: #0: 8.887 [2024-10-06 12:50:54,700][00491] Avg episode reward: 17.888, avg true_objective: 8.887 [2024-10-06 12:50:54,957][00491] Num frames 7200... [2024-10-06 12:50:55,270][00491] Num frames 7300... [2024-10-06 12:50:55,589][00491] Num frames 7400... [2024-10-06 12:50:55,874][00491] Num frames 7500... [2024-10-06 12:50:56,179][00491] Num frames 7600... [2024-10-06 12:50:56,398][00491] Num frames 7700... [2024-10-06 12:50:56,623][00491] Num frames 7800... [2024-10-06 12:50:56,847][00491] Num frames 7900... [2024-10-06 12:50:57,077][00491] Num frames 8000... [2024-10-06 12:50:57,320][00491] Num frames 8100... [2024-10-06 12:50:57,553][00491] Num frames 8200... [2024-10-06 12:50:57,786][00491] Num frames 8300... [2024-10-06 12:50:58,019][00491] Num frames 8400... [2024-10-06 12:50:58,259][00491] Num frames 8500... [2024-10-06 12:50:58,490][00491] Num frames 8600... [2024-10-06 12:50:58,730][00491] Num frames 8700... [2024-10-06 12:50:58,959][00491] Num frames 8800... [2024-10-06 12:50:59,196][00491] Num frames 8900... [2024-10-06 12:50:59,428][00491] Num frames 9000... [2024-10-06 12:50:59,660][00491] Num frames 9100... [2024-10-06 12:50:59,887][00491] Num frames 9200... [2024-10-06 12:50:59,974][00491] Avg episode rewards: #0: 21.789, true rewards: #0: 10.233 [2024-10-06 12:50:59,975][00491] Avg episode reward: 21.789, avg true_objective: 10.233 [2024-10-06 12:51:00,191][00491] Num frames 9300... [2024-10-06 12:51:00,415][00491] Num frames 9400... [2024-10-06 12:51:00,646][00491] Num frames 9500... [2024-10-06 12:51:00,886][00491] Num frames 9600... [2024-10-06 12:51:01,102][00491] Num frames 9700... [2024-10-06 12:51:01,338][00491] Num frames 9800... [2024-10-06 12:51:01,566][00491] Num frames 9900... [2024-10-06 12:51:01,795][00491] Num frames 10000... [2024-10-06 12:51:02,022][00491] Num frames 10100... [2024-10-06 12:51:02,253][00491] Num frames 10200... [2024-10-06 12:51:02,490][00491] Num frames 10300... [2024-10-06 12:51:02,724][00491] Num frames 10400... [2024-10-06 12:51:02,957][00491] Num frames 10500... [2024-10-06 12:51:03,190][00491] Num frames 10600... [2024-10-06 12:51:03,417][00491] Num frames 10700... [2024-10-06 12:51:03,643][00491] Num frames 10800... [2024-10-06 12:51:03,875][00491] Num frames 10900... [2024-10-06 12:51:04,101][00491] Num frames 11000... [2024-10-06 12:51:04,347][00491] Num frames 11100... [2024-10-06 12:51:04,577][00491] Num frames 11200... [2024-10-06 12:51:04,803][00491] Num frames 11300... [2024-10-06 12:51:04,894][00491] Avg episode rewards: #0: 25.210, true rewards: #0: 11.310 [2024-10-06 12:51:04,896][00491] Avg episode reward: 25.210, avg true_objective: 11.310 [2024-10-06 12:52:26,580][00491] Replay video saved to /content/train_dir/default_experiment/replay.mp4!