Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.summary/0/events.out.tfevents.1700033565.4391a95ca488 +3 -0
README.md +1 -1
checkpoint_p0/checkpoint_000000979_4009984.pth +3 -0
config.json +1 -1
replay.mp4 +2 -2
sf_log.txt +693 -0

.summary/0/events.out.tfevents.1700033565.4391a95ca488 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d771ab8482c178e8bf9081104c58c80a76e2adaecff8a1215c9356ad72cda47
+size 2343

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ model-index:
       type: doom_health_gathering_supreme
     metrics:
     - type: mean_reward
-      value: 9.73 +/- 4.97
       name: mean_reward
       verified: false
 ---

       type: doom_health_gathering_supreme
     metrics:
     - type: mean_reward
+      value: 10.43 +/- 5.21
       name: mean_reward
       verified: false
 ---

checkpoint_p0/checkpoint_000000979_4009984.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a1bbba73e8dc1d41b2810287eed5e98ebd95145102dc4e56b5d68014f37486f9
+size 34929669

config.json CHANGED Viewed

@@ -65,7 +65,7 @@
   "summaries_use_frameskip": true,
   "heartbeat_interval": 20,
   "heartbeat_reporting_interval": 600,
-  "train_for_env_steps": 4000000,
   "train_for_seconds": 10000000000,
   "save_every_sec": 120,
   "keep_checkpoints": 2,

   "summaries_use_frameskip": true,
   "heartbeat_interval": 20,
   "heartbeat_reporting_interval": 600,
+  "train_for_env_steps": 50000,
   "train_for_seconds": 10000000000,
   "save_every_sec": 120,
   "keep_checkpoints": 2,

replay.mp4 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a52081b89a2f9f08cb0b1216df352de49a4cefc8cd6cb67198b9bfca1b8bec39
-size 18903213

 version https://git-lfs.github.com/spec/v1
+oid sha256:eb52ab7546998ece94f53381dc18af3b64b68dcbda359637b9feed826a1c69a5
+size 20332764

sf_log.txt CHANGED Viewed

@@ -2411,3 +2411,696 @@ main_loop: 1253.2841
 [2023-11-15 07:30:43,267][00663] Avg episode rewards: #0: 22.728, true rewards: #0: 9.728
 [2023-11-15 07:30:43,270][00663] Avg episode reward: 22.728, avg true_objective: 9.728
 [2023-11-15 07:31:48,191][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!

 [2023-11-15 07:30:43,267][00663] Avg episode rewards: #0: 22.728, true rewards: #0: 9.728
 [2023-11-15 07:30:43,270][00663] Avg episode reward: 22.728, avg true_objective: 9.728
 [2023-11-15 07:31:48,191][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+[2023-11-15 07:31:53,927][00663] The model has been pushed to https://huggingface.co/nikxtaco/rl_course_vizdoom_health_gathering_supreme
+[2023-11-15 07:32:45,353][00663] Environment doom_basic already registered, overwriting...
+[2023-11-15 07:32:45,356][00663] Environment doom_two_colors_easy already registered, overwriting...
+[2023-11-15 07:32:45,358][00663] Environment doom_two_colors_hard already registered, overwriting...
+[2023-11-15 07:32:45,359][00663] Environment doom_dm already registered, overwriting...
+[2023-11-15 07:32:45,361][00663] Environment doom_dwango5 already registered, overwriting...
+[2023-11-15 07:32:45,363][00663] Environment doom_my_way_home_flat_actions already registered, overwriting...
+[2023-11-15 07:32:45,364][00663] Environment doom_defend_the_center_flat_actions already registered, overwriting...
+[2023-11-15 07:32:45,366][00663] Environment doom_my_way_home already registered, overwriting...
+[2023-11-15 07:32:45,368][00663] Environment doom_deadly_corridor already registered, overwriting...
+[2023-11-15 07:32:45,369][00663] Environment doom_defend_the_center already registered, overwriting...
+[2023-11-15 07:32:45,370][00663] Environment doom_defend_the_line already registered, overwriting...
+[2023-11-15 07:32:45,372][00663] Environment doom_health_gathering already registered, overwriting...
+[2023-11-15 07:32:45,374][00663] Environment doom_health_gathering_supreme already registered, overwriting...
+[2023-11-15 07:32:45,376][00663] Environment doom_battle already registered, overwriting...
+[2023-11-15 07:32:45,378][00663] Environment doom_battle2 already registered, overwriting...
+[2023-11-15 07:32:45,379][00663] Environment doom_duel_bots already registered, overwriting...
+[2023-11-15 07:32:45,381][00663] Environment doom_deathmatch_bots already registered, overwriting...
+[2023-11-15 07:32:45,383][00663] Environment doom_duel already registered, overwriting...
+[2023-11-15 07:32:45,385][00663] Environment doom_deathmatch_full already registered, overwriting...
+[2023-11-15 07:32:45,386][00663] Environment doom_benchmark already registered, overwriting...
+[2023-11-15 07:32:45,388][00663] register_encoder_factory: <function make_vizdoom_encoder at 0x7e8c58d712d0>
+[2023-11-15 07:32:45,419][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2023-11-15 07:32:45,420][00663] Overriding arg 'train_for_env_steps' with value 50000 passed from command line
+[2023-11-15 07:32:45,424][00663] Experiment dir /content/train_dir/default_experiment already exists!
+[2023-11-15 07:32:45,425][00663] Resuming existing experiment from /content/train_dir/default_experiment...
+[2023-11-15 07:32:45,429][00663] Weights and Biases integration disabled
+[2023-11-15 07:32:45,433][00663] Environment var CUDA_VISIBLE_DEVICES is 0
+[2023-11-15 07:32:48,231][00663] Starting experiment with the following configuration:
+help=False
+algo=APPO
+env=doom_health_gathering_supreme
+experiment=default_experiment
+train_dir=/content/train_dir
+restart_behavior=resume
+device=gpu
+seed=None
+num_policies=1
+async_rl=True
+serial_mode=False
+batched_sampling=False
+num_batches_to_accumulate=2
+worker_num_splits=2
+policy_workers_per_policy=1
+max_policy_lag=1000
+num_workers=8
+num_envs_per_worker=4
+batch_size=1024
+num_batches_per_epoch=1
+num_epochs=1
+rollout=32
+recurrence=32
+shuffle_minibatches=False
+gamma=0.99
+reward_scale=1.0
+reward_clip=1000.0
+value_bootstrap=False
+normalize_returns=True
+exploration_loss_coeff=0.001
+value_loss_coeff=0.5
+kl_loss_coeff=0.0
+exploration_loss=symmetric_kl
+gae_lambda=0.95
+ppo_clip_ratio=0.1
+ppo_clip_value=0.2
+with_vtrace=False
+vtrace_rho=1.0
+vtrace_c=1.0
+optimizer=adam
+adam_eps=1e-06
+adam_beta1=0.9
+adam_beta2=0.999
+max_grad_norm=4.0
+learning_rate=0.0001
+lr_schedule=constant
+lr_schedule_kl_threshold=0.008
+lr_adaptive_min=1e-06
+lr_adaptive_max=0.01
+obs_subtract_mean=0.0
+obs_scale=255.0
+normalize_input=True
+normalize_input_keys=None
+decorrelate_experience_max_seconds=0
+decorrelate_envs_on_one_worker=True
+actor_worker_gpus=[]
+set_workers_cpu_affinity=True
+force_envs_single_thread=False
+default_niceness=0
+log_to_file=True
+experiment_summaries_interval=10
+flush_summaries_interval=30
+stats_avg=100
+summaries_use_frameskip=True
+heartbeat_interval=20
+heartbeat_reporting_interval=600
+train_for_env_steps=50000
+train_for_seconds=10000000000
+save_every_sec=120
+keep_checkpoints=2
+load_checkpoint_kind=latest
+save_milestones_sec=-1
+save_best_every_sec=5
+save_best_metric=reward
+save_best_after=100000
+benchmark=False
+encoder_mlp_layers=[512, 512]
+encoder_conv_architecture=convnet_simple
+encoder_conv_mlp_layers=[512]
+use_rnn=True
+rnn_size=512
+rnn_type=gru
+rnn_num_layers=1
+decoder_mlp_layers=[]
+nonlinearity=elu
+policy_initialization=orthogonal
+policy_init_gain=1.0
+actor_critic_share_weights=True
+adaptive_stddev=True
+continuous_tanh_scale=0.0
+initial_stddev=1.0
+use_env_info_cache=False
+env_gpu_actions=False
+env_gpu_observations=True
+env_frameskip=4
+env_framestack=1
+pixel_format=CHW
+use_record_episode_statistics=False
+with_wandb=False
+wandb_user=None
+wandb_project=sample_factory
+wandb_group=None
+wandb_job_type=SF
+wandb_tags=[]
+with_pbt=False
+pbt_mix_policies_in_one_env=True
+pbt_period_env_steps=5000000
+pbt_start_mutation=20000000
+pbt_replace_fraction=0.3
+pbt_mutation_rate=0.15
+pbt_replace_reward_gap=0.1
+pbt_replace_reward_gap_absolute=1e-06
+pbt_optimize_gamma=False
+pbt_target_objective=true_objective
+pbt_perturb_min=1.1
+pbt_perturb_max=1.5
+num_agents=-1
+num_humans=0
+num_bots=-1
+start_bot_difficulty=None
+timelimit=None
+res_w=128
+res_h=72
+wide_aspect_ratio=False
+eval_env_frameskip=1
+fps=35
+command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000
+cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000}
+git_hash=unknown
+git_repo_name=not a git repository
+[2023-11-15 07:32:48,234][00663] Saving configuration to /content/train_dir/default_experiment/config.json...
+[2023-11-15 07:32:48,237][00663] Rollout worker 0 uses device cpu
+[2023-11-15 07:32:48,239][00663] Rollout worker 1 uses device cpu
+[2023-11-15 07:32:48,243][00663] Rollout worker 2 uses device cpu
+[2023-11-15 07:32:48,244][00663] Rollout worker 3 uses device cpu
+[2023-11-15 07:32:48,245][00663] Rollout worker 4 uses device cpu
+[2023-11-15 07:32:48,246][00663] Rollout worker 5 uses device cpu
+[2023-11-15 07:32:48,251][00663] Rollout worker 6 uses device cpu
+[2023-11-15 07:32:48,252][00663] Rollout worker 7 uses device cpu
+[2023-11-15 07:32:48,364][00663] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-11-15 07:32:48,367][00663] InferenceWorker_p0-w0: min num requests: 2
+[2023-11-15 07:32:48,408][00663] Starting all processes...
+[2023-11-15 07:32:48,410][00663] Starting process learner_proc0
+[2023-11-15 07:32:48,483][00663] Starting all processes...
+[2023-11-15 07:32:48,496][00663] Starting process inference_proc0-0
+[2023-11-15 07:32:48,498][00663] Starting process rollout_proc0
+[2023-11-15 07:32:48,516][00663] Starting process rollout_proc1
+[2023-11-15 07:32:48,517][00663] Starting process rollout_proc2
+[2023-11-15 07:32:48,517][00663] Starting process rollout_proc3
+[2023-11-15 07:32:48,517][00663] Starting process rollout_proc4
+[2023-11-15 07:32:48,517][00663] Starting process rollout_proc5
+[2023-11-15 07:32:48,517][00663] Starting process rollout_proc6
+[2023-11-15 07:32:48,517][00663] Starting process rollout_proc7
+[2023-11-15 07:33:05,025][29796] Worker 1 uses CPU cores [1]
+[2023-11-15 07:33:05,444][29797] Worker 2 uses CPU cores [0]
+[2023-11-15 07:33:05,498][29798] Worker 3 uses CPU cores [1]
+[2023-11-15 07:33:05,640][29794] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-11-15 07:33:05,641][29794] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2023-11-15 07:33:05,711][29794] Num visible devices: 1
+[2023-11-15 07:33:05,741][29802] Worker 7 uses CPU cores [1]
+[2023-11-15 07:33:05,833][29799] Worker 4 uses CPU cores [0]
+[2023-11-15 07:33:05,854][29800] Worker 5 uses CPU cores [1]
+[2023-11-15 07:33:05,873][29781] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-11-15 07:33:05,873][29781] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2023-11-15 07:33:05,903][29795] Worker 0 uses CPU cores [0]
+[2023-11-15 07:33:05,913][29781] Num visible devices: 1
+[2023-11-15 07:33:05,915][29781] Starting seed is not provided
+[2023-11-15 07:33:05,916][29781] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-11-15 07:33:05,916][29781] Initializing actor-critic model on device cuda:0
+[2023-11-15 07:33:05,917][29781] RunningMeanStd input shape: (3, 72, 128)
+[2023-11-15 07:33:05,918][29781] RunningMeanStd input shape: (1,)
+[2023-11-15 07:33:05,941][29781] ConvEncoder: input_channels=3
+[2023-11-15 07:33:05,950][29801] Worker 6 uses CPU cores [0]
+[2023-11-15 07:33:06,110][29781] Conv encoder output size: 512
+[2023-11-15 07:33:06,111][29781] Policy head output size: 512
+[2023-11-15 07:33:06,135][29781] Created Actor Critic model with architecture:
+[2023-11-15 07:33:06,136][29781] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2023-11-15 07:33:06,412][29781] Using optimizer <class 'torch.optim.adam.Adam'>
+[2023-11-15 07:33:06,874][29781] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2023-11-15 07:33:06,915][29781] Loading model from checkpoint
+[2023-11-15 07:33:06,918][29781] Loaded experiment state at self.train_step=978, self.env_steps=4005888
+[2023-11-15 07:33:06,919][29781] Initialized policy 0 weights for model version 978
+[2023-11-15 07:33:06,937][29781] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-11-15 07:33:06,946][29781] LearnerWorker_p0 finished initialization!
+[2023-11-15 07:33:07,279][29794] RunningMeanStd input shape: (3, 72, 128)
+[2023-11-15 07:33:07,282][29794] RunningMeanStd input shape: (1,)
+[2023-11-15 07:33:07,302][29794] ConvEncoder: input_channels=3
+[2023-11-15 07:33:07,473][29794] Conv encoder output size: 512
+[2023-11-15 07:33:07,476][29794] Policy head output size: 512
+[2023-11-15 07:33:07,573][00663] Inference worker 0-0 is ready!
+[2023-11-15 07:33:07,576][00663] All inference workers are ready! Signal rollout workers to start!
+[2023-11-15 07:33:07,842][29800] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-11-15 07:33:07,841][29802] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-11-15 07:33:07,845][29796] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-11-15 07:33:07,844][29798] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-11-15 07:33:07,878][29799] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-11-15 07:33:07,880][29795] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-11-15 07:33:07,884][29797] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-11-15 07:33:07,885][29801] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-11-15 07:33:08,353][00663] Heartbeat connected on Batcher_0
+[2023-11-15 07:33:08,361][00663] Heartbeat connected on LearnerWorker_p0
+[2023-11-15 07:33:08,413][00663] Heartbeat connected on InferenceWorker_p0-w0
+[2023-11-15 07:33:09,261][29800] Decorrelating experience for 0 frames...
+[2023-11-15 07:33:09,346][29799] Decorrelating experience for 0 frames...
+[2023-11-15 07:33:09,358][29797] Decorrelating experience for 0 frames...
+[2023-11-15 07:33:09,363][29801] Decorrelating experience for 0 frames...
+[2023-11-15 07:33:10,434][00663] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2023-11-15 07:33:10,749][29797] Decorrelating experience for 32 frames...
+[2023-11-15 07:33:10,751][29801] Decorrelating experience for 32 frames...
+[2023-11-15 07:33:10,763][29795] Decorrelating experience for 0 frames...
+[2023-11-15 07:33:11,048][29796] Decorrelating experience for 0 frames...
+[2023-11-15 07:33:11,096][29802] Decorrelating experience for 0 frames...
+[2023-11-15 07:33:11,832][29796] Decorrelating experience for 32 frames...
+[2023-11-15 07:33:12,228][29795] Decorrelating experience for 32 frames...
+[2023-11-15 07:33:12,576][29801] Decorrelating experience for 64 frames...
+[2023-11-15 07:33:12,578][29797] Decorrelating experience for 64 frames...
+[2023-11-15 07:33:13,530][29802] Decorrelating experience for 32 frames...
+[2023-11-15 07:33:13,738][29795] Decorrelating experience for 64 frames...
+[2023-11-15 07:33:13,838][29798] Decorrelating experience for 0 frames...
+[2023-11-15 07:33:13,868][29796] Decorrelating experience for 64 frames...
+[2023-11-15 07:33:13,867][29801] Decorrelating experience for 96 frames...
+[2023-11-15 07:33:14,088][00663] Heartbeat connected on RolloutWorker_w6
+[2023-11-15 07:33:14,903][29797] Decorrelating experience for 96 frames...
+[2023-11-15 07:33:15,016][29800] Decorrelating experience for 32 frames...
+[2023-11-15 07:33:15,022][29798] Decorrelating experience for 32 frames...
+[2023-11-15 07:33:15,038][29795] Decorrelating experience for 96 frames...
+[2023-11-15 07:33:15,261][00663] Heartbeat connected on RolloutWorker_w2
+[2023-11-15 07:33:15,433][00663] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2023-11-15 07:33:15,506][00663] Heartbeat connected on RolloutWorker_w0
+[2023-11-15 07:33:16,199][29796] Decorrelating experience for 96 frames...
+[2023-11-15 07:33:16,379][29802] Decorrelating experience for 64 frames...
+[2023-11-15 07:33:16,468][00663] Heartbeat connected on RolloutWorker_w1
+[2023-11-15 07:33:16,759][29798] Decorrelating experience for 64 frames...
+[2023-11-15 07:33:17,730][29799] Decorrelating experience for 32 frames...
+[2023-11-15 07:33:18,946][29800] Decorrelating experience for 64 frames...
+[2023-11-15 07:33:19,909][29781] Stopping Batcher_0...
+[2023-11-15 07:33:19,911][29781] Loop batcher_evt_loop terminating...
+[2023-11-15 07:33:19,915][00663] Component Batcher_0 stopped!
+[2023-11-15 07:33:19,917][29781] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
+[2023-11-15 07:33:19,950][29796] Stopping RolloutWorker_w1...
+[2023-11-15 07:33:19,950][00663] Component RolloutWorker_w1 stopped!
+[2023-11-15 07:33:19,966][00663] Component RolloutWorker_w2 stopped!
+[2023-11-15 07:33:19,971][29796] Loop rollout_proc1_evt_loop terminating...
+[2023-11-15 07:33:19,966][29797] Stopping RolloutWorker_w2...
+[2023-11-15 07:33:19,973][00663] Component RolloutWorker_w6 stopped!
+[2023-11-15 07:33:19,973][29801] Stopping RolloutWorker_w6...
+[2023-11-15 07:33:19,976][29797] Loop rollout_proc2_evt_loop terminating...
+[2023-11-15 07:33:19,980][29801] Loop rollout_proc6_evt_loop terminating...
+[2023-11-15 07:33:19,987][00663] Component RolloutWorker_w0 stopped!
+[2023-11-15 07:33:19,987][29795] Stopping RolloutWorker_w0...
+[2023-11-15 07:33:19,990][29795] Loop rollout_proc0_evt_loop terminating...
+[2023-11-15 07:33:20,012][29794] Weights refcount: 2 0
+[2023-11-15 07:33:20,021][00663] Component InferenceWorker_p0-w0 stopped!
+[2023-11-15 07:33:20,023][29794] Stopping InferenceWorker_p0-w0...
+[2023-11-15 07:33:20,023][29794] Loop inference_proc0-0_evt_loop terminating...
+[2023-11-15 07:33:20,086][29781] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000941_3854336.pth
+[2023-11-15 07:33:20,109][29781] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
+[2023-11-15 07:33:20,145][29802] Decorrelating experience for 96 frames...
+[2023-11-15 07:33:20,322][00663] Component LearnerWorker_p0 stopped!
+[2023-11-15 07:33:20,324][29781] Stopping LearnerWorker_p0...
+[2023-11-15 07:33:20,326][29781] Loop learner_proc0_evt_loop terminating...
+[2023-11-15 07:33:20,604][29798] Decorrelating experience for 96 frames...
+[2023-11-15 07:33:20,931][00663] Component RolloutWorker_w7 stopped!
+[2023-11-15 07:33:20,934][29802] Stopping RolloutWorker_w7...
+[2023-11-15 07:33:20,936][29802] Loop rollout_proc7_evt_loop terminating...
+[2023-11-15 07:33:21,203][00663] Component RolloutWorker_w3 stopped!
+[2023-11-15 07:33:21,201][29798] Stopping RolloutWorker_w3...
+[2023-11-15 07:33:21,206][29798] Loop rollout_proc3_evt_loop terminating...
+[2023-11-15 07:33:22,117][29800] Decorrelating experience for 96 frames...
+[2023-11-15 07:33:22,120][29799] Decorrelating experience for 64 frames...
+[2023-11-15 07:33:22,386][00663] Component RolloutWorker_w5 stopped!
+[2023-11-15 07:33:22,386][29800] Stopping RolloutWorker_w5...
+[2023-11-15 07:33:22,388][29800] Loop rollout_proc5_evt_loop terminating...
+[2023-11-15 07:33:23,934][29799] Decorrelating experience for 96 frames...
+[2023-11-15 07:33:24,200][29799] Stopping RolloutWorker_w4...
+[2023-11-15 07:33:24,200][00663] Component RolloutWorker_w4 stopped!
+[2023-11-15 07:33:24,207][29799] Loop rollout_proc4_evt_loop terminating...
+[2023-11-15 07:33:24,206][00663] Waiting for process learner_proc0 to stop...
+[2023-11-15 07:33:24,212][00663] Waiting for process inference_proc0-0 to join...
+[2023-11-15 07:33:24,215][00663] Waiting for process rollout_proc0 to join...
+[2023-11-15 07:33:24,220][00663] Waiting for process rollout_proc1 to join...
+[2023-11-15 07:33:24,228][00663] Waiting for process rollout_proc2 to join...
+[2023-11-15 07:33:24,230][00663] Waiting for process rollout_proc3 to join...
+[2023-11-15 07:33:24,485][00663] Waiting for process rollout_proc4 to join...
+[2023-11-15 07:33:25,056][00663] Waiting for process rollout_proc5 to join...
+[2023-11-15 07:33:25,062][00663] Waiting for process rollout_proc6 to join...
+[2023-11-15 07:33:25,064][00663] Waiting for process rollout_proc7 to join...
+[2023-11-15 07:33:25,066][00663] Batcher 0 profile tree view:
+batching: 0.0184, releasing_batches: 0.0000
+[2023-11-15 07:33:25,068][00663] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0038
+  wait_policy_total: 9.5800
+update_model: 0.0192
+  weight_update: 0.0013
+one_step: 0.0029
+  handle_policy_step: 2.5433
+    deserialize: 0.0512, stack: 0.0104, obs_to_device_normalize: 0.4439, forward: 1.6464, send_messages: 0.0528
+    prepare_outputs: 0.2688
+      to_cpu: 0.1773
+[2023-11-15 07:33:25,069][00663] Learner 0 profile tree view:
+misc: 0.0000, prepare_batch: 1.0413
+train: 1.5532
+  epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0002, kl_divergence: 0.0073, after_optimizer: 0.0407
+  calculate_losses: 0.4753
+    losses_init: 0.0000, forward_head: 0.3270, bptt_initial: 0.1047, tail: 0.0067, advantages_returns: 0.0009, losses: 0.0310
+    bptt: 0.0048
+      bptt_forward_core: 0.0047
+  update: 1.0292
+    clip: 0.0481
+[2023-11-15 07:33:25,070][00663] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 0.0009, enqueue_policy_requests: 0.9218, env_step: 2.9765, overhead: 0.0933, complete_rollouts: 0.0458
+save_policy_outputs: 0.0550
+  split_output_tensors: 0.0273
+[2023-11-15 07:33:25,071][00663] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0148
+[2023-11-15 07:33:25,073][00663] Loop Runner_EvtLoop terminating...
+[2023-11-15 07:33:25,077][00663] Runner profile tree view:
+main_loop: 36.6700
+[2023-11-15 07:33:25,079][00663] Collected {0: 4009984}, FPS: 111.7
+[2023-11-15 07:33:25,121][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2023-11-15 07:33:25,124][00663] Overriding arg 'num_workers' with value 1 passed from command line
+[2023-11-15 07:33:25,128][00663] Adding new argument 'no_render'=True that is not in the saved config file!
+[2023-11-15 07:33:25,130][00663] Adding new argument 'save_video'=True that is not in the saved config file!
+[2023-11-15 07:33:25,133][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2023-11-15 07:33:25,136][00663] Adding new argument 'video_name'=None that is not in the saved config file!
+[2023-11-15 07:33:25,137][00663] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2023-11-15 07:33:25,140][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2023-11-15 07:33:25,142][00663] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2023-11-15 07:33:25,144][00663] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2023-11-15 07:33:25,145][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2023-11-15 07:33:25,148][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2023-11-15 07:33:25,149][00663] Adding new argument 'train_script'=None that is not in the saved config file!
+[2023-11-15 07:33:25,151][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2023-11-15 07:33:25,152][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2023-11-15 07:33:25,220][00663] RunningMeanStd input shape: (3, 72, 128)
+[2023-11-15 07:33:25,222][00663] RunningMeanStd input shape: (1,)
+[2023-11-15 07:33:25,247][00663] ConvEncoder: input_channels=3
+[2023-11-15 07:33:25,316][00663] Conv encoder output size: 512
+[2023-11-15 07:33:25,318][00663] Policy head output size: 512
+[2023-11-15 07:33:25,349][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
+[2023-11-15 07:33:26,067][00663] Num frames 100...
+[2023-11-15 07:33:26,253][00663] Num frames 200...
+[2023-11-15 07:33:26,468][00663] Num frames 300...
+[2023-11-15 07:33:26,674][00663] Num frames 400...
+[2023-11-15 07:33:26,861][00663] Num frames 500...
+[2023-11-15 07:33:27,055][00663] Num frames 600...
+[2023-11-15 07:33:27,250][00663] Num frames 700...
+[2023-11-15 07:33:27,456][00663] Num frames 800...
+[2023-11-15 07:33:27,651][00663] Num frames 900...
+[2023-11-15 07:33:27,893][00663] Avg episode rewards: #0: 23.920, true rewards: #0: 9.920
+[2023-11-15 07:33:27,895][00663] Avg episode reward: 23.920, avg true_objective: 9.920
+[2023-11-15 07:33:27,914][00663] Num frames 1000...
+[2023-11-15 07:33:28,107][00663] Num frames 1100...
+[2023-11-15 07:33:28,309][00663] Num frames 1200...
+[2023-11-15 07:33:28,515][00663] Num frames 1300...
+[2023-11-15 07:33:28,702][00663] Num frames 1400...
+[2023-11-15 07:33:28,894][00663] Num frames 1500...
+[2023-11-15 07:33:29,032][00663] Num frames 1600...
+[2023-11-15 07:33:29,158][00663] Num frames 1700...
+[2023-11-15 07:33:29,287][00663] Num frames 1800...
+[2023-11-15 07:33:29,482][00663] Avg episode rewards: #0: 21.440, true rewards: #0: 9.440
+[2023-11-15 07:33:29,484][00663] Avg episode reward: 21.440, avg true_objective: 9.440
+[2023-11-15 07:33:29,503][00663] Num frames 1900...
+[2023-11-15 07:33:29,631][00663] Num frames 2000...
+[2023-11-15 07:33:29,762][00663] Num frames 2100...
+[2023-11-15 07:33:29,887][00663] Num frames 2200...
+[2023-11-15 07:33:30,024][00663] Num frames 2300...
+[2023-11-15 07:33:30,155][00663] Num frames 2400...
+[2023-11-15 07:33:30,285][00663] Num frames 2500...
+[2023-11-15 07:33:30,420][00663] Num frames 2600...
+[2023-11-15 07:33:30,564][00663] Num frames 2700...
+[2023-11-15 07:33:30,695][00663] Num frames 2800...
+[2023-11-15 07:33:30,829][00663] Num frames 2900...
+[2023-11-15 07:33:30,960][00663] Num frames 3000...
+[2023-11-15 07:33:31,095][00663] Num frames 3100...
+[2023-11-15 07:33:31,234][00663] Num frames 3200...
+[2023-11-15 07:33:31,374][00663] Num frames 3300...
+[2023-11-15 07:33:31,519][00663] Num frames 3400...
+[2023-11-15 07:33:31,658][00663] Num frames 3500...
+[2023-11-15 07:33:31,795][00663] Num frames 3600...
+[2023-11-15 07:33:31,930][00663] Num frames 3700...
+[2023-11-15 07:33:32,065][00663] Num frames 3800...
+[2023-11-15 07:33:32,197][00663] Num frames 3900...
+[2023-11-15 07:33:32,370][00663] Avg episode rewards: #0: 31.960, true rewards: #0: 13.293
+[2023-11-15 07:33:32,371][00663] Avg episode reward: 31.960, avg true_objective: 13.293
+[2023-11-15 07:33:32,393][00663] Num frames 4000...
+[2023-11-15 07:33:32,537][00663] Num frames 4100...
+[2023-11-15 07:33:32,673][00663] Num frames 4200...
+[2023-11-15 07:33:32,816][00663] Num frames 4300...
+[2023-11-15 07:33:32,947][00663] Num frames 4400...
+[2023-11-15 07:33:33,087][00663] Num frames 4500...
+[2023-11-15 07:33:33,216][00663] Num frames 4600...
+[2023-11-15 07:33:33,355][00663] Num frames 4700...
+[2023-11-15 07:33:33,488][00663] Num frames 4800...
+[2023-11-15 07:33:33,626][00663] Num frames 4900...
+[2023-11-15 07:33:33,756][00663] Num frames 5000...
+[2023-11-15 07:33:33,887][00663] Num frames 5100...
+[2023-11-15 07:33:34,018][00663] Num frames 5200...
+[2023-11-15 07:33:34,083][00663] Avg episode rewards: #0: 30.010, true rewards: #0: 13.010
+[2023-11-15 07:33:34,085][00663] Avg episode reward: 30.010, avg true_objective: 13.010
+[2023-11-15 07:33:34,224][00663] Num frames 5300...
+[2023-11-15 07:33:34,361][00663] Num frames 5400...
+[2023-11-15 07:33:34,491][00663] Num frames 5500...
+[2023-11-15 07:33:34,629][00663] Num frames 5600...
+[2023-11-15 07:33:34,763][00663] Num frames 5700...
+[2023-11-15 07:33:34,880][00663] Avg episode rewards: #0: 26.096, true rewards: #0: 11.496
+[2023-11-15 07:33:34,884][00663] Avg episode reward: 26.096, avg true_objective: 11.496
+[2023-11-15 07:33:34,952][00663] Num frames 5800...
+[2023-11-15 07:33:35,080][00663] Num frames 5900...
+[2023-11-15 07:33:35,210][00663] Num frames 6000...
+[2023-11-15 07:33:35,340][00663] Num frames 6100...
+[2023-11-15 07:33:35,476][00663] Num frames 6200...
+[2023-11-15 07:33:35,612][00663] Num frames 6300...
+[2023-11-15 07:33:35,743][00663] Num frames 6400...
+[2023-11-15 07:33:35,871][00663] Num frames 6500...
+[2023-11-15 07:33:36,004][00663] Num frames 6600...
+[2023-11-15 07:33:36,135][00663] Num frames 6700...
+[2023-11-15 07:33:36,264][00663] Num frames 6800...
+[2023-11-15 07:33:36,422][00663] Num frames 6900...
+[2023-11-15 07:33:36,574][00663] Num frames 7000...
+[2023-11-15 07:33:36,710][00663] Num frames 7100...
+[2023-11-15 07:33:36,843][00663] Num frames 7200...
+[2023-11-15 07:33:36,974][00663] Num frames 7300...
+[2023-11-15 07:33:37,107][00663] Num frames 7400...
+[2023-11-15 07:33:37,241][00663] Num frames 7500...
+[2023-11-15 07:33:37,350][00663] Avg episode rewards: #0: 29.233, true rewards: #0: 12.567
+[2023-11-15 07:33:37,352][00663] Avg episode reward: 29.233, avg true_objective: 12.567
+[2023-11-15 07:33:37,437][00663] Num frames 7600...
+[2023-11-15 07:33:37,574][00663] Num frames 7700...
+[2023-11-15 07:33:37,718][00663] Num frames 7800...
+[2023-11-15 07:33:37,854][00663] Num frames 7900...
+[2023-11-15 07:33:37,986][00663] Num frames 8000...
+[2023-11-15 07:33:38,119][00663] Num frames 8100...
+[2023-11-15 07:33:38,274][00663] Avg episode rewards: #0: 26.681, true rewards: #0: 11.681
+[2023-11-15 07:33:38,276][00663] Avg episode reward: 26.681, avg true_objective: 11.681
+[2023-11-15 07:33:38,308][00663] Num frames 8200...
+[2023-11-15 07:33:38,442][00663] Num frames 8300...
+[2023-11-15 07:33:38,574][00663] Num frames 8400...
+[2023-11-15 07:33:38,714][00663] Num frames 8500...
+[2023-11-15 07:33:38,845][00663] Num frames 8600...
+[2023-11-15 07:33:39,018][00663] Num frames 8700...
+[2023-11-15 07:33:39,222][00663] Num frames 8800...
+[2023-11-15 07:33:39,416][00663] Num frames 8900...
+[2023-11-15 07:33:39,610][00663] Num frames 9000...
+[2023-11-15 07:33:39,815][00663] Num frames 9100...
+[2023-11-15 07:33:40,007][00663] Num frames 9200...
+[2023-11-15 07:33:40,204][00663] Num frames 9300...
+[2023-11-15 07:33:40,403][00663] Num frames 9400...
+[2023-11-15 07:33:40,602][00663] Num frames 9500...
+[2023-11-15 07:33:40,802][00663] Num frames 9600...
+[2023-11-15 07:33:40,988][00663] Num frames 9700...
+[2023-11-15 07:33:41,179][00663] Num frames 9800...
+[2023-11-15 07:33:41,380][00663] Num frames 9900...
+[2023-11-15 07:33:41,455][00663] Avg episode rewards: #0: 29.006, true rewards: #0: 12.381
+[2023-11-15 07:33:41,457][00663] Avg episode reward: 29.006, avg true_objective: 12.381
+[2023-11-15 07:33:41,645][00663] Num frames 10000...
+[2023-11-15 07:33:41,851][00663] Num frames 10100...
+[2023-11-15 07:33:42,042][00663] Num frames 10200...
+[2023-11-15 07:33:42,229][00663] Num frames 10300...
+[2023-11-15 07:33:42,428][00663] Num frames 10400...
+[2023-11-15 07:33:42,632][00663] Num frames 10500...
+[2023-11-15 07:33:42,775][00663] Avg episode rewards: #0: 27.494, true rewards: #0: 11.717
+[2023-11-15 07:33:42,777][00663] Avg episode reward: 27.494, avg true_objective: 11.717
+[2023-11-15 07:33:42,884][00663] Num frames 10600...
+[2023-11-15 07:33:43,070][00663] Num frames 10700...
+[2023-11-15 07:33:43,272][00663] Num frames 10800...
+[2023-11-15 07:33:43,470][00663] Num frames 10900...
+[2023-11-15 07:33:43,662][00663] Num frames 11000...
+[2023-11-15 07:33:43,855][00663] Num frames 11100...
+[2023-11-15 07:33:44,056][00663] Num frames 11200...
+[2023-11-15 07:33:44,250][00663] Num frames 11300...
+[2023-11-15 07:33:44,455][00663] Num frames 11400...
+[2023-11-15 07:33:44,588][00663] Num frames 11500...
+[2023-11-15 07:33:44,724][00663] Num frames 11600...
+[2023-11-15 07:33:44,784][00663] Avg episode rewards: #0: 26.901, true rewards: #0: 11.601
+[2023-11-15 07:33:44,785][00663] Avg episode reward: 26.901, avg true_objective: 11.601
+[2023-11-15 07:34:59,120][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+[2023-11-15 07:34:59,693][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2023-11-15 07:34:59,695][00663] Overriding arg 'num_workers' with value 1 passed from command line
+[2023-11-15 07:34:59,701][00663] Adding new argument 'no_render'=True that is not in the saved config file!
+[2023-11-15 07:34:59,705][00663] Adding new argument 'save_video'=True that is not in the saved config file!
+[2023-11-15 07:34:59,707][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2023-11-15 07:34:59,709][00663] Adding new argument 'video_name'=None that is not in the saved config file!
+[2023-11-15 07:34:59,711][00663] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2023-11-15 07:34:59,712][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2023-11-15 07:34:59,713][00663] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2023-11-15 07:34:59,714][00663] Adding new argument 'hf_repository'='nikxtaco/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2023-11-15 07:34:59,715][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2023-11-15 07:34:59,716][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2023-11-15 07:34:59,717][00663] Adding new argument 'train_script'=None that is not in the saved config file!
+[2023-11-15 07:34:59,718][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2023-11-15 07:34:59,720][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2023-11-15 07:34:59,762][00663] RunningMeanStd input shape: (3, 72, 128)
+[2023-11-15 07:34:59,764][00663] RunningMeanStd input shape: (1,)
+[2023-11-15 07:34:59,781][00663] ConvEncoder: input_channels=3
+[2023-11-15 07:34:59,840][00663] Conv encoder output size: 512
+[2023-11-15 07:34:59,843][00663] Policy head output size: 512
+[2023-11-15 07:34:59,871][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
+[2023-11-15 07:35:00,607][00663] Num frames 100...
+[2023-11-15 07:35:00,796][00663] Num frames 200...
+[2023-11-15 07:35:00,998][00663] Num frames 300...
+[2023-11-15 07:35:01,191][00663] Num frames 400...
+[2023-11-15 07:35:01,426][00663] Num frames 500...
+[2023-11-15 07:35:01,656][00663] Num frames 600...
+[2023-11-15 07:35:01,850][00663] Num frames 700...
+[2023-11-15 07:35:02,050][00663] Num frames 800...
+[2023-11-15 07:35:02,243][00663] Num frames 900...
+[2023-11-15 07:35:02,469][00663] Num frames 1000...
+[2023-11-15 07:35:02,674][00663] Num frames 1100...
+[2023-11-15 07:35:02,874][00663] Num frames 1200...
+[2023-11-15 07:35:03,080][00663] Num frames 1300...
+[2023-11-15 07:35:03,287][00663] Num frames 1400...
+[2023-11-15 07:35:03,506][00663] Num frames 1500...
+[2023-11-15 07:35:03,706][00663] Num frames 1600...
+[2023-11-15 07:35:03,903][00663] Num frames 1700...
+[2023-11-15 07:35:04,156][00663] Num frames 1800...
+[2023-11-15 07:35:04,390][00663] Num frames 1900...
+[2023-11-15 07:35:04,621][00663] Num frames 2000...
+[2023-11-15 07:35:04,859][00663] Num frames 2100...
+[2023-11-15 07:35:04,912][00663] Avg episode rewards: #0: 54.999, true rewards: #0: 21.000
+[2023-11-15 07:35:04,914][00663] Avg episode reward: 54.999, avg true_objective: 21.000
+[2023-11-15 07:35:05,148][00663] Num frames 2200...
+[2023-11-15 07:35:05,373][00663] Num frames 2300...
+[2023-11-15 07:35:05,613][00663] Num frames 2400...
+[2023-11-15 07:35:05,844][00663] Num frames 2500...
+[2023-11-15 07:35:06,072][00663] Num frames 2600...
+[2023-11-15 07:35:06,302][00663] Avg episode rewards: #0: 32.380, true rewards: #0: 13.380
+[2023-11-15 07:35:06,304][00663] Avg episode reward: 32.380, avg true_objective: 13.380
+[2023-11-15 07:35:06,376][00663] Num frames 2700...
+[2023-11-15 07:35:06,582][00663] Num frames 2800...
+[2023-11-15 07:35:06,827][00663] Num frames 2900...
+[2023-11-15 07:35:07,020][00663] Num frames 3000...
+[2023-11-15 07:35:07,242][00663] Num frames 3100...
+[2023-11-15 07:35:07,527][00663] Num frames 3200...
+[2023-11-15 07:35:07,789][00663] Avg episode rewards: #0: 25.613, true rewards: #0: 10.947
+[2023-11-15 07:35:07,791][00663] Avg episode reward: 25.613, avg true_objective: 10.947
+[2023-11-15 07:35:07,830][00663] Num frames 3300...
+[2023-11-15 07:35:08,071][00663] Num frames 3400...
+[2023-11-15 07:35:08,307][00663] Num frames 3500...
+[2023-11-15 07:35:08,537][00663] Num frames 3600...
+[2023-11-15 07:35:08,795][00663] Num frames 3700...
+[2023-11-15 07:35:09,056][00663] Num frames 3800...
+[2023-11-15 07:35:09,237][00663] Avg episode rewards: #0: 21.877, true rewards: #0: 9.628
+[2023-11-15 07:35:09,239][00663] Avg episode reward: 21.877, avg true_objective: 9.628
+[2023-11-15 07:35:09,364][00663] Num frames 3900...
+[2023-11-15 07:35:09,614][00663] Num frames 4000...
+[2023-11-15 07:35:09,846][00663] Num frames 4100...
+[2023-11-15 07:35:10,029][00663] Num frames 4200...
+[2023-11-15 07:35:10,219][00663] Num frames 4300...
+[2023-11-15 07:35:10,408][00663] Num frames 4400...
+[2023-11-15 07:35:10,598][00663] Num frames 4500...
+[2023-11-15 07:35:10,796][00663] Num frames 4600...
+[2023-11-15 07:35:10,971][00663] Avg episode rewards: #0: 21.102, true rewards: #0: 9.302
+[2023-11-15 07:35:10,974][00663] Avg episode reward: 21.102, avg true_objective: 9.302
+[2023-11-15 07:35:11,044][00663] Num frames 4700...
+[2023-11-15 07:35:11,172][00663] Num frames 4800...
+[2023-11-15 07:35:11,300][00663] Num frames 4900...
+[2023-11-15 07:35:11,434][00663] Num frames 5000...
+[2023-11-15 07:35:11,567][00663] Num frames 5100...
+[2023-11-15 07:35:11,701][00663] Num frames 5200...
+[2023-11-15 07:35:11,857][00663] Num frames 5300...
+[2023-11-15 07:35:11,997][00663] Num frames 5400...
+[2023-11-15 07:35:12,129][00663] Num frames 5500...
+[2023-11-15 07:35:12,265][00663] Num frames 5600...
+[2023-11-15 07:35:12,407][00663] Num frames 5700...
+[2023-11-15 07:35:12,538][00663] Num frames 5800...
+[2023-11-15 07:35:12,669][00663] Num frames 5900...
+[2023-11-15 07:35:12,803][00663] Num frames 6000...
+[2023-11-15 07:35:12,942][00663] Num frames 6100...
+[2023-11-15 07:35:13,088][00663] Avg episode rewards: #0: 23.780, true rewards: #0: 10.280
+[2023-11-15 07:35:13,090][00663] Avg episode reward: 23.780, avg true_objective: 10.280
+[2023-11-15 07:35:13,138][00663] Num frames 6200...
+[2023-11-15 07:35:13,277][00663] Num frames 6300...
+[2023-11-15 07:35:13,410][00663] Num frames 6400...
+[2023-11-15 07:35:13,544][00663] Num frames 6500...
+[2023-11-15 07:35:13,675][00663] Num frames 6600...
+[2023-11-15 07:35:13,803][00663] Num frames 6700...
+[2023-11-15 07:35:13,958][00663] Avg episode rewards: #0: 22.397, true rewards: #0: 9.683
+[2023-11-15 07:35:13,960][00663] Avg episode reward: 22.397, avg true_objective: 9.683
+[2023-11-15 07:35:13,992][00663] Num frames 6800...
+[2023-11-15 07:35:14,122][00663] Num frames 6900...
+[2023-11-15 07:35:14,254][00663] Num frames 7000...
+[2023-11-15 07:35:14,393][00663] Num frames 7100...
+[2023-11-15 07:35:14,523][00663] Num frames 7200...
+[2023-11-15 07:35:14,656][00663] Num frames 7300...
+[2023-11-15 07:35:14,801][00663] Num frames 7400...
+[2023-11-15 07:35:14,934][00663] Num frames 7500...
+[2023-11-15 07:35:15,073][00663] Num frames 7600...
+[2023-11-15 07:35:15,204][00663] Num frames 7700...
+[2023-11-15 07:35:15,340][00663] Num frames 7800...
+[2023-11-15 07:35:15,482][00663] Num frames 7900...
+[2023-11-15 07:35:15,620][00663] Num frames 8000...
+[2023-11-15 07:35:15,761][00663] Num frames 8100...
+[2023-11-15 07:35:15,905][00663] Num frames 8200...
+[2023-11-15 07:35:16,087][00663] Avg episode rewards: #0: 24.602, true rewards: #0: 10.352
+[2023-11-15 07:35:16,089][00663] Avg episode reward: 24.602, avg true_objective: 10.352
+[2023-11-15 07:35:16,116][00663] Num frames 8300...
+[2023-11-15 07:35:16,249][00663] Num frames 8400...
+[2023-11-15 07:35:16,384][00663] Num frames 8500...
+[2023-11-15 07:35:16,517][00663] Num frames 8600...
+[2023-11-15 07:35:16,655][00663] Num frames 8700...
+[2023-11-15 07:35:16,795][00663] Num frames 8800...
+[2023-11-15 07:35:16,926][00663] Num frames 8900...
+[2023-11-15 07:35:17,111][00663] Avg episode rewards: #0: 23.095, true rewards: #0: 9.984
+[2023-11-15 07:35:17,113][00663] Avg episode reward: 23.095, avg true_objective: 9.984
+[2023-11-15 07:35:17,137][00663] Num frames 9000...
+[2023-11-15 07:35:17,273][00663] Num frames 9100...
+[2023-11-15 07:35:17,411][00663] Num frames 9200...
+[2023-11-15 07:35:17,543][00663] Num frames 9300...
+[2023-11-15 07:35:17,676][00663] Num frames 9400...
+[2023-11-15 07:35:17,809][00663] Num frames 9500...
+[2023-11-15 07:35:17,940][00663] Num frames 9600...
+[2023-11-15 07:35:18,077][00663] Num frames 9700...
+[2023-11-15 07:35:18,206][00663] Num frames 9800...
+[2023-11-15 07:35:18,340][00663] Num frames 9900...
+[2023-11-15 07:35:18,478][00663] Num frames 10000...
+[2023-11-15 07:35:18,611][00663] Num frames 10100...
+[2023-11-15 07:35:18,742][00663] Num frames 10200...
+[2023-11-15 07:35:18,871][00663] Num frames 10300...
+[2023-11-15 07:35:18,998][00663] Num frames 10400...
+[2023-11-15 07:35:19,105][00663] Avg episode rewards: #0: 24.032, true rewards: #0: 10.432
+[2023-11-15 07:35:19,107][00663] Avg episode reward: 24.032, avg true_objective: 10.432
+[2023-11-15 07:36:27,462][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!