nikxtaco's picture
Upload folder using huggingface_hub
17453de
[2023-11-15 07:01:43,246][00663] Saving configuration to /content/train_dir/default_experiment/config.json...
[2023-11-15 07:01:43,251][00663] Rollout worker 0 uses device cpu
[2023-11-15 07:01:43,255][00663] Rollout worker 1 uses device cpu
[2023-11-15 07:01:43,256][00663] Rollout worker 2 uses device cpu
[2023-11-15 07:01:43,258][00663] Rollout worker 3 uses device cpu
[2023-11-15 07:01:43,262][00663] Rollout worker 4 uses device cpu
[2023-11-15 07:01:43,264][00663] Rollout worker 5 uses device cpu
[2023-11-15 07:01:43,265][00663] Rollout worker 6 uses device cpu
[2023-11-15 07:01:43,267][00663] Rollout worker 7 uses device cpu
[2023-11-15 07:01:43,416][00663] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:01:43,417][00663] InferenceWorker_p0-w0: min num requests: 2
[2023-11-15 07:01:43,448][00663] Starting all processes...
[2023-11-15 07:01:43,450][00663] Starting process learner_proc0
[2023-11-15 07:01:43,501][00663] Starting all processes...
[2023-11-15 07:01:43,510][00663] Starting process inference_proc0-0
[2023-11-15 07:01:43,510][00663] Starting process rollout_proc0
[2023-11-15 07:01:43,512][00663] Starting process rollout_proc1
[2023-11-15 07:01:43,513][00663] Starting process rollout_proc2
[2023-11-15 07:01:43,514][00663] Starting process rollout_proc3
[2023-11-15 07:01:43,514][00663] Starting process rollout_proc4
[2023-11-15 07:01:43,514][00663] Starting process rollout_proc5
[2023-11-15 07:01:43,514][00663] Starting process rollout_proc6
[2023-11-15 07:01:43,514][00663] Starting process rollout_proc7
[2023-11-15 07:02:00,834][10761] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:02:00,838][10761] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2023-11-15 07:02:00,910][10761] Num visible devices: 1
[2023-11-15 07:02:00,941][10761] Starting seed is not provided
[2023-11-15 07:02:00,941][10761] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:02:00,941][10761] Initializing actor-critic model on device cuda:0
[2023-11-15 07:02:00,943][10761] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:02:00,948][10761] RunningMeanStd input shape: (1,)
[2023-11-15 07:02:00,987][10780] Worker 1 uses CPU cores [1]
[2023-11-15 07:02:01,005][10761] ConvEncoder: input_channels=3
[2023-11-15 07:02:01,011][10779] Worker 0 uses CPU cores [0]
[2023-11-15 07:02:01,024][10778] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:02:01,027][10778] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2023-11-15 07:02:01,056][10778] Num visible devices: 1
[2023-11-15 07:02:01,127][10785] Worker 6 uses CPU cores [0]
[2023-11-15 07:02:01,251][10786] Worker 7 uses CPU cores [1]
[2023-11-15 07:02:01,290][10781] Worker 3 uses CPU cores [1]
[2023-11-15 07:02:01,334][10783] Worker 4 uses CPU cores [0]
[2023-11-15 07:02:01,373][10782] Worker 2 uses CPU cores [0]
[2023-11-15 07:02:01,383][10784] Worker 5 uses CPU cores [1]
[2023-11-15 07:02:01,422][10761] Conv encoder output size: 512
[2023-11-15 07:02:01,422][10761] Policy head output size: 512
[2023-11-15 07:02:01,475][10761] Created Actor Critic model with architecture:
[2023-11-15 07:02:01,475][10761] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2023-11-15 07:02:01,853][10761] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-11-15 07:02:02,202][10761] No checkpoints found
[2023-11-15 07:02:02,203][10761] Did not load from checkpoint, starting from scratch!
[2023-11-15 07:02:02,203][10761] Initialized policy 0 weights for model version 0
[2023-11-15 07:02:02,207][10761] LearnerWorker_p0 finished initialization!
[2023-11-15 07:02:02,208][10761] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:02:02,385][10778] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:02:02,386][10778] RunningMeanStd input shape: (1,)
[2023-11-15 07:02:02,398][10778] ConvEncoder: input_channels=3
[2023-11-15 07:02:02,495][10778] Conv encoder output size: 512
[2023-11-15 07:02:02,495][10778] Policy head output size: 512
[2023-11-15 07:02:02,555][00663] Inference worker 0-0 is ready!
[2023-11-15 07:02:02,556][00663] All inference workers are ready! Signal rollout workers to start!
[2023-11-15 07:02:02,758][10783] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:02:02,759][10782] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:02:02,760][10779] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:02:02,761][10785] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:02:02,770][10784] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:02:02,763][10781] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:02:02,772][10780] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:02:02,773][10786] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:02:03,403][00663] Heartbeat connected on Batcher_0
[2023-11-15 07:02:03,409][00663] Heartbeat connected on LearnerWorker_p0
[2023-11-15 07:02:03,460][00663] Heartbeat connected on InferenceWorker_p0-w0
[2023-11-15 07:02:03,770][10780] Decorrelating experience for 0 frames...
[2023-11-15 07:02:03,769][10784] Decorrelating experience for 0 frames...
[2023-11-15 07:02:04,070][10782] Decorrelating experience for 0 frames...
[2023-11-15 07:02:04,076][10783] Decorrelating experience for 0 frames...
[2023-11-15 07:02:04,079][10785] Decorrelating experience for 0 frames...
[2023-11-15 07:02:04,923][10780] Decorrelating experience for 32 frames...
[2023-11-15 07:02:04,927][10784] Decorrelating experience for 32 frames...
[2023-11-15 07:02:04,992][10781] Decorrelating experience for 0 frames...
[2023-11-15 07:02:05,650][10782] Decorrelating experience for 32 frames...
[2023-11-15 07:02:05,654][10783] Decorrelating experience for 32 frames...
[2023-11-15 07:02:05,705][10779] Decorrelating experience for 0 frames...
[2023-11-15 07:02:06,206][00663] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-11-15 07:02:06,845][10786] Decorrelating experience for 0 frames...
[2023-11-15 07:02:07,400][10784] Decorrelating experience for 64 frames...
[2023-11-15 07:02:07,555][10780] Decorrelating experience for 64 frames...
[2023-11-15 07:02:07,628][10785] Decorrelating experience for 32 frames...
[2023-11-15 07:02:07,692][10779] Decorrelating experience for 32 frames...
[2023-11-15 07:02:08,073][10783] Decorrelating experience for 64 frames...
[2023-11-15 07:02:08,089][10782] Decorrelating experience for 64 frames...
[2023-11-15 07:02:09,560][10785] Decorrelating experience for 64 frames...
[2023-11-15 07:02:09,642][10779] Decorrelating experience for 64 frames...
[2023-11-15 07:02:09,771][10781] Decorrelating experience for 32 frames...
[2023-11-15 07:02:09,855][10784] Decorrelating experience for 96 frames...
[2023-11-15 07:02:09,900][10782] Decorrelating experience for 96 frames...
[2023-11-15 07:02:10,156][10786] Decorrelating experience for 32 frames...
[2023-11-15 07:02:10,296][00663] Heartbeat connected on RolloutWorker_w2
[2023-11-15 07:02:10,303][00663] Heartbeat connected on RolloutWorker_w5
[2023-11-15 07:02:11,207][00663] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-11-15 07:02:11,442][10783] Decorrelating experience for 96 frames...
[2023-11-15 07:02:11,533][10785] Decorrelating experience for 96 frames...
[2023-11-15 07:02:11,565][10780] Decorrelating experience for 96 frames...
[2023-11-15 07:02:11,747][00663] Heartbeat connected on RolloutWorker_w4
[2023-11-15 07:02:11,837][00663] Heartbeat connected on RolloutWorker_w6
[2023-11-15 07:02:11,969][00663] Heartbeat connected on RolloutWorker_w1
[2023-11-15 07:02:12,639][10786] Decorrelating experience for 64 frames...
[2023-11-15 07:02:14,290][10779] Decorrelating experience for 96 frames...
[2023-11-15 07:02:14,776][10781] Decorrelating experience for 64 frames...
[2023-11-15 07:02:15,186][00663] Heartbeat connected on RolloutWorker_w0
[2023-11-15 07:02:16,203][00663] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 120.8. Samples: 1208. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-11-15 07:02:16,206][00663] Avg episode reward: [(0, '2.474')]
[2023-11-15 07:02:16,576][10786] Decorrelating experience for 96 frames...
[2023-11-15 07:02:17,569][00663] Heartbeat connected on RolloutWorker_w7
[2023-11-15 07:02:17,852][10761] Signal inference workers to stop experience collection...
[2023-11-15 07:02:17,922][10778] InferenceWorker_p0-w0: stopping experience collection
[2023-11-15 07:02:18,028][10781] Decorrelating experience for 96 frames...
[2023-11-15 07:02:18,105][00663] Heartbeat connected on RolloutWorker_w3
[2023-11-15 07:02:18,723][10761] Signal inference workers to resume experience collection...
[2023-11-15 07:02:18,724][10778] InferenceWorker_p0-w0: resuming experience collection
[2023-11-15 07:02:20,186][10761] Stopping Batcher_0...
[2023-11-15 07:02:20,187][10761] Loop batcher_evt_loop terminating...
[2023-11-15 07:02:20,187][00663] Component Batcher_0 stopped!
[2023-11-15 07:02:20,196][10761] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth...
[2023-11-15 07:02:20,229][00663] Component RolloutWorker_w1 stopped!
[2023-11-15 07:02:20,228][10780] Stopping RolloutWorker_w1...
[2023-11-15 07:02:20,234][00663] Component RolloutWorker_w5 stopped!
[2023-11-15 07:02:20,240][10781] Stopping RolloutWorker_w3...
[2023-11-15 07:02:20,242][10781] Loop rollout_proc3_evt_loop terminating...
[2023-11-15 07:02:20,241][00663] Component RolloutWorker_w3 stopped!
[2023-11-15 07:02:20,234][10784] Stopping RolloutWorker_w5...
[2023-11-15 07:02:20,235][10780] Loop rollout_proc1_evt_loop terminating...
[2023-11-15 07:02:20,251][00663] Component RolloutWorker_w7 stopped!
[2023-11-15 07:02:20,250][10786] Stopping RolloutWorker_w7...
[2023-11-15 07:02:20,251][10784] Loop rollout_proc5_evt_loop terminating...
[2023-11-15 07:02:20,254][10786] Loop rollout_proc7_evt_loop terminating...
[2023-11-15 07:02:20,269][10778] Weights refcount: 2 0
[2023-11-15 07:02:20,270][10782] Stopping RolloutWorker_w2...
[2023-11-15 07:02:20,270][00663] Component RolloutWorker_w2 stopped!
[2023-11-15 07:02:20,272][10778] Stopping InferenceWorker_p0-w0...
[2023-11-15 07:02:20,273][10778] Loop inference_proc0-0_evt_loop terminating...
[2023-11-15 07:02:20,273][00663] Component InferenceWorker_p0-w0 stopped!
[2023-11-15 07:02:20,286][10782] Loop rollout_proc2_evt_loop terminating...
[2023-11-15 07:02:20,295][00663] Component RolloutWorker_w6 stopped!
[2023-11-15 07:02:20,303][00663] Component RolloutWorker_w0 stopped!
[2023-11-15 07:02:20,295][10785] Stopping RolloutWorker_w6...
[2023-11-15 07:02:20,303][10779] Stopping RolloutWorker_w0...
[2023-11-15 07:02:20,309][10785] Loop rollout_proc6_evt_loop terminating...
[2023-11-15 07:02:20,310][10779] Loop rollout_proc0_evt_loop terminating...
[2023-11-15 07:02:20,321][00663] Component RolloutWorker_w4 stopped!
[2023-11-15 07:02:20,321][10783] Stopping RolloutWorker_w4...
[2023-11-15 07:02:20,327][10783] Loop rollout_proc4_evt_loop terminating...
[2023-11-15 07:02:20,370][10761] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth...
[2023-11-15 07:02:20,586][00663] Component LearnerWorker_p0 stopped!
[2023-11-15 07:02:20,588][00663] Waiting for process learner_proc0 to stop...
[2023-11-15 07:02:20,586][10761] Stopping LearnerWorker_p0...
[2023-11-15 07:02:20,591][10761] Loop learner_proc0_evt_loop terminating...
[2023-11-15 07:02:22,053][00663] Waiting for process inference_proc0-0 to join...
[2023-11-15 07:02:22,102][00663] Waiting for process rollout_proc0 to join...
[2023-11-15 07:02:24,502][00663] Waiting for process rollout_proc1 to join...
[2023-11-15 07:02:24,604][00663] Waiting for process rollout_proc2 to join...
[2023-11-15 07:02:24,606][00663] Waiting for process rollout_proc3 to join...
[2023-11-15 07:02:24,609][00663] Waiting for process rollout_proc4 to join...
[2023-11-15 07:02:24,612][00663] Waiting for process rollout_proc5 to join...
[2023-11-15 07:02:24,615][00663] Waiting for process rollout_proc6 to join...
[2023-11-15 07:02:24,620][00663] Waiting for process rollout_proc7 to join...
[2023-11-15 07:02:24,623][00663] Batcher 0 profile tree view:
batching: 0.0629, releasing_batches: 0.0004
[2023-11-15 07:02:24,625][00663] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 11.1031
update_model: 0.0531
weight_update: 0.0035
one_step: 0.0117
handle_policy_step: 5.3752
deserialize: 0.0825, stack: 0.0142, obs_to_device_normalize: 0.7806, forward: 3.7947, send_messages: 0.1665
prepare_outputs: 0.4019
to_cpu: 0.1972
[2023-11-15 07:02:24,627][00663] Learner 0 profile tree view:
misc: 0.0000, prepare_batch: 3.1931
train: 1.9676
epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0005, kl_divergence: 0.0067, after_optimizer: 0.0663
calculate_losses: 0.6155
losses_init: 0.0000, forward_head: 0.3481, bptt_initial: 0.1651, tail: 0.0300, advantages_returns: 0.0021, losses: 0.0578
bptt: 0.0118
bptt_forward_core: 0.0117
update: 1.2776
clip: 0.0777
[2023-11-15 07:02:24,629][00663] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0032, enqueue_policy_requests: 0.6631, env_step: 2.7787, overhead: 0.0796, complete_rollouts: 0.0091
save_policy_outputs: 0.0847
split_output_tensors: 0.0449
[2023-11-15 07:02:24,632][00663] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.0005, enqueue_policy_requests: 0.1984, env_step: 1.3664, overhead: 0.0326, complete_rollouts: 0.0003
save_policy_outputs: 0.0129
split_output_tensors: 0.0069
[2023-11-15 07:02:24,636][00663] Loop Runner_EvtLoop terminating...
[2023-11-15 07:02:24,638][00663] Runner profile tree view:
main_loop: 41.1905
[2023-11-15 07:02:24,640][00663] Collected {0: 8192}, FPS: 198.9
[2023-11-15 07:02:45,729][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:02:45,730][00663] Overriding arg 'num_workers' with value 1 passed from command line
[2023-11-15 07:02:45,734][00663] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-11-15 07:02:45,739][00663] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-11-15 07:02:45,742][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:02:45,743][00663] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-11-15 07:02:45,749][00663] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:02:45,750][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-11-15 07:02:45,751][00663] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2023-11-15 07:02:45,753][00663] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2023-11-15 07:02:45,755][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-11-15 07:02:45,756][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-11-15 07:02:45,757][00663] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-11-15 07:02:45,758][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-11-15 07:02:45,760][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-11-15 07:02:45,797][00663] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:02:45,801][00663] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:02:45,806][00663] RunningMeanStd input shape: (1,)
[2023-11-15 07:02:45,821][00663] ConvEncoder: input_channels=3
[2023-11-15 07:02:45,925][00663] Conv encoder output size: 512
[2023-11-15 07:02:45,926][00663] Policy head output size: 512
[2023-11-15 07:02:53,558][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth...
[2023-11-15 07:02:57,671][00663] Num frames 100...
[2023-11-15 07:02:57,861][00663] Num frames 200...
[2023-11-15 07:02:58,054][00663] Num frames 300...
[2023-11-15 07:02:58,250][00663] Num frames 400...
[2023-11-15 07:02:58,398][00663] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
[2023-11-15 07:02:58,401][00663] Avg episode reward: 5.480, avg true_objective: 4.480
[2023-11-15 07:02:58,505][00663] Num frames 500...
[2023-11-15 07:02:58,692][00663] Num frames 600...
[2023-11-15 07:02:58,874][00663] Num frames 700...
[2023-11-15 07:02:59,057][00663] Num frames 800...
[2023-11-15 07:02:59,155][00663] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
[2023-11-15 07:02:59,156][00663] Avg episode reward: 4.660, avg true_objective: 4.160
[2023-11-15 07:02:59,251][00663] Num frames 900...
[2023-11-15 07:02:59,388][00663] Num frames 1000...
[2023-11-15 07:02:59,534][00663] Num frames 1100...
[2023-11-15 07:02:59,667][00663] Num frames 1200...
[2023-11-15 07:02:59,826][00663] Avg episode rewards: #0: 4.933, true rewards: #0: 4.267
[2023-11-15 07:02:59,828][00663] Avg episode reward: 4.933, avg true_objective: 4.267
[2023-11-15 07:02:59,858][00663] Num frames 1300...
[2023-11-15 07:02:59,983][00663] Num frames 1400...
[2023-11-15 07:03:00,112][00663] Num frames 1500...
[2023-11-15 07:03:00,241][00663] Num frames 1600...
[2023-11-15 07:03:00,367][00663] Num frames 1700...
[2023-11-15 07:03:00,462][00663] Avg episode rewards: #0: 5.070, true rewards: #0: 4.320
[2023-11-15 07:03:00,463][00663] Avg episode reward: 5.070, avg true_objective: 4.320
[2023-11-15 07:03:00,561][00663] Num frames 1800...
[2023-11-15 07:03:00,688][00663] Num frames 1900...
[2023-11-15 07:03:00,815][00663] Num frames 2000...
[2023-11-15 07:03:00,939][00663] Num frames 2100...
[2023-11-15 07:03:01,047][00663] Avg episode rewards: #0: 4.888, true rewards: #0: 4.288
[2023-11-15 07:03:01,050][00663] Avg episode reward: 4.888, avg true_objective: 4.288
[2023-11-15 07:03:01,123][00663] Num frames 2200...
[2023-11-15 07:03:01,253][00663] Num frames 2300...
[2023-11-15 07:03:01,386][00663] Num frames 2400...
[2023-11-15 07:03:01,514][00663] Num frames 2500...
[2023-11-15 07:03:01,698][00663] Avg episode rewards: #0: 4.987, true rewards: #0: 4.320
[2023-11-15 07:03:01,700][00663] Avg episode reward: 4.987, avg true_objective: 4.320
[2023-11-15 07:03:01,718][00663] Num frames 2600...
[2023-11-15 07:03:01,862][00663] Num frames 2700...
[2023-11-15 07:03:01,996][00663] Num frames 2800...
[2023-11-15 07:03:02,124][00663] Num frames 2900...
[2023-11-15 07:03:02,278][00663] Avg episode rewards: #0: 4.823, true rewards: #0: 4.251
[2023-11-15 07:03:02,279][00663] Avg episode reward: 4.823, avg true_objective: 4.251
[2023-11-15 07:03:02,313][00663] Num frames 3000...
[2023-11-15 07:03:02,439][00663] Num frames 3100...
[2023-11-15 07:03:02,585][00663] Num frames 3200...
[2023-11-15 07:03:02,713][00663] Num frames 3300...
[2023-11-15 07:03:02,845][00663] Avg episode rewards: #0: 4.700, true rewards: #0: 4.200
[2023-11-15 07:03:02,846][00663] Avg episode reward: 4.700, avg true_objective: 4.200
[2023-11-15 07:03:02,903][00663] Num frames 3400...
[2023-11-15 07:03:03,028][00663] Num frames 3500...
[2023-11-15 07:03:03,164][00663] Num frames 3600...
[2023-11-15 07:03:03,291][00663] Num frames 3700...
[2023-11-15 07:03:03,401][00663] Avg episode rewards: #0: 4.604, true rewards: #0: 4.160
[2023-11-15 07:03:03,402][00663] Avg episode reward: 4.604, avg true_objective: 4.160
[2023-11-15 07:03:03,475][00663] Num frames 3800...
[2023-11-15 07:03:03,606][00663] Num frames 3900...
[2023-11-15 07:03:03,733][00663] Num frames 4000...
[2023-11-15 07:03:03,860][00663] Num frames 4100...
[2023-11-15 07:03:03,954][00663] Avg episode rewards: #0: 4.528, true rewards: #0: 4.128
[2023-11-15 07:03:03,955][00663] Avg episode reward: 4.528, avg true_objective: 4.128
[2023-11-15 07:03:30,284][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2023-11-15 07:03:44,458][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:03:44,463][00663] Overriding arg 'num_workers' with value 1 passed from command line
[2023-11-15 07:03:44,465][00663] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-11-15 07:03:44,467][00663] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-11-15 07:03:44,473][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:03:44,474][00663] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-11-15 07:03:44,476][00663] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2023-11-15 07:03:44,477][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-11-15 07:03:44,478][00663] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2023-11-15 07:03:44,484][00663] Adding new argument 'hf_repository'='nikxtaco/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2023-11-15 07:03:44,485][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-11-15 07:03:44,486][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-11-15 07:03:44,487][00663] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-11-15 07:03:44,488][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-11-15 07:03:44,489][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-11-15 07:03:44,548][00663] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:03:44,553][00663] RunningMeanStd input shape: (1,)
[2023-11-15 07:03:44,575][00663] ConvEncoder: input_channels=3
[2023-11-15 07:03:44,652][00663] Conv encoder output size: 512
[2023-11-15 07:03:44,655][00663] Policy head output size: 512
[2023-11-15 07:03:44,682][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth...
[2023-11-15 07:03:45,285][00663] Num frames 100...
[2023-11-15 07:03:45,421][00663] Num frames 200...
[2023-11-15 07:03:45,563][00663] Num frames 300...
[2023-11-15 07:03:45,734][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:03:45,737][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:03:45,763][00663] Num frames 400...
[2023-11-15 07:03:45,894][00663] Num frames 500...
[2023-11-15 07:03:46,058][00663] Num frames 600...
[2023-11-15 07:03:46,234][00663] Num frames 700...
[2023-11-15 07:03:46,379][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:03:46,380][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:03:46,428][00663] Num frames 800...
[2023-11-15 07:03:46,552][00663] Num frames 900...
[2023-11-15 07:03:46,685][00663] Num frames 1000...
[2023-11-15 07:03:46,815][00663] Num frames 1100...
[2023-11-15 07:03:46,939][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:03:46,942][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:03:47,005][00663] Num frames 1200...
[2023-11-15 07:03:47,138][00663] Num frames 1300...
[2023-11-15 07:03:47,265][00663] Num frames 1400...
[2023-11-15 07:03:47,393][00663] Num frames 1500...
[2023-11-15 07:03:47,498][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:03:47,500][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:03:47,584][00663] Num frames 1600...
[2023-11-15 07:03:47,719][00663] Num frames 1700...
[2023-11-15 07:03:47,849][00663] Num frames 1800...
[2023-11-15 07:03:47,973][00663] Num frames 1900...
[2023-11-15 07:03:48,055][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:03:48,057][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:03:48,163][00663] Num frames 2000...
[2023-11-15 07:03:48,289][00663] Num frames 2100...
[2023-11-15 07:03:48,419][00663] Num frames 2200...
[2023-11-15 07:03:48,545][00663] Num frames 2300...
[2023-11-15 07:03:48,606][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:03:48,607][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:03:48,741][00663] Num frames 2400...
[2023-11-15 07:03:48,869][00663] Num frames 2500...
[2023-11-15 07:03:48,992][00663] Num frames 2600...
[2023-11-15 07:03:49,160][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:03:49,161][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:03:49,181][00663] Num frames 2700...
[2023-11-15 07:03:49,313][00663] Num frames 2800...
[2023-11-15 07:03:49,439][00663] Num frames 2900...
[2023-11-15 07:03:49,564][00663] Num frames 3000...
[2023-11-15 07:03:49,711][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:03:49,713][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:03:49,757][00663] Num frames 3100...
[2023-11-15 07:03:49,883][00663] Num frames 3200...
[2023-11-15 07:03:50,006][00663] Num frames 3300...
[2023-11-15 07:03:50,148][00663] Num frames 3400...
[2023-11-15 07:03:50,274][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:03:50,276][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:03:50,335][00663] Num frames 3500...
[2023-11-15 07:03:50,460][00663] Num frames 3600...
[2023-11-15 07:03:50,584][00663] Num frames 3700...
[2023-11-15 07:03:50,718][00663] Num frames 3800...
[2023-11-15 07:03:50,834][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:03:50,836][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:04:09,945][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2023-11-15 07:04:24,386][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:04:24,389][00663] Overriding arg 'num_workers' with value 1 passed from command line
[2023-11-15 07:04:24,391][00663] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-11-15 07:04:24,393][00663] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-11-15 07:04:24,396][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:04:24,398][00663] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-11-15 07:04:24,400][00663] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2023-11-15 07:04:24,402][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-11-15 07:04:24,403][00663] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2023-11-15 07:04:24,404][00663] Adding new argument 'hf_repository'='nikxtaco/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2023-11-15 07:04:24,405][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-11-15 07:04:24,406][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-11-15 07:04:24,407][00663] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-11-15 07:04:24,409][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-11-15 07:04:24,410][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-11-15 07:04:24,447][00663] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:04:24,449][00663] RunningMeanStd input shape: (1,)
[2023-11-15 07:04:24,462][00663] ConvEncoder: input_channels=3
[2023-11-15 07:04:24,498][00663] Conv encoder output size: 512
[2023-11-15 07:04:24,499][00663] Policy head output size: 512
[2023-11-15 07:04:24,518][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth...
[2023-11-15 07:04:24,954][00663] Num frames 100...
[2023-11-15 07:04:25,083][00663] Num frames 200...
[2023-11-15 07:04:25,208][00663] Num frames 300...
[2023-11-15 07:04:25,402][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:04:25,405][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:04:25,438][00663] Num frames 400...
[2023-11-15 07:04:25,627][00663] Num frames 500...
[2023-11-15 07:04:25,826][00663] Num frames 600...
[2023-11-15 07:04:26,017][00663] Num frames 700...
[2023-11-15 07:04:26,206][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:04:26,208][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:04:26,273][00663] Num frames 800...
[2023-11-15 07:04:26,472][00663] Num frames 900...
[2023-11-15 07:04:26,666][00663] Num frames 1000...
[2023-11-15 07:04:26,875][00663] Num frames 1100...
[2023-11-15 07:04:27,057][00663] Num frames 1200...
[2023-11-15 07:04:27,142][00663] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053
[2023-11-15 07:04:27,145][00663] Avg episode reward: 4.387, avg true_objective: 4.053
[2023-11-15 07:04:27,312][00663] Num frames 1300...
[2023-11-15 07:04:27,525][00663] Num frames 1400...
[2023-11-15 07:04:27,728][00663] Num frames 1500...
[2023-11-15 07:04:27,956][00663] Num frames 1600...
[2023-11-15 07:04:28,012][00663] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000
[2023-11-15 07:04:28,014][00663] Avg episode reward: 4.250, avg true_objective: 4.000
[2023-11-15 07:04:28,206][00663] Num frames 1700...
[2023-11-15 07:04:28,392][00663] Num frames 1800...
[2023-11-15 07:04:28,570][00663] Num frames 1900...
[2023-11-15 07:04:28,780][00663] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968
[2023-11-15 07:04:28,782][00663] Avg episode reward: 4.168, avg true_objective: 3.968
[2023-11-15 07:04:28,815][00663] Num frames 2000...
[2023-11-15 07:04:28,992][00663] Num frames 2100...
[2023-11-15 07:04:29,175][00663] Num frames 2200...
[2023-11-15 07:04:29,362][00663] Num frames 2300...
[2023-11-15 07:04:29,549][00663] Avg episode rewards: #0: 4.113, true rewards: #0: 3.947
[2023-11-15 07:04:29,551][00663] Avg episode reward: 4.113, avg true_objective: 3.947
[2023-11-15 07:04:29,615][00663] Num frames 2400...
[2023-11-15 07:04:29,807][00663] Num frames 2500...
[2023-11-15 07:04:29,995][00663] Num frames 2600...
[2023-11-15 07:04:30,176][00663] Num frames 2700...
[2023-11-15 07:04:30,332][00663] Avg episode rewards: #0: 4.074, true rewards: #0: 3.931
[2023-11-15 07:04:30,335][00663] Avg episode reward: 4.074, avg true_objective: 3.931
[2023-11-15 07:04:30,431][00663] Num frames 2800...
[2023-11-15 07:04:30,613][00663] Num frames 2900...
[2023-11-15 07:04:30,802][00663] Num frames 3000...
[2023-11-15 07:04:30,988][00663] Num frames 3100...
[2023-11-15 07:04:31,163][00663] Avg episode rewards: #0: 4.353, true rewards: #0: 3.977
[2023-11-15 07:04:31,164][00663] Avg episode reward: 4.353, avg true_objective: 3.977
[2023-11-15 07:04:31,190][00663] Num frames 3200...
[2023-11-15 07:04:31,325][00663] Num frames 3300...
[2023-11-15 07:04:31,455][00663] Num frames 3400...
[2023-11-15 07:04:31,583][00663] Num frames 3500...
[2023-11-15 07:04:31,725][00663] Avg episode rewards: #0: 4.296, true rewards: #0: 3.962
[2023-11-15 07:04:31,726][00663] Avg episode reward: 4.296, avg true_objective: 3.962
[2023-11-15 07:04:31,772][00663] Num frames 3600...
[2023-11-15 07:04:31,903][00663] Num frames 3700...
[2023-11-15 07:04:32,033][00663] Num frames 3800...
[2023-11-15 07:04:32,159][00663] Num frames 3900...
[2023-11-15 07:04:32,287][00663] Avg episode rewards: #0: 4.250, true rewards: #0: 3.950
[2023-11-15 07:04:32,288][00663] Avg episode reward: 4.250, avg true_objective: 3.950
[2023-11-15 07:04:53,857][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2023-11-15 07:04:56,399][00663] The model has been pushed to https://huggingface.co/nikxtaco/rl_course_vizdoom_health_gathering_supreme
[2023-11-15 07:05:42,410][00663] Environment doom_basic already registered, overwriting...
[2023-11-15 07:05:42,415][00663] Environment doom_two_colors_easy already registered, overwriting...
[2023-11-15 07:05:42,416][00663] Environment doom_two_colors_hard already registered, overwriting...
[2023-11-15 07:05:42,417][00663] Environment doom_dm already registered, overwriting...
[2023-11-15 07:05:42,419][00663] Environment doom_dwango5 already registered, overwriting...
[2023-11-15 07:05:42,421][00663] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2023-11-15 07:05:42,422][00663] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2023-11-15 07:05:42,424][00663] Environment doom_my_way_home already registered, overwriting...
[2023-11-15 07:05:42,425][00663] Environment doom_deadly_corridor already registered, overwriting...
[2023-11-15 07:05:42,426][00663] Environment doom_defend_the_center already registered, overwriting...
[2023-11-15 07:05:42,428][00663] Environment doom_defend_the_line already registered, overwriting...
[2023-11-15 07:05:42,429][00663] Environment doom_health_gathering already registered, overwriting...
[2023-11-15 07:05:42,430][00663] Environment doom_health_gathering_supreme already registered, overwriting...
[2023-11-15 07:05:42,432][00663] Environment doom_battle already registered, overwriting...
[2023-11-15 07:05:42,433][00663] Environment doom_battle2 already registered, overwriting...
[2023-11-15 07:05:42,434][00663] Environment doom_duel_bots already registered, overwriting...
[2023-11-15 07:05:42,435][00663] Environment doom_deathmatch_bots already registered, overwriting...
[2023-11-15 07:05:42,442][00663] Environment doom_duel already registered, overwriting...
[2023-11-15 07:05:42,443][00663] Environment doom_deathmatch_full already registered, overwriting...
[2023-11-15 07:05:42,445][00663] Environment doom_benchmark already registered, overwriting...
[2023-11-15 07:05:42,446][00663] register_encoder_factory: <function make_vizdoom_encoder at 0x7e8c58d712d0>
[2023-11-15 07:05:42,475][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:05:42,477][00663] Overriding arg 'train_for_env_steps' with value 50000 passed from command line
[2023-11-15 07:05:42,483][00663] Experiment dir /content/train_dir/default_experiment already exists!
[2023-11-15 07:05:42,487][00663] Resuming existing experiment from /content/train_dir/default_experiment...
[2023-11-15 07:05:42,489][00663] Weights and Biases integration disabled
[2023-11-15 07:05:42,492][00663] Environment var CUDA_VISIBLE_DEVICES is 0
[2023-11-15 07:05:44,510][00663] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=50000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000}
git_hash=unknown
git_repo_name=not a git repository
[2023-11-15 07:05:44,512][00663] Saving configuration to /content/train_dir/default_experiment/config.json...
[2023-11-15 07:05:44,519][00663] Rollout worker 0 uses device cpu
[2023-11-15 07:05:44,521][00663] Rollout worker 1 uses device cpu
[2023-11-15 07:05:44,523][00663] Rollout worker 2 uses device cpu
[2023-11-15 07:05:44,525][00663] Rollout worker 3 uses device cpu
[2023-11-15 07:05:44,526][00663] Rollout worker 4 uses device cpu
[2023-11-15 07:05:44,528][00663] Rollout worker 5 uses device cpu
[2023-11-15 07:05:44,530][00663] Rollout worker 6 uses device cpu
[2023-11-15 07:05:44,532][00663] Rollout worker 7 uses device cpu
[2023-11-15 07:05:44,605][00663] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:05:44,607][00663] InferenceWorker_p0-w0: min num requests: 2
[2023-11-15 07:05:44,639][00663] Starting all processes...
[2023-11-15 07:05:44,641][00663] Starting process learner_proc0
[2023-11-15 07:05:44,691][00663] Starting all processes...
[2023-11-15 07:05:44,700][00663] Starting process inference_proc0-0
[2023-11-15 07:05:44,700][00663] Starting process rollout_proc0
[2023-11-15 07:05:44,700][00663] Starting process rollout_proc1
[2023-11-15 07:05:44,700][00663] Starting process rollout_proc2
[2023-11-15 07:05:44,701][00663] Starting process rollout_proc3
[2023-11-15 07:05:44,701][00663] Starting process rollout_proc4
[2023-11-15 07:05:44,701][00663] Starting process rollout_proc5
[2023-11-15 07:05:44,701][00663] Starting process rollout_proc6
[2023-11-15 07:05:44,701][00663] Starting process rollout_proc7
[2023-11-15 07:06:02,104][15564] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:06:02,109][15564] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2023-11-15 07:06:02,157][15564] Num visible devices: 1
[2023-11-15 07:06:02,201][15564] Starting seed is not provided
[2023-11-15 07:06:02,202][15564] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:06:02,203][15564] Initializing actor-critic model on device cuda:0
[2023-11-15 07:06:02,204][15564] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:06:02,205][15564] RunningMeanStd input shape: (1,)
[2023-11-15 07:06:02,287][15564] ConvEncoder: input_channels=3
[2023-11-15 07:06:02,401][15582] Worker 3 uses CPU cores [1]
[2023-11-15 07:06:02,521][15581] Worker 4 uses CPU cores [0]
[2023-11-15 07:06:02,535][15578] Worker 0 uses CPU cores [0]
[2023-11-15 07:06:02,550][15577] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:06:02,554][15577] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2023-11-15 07:06:02,600][15577] Num visible devices: 1
[2023-11-15 07:06:02,629][15585] Worker 7 uses CPU cores [1]
[2023-11-15 07:06:02,639][15579] Worker 1 uses CPU cores [1]
[2023-11-15 07:06:02,652][15580] Worker 2 uses CPU cores [0]
[2023-11-15 07:06:02,683][15583] Worker 6 uses CPU cores [0]
[2023-11-15 07:06:02,704][15564] Conv encoder output size: 512
[2023-11-15 07:06:02,704][15564] Policy head output size: 512
[2023-11-15 07:06:02,715][15584] Worker 5 uses CPU cores [1]
[2023-11-15 07:06:02,724][15564] Created Actor Critic model with architecture:
[2023-11-15 07:06:02,725][15564] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2023-11-15 07:06:02,956][15564] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-11-15 07:06:03,250][15564] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth...
[2023-11-15 07:06:03,284][15564] Loading model from checkpoint
[2023-11-15 07:06:03,286][15564] Loaded experiment state at self.train_step=2, self.env_steps=8192
[2023-11-15 07:06:03,287][15564] Initialized policy 0 weights for model version 2
[2023-11-15 07:06:03,291][15564] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:06:03,298][15564] LearnerWorker_p0 finished initialization!
[2023-11-15 07:06:03,495][15577] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:06:03,496][15577] RunningMeanStd input shape: (1,)
[2023-11-15 07:06:03,510][15577] ConvEncoder: input_channels=3
[2023-11-15 07:06:03,617][15577] Conv encoder output size: 512
[2023-11-15 07:06:03,619][15577] Policy head output size: 512
[2023-11-15 07:06:03,687][00663] Inference worker 0-0 is ready!
[2023-11-15 07:06:03,689][00663] All inference workers are ready! Signal rollout workers to start!
[2023-11-15 07:06:03,984][15581] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:06:03,984][15579] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:06:03,986][15584] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:06:04,000][15578] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:06:04,004][15580] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:06:04,009][15585] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:06:04,017][15583] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:06:04,018][15582] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:06:04,598][00663] Heartbeat connected on Batcher_0
[2023-11-15 07:06:04,602][00663] Heartbeat connected on LearnerWorker_p0
[2023-11-15 07:06:04,634][00663] Heartbeat connected on InferenceWorker_p0-w0
[2023-11-15 07:06:05,427][15580] Decorrelating experience for 0 frames...
[2023-11-15 07:06:05,427][15579] Decorrelating experience for 0 frames...
[2023-11-15 07:06:05,424][15581] Decorrelating experience for 0 frames...
[2023-11-15 07:06:05,424][15584] Decorrelating experience for 0 frames...
[2023-11-15 07:06:05,430][15578] Decorrelating experience for 0 frames...
[2023-11-15 07:06:05,428][15585] Decorrelating experience for 0 frames...
[2023-11-15 07:06:06,560][15584] Decorrelating experience for 32 frames...
[2023-11-15 07:06:06,562][15579] Decorrelating experience for 32 frames...
[2023-11-15 07:06:06,584][15580] Decorrelating experience for 32 frames...
[2023-11-15 07:06:06,590][15578] Decorrelating experience for 32 frames...
[2023-11-15 07:06:06,593][15581] Decorrelating experience for 32 frames...
[2023-11-15 07:06:06,603][15582] Decorrelating experience for 0 frames...
[2023-11-15 07:06:07,376][15578] Decorrelating experience for 64 frames...
[2023-11-15 07:06:07,492][00663] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8192. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-11-15 07:06:07,670][15585] Decorrelating experience for 32 frames...
[2023-11-15 07:06:07,900][15579] Decorrelating experience for 64 frames...
[2023-11-15 07:06:07,911][15584] Decorrelating experience for 64 frames...
[2023-11-15 07:06:08,312][15580] Decorrelating experience for 64 frames...
[2023-11-15 07:06:08,724][15582] Decorrelating experience for 32 frames...
[2023-11-15 07:06:08,777][15580] Decorrelating experience for 96 frames...
[2023-11-15 07:06:08,945][00663] Heartbeat connected on RolloutWorker_w2
[2023-11-15 07:06:09,022][15585] Decorrelating experience for 64 frames...
[2023-11-15 07:06:09,575][15583] Decorrelating experience for 0 frames...
[2023-11-15 07:06:09,653][15579] Decorrelating experience for 96 frames...
[2023-11-15 07:06:09,960][00663] Heartbeat connected on RolloutWorker_w1
[2023-11-15 07:06:10,801][15582] Decorrelating experience for 64 frames...
[2023-11-15 07:06:11,136][15585] Decorrelating experience for 96 frames...
[2023-11-15 07:06:11,188][15583] Decorrelating experience for 32 frames...
[2023-11-15 07:06:11,618][00663] Heartbeat connected on RolloutWorker_w7
[2023-11-15 07:06:11,635][15581] Decorrelating experience for 64 frames...
[2023-11-15 07:06:11,812][15584] Decorrelating experience for 96 frames...
[2023-11-15 07:06:12,405][00663] Heartbeat connected on RolloutWorker_w5
[2023-11-15 07:06:12,492][00663] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8192. Throughput: 0: 22.4. Samples: 112. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-11-15 07:06:12,497][00663] Avg episode reward: [(0, '2.780')]
[2023-11-15 07:06:15,309][15582] Decorrelating experience for 96 frames...
[2023-11-15 07:06:15,644][15578] Decorrelating experience for 96 frames...
[2023-11-15 07:06:16,053][15583] Decorrelating experience for 64 frames...
[2023-11-15 07:06:16,151][00663] Heartbeat connected on RolloutWorker_w0
[2023-11-15 07:06:16,238][00663] Heartbeat connected on RolloutWorker_w3
[2023-11-15 07:06:17,500][00663] Fps is (10 sec: 409.3, 60 sec: 409.3, 300 sec: 409.3). Total num frames: 12288. Throughput: 0: 185.3. Samples: 1854. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-11-15 07:06:17,505][00663] Avg episode reward: [(0, '3.239')]
[2023-11-15 07:06:19,033][15581] Decorrelating experience for 96 frames...
[2023-11-15 07:06:19,544][00663] Heartbeat connected on RolloutWorker_w4
[2023-11-15 07:06:20,054][15583] Decorrelating experience for 96 frames...
[2023-11-15 07:06:20,549][00663] Heartbeat connected on RolloutWorker_w6
[2023-11-15 07:06:22,492][00663] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 24576. Throughput: 0: 319.9. Samples: 4798. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:06:22,503][00663] Avg episode reward: [(0, '3.534')]
[2023-11-15 07:06:27,492][00663] Fps is (10 sec: 2869.5, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 40960. Throughput: 0: 340.1. Samples: 6802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:06:27,495][00663] Avg episode reward: [(0, '3.986')]
[2023-11-15 07:06:28,868][15577] Updated weights for policy 0, policy_version 12 (0.0035)
[2023-11-15 07:06:31,349][15564] Stopping Batcher_0...
[2023-11-15 07:06:31,352][15564] Loop batcher_evt_loop terminating...
[2023-11-15 07:06:31,350][00663] Component Batcher_0 stopped!
[2023-11-15 07:06:31,354][15564] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000014_57344.pth...
[2023-11-15 07:06:31,392][00663] Component RolloutWorker_w3 stopped!
[2023-11-15 07:06:31,390][15582] Stopping RolloutWorker_w3...
[2023-11-15 07:06:31,405][15583] Stopping RolloutWorker_w6...
[2023-11-15 07:06:31,405][15583] Loop rollout_proc6_evt_loop terminating...
[2023-11-15 07:06:31,405][00663] Component RolloutWorker_w6 stopped!
[2023-11-15 07:06:31,421][15580] Stopping RolloutWorker_w2...
[2023-11-15 07:06:31,421][00663] Component RolloutWorker_w2 stopped!
[2023-11-15 07:06:31,402][15582] Loop rollout_proc3_evt_loop terminating...
[2023-11-15 07:06:31,430][15581] Stopping RolloutWorker_w4...
[2023-11-15 07:06:31,430][15581] Loop rollout_proc4_evt_loop terminating...
[2023-11-15 07:06:31,430][00663] Component RolloutWorker_w4 stopped!
[2023-11-15 07:06:31,422][15580] Loop rollout_proc2_evt_loop terminating...
[2023-11-15 07:06:31,450][00663] Component RolloutWorker_w1 stopped!
[2023-11-15 07:06:31,452][00663] Component RolloutWorker_w5 stopped!
[2023-11-15 07:06:31,454][15584] Stopping RolloutWorker_w5...
[2023-11-15 07:06:31,454][15579] Stopping RolloutWorker_w1...
[2023-11-15 07:06:31,456][15577] Weights refcount: 2 0
[2023-11-15 07:06:31,463][00663] Component RolloutWorker_w7 stopped!
[2023-11-15 07:06:31,460][15584] Loop rollout_proc5_evt_loop terminating...
[2023-11-15 07:06:31,465][15585] Stopping RolloutWorker_w7...
[2023-11-15 07:06:31,468][15579] Loop rollout_proc1_evt_loop terminating...
[2023-11-15 07:06:31,469][15577] Stopping InferenceWorker_p0-w0...
[2023-11-15 07:06:31,469][15577] Loop inference_proc0-0_evt_loop terminating...
[2023-11-15 07:06:31,469][00663] Component InferenceWorker_p0-w0 stopped!
[2023-11-15 07:06:31,471][15585] Loop rollout_proc7_evt_loop terminating...
[2023-11-15 07:06:31,483][00663] Component RolloutWorker_w0 stopped!
[2023-11-15 07:06:31,483][15578] Stopping RolloutWorker_w0...
[2023-11-15 07:06:31,487][15578] Loop rollout_proc0_evt_loop terminating...
[2023-11-15 07:06:31,532][15564] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000014_57344.pth...
[2023-11-15 07:06:31,711][15564] Stopping LearnerWorker_p0...
[2023-11-15 07:06:31,712][15564] Loop learner_proc0_evt_loop terminating...
[2023-11-15 07:06:31,712][00663] Component LearnerWorker_p0 stopped!
[2023-11-15 07:06:31,719][00663] Waiting for process learner_proc0 to stop...
[2023-11-15 07:06:32,881][00663] Waiting for process inference_proc0-0 to join...
[2023-11-15 07:06:32,888][00663] Waiting for process rollout_proc0 to join...
[2023-11-15 07:06:34,960][00663] Waiting for process rollout_proc1 to join...
[2023-11-15 07:06:35,031][00663] Waiting for process rollout_proc2 to join...
[2023-11-15 07:06:35,038][00663] Waiting for process rollout_proc3 to join...
[2023-11-15 07:06:35,039][00663] Waiting for process rollout_proc4 to join...
[2023-11-15 07:06:35,043][00663] Waiting for process rollout_proc5 to join...
[2023-11-15 07:06:35,044][00663] Waiting for process rollout_proc6 to join...
[2023-11-15 07:06:35,046][00663] Waiting for process rollout_proc7 to join...
[2023-11-15 07:06:35,049][00663] Batcher 0 profile tree view:
batching: 0.2616, releasing_batches: 0.0003
[2023-11-15 07:06:35,051][00663] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
wait_policy_total: 16.4046
update_model: 0.1553
weight_update: 0.0041
one_step: 0.0107
handle_policy_step: 10.3452
deserialize: 0.2265, stack: 0.0644, obs_to_device_normalize: 1.9478, forward: 6.1978, send_messages: 0.3968
prepare_outputs: 1.0654
to_cpu: 0.6187
[2023-11-15 07:06:35,053][00663] Learner 0 profile tree view:
misc: 0.0001, prepare_batch: 2.9713
train: 2.4846
epoch_init: 0.0001, minibatch_init: 0.0012, losses_postprocess: 0.0023, kl_divergence: 0.0143, after_optimizer: 0.0802
calculate_losses: 0.7899
losses_init: 0.0000, forward_head: 0.3382, bptt_initial: 0.3253, tail: 0.0290, advantages_returns: 0.0054, losses: 0.0731
bptt: 0.0164
bptt_forward_core: 0.0157
update: 1.5862
clip: 0.0658
[2023-11-15 07:06:35,054][00663] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0024, enqueue_policy_requests: 2.1673, env_step: 10.8838, overhead: 0.3617, complete_rollouts: 0.0646
save_policy_outputs: 0.2600
split_output_tensors: 0.1015
[2023-11-15 07:06:35,055][00663] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.0027, enqueue_policy_requests: 2.9577, env_step: 14.0803, overhead: 0.4581, complete_rollouts: 0.1817
save_policy_outputs: 0.3234
split_output_tensors: 0.1433
[2023-11-15 07:06:35,058][00663] Loop Runner_EvtLoop terminating...
[2023-11-15 07:06:35,059][00663] Runner profile tree view:
main_loop: 50.4204
[2023-11-15 07:06:35,061][00663] Collected {0: 57344}, FPS: 974.8
[2023-11-15 07:06:35,093][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:06:35,094][00663] Overriding arg 'num_workers' with value 1 passed from command line
[2023-11-15 07:06:35,096][00663] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-11-15 07:06:35,099][00663] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-11-15 07:06:35,100][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:06:35,103][00663] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-11-15 07:06:35,105][00663] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:06:35,106][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-11-15 07:06:35,107][00663] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2023-11-15 07:06:35,108][00663] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2023-11-15 07:06:35,109][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-11-15 07:06:35,110][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-11-15 07:06:35,111][00663] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-11-15 07:06:35,112][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-11-15 07:06:35,114][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-11-15 07:06:35,146][00663] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:06:35,148][00663] RunningMeanStd input shape: (1,)
[2023-11-15 07:06:35,160][00663] ConvEncoder: input_channels=3
[2023-11-15 07:06:35,197][00663] Conv encoder output size: 512
[2023-11-15 07:06:35,198][00663] Policy head output size: 512
[2023-11-15 07:06:35,218][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000014_57344.pth...
[2023-11-15 07:06:35,658][00663] Num frames 100...
[2023-11-15 07:06:35,854][00663] Num frames 200...
[2023-11-15 07:06:36,027][00663] Num frames 300...
[2023-11-15 07:06:36,216][00663] Num frames 400...
[2023-11-15 07:06:36,359][00663] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
[2023-11-15 07:06:36,363][00663] Avg episode reward: 5.480, avg true_objective: 4.480
[2023-11-15 07:06:36,463][00663] Num frames 500...
[2023-11-15 07:06:36,653][00663] Num frames 600...
[2023-11-15 07:06:36,843][00663] Num frames 700...
[2023-11-15 07:06:36,910][00663] Avg episode rewards: #0: 4.020, true rewards: #0: 3.520
[2023-11-15 07:06:36,914][00663] Avg episode reward: 4.020, avg true_objective: 3.520
[2023-11-15 07:06:37,097][00663] Num frames 800...
[2023-11-15 07:06:37,276][00663] Num frames 900...
[2023-11-15 07:06:37,457][00663] Num frames 1000...
[2023-11-15 07:06:37,677][00663] Avg episode rewards: #0: 3.960, true rewards: #0: 3.627
[2023-11-15 07:06:37,679][00663] Avg episode reward: 3.960, avg true_objective: 3.627
[2023-11-15 07:06:37,706][00663] Num frames 1100...
[2023-11-15 07:06:37,888][00663] Num frames 1200...
[2023-11-15 07:06:38,067][00663] Num frames 1300...
[2023-11-15 07:06:38,286][00663] Avg episode rewards: #0: 3.728, true rewards: #0: 3.477
[2023-11-15 07:06:38,288][00663] Avg episode reward: 3.728, avg true_objective: 3.477
[2023-11-15 07:06:38,313][00663] Num frames 1400...
[2023-11-15 07:06:38,497][00663] Num frames 1500...
[2023-11-15 07:06:38,686][00663] Num frames 1600...
[2023-11-15 07:06:38,874][00663] Num frames 1700...
[2023-11-15 07:06:39,061][00663] Num frames 1800...
[2023-11-15 07:06:39,194][00663] Avg episode rewards: #0: 4.078, true rewards: #0: 3.678
[2023-11-15 07:06:39,196][00663] Avg episode reward: 4.078, avg true_objective: 3.678
[2023-11-15 07:06:39,327][00663] Num frames 1900...
[2023-11-15 07:06:39,516][00663] Num frames 2000...
[2023-11-15 07:06:39,702][00663] Num frames 2100...
[2023-11-15 07:06:39,891][00663] Num frames 2200...
[2023-11-15 07:06:39,993][00663] Avg episode rewards: #0: 4.038, true rewards: #0: 3.705
[2023-11-15 07:06:39,995][00663] Avg episode reward: 4.038, avg true_objective: 3.705
[2023-11-15 07:06:40,144][00663] Num frames 2300...
[2023-11-15 07:06:40,331][00663] Num frames 2400...
[2023-11-15 07:06:40,522][00663] Num frames 2500...
[2023-11-15 07:06:40,715][00663] Num frames 2600...
[2023-11-15 07:06:40,791][00663] Avg episode rewards: #0: 4.010, true rewards: #0: 3.724
[2023-11-15 07:06:40,793][00663] Avg episode reward: 4.010, avg true_objective: 3.724
[2023-11-15 07:06:40,965][00663] Num frames 2700...
[2023-11-15 07:06:41,109][00663] Num frames 2800...
[2023-11-15 07:06:41,238][00663] Num frames 2900...
[2023-11-15 07:06:41,369][00663] Num frames 3000...
[2023-11-15 07:06:41,455][00663] Avg episode rewards: #0: 4.154, true rewards: #0: 3.779
[2023-11-15 07:06:41,456][00663] Avg episode reward: 4.154, avg true_objective: 3.779
[2023-11-15 07:06:41,554][00663] Num frames 3100...
[2023-11-15 07:06:41,681][00663] Num frames 3200...
[2023-11-15 07:06:41,819][00663] Num frames 3300...
[2023-11-15 07:06:41,952][00663] Num frames 3400...
[2023-11-15 07:06:42,102][00663] Avg episode rewards: #0: 4.301, true rewards: #0: 3.857
[2023-11-15 07:06:42,103][00663] Avg episode reward: 4.301, avg true_objective: 3.857
[2023-11-15 07:06:42,143][00663] Num frames 3500...
[2023-11-15 07:06:42,276][00663] Num frames 3600...
[2023-11-15 07:06:42,403][00663] Num frames 3700...
[2023-11-15 07:06:42,493][00663] Avg episode rewards: #0: 4.127, true rewards: #0: 3.727
[2023-11-15 07:06:42,495][00663] Avg episode reward: 4.127, avg true_objective: 3.727
[2023-11-15 07:07:04,686][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2023-11-15 07:07:04,849][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:07:04,851][00663] Overriding arg 'num_workers' with value 1 passed from command line
[2023-11-15 07:07:04,853][00663] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-11-15 07:07:04,854][00663] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-11-15 07:07:04,856][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:07:04,857][00663] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-11-15 07:07:04,858][00663] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2023-11-15 07:07:04,860][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-11-15 07:07:04,861][00663] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2023-11-15 07:07:04,862][00663] Adding new argument 'hf_repository'='nikxtaco/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2023-11-15 07:07:04,863][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-11-15 07:07:04,864][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-11-15 07:07:04,865][00663] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-11-15 07:07:04,866][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-11-15 07:07:04,867][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-11-15 07:07:04,908][00663] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:07:04,910][00663] RunningMeanStd input shape: (1,)
[2023-11-15 07:07:04,927][00663] ConvEncoder: input_channels=3
[2023-11-15 07:07:04,988][00663] Conv encoder output size: 512
[2023-11-15 07:07:04,991][00663] Policy head output size: 512
[2023-11-15 07:07:05,018][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000014_57344.pth...
[2023-11-15 07:07:05,806][00663] Num frames 100...
[2023-11-15 07:07:06,226][00663] Num frames 200...
[2023-11-15 07:07:06,567][00663] Num frames 300...
[2023-11-15 07:07:06,772][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:07:06,774][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:07:06,805][00663] Num frames 400...
[2023-11-15 07:07:06,933][00663] Num frames 500...
[2023-11-15 07:07:07,059][00663] Num frames 600...
[2023-11-15 07:07:07,182][00663] Num frames 700...
[2023-11-15 07:07:07,296][00663] Avg episode rewards: #0: 3.730, true rewards: #0: 3.730
[2023-11-15 07:07:07,298][00663] Avg episode reward: 3.730, avg true_objective: 3.730
[2023-11-15 07:07:07,373][00663] Num frames 800...
[2023-11-15 07:07:07,508][00663] Num frames 900...
[2023-11-15 07:07:07,687][00663] Num frames 1000...
[2023-11-15 07:07:07,873][00663] Num frames 1100...
[2023-11-15 07:07:07,988][00663] Avg episode rewards: #0: 3.767, true rewards: #0: 3.767
[2023-11-15 07:07:07,990][00663] Avg episode reward: 3.767, avg true_objective: 3.767
[2023-11-15 07:07:08,124][00663] Num frames 1200...
[2023-11-15 07:07:08,309][00663] Num frames 1300...
[2023-11-15 07:07:08,498][00663] Num frames 1400...
[2023-11-15 07:07:08,683][00663] Num frames 1500...
[2023-11-15 07:07:08,770][00663] Avg episode rewards: #0: 3.785, true rewards: #0: 3.785
[2023-11-15 07:07:08,775][00663] Avg episode reward: 3.785, avg true_objective: 3.785
[2023-11-15 07:07:08,932][00663] Num frames 1600...
[2023-11-15 07:07:09,116][00663] Num frames 1700...
[2023-11-15 07:07:09,307][00663] Num frames 1800...
[2023-11-15 07:07:09,497][00663] Num frames 1900...
[2023-11-15 07:07:09,687][00663] Avg episode rewards: #0: 4.124, true rewards: #0: 3.924
[2023-11-15 07:07:09,691][00663] Avg episode reward: 4.124, avg true_objective: 3.924
[2023-11-15 07:07:09,766][00663] Num frames 2000...
[2023-11-15 07:07:09,952][00663] Num frames 2100...
[2023-11-15 07:07:10,140][00663] Num frames 2200...
[2023-11-15 07:07:10,325][00663] Num frames 2300...
[2023-11-15 07:07:10,507][00663] Num frames 2400...
[2023-11-15 07:07:10,586][00663] Avg episode rewards: #0: 4.350, true rewards: #0: 4.017
[2023-11-15 07:07:10,588][00663] Avg episode reward: 4.350, avg true_objective: 4.017
[2023-11-15 07:07:10,759][00663] Num frames 2500...
[2023-11-15 07:07:10,952][00663] Num frames 2600...
[2023-11-15 07:07:11,141][00663] Num frames 2700...
[2023-11-15 07:07:11,333][00663] Num frames 2800...
[2023-11-15 07:07:11,508][00663] Avg episode rewards: #0: 4.511, true rewards: #0: 4.083
[2023-11-15 07:07:11,511][00663] Avg episode reward: 4.511, avg true_objective: 4.083
[2023-11-15 07:07:11,592][00663] Num frames 2900...
[2023-11-15 07:07:11,783][00663] Num frames 3000...
[2023-11-15 07:07:11,972][00663] Num frames 3100...
[2023-11-15 07:07:12,163][00663] Num frames 3200...
[2023-11-15 07:07:12,300][00663] Avg episode rewards: #0: 4.428, true rewards: #0: 4.052
[2023-11-15 07:07:12,302][00663] Avg episode reward: 4.428, avg true_objective: 4.052
[2023-11-15 07:07:12,414][00663] Num frames 3300...
[2023-11-15 07:07:12,597][00663] Num frames 3400...
[2023-11-15 07:07:12,796][00663] Num frames 3500...
[2023-11-15 07:07:12,966][00663] Num frames 3600...
[2023-11-15 07:07:13,138][00663] Avg episode rewards: #0: 4.544, true rewards: #0: 4.100
[2023-11-15 07:07:13,140][00663] Avg episode reward: 4.544, avg true_objective: 4.100
[2023-11-15 07:07:13,156][00663] Num frames 3700...
[2023-11-15 07:07:13,290][00663] Num frames 3800...
[2023-11-15 07:07:13,422][00663] Num frames 3900...
[2023-11-15 07:07:13,556][00663] Num frames 4000...
[2023-11-15 07:07:13,708][00663] Avg episode rewards: #0: 4.474, true rewards: #0: 4.074
[2023-11-15 07:07:13,711][00663] Avg episode reward: 4.474, avg true_objective: 4.074
[2023-11-15 07:07:36,214][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2023-11-15 07:07:38,656][00663] The model has been pushed to https://huggingface.co/nikxtaco/rl_course_vizdoom_health_gathering_supreme
[2023-11-15 07:07:38,699][00663] Environment doom_basic already registered, overwriting...
[2023-11-15 07:07:38,702][00663] Environment doom_two_colors_easy already registered, overwriting...
[2023-11-15 07:07:38,704][00663] Environment doom_two_colors_hard already registered, overwriting...
[2023-11-15 07:07:38,705][00663] Environment doom_dm already registered, overwriting...
[2023-11-15 07:07:38,706][00663] Environment doom_dwango5 already registered, overwriting...
[2023-11-15 07:07:38,707][00663] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2023-11-15 07:07:38,708][00663] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2023-11-15 07:07:38,709][00663] Environment doom_my_way_home already registered, overwriting...
[2023-11-15 07:07:38,710][00663] Environment doom_deadly_corridor already registered, overwriting...
[2023-11-15 07:07:38,711][00663] Environment doom_defend_the_center already registered, overwriting...
[2023-11-15 07:07:38,712][00663] Environment doom_defend_the_line already registered, overwriting...
[2023-11-15 07:07:38,713][00663] Environment doom_health_gathering already registered, overwriting...
[2023-11-15 07:07:38,714][00663] Environment doom_health_gathering_supreme already registered, overwriting...
[2023-11-15 07:07:38,715][00663] Environment doom_battle already registered, overwriting...
[2023-11-15 07:07:38,716][00663] Environment doom_battle2 already registered, overwriting...
[2023-11-15 07:07:38,717][00663] Environment doom_duel_bots already registered, overwriting...
[2023-11-15 07:07:38,718][00663] Environment doom_deathmatch_bots already registered, overwriting...
[2023-11-15 07:07:38,719][00663] Environment doom_duel already registered, overwriting...
[2023-11-15 07:07:38,720][00663] Environment doom_deathmatch_full already registered, overwriting...
[2023-11-15 07:07:38,721][00663] Environment doom_benchmark already registered, overwriting...
[2023-11-15 07:07:38,722][00663] register_encoder_factory: <function make_vizdoom_encoder at 0x7e8c58d712d0>
[2023-11-15 07:07:38,747][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:07:38,747][00663] Overriding arg 'train_for_env_steps' with value 4000000 passed from command line
[2023-11-15 07:07:38,753][00663] Experiment dir /content/train_dir/default_experiment already exists!
[2023-11-15 07:07:38,758][00663] Resuming existing experiment from /content/train_dir/default_experiment...
[2023-11-15 07:07:38,759][00663] Weights and Biases integration disabled
[2023-11-15 07:07:38,762][00663] Environment var CUDA_VISIBLE_DEVICES is 0
[2023-11-15 07:07:41,808][00663] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=4000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000}
git_hash=unknown
git_repo_name=not a git repository
[2023-11-15 07:07:41,813][00663] Saving configuration to /content/train_dir/default_experiment/config.json...
[2023-11-15 07:07:41,820][00663] Rollout worker 0 uses device cpu
[2023-11-15 07:07:41,821][00663] Rollout worker 1 uses device cpu
[2023-11-15 07:07:41,822][00663] Rollout worker 2 uses device cpu
[2023-11-15 07:07:41,824][00663] Rollout worker 3 uses device cpu
[2023-11-15 07:07:41,827][00663] Rollout worker 4 uses device cpu
[2023-11-15 07:07:41,828][00663] Rollout worker 5 uses device cpu
[2023-11-15 07:07:41,829][00663] Rollout worker 6 uses device cpu
[2023-11-15 07:07:41,830][00663] Rollout worker 7 uses device cpu
[2023-11-15 07:07:41,940][00663] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:07:41,943][00663] InferenceWorker_p0-w0: min num requests: 2
[2023-11-15 07:07:41,983][00663] Starting all processes...
[2023-11-15 07:07:41,985][00663] Starting process learner_proc0
[2023-11-15 07:07:42,058][00663] Starting all processes...
[2023-11-15 07:07:42,071][00663] Starting process inference_proc0-0
[2023-11-15 07:07:42,072][00663] Starting process rollout_proc0
[2023-11-15 07:07:42,073][00663] Starting process rollout_proc1
[2023-11-15 07:07:42,073][00663] Starting process rollout_proc2
[2023-11-15 07:07:42,074][00663] Starting process rollout_proc3
[2023-11-15 07:07:42,074][00663] Starting process rollout_proc4
[2023-11-15 07:07:42,074][00663] Starting process rollout_proc5
[2023-11-15 07:07:42,074][00663] Starting process rollout_proc6
[2023-11-15 07:07:42,074][00663] Starting process rollout_proc7
[2023-11-15 07:07:58,211][19966] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:07:58,222][19966] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2023-11-15 07:07:58,305][19966] Num visible devices: 1
[2023-11-15 07:07:58,312][19981] Worker 0 uses CPU cores [0]
[2023-11-15 07:07:58,361][19966] Starting seed is not provided
[2023-11-15 07:07:58,362][19966] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:07:58,363][19966] Initializing actor-critic model on device cuda:0
[2023-11-15 07:07:58,364][19966] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:07:58,365][19966] RunningMeanStd input shape: (1,)
[2023-11-15 07:07:58,521][19966] ConvEncoder: input_channels=3
[2023-11-15 07:07:58,695][19980] Worker 1 uses CPU cores [1]
[2023-11-15 07:07:58,957][19991] Worker 6 uses CPU cores [0]
[2023-11-15 07:07:59,030][19983] Worker 3 uses CPU cores [1]
[2023-11-15 07:07:59,075][19990] Worker 7 uses CPU cores [1]
[2023-11-15 07:07:59,104][19989] Worker 5 uses CPU cores [1]
[2023-11-15 07:07:59,158][19979] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:07:59,160][19979] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2023-11-15 07:07:59,161][19982] Worker 2 uses CPU cores [0]
[2023-11-15 07:07:59,210][19979] Num visible devices: 1
[2023-11-15 07:07:59,212][19984] Worker 4 uses CPU cores [0]
[2023-11-15 07:07:59,270][19966] Conv encoder output size: 512
[2023-11-15 07:07:59,270][19966] Policy head output size: 512
[2023-11-15 07:07:59,295][19966] Created Actor Critic model with architecture:
[2023-11-15 07:07:59,296][19966] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2023-11-15 07:07:59,571][19966] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-11-15 07:08:00,098][19966] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000014_57344.pth...
[2023-11-15 07:08:00,151][19966] Loading model from checkpoint
[2023-11-15 07:08:00,153][19966] Loaded experiment state at self.train_step=14, self.env_steps=57344
[2023-11-15 07:08:00,154][19966] Initialized policy 0 weights for model version 14
[2023-11-15 07:08:00,158][19966] LearnerWorker_p0 finished initialization!
[2023-11-15 07:08:00,160][19966] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:08:00,417][19979] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:08:00,419][19979] RunningMeanStd input shape: (1,)
[2023-11-15 07:08:00,438][19979] ConvEncoder: input_channels=3
[2023-11-15 07:08:00,593][19979] Conv encoder output size: 512
[2023-11-15 07:08:00,595][19979] Policy head output size: 512
[2023-11-15 07:08:00,688][00663] Inference worker 0-0 is ready!
[2023-11-15 07:08:00,690][00663] All inference workers are ready! Signal rollout workers to start!
[2023-11-15 07:08:00,954][19981] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:08:00,951][19991] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:08:00,959][19982] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:08:00,956][19984] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:08:01,021][19980] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:08:01,026][19983] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:08:01,009][19989] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:08:01,015][19990] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:08:01,933][00663] Heartbeat connected on Batcher_0
[2023-11-15 07:08:01,936][00663] Heartbeat connected on LearnerWorker_p0
[2023-11-15 07:08:01,978][00663] Heartbeat connected on InferenceWorker_p0-w0
[2023-11-15 07:08:02,793][19980] Decorrelating experience for 0 frames...
[2023-11-15 07:08:02,792][19990] Decorrelating experience for 0 frames...
[2023-11-15 07:08:02,795][19989] Decorrelating experience for 0 frames...
[2023-11-15 07:08:03,386][19981] Decorrelating experience for 0 frames...
[2023-11-15 07:08:03,404][19991] Decorrelating experience for 0 frames...
[2023-11-15 07:08:03,398][19982] Decorrelating experience for 0 frames...
[2023-11-15 07:08:03,599][19980] Decorrelating experience for 32 frames...
[2023-11-15 07:08:03,762][00663] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 57344. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-11-15 07:08:04,355][19983] Decorrelating experience for 0 frames...
[2023-11-15 07:08:04,587][19984] Decorrelating experience for 0 frames...
[2023-11-15 07:08:05,168][19991] Decorrelating experience for 32 frames...
[2023-11-15 07:08:05,180][19982] Decorrelating experience for 32 frames...
[2023-11-15 07:08:05,183][19981] Decorrelating experience for 32 frames...
[2023-11-15 07:08:05,932][19990] Decorrelating experience for 32 frames...
[2023-11-15 07:08:05,944][19984] Decorrelating experience for 32 frames...
[2023-11-15 07:08:06,438][19982] Decorrelating experience for 64 frames...
[2023-11-15 07:08:06,712][19980] Decorrelating experience for 64 frames...
[2023-11-15 07:08:06,724][19989] Decorrelating experience for 32 frames...
[2023-11-15 07:08:06,732][19983] Decorrelating experience for 32 frames...
[2023-11-15 07:08:06,747][19984] Decorrelating experience for 64 frames...
[2023-11-15 07:08:07,784][19990] Decorrelating experience for 64 frames...
[2023-11-15 07:08:07,917][19991] Decorrelating experience for 64 frames...
[2023-11-15 07:08:07,948][19981] Decorrelating experience for 64 frames...
[2023-11-15 07:08:08,132][19984] Decorrelating experience for 96 frames...
[2023-11-15 07:08:08,267][19980] Decorrelating experience for 96 frames...
[2023-11-15 07:08:08,278][00663] Heartbeat connected on RolloutWorker_w4
[2023-11-15 07:08:08,635][00663] Heartbeat connected on RolloutWorker_w1
[2023-11-15 07:08:08,684][19989] Decorrelating experience for 64 frames...
[2023-11-15 07:08:08,762][00663] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 57344. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-11-15 07:08:09,142][19982] Decorrelating experience for 96 frames...
[2023-11-15 07:08:09,431][00663] Heartbeat connected on RolloutWorker_w2
[2023-11-15 07:08:09,668][19983] Decorrelating experience for 64 frames...
[2023-11-15 07:08:09,688][19990] Decorrelating experience for 96 frames...
[2023-11-15 07:08:09,864][19981] Decorrelating experience for 96 frames...
[2023-11-15 07:08:10,087][00663] Heartbeat connected on RolloutWorker_w7
[2023-11-15 07:08:10,312][00663] Heartbeat connected on RolloutWorker_w0
[2023-11-15 07:08:11,282][19991] Decorrelating experience for 96 frames...
[2023-11-15 07:08:12,207][00663] Heartbeat connected on RolloutWorker_w6
[2023-11-15 07:08:12,247][19989] Decorrelating experience for 96 frames...
[2023-11-15 07:08:12,387][19983] Decorrelating experience for 96 frames...
[2023-11-15 07:08:12,850][00663] Heartbeat connected on RolloutWorker_w5
[2023-11-15 07:08:12,974][00663] Heartbeat connected on RolloutWorker_w3
[2023-11-15 07:08:13,273][19966] Signal inference workers to stop experience collection...
[2023-11-15 07:08:13,294][19979] InferenceWorker_p0-w0: stopping experience collection
[2023-11-15 07:08:13,762][00663] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 57344. Throughput: 0: 209.6. Samples: 2096. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-11-15 07:08:13,768][00663] Avg episode reward: [(0, '2.661')]
[2023-11-15 07:08:13,827][19966] Signal inference workers to resume experience collection...
[2023-11-15 07:08:13,832][19979] InferenceWorker_p0-w0: resuming experience collection
[2023-11-15 07:08:18,762][00663] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 73728. Throughput: 0: 350.5. Samples: 5258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:08:18,766][00663] Avg episode reward: [(0, '3.394')]
[2023-11-15 07:08:23,762][00663] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 86016. Throughput: 0: 367.9. Samples: 7358. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:08:23,764][00663] Avg episode reward: [(0, '3.931')]
[2023-11-15 07:08:26,558][19979] Updated weights for policy 0, policy_version 24 (0.0200)
[2023-11-15 07:08:28,762][00663] Fps is (10 sec: 2867.2, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 102400. Throughput: 0: 459.0. Samples: 11476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:08:28,765][00663] Avg episode reward: [(0, '4.285')]
[2023-11-15 07:08:28,775][19966] Saving new best policy, reward=4.285!
[2023-11-15 07:08:33,762][00663] Fps is (10 sec: 3276.8, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 118784. Throughput: 0: 539.9. Samples: 16198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:08:33,771][00663] Avg episode reward: [(0, '4.724')]
[2023-11-15 07:08:33,773][19966] Saving new best policy, reward=4.724!
[2023-11-15 07:08:38,291][19979] Updated weights for policy 0, policy_version 34 (0.0027)
[2023-11-15 07:08:38,762][00663] Fps is (10 sec: 3686.4, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 139264. Throughput: 0: 553.9. Samples: 19386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:08:38,770][00663] Avg episode reward: [(0, '4.647')]
[2023-11-15 07:08:43,766][00663] Fps is (10 sec: 3684.8, 60 sec: 2457.3, 300 sec: 2457.3). Total num frames: 155648. Throughput: 0: 629.5. Samples: 25184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:08:43,769][00663] Avg episode reward: [(0, '4.410')]
[2023-11-15 07:08:48,762][00663] Fps is (10 sec: 3276.8, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 172032. Throughput: 0: 651.4. Samples: 29314. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:08:48,776][00663] Avg episode reward: [(0, '4.270')]
[2023-11-15 07:08:51,360][19979] Updated weights for policy 0, policy_version 44 (0.0024)
[2023-11-15 07:08:53,765][00663] Fps is (10 sec: 2867.7, 60 sec: 2539.4, 300 sec: 2539.4). Total num frames: 184320. Throughput: 0: 698.6. Samples: 31440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:08:53,767][00663] Avg episode reward: [(0, '4.391')]
[2023-11-15 07:08:58,762][00663] Fps is (10 sec: 3276.8, 60 sec: 2681.0, 300 sec: 2681.0). Total num frames: 204800. Throughput: 0: 760.9. Samples: 36338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:08:58,765][00663] Avg episode reward: [(0, '4.465')]
[2023-11-15 07:09:02,545][19979] Updated weights for policy 0, policy_version 54 (0.0013)
[2023-11-15 07:09:03,762][00663] Fps is (10 sec: 4097.0, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 225280. Throughput: 0: 835.6. Samples: 42858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:09:03,770][00663] Avg episode reward: [(0, '4.373')]
[2023-11-15 07:09:08,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 241664. Throughput: 0: 853.3. Samples: 45758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:09:08,770][00663] Avg episode reward: [(0, '4.497')]
[2023-11-15 07:09:13,764][00663] Fps is (10 sec: 2866.6, 60 sec: 3276.7, 300 sec: 2808.6). Total num frames: 253952. Throughput: 0: 851.2. Samples: 49782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:09:13,770][00663] Avg episode reward: [(0, '4.483')]
[2023-11-15 07:09:15,696][19979] Updated weights for policy 0, policy_version 64 (0.0016)
[2023-11-15 07:09:18,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2839.9). Total num frames: 270336. Throughput: 0: 839.4. Samples: 53970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:09:18,765][00663] Avg episode reward: [(0, '4.471')]
[2023-11-15 07:09:23,762][00663] Fps is (10 sec: 3277.5, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 286720. Throughput: 0: 817.0. Samples: 56150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:09:23,768][00663] Avg episode reward: [(0, '4.289')]
[2023-11-15 07:09:27,230][19979] Updated weights for policy 0, policy_version 74 (0.0031)
[2023-11-15 07:09:28,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2939.5). Total num frames: 307200. Throughput: 0: 834.0. Samples: 62710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:09:28,765][00663] Avg episode reward: [(0, '4.262')]
[2023-11-15 07:09:33,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2958.2). Total num frames: 323584. Throughput: 0: 860.8. Samples: 68050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:09:33,767][00663] Avg episode reward: [(0, '4.485')]
[2023-11-15 07:09:38,763][00663] Fps is (10 sec: 3276.4, 60 sec: 3345.0, 300 sec: 2975.0). Total num frames: 339968. Throughput: 0: 858.3. Samples: 70062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:09:38,767][00663] Avg episode reward: [(0, '4.541')]
[2023-11-15 07:09:38,785][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000083_339968.pth...
[2023-11-15 07:09:38,934][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth
[2023-11-15 07:09:40,026][19979] Updated weights for policy 0, policy_version 84 (0.0016)
[2023-11-15 07:09:43,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3277.0, 300 sec: 2949.1). Total num frames: 352256. Throughput: 0: 837.2. Samples: 74012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:09:43,765][00663] Avg episode reward: [(0, '4.445')]
[2023-11-15 07:09:48,762][00663] Fps is (10 sec: 2867.6, 60 sec: 3276.8, 300 sec: 2964.7). Total num frames: 368640. Throughput: 0: 806.8. Samples: 79166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2023-11-15 07:09:48,764][00663] Avg episode reward: [(0, '4.373')]
[2023-11-15 07:09:51,863][19979] Updated weights for policy 0, policy_version 94 (0.0025)
[2023-11-15 07:09:53,762][00663] Fps is (10 sec: 4095.9, 60 sec: 3481.7, 300 sec: 3053.4). Total num frames: 393216. Throughput: 0: 814.3. Samples: 82400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:09:53,765][00663] Avg episode reward: [(0, '4.281')]
[2023-11-15 07:09:58,767][00663] Fps is (10 sec: 4093.9, 60 sec: 3413.0, 300 sec: 3063.0). Total num frames: 409600. Throughput: 0: 855.6. Samples: 88288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:09:58,774][00663] Avg episode reward: [(0, '4.477')]
[2023-11-15 07:10:03,762][00663] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 3037.9). Total num frames: 421888. Throughput: 0: 855.4. Samples: 92462. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:10:03,768][00663] Avg episode reward: [(0, '4.386')]
[2023-11-15 07:10:03,863][19979] Updated weights for policy 0, policy_version 104 (0.0017)
[2023-11-15 07:10:08,762][00663] Fps is (10 sec: 2868.7, 60 sec: 3276.8, 300 sec: 3047.4). Total num frames: 438272. Throughput: 0: 852.7. Samples: 94522. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:10:08,769][00663] Avg episode reward: [(0, '4.582')]
[2023-11-15 07:10:13,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.2, 300 sec: 3056.2). Total num frames: 454656. Throughput: 0: 814.4. Samples: 99360. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:10:13,769][00663] Avg episode reward: [(0, '4.644')]
[2023-11-15 07:10:16,052][19979] Updated weights for policy 0, policy_version 114 (0.0019)
[2023-11-15 07:10:18,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3094.8). Total num frames: 475136. Throughput: 0: 842.8. Samples: 105974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:10:18,765][00663] Avg episode reward: [(0, '4.720')]
[2023-11-15 07:10:23,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3130.5). Total num frames: 495616. Throughput: 0: 865.4. Samples: 109004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:10:23,766][00663] Avg episode reward: [(0, '4.640')]
[2023-11-15 07:10:27,634][19979] Updated weights for policy 0, policy_version 124 (0.0023)
[2023-11-15 07:10:28,766][00663] Fps is (10 sec: 3275.4, 60 sec: 3344.8, 300 sec: 3107.2). Total num frames: 507904. Throughput: 0: 870.0. Samples: 113166. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:10:28,769][00663] Avg episode reward: [(0, '4.730')]
[2023-11-15 07:10:28,789][19966] Saving new best policy, reward=4.730!
[2023-11-15 07:10:33,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3113.0). Total num frames: 524288. Throughput: 0: 845.2. Samples: 117202. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:10:33,767][00663] Avg episode reward: [(0, '4.599')]
[2023-11-15 07:10:38,762][00663] Fps is (10 sec: 3278.2, 60 sec: 3345.1, 300 sec: 3118.2). Total num frames: 540672. Throughput: 0: 818.8. Samples: 119244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:10:38,767][00663] Avg episode reward: [(0, '4.318')]
[2023-11-15 07:10:40,527][19979] Updated weights for policy 0, policy_version 134 (0.0016)
[2023-11-15 07:10:43,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3148.8). Total num frames: 561152. Throughput: 0: 833.2. Samples: 125780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:10:43,765][00663] Avg episode reward: [(0, '4.426')]
[2023-11-15 07:10:48,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3152.7). Total num frames: 577536. Throughput: 0: 867.7. Samples: 131508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:10:48,768][00663] Avg episode reward: [(0, '4.730')]
[2023-11-15 07:10:52,217][19979] Updated weights for policy 0, policy_version 144 (0.0013)
[2023-11-15 07:10:53,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3156.3). Total num frames: 593920. Throughput: 0: 866.7. Samples: 133524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:10:53,764][00663] Avg episode reward: [(0, '4.778')]
[2023-11-15 07:10:53,771][19966] Saving new best policy, reward=4.778!
[2023-11-15 07:10:58,763][00663] Fps is (10 sec: 2867.1, 60 sec: 3277.0, 300 sec: 3136.4). Total num frames: 606208. Throughput: 0: 849.4. Samples: 137584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:10:58,764][00663] Avg episode reward: [(0, '4.649')]
[2023-11-15 07:11:03,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3140.3). Total num frames: 622592. Throughput: 0: 807.4. Samples: 142308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:11:03,764][00663] Avg episode reward: [(0, '4.625')]
[2023-11-15 07:11:05,355][19979] Updated weights for policy 0, policy_version 154 (0.0040)
[2023-11-15 07:11:08,762][00663] Fps is (10 sec: 3686.6, 60 sec: 3413.3, 300 sec: 3166.1). Total num frames: 643072. Throughput: 0: 809.8. Samples: 145446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:11:08,768][00663] Avg episode reward: [(0, '4.526')]
[2023-11-15 07:11:13,763][00663] Fps is (10 sec: 4095.5, 60 sec: 3481.5, 300 sec: 3190.5). Total num frames: 663552. Throughput: 0: 853.2. Samples: 151558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:11:13,766][00663] Avg episode reward: [(0, '4.571')]
[2023-11-15 07:11:16,666][19979] Updated weights for policy 0, policy_version 164 (0.0012)
[2023-11-15 07:11:18,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3171.8). Total num frames: 675840. Throughput: 0: 854.5. Samples: 155654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:11:18,766][00663] Avg episode reward: [(0, '4.517')]
[2023-11-15 07:11:23,762][00663] Fps is (10 sec: 2457.9, 60 sec: 3208.5, 300 sec: 3153.9). Total num frames: 688128. Throughput: 0: 853.8. Samples: 157666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:11:23,771][00663] Avg episode reward: [(0, '4.434')]
[2023-11-15 07:11:28,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3277.0, 300 sec: 3156.9). Total num frames: 704512. Throughput: 0: 800.4. Samples: 161798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:11:28,769][00663] Avg episode reward: [(0, '4.391')]
[2023-11-15 07:11:30,363][19979] Updated weights for policy 0, policy_version 174 (0.0035)
[2023-11-15 07:11:33,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3179.3). Total num frames: 724992. Throughput: 0: 812.1. Samples: 168052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2023-11-15 07:11:33,765][00663] Avg episode reward: [(0, '4.618')]
[2023-11-15 07:11:38,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3200.6). Total num frames: 745472. Throughput: 0: 838.8. Samples: 171268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:11:38,771][00663] Avg episode reward: [(0, '4.700')]
[2023-11-15 07:11:38,786][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000182_745472.pth...
[2023-11-15 07:11:38,927][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000014_57344.pth
[2023-11-15 07:11:41,398][19979] Updated weights for policy 0, policy_version 184 (0.0032)
[2023-11-15 07:11:43,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3183.7). Total num frames: 757760. Throughput: 0: 840.9. Samples: 175426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:11:43,769][00663] Avg episode reward: [(0, '4.683')]
[2023-11-15 07:11:48,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3167.6). Total num frames: 770048. Throughput: 0: 811.0. Samples: 178804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:11:48,766][00663] Avg episode reward: [(0, '4.519')]
[2023-11-15 07:11:53,762][00663] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 3134.3). Total num frames: 778240. Throughput: 0: 777.2. Samples: 180420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:11:53,764][00663] Avg episode reward: [(0, '4.641')]
[2023-11-15 07:11:58,762][00663] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 3119.9). Total num frames: 790528. Throughput: 0: 714.2. Samples: 183698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:11:58,770][00663] Avg episode reward: [(0, '4.663')]
[2023-11-15 07:11:59,579][19979] Updated weights for policy 0, policy_version 194 (0.0015)
[2023-11-15 07:12:03,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3123.2). Total num frames: 806912. Throughput: 0: 723.7. Samples: 188222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:12:03,764][00663] Avg episode reward: [(0, '4.870')]
[2023-11-15 07:12:03,768][19966] Saving new best policy, reward=4.870!
[2023-11-15 07:12:08,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3126.3). Total num frames: 823296. Throughput: 0: 747.8. Samples: 191316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:12:08,767][00663] Avg episode reward: [(0, '4.896')]
[2023-11-15 07:12:08,779][19966] Saving new best policy, reward=4.896!
[2023-11-15 07:12:12,261][19979] Updated weights for policy 0, policy_version 204 (0.0040)
[2023-11-15 07:12:13,762][00663] Fps is (10 sec: 2867.2, 60 sec: 2867.3, 300 sec: 3113.0). Total num frames: 835584. Throughput: 0: 745.3. Samples: 195338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:12:13,766][00663] Avg episode reward: [(0, '4.925')]
[2023-11-15 07:12:13,772][19966] Saving new best policy, reward=4.925!
[2023-11-15 07:12:18,762][00663] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 3116.2). Total num frames: 851968. Throughput: 0: 694.7. Samples: 199314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:12:18,764][00663] Avg episode reward: [(0, '4.977')]
[2023-11-15 07:12:18,776][19966] Saving new best policy, reward=4.977!
[2023-11-15 07:12:23,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3119.3). Total num frames: 868352. Throughput: 0: 667.2. Samples: 201290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:12:23,769][00663] Avg episode reward: [(0, '5.019')]
[2023-11-15 07:12:23,773][19966] Saving new best policy, reward=5.019!
[2023-11-15 07:12:25,772][19979] Updated weights for policy 0, policy_version 214 (0.0019)
[2023-11-15 07:12:28,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3137.7). Total num frames: 888832. Throughput: 0: 713.4. Samples: 207530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:12:28,764][00663] Avg episode reward: [(0, '4.966')]
[2023-11-15 07:12:33,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3072.0, 300 sec: 3155.4). Total num frames: 909312. Throughput: 0: 774.4. Samples: 213652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2023-11-15 07:12:33,768][00663] Avg episode reward: [(0, '4.829')]
[2023-11-15 07:12:36,803][19979] Updated weights for policy 0, policy_version 224 (0.0020)
[2023-11-15 07:12:38,762][00663] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 3142.8). Total num frames: 921600. Throughput: 0: 784.5. Samples: 215722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:12:38,766][00663] Avg episode reward: [(0, '4.808')]
[2023-11-15 07:12:43,768][00663] Fps is (10 sec: 2456.2, 60 sec: 2935.2, 300 sec: 3130.5). Total num frames: 933888. Throughput: 0: 804.8. Samples: 219918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:12:43,775][00663] Avg episode reward: [(0, '4.853')]
[2023-11-15 07:12:48,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3133.1). Total num frames: 950272. Throughput: 0: 806.8. Samples: 224528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:12:48,769][00663] Avg episode reward: [(0, '5.098')]
[2023-11-15 07:12:48,778][19966] Saving new best policy, reward=5.098!
[2023-11-15 07:12:49,930][19979] Updated weights for policy 0, policy_version 234 (0.0018)
[2023-11-15 07:12:53,762][00663] Fps is (10 sec: 4098.3, 60 sec: 3276.8, 300 sec: 3163.8). Total num frames: 974848. Throughput: 0: 810.1. Samples: 227770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:12:53,769][00663] Avg episode reward: [(0, '5.110')]
[2023-11-15 07:12:53,773][19966] Saving new best policy, reward=5.110!
[2023-11-15 07:12:58,762][00663] Fps is (10 sec: 4505.6, 60 sec: 3413.3, 300 sec: 3179.6). Total num frames: 995328. Throughput: 0: 865.2. Samples: 234274. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:12:58,764][00663] Avg episode reward: [(0, '5.102')]
[2023-11-15 07:13:00,260][19979] Updated weights for policy 0, policy_version 244 (0.0019)
[2023-11-15 07:13:03,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1007616. Throughput: 0: 870.0. Samples: 238462. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:13:03,764][00663] Avg episode reward: [(0, '5.344')]
[2023-11-15 07:13:03,766][19966] Saving new best policy, reward=5.344!
[2023-11-15 07:13:08,764][00663] Fps is (10 sec: 2457.2, 60 sec: 3276.7, 300 sec: 3262.9). Total num frames: 1019904. Throughput: 0: 871.9. Samples: 240526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:13:08,769][00663] Avg episode reward: [(0, '5.385')]
[2023-11-15 07:13:08,782][19966] Saving new best policy, reward=5.385!
[2023-11-15 07:13:13,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 1036288. Throughput: 0: 822.9. Samples: 244560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:13:13,771][00663] Avg episode reward: [(0, '5.240')]
[2023-11-15 07:13:14,440][19979] Updated weights for policy 0, policy_version 254 (0.0030)
[2023-11-15 07:13:18,762][00663] Fps is (10 sec: 3686.9, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 1056768. Throughput: 0: 833.1. Samples: 251142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:13:18,773][00663] Avg episode reward: [(0, '4.900')]
[2023-11-15 07:13:23,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3304.6). Total num frames: 1077248. Throughput: 0: 858.5. Samples: 254356. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:13:23,767][00663] Avg episode reward: [(0, '5.089')]
[2023-11-15 07:13:24,222][19979] Updated weights for policy 0, policy_version 264 (0.0027)
[2023-11-15 07:13:28,763][00663] Fps is (10 sec: 3685.9, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1093632. Throughput: 0: 865.1. Samples: 258842. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:13:28,773][00663] Avg episode reward: [(0, '4.978')]
[2023-11-15 07:13:33,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1105920. Throughput: 0: 851.3. Samples: 262836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:13:33,767][00663] Avg episode reward: [(0, '5.092')]
[2023-11-15 07:13:38,762][00663] Fps is (10 sec: 2457.9, 60 sec: 3276.8, 300 sec: 3263.0). Total num frames: 1118208. Throughput: 0: 824.0. Samples: 264850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:13:38,764][00663] Avg episode reward: [(0, '5.285')]
[2023-11-15 07:13:38,777][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000273_1118208.pth...
[2023-11-15 07:13:38,884][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000083_339968.pth
[2023-11-15 07:13:39,026][19979] Updated weights for policy 0, policy_version 274 (0.0029)
[2023-11-15 07:13:43,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3413.6, 300 sec: 3276.8). Total num frames: 1138688. Throughput: 0: 808.0. Samples: 270634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:13:43,768][00663] Avg episode reward: [(0, '5.544')]
[2023-11-15 07:13:43,772][19966] Saving new best policy, reward=5.544!
[2023-11-15 07:13:48,734][19979] Updated weights for policy 0, policy_version 284 (0.0017)
[2023-11-15 07:13:48,762][00663] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 1163264. Throughput: 0: 855.3. Samples: 276950. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:13:48,769][00663] Avg episode reward: [(0, '5.514')]
[2023-11-15 07:13:53,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1175552. Throughput: 0: 856.7. Samples: 279074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:13:53,766][00663] Avg episode reward: [(0, '6.035')]
[2023-11-15 07:13:53,772][19966] Saving new best policy, reward=6.035!
[2023-11-15 07:13:58,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1187840. Throughput: 0: 859.4. Samples: 283234. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:13:58,764][00663] Avg episode reward: [(0, '6.600')]
[2023-11-15 07:13:58,783][19966] Saving new best policy, reward=6.600!
[2023-11-15 07:14:03,710][19979] Updated weights for policy 0, policy_version 294 (0.0038)
[2023-11-15 07:14:03,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1204224. Throughput: 0: 806.9. Samples: 287452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:14:03,765][00663] Avg episode reward: [(0, '6.504')]
[2023-11-15 07:14:08,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3290.7). Total num frames: 1224704. Throughput: 0: 807.4. Samples: 290690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:14:08,765][00663] Avg episode reward: [(0, '5.903')]
[2023-11-15 07:14:12,973][19979] Updated weights for policy 0, policy_version 304 (0.0022)
[2023-11-15 07:14:13,764][00663] Fps is (10 sec: 4095.2, 60 sec: 3481.5, 300 sec: 3304.5). Total num frames: 1245184. Throughput: 0: 855.7. Samples: 297348. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:14:13,771][00663] Avg episode reward: [(0, '6.242')]
[2023-11-15 07:14:18,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1261568. Throughput: 0: 866.2. Samples: 301814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:14:18,765][00663] Avg episode reward: [(0, '6.763')]
[2023-11-15 07:14:18,776][19966] Saving new best policy, reward=6.763!
[2023-11-15 07:14:23,768][00663] Fps is (10 sec: 2866.0, 60 sec: 3276.4, 300 sec: 3276.7). Total num frames: 1273856. Throughput: 0: 867.0. Samples: 303870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:14:23,774][00663] Avg episode reward: [(0, '6.646')]
[2023-11-15 07:14:27,552][19979] Updated weights for policy 0, policy_version 314 (0.0023)
[2023-11-15 07:14:28,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3208.6, 300 sec: 3262.9). Total num frames: 1286144. Throughput: 0: 828.8. Samples: 307928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:14:28,765][00663] Avg episode reward: [(0, '6.474')]
[2023-11-15 07:14:33,762][00663] Fps is (10 sec: 3688.8, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 1310720. Throughput: 0: 827.9. Samples: 314204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:14:33,768][00663] Avg episode reward: [(0, '6.199')]
[2023-11-15 07:14:37,225][19979] Updated weights for policy 0, policy_version 324 (0.0025)
[2023-11-15 07:14:38,762][00663] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 1331200. Throughput: 0: 852.6. Samples: 317440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:14:38,765][00663] Avg episode reward: [(0, '7.263')]
[2023-11-15 07:14:38,779][19966] Saving new best policy, reward=7.263!
[2023-11-15 07:14:43,766][00663] Fps is (10 sec: 3275.4, 60 sec: 3413.1, 300 sec: 3304.5). Total num frames: 1343488. Throughput: 0: 868.9. Samples: 322336. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2023-11-15 07:14:43,769][00663] Avg episode reward: [(0, '7.386')]
[2023-11-15 07:14:43,802][19966] Saving new best policy, reward=7.386!
[2023-11-15 07:14:48,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1359872. Throughput: 0: 865.1. Samples: 326380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:14:48,766][00663] Avg episode reward: [(0, '7.307')]
[2023-11-15 07:14:51,319][19979] Updated weights for policy 0, policy_version 334 (0.0024)
[2023-11-15 07:14:53,762][00663] Fps is (10 sec: 2868.4, 60 sec: 3276.8, 300 sec: 3263.0). Total num frames: 1372160. Throughput: 0: 838.4. Samples: 328420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:14:53,765][00663] Avg episode reward: [(0, '8.018')]
[2023-11-15 07:14:53,772][19966] Saving new best policy, reward=8.018!
[2023-11-15 07:14:58,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 1392640. Throughput: 0: 817.5. Samples: 334136. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:14:58,764][00663] Avg episode reward: [(0, '8.134')]
[2023-11-15 07:14:58,775][19966] Saving new best policy, reward=8.134!
[2023-11-15 07:15:01,872][19979] Updated weights for policy 0, policy_version 344 (0.0023)
[2023-11-15 07:15:03,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3304.6). Total num frames: 1413120. Throughput: 0: 861.6. Samples: 340586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:15:03,771][00663] Avg episode reward: [(0, '7.804')]
[2023-11-15 07:15:08,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1429504. Throughput: 0: 859.8. Samples: 342554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2023-11-15 07:15:08,771][00663] Avg episode reward: [(0, '7.723')]
[2023-11-15 07:15:13,765][00663] Fps is (10 sec: 2866.3, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 1441792. Throughput: 0: 860.8. Samples: 346668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:15:13,768][00663] Avg episode reward: [(0, '8.244')]
[2023-11-15 07:15:13,772][19966] Saving new best policy, reward=8.244!
[2023-11-15 07:15:15,793][19979] Updated weights for policy 0, policy_version 354 (0.0013)
[2023-11-15 07:15:18,767][00663] Fps is (10 sec: 2456.3, 60 sec: 3208.2, 300 sec: 3249.0). Total num frames: 1454080. Throughput: 0: 813.5. Samples: 350814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:15:18,773][00663] Avg episode reward: [(0, '8.625')]
[2023-11-15 07:15:18,789][19966] Saving new best policy, reward=8.625!
[2023-11-15 07:15:23,762][00663] Fps is (10 sec: 3687.6, 60 sec: 3413.7, 300 sec: 3290.7). Total num frames: 1478656. Throughput: 0: 809.6. Samples: 353872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:15:23,769][00663] Avg episode reward: [(0, '8.783')]
[2023-11-15 07:15:23,775][19966] Saving new best policy, reward=8.783!
[2023-11-15 07:15:26,590][19979] Updated weights for policy 0, policy_version 364 (0.0024)
[2023-11-15 07:15:28,762][00663] Fps is (10 sec: 4508.0, 60 sec: 3549.9, 300 sec: 3304.6). Total num frames: 1499136. Throughput: 0: 844.1. Samples: 360318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:15:28,768][00663] Avg episode reward: [(0, '8.611')]
[2023-11-15 07:15:33,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1511424. Throughput: 0: 857.2. Samples: 364952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:15:33,764][00663] Avg episode reward: [(0, '8.294')]
[2023-11-15 07:15:38,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1527808. Throughput: 0: 856.7. Samples: 366970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:15:38,765][00663] Avg episode reward: [(0, '8.250')]
[2023-11-15 07:15:38,779][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000373_1527808.pth...
[2023-11-15 07:15:38,941][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000182_745472.pth
[2023-11-15 07:15:40,253][19979] Updated weights for policy 0, policy_version 374 (0.0013)
[2023-11-15 07:15:43,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3277.0, 300 sec: 3262.9). Total num frames: 1540096. Throughput: 0: 818.4. Samples: 370962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:15:43,769][00663] Avg episode reward: [(0, '8.407')]
[2023-11-15 07:15:48,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 1560576. Throughput: 0: 806.8. Samples: 376894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:15:48,767][00663] Avg episode reward: [(0, '8.802')]
[2023-11-15 07:15:48,779][19966] Saving new best policy, reward=8.802!
[2023-11-15 07:15:51,116][19979] Updated weights for policy 0, policy_version 384 (0.0028)
[2023-11-15 07:15:53,762][00663] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 1585152. Throughput: 0: 836.4. Samples: 380192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:15:53,765][00663] Avg episode reward: [(0, '9.030')]
[2023-11-15 07:15:53,768][19966] Saving new best policy, reward=9.030!
[2023-11-15 07:15:58,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1597440. Throughput: 0: 855.6. Samples: 385166. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:15:58,768][00663] Avg episode reward: [(0, '9.265')]
[2023-11-15 07:15:58,777][19966] Saving new best policy, reward=9.265!
[2023-11-15 07:16:03,764][00663] Fps is (10 sec: 2457.1, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 1609728. Throughput: 0: 855.0. Samples: 389288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:16:03,767][00663] Avg episode reward: [(0, '9.437')]
[2023-11-15 07:16:03,773][19966] Saving new best policy, reward=9.437!
[2023-11-15 07:16:04,176][19979] Updated weights for policy 0, policy_version 394 (0.0019)
[2023-11-15 07:16:08,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1626112. Throughput: 0: 829.8. Samples: 391214. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:16:08,768][00663] Avg episode reward: [(0, '9.673')]
[2023-11-15 07:16:08,778][19966] Saving new best policy, reward=9.673!
[2023-11-15 07:16:13,764][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 1642496. Throughput: 0: 810.0. Samples: 396768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:16:13,772][00663] Avg episode reward: [(0, '9.432')]
[2023-11-15 07:16:15,667][19979] Updated weights for policy 0, policy_version 404 (0.0028)
[2023-11-15 07:16:18,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3550.2, 300 sec: 3318.5). Total num frames: 1667072. Throughput: 0: 851.9. Samples: 403288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:16:18,768][00663] Avg episode reward: [(0, '9.778')]
[2023-11-15 07:16:18,779][19966] Saving new best policy, reward=9.778!
[2023-11-15 07:16:23,762][00663] Fps is (10 sec: 3687.2, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 1679360. Throughput: 0: 856.8. Samples: 405526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:16:23,768][00663] Avg episode reward: [(0, '10.110')]
[2023-11-15 07:16:23,773][19966] Saving new best policy, reward=10.110!
[2023-11-15 07:16:28,759][19979] Updated weights for policy 0, policy_version 414 (0.0025)
[2023-11-15 07:16:28,766][00663] Fps is (10 sec: 2866.2, 60 sec: 3276.6, 300 sec: 3290.6). Total num frames: 1695744. Throughput: 0: 855.4. Samples: 409456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:16:28,775][00663] Avg episode reward: [(0, '10.234')]
[2023-11-15 07:16:28,790][19966] Saving new best policy, reward=10.234!
[2023-11-15 07:16:33,762][00663] Fps is (10 sec: 2867.1, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1708032. Throughput: 0: 814.0. Samples: 413524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:16:33,765][00663] Avg episode reward: [(0, '11.214')]
[2023-11-15 07:16:33,768][19966] Saving new best policy, reward=11.214!
[2023-11-15 07:16:38,762][00663] Fps is (10 sec: 3278.0, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1728512. Throughput: 0: 800.6. Samples: 416218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:16:38,764][00663] Avg episode reward: [(0, '11.209')]
[2023-11-15 07:16:40,584][19979] Updated weights for policy 0, policy_version 424 (0.0023)
[2023-11-15 07:16:43,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3318.5). Total num frames: 1748992. Throughput: 0: 835.0. Samples: 422740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:16:43,764][00663] Avg episode reward: [(0, '11.784')]
[2023-11-15 07:16:43,768][19966] Saving new best policy, reward=11.784!
[2023-11-15 07:16:48,764][00663] Fps is (10 sec: 3685.5, 60 sec: 3413.2, 300 sec: 3346.2). Total num frames: 1765376. Throughput: 0: 853.8. Samples: 427708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:16:48,767][00663] Avg episode reward: [(0, '13.103')]
[2023-11-15 07:16:48,777][19966] Saving new best policy, reward=13.103!
[2023-11-15 07:16:52,927][19979] Updated weights for policy 0, policy_version 434 (0.0016)
[2023-11-15 07:16:53,766][00663] Fps is (10 sec: 2866.0, 60 sec: 3208.3, 300 sec: 3346.2). Total num frames: 1777664. Throughput: 0: 857.0. Samples: 429784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:16:53,780][00663] Avg episode reward: [(0, '13.866')]
[2023-11-15 07:16:53,786][19966] Saving new best policy, reward=13.866!
[2023-11-15 07:16:58,762][00663] Fps is (10 sec: 2458.1, 60 sec: 3208.5, 300 sec: 3332.3). Total num frames: 1789952. Throughput: 0: 823.4. Samples: 433818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:16:58,767][00663] Avg episode reward: [(0, '13.587')]
[2023-11-15 07:17:03,762][00663] Fps is (10 sec: 3278.1, 60 sec: 3345.2, 300 sec: 3346.2). Total num frames: 1810432. Throughput: 0: 800.1. Samples: 439292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:17:03,771][00663] Avg episode reward: [(0, '13.069')]
[2023-11-15 07:17:05,219][19979] Updated weights for policy 0, policy_version 444 (0.0041)
[2023-11-15 07:17:08,762][00663] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 1830912. Throughput: 0: 819.7. Samples: 442412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:17:08,770][00663] Avg episode reward: [(0, '11.862')]
[2023-11-15 07:17:13,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3374.0). Total num frames: 1847296. Throughput: 0: 857.8. Samples: 448052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:17:13,772][00663] Avg episode reward: [(0, '12.005')]
[2023-11-15 07:17:17,224][19979] Updated weights for policy 0, policy_version 454 (0.0021)
[2023-11-15 07:17:18,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 1863680. Throughput: 0: 859.5. Samples: 452202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:17:18,764][00663] Avg episode reward: [(0, '11.711')]
[2023-11-15 07:17:23,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 1875968. Throughput: 0: 844.5. Samples: 454222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:17:23,769][00663] Avg episode reward: [(0, '13.027')]
[2023-11-15 07:17:28,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3277.0, 300 sec: 3332.3). Total num frames: 1892352. Throughput: 0: 810.8. Samples: 459226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:17:28,770][00663] Avg episode reward: [(0, '14.210')]
[2023-11-15 07:17:28,821][19966] Saving new best policy, reward=14.210!
[2023-11-15 07:17:29,761][19979] Updated weights for policy 0, policy_version 464 (0.0037)
[2023-11-15 07:17:33,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 1916928. Throughput: 0: 844.0. Samples: 465686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:17:33,770][00663] Avg episode reward: [(0, '14.489')]
[2023-11-15 07:17:33,773][19966] Saving new best policy, reward=14.489!
[2023-11-15 07:17:38,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 1933312. Throughput: 0: 859.6. Samples: 468462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:17:38,766][00663] Avg episode reward: [(0, '14.936')]
[2023-11-15 07:17:38,776][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000472_1933312.pth...
[2023-11-15 07:17:38,934][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000273_1118208.pth
[2023-11-15 07:17:38,948][19966] Saving new best policy, reward=14.936!
[2023-11-15 07:17:41,457][19979] Updated weights for policy 0, policy_version 474 (0.0030)
[2023-11-15 07:17:43,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 1945600. Throughput: 0: 858.2. Samples: 472438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:17:43,770][00663] Avg episode reward: [(0, '14.688')]
[2023-11-15 07:17:48,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3208.7, 300 sec: 3332.3). Total num frames: 1957888. Throughput: 0: 828.6. Samples: 476578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:17:48,766][00663] Avg episode reward: [(0, '14.802')]
[2023-11-15 07:17:53,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.3, 300 sec: 3332.3). Total num frames: 1978368. Throughput: 0: 812.2. Samples: 478962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:17:53,764][00663] Avg episode reward: [(0, '14.606')]
[2023-11-15 07:17:54,258][19979] Updated weights for policy 0, policy_version 484 (0.0037)
[2023-11-15 07:17:58,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 1998848. Throughput: 0: 834.4. Samples: 485602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:17:58,772][00663] Avg episode reward: [(0, '14.478')]
[2023-11-15 07:18:03,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 2015232. Throughput: 0: 862.7. Samples: 491022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:18:03,768][00663] Avg episode reward: [(0, '14.503')]
[2023-11-15 07:18:05,877][19979] Updated weights for policy 0, policy_version 494 (0.0013)
[2023-11-15 07:18:08,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 2027520. Throughput: 0: 854.1. Samples: 492658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:18:08,768][00663] Avg episode reward: [(0, '13.966')]
[2023-11-15 07:18:13,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3332.3). Total num frames: 2039808. Throughput: 0: 816.4. Samples: 495962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:18:13,765][00663] Avg episode reward: [(0, '13.271')]
[2023-11-15 07:18:18,762][00663] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 3290.7). Total num frames: 2048000. Throughput: 0: 744.8. Samples: 499202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:18:18,768][00663] Avg episode reward: [(0, '13.822')]
[2023-11-15 07:18:23,762][00663] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 3276.8). Total num frames: 2060288. Throughput: 0: 718.1. Samples: 500778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:18:23,765][00663] Avg episode reward: [(0, '14.455')]
[2023-11-15 07:18:24,039][19979] Updated weights for policy 0, policy_version 504 (0.0041)
[2023-11-15 07:18:28,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3304.6). Total num frames: 2080768. Throughput: 0: 750.4. Samples: 506206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:18:28,764][00663] Avg episode reward: [(0, '14.489')]
[2023-11-15 07:18:33,771][00663] Fps is (10 sec: 4092.2, 60 sec: 3071.5, 300 sec: 3332.2). Total num frames: 2101248. Throughput: 0: 784.1. Samples: 511868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:18:33,774][00663] Avg episode reward: [(0, '14.780')]
[2023-11-15 07:18:34,496][19979] Updated weights for policy 0, policy_version 514 (0.0014)
[2023-11-15 07:18:38,762][00663] Fps is (10 sec: 3276.7, 60 sec: 3003.7, 300 sec: 3304.6). Total num frames: 2113536. Throughput: 0: 777.7. Samples: 513960. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:18:38,766][00663] Avg episode reward: [(0, '15.936')]
[2023-11-15 07:18:38,788][19966] Saving new best policy, reward=15.936!
[2023-11-15 07:18:43,762][00663] Fps is (10 sec: 2869.8, 60 sec: 3072.0, 300 sec: 3276.8). Total num frames: 2129920. Throughput: 0: 722.0. Samples: 518092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:18:43,765][00663] Avg episode reward: [(0, '16.440')]
[2023-11-15 07:18:43,771][19966] Saving new best policy, reward=16.440!
[2023-11-15 07:18:48,705][19979] Updated weights for policy 0, policy_version 524 (0.0018)
[2023-11-15 07:18:48,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3290.7). Total num frames: 2146304. Throughput: 0: 705.1. Samples: 522750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:18:48,765][00663] Avg episode reward: [(0, '16.630')]
[2023-11-15 07:18:48,791][19966] Saving new best policy, reward=16.630!
[2023-11-15 07:18:53,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3304.6). Total num frames: 2162688. Throughput: 0: 732.3. Samples: 525610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:18:53,767][00663] Avg episode reward: [(0, '17.738')]
[2023-11-15 07:18:53,773][19966] Saving new best policy, reward=17.738!
[2023-11-15 07:18:58,762][00663] Fps is (10 sec: 3686.5, 60 sec: 3072.0, 300 sec: 3318.5). Total num frames: 2183168. Throughput: 0: 803.3. Samples: 532112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:18:58,769][00663] Avg episode reward: [(0, '17.761')]
[2023-11-15 07:18:58,781][19966] Saving new best policy, reward=17.761!
[2023-11-15 07:18:59,180][19979] Updated weights for policy 0, policy_version 534 (0.0022)
[2023-11-15 07:19:03,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3304.6). Total num frames: 2199552. Throughput: 0: 822.6. Samples: 536220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:19:03,769][00663] Avg episode reward: [(0, '17.268')]
[2023-11-15 07:19:08,764][00663] Fps is (10 sec: 2866.6, 60 sec: 3071.9, 300 sec: 3276.8). Total num frames: 2211840. Throughput: 0: 833.4. Samples: 538284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:19:08,771][00663] Avg episode reward: [(0, '16.968')]
[2023-11-15 07:19:13,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3262.9). Total num frames: 2224128. Throughput: 0: 800.8. Samples: 542244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:19:13,776][00663] Avg episode reward: [(0, '17.874')]
[2023-11-15 07:19:13,785][19966] Saving new best policy, reward=17.874!
[2023-11-15 07:19:13,794][19979] Updated weights for policy 0, policy_version 544 (0.0018)
[2023-11-15 07:19:18,762][00663] Fps is (10 sec: 3687.2, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 2248704. Throughput: 0: 815.8. Samples: 548570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:19:18,764][00663] Avg episode reward: [(0, '17.936')]
[2023-11-15 07:19:18,776][19966] Saving new best policy, reward=17.936!
[2023-11-15 07:19:23,549][19979] Updated weights for policy 0, policy_version 554 (0.0016)
[2023-11-15 07:19:23,766][00663] Fps is (10 sec: 4503.7, 60 sec: 3481.4, 300 sec: 3332.3). Total num frames: 2269184. Throughput: 0: 840.9. Samples: 551806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:19:23,771][00663] Avg episode reward: [(0, '18.439')]
[2023-11-15 07:19:23,772][19966] Saving new best policy, reward=18.439!
[2023-11-15 07:19:28,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2281472. Throughput: 0: 851.7. Samples: 556420. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:19:28,770][00663] Avg episode reward: [(0, '18.880')]
[2023-11-15 07:19:28,781][19966] Saving new best policy, reward=18.880!
[2023-11-15 07:19:33,763][00663] Fps is (10 sec: 2458.3, 60 sec: 3209.0, 300 sec: 3262.9). Total num frames: 2293760. Throughput: 0: 836.8. Samples: 560408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:19:33,768][00663] Avg episode reward: [(0, '19.881')]
[2023-11-15 07:19:33,779][19966] Saving new best policy, reward=19.881!
[2023-11-15 07:19:38,613][19979] Updated weights for policy 0, policy_version 564 (0.0021)
[2023-11-15 07:19:38,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2310144. Throughput: 0: 817.0. Samples: 562374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:19:38,768][00663] Avg episode reward: [(0, '20.845')]
[2023-11-15 07:19:38,781][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000564_2310144.pth...
[2023-11-15 07:19:38,901][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000373_1527808.pth
[2023-11-15 07:19:38,914][19966] Saving new best policy, reward=20.845!
[2023-11-15 07:19:43,762][00663] Fps is (10 sec: 3687.0, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2330624. Throughput: 0: 799.4. Samples: 568086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:19:43,767][00663] Avg episode reward: [(0, '20.579')]
[2023-11-15 07:19:48,224][19979] Updated weights for policy 0, policy_version 574 (0.0013)
[2023-11-15 07:19:48,766][00663] Fps is (10 sec: 4094.3, 60 sec: 3413.1, 300 sec: 3318.4). Total num frames: 2351104. Throughput: 0: 849.7. Samples: 574460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:19:48,769][00663] Avg episode reward: [(0, '21.870')]
[2023-11-15 07:19:48,788][19966] Saving new best policy, reward=21.870!
[2023-11-15 07:19:53,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2363392. Throughput: 0: 847.0. Samples: 576398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:19:53,769][00663] Avg episode reward: [(0, '21.580')]
[2023-11-15 07:19:58,765][00663] Fps is (10 sec: 2867.5, 60 sec: 3276.6, 300 sec: 3276.8). Total num frames: 2379776. Throughput: 0: 852.4. Samples: 580604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:19:58,768][00663] Avg episode reward: [(0, '21.960')]
[2023-11-15 07:19:58,780][19966] Saving new best policy, reward=21.960!
[2023-11-15 07:20:03,256][19979] Updated weights for policy 0, policy_version 584 (0.0034)
[2023-11-15 07:20:03,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2392064. Throughput: 0: 803.5. Samples: 584728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:20:03,769][00663] Avg episode reward: [(0, '21.276')]
[2023-11-15 07:20:08,762][00663] Fps is (10 sec: 3277.8, 60 sec: 3345.2, 300 sec: 3290.7). Total num frames: 2412544. Throughput: 0: 802.3. Samples: 587904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:20:08,765][00663] Avg episode reward: [(0, '21.471')]
[2023-11-15 07:20:12,622][19979] Updated weights for policy 0, policy_version 594 (0.0017)
[2023-11-15 07:20:13,769][00663] Fps is (10 sec: 4093.1, 60 sec: 3481.2, 300 sec: 3318.4). Total num frames: 2433024. Throughput: 0: 842.8. Samples: 594354. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2023-11-15 07:20:13,772][00663] Avg episode reward: [(0, '22.136')]
[2023-11-15 07:20:13,857][19966] Saving new best policy, reward=22.136!
[2023-11-15 07:20:18,766][00663] Fps is (10 sec: 3684.8, 60 sec: 3344.8, 300 sec: 3290.6). Total num frames: 2449408. Throughput: 0: 849.8. Samples: 598652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:20:18,769][00663] Avg episode reward: [(0, '21.624')]
[2023-11-15 07:20:23,765][00663] Fps is (10 sec: 2868.3, 60 sec: 3208.6, 300 sec: 3262.9). Total num frames: 2461696. Throughput: 0: 851.0. Samples: 600670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:20:23,768][00663] Avg episode reward: [(0, '19.959')]
[2023-11-15 07:20:27,763][19979] Updated weights for policy 0, policy_version 604 (0.0019)
[2023-11-15 07:20:28,762][00663] Fps is (10 sec: 2458.6, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2473984. Throughput: 0: 814.1. Samples: 604720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:20:28,770][00663] Avg episode reward: [(0, '19.882')]
[2023-11-15 07:20:33,762][00663] Fps is (10 sec: 3277.9, 60 sec: 3345.2, 300 sec: 3276.8). Total num frames: 2494464. Throughput: 0: 808.8. Samples: 610854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:20:33,765][00663] Avg episode reward: [(0, '19.855')]
[2023-11-15 07:20:37,463][19979] Updated weights for policy 0, policy_version 614 (0.0014)
[2023-11-15 07:20:38,762][00663] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3318.5). Total num frames: 2519040. Throughput: 0: 838.2. Samples: 614118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:20:38,765][00663] Avg episode reward: [(0, '18.433')]
[2023-11-15 07:20:43,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2531328. Throughput: 0: 855.2. Samples: 619086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:20:43,765][00663] Avg episode reward: [(0, '18.535')]
[2023-11-15 07:20:48,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3208.8, 300 sec: 3249.0). Total num frames: 2543616. Throughput: 0: 854.2. Samples: 623166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:20:48,767][00663] Avg episode reward: [(0, '18.689')]
[2023-11-15 07:20:51,968][19979] Updated weights for policy 0, policy_version 624 (0.0016)
[2023-11-15 07:20:53,764][00663] Fps is (10 sec: 2866.7, 60 sec: 3276.7, 300 sec: 3262.9). Total num frames: 2560000. Throughput: 0: 828.5. Samples: 625186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:20:53,775][00663] Avg episode reward: [(0, '20.425')]
[2023-11-15 07:20:58,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3290.7). Total num frames: 2580480. Throughput: 0: 804.7. Samples: 630558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:20:58,764][00663] Avg episode reward: [(0, '20.739')]
[2023-11-15 07:21:02,425][19979] Updated weights for policy 0, policy_version 634 (0.0021)
[2023-11-15 07:21:03,762][00663] Fps is (10 sec: 4096.7, 60 sec: 3481.6, 300 sec: 3304.6). Total num frames: 2600960. Throughput: 0: 852.6. Samples: 637014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:21:03,769][00663] Avg episode reward: [(0, '20.679')]
[2023-11-15 07:21:08,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 2617344. Throughput: 0: 860.2. Samples: 639376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:21:08,768][00663] Avg episode reward: [(0, '20.499')]
[2023-11-15 07:21:13,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3277.2, 300 sec: 3262.9). Total num frames: 2629632. Throughput: 0: 860.3. Samples: 643434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:21:13,771][00663] Avg episode reward: [(0, '20.860')]
[2023-11-15 07:21:15,945][19979] Updated weights for policy 0, policy_version 644 (0.0013)
[2023-11-15 07:21:18,763][00663] Fps is (10 sec: 2457.5, 60 sec: 3208.7, 300 sec: 3262.9). Total num frames: 2641920. Throughput: 0: 817.6. Samples: 647648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:21:18,769][00663] Avg episode reward: [(0, '20.974')]
[2023-11-15 07:21:23,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.2, 300 sec: 3276.8). Total num frames: 2662400. Throughput: 0: 810.1. Samples: 650572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:21:23,765][00663] Avg episode reward: [(0, '20.417')]
[2023-11-15 07:21:26,628][19979] Updated weights for policy 0, policy_version 654 (0.0019)
[2023-11-15 07:21:28,762][00663] Fps is (10 sec: 4505.8, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 2686976. Throughput: 0: 846.5. Samples: 657180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:21:28,764][00663] Avg episode reward: [(0, '20.015')]
[2023-11-15 07:21:33,764][00663] Fps is (10 sec: 4095.3, 60 sec: 3481.5, 300 sec: 3304.6). Total num frames: 2703360. Throughput: 0: 866.6. Samples: 662166. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:21:33,770][00663] Avg episode reward: [(0, '20.878')]
[2023-11-15 07:21:38,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2715648. Throughput: 0: 868.8. Samples: 664280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:21:38,769][00663] Avg episode reward: [(0, '21.822')]
[2023-11-15 07:21:38,777][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000663_2715648.pth...
[2023-11-15 07:21:38,905][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000472_1933312.pth
[2023-11-15 07:21:39,389][19979] Updated weights for policy 0, policy_version 664 (0.0029)
[2023-11-15 07:21:43,762][00663] Fps is (10 sec: 2458.0, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 2727936. Throughput: 0: 840.3. Samples: 668372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:21:43,765][00663] Avg episode reward: [(0, '22.470')]
[2023-11-15 07:21:43,773][19966] Saving new best policy, reward=22.470!
[2023-11-15 07:21:48,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 2748416. Throughput: 0: 820.6. Samples: 673942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:21:48,769][00663] Avg episode reward: [(0, '22.668')]
[2023-11-15 07:21:48,789][19966] Saving new best policy, reward=22.668!
[2023-11-15 07:21:51,093][19979] Updated weights for policy 0, policy_version 674 (0.0018)
[2023-11-15 07:21:53,764][00663] Fps is (10 sec: 4095.3, 60 sec: 3481.6, 300 sec: 3318.4). Total num frames: 2768896. Throughput: 0: 839.7. Samples: 677164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:21:53,767][00663] Avg episode reward: [(0, '21.948')]
[2023-11-15 07:21:58,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 2785280. Throughput: 0: 872.3. Samples: 682686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:21:58,769][00663] Avg episode reward: [(0, '21.910')]
[2023-11-15 07:22:03,551][19979] Updated weights for policy 0, policy_version 684 (0.0013)
[2023-11-15 07:22:03,766][00663] Fps is (10 sec: 3275.9, 60 sec: 3344.8, 300 sec: 3290.6). Total num frames: 2801664. Throughput: 0: 870.1. Samples: 686804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:22:03,769][00663] Avg episode reward: [(0, '21.478')]
[2023-11-15 07:22:08,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2813952. Throughput: 0: 850.0. Samples: 688824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:22:08,766][00663] Avg episode reward: [(0, '19.792')]
[2023-11-15 07:22:13,762][00663] Fps is (10 sec: 3278.3, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 2834432. Throughput: 0: 815.1. Samples: 693858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:22:13,770][00663] Avg episode reward: [(0, '19.248')]
[2023-11-15 07:22:15,447][19979] Updated weights for policy 0, policy_version 694 (0.0045)
[2023-11-15 07:22:18,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 2854912. Throughput: 0: 854.1. Samples: 700600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:22:18,768][00663] Avg episode reward: [(0, '19.916')]
[2023-11-15 07:22:23,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3318.5). Total num frames: 2871296. Throughput: 0: 867.3. Samples: 703308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:22:23,765][00663] Avg episode reward: [(0, '20.536')]
[2023-11-15 07:22:27,693][19979] Updated weights for policy 0, policy_version 704 (0.0014)
[2023-11-15 07:22:28,762][00663] Fps is (10 sec: 2867.1, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2883584. Throughput: 0: 862.4. Samples: 707180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:22:28,767][00663] Avg episode reward: [(0, '21.149')]
[2023-11-15 07:22:33,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.9, 300 sec: 3276.8). Total num frames: 2899968. Throughput: 0: 830.8. Samples: 711326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:22:33,766][00663] Avg episode reward: [(0, '21.404')]
[2023-11-15 07:22:38,762][00663] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2916352. Throughput: 0: 806.1. Samples: 713438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:22:38,772][00663] Avg episode reward: [(0, '23.135')]
[2023-11-15 07:22:38,786][19966] Saving new best policy, reward=23.135!
[2023-11-15 07:22:40,754][19979] Updated weights for policy 0, policy_version 714 (0.0045)
[2023-11-15 07:22:43,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 2932736. Throughput: 0: 802.9. Samples: 718818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:22:43,765][00663] Avg episode reward: [(0, '24.240')]
[2023-11-15 07:22:43,774][19966] Saving new best policy, reward=24.240!
[2023-11-15 07:22:48,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2949120. Throughput: 0: 827.2. Samples: 724024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:22:48,769][00663] Avg episode reward: [(0, '24.795')]
[2023-11-15 07:22:48,789][19966] Saving new best policy, reward=24.795!
[2023-11-15 07:22:53,765][00663] Fps is (10 sec: 2866.3, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2961408. Throughput: 0: 823.9. Samples: 725904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:22:53,768][00663] Avg episode reward: [(0, '24.162')]
[2023-11-15 07:22:54,124][19979] Updated weights for policy 0, policy_version 724 (0.0019)
[2023-11-15 07:22:58,768][00663] Fps is (10 sec: 2456.1, 60 sec: 3139.9, 300 sec: 3249.0). Total num frames: 2973696. Throughput: 0: 801.2. Samples: 729916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:22:58,771][00663] Avg episode reward: [(0, '23.821')]
[2023-11-15 07:23:03,762][00663] Fps is (10 sec: 2868.1, 60 sec: 3140.5, 300 sec: 3262.9). Total num frames: 2990080. Throughput: 0: 739.7. Samples: 733888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:23:03,768][00663] Avg episode reward: [(0, '23.365')]
[2023-11-15 07:23:07,752][19979] Updated weights for policy 0, policy_version 734 (0.0014)
[2023-11-15 07:23:08,762][00663] Fps is (10 sec: 3278.8, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 3006464. Throughput: 0: 745.0. Samples: 736832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:23:08,771][00663] Avg episode reward: [(0, '20.919')]
[2023-11-15 07:23:13,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3318.5). Total num frames: 3026944. Throughput: 0: 795.2. Samples: 742964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:23:13,765][00663] Avg episode reward: [(0, '21.956')]
[2023-11-15 07:23:18,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3332.3). Total num frames: 3043328. Throughput: 0: 795.7. Samples: 747132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:23:18,769][00663] Avg episode reward: [(0, '19.939')]
[2023-11-15 07:23:20,231][19979] Updated weights for policy 0, policy_version 744 (0.0018)
[2023-11-15 07:23:23,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3304.6). Total num frames: 3055616. Throughput: 0: 794.0. Samples: 749170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:23:23,766][00663] Avg episode reward: [(0, '20.981')]
[2023-11-15 07:23:28,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3276.9). Total num frames: 3067904. Throughput: 0: 764.0. Samples: 753198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:23:28,765][00663] Avg episode reward: [(0, '21.181')]
[2023-11-15 07:23:32,884][19979] Updated weights for policy 0, policy_version 754 (0.0036)
[2023-11-15 07:23:33,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3304.6). Total num frames: 3088384. Throughput: 0: 785.9. Samples: 759390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:23:33,769][00663] Avg episode reward: [(0, '22.128')]
[2023-11-15 07:23:38,766][00663] Fps is (10 sec: 4503.9, 60 sec: 3276.6, 300 sec: 3332.3). Total num frames: 3112960. Throughput: 0: 815.0. Samples: 762578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:23:38,777][00663] Avg episode reward: [(0, '24.172')]
[2023-11-15 07:23:38,794][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000760_3112960.pth...
[2023-11-15 07:23:38,974][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000564_2310144.pth
[2023-11-15 07:23:43,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3318.5). Total num frames: 3125248. Throughput: 0: 829.6. Samples: 767242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:23:43,765][00663] Avg episode reward: [(0, '25.022')]
[2023-11-15 07:23:43,768][19966] Saving new best policy, reward=25.022!
[2023-11-15 07:23:44,544][19979] Updated weights for policy 0, policy_version 764 (0.0017)
[2023-11-15 07:23:48,767][00663] Fps is (10 sec: 2457.3, 60 sec: 3140.0, 300 sec: 3304.5). Total num frames: 3137536. Throughput: 0: 829.4. Samples: 771216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:23:48,773][00663] Avg episode reward: [(0, '24.711')]
[2023-11-15 07:23:53,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3208.7, 300 sec: 3290.7). Total num frames: 3153920. Throughput: 0: 808.5. Samples: 773216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:23:53,765][00663] Avg episode reward: [(0, '26.228')]
[2023-11-15 07:23:53,767][19966] Saving new best policy, reward=26.228!
[2023-11-15 07:23:57,811][19979] Updated weights for policy 0, policy_version 774 (0.0048)
[2023-11-15 07:23:58,762][00663] Fps is (10 sec: 3688.3, 60 sec: 3345.4, 300 sec: 3304.6). Total num frames: 3174400. Throughput: 0: 796.9. Samples: 778826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:23:58,765][00663] Avg episode reward: [(0, '26.196')]
[2023-11-15 07:24:03,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3332.4). Total num frames: 3194880. Throughput: 0: 847.2. Samples: 785256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:24:03,765][00663] Avg episode reward: [(0, '25.911')]
[2023-11-15 07:24:08,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3207168. Throughput: 0: 848.5. Samples: 787354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:24:08,776][00663] Avg episode reward: [(0, '25.436')]
[2023-11-15 07:24:09,171][19979] Updated weights for policy 0, policy_version 784 (0.0033)
[2023-11-15 07:24:13,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 3219456. Throughput: 0: 848.6. Samples: 791384. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:24:13,765][00663] Avg episode reward: [(0, '25.348')]
[2023-11-15 07:24:18,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 3235840. Throughput: 0: 802.5. Samples: 795504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:24:18,765][00663] Avg episode reward: [(0, '25.085')]
[2023-11-15 07:24:23,164][19979] Updated weights for policy 0, policy_version 794 (0.0027)
[2023-11-15 07:24:23,763][00663] Fps is (10 sec: 3276.4, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 3252224. Throughput: 0: 793.9. Samples: 798300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:24:23,766][00663] Avg episode reward: [(0, '24.973')]
[2023-11-15 07:24:28,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 3264512. Throughput: 0: 781.8. Samples: 802424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:24:28,765][00663] Avg episode reward: [(0, '25.014')]
[2023-11-15 07:24:33,765][00663] Fps is (10 sec: 2457.1, 60 sec: 3140.1, 300 sec: 3276.8). Total num frames: 3276800. Throughput: 0: 772.6. Samples: 805980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:24:33,768][00663] Avg episode reward: [(0, '24.892')]
[2023-11-15 07:24:38,762][00663] Fps is (10 sec: 2457.6, 60 sec: 2935.7, 300 sec: 3249.0). Total num frames: 3289088. Throughput: 0: 763.7. Samples: 807582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:24:38,768][00663] Avg episode reward: [(0, '25.159')]
[2023-11-15 07:24:40,034][19979] Updated weights for policy 0, policy_version 804 (0.0041)
[2023-11-15 07:24:43,767][00663] Fps is (10 sec: 2457.1, 60 sec: 2935.2, 300 sec: 3221.3). Total num frames: 3301376. Throughput: 0: 718.1. Samples: 811146. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:24:43,782][00663] Avg episode reward: [(0, '25.339')]
[2023-11-15 07:24:48,764][00663] Fps is (10 sec: 2457.1, 60 sec: 2935.6, 300 sec: 3221.2). Total num frames: 3313664. Throughput: 0: 665.5. Samples: 815206. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2023-11-15 07:24:48,773][00663] Avg episode reward: [(0, '25.846')]
[2023-11-15 07:24:53,146][19979] Updated weights for policy 0, policy_version 814 (0.0028)
[2023-11-15 07:24:53,762][00663] Fps is (10 sec: 3278.5, 60 sec: 3003.7, 300 sec: 3235.2). Total num frames: 3334144. Throughput: 0: 688.0. Samples: 818312. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:24:53,769][00663] Avg episode reward: [(0, '25.845')]
[2023-11-15 07:24:58,762][00663] Fps is (10 sec: 4506.6, 60 sec: 3072.0, 300 sec: 3276.8). Total num frames: 3358720. Throughput: 0: 743.8. Samples: 824854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:24:58,765][00663] Avg episode reward: [(0, '25.737')]
[2023-11-15 07:25:03,762][00663] Fps is (10 sec: 3686.3, 60 sec: 2935.5, 300 sec: 3249.0). Total num frames: 3371008. Throughput: 0: 754.1. Samples: 829438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:25:03,770][00663] Avg episode reward: [(0, '24.696')]
[2023-11-15 07:25:04,693][19979] Updated weights for policy 0, policy_version 824 (0.0018)
[2023-11-15 07:25:08,762][00663] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 3221.3). Total num frames: 3383296. Throughput: 0: 737.1. Samples: 831468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:25:08,765][00663] Avg episode reward: [(0, '25.236')]
[2023-11-15 07:25:13,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3221.3). Total num frames: 3399680. Throughput: 0: 734.8. Samples: 835492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:25:13,765][00663] Avg episode reward: [(0, '24.231')]
[2023-11-15 07:25:17,672][19979] Updated weights for policy 0, policy_version 834 (0.0025)
[2023-11-15 07:25:18,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3249.1). Total num frames: 3420160. Throughput: 0: 785.5. Samples: 841324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:25:18,765][00663] Avg episode reward: [(0, '23.070')]
[2023-11-15 07:25:23,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 3440640. Throughput: 0: 821.9. Samples: 844566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:25:23,765][00663] Avg episode reward: [(0, '23.064')]
[2023-11-15 07:25:28,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 3452928. Throughput: 0: 857.0. Samples: 849708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:25:28,767][00663] Avg episode reward: [(0, '23.302')]
[2023-11-15 07:25:28,940][19979] Updated weights for policy 0, policy_version 844 (0.0015)
[2023-11-15 07:25:33,765][00663] Fps is (10 sec: 2866.2, 60 sec: 3208.5, 300 sec: 3221.2). Total num frames: 3469312. Throughput: 0: 856.5. Samples: 853748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-11-15 07:25:33,771][00663] Avg episode reward: [(0, '24.271')]
[2023-11-15 07:25:38,764][00663] Fps is (10 sec: 2866.6, 60 sec: 3208.4, 300 sec: 3221.2). Total num frames: 3481600. Throughput: 0: 832.2. Samples: 855764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:25:38,768][00663] Avg episode reward: [(0, '26.288')]
[2023-11-15 07:25:38,788][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000850_3481600.pth...
[2023-11-15 07:25:38,993][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000663_2715648.pth
[2023-11-15 07:25:39,015][19966] Saving new best policy, reward=26.288!
[2023-11-15 07:25:42,589][19979] Updated weights for policy 0, policy_version 854 (0.0019)
[2023-11-15 07:25:43,762][00663] Fps is (10 sec: 3277.8, 60 sec: 3345.3, 300 sec: 3249.0). Total num frames: 3502080. Throughput: 0: 796.4. Samples: 860694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:25:43,765][00663] Avg episode reward: [(0, '25.710')]
[2023-11-15 07:25:48,762][00663] Fps is (10 sec: 3687.2, 60 sec: 3413.5, 300 sec: 3249.0). Total num frames: 3518464. Throughput: 0: 823.2. Samples: 866484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:25:48,765][00663] Avg episode reward: [(0, '27.232')]
[2023-11-15 07:25:48,782][19966] Saving new best policy, reward=27.232!
[2023-11-15 07:25:53,762][00663] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 3534848. Throughput: 0: 828.4. Samples: 868748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:25:53,765][00663] Avg episode reward: [(0, '27.593')]
[2023-11-15 07:25:53,772][19966] Saving new best policy, reward=27.593!
[2023-11-15 07:25:55,253][19979] Updated weights for policy 0, policy_version 864 (0.0015)
[2023-11-15 07:25:58,764][00663] Fps is (10 sec: 2866.6, 60 sec: 3140.2, 300 sec: 3207.4). Total num frames: 3547136. Throughput: 0: 825.8. Samples: 872656. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:25:58,771][00663] Avg episode reward: [(0, '27.813')]
[2023-11-15 07:25:58,784][19966] Saving new best policy, reward=27.813!
[2023-11-15 07:26:03,762][00663] Fps is (10 sec: 2457.5, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 3559424. Throughput: 0: 779.7. Samples: 876410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2023-11-15 07:26:03,769][00663] Avg episode reward: [(0, '25.857')]
[2023-11-15 07:26:08,762][00663] Fps is (10 sec: 2867.8, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 3575808. Throughput: 0: 754.4. Samples: 878514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:26:08,766][00663] Avg episode reward: [(0, '23.897')]
[2023-11-15 07:26:09,271][19979] Updated weights for policy 0, policy_version 874 (0.0021)
[2023-11-15 07:26:13,762][00663] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3235.2). Total num frames: 3596288. Throughput: 0: 775.4. Samples: 884602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:26:13,765][00663] Avg episode reward: [(0, '22.681')]
[2023-11-15 07:26:18,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3612672. Throughput: 0: 800.1. Samples: 889750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:26:18,769][00663] Avg episode reward: [(0, '22.993')]
[2023-11-15 07:26:21,639][19979] Updated weights for policy 0, policy_version 884 (0.0033)
[2023-11-15 07:26:23,770][00663] Fps is (10 sec: 2864.8, 60 sec: 3071.6, 300 sec: 3179.5). Total num frames: 3624960. Throughput: 0: 795.6. Samples: 891572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:26:23,777][00663] Avg episode reward: [(0, '21.834')]
[2023-11-15 07:26:28,766][00663] Fps is (10 sec: 2456.6, 60 sec: 3071.8, 300 sec: 3165.7). Total num frames: 3637248. Throughput: 0: 766.6. Samples: 895192. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:26:28,769][00663] Avg episode reward: [(0, '21.770')]
[2023-11-15 07:26:33,762][00663] Fps is (10 sec: 2459.6, 60 sec: 3003.9, 300 sec: 3165.7). Total num frames: 3649536. Throughput: 0: 727.6. Samples: 899226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:26:33,770][00663] Avg episode reward: [(0, '21.486')]
[2023-11-15 07:26:36,247][19979] Updated weights for policy 0, policy_version 894 (0.0030)
[2023-11-15 07:26:38,762][00663] Fps is (10 sec: 3278.2, 60 sec: 3140.4, 300 sec: 3193.5). Total num frames: 3670016. Throughput: 0: 739.3. Samples: 902018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:26:38,768][00663] Avg episode reward: [(0, '24.089')]
[2023-11-15 07:26:43,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3179.6). Total num frames: 3686400. Throughput: 0: 783.6. Samples: 907914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:26:43,764][00663] Avg episode reward: [(0, '23.112')]
[2023-11-15 07:26:48,659][19979] Updated weights for policy 0, policy_version 904 (0.0029)
[2023-11-15 07:26:48,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 3702784. Throughput: 0: 790.0. Samples: 911960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:26:48,775][00663] Avg episode reward: [(0, '23.950')]
[2023-11-15 07:26:53,762][00663] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3151.8). Total num frames: 3715072. Throughput: 0: 789.6. Samples: 914048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:26:53,767][00663] Avg episode reward: [(0, '23.163')]
[2023-11-15 07:26:58,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3003.8, 300 sec: 3138.0). Total num frames: 3727360. Throughput: 0: 744.8. Samples: 918116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:26:58,766][00663] Avg episode reward: [(0, '23.876')]
[2023-11-15 07:27:02,003][19979] Updated weights for policy 0, policy_version 914 (0.0020)
[2023-11-15 07:27:03,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 3747840. Throughput: 0: 763.6. Samples: 924110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:27:03,773][00663] Avg episode reward: [(0, '24.615')]
[2023-11-15 07:27:08,762][00663] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 3768320. Throughput: 0: 785.6. Samples: 926916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:27:08,772][00663] Avg episode reward: [(0, '26.098')]
[2023-11-15 07:27:13,762][00663] Fps is (10 sec: 3276.7, 60 sec: 3072.0, 300 sec: 3138.0). Total num frames: 3780608. Throughput: 0: 802.2. Samples: 931288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:27:13,768][00663] Avg episode reward: [(0, '26.011')]
[2023-11-15 07:27:14,591][19979] Updated weights for policy 0, policy_version 924 (0.0048)
[2023-11-15 07:27:18,763][00663] Fps is (10 sec: 2457.3, 60 sec: 3003.7, 300 sec: 3124.1). Total num frames: 3792896. Throughput: 0: 792.6. Samples: 934892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:27:18,767][00663] Avg episode reward: [(0, '26.845')]
[2023-11-15 07:27:23,762][00663] Fps is (10 sec: 2457.6, 60 sec: 3004.1, 300 sec: 3124.1). Total num frames: 3805184. Throughput: 0: 772.0. Samples: 936758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:27:23,767][00663] Avg episode reward: [(0, '27.055')]
[2023-11-15 07:27:28,762][00663] Fps is (10 sec: 2867.6, 60 sec: 3072.2, 300 sec: 3124.1). Total num frames: 3821568. Throughput: 0: 734.4. Samples: 940964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2023-11-15 07:27:28,768][00663] Avg episode reward: [(0, '26.089')]
[2023-11-15 07:27:29,585][19979] Updated weights for policy 0, policy_version 934 (0.0020)
[2023-11-15 07:27:33,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3837952. Throughput: 0: 771.8. Samples: 946690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:27:33,765][00663] Avg episode reward: [(0, '26.026')]
[2023-11-15 07:27:38,764][00663] Fps is (10 sec: 3276.1, 60 sec: 3071.9, 300 sec: 3124.0). Total num frames: 3854336. Throughput: 0: 783.6. Samples: 949312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:27:38,773][00663] Avg episode reward: [(0, '25.958')]
[2023-11-15 07:27:38,790][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000941_3854336.pth...
[2023-11-15 07:27:38,938][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000760_3112960.pth
[2023-11-15 07:27:42,730][19979] Updated weights for policy 0, policy_version 944 (0.0014)
[2023-11-15 07:27:43,768][00663] Fps is (10 sec: 2865.4, 60 sec: 3003.4, 300 sec: 3110.1). Total num frames: 3866624. Throughput: 0: 773.3. Samples: 952918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:27:43,772][00663] Avg episode reward: [(0, '25.707')]
[2023-11-15 07:27:48,762][00663] Fps is (10 sec: 2458.1, 60 sec: 2935.5, 300 sec: 3110.2). Total num frames: 3878912. Throughput: 0: 719.4. Samples: 956482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:27:48,765][00663] Avg episode reward: [(0, '25.896')]
[2023-11-15 07:27:53,762][00663] Fps is (10 sec: 2459.1, 60 sec: 2935.5, 300 sec: 3110.2). Total num frames: 3891200. Throughput: 0: 693.5. Samples: 958122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:27:53,765][00663] Avg episode reward: [(0, '24.568')]
[2023-11-15 07:27:57,585][19979] Updated weights for policy 0, policy_version 954 (0.0014)
[2023-11-15 07:27:58,762][00663] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3124.1). Total num frames: 3911680. Throughput: 0: 710.8. Samples: 963274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-11-15 07:27:58,765][00663] Avg episode reward: [(0, '24.958')]
[2023-11-15 07:28:03,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3124.1). Total num frames: 3928064. Throughput: 0: 754.7. Samples: 968852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2023-11-15 07:28:03,772][00663] Avg episode reward: [(0, '25.583')]
[2023-11-15 07:28:08,762][00663] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 3096.3). Total num frames: 3940352. Throughput: 0: 749.3. Samples: 970476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:28:08,775][00663] Avg episode reward: [(0, '25.390')]
[2023-11-15 07:28:11,854][19979] Updated weights for policy 0, policy_version 964 (0.0049)
[2023-11-15 07:28:13,764][00663] Fps is (10 sec: 2457.1, 60 sec: 2867.1, 300 sec: 3082.4). Total num frames: 3952640. Throughput: 0: 729.2. Samples: 973778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2023-11-15 07:28:13,767][00663] Avg episode reward: [(0, '26.581')]
[2023-11-15 07:28:18,762][00663] Fps is (10 sec: 2457.6, 60 sec: 2867.3, 300 sec: 3082.4). Total num frames: 3964928. Throughput: 0: 681.1. Samples: 977338. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2023-11-15 07:28:18,769][00663] Avg episode reward: [(0, '26.764')]
[2023-11-15 07:28:23,762][00663] Fps is (10 sec: 2867.8, 60 sec: 2935.5, 300 sec: 3096.3). Total num frames: 3981312. Throughput: 0: 680.9. Samples: 979950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:28:23,771][00663] Avg episode reward: [(0, '26.737')]
[2023-11-15 07:28:25,370][19979] Updated weights for policy 0, policy_version 974 (0.0022)
[2023-11-15 07:28:28,762][00663] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3096.3). Total num frames: 4001792. Throughput: 0: 735.1. Samples: 985992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2023-11-15 07:28:28,765][00663] Avg episode reward: [(0, '26.393')]
[2023-11-15 07:28:29,542][19966] Stopping Batcher_0...
[2023-11-15 07:28:29,542][19966] Loop batcher_evt_loop terminating...
[2023-11-15 07:28:29,543][00663] Component Batcher_0 stopped!
[2023-11-15 07:28:29,547][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2023-11-15 07:28:29,596][00663] Component RolloutWorker_w6 stopped!
[2023-11-15 07:28:29,609][00663] Component RolloutWorker_w2 stopped!
[2023-11-15 07:28:29,616][19982] Stopping RolloutWorker_w2...
[2023-11-15 07:28:29,617][19982] Loop rollout_proc2_evt_loop terminating...
[2023-11-15 07:28:29,615][00663] Component RolloutWorker_w3 stopped!
[2023-11-15 07:28:29,603][19991] Stopping RolloutWorker_w6...
[2023-11-15 07:28:29,612][19983] Stopping RolloutWorker_w3...
[2023-11-15 07:28:29,618][19991] Loop rollout_proc6_evt_loop terminating...
[2023-11-15 07:28:29,635][00663] Component RolloutWorker_w1 stopped!
[2023-11-15 07:28:29,647][00663] Component RolloutWorker_w5 stopped!
[2023-11-15 07:28:29,649][19989] Stopping RolloutWorker_w5...
[2023-11-15 07:28:29,637][19980] Stopping RolloutWorker_w1...
[2023-11-15 07:28:29,628][19983] Loop rollout_proc3_evt_loop terminating...
[2023-11-15 07:28:29,653][00663] Component RolloutWorker_w0 stopped!
[2023-11-15 07:28:29,656][19989] Loop rollout_proc5_evt_loop terminating...
[2023-11-15 07:28:29,657][19980] Loop rollout_proc1_evt_loop terminating...
[2023-11-15 07:28:29,654][19981] Stopping RolloutWorker_w0...
[2023-11-15 07:28:29,663][19981] Loop rollout_proc0_evt_loop terminating...
[2023-11-15 07:28:29,663][00663] Component RolloutWorker_w7 stopped!
[2023-11-15 07:28:29,669][19984] Stopping RolloutWorker_w4...
[2023-11-15 07:28:29,669][00663] Component RolloutWorker_w4 stopped!
[2023-11-15 07:28:29,666][19990] Stopping RolloutWorker_w7...
[2023-11-15 07:28:29,675][19984] Loop rollout_proc4_evt_loop terminating...
[2023-11-15 07:28:29,675][19990] Loop rollout_proc7_evt_loop terminating...
[2023-11-15 07:28:29,683][19979] Weights refcount: 2 0
[2023-11-15 07:28:29,686][19979] Stopping InferenceWorker_p0-w0...
[2023-11-15 07:28:29,689][19979] Loop inference_proc0-0_evt_loop terminating...
[2023-11-15 07:28:29,686][00663] Component InferenceWorker_p0-w0 stopped!
[2023-11-15 07:28:29,735][19966] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000850_3481600.pth
[2023-11-15 07:28:29,744][19966] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2023-11-15 07:28:29,920][19966] Stopping LearnerWorker_p0...
[2023-11-15 07:28:29,921][19966] Loop learner_proc0_evt_loop terminating...
[2023-11-15 07:28:29,920][00663] Component LearnerWorker_p0 stopped!
[2023-11-15 07:28:29,922][00663] Waiting for process learner_proc0 to stop...
[2023-11-15 07:28:31,734][00663] Waiting for process inference_proc0-0 to join...
[2023-11-15 07:28:31,949][00663] Waiting for process rollout_proc0 to join...
[2023-11-15 07:28:35,237][00663] Waiting for process rollout_proc1 to join...
[2023-11-15 07:28:35,239][00663] Waiting for process rollout_proc2 to join...
[2023-11-15 07:28:35,240][00663] Waiting for process rollout_proc3 to join...
[2023-11-15 07:28:35,242][00663] Waiting for process rollout_proc4 to join...
[2023-11-15 07:28:35,244][00663] Waiting for process rollout_proc5 to join...
[2023-11-15 07:28:35,247][00663] Waiting for process rollout_proc6 to join...
[2023-11-15 07:28:35,249][00663] Waiting for process rollout_proc7 to join...
[2023-11-15 07:28:35,251][00663] Batcher 0 profile tree view:
batching: 27.7594, releasing_batches: 0.0312
[2023-11-15 07:28:35,255][00663] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 549.7311
update_model: 9.1496
weight_update: 0.0023
one_step: 0.0114
handle_policy_step: 624.9387
deserialize: 16.7758, stack: 3.4160, obs_to_device_normalize: 122.3939, forward: 342.2939, send_messages: 28.8077
prepare_outputs: 80.7034
to_cpu: 45.8282
[2023-11-15 07:28:35,257][00663] Learner 0 profile tree view:
misc: 0.0053, prepare_batch: 13.0526
train: 73.9165
epoch_init: 0.0099, minibatch_init: 0.0125, losses_postprocess: 0.6118, kl_divergence: 0.7207, after_optimizer: 4.1290
calculate_losses: 25.7212
losses_init: 0.0039, forward_head: 1.4681, bptt_initial: 16.5538, tail: 1.2188, advantages_returns: 0.2941, losses: 3.7579
bptt: 2.0750
bptt_forward_core: 1.9813
update: 41.9919
clip: 0.8657
[2023-11-15 07:28:35,258][00663] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.3667, enqueue_policy_requests: 163.3112, env_step: 909.9032, overhead: 25.1335, complete_rollouts: 7.5770
save_policy_outputs: 22.8360
split_output_tensors: 11.2868
[2023-11-15 07:28:35,260][00663] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3751, enqueue_policy_requests: 165.9420, env_step: 907.0107, overhead: 23.9809, complete_rollouts: 7.8433
save_policy_outputs: 22.6742
split_output_tensors: 11.0009
[2023-11-15 07:28:35,265][00663] Loop Runner_EvtLoop terminating...
[2023-11-15 07:28:35,267][00663] Runner profile tree view:
main_loop: 1253.2841
[2023-11-15 07:28:35,268][00663] Collected {0: 4005888}, FPS: 3150.6
[2023-11-15 07:28:35,330][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:28:35,332][00663] Overriding arg 'num_workers' with value 1 passed from command line
[2023-11-15 07:28:35,334][00663] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-11-15 07:28:35,337][00663] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-11-15 07:28:35,339][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:28:35,343][00663] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-11-15 07:28:35,344][00663] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:28:35,345][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-11-15 07:28:35,346][00663] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2023-11-15 07:28:35,347][00663] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2023-11-15 07:28:35,348][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-11-15 07:28:35,349][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-11-15 07:28:35,350][00663] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-11-15 07:28:35,356][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-11-15 07:28:35,357][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-11-15 07:28:35,636][00663] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:28:35,639][00663] RunningMeanStd input shape: (1,)
[2023-11-15 07:28:35,657][00663] ConvEncoder: input_channels=3
[2023-11-15 07:28:35,720][00663] Conv encoder output size: 512
[2023-11-15 07:28:35,723][00663] Policy head output size: 512
[2023-11-15 07:28:35,752][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2023-11-15 07:28:36,489][00663] Num frames 100...
[2023-11-15 07:28:36,695][00663] Num frames 200...
[2023-11-15 07:28:36,898][00663] Num frames 300...
[2023-11-15 07:28:37,116][00663] Num frames 400...
[2023-11-15 07:28:37,329][00663] Num frames 500...
[2023-11-15 07:28:37,541][00663] Num frames 600...
[2023-11-15 07:28:37,772][00663] Num frames 700...
[2023-11-15 07:28:37,989][00663] Num frames 800...
[2023-11-15 07:28:38,199][00663] Num frames 900...
[2023-11-15 07:28:38,415][00663] Num frames 1000...
[2023-11-15 07:28:38,568][00663] Num frames 1100...
[2023-11-15 07:28:38,716][00663] Num frames 1200...
[2023-11-15 07:28:38,860][00663] Num frames 1300...
[2023-11-15 07:28:39,041][00663] Num frames 1400...
[2023-11-15 07:28:39,183][00663] Num frames 1500...
[2023-11-15 07:28:39,329][00663] Num frames 1600...
[2023-11-15 07:28:39,399][00663] Avg episode rewards: #0: 38.080, true rewards: #0: 16.080
[2023-11-15 07:28:39,400][00663] Avg episode reward: 38.080, avg true_objective: 16.080
[2023-11-15 07:28:39,535][00663] Num frames 1700...
[2023-11-15 07:28:39,676][00663] Num frames 1800...
[2023-11-15 07:28:39,809][00663] Num frames 1900...
[2023-11-15 07:28:39,991][00663] Avg episode rewards: #0: 21.465, true rewards: #0: 9.965
[2023-11-15 07:28:39,992][00663] Avg episode reward: 21.465, avg true_objective: 9.965
[2023-11-15 07:28:40,008][00663] Num frames 2000...
[2023-11-15 07:28:40,156][00663] Num frames 2100...
[2023-11-15 07:28:40,299][00663] Num frames 2200...
[2023-11-15 07:28:40,438][00663] Num frames 2300...
[2023-11-15 07:28:40,585][00663] Num frames 2400...
[2023-11-15 07:28:40,725][00663] Num frames 2500...
[2023-11-15 07:28:40,866][00663] Num frames 2600...
[2023-11-15 07:28:41,013][00663] Num frames 2700...
[2023-11-15 07:28:41,167][00663] Num frames 2800...
[2023-11-15 07:28:41,311][00663] Num frames 2900...
[2023-11-15 07:28:41,455][00663] Num frames 3000...
[2023-11-15 07:28:41,595][00663] Num frames 3100...
[2023-11-15 07:28:41,742][00663] Num frames 3200...
[2023-11-15 07:28:41,887][00663] Num frames 3300...
[2023-11-15 07:28:42,030][00663] Num frames 3400...
[2023-11-15 07:28:42,188][00663] Num frames 3500...
[2023-11-15 07:28:42,340][00663] Num frames 3600...
[2023-11-15 07:28:42,489][00663] Num frames 3700...
[2023-11-15 07:28:42,641][00663] Num frames 3800...
[2023-11-15 07:28:42,785][00663] Num frames 3900...
[2023-11-15 07:28:42,931][00663] Num frames 4000...
[2023-11-15 07:28:43,115][00663] Avg episode rewards: #0: 32.976, true rewards: #0: 13.643
[2023-11-15 07:28:43,117][00663] Avg episode reward: 32.976, avg true_objective: 13.643
[2023-11-15 07:28:43,131][00663] Num frames 4100...
[2023-11-15 07:28:43,277][00663] Num frames 4200...
[2023-11-15 07:28:43,422][00663] Num frames 4300...
[2023-11-15 07:28:43,564][00663] Num frames 4400...
[2023-11-15 07:28:43,712][00663] Num frames 4500...
[2023-11-15 07:28:43,855][00663] Num frames 4600...
[2023-11-15 07:28:43,993][00663] Num frames 4700...
[2023-11-15 07:28:44,135][00663] Num frames 4800...
[2023-11-15 07:28:44,277][00663] Num frames 4900...
[2023-11-15 07:28:44,425][00663] Num frames 5000...
[2023-11-15 07:28:44,569][00663] Num frames 5100...
[2023-11-15 07:28:44,715][00663] Num frames 5200...
[2023-11-15 07:28:44,852][00663] Avg episode rewards: #0: 32.387, true rewards: #0: 13.137
[2023-11-15 07:28:44,854][00663] Avg episode reward: 32.387, avg true_objective: 13.137
[2023-11-15 07:28:44,925][00663] Num frames 5300...
[2023-11-15 07:28:45,066][00663] Num frames 5400...
[2023-11-15 07:28:45,216][00663] Num frames 5500...
[2023-11-15 07:28:45,364][00663] Num frames 5600...
[2023-11-15 07:28:45,510][00663] Num frames 5700...
[2023-11-15 07:28:45,654][00663] Num frames 5800...
[2023-11-15 07:28:45,795][00663] Num frames 5900...
[2023-11-15 07:28:45,935][00663] Num frames 6000...
[2023-11-15 07:28:46,079][00663] Num frames 6100...
[2023-11-15 07:28:46,231][00663] Num frames 6200...
[2023-11-15 07:28:46,373][00663] Num frames 6300...
[2023-11-15 07:28:46,516][00663] Num frames 6400...
[2023-11-15 07:28:46,665][00663] Num frames 6500...
[2023-11-15 07:28:46,804][00663] Num frames 6600...
[2023-11-15 07:28:46,991][00663] Avg episode rewards: #0: 32.376, true rewards: #0: 13.376
[2023-11-15 07:28:46,993][00663] Avg episode reward: 32.376, avg true_objective: 13.376
[2023-11-15 07:28:47,016][00663] Num frames 6700...
[2023-11-15 07:28:47,159][00663] Num frames 6800...
[2023-11-15 07:28:47,306][00663] Num frames 6900...
[2023-11-15 07:28:47,451][00663] Num frames 7000...
[2023-11-15 07:28:47,598][00663] Num frames 7100...
[2023-11-15 07:28:47,742][00663] Num frames 7200...
[2023-11-15 07:28:47,884][00663] Num frames 7300...
[2023-11-15 07:28:48,018][00663] Num frames 7400...
[2023-11-15 07:28:48,156][00663] Num frames 7500...
[2023-11-15 07:28:48,344][00663] Avg episode rewards: #0: 29.807, true rewards: #0: 12.640
[2023-11-15 07:28:48,345][00663] Avg episode reward: 29.807, avg true_objective: 12.640
[2023-11-15 07:28:48,375][00663] Num frames 7600...
[2023-11-15 07:28:48,540][00663] Num frames 7700...
[2023-11-15 07:28:48,750][00663] Num frames 7800...
[2023-11-15 07:28:48,951][00663] Num frames 7900...
[2023-11-15 07:28:49,157][00663] Num frames 8000...
[2023-11-15 07:28:49,372][00663] Num frames 8100...
[2023-11-15 07:28:49,577][00663] Num frames 8200...
[2023-11-15 07:28:49,789][00663] Num frames 8300...
[2023-11-15 07:28:49,989][00663] Num frames 8400...
[2023-11-15 07:28:50,191][00663] Num frames 8500...
[2023-11-15 07:28:50,404][00663] Num frames 8600...
[2023-11-15 07:28:50,605][00663] Num frames 8700...
[2023-11-15 07:28:50,807][00663] Num frames 8800...
[2023-11-15 07:28:51,007][00663] Num frames 8900...
[2023-11-15 07:28:51,215][00663] Num frames 9000...
[2023-11-15 07:28:51,437][00663] Num frames 9100...
[2023-11-15 07:28:51,648][00663] Num frames 9200...
[2023-11-15 07:28:51,843][00663] Num frames 9300...
[2023-11-15 07:28:52,042][00663] Num frames 9400...
[2023-11-15 07:28:52,243][00663] Num frames 9500...
[2023-11-15 07:28:52,452][00663] Num frames 9600...
[2023-11-15 07:28:52,682][00663] Avg episode rewards: #0: 33.977, true rewards: #0: 13.834
[2023-11-15 07:28:52,685][00663] Avg episode reward: 33.977, avg true_objective: 13.834
[2023-11-15 07:28:52,727][00663] Num frames 9700...
[2023-11-15 07:28:52,933][00663] Num frames 9800...
[2023-11-15 07:28:53,141][00663] Num frames 9900...
[2023-11-15 07:28:53,356][00663] Num frames 10000...
[2023-11-15 07:28:53,567][00663] Num frames 10100...
[2023-11-15 07:28:53,780][00663] Num frames 10200...
[2023-11-15 07:28:53,985][00663] Num frames 10300...
[2023-11-15 07:28:54,187][00663] Num frames 10400...
[2023-11-15 07:28:54,403][00663] Num frames 10500...
[2023-11-15 07:28:54,557][00663] Num frames 10600...
[2023-11-15 07:28:54,701][00663] Num frames 10700...
[2023-11-15 07:28:54,846][00663] Num frames 10800...
[2023-11-15 07:28:54,988][00663] Num frames 10900...
[2023-11-15 07:28:55,094][00663] Avg episode rewards: #0: 33.541, true rewards: #0: 13.666
[2023-11-15 07:28:55,095][00663] Avg episode reward: 33.541, avg true_objective: 13.666
[2023-11-15 07:28:55,195][00663] Num frames 11000...
[2023-11-15 07:28:55,334][00663] Num frames 11100...
[2023-11-15 07:28:55,490][00663] Num frames 11200...
[2023-11-15 07:28:55,630][00663] Num frames 11300...
[2023-11-15 07:28:55,767][00663] Num frames 11400...
[2023-11-15 07:28:55,907][00663] Num frames 11500...
[2023-11-15 07:28:56,034][00663] Avg episode rewards: #0: 31.386, true rewards: #0: 12.831
[2023-11-15 07:28:56,036][00663] Avg episode reward: 31.386, avg true_objective: 12.831
[2023-11-15 07:28:56,112][00663] Num frames 11600...
[2023-11-15 07:28:56,251][00663] Num frames 11700...
[2023-11-15 07:28:56,392][00663] Num frames 11800...
[2023-11-15 07:28:56,456][00663] Avg episode rewards: #0: 28.604, true rewards: #0: 11.804
[2023-11-15 07:28:56,457][00663] Avg episode reward: 28.604, avg true_objective: 11.804
[2023-11-15 07:30:22,153][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2023-11-15 07:30:22,734][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:30:22,737][00663] Overriding arg 'num_workers' with value 1 passed from command line
[2023-11-15 07:30:22,739][00663] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-11-15 07:30:22,741][00663] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-11-15 07:30:22,743][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:30:22,745][00663] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-11-15 07:30:22,747][00663] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2023-11-15 07:30:22,748][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-11-15 07:30:22,749][00663] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2023-11-15 07:30:22,750][00663] Adding new argument 'hf_repository'='nikxtaco/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2023-11-15 07:30:22,751][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-11-15 07:30:22,752][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-11-15 07:30:22,754][00663] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-11-15 07:30:22,755][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-11-15 07:30:22,756][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-11-15 07:30:22,800][00663] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:30:22,803][00663] RunningMeanStd input shape: (1,)
[2023-11-15 07:30:22,821][00663] ConvEncoder: input_channels=3
[2023-11-15 07:30:22,882][00663] Conv encoder output size: 512
[2023-11-15 07:30:22,885][00663] Policy head output size: 512
[2023-11-15 07:30:22,913][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2023-11-15 07:30:23,687][00663] Num frames 100...
[2023-11-15 07:30:23,864][00663] Num frames 200...
[2023-11-15 07:30:24,056][00663] Num frames 300...
[2023-11-15 07:30:24,272][00663] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2023-11-15 07:30:24,275][00663] Avg episode reward: 3.840, avg true_objective: 3.840
[2023-11-15 07:30:24,311][00663] Num frames 400...
[2023-11-15 07:30:24,496][00663] Num frames 500...
[2023-11-15 07:30:24,691][00663] Num frames 600...
[2023-11-15 07:30:24,901][00663] Num frames 700...
[2023-11-15 07:30:25,093][00663] Num frames 800...
[2023-11-15 07:30:25,271][00663] Num frames 900...
[2023-11-15 07:30:25,463][00663] Num frames 1000...
[2023-11-15 07:30:25,654][00663] Num frames 1100...
[2023-11-15 07:30:25,856][00663] Num frames 1200...
[2023-11-15 07:30:26,044][00663] Num frames 1300...
[2023-11-15 07:30:26,231][00663] Num frames 1400...
[2023-11-15 07:30:26,434][00663] Num frames 1500...
[2023-11-15 07:30:26,644][00663] Num frames 1600...
[2023-11-15 07:30:26,857][00663] Num frames 1700...
[2023-11-15 07:30:27,079][00663] Num frames 1800...
[2023-11-15 07:30:27,274][00663] Num frames 1900...
[2023-11-15 07:30:27,463][00663] Num frames 2000...
[2023-11-15 07:30:27,666][00663] Num frames 2100...
[2023-11-15 07:30:27,891][00663] Num frames 2200...
[2023-11-15 07:30:28,101][00663] Num frames 2300...
[2023-11-15 07:30:28,316][00663] Num frames 2400...
[2023-11-15 07:30:28,538][00663] Avg episode rewards: #0: 32.419, true rewards: #0: 12.420
[2023-11-15 07:30:28,541][00663] Avg episode reward: 32.419, avg true_objective: 12.420
[2023-11-15 07:30:28,592][00663] Num frames 2500...
[2023-11-15 07:30:28,795][00663] Num frames 2600...
[2023-11-15 07:30:28,993][00663] Num frames 2700...
[2023-11-15 07:30:29,190][00663] Num frames 2800...
[2023-11-15 07:30:29,384][00663] Num frames 2900...
[2023-11-15 07:30:29,585][00663] Num frames 3000...
[2023-11-15 07:30:29,817][00663] Num frames 3100...
[2023-11-15 07:30:30,035][00663] Num frames 3200...
[2023-11-15 07:30:30,248][00663] Num frames 3300...
[2023-11-15 07:30:30,475][00663] Num frames 3400...
[2023-11-15 07:30:30,752][00663] Avg episode rewards: #0: 29.303, true rewards: #0: 11.637
[2023-11-15 07:30:30,754][00663] Avg episode reward: 29.303, avg true_objective: 11.637
[2023-11-15 07:30:30,777][00663] Num frames 3500...
[2023-11-15 07:30:31,008][00663] Num frames 3600...
[2023-11-15 07:30:31,226][00663] Num frames 3700...
[2023-11-15 07:30:31,424][00663] Num frames 3800...
[2023-11-15 07:30:31,631][00663] Num frames 3900...
[2023-11-15 07:30:31,840][00663] Num frames 4000...
[2023-11-15 07:30:32,044][00663] Num frames 4100...
[2023-11-15 07:30:32,260][00663] Num frames 4200...
[2023-11-15 07:30:32,470][00663] Num frames 4300...
[2023-11-15 07:30:32,682][00663] Num frames 4400...
[2023-11-15 07:30:32,886][00663] Num frames 4500...
[2023-11-15 07:30:33,087][00663] Num frames 4600...
[2023-11-15 07:30:33,293][00663] Num frames 4700...
[2023-11-15 07:30:33,475][00663] Avg episode rewards: #0: 29.652, true rewards: #0: 11.903
[2023-11-15 07:30:33,478][00663] Avg episode reward: 29.652, avg true_objective: 11.903
[2023-11-15 07:30:33,565][00663] Num frames 4800...
[2023-11-15 07:30:33,779][00663] Num frames 4900...
[2023-11-15 07:30:33,978][00663] Num frames 5000...
[2023-11-15 07:30:34,186][00663] Num frames 5100...
[2023-11-15 07:30:34,398][00663] Num frames 5200...
[2023-11-15 07:30:34,602][00663] Num frames 5300...
[2023-11-15 07:30:34,807][00663] Avg episode rewards: #0: 26.138, true rewards: #0: 10.738
[2023-11-15 07:30:34,810][00663] Avg episode reward: 26.138, avg true_objective: 10.738
[2023-11-15 07:30:34,878][00663] Num frames 5400...
[2023-11-15 07:30:35,089][00663] Num frames 5500...
[2023-11-15 07:30:35,295][00663] Num frames 5600...
[2023-11-15 07:30:35,495][00663] Num frames 5700...
[2023-11-15 07:30:35,706][00663] Num frames 5800...
[2023-11-15 07:30:35,903][00663] Num frames 5900...
[2023-11-15 07:30:36,100][00663] Num frames 6000...
[2023-11-15 07:30:36,231][00663] Num frames 6100...
[2023-11-15 07:30:36,378][00663] Num frames 6200...
[2023-11-15 07:30:36,526][00663] Num frames 6300...
[2023-11-15 07:30:36,724][00663] Num frames 6400...
[2023-11-15 07:30:36,908][00663] Num frames 6500...
[2023-11-15 07:30:37,092][00663] Num frames 6600...
[2023-11-15 07:30:37,283][00663] Num frames 6700...
[2023-11-15 07:30:37,484][00663] Num frames 6800...
[2023-11-15 07:30:37,623][00663] Avg episode rewards: #0: 27.902, true rewards: #0: 11.402
[2023-11-15 07:30:37,625][00663] Avg episode reward: 27.902, avg true_objective: 11.402
[2023-11-15 07:30:37,746][00663] Num frames 6900...
[2023-11-15 07:30:37,934][00663] Num frames 7000...
[2023-11-15 07:30:38,120][00663] Num frames 7100...
[2023-11-15 07:30:38,307][00663] Num frames 7200...
[2023-11-15 07:30:38,492][00663] Avg episode rewards: #0: 24.796, true rewards: #0: 10.367
[2023-11-15 07:30:38,495][00663] Avg episode reward: 24.796, avg true_objective: 10.367
[2023-11-15 07:30:38,592][00663] Num frames 7300...
[2023-11-15 07:30:38,789][00663] Num frames 7400...
[2023-11-15 07:30:38,991][00663] Num frames 7500...
[2023-11-15 07:30:39,185][00663] Num frames 7600...
[2023-11-15 07:30:39,385][00663] Num frames 7700...
[2023-11-15 07:30:39,610][00663] Num frames 7800...
[2023-11-15 07:30:39,803][00663] Num frames 7900...
[2023-11-15 07:30:40,001][00663] Num frames 8000...
[2023-11-15 07:30:40,112][00663] Avg episode rewards: #0: 23.656, true rewards: #0: 10.031
[2023-11-15 07:30:40,113][00663] Avg episode reward: 23.656, avg true_objective: 10.031
[2023-11-15 07:30:40,261][00663] Num frames 8100...
[2023-11-15 07:30:40,469][00663] Num frames 8200...
[2023-11-15 07:30:40,679][00663] Num frames 8300...
[2023-11-15 07:30:40,889][00663] Num frames 8400...
[2023-11-15 07:30:41,101][00663] Num frames 8500...
[2023-11-15 07:30:41,316][00663] Num frames 8600...
[2023-11-15 07:30:41,536][00663] Num frames 8700...
[2023-11-15 07:30:41,734][00663] Num frames 8800...
[2023-11-15 07:30:41,967][00663] Avg episode rewards: #0: 22.988, true rewards: #0: 9.877
[2023-11-15 07:30:41,969][00663] Avg episode reward: 22.988, avg true_objective: 9.877
[2023-11-15 07:30:41,997][00663] Num frames 8900...
[2023-11-15 07:30:42,190][00663] Num frames 9000...
[2023-11-15 07:30:42,326][00663] Num frames 9100...
[2023-11-15 07:30:42,462][00663] Num frames 9200...
[2023-11-15 07:30:42,608][00663] Num frames 9300...
[2023-11-15 07:30:42,745][00663] Num frames 9400...
[2023-11-15 07:30:42,878][00663] Num frames 9500...
[2023-11-15 07:30:43,014][00663] Num frames 9600...
[2023-11-15 07:30:43,167][00663] Num frames 9700...
[2023-11-15 07:30:43,267][00663] Avg episode rewards: #0: 22.728, true rewards: #0: 9.728
[2023-11-15 07:30:43,270][00663] Avg episode reward: 22.728, avg true_objective: 9.728
[2023-11-15 07:31:48,191][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2023-11-15 07:31:53,927][00663] The model has been pushed to https://huggingface.co/nikxtaco/rl_course_vizdoom_health_gathering_supreme
[2023-11-15 07:32:45,353][00663] Environment doom_basic already registered, overwriting...
[2023-11-15 07:32:45,356][00663] Environment doom_two_colors_easy already registered, overwriting...
[2023-11-15 07:32:45,358][00663] Environment doom_two_colors_hard already registered, overwriting...
[2023-11-15 07:32:45,359][00663] Environment doom_dm already registered, overwriting...
[2023-11-15 07:32:45,361][00663] Environment doom_dwango5 already registered, overwriting...
[2023-11-15 07:32:45,363][00663] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2023-11-15 07:32:45,364][00663] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2023-11-15 07:32:45,366][00663] Environment doom_my_way_home already registered, overwriting...
[2023-11-15 07:32:45,368][00663] Environment doom_deadly_corridor already registered, overwriting...
[2023-11-15 07:32:45,369][00663] Environment doom_defend_the_center already registered, overwriting...
[2023-11-15 07:32:45,370][00663] Environment doom_defend_the_line already registered, overwriting...
[2023-11-15 07:32:45,372][00663] Environment doom_health_gathering already registered, overwriting...
[2023-11-15 07:32:45,374][00663] Environment doom_health_gathering_supreme already registered, overwriting...
[2023-11-15 07:32:45,376][00663] Environment doom_battle already registered, overwriting...
[2023-11-15 07:32:45,378][00663] Environment doom_battle2 already registered, overwriting...
[2023-11-15 07:32:45,379][00663] Environment doom_duel_bots already registered, overwriting...
[2023-11-15 07:32:45,381][00663] Environment doom_deathmatch_bots already registered, overwriting...
[2023-11-15 07:32:45,383][00663] Environment doom_duel already registered, overwriting...
[2023-11-15 07:32:45,385][00663] Environment doom_deathmatch_full already registered, overwriting...
[2023-11-15 07:32:45,386][00663] Environment doom_benchmark already registered, overwriting...
[2023-11-15 07:32:45,388][00663] register_encoder_factory: <function make_vizdoom_encoder at 0x7e8c58d712d0>
[2023-11-15 07:32:45,419][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:32:45,420][00663] Overriding arg 'train_for_env_steps' with value 50000 passed from command line
[2023-11-15 07:32:45,424][00663] Experiment dir /content/train_dir/default_experiment already exists!
[2023-11-15 07:32:45,425][00663] Resuming existing experiment from /content/train_dir/default_experiment...
[2023-11-15 07:32:45,429][00663] Weights and Biases integration disabled
[2023-11-15 07:32:45,433][00663] Environment var CUDA_VISIBLE_DEVICES is 0
[2023-11-15 07:32:48,231][00663] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=50000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000}
git_hash=unknown
git_repo_name=not a git repository
[2023-11-15 07:32:48,234][00663] Saving configuration to /content/train_dir/default_experiment/config.json...
[2023-11-15 07:32:48,237][00663] Rollout worker 0 uses device cpu
[2023-11-15 07:32:48,239][00663] Rollout worker 1 uses device cpu
[2023-11-15 07:32:48,243][00663] Rollout worker 2 uses device cpu
[2023-11-15 07:32:48,244][00663] Rollout worker 3 uses device cpu
[2023-11-15 07:32:48,245][00663] Rollout worker 4 uses device cpu
[2023-11-15 07:32:48,246][00663] Rollout worker 5 uses device cpu
[2023-11-15 07:32:48,251][00663] Rollout worker 6 uses device cpu
[2023-11-15 07:32:48,252][00663] Rollout worker 7 uses device cpu
[2023-11-15 07:32:48,364][00663] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:32:48,367][00663] InferenceWorker_p0-w0: min num requests: 2
[2023-11-15 07:32:48,408][00663] Starting all processes...
[2023-11-15 07:32:48,410][00663] Starting process learner_proc0
[2023-11-15 07:32:48,483][00663] Starting all processes...
[2023-11-15 07:32:48,496][00663] Starting process inference_proc0-0
[2023-11-15 07:32:48,498][00663] Starting process rollout_proc0
[2023-11-15 07:32:48,516][00663] Starting process rollout_proc1
[2023-11-15 07:32:48,517][00663] Starting process rollout_proc2
[2023-11-15 07:32:48,517][00663] Starting process rollout_proc3
[2023-11-15 07:32:48,517][00663] Starting process rollout_proc4
[2023-11-15 07:32:48,517][00663] Starting process rollout_proc5
[2023-11-15 07:32:48,517][00663] Starting process rollout_proc6
[2023-11-15 07:32:48,517][00663] Starting process rollout_proc7
[2023-11-15 07:33:05,025][29796] Worker 1 uses CPU cores [1]
[2023-11-15 07:33:05,444][29797] Worker 2 uses CPU cores [0]
[2023-11-15 07:33:05,498][29798] Worker 3 uses CPU cores [1]
[2023-11-15 07:33:05,640][29794] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:33:05,641][29794] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2023-11-15 07:33:05,711][29794] Num visible devices: 1
[2023-11-15 07:33:05,741][29802] Worker 7 uses CPU cores [1]
[2023-11-15 07:33:05,833][29799] Worker 4 uses CPU cores [0]
[2023-11-15 07:33:05,854][29800] Worker 5 uses CPU cores [1]
[2023-11-15 07:33:05,873][29781] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:33:05,873][29781] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2023-11-15 07:33:05,903][29795] Worker 0 uses CPU cores [0]
[2023-11-15 07:33:05,913][29781] Num visible devices: 1
[2023-11-15 07:33:05,915][29781] Starting seed is not provided
[2023-11-15 07:33:05,916][29781] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:33:05,916][29781] Initializing actor-critic model on device cuda:0
[2023-11-15 07:33:05,917][29781] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:33:05,918][29781] RunningMeanStd input shape: (1,)
[2023-11-15 07:33:05,941][29781] ConvEncoder: input_channels=3
[2023-11-15 07:33:05,950][29801] Worker 6 uses CPU cores [0]
[2023-11-15 07:33:06,110][29781] Conv encoder output size: 512
[2023-11-15 07:33:06,111][29781] Policy head output size: 512
[2023-11-15 07:33:06,135][29781] Created Actor Critic model with architecture:
[2023-11-15 07:33:06,136][29781] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2023-11-15 07:33:06,412][29781] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-11-15 07:33:06,874][29781] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2023-11-15 07:33:06,915][29781] Loading model from checkpoint
[2023-11-15 07:33:06,918][29781] Loaded experiment state at self.train_step=978, self.env_steps=4005888
[2023-11-15 07:33:06,919][29781] Initialized policy 0 weights for model version 978
[2023-11-15 07:33:06,937][29781] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-11-15 07:33:06,946][29781] LearnerWorker_p0 finished initialization!
[2023-11-15 07:33:07,279][29794] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:33:07,282][29794] RunningMeanStd input shape: (1,)
[2023-11-15 07:33:07,302][29794] ConvEncoder: input_channels=3
[2023-11-15 07:33:07,473][29794] Conv encoder output size: 512
[2023-11-15 07:33:07,476][29794] Policy head output size: 512
[2023-11-15 07:33:07,573][00663] Inference worker 0-0 is ready!
[2023-11-15 07:33:07,576][00663] All inference workers are ready! Signal rollout workers to start!
[2023-11-15 07:33:07,842][29800] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:33:07,841][29802] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:33:07,845][29796] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:33:07,844][29798] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:33:07,878][29799] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:33:07,880][29795] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:33:07,884][29797] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:33:07,885][29801] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-11-15 07:33:08,353][00663] Heartbeat connected on Batcher_0
[2023-11-15 07:33:08,361][00663] Heartbeat connected on LearnerWorker_p0
[2023-11-15 07:33:08,413][00663] Heartbeat connected on InferenceWorker_p0-w0
[2023-11-15 07:33:09,261][29800] Decorrelating experience for 0 frames...
[2023-11-15 07:33:09,346][29799] Decorrelating experience for 0 frames...
[2023-11-15 07:33:09,358][29797] Decorrelating experience for 0 frames...
[2023-11-15 07:33:09,363][29801] Decorrelating experience for 0 frames...
[2023-11-15 07:33:10,434][00663] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-11-15 07:33:10,749][29797] Decorrelating experience for 32 frames...
[2023-11-15 07:33:10,751][29801] Decorrelating experience for 32 frames...
[2023-11-15 07:33:10,763][29795] Decorrelating experience for 0 frames...
[2023-11-15 07:33:11,048][29796] Decorrelating experience for 0 frames...
[2023-11-15 07:33:11,096][29802] Decorrelating experience for 0 frames...
[2023-11-15 07:33:11,832][29796] Decorrelating experience for 32 frames...
[2023-11-15 07:33:12,228][29795] Decorrelating experience for 32 frames...
[2023-11-15 07:33:12,576][29801] Decorrelating experience for 64 frames...
[2023-11-15 07:33:12,578][29797] Decorrelating experience for 64 frames...
[2023-11-15 07:33:13,530][29802] Decorrelating experience for 32 frames...
[2023-11-15 07:33:13,738][29795] Decorrelating experience for 64 frames...
[2023-11-15 07:33:13,838][29798] Decorrelating experience for 0 frames...
[2023-11-15 07:33:13,868][29796] Decorrelating experience for 64 frames...
[2023-11-15 07:33:13,867][29801] Decorrelating experience for 96 frames...
[2023-11-15 07:33:14,088][00663] Heartbeat connected on RolloutWorker_w6
[2023-11-15 07:33:14,903][29797] Decorrelating experience for 96 frames...
[2023-11-15 07:33:15,016][29800] Decorrelating experience for 32 frames...
[2023-11-15 07:33:15,022][29798] Decorrelating experience for 32 frames...
[2023-11-15 07:33:15,038][29795] Decorrelating experience for 96 frames...
[2023-11-15 07:33:15,261][00663] Heartbeat connected on RolloutWorker_w2
[2023-11-15 07:33:15,433][00663] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-11-15 07:33:15,506][00663] Heartbeat connected on RolloutWorker_w0
[2023-11-15 07:33:16,199][29796] Decorrelating experience for 96 frames...
[2023-11-15 07:33:16,379][29802] Decorrelating experience for 64 frames...
[2023-11-15 07:33:16,468][00663] Heartbeat connected on RolloutWorker_w1
[2023-11-15 07:33:16,759][29798] Decorrelating experience for 64 frames...
[2023-11-15 07:33:17,730][29799] Decorrelating experience for 32 frames...
[2023-11-15 07:33:18,946][29800] Decorrelating experience for 64 frames...
[2023-11-15 07:33:19,909][29781] Stopping Batcher_0...
[2023-11-15 07:33:19,911][29781] Loop batcher_evt_loop terminating...
[2023-11-15 07:33:19,915][00663] Component Batcher_0 stopped!
[2023-11-15 07:33:19,917][29781] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
[2023-11-15 07:33:19,950][29796] Stopping RolloutWorker_w1...
[2023-11-15 07:33:19,950][00663] Component RolloutWorker_w1 stopped!
[2023-11-15 07:33:19,966][00663] Component RolloutWorker_w2 stopped!
[2023-11-15 07:33:19,971][29796] Loop rollout_proc1_evt_loop terminating...
[2023-11-15 07:33:19,966][29797] Stopping RolloutWorker_w2...
[2023-11-15 07:33:19,973][00663] Component RolloutWorker_w6 stopped!
[2023-11-15 07:33:19,973][29801] Stopping RolloutWorker_w6...
[2023-11-15 07:33:19,976][29797] Loop rollout_proc2_evt_loop terminating...
[2023-11-15 07:33:19,980][29801] Loop rollout_proc6_evt_loop terminating...
[2023-11-15 07:33:19,987][00663] Component RolloutWorker_w0 stopped!
[2023-11-15 07:33:19,987][29795] Stopping RolloutWorker_w0...
[2023-11-15 07:33:19,990][29795] Loop rollout_proc0_evt_loop terminating...
[2023-11-15 07:33:20,012][29794] Weights refcount: 2 0
[2023-11-15 07:33:20,021][00663] Component InferenceWorker_p0-w0 stopped!
[2023-11-15 07:33:20,023][29794] Stopping InferenceWorker_p0-w0...
[2023-11-15 07:33:20,023][29794] Loop inference_proc0-0_evt_loop terminating...
[2023-11-15 07:33:20,086][29781] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000941_3854336.pth
[2023-11-15 07:33:20,109][29781] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
[2023-11-15 07:33:20,145][29802] Decorrelating experience for 96 frames...
[2023-11-15 07:33:20,322][00663] Component LearnerWorker_p0 stopped!
[2023-11-15 07:33:20,324][29781] Stopping LearnerWorker_p0...
[2023-11-15 07:33:20,326][29781] Loop learner_proc0_evt_loop terminating...
[2023-11-15 07:33:20,604][29798] Decorrelating experience for 96 frames...
[2023-11-15 07:33:20,931][00663] Component RolloutWorker_w7 stopped!
[2023-11-15 07:33:20,934][29802] Stopping RolloutWorker_w7...
[2023-11-15 07:33:20,936][29802] Loop rollout_proc7_evt_loop terminating...
[2023-11-15 07:33:21,203][00663] Component RolloutWorker_w3 stopped!
[2023-11-15 07:33:21,201][29798] Stopping RolloutWorker_w3...
[2023-11-15 07:33:21,206][29798] Loop rollout_proc3_evt_loop terminating...
[2023-11-15 07:33:22,117][29800] Decorrelating experience for 96 frames...
[2023-11-15 07:33:22,120][29799] Decorrelating experience for 64 frames...
[2023-11-15 07:33:22,386][00663] Component RolloutWorker_w5 stopped!
[2023-11-15 07:33:22,386][29800] Stopping RolloutWorker_w5...
[2023-11-15 07:33:22,388][29800] Loop rollout_proc5_evt_loop terminating...
[2023-11-15 07:33:23,934][29799] Decorrelating experience for 96 frames...
[2023-11-15 07:33:24,200][29799] Stopping RolloutWorker_w4...
[2023-11-15 07:33:24,200][00663] Component RolloutWorker_w4 stopped!
[2023-11-15 07:33:24,207][29799] Loop rollout_proc4_evt_loop terminating...
[2023-11-15 07:33:24,206][00663] Waiting for process learner_proc0 to stop...
[2023-11-15 07:33:24,212][00663] Waiting for process inference_proc0-0 to join...
[2023-11-15 07:33:24,215][00663] Waiting for process rollout_proc0 to join...
[2023-11-15 07:33:24,220][00663] Waiting for process rollout_proc1 to join...
[2023-11-15 07:33:24,228][00663] Waiting for process rollout_proc2 to join...
[2023-11-15 07:33:24,230][00663] Waiting for process rollout_proc3 to join...
[2023-11-15 07:33:24,485][00663] Waiting for process rollout_proc4 to join...
[2023-11-15 07:33:25,056][00663] Waiting for process rollout_proc5 to join...
[2023-11-15 07:33:25,062][00663] Waiting for process rollout_proc6 to join...
[2023-11-15 07:33:25,064][00663] Waiting for process rollout_proc7 to join...
[2023-11-15 07:33:25,066][00663] Batcher 0 profile tree view:
batching: 0.0184, releasing_batches: 0.0000
[2023-11-15 07:33:25,068][00663] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0038
wait_policy_total: 9.5800
update_model: 0.0192
weight_update: 0.0013
one_step: 0.0029
handle_policy_step: 2.5433
deserialize: 0.0512, stack: 0.0104, obs_to_device_normalize: 0.4439, forward: 1.6464, send_messages: 0.0528
prepare_outputs: 0.2688
to_cpu: 0.1773
[2023-11-15 07:33:25,069][00663] Learner 0 profile tree view:
misc: 0.0000, prepare_batch: 1.0413
train: 1.5532
epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0002, kl_divergence: 0.0073, after_optimizer: 0.0407
calculate_losses: 0.4753
losses_init: 0.0000, forward_head: 0.3270, bptt_initial: 0.1047, tail: 0.0067, advantages_returns: 0.0009, losses: 0.0310
bptt: 0.0048
bptt_forward_core: 0.0047
update: 1.0292
clip: 0.0481
[2023-11-15 07:33:25,070][00663] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0009, enqueue_policy_requests: 0.9218, env_step: 2.9765, overhead: 0.0933, complete_rollouts: 0.0458
save_policy_outputs: 0.0550
split_output_tensors: 0.0273
[2023-11-15 07:33:25,071][00663] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0148
[2023-11-15 07:33:25,073][00663] Loop Runner_EvtLoop terminating...
[2023-11-15 07:33:25,077][00663] Runner profile tree view:
main_loop: 36.6700
[2023-11-15 07:33:25,079][00663] Collected {0: 4009984}, FPS: 111.7
[2023-11-15 07:33:25,121][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:33:25,124][00663] Overriding arg 'num_workers' with value 1 passed from command line
[2023-11-15 07:33:25,128][00663] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-11-15 07:33:25,130][00663] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-11-15 07:33:25,133][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:33:25,136][00663] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-11-15 07:33:25,137][00663] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:33:25,140][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-11-15 07:33:25,142][00663] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2023-11-15 07:33:25,144][00663] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2023-11-15 07:33:25,145][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-11-15 07:33:25,148][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-11-15 07:33:25,149][00663] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-11-15 07:33:25,151][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-11-15 07:33:25,152][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-11-15 07:33:25,220][00663] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:33:25,222][00663] RunningMeanStd input shape: (1,)
[2023-11-15 07:33:25,247][00663] ConvEncoder: input_channels=3
[2023-11-15 07:33:25,316][00663] Conv encoder output size: 512
[2023-11-15 07:33:25,318][00663] Policy head output size: 512
[2023-11-15 07:33:25,349][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
[2023-11-15 07:33:26,067][00663] Num frames 100...
[2023-11-15 07:33:26,253][00663] Num frames 200...
[2023-11-15 07:33:26,468][00663] Num frames 300...
[2023-11-15 07:33:26,674][00663] Num frames 400...
[2023-11-15 07:33:26,861][00663] Num frames 500...
[2023-11-15 07:33:27,055][00663] Num frames 600...
[2023-11-15 07:33:27,250][00663] Num frames 700...
[2023-11-15 07:33:27,456][00663] Num frames 800...
[2023-11-15 07:33:27,651][00663] Num frames 900...
[2023-11-15 07:33:27,893][00663] Avg episode rewards: #0: 23.920, true rewards: #0: 9.920
[2023-11-15 07:33:27,895][00663] Avg episode reward: 23.920, avg true_objective: 9.920
[2023-11-15 07:33:27,914][00663] Num frames 1000...
[2023-11-15 07:33:28,107][00663] Num frames 1100...
[2023-11-15 07:33:28,309][00663] Num frames 1200...
[2023-11-15 07:33:28,515][00663] Num frames 1300...
[2023-11-15 07:33:28,702][00663] Num frames 1400...
[2023-11-15 07:33:28,894][00663] Num frames 1500...
[2023-11-15 07:33:29,032][00663] Num frames 1600...
[2023-11-15 07:33:29,158][00663] Num frames 1700...
[2023-11-15 07:33:29,287][00663] Num frames 1800...
[2023-11-15 07:33:29,482][00663] Avg episode rewards: #0: 21.440, true rewards: #0: 9.440
[2023-11-15 07:33:29,484][00663] Avg episode reward: 21.440, avg true_objective: 9.440
[2023-11-15 07:33:29,503][00663] Num frames 1900...
[2023-11-15 07:33:29,631][00663] Num frames 2000...
[2023-11-15 07:33:29,762][00663] Num frames 2100...
[2023-11-15 07:33:29,887][00663] Num frames 2200...
[2023-11-15 07:33:30,024][00663] Num frames 2300...
[2023-11-15 07:33:30,155][00663] Num frames 2400...
[2023-11-15 07:33:30,285][00663] Num frames 2500...
[2023-11-15 07:33:30,420][00663] Num frames 2600...
[2023-11-15 07:33:30,564][00663] Num frames 2700...
[2023-11-15 07:33:30,695][00663] Num frames 2800...
[2023-11-15 07:33:30,829][00663] Num frames 2900...
[2023-11-15 07:33:30,960][00663] Num frames 3000...
[2023-11-15 07:33:31,095][00663] Num frames 3100...
[2023-11-15 07:33:31,234][00663] Num frames 3200...
[2023-11-15 07:33:31,374][00663] Num frames 3300...
[2023-11-15 07:33:31,519][00663] Num frames 3400...
[2023-11-15 07:33:31,658][00663] Num frames 3500...
[2023-11-15 07:33:31,795][00663] Num frames 3600...
[2023-11-15 07:33:31,930][00663] Num frames 3700...
[2023-11-15 07:33:32,065][00663] Num frames 3800...
[2023-11-15 07:33:32,197][00663] Num frames 3900...
[2023-11-15 07:33:32,370][00663] Avg episode rewards: #0: 31.960, true rewards: #0: 13.293
[2023-11-15 07:33:32,371][00663] Avg episode reward: 31.960, avg true_objective: 13.293
[2023-11-15 07:33:32,393][00663] Num frames 4000...
[2023-11-15 07:33:32,537][00663] Num frames 4100...
[2023-11-15 07:33:32,673][00663] Num frames 4200...
[2023-11-15 07:33:32,816][00663] Num frames 4300...
[2023-11-15 07:33:32,947][00663] Num frames 4400...
[2023-11-15 07:33:33,087][00663] Num frames 4500...
[2023-11-15 07:33:33,216][00663] Num frames 4600...
[2023-11-15 07:33:33,355][00663] Num frames 4700...
[2023-11-15 07:33:33,488][00663] Num frames 4800...
[2023-11-15 07:33:33,626][00663] Num frames 4900...
[2023-11-15 07:33:33,756][00663] Num frames 5000...
[2023-11-15 07:33:33,887][00663] Num frames 5100...
[2023-11-15 07:33:34,018][00663] Num frames 5200...
[2023-11-15 07:33:34,083][00663] Avg episode rewards: #0: 30.010, true rewards: #0: 13.010
[2023-11-15 07:33:34,085][00663] Avg episode reward: 30.010, avg true_objective: 13.010
[2023-11-15 07:33:34,224][00663] Num frames 5300...
[2023-11-15 07:33:34,361][00663] Num frames 5400...
[2023-11-15 07:33:34,491][00663] Num frames 5500...
[2023-11-15 07:33:34,629][00663] Num frames 5600...
[2023-11-15 07:33:34,763][00663] Num frames 5700...
[2023-11-15 07:33:34,880][00663] Avg episode rewards: #0: 26.096, true rewards: #0: 11.496
[2023-11-15 07:33:34,884][00663] Avg episode reward: 26.096, avg true_objective: 11.496
[2023-11-15 07:33:34,952][00663] Num frames 5800...
[2023-11-15 07:33:35,080][00663] Num frames 5900...
[2023-11-15 07:33:35,210][00663] Num frames 6000...
[2023-11-15 07:33:35,340][00663] Num frames 6100...
[2023-11-15 07:33:35,476][00663] Num frames 6200...
[2023-11-15 07:33:35,612][00663] Num frames 6300...
[2023-11-15 07:33:35,743][00663] Num frames 6400...
[2023-11-15 07:33:35,871][00663] Num frames 6500...
[2023-11-15 07:33:36,004][00663] Num frames 6600...
[2023-11-15 07:33:36,135][00663] Num frames 6700...
[2023-11-15 07:33:36,264][00663] Num frames 6800...
[2023-11-15 07:33:36,422][00663] Num frames 6900...
[2023-11-15 07:33:36,574][00663] Num frames 7000...
[2023-11-15 07:33:36,710][00663] Num frames 7100...
[2023-11-15 07:33:36,843][00663] Num frames 7200...
[2023-11-15 07:33:36,974][00663] Num frames 7300...
[2023-11-15 07:33:37,107][00663] Num frames 7400...
[2023-11-15 07:33:37,241][00663] Num frames 7500...
[2023-11-15 07:33:37,350][00663] Avg episode rewards: #0: 29.233, true rewards: #0: 12.567
[2023-11-15 07:33:37,352][00663] Avg episode reward: 29.233, avg true_objective: 12.567
[2023-11-15 07:33:37,437][00663] Num frames 7600...
[2023-11-15 07:33:37,574][00663] Num frames 7700...
[2023-11-15 07:33:37,718][00663] Num frames 7800...
[2023-11-15 07:33:37,854][00663] Num frames 7900...
[2023-11-15 07:33:37,986][00663] Num frames 8000...
[2023-11-15 07:33:38,119][00663] Num frames 8100...
[2023-11-15 07:33:38,274][00663] Avg episode rewards: #0: 26.681, true rewards: #0: 11.681
[2023-11-15 07:33:38,276][00663] Avg episode reward: 26.681, avg true_objective: 11.681
[2023-11-15 07:33:38,308][00663] Num frames 8200...
[2023-11-15 07:33:38,442][00663] Num frames 8300...
[2023-11-15 07:33:38,574][00663] Num frames 8400...
[2023-11-15 07:33:38,714][00663] Num frames 8500...
[2023-11-15 07:33:38,845][00663] Num frames 8600...
[2023-11-15 07:33:39,018][00663] Num frames 8700...
[2023-11-15 07:33:39,222][00663] Num frames 8800...
[2023-11-15 07:33:39,416][00663] Num frames 8900...
[2023-11-15 07:33:39,610][00663] Num frames 9000...
[2023-11-15 07:33:39,815][00663] Num frames 9100...
[2023-11-15 07:33:40,007][00663] Num frames 9200...
[2023-11-15 07:33:40,204][00663] Num frames 9300...
[2023-11-15 07:33:40,403][00663] Num frames 9400...
[2023-11-15 07:33:40,602][00663] Num frames 9500...
[2023-11-15 07:33:40,802][00663] Num frames 9600...
[2023-11-15 07:33:40,988][00663] Num frames 9700...
[2023-11-15 07:33:41,179][00663] Num frames 9800...
[2023-11-15 07:33:41,380][00663] Num frames 9900...
[2023-11-15 07:33:41,455][00663] Avg episode rewards: #0: 29.006, true rewards: #0: 12.381
[2023-11-15 07:33:41,457][00663] Avg episode reward: 29.006, avg true_objective: 12.381
[2023-11-15 07:33:41,645][00663] Num frames 10000...
[2023-11-15 07:33:41,851][00663] Num frames 10100...
[2023-11-15 07:33:42,042][00663] Num frames 10200...
[2023-11-15 07:33:42,229][00663] Num frames 10300...
[2023-11-15 07:33:42,428][00663] Num frames 10400...
[2023-11-15 07:33:42,632][00663] Num frames 10500...
[2023-11-15 07:33:42,775][00663] Avg episode rewards: #0: 27.494, true rewards: #0: 11.717
[2023-11-15 07:33:42,777][00663] Avg episode reward: 27.494, avg true_objective: 11.717
[2023-11-15 07:33:42,884][00663] Num frames 10600...
[2023-11-15 07:33:43,070][00663] Num frames 10700...
[2023-11-15 07:33:43,272][00663] Num frames 10800...
[2023-11-15 07:33:43,470][00663] Num frames 10900...
[2023-11-15 07:33:43,662][00663] Num frames 11000...
[2023-11-15 07:33:43,855][00663] Num frames 11100...
[2023-11-15 07:33:44,056][00663] Num frames 11200...
[2023-11-15 07:33:44,250][00663] Num frames 11300...
[2023-11-15 07:33:44,455][00663] Num frames 11400...
[2023-11-15 07:33:44,588][00663] Num frames 11500...
[2023-11-15 07:33:44,724][00663] Num frames 11600...
[2023-11-15 07:33:44,784][00663] Avg episode rewards: #0: 26.901, true rewards: #0: 11.601
[2023-11-15 07:33:44,785][00663] Avg episode reward: 26.901, avg true_objective: 11.601
[2023-11-15 07:34:59,120][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2023-11-15 07:34:59,693][00663] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2023-11-15 07:34:59,695][00663] Overriding arg 'num_workers' with value 1 passed from command line
[2023-11-15 07:34:59,701][00663] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-11-15 07:34:59,705][00663] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-11-15 07:34:59,707][00663] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-11-15 07:34:59,709][00663] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-11-15 07:34:59,711][00663] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2023-11-15 07:34:59,712][00663] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-11-15 07:34:59,713][00663] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2023-11-15 07:34:59,714][00663] Adding new argument 'hf_repository'='nikxtaco/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2023-11-15 07:34:59,715][00663] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-11-15 07:34:59,716][00663] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-11-15 07:34:59,717][00663] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-11-15 07:34:59,718][00663] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-11-15 07:34:59,720][00663] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-11-15 07:34:59,762][00663] RunningMeanStd input shape: (3, 72, 128)
[2023-11-15 07:34:59,764][00663] RunningMeanStd input shape: (1,)
[2023-11-15 07:34:59,781][00663] ConvEncoder: input_channels=3
[2023-11-15 07:34:59,840][00663] Conv encoder output size: 512
[2023-11-15 07:34:59,843][00663] Policy head output size: 512
[2023-11-15 07:34:59,871][00663] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
[2023-11-15 07:35:00,607][00663] Num frames 100...
[2023-11-15 07:35:00,796][00663] Num frames 200...
[2023-11-15 07:35:00,998][00663] Num frames 300...
[2023-11-15 07:35:01,191][00663] Num frames 400...
[2023-11-15 07:35:01,426][00663] Num frames 500...
[2023-11-15 07:35:01,656][00663] Num frames 600...
[2023-11-15 07:35:01,850][00663] Num frames 700...
[2023-11-15 07:35:02,050][00663] Num frames 800...
[2023-11-15 07:35:02,243][00663] Num frames 900...
[2023-11-15 07:35:02,469][00663] Num frames 1000...
[2023-11-15 07:35:02,674][00663] Num frames 1100...
[2023-11-15 07:35:02,874][00663] Num frames 1200...
[2023-11-15 07:35:03,080][00663] Num frames 1300...
[2023-11-15 07:35:03,287][00663] Num frames 1400...
[2023-11-15 07:35:03,506][00663] Num frames 1500...
[2023-11-15 07:35:03,706][00663] Num frames 1600...
[2023-11-15 07:35:03,903][00663] Num frames 1700...
[2023-11-15 07:35:04,156][00663] Num frames 1800...
[2023-11-15 07:35:04,390][00663] Num frames 1900...
[2023-11-15 07:35:04,621][00663] Num frames 2000...
[2023-11-15 07:35:04,859][00663] Num frames 2100...
[2023-11-15 07:35:04,912][00663] Avg episode rewards: #0: 54.999, true rewards: #0: 21.000
[2023-11-15 07:35:04,914][00663] Avg episode reward: 54.999, avg true_objective: 21.000
[2023-11-15 07:35:05,148][00663] Num frames 2200...
[2023-11-15 07:35:05,373][00663] Num frames 2300...
[2023-11-15 07:35:05,613][00663] Num frames 2400...
[2023-11-15 07:35:05,844][00663] Num frames 2500...
[2023-11-15 07:35:06,072][00663] Num frames 2600...
[2023-11-15 07:35:06,302][00663] Avg episode rewards: #0: 32.380, true rewards: #0: 13.380
[2023-11-15 07:35:06,304][00663] Avg episode reward: 32.380, avg true_objective: 13.380
[2023-11-15 07:35:06,376][00663] Num frames 2700...
[2023-11-15 07:35:06,582][00663] Num frames 2800...
[2023-11-15 07:35:06,827][00663] Num frames 2900...
[2023-11-15 07:35:07,020][00663] Num frames 3000...
[2023-11-15 07:35:07,242][00663] Num frames 3100...
[2023-11-15 07:35:07,527][00663] Num frames 3200...
[2023-11-15 07:35:07,789][00663] Avg episode rewards: #0: 25.613, true rewards: #0: 10.947
[2023-11-15 07:35:07,791][00663] Avg episode reward: 25.613, avg true_objective: 10.947
[2023-11-15 07:35:07,830][00663] Num frames 3300...
[2023-11-15 07:35:08,071][00663] Num frames 3400...
[2023-11-15 07:35:08,307][00663] Num frames 3500...
[2023-11-15 07:35:08,537][00663] Num frames 3600...
[2023-11-15 07:35:08,795][00663] Num frames 3700...
[2023-11-15 07:35:09,056][00663] Num frames 3800...
[2023-11-15 07:35:09,237][00663] Avg episode rewards: #0: 21.877, true rewards: #0: 9.628
[2023-11-15 07:35:09,239][00663] Avg episode reward: 21.877, avg true_objective: 9.628
[2023-11-15 07:35:09,364][00663] Num frames 3900...
[2023-11-15 07:35:09,614][00663] Num frames 4000...
[2023-11-15 07:35:09,846][00663] Num frames 4100...
[2023-11-15 07:35:10,029][00663] Num frames 4200...
[2023-11-15 07:35:10,219][00663] Num frames 4300...
[2023-11-15 07:35:10,408][00663] Num frames 4400...
[2023-11-15 07:35:10,598][00663] Num frames 4500...
[2023-11-15 07:35:10,796][00663] Num frames 4600...
[2023-11-15 07:35:10,971][00663] Avg episode rewards: #0: 21.102, true rewards: #0: 9.302
[2023-11-15 07:35:10,974][00663] Avg episode reward: 21.102, avg true_objective: 9.302
[2023-11-15 07:35:11,044][00663] Num frames 4700...
[2023-11-15 07:35:11,172][00663] Num frames 4800...
[2023-11-15 07:35:11,300][00663] Num frames 4900...
[2023-11-15 07:35:11,434][00663] Num frames 5000...
[2023-11-15 07:35:11,567][00663] Num frames 5100...
[2023-11-15 07:35:11,701][00663] Num frames 5200...
[2023-11-15 07:35:11,857][00663] Num frames 5300...
[2023-11-15 07:35:11,997][00663] Num frames 5400...
[2023-11-15 07:35:12,129][00663] Num frames 5500...
[2023-11-15 07:35:12,265][00663] Num frames 5600...
[2023-11-15 07:35:12,407][00663] Num frames 5700...
[2023-11-15 07:35:12,538][00663] Num frames 5800...
[2023-11-15 07:35:12,669][00663] Num frames 5900...
[2023-11-15 07:35:12,803][00663] Num frames 6000...
[2023-11-15 07:35:12,942][00663] Num frames 6100...
[2023-11-15 07:35:13,088][00663] Avg episode rewards: #0: 23.780, true rewards: #0: 10.280
[2023-11-15 07:35:13,090][00663] Avg episode reward: 23.780, avg true_objective: 10.280
[2023-11-15 07:35:13,138][00663] Num frames 6200...
[2023-11-15 07:35:13,277][00663] Num frames 6300...
[2023-11-15 07:35:13,410][00663] Num frames 6400...
[2023-11-15 07:35:13,544][00663] Num frames 6500...
[2023-11-15 07:35:13,675][00663] Num frames 6600...
[2023-11-15 07:35:13,803][00663] Num frames 6700...
[2023-11-15 07:35:13,958][00663] Avg episode rewards: #0: 22.397, true rewards: #0: 9.683
[2023-11-15 07:35:13,960][00663] Avg episode reward: 22.397, avg true_objective: 9.683
[2023-11-15 07:35:13,992][00663] Num frames 6800...
[2023-11-15 07:35:14,122][00663] Num frames 6900...
[2023-11-15 07:35:14,254][00663] Num frames 7000...
[2023-11-15 07:35:14,393][00663] Num frames 7100...
[2023-11-15 07:35:14,523][00663] Num frames 7200...
[2023-11-15 07:35:14,656][00663] Num frames 7300...
[2023-11-15 07:35:14,801][00663] Num frames 7400...
[2023-11-15 07:35:14,934][00663] Num frames 7500...
[2023-11-15 07:35:15,073][00663] Num frames 7600...
[2023-11-15 07:35:15,204][00663] Num frames 7700...
[2023-11-15 07:35:15,340][00663] Num frames 7800...
[2023-11-15 07:35:15,482][00663] Num frames 7900...
[2023-11-15 07:35:15,620][00663] Num frames 8000...
[2023-11-15 07:35:15,761][00663] Num frames 8100...
[2023-11-15 07:35:15,905][00663] Num frames 8200...
[2023-11-15 07:35:16,087][00663] Avg episode rewards: #0: 24.602, true rewards: #0: 10.352
[2023-11-15 07:35:16,089][00663] Avg episode reward: 24.602, avg true_objective: 10.352
[2023-11-15 07:35:16,116][00663] Num frames 8300...
[2023-11-15 07:35:16,249][00663] Num frames 8400...
[2023-11-15 07:35:16,384][00663] Num frames 8500...
[2023-11-15 07:35:16,517][00663] Num frames 8600...
[2023-11-15 07:35:16,655][00663] Num frames 8700...
[2023-11-15 07:35:16,795][00663] Num frames 8800...
[2023-11-15 07:35:16,926][00663] Num frames 8900...
[2023-11-15 07:35:17,111][00663] Avg episode rewards: #0: 23.095, true rewards: #0: 9.984
[2023-11-15 07:35:17,113][00663] Avg episode reward: 23.095, avg true_objective: 9.984
[2023-11-15 07:35:17,137][00663] Num frames 9000...
[2023-11-15 07:35:17,273][00663] Num frames 9100...
[2023-11-15 07:35:17,411][00663] Num frames 9200...
[2023-11-15 07:35:17,543][00663] Num frames 9300...
[2023-11-15 07:35:17,676][00663] Num frames 9400...
[2023-11-15 07:35:17,809][00663] Num frames 9500...
[2023-11-15 07:35:17,940][00663] Num frames 9600...
[2023-11-15 07:35:18,077][00663] Num frames 9700...
[2023-11-15 07:35:18,206][00663] Num frames 9800...
[2023-11-15 07:35:18,340][00663] Num frames 9900...
[2023-11-15 07:35:18,478][00663] Num frames 10000...
[2023-11-15 07:35:18,611][00663] Num frames 10100...
[2023-11-15 07:35:18,742][00663] Num frames 10200...
[2023-11-15 07:35:18,871][00663] Num frames 10300...
[2023-11-15 07:35:18,998][00663] Num frames 10400...
[2023-11-15 07:35:19,105][00663] Avg episode rewards: #0: 24.032, true rewards: #0: 10.432
[2023-11-15 07:35:19,107][00663] Avg episode reward: 24.032, avg true_objective: 10.432
[2023-11-15 07:36:27,462][00663] Replay video saved to /content/train_dir/default_experiment/replay.mp4!