File size: 26,654 Bytes

[2023-02-22 18:52:22,088][18494] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-02-22 18:52:22,094][18494] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2023-02-22 18:52:22,455][18508] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-02-22 18:52:22,468][18508] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2023-02-22 18:52:22,636][18513] Worker 4 uses CPU cores [0]
[2023-02-22 18:52:22,646][18516] Worker 6 uses CPU cores [0]
[2023-02-22 18:52:22,654][18510] Worker 2 uses CPU cores [0]
[2023-02-22 18:52:22,720][18512] Worker 3 uses CPU cores [1]
[2023-02-22 18:52:22,733][18509] Worker 0 uses CPU cores [0]
[2023-02-22 18:52:22,750][18515] Worker 7 uses CPU cores [1]
[2023-02-22 18:52:22,921][18511] Worker 1 uses CPU cores [1]
[2023-02-22 18:52:22,924][18514] Worker 5 uses CPU cores [1]
[2023-02-22 18:52:23,261][18508] Num visible devices: 1
[2023-02-22 18:52:23,262][18494] Num visible devices: 1
[2023-02-22 18:52:23,263][18494] Starting seed is not provided
[2023-02-22 18:52:23,263][18494] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-02-22 18:52:23,264][18494] Initializing actor-critic model on device cuda:0
[2023-02-22 18:52:23,265][18494] RunningMeanStd input shape: (3, 72, 128)
[2023-02-22 18:52:23,266][18494] RunningMeanStd input shape: (1,)
[2023-02-22 18:52:23,286][18494] ConvEncoder: input_channels=3
[2023-02-22 18:52:23,468][18494] Conv encoder output size: 512
[2023-02-22 18:52:23,469][18494] Policy head output size: 512
[2023-02-22 18:52:23,492][18494] Created Actor Critic model with architecture:
[2023-02-22 18:52:23,493][18494] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2023-02-22 18:52:25,754][18494] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-02-22 18:52:25,755][18494] No checkpoints found
[2023-02-22 18:52:25,756][18494] Did not load from checkpoint, starting from scratch!
[2023-02-22 18:52:25,756][18494] Initialized policy 0 weights for model version 0
[2023-02-22 18:52:25,760][18494] LearnerWorker_p0 finished initialization!
[2023-02-22 18:52:25,761][18494] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-02-22 18:52:25,974][18508] RunningMeanStd input shape: (3, 72, 128)
[2023-02-22 18:52:25,975][18508] RunningMeanStd input shape: (1,)
[2023-02-22 18:52:25,991][18508] ConvEncoder: input_channels=3
[2023-02-22 18:52:26,091][18508] Conv encoder output size: 512
[2023-02-22 18:52:26,091][18508] Policy head output size: 512
[2023-02-22 18:52:28,437][18509] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-22 18:52:28,444][18510] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-22 18:52:28,475][18511] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-22 18:52:28,474][18516] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-22 18:52:28,477][18515] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-22 18:52:28,486][18514] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-22 18:52:28,489][18512] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-22 18:52:28,491][18513] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-22 18:52:29,641][18511] Decorrelating experience for 0 frames...
[2023-02-22 18:52:29,642][18512] Decorrelating experience for 0 frames...
[2023-02-22 18:52:29,643][18514] Decorrelating experience for 0 frames...
[2023-02-22 18:52:29,881][18509] Decorrelating experience for 0 frames...
[2023-02-22 18:52:29,876][18513] Decorrelating experience for 0 frames...
[2023-02-22 18:52:29,885][18510] Decorrelating experience for 0 frames...
[2023-02-22 18:52:29,901][18516] Decorrelating experience for 0 frames...
[2023-02-22 18:52:30,514][18514] Decorrelating experience for 32 frames...
[2023-02-22 18:52:30,560][18515] Decorrelating experience for 0 frames...
[2023-02-22 18:52:30,890][18511] Decorrelating experience for 32 frames...
[2023-02-22 18:52:30,982][18509] Decorrelating experience for 32 frames...
[2023-02-22 18:52:30,985][18513] Decorrelating experience for 32 frames...
[2023-02-22 18:52:30,994][18510] Decorrelating experience for 32 frames...
[2023-02-22 18:52:31,638][18515] Decorrelating experience for 32 frames...
[2023-02-22 18:52:31,764][18516] Decorrelating experience for 32 frames...
[2023-02-22 18:52:32,185][18510] Decorrelating experience for 64 frames...
[2023-02-22 18:52:32,191][18513] Decorrelating experience for 64 frames...
[2023-02-22 18:52:32,387][18514] Decorrelating experience for 64 frames...
[2023-02-22 18:52:32,394][18512] Decorrelating experience for 32 frames...
[2023-02-22 18:52:32,402][18511] Decorrelating experience for 64 frames...
[2023-02-22 18:52:33,107][18515] Decorrelating experience for 64 frames...
[2023-02-22 18:52:33,500][18514] Decorrelating experience for 96 frames...
[2023-02-22 18:52:33,509][18510] Decorrelating experience for 96 frames...
[2023-02-22 18:52:33,513][18513] Decorrelating experience for 96 frames...
[2023-02-22 18:52:33,697][18512] Decorrelating experience for 64 frames...
[2023-02-22 18:52:34,207][18516] Decorrelating experience for 64 frames...
[2023-02-22 18:52:34,214][18509] Decorrelating experience for 64 frames...
[2023-02-22 18:52:34,623][18515] Decorrelating experience for 96 frames...
[2023-02-22 18:52:34,629][18511] Decorrelating experience for 96 frames...
[2023-02-22 18:52:35,339][18512] Decorrelating experience for 96 frames...
[2023-02-22 18:52:36,064][18516] Decorrelating experience for 96 frames...
[2023-02-22 18:52:36,085][18509] Decorrelating experience for 96 frames...
[2023-02-22 18:52:42,053][18494] Signal inference workers to stop experience collection...
[2023-02-22 18:52:42,116][18508] InferenceWorker_p0-w0: stopping experience collection
[2023-02-22 18:52:44,184][18494] Signal inference workers to resume experience collection...
[2023-02-22 18:52:44,187][18508] InferenceWorker_p0-w0: resuming experience collection
[2023-02-22 18:52:51,922][18508] Updated weights for policy 0, policy_version 10 (0.0338)
[2023-02-22 18:53:03,897][18508] Updated weights for policy 0, policy_version 20 (0.0021)
[2023-02-22 18:53:10,170][18494] Saving new best policy, reward=4.191!
[2023-02-22 18:53:12,948][18508] Updated weights for policy 0, policy_version 30 (0.0014)
[2023-02-22 18:53:15,156][18494] Saving new best policy, reward=4.205!
[2023-02-22 18:53:24,545][18508] Updated weights for policy 0, policy_version 40 (0.0025)
[2023-02-22 18:53:25,231][18494] Saving new best policy, reward=4.251!
[2023-02-22 18:53:30,169][18494] Saving new best policy, reward=4.532!
[2023-02-22 18:53:33,956][18508] Updated weights for policy 0, policy_version 50 (0.0026)
[2023-02-22 18:53:45,138][18508] Updated weights for policy 0, policy_version 60 (0.0016)
[2023-02-22 18:53:50,186][18494] Saving new best policy, reward=4.588!
[2023-02-22 18:53:54,603][18508] Updated weights for policy 0, policy_version 70 (0.0023)
[2023-02-22 18:54:05,216][18494] Saving new best policy, reward=4.629!
[2023-02-22 18:54:05,221][18508] Updated weights for policy 0, policy_version 80 (0.0030)
[2023-02-22 18:54:10,165][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000083_339968.pth...
[2023-02-22 18:54:19,873][18508] Updated weights for policy 0, policy_version 90 (0.0027)
[2023-02-22 18:54:25,154][18494] Saving new best policy, reward=4.694!
[2023-02-22 18:54:29,311][18508] Updated weights for policy 0, policy_version 100 (0.0017)
[2023-02-22 18:54:30,189][18494] Saving new best policy, reward=4.702!
[2023-02-22 18:54:40,931][18508] Updated weights for policy 0, policy_version 110 (0.0017)
[2023-02-22 18:54:45,156][18494] Saving new best policy, reward=4.806!
[2023-02-22 18:54:49,636][18508] Updated weights for policy 0, policy_version 120 (0.0030)
[2023-02-22 18:55:00,261][18494] Saving new best policy, reward=4.984!
[2023-02-22 18:55:01,559][18508] Updated weights for policy 0, policy_version 130 (0.0013)
[2023-02-22 18:55:10,154][18508] Updated weights for policy 0, policy_version 140 (0.0011)
[2023-02-22 18:55:20,171][18494] Saving new best policy, reward=5.100!
[2023-02-22 18:55:21,905][18508] Updated weights for policy 0, policy_version 150 (0.0019)
[2023-02-22 18:55:25,160][18494] Saving new best policy, reward=5.200!
[2023-02-22 18:55:30,400][18508] Updated weights for policy 0, policy_version 160 (0.0013)
[2023-02-22 18:55:42,165][18508] Updated weights for policy 0, policy_version 170 (0.0020)
[2023-02-22 18:55:50,168][18494] Saving new best policy, reward=5.430!
[2023-02-22 18:55:50,642][18508] Updated weights for policy 0, policy_version 180 (0.0018)
[2023-02-22 18:55:55,162][18494] Saving new best policy, reward=5.935!
[2023-02-22 18:56:02,727][18508] Updated weights for policy 0, policy_version 190 (0.0014)
[2023-02-22 18:56:05,160][18494] Saving new best policy, reward=5.956!
[2023-02-22 18:56:10,166][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000198_811008.pth...
[2023-02-22 18:56:10,313][18494] Saving new best policy, reward=6.201!
[2023-02-22 18:56:11,589][18508] Updated weights for policy 0, policy_version 200 (0.0015)
[2023-02-22 18:56:15,158][18494] Saving new best policy, reward=6.805!
[2023-02-22 18:56:20,171][18494] Saving new best policy, reward=6.821!
[2023-02-22 18:56:23,530][18508] Updated weights for policy 0, policy_version 210 (0.0012)
[2023-02-22 18:56:25,161][18494] Saving new best policy, reward=6.914!
[2023-02-22 18:56:30,224][18494] Saving new best policy, reward=6.963!
[2023-02-22 18:56:32,783][18508] Updated weights for policy 0, policy_version 220 (0.0028)
[2023-02-22 18:56:35,160][18494] Saving new best policy, reward=7.309!
[2023-02-22 18:56:43,943][18508] Updated weights for policy 0, policy_version 230 (0.0021)
[2023-02-22 18:56:53,708][18508] Updated weights for policy 0, policy_version 240 (0.0012)
[2023-02-22 18:57:04,323][18508] Updated weights for policy 0, policy_version 250 (0.0014)
[2023-02-22 18:57:14,724][18508] Updated weights for policy 0, policy_version 260 (0.0012)
[2023-02-22 18:57:20,167][18494] Saving new best policy, reward=7.435!
[2023-02-22 18:57:25,048][18508] Updated weights for policy 0, policy_version 270 (0.0017)
[2023-02-22 18:57:35,159][18494] Saving new best policy, reward=7.552!
[2023-02-22 18:57:35,781][18508] Updated weights for policy 0, policy_version 280 (0.0016)
[2023-02-22 18:57:40,169][18494] Saving new best policy, reward=7.882!
[2023-02-22 18:57:45,158][18494] Saving new best policy, reward=8.241!
[2023-02-22 18:57:45,586][18508] Updated weights for policy 0, policy_version 290 (0.0015)
[2023-02-22 18:57:55,163][18494] Saving new best policy, reward=8.245!
[2023-02-22 18:57:56,872][18508] Updated weights for policy 0, policy_version 300 (0.0025)
[2023-02-22 18:58:00,167][18494] Saving new best policy, reward=9.512!
[2023-02-22 18:58:05,157][18494] Saving new best policy, reward=10.187!
[2023-02-22 18:58:06,102][18508] Updated weights for policy 0, policy_version 310 (0.0013)
[2023-02-22 18:58:10,176][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth...
[2023-02-22 18:58:10,377][18494] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000083_339968.pth
[2023-02-22 18:58:10,391][18494] Saving new best policy, reward=11.235!
[2023-02-22 18:58:15,154][18494] Saving new best policy, reward=12.271!
[2023-02-22 18:58:18,369][18508] Updated weights for policy 0, policy_version 320 (0.0032)
[2023-02-22 18:58:20,164][18494] Saving new best policy, reward=12.977!
[2023-02-22 18:58:25,159][18494] Saving new best policy, reward=13.710!
[2023-02-22 18:58:26,845][18508] Updated weights for policy 0, policy_version 330 (0.0013)
[2023-02-22 18:58:39,520][18508] Updated weights for policy 0, policy_version 340 (0.0019)
[2023-02-22 18:58:52,509][18508] Updated weights for policy 0, policy_version 350 (0.0018)
[2023-02-22 18:58:55,199][18494] Saving new best policy, reward=13.881!
[2023-02-22 18:59:00,167][18494] Saving new best policy, reward=15.240!
[2023-02-22 18:59:02,958][18508] Updated weights for policy 0, policy_version 360 (0.0014)
[2023-02-22 18:59:10,168][18494] Saving new best policy, reward=15.378!
[2023-02-22 18:59:13,687][18508] Updated weights for policy 0, policy_version 370 (0.0011)
[2023-02-22 18:59:15,167][18494] Saving new best policy, reward=16.206!
[2023-02-22 18:59:23,567][18508] Updated weights for policy 0, policy_version 380 (0.0019)
[2023-02-22 18:59:25,162][18494] Saving new best policy, reward=16.728!
[2023-02-22 18:59:30,168][18494] Saving new best policy, reward=17.859!
[2023-02-22 18:59:34,808][18508] Updated weights for policy 0, policy_version 390 (0.0012)
[2023-02-22 18:59:40,161][18494] Saving new best policy, reward=18.218!
[2023-02-22 18:59:44,145][18508] Updated weights for policy 0, policy_version 400 (0.0021)
[2023-02-22 18:59:45,155][18494] Saving new best policy, reward=18.396!
[2023-02-22 18:59:50,170][18494] Saving new best policy, reward=19.251!
[2023-02-22 18:59:55,604][18508] Updated weights for policy 0, policy_version 410 (0.0013)
[2023-02-22 19:00:04,352][18508] Updated weights for policy 0, policy_version 420 (0.0011)
[2023-02-22 19:00:10,162][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000425_1740800.pth...
[2023-02-22 19:00:10,307][18494] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000198_811008.pth
[2023-02-22 19:00:16,510][18508] Updated weights for policy 0, policy_version 430 (0.0016)
[2023-02-22 19:00:20,169][18494] Saving new best policy, reward=19.433!
[2023-02-22 19:00:24,918][18508] Updated weights for policy 0, policy_version 440 (0.0023)
[2023-02-22 19:00:25,159][18494] Saving new best policy, reward=19.548!
[2023-02-22 19:00:36,903][18508] Updated weights for policy 0, policy_version 450 (0.0011)
[2023-02-22 19:00:45,179][18508] Updated weights for policy 0, policy_version 460 (0.0019)
[2023-02-22 19:00:57,269][18508] Updated weights for policy 0, policy_version 470 (0.0012)
[2023-02-22 19:01:05,633][18508] Updated weights for policy 0, policy_version 480 (0.0013)
[2023-02-22 19:01:17,487][18508] Updated weights for policy 0, policy_version 490 (0.0023)
[2023-02-22 19:01:25,180][18494] Saving new best policy, reward=20.348!
[2023-02-22 19:01:26,585][18508] Updated weights for policy 0, policy_version 500 (0.0016)
[2023-02-22 19:01:30,170][18494] Saving new best policy, reward=20.421!
[2023-02-22 19:01:37,786][18508] Updated weights for policy 0, policy_version 510 (0.0016)
[2023-02-22 19:01:45,167][18494] Saving new best policy, reward=20.458!
[2023-02-22 19:01:47,346][18508] Updated weights for policy 0, policy_version 520 (0.0030)
[2023-02-22 19:01:50,193][18494] Saving new best policy, reward=21.617!
[2023-02-22 19:01:55,159][18494] Saving new best policy, reward=22.472!
[2023-02-22 19:01:58,324][18508] Updated weights for policy 0, policy_version 530 (0.0018)
[2023-02-22 19:02:00,170][18494] Saving new best policy, reward=22.824!
[2023-02-22 19:02:08,289][18508] Updated weights for policy 0, policy_version 540 (0.0011)
[2023-02-22 19:02:10,171][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000541_2215936.pth...
[2023-02-22 19:02:10,309][18494] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth
[2023-02-22 19:02:18,832][18508] Updated weights for policy 0, policy_version 550 (0.0028)
[2023-02-22 19:02:29,136][18508] Updated weights for policy 0, policy_version 560 (0.0015)
[2023-02-22 19:02:38,966][18508] Updated weights for policy 0, policy_version 570 (0.0027)
[2023-02-22 19:02:50,118][18508] Updated weights for policy 0, policy_version 580 (0.0019)
[2023-02-22 19:02:59,465][18508] Updated weights for policy 0, policy_version 590 (0.0014)
[2023-02-22 19:03:11,008][18508] Updated weights for policy 0, policy_version 600 (0.0013)
[2023-02-22 19:03:23,769][18508] Updated weights for policy 0, policy_version 610 (0.0029)
[2023-02-22 19:03:35,094][18508] Updated weights for policy 0, policy_version 620 (0.0034)
[2023-02-22 19:03:44,110][18508] Updated weights for policy 0, policy_version 630 (0.0026)
[2023-02-22 19:03:55,449][18508] Updated weights for policy 0, policy_version 640 (0.0021)
[2023-02-22 19:04:04,961][18508] Updated weights for policy 0, policy_version 650 (0.0022)
[2023-02-22 19:04:10,173][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000653_2674688.pth...
[2023-02-22 19:04:10,350][18494] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000425_1740800.pth
[2023-02-22 19:04:15,773][18508] Updated weights for policy 0, policy_version 660 (0.0016)
[2023-02-22 19:04:25,669][18508] Updated weights for policy 0, policy_version 670 (0.0011)
[2023-02-22 19:04:36,111][18508] Updated weights for policy 0, policy_version 680 (0.0020)
[2023-02-22 19:04:47,035][18508] Updated weights for policy 0, policy_version 690 (0.0032)
[2023-02-22 19:04:55,222][18494] Saving new best policy, reward=23.394!
[2023-02-22 19:04:56,779][18508] Updated weights for policy 0, policy_version 700 (0.0015)
[2023-02-22 19:05:08,006][18508] Updated weights for policy 0, policy_version 710 (0.0013)
[2023-02-22 19:05:17,384][18508] Updated weights for policy 0, policy_version 720 (0.0020)
[2023-02-22 19:05:28,657][18508] Updated weights for policy 0, policy_version 730 (0.0022)
[2023-02-22 19:05:37,495][18508] Updated weights for policy 0, policy_version 740 (0.0036)
[2023-02-22 19:05:49,212][18508] Updated weights for policy 0, policy_version 750 (0.0015)
[2023-02-22 19:05:57,872][18508] Updated weights for policy 0, policy_version 760 (0.0021)
[2023-02-22 19:06:09,765][18508] Updated weights for policy 0, policy_version 770 (0.0044)
[2023-02-22 19:06:10,168][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000770_3153920.pth...
[2023-02-22 19:06:10,307][18494] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000541_2215936.pth
[2023-02-22 19:06:18,309][18508] Updated weights for policy 0, policy_version 780 (0.0015)
[2023-02-22 19:06:30,077][18508] Updated weights for policy 0, policy_version 790 (0.0015)
[2023-02-22 19:06:38,169][18508] Updated weights for policy 0, policy_version 800 (0.0018)
[2023-02-22 19:06:45,158][18494] Saving new best policy, reward=24.124!
[2023-02-22 19:06:50,166][18494] Saving new best policy, reward=24.920!
[2023-02-22 19:06:50,533][18508] Updated weights for policy 0, policy_version 810 (0.0039)
[2023-02-22 19:06:55,155][18494] Saving new best policy, reward=26.242!
[2023-02-22 19:06:59,472][18508] Updated weights for policy 0, policy_version 820 (0.0015)
[2023-02-22 19:07:10,964][18508] Updated weights for policy 0, policy_version 830 (0.0031)
[2023-02-22 19:07:20,255][18508] Updated weights for policy 0, policy_version 840 (0.0026)
[2023-02-22 19:07:31,400][18508] Updated weights for policy 0, policy_version 850 (0.0018)
[2023-02-22 19:07:40,844][18508] Updated weights for policy 0, policy_version 860 (0.0014)
[2023-02-22 19:07:50,165][18494] Saving new best policy, reward=26.314!
[2023-02-22 19:07:55,359][18508] Updated weights for policy 0, policy_version 870 (0.0042)
[2023-02-22 19:08:05,366][18508] Updated weights for policy 0, policy_version 880 (0.0017)
[2023-02-22 19:08:10,176][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000883_3616768.pth...
[2023-02-22 19:08:10,356][18494] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000653_2674688.pth
[2023-02-22 19:08:15,695][18508] Updated weights for policy 0, policy_version 890 (0.0013)
[2023-02-22 19:08:26,420][18508] Updated weights for policy 0, policy_version 900 (0.0031)
[2023-02-22 19:08:35,909][18508] Updated weights for policy 0, policy_version 910 (0.0037)
[2023-02-22 19:08:46,932][18508] Updated weights for policy 0, policy_version 920 (0.0032)
[2023-02-22 19:08:56,357][18508] Updated weights for policy 0, policy_version 930 (0.0014)
[2023-02-22 19:09:07,688][18508] Updated weights for policy 0, policy_version 940 (0.0013)
[2023-02-22 19:09:16,391][18508] Updated weights for policy 0, policy_version 950 (0.0011)
[2023-02-22 19:09:28,579][18508] Updated weights for policy 0, policy_version 960 (0.0011)
[2023-02-22 19:09:36,897][18508] Updated weights for policy 0, policy_version 970 (0.0014)
[2023-02-22 19:09:48,991][18508] Updated weights for policy 0, policy_version 980 (0.0033)
[2023-02-22 19:09:57,581][18508] Updated weights for policy 0, policy_version 990 (0.0011)
[2023-02-22 19:10:09,423][18508] Updated weights for policy 0, policy_version 1000 (0.0026)
[2023-02-22 19:10:10,165][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001001_4100096.pth...
[2023-02-22 19:10:10,282][18494] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000770_3153920.pth
[2023-02-22 19:10:17,695][18508] Updated weights for policy 0, policy_version 1010 (0.0012)
[2023-02-22 19:10:29,555][18508] Updated weights for policy 0, policy_version 1020 (0.0020)
[2023-02-22 19:10:38,054][18508] Updated weights for policy 0, policy_version 1030 (0.0026)
[2023-02-22 19:10:49,652][18508] Updated weights for policy 0, policy_version 1040 (0.0028)
[2023-02-22 19:10:58,506][18508] Updated weights for policy 0, policy_version 1050 (0.0012)
[2023-02-22 19:11:10,211][18508] Updated weights for policy 0, policy_version 1060 (0.0012)
[2023-02-22 19:11:19,404][18508] Updated weights for policy 0, policy_version 1070 (0.0011)
[2023-02-22 19:11:30,700][18508] Updated weights for policy 0, policy_version 1080 (0.0023)
[2023-02-22 19:11:39,896][18508] Updated weights for policy 0, policy_version 1090 (0.0033)
[2023-02-22 19:11:50,890][18508] Updated weights for policy 0, policy_version 1100 (0.0011)
[2023-02-22 19:12:00,614][18508] Updated weights for policy 0, policy_version 1110 (0.0015)
[2023-02-22 19:12:10,171][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001118_4579328.pth...
[2023-02-22 19:12:10,311][18494] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000883_3616768.pth
[2023-02-22 19:12:12,804][18508] Updated weights for policy 0, policy_version 1120 (0.0020)
[2023-02-22 19:12:26,550][18508] Updated weights for policy 0, policy_version 1130 (0.0018)
[2023-02-22 19:12:35,018][18508] Updated weights for policy 0, policy_version 1140 (0.0018)
[2023-02-22 19:12:46,742][18508] Updated weights for policy 0, policy_version 1150 (0.0021)
[2023-02-22 19:12:55,110][18508] Updated weights for policy 0, policy_version 1160 (0.0018)
[2023-02-22 19:13:07,263][18508] Updated weights for policy 0, policy_version 1170 (0.0016)
[2023-02-22 19:13:15,160][18494] Saving new best policy, reward=26.727!
[2023-02-22 19:13:15,694][18508] Updated weights for policy 0, policy_version 1180 (0.0014)
[2023-02-22 19:13:27,579][18508] Updated weights for policy 0, policy_version 1190 (0.0021)
[2023-02-22 19:13:36,030][18508] Updated weights for policy 0, policy_version 1200 (0.0028)
[2023-02-22 19:13:47,959][18508] Updated weights for policy 0, policy_version 1210 (0.0020)
[2023-02-22 19:13:56,937][18508] Updated weights for policy 0, policy_version 1220 (0.0017)
[2023-02-22 19:13:59,627][18494] Stopping Batcher_0...
[2023-02-22 19:13:59,629][18494] Loop batcher_evt_loop terminating...
[2023-02-22 19:13:59,635][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2023-02-22 19:13:59,669][18512] Stopping RolloutWorker_w3...
[2023-02-22 19:13:59,686][18511] Stopping RolloutWorker_w1...
[2023-02-22 19:13:59,691][18508] Weights refcount: 2 0
[2023-02-22 19:13:59,687][18511] Loop rollout_proc1_evt_loop terminating...
[2023-02-22 19:13:59,682][18512] Loop rollout_proc3_evt_loop terminating...
[2023-02-22 19:13:59,695][18508] Stopping InferenceWorker_p0-w0...
[2023-02-22 19:13:59,700][18508] Loop inference_proc0-0_evt_loop terminating...
[2023-02-22 19:13:59,714][18514] Stopping RolloutWorker_w5...
[2023-02-22 19:13:59,715][18514] Loop rollout_proc5_evt_loop terminating...
[2023-02-22 19:13:59,724][18515] Stopping RolloutWorker_w7...
[2023-02-22 19:13:59,728][18515] Loop rollout_proc7_evt_loop terminating...
[2023-02-22 19:13:59,832][18513] Stopping RolloutWorker_w4...
[2023-02-22 19:13:59,833][18513] Loop rollout_proc4_evt_loop terminating...
[2023-02-22 19:13:59,839][18509] Stopping RolloutWorker_w0...
[2023-02-22 19:13:59,844][18494] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001001_4100096.pth
[2023-02-22 19:13:59,862][18509] Loop rollout_proc0_evt_loop terminating...
[2023-02-22 19:13:59,864][18494] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2023-02-22 19:13:59,879][18510] Stopping RolloutWorker_w2...
[2023-02-22 19:13:59,889][18510] Loop rollout_proc2_evt_loop terminating...
[2023-02-22 19:13:59,900][18516] Stopping RolloutWorker_w6...
[2023-02-22 19:13:59,913][18516] Loop rollout_proc6_evt_loop terminating...
[2023-02-22 19:14:00,175][18494] Stopping LearnerWorker_p0...
[2023-02-22 19:14:00,176][18494] Loop learner_proc0_evt_loop terminating...