File size: 32,881 Bytes

[2023-02-24 07:55:59,139][784615] Saving configuration to /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json...
[2023-02-24 07:55:59,305][784615] Rollout worker 0 uses device cpu
[2023-02-24 07:55:59,305][784615] Rollout worker 1 uses device cpu
[2023-02-24 07:55:59,306][784615] Rollout worker 2 uses device cpu
[2023-02-24 07:55:59,306][784615] Rollout worker 3 uses device cpu
[2023-02-24 07:55:59,306][784615] Rollout worker 4 uses device cpu
[2023-02-24 07:55:59,307][784615] Rollout worker 5 uses device cpu
[2023-02-24 07:55:59,307][784615] Rollout worker 6 uses device cpu
[2023-02-24 07:55:59,308][784615] Rollout worker 7 uses device cpu
[2023-02-24 07:55:59,357][784615] Using GPUs [0] for process 0 (actually maps to GPUs [1])
[2023-02-24 07:55:59,358][784615] InferenceWorker_p0-w0: min num requests: 2
[2023-02-24 07:55:59,378][784615] Starting all processes...
[2023-02-24 07:55:59,378][784615] Starting process learner_proc0
[2023-02-24 07:55:59,428][784615] Starting all processes...
[2023-02-24 07:55:59,438][784615] Starting process inference_proc0-0
[2023-02-24 07:55:59,439][784615] Starting process rollout_proc0
[2023-02-24 07:55:59,439][784615] Starting process rollout_proc1
[2023-02-24 07:55:59,439][784615] Starting process rollout_proc2
[2023-02-24 07:55:59,439][784615] Starting process rollout_proc3
[2023-02-24 07:55:59,439][784615] Starting process rollout_proc4
[2023-02-24 07:55:59,439][784615] Starting process rollout_proc5
[2023-02-24 07:55:59,440][784615] Starting process rollout_proc6
[2023-02-24 07:55:59,441][784615] Starting process rollout_proc7
[2023-02-24 07:56:00,842][794035] Worker 3 uses CPU cores [3]
[2023-02-24 07:56:00,862][794036] Worker 5 uses CPU cores [5]
[2023-02-24 07:56:00,978][794037] Worker 4 uses CPU cores [4]
[2023-02-24 07:56:00,998][794038] Worker 2 uses CPU cores [2]
[2023-02-24 07:56:00,999][794019] Low niceness requires sudo!
[2023-02-24 07:56:00,999][794019] Using GPUs [0] for process 0 (actually maps to GPUs [1])
[2023-02-24 07:56:01,000][794019] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for learning process 0
[2023-02-24 07:56:01,022][794032] Low niceness requires sudo!
[2023-02-24 07:56:01,022][794032] Using GPUs [0] for process 0 (actually maps to GPUs [1])
[2023-02-24 07:56:01,022][794032] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for inference process 0
[2023-02-24 07:56:01,025][794019] Num visible devices: 1
[2023-02-24 07:56:01,040][794032] Num visible devices: 1
[2023-02-24 07:56:01,050][794019] Starting seed is not provided
[2023-02-24 07:56:01,050][794019] Using GPUs [0] for process 0 (actually maps to GPUs [1])
[2023-02-24 07:56:01,050][794019] Initializing actor-critic model on device cuda:0
[2023-02-24 07:56:01,051][794019] RunningMeanStd input shape: (3, 72, 128)
[2023-02-24 07:56:01,051][794019] RunningMeanStd input shape: (1,)
[2023-02-24 07:56:01,058][794040] Worker 7 uses CPU cores [7]
[2023-02-24 07:56:01,065][794019] ConvEncoder: input_channels=3
[2023-02-24 07:56:01,154][794034] Worker 1 uses CPU cores [1]
[2023-02-24 07:56:01,186][794039] Worker 6 uses CPU cores [6]
[2023-02-24 07:56:01,194][794033] Worker 0 uses CPU cores [0]
[2023-02-24 07:56:01,198][794019] Conv encoder output size: 512
[2023-02-24 07:56:01,198][794019] Policy head output size: 512
[2023-02-24 07:56:01,216][794019] Created Actor Critic model with architecture:
[2023-02-24 07:56:01,216][794019] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2023-02-24 07:56:04,122][794019] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-02-24 07:56:04,122][794019] No checkpoints found
[2023-02-24 07:56:04,122][794019] Did not load from checkpoint, starting from scratch!
[2023-02-24 07:56:04,122][794019] Initialized policy 0 weights for model version 0
[2023-02-24 07:56:04,124][794019] LearnerWorker_p0 finished initialization!
[2023-02-24 07:56:04,124][794019] Using GPUs [0] for process 0 (actually maps to GPUs [1])
[2023-02-24 07:56:05,229][794032] RunningMeanStd input shape: (3, 72, 128)
[2023-02-24 07:56:05,229][794032] RunningMeanStd input shape: (1,)
[2023-02-24 07:56:05,237][794032] ConvEncoder: input_channels=3
[2023-02-24 07:56:05,307][794032] Conv encoder output size: 512
[2023-02-24 07:56:05,307][794032] Policy head output size: 512
[2023-02-24 07:56:06,350][784615] Inference worker 0-0 is ready!
[2023-02-24 07:56:06,350][784615] All inference workers are ready! Signal rollout workers to start!
[2023-02-24 07:56:06,367][794036] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-24 07:56:06,368][794039] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-24 07:56:06,368][794034] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-24 07:56:06,368][794035] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-24 07:56:06,369][794038] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-24 07:56:06,374][794040] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-24 07:56:06,390][794037] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-24 07:56:06,393][794033] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-24 07:56:06,669][794035] Decorrelating experience for 0 frames...
[2023-02-24 07:56:06,672][794039] Decorrelating experience for 0 frames...
[2023-02-24 07:56:06,678][794036] Decorrelating experience for 0 frames...
[2023-02-24 07:56:06,681][794037] Decorrelating experience for 0 frames...
[2023-02-24 07:56:06,683][794040] Decorrelating experience for 0 frames...
[2023-02-24 07:56:06,929][794035] Decorrelating experience for 32 frames...
[2023-02-24 07:56:06,946][794040] Decorrelating experience for 32 frames...
[2023-02-24 07:56:06,958][794036] Decorrelating experience for 32 frames...
[2023-02-24 07:56:06,977][794037] Decorrelating experience for 32 frames...
[2023-02-24 07:56:06,986][794038] Decorrelating experience for 0 frames...
[2023-02-24 07:56:06,997][794039] Decorrelating experience for 32 frames...
[2023-02-24 07:56:07,033][794033] Decorrelating experience for 0 frames...
[2023-02-24 07:56:07,227][794038] Decorrelating experience for 32 frames...
[2023-02-24 07:56:07,268][794034] Decorrelating experience for 0 frames...
[2023-02-24 07:56:07,269][794036] Decorrelating experience for 64 frames...
[2023-02-24 07:56:07,278][794037] Decorrelating experience for 64 frames...
[2023-02-24 07:56:07,293][794033] Decorrelating experience for 32 frames...
[2023-02-24 07:56:07,502][794038] Decorrelating experience for 64 frames...
[2023-02-24 07:56:07,545][794039] Decorrelating experience for 64 frames...
[2023-02-24 07:56:07,584][794036] Decorrelating experience for 96 frames...
[2023-02-24 07:56:07,591][794033] Decorrelating experience for 64 frames...
[2023-02-24 07:56:07,594][794040] Decorrelating experience for 64 frames...
[2023-02-24 07:56:07,600][794037] Decorrelating experience for 96 frames...
[2023-02-24 07:56:07,628][794034] Decorrelating experience for 32 frames...
[2023-02-24 07:56:07,822][794033] Decorrelating experience for 96 frames...
[2023-02-24 07:56:07,846][794039] Decorrelating experience for 96 frames...
[2023-02-24 07:56:07,890][794038] Decorrelating experience for 96 frames...
[2023-02-24 07:56:07,895][794040] Decorrelating experience for 96 frames...
[2023-02-24 07:56:08,065][794035] Decorrelating experience for 64 frames...
[2023-02-24 07:56:08,153][784615] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2023-02-24 07:56:08,319][794034] Decorrelating experience for 64 frames...
[2023-02-24 07:56:08,572][794035] Decorrelating experience for 96 frames...
[2023-02-24 07:56:08,577][794034] Decorrelating experience for 96 frames...
[2023-02-24 07:56:10,130][794019] Signal inference workers to stop experience collection...
[2023-02-24 07:56:10,137][794032] InferenceWorker_p0-w0: stopping experience collection
[2023-02-24 07:56:12,636][794019] Signal inference workers to resume experience collection...
[2023-02-24 07:56:12,637][794032] InferenceWorker_p0-w0: resuming experience collection
[2023-02-24 07:56:13,153][784615] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 4096. Throughput: 0: 494.8. Samples: 2474. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-02-24 07:56:13,154][784615] Avg episode reward: [(0, '2.494')]
[2023-02-24 07:56:15,144][794032] Updated weights for policy 0, policy_version 10 (0.0249)
[2023-02-24 07:56:17,664][794032] Updated weights for policy 0, policy_version 20 (0.0006)
[2023-02-24 07:56:18,153][784615] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8601.6). Total num frames: 86016. Throughput: 0: 1166.6. Samples: 11666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-02-24 07:56:18,154][784615] Avg episode reward: [(0, '4.437')]
[2023-02-24 07:56:19,351][784615] Heartbeat connected on Batcher_0
[2023-02-24 07:56:19,354][784615] Heartbeat connected on LearnerWorker_p0
[2023-02-24 07:56:19,362][784615] Heartbeat connected on InferenceWorker_p0-w0
[2023-02-24 07:56:19,364][784615] Heartbeat connected on RolloutWorker_w0
[2023-02-24 07:56:19,365][784615] Heartbeat connected on RolloutWorker_w1
[2023-02-24 07:56:19,368][784615] Heartbeat connected on RolloutWorker_w3
[2023-02-24 07:56:19,373][784615] Heartbeat connected on RolloutWorker_w4
[2023-02-24 07:56:19,374][784615] Heartbeat connected on RolloutWorker_w5
[2023-02-24 07:56:19,375][784615] Heartbeat connected on RolloutWorker_w2
[2023-02-24 07:56:19,377][784615] Heartbeat connected on RolloutWorker_w7
[2023-02-24 07:56:19,378][784615] Heartbeat connected on RolloutWorker_w6
[2023-02-24 07:56:20,182][794032] Updated weights for policy 0, policy_version 30 (0.0006)
[2023-02-24 07:56:22,465][794032] Updated weights for policy 0, policy_version 40 (0.0007)
[2023-02-24 07:56:23,153][784615] Fps is (10 sec: 16793.7, 60 sec: 11468.8, 300 sec: 11468.8). Total num frames: 172032. Throughput: 0: 2462.0. Samples: 36930. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2023-02-24 07:56:23,154][784615] Avg episode reward: [(0, '4.501')]
[2023-02-24 07:56:23,155][794019] Saving new best policy, reward=4.501!
[2023-02-24 07:56:25,053][794032] Updated weights for policy 0, policy_version 50 (0.0007)
[2023-02-24 07:56:27,490][794032] Updated weights for policy 0, policy_version 60 (0.0007)
[2023-02-24 07:56:28,153][784615] Fps is (10 sec: 16793.6, 60 sec: 12697.6, 300 sec: 12697.6). Total num frames: 253952. Throughput: 0: 3086.0. Samples: 61720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2023-02-24 07:56:28,154][784615] Avg episode reward: [(0, '4.402')]
[2023-02-24 07:56:30,062][794032] Updated weights for policy 0, policy_version 70 (0.0007)
[2023-02-24 07:56:32,640][794032] Updated weights for policy 0, policy_version 80 (0.0007)
[2023-02-24 07:56:33,153][784615] Fps is (10 sec: 15974.4, 60 sec: 13271.1, 300 sec: 13271.1). Total num frames: 331776. Throughput: 0: 2951.7. Samples: 73792. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2023-02-24 07:56:33,154][784615] Avg episode reward: [(0, '4.289')]
[2023-02-24 07:56:35,246][794032] Updated weights for policy 0, policy_version 90 (0.0007)
[2023-02-24 07:56:37,855][794032] Updated weights for policy 0, policy_version 100 (0.0008)
[2023-02-24 07:56:38,153][784615] Fps is (10 sec: 15974.5, 60 sec: 13789.9, 300 sec: 13789.9). Total num frames: 413696. Throughput: 0: 3249.9. Samples: 97496. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2023-02-24 07:56:38,154][784615] Avg episode reward: [(0, '4.457')]
[2023-02-24 07:56:40,454][794032] Updated weights for policy 0, policy_version 110 (0.0007)
[2023-02-24 07:56:43,061][794032] Updated weights for policy 0, policy_version 120 (0.0006)
[2023-02-24 07:56:43,153][784615] Fps is (10 sec: 15973.9, 60 sec: 14043.3, 300 sec: 14043.3). Total num frames: 491520. Throughput: 0: 3461.9. Samples: 121166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2023-02-24 07:56:43,154][784615] Avg episode reward: [(0, '4.401')]
[2023-02-24 07:56:45,699][794032] Updated weights for policy 0, policy_version 130 (0.0007)
[2023-02-24 07:56:48,153][784615] Fps is (10 sec: 15564.7, 60 sec: 14233.6, 300 sec: 14233.6). Total num frames: 569344. Throughput: 0: 3324.0. Samples: 132960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2023-02-24 07:56:48,154][784615] Avg episode reward: [(0, '4.667')]
[2023-02-24 07:56:48,157][794019] Saving new best policy, reward=4.667!
[2023-02-24 07:56:48,334][794032] Updated weights for policy 0, policy_version 140 (0.0007)
[2023-02-24 07:56:50,891][794032] Updated weights for policy 0, policy_version 150 (0.0006)
[2023-02-24 07:56:53,153][784615] Fps is (10 sec: 15565.3, 60 sec: 14381.5, 300 sec: 14381.5). Total num frames: 647168. Throughput: 0: 3474.6. Samples: 156358. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2023-02-24 07:56:53,154][784615] Avg episode reward: [(0, '4.622')]
[2023-02-24 07:56:53,523][794032] Updated weights for policy 0, policy_version 160 (0.0008)
[2023-02-24 07:56:56,158][794032] Updated weights for policy 0, policy_version 170 (0.0007)
[2023-02-24 07:56:58,153][784615] Fps is (10 sec: 15565.0, 60 sec: 14499.9, 300 sec: 14499.9). Total num frames: 724992. Throughput: 0: 3940.0. Samples: 179774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2023-02-24 07:56:58,153][784615] Avg episode reward: [(0, '4.603')]
[2023-02-24 07:56:58,738][794032] Updated weights for policy 0, policy_version 180 (0.0008)
[2023-02-24 07:57:01,298][794032] Updated weights for policy 0, policy_version 190 (0.0007)
[2023-02-24 07:57:03,153][784615] Fps is (10 sec: 15974.3, 60 sec: 14671.1, 300 sec: 14671.1). Total num frames: 806912. Throughput: 0: 3998.0. Samples: 191576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2023-02-24 07:57:03,154][784615] Avg episode reward: [(0, '4.850')]
[2023-02-24 07:57:03,155][794019] Saving new best policy, reward=4.850!
[2023-02-24 07:57:03,931][794032] Updated weights for policy 0, policy_version 200 (0.0007)
[2023-02-24 07:57:06,517][794032] Updated weights for policy 0, policy_version 210 (0.0007)
[2023-02-24 07:57:08,153][784615] Fps is (10 sec: 15974.2, 60 sec: 14745.6, 300 sec: 14745.6). Total num frames: 884736. Throughput: 0: 3958.6. Samples: 215068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2023-02-24 07:57:08,154][784615] Avg episode reward: [(0, '5.203')]
[2023-02-24 07:57:08,156][794019] Saving new best policy, reward=5.203!
[2023-02-24 07:57:09,207][794032] Updated weights for policy 0, policy_version 220 (0.0006)
[2023-02-24 07:57:11,788][794032] Updated weights for policy 0, policy_version 230 (0.0006)
[2023-02-24 07:57:13,153][784615] Fps is (10 sec: 15564.9, 60 sec: 15974.4, 300 sec: 14808.6). Total num frames: 962560. Throughput: 0: 3932.8. Samples: 238698. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2023-02-24 07:57:13,154][784615] Avg episode reward: [(0, '4.831')]
[2023-02-24 07:57:14,400][794032] Updated weights for policy 0, policy_version 240 (0.0006)
[2023-02-24 07:57:17,007][794032] Updated weights for policy 0, policy_version 250 (0.0008)
[2023-02-24 07:57:18,153][784615] Fps is (10 sec: 15564.8, 60 sec: 15906.1, 300 sec: 14862.6). Total num frames: 1040384. Throughput: 0: 3928.0. Samples: 250550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2023-02-24 07:57:18,154][784615] Avg episode reward: [(0, '6.010')]
[2023-02-24 07:57:18,157][794019] Saving new best policy, reward=6.010!
[2023-02-24 07:57:19,591][794032] Updated weights for policy 0, policy_version 260 (0.0006)
[2023-02-24 07:57:21,628][784615] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 784615], exiting...
[2023-02-24 07:57:21,629][794019] Stopping Batcher_0...
[2023-02-24 07:57:21,629][794019] Loop batcher_evt_loop terminating...
[2023-02-24 07:57:21,628][784615] Runner profile tree view:
main_loop: 82.2510
[2023-02-24 07:57:21,630][784615] Collected {0: 1093632}, FPS: 13296.3
[2023-02-24 07:57:21,638][794039] Stopping RolloutWorker_w6...
[2023-02-24 07:57:21,638][794035] Stopping RolloutWorker_w3...
[2023-02-24 07:57:21,638][794035] Loop rollout_proc3_evt_loop terminating...
[2023-02-24 07:57:21,638][794034] Stopping RolloutWorker_w1...
[2023-02-24 07:57:21,639][794039] Loop rollout_proc6_evt_loop terminating...
[2023-02-24 07:57:21,639][794034] Loop rollout_proc1_evt_loop terminating...
[2023-02-24 07:57:21,645][794033] Stopping RolloutWorker_w0...
[2023-02-24 07:57:21,646][794033] Loop rollout_proc0_evt_loop terminating...
[2023-02-24 07:57:21,646][794037] Stopping RolloutWorker_w4...
[2023-02-24 07:57:21,647][794037] Loop rollout_proc4_evt_loop terminating...
[2023-02-24 07:57:21,647][794038] Stopping RolloutWorker_w2...
[2023-02-24 07:57:21,648][794038] Loop rollout_proc2_evt_loop terminating...
[2023-02-24 07:57:21,651][794036] Stopping RolloutWorker_w5...
[2023-02-24 07:57:21,651][794036] Loop rollout_proc5_evt_loop terminating...
[2023-02-24 07:57:21,652][794040] Stopping RolloutWorker_w7...
[2023-02-24 07:57:21,653][794040] Loop rollout_proc7_evt_loop terminating...
[2023-02-24 07:57:21,667][794019] Saving /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth...
[2023-02-24 07:57:21,688][794032] Weights refcount: 2 0
[2023-02-24 07:57:21,695][794032] Stopping InferenceWorker_p0-w0...
[2023-02-24 07:57:21,696][794032] Loop inference_proc0-0_evt_loop terminating...
[2023-02-24 07:57:21,825][794019] Stopping LearnerWorker_p0...
[2023-02-24 07:57:21,826][794019] Loop learner_proc0_evt_loop terminating...
[2023-02-24 07:57:37,644][784615] Loading existing experiment configuration from /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
[2023-02-24 07:57:37,644][784615] Overriding arg 'num_workers' with value 1 passed from command line
[2023-02-24 07:57:37,645][784615] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-02-24 07:57:37,645][784615] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-02-24 07:57:37,645][784615] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-02-24 07:57:37,646][784615] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-02-24 07:57:37,646][784615] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2023-02-24 07:57:37,646][784615] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-02-24 07:57:37,647][784615] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2023-02-24 07:57:37,647][784615] Adding new argument 'hf_repository'='chqmatteo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2023-02-24 07:57:37,647][784615] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-02-24 07:57:37,648][784615] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-02-24 07:57:37,648][784615] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-02-24 07:57:37,648][784615] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-02-24 07:57:37,649][784615] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-02-24 07:57:37,655][784615] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-02-24 07:57:37,656][784615] RunningMeanStd input shape: (3, 72, 128)
[2023-02-24 07:57:37,657][784615] RunningMeanStd input shape: (1,)
[2023-02-24 07:57:37,665][784615] ConvEncoder: input_channels=3
[2023-02-24 07:57:37,755][784615] Conv encoder output size: 512
[2023-02-24 07:57:37,755][784615] Policy head output size: 512
[2023-02-24 07:57:40,369][784615] Loading state from checkpoint /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth...
[2023-02-24 07:57:42,531][784615] Num frames 100...
[2023-02-24 07:57:42,595][784615] Num frames 200...
[2023-02-24 07:57:42,663][784615] Num frames 300...
[2023-02-24 07:57:42,732][784615] Num frames 400...
[2023-02-24 07:57:42,832][784615] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800
[2023-02-24 07:57:42,833][784615] Avg episode reward: 6.800, avg true_objective: 4.800
[2023-02-24 07:57:42,848][784615] Num frames 500...
[2023-02-24 07:57:42,911][784615] Num frames 600...
[2023-02-24 07:57:42,969][784615] Num frames 700...
[2023-02-24 07:57:43,043][784615] Num frames 800...
[2023-02-24 07:57:43,113][784615] Avg episode rewards: #0: 5.630, true rewards: #0: 4.130
[2023-02-24 07:57:43,113][784615] Avg episode reward: 5.630, avg true_objective: 4.130
[2023-02-24 07:57:43,175][784615] Num frames 900...
[2023-02-24 07:57:43,244][784615] Num frames 1000...
[2023-02-24 07:57:43,309][784615] Num frames 1100...
[2023-02-24 07:57:43,371][784615] Num frames 1200...
[2023-02-24 07:57:43,470][784615] Avg episode rewards: #0: 5.580, true rewards: #0: 4.247
[2023-02-24 07:57:43,471][784615] Avg episode reward: 5.580, avg true_objective: 4.247
[2023-02-24 07:57:43,487][784615] Num frames 1300...
[2023-02-24 07:57:43,564][784615] Num frames 1400...
[2023-02-24 07:57:43,628][784615] Num frames 1500...
[2023-02-24 07:57:43,689][784615] Num frames 1600...
[2023-02-24 07:57:43,781][784615] Avg episode rewards: #0: 5.145, true rewards: #0: 4.145
[2023-02-24 07:57:43,781][784615] Avg episode reward: 5.145, avg true_objective: 4.145
[2023-02-24 07:57:43,817][784615] Num frames 1700...
[2023-02-24 07:57:43,887][784615] Num frames 1800...
[2023-02-24 07:57:43,956][784615] Num frames 1900...
[2023-02-24 07:57:44,024][784615] Num frames 2000...
[2023-02-24 07:57:44,089][784615] Num frames 2100...
[2023-02-24 07:57:44,147][784615] Avg episode rewards: #0: 5.612, true rewards: #0: 4.212
[2023-02-24 07:57:44,148][784615] Avg episode reward: 5.612, avg true_objective: 4.212
[2023-02-24 07:57:44,208][784615] Num frames 2200...
[2023-02-24 07:57:44,288][784615] Num frames 2300...
[2023-02-24 07:57:44,350][784615] Num frames 2400...
[2023-02-24 07:57:44,463][784615] Avg episode rewards: #0: 5.317, true rewards: #0: 4.150
[2023-02-24 07:57:44,463][784615] Avg episode reward: 5.317, avg true_objective: 4.150
[2023-02-24 07:57:44,471][784615] Num frames 2500...
[2023-02-24 07:57:44,540][784615] Num frames 2600...
[2023-02-24 07:57:44,607][784615] Num frames 2700...
[2023-02-24 07:57:44,671][784615] Num frames 2800...
[2023-02-24 07:57:44,738][784615] Num frames 2900...
[2023-02-24 07:57:44,810][784615] Num frames 3000...
[2023-02-24 07:57:44,865][784615] Avg episode rewards: #0: 5.717, true rewards: #0: 4.289
[2023-02-24 07:57:44,866][784615] Avg episode reward: 5.717, avg true_objective: 4.289
[2023-02-24 07:57:44,925][784615] Num frames 3100...
[2023-02-24 07:57:44,988][784615] Num frames 3200...
[2023-02-24 07:57:45,057][784615] Num frames 3300...
[2023-02-24 07:57:45,126][784615] Num frames 3400...
[2023-02-24 07:57:45,200][784615] Num frames 3500...
[2023-02-24 07:57:45,287][784615] Avg episode rewards: #0: 5.933, true rewards: #0: 4.432
[2023-02-24 07:57:45,287][784615] Avg episode reward: 5.933, avg true_objective: 4.432
[2023-02-24 07:57:45,332][784615] Num frames 3600...
[2023-02-24 07:57:45,410][784615] Num frames 3700...
[2023-02-24 07:57:45,468][784615] Num frames 3800...
[2023-02-24 07:57:45,570][784615] Num frames 3900...
[2023-02-24 07:57:45,634][784615] Num frames 4000...
[2023-02-24 07:57:45,698][784615] Num frames 4100...
[2023-02-24 07:57:45,762][784615] Num frames 4200...
[2023-02-24 07:57:45,873][784615] Avg episode rewards: #0: 6.869, true rewards: #0: 4.758
[2023-02-24 07:57:45,874][784615] Avg episode reward: 6.869, avg true_objective: 4.758
[2023-02-24 07:57:45,886][784615] Num frames 4300...
[2023-02-24 07:57:45,952][784615] Num frames 4400...
[2023-02-24 07:57:46,014][784615] Num frames 4500...
[2023-02-24 07:57:46,087][784615] Num frames 4600...
[2023-02-24 07:57:46,149][784615] Num frames 4700...
[2023-02-24 07:57:46,224][784615] Avg episode rewards: #0: 6.730, true rewards: #0: 4.730
[2023-02-24 07:57:46,225][784615] Avg episode reward: 6.730, avg true_objective: 4.730
[2023-02-24 07:57:48,353][784615] Replay video saved to /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!
[2023-02-24 07:58:39,896][784615] Loading existing experiment configuration from /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
[2023-02-24 07:58:39,896][784615] Overriding arg 'num_workers' with value 1 passed from command line
[2023-02-24 07:58:39,897][784615] Adding new argument 'no_render'=True that is not in the saved config file!
[2023-02-24 07:58:39,897][784615] Adding new argument 'save_video'=True that is not in the saved config file!
[2023-02-24 07:58:39,898][784615] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2023-02-24 07:58:39,898][784615] Adding new argument 'video_name'=None that is not in the saved config file!
[2023-02-24 07:58:39,899][784615] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2023-02-24 07:58:39,899][784615] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2023-02-24 07:58:39,900][784615] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2023-02-24 07:58:39,900][784615] Adding new argument 'hf_repository'='chqmatteo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2023-02-24 07:58:39,900][784615] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2023-02-24 07:58:39,901][784615] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2023-02-24 07:58:39,901][784615] Adding new argument 'train_script'=None that is not in the saved config file!
[2023-02-24 07:58:39,902][784615] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2023-02-24 07:58:39,902][784615] Using frameskip 1 and render_action_repeat=4 for evaluation
[2023-02-24 07:58:39,911][784615] RunningMeanStd input shape: (3, 72, 128)
[2023-02-24 07:58:39,912][784615] RunningMeanStd input shape: (1,)
[2023-02-24 07:58:39,919][784615] ConvEncoder: input_channels=3
[2023-02-24 07:58:39,943][784615] Conv encoder output size: 512
[2023-02-24 07:58:39,944][784615] Policy head output size: 512
[2023-02-24 07:58:39,980][784615] Loading state from checkpoint /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth...
[2023-02-24 07:58:40,400][784615] Num frames 100...
[2023-02-24 07:58:40,470][784615] Num frames 200...
[2023-02-24 07:58:40,530][784615] Num frames 300...
[2023-02-24 07:58:40,596][784615] Num frames 400...
[2023-02-24 07:58:40,684][784615] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
[2023-02-24 07:58:40,685][784615] Avg episode reward: 5.480, avg true_objective: 4.480
[2023-02-24 07:58:40,719][784615] Num frames 500...
[2023-02-24 07:58:40,787][784615] Num frames 600...
[2023-02-24 07:58:40,850][784615] Num frames 700...
[2023-02-24 07:58:40,918][784615] Num frames 800...
[2023-02-24 07:58:41,005][784615] Num frames 900...
[2023-02-24 07:58:41,078][784615] Avg episode rewards: #0: 6.640, true rewards: #0: 4.640
[2023-02-24 07:58:41,079][784615] Avg episode reward: 6.640, avg true_objective: 4.640
[2023-02-24 07:58:41,129][784615] Num frames 1000...
[2023-02-24 07:58:41,202][784615] Num frames 1100...
[2023-02-24 07:58:41,276][784615] Num frames 1200...
[2023-02-24 07:58:41,360][784615] Num frames 1300...
[2023-02-24 07:58:41,429][784615] Num frames 1400...
[2023-02-24 07:58:41,493][784615] Num frames 1500...
[2023-02-24 07:58:41,554][784615] Num frames 1600...
[2023-02-24 07:58:41,667][784615] Avg episode rewards: #0: 8.653, true rewards: #0: 5.653
[2023-02-24 07:58:41,668][784615] Avg episode reward: 8.653, avg true_objective: 5.653
[2023-02-24 07:58:41,673][784615] Num frames 1700...
[2023-02-24 07:58:41,734][784615] Num frames 1800...
[2023-02-24 07:58:41,791][784615] Num frames 1900...
[2023-02-24 07:58:41,848][784615] Num frames 2000...
[2023-02-24 07:58:41,905][784615] Num frames 2100...
[2023-02-24 07:58:41,961][784615] Num frames 2200...
[2023-02-24 07:58:42,055][784615] Avg episode rewards: #0: 8.680, true rewards: #0: 5.680
[2023-02-24 07:58:42,056][784615] Avg episode reward: 8.680, avg true_objective: 5.680
[2023-02-24 07:58:42,077][784615] Num frames 2300...
[2023-02-24 07:58:42,141][784615] Num frames 2400...
[2023-02-24 07:58:42,203][784615] Num frames 2500...
[2023-02-24 07:58:42,260][784615] Num frames 2600...
[2023-02-24 07:58:42,346][784615] Avg episode rewards: #0: 7.712, true rewards: #0: 5.312
[2023-02-24 07:58:42,348][784615] Avg episode reward: 7.712, avg true_objective: 5.312
[2023-02-24 07:58:42,385][784615] Num frames 2700...
[2023-02-24 07:58:42,449][784615] Num frames 2800...
[2023-02-24 07:58:42,506][784615] Num frames 2900...
[2023-02-24 07:58:42,563][784615] Num frames 3000...
[2023-02-24 07:58:42,621][784615] Num frames 3100...
[2023-02-24 07:58:42,688][784615] Num frames 3200...
[2023-02-24 07:58:42,788][784615] Avg episode rewards: #0: 7.940, true rewards: #0: 5.440
[2023-02-24 07:58:42,788][784615] Avg episode reward: 7.940, avg true_objective: 5.440
[2023-02-24 07:58:42,817][784615] Num frames 3300...
[2023-02-24 07:58:42,890][784615] Num frames 3400...
[2023-02-24 07:58:42,964][784615] Num frames 3500...
[2023-02-24 07:58:43,040][784615] Num frames 3600...
[2023-02-24 07:58:43,128][784615] Avg episode rewards: #0: 7.354, true rewards: #0: 5.211
[2023-02-24 07:58:43,130][784615] Avg episode reward: 7.354, avg true_objective: 5.211
[2023-02-24 07:58:43,174][784615] Num frames 3700...
[2023-02-24 07:58:43,248][784615] Num frames 3800...
[2023-02-24 07:58:43,316][784615] Num frames 3900...
[2023-02-24 07:58:43,383][784615] Num frames 4000...
[2023-02-24 07:58:43,475][784615] Avg episode rewards: #0: 7.205, true rewards: #0: 5.080
[2023-02-24 07:58:43,476][784615] Avg episode reward: 7.205, avg true_objective: 5.080
[2023-02-24 07:58:43,498][784615] Num frames 4100...
[2023-02-24 07:58:43,556][784615] Num frames 4200...
[2023-02-24 07:58:43,614][784615] Num frames 4300...
[2023-02-24 07:58:43,670][784615] Num frames 4400...
[2023-02-24 07:58:43,727][784615] Num frames 4500...
[2023-02-24 07:58:43,790][784615] Num frames 4600...
[2023-02-24 07:58:43,891][784615] Avg episode rewards: #0: 7.413, true rewards: #0: 5.191
[2023-02-24 07:58:43,891][784615] Avg episode reward: 7.413, avg true_objective: 5.191
[2023-02-24 07:58:43,912][784615] Num frames 4700...
[2023-02-24 07:58:43,982][784615] Num frames 4800...
[2023-02-24 07:58:44,049][784615] Num frames 4900...
[2023-02-24 07:58:44,117][784615] Num frames 5000...
[2023-02-24 07:58:44,184][784615] Num frames 5100...
[2023-02-24 07:58:44,250][784615] Avg episode rewards: #0: 7.220, true rewards: #0: 5.120
[2023-02-24 07:58:44,250][784615] Avg episode reward: 7.220, avg true_objective: 5.120
[2023-02-24 07:58:46,584][784615] Replay video saved to /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!