[2024-06-05 17:50:36,095][10130] Saving configuration to /workspace/metta/train_dir/p2.metta.4/config.json... [2024-06-05 17:50:36,112][10130] Rollout worker 0 uses device cpu [2024-06-05 17:50:36,112][10130] Rollout worker 1 uses device cpu [2024-06-05 17:50:36,113][10130] Rollout worker 2 uses device cpu [2024-06-05 17:50:36,113][10130] Rollout worker 3 uses device cpu [2024-06-05 17:50:36,113][10130] Rollout worker 4 uses device cpu [2024-06-05 17:50:36,113][10130] Rollout worker 5 uses device cpu [2024-06-05 17:50:36,114][10130] Rollout worker 6 uses device cpu [2024-06-05 17:50:36,114][10130] Rollout worker 7 uses device cpu [2024-06-05 17:50:36,114][10130] Rollout worker 8 uses device cpu [2024-06-05 17:50:36,114][10130] Rollout worker 9 uses device cpu [2024-06-05 17:50:36,115][10130] Rollout worker 10 uses device cpu [2024-06-05 17:50:36,115][10130] Rollout worker 11 uses device cpu [2024-06-05 17:50:36,115][10130] Rollout worker 12 uses device cpu [2024-06-05 17:50:36,116][10130] Rollout worker 13 uses device cpu [2024-06-05 17:50:36,116][10130] Rollout worker 14 uses device cpu [2024-06-05 17:50:36,116][10130] Rollout worker 15 uses device cpu [2024-06-05 17:50:36,116][10130] Rollout worker 16 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 17 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 18 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 19 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 20 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 21 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 22 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 23 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 24 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 25 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 26 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 27 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 28 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 29 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 30 uses device cpu [2024-06-05 17:50:36,119][10130] Rollout worker 31 uses device cpu [2024-06-05 17:50:36,630][10130] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-05 17:50:36,630][10130] InferenceWorker_p0-w0: min num requests: 10 [2024-06-05 17:50:36,673][10130] Starting all processes... [2024-06-05 17:50:36,673][10130] Starting process learner_proc0 [2024-06-05 17:50:36,947][10130] Starting all processes... [2024-06-05 17:50:36,950][10130] Starting process inference_proc0-0 [2024-06-05 17:50:36,950][10130] Starting process rollout_proc0 [2024-06-05 17:50:36,950][10130] Starting process rollout_proc1 [2024-06-05 17:50:36,950][10130] Starting process rollout_proc2 [2024-06-05 17:50:36,950][10130] Starting process rollout_proc3 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc4 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc5 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc6 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc7 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc8 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc9 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc10 [2024-06-05 17:50:36,952][10130] Starting process rollout_proc11 [2024-06-05 17:50:36,952][10130] Starting process rollout_proc12 [2024-06-05 17:50:36,953][10130] Starting process rollout_proc13 [2024-06-05 17:50:36,954][10130] Starting process rollout_proc14 [2024-06-05 17:50:36,954][10130] Starting process rollout_proc15 [2024-06-05 17:50:36,960][10130] Starting process rollout_proc16 [2024-06-05 17:50:36,960][10130] Starting process rollout_proc17 [2024-06-05 17:50:36,960][10130] Starting process rollout_proc18 [2024-06-05 17:50:36,961][10130] Starting process rollout_proc19 [2024-06-05 17:50:36,961][10130] Starting process rollout_proc20 [2024-06-05 17:50:36,961][10130] Starting process rollout_proc21 [2024-06-05 17:50:36,963][10130] Starting process rollout_proc22 [2024-06-05 17:50:36,964][10130] Starting process rollout_proc23 [2024-06-05 17:50:36,966][10130] Starting process rollout_proc24 [2024-06-05 17:50:36,969][10130] Starting process rollout_proc25 [2024-06-05 17:50:36,970][10130] Starting process rollout_proc26 [2024-06-05 17:50:36,970][10130] Starting process rollout_proc27 [2024-06-05 17:50:36,974][10130] Starting process rollout_proc28 [2024-06-05 17:50:36,974][10130] Starting process rollout_proc29 [2024-06-05 17:50:36,975][10130] Starting process rollout_proc30 [2024-06-05 17:50:36,978][10130] Starting process rollout_proc31 [2024-06-05 17:50:38,851][10396] Worker 27 uses CPU cores [27] [2024-06-05 17:50:38,866][10384] Worker 16 uses CPU cores [16] [2024-06-05 17:50:39,028][10367] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-05 17:50:39,028][10367] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-05 17:50:39,043][10367] Num visible devices: 1 [2024-06-05 17:50:39,067][10376] Worker 8 uses CPU cores [8] [2024-06-05 17:50:39,079][10389] Worker 22 uses CPU cores [22] [2024-06-05 17:50:39,099][10375] Worker 7 uses CPU cores [7] [2024-06-05 17:50:39,127][10371] Worker 1 uses CPU cores [1] [2024-06-05 17:50:39,155][10381] Worker 14 uses CPU cores [14] [2024-06-05 17:50:39,171][10393] Worker 25 uses CPU cores [25] [2024-06-05 17:50:39,175][10397] Worker 29 uses CPU cores [29] [2024-06-05 17:50:39,179][10394] Worker 26 uses CPU cores [26] [2024-06-05 17:50:39,195][10391] Worker 24 uses CPU cores [24] [2024-06-05 17:50:39,215][10373] Worker 5 uses CPU cores [5] [2024-06-05 17:50:39,234][10374] Worker 6 uses CPU cores [6] [2024-06-05 17:50:39,239][10387] Worker 19 uses CPU cores [19] [2024-06-05 17:50:39,277][10369] Worker 3 uses CPU cores [3] [2024-06-05 17:50:39,317][10382] Worker 15 uses CPU cores [15] [2024-06-05 17:50:39,322][10377] Worker 9 uses CPU cores [9] [2024-06-05 17:50:39,351][10368] Worker 0 uses CPU cores [0] [2024-06-05 17:50:39,363][10372] Worker 4 uses CPU cores [4] [2024-06-05 17:50:39,368][10383] Worker 13 uses CPU cores [13] [2024-06-05 17:50:39,371][10390] Worker 23 uses CPU cores [23] [2024-06-05 17:50:39,391][10370] Worker 2 uses CPU cores [2] [2024-06-05 17:50:39,391][10395] Worker 28 uses CPU cores [28] [2024-06-05 17:50:39,399][10380] Worker 12 uses CPU cores [12] [2024-06-05 17:50:39,401][10379] Worker 11 uses CPU cores [11] [2024-06-05 17:50:39,434][10386] Worker 18 uses CPU cores [18] [2024-06-05 17:50:39,445][10378] Worker 10 uses CPU cores [10] [2024-06-05 17:50:39,516][10399] Worker 30 uses CPU cores [30] [2024-06-05 17:50:39,517][10392] Worker 21 uses CPU cores [21] [2024-06-05 17:50:39,526][10398] Worker 31 uses CPU cores [31] [2024-06-05 17:50:39,532][10388] Worker 20 uses CPU cores [20] [2024-06-05 17:50:39,537][10385] Worker 17 uses CPU cores [17] [2024-06-05 17:50:39,563][10347] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-05 17:50:39,563][10347] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-05 17:50:39,570][10347] Num visible devices: 1 [2024-06-05 17:50:39,580][10347] Setting fixed seed 0 [2024-06-05 17:50:39,580][10347] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-05 17:50:39,580][10347] Initializing actor-critic model on device cuda:0 [2024-06-05 17:50:40,184][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,189][10347] RunningMeanStd input shape: (1,) [2024-06-05 17:50:40,189][10347] RunningMeanStd input shape: (1,) [2024-06-05 17:50:40,189][10347] RunningMeanStd input shape: (1,) [2024-06-05 17:50:40,189][10347] RunningMeanStd input shape: (1,) [2024-06-05 17:50:40,228][10347] RunningMeanStd input shape: (1,) [2024-06-05 17:50:40,232][10347] Created Actor Critic model with architecture: [2024-06-05 17:50:40,232][10347] SampleFactoryAgentWrapper( (obs_normalizer): ObservationNormalizer() (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (agent): MettaAgent( (_encoder): MultiFeatureSetEncoder( (feature_set_encoders): ModuleDict( (grid_obs): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (agent): RunningMeanStdInPlace() (altar): RunningMeanStdInPlace() (converter): RunningMeanStdInPlace() (generator): RunningMeanStdInPlace() (wall): RunningMeanStdInPlace() (agent:dir): RunningMeanStdInPlace() (agent:energy): RunningMeanStdInPlace() (agent:frozen): RunningMeanStdInPlace() (agent:hp): RunningMeanStdInPlace() (agent:id): RunningMeanStdInPlace() (agent:inv_r1): RunningMeanStdInPlace() (agent:inv_r2): RunningMeanStdInPlace() (agent:inv_r3): RunningMeanStdInPlace() (agent:shield): RunningMeanStdInPlace() (altar:hp): RunningMeanStdInPlace() (altar:state): RunningMeanStdInPlace() (converter:hp): RunningMeanStdInPlace() (converter:state): RunningMeanStdInPlace() (generator:amount): RunningMeanStdInPlace() (generator:hp): RunningMeanStdInPlace() (generator:state): RunningMeanStdInPlace() (wall:hp): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=125, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=512, out_features=512, bias=True) (7): ELU(alpha=1.0) ) ) (global_vars): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (_steps): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_action): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_action_id): RunningMeanStdInPlace() (last_action_val): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_reward): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_reward): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) ) (merged_encoder): Sequential( (0): Linear(in_features=536, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) ) ) (_core): ModelCoreRNN( (core): GRU(512, 512) ) (_decoder): Decoder( (mlp): Identity() ) (_critic_linear): Linear(in_features=512, out_features=1, bias=True) (_action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) ) [2024-06-05 17:50:40,295][10347] Using optimizer [2024-06-05 17:50:40,442][10347] No checkpoints found [2024-06-05 17:50:40,442][10347] Did not load from checkpoint, starting from scratch! [2024-06-05 17:50:40,442][10347] Initialized policy 0 weights for model version 0 [2024-06-05 17:50:40,444][10347] LearnerWorker_p0 finished initialization! [2024-06-05 17:50:40,444][10347] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-05 17:50:41,082][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,084][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,084][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,084][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,087][10367] RunningMeanStd input shape: (1,) [2024-06-05 17:50:41,087][10367] RunningMeanStd input shape: (1,) [2024-06-05 17:50:41,087][10367] RunningMeanStd input shape: (1,) [2024-06-05 17:50:41,087][10367] RunningMeanStd input shape: (1,) [2024-06-05 17:50:41,126][10367] RunningMeanStd input shape: (1,) [2024-06-05 17:50:41,148][10130] Inference worker 0-0 is ready! [2024-06-05 17:50:41,148][10130] All inference workers are ready! Signal rollout workers to start! [2024-06-05 17:50:43,238][10387] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,238][10389] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,252][10390] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,260][10385] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,261][10396] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,261][10388] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,266][10398] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,267][10384] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,271][10391] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,272][10393] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,273][10386] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,274][10399] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,277][10397] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,282][10369] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,283][10382] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,283][10379] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,284][10375] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,284][10377] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,286][10373] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,290][10395] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,291][10371] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,292][10368] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,292][10380] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,293][10381] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,294][10372] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,294][10376] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,295][10374] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,296][10378] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,297][10370] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,297][10383] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,298][10392] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,309][10394] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,920][10130] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-05 17:50:43,980][10387] Decorrelating experience for 256 frames... [2024-06-05 17:50:43,984][10389] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,007][10390] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,017][10385] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,035][10398] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,037][10388] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,042][10384] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,043][10379] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,045][10391] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,050][10396] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,052][10382] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,053][10369] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,057][10373] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,058][10399] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,059][10375] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,061][10377] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,062][10386] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,066][10393] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,068][10380] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,069][10374] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,070][10397] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,071][10371] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,072][10368] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,074][10378] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,074][10372] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,074][10381] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,076][10370] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,078][10376] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,081][10383] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,086][10395] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,104][10394] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,105][10392] Decorrelating experience for 256 frames... [2024-06-05 17:50:48,920][10130] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 31084.7. Samples: 155420. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-05 17:50:49,852][10381] Worker 14, sleep for 65.625 sec to decorrelate experience collection [2024-06-05 17:50:49,852][10389] Worker 22, sleep for 103.125 sec to decorrelate experience collection [2024-06-05 17:50:49,852][10385] Worker 17, sleep for 79.688 sec to decorrelate experience collection [2024-06-05 17:50:49,853][10391] Worker 24, sleep for 112.500 sec to decorrelate experience collection [2024-06-05 17:50:49,862][10383] Worker 13, sleep for 60.938 sec to decorrelate experience collection [2024-06-05 17:50:49,862][10369] Worker 3, sleep for 14.062 sec to decorrelate experience collection [2024-06-05 17:50:49,862][10378] Worker 10, sleep for 46.875 sec to decorrelate experience collection [2024-06-05 17:50:49,862][10370] Worker 2, sleep for 9.375 sec to decorrelate experience collection [2024-06-05 17:50:49,863][10390] Worker 23, sleep for 107.812 sec to decorrelate experience collection [2024-06-05 17:50:49,863][10384] Worker 16, sleep for 75.000 sec to decorrelate experience collection [2024-06-05 17:50:49,863][10387] Worker 19, sleep for 89.062 sec to decorrelate experience collection [2024-06-05 17:50:49,863][10399] Worker 30, sleep for 140.625 sec to decorrelate experience collection [2024-06-05 17:50:49,874][10388] Worker 20, sleep for 93.750 sec to decorrelate experience collection [2024-06-05 17:50:49,874][10398] Worker 31, sleep for 145.312 sec to decorrelate experience collection [2024-06-05 17:50:49,874][10386] Worker 18, sleep for 84.375 sec to decorrelate experience collection [2024-06-05 17:50:49,874][10397] Worker 29, sleep for 135.938 sec to decorrelate experience collection [2024-06-05 17:50:49,875][10396] Worker 27, sleep for 126.562 sec to decorrelate experience collection [2024-06-05 17:50:49,880][10380] Worker 12, sleep for 56.250 sec to decorrelate experience collection [2024-06-05 17:50:49,880][10382] Worker 15, sleep for 70.312 sec to decorrelate experience collection [2024-06-05 17:50:49,881][10376] Worker 8, sleep for 37.500 sec to decorrelate experience collection [2024-06-05 17:50:49,881][10377] Worker 9, sleep for 42.188 sec to decorrelate experience collection [2024-06-05 17:50:49,887][10371] Worker 1, sleep for 4.688 sec to decorrelate experience collection [2024-06-05 17:50:49,891][10379] Worker 11, sleep for 51.562 sec to decorrelate experience collection [2024-06-05 17:50:49,893][10393] Worker 25, sleep for 117.188 sec to decorrelate experience collection [2024-06-05 17:50:49,893][10395] Worker 28, sleep for 131.250 sec to decorrelate experience collection [2024-06-05 17:50:49,899][10374] Worker 6, sleep for 28.125 sec to decorrelate experience collection [2024-06-05 17:50:49,900][10394] Worker 26, sleep for 121.875 sec to decorrelate experience collection [2024-06-05 17:50:49,907][10375] Worker 7, sleep for 32.812 sec to decorrelate experience collection [2024-06-05 17:50:49,908][10392] Worker 21, sleep for 98.438 sec to decorrelate experience collection [2024-06-05 17:50:49,941][10372] Worker 4, sleep for 18.750 sec to decorrelate experience collection [2024-06-05 17:50:49,943][10373] Worker 5, sleep for 23.438 sec to decorrelate experience collection [2024-06-05 17:50:49,958][10347] Signal inference workers to stop experience collection... [2024-06-05 17:50:50,001][10367] InferenceWorker_p0-w0: stopping experience collection [2024-06-05 17:50:50,489][10347] Signal inference workers to resume experience collection... [2024-06-05 17:50:50,489][10367] InferenceWorker_p0-w0: resuming experience collection [2024-06-05 17:50:51,580][10367] Updated weights for policy 0, policy_version 10 (0.0012) [2024-06-05 17:50:53,920][10130] Fps is (10 sec: 16383.9, 60 sec: 16383.9, 300 sec: 16383.9). Total num frames: 163840. Throughput: 0: 33031.8. Samples: 330320. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-05 17:50:54,598][10371] Worker 1 awakens! [2024-06-05 17:50:56,627][10130] Heartbeat connected on Batcher_0 [2024-06-05 17:50:56,629][10130] Heartbeat connected on LearnerWorker_p0 [2024-06-05 17:50:56,640][10130] Heartbeat connected on RolloutWorker_w0 [2024-06-05 17:50:56,641][10130] Heartbeat connected on RolloutWorker_w1 [2024-06-05 17:50:56,697][10130] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-05 17:50:58,920][10130] Fps is (10 sec: 16383.9, 60 sec: 10922.7, 300 sec: 10922.7). Total num frames: 163840. Throughput: 0: 22352.1. Samples: 335280. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-05 17:50:59,283][10370] Worker 2 awakens! [2024-06-05 17:50:59,289][10130] Heartbeat connected on RolloutWorker_w2 [2024-06-05 17:51:03,920][10130] Fps is (10 sec: 1638.4, 60 sec: 9011.1, 300 sec: 9011.1). Total num frames: 180224. Throughput: 0: 17577.7. Samples: 351560. Policy #0 lag: (min: 0.0, avg: 9.4, max: 10.0) [2024-06-05 17:51:03,995][10369] Worker 3 awakens! [2024-06-05 17:51:04,000][10130] Heartbeat connected on RolloutWorker_w3 [2024-06-05 17:51:08,714][10372] Worker 4 awakens! [2024-06-05 17:51:08,722][10130] Heartbeat connected on RolloutWorker_w4 [2024-06-05 17:51:08,920][10130] Fps is (10 sec: 4915.3, 60 sec: 8519.8, 300 sec: 8519.8). Total num frames: 212992. Throughput: 0: 15021.0. Samples: 375520. Policy #0 lag: (min: 0.0, avg: 7.9, max: 12.0) [2024-06-05 17:51:13,477][10373] Worker 5 awakens! [2024-06-05 17:51:13,481][10130] Heartbeat connected on RolloutWorker_w5 [2024-06-05 17:51:13,920][10130] Fps is (10 sec: 11469.5, 60 sec: 9830.5, 300 sec: 9830.5). Total num frames: 294912. Throughput: 0: 14072.2. Samples: 422160. Policy #0 lag: (min: 0.0, avg: 1.2, max: 4.0) [2024-06-05 17:51:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:13,934][10347] Saving new best policy, reward=0.000! [2024-06-05 17:51:15,108][10367] Updated weights for policy 0, policy_version 20 (0.0013) [2024-06-05 17:51:18,127][10374] Worker 6 awakens! [2024-06-05 17:51:18,131][10130] Heartbeat connected on RolloutWorker_w6 [2024-06-05 17:51:18,920][10130] Fps is (10 sec: 18022.3, 60 sec: 11234.8, 300 sec: 11234.8). Total num frames: 393216. Throughput: 0: 15283.5. Samples: 534920. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) [2024-06-05 17:51:18,928][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:21,984][10367] Updated weights for policy 0, policy_version 30 (0.0011) [2024-06-05 17:51:22,820][10375] Worker 7 awakens! [2024-06-05 17:51:22,825][10130] Heartbeat connected on RolloutWorker_w7 [2024-06-05 17:51:23,920][10130] Fps is (10 sec: 24576.0, 60 sec: 13516.9, 300 sec: 13516.9). Total num frames: 540672. Throughput: 0: 17248.1. Samples: 689920. Policy #0 lag: (min: 0.0, avg: 2.7, max: 5.0) [2024-06-05 17:51:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:27,479][10376] Worker 8 awakens! [2024-06-05 17:51:27,483][10130] Heartbeat connected on RolloutWorker_w8 [2024-06-05 17:51:28,055][10367] Updated weights for policy 0, policy_version 40 (0.0011) [2024-06-05 17:51:28,920][10130] Fps is (10 sec: 29491.3, 60 sec: 15291.8, 300 sec: 15291.8). Total num frames: 688128. Throughput: 0: 17122.7. Samples: 770520. Policy #0 lag: (min: 0.0, avg: 11.7, max: 38.0) [2024-06-05 17:51:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:32,168][10377] Worker 9 awakens! [2024-06-05 17:51:32,174][10130] Heartbeat connected on RolloutWorker_w9 [2024-06-05 17:51:32,991][10367] Updated weights for policy 0, policy_version 50 (0.0012) [2024-06-05 17:51:33,920][10130] Fps is (10 sec: 27852.8, 60 sec: 16384.1, 300 sec: 16384.1). Total num frames: 819200. Throughput: 0: 17571.6. Samples: 946140. Policy #0 lag: (min: 0.0, avg: 3.9, max: 8.0) [2024-06-05 17:51:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:36,838][10378] Worker 10 awakens! [2024-06-05 17:51:36,842][10130] Heartbeat connected on RolloutWorker_w10 [2024-06-05 17:51:38,322][10367] Updated weights for policy 0, policy_version 60 (0.0011) [2024-06-05 17:51:38,920][10130] Fps is (10 sec: 31129.9, 60 sec: 18171.4, 300 sec: 18171.4). Total num frames: 999424. Throughput: 0: 18054.8. Samples: 1142780. Policy #0 lag: (min: 0.0, avg: 19.8, max: 56.0) [2024-06-05 17:51:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:41,554][10379] Worker 11 awakens! [2024-06-05 17:51:41,560][10130] Heartbeat connected on RolloutWorker_w11 [2024-06-05 17:51:42,019][10367] Updated weights for policy 0, policy_version 70 (0.0013) [2024-06-05 17:51:43,920][10130] Fps is (10 sec: 39321.3, 60 sec: 20207.0, 300 sec: 20207.0). Total num frames: 1212416. Throughput: 0: 20612.5. Samples: 1262840. Policy #0 lag: (min: 0.0, avg: 24.3, max: 68.0) [2024-06-05 17:51:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:46,224][10367] Updated weights for policy 0, policy_version 80 (0.0013) [2024-06-05 17:51:46,227][10380] Worker 12 awakens! [2024-06-05 17:51:46,232][10130] Heartbeat connected on RolloutWorker_w12 [2024-06-05 17:51:48,920][10130] Fps is (10 sec: 39321.2, 60 sec: 23210.7, 300 sec: 21425.3). Total num frames: 1392640. Throughput: 0: 25902.5. Samples: 1517160. Policy #0 lag: (min: 1.0, avg: 3.3, max: 9.0) [2024-06-05 17:51:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:49,984][10367] Updated weights for policy 0, policy_version 90 (0.0016) [2024-06-05 17:51:50,900][10383] Worker 13 awakens! [2024-06-05 17:51:50,906][10130] Heartbeat connected on RolloutWorker_w13 [2024-06-05 17:51:53,525][10367] Updated weights for policy 0, policy_version 100 (0.0016) [2024-06-05 17:51:53,920][10130] Fps is (10 sec: 44236.5, 60 sec: 24849.2, 300 sec: 23639.8). Total num frames: 1654784. Throughput: 0: 31277.7. Samples: 1783020. Policy #0 lag: (min: 0.0, avg: 4.3, max: 11.0) [2024-06-05 17:51:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:55,577][10381] Worker 14 awakens! [2024-06-05 17:51:55,582][10130] Heartbeat connected on RolloutWorker_w14 [2024-06-05 17:51:57,303][10367] Updated weights for policy 0, policy_version 110 (0.0018) [2024-06-05 17:51:58,920][10130] Fps is (10 sec: 44236.9, 60 sec: 27852.9, 300 sec: 24466.8). Total num frames: 1835008. Throughput: 0: 33310.1. Samples: 1921120. Policy #0 lag: (min: 0.0, avg: 6.0, max: 10.0) [2024-06-05 17:51:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:00,291][10382] Worker 15 awakens! [2024-06-05 17:52:00,296][10130] Heartbeat connected on RolloutWorker_w15 [2024-06-05 17:52:00,836][10367] Updated weights for policy 0, policy_version 120 (0.0019) [2024-06-05 17:52:03,920][10130] Fps is (10 sec: 40960.1, 60 sec: 31402.9, 300 sec: 25804.9). Total num frames: 2064384. Throughput: 0: 36505.3. Samples: 2177660. Policy #0 lag: (min: 0.0, avg: 5.8, max: 11.0) [2024-06-05 17:52:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:04,663][10367] Updated weights for policy 0, policy_version 130 (0.0022) [2024-06-05 17:52:04,963][10384] Worker 16 awakens! [2024-06-05 17:52:04,972][10130] Heartbeat connected on RolloutWorker_w16 [2024-06-05 17:52:08,842][10367] Updated weights for policy 0, policy_version 140 (0.0020) [2024-06-05 17:52:08,920][10130] Fps is (10 sec: 45874.9, 60 sec: 34679.4, 300 sec: 26985.5). Total num frames: 2293760. Throughput: 0: 38434.5. Samples: 2419480. Policy #0 lag: (min: 0.0, avg: 4.0, max: 11.0) [2024-06-05 17:52:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:09,640][10385] Worker 17 awakens! [2024-06-05 17:52:09,649][10130] Heartbeat connected on RolloutWorker_w17 [2024-06-05 17:52:12,890][10367] Updated weights for policy 0, policy_version 150 (0.0019) [2024-06-05 17:52:13,920][10130] Fps is (10 sec: 44236.4, 60 sec: 36863.9, 300 sec: 27852.8). Total num frames: 2506752. Throughput: 0: 39530.6. Samples: 2549400. Policy #0 lag: (min: 0.0, avg: 5.1, max: 13.0) [2024-06-05 17:52:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:14,347][10386] Worker 18 awakens! [2024-06-05 17:52:14,356][10130] Heartbeat connected on RolloutWorker_w18 [2024-06-05 17:52:17,039][10367] Updated weights for policy 0, policy_version 160 (0.0026) [2024-06-05 17:52:18,920][10130] Fps is (10 sec: 42598.3, 60 sec: 38775.4, 300 sec: 28628.9). Total num frames: 2719744. Throughput: 0: 41306.5. Samples: 2804940. Policy #0 lag: (min: 0.0, avg: 7.7, max: 13.0) [2024-06-05 17:52:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:19,023][10387] Worker 19 awakens! [2024-06-05 17:52:19,033][10130] Heartbeat connected on RolloutWorker_w19 [2024-06-05 17:52:21,040][10367] Updated weights for policy 0, policy_version 170 (0.0025) [2024-06-05 17:52:23,680][10388] Worker 20 awakens! [2024-06-05 17:52:23,689][10130] Heartbeat connected on RolloutWorker_w20 [2024-06-05 17:52:23,920][10130] Fps is (10 sec: 42598.8, 60 sec: 39867.7, 300 sec: 29327.4). Total num frames: 2932736. Throughput: 0: 42789.7. Samples: 3068320. Policy #0 lag: (min: 0.0, avg: 7.7, max: 13.0) [2024-06-05 17:52:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:24,132][10367] Updated weights for policy 0, policy_version 180 (0.0023) [2024-06-05 17:52:27,730][10367] Updated weights for policy 0, policy_version 190 (0.0025) [2024-06-05 17:52:28,443][10392] Worker 21 awakens! [2024-06-05 17:52:28,452][10130] Heartbeat connected on RolloutWorker_w21 [2024-06-05 17:52:28,920][10130] Fps is (10 sec: 44237.1, 60 sec: 41233.1, 300 sec: 30115.4). Total num frames: 3162112. Throughput: 0: 43200.9. Samples: 3206880. Policy #0 lag: (min: 0.0, avg: 7.8, max: 14.0) [2024-06-05 17:52:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:31,659][10367] Updated weights for policy 0, policy_version 200 (0.0019) [2024-06-05 17:52:33,077][10389] Worker 22 awakens! [2024-06-05 17:52:33,088][10130] Heartbeat connected on RolloutWorker_w22 [2024-06-05 17:52:33,920][10130] Fps is (10 sec: 45874.8, 60 sec: 42871.3, 300 sec: 30831.7). Total num frames: 3391488. Throughput: 0: 43431.0. Samples: 3471560. Policy #0 lag: (min: 0.0, avg: 69.3, max: 204.0) [2024-06-05 17:52:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:33,927][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000207_3391488.pth... [2024-06-05 17:52:35,477][10367] Updated weights for policy 0, policy_version 210 (0.0026) [2024-06-05 17:52:37,775][10390] Worker 23 awakens! [2024-06-05 17:52:37,787][10130] Heartbeat connected on RolloutWorker_w23 [2024-06-05 17:52:38,423][10367] Updated weights for policy 0, policy_version 220 (0.0020) [2024-06-05 17:52:38,920][10130] Fps is (10 sec: 45875.3, 60 sec: 43690.6, 300 sec: 31485.8). Total num frames: 3620864. Throughput: 0: 43491.2. Samples: 3740120. Policy #0 lag: (min: 0.0, avg: 8.7, max: 15.0) [2024-06-05 17:52:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:42,383][10367] Updated weights for policy 0, policy_version 230 (0.0019) [2024-06-05 17:52:42,413][10391] Worker 24 awakens! [2024-06-05 17:52:42,424][10130] Heartbeat connected on RolloutWorker_w24 [2024-06-05 17:52:43,920][10130] Fps is (10 sec: 45875.4, 60 sec: 43963.7, 300 sec: 32085.4). Total num frames: 3850240. Throughput: 0: 43715.0. Samples: 3888300. Policy #0 lag: (min: 0.0, avg: 8.1, max: 17.0) [2024-06-05 17:52:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:46,053][10367] Updated weights for policy 0, policy_version 240 (0.0021) [2024-06-05 17:52:47,179][10393] Worker 25 awakens! [2024-06-05 17:52:47,190][10130] Heartbeat connected on RolloutWorker_w25 [2024-06-05 17:52:48,920][10130] Fps is (10 sec: 45873.6, 60 sec: 44782.7, 300 sec: 32636.9). Total num frames: 4079616. Throughput: 0: 44058.4. Samples: 4160300. Policy #0 lag: (min: 0.0, avg: 7.9, max: 17.0) [2024-06-05 17:52:48,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:49,131][10367] Updated weights for policy 0, policy_version 250 (0.0024) [2024-06-05 17:52:51,873][10394] Worker 26 awakens! [2024-06-05 17:52:51,885][10130] Heartbeat connected on RolloutWorker_w26 [2024-06-05 17:52:53,063][10367] Updated weights for policy 0, policy_version 260 (0.0025) [2024-06-05 17:52:53,920][10130] Fps is (10 sec: 44237.0, 60 sec: 43963.8, 300 sec: 33020.1). Total num frames: 4292608. Throughput: 0: 45023.6. Samples: 4445540. Policy #0 lag: (min: 0.0, avg: 8.0, max: 19.0) [2024-06-05 17:52:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:56,038][10367] Updated weights for policy 0, policy_version 270 (0.0028) [2024-06-05 17:52:56,528][10396] Worker 27 awakens! [2024-06-05 17:52:56,540][10130] Heartbeat connected on RolloutWorker_w27 [2024-06-05 17:52:58,920][10130] Fps is (10 sec: 45876.6, 60 sec: 45056.0, 300 sec: 33617.6). Total num frames: 4538368. Throughput: 0: 45173.0. Samples: 4582180. Policy #0 lag: (min: 0.0, avg: 7.9, max: 18.0) [2024-06-05 17:52:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:59,653][10367] Updated weights for policy 0, policy_version 280 (0.0025) [2024-06-05 17:53:01,243][10395] Worker 28 awakens! [2024-06-05 17:53:01,255][10130] Heartbeat connected on RolloutWorker_w28 [2024-06-05 17:53:03,198][10367] Updated weights for policy 0, policy_version 290 (0.0023) [2024-06-05 17:53:03,920][10130] Fps is (10 sec: 47513.1, 60 sec: 45055.9, 300 sec: 34055.3). Total num frames: 4767744. Throughput: 0: 46026.6. Samples: 4876140. Policy #0 lag: (min: 0.0, avg: 9.4, max: 19.0) [2024-06-05 17:53:03,929][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:05,911][10397] Worker 29 awakens! [2024-06-05 17:53:05,925][10130] Heartbeat connected on RolloutWorker_w29 [2024-06-05 17:53:06,251][10367] Updated weights for policy 0, policy_version 300 (0.0032) [2024-06-05 17:53:08,920][10130] Fps is (10 sec: 49151.1, 60 sec: 45602.0, 300 sec: 34688.9). Total num frames: 5029888. Throughput: 0: 46486.0. Samples: 5160200. Policy #0 lag: (min: 0.0, avg: 11.3, max: 20.0) [2024-06-05 17:53:08,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:10,432][10367] Updated weights for policy 0, policy_version 310 (0.0028) [2024-06-05 17:53:10,543][10399] Worker 30 awakens! [2024-06-05 17:53:10,565][10130] Heartbeat connected on RolloutWorker_w30 [2024-06-05 17:53:10,683][10347] Signal inference workers to stop experience collection... (50 times) [2024-06-05 17:53:10,722][10367] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-05 17:53:10,731][10347] Signal inference workers to resume experience collection... (50 times) [2024-06-05 17:53:10,740][10367] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-05 17:53:13,174][10367] Updated weights for policy 0, policy_version 320 (0.0032) [2024-06-05 17:53:13,920][10130] Fps is (10 sec: 49153.1, 60 sec: 45875.4, 300 sec: 35061.8). Total num frames: 5259264. Throughput: 0: 46835.7. Samples: 5314480. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 17:53:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:15,287][10398] Worker 31 awakens! [2024-06-05 17:53:15,301][10130] Heartbeat connected on RolloutWorker_w31 [2024-06-05 17:53:16,975][10367] Updated weights for policy 0, policy_version 330 (0.0032) [2024-06-05 17:53:18,920][10130] Fps is (10 sec: 47514.9, 60 sec: 46421.4, 300 sec: 35516.3). Total num frames: 5505024. Throughput: 0: 47489.1. Samples: 5608560. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-05 17:53:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:19,731][10367] Updated weights for policy 0, policy_version 340 (0.0036) [2024-06-05 17:53:23,435][10367] Updated weights for policy 0, policy_version 350 (0.0032) [2024-06-05 17:53:23,920][10130] Fps is (10 sec: 47512.8, 60 sec: 46694.4, 300 sec: 35840.0). Total num frames: 5734400. Throughput: 0: 48076.3. Samples: 5903560. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 17:53:23,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:26,264][10367] Updated weights for policy 0, policy_version 360 (0.0030) [2024-06-05 17:53:28,920][10130] Fps is (10 sec: 50790.6, 60 sec: 47513.7, 300 sec: 36442.1). Total num frames: 6012928. Throughput: 0: 47960.2. Samples: 6046500. Policy #0 lag: (min: 0.0, avg: 10.4, max: 20.0) [2024-06-05 17:53:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:30,004][10367] Updated weights for policy 0, policy_version 370 (0.0028) [2024-06-05 17:53:32,795][10367] Updated weights for policy 0, policy_version 380 (0.0022) [2024-06-05 17:53:33,921][10130] Fps is (10 sec: 54059.4, 60 sec: 48058.6, 300 sec: 36911.9). Total num frames: 6275072. Throughput: 0: 48709.4. Samples: 6352280. Policy #0 lag: (min: 1.0, avg: 10.2, max: 21.0) [2024-06-05 17:53:33,922][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:36,694][10367] Updated weights for policy 0, policy_version 390 (0.0028) [2024-06-05 17:53:38,920][10130] Fps is (10 sec: 50789.8, 60 sec: 48332.8, 300 sec: 37261.9). Total num frames: 6520832. Throughput: 0: 49069.8. Samples: 6653680. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-05 17:53:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:39,335][10367] Updated weights for policy 0, policy_version 400 (0.0025) [2024-06-05 17:53:43,045][10367] Updated weights for policy 0, policy_version 410 (0.0026) [2024-06-05 17:53:43,920][10130] Fps is (10 sec: 47520.7, 60 sec: 48332.8, 300 sec: 37501.2). Total num frames: 6750208. Throughput: 0: 49386.2. Samples: 6804560. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-05 17:53:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:45,842][10367] Updated weights for policy 0, policy_version 420 (0.0029) [2024-06-05 17:53:48,923][10130] Fps is (10 sec: 49136.1, 60 sec: 48876.6, 300 sec: 37904.0). Total num frames: 7012352. Throughput: 0: 49522.8. Samples: 7104820. Policy #0 lag: (min: 0.0, avg: 11.0, max: 20.0) [2024-06-05 17:53:48,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:49,402][10367] Updated weights for policy 0, policy_version 430 (0.0031) [2024-06-05 17:53:52,443][10367] Updated weights for policy 0, policy_version 440 (0.0036) [2024-06-05 17:53:53,920][10130] Fps is (10 sec: 52428.6, 60 sec: 49698.1, 300 sec: 38286.9). Total num frames: 7274496. Throughput: 0: 49723.7. Samples: 7397760. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-05 17:53:53,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:56,259][10367] Updated weights for policy 0, policy_version 450 (0.0036) [2024-06-05 17:53:58,920][10130] Fps is (10 sec: 50806.2, 60 sec: 49698.1, 300 sec: 38565.4). Total num frames: 7520256. Throughput: 0: 49711.3. Samples: 7551500. Policy #0 lag: (min: 0.0, avg: 8.9, max: 23.0) [2024-06-05 17:53:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:59,274][10367] Updated weights for policy 0, policy_version 460 (0.0036) [2024-06-05 17:54:02,961][10367] Updated weights for policy 0, policy_version 470 (0.0028) [2024-06-05 17:54:03,920][10130] Fps is (10 sec: 47513.7, 60 sec: 49698.2, 300 sec: 38748.2). Total num frames: 7749632. Throughput: 0: 49790.6. Samples: 7849140. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 17:54:03,931][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:05,740][10367] Updated weights for policy 0, policy_version 480 (0.0029) [2024-06-05 17:54:08,920][10130] Fps is (10 sec: 47514.6, 60 sec: 49425.3, 300 sec: 39002.0). Total num frames: 7995392. Throughput: 0: 49769.5. Samples: 8143180. Policy #0 lag: (min: 0.0, avg: 10.8, max: 20.0) [2024-06-05 17:54:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:09,597][10367] Updated weights for policy 0, policy_version 490 (0.0044) [2024-06-05 17:54:12,403][10367] Updated weights for policy 0, policy_version 500 (0.0034) [2024-06-05 17:54:13,920][10130] Fps is (10 sec: 52428.8, 60 sec: 50244.2, 300 sec: 39399.7). Total num frames: 8273920. Throughput: 0: 49845.6. Samples: 8289560. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-05 17:54:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:16,010][10367] Updated weights for policy 0, policy_version 510 (0.0021) [2024-06-05 17:54:18,927][10130] Fps is (10 sec: 50753.2, 60 sec: 49965.1, 300 sec: 39548.9). Total num frames: 8503296. Throughput: 0: 49701.2. Samples: 8589120. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-05 17:54:18,928][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:19,106][10367] Updated weights for policy 0, policy_version 520 (0.0035) [2024-06-05 17:54:22,790][10367] Updated weights for policy 0, policy_version 530 (0.0026) [2024-06-05 17:54:23,920][10130] Fps is (10 sec: 44237.1, 60 sec: 49698.2, 300 sec: 39619.5). Total num frames: 8716288. Throughput: 0: 49539.6. Samples: 8882960. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 17:54:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:25,967][10367] Updated weights for policy 0, policy_version 540 (0.0030) [2024-06-05 17:54:28,920][10130] Fps is (10 sec: 47548.5, 60 sec: 49425.1, 300 sec: 39904.2). Total num frames: 8978432. Throughput: 0: 49383.7. Samples: 9026820. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-05 17:54:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:29,482][10367] Updated weights for policy 0, policy_version 550 (0.0026) [2024-06-05 17:54:32,533][10367] Updated weights for policy 0, policy_version 560 (0.0028) [2024-06-05 17:54:33,532][10347] Signal inference workers to stop experience collection... (100 times) [2024-06-05 17:54:33,573][10367] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-05 17:54:33,583][10347] Signal inference workers to resume experience collection... (100 times) [2024-06-05 17:54:33,592][10367] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-05 17:54:33,920][10130] Fps is (10 sec: 54066.6, 60 sec: 49699.3, 300 sec: 40247.7). Total num frames: 9256960. Throughput: 0: 49301.7. Samples: 9323240. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-05 17:54:33,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:33,942][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000565_9256960.pth... [2024-06-05 17:54:36,002][10367] Updated weights for policy 0, policy_version 570 (0.0028) [2024-06-05 17:54:38,920][10130] Fps is (10 sec: 50789.8, 60 sec: 49425.1, 300 sec: 40367.4). Total num frames: 9486336. Throughput: 0: 49438.7. Samples: 9622500. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-05 17:54:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:39,063][10367] Updated weights for policy 0, policy_version 580 (0.0030) [2024-06-05 17:54:42,408][10367] Updated weights for policy 0, policy_version 590 (0.0030) [2024-06-05 17:54:43,920][10130] Fps is (10 sec: 45875.5, 60 sec: 49425.1, 300 sec: 40482.2). Total num frames: 9715712. Throughput: 0: 49290.8. Samples: 9769580. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 17:54:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:45,622][10367] Updated weights for policy 0, policy_version 600 (0.0025) [2024-06-05 17:54:48,920][10130] Fps is (10 sec: 49151.9, 60 sec: 49427.7, 300 sec: 40726.0). Total num frames: 9977856. Throughput: 0: 49204.0. Samples: 10063320. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-05 17:54:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:49,144][10367] Updated weights for policy 0, policy_version 610 (0.0030) [2024-06-05 17:54:52,361][10367] Updated weights for policy 0, policy_version 620 (0.0033) [2024-06-05 17:54:53,920][10130] Fps is (10 sec: 54068.1, 60 sec: 49698.3, 300 sec: 41025.6). Total num frames: 10256384. Throughput: 0: 49302.7. Samples: 10361800. Policy #0 lag: (min: 0.0, avg: 10.4, max: 20.0) [2024-06-05 17:54:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:55,848][10367] Updated weights for policy 0, policy_version 630 (0.0040) [2024-06-05 17:54:58,894][10367] Updated weights for policy 0, policy_version 640 (0.0035) [2024-06-05 17:54:58,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49425.2, 300 sec: 41120.7). Total num frames: 10485760. Throughput: 0: 49504.9. Samples: 10517280. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 17:54:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:02,325][10367] Updated weights for policy 0, policy_version 650 (0.0027) [2024-06-05 17:55:03,923][10130] Fps is (10 sec: 45858.1, 60 sec: 49422.1, 300 sec: 41211.5). Total num frames: 10715136. Throughput: 0: 49470.2. Samples: 10815100. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-05 17:55:03,925][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:05,429][10367] Updated weights for policy 0, policy_version 660 (0.0026) [2024-06-05 17:55:08,852][10367] Updated weights for policy 0, policy_version 670 (0.0030) [2024-06-05 17:55:08,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49698.0, 300 sec: 41423.7). Total num frames: 10977280. Throughput: 0: 49404.8. Samples: 11106180. Policy #0 lag: (min: 1.0, avg: 10.5, max: 21.0) [2024-06-05 17:55:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:12,167][10367] Updated weights for policy 0, policy_version 680 (0.0028) [2024-06-05 17:55:13,920][10130] Fps is (10 sec: 50809.2, 60 sec: 49152.1, 300 sec: 41566.9). Total num frames: 11223040. Throughput: 0: 49593.3. Samples: 11258520. Policy #0 lag: (min: 0.0, avg: 9.6, max: 23.0) [2024-06-05 17:55:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:15,511][10367] Updated weights for policy 0, policy_version 690 (0.0035) [2024-06-05 17:55:18,826][10367] Updated weights for policy 0, policy_version 700 (0.0037) [2024-06-05 17:55:18,920][10130] Fps is (10 sec: 49151.9, 60 sec: 49431.0, 300 sec: 41704.8). Total num frames: 11468800. Throughput: 0: 49773.8. Samples: 11563060. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-05 17:55:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:21,997][10367] Updated weights for policy 0, policy_version 710 (0.0028) [2024-06-05 17:55:23,920][10130] Fps is (10 sec: 47513.8, 60 sec: 49698.2, 300 sec: 41779.3). Total num frames: 11698176. Throughput: 0: 49812.2. Samples: 11864040. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-05 17:55:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:25,295][10367] Updated weights for policy 0, policy_version 720 (0.0025) [2024-06-05 17:55:28,569][10367] Updated weights for policy 0, policy_version 730 (0.0031) [2024-06-05 17:55:28,920][10130] Fps is (10 sec: 49151.9, 60 sec: 49698.0, 300 sec: 41966.1). Total num frames: 11960320. Throughput: 0: 49663.1. Samples: 12004420. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 17:55:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:31,724][10367] Updated weights for policy 0, policy_version 740 (0.0029) [2024-06-05 17:55:33,920][10130] Fps is (10 sec: 52428.5, 60 sec: 49425.2, 300 sec: 42146.5). Total num frames: 12222464. Throughput: 0: 49892.1. Samples: 12308460. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 17:55:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:35,221][10367] Updated weights for policy 0, policy_version 750 (0.0043) [2024-06-05 17:55:38,628][10367] Updated weights for policy 0, policy_version 760 (0.0028) [2024-06-05 17:55:38,920][10130] Fps is (10 sec: 50791.1, 60 sec: 49698.2, 300 sec: 42265.2). Total num frames: 12468224. Throughput: 0: 49796.4. Samples: 12602640. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 17:55:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:41,835][10367] Updated weights for policy 0, policy_version 770 (0.0027) [2024-06-05 17:55:43,920][10130] Fps is (10 sec: 47513.2, 60 sec: 49698.1, 300 sec: 43042.7). Total num frames: 12697600. Throughput: 0: 49536.5. Samples: 12746420. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 17:55:43,929][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:45,160][10367] Updated weights for policy 0, policy_version 780 (0.0027) [2024-06-05 17:55:48,403][10367] Updated weights for policy 0, policy_version 790 (0.0027) [2024-06-05 17:55:48,920][10130] Fps is (10 sec: 49151.4, 60 sec: 49698.1, 300 sec: 43376.0). Total num frames: 12959744. Throughput: 0: 49472.8. Samples: 13041200. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-05 17:55:48,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:51,851][10367] Updated weights for policy 0, policy_version 800 (0.0039) [2024-06-05 17:55:53,920][10130] Fps is (10 sec: 52428.9, 60 sec: 49425.0, 300 sec: 44264.6). Total num frames: 13221888. Throughput: 0: 49699.2. Samples: 13342640. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 17:55:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:55,023][10367] Updated weights for policy 0, policy_version 810 (0.0030) [2024-06-05 17:55:58,361][10367] Updated weights for policy 0, policy_version 820 (0.0025) [2024-06-05 17:55:58,920][10130] Fps is (10 sec: 49152.4, 60 sec: 49425.1, 300 sec: 44986.7). Total num frames: 13451264. Throughput: 0: 49787.0. Samples: 13498940. Policy #0 lag: (min: 1.0, avg: 9.1, max: 21.0) [2024-06-05 17:55:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:59,496][10347] Signal inference workers to stop experience collection... (150 times) [2024-06-05 17:55:59,523][10367] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-05 17:55:59,549][10347] Signal inference workers to resume experience collection... (150 times) [2024-06-05 17:55:59,554][10367] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-05 17:56:01,526][10367] Updated weights for policy 0, policy_version 830 (0.0033) [2024-06-05 17:56:03,920][10130] Fps is (10 sec: 47514.0, 60 sec: 49701.2, 300 sec: 45708.6). Total num frames: 13697024. Throughput: 0: 49581.5. Samples: 13794220. Policy #0 lag: (min: 1.0, avg: 11.4, max: 22.0) [2024-06-05 17:56:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:05,034][10367] Updated weights for policy 0, policy_version 840 (0.0026) [2024-06-05 17:56:07,973][10367] Updated weights for policy 0, policy_version 850 (0.0024) [2024-06-05 17:56:08,923][10130] Fps is (10 sec: 49134.1, 60 sec: 49422.1, 300 sec: 46263.4). Total num frames: 13942784. Throughput: 0: 49476.8. Samples: 14090680. Policy #0 lag: (min: 0.0, avg: 11.0, max: 20.0) [2024-06-05 17:56:08,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:11,479][10367] Updated weights for policy 0, policy_version 860 (0.0025) [2024-06-05 17:56:13,920][10130] Fps is (10 sec: 50790.0, 60 sec: 49698.1, 300 sec: 46819.4). Total num frames: 14204928. Throughput: 0: 49717.4. Samples: 14241700. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-05 17:56:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:14,512][10367] Updated weights for policy 0, policy_version 870 (0.0028) [2024-06-05 17:56:17,992][10367] Updated weights for policy 0, policy_version 880 (0.0021) [2024-06-05 17:56:18,920][10130] Fps is (10 sec: 50809.1, 60 sec: 49698.2, 300 sec: 47152.6). Total num frames: 14450688. Throughput: 0: 49681.3. Samples: 14544120. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-05 17:56:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:21,105][10367] Updated weights for policy 0, policy_version 890 (0.0029) [2024-06-05 17:56:23,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49971.1, 300 sec: 47485.8). Total num frames: 14696448. Throughput: 0: 49820.4. Samples: 14844560. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 17:56:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:24,769][10367] Updated weights for policy 0, policy_version 900 (0.0030) [2024-06-05 17:56:27,746][10367] Updated weights for policy 0, policy_version 910 (0.0033) [2024-06-05 17:56:28,920][10130] Fps is (10 sec: 49151.5, 60 sec: 49698.2, 300 sec: 47874.6). Total num frames: 14942208. Throughput: 0: 49644.9. Samples: 14980440. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-05 17:56:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:31,396][10367] Updated weights for policy 0, policy_version 920 (0.0030) [2024-06-05 17:56:33,920][10130] Fps is (10 sec: 50790.2, 60 sec: 49698.1, 300 sec: 48152.3). Total num frames: 15204352. Throughput: 0: 49724.1. Samples: 15278780. Policy #0 lag: (min: 0.0, avg: 10.8, max: 20.0) [2024-06-05 17:56:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:34,025][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000929_15220736.pth... [2024-06-05 17:56:34,070][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000207_3391488.pth [2024-06-05 17:56:34,348][10367] Updated weights for policy 0, policy_version 930 (0.0031) [2024-06-05 17:56:37,934][10367] Updated weights for policy 0, policy_version 940 (0.0024) [2024-06-05 17:56:38,920][10130] Fps is (10 sec: 50790.8, 60 sec: 49698.1, 300 sec: 48263.4). Total num frames: 15450112. Throughput: 0: 49526.7. Samples: 15571340. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-05 17:56:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:41,139][10367] Updated weights for policy 0, policy_version 950 (0.0028) [2024-06-05 17:56:43,920][10130] Fps is (10 sec: 45874.6, 60 sec: 49425.0, 300 sec: 48374.4). Total num frames: 15663104. Throughput: 0: 49251.9. Samples: 15715280. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 17:56:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:44,748][10367] Updated weights for policy 0, policy_version 960 (0.0041) [2024-06-05 17:56:48,031][10367] Updated weights for policy 0, policy_version 970 (0.0031) [2024-06-05 17:56:48,920][10130] Fps is (10 sec: 47513.5, 60 sec: 49425.1, 300 sec: 48374.5). Total num frames: 15925248. Throughput: 0: 49133.3. Samples: 16005220. Policy #0 lag: (min: 1.0, avg: 11.6, max: 22.0) [2024-06-05 17:56:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:51,751][10367] Updated weights for policy 0, policy_version 980 (0.0029) [2024-06-05 17:56:53,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49425.0, 300 sec: 48652.1). Total num frames: 16187392. Throughput: 0: 49062.5. Samples: 16298320. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 17:56:53,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:54,782][10367] Updated weights for policy 0, policy_version 990 (0.0031) [2024-06-05 17:56:58,246][10367] Updated weights for policy 0, policy_version 1000 (0.0023) [2024-06-05 17:56:58,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49425.1, 300 sec: 48652.2). Total num frames: 16416768. Throughput: 0: 49227.2. Samples: 16456920. Policy #0 lag: (min: 0.0, avg: 9.2, max: 23.0) [2024-06-05 17:56:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:01,299][10367] Updated weights for policy 0, policy_version 1010 (0.0038) [2024-06-05 17:57:03,920][10130] Fps is (10 sec: 45876.0, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 16646144. Throughput: 0: 48875.6. Samples: 16743520. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-05 17:57:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:04,693][10367] Updated weights for policy 0, policy_version 1020 (0.0036) [2024-06-05 17:57:08,030][10367] Updated weights for policy 0, policy_version 1030 (0.0029) [2024-06-05 17:57:08,920][10130] Fps is (10 sec: 47513.1, 60 sec: 49154.9, 300 sec: 48763.2). Total num frames: 16891904. Throughput: 0: 48702.6. Samples: 17036180. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-05 17:57:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:09,901][10347] Signal inference workers to stop experience collection... (200 times) [2024-06-05 17:57:09,952][10347] Signal inference workers to resume experience collection... (200 times) [2024-06-05 17:57:09,953][10367] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-05 17:57:09,969][10367] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-05 17:57:11,440][10367] Updated weights for policy 0, policy_version 1040 (0.0025) [2024-06-05 17:57:13,924][10130] Fps is (10 sec: 50770.8, 60 sec: 49148.9, 300 sec: 48929.2). Total num frames: 17154048. Throughput: 0: 49047.5. Samples: 17187760. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-05 17:57:13,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:14,636][10367] Updated weights for policy 0, policy_version 1050 (0.0024) [2024-06-05 17:57:18,258][10367] Updated weights for policy 0, policy_version 1060 (0.0023) [2024-06-05 17:57:18,920][10130] Fps is (10 sec: 49152.3, 60 sec: 48878.9, 300 sec: 48985.4). Total num frames: 17383424. Throughput: 0: 48994.7. Samples: 17483540. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 17:57:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:21,611][10367] Updated weights for policy 0, policy_version 1070 (0.0028) [2024-06-05 17:57:23,920][10130] Fps is (10 sec: 45893.0, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 17612800. Throughput: 0: 49123.2. Samples: 17781880. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-05 17:57:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:24,791][10367] Updated weights for policy 0, policy_version 1080 (0.0027) [2024-06-05 17:57:28,317][10367] Updated weights for policy 0, policy_version 1090 (0.0033) [2024-06-05 17:57:28,920][10130] Fps is (10 sec: 49152.3, 60 sec: 48879.0, 300 sec: 49096.5). Total num frames: 17874944. Throughput: 0: 49008.6. Samples: 17920660. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 17:57:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:31,394][10367] Updated weights for policy 0, policy_version 1100 (0.0024) [2024-06-05 17:57:33,920][10130] Fps is (10 sec: 54066.3, 60 sec: 49151.9, 300 sec: 49263.1). Total num frames: 18153472. Throughput: 0: 49391.4. Samples: 18227840. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 17:57:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:34,708][10367] Updated weights for policy 0, policy_version 1110 (0.0031) [2024-06-05 17:57:37,980][10367] Updated weights for policy 0, policy_version 1120 (0.0023) [2024-06-05 17:57:38,920][10130] Fps is (10 sec: 49151.6, 60 sec: 48605.8, 300 sec: 49207.6). Total num frames: 18366464. Throughput: 0: 49428.1. Samples: 18522580. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-05 17:57:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:41,318][10367] Updated weights for policy 0, policy_version 1130 (0.0031) [2024-06-05 17:57:43,920][10130] Fps is (10 sec: 45876.0, 60 sec: 49152.2, 300 sec: 49263.2). Total num frames: 18612224. Throughput: 0: 49285.8. Samples: 18674780. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 17:57:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:44,611][10367] Updated weights for policy 0, policy_version 1140 (0.0023) [2024-06-05 17:57:47,744][10367] Updated weights for policy 0, policy_version 1150 (0.0027) [2024-06-05 17:57:48,920][10130] Fps is (10 sec: 50790.5, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 18874368. Throughput: 0: 49469.7. Samples: 18969660. Policy #0 lag: (min: 0.0, avg: 11.8, max: 21.0) [2024-06-05 17:57:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:50,904][10367] Updated weights for policy 0, policy_version 1160 (0.0018) [2024-06-05 17:57:53,920][10130] Fps is (10 sec: 54066.1, 60 sec: 49425.1, 300 sec: 49540.8). Total num frames: 19152896. Throughput: 0: 49701.7. Samples: 19272760. Policy #0 lag: (min: 1.0, avg: 11.0, max: 21.0) [2024-06-05 17:57:53,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:54,257][10367] Updated weights for policy 0, policy_version 1170 (0.0027) [2024-06-05 17:57:57,476][10367] Updated weights for policy 0, policy_version 1180 (0.0022) [2024-06-05 17:57:58,920][10130] Fps is (10 sec: 54066.9, 60 sec: 49971.1, 300 sec: 49651.9). Total num frames: 19415040. Throughput: 0: 49994.8. Samples: 19437340. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-05 17:57:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:00,727][10367] Updated weights for policy 0, policy_version 1190 (0.0026) [2024-06-05 17:58:03,887][10367] Updated weights for policy 0, policy_version 1200 (0.0025) [2024-06-05 17:58:03,924][10130] Fps is (10 sec: 50772.4, 60 sec: 50241.1, 300 sec: 49595.7). Total num frames: 19660800. Throughput: 0: 50194.5. Samples: 19742480. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 17:58:03,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:07,170][10367] Updated weights for policy 0, policy_version 1210 (0.0031) [2024-06-05 17:58:08,920][10130] Fps is (10 sec: 47514.2, 60 sec: 49971.3, 300 sec: 49596.3). Total num frames: 19890176. Throughput: 0: 50157.3. Samples: 20038960. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-05 17:58:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:10,743][10367] Updated weights for policy 0, policy_version 1220 (0.0028) [2024-06-05 17:58:13,920][10130] Fps is (10 sec: 47531.0, 60 sec: 49701.3, 300 sec: 49596.3). Total num frames: 20135936. Throughput: 0: 50299.9. Samples: 20184160. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-05 17:58:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:13,930][10367] Updated weights for policy 0, policy_version 1230 (0.0027) [2024-06-05 17:58:14,996][10347] Signal inference workers to stop experience collection... (250 times) [2024-06-05 17:58:15,028][10367] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-05 17:58:15,045][10347] Signal inference workers to resume experience collection... (250 times) [2024-06-05 17:58:15,047][10367] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-05 17:58:17,198][10367] Updated weights for policy 0, policy_version 1240 (0.0031) [2024-06-05 17:58:18,920][10130] Fps is (10 sec: 52428.1, 60 sec: 50517.2, 300 sec: 49762.9). Total num frames: 20414464. Throughput: 0: 50094.2. Samples: 20482080. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-05 17:58:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:20,700][10367] Updated weights for policy 0, policy_version 1250 (0.0035) [2024-06-05 17:58:23,884][10367] Updated weights for policy 0, policy_version 1260 (0.0037) [2024-06-05 17:58:23,920][10130] Fps is (10 sec: 50790.9, 60 sec: 50517.3, 300 sec: 49596.3). Total num frames: 20643840. Throughput: 0: 50177.4. Samples: 20780560. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-05 17:58:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:27,134][10367] Updated weights for policy 0, policy_version 1270 (0.0029) [2024-06-05 17:58:28,920][10130] Fps is (10 sec: 47513.8, 60 sec: 50244.2, 300 sec: 49541.0). Total num frames: 20889600. Throughput: 0: 49994.1. Samples: 20924520. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 17:58:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:30,349][10367] Updated weights for policy 0, policy_version 1280 (0.0026) [2024-06-05 17:58:33,689][10367] Updated weights for policy 0, policy_version 1290 (0.0018) [2024-06-05 17:58:33,920][10130] Fps is (10 sec: 49151.8, 60 sec: 49698.2, 300 sec: 49540.8). Total num frames: 21135360. Throughput: 0: 49958.3. Samples: 21217780. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-05 17:58:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:33,936][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000001290_21135360.pth... [2024-06-05 17:58:33,977][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000565_9256960.pth [2024-06-05 17:58:37,317][10367] Updated weights for policy 0, policy_version 1300 (0.0030) [2024-06-05 17:58:38,920][10130] Fps is (10 sec: 50790.6, 60 sec: 50517.4, 300 sec: 49651.9). Total num frames: 21397504. Throughput: 0: 49891.7. Samples: 21517880. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-05 17:58:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:40,333][10367] Updated weights for policy 0, policy_version 1310 (0.0019) [2024-06-05 17:58:43,770][10367] Updated weights for policy 0, policy_version 1320 (0.0031) [2024-06-05 17:58:43,923][10130] Fps is (10 sec: 49133.9, 60 sec: 50241.1, 300 sec: 49540.7). Total num frames: 21626880. Throughput: 0: 49678.3. Samples: 21673040. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 17:58:43,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:47,157][10367] Updated weights for policy 0, policy_version 1330 (0.0028) [2024-06-05 17:58:48,920][10130] Fps is (10 sec: 47513.5, 60 sec: 49971.2, 300 sec: 49485.2). Total num frames: 21872640. Throughput: 0: 49371.1. Samples: 21964000. Policy #0 lag: (min: 0.0, avg: 12.0, max: 22.0) [2024-06-05 17:58:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:50,391][10367] Updated weights for policy 0, policy_version 1340 (0.0035) [2024-06-05 17:58:53,502][10367] Updated weights for policy 0, policy_version 1350 (0.0032) [2024-06-05 17:58:53,920][10130] Fps is (10 sec: 49169.4, 60 sec: 49425.1, 300 sec: 49485.2). Total num frames: 22118400. Throughput: 0: 49540.2. Samples: 22268280. Policy #0 lag: (min: 0.0, avg: 11.9, max: 23.0) [2024-06-05 17:58:53,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:57,011][10367] Updated weights for policy 0, policy_version 1360 (0.0025) [2024-06-05 17:58:58,920][10130] Fps is (10 sec: 54066.7, 60 sec: 49971.2, 300 sec: 49707.4). Total num frames: 22413312. Throughput: 0: 49592.4. Samples: 22415820. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 17:58:58,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:59,949][10367] Updated weights for policy 0, policy_version 1370 (0.0033) [2024-06-05 17:59:03,674][10367] Updated weights for policy 0, policy_version 1380 (0.0025) [2024-06-05 17:59:03,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49155.0, 300 sec: 49540.8). Total num frames: 22609920. Throughput: 0: 49478.7. Samples: 22708620. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-05 17:59:03,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:06,646][10367] Updated weights for policy 0, policy_version 1390 (0.0028) [2024-06-05 17:59:08,920][10130] Fps is (10 sec: 44237.4, 60 sec: 49425.0, 300 sec: 49429.7). Total num frames: 22855680. Throughput: 0: 49479.1. Samples: 23007120. Policy #0 lag: (min: 0.0, avg: 12.6, max: 22.0) [2024-06-05 17:59:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:10,210][10367] Updated weights for policy 0, policy_version 1400 (0.0024) [2024-06-05 17:59:13,225][10367] Updated weights for policy 0, policy_version 1410 (0.0027) [2024-06-05 17:59:13,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49425.1, 300 sec: 49486.4). Total num frames: 23101440. Throughput: 0: 49588.9. Samples: 23156020. Policy #0 lag: (min: 1.0, avg: 12.2, max: 20.0) [2024-06-05 17:59:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:16,963][10367] Updated weights for policy 0, policy_version 1420 (0.0031) [2024-06-05 17:59:17,834][10347] Signal inference workers to stop experience collection... (300 times) [2024-06-05 17:59:17,834][10347] Signal inference workers to resume experience collection... (300 times) [2024-06-05 17:59:17,850][10367] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-05 17:59:17,850][10367] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-05 17:59:18,920][10130] Fps is (10 sec: 54066.9, 60 sec: 49698.2, 300 sec: 49762.9). Total num frames: 23396352. Throughput: 0: 49784.8. Samples: 23458100. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-05 17:59:18,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:19,763][10367] Updated weights for policy 0, policy_version 1430 (0.0025) [2024-06-05 17:59:23,763][10367] Updated weights for policy 0, policy_version 1440 (0.0024) [2024-06-05 17:59:23,920][10130] Fps is (10 sec: 49152.6, 60 sec: 49152.0, 300 sec: 49540.8). Total num frames: 23592960. Throughput: 0: 49796.1. Samples: 23758700. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-05 17:59:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:26,070][10367] Updated weights for policy 0, policy_version 1450 (0.0034) [2024-06-05 17:59:28,920][10130] Fps is (10 sec: 44237.0, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 23838720. Throughput: 0: 49454.7. Samples: 23898320. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-05 17:59:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:30,156][10367] Updated weights for policy 0, policy_version 1460 (0.0032) [2024-06-05 17:59:32,910][10367] Updated weights for policy 0, policy_version 1470 (0.0033) [2024-06-05 17:59:33,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 49485.3). Total num frames: 24084480. Throughput: 0: 49504.1. Samples: 24191680. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 17:59:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:36,741][10367] Updated weights for policy 0, policy_version 1480 (0.0023) [2024-06-05 17:59:38,920][10130] Fps is (10 sec: 54067.7, 60 sec: 49698.2, 300 sec: 49707.4). Total num frames: 24379392. Throughput: 0: 49202.5. Samples: 24482380. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-05 17:59:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:39,854][10367] Updated weights for policy 0, policy_version 1490 (0.0033) [2024-06-05 17:59:43,805][10367] Updated weights for policy 0, policy_version 1500 (0.0037) [2024-06-05 17:59:43,920][10130] Fps is (10 sec: 49151.4, 60 sec: 49154.9, 300 sec: 49485.2). Total num frames: 24576000. Throughput: 0: 49234.3. Samples: 24631360. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-05 17:59:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:46,414][10367] Updated weights for policy 0, policy_version 1510 (0.0024) [2024-06-05 17:59:48,920][10130] Fps is (10 sec: 44236.5, 60 sec: 49152.0, 300 sec: 49374.1). Total num frames: 24821760. Throughput: 0: 49402.3. Samples: 24931720. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-05 17:59:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:50,398][10367] Updated weights for policy 0, policy_version 1520 (0.0030) [2024-06-05 17:59:52,901][10367] Updated weights for policy 0, policy_version 1530 (0.0028) [2024-06-05 17:59:53,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 25067520. Throughput: 0: 49152.3. Samples: 25218980. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-05 17:59:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:57,188][10367] Updated weights for policy 0, policy_version 1540 (0.0020) [2024-06-05 17:59:58,920][10130] Fps is (10 sec: 54066.4, 60 sec: 49152.0, 300 sec: 49652.4). Total num frames: 25362432. Throughput: 0: 49368.4. Samples: 25377600. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-05 17:59:58,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:59,762][10367] Updated weights for policy 0, policy_version 1550 (0.0041) [2024-06-05 18:00:03,643][10367] Updated weights for policy 0, policy_version 1560 (0.0031) [2024-06-05 18:00:03,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 25559040. Throughput: 0: 49129.8. Samples: 25668940. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-05 18:00:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:06,513][10367] Updated weights for policy 0, policy_version 1570 (0.0027) [2024-06-05 18:00:08,920][10130] Fps is (10 sec: 44237.5, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 25804800. Throughput: 0: 48899.5. Samples: 25959180. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:00:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:10,281][10367] Updated weights for policy 0, policy_version 1580 (0.0024) [2024-06-05 18:00:12,922][10367] Updated weights for policy 0, policy_version 1590 (0.0026) [2024-06-05 18:00:13,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 26050560. Throughput: 0: 49167.0. Samples: 26110840. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-05 18:00:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:16,842][10367] Updated weights for policy 0, policy_version 1600 (0.0029) [2024-06-05 18:00:17,792][10347] Signal inference workers to stop experience collection... (350 times) [2024-06-05 18:00:17,808][10367] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-05 18:00:17,847][10347] Signal inference workers to resume experience collection... (350 times) [2024-06-05 18:00:17,847][10367] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-05 18:00:18,920][10130] Fps is (10 sec: 54067.0, 60 sec: 49152.0, 300 sec: 49651.8). Total num frames: 26345472. Throughput: 0: 49332.8. Samples: 26411660. Policy #0 lag: (min: 0.0, avg: 8.4, max: 21.0) [2024-06-05 18:00:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:19,403][10367] Updated weights for policy 0, policy_version 1610 (0.0028) [2024-06-05 18:00:23,838][10367] Updated weights for policy 0, policy_version 1620 (0.0031) [2024-06-05 18:00:23,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49151.9, 300 sec: 49429.7). Total num frames: 26542080. Throughput: 0: 49213.7. Samples: 26697000. Policy #0 lag: (min: 0.0, avg: 8.8, max: 22.0) [2024-06-05 18:00:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:26,798][10367] Updated weights for policy 0, policy_version 1630 (0.0032) [2024-06-05 18:00:28,920][10130] Fps is (10 sec: 44236.5, 60 sec: 49151.9, 300 sec: 49374.1). Total num frames: 26787840. Throughput: 0: 48948.9. Samples: 26834060. Policy #0 lag: (min: 0.0, avg: 12.8, max: 21.0) [2024-06-05 18:00:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:30,249][10367] Updated weights for policy 0, policy_version 1640 (0.0027) [2024-06-05 18:00:33,495][10367] Updated weights for policy 0, policy_version 1650 (0.0029) [2024-06-05 18:00:33,924][10130] Fps is (10 sec: 50771.5, 60 sec: 49422.0, 300 sec: 49429.1). Total num frames: 27049984. Throughput: 0: 48867.9. Samples: 27130960. Policy #0 lag: (min: 0.0, avg: 13.0, max: 23.0) [2024-06-05 18:00:33,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:33,931][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000001651_27049984.pth... [2024-06-05 18:00:33,979][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000929_15220736.pth [2024-06-05 18:00:36,929][10367] Updated weights for policy 0, policy_version 1660 (0.0027) [2024-06-05 18:00:38,920][10130] Fps is (10 sec: 54066.9, 60 sec: 49151.8, 300 sec: 49596.3). Total num frames: 27328512. Throughput: 0: 49280.0. Samples: 27436580. Policy #0 lag: (min: 0.0, avg: 12.6, max: 23.0) [2024-06-05 18:00:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:39,919][10367] Updated weights for policy 0, policy_version 1670 (0.0024) [2024-06-05 18:00:43,665][10367] Updated weights for policy 0, policy_version 1680 (0.0024) [2024-06-05 18:00:43,920][10130] Fps is (10 sec: 49169.8, 60 sec: 49425.1, 300 sec: 49429.7). Total num frames: 27541504. Throughput: 0: 49076.5. Samples: 27586040. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-05 18:00:43,922][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:46,369][10367] Updated weights for policy 0, policy_version 1690 (0.0033) [2024-06-05 18:00:48,920][10130] Fps is (10 sec: 45875.6, 60 sec: 49425.0, 300 sec: 49374.2). Total num frames: 27787264. Throughput: 0: 49141.3. Samples: 27880300. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:00:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:50,325][10367] Updated weights for policy 0, policy_version 1700 (0.0030) [2024-06-05 18:00:53,080][10367] Updated weights for policy 0, policy_version 1710 (0.0029) [2024-06-05 18:00:53,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.1, 300 sec: 49429.7). Total num frames: 28033024. Throughput: 0: 49212.4. Samples: 28173740. Policy #0 lag: (min: 1.0, avg: 12.2, max: 23.0) [2024-06-05 18:00:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:56,974][10367] Updated weights for policy 0, policy_version 1720 (0.0027) [2024-06-05 18:00:58,920][10130] Fps is (10 sec: 52429.3, 60 sec: 49152.2, 300 sec: 49540.8). Total num frames: 28311552. Throughput: 0: 49330.8. Samples: 28330720. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-05 18:00:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:59,827][10367] Updated weights for policy 0, policy_version 1730 (0.0023) [2024-06-05 18:01:03,478][10367] Updated weights for policy 0, policy_version 1740 (0.0031) [2024-06-05 18:01:03,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.1, 300 sec: 49430.3). Total num frames: 28524544. Throughput: 0: 49108.0. Samples: 28621520. Policy #0 lag: (min: 1.0, avg: 10.0, max: 22.0) [2024-06-05 18:01:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:06,441][10367] Updated weights for policy 0, policy_version 1750 (0.0031) [2024-06-05 18:01:08,920][10130] Fps is (10 sec: 45874.5, 60 sec: 49425.0, 300 sec: 49374.1). Total num frames: 28770304. Throughput: 0: 49319.4. Samples: 28916380. Policy #0 lag: (min: 0.0, avg: 9.6, max: 23.0) [2024-06-05 18:01:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:10,430][10367] Updated weights for policy 0, policy_version 1760 (0.0029) [2024-06-05 18:01:12,973][10367] Updated weights for policy 0, policy_version 1770 (0.0031) [2024-06-05 18:01:13,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.2, 300 sec: 49374.2). Total num frames: 29016064. Throughput: 0: 49395.2. Samples: 29056840. Policy #0 lag: (min: 1.0, avg: 10.9, max: 23.0) [2024-06-05 18:01:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:16,942][10367] Updated weights for policy 0, policy_version 1780 (0.0033) [2024-06-05 18:01:18,397][10347] Signal inference workers to stop experience collection... (400 times) [2024-06-05 18:01:18,397][10347] Signal inference workers to resume experience collection... (400 times) [2024-06-05 18:01:18,413][10367] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-05 18:01:18,443][10367] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-05 18:01:18,920][10130] Fps is (10 sec: 52428.4, 60 sec: 49151.9, 300 sec: 49485.2). Total num frames: 29294592. Throughput: 0: 49671.9. Samples: 29366020. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-05 18:01:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:19,697][10367] Updated weights for policy 0, policy_version 1790 (0.0024) [2024-06-05 18:01:23,623][10367] Updated weights for policy 0, policy_version 1800 (0.0040) [2024-06-05 18:01:23,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49425.1, 300 sec: 49374.2). Total num frames: 29507584. Throughput: 0: 49392.1. Samples: 29659220. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-05 18:01:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:26,490][10367] Updated weights for policy 0, policy_version 1810 (0.0033) [2024-06-05 18:01:28,920][10130] Fps is (10 sec: 45876.3, 60 sec: 49425.2, 300 sec: 49318.6). Total num frames: 29753344. Throughput: 0: 49118.4. Samples: 29796360. Policy #0 lag: (min: 1.0, avg: 10.2, max: 21.0) [2024-06-05 18:01:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:30,147][10367] Updated weights for policy 0, policy_version 1820 (0.0037) [2024-06-05 18:01:32,953][10367] Updated weights for policy 0, policy_version 1830 (0.0027) [2024-06-05 18:01:33,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49155.0, 300 sec: 49318.6). Total num frames: 29999104. Throughput: 0: 49068.4. Samples: 30088380. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-05 18:01:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:36,871][10367] Updated weights for policy 0, policy_version 1840 (0.0027) [2024-06-05 18:01:38,920][10130] Fps is (10 sec: 52428.3, 60 sec: 49152.1, 300 sec: 49540.8). Total num frames: 30277632. Throughput: 0: 49342.7. Samples: 30394160. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-05 18:01:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:39,550][10367] Updated weights for policy 0, policy_version 1850 (0.0020) [2024-06-05 18:01:43,447][10367] Updated weights for policy 0, policy_version 1860 (0.0037) [2024-06-05 18:01:43,920][10130] Fps is (10 sec: 49152.5, 60 sec: 49152.1, 300 sec: 49374.2). Total num frames: 30490624. Throughput: 0: 49143.1. Samples: 30542160. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:01:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:46,282][10367] Updated weights for policy 0, policy_version 1870 (0.0025) [2024-06-05 18:01:48,920][10130] Fps is (10 sec: 45874.7, 60 sec: 49151.9, 300 sec: 49318.6). Total num frames: 30736384. Throughput: 0: 49293.2. Samples: 30839720. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-05 18:01:48,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:50,137][10367] Updated weights for policy 0, policy_version 1880 (0.0028) [2024-06-05 18:01:52,703][10367] Updated weights for policy 0, policy_version 1890 (0.0027) [2024-06-05 18:01:53,920][10130] Fps is (10 sec: 50790.4, 60 sec: 49425.1, 300 sec: 49429.7). Total num frames: 30998528. Throughput: 0: 49393.0. Samples: 31139060. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-05 18:01:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:56,685][10367] Updated weights for policy 0, policy_version 1900 (0.0028) [2024-06-05 18:01:58,920][10130] Fps is (10 sec: 52429.5, 60 sec: 49152.0, 300 sec: 49540.8). Total num frames: 31260672. Throughput: 0: 49658.6. Samples: 31291480. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-05 18:01:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:59,543][10367] Updated weights for policy 0, policy_version 1910 (0.0030) [2024-06-05 18:02:03,230][10367] Updated weights for policy 0, policy_version 1920 (0.0024) [2024-06-05 18:02:03,920][10130] Fps is (10 sec: 49151.9, 60 sec: 49425.1, 300 sec: 49485.2). Total num frames: 31490048. Throughput: 0: 49458.0. Samples: 31591620. Policy #0 lag: (min: 1.0, avg: 10.0, max: 22.0) [2024-06-05 18:02:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:05,962][10367] Updated weights for policy 0, policy_version 1930 (0.0028) [2024-06-05 18:02:08,920][10130] Fps is (10 sec: 47513.9, 60 sec: 49425.2, 300 sec: 49430.3). Total num frames: 31735808. Throughput: 0: 49524.1. Samples: 31887800. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-05 18:02:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:09,825][10367] Updated weights for policy 0, policy_version 1940 (0.0031) [2024-06-05 18:02:12,649][10367] Updated weights for policy 0, policy_version 1950 (0.0030) [2024-06-05 18:02:13,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49698.1, 300 sec: 49540.8). Total num frames: 31997952. Throughput: 0: 49704.0. Samples: 32033040. Policy #0 lag: (min: 1.0, avg: 10.8, max: 22.0) [2024-06-05 18:02:13,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:02:16,333][10367] Updated weights for policy 0, policy_version 1960 (0.0036) [2024-06-05 18:02:18,920][10130] Fps is (10 sec: 50789.8, 60 sec: 49152.1, 300 sec: 49596.3). Total num frames: 32243712. Throughput: 0: 49897.8. Samples: 32333780. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:02:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:19,162][10367] Updated weights for policy 0, policy_version 1970 (0.0033) [2024-06-05 18:02:23,019][10367] Updated weights for policy 0, policy_version 1980 (0.0035) [2024-06-05 18:02:23,920][10130] Fps is (10 sec: 49151.5, 60 sec: 49698.1, 300 sec: 49540.7). Total num frames: 32489472. Throughput: 0: 49729.3. Samples: 32631980. Policy #0 lag: (min: 1.0, avg: 9.4, max: 20.0) [2024-06-05 18:02:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:25,790][10367] Updated weights for policy 0, policy_version 1990 (0.0027) [2024-06-05 18:02:27,475][10347] Signal inference workers to stop experience collection... (450 times) [2024-06-05 18:02:27,476][10347] Signal inference workers to resume experience collection... (450 times) [2024-06-05 18:02:27,489][10367] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-05 18:02:27,489][10367] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-05 18:02:28,920][10130] Fps is (10 sec: 49152.5, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 32735232. Throughput: 0: 49710.7. Samples: 32779140. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:02:28,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:02:29,438][10367] Updated weights for policy 0, policy_version 2000 (0.0027) [2024-06-05 18:02:32,214][10367] Updated weights for policy 0, policy_version 2010 (0.0026) [2024-06-05 18:02:33,920][10130] Fps is (10 sec: 49152.6, 60 sec: 49698.2, 300 sec: 49540.8). Total num frames: 32980992. Throughput: 0: 49734.9. Samples: 33077780. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:02:33,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:02:33,931][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002013_32980992.pth... [2024-06-05 18:02:33,978][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000001290_21135360.pth [2024-06-05 18:02:36,034][10367] Updated weights for policy 0, policy_version 2020 (0.0020) [2024-06-05 18:02:38,920][10130] Fps is (10 sec: 50789.9, 60 sec: 49425.0, 300 sec: 49596.3). Total num frames: 33243136. Throughput: 0: 49835.9. Samples: 33381680. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-05 18:02:38,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:38,999][10367] Updated weights for policy 0, policy_version 2030 (0.0029) [2024-06-05 18:02:42,514][10367] Updated weights for policy 0, policy_version 2040 (0.0026) [2024-06-05 18:02:43,920][10130] Fps is (10 sec: 52428.5, 60 sec: 50244.2, 300 sec: 49596.3). Total num frames: 33505280. Throughput: 0: 49856.4. Samples: 33535020. Policy #0 lag: (min: 0.0, avg: 10.6, max: 23.0) [2024-06-05 18:02:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:45,592][10367] Updated weights for policy 0, policy_version 2050 (0.0035) [2024-06-05 18:02:48,920][10130] Fps is (10 sec: 47514.1, 60 sec: 49698.3, 300 sec: 49374.2). Total num frames: 33718272. Throughput: 0: 49959.2. Samples: 33839780. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-05 18:02:48,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:02:49,191][10367] Updated weights for policy 0, policy_version 2060 (0.0026) [2024-06-05 18:02:52,023][10367] Updated weights for policy 0, policy_version 2070 (0.0035) [2024-06-05 18:02:53,920][10130] Fps is (10 sec: 47513.3, 60 sec: 49698.1, 300 sec: 49374.2). Total num frames: 33980416. Throughput: 0: 49815.0. Samples: 34129480. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:02:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:55,906][10367] Updated weights for policy 0, policy_version 2080 (0.0025) [2024-06-05 18:02:58,544][10367] Updated weights for policy 0, policy_version 2090 (0.0027) [2024-06-05 18:02:58,920][10130] Fps is (10 sec: 52428.3, 60 sec: 49698.1, 300 sec: 49430.3). Total num frames: 34242560. Throughput: 0: 49839.5. Samples: 34275820. Policy #0 lag: (min: 0.0, avg: 8.5, max: 22.0) [2024-06-05 18:02:58,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:02,392][10367] Updated weights for policy 0, policy_version 2100 (0.0034) [2024-06-05 18:03:03,920][10130] Fps is (10 sec: 50791.0, 60 sec: 49971.2, 300 sec: 49485.2). Total num frames: 34488320. Throughput: 0: 49993.5. Samples: 34583480. Policy #0 lag: (min: 2.0, avg: 10.0, max: 22.0) [2024-06-05 18:03:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:05,225][10367] Updated weights for policy 0, policy_version 2110 (0.0025) [2024-06-05 18:03:08,920][10130] Fps is (10 sec: 47513.8, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 34717696. Throughput: 0: 49885.0. Samples: 34876800. Policy #0 lag: (min: 0.0, avg: 11.8, max: 22.0) [2024-06-05 18:03:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:08,970][10367] Updated weights for policy 0, policy_version 2120 (0.0030) [2024-06-05 18:03:11,995][10367] Updated weights for policy 0, policy_version 2130 (0.0029) [2024-06-05 18:03:13,920][10130] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 49263.1). Total num frames: 34947072. Throughput: 0: 49529.8. Samples: 35007980. Policy #0 lag: (min: 0.0, avg: 11.1, max: 25.0) [2024-06-05 18:03:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:15,790][10367] Updated weights for policy 0, policy_version 2140 (0.0028) [2024-06-05 18:03:18,580][10367] Updated weights for policy 0, policy_version 2150 (0.0024) [2024-06-05 18:03:18,920][10130] Fps is (10 sec: 50790.0, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 35225600. Throughput: 0: 49502.1. Samples: 35305380. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:03:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:22,486][10367] Updated weights for policy 0, policy_version 2160 (0.0024) [2024-06-05 18:03:23,920][10130] Fps is (10 sec: 52429.2, 60 sec: 49698.3, 300 sec: 49429.7). Total num frames: 35471360. Throughput: 0: 49262.4. Samples: 35598480. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-05 18:03:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:25,211][10367] Updated weights for policy 0, policy_version 2170 (0.0028) [2024-06-05 18:03:28,911][10347] Signal inference workers to stop experience collection... (500 times) [2024-06-05 18:03:28,920][10130] Fps is (10 sec: 45875.2, 60 sec: 49151.9, 300 sec: 49318.6). Total num frames: 35684352. Throughput: 0: 49062.1. Samples: 35742820. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-05 18:03:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:28,933][10367] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-05 18:03:28,965][10347] Signal inference workers to resume experience collection... (500 times) [2024-06-05 18:03:28,966][10367] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-05 18:03:29,099][10367] Updated weights for policy 0, policy_version 2180 (0.0029) [2024-06-05 18:03:32,030][10367] Updated weights for policy 0, policy_version 2190 (0.0025) [2024-06-05 18:03:33,920][10130] Fps is (10 sec: 47512.9, 60 sec: 49425.0, 300 sec: 49318.6). Total num frames: 35946496. Throughput: 0: 48762.1. Samples: 36034080. Policy #0 lag: (min: 0.0, avg: 11.6, max: 22.0) [2024-06-05 18:03:33,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:03:35,753][10367] Updated weights for policy 0, policy_version 2200 (0.0029) [2024-06-05 18:03:38,713][10367] Updated weights for policy 0, policy_version 2210 (0.0037) [2024-06-05 18:03:38,920][10130] Fps is (10 sec: 52429.4, 60 sec: 49425.1, 300 sec: 49430.3). Total num frames: 36208640. Throughput: 0: 48962.3. Samples: 36332780. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:03:38,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:03:42,419][10367] Updated weights for policy 0, policy_version 2220 (0.0033) [2024-06-05 18:03:43,920][10130] Fps is (10 sec: 49152.2, 60 sec: 48879.0, 300 sec: 49374.2). Total num frames: 36438016. Throughput: 0: 49133.0. Samples: 36486800. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-05 18:03:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:45,180][10367] Updated weights for policy 0, policy_version 2230 (0.0031) [2024-06-05 18:03:48,920][10130] Fps is (10 sec: 47513.0, 60 sec: 49424.9, 300 sec: 49374.2). Total num frames: 36683776. Throughput: 0: 48884.3. Samples: 36783280. Policy #0 lag: (min: 0.0, avg: 10.6, max: 20.0) [2024-06-05 18:03:48,921][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:03:49,061][10367] Updated weights for policy 0, policy_version 2240 (0.0027) [2024-06-05 18:03:51,691][10367] Updated weights for policy 0, policy_version 2250 (0.0034) [2024-06-05 18:03:53,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49152.1, 300 sec: 49207.6). Total num frames: 36929536. Throughput: 0: 48767.2. Samples: 37071320. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-05 18:03:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:55,665][10367] Updated weights for policy 0, policy_version 2260 (0.0024) [2024-06-05 18:03:58,681][10367] Updated weights for policy 0, policy_version 2270 (0.0027) [2024-06-05 18:03:58,920][10130] Fps is (10 sec: 50791.1, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 37191680. Throughput: 0: 49253.3. Samples: 37224380. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:03:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:04:02,227][10367] Updated weights for policy 0, policy_version 2280 (0.0029) [2024-06-05 18:04:03,920][10130] Fps is (10 sec: 49151.9, 60 sec: 48878.9, 300 sec: 49374.2). Total num frames: 37421056. Throughput: 0: 49152.6. Samples: 37517240. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:04:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:04:05,338][10367] Updated weights for policy 0, policy_version 2290 (0.0020) [2024-06-05 18:04:08,784][10367] Updated weights for policy 0, policy_version 2300 (0.0026) [2024-06-05 18:04:08,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.1, 300 sec: 49429.7). Total num frames: 37683200. Throughput: 0: 49562.1. Samples: 37828780. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:04:08,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:11,868][10367] Updated weights for policy 0, policy_version 2310 (0.0026) [2024-06-05 18:04:13,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.1, 300 sec: 49207.6). Total num frames: 37912576. Throughput: 0: 49419.3. Samples: 37966680. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-05 18:04:13,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:15,387][10367] Updated weights for policy 0, policy_version 2320 (0.0031) [2024-06-05 18:04:18,424][10367] Updated weights for policy 0, policy_version 2330 (0.0029) [2024-06-05 18:04:18,920][10130] Fps is (10 sec: 50790.1, 60 sec: 49425.1, 300 sec: 49485.2). Total num frames: 38191104. Throughput: 0: 49635.6. Samples: 38267680. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-05 18:04:18,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:22,051][10367] Updated weights for policy 0, policy_version 2340 (0.0026) [2024-06-05 18:04:23,920][10130] Fps is (10 sec: 50790.0, 60 sec: 49151.9, 300 sec: 49429.7). Total num frames: 38420480. Throughput: 0: 49663.9. Samples: 38567660. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-05 18:04:23,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:24,802][10367] Updated weights for policy 0, policy_version 2350 (0.0030) [2024-06-05 18:04:28,844][10367] Updated weights for policy 0, policy_version 2360 (0.0029) [2024-06-05 18:04:28,920][10130] Fps is (10 sec: 47513.5, 60 sec: 49698.2, 300 sec: 49429.7). Total num frames: 38666240. Throughput: 0: 49405.8. Samples: 38710060. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-05 18:04:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:04:31,641][10367] Updated weights for policy 0, policy_version 2370 (0.0024) [2024-06-05 18:04:33,920][10130] Fps is (10 sec: 47513.6, 60 sec: 49152.0, 300 sec: 49207.5). Total num frames: 38895616. Throughput: 0: 49290.3. Samples: 39001340. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 18:04:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:04:33,942][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002374_38895616.pth... [2024-06-05 18:04:34,006][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000001651_27049984.pth [2024-06-05 18:04:35,495][10367] Updated weights for policy 0, policy_version 2380 (0.0026) [2024-06-05 18:04:38,614][10367] Updated weights for policy 0, policy_version 2390 (0.0025) [2024-06-05 18:04:38,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 39157760. Throughput: 0: 49308.5. Samples: 39290200. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-05 18:04:38,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:38,956][10347] Saving new best policy, reward=0.001! [2024-06-05 18:04:42,114][10367] Updated weights for policy 0, policy_version 2400 (0.0035) [2024-06-05 18:04:43,920][10130] Fps is (10 sec: 49151.4, 60 sec: 49151.9, 300 sec: 49374.1). Total num frames: 39387136. Throughput: 0: 49432.3. Samples: 39448840. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-05 18:04:43,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:45,252][10367] Updated weights for policy 0, policy_version 2410 (0.0021) [2024-06-05 18:04:48,304][10347] Signal inference workers to stop experience collection... (550 times) [2024-06-05 18:04:48,304][10347] Signal inference workers to resume experience collection... (550 times) [2024-06-05 18:04:48,314][10367] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-05 18:04:48,315][10367] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-05 18:04:48,735][10367] Updated weights for policy 0, policy_version 2420 (0.0030) [2024-06-05 18:04:48,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49425.2, 300 sec: 49429.7). Total num frames: 39649280. Throughput: 0: 49469.3. Samples: 39743360. Policy #0 lag: (min: 0.0, avg: 8.0, max: 20.0) [2024-06-05 18:04:48,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:51,667][10367] Updated weights for policy 0, policy_version 2430 (0.0032) [2024-06-05 18:04:53,920][10130] Fps is (10 sec: 49152.7, 60 sec: 49152.0, 300 sec: 49207.6). Total num frames: 39878656. Throughput: 0: 49044.4. Samples: 40035780. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:04:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:04:55,505][10367] Updated weights for policy 0, policy_version 2440 (0.0026) [2024-06-05 18:04:58,354][10367] Updated weights for policy 0, policy_version 2450 (0.0026) [2024-06-05 18:04:58,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 40140800. Throughput: 0: 49132.0. Samples: 40177620. Policy #0 lag: (min: 1.0, avg: 8.6, max: 22.0) [2024-06-05 18:04:58,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:02,039][10367] Updated weights for policy 0, policy_version 2460 (0.0033) [2024-06-05 18:05:03,920][10130] Fps is (10 sec: 50790.0, 60 sec: 49425.0, 300 sec: 49429.7). Total num frames: 40386560. Throughput: 0: 48993.7. Samples: 40472400. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-05 18:05:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:05:05,184][10367] Updated weights for policy 0, policy_version 2470 (0.0031) [2024-06-05 18:05:08,506][10367] Updated weights for policy 0, policy_version 2480 (0.0026) [2024-06-05 18:05:08,920][10130] Fps is (10 sec: 50790.5, 60 sec: 49425.1, 300 sec: 49485.3). Total num frames: 40648704. Throughput: 0: 49172.1. Samples: 40780400. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-05 18:05:08,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:11,641][10367] Updated weights for policy 0, policy_version 2490 (0.0027) [2024-06-05 18:05:13,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.0, 300 sec: 49263.1). Total num frames: 40878080. Throughput: 0: 49254.6. Samples: 40926520. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:05:13,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:15,156][10367] Updated weights for policy 0, policy_version 2500 (0.0031) [2024-06-05 18:05:18,176][10367] Updated weights for policy 0, policy_version 2510 (0.0032) [2024-06-05 18:05:18,920][10130] Fps is (10 sec: 50790.5, 60 sec: 49425.1, 300 sec: 49540.8). Total num frames: 41156608. Throughput: 0: 49415.2. Samples: 41225020. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-05 18:05:18,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:21,914][10367] Updated weights for policy 0, policy_version 2520 (0.0027) [2024-06-05 18:05:23,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 41369600. Throughput: 0: 49599.0. Samples: 41522160. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-05 18:05:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:05:24,933][10367] Updated weights for policy 0, policy_version 2530 (0.0030) [2024-06-05 18:05:28,554][10367] Updated weights for policy 0, policy_version 2540 (0.0025) [2024-06-05 18:05:28,920][10130] Fps is (10 sec: 47513.0, 60 sec: 49425.0, 300 sec: 49430.3). Total num frames: 41631744. Throughput: 0: 49138.7. Samples: 41660080. Policy #0 lag: (min: 1.0, avg: 9.8, max: 23.0) [2024-06-05 18:05:28,922][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:31,680][10367] Updated weights for policy 0, policy_version 2550 (0.0033) [2024-06-05 18:05:33,920][10130] Fps is (10 sec: 47513.3, 60 sec: 49152.0, 300 sec: 49207.5). Total num frames: 41844736. Throughput: 0: 49107.5. Samples: 41953200. Policy #0 lag: (min: 0.0, avg: 12.4, max: 23.0) [2024-06-05 18:05:33,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:35,202][10367] Updated weights for policy 0, policy_version 2560 (0.0030) [2024-06-05 18:05:38,172][10367] Updated weights for policy 0, policy_version 2570 (0.0031) [2024-06-05 18:05:38,920][10130] Fps is (10 sec: 50791.2, 60 sec: 49698.1, 300 sec: 49485.3). Total num frames: 42139648. Throughput: 0: 49240.5. Samples: 42251600. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 18:05:38,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:41,748][10367] Updated weights for policy 0, policy_version 2580 (0.0033) [2024-06-05 18:05:43,920][10130] Fps is (10 sec: 52428.9, 60 sec: 49698.2, 300 sec: 49429.7). Total num frames: 42369024. Throughput: 0: 49701.3. Samples: 42414180. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-05 18:05:43,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:05:44,670][10367] Updated weights for policy 0, policy_version 2590 (0.0033) [2024-06-05 18:05:47,601][10347] Signal inference workers to stop experience collection... (600 times) [2024-06-05 18:05:47,601][10347] Signal inference workers to resume experience collection... (600 times) [2024-06-05 18:05:47,617][10367] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-05 18:05:47,640][10367] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-05 18:05:48,160][10367] Updated weights for policy 0, policy_version 2600 (0.0033) [2024-06-05 18:05:48,920][10130] Fps is (10 sec: 49150.8, 60 sec: 49698.0, 300 sec: 49485.2). Total num frames: 42631168. Throughput: 0: 49942.6. Samples: 42719820. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-05 18:05:48,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:05:51,478][10367] Updated weights for policy 0, policy_version 2610 (0.0028) [2024-06-05 18:05:53,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 49318.6). Total num frames: 42860544. Throughput: 0: 49599.5. Samples: 43012380. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-05 18:05:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:05:54,719][10367] Updated weights for policy 0, policy_version 2620 (0.0023) [2024-06-05 18:05:57,828][10367] Updated weights for policy 0, policy_version 2630 (0.0030) [2024-06-05 18:05:58,920][10130] Fps is (10 sec: 49153.1, 60 sec: 49698.2, 300 sec: 49485.2). Total num frames: 43122688. Throughput: 0: 49633.4. Samples: 43160020. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-05 18:05:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:06:01,439][10367] Updated weights for policy 0, policy_version 2640 (0.0023) [2024-06-05 18:06:03,920][10130] Fps is (10 sec: 50790.9, 60 sec: 49698.2, 300 sec: 49485.3). Total num frames: 43368448. Throughput: 0: 49518.7. Samples: 43453360. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:06:03,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:04,294][10367] Updated weights for policy 0, policy_version 2650 (0.0025) [2024-06-05 18:06:08,050][10367] Updated weights for policy 0, policy_version 2660 (0.0038) [2024-06-05 18:06:08,920][10130] Fps is (10 sec: 49151.3, 60 sec: 49425.0, 300 sec: 49485.2). Total num frames: 43614208. Throughput: 0: 49539.9. Samples: 43751460. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 18:06:08,921][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:11,157][10367] Updated weights for policy 0, policy_version 2670 (0.0030) [2024-06-05 18:06:13,920][10130] Fps is (10 sec: 47512.9, 60 sec: 49425.0, 300 sec: 49318.6). Total num frames: 43843584. Throughput: 0: 49747.5. Samples: 43898720. Policy #0 lag: (min: 0.0, avg: 10.8, max: 24.0) [2024-06-05 18:06:13,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:14,570][10367] Updated weights for policy 0, policy_version 2680 (0.0025) [2024-06-05 18:06:17,657][10367] Updated weights for policy 0, policy_version 2690 (0.0032) [2024-06-05 18:06:18,920][10130] Fps is (10 sec: 50790.7, 60 sec: 49425.0, 300 sec: 49540.8). Total num frames: 44122112. Throughput: 0: 49937.8. Samples: 44200400. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-05 18:06:18,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:21,097][10367] Updated weights for policy 0, policy_version 2700 (0.0021) [2024-06-05 18:06:23,920][10130] Fps is (10 sec: 52428.9, 60 sec: 49971.2, 300 sec: 49540.8). Total num frames: 44367872. Throughput: 0: 49897.6. Samples: 44497000. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-05 18:06:23,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:06:24,238][10367] Updated weights for policy 0, policy_version 2710 (0.0031) [2024-06-05 18:06:27,685][10367] Updated weights for policy 0, policy_version 2720 (0.0028) [2024-06-05 18:06:28,920][10130] Fps is (10 sec: 49152.4, 60 sec: 49698.2, 300 sec: 49540.8). Total num frames: 44613632. Throughput: 0: 49607.2. Samples: 44646500. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-05 18:06:28,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:30,937][10367] Updated weights for policy 0, policy_version 2730 (0.0033) [2024-06-05 18:06:33,920][10130] Fps is (10 sec: 47514.1, 60 sec: 49971.3, 300 sec: 49374.2). Total num frames: 44843008. Throughput: 0: 49395.4. Samples: 44942600. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-05 18:06:33,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:06:34,060][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002738_44859392.pth... [2024-06-05 18:06:34,121][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002013_32980992.pth [2024-06-05 18:06:34,348][10367] Updated weights for policy 0, policy_version 2740 (0.0021) [2024-06-05 18:06:37,535][10367] Updated weights for policy 0, policy_version 2750 (0.0035) [2024-06-05 18:06:38,920][10130] Fps is (10 sec: 47513.2, 60 sec: 49151.9, 300 sec: 49485.2). Total num frames: 45088768. Throughput: 0: 49466.7. Samples: 45238380. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:06:38,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:06:38,954][10347] Saving new best policy, reward=0.002! [2024-06-05 18:06:40,924][10367] Updated weights for policy 0, policy_version 2760 (0.0029) [2024-06-05 18:06:43,920][10130] Fps is (10 sec: 50790.2, 60 sec: 49698.2, 300 sec: 49540.8). Total num frames: 45350912. Throughput: 0: 49330.2. Samples: 45379880. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-05 18:06:43,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:44,263][10367] Updated weights for policy 0, policy_version 2770 (0.0036) [2024-06-05 18:06:47,589][10367] Updated weights for policy 0, policy_version 2780 (0.0019) [2024-06-05 18:06:48,678][10347] Signal inference workers to stop experience collection... (650 times) [2024-06-05 18:06:48,719][10367] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-05 18:06:48,724][10347] Signal inference workers to resume experience collection... (650 times) [2024-06-05 18:06:48,731][10367] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-05 18:06:48,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49425.2, 300 sec: 49485.2). Total num frames: 45596672. Throughput: 0: 49384.4. Samples: 45675660. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-05 18:06:48,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:50,911][10367] Updated weights for policy 0, policy_version 2790 (0.0023) [2024-06-05 18:06:53,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49698.2, 300 sec: 49429.7). Total num frames: 45842432. Throughput: 0: 49542.4. Samples: 45980860. Policy #0 lag: (min: 1.0, avg: 11.4, max: 23.0) [2024-06-05 18:06:53,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:54,536][10367] Updated weights for policy 0, policy_version 2800 (0.0026) [2024-06-05 18:06:57,545][10367] Updated weights for policy 0, policy_version 2810 (0.0020) [2024-06-05 18:06:58,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49425.0, 300 sec: 49485.2). Total num frames: 46088192. Throughput: 0: 49556.5. Samples: 46128760. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-05 18:06:58,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:01,222][10367] Updated weights for policy 0, policy_version 2820 (0.0029) [2024-06-05 18:07:03,920][10130] Fps is (10 sec: 49151.3, 60 sec: 49425.0, 300 sec: 49485.2). Total num frames: 46333952. Throughput: 0: 49436.4. Samples: 46425040. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-05 18:07:03,921][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:07:04,456][10367] Updated weights for policy 0, policy_version 2830 (0.0038) [2024-06-05 18:07:07,922][10367] Updated weights for policy 0, policy_version 2840 (0.0034) [2024-06-05 18:07:08,920][10130] Fps is (10 sec: 49151.6, 60 sec: 49425.1, 300 sec: 49429.7). Total num frames: 46579712. Throughput: 0: 49463.1. Samples: 46722840. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-05 18:07:08,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:10,962][10367] Updated weights for policy 0, policy_version 2850 (0.0031) [2024-06-05 18:07:13,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 46825472. Throughput: 0: 49317.6. Samples: 46865800. Policy #0 lag: (min: 0.0, avg: 10.5, max: 24.0) [2024-06-05 18:07:13,921][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:07:14,706][10367] Updated weights for policy 0, policy_version 2860 (0.0038) [2024-06-05 18:07:17,674][10367] Updated weights for policy 0, policy_version 2870 (0.0034) [2024-06-05 18:07:18,920][10130] Fps is (10 sec: 49152.6, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 47071232. Throughput: 0: 49164.9. Samples: 47155020. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-05 18:07:18,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:07:21,315][10367] Updated weights for policy 0, policy_version 2880 (0.0024) [2024-06-05 18:07:23,920][10130] Fps is (10 sec: 49152.6, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 47316992. Throughput: 0: 49154.7. Samples: 47450340. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:07:23,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:24,345][10367] Updated weights for policy 0, policy_version 2890 (0.0023) [2024-06-05 18:07:28,141][10367] Updated weights for policy 0, policy_version 2900 (0.0022) [2024-06-05 18:07:28,920][10130] Fps is (10 sec: 49151.4, 60 sec: 49151.9, 300 sec: 49429.7). Total num frames: 47562752. Throughput: 0: 49297.2. Samples: 47598260. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-05 18:07:28,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:07:31,231][10367] Updated weights for policy 0, policy_version 2910 (0.0025) [2024-06-05 18:07:33,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49425.0, 300 sec: 49374.2). Total num frames: 47808512. Throughput: 0: 49257.3. Samples: 47892240. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-05 18:07:33,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:07:34,828][10367] Updated weights for policy 0, policy_version 2920 (0.0023) [2024-06-05 18:07:37,951][10367] Updated weights for policy 0, policy_version 2930 (0.0024) [2024-06-05 18:07:38,920][10130] Fps is (10 sec: 49152.5, 60 sec: 49425.1, 300 sec: 49318.6). Total num frames: 48054272. Throughput: 0: 49046.2. Samples: 48187940. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-05 18:07:38,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:07:41,550][10367] Updated weights for policy 0, policy_version 2940 (0.0018) [2024-06-05 18:07:43,920][10130] Fps is (10 sec: 49152.5, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 48300032. Throughput: 0: 49323.1. Samples: 48348300. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:07:43,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:44,414][10367] Updated weights for policy 0, policy_version 2950 (0.0034) [2024-06-05 18:07:48,265][10367] Updated weights for policy 0, policy_version 2960 (0.0029) [2024-06-05 18:07:48,923][10130] Fps is (10 sec: 47496.2, 60 sec: 48876.0, 300 sec: 49318.0). Total num frames: 48529408. Throughput: 0: 48949.4. Samples: 48627940. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-05 18:07:48,924][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:50,939][10367] Updated weights for policy 0, policy_version 2970 (0.0024) [2024-06-05 18:07:53,922][10130] Fps is (10 sec: 49142.6, 60 sec: 49150.4, 300 sec: 49318.3). Total num frames: 48791552. Throughput: 0: 48998.9. Samples: 48927880. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:07:53,922][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:54,946][10367] Updated weights for policy 0, policy_version 2980 (0.0023) [2024-06-05 18:07:57,670][10367] Updated weights for policy 0, policy_version 2990 (0.0027) [2024-06-05 18:07:58,920][10130] Fps is (10 sec: 50809.1, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 49037312. Throughput: 0: 49038.4. Samples: 49072520. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:07:58,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:08:01,453][10367] Updated weights for policy 0, policy_version 3000 (0.0035) [2024-06-05 18:08:03,920][10130] Fps is (10 sec: 49161.0, 60 sec: 49152.0, 300 sec: 49374.2). Total num frames: 49283072. Throughput: 0: 49464.8. Samples: 49380940. Policy #0 lag: (min: 1.0, avg: 10.7, max: 22.0) [2024-06-05 18:08:03,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:08:03,941][10347] Saving new best policy, reward=0.003! [2024-06-05 18:08:04,290][10367] Updated weights for policy 0, policy_version 3010 (0.0027) [2024-06-05 18:08:04,323][10347] Signal inference workers to stop experience collection... (700 times) [2024-06-05 18:08:04,323][10347] Signal inference workers to resume experience collection... (700 times) [2024-06-05 18:08:04,361][10367] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-06-05 18:08:04,361][10367] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-06-05 18:08:07,990][10367] Updated weights for policy 0, policy_version 3020 (0.0026) [2024-06-05 18:08:08,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 49528832. Throughput: 0: 49534.7. Samples: 49679400. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-05 18:08:08,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:10,702][10367] Updated weights for policy 0, policy_version 3030 (0.0022) [2024-06-05 18:08:13,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49152.1, 300 sec: 49318.6). Total num frames: 49774592. Throughput: 0: 49500.5. Samples: 49825780. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-05 18:08:13,932][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:08:14,735][10367] Updated weights for policy 0, policy_version 3040 (0.0028) [2024-06-05 18:08:17,333][10367] Updated weights for policy 0, policy_version 3050 (0.0025) [2024-06-05 18:08:18,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 50053120. Throughput: 0: 49513.9. Samples: 50120360. Policy #0 lag: (min: 1.0, avg: 10.5, max: 22.0) [2024-06-05 18:08:18,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:21,150][10367] Updated weights for policy 0, policy_version 3060 (0.0028) [2024-06-05 18:08:23,843][10367] Updated weights for policy 0, policy_version 3070 (0.0031) [2024-06-05 18:08:23,920][10130] Fps is (10 sec: 52428.3, 60 sec: 49698.0, 300 sec: 49540.8). Total num frames: 50298880. Throughput: 0: 49665.2. Samples: 50422880. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-05 18:08:23,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:08:27,795][10367] Updated weights for policy 0, policy_version 3080 (0.0027) [2024-06-05 18:08:28,920][10130] Fps is (10 sec: 47513.9, 60 sec: 49425.2, 300 sec: 49429.7). Total num frames: 50528256. Throughput: 0: 49382.7. Samples: 50570520. Policy #0 lag: (min: 0.0, avg: 8.3, max: 22.0) [2024-06-05 18:08:28,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:30,543][10367] Updated weights for policy 0, policy_version 3090 (0.0035) [2024-06-05 18:08:33,920][10130] Fps is (10 sec: 45875.1, 60 sec: 49151.9, 300 sec: 49318.6). Total num frames: 50757632. Throughput: 0: 49800.7. Samples: 50868800. Policy #0 lag: (min: 1.0, avg: 11.3, max: 22.0) [2024-06-05 18:08:33,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:33,935][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003098_50757632.pth... [2024-06-05 18:08:34,001][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002374_38895616.pth [2024-06-05 18:08:34,317][10367] Updated weights for policy 0, policy_version 3100 (0.0028) [2024-06-05 18:08:37,055][10367] Updated weights for policy 0, policy_version 3110 (0.0041) [2024-06-05 18:08:38,920][10130] Fps is (10 sec: 50790.2, 60 sec: 49698.1, 300 sec: 49485.2). Total num frames: 51036160. Throughput: 0: 49824.3. Samples: 51169880. Policy #0 lag: (min: 1.0, avg: 10.2, max: 23.0) [2024-06-05 18:08:38,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:40,897][10367] Updated weights for policy 0, policy_version 3120 (0.0032) [2024-06-05 18:08:43,918][10367] Updated weights for policy 0, policy_version 3130 (0.0023) [2024-06-05 18:08:43,920][10130] Fps is (10 sec: 52429.2, 60 sec: 49698.0, 300 sec: 49485.2). Total num frames: 51281920. Throughput: 0: 50086.1. Samples: 51326400. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-05 18:08:43,928][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:47,297][10367] Updated weights for policy 0, policy_version 3140 (0.0023) [2024-06-05 18:08:48,920][10130] Fps is (10 sec: 49151.8, 60 sec: 49974.2, 300 sec: 49485.2). Total num frames: 51527680. Throughput: 0: 49855.6. Samples: 51624440. Policy #0 lag: (min: 0.0, avg: 7.8, max: 19.0) [2024-06-05 18:08:48,929][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:50,224][10367] Updated weights for policy 0, policy_version 3150 (0.0020) [2024-06-05 18:08:53,862][10367] Updated weights for policy 0, policy_version 3160 (0.0031) [2024-06-05 18:08:53,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49699.6, 300 sec: 49429.7). Total num frames: 51773440. Throughput: 0: 49701.7. Samples: 51915980. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:08:53,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:56,876][10367] Updated weights for policy 0, policy_version 3170 (0.0025) [2024-06-05 18:08:58,920][10130] Fps is (10 sec: 50790.3, 60 sec: 49971.1, 300 sec: 49540.8). Total num frames: 52035584. Throughput: 0: 49820.9. Samples: 52067720. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:08:58,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:09:00,689][10367] Updated weights for policy 0, policy_version 3180 (0.0049) [2024-06-05 18:09:03,823][10367] Updated weights for policy 0, policy_version 3190 (0.0024) [2024-06-05 18:09:03,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 52264960. Throughput: 0: 49583.5. Samples: 52351620. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-05 18:09:03,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:07,348][10367] Updated weights for policy 0, policy_version 3200 (0.0031) [2024-06-05 18:09:08,920][10130] Fps is (10 sec: 47513.6, 60 sec: 49698.1, 300 sec: 49485.2). Total num frames: 52510720. Throughput: 0: 49544.1. Samples: 52652360. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-05 18:09:08,929][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:09:10,389][10367] Updated weights for policy 0, policy_version 3210 (0.0027) [2024-06-05 18:09:13,695][10367] Updated weights for policy 0, policy_version 3220 (0.0037) [2024-06-05 18:09:13,920][10130] Fps is (10 sec: 49152.4, 60 sec: 49698.2, 300 sec: 49374.2). Total num frames: 52756480. Throughput: 0: 49661.2. Samples: 52805280. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:09:13,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:16,763][10367] Updated weights for policy 0, policy_version 3230 (0.0040) [2024-06-05 18:09:17,575][10347] Signal inference workers to stop experience collection... (750 times) [2024-06-05 18:09:17,576][10347] Signal inference workers to resume experience collection... (750 times) [2024-06-05 18:09:17,625][10367] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-06-05 18:09:17,626][10367] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-06-05 18:09:18,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 49540.8). Total num frames: 53035008. Throughput: 0: 49728.1. Samples: 53106560. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-05 18:09:18,929][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:09:20,406][10367] Updated weights for policy 0, policy_version 3240 (0.0027) [2024-06-05 18:09:23,311][10367] Updated weights for policy 0, policy_version 3250 (0.0033) [2024-06-05 18:09:23,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49425.2, 300 sec: 49485.2). Total num frames: 53264384. Throughput: 0: 49584.0. Samples: 53401160. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-05 18:09:23,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:09:26,948][10367] Updated weights for policy 0, policy_version 3260 (0.0026) [2024-06-05 18:09:28,920][10130] Fps is (10 sec: 45875.6, 60 sec: 49425.1, 300 sec: 49485.2). Total num frames: 53493760. Throughput: 0: 49459.7. Samples: 53552080. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-05 18:09:28,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:30,237][10367] Updated weights for policy 0, policy_version 3270 (0.0029) [2024-06-05 18:09:33,811][10367] Updated weights for policy 0, policy_version 3280 (0.0035) [2024-06-05 18:09:33,920][10130] Fps is (10 sec: 47513.5, 60 sec: 49698.3, 300 sec: 49429.7). Total num frames: 53739520. Throughput: 0: 49192.5. Samples: 53838100. Policy #0 lag: (min: 0.0, avg: 10.6, max: 24.0) [2024-06-05 18:09:33,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:37,053][10367] Updated weights for policy 0, policy_version 3290 (0.0032) [2024-06-05 18:09:38,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49698.2, 300 sec: 49596.3). Total num frames: 54018048. Throughput: 0: 49240.6. Samples: 54131800. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:09:38,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:40,466][10367] Updated weights for policy 0, policy_version 3300 (0.0037) [2024-06-05 18:09:43,623][10367] Updated weights for policy 0, policy_version 3310 (0.0024) [2024-06-05 18:09:43,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 54231040. Throughput: 0: 49207.7. Samples: 54282060. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-05 18:09:43,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:09:47,107][10367] Updated weights for policy 0, policy_version 3320 (0.0030) [2024-06-05 18:09:48,920][10130] Fps is (10 sec: 45875.0, 60 sec: 49152.0, 300 sec: 49485.2). Total num frames: 54476800. Throughput: 0: 49515.2. Samples: 54579800. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 18:09:48,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:50,404][10367] Updated weights for policy 0, policy_version 3330 (0.0022) [2024-06-05 18:09:53,880][10367] Updated weights for policy 0, policy_version 3340 (0.0035) [2024-06-05 18:09:53,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 54722560. Throughput: 0: 49396.9. Samples: 54875220. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:09:53,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:09:57,062][10367] Updated weights for policy 0, policy_version 3350 (0.0036) [2024-06-05 18:09:58,920][10130] Fps is (10 sec: 52428.3, 60 sec: 49425.0, 300 sec: 49540.8). Total num frames: 55001088. Throughput: 0: 49290.1. Samples: 55023340. Policy #0 lag: (min: 1.0, avg: 11.5, max: 23.0) [2024-06-05 18:09:58,921][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:10:00,614][10367] Updated weights for policy 0, policy_version 3360 (0.0037) [2024-06-05 18:10:03,745][10367] Updated weights for policy 0, policy_version 3370 (0.0034) [2024-06-05 18:10:03,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49425.2, 300 sec: 49429.7). Total num frames: 55230464. Throughput: 0: 49076.5. Samples: 55315000. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-05 18:10:03,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:10:07,143][10367] Updated weights for policy 0, policy_version 3380 (0.0028) [2024-06-05 18:10:08,920][10130] Fps is (10 sec: 45876.0, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 55459840. Throughput: 0: 48998.3. Samples: 55606080. Policy #0 lag: (min: 0.0, avg: 11.2, max: 24.0) [2024-06-05 18:10:08,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:10:10,340][10367] Updated weights for policy 0, policy_version 3390 (0.0023) [2024-06-05 18:10:13,617][10367] Updated weights for policy 0, policy_version 3400 (0.0031) [2024-06-05 18:10:13,920][10130] Fps is (10 sec: 47513.4, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 55705600. Throughput: 0: 48973.3. Samples: 55755880. Policy #0 lag: (min: 1.0, avg: 9.4, max: 22.0) [2024-06-05 18:10:13,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:10:17,175][10367] Updated weights for policy 0, policy_version 3410 (0.0023) [2024-06-05 18:10:18,920][10130] Fps is (10 sec: 52427.4, 60 sec: 49151.9, 300 sec: 49540.7). Total num frames: 55984128. Throughput: 0: 49152.2. Samples: 56049960. Policy #0 lag: (min: 0.0, avg: 12.9, max: 21.0) [2024-06-05 18:10:18,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:10:20,617][10367] Updated weights for policy 0, policy_version 3420 (0.0026) [2024-06-05 18:10:23,802][10347] Signal inference workers to stop experience collection... (800 times) [2024-06-05 18:10:23,848][10367] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-06-05 18:10:23,851][10347] Signal inference workers to resume experience collection... (800 times) [2024-06-05 18:10:23,859][10367] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-06-05 18:10:23,861][10367] Updated weights for policy 0, policy_version 3430 (0.0031) [2024-06-05 18:10:23,920][10130] Fps is (10 sec: 49151.7, 60 sec: 48878.9, 300 sec: 49374.2). Total num frames: 56197120. Throughput: 0: 49295.4. Samples: 56350100. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-05 18:10:23,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:10:27,387][10367] Updated weights for policy 0, policy_version 3440 (0.0023) [2024-06-05 18:10:28,920][10130] Fps is (10 sec: 45875.4, 60 sec: 49151.8, 300 sec: 49485.2). Total num frames: 56442880. Throughput: 0: 48974.0. Samples: 56485900. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-05 18:10:28,929][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:10:30,402][10367] Updated weights for policy 0, policy_version 3450 (0.0036) [2024-06-05 18:10:33,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48878.9, 300 sec: 49263.1). Total num frames: 56672256. Throughput: 0: 48725.3. Samples: 56772440. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-05 18:10:33,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:10:33,932][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003460_56688640.pth... [2024-06-05 18:10:33,937][10367] Updated weights for policy 0, policy_version 3460 (0.0028) [2024-06-05 18:10:33,973][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002738_44859392.pth [2024-06-05 18:10:37,237][10367] Updated weights for policy 0, policy_version 3470 (0.0032) [2024-06-05 18:10:38,920][10130] Fps is (10 sec: 50791.2, 60 sec: 48878.9, 300 sec: 49429.7). Total num frames: 56950784. Throughput: 0: 48778.2. Samples: 57070240. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:10:38,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:10:40,476][10367] Updated weights for policy 0, policy_version 3480 (0.0025) [2024-06-05 18:10:43,882][10367] Updated weights for policy 0, policy_version 3490 (0.0020) [2024-06-05 18:10:43,920][10130] Fps is (10 sec: 50789.6, 60 sec: 49151.8, 300 sec: 49318.6). Total num frames: 57180160. Throughput: 0: 48892.8. Samples: 57223520. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-05 18:10:43,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:10:47,342][10367] Updated weights for policy 0, policy_version 3500 (0.0034) [2024-06-05 18:10:48,920][10130] Fps is (10 sec: 47513.3, 60 sec: 49152.0, 300 sec: 49374.2). Total num frames: 57425920. Throughput: 0: 48999.4. Samples: 57519980. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-05 18:10:48,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:10:48,921][10347] Saving new best policy, reward=0.005! [2024-06-05 18:10:50,556][10367] Updated weights for policy 0, policy_version 3510 (0.0023) [2024-06-05 18:10:53,920][10130] Fps is (10 sec: 47513.6, 60 sec: 48878.8, 300 sec: 49263.0). Total num frames: 57655296. Throughput: 0: 49058.8. Samples: 57813740. Policy #0 lag: (min: 0.0, avg: 11.2, max: 23.0) [2024-06-05 18:10:53,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:10:54,228][10367] Updated weights for policy 0, policy_version 3520 (0.0029) [2024-06-05 18:10:57,164][10367] Updated weights for policy 0, policy_version 3530 (0.0025) [2024-06-05 18:10:58,920][10130] Fps is (10 sec: 50790.7, 60 sec: 48879.0, 300 sec: 49374.1). Total num frames: 57933824. Throughput: 0: 49097.8. Samples: 57965280. Policy #0 lag: (min: 1.0, avg: 11.0, max: 23.0) [2024-06-05 18:10:58,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:00,607][10367] Updated weights for policy 0, policy_version 3540 (0.0021) [2024-06-05 18:11:03,883][10367] Updated weights for policy 0, policy_version 3550 (0.0038) [2024-06-05 18:11:03,920][10130] Fps is (10 sec: 50791.4, 60 sec: 48878.9, 300 sec: 49318.6). Total num frames: 58163200. Throughput: 0: 49026.0. Samples: 58256120. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-05 18:11:03,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:11:07,174][10367] Updated weights for policy 0, policy_version 3560 (0.0035) [2024-06-05 18:11:08,920][10130] Fps is (10 sec: 47513.4, 60 sec: 49151.9, 300 sec: 49374.2). Total num frames: 58408960. Throughput: 0: 48983.1. Samples: 58554340. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:11:08,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:10,592][10367] Updated weights for policy 0, policy_version 3570 (0.0032) [2024-06-05 18:11:13,920][10130] Fps is (10 sec: 47513.1, 60 sec: 48878.9, 300 sec: 49207.5). Total num frames: 58638336. Throughput: 0: 49009.8. Samples: 58691340. Policy #0 lag: (min: 2.0, avg: 9.8, max: 19.0) [2024-06-05 18:11:13,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:11:13,947][10367] Updated weights for policy 0, policy_version 3580 (0.0027) [2024-06-05 18:11:17,127][10367] Updated weights for policy 0, policy_version 3590 (0.0025) [2024-06-05 18:11:18,920][10130] Fps is (10 sec: 50790.6, 60 sec: 48879.1, 300 sec: 49318.6). Total num frames: 58916864. Throughput: 0: 49374.7. Samples: 58994300. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:11:18,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:11:20,446][10367] Updated weights for policy 0, policy_version 3600 (0.0022) [2024-06-05 18:11:23,638][10347] Signal inference workers to stop experience collection... (850 times) [2024-06-05 18:11:23,678][10367] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-06-05 18:11:23,690][10347] Signal inference workers to resume experience collection... (850 times) [2024-06-05 18:11:23,696][10367] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-06-05 18:11:23,699][10367] Updated weights for policy 0, policy_version 3610 (0.0030) [2024-06-05 18:11:23,920][10130] Fps is (10 sec: 52428.2, 60 sec: 49424.9, 300 sec: 49318.6). Total num frames: 59162624. Throughput: 0: 49548.2. Samples: 59299920. Policy #0 lag: (min: 2.0, avg: 10.7, max: 24.0) [2024-06-05 18:11:23,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:11:26,919][10367] Updated weights for policy 0, policy_version 3620 (0.0029) [2024-06-05 18:11:28,920][10130] Fps is (10 sec: 47513.8, 60 sec: 49152.2, 300 sec: 49318.6). Total num frames: 59392000. Throughput: 0: 49282.9. Samples: 59441240. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:11:28,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:30,264][10367] Updated weights for policy 0, policy_version 3630 (0.0027) [2024-06-05 18:11:33,497][10367] Updated weights for policy 0, policy_version 3640 (0.0029) [2024-06-05 18:11:33,920][10130] Fps is (10 sec: 49152.6, 60 sec: 49698.1, 300 sec: 49374.1). Total num frames: 59654144. Throughput: 0: 49317.7. Samples: 59739280. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:11:33,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:11:36,893][10367] Updated weights for policy 0, policy_version 3650 (0.0032) [2024-06-05 18:11:38,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49425.1, 300 sec: 49374.2). Total num frames: 59916288. Throughput: 0: 49230.9. Samples: 60029120. Policy #0 lag: (min: 0.0, avg: 10.0, max: 19.0) [2024-06-05 18:11:38,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:11:40,276][10367] Updated weights for policy 0, policy_version 3660 (0.0035) [2024-06-05 18:11:43,594][10367] Updated weights for policy 0, policy_version 3670 (0.0027) [2024-06-05 18:11:43,920][10130] Fps is (10 sec: 50791.1, 60 sec: 49698.3, 300 sec: 49374.2). Total num frames: 60162048. Throughput: 0: 49351.2. Samples: 60186080. Policy #0 lag: (min: 1.0, avg: 9.9, max: 21.0) [2024-06-05 18:11:43,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:46,828][10367] Updated weights for policy 0, policy_version 3680 (0.0020) [2024-06-05 18:11:48,920][10130] Fps is (10 sec: 45874.7, 60 sec: 49152.0, 300 sec: 49263.1). Total num frames: 60375040. Throughput: 0: 49556.8. Samples: 60486180. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:11:48,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:50,163][10367] Updated weights for policy 0, policy_version 3690 (0.0022) [2024-06-05 18:11:53,334][10367] Updated weights for policy 0, policy_version 3700 (0.0038) [2024-06-05 18:11:53,920][10130] Fps is (10 sec: 45874.7, 60 sec: 49425.2, 300 sec: 49263.1). Total num frames: 60620800. Throughput: 0: 49396.9. Samples: 60777200. Policy #0 lag: (min: 1.0, avg: 12.4, max: 25.0) [2024-06-05 18:11:53,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:56,937][10367] Updated weights for policy 0, policy_version 3710 (0.0026) [2024-06-05 18:11:58,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 60882944. Throughput: 0: 49817.4. Samples: 60933120. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 18:11:58,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:12:00,338][10367] Updated weights for policy 0, policy_version 3720 (0.0031) [2024-06-05 18:12:03,649][10367] Updated weights for policy 0, policy_version 3730 (0.0037) [2024-06-05 18:12:03,920][10130] Fps is (10 sec: 50790.4, 60 sec: 49425.0, 300 sec: 49318.6). Total num frames: 61128704. Throughput: 0: 49399.9. Samples: 61217300. Policy #0 lag: (min: 1.0, avg: 8.0, max: 19.0) [2024-06-05 18:12:03,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:12:07,101][10367] Updated weights for policy 0, policy_version 3740 (0.0047) [2024-06-05 18:12:08,920][10130] Fps is (10 sec: 47512.8, 60 sec: 49151.9, 300 sec: 49263.1). Total num frames: 61358080. Throughput: 0: 49011.1. Samples: 61505420. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 18:12:08,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:12:10,279][10367] Updated weights for policy 0, policy_version 3750 (0.0022) [2024-06-05 18:12:13,494][10367] Updated weights for policy 0, policy_version 3760 (0.0028) [2024-06-05 18:12:13,920][10130] Fps is (10 sec: 47513.6, 60 sec: 49425.1, 300 sec: 49263.1). Total num frames: 61603840. Throughput: 0: 48998.5. Samples: 61646180. Policy #0 lag: (min: 0.0, avg: 12.5, max: 22.0) [2024-06-05 18:12:13,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:12:16,446][10347] Signal inference workers to stop experience collection... (900 times) [2024-06-05 18:12:16,494][10367] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-06-05 18:12:16,500][10347] Signal inference workers to resume experience collection... (900 times) [2024-06-05 18:12:16,508][10367] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-06-05 18:12:17,004][10367] Updated weights for policy 0, policy_version 3770 (0.0033) [2024-06-05 18:12:18,920][10130] Fps is (10 sec: 50790.9, 60 sec: 49151.9, 300 sec: 49318.6). Total num frames: 61865984. Throughput: 0: 49091.1. Samples: 61948380. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-05 18:12:18,921][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:12:20,137][10367] Updated weights for policy 0, policy_version 3780 (0.0027) [2024-06-05 18:12:23,579][10367] Updated weights for policy 0, policy_version 3790 (0.0031) [2024-06-05 18:12:23,920][10130] Fps is (10 sec: 50790.2, 60 sec: 49152.1, 300 sec: 49318.6). Total num frames: 62111744. Throughput: 0: 49406.5. Samples: 62252420. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:12:23,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:12:27,065][10367] Updated weights for policy 0, policy_version 3800 (0.0030) [2024-06-05 18:12:28,920][10130] Fps is (10 sec: 47514.2, 60 sec: 49152.0, 300 sec: 49263.1). Total num frames: 62341120. Throughput: 0: 48964.4. Samples: 62389480. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-05 18:12:28,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:12:30,435][10367] Updated weights for policy 0, policy_version 3810 (0.0043) [2024-06-05 18:12:33,826][10367] Updated weights for policy 0, policy_version 3820 (0.0023) [2024-06-05 18:12:33,920][10130] Fps is (10 sec: 47513.3, 60 sec: 48878.9, 300 sec: 49263.0). Total num frames: 62586880. Throughput: 0: 48728.8. Samples: 62678980. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-05 18:12:33,921][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:12:33,935][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003820_62586880.pth... [2024-06-05 18:12:33,984][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003098_50757632.pth [2024-06-05 18:12:37,124][10367] Updated weights for policy 0, policy_version 3830 (0.0030) [2024-06-05 18:12:38,920][10130] Fps is (10 sec: 50790.4, 60 sec: 48878.9, 300 sec: 49318.6). Total num frames: 62849024. Throughput: 0: 48750.3. Samples: 62970960. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-05 18:12:38,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:12:40,456][10367] Updated weights for policy 0, policy_version 3840 (0.0045) [2024-06-05 18:12:43,920][10130] Fps is (10 sec: 47514.5, 60 sec: 48332.8, 300 sec: 49263.7). Total num frames: 63062016. Throughput: 0: 48675.6. Samples: 63123520. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-05 18:12:43,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:12:43,934][10367] Updated weights for policy 0, policy_version 3850 (0.0031) [2024-06-05 18:12:47,303][10367] Updated weights for policy 0, policy_version 3860 (0.0029) [2024-06-05 18:12:48,920][10130] Fps is (10 sec: 45874.9, 60 sec: 48879.0, 300 sec: 49207.9). Total num frames: 63307776. Throughput: 0: 48710.7. Samples: 63409280. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-05 18:12:48,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:12:50,676][10367] Updated weights for policy 0, policy_version 3870 (0.0026) [2024-06-05 18:12:53,920][10130] Fps is (10 sec: 49151.2, 60 sec: 48878.9, 300 sec: 49207.5). Total num frames: 63553536. Throughput: 0: 48662.3. Samples: 63695220. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-05 18:12:53,921][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:12:54,247][10367] Updated weights for policy 0, policy_version 3880 (0.0027) [2024-06-05 18:12:57,529][10367] Updated weights for policy 0, policy_version 3890 (0.0027) [2024-06-05 18:12:58,920][10130] Fps is (10 sec: 50790.0, 60 sec: 48878.9, 300 sec: 49263.1). Total num frames: 63815680. Throughput: 0: 48908.0. Samples: 63847040. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-05 18:12:58,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:13:00,938][10367] Updated weights for policy 0, policy_version 3900 (0.0028) [2024-06-05 18:13:03,920][10130] Fps is (10 sec: 47514.7, 60 sec: 48332.9, 300 sec: 49152.0). Total num frames: 64028672. Throughput: 0: 48683.3. Samples: 64139120. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 18:13:03,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:13:04,096][10367] Updated weights for policy 0, policy_version 3910 (0.0023) [2024-06-05 18:13:07,610][10367] Updated weights for policy 0, policy_version 3920 (0.0031) [2024-06-05 18:13:08,920][10130] Fps is (10 sec: 45874.8, 60 sec: 48605.9, 300 sec: 49152.0). Total num frames: 64274432. Throughput: 0: 48501.7. Samples: 64435000. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 18:13:08,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:13:10,800][10367] Updated weights for policy 0, policy_version 3930 (0.0018) [2024-06-05 18:13:13,920][10130] Fps is (10 sec: 50790.0, 60 sec: 48879.0, 300 sec: 49096.5). Total num frames: 64536576. Throughput: 0: 48576.0. Samples: 64575400. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:13:13,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:13:14,180][10367] Updated weights for policy 0, policy_version 3940 (0.0035) [2024-06-05 18:13:17,441][10347] Signal inference workers to stop experience collection... (950 times) [2024-06-05 18:13:17,441][10347] Signal inference workers to resume experience collection... (950 times) [2024-06-05 18:13:17,491][10367] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-06-05 18:13:17,492][10367] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-06-05 18:13:17,573][10367] Updated weights for policy 0, policy_version 3950 (0.0034) [2024-06-05 18:13:18,923][10130] Fps is (10 sec: 52414.4, 60 sec: 48876.6, 300 sec: 49151.5). Total num frames: 64798720. Throughput: 0: 48669.9. Samples: 64869260. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:13:18,923][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:13:21,041][10367] Updated weights for policy 0, policy_version 3960 (0.0030) [2024-06-05 18:13:23,920][10130] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 49040.9). Total num frames: 64995328. Throughput: 0: 48584.9. Samples: 65157280. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-05 18:13:23,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:13:24,492][10367] Updated weights for policy 0, policy_version 3970 (0.0022) [2024-06-05 18:13:27,516][10367] Updated weights for policy 0, policy_version 3980 (0.0018) [2024-06-05 18:13:28,920][10130] Fps is (10 sec: 42607.9, 60 sec: 48059.2, 300 sec: 49040.8). Total num frames: 65224704. Throughput: 0: 48362.8. Samples: 65299880. Policy #0 lag: (min: 1.0, avg: 12.2, max: 25.0) [2024-06-05 18:13:28,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:13:31,229][10367] Updated weights for policy 0, policy_version 3990 (0.0024) [2024-06-05 18:13:33,920][10130] Fps is (10 sec: 52428.2, 60 sec: 48879.0, 300 sec: 49096.4). Total num frames: 65519616. Throughput: 0: 48459.0. Samples: 65589940. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-05 18:13:33,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:13:34,592][10367] Updated weights for policy 0, policy_version 4000 (0.0023) [2024-06-05 18:13:37,789][10367] Updated weights for policy 0, policy_version 4010 (0.0030) [2024-06-05 18:13:38,920][10130] Fps is (10 sec: 55709.5, 60 sec: 48878.9, 300 sec: 49152.0). Total num frames: 65781760. Throughput: 0: 48744.2. Samples: 65888700. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-05 18:13:38,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:13:41,229][10367] Updated weights for policy 0, policy_version 4020 (0.0031) [2024-06-05 18:13:43,920][10130] Fps is (10 sec: 45875.6, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 65978368. Throughput: 0: 48718.8. Samples: 66039380. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-05 18:13:43,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:13:44,546][10367] Updated weights for policy 0, policy_version 4030 (0.0038) [2024-06-05 18:13:48,015][10367] Updated weights for policy 0, policy_version 4040 (0.0026) [2024-06-05 18:13:48,920][10130] Fps is (10 sec: 42598.3, 60 sec: 48332.8, 300 sec: 48929.9). Total num frames: 66207744. Throughput: 0: 48575.0. Samples: 66325000. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-05 18:13:48,920][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:13:51,473][10367] Updated weights for policy 0, policy_version 4050 (0.0041) [2024-06-05 18:13:53,920][10130] Fps is (10 sec: 50790.4, 60 sec: 48879.1, 300 sec: 48985.4). Total num frames: 66486272. Throughput: 0: 48392.2. Samples: 66612640. Policy #0 lag: (min: 0.0, avg: 12.9, max: 22.0) [2024-06-05 18:13:53,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:13:54,586][10367] Updated weights for policy 0, policy_version 4060 (0.0026) [2024-06-05 18:13:58,002][10367] Updated weights for policy 0, policy_version 4070 (0.0028) [2024-06-05 18:13:58,920][10130] Fps is (10 sec: 54066.3, 60 sec: 48878.9, 300 sec: 49096.4). Total num frames: 66748416. Throughput: 0: 48756.7. Samples: 66769460. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-05 18:13:58,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:01,354][10367] Updated weights for policy 0, policy_version 4080 (0.0034) [2024-06-05 18:14:03,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48878.9, 300 sec: 48985.4). Total num frames: 66961408. Throughput: 0: 48665.9. Samples: 67059080. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-05 18:14:03,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:04,782][10367] Updated weights for policy 0, policy_version 4090 (0.0024) [2024-06-05 18:14:08,246][10367] Updated weights for policy 0, policy_version 4100 (0.0033) [2024-06-05 18:14:08,920][10130] Fps is (10 sec: 44237.3, 60 sec: 48606.0, 300 sec: 48929.8). Total num frames: 67190784. Throughput: 0: 48763.9. Samples: 67351660. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-05 18:14:08,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:14:11,514][10367] Updated weights for policy 0, policy_version 4110 (0.0028) [2024-06-05 18:14:13,920][10130] Fps is (10 sec: 50789.9, 60 sec: 48878.9, 300 sec: 48929.8). Total num frames: 67469312. Throughput: 0: 48859.3. Samples: 67498520. Policy #0 lag: (min: 1.0, avg: 11.4, max: 23.0) [2024-06-05 18:14:13,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:14,848][10367] Updated weights for policy 0, policy_version 4120 (0.0033) [2024-06-05 18:14:18,375][10367] Updated weights for policy 0, policy_version 4130 (0.0038) [2024-06-05 18:14:18,920][10130] Fps is (10 sec: 50790.7, 60 sec: 48335.2, 300 sec: 48929.8). Total num frames: 67698688. Throughput: 0: 48785.4. Samples: 67785280. Policy #0 lag: (min: 1.0, avg: 8.3, max: 21.0) [2024-06-05 18:14:18,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:21,783][10367] Updated weights for policy 0, policy_version 4140 (0.0031) [2024-06-05 18:14:23,920][10130] Fps is (10 sec: 45875.6, 60 sec: 48878.9, 300 sec: 48929.8). Total num frames: 67928064. Throughput: 0: 48765.8. Samples: 68083160. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:14:23,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:14:24,576][10347] Signal inference workers to stop experience collection... (1000 times) [2024-06-05 18:14:24,576][10347] Signal inference workers to resume experience collection... (1000 times) [2024-06-05 18:14:24,595][10367] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-06-05 18:14:24,595][10367] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-06-05 18:14:24,869][10367] Updated weights for policy 0, policy_version 4150 (0.0030) [2024-06-05 18:14:28,383][10367] Updated weights for policy 0, policy_version 4160 (0.0030) [2024-06-05 18:14:28,920][10130] Fps is (10 sec: 47512.9, 60 sec: 49152.4, 300 sec: 48929.8). Total num frames: 68173824. Throughput: 0: 48502.1. Samples: 68221980. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-05 18:14:28,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:31,350][10367] Updated weights for policy 0, policy_version 4170 (0.0027) [2024-06-05 18:14:33,924][10130] Fps is (10 sec: 52408.9, 60 sec: 48875.9, 300 sec: 48929.2). Total num frames: 68452352. Throughput: 0: 48815.5. Samples: 68521880. Policy #0 lag: (min: 1.0, avg: 10.0, max: 22.0) [2024-06-05 18:14:33,924][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:14:33,936][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004178_68452352.pth... [2024-06-05 18:14:33,979][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003460_56688640.pth [2024-06-05 18:14:33,982][10347] Saving new best policy, reward=0.006! [2024-06-05 18:14:35,028][10367] Updated weights for policy 0, policy_version 4180 (0.0041) [2024-06-05 18:14:38,147][10367] Updated weights for policy 0, policy_version 4190 (0.0033) [2024-06-05 18:14:38,920][10130] Fps is (10 sec: 52429.4, 60 sec: 48605.8, 300 sec: 49040.9). Total num frames: 68698112. Throughput: 0: 49030.2. Samples: 68819000. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 18:14:38,920][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:14:41,640][10367] Updated weights for policy 0, policy_version 4200 (0.0029) [2024-06-05 18:14:43,920][10130] Fps is (10 sec: 45892.2, 60 sec: 48878.9, 300 sec: 48929.8). Total num frames: 68911104. Throughput: 0: 48780.6. Samples: 68964580. Policy #0 lag: (min: 0.0, avg: 11.8, max: 22.0) [2024-06-05 18:14:43,920][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:14:45,022][10367] Updated weights for policy 0, policy_version 4210 (0.0027) [2024-06-05 18:14:48,421][10367] Updated weights for policy 0, policy_version 4220 (0.0028) [2024-06-05 18:14:48,920][10130] Fps is (10 sec: 45874.9, 60 sec: 49151.9, 300 sec: 48929.8). Total num frames: 69156864. Throughput: 0: 48812.3. Samples: 69255640. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-05 18:14:48,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:51,454][10367] Updated weights for policy 0, policy_version 4230 (0.0035) [2024-06-05 18:14:53,920][10130] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 48929.9). Total num frames: 69435392. Throughput: 0: 48889.4. Samples: 69551680. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 18:14:53,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:14:55,027][10367] Updated weights for policy 0, policy_version 4240 (0.0041) [2024-06-05 18:14:58,047][10367] Updated weights for policy 0, policy_version 4250 (0.0027) [2024-06-05 18:14:58,920][10130] Fps is (10 sec: 52429.4, 60 sec: 48879.1, 300 sec: 48985.4). Total num frames: 69681152. Throughput: 0: 49213.9. Samples: 69713140. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-05 18:14:58,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:15:01,773][10367] Updated weights for policy 0, policy_version 4260 (0.0037) [2024-06-05 18:15:03,924][10130] Fps is (10 sec: 47496.1, 60 sec: 49148.9, 300 sec: 48984.8). Total num frames: 69910528. Throughput: 0: 49217.7. Samples: 70000260. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:15:03,924][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:15:04,951][10367] Updated weights for policy 0, policy_version 4270 (0.0035) [2024-06-05 18:15:08,317][10367] Updated weights for policy 0, policy_version 4280 (0.0035) [2024-06-05 18:15:08,920][10130] Fps is (10 sec: 44236.6, 60 sec: 48879.0, 300 sec: 48874.3). Total num frames: 70123520. Throughput: 0: 49046.2. Samples: 70290240. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:15:08,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:15:11,673][10367] Updated weights for policy 0, policy_version 4290 (0.0024) [2024-06-05 18:15:13,920][10130] Fps is (10 sec: 49170.4, 60 sec: 48879.0, 300 sec: 48874.3). Total num frames: 70402048. Throughput: 0: 49359.3. Samples: 70443140. Policy #0 lag: (min: 0.0, avg: 11.8, max: 21.0) [2024-06-05 18:15:13,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:15:13,983][10347] Saving new best policy, reward=0.008! [2024-06-05 18:15:15,044][10367] Updated weights for policy 0, policy_version 4300 (0.0043) [2024-06-05 18:15:18,260][10367] Updated weights for policy 0, policy_version 4310 (0.0036) [2024-06-05 18:15:18,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 70647808. Throughput: 0: 49237.5. Samples: 70737380. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:15:18,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:15:21,743][10367] Updated weights for policy 0, policy_version 4320 (0.0020) [2024-06-05 18:15:23,920][10130] Fps is (10 sec: 49151.0, 60 sec: 49424.9, 300 sec: 48985.4). Total num frames: 70893568. Throughput: 0: 49255.4. Samples: 71035500. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-05 18:15:23,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:15:24,854][10367] Updated weights for policy 0, policy_version 4330 (0.0026) [2024-06-05 18:15:28,462][10367] Updated weights for policy 0, policy_version 4340 (0.0028) [2024-06-05 18:15:28,920][10130] Fps is (10 sec: 47512.9, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 71122944. Throughput: 0: 48995.9. Samples: 71169400. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-05 18:15:28,921][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:15:31,725][10367] Updated weights for policy 0, policy_version 4350 (0.0034) [2024-06-05 18:15:33,920][10130] Fps is (10 sec: 49152.6, 60 sec: 48882.0, 300 sec: 48929.8). Total num frames: 71385088. Throughput: 0: 49076.0. Samples: 71464060. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-05 18:15:33,920][10130] Avg episode reward: [(0, '0.009')] [2024-06-05 18:15:34,865][10367] Updated weights for policy 0, policy_version 4360 (0.0030) [2024-06-05 18:15:38,424][10367] Updated weights for policy 0, policy_version 4370 (0.0029) [2024-06-05 18:15:38,920][10130] Fps is (10 sec: 52428.7, 60 sec: 49151.9, 300 sec: 49040.9). Total num frames: 71647232. Throughput: 0: 49336.7. Samples: 71771840. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-05 18:15:38,921][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:15:40,566][10347] Signal inference workers to stop experience collection... (1050 times) [2024-06-05 18:15:40,586][10367] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-06-05 18:15:40,672][10347] Signal inference workers to resume experience collection... (1050 times) [2024-06-05 18:15:40,673][10367] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-06-05 18:15:41,813][10367] Updated weights for policy 0, policy_version 4380 (0.0026) [2024-06-05 18:15:43,920][10130] Fps is (10 sec: 47513.5, 60 sec: 49152.0, 300 sec: 48929.8). Total num frames: 71860224. Throughput: 0: 48759.0. Samples: 71907300. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:15:43,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:15:44,958][10367] Updated weights for policy 0, policy_version 4390 (0.0030) [2024-06-05 18:15:48,556][10367] Updated weights for policy 0, policy_version 4400 (0.0025) [2024-06-05 18:15:48,920][10130] Fps is (10 sec: 45875.9, 60 sec: 49152.1, 300 sec: 48985.4). Total num frames: 72105984. Throughput: 0: 48916.4. Samples: 72201320. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:15:48,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:15:51,699][10367] Updated weights for policy 0, policy_version 4410 (0.0032) [2024-06-05 18:15:53,920][10130] Fps is (10 sec: 49152.5, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 72351744. Throughput: 0: 48901.4. Samples: 72490800. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:15:53,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:15:55,278][10367] Updated weights for policy 0, policy_version 4420 (0.0028) [2024-06-05 18:15:58,493][10367] Updated weights for policy 0, policy_version 4430 (0.0029) [2024-06-05 18:15:58,920][10130] Fps is (10 sec: 50790.5, 60 sec: 48878.9, 300 sec: 48985.4). Total num frames: 72613888. Throughput: 0: 48832.4. Samples: 72640600. Policy #0 lag: (min: 0.0, avg: 9.6, max: 24.0) [2024-06-05 18:15:58,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:01,878][10367] Updated weights for policy 0, policy_version 4440 (0.0029) [2024-06-05 18:16:03,920][10130] Fps is (10 sec: 49151.1, 60 sec: 48881.8, 300 sec: 48929.8). Total num frames: 72843264. Throughput: 0: 48819.8. Samples: 72934280. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-05 18:16:03,921][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:05,273][10367] Updated weights for policy 0, policy_version 4450 (0.0034) [2024-06-05 18:16:08,718][10367] Updated weights for policy 0, policy_version 4460 (0.0030) [2024-06-05 18:16:08,920][10130] Fps is (10 sec: 45875.3, 60 sec: 49152.0, 300 sec: 48929.9). Total num frames: 73072640. Throughput: 0: 48721.5. Samples: 73227960. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:16:08,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:16:11,703][10367] Updated weights for policy 0, policy_version 4470 (0.0032) [2024-06-05 18:16:13,920][10130] Fps is (10 sec: 47514.5, 60 sec: 48605.9, 300 sec: 48818.8). Total num frames: 73318400. Throughput: 0: 49091.8. Samples: 73378520. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:16:13,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:15,637][10367] Updated weights for policy 0, policy_version 4480 (0.0023) [2024-06-05 18:16:18,395][10367] Updated weights for policy 0, policy_version 4490 (0.0037) [2024-06-05 18:16:18,920][10130] Fps is (10 sec: 50789.5, 60 sec: 48878.8, 300 sec: 48874.3). Total num frames: 73580544. Throughput: 0: 48871.5. Samples: 73663280. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:16:18,921][10130] Avg episode reward: [(0, '0.009')] [2024-06-05 18:16:22,445][10367] Updated weights for policy 0, policy_version 4500 (0.0037) [2024-06-05 18:16:23,920][10130] Fps is (10 sec: 49151.5, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 73809920. Throughput: 0: 48405.5. Samples: 73950080. Policy #0 lag: (min: 0.0, avg: 9.4, max: 23.0) [2024-06-05 18:16:23,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:25,456][10367] Updated weights for policy 0, policy_version 4510 (0.0033) [2024-06-05 18:16:28,920][10130] Fps is (10 sec: 45875.9, 60 sec: 48606.0, 300 sec: 48763.2). Total num frames: 74039296. Throughput: 0: 48401.4. Samples: 74085360. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-05 18:16:28,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:16:29,107][10367] Updated weights for policy 0, policy_version 4520 (0.0026) [2024-06-05 18:16:32,267][10367] Updated weights for policy 0, policy_version 4530 (0.0039) [2024-06-05 18:16:33,920][10130] Fps is (10 sec: 49151.8, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 74301440. Throughput: 0: 48416.4. Samples: 74380060. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 18:16:33,921][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:33,929][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004535_74301440.pth... [2024-06-05 18:16:33,973][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003820_62586880.pth [2024-06-05 18:16:35,928][10367] Updated weights for policy 0, policy_version 4540 (0.0037) [2024-06-05 18:16:38,815][10367] Updated weights for policy 0, policy_version 4550 (0.0030) [2024-06-05 18:16:38,920][10130] Fps is (10 sec: 50790.4, 60 sec: 48332.9, 300 sec: 48763.2). Total num frames: 74547200. Throughput: 0: 48551.5. Samples: 74675620. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:16:38,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:16:43,010][10367] Updated weights for policy 0, policy_version 4560 (0.0032) [2024-06-05 18:16:43,923][10130] Fps is (10 sec: 45858.9, 60 sec: 48329.9, 300 sec: 48762.6). Total num frames: 74760192. Throughput: 0: 48322.3. Samples: 74815280. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-05 18:16:43,924][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:16:45,486][10367] Updated weights for policy 0, policy_version 4570 (0.0034) [2024-06-05 18:16:48,920][10130] Fps is (10 sec: 45874.8, 60 sec: 48332.7, 300 sec: 48763.2). Total num frames: 75005952. Throughput: 0: 48262.3. Samples: 75106080. Policy #0 lag: (min: 0.0, avg: 11.8, max: 22.0) [2024-06-05 18:16:48,920][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:16:48,921][10347] Saving new best policy, reward=0.012! [2024-06-05 18:16:49,810][10367] Updated weights for policy 0, policy_version 4580 (0.0033) [2024-06-05 18:16:51,643][10347] Signal inference workers to stop experience collection... (1100 times) [2024-06-05 18:16:51,643][10347] Signal inference workers to resume experience collection... (1100 times) [2024-06-05 18:16:51,680][10367] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-06-05 18:16:51,680][10367] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-06-05 18:16:52,411][10367] Updated weights for policy 0, policy_version 4590 (0.0039) [2024-06-05 18:16:53,920][10130] Fps is (10 sec: 50808.5, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 75268096. Throughput: 0: 48164.3. Samples: 75395360. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-05 18:16:53,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:16:56,399][10367] Updated weights for policy 0, policy_version 4600 (0.0035) [2024-06-05 18:16:58,920][10130] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 48707.7). Total num frames: 75497472. Throughput: 0: 48190.6. Samples: 75547100. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:16:58,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:59,295][10367] Updated weights for policy 0, policy_version 4610 (0.0034) [2024-06-05 18:17:03,273][10367] Updated weights for policy 0, policy_version 4620 (0.0030) [2024-06-05 18:17:03,920][10130] Fps is (10 sec: 44237.2, 60 sec: 47786.8, 300 sec: 48652.2). Total num frames: 75710464. Throughput: 0: 48051.7. Samples: 75825600. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:17:03,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:17:05,892][10367] Updated weights for policy 0, policy_version 4630 (0.0033) [2024-06-05 18:17:08,920][10130] Fps is (10 sec: 49151.9, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 75988992. Throughput: 0: 47989.3. Samples: 76109600. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:17:08,921][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:17:10,492][10367] Updated weights for policy 0, policy_version 4640 (0.0022) [2024-06-05 18:17:12,632][10367] Updated weights for policy 0, policy_version 4650 (0.0028) [2024-06-05 18:17:13,920][10130] Fps is (10 sec: 49152.2, 60 sec: 48059.7, 300 sec: 48596.6). Total num frames: 76201984. Throughput: 0: 48352.9. Samples: 76261240. Policy #0 lag: (min: 0.0, avg: 11.9, max: 20.0) [2024-06-05 18:17:13,920][10130] Avg episode reward: [(0, '0.009')] [2024-06-05 18:17:17,124][10367] Updated weights for policy 0, policy_version 4660 (0.0034) [2024-06-05 18:17:18,920][10130] Fps is (10 sec: 47514.4, 60 sec: 48059.9, 300 sec: 48652.2). Total num frames: 76464128. Throughput: 0: 48333.1. Samples: 76555040. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:17:18,920][10130] Avg episode reward: [(0, '0.009')] [2024-06-05 18:17:19,654][10367] Updated weights for policy 0, policy_version 4670 (0.0027) [2024-06-05 18:17:23,682][10367] Updated weights for policy 0, policy_version 4680 (0.0029) [2024-06-05 18:17:23,920][10130] Fps is (10 sec: 47513.4, 60 sec: 47786.7, 300 sec: 48596.6). Total num frames: 76677120. Throughput: 0: 48136.9. Samples: 76841780. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-05 18:17:23,920][10130] Avg episode reward: [(0, '0.010')] [2024-06-05 18:17:26,490][10367] Updated weights for policy 0, policy_version 4690 (0.0025) [2024-06-05 18:17:28,920][10130] Fps is (10 sec: 49151.5, 60 sec: 48605.9, 300 sec: 48707.7). Total num frames: 76955648. Throughput: 0: 48214.1. Samples: 76984740. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:17:28,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:17:30,538][10367] Updated weights for policy 0, policy_version 4700 (0.0034) [2024-06-05 18:17:33,110][10367] Updated weights for policy 0, policy_version 4710 (0.0028) [2024-06-05 18:17:33,924][10130] Fps is (10 sec: 50771.7, 60 sec: 48056.9, 300 sec: 48596.0). Total num frames: 77185024. Throughput: 0: 48226.3. Samples: 77276440. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:17:33,924][10130] Avg episode reward: [(0, '0.010')] [2024-06-05 18:17:37,554][10367] Updated weights for policy 0, policy_version 4720 (0.0026) [2024-06-05 18:17:38,920][10130] Fps is (10 sec: 45875.0, 60 sec: 47786.6, 300 sec: 48652.1). Total num frames: 77414400. Throughput: 0: 48136.0. Samples: 77561480. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:17:38,920][10130] Avg episode reward: [(0, '0.013')] [2024-06-05 18:17:38,980][10347] Saving new best policy, reward=0.013! [2024-06-05 18:17:40,141][10367] Updated weights for policy 0, policy_version 4730 (0.0025) [2024-06-05 18:17:43,920][10130] Fps is (10 sec: 45891.9, 60 sec: 48062.6, 300 sec: 48596.6). Total num frames: 77643776. Throughput: 0: 47862.7. Samples: 77700920. Policy #0 lag: (min: 0.0, avg: 10.6, max: 20.0) [2024-06-05 18:17:43,920][10130] Avg episode reward: [(0, '0.010')] [2024-06-05 18:17:44,205][10367] Updated weights for policy 0, policy_version 4740 (0.0038) [2024-06-05 18:17:47,203][10367] Updated weights for policy 0, policy_version 4750 (0.0031) [2024-06-05 18:17:48,920][10130] Fps is (10 sec: 49152.5, 60 sec: 48332.9, 300 sec: 48652.2). Total num frames: 77905920. Throughput: 0: 48052.5. Samples: 77987960. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-05 18:17:48,920][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:17:50,742][10367] Updated weights for policy 0, policy_version 4760 (0.0025) [2024-06-05 18:17:53,920][10130] Fps is (10 sec: 49152.1, 60 sec: 47786.7, 300 sec: 48541.1). Total num frames: 78135296. Throughput: 0: 48257.8. Samples: 78281200. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-05 18:17:53,920][10130] Avg episode reward: [(0, '0.011')] [2024-06-05 18:17:53,997][10367] Updated weights for policy 0, policy_version 4770 (0.0030) [2024-06-05 18:17:57,487][10367] Updated weights for policy 0, policy_version 4780 (0.0030) [2024-06-05 18:17:58,920][10130] Fps is (10 sec: 49151.6, 60 sec: 48332.8, 300 sec: 48707.7). Total num frames: 78397440. Throughput: 0: 48127.5. Samples: 78426980. Policy #0 lag: (min: 1.0, avg: 10.4, max: 22.0) [2024-06-05 18:17:58,920][10130] Avg episode reward: [(0, '0.013')] [2024-06-05 18:18:00,483][10367] Updated weights for policy 0, policy_version 4790 (0.0025) [2024-06-05 18:18:02,976][10347] Signal inference workers to stop experience collection... (1150 times) [2024-06-05 18:18:02,976][10347] Signal inference workers to resume experience collection... (1150 times) [2024-06-05 18:18:02,985][10367] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-06-05 18:18:03,005][10367] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-06-05 18:18:03,920][10130] Fps is (10 sec: 47512.8, 60 sec: 48332.7, 300 sec: 48596.6). Total num frames: 78610432. Throughput: 0: 48143.7. Samples: 78721520. Policy #0 lag: (min: 1.0, avg: 10.4, max: 22.0) [2024-06-05 18:18:03,921][10130] Avg episode reward: [(0, '0.009')] [2024-06-05 18:18:04,331][10367] Updated weights for policy 0, policy_version 4800 (0.0026) [2024-06-05 18:18:07,078][10367] Updated weights for policy 0, policy_version 4810 (0.0038) [2024-06-05 18:18:08,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48059.8, 300 sec: 48596.6). Total num frames: 78872576. Throughput: 0: 48372.5. Samples: 79018540. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:18:08,920][10130] Avg episode reward: [(0, '0.011')] [2024-06-05 18:18:10,896][10367] Updated weights for policy 0, policy_version 4820 (0.0046) [2024-06-05 18:18:13,920][10130] Fps is (10 sec: 50791.4, 60 sec: 48605.8, 300 sec: 48541.5). Total num frames: 79118336. Throughput: 0: 48461.8. Samples: 79165520. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-05 18:18:13,920][10130] Avg episode reward: [(0, '0.010')] [2024-06-05 18:18:13,967][10367] Updated weights for policy 0, policy_version 4830 (0.0029) [2024-06-05 18:18:17,433][10367] Updated weights for policy 0, policy_version 4840 (0.0026) [2024-06-05 18:18:18,920][10130] Fps is (10 sec: 47513.5, 60 sec: 48059.7, 300 sec: 48652.1). Total num frames: 79347712. Throughput: 0: 48419.5. Samples: 79455140. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-05 18:18:18,920][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:18:20,863][10367] Updated weights for policy 0, policy_version 4850 (0.0035) [2024-06-05 18:18:23,920][10130] Fps is (10 sec: 47513.5, 60 sec: 48605.9, 300 sec: 48707.8). Total num frames: 79593472. Throughput: 0: 48383.6. Samples: 79738740. Policy #0 lag: (min: 1.0, avg: 11.6, max: 22.0) [2024-06-05 18:18:23,920][10130] Avg episode reward: [(0, '0.010')] [2024-06-05 18:18:24,617][10367] Updated weights for policy 0, policy_version 4860 (0.0032) [2024-06-05 18:18:27,483][10367] Updated weights for policy 0, policy_version 4870 (0.0028) [2024-06-05 18:18:28,920][10130] Fps is (10 sec: 49151.4, 60 sec: 48059.6, 300 sec: 48541.1). Total num frames: 79839232. Throughput: 0: 48529.2. Samples: 79884740. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:18:28,921][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:18:31,509][10367] Updated weights for policy 0, policy_version 4880 (0.0025) [2024-06-05 18:18:33,920][10130] Fps is (10 sec: 50790.2, 60 sec: 48608.8, 300 sec: 48541.1). Total num frames: 80101376. Throughput: 0: 48709.2. Samples: 80179880. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:18:33,920][10130] Avg episode reward: [(0, '0.011')] [2024-06-05 18:18:34,030][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004890_80117760.pth... [2024-06-05 18:18:34,044][10367] Updated weights for policy 0, policy_version 4890 (0.0039) [2024-06-05 18:18:34,072][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004178_68452352.pth [2024-06-05 18:18:38,027][10367] Updated weights for policy 0, policy_version 4900 (0.0027) [2024-06-05 18:18:38,920][10130] Fps is (10 sec: 47514.4, 60 sec: 48332.9, 300 sec: 48596.6). Total num frames: 80314368. Throughput: 0: 48740.5. Samples: 80474520. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:18:38,920][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:18:38,934][10347] Saving new best policy, reward=0.015! [2024-06-05 18:18:40,934][10367] Updated weights for policy 0, policy_version 4910 (0.0031) [2024-06-05 18:18:43,920][10130] Fps is (10 sec: 45875.7, 60 sec: 48605.9, 300 sec: 48652.2). Total num frames: 80560128. Throughput: 0: 48533.4. Samples: 80610980. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-05 18:18:43,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:18:44,637][10367] Updated weights for policy 0, policy_version 4920 (0.0026) [2024-06-05 18:18:47,838][10367] Updated weights for policy 0, policy_version 4930 (0.0029) [2024-06-05 18:18:48,920][10130] Fps is (10 sec: 50789.6, 60 sec: 48605.7, 300 sec: 48596.6). Total num frames: 80822272. Throughput: 0: 48488.1. Samples: 80903480. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 18:18:48,921][10130] Avg episode reward: [(0, '0.013')] [2024-06-05 18:18:51,624][10367] Updated weights for policy 0, policy_version 4940 (0.0032) [2024-06-05 18:18:53,920][10130] Fps is (10 sec: 50789.1, 60 sec: 48878.8, 300 sec: 48541.1). Total num frames: 81068032. Throughput: 0: 48307.3. Samples: 81192380. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:18:53,921][10130] Avg episode reward: [(0, '0.014')] [2024-06-05 18:18:54,496][10367] Updated weights for policy 0, policy_version 4950 (0.0029) [2024-06-05 18:18:58,555][10367] Updated weights for policy 0, policy_version 4960 (0.0036) [2024-06-05 18:18:58,920][10130] Fps is (10 sec: 45875.4, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 81281024. Throughput: 0: 48233.3. Samples: 81336020. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:18:58,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:18:58,921][10347] Saving new best policy, reward=0.016! [2024-06-05 18:19:01,180][10367] Updated weights for policy 0, policy_version 4970 (0.0038) [2024-06-05 18:19:03,920][10130] Fps is (10 sec: 44238.0, 60 sec: 48333.0, 300 sec: 48541.1). Total num frames: 81510400. Throughput: 0: 48164.1. Samples: 81622520. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:19:03,920][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:19:05,148][10367] Updated weights for policy 0, policy_version 4980 (0.0028) [2024-06-05 18:19:08,145][10367] Updated weights for policy 0, policy_version 4990 (0.0022) [2024-06-05 18:19:08,920][10130] Fps is (10 sec: 49151.6, 60 sec: 48332.7, 300 sec: 48485.5). Total num frames: 81772544. Throughput: 0: 48267.9. Samples: 81910800. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-05 18:19:08,920][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:19:11,770][10367] Updated weights for policy 0, policy_version 5000 (0.0032) [2024-06-05 18:19:13,920][10130] Fps is (10 sec: 50790.1, 60 sec: 48332.8, 300 sec: 48541.1). Total num frames: 82018304. Throughput: 0: 48343.7. Samples: 82060200. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-05 18:19:13,920][10130] Avg episode reward: [(0, '0.014')] [2024-06-05 18:19:14,902][10367] Updated weights for policy 0, policy_version 5010 (0.0031) [2024-06-05 18:19:18,725][10367] Updated weights for policy 0, policy_version 5020 (0.0026) [2024-06-05 18:19:18,920][10130] Fps is (10 sec: 47514.2, 60 sec: 48332.8, 300 sec: 48541.1). Total num frames: 82247680. Throughput: 0: 48265.4. Samples: 82351820. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:19:18,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:19:21,763][10367] Updated weights for policy 0, policy_version 5030 (0.0025) [2024-06-05 18:19:23,920][10130] Fps is (10 sec: 47513.2, 60 sec: 48332.7, 300 sec: 48541.1). Total num frames: 82493440. Throughput: 0: 48062.1. Samples: 82637320. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:19:23,920][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:19:25,467][10367] Updated weights for policy 0, policy_version 5040 (0.0033) [2024-06-05 18:19:28,369][10367] Updated weights for policy 0, policy_version 5050 (0.0030) [2024-06-05 18:19:28,920][10130] Fps is (10 sec: 50790.1, 60 sec: 48605.9, 300 sec: 48486.1). Total num frames: 82755584. Throughput: 0: 48358.1. Samples: 82787100. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-05 18:19:28,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:19:30,385][10347] Signal inference workers to stop experience collection... (1200 times) [2024-06-05 18:19:30,386][10347] Signal inference workers to resume experience collection... (1200 times) [2024-06-05 18:19:30,425][10367] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-06-05 18:19:30,425][10367] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-06-05 18:19:32,034][10367] Updated weights for policy 0, policy_version 5060 (0.0038) [2024-06-05 18:19:33,920][10130] Fps is (10 sec: 47513.7, 60 sec: 47786.7, 300 sec: 48374.4). Total num frames: 82968576. Throughput: 0: 48206.3. Samples: 83072760. Policy #0 lag: (min: 1.0, avg: 9.4, max: 22.0) [2024-06-05 18:19:33,920][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:19:35,099][10367] Updated weights for policy 0, policy_version 5070 (0.0028) [2024-06-05 18:19:38,920][10130] Fps is (10 sec: 45875.5, 60 sec: 48332.8, 300 sec: 48485.5). Total num frames: 83214336. Throughput: 0: 48204.7. Samples: 83361580. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-05 18:19:38,920][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:19:38,999][10367] Updated weights for policy 0, policy_version 5080 (0.0034) [2024-06-05 18:19:42,079][10367] Updated weights for policy 0, policy_version 5090 (0.0035) [2024-06-05 18:19:43,920][10130] Fps is (10 sec: 49151.7, 60 sec: 48332.7, 300 sec: 48485.5). Total num frames: 83460096. Throughput: 0: 48076.8. Samples: 83499480. Policy #0 lag: (min: 1.0, avg: 10.9, max: 22.0) [2024-06-05 18:19:43,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:19:45,984][10367] Updated weights for policy 0, policy_version 5100 (0.0027) [2024-06-05 18:19:48,748][10367] Updated weights for policy 0, policy_version 5110 (0.0036) [2024-06-05 18:19:48,920][10130] Fps is (10 sec: 50790.6, 60 sec: 48332.9, 300 sec: 48430.0). Total num frames: 83722240. Throughput: 0: 48318.2. Samples: 83796840. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:19:48,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:19:52,601][10367] Updated weights for policy 0, policy_version 5120 (0.0021) [2024-06-05 18:19:53,920][10130] Fps is (10 sec: 47514.2, 60 sec: 47786.8, 300 sec: 48318.9). Total num frames: 83935232. Throughput: 0: 48329.5. Samples: 84085620. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:19:53,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:19:54,034][10347] Saving new best policy, reward=0.017! [2024-06-05 18:19:55,595][10367] Updated weights for policy 0, policy_version 5130 (0.0031) [2024-06-05 18:19:58,920][10130] Fps is (10 sec: 45875.0, 60 sec: 48332.8, 300 sec: 48375.1). Total num frames: 84180992. Throughput: 0: 48228.0. Samples: 84230460. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 18:19:58,920][10130] Avg episode reward: [(0, '0.013')] [2024-06-05 18:19:59,189][10367] Updated weights for policy 0, policy_version 5140 (0.0022) [2024-06-05 18:20:02,310][10367] Updated weights for policy 0, policy_version 5150 (0.0037) [2024-06-05 18:20:03,920][10130] Fps is (10 sec: 47513.7, 60 sec: 48332.8, 300 sec: 48430.0). Total num frames: 84410368. Throughput: 0: 48004.9. Samples: 84512040. Policy #0 lag: (min: 1.0, avg: 11.8, max: 23.0) [2024-06-05 18:20:03,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:20:06,099][10367] Updated weights for policy 0, policy_version 5160 (0.0034) [2024-06-05 18:20:08,920][10130] Fps is (10 sec: 49152.2, 60 sec: 48333.0, 300 sec: 48374.5). Total num frames: 84672512. Throughput: 0: 48268.2. Samples: 84809380. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:20:08,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:20:09,177][10367] Updated weights for policy 0, policy_version 5170 (0.0033) [2024-06-05 18:20:13,161][10367] Updated weights for policy 0, policy_version 5180 (0.0036) [2024-06-05 18:20:13,924][10130] Fps is (10 sec: 49133.7, 60 sec: 48056.8, 300 sec: 48318.3). Total num frames: 84901888. Throughput: 0: 47985.0. Samples: 84946600. Policy #0 lag: (min: 1.0, avg: 9.6, max: 21.0) [2024-06-05 18:20:13,924][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:20:15,943][10367] Updated weights for policy 0, policy_version 5190 (0.0019) [2024-06-05 18:20:18,923][10130] Fps is (10 sec: 45858.1, 60 sec: 48056.8, 300 sec: 48262.8). Total num frames: 85131264. Throughput: 0: 48273.9. Samples: 85245260. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-05 18:20:18,924][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:20:18,968][10347] Saving new best policy, reward=0.022! [2024-06-05 18:20:19,732][10367] Updated weights for policy 0, policy_version 5200 (0.0024) [2024-06-05 18:20:22,947][10367] Updated weights for policy 0, policy_version 5210 (0.0019) [2024-06-05 18:20:23,920][10130] Fps is (10 sec: 47530.9, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 85377024. Throughput: 0: 48171.0. Samples: 85529280. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-05 18:20:23,920][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:20:26,371][10367] Updated weights for policy 0, policy_version 5220 (0.0028) [2024-06-05 18:20:28,920][10130] Fps is (10 sec: 49170.1, 60 sec: 47786.7, 300 sec: 48263.4). Total num frames: 85622784. Throughput: 0: 48381.5. Samples: 85676640. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 18:20:28,920][10130] Avg episode reward: [(0, '0.021')] [2024-06-05 18:20:29,641][10367] Updated weights for policy 0, policy_version 5230 (0.0034) [2024-06-05 18:20:33,532][10367] Updated weights for policy 0, policy_version 5240 (0.0032) [2024-06-05 18:20:33,920][10130] Fps is (10 sec: 49151.7, 60 sec: 48332.8, 300 sec: 48207.8). Total num frames: 85868544. Throughput: 0: 48135.4. Samples: 85962940. Policy #0 lag: (min: 1.0, avg: 10.2, max: 20.0) [2024-06-05 18:20:33,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:20:34,043][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005242_85884928.pth... [2024-06-05 18:20:34,091][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004535_74301440.pth [2024-06-05 18:20:36,371][10367] Updated weights for policy 0, policy_version 5250 (0.0034) [2024-06-05 18:20:38,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48059.8, 300 sec: 48263.4). Total num frames: 86097920. Throughput: 0: 47961.4. Samples: 86243880. Policy #0 lag: (min: 1.0, avg: 10.2, max: 20.0) [2024-06-05 18:20:38,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:20:40,605][10367] Updated weights for policy 0, policy_version 5260 (0.0027) [2024-06-05 18:20:42,695][10347] Signal inference workers to stop experience collection... (1250 times) [2024-06-05 18:20:42,728][10367] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-06-05 18:20:42,803][10347] Signal inference workers to resume experience collection... (1250 times) [2024-06-05 18:20:42,803][10367] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-06-05 18:20:43,367][10367] Updated weights for policy 0, policy_version 5270 (0.0026) [2024-06-05 18:20:43,920][10130] Fps is (10 sec: 49152.8, 60 sec: 48332.9, 300 sec: 48318.9). Total num frames: 86360064. Throughput: 0: 48002.7. Samples: 86390580. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-05 18:20:43,920][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:20:47,182][10367] Updated weights for policy 0, policy_version 5280 (0.0031) [2024-06-05 18:20:48,920][10130] Fps is (10 sec: 47513.4, 60 sec: 47513.6, 300 sec: 48207.8). Total num frames: 86573056. Throughput: 0: 48181.8. Samples: 86680220. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:20:48,920][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:20:50,307][10367] Updated weights for policy 0, policy_version 5290 (0.0035) [2024-06-05 18:20:53,920][10130] Fps is (10 sec: 45874.4, 60 sec: 48059.6, 300 sec: 48152.3). Total num frames: 86818816. Throughput: 0: 47948.7. Samples: 86967080. Policy #0 lag: (min: 1.0, avg: 12.0, max: 22.0) [2024-06-05 18:20:53,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:20:54,058][10367] Updated weights for policy 0, policy_version 5300 (0.0033) [2024-06-05 18:20:56,916][10367] Updated weights for policy 0, policy_version 5310 (0.0029) [2024-06-05 18:20:58,920][10130] Fps is (10 sec: 50790.5, 60 sec: 48332.8, 300 sec: 48263.4). Total num frames: 87080960. Throughput: 0: 48188.4. Samples: 87114900. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-05 18:20:58,920][10130] Avg episode reward: [(0, '0.020')] [2024-06-05 18:21:00,859][10367] Updated weights for policy 0, policy_version 5320 (0.0026) [2024-06-05 18:21:03,601][10367] Updated weights for policy 0, policy_version 5330 (0.0026) [2024-06-05 18:21:03,920][10130] Fps is (10 sec: 50790.6, 60 sec: 48605.8, 300 sec: 48318.9). Total num frames: 87326720. Throughput: 0: 47968.7. Samples: 87403680. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:21:03,921][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:21:07,699][10367] Updated weights for policy 0, policy_version 5340 (0.0035) [2024-06-05 18:21:08,920][10130] Fps is (10 sec: 45874.7, 60 sec: 47786.6, 300 sec: 48207.8). Total num frames: 87539712. Throughput: 0: 48062.2. Samples: 87692080. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-05 18:21:08,920][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:21:10,594][10367] Updated weights for policy 0, policy_version 5350 (0.0029) [2024-06-05 18:21:13,920][10130] Fps is (10 sec: 44237.4, 60 sec: 47789.6, 300 sec: 48096.8). Total num frames: 87769088. Throughput: 0: 47904.9. Samples: 87832360. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:21:13,920][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:21:14,431][10367] Updated weights for policy 0, policy_version 5360 (0.0033) [2024-06-05 18:21:17,451][10367] Updated weights for policy 0, policy_version 5370 (0.0032) [2024-06-05 18:21:18,920][10130] Fps is (10 sec: 49151.5, 60 sec: 48335.6, 300 sec: 48207.8). Total num frames: 88031232. Throughput: 0: 47913.7. Samples: 88119060. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-05 18:21:18,921][10130] Avg episode reward: [(0, '0.021')] [2024-06-05 18:21:21,134][10367] Updated weights for policy 0, policy_version 5380 (0.0026) [2024-06-05 18:21:23,920][10130] Fps is (10 sec: 50789.7, 60 sec: 48332.8, 300 sec: 48263.4). Total num frames: 88276992. Throughput: 0: 48211.0. Samples: 88413380. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:21:23,920][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:21:24,184][10367] Updated weights for policy 0, policy_version 5390 (0.0033) [2024-06-05 18:21:28,114][10367] Updated weights for policy 0, policy_version 5400 (0.0022) [2024-06-05 18:21:28,920][10130] Fps is (10 sec: 47514.0, 60 sec: 48059.7, 300 sec: 48152.3). Total num frames: 88506368. Throughput: 0: 47995.4. Samples: 88550380. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:21:28,920][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:21:30,979][10367] Updated weights for policy 0, policy_version 5410 (0.0031) [2024-06-05 18:21:33,920][10130] Fps is (10 sec: 45875.2, 60 sec: 47786.7, 300 sec: 48096.7). Total num frames: 88735744. Throughput: 0: 48043.4. Samples: 88842180. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 18:21:33,922][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:21:34,906][10367] Updated weights for policy 0, policy_version 5420 (0.0031) [2024-06-05 18:21:38,014][10367] Updated weights for policy 0, policy_version 5430 (0.0031) [2024-06-05 18:21:38,920][10130] Fps is (10 sec: 47513.5, 60 sec: 48059.6, 300 sec: 48208.4). Total num frames: 88981504. Throughput: 0: 48032.9. Samples: 89128560. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 18:21:38,921][10130] Avg episode reward: [(0, '0.018')] [2024-06-05 18:21:41,515][10367] Updated weights for policy 0, policy_version 5440 (0.0034) [2024-06-05 18:21:43,920][10130] Fps is (10 sec: 49152.4, 60 sec: 47786.6, 300 sec: 48207.8). Total num frames: 89227264. Throughput: 0: 48093.7. Samples: 89279120. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:21:43,920][10130] Avg episode reward: [(0, '0.021')] [2024-06-05 18:21:44,756][10367] Updated weights for policy 0, policy_version 5450 (0.0032) [2024-06-05 18:21:48,447][10367] Updated weights for policy 0, policy_version 5460 (0.0029) [2024-06-05 18:21:48,920][10130] Fps is (10 sec: 49151.5, 60 sec: 48332.6, 300 sec: 48152.3). Total num frames: 89473024. Throughput: 0: 48019.4. Samples: 89564560. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-05 18:21:48,921][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:21:51,702][10367] Updated weights for policy 0, policy_version 5470 (0.0033) [2024-06-05 18:21:53,920][10130] Fps is (10 sec: 47513.8, 60 sec: 48059.9, 300 sec: 48152.3). Total num frames: 89702400. Throughput: 0: 48049.0. Samples: 89854280. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 18:21:53,920][10130] Avg episode reward: [(0, '0.018')] [2024-06-05 18:21:55,230][10367] Updated weights for policy 0, policy_version 5480 (0.0032) [2024-06-05 18:21:58,658][10367] Updated weights for policy 0, policy_version 5490 (0.0034) [2024-06-05 18:21:58,921][10130] Fps is (10 sec: 49145.1, 60 sec: 48058.4, 300 sec: 48318.7). Total num frames: 89964544. Throughput: 0: 48190.7. Samples: 90001020. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-05 18:21:58,922][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:22:01,947][10367] Updated weights for policy 0, policy_version 5500 (0.0029) [2024-06-05 18:22:03,920][10130] Fps is (10 sec: 50790.2, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 90210304. Throughput: 0: 48260.2. Samples: 90290760. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-05 18:22:03,920][10130] Avg episode reward: [(0, '0.024')] [2024-06-05 18:22:03,931][10347] Saving new best policy, reward=0.024! [2024-06-05 18:22:05,433][10367] Updated weights for policy 0, policy_version 5510 (0.0036) [2024-06-05 18:22:08,920][10130] Fps is (10 sec: 45882.8, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 90423296. Throughput: 0: 48134.8. Samples: 90579440. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 18:22:08,920][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:22:09,067][10367] Updated weights for policy 0, policy_version 5520 (0.0022) [2024-06-05 18:22:12,236][10367] Updated weights for policy 0, policy_version 5530 (0.0029) [2024-06-05 18:22:13,920][10130] Fps is (10 sec: 45875.2, 60 sec: 48332.8, 300 sec: 48152.3). Total num frames: 90669056. Throughput: 0: 48105.9. Samples: 90715140. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-05 18:22:13,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:22:15,884][10367] Updated weights for policy 0, policy_version 5540 (0.0035) [2024-06-05 18:22:18,920][10130] Fps is (10 sec: 49151.9, 60 sec: 48059.9, 300 sec: 48263.4). Total num frames: 90914816. Throughput: 0: 48012.1. Samples: 91002720. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:22:18,920][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:22:19,029][10367] Updated weights for policy 0, policy_version 5550 (0.0023) [2024-06-05 18:22:20,109][10347] Signal inference workers to stop experience collection... (1300 times) [2024-06-05 18:22:20,110][10347] Signal inference workers to resume experience collection... (1300 times) [2024-06-05 18:22:20,139][10367] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-06-05 18:22:20,139][10367] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-06-05 18:22:22,455][10367] Updated weights for policy 0, policy_version 5560 (0.0024) [2024-06-05 18:22:23,920][10130] Fps is (10 sec: 49151.5, 60 sec: 48059.7, 300 sec: 48152.3). Total num frames: 91160576. Throughput: 0: 48218.7. Samples: 91298400. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:22:23,920][10130] Avg episode reward: [(0, '0.021')] [2024-06-05 18:22:25,920][10367] Updated weights for policy 0, policy_version 5570 (0.0030) [2024-06-05 18:22:28,924][10130] Fps is (10 sec: 47495.4, 60 sec: 48056.8, 300 sec: 48152.3). Total num frames: 91389952. Throughput: 0: 47912.0. Samples: 91435340. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:22:28,924][10130] Avg episode reward: [(0, '0.025')] [2024-06-05 18:22:29,326][10367] Updated weights for policy 0, policy_version 5580 (0.0039) [2024-06-05 18:22:32,771][10367] Updated weights for policy 0, policy_version 5590 (0.0034) [2024-06-05 18:22:33,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48332.8, 300 sec: 48207.8). Total num frames: 91635712. Throughput: 0: 48150.4. Samples: 91731320. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-05 18:22:33,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:22:33,930][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005593_91635712.pth... [2024-06-05 18:22:33,975][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004890_80117760.pth [2024-06-05 18:22:36,199][10367] Updated weights for policy 0, policy_version 5600 (0.0028) [2024-06-05 18:22:38,920][10130] Fps is (10 sec: 47531.3, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 91865088. Throughput: 0: 47955.0. Samples: 92012260. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-05 18:22:38,920][10130] Avg episode reward: [(0, '0.023')] [2024-06-05 18:22:39,392][10367] Updated weights for policy 0, policy_version 5610 (0.0032) [2024-06-05 18:22:42,899][10367] Updated weights for policy 0, policy_version 5620 (0.0035) [2024-06-05 18:22:43,922][10130] Fps is (10 sec: 49140.9, 60 sec: 48330.9, 300 sec: 48207.5). Total num frames: 92127232. Throughput: 0: 47985.5. Samples: 92160400. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:22:43,923][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:22:46,458][10367] Updated weights for policy 0, policy_version 5630 (0.0029) [2024-06-05 18:22:48,920][10130] Fps is (10 sec: 47513.3, 60 sec: 47786.7, 300 sec: 48152.3). Total num frames: 92340224. Throughput: 0: 47981.2. Samples: 92449920. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:22:48,921][10130] Avg episode reward: [(0, '0.020')] [2024-06-05 18:22:49,674][10367] Updated weights for policy 0, policy_version 5640 (0.0023) [2024-06-05 18:22:53,142][10367] Updated weights for policy 0, policy_version 5650 (0.0028) [2024-06-05 18:22:53,920][10130] Fps is (10 sec: 45886.1, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 92585984. Throughput: 0: 47852.5. Samples: 92732800. Policy #0 lag: (min: 1.0, avg: 9.8, max: 21.0) [2024-06-05 18:22:53,920][10130] Avg episode reward: [(0, '0.023')] [2024-06-05 18:22:56,671][10367] Updated weights for policy 0, policy_version 5660 (0.0032) [2024-06-05 18:22:58,920][10130] Fps is (10 sec: 50790.4, 60 sec: 48060.9, 300 sec: 48263.4). Total num frames: 92848128. Throughput: 0: 48080.8. Samples: 92878780. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:22:58,921][10130] Avg episode reward: [(0, '0.020')] [2024-06-05 18:23:00,203][10367] Updated weights for policy 0, policy_version 5670 (0.0025) [2024-06-05 18:23:03,641][10367] Updated weights for policy 0, policy_version 5680 (0.0036) [2024-06-05 18:23:03,920][10130] Fps is (10 sec: 47513.3, 60 sec: 47513.6, 300 sec: 48096.8). Total num frames: 93061120. Throughput: 0: 48024.0. Samples: 93163800. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-05 18:23:03,920][10130] Avg episode reward: [(0, '0.026')] [2024-06-05 18:23:04,041][10347] Saving new best policy, reward=0.026! [2024-06-05 18:23:07,302][10367] Updated weights for policy 0, policy_version 5690 (0.0032) [2024-06-05 18:23:08,920][10130] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 93306880. Throughput: 0: 47743.2. Samples: 93446840. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-05 18:23:08,921][10130] Avg episode reward: [(0, '0.029')] [2024-06-05 18:23:08,924][10347] Saving new best policy, reward=0.029! [2024-06-05 18:23:10,422][10367] Updated weights for policy 0, policy_version 5700 (0.0027) [2024-06-05 18:23:13,920][10130] Fps is (10 sec: 47513.3, 60 sec: 47786.6, 300 sec: 48096.7). Total num frames: 93536256. Throughput: 0: 47816.9. Samples: 93586920. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:23:13,920][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:23:14,006][10367] Updated weights for policy 0, policy_version 5710 (0.0032) [2024-06-05 18:23:17,030][10367] Updated weights for policy 0, policy_version 5720 (0.0028) [2024-06-05 18:23:18,920][10130] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 48152.3). Total num frames: 93798400. Throughput: 0: 47626.3. Samples: 93874500. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:23:18,920][10130] Avg episode reward: [(0, '0.024')] [2024-06-05 18:23:21,223][10367] Updated weights for policy 0, policy_version 5730 (0.0029) [2024-06-05 18:23:23,920][10130] Fps is (10 sec: 49151.8, 60 sec: 47786.7, 300 sec: 48096.8). Total num frames: 94027776. Throughput: 0: 47857.3. Samples: 94165840. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:23:23,922][10130] Avg episode reward: [(0, '0.026')] [2024-06-05 18:23:24,091][10367] Updated weights for policy 0, policy_version 5740 (0.0022) [2024-06-05 18:23:27,878][10367] Updated weights for policy 0, policy_version 5750 (0.0035) [2024-06-05 18:23:28,920][10130] Fps is (10 sec: 47513.3, 60 sec: 48062.7, 300 sec: 48041.2). Total num frames: 94273536. Throughput: 0: 47747.3. Samples: 94308920. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:23:28,929][10130] Avg episode reward: [(0, '0.027')] [2024-06-05 18:23:30,816][10367] Updated weights for policy 0, policy_version 5760 (0.0018) [2024-06-05 18:23:33,920][10130] Fps is (10 sec: 47513.6, 60 sec: 47786.6, 300 sec: 48096.7). Total num frames: 94502912. Throughput: 0: 47730.7. Samples: 94597800. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-05 18:23:33,929][10130] Avg episode reward: [(0, '0.028')] [2024-06-05 18:23:34,866][10367] Updated weights for policy 0, policy_version 5770 (0.0021) [2024-06-05 18:23:37,661][10367] Updated weights for policy 0, policy_version 5780 (0.0028) [2024-06-05 18:23:38,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 94748672. Throughput: 0: 47898.1. Samples: 94888220. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-05 18:23:38,920][10130] Avg episode reward: [(0, '0.023')] [2024-06-05 18:23:40,492][10347] Signal inference workers to stop experience collection... (1350 times) [2024-06-05 18:23:40,492][10347] Signal inference workers to resume experience collection... (1350 times) [2024-06-05 18:23:40,531][10367] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-06-05 18:23:40,532][10367] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-06-05 18:23:41,577][10367] Updated weights for policy 0, policy_version 5790 (0.0028) [2024-06-05 18:23:43,920][10130] Fps is (10 sec: 49152.2, 60 sec: 47788.5, 300 sec: 48041.2). Total num frames: 94994432. Throughput: 0: 47879.2. Samples: 95033340. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-05 18:23:43,920][10130] Avg episode reward: [(0, '0.020')] [2024-06-05 18:23:44,325][10367] Updated weights for policy 0, policy_version 5800 (0.0032) [2024-06-05 18:23:48,378][10367] Updated weights for policy 0, policy_version 5810 (0.0023) [2024-06-05 18:23:48,920][10130] Fps is (10 sec: 47513.5, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 95223808. Throughput: 0: 47924.0. Samples: 95320380. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-05 18:23:48,920][10130] Avg episode reward: [(0, '0.026')] [2024-06-05 18:23:51,313][10367] Updated weights for policy 0, policy_version 5820 (0.0027) [2024-06-05 18:23:53,920][10130] Fps is (10 sec: 47513.8, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 95469568. Throughput: 0: 47903.6. Samples: 95602500. Policy #0 lag: (min: 1.0, avg: 12.0, max: 24.0) [2024-06-05 18:23:53,920][10130] Avg episode reward: [(0, '0.026')] [2024-06-05 18:23:55,279][10367] Updated weights for policy 0, policy_version 5830 (0.0021) [2024-06-05 18:23:58,450][10367] Updated weights for policy 0, policy_version 5840 (0.0034) [2024-06-05 18:23:58,920][10130] Fps is (10 sec: 49151.4, 60 sec: 47786.6, 300 sec: 48152.3). Total num frames: 95715328. Throughput: 0: 48131.0. Samples: 95752820. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:23:58,920][10130] Avg episode reward: [(0, '0.024')] [2024-06-05 18:24:02,190][10367] Updated weights for policy 0, policy_version 5850 (0.0026) [2024-06-05 18:24:03,923][10130] Fps is (10 sec: 47496.2, 60 sec: 48056.8, 300 sec: 48040.6). Total num frames: 95944704. Throughput: 0: 48201.9. Samples: 96043760. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:24:03,924][10130] Avg episode reward: [(0, '0.032')] [2024-06-05 18:24:03,958][10347] Saving new best policy, reward=0.032! [2024-06-05 18:24:05,077][10367] Updated weights for policy 0, policy_version 5860 (0.0025) [2024-06-05 18:24:08,902][10367] Updated weights for policy 0, policy_version 5870 (0.0030) [2024-06-05 18:24:08,920][10130] Fps is (10 sec: 45875.8, 60 sec: 47786.7, 300 sec: 47985.7). Total num frames: 96174080. Throughput: 0: 48183.2. Samples: 96334080. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-05 18:24:08,920][10130] Avg episode reward: [(0, '0.024')] [2024-06-05 18:24:12,064][10367] Updated weights for policy 0, policy_version 5880 (0.0037) [2024-06-05 18:24:13,922][10130] Fps is (10 sec: 49158.4, 60 sec: 48330.9, 300 sec: 48096.4). Total num frames: 96436224. Throughput: 0: 47947.8. Samples: 96466680. Policy #0 lag: (min: 0.0, avg: 9.5, max: 23.0) [2024-06-05 18:24:13,923][10130] Avg episode reward: [(0, '0.029')] [2024-06-05 18:24:15,822][10367] Updated weights for policy 0, policy_version 5890 (0.0030) [2024-06-05 18:24:18,873][10367] Updated weights for policy 0, policy_version 5900 (0.0035) [2024-06-05 18:24:18,920][10130] Fps is (10 sec: 49152.3, 60 sec: 47786.7, 300 sec: 48041.2). Total num frames: 96665600. Throughput: 0: 47966.4. Samples: 96756280. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-05 18:24:18,920][10130] Avg episode reward: [(0, '0.030')] [2024-06-05 18:24:22,801][10367] Updated weights for policy 0, policy_version 5910 (0.0028) [2024-06-05 18:24:23,921][10130] Fps is (10 sec: 47519.6, 60 sec: 48058.9, 300 sec: 47985.5). Total num frames: 96911360. Throughput: 0: 47936.6. Samples: 97045420. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-05 18:24:23,922][10130] Avg episode reward: [(0, '0.024')] [2024-06-05 18:24:25,533][10367] Updated weights for policy 0, policy_version 5920 (0.0031) [2024-06-05 18:24:28,920][10130] Fps is (10 sec: 45874.6, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 97124352. Throughput: 0: 47990.2. Samples: 97192900. Policy #0 lag: (min: 0.0, avg: 13.1, max: 23.0) [2024-06-05 18:24:28,921][10130] Avg episode reward: [(0, '0.029')] [2024-06-05 18:24:29,274][10367] Updated weights for policy 0, policy_version 5930 (0.0025) [2024-06-05 18:24:32,348][10367] Updated weights for policy 0, policy_version 5940 (0.0032) [2024-06-05 18:24:33,923][10130] Fps is (10 sec: 50780.6, 60 sec: 48603.5, 300 sec: 48151.8). Total num frames: 97419264. Throughput: 0: 47975.5. Samples: 97479420. Policy #0 lag: (min: 0.0, avg: 11.7, max: 23.0) [2024-06-05 18:24:33,923][10130] Avg episode reward: [(0, '0.034')] [2024-06-05 18:24:33,929][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005946_97419264.pth... [2024-06-05 18:24:33,978][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005242_85884928.pth [2024-06-05 18:24:33,982][10347] Saving new best policy, reward=0.034! [2024-06-05 18:24:36,149][10367] Updated weights for policy 0, policy_version 5950 (0.0032) [2024-06-05 18:24:38,920][10130] Fps is (10 sec: 49152.0, 60 sec: 47786.6, 300 sec: 47985.7). Total num frames: 97615872. Throughput: 0: 48110.6. Samples: 97767480. Policy #0 lag: (min: 0.0, avg: 11.7, max: 23.0) [2024-06-05 18:24:38,920][10130] Avg episode reward: [(0, '0.033')] [2024-06-05 18:24:39,093][10367] Updated weights for policy 0, policy_version 5960 (0.0027) [2024-06-05 18:24:42,971][10367] Updated weights for policy 0, policy_version 5970 (0.0032) [2024-06-05 18:24:43,920][10130] Fps is (10 sec: 44250.1, 60 sec: 47786.7, 300 sec: 47930.1). Total num frames: 97861632. Throughput: 0: 48020.1. Samples: 97913720. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:24:43,920][10130] Avg episode reward: [(0, '0.035')] [2024-06-05 18:24:43,937][10347] Saving new best policy, reward=0.035! [2024-06-05 18:24:46,091][10367] Updated weights for policy 0, policy_version 5980 (0.0022) [2024-06-05 18:24:46,707][10347] Signal inference workers to stop experience collection... (1400 times) [2024-06-05 18:24:46,708][10347] Signal inference workers to resume experience collection... (1400 times) [2024-06-05 18:24:46,751][10367] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-06-05 18:24:46,751][10367] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-06-05 18:24:48,920][10130] Fps is (10 sec: 47514.1, 60 sec: 47786.7, 300 sec: 47985.7). Total num frames: 98091008. Throughput: 0: 47804.4. Samples: 98194780. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-05 18:24:48,920][10130] Avg episode reward: [(0, '0.033')] [2024-06-05 18:24:49,943][10367] Updated weights for policy 0, policy_version 5990 (0.0028) [2024-06-05 18:24:52,756][10367] Updated weights for policy 0, policy_version 6000 (0.0034) [2024-06-05 18:24:53,920][10130] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 48041.2). Total num frames: 98353152. Throughput: 0: 47923.1. Samples: 98490620. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-05 18:24:53,920][10130] Avg episode reward: [(0, '0.029')] [2024-06-05 18:24:56,632][10367] Updated weights for policy 0, policy_version 6010 (0.0038) [2024-06-05 18:24:58,920][10130] Fps is (10 sec: 49151.7, 60 sec: 47786.8, 300 sec: 48041.2). Total num frames: 98582528. Throughput: 0: 48257.2. Samples: 98638140. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-05 18:24:58,920][10130] Avg episode reward: [(0, '0.032')] [2024-06-05 18:24:59,742][10367] Updated weights for policy 0, policy_version 6020 (0.0030) [2024-06-05 18:25:03,363][10367] Updated weights for policy 0, policy_version 6030 (0.0033) [2024-06-05 18:25:03,920][10130] Fps is (10 sec: 45874.9, 60 sec: 47789.5, 300 sec: 47930.1). Total num frames: 98811904. Throughput: 0: 48108.8. Samples: 98921180. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-05 18:25:03,920][10130] Avg episode reward: [(0, '0.030')] [2024-06-05 18:25:06,411][10367] Updated weights for policy 0, policy_version 6040 (0.0047) [2024-06-05 18:25:08,920][10130] Fps is (10 sec: 49151.9, 60 sec: 48332.8, 300 sec: 48041.8). Total num frames: 99074048. Throughput: 0: 47986.9. Samples: 99204780. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-05 18:25:08,920][10130] Avg episode reward: [(0, '0.026')] [2024-06-05 18:25:10,394][10367] Updated weights for policy 0, policy_version 6050 (0.0026) [2024-06-05 18:25:13,542][10367] Updated weights for policy 0, policy_version 6060 (0.0020) [2024-06-05 18:25:13,920][10130] Fps is (10 sec: 49152.3, 60 sec: 47788.6, 300 sec: 48041.8). Total num frames: 99303424. Throughput: 0: 47821.0. Samples: 99344840. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:25:13,920][10130] Avg episode reward: [(0, '0.034')] [2024-06-05 18:25:17,323][10367] Updated weights for policy 0, policy_version 6070 (0.0024) [2024-06-05 18:25:18,920][10130] Fps is (10 sec: 47514.0, 60 sec: 48059.7, 300 sec: 48041.2). Total num frames: 99549184. Throughput: 0: 47937.0. Samples: 99636440. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:25:18,920][10130] Avg episode reward: [(0, '0.033')] [2024-06-05 18:25:20,347][10367] Updated weights for policy 0, policy_version 6080 (0.0025) [2024-06-05 18:25:23,920][10130] Fps is (10 sec: 45874.9, 60 sec: 47514.4, 300 sec: 47930.1). Total num frames: 99762176. Throughput: 0: 47834.2. Samples: 99920020. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 18:25:23,920][10130] Avg episode reward: [(0, '0.031')] [2024-06-05 18:25:24,074][10367] Updated weights for policy 0, policy_version 6090 (0.0031) [2024-06-05 18:25:27,183][10367] Updated weights for policy 0, policy_version 6100 (0.0027) [2024-06-05 18:25:28,920][10130] Fps is (10 sec: 49151.9, 60 sec: 48605.9, 300 sec: 48041.2). Total num frames: 100040704. Throughput: 0: 47741.4. Samples: 100062080. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-05 18:25:28,921][10130] Avg episode reward: [(0, '0.031')] [2024-06-05 18:25:30,901][10367] Updated weights for policy 0, policy_version 6110 (0.0029) [2024-06-05 18:25:33,718][10367] Updated weights for policy 0, policy_version 6120 (0.0033) [2024-06-05 18:25:33,920][10130] Fps is (10 sec: 50790.3, 60 sec: 47515.9, 300 sec: 48041.2). Total num frames: 100270080. Throughput: 0: 48056.3. Samples: 100357320. Policy #0 lag: (min: 0.0, avg: 8.8, max: 22.0) [2024-06-05 18:25:33,920][10130] Avg episode reward: [(0, '0.027')] [2024-06-05 18:25:37,662][10367] Updated weights for policy 0, policy_version 6130 (0.0028) [2024-06-05 18:25:38,920][10130] Fps is (10 sec: 47513.6, 60 sec: 48332.9, 300 sec: 47985.7). Total num frames: 100515840. Throughput: 0: 48094.2. Samples: 100654860. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-05 18:25:38,921][10130] Avg episode reward: [(0, '0.031')] [2024-06-05 18:25:40,309][10367] Updated weights for policy 0, policy_version 6140 (0.0029) [2024-06-05 18:25:43,920][10130] Fps is (10 sec: 45875.2, 60 sec: 47786.6, 300 sec: 47985.7). Total num frames: 100728832. Throughput: 0: 47969.3. Samples: 100796760. Policy #0 lag: (min: 0.0, avg: 10.3, max: 26.0) [2024-06-05 18:25:43,921][10130] Avg episode reward: [(0, '0.032')] [2024-06-05 18:25:44,287][10367] Updated weights for policy 0, policy_version 6150 (0.0027) [2024-06-05 18:25:45,375][10347] Signal inference workers to stop experience collection... (1450 times) [2024-06-05 18:25:45,419][10367] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-06-05 18:25:45,425][10347] Signal inference workers to resume experience collection... (1450 times) [2024-06-05 18:25:45,440][10367] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-06-05 18:25:47,308][10367] Updated weights for policy 0, policy_version 6160 (0.0033) [2024-06-05 18:25:48,920][10130] Fps is (10 sec: 47513.6, 60 sec: 48332.8, 300 sec: 48041.2). Total num frames: 100990976. Throughput: 0: 47991.2. Samples: 101080780. Policy #0 lag: (min: 0.0, avg: 10.3, max: 26.0) [2024-06-05 18:25:48,920][10130] Avg episode reward: [(0, '0.036')] [2024-06-05 18:25:48,921][10347] Saving new best policy, reward=0.036! [2024-06-05 18:25:51,113][10367] Updated weights for policy 0, policy_version 6170 (0.0030) [2024-06-05 18:25:53,920][10130] Fps is (10 sec: 49151.6, 60 sec: 47786.5, 300 sec: 47930.1). Total num frames: 101220352. Throughput: 0: 47984.3. Samples: 101364080. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-05 18:25:53,921][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:25:54,317][10367] Updated weights for policy 0, policy_version 6180 (0.0037) [2024-06-05 18:25:57,899][10367] Updated weights for policy 0, policy_version 6190 (0.0030) [2024-06-05 18:25:58,920][10130] Fps is (10 sec: 47513.3, 60 sec: 48059.7, 300 sec: 47930.1). Total num frames: 101466112. Throughput: 0: 48090.6. Samples: 101508920. Policy #0 lag: (min: 1.0, avg: 11.6, max: 23.0) [2024-06-05 18:25:58,920][10130] Avg episode reward: [(0, '0.036')] [2024-06-05 18:26:00,956][10367] Updated weights for policy 0, policy_version 6200 (0.0026) [2024-06-05 18:26:03,920][10130] Fps is (10 sec: 47514.4, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 101695488. Throughput: 0: 48140.9. Samples: 101802780. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-05 18:26:03,920][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:26:03,930][10347] Saving new best policy, reward=0.037! [2024-06-05 18:26:04,665][10367] Updated weights for policy 0, policy_version 6210 (0.0030) [2024-06-05 18:26:07,868][10367] Updated weights for policy 0, policy_version 6220 (0.0032) [2024-06-05 18:26:08,920][10130] Fps is (10 sec: 47514.1, 60 sec: 47786.8, 300 sec: 48041.2). Total num frames: 101941248. Throughput: 0: 48343.2. Samples: 102095460. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:26:08,920][10130] Avg episode reward: [(0, '0.030')] [2024-06-05 18:26:11,486][10367] Updated weights for policy 0, policy_version 6230 (0.0031) [2024-06-05 18:26:13,923][10130] Fps is (10 sec: 50771.7, 60 sec: 48329.8, 300 sec: 48040.6). Total num frames: 102203392. Throughput: 0: 48304.1. Samples: 102235940. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 18:26:13,924][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:26:14,930][10367] Updated weights for policy 0, policy_version 6240 (0.0022) [2024-06-05 18:26:18,200][10367] Updated weights for policy 0, policy_version 6250 (0.0027) [2024-06-05 18:26:18,920][10130] Fps is (10 sec: 47513.5, 60 sec: 47786.7, 300 sec: 47930.2). Total num frames: 102416384. Throughput: 0: 48069.5. Samples: 102520440. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 18:26:18,920][10130] Avg episode reward: [(0, '0.040')] [2024-06-05 18:26:18,921][10347] Saving new best policy, reward=0.040! [2024-06-05 18:26:21,618][10367] Updated weights for policy 0, policy_version 6260 (0.0031) [2024-06-05 18:26:23,920][10130] Fps is (10 sec: 45892.0, 60 sec: 48332.8, 300 sec: 47985.7). Total num frames: 102662144. Throughput: 0: 47905.7. Samples: 102810620. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 18:26:23,920][10130] Avg episode reward: [(0, '0.036')] [2024-06-05 18:26:24,831][10367] Updated weights for policy 0, policy_version 6270 (0.0037) [2024-06-05 18:26:28,225][10367] Updated weights for policy 0, policy_version 6280 (0.0029) [2024-06-05 18:26:28,920][10130] Fps is (10 sec: 49151.4, 60 sec: 47786.6, 300 sec: 48041.2). Total num frames: 102907904. Throughput: 0: 47981.3. Samples: 102955920. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-05 18:26:28,920][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:26:31,938][10367] Updated weights for policy 0, policy_version 6290 (0.0027) [2024-06-05 18:26:33,920][10130] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 48041.2). Total num frames: 103153664. Throughput: 0: 48217.7. Samples: 103250580. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:26:33,920][10130] Avg episode reward: [(0, '0.036')] [2024-06-05 18:26:34,005][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006297_103170048.pth... [2024-06-05 18:26:34,052][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005593_91635712.pth [2024-06-05 18:26:35,272][10367] Updated weights for policy 0, policy_version 6300 (0.0035) [2024-06-05 18:26:38,873][10367] Updated weights for policy 0, policy_version 6310 (0.0029) [2024-06-05 18:26:38,920][10130] Fps is (10 sec: 47513.8, 60 sec: 47786.6, 300 sec: 47985.7). Total num frames: 103383040. Throughput: 0: 48182.4. Samples: 103532280. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:26:38,920][10130] Avg episode reward: [(0, '0.041')] [2024-06-05 18:26:42,166][10367] Updated weights for policy 0, policy_version 6320 (0.0031) [2024-06-05 18:26:42,804][10347] Signal inference workers to stop experience collection... (1500 times) [2024-06-05 18:26:42,805][10347] Signal inference workers to resume experience collection... (1500 times) [2024-06-05 18:26:42,844][10367] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-06-05 18:26:42,844][10367] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-06-05 18:26:43,920][10130] Fps is (10 sec: 49151.3, 60 sec: 48605.8, 300 sec: 48041.2). Total num frames: 103645184. Throughput: 0: 48091.0. Samples: 103673020. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-05 18:26:43,920][10130] Avg episode reward: [(0, '0.028')] [2024-06-05 18:26:45,687][10367] Updated weights for policy 0, policy_version 6330 (0.0024) [2024-06-05 18:26:48,923][10130] Fps is (10 sec: 47500.4, 60 sec: 47784.4, 300 sec: 47985.2). Total num frames: 103858176. Throughput: 0: 47868.1. Samples: 103956980. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-05 18:26:48,923][10130] Avg episode reward: [(0, '0.038')] [2024-06-05 18:26:48,979][10367] Updated weights for policy 0, policy_version 6340 (0.0034) [2024-06-05 18:26:52,483][10367] Updated weights for policy 0, policy_version 6350 (0.0032) [2024-06-05 18:26:53,920][10130] Fps is (10 sec: 47514.3, 60 sec: 48332.9, 300 sec: 47985.9). Total num frames: 104120320. Throughput: 0: 47956.8. Samples: 104253520. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-05 18:26:53,920][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:26:55,693][10367] Updated weights for policy 0, policy_version 6360 (0.0029) [2024-06-05 18:26:58,920][10130] Fps is (10 sec: 45887.8, 60 sec: 47513.6, 300 sec: 47819.1). Total num frames: 104316928. Throughput: 0: 47907.4. Samples: 104391600. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:26:58,920][10130] Avg episode reward: [(0, '0.038')] [2024-06-05 18:26:59,446][10367] Updated weights for policy 0, policy_version 6370 (0.0021) [2024-06-05 18:27:02,516][10367] Updated weights for policy 0, policy_version 6380 (0.0031) [2024-06-05 18:27:03,920][10130] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 104611840. Throughput: 0: 48042.6. Samples: 104682360. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-05 18:27:03,926][10130] Avg episode reward: [(0, '0.036')] [2024-06-05 18:27:06,181][10367] Updated weights for policy 0, policy_version 6390 (0.0025) [2024-06-05 18:27:08,920][10130] Fps is (10 sec: 50790.1, 60 sec: 48059.6, 300 sec: 47985.7). Total num frames: 104824832. Throughput: 0: 47833.2. Samples: 104963120. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-05 18:27:08,921][10130] Avg episode reward: [(0, '0.039')] [2024-06-05 18:27:09,380][10367] Updated weights for policy 0, policy_version 6400 (0.0031) [2024-06-05 18:27:12,919][10367] Updated weights for policy 0, policy_version 6410 (0.0028) [2024-06-05 18:27:13,920][10130] Fps is (10 sec: 44237.0, 60 sec: 47516.5, 300 sec: 47930.1). Total num frames: 105054208. Throughput: 0: 47918.4. Samples: 105112240. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-05 18:27:13,920][10130] Avg episode reward: [(0, '0.040')] [2024-06-05 18:27:16,153][10367] Updated weights for policy 0, policy_version 6420 (0.0031) [2024-06-05 18:27:18,920][10130] Fps is (10 sec: 45875.8, 60 sec: 47786.6, 300 sec: 47874.6). Total num frames: 105283584. Throughput: 0: 47747.1. Samples: 105399200. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-05 18:27:18,920][10130] Avg episode reward: [(0, '0.044')] [2024-06-05 18:27:18,997][10347] Saving new best policy, reward=0.044! [2024-06-05 18:27:19,818][10367] Updated weights for policy 0, policy_version 6430 (0.0033) [2024-06-05 18:27:22,920][10367] Updated weights for policy 0, policy_version 6440 (0.0023) [2024-06-05 18:27:23,920][10130] Fps is (10 sec: 50789.9, 60 sec: 48332.8, 300 sec: 48041.8). Total num frames: 105562112. Throughput: 0: 47891.1. Samples: 105687380. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-05 18:27:23,920][10130] Avg episode reward: [(0, '0.039')] [2024-06-05 18:27:26,668][10367] Updated weights for policy 0, policy_version 6450 (0.0034) [2024-06-05 18:27:28,920][10130] Fps is (10 sec: 50790.1, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 105791488. Throughput: 0: 48201.0. Samples: 105842060. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 18:27:28,920][10130] Avg episode reward: [(0, '0.039')] [2024-06-05 18:27:29,772][10367] Updated weights for policy 0, policy_version 6460 (0.0035) [2024-06-05 18:27:33,346][10367] Updated weights for policy 0, policy_version 6470 (0.0035) [2024-06-05 18:27:33,920][10130] Fps is (10 sec: 45875.4, 60 sec: 47786.7, 300 sec: 47985.7). Total num frames: 106020864. Throughput: 0: 48353.3. Samples: 106132740. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 18:27:33,920][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:27:36,340][10367] Updated weights for policy 0, policy_version 6480 (0.0040) [2024-06-05 18:27:38,920][10130] Fps is (10 sec: 47513.2, 60 sec: 48059.7, 300 sec: 47930.5). Total num frames: 106266624. Throughput: 0: 48107.0. Samples: 106418340. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:27:38,920][10130] Avg episode reward: [(0, '0.032')] [2024-06-05 18:27:40,011][10367] Updated weights for policy 0, policy_version 6490 (0.0034) [2024-06-05 18:27:43,099][10367] Updated weights for policy 0, policy_version 6500 (0.0030) [2024-06-05 18:27:43,920][10130] Fps is (10 sec: 49152.1, 60 sec: 47786.8, 300 sec: 48041.2). Total num frames: 106512384. Throughput: 0: 48242.3. Samples: 106562500. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:27:43,920][10130] Avg episode reward: [(0, '0.040')] [2024-06-05 18:27:46,954][10367] Updated weights for policy 0, policy_version 6510 (0.0030) [2024-06-05 18:27:48,920][10130] Fps is (10 sec: 49153.0, 60 sec: 48335.1, 300 sec: 48041.2). Total num frames: 106758144. Throughput: 0: 48267.2. Samples: 106854380. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-05 18:27:48,920][10130] Avg episode reward: [(0, '0.042')] [2024-06-05 18:27:50,016][10367] Updated weights for policy 0, policy_version 6520 (0.0027) [2024-06-05 18:27:53,920][10130] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 106971136. Throughput: 0: 48253.9. Samples: 107134540. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-05 18:27:53,920][10130] Avg episode reward: [(0, '0.033')] [2024-06-05 18:27:53,965][10367] Updated weights for policy 0, policy_version 6530 (0.0027) [2024-06-05 18:27:57,085][10367] Updated weights for policy 0, policy_version 6540 (0.0034) [2024-06-05 18:27:58,920][10130] Fps is (10 sec: 47513.6, 60 sec: 48606.0, 300 sec: 48041.2). Total num frames: 107233280. Throughput: 0: 48017.8. Samples: 107273040. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-05 18:27:58,920][10130] Avg episode reward: [(0, '0.040')] [2024-06-05 18:28:00,618][10347] Signal inference workers to stop experience collection... (1550 times) [2024-06-05 18:28:00,619][10347] Signal inference workers to resume experience collection... (1550 times) [2024-06-05 18:28:00,643][10367] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-06-05 18:28:00,644][10367] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-06-05 18:28:00,754][10367] Updated weights for policy 0, policy_version 6550 (0.0031) [2024-06-05 18:28:03,920][10130] Fps is (10 sec: 49151.6, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 107462656. Throughput: 0: 48060.8. Samples: 107561940. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:28:03,920][10130] Avg episode reward: [(0, '0.043')] [2024-06-05 18:28:03,996][10367] Updated weights for policy 0, policy_version 6560 (0.0032) [2024-06-05 18:28:07,477][10367] Updated weights for policy 0, policy_version 6570 (0.0029) [2024-06-05 18:28:08,920][10130] Fps is (10 sec: 49151.6, 60 sec: 48332.9, 300 sec: 48096.8). Total num frames: 107724800. Throughput: 0: 48284.0. Samples: 107860160. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-05 18:28:08,920][10130] Avg episode reward: [(0, '0.042')] [2024-06-05 18:28:10,585][10367] Updated weights for policy 0, policy_version 6580 (0.0029) [2024-06-05 18:28:13,920][10130] Fps is (10 sec: 49152.1, 60 sec: 48332.7, 300 sec: 47985.7). Total num frames: 107954176. Throughput: 0: 47972.9. Samples: 108000840. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-05 18:28:13,920][10130] Avg episode reward: [(0, '0.039')] [2024-06-05 18:28:14,111][10367] Updated weights for policy 0, policy_version 6590 (0.0027) [2024-06-05 18:28:17,480][10367] Updated weights for policy 0, policy_version 6600 (0.0038) [2024-06-05 18:28:18,920][10130] Fps is (10 sec: 49151.8, 60 sec: 48878.9, 300 sec: 48096.8). Total num frames: 108216320. Throughput: 0: 47985.7. Samples: 108292100. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-05 18:28:18,920][10130] Avg episode reward: [(0, '0.041')] [2024-06-05 18:28:21,055][10367] Updated weights for policy 0, policy_version 6610 (0.0031) [2024-06-06 11:59:28,956][02692] Saving configuration to /workspace/metta/train_dir/p2.metta.4/config.json... [2024-06-06 11:59:28,998][02692] Rollout worker 0 uses device cpu [2024-06-06 11:59:28,999][02692] Rollout worker 1 uses device cpu [2024-06-06 11:59:28,999][02692] Rollout worker 2 uses device cpu [2024-06-06 11:59:29,000][02692] Rollout worker 3 uses device cpu [2024-06-06 11:59:29,001][02692] Rollout worker 4 uses device cpu [2024-06-06 11:59:29,001][02692] Rollout worker 5 uses device cpu [2024-06-06 11:59:29,001][02692] Rollout worker 6 uses device cpu [2024-06-06 11:59:29,002][02692] Rollout worker 7 uses device cpu [2024-06-06 11:59:29,002][02692] Rollout worker 8 uses device cpu [2024-06-06 11:59:29,003][02692] Rollout worker 9 uses device cpu [2024-06-06 11:59:29,003][02692] Rollout worker 10 uses device cpu [2024-06-06 11:59:29,004][02692] Rollout worker 11 uses device cpu [2024-06-06 11:59:29,005][02692] Rollout worker 12 uses device cpu [2024-06-06 11:59:29,005][02692] Rollout worker 13 uses device cpu [2024-06-06 11:59:29,005][02692] Rollout worker 14 uses device cpu [2024-06-06 11:59:29,005][02692] Rollout worker 15 uses device cpu [2024-06-06 11:59:29,005][02692] Rollout worker 16 uses device cpu [2024-06-06 11:59:29,006][02692] Rollout worker 17 uses device cpu [2024-06-06 11:59:29,006][02692] Rollout worker 18 uses device cpu [2024-06-06 11:59:29,006][02692] Rollout worker 19 uses device cpu [2024-06-06 11:59:29,006][02692] Rollout worker 20 uses device cpu [2024-06-06 11:59:29,006][02692] Rollout worker 21 uses device cpu [2024-06-06 11:59:29,007][02692] Rollout worker 22 uses device cpu [2024-06-06 11:59:29,007][02692] Rollout worker 23 uses device cpu [2024-06-06 11:59:29,007][02692] Rollout worker 24 uses device cpu [2024-06-06 11:59:29,007][02692] Rollout worker 25 uses device cpu [2024-06-06 11:59:29,007][02692] Rollout worker 26 uses device cpu [2024-06-06 11:59:29,008][02692] Rollout worker 27 uses device cpu [2024-06-06 11:59:29,008][02692] Rollout worker 28 uses device cpu [2024-06-06 11:59:29,008][02692] Rollout worker 29 uses device cpu [2024-06-06 11:59:29,008][02692] Rollout worker 30 uses device cpu [2024-06-06 11:59:29,008][02692] Rollout worker 31 uses device cpu [2024-06-06 11:59:29,528][02692] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 11:59:29,529][02692] InferenceWorker_p0-w0: min num requests: 10 [2024-06-06 11:59:29,589][02692] Starting all processes... [2024-06-06 11:59:29,589][02692] Starting process learner_proc0 [2024-06-06 11:59:29,865][02692] Starting all processes... [2024-06-06 11:59:29,868][02692] Starting process inference_proc0-0 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc0 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc1 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc2 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc3 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc4 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc5 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc6 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc7 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc8 [2024-06-06 11:59:29,871][02692] Starting process rollout_proc9 [2024-06-06 11:59:29,871][02692] Starting process rollout_proc10 [2024-06-06 11:59:29,871][02692] Starting process rollout_proc11 [2024-06-06 11:59:29,872][02692] Starting process rollout_proc12 [2024-06-06 11:59:29,875][02692] Starting process rollout_proc17 [2024-06-06 11:59:29,873][02692] Starting process rollout_proc14 [2024-06-06 11:59:29,873][02692] Starting process rollout_proc15 [2024-06-06 11:59:29,873][02692] Starting process rollout_proc16 [2024-06-06 11:59:29,872][02692] Starting process rollout_proc13 [2024-06-06 11:59:29,875][02692] Starting process rollout_proc18 [2024-06-06 11:59:29,876][02692] Starting process rollout_proc19 [2024-06-06 11:59:29,880][02692] Starting process rollout_proc20 [2024-06-06 11:59:29,880][02692] Starting process rollout_proc21 [2024-06-06 11:59:29,880][02692] Starting process rollout_proc22 [2024-06-06 11:59:29,885][02692] Starting process rollout_proc23 [2024-06-06 11:59:29,889][02692] Starting process rollout_proc24 [2024-06-06 11:59:29,889][02692] Starting process rollout_proc25 [2024-06-06 11:59:29,892][02692] Starting process rollout_proc26 [2024-06-06 11:59:29,893][02692] Starting process rollout_proc27 [2024-06-06 11:59:29,896][02692] Starting process rollout_proc28 [2024-06-06 11:59:29,896][02692] Starting process rollout_proc29 [2024-06-06 11:59:29,898][02692] Starting process rollout_proc30 [2024-06-06 11:59:29,898][02692] Starting process rollout_proc31 [2024-06-06 11:59:31,916][02954] Worker 29 uses CPU cores [29] [2024-06-06 11:59:31,916][02925] Worker 0 uses CPU cores [0] [2024-06-06 11:59:32,001][02941] Worker 14 uses CPU cores [14] [2024-06-06 11:59:32,018][02904] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 11:59:32,019][02904] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-06 11:59:32,020][02927] Worker 2 uses CPU cores [2] [2024-06-06 11:59:32,028][02932] Worker 7 uses CPU cores [7] [2024-06-06 11:59:32,028][02904] Num visible devices: 1 [2024-06-06 11:59:32,032][02937] Worker 10 uses CPU cores [10] [2024-06-06 11:59:32,048][02904] Setting fixed seed 0 [2024-06-06 11:59:32,049][02904] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 11:59:32,050][02904] Initializing actor-critic model on device cuda:0 [2024-06-06 11:59:32,055][02934] Worker 8 uses CPU cores [8] [2024-06-06 11:59:32,080][02955] Worker 31 uses CPU cores [31] [2024-06-06 11:59:32,083][02930] Worker 5 uses CPU cores [5] [2024-06-06 11:59:32,104][02942] Worker 17 uses CPU cores [17] [2024-06-06 11:59:32,128][02949] Worker 24 uses CPU cores [24] [2024-06-06 11:59:32,148][02950] Worker 25 uses CPU cores [25] [2024-06-06 11:59:32,198][02940] Worker 16 uses CPU cores [16] [2024-06-06 11:59:32,204][02946] Worker 21 uses CPU cores [21] [2024-06-06 11:59:32,256][02952] Worker 28 uses CPU cores [28] [2024-06-06 11:59:32,260][02956] Worker 30 uses CPU cores [30] [2024-06-06 11:59:32,262][02936] Worker 12 uses CPU cores [12] [2024-06-06 11:59:32,264][02947] Worker 22 uses CPU cores [22] [2024-06-06 11:59:32,288][02951] Worker 27 uses CPU cores [27] [2024-06-06 11:59:32,288][02928] Worker 3 uses CPU cores [3] [2024-06-06 11:59:32,300][02953] Worker 26 uses CPU cores [26] [2024-06-06 11:59:32,304][02926] Worker 1 uses CPU cores [1] [2024-06-06 11:59:32,310][02938] Worker 19 uses CPU cores [19] [2024-06-06 11:59:32,324][02933] Worker 11 uses CPU cores [11] [2024-06-06 11:59:32,324][02935] Worker 9 uses CPU cores [9] [2024-06-06 11:59:32,331][02948] Worker 23 uses CPU cores [23] [2024-06-06 11:59:32,342][02943] Worker 18 uses CPU cores [18] [2024-06-06 11:59:32,355][02924] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 11:59:32,355][02924] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-06 11:59:32,363][02924] Num visible devices: 1 [2024-06-06 11:59:32,394][02945] Worker 13 uses CPU cores [13] [2024-06-06 11:59:32,400][02931] Worker 6 uses CPU cores [6] [2024-06-06 11:59:32,400][02944] Worker 20 uses CPU cores [20] [2024-06-06 11:59:32,460][02939] Worker 15 uses CPU cores [15] [2024-06-06 11:59:32,480][02929] Worker 4 uses CPU cores [4] [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,809][02904] RunningMeanStd input shape: (1,) [2024-06-06 11:59:32,809][02904] RunningMeanStd input shape: (1,) [2024-06-06 11:59:32,809][02904] RunningMeanStd input shape: (1,) [2024-06-06 11:59:32,810][02904] RunningMeanStd input shape: (1,) [2024-06-06 11:59:32,849][02904] RunningMeanStd input shape: (1,) [2024-06-06 11:59:32,853][02904] Created Actor Critic model with architecture: [2024-06-06 11:59:32,853][02904] SampleFactoryAgentWrapper( (obs_normalizer): ObservationNormalizer() (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (agent): MettaAgent( (_encoder): MultiFeatureSetEncoder( (feature_set_encoders): ModuleDict( (grid_obs): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (agent): RunningMeanStdInPlace() (altar): RunningMeanStdInPlace() (converter): RunningMeanStdInPlace() (generator): RunningMeanStdInPlace() (wall): RunningMeanStdInPlace() (agent:dir): RunningMeanStdInPlace() (agent:energy): RunningMeanStdInPlace() (agent:frozen): RunningMeanStdInPlace() (agent:hp): RunningMeanStdInPlace() (agent:id): RunningMeanStdInPlace() (agent:inv_r1): RunningMeanStdInPlace() (agent:inv_r2): RunningMeanStdInPlace() (agent:inv_r3): RunningMeanStdInPlace() (agent:shield): RunningMeanStdInPlace() (altar:hp): RunningMeanStdInPlace() (altar:state): RunningMeanStdInPlace() (converter:hp): RunningMeanStdInPlace() (converter:state): RunningMeanStdInPlace() (generator:amount): RunningMeanStdInPlace() (generator:hp): RunningMeanStdInPlace() (generator:state): RunningMeanStdInPlace() (wall:hp): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=125, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=512, out_features=512, bias=True) (7): ELU(alpha=1.0) ) ) (global_vars): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (_steps): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_action): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_action_id): RunningMeanStdInPlace() (last_action_val): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_reward): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_reward): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) ) (merged_encoder): Sequential( (0): Linear(in_features=536, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) ) ) (_core): ModelCoreRNN( (core): GRU(512, 512) ) (_decoder): Decoder( (mlp): Identity() ) (_critic_linear): Linear(in_features=512, out_features=1, bias=True) (_action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) ) [2024-06-06 11:59:32,923][02904] Using optimizer [2024-06-06 11:59:33,068][02904] Loading state from checkpoint /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006297_103170048.pth... [2024-06-06 11:59:33,203][02904] Loading model from checkpoint [2024-06-06 11:59:33,207][02904] Loaded experiment state at self.train_step=6297, self.env_steps=103170048 [2024-06-06 11:59:33,207][02904] Initialized policy 0 weights for model version 6297 [2024-06-06 11:59:33,210][02904] LearnerWorker_p0 finished initialization! [2024-06-06 11:59:33,210][02904] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 11:59:33,844][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,846][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,846][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,849][02924] RunningMeanStd input shape: (1,) [2024-06-06 11:59:33,849][02924] RunningMeanStd input shape: (1,) [2024-06-06 11:59:33,849][02924] RunningMeanStd input shape: (1,) [2024-06-06 11:59:33,849][02924] RunningMeanStd input shape: (1,) [2024-06-06 11:59:33,888][02924] RunningMeanStd input shape: (1,) [2024-06-06 11:59:33,909][02692] Inference worker 0-0 is ready! [2024-06-06 11:59:33,910][02692] All inference workers are ready! Signal rollout workers to start! [2024-06-06 11:59:35,968][02942] Decorrelating experience for 0 frames... [2024-06-06 11:59:35,987][02946] Decorrelating experience for 0 frames... [2024-06-06 11:59:35,993][02938] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,003][02953] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,006][02943] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,007][02940] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,009][02956] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,011][02950] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,013][02944] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,015][02948] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,015][02947] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,016][02949] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,030][02935] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,032][02945] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,032][02955] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,032][02932] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,033][02926] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,034][02933] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,035][02954] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,036][02928] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,037][02939] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,038][02941] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,039][02927] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,039][02937] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,040][02936] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,043][02925] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,043][02934] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,044][02930] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,044][02931] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,046][02929] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,058][02952] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,069][02951] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,700][02942] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,721][02946] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,734][02938] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,747][02953] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,757][02692] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 103170048. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 11:59:36,761][02940] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,765][02943] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,767][02950] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,769][02956] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,773][02948] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,784][02932] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,784][02947] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,785][02944] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,785][02935] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,788][02945] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,789][02933] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,791][02926] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,792][02928] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,796][02939] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,798][02949] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,799][02937] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,801][02936] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,803][02927] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,804][02941] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,805][02934] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,808][02930] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,810][02929] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,810][02925] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,811][02931] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,816][02954] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,824][02955] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,844][02952] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,855][02951] Decorrelating experience for 256 frames... [2024-06-06 11:59:41,757][02692] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 103170048. Throughput: 0: 31192.2. Samples: 155960. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 11:59:42,573][02950] Worker 25, sleep for 117.188 sec to decorrelate experience collection [2024-06-06 11:59:42,573][02946] Worker 21, sleep for 98.438 sec to decorrelate experience collection [2024-06-06 11:59:42,574][02944] Worker 20, sleep for 93.750 sec to decorrelate experience collection [2024-06-06 11:59:42,581][02947] Worker 22, sleep for 103.125 sec to decorrelate experience collection [2024-06-06 11:59:42,581][02948] Worker 23, sleep for 107.812 sec to decorrelate experience collection [2024-06-06 11:59:42,581][02942] Worker 17, sleep for 79.688 sec to decorrelate experience collection [2024-06-06 11:59:42,581][02953] Worker 26, sleep for 121.875 sec to decorrelate experience collection [2024-06-06 11:59:42,581][02949] Worker 24, sleep for 112.500 sec to decorrelate experience collection [2024-06-06 11:59:42,592][02937] Worker 10, sleep for 46.875 sec to decorrelate experience collection [2024-06-06 11:59:42,594][02926] Worker 1, sleep for 4.688 sec to decorrelate experience collection [2024-06-06 11:59:42,595][02938] Worker 19, sleep for 89.062 sec to decorrelate experience collection [2024-06-06 11:59:42,595][02940] Worker 16, sleep for 75.000 sec to decorrelate experience collection [2024-06-06 11:59:42,595][02954] Worker 29, sleep for 135.938 sec to decorrelate experience collection [2024-06-06 11:59:42,596][02951] Worker 27, sleep for 126.562 sec to decorrelate experience collection [2024-06-06 11:59:42,596][02943] Worker 18, sleep for 84.375 sec to decorrelate experience collection [2024-06-06 11:59:42,596][02955] Worker 31, sleep for 145.312 sec to decorrelate experience collection [2024-06-06 11:59:42,596][02941] Worker 14, sleep for 65.625 sec to decorrelate experience collection [2024-06-06 11:59:42,602][02933] Worker 11, sleep for 51.562 sec to decorrelate experience collection [2024-06-06 11:59:42,603][02935] Worker 9, sleep for 42.188 sec to decorrelate experience collection [2024-06-06 11:59:42,608][02936] Worker 12, sleep for 56.250 sec to decorrelate experience collection [2024-06-06 11:59:42,608][02930] Worker 5, sleep for 23.438 sec to decorrelate experience collection [2024-06-06 11:59:42,609][02952] Worker 28, sleep for 131.250 sec to decorrelate experience collection [2024-06-06 11:59:42,615][02928] Worker 3, sleep for 14.062 sec to decorrelate experience collection [2024-06-06 11:59:42,615][02939] Worker 15, sleep for 70.312 sec to decorrelate experience collection [2024-06-06 11:59:42,616][02956] Worker 30, sleep for 140.625 sec to decorrelate experience collection [2024-06-06 11:59:42,620][02927] Worker 2, sleep for 9.375 sec to decorrelate experience collection [2024-06-06 11:59:42,627][02945] Worker 13, sleep for 60.938 sec to decorrelate experience collection [2024-06-06 11:59:42,628][02929] Worker 4, sleep for 18.750 sec to decorrelate experience collection [2024-06-06 11:59:42,630][02934] Worker 8, sleep for 37.500 sec to decorrelate experience collection [2024-06-06 11:59:42,665][02931] Worker 6, sleep for 28.125 sec to decorrelate experience collection [2024-06-06 11:59:42,669][02904] Signal inference workers to stop experience collection... [2024-06-06 11:59:42,692][02932] Worker 7, sleep for 32.812 sec to decorrelate experience collection [2024-06-06 11:59:42,716][02924] InferenceWorker_p0-w0: stopping experience collection [2024-06-06 11:59:43,230][02904] Signal inference workers to resume experience collection... [2024-06-06 11:59:43,231][02924] InferenceWorker_p0-w0: resuming experience collection [2024-06-06 11:59:44,324][02924] Updated weights for policy 0, policy_version 6307 (0.0011) [2024-06-06 11:59:46,758][02692] Fps is (10 sec: 16383.2, 60 sec: 16383.2, 300 sec: 16383.2). Total num frames: 103333888. Throughput: 0: 33038.3. Samples: 330400. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 11:59:47,304][02926] Worker 1 awakens! [2024-06-06 11:59:49,525][02692] Heartbeat connected on Batcher_0 [2024-06-06 11:59:49,527][02692] Heartbeat connected on LearnerWorker_p0 [2024-06-06 11:59:49,532][02692] Heartbeat connected on RolloutWorker_w0 [2024-06-06 11:59:49,548][02692] Heartbeat connected on RolloutWorker_w1 [2024-06-06 11:59:49,567][02692] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-06 11:59:51,757][02692] Fps is (10 sec: 16383.6, 60 sec: 10922.5, 300 sec: 10922.5). Total num frames: 103333888. Throughput: 0: 22381.0. Samples: 335720. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 11:59:52,016][02927] Worker 2 awakens! [2024-06-06 11:59:52,025][02692] Heartbeat connected on RolloutWorker_w2 [2024-06-06 11:59:56,748][02928] Worker 3 awakens! [2024-06-06 11:59:56,757][02692] Fps is (10 sec: 3276.8, 60 sec: 9830.2, 300 sec: 9830.2). Total num frames: 103366656. Throughput: 0: 17619.6. Samples: 352400. Policy #0 lag: (min: 0.0, avg: 9.4, max: 10.0) [2024-06-06 11:59:56,763][02692] Heartbeat connected on RolloutWorker_w3 [2024-06-06 12:00:01,468][02929] Worker 4 awakens! [2024-06-06 12:00:01,475][02692] Heartbeat connected on RolloutWorker_w4 [2024-06-06 12:00:01,757][02692] Fps is (10 sec: 4915.4, 60 sec: 8519.7, 300 sec: 8519.7). Total num frames: 103383040. Throughput: 0: 15072.1. Samples: 376800. Policy #0 lag: (min: 0.0, avg: 4.3, max: 12.0) [2024-06-06 12:00:01,757][02692] Avg episode reward: [(0, '0.022')] [2024-06-06 12:00:06,146][02930] Worker 5 awakens! [2024-06-06 12:00:06,149][02692] Heartbeat connected on RolloutWorker_w5 [2024-06-06 12:00:06,757][02692] Fps is (10 sec: 9830.8, 60 sec: 9830.4, 300 sec: 9830.4). Total num frames: 103464960. Throughput: 0: 14160.7. Samples: 424820. Policy #0 lag: (min: 0.0, avg: 4.3, max: 12.0) [2024-06-06 12:00:06,757][02692] Avg episode reward: [(0, '0.029')] [2024-06-06 12:00:07,863][02924] Updated weights for policy 0, policy_version 6317 (0.0012) [2024-06-06 12:00:10,888][02931] Worker 6 awakens! [2024-06-06 12:00:10,891][02692] Heartbeat connected on RolloutWorker_w6 [2024-06-06 12:00:11,757][02692] Fps is (10 sec: 19660.8, 60 sec: 11702.9, 300 sec: 11702.9). Total num frames: 103579648. Throughput: 0: 15247.5. Samples: 533660. Policy #0 lag: (min: 0.0, avg: 6.7, max: 18.0) [2024-06-06 12:00:11,757][02692] Avg episode reward: [(0, '0.037')] [2024-06-06 12:00:15,047][02924] Updated weights for policy 0, policy_version 6327 (0.0011) [2024-06-06 12:00:15,548][02932] Worker 7 awakens! [2024-06-06 12:00:15,553][02692] Heartbeat connected on RolloutWorker_w7 [2024-06-06 12:00:16,757][02692] Fps is (10 sec: 24576.1, 60 sec: 13516.8, 300 sec: 13516.8). Total num frames: 103710720. Throughput: 0: 17070.5. Samples: 682820. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) [2024-06-06 12:00:16,757][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:00:16,767][02904] Saving new best policy, reward=0.046! [2024-06-06 12:00:20,231][02934] Worker 8 awakens! [2024-06-06 12:00:20,235][02692] Heartbeat connected on RolloutWorker_w8 [2024-06-06 12:00:21,195][02924] Updated weights for policy 0, policy_version 6337 (0.0012) [2024-06-06 12:00:21,757][02692] Fps is (10 sec: 26214.4, 60 sec: 14927.7, 300 sec: 14927.7). Total num frames: 103841792. Throughput: 0: 17004.0. Samples: 765180. Policy #0 lag: (min: 0.0, avg: 2.4, max: 7.0) [2024-06-06 12:00:21,757][02692] Avg episode reward: [(0, '0.038')] [2024-06-06 12:00:24,812][02935] Worker 9 awakens! [2024-06-06 12:00:24,818][02692] Heartbeat connected on RolloutWorker_w9 [2024-06-06 12:00:25,820][02924] Updated weights for policy 0, policy_version 6347 (0.0012) [2024-06-06 12:00:26,757][02692] Fps is (10 sec: 29491.2, 60 sec: 16711.7, 300 sec: 16711.7). Total num frames: 104005632. Throughput: 0: 17391.1. Samples: 938560. Policy #0 lag: (min: 0.0, avg: 3.1, max: 7.0) [2024-06-06 12:00:26,757][02692] Avg episode reward: [(0, '0.032')] [2024-06-06 12:00:29,566][02937] Worker 10 awakens! [2024-06-06 12:00:29,571][02692] Heartbeat connected on RolloutWorker_w10 [2024-06-06 12:00:30,884][02924] Updated weights for policy 0, policy_version 6357 (0.0017) [2024-06-06 12:00:31,757][02692] Fps is (10 sec: 32767.8, 60 sec: 18171.4, 300 sec: 18171.4). Total num frames: 104169472. Throughput: 0: 18216.7. Samples: 1150140. Policy #0 lag: (min: 0.0, avg: 3.3, max: 7.0) [2024-06-06 12:00:31,757][02692] Avg episode reward: [(0, '0.035')] [2024-06-06 12:00:34,264][02933] Worker 11 awakens! [2024-06-06 12:00:34,270][02692] Heartbeat connected on RolloutWorker_w11 [2024-06-06 12:00:35,136][02924] Updated weights for policy 0, policy_version 6367 (0.0014) [2024-06-06 12:00:36,757][02692] Fps is (10 sec: 36044.5, 60 sec: 19933.9, 300 sec: 19933.9). Total num frames: 104366080. Throughput: 0: 20597.4. Samples: 1262600. Policy #0 lag: (min: 0.0, avg: 3.3, max: 7.0) [2024-06-06 12:00:36,757][02692] Avg episode reward: [(0, '0.040')] [2024-06-06 12:00:38,956][02936] Worker 12 awakens! [2024-06-06 12:00:38,961][02692] Heartbeat connected on RolloutWorker_w12 [2024-06-06 12:00:39,106][02924] Updated weights for policy 0, policy_version 6377 (0.0012) [2024-06-06 12:00:41,757][02692] Fps is (10 sec: 40959.9, 60 sec: 23483.7, 300 sec: 21677.3). Total num frames: 104579072. Throughput: 0: 25802.4. Samples: 1513500. Policy #0 lag: (min: 0.0, avg: 4.5, max: 10.0) [2024-06-06 12:00:41,757][02692] Avg episode reward: [(0, '0.043')] [2024-06-06 12:00:43,025][02924] Updated weights for policy 0, policy_version 6387 (0.0016) [2024-06-06 12:00:43,664][02945] Worker 13 awakens! [2024-06-06 12:00:43,670][02692] Heartbeat connected on RolloutWorker_w13 [2024-06-06 12:00:46,572][02924] Updated weights for policy 0, policy_version 6397 (0.0016) [2024-06-06 12:00:46,757][02692] Fps is (10 sec: 44237.2, 60 sec: 24576.2, 300 sec: 23405.7). Total num frames: 104808448. Throughput: 0: 31061.7. Samples: 1774580. Policy #0 lag: (min: 0.0, avg: 4.1, max: 10.0) [2024-06-06 12:00:46,757][02692] Avg episode reward: [(0, '0.037')] [2024-06-06 12:00:48,320][02941] Worker 14 awakens! [2024-06-06 12:00:48,328][02692] Heartbeat connected on RolloutWorker_w14 [2024-06-06 12:00:50,464][02924] Updated weights for policy 0, policy_version 6407 (0.0018) [2024-06-06 12:00:51,757][02692] Fps is (10 sec: 44237.2, 60 sec: 28126.0, 300 sec: 24685.3). Total num frames: 105021440. Throughput: 0: 32918.7. Samples: 1906160. Policy #0 lag: (min: 0.0, avg: 5.1, max: 9.0) [2024-06-06 12:00:51,757][02692] Avg episode reward: [(0, '0.044')] [2024-06-06 12:00:53,029][02939] Worker 15 awakens! [2024-06-06 12:00:53,037][02692] Heartbeat connected on RolloutWorker_w15 [2024-06-06 12:00:54,289][02924] Updated weights for policy 0, policy_version 6417 (0.0023) [2024-06-06 12:00:56,757][02692] Fps is (10 sec: 42597.9, 60 sec: 31129.8, 300 sec: 25804.8). Total num frames: 105234432. Throughput: 0: 36301.7. Samples: 2167240. Policy #0 lag: (min: 1.0, avg: 5.3, max: 11.0) [2024-06-06 12:00:56,758][02692] Avg episode reward: [(0, '0.038')] [2024-06-06 12:00:57,696][02940] Worker 16 awakens! [2024-06-06 12:00:57,704][02692] Heartbeat connected on RolloutWorker_w16 [2024-06-06 12:00:58,449][02924] Updated weights for policy 0, policy_version 6427 (0.0018) [2024-06-06 12:01:01,757][02692] Fps is (10 sec: 42598.2, 60 sec: 34406.4, 300 sec: 26792.7). Total num frames: 105447424. Throughput: 0: 38332.9. Samples: 2407800. Policy #0 lag: (min: 1.0, avg: 5.3, max: 11.0) [2024-06-06 12:01:01,757][02692] Avg episode reward: [(0, '0.040')] [2024-06-06 12:01:02,111][02924] Updated weights for policy 0, policy_version 6437 (0.0019) [2024-06-06 12:01:02,368][02942] Worker 17 awakens! [2024-06-06 12:01:02,377][02692] Heartbeat connected on RolloutWorker_w17 [2024-06-06 12:01:05,803][02924] Updated weights for policy 0, policy_version 6447 (0.0021) [2024-06-06 12:01:06,757][02692] Fps is (10 sec: 40960.3, 60 sec: 36317.9, 300 sec: 27488.7). Total num frames: 105644032. Throughput: 0: 39404.4. Samples: 2538380. Policy #0 lag: (min: 0.0, avg: 5.6, max: 11.0) [2024-06-06 12:01:06,757][02692] Avg episode reward: [(0, '0.040')] [2024-06-06 12:01:07,026][02943] Worker 18 awakens! [2024-06-06 12:01:07,035][02692] Heartbeat connected on RolloutWorker_w18 [2024-06-06 12:01:09,883][02924] Updated weights for policy 0, policy_version 6457 (0.0019) [2024-06-06 12:01:11,756][02938] Worker 19 awakens! [2024-06-06 12:01:11,757][02692] Fps is (10 sec: 40960.2, 60 sec: 37956.3, 300 sec: 28284.0). Total num frames: 105857024. Throughput: 0: 41197.8. Samples: 2792460. Policy #0 lag: (min: 0.0, avg: 5.4, max: 13.0) [2024-06-06 12:01:11,757][02692] Avg episode reward: [(0, '0.039')] [2024-06-06 12:01:11,764][02692] Heartbeat connected on RolloutWorker_w19 [2024-06-06 12:01:13,475][02924] Updated weights for policy 0, policy_version 6467 (0.0024) [2024-06-06 12:01:16,424][02944] Worker 20 awakens! [2024-06-06 12:01:16,433][02692] Heartbeat connected on RolloutWorker_w20 [2024-06-06 12:01:16,757][02692] Fps is (10 sec: 45875.5, 60 sec: 39867.7, 300 sec: 29327.4). Total num frames: 106102784. Throughput: 0: 42392.1. Samples: 3057780. Policy #0 lag: (min: 0.0, avg: 6.9, max: 13.0) [2024-06-06 12:01:16,757][02692] Avg episode reward: [(0, '0.036')] [2024-06-06 12:01:17,351][02924] Updated weights for policy 0, policy_version 6477 (0.0028) [2024-06-06 12:01:21,112][02946] Worker 21 awakens! [2024-06-06 12:01:21,122][02692] Heartbeat connected on RolloutWorker_w21 [2024-06-06 12:01:21,251][02924] Updated weights for policy 0, policy_version 6487 (0.0022) [2024-06-06 12:01:21,757][02692] Fps is (10 sec: 45874.9, 60 sec: 41233.0, 300 sec: 29959.3). Total num frames: 106315776. Throughput: 0: 42829.4. Samples: 3189920. Policy #0 lag: (min: 0.0, avg: 5.6, max: 14.0) [2024-06-06 12:01:21,757][02692] Avg episode reward: [(0, '0.034')] [2024-06-06 12:01:24,521][02924] Updated weights for policy 0, policy_version 6497 (0.0022) [2024-06-06 12:01:25,798][02947] Worker 22 awakens! [2024-06-06 12:01:25,809][02692] Heartbeat connected on RolloutWorker_w22 [2024-06-06 12:01:26,757][02692] Fps is (10 sec: 42597.6, 60 sec: 42052.1, 300 sec: 30533.8). Total num frames: 106528768. Throughput: 0: 43281.2. Samples: 3461160. Policy #0 lag: (min: 0.0, avg: 5.6, max: 14.0) [2024-06-06 12:01:26,758][02692] Avg episode reward: [(0, '0.043')] [2024-06-06 12:01:26,890][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006503_106545152.pth... [2024-06-06 12:01:26,935][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005946_97419264.pth [2024-06-06 12:01:28,140][02924] Updated weights for policy 0, policy_version 6507 (0.0023) [2024-06-06 12:01:30,420][02948] Worker 23 awakens! [2024-06-06 12:01:30,431][02692] Heartbeat connected on RolloutWorker_w23 [2024-06-06 12:01:31,555][02924] Updated weights for policy 0, policy_version 6517 (0.0025) [2024-06-06 12:01:31,757][02692] Fps is (10 sec: 45874.7, 60 sec: 43417.5, 300 sec: 31343.3). Total num frames: 106774528. Throughput: 0: 43655.9. Samples: 3739100. Policy #0 lag: (min: 0.0, avg: 7.7, max: 16.0) [2024-06-06 12:01:31,758][02692] Avg episode reward: [(0, '0.041')] [2024-06-06 12:01:35,134][02949] Worker 24 awakens! [2024-06-06 12:01:35,145][02692] Heartbeat connected on RolloutWorker_w24 [2024-06-06 12:01:35,712][02924] Updated weights for policy 0, policy_version 6527 (0.0021) [2024-06-06 12:01:36,757][02692] Fps is (10 sec: 45875.8, 60 sec: 43690.7, 300 sec: 31812.3). Total num frames: 106987520. Throughput: 0: 43704.4. Samples: 3872860. Policy #0 lag: (min: 0.0, avg: 8.1, max: 17.0) [2024-06-06 12:01:36,757][02692] Avg episode reward: [(0, '0.032')] [2024-06-06 12:01:38,798][02924] Updated weights for policy 0, policy_version 6537 (0.0024) [2024-06-06 12:01:39,848][02950] Worker 25 awakens! [2024-06-06 12:01:39,861][02692] Heartbeat connected on RolloutWorker_w25 [2024-06-06 12:01:41,757][02692] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 32243.7). Total num frames: 107200512. Throughput: 0: 44044.4. Samples: 4149240. Policy #0 lag: (min: 0.0, avg: 8.0, max: 17.0) [2024-06-06 12:01:41,758][02692] Avg episode reward: [(0, '0.042')] [2024-06-06 12:01:42,565][02924] Updated weights for policy 0, policy_version 6547 (0.0026) [2024-06-06 12:01:44,556][02953] Worker 26 awakens! [2024-06-06 12:01:44,566][02692] Heartbeat connected on RolloutWorker_w26 [2024-06-06 12:01:45,730][02924] Updated weights for policy 0, policy_version 6557 (0.0031) [2024-06-06 12:01:46,757][02692] Fps is (10 sec: 45875.7, 60 sec: 43963.8, 300 sec: 32894.1). Total num frames: 107446272. Throughput: 0: 44853.9. Samples: 4426220. Policy #0 lag: (min: 1.0, avg: 9.6, max: 18.0) [2024-06-06 12:01:46,757][02692] Avg episode reward: [(0, '0.037')] [2024-06-06 12:01:49,082][02924] Updated weights for policy 0, policy_version 6567 (0.0026) [2024-06-06 12:01:49,256][02951] Worker 27 awakens! [2024-06-06 12:01:49,269][02692] Heartbeat connected on RolloutWorker_w27 [2024-06-06 12:01:51,757][02692] Fps is (10 sec: 52428.9, 60 sec: 45055.9, 300 sec: 33738.9). Total num frames: 107724800. Throughput: 0: 45059.0. Samples: 4566040. Policy #0 lag: (min: 0.0, avg: 10.0, max: 18.0) [2024-06-06 12:01:51,758][02692] Avg episode reward: [(0, '0.042')] [2024-06-06 12:01:53,511][02924] Updated weights for policy 0, policy_version 6577 (0.0031) [2024-06-06 12:01:53,908][02952] Worker 28 awakens! [2024-06-06 12:01:53,918][02692] Heartbeat connected on RolloutWorker_w28 [2024-06-06 12:01:56,037][02924] Updated weights for policy 0, policy_version 6587 (0.0020) [2024-06-06 12:01:56,757][02692] Fps is (10 sec: 50789.9, 60 sec: 45329.1, 300 sec: 34172.4). Total num frames: 107954176. Throughput: 0: 45800.0. Samples: 4853460. Policy #0 lag: (min: 0.0, avg: 10.0, max: 18.0) [2024-06-06 12:01:56,757][02692] Avg episode reward: [(0, '0.041')] [2024-06-06 12:01:58,632][02954] Worker 29 awakens! [2024-06-06 12:01:58,645][02692] Heartbeat connected on RolloutWorker_w29 [2024-06-06 12:02:00,136][02924] Updated weights for policy 0, policy_version 6597 (0.0020) [2024-06-06 12:02:01,757][02692] Fps is (10 sec: 45876.0, 60 sec: 45602.2, 300 sec: 34575.9). Total num frames: 108183552. Throughput: 0: 46296.0. Samples: 5141100. Policy #0 lag: (min: 1.0, avg: 10.4, max: 18.0) [2024-06-06 12:02:01,757][02692] Avg episode reward: [(0, '0.037')] [2024-06-06 12:02:02,782][02904] Signal inference workers to stop experience collection... (50 times) [2024-06-06 12:02:02,834][02924] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-06 12:02:02,892][02904] Signal inference workers to resume experience collection... (50 times) [2024-06-06 12:02:02,893][02924] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-06 12:02:03,046][02924] Updated weights for policy 0, policy_version 6607 (0.0027) [2024-06-06 12:02:03,341][02956] Worker 30 awakens! [2024-06-06 12:02:03,354][02692] Heartbeat connected on RolloutWorker_w30 [2024-06-06 12:02:06,645][02924] Updated weights for policy 0, policy_version 6617 (0.0033) [2024-06-06 12:02:06,757][02692] Fps is (10 sec: 45875.0, 60 sec: 46148.3, 300 sec: 34952.5). Total num frames: 108412928. Throughput: 0: 46683.5. Samples: 5290680. Policy #0 lag: (min: 1.0, avg: 10.1, max: 20.0) [2024-06-06 12:02:06,758][02692] Avg episode reward: [(0, '0.039')] [2024-06-06 12:02:08,009][02955] Worker 31 awakens! [2024-06-06 12:02:08,024][02692] Heartbeat connected on RolloutWorker_w31 [2024-06-06 12:02:09,379][02924] Updated weights for policy 0, policy_version 6627 (0.0022) [2024-06-06 12:02:11,757][02692] Fps is (10 sec: 47513.8, 60 sec: 46694.4, 300 sec: 35410.6). Total num frames: 108658688. Throughput: 0: 47325.6. Samples: 5590800. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 12:02:11,757][02692] Avg episode reward: [(0, '0.038')] [2024-06-06 12:02:13,235][02924] Updated weights for policy 0, policy_version 6637 (0.0029) [2024-06-06 12:02:15,839][02924] Updated weights for policy 0, policy_version 6647 (0.0026) [2024-06-06 12:02:16,757][02692] Fps is (10 sec: 54065.1, 60 sec: 47513.2, 300 sec: 36147.1). Total num frames: 108953600. Throughput: 0: 47749.5. Samples: 5887840. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-06 12:02:16,758][02692] Avg episode reward: [(0, '0.040')] [2024-06-06 12:02:19,949][02924] Updated weights for policy 0, policy_version 6657 (0.0031) [2024-06-06 12:02:21,760][02692] Fps is (10 sec: 50775.1, 60 sec: 47511.3, 300 sec: 36342.1). Total num frames: 109166592. Throughput: 0: 48401.3. Samples: 6051060. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-06 12:02:21,760][02692] Avg episode reward: [(0, '0.039')] [2024-06-06 12:02:22,650][02924] Updated weights for policy 0, policy_version 6667 (0.0023) [2024-06-06 12:02:26,610][02924] Updated weights for policy 0, policy_version 6677 (0.0036) [2024-06-06 12:02:26,757][02692] Fps is (10 sec: 44238.9, 60 sec: 47786.8, 300 sec: 36623.1). Total num frames: 109395968. Throughput: 0: 48685.6. Samples: 6340080. Policy #0 lag: (min: 0.0, avg: 12.1, max: 22.0) [2024-06-06 12:02:26,757][02692] Avg episode reward: [(0, '0.045')] [2024-06-06 12:02:29,303][02924] Updated weights for policy 0, policy_version 6687 (0.0029) [2024-06-06 12:02:31,757][02692] Fps is (10 sec: 47527.8, 60 sec: 47786.8, 300 sec: 36981.0). Total num frames: 109641728. Throughput: 0: 49135.9. Samples: 6637340. Policy #0 lag: (min: 0.0, avg: 12.3, max: 28.0) [2024-06-06 12:02:31,757][02692] Avg episode reward: [(0, '0.052')] [2024-06-06 12:02:31,758][02904] Saving new best policy, reward=0.052! [2024-06-06 12:02:33,139][02924] Updated weights for policy 0, policy_version 6697 (0.0034) [2024-06-06 12:02:35,688][02924] Updated weights for policy 0, policy_version 6707 (0.0025) [2024-06-06 12:02:36,757][02692] Fps is (10 sec: 55705.3, 60 sec: 49425.1, 300 sec: 37683.2). Total num frames: 109953024. Throughput: 0: 49284.6. Samples: 6783840. Policy #0 lag: (min: 0.0, avg: 13.2, max: 26.0) [2024-06-06 12:02:36,757][02692] Avg episode reward: [(0, '0.045')] [2024-06-06 12:02:39,584][02924] Updated weights for policy 0, policy_version 6717 (0.0024) [2024-06-06 12:02:41,757][02692] Fps is (10 sec: 54067.4, 60 sec: 49698.3, 300 sec: 37904.6). Total num frames: 110182400. Throughput: 0: 49754.7. Samples: 7092420. Policy #0 lag: (min: 0.0, avg: 13.1, max: 27.0) [2024-06-06 12:02:41,757][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:02:42,028][02924] Updated weights for policy 0, policy_version 6727 (0.0037) [2024-06-06 12:02:46,180][02924] Updated weights for policy 0, policy_version 6737 (0.0040) [2024-06-06 12:02:46,757][02692] Fps is (10 sec: 44236.8, 60 sec: 49151.9, 300 sec: 38028.1). Total num frames: 110395392. Throughput: 0: 49974.6. Samples: 7389960. Policy #0 lag: (min: 0.0, avg: 13.1, max: 27.0) [2024-06-06 12:02:46,757][02692] Avg episode reward: [(0, '0.043')] [2024-06-06 12:02:48,601][02904] Signal inference workers to stop experience collection... (100 times) [2024-06-06 12:02:48,603][02904] Signal inference workers to resume experience collection... (100 times) [2024-06-06 12:02:48,609][02924] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-06 12:02:48,640][02924] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-06 12:02:48,751][02924] Updated weights for policy 0, policy_version 6747 (0.0028) [2024-06-06 12:02:51,760][02692] Fps is (10 sec: 45861.3, 60 sec: 48603.6, 300 sec: 38312.8). Total num frames: 110641152. Throughput: 0: 49608.4. Samples: 7523200. Policy #0 lag: (min: 0.0, avg: 12.8, max: 25.0) [2024-06-06 12:02:51,760][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:02:52,927][02924] Updated weights for policy 0, policy_version 6757 (0.0031) [2024-06-06 12:02:55,511][02924] Updated weights for policy 0, policy_version 6767 (0.0031) [2024-06-06 12:02:56,757][02692] Fps is (10 sec: 54067.4, 60 sec: 49698.2, 300 sec: 38830.1). Total num frames: 110936064. Throughput: 0: 49699.5. Samples: 7827280. Policy #0 lag: (min: 0.0, avg: 12.9, max: 25.0) [2024-06-06 12:02:56,757][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:02:59,600][02924] Updated weights for policy 0, policy_version 6777 (0.0032) [2024-06-06 12:03:01,757][02692] Fps is (10 sec: 50805.7, 60 sec: 49425.1, 300 sec: 38922.0). Total num frames: 111149056. Throughput: 0: 49511.7. Samples: 8115840. Policy #0 lag: (min: 2.0, avg: 11.7, max: 24.0) [2024-06-06 12:03:01,757][02692] Avg episode reward: [(0, '0.048')] [2024-06-06 12:03:02,126][02924] Updated weights for policy 0, policy_version 6787 (0.0019) [2024-06-06 12:03:06,383][02924] Updated weights for policy 0, policy_version 6797 (0.0026) [2024-06-06 12:03:06,757][02692] Fps is (10 sec: 45875.3, 60 sec: 49698.2, 300 sec: 39165.6). Total num frames: 111394816. Throughput: 0: 49324.2. Samples: 8270500. Policy #0 lag: (min: 2.0, avg: 9.2, max: 22.0) [2024-06-06 12:03:06,757][02692] Avg episode reward: [(0, '0.044')] [2024-06-06 12:03:08,618][02924] Updated weights for policy 0, policy_version 6807 (0.0025) [2024-06-06 12:03:11,757][02692] Fps is (10 sec: 45874.7, 60 sec: 49151.9, 300 sec: 39245.4). Total num frames: 111607808. Throughput: 0: 49243.9. Samples: 8556060. Policy #0 lag: (min: 2.0, avg: 9.2, max: 22.0) [2024-06-06 12:03:11,757][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:03:12,751][02924] Updated weights for policy 0, policy_version 6817 (0.0037) [2024-06-06 12:03:15,159][02924] Updated weights for policy 0, policy_version 6827 (0.0027) [2024-06-06 12:03:16,757][02692] Fps is (10 sec: 50790.1, 60 sec: 49152.3, 300 sec: 39694.0). Total num frames: 111902720. Throughput: 0: 49232.4. Samples: 8852800. Policy #0 lag: (min: 1.0, avg: 7.9, max: 21.0) [2024-06-06 12:03:16,757][02692] Avg episode reward: [(0, '0.048')] [2024-06-06 12:03:19,661][02924] Updated weights for policy 0, policy_version 6837 (0.0021) [2024-06-06 12:03:21,757][02692] Fps is (10 sec: 55705.8, 60 sec: 49973.7, 300 sec: 39977.0). Total num frames: 112164864. Throughput: 0: 49695.6. Samples: 9020140. Policy #0 lag: (min: 0.0, avg: 7.3, max: 21.0) [2024-06-06 12:03:21,757][02692] Avg episode reward: [(0, '0.047')] [2024-06-06 12:03:22,019][02924] Updated weights for policy 0, policy_version 6847 (0.0029) [2024-06-06 12:03:26,074][02924] Updated weights for policy 0, policy_version 6857 (0.0031) [2024-06-06 12:03:26,757][02692] Fps is (10 sec: 49151.5, 60 sec: 49971.1, 300 sec: 40105.2). Total num frames: 112394240. Throughput: 0: 49339.8. Samples: 9312720. Policy #0 lag: (min: 0.0, avg: 7.7, max: 21.0) [2024-06-06 12:03:26,757][02692] Avg episode reward: [(0, '0.047')] [2024-06-06 12:03:26,769][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006860_112394240.pth... [2024-06-06 12:03:26,835][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006297_103170048.pth [2024-06-06 12:03:28,588][02924] Updated weights for policy 0, policy_version 6867 (0.0029) [2024-06-06 12:03:31,757][02692] Fps is (10 sec: 42598.2, 60 sec: 49151.9, 300 sec: 40088.5). Total num frames: 112590848. Throughput: 0: 49144.8. Samples: 9601480. Policy #0 lag: (min: 0.0, avg: 7.7, max: 21.0) [2024-06-06 12:03:31,757][02692] Avg episode reward: [(0, '0.044')] [2024-06-06 12:03:33,014][02924] Updated weights for policy 0, policy_version 6877 (0.0029) [2024-06-06 12:03:34,990][02904] Signal inference workers to stop experience collection... (150 times) [2024-06-06 12:03:35,028][02924] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-06 12:03:35,051][02904] Signal inference workers to resume experience collection... (150 times) [2024-06-06 12:03:35,052][02924] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-06 12:03:35,183][02924] Updated weights for policy 0, policy_version 6887 (0.0032) [2024-06-06 12:03:36,757][02692] Fps is (10 sec: 50791.4, 60 sec: 49152.1, 300 sec: 40550.4). Total num frames: 112902144. Throughput: 0: 49329.6. Samples: 9742880. Policy #0 lag: (min: 0.0, avg: 7.8, max: 20.0) [2024-06-06 12:03:36,757][02692] Avg episode reward: [(0, '0.045')] [2024-06-06 12:03:39,573][02924] Updated weights for policy 0, policy_version 6897 (0.0043) [2024-06-06 12:03:41,757][02692] Fps is (10 sec: 54067.8, 60 sec: 49152.0, 300 sec: 40659.1). Total num frames: 113131520. Throughput: 0: 49324.5. Samples: 10046880. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-06 12:03:41,757][02692] Avg episode reward: [(0, '0.043')] [2024-06-06 12:03:41,908][02924] Updated weights for policy 0, policy_version 6907 (0.0031) [2024-06-06 12:03:46,302][02924] Updated weights for policy 0, policy_version 6917 (0.0029) [2024-06-06 12:03:46,757][02692] Fps is (10 sec: 45875.2, 60 sec: 49425.1, 300 sec: 40763.4). Total num frames: 113360896. Throughput: 0: 49519.1. Samples: 10344200. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 12:03:46,757][02692] Avg episode reward: [(0, '0.047')] [2024-06-06 12:03:48,684][02924] Updated weights for policy 0, policy_version 6927 (0.0025) [2024-06-06 12:03:51,757][02692] Fps is (10 sec: 45875.2, 60 sec: 49154.5, 300 sec: 40863.6). Total num frames: 113590272. Throughput: 0: 49015.5. Samples: 10476200. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:03:51,757][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:03:52,848][02924] Updated weights for policy 0, policy_version 6937 (0.0030) [2024-06-06 12:03:55,202][02924] Updated weights for policy 0, policy_version 6947 (0.0024) [2024-06-06 12:03:56,757][02692] Fps is (10 sec: 52428.2, 60 sec: 49152.0, 300 sec: 41212.1). Total num frames: 113885184. Throughput: 0: 49395.6. Samples: 10778860. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:03:56,757][02692] Avg episode reward: [(0, '0.041')] [2024-06-06 12:03:59,634][02924] Updated weights for policy 0, policy_version 6957 (0.0030) [2024-06-06 12:04:01,757][02692] Fps is (10 sec: 54065.9, 60 sec: 49697.9, 300 sec: 41361.8). Total num frames: 114130944. Throughput: 0: 49284.2. Samples: 11070600. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 12:04:01,758][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:04:01,913][02924] Updated weights for policy 0, policy_version 6967 (0.0023) [2024-06-06 12:04:06,306][02924] Updated weights for policy 0, policy_version 6977 (0.0029) [2024-06-06 12:04:06,757][02692] Fps is (10 sec: 45875.5, 60 sec: 49152.0, 300 sec: 41384.8). Total num frames: 114343936. Throughput: 0: 49001.4. Samples: 11225200. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:04:06,757][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:04:08,562][02924] Updated weights for policy 0, policy_version 6987 (0.0032) [2024-06-06 12:04:11,757][02692] Fps is (10 sec: 44237.6, 60 sec: 49425.1, 300 sec: 41466.4). Total num frames: 114573312. Throughput: 0: 48929.9. Samples: 11514560. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:04:11,758][02692] Avg episode reward: [(0, '0.048')] [2024-06-06 12:04:12,880][02924] Updated weights for policy 0, policy_version 6997 (0.0023) [2024-06-06 12:04:15,101][02924] Updated weights for policy 0, policy_version 7007 (0.0034) [2024-06-06 12:04:16,757][02692] Fps is (10 sec: 52428.6, 60 sec: 49425.1, 300 sec: 41779.2). Total num frames: 114868224. Throughput: 0: 49051.6. Samples: 11808800. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:04:16,757][02692] Avg episode reward: [(0, '0.052')] [2024-06-06 12:04:19,289][02924] Updated weights for policy 0, policy_version 7017 (0.0029) [2024-06-06 12:04:21,570][02924] Updated weights for policy 0, policy_version 7027 (0.0022) [2024-06-06 12:04:21,757][02692] Fps is (10 sec: 55705.9, 60 sec: 49425.1, 300 sec: 41966.0). Total num frames: 115130368. Throughput: 0: 49603.9. Samples: 11975060. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 12:04:21,757][02692] Avg episode reward: [(0, '0.051')] [2024-06-06 12:04:26,075][02924] Updated weights for policy 0, policy_version 7037 (0.0019) [2024-06-06 12:04:26,757][02692] Fps is (10 sec: 47512.5, 60 sec: 49151.9, 300 sec: 41976.9). Total num frames: 115343360. Throughput: 0: 49479.3. Samples: 12273460. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 12:04:26,758][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:04:28,159][02924] Updated weights for policy 0, policy_version 7047 (0.0033) [2024-06-06 12:04:31,757][02692] Fps is (10 sec: 42598.3, 60 sec: 49425.1, 300 sec: 41987.5). Total num frames: 115556352. Throughput: 0: 49415.5. Samples: 12567900. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:04:31,757][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:04:31,761][02904] Saving new best policy, reward=0.056! [2024-06-06 12:04:32,683][02924] Updated weights for policy 0, policy_version 7057 (0.0028) [2024-06-06 12:04:33,460][02904] Signal inference workers to stop experience collection... (200 times) [2024-06-06 12:04:33,497][02924] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-06 12:04:33,526][02904] Signal inference workers to resume experience collection... (200 times) [2024-06-06 12:04:33,527][02924] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-06 12:04:34,970][02924] Updated weights for policy 0, policy_version 7067 (0.0028) [2024-06-06 12:04:36,757][02692] Fps is (10 sec: 50791.2, 60 sec: 49151.9, 300 sec: 42987.2). Total num frames: 115851264. Throughput: 0: 49475.4. Samples: 12702600. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:04:36,758][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:04:39,419][02924] Updated weights for policy 0, policy_version 7077 (0.0031) [2024-06-06 12:04:41,450][02924] Updated weights for policy 0, policy_version 7087 (0.0022) [2024-06-06 12:04:41,757][02692] Fps is (10 sec: 55705.8, 60 sec: 49698.1, 300 sec: 43320.5). Total num frames: 116113408. Throughput: 0: 49481.4. Samples: 13005520. Policy #0 lag: (min: 0.0, avg: 7.6, max: 20.0) [2024-06-06 12:04:41,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:04:45,890][02924] Updated weights for policy 0, policy_version 7097 (0.0036) [2024-06-06 12:04:46,757][02692] Fps is (10 sec: 47514.2, 60 sec: 49425.0, 300 sec: 44042.5). Total num frames: 116326400. Throughput: 0: 49799.0. Samples: 13311540. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 12:04:46,757][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:04:48,024][02924] Updated weights for policy 0, policy_version 7107 (0.0029) [2024-06-06 12:04:51,757][02692] Fps is (10 sec: 42598.4, 60 sec: 49152.0, 300 sec: 44653.4). Total num frames: 116539392. Throughput: 0: 49370.2. Samples: 13446860. Policy #0 lag: (min: 0.0, avg: 8.8, max: 20.0) [2024-06-06 12:04:51,757][02692] Avg episode reward: [(0, '0.051')] [2024-06-06 12:04:52,738][02924] Updated weights for policy 0, policy_version 7117 (0.0026) [2024-06-06 12:04:54,625][02924] Updated weights for policy 0, policy_version 7127 (0.0034) [2024-06-06 12:04:56,757][02692] Fps is (10 sec: 50790.1, 60 sec: 49152.0, 300 sec: 45597.5). Total num frames: 116834304. Throughput: 0: 49361.8. Samples: 13735840. Policy #0 lag: (min: 0.0, avg: 8.8, max: 20.0) [2024-06-06 12:04:56,757][02692] Avg episode reward: [(0, '0.048')] [2024-06-06 12:04:59,277][02924] Updated weights for policy 0, policy_version 7137 (0.0030) [2024-06-06 12:05:01,339][02924] Updated weights for policy 0, policy_version 7147 (0.0028) [2024-06-06 12:05:01,757][02692] Fps is (10 sec: 55705.8, 60 sec: 49425.3, 300 sec: 46208.5). Total num frames: 117096448. Throughput: 0: 49435.6. Samples: 14033400. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 12:05:01,757][02692] Avg episode reward: [(0, '0.055')] [2024-06-06 12:05:05,987][02924] Updated weights for policy 0, policy_version 7157 (0.0025) [2024-06-06 12:05:06,757][02692] Fps is (10 sec: 47512.9, 60 sec: 49424.9, 300 sec: 46541.6). Total num frames: 117309440. Throughput: 0: 49193.6. Samples: 14188780. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 12:05:06,758][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:05:07,882][02924] Updated weights for policy 0, policy_version 7167 (0.0029) [2024-06-06 12:05:11,757][02692] Fps is (10 sec: 44236.2, 60 sec: 49425.0, 300 sec: 46874.9). Total num frames: 117538816. Throughput: 0: 49063.7. Samples: 14481320. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:05:11,758][02692] Avg episode reward: [(0, '0.052')] [2024-06-06 12:05:12,594][02924] Updated weights for policy 0, policy_version 7177 (0.0025) [2024-06-06 12:05:14,372][02924] Updated weights for policy 0, policy_version 7187 (0.0029) [2024-06-06 12:05:16,758][02692] Fps is (10 sec: 52424.9, 60 sec: 49424.3, 300 sec: 47430.1). Total num frames: 117833728. Throughput: 0: 49086.6. Samples: 14776840. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:05:16,758][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:05:19,160][02924] Updated weights for policy 0, policy_version 7197 (0.0037) [2024-06-06 12:05:21,066][02924] Updated weights for policy 0, policy_version 7207 (0.0027) [2024-06-06 12:05:21,758][02692] Fps is (10 sec: 55702.5, 60 sec: 49424.5, 300 sec: 47763.4). Total num frames: 118095872. Throughput: 0: 49854.9. Samples: 14946100. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:05:21,758][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:05:25,770][02924] Updated weights for policy 0, policy_version 7217 (0.0046) [2024-06-06 12:05:26,757][02692] Fps is (10 sec: 49156.1, 60 sec: 49698.3, 300 sec: 47985.7). Total num frames: 118325248. Throughput: 0: 49796.8. Samples: 15246380. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:05:26,757][02692] Avg episode reward: [(0, '0.055')] [2024-06-06 12:05:26,799][02904] Signal inference workers to stop experience collection... (250 times) [2024-06-06 12:05:26,801][02904] Signal inference workers to resume experience collection... (250 times) [2024-06-06 12:05:26,802][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007223_118341632.pth... [2024-06-06 12:05:26,817][02924] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-06 12:05:26,817][02924] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-06 12:05:26,864][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006503_106545152.pth [2024-06-06 12:05:27,681][02924] Updated weights for policy 0, policy_version 7227 (0.0022) [2024-06-06 12:05:31,757][02692] Fps is (10 sec: 44238.6, 60 sec: 49698.0, 300 sec: 48041.2). Total num frames: 118538240. Throughput: 0: 49615.3. Samples: 15544240. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:05:31,758][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:05:31,758][02904] Saving new best policy, reward=0.059! [2024-06-06 12:05:32,308][02924] Updated weights for policy 0, policy_version 7237 (0.0030) [2024-06-06 12:05:34,101][02924] Updated weights for policy 0, policy_version 7247 (0.0028) [2024-06-06 12:05:36,760][02692] Fps is (10 sec: 49138.9, 60 sec: 49422.9, 300 sec: 48262.9). Total num frames: 118816768. Throughput: 0: 49416.1. Samples: 15670720. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:05:36,760][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:05:38,843][02924] Updated weights for policy 0, policy_version 7257 (0.0037) [2024-06-06 12:05:40,661][02924] Updated weights for policy 0, policy_version 7267 (0.0020) [2024-06-06 12:05:41,757][02692] Fps is (10 sec: 54068.5, 60 sec: 49425.1, 300 sec: 48374.5). Total num frames: 119078912. Throughput: 0: 49886.3. Samples: 15980720. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 12:05:41,757][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:05:41,918][02904] Saving new best policy, reward=0.060! [2024-06-06 12:05:45,419][02924] Updated weights for policy 0, policy_version 7277 (0.0033) [2024-06-06 12:05:46,757][02692] Fps is (10 sec: 52442.5, 60 sec: 50244.1, 300 sec: 48541.1). Total num frames: 119341056. Throughput: 0: 50077.6. Samples: 16286900. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:05:46,757][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:05:47,466][02924] Updated weights for policy 0, policy_version 7287 (0.0027) [2024-06-06 12:05:51,757][02692] Fps is (10 sec: 44236.9, 60 sec: 49698.1, 300 sec: 48430.0). Total num frames: 119521280. Throughput: 0: 49761.1. Samples: 16428020. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-06 12:05:51,757][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:05:52,004][02924] Updated weights for policy 0, policy_version 7297 (0.0040) [2024-06-06 12:05:54,227][02924] Updated weights for policy 0, policy_version 7307 (0.0033) [2024-06-06 12:05:56,757][02692] Fps is (10 sec: 45875.8, 60 sec: 49425.1, 300 sec: 48652.2). Total num frames: 119799808. Throughput: 0: 49765.4. Samples: 16720760. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-06 12:05:56,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:05:58,596][02924] Updated weights for policy 0, policy_version 7317 (0.0021) [2024-06-06 12:06:00,764][02924] Updated weights for policy 0, policy_version 7327 (0.0034) [2024-06-06 12:06:01,757][02692] Fps is (10 sec: 55705.5, 60 sec: 49698.1, 300 sec: 48929.9). Total num frames: 120078336. Throughput: 0: 49674.8. Samples: 17012160. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 12:06:01,757][02692] Avg episode reward: [(0, '0.055')] [2024-06-06 12:06:05,423][02924] Updated weights for policy 0, policy_version 7337 (0.0026) [2024-06-06 12:06:06,760][02692] Fps is (10 sec: 50776.3, 60 sec: 49969.0, 300 sec: 48984.9). Total num frames: 120307712. Throughput: 0: 49473.2. Samples: 17172500. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:06:06,760][02692] Avg episode reward: [(0, '0.055')] [2024-06-06 12:06:07,452][02924] Updated weights for policy 0, policy_version 7347 (0.0037) [2024-06-06 12:06:08,266][02904] Signal inference workers to stop experience collection... (300 times) [2024-06-06 12:06:08,267][02904] Signal inference workers to resume experience collection... (300 times) [2024-06-06 12:06:08,311][02924] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-06 12:06:08,311][02924] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-06 12:06:11,757][02692] Fps is (10 sec: 42598.5, 60 sec: 49425.2, 300 sec: 48818.8). Total num frames: 120504320. Throughput: 0: 49158.8. Samples: 17458520. Policy #0 lag: (min: 0.0, avg: 7.2, max: 20.0) [2024-06-06 12:06:11,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:06:12,055][02924] Updated weights for policy 0, policy_version 7357 (0.0027) [2024-06-06 12:06:14,178][02924] Updated weights for policy 0, policy_version 7367 (0.0028) [2024-06-06 12:06:16,757][02692] Fps is (10 sec: 45887.4, 60 sec: 48879.6, 300 sec: 48985.4). Total num frames: 120766464. Throughput: 0: 49012.6. Samples: 17749800. Policy #0 lag: (min: 0.0, avg: 7.2, max: 20.0) [2024-06-06 12:06:16,758][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:06:18,787][02924] Updated weights for policy 0, policy_version 7377 (0.0029) [2024-06-06 12:06:20,783][02924] Updated weights for policy 0, policy_version 7387 (0.0033) [2024-06-06 12:06:21,757][02692] Fps is (10 sec: 54066.7, 60 sec: 49152.5, 300 sec: 49207.6). Total num frames: 121044992. Throughput: 0: 49611.4. Samples: 17903100. Policy #0 lag: (min: 0.0, avg: 7.4, max: 20.0) [2024-06-06 12:06:21,758][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:06:25,501][02924] Updated weights for policy 0, policy_version 7397 (0.0031) [2024-06-06 12:06:26,757][02692] Fps is (10 sec: 52429.5, 60 sec: 49425.2, 300 sec: 49207.6). Total num frames: 121290752. Throughput: 0: 49274.7. Samples: 18198080. Policy #0 lag: (min: 0.0, avg: 7.5, max: 20.0) [2024-06-06 12:06:26,757][02692] Avg episode reward: [(0, '0.055')] [2024-06-06 12:06:27,300][02924] Updated weights for policy 0, policy_version 7407 (0.0028) [2024-06-06 12:06:31,757][02692] Fps is (10 sec: 42598.7, 60 sec: 48879.1, 300 sec: 49096.5). Total num frames: 121470976. Throughput: 0: 48952.6. Samples: 18489760. Policy #0 lag: (min: 0.0, avg: 7.3, max: 20.0) [2024-06-06 12:06:31,757][02692] Avg episode reward: [(0, '0.050')] [2024-06-06 12:06:32,191][02924] Updated weights for policy 0, policy_version 7417 (0.0031) [2024-06-06 12:06:34,127][02924] Updated weights for policy 0, policy_version 7427 (0.0027) [2024-06-06 12:06:36,757][02692] Fps is (10 sec: 44236.7, 60 sec: 48608.1, 300 sec: 49263.1). Total num frames: 121733120. Throughput: 0: 48724.0. Samples: 18620600. Policy #0 lag: (min: 0.0, avg: 7.3, max: 20.0) [2024-06-06 12:06:36,757][02692] Avg episode reward: [(0, '0.057')] [2024-06-06 12:06:38,940][02924] Updated weights for policy 0, policy_version 7437 (0.0036) [2024-06-06 12:06:40,982][02924] Updated weights for policy 0, policy_version 7447 (0.0029) [2024-06-06 12:06:41,760][02692] Fps is (10 sec: 55688.6, 60 sec: 49149.5, 300 sec: 49429.2). Total num frames: 122028032. Throughput: 0: 48809.1. Samples: 18917320. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 12:06:41,761][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:06:45,461][02924] Updated weights for policy 0, policy_version 7457 (0.0017) [2024-06-06 12:06:46,757][02692] Fps is (10 sec: 52428.8, 60 sec: 48606.0, 300 sec: 49263.1). Total num frames: 122257408. Throughput: 0: 48984.9. Samples: 19216480. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:06:46,757][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:06:47,587][02924] Updated weights for policy 0, policy_version 7467 (0.0025) [2024-06-06 12:06:51,757][02692] Fps is (10 sec: 42611.4, 60 sec: 48878.9, 300 sec: 49152.0). Total num frames: 122454016. Throughput: 0: 48547.0. Samples: 19356980. Policy #0 lag: (min: 0.0, avg: 7.8, max: 20.0) [2024-06-06 12:06:51,757][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:06:52,262][02924] Updated weights for policy 0, policy_version 7477 (0.0030) [2024-06-06 12:06:54,076][02924] Updated weights for policy 0, policy_version 7487 (0.0029) [2024-06-06 12:06:56,757][02692] Fps is (10 sec: 47512.9, 60 sec: 48878.8, 300 sec: 49318.6). Total num frames: 122732544. Throughput: 0: 48878.0. Samples: 19658040. Policy #0 lag: (min: 0.0, avg: 7.8, max: 20.0) [2024-06-06 12:06:56,757][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:06:58,931][02924] Updated weights for policy 0, policy_version 7497 (0.0032) [2024-06-06 12:07:00,698][02924] Updated weights for policy 0, policy_version 7507 (0.0034) [2024-06-06 12:07:01,757][02692] Fps is (10 sec: 55705.0, 60 sec: 48878.9, 300 sec: 49485.2). Total num frames: 123011072. Throughput: 0: 48787.1. Samples: 19945220. Policy #0 lag: (min: 0.0, avg: 8.2, max: 21.0) [2024-06-06 12:07:01,757][02692] Avg episode reward: [(0, '0.057')] [2024-06-06 12:07:05,227][02904] Signal inference workers to stop experience collection... (350 times) [2024-06-06 12:07:05,273][02924] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-06 12:07:05,273][02904] Signal inference workers to resume experience collection... (350 times) [2024-06-06 12:07:05,293][02924] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-06 12:07:05,408][02924] Updated weights for policy 0, policy_version 7517 (0.0032) [2024-06-06 12:07:06,757][02692] Fps is (10 sec: 52429.0, 60 sec: 49154.2, 300 sec: 49485.2). Total num frames: 123256832. Throughput: 0: 49030.2. Samples: 20109460. Policy #0 lag: (min: 0.0, avg: 7.5, max: 22.0) [2024-06-06 12:07:06,758][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:07:07,443][02924] Updated weights for policy 0, policy_version 7527 (0.0031) [2024-06-06 12:07:11,757][02692] Fps is (10 sec: 44237.0, 60 sec: 49151.9, 300 sec: 49152.1). Total num frames: 123453440. Throughput: 0: 48994.6. Samples: 20402840. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:07:11,757][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:07:12,118][02924] Updated weights for policy 0, policy_version 7537 (0.0030) [2024-06-06 12:07:14,294][02924] Updated weights for policy 0, policy_version 7547 (0.0021) [2024-06-06 12:07:16,757][02692] Fps is (10 sec: 44237.3, 60 sec: 48879.0, 300 sec: 49263.6). Total num frames: 123699200. Throughput: 0: 48950.2. Samples: 20692520. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:07:16,757][02692] Avg episode reward: [(0, '0.058')] [2024-06-06 12:07:18,729][02924] Updated weights for policy 0, policy_version 7557 (0.0032) [2024-06-06 12:07:20,724][02924] Updated weights for policy 0, policy_version 7567 (0.0022) [2024-06-06 12:07:21,757][02692] Fps is (10 sec: 54067.5, 60 sec: 49152.0, 300 sec: 49485.2). Total num frames: 123994112. Throughput: 0: 49574.2. Samples: 20851440. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 12:07:21,757][02692] Avg episode reward: [(0, '0.058')] [2024-06-06 12:07:25,373][02924] Updated weights for policy 0, policy_version 7577 (0.0025) [2024-06-06 12:07:26,757][02692] Fps is (10 sec: 55705.4, 60 sec: 49425.0, 300 sec: 49540.8). Total num frames: 124256256. Throughput: 0: 49735.3. Samples: 21155260. Policy #0 lag: (min: 0.0, avg: 8.6, max: 23.0) [2024-06-06 12:07:26,757][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:07:26,894][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007585_124272640.pth... [2024-06-06 12:07:26,944][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006860_112394240.pth [2024-06-06 12:07:27,221][02924] Updated weights for policy 0, policy_version 7587 (0.0032) [2024-06-06 12:07:31,760][02692] Fps is (10 sec: 44223.7, 60 sec: 49422.6, 300 sec: 49096.0). Total num frames: 124436480. Throughput: 0: 49495.8. Samples: 21443940. Policy #0 lag: (min: 0.0, avg: 7.1, max: 21.0) [2024-06-06 12:07:31,760][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:07:32,095][02924] Updated weights for policy 0, policy_version 7597 (0.0030) [2024-06-06 12:07:34,057][02924] Updated weights for policy 0, policy_version 7607 (0.0021) [2024-06-06 12:07:36,757][02692] Fps is (10 sec: 42597.8, 60 sec: 49151.9, 300 sec: 49152.0). Total num frames: 124682240. Throughput: 0: 49190.5. Samples: 21570560. Policy #0 lag: (min: 0.0, avg: 7.1, max: 21.0) [2024-06-06 12:07:36,758][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:07:38,723][02924] Updated weights for policy 0, policy_version 7617 (0.0034) [2024-06-06 12:07:40,640][02924] Updated weights for policy 0, policy_version 7627 (0.0035) [2024-06-06 12:07:41,757][02692] Fps is (10 sec: 52444.6, 60 sec: 48881.4, 300 sec: 49374.2). Total num frames: 124960768. Throughput: 0: 49175.8. Samples: 21870940. Policy #0 lag: (min: 1.0, avg: 7.5, max: 21.0) [2024-06-06 12:07:41,757][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:07:45,425][02924] Updated weights for policy 0, policy_version 7637 (0.0034) [2024-06-06 12:07:46,757][02692] Fps is (10 sec: 55706.6, 60 sec: 49698.1, 300 sec: 49485.7). Total num frames: 125239296. Throughput: 0: 49589.0. Samples: 22176720. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:07:46,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:07:47,190][02924] Updated weights for policy 0, policy_version 7647 (0.0029) [2024-06-06 12:07:51,757][02692] Fps is (10 sec: 44236.7, 60 sec: 49152.0, 300 sec: 49040.9). Total num frames: 125403136. Throughput: 0: 49171.2. Samples: 22322160. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:07:51,757][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:07:52,144][02924] Updated weights for policy 0, policy_version 7657 (0.0025) [2024-06-06 12:07:52,651][02904] Signal inference workers to stop experience collection... (400 times) [2024-06-06 12:07:52,651][02904] Signal inference workers to resume experience collection... (400 times) [2024-06-06 12:07:52,663][02924] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-06 12:07:52,675][02924] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-06 12:07:53,850][02924] Updated weights for policy 0, policy_version 7667 (0.0022) [2024-06-06 12:07:56,757][02692] Fps is (10 sec: 45874.3, 60 sec: 49425.0, 300 sec: 49318.6). Total num frames: 125698048. Throughput: 0: 49076.3. Samples: 22611280. Policy #0 lag: (min: 0.0, avg: 7.2, max: 21.0) [2024-06-06 12:07:56,758][02692] Avg episode reward: [(0, '0.065')] [2024-06-06 12:07:56,771][02904] Saving new best policy, reward=0.065! [2024-06-06 12:07:58,712][02924] Updated weights for policy 0, policy_version 7677 (0.0038) [2024-06-06 12:08:00,723][02924] Updated weights for policy 0, policy_version 7687 (0.0021) [2024-06-06 12:08:01,757][02692] Fps is (10 sec: 55705.4, 60 sec: 49152.1, 300 sec: 49374.1). Total num frames: 125960192. Throughput: 0: 49048.0. Samples: 22899680. Policy #0 lag: (min: 2.0, avg: 7.6, max: 22.0) [2024-06-06 12:08:01,757][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:08:05,476][02924] Updated weights for policy 0, policy_version 7697 (0.0029) [2024-06-06 12:08:06,757][02692] Fps is (10 sec: 52429.7, 60 sec: 49425.2, 300 sec: 49540.8). Total num frames: 126222336. Throughput: 0: 49052.0. Samples: 23058780. Policy #0 lag: (min: 0.0, avg: 8.8, max: 23.0) [2024-06-06 12:08:06,757][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:08:07,617][02924] Updated weights for policy 0, policy_version 7707 (0.0034) [2024-06-06 12:08:11,757][02692] Fps is (10 sec: 45875.1, 60 sec: 49425.1, 300 sec: 49207.5). Total num frames: 126418944. Throughput: 0: 48817.8. Samples: 23352060. Policy #0 lag: (min: 0.0, avg: 8.8, max: 23.0) [2024-06-06 12:08:11,758][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:08:12,024][02924] Updated weights for policy 0, policy_version 7717 (0.0026) [2024-06-06 12:08:13,982][02924] Updated weights for policy 0, policy_version 7727 (0.0039) [2024-06-06 12:08:16,757][02692] Fps is (10 sec: 45874.8, 60 sec: 49698.1, 300 sec: 49207.5). Total num frames: 126681088. Throughput: 0: 49054.7. Samples: 23651260. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 12:08:16,758][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:08:18,713][02924] Updated weights for policy 0, policy_version 7737 (0.0028) [2024-06-06 12:08:20,462][02924] Updated weights for policy 0, policy_version 7747 (0.0031) [2024-06-06 12:08:21,757][02692] Fps is (10 sec: 52428.6, 60 sec: 49151.9, 300 sec: 49318.6). Total num frames: 126943232. Throughput: 0: 49733.9. Samples: 23808580. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 12:08:21,757][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:08:25,128][02924] Updated weights for policy 0, policy_version 7757 (0.0028) [2024-06-06 12:08:26,757][02692] Fps is (10 sec: 52427.4, 60 sec: 49151.7, 300 sec: 49540.7). Total num frames: 127205376. Throughput: 0: 49694.7. Samples: 24107220. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-06 12:08:26,758][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:08:27,379][02924] Updated weights for policy 0, policy_version 7767 (0.0032) [2024-06-06 12:08:31,757][02692] Fps is (10 sec: 45875.6, 60 sec: 49427.5, 300 sec: 49152.0). Total num frames: 127401984. Throughput: 0: 49418.2. Samples: 24400540. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-06 12:08:31,757][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:08:31,922][02924] Updated weights for policy 0, policy_version 7777 (0.0032) [2024-06-06 12:08:34,256][02924] Updated weights for policy 0, policy_version 7787 (0.0029) [2024-06-06 12:08:36,760][02692] Fps is (10 sec: 45863.0, 60 sec: 49695.8, 300 sec: 49262.6). Total num frames: 127664128. Throughput: 0: 49229.6. Samples: 24537640. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 12:08:36,761][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:08:38,621][02924] Updated weights for policy 0, policy_version 7797 (0.0032) [2024-06-06 12:08:39,346][02904] Signal inference workers to stop experience collection... (450 times) [2024-06-06 12:08:39,374][02924] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-06 12:08:39,457][02904] Signal inference workers to resume experience collection... (450 times) [2024-06-06 12:08:39,457][02924] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-06 12:08:40,729][02924] Updated weights for policy 0, policy_version 7807 (0.0027) [2024-06-06 12:08:41,757][02692] Fps is (10 sec: 52428.5, 60 sec: 49425.0, 300 sec: 49374.1). Total num frames: 127926272. Throughput: 0: 49343.2. Samples: 24831720. Policy #0 lag: (min: 1.0, avg: 10.8, max: 21.0) [2024-06-06 12:08:41,757][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:08:45,190][02924] Updated weights for policy 0, policy_version 7817 (0.0029) [2024-06-06 12:08:46,757][02692] Fps is (10 sec: 52444.2, 60 sec: 49151.9, 300 sec: 49485.2). Total num frames: 128188416. Throughput: 0: 49705.7. Samples: 25136440. Policy #0 lag: (min: 2.0, avg: 11.1, max: 24.0) [2024-06-06 12:08:46,758][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:08:47,343][02924] Updated weights for policy 0, policy_version 7827 (0.0030) [2024-06-06 12:08:51,757][02692] Fps is (10 sec: 45875.1, 60 sec: 49698.1, 300 sec: 49152.0). Total num frames: 128385024. Throughput: 0: 49205.3. Samples: 25273020. Policy #0 lag: (min: 2.0, avg: 11.1, max: 24.0) [2024-06-06 12:08:51,757][02692] Avg episode reward: [(0, '0.057')] [2024-06-06 12:08:51,801][02924] Updated weights for policy 0, policy_version 7837 (0.0040) [2024-06-06 12:08:54,239][02924] Updated weights for policy 0, policy_version 7847 (0.0025) [2024-06-06 12:08:56,757][02692] Fps is (10 sec: 45875.6, 60 sec: 49152.2, 300 sec: 49207.6). Total num frames: 128647168. Throughput: 0: 49324.9. Samples: 25571680. Policy #0 lag: (min: 1.0, avg: 10.9, max: 26.0) [2024-06-06 12:08:56,757][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:08:58,272][02924] Updated weights for policy 0, policy_version 7857 (0.0022) [2024-06-06 12:09:00,910][02924] Updated weights for policy 0, policy_version 7867 (0.0028) [2024-06-06 12:09:01,757][02692] Fps is (10 sec: 52429.0, 60 sec: 49152.0, 300 sec: 49374.1). Total num frames: 128909312. Throughput: 0: 49192.5. Samples: 25864920. Policy #0 lag: (min: 1.0, avg: 11.8, max: 26.0) [2024-06-06 12:09:01,758][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:09:05,298][02924] Updated weights for policy 0, policy_version 7877 (0.0025) [2024-06-06 12:09:06,757][02692] Fps is (10 sec: 54067.2, 60 sec: 49425.1, 300 sec: 49540.8). Total num frames: 129187840. Throughput: 0: 49208.1. Samples: 26022940. Policy #0 lag: (min: 0.0, avg: 12.3, max: 25.0) [2024-06-06 12:09:06,757][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:09:07,372][02924] Updated weights for policy 0, policy_version 7887 (0.0031) [2024-06-06 12:09:11,757][02692] Fps is (10 sec: 45875.4, 60 sec: 49152.0, 300 sec: 49152.0). Total num frames: 129368064. Throughput: 0: 49277.3. Samples: 26324680. Policy #0 lag: (min: 0.0, avg: 12.3, max: 25.0) [2024-06-06 12:09:11,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:09:11,797][02924] Updated weights for policy 0, policy_version 7897 (0.0030) [2024-06-06 12:09:13,949][02924] Updated weights for policy 0, policy_version 7907 (0.0026) [2024-06-06 12:09:16,760][02692] Fps is (10 sec: 45861.4, 60 sec: 49422.7, 300 sec: 49207.0). Total num frames: 129646592. Throughput: 0: 49070.5. Samples: 26608860. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 12:09:16,761][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:09:18,619][02924] Updated weights for policy 0, policy_version 7917 (0.0036) [2024-06-06 12:09:20,719][02924] Updated weights for policy 0, policy_version 7927 (0.0034) [2024-06-06 12:09:21,757][02692] Fps is (10 sec: 52428.8, 60 sec: 49152.1, 300 sec: 49318.7). Total num frames: 129892352. Throughput: 0: 49438.0. Samples: 26762200. Policy #0 lag: (min: 2.0, avg: 12.9, max: 24.0) [2024-06-06 12:09:21,757][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:09:25,068][02924] Updated weights for policy 0, policy_version 7937 (0.0027) [2024-06-06 12:09:26,757][02692] Fps is (10 sec: 52443.7, 60 sec: 49425.2, 300 sec: 49540.8). Total num frames: 130170880. Throughput: 0: 49682.1. Samples: 27067420. Policy #0 lag: (min: 2.0, avg: 12.9, max: 24.0) [2024-06-06 12:09:26,758][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:09:26,767][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007945_130170880.pth... [2024-06-06 12:09:26,810][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007223_118341632.pth [2024-06-06 12:09:27,213][02924] Updated weights for policy 0, policy_version 7947 (0.0033) [2024-06-06 12:09:31,757][02692] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 49152.0). Total num frames: 130351104. Throughput: 0: 49393.4. Samples: 27359140. Policy #0 lag: (min: 0.0, avg: 12.9, max: 29.0) [2024-06-06 12:09:31,757][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:09:31,766][02924] Updated weights for policy 0, policy_version 7957 (0.0027) [2024-06-06 12:09:33,625][02904] Signal inference workers to stop experience collection... (500 times) [2024-06-06 12:09:33,655][02924] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-06 12:09:33,682][02904] Signal inference workers to resume experience collection... (500 times) [2024-06-06 12:09:33,682][02924] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-06 12:09:33,831][02924] Updated weights for policy 0, policy_version 7967 (0.0027) [2024-06-06 12:09:36,757][02692] Fps is (10 sec: 45875.7, 60 sec: 49427.5, 300 sec: 49207.5). Total num frames: 130629632. Throughput: 0: 49172.0. Samples: 27485760. Policy #0 lag: (min: 1.0, avg: 13.1, max: 21.0) [2024-06-06 12:09:36,757][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:09:38,423][02924] Updated weights for policy 0, policy_version 7977 (0.0026) [2024-06-06 12:09:40,731][02924] Updated weights for policy 0, policy_version 7987 (0.0034) [2024-06-06 12:09:41,757][02692] Fps is (10 sec: 52428.8, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 130875392. Throughput: 0: 49236.4. Samples: 27787320. Policy #0 lag: (min: 0.0, avg: 13.3, max: 25.0) [2024-06-06 12:09:41,757][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:09:45,259][02924] Updated weights for policy 0, policy_version 7997 (0.0023) [2024-06-06 12:09:46,757][02692] Fps is (10 sec: 52429.2, 60 sec: 49425.1, 300 sec: 49540.8). Total num frames: 131153920. Throughput: 0: 49397.8. Samples: 28087820. Policy #0 lag: (min: 0.0, avg: 13.3, max: 25.0) [2024-06-06 12:09:46,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:09:47,390][02924] Updated weights for policy 0, policy_version 8007 (0.0028) [2024-06-06 12:09:51,692][02924] Updated weights for policy 0, policy_version 8017 (0.0026) [2024-06-06 12:09:51,757][02692] Fps is (10 sec: 47513.0, 60 sec: 49425.0, 300 sec: 49207.5). Total num frames: 131350528. Throughput: 0: 49186.1. Samples: 28236320. Policy #0 lag: (min: 0.0, avg: 11.4, max: 20.0) [2024-06-06 12:09:51,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:09:53,994][02924] Updated weights for policy 0, policy_version 8027 (0.0035) [2024-06-06 12:09:56,757][02692] Fps is (10 sec: 45874.7, 60 sec: 49425.0, 300 sec: 49207.5). Total num frames: 131612672. Throughput: 0: 48956.8. Samples: 28527740. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:09:56,758][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:09:58,374][02924] Updated weights for policy 0, policy_version 8037 (0.0031) [2024-06-06 12:10:00,538][02924] Updated weights for policy 0, policy_version 8047 (0.0034) [2024-06-06 12:10:01,757][02692] Fps is (10 sec: 50791.0, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 131858432. Throughput: 0: 49131.3. Samples: 28819620. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:10:01,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:10:04,890][02924] Updated weights for policy 0, policy_version 8057 (0.0037) [2024-06-06 12:10:06,757][02692] Fps is (10 sec: 52429.0, 60 sec: 49151.9, 300 sec: 49485.2). Total num frames: 132136960. Throughput: 0: 49293.7. Samples: 28980420. Policy #0 lag: (min: 1.0, avg: 10.3, max: 21.0) [2024-06-06 12:10:06,757][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:10:07,195][02924] Updated weights for policy 0, policy_version 8067 (0.0031) [2024-06-06 12:10:11,550][02924] Updated weights for policy 0, policy_version 8077 (0.0026) [2024-06-06 12:10:11,757][02692] Fps is (10 sec: 47513.0, 60 sec: 49425.0, 300 sec: 49152.1). Total num frames: 132333568. Throughput: 0: 49040.1. Samples: 29274220. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 12:10:11,757][02692] Avg episode reward: [(0, '0.058')] [2024-06-06 12:10:13,873][02924] Updated weights for policy 0, policy_version 8087 (0.0029) [2024-06-06 12:10:16,757][02692] Fps is (10 sec: 45875.4, 60 sec: 49154.5, 300 sec: 49152.1). Total num frames: 132595712. Throughput: 0: 49292.4. Samples: 29577300. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:10:16,757][02692] Avg episode reward: [(0, '0.064')] [2024-06-06 12:10:17,972][02924] Updated weights for policy 0, policy_version 8097 (0.0028) [2024-06-06 12:10:20,348][02924] Updated weights for policy 0, policy_version 8107 (0.0036) [2024-06-06 12:10:21,757][02692] Fps is (10 sec: 52429.5, 60 sec: 49425.1, 300 sec: 49263.1). Total num frames: 132857856. Throughput: 0: 49848.1. Samples: 29728920. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:10:21,757][02692] Avg episode reward: [(0, '0.057')] [2024-06-06 12:10:24,663][02924] Updated weights for policy 0, policy_version 8117 (0.0025) [2024-06-06 12:10:26,757][02692] Fps is (10 sec: 55704.7, 60 sec: 49698.1, 300 sec: 49540.8). Total num frames: 133152768. Throughput: 0: 49914.5. Samples: 30033480. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 12:10:26,759][02924] Updated weights for policy 0, policy_version 8127 (0.0028) [2024-06-06 12:10:26,766][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:10:31,248][02924] Updated weights for policy 0, policy_version 8137 (0.0019) [2024-06-06 12:10:31,757][02692] Fps is (10 sec: 49151.7, 60 sec: 49971.2, 300 sec: 49263.5). Total num frames: 133349376. Throughput: 0: 49935.0. Samples: 30334900. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:10:31,757][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:10:33,493][02924] Updated weights for policy 0, policy_version 8147 (0.0032) [2024-06-06 12:10:36,757][02692] Fps is (10 sec: 42598.6, 60 sec: 49152.0, 300 sec: 49152.0). Total num frames: 133578752. Throughput: 0: 49598.7. Samples: 30468260. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:10:36,758][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:10:37,945][02924] Updated weights for policy 0, policy_version 8157 (0.0039) [2024-06-06 12:10:38,614][02904] Signal inference workers to stop experience collection... (550 times) [2024-06-06 12:10:38,615][02904] Signal inference workers to resume experience collection... (550 times) [2024-06-06 12:10:38,664][02924] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-06 12:10:38,664][02924] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-06 12:10:40,218][02924] Updated weights for policy 0, policy_version 8167 (0.0023) [2024-06-06 12:10:41,757][02692] Fps is (10 sec: 50790.5, 60 sec: 49698.1, 300 sec: 49207.6). Total num frames: 133857280. Throughput: 0: 49733.4. Samples: 30765740. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:10:41,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:10:44,446][02924] Updated weights for policy 0, policy_version 8177 (0.0027) [2024-06-06 12:10:46,757][02692] Fps is (10 sec: 54067.7, 60 sec: 49425.0, 300 sec: 49485.2). Total num frames: 134119424. Throughput: 0: 49714.6. Samples: 31056780. Policy #0 lag: (min: 1.0, avg: 8.5, max: 21.0) [2024-06-06 12:10:46,757][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:10:46,789][02924] Updated weights for policy 0, policy_version 8187 (0.0022) [2024-06-06 12:10:51,022][02924] Updated weights for policy 0, policy_version 8197 (0.0023) [2024-06-06 12:10:51,757][02692] Fps is (10 sec: 49151.8, 60 sec: 49971.2, 300 sec: 49318.6). Total num frames: 134348800. Throughput: 0: 49774.7. Samples: 31220280. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 12:10:51,758][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:10:53,115][02924] Updated weights for policy 0, policy_version 8207 (0.0033) [2024-06-06 12:10:56,757][02692] Fps is (10 sec: 45875.4, 60 sec: 49425.2, 300 sec: 49152.0). Total num frames: 134578176. Throughput: 0: 49951.7. Samples: 31522040. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 12:10:56,757][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:10:57,777][02924] Updated weights for policy 0, policy_version 8217 (0.0024) [2024-06-06 12:11:17,523][02692] Fps is (10 sec: 15897.2, 60 sec: 38275.5, 300 sec: 46500.7). Total num frames: 134758400. Throughput: 0: 34448.7. Samples: 31670600. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 12:11:17,523][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:11:17,537][02692] Fps is (10 sec: 8673.0, 60 sec: 37036.5, 300 sec: 46615.5). Total num frames: 134758400. Throughput: 0: 34809.6. Samples: 31670600. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 12:11:17,537][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:11:17,542][02692] Fps is (10 sec: 0.0, 60 sec: 36859.7, 300 sec: 46518.0). Total num frames: 134758400. Throughput: 0: 33551.0. Samples: 31737380. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 12:11:17,543][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:11:17,543][02692] Fps is (10 sec: 0.0, 60 sec: 35578.7, 300 sec: 46362.6). Total num frames: 134758400. Throughput: 0: 30631.2. Samples: 31737380. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 12:11:17,543][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:11:17,586][02904] Saving new best policy, reward=0.069! [2024-06-06 12:11:18,006][02924] Updated weights for policy 0, policy_version 8227 (0.0029) [2024-06-06 12:11:21,759][02692] Fps is (10 sec: 38856.9, 60 sec: 34405.2, 300 sec: 46208.1). Total num frames: 134922240. Throughput: 0: 29791.6. Samples: 31808940. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 12:11:21,760][02692] Avg episode reward: [(0, '0.065')] [2024-06-06 12:11:22,234][02924] Updated weights for policy 0, policy_version 8237 (0.0024) [2024-06-06 12:11:24,574][02924] Updated weights for policy 0, policy_version 8247 (0.0020) [2024-06-06 12:11:26,757][02692] Fps is (10 sec: 48010.8, 60 sec: 34133.4, 300 sec: 46541.7). Total num frames: 135200768. Throughput: 0: 30004.9. Samples: 32115960. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-06 12:11:26,762][02692] Avg episode reward: [(0, '0.057')] [2024-06-06 12:11:26,888][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008253_135217152.pth... [2024-06-06 12:11:26,946][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007585_124272640.pth [2024-06-06 12:11:28,761][02924] Updated weights for policy 0, policy_version 8257 (0.0027) [2024-06-06 12:11:31,037][02924] Updated weights for policy 0, policy_version 8267 (0.0029) [2024-06-06 12:11:31,760][02692] Fps is (10 sec: 54062.5, 60 sec: 35223.9, 300 sec: 46541.2). Total num frames: 135462912. Throughput: 0: 30146.5. Samples: 32413460. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-06 12:11:31,760][02692] Avg episode reward: [(0, '0.070')] [2024-06-06 12:11:35,286][02924] Updated weights for policy 0, policy_version 8277 (0.0027) [2024-06-06 12:11:36,757][02692] Fps is (10 sec: 52429.1, 60 sec: 35771.8, 300 sec: 46431.1). Total num frames: 135725056. Throughput: 0: 30183.2. Samples: 32578520. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:11:36,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:11:37,470][02924] Updated weights for policy 0, policy_version 8287 (0.0022) [2024-06-06 12:11:41,757][02692] Fps is (10 sec: 45888.6, 60 sec: 34406.4, 300 sec: 46319.5). Total num frames: 135921664. Throughput: 0: 29943.1. Samples: 32869480. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 12:11:41,757][02692] Avg episode reward: [(0, '0.067')] [2024-06-06 12:11:41,883][02924] Updated weights for policy 0, policy_version 8297 (0.0029) [2024-06-06 12:11:44,196][02924] Updated weights for policy 0, policy_version 8307 (0.0027) [2024-06-06 12:11:46,757][02692] Fps is (10 sec: 47513.3, 60 sec: 34679.5, 300 sec: 46597.2). Total num frames: 136200192. Throughput: 0: 51249.2. Samples: 33168840. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-06 12:11:46,757][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:11:48,359][02924] Updated weights for policy 0, policy_version 8317 (0.0030) [2024-06-06 12:11:50,797][02924] Updated weights for policy 0, policy_version 8327 (0.0027) [2024-06-06 12:11:51,703][02904] Signal inference workers to stop experience collection... (600 times) [2024-06-06 12:11:51,743][02924] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-06 12:11:51,749][02904] Signal inference workers to resume experience collection... (600 times) [2024-06-06 12:11:51,757][02924] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-06 12:11:51,757][02692] Fps is (10 sec: 54067.4, 60 sec: 35225.6, 300 sec: 46541.7). Total num frames: 136462336. Throughput: 0: 48039.1. Samples: 33314500. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-06 12:11:51,762][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:11:54,889][02924] Updated weights for policy 0, policy_version 8337 (0.0030) [2024-06-06 12:11:56,757][02692] Fps is (10 sec: 50790.8, 60 sec: 35498.7, 300 sec: 46430.6). Total num frames: 136708096. Throughput: 0: 48044.6. Samples: 33621420. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:11:56,757][02692] Avg episode reward: [(0, '0.065')] [2024-06-06 12:11:57,279][02924] Updated weights for policy 0, policy_version 8347 (0.0022) [2024-06-06 12:12:01,591][02924] Updated weights for policy 0, policy_version 8357 (0.0032) [2024-06-06 12:12:01,757][02692] Fps is (10 sec: 47513.6, 60 sec: 49262.0, 300 sec: 46375.1). Total num frames: 136937472. Throughput: 0: 49378.6. Samples: 33920600. Policy #0 lag: (min: 0.0, avg: 9.1, max: 24.0) [2024-06-06 12:12:01,757][02692] Avg episode reward: [(0, '0.067')] [2024-06-06 12:12:03,823][02924] Updated weights for policy 0, policy_version 8367 (0.0029) [2024-06-06 12:12:06,757][02692] Fps is (10 sec: 47513.4, 60 sec: 49265.1, 300 sec: 46541.7). Total num frames: 137183232. Throughput: 0: 49911.2. Samples: 34054840. Policy #0 lag: (min: 0.0, avg: 9.1, max: 24.0) [2024-06-06 12:12:06,757][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:12:08,091][02924] Updated weights for policy 0, policy_version 8377 (0.0025) [2024-06-06 12:12:10,631][02924] Updated weights for policy 0, policy_version 8387 (0.0027) [2024-06-06 12:12:11,757][02692] Fps is (10 sec: 52428.6, 60 sec: 49864.2, 300 sec: 46652.7). Total num frames: 137461760. Throughput: 0: 49783.5. Samples: 34356220. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 12:12:11,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:12:14,692][02924] Updated weights for policy 0, policy_version 8397 (0.0033) [2024-06-06 12:12:16,757][02692] Fps is (10 sec: 50790.2, 60 sec: 49527.8, 300 sec: 46430.6). Total num frames: 137691136. Throughput: 0: 49810.3. Samples: 34654780. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 12:12:16,757][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:12:17,258][02924] Updated weights for policy 0, policy_version 8407 (0.0032) [2024-06-06 12:12:21,177][02924] Updated weights for policy 0, policy_version 8417 (0.0037) [2024-06-06 12:12:21,757][02692] Fps is (10 sec: 45875.6, 60 sec: 49973.0, 300 sec: 46319.5). Total num frames: 137920512. Throughput: 0: 49573.3. Samples: 34809320. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 12:12:21,757][02692] Avg episode reward: [(0, '0.068')] [2024-06-06 12:12:23,659][02924] Updated weights for policy 0, policy_version 8427 (0.0028) [2024-06-06 12:12:26,757][02692] Fps is (10 sec: 49152.2, 60 sec: 49698.2, 300 sec: 46597.7). Total num frames: 138182656. Throughput: 0: 49648.0. Samples: 35103640. Policy #0 lag: (min: 0.0, avg: 9.2, max: 20.0) [2024-06-06 12:12:26,757][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:12:26,762][02904] Saving new best policy, reward=0.071! [2024-06-06 12:12:27,883][02924] Updated weights for policy 0, policy_version 8437 (0.0025) [2024-06-06 12:12:30,292][02924] Updated weights for policy 0, policy_version 8447 (0.0037) [2024-06-06 12:12:31,757][02692] Fps is (10 sec: 52428.1, 60 sec: 49700.5, 300 sec: 46652.8). Total num frames: 138444800. Throughput: 0: 49453.7. Samples: 35394260. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 12:12:31,758][02692] Avg episode reward: [(0, '0.068')] [2024-06-06 12:12:34,495][02924] Updated weights for policy 0, policy_version 8457 (0.0028) [2024-06-06 12:12:36,757][02692] Fps is (10 sec: 50789.9, 60 sec: 49425.0, 300 sec: 46541.6). Total num frames: 138690560. Throughput: 0: 49563.5. Samples: 35544860. Policy #0 lag: (min: 0.0, avg: 10.6, max: 23.0) [2024-06-06 12:12:36,758][02692] Avg episode reward: [(0, '0.067')] [2024-06-06 12:12:37,268][02924] Updated weights for policy 0, policy_version 8467 (0.0022) [2024-06-06 12:12:41,187][02924] Updated weights for policy 0, policy_version 8477 (0.0028) [2024-06-06 12:12:41,757][02692] Fps is (10 sec: 44237.3, 60 sec: 49425.1, 300 sec: 46264.0). Total num frames: 138887168. Throughput: 0: 49263.5. Samples: 35838280. Policy #0 lag: (min: 0.0, avg: 10.6, max: 23.0) [2024-06-06 12:12:41,757][02692] Avg episode reward: [(0, '0.058')] [2024-06-06 12:12:43,956][02924] Updated weights for policy 0, policy_version 8487 (0.0027) [2024-06-06 12:12:46,757][02692] Fps is (10 sec: 47513.9, 60 sec: 49425.1, 300 sec: 46652.7). Total num frames: 139165696. Throughput: 0: 49034.6. Samples: 36127160. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 12:12:46,757][02692] Avg episode reward: [(0, '0.068')] [2024-06-06 12:12:47,821][02924] Updated weights for policy 0, policy_version 8497 (0.0029) [2024-06-06 12:12:50,120][02904] Signal inference workers to stop experience collection... (650 times) [2024-06-06 12:12:50,164][02924] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-06 12:12:50,227][02904] Signal inference workers to resume experience collection... (650 times) [2024-06-06 12:12:50,227][02924] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-06 12:12:50,545][02924] Updated weights for policy 0, policy_version 8507 (0.0029) [2024-06-06 12:12:51,757][02692] Fps is (10 sec: 52428.6, 60 sec: 49152.0, 300 sec: 46486.2). Total num frames: 139411456. Throughput: 0: 49364.9. Samples: 36276260. Policy #0 lag: (min: 0.0, avg: 11.1, max: 20.0) [2024-06-06 12:12:51,757][02692] Avg episode reward: [(0, '0.070')] [2024-06-06 12:12:54,324][02924] Updated weights for policy 0, policy_version 8517 (0.0036) [2024-06-06 12:12:56,757][02692] Fps is (10 sec: 50790.4, 60 sec: 49425.0, 300 sec: 46486.1). Total num frames: 139673600. Throughput: 0: 49349.8. Samples: 36576960. Policy #0 lag: (min: 0.0, avg: 11.1, max: 20.0) [2024-06-06 12:12:56,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:12:57,097][02924] Updated weights for policy 0, policy_version 8527 (0.0026) [2024-06-06 12:13:01,074][02924] Updated weights for policy 0, policy_version 8537 (0.0030) [2024-06-06 12:13:01,757][02692] Fps is (10 sec: 45875.1, 60 sec: 48878.9, 300 sec: 46264.0). Total num frames: 139870208. Throughput: 0: 49160.4. Samples: 36867000. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 12:13:01,757][02692] Avg episode reward: [(0, '0.064')] [2024-06-06 12:13:03,955][02924] Updated weights for policy 0, policy_version 8547 (0.0028) [2024-06-06 12:13:06,757][02692] Fps is (10 sec: 45875.3, 60 sec: 49152.0, 300 sec: 46486.1). Total num frames: 140132352. Throughput: 0: 48951.9. Samples: 37012160. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:13:06,757][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:13:07,463][02924] Updated weights for policy 0, policy_version 8557 (0.0039) [2024-06-06 12:13:10,745][02924] Updated weights for policy 0, policy_version 8567 (0.0032) [2024-06-06 12:13:11,757][02692] Fps is (10 sec: 52429.4, 60 sec: 48879.0, 300 sec: 46486.2). Total num frames: 140394496. Throughput: 0: 48922.7. Samples: 37305160. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:13:11,757][02692] Avg episode reward: [(0, '0.065')] [2024-06-06 12:13:13,926][02924] Updated weights for policy 0, policy_version 8577 (0.0027) [2024-06-06 12:13:16,757][02692] Fps is (10 sec: 54066.8, 60 sec: 49698.1, 300 sec: 46541.7). Total num frames: 140673024. Throughput: 0: 49328.0. Samples: 37614020. Policy #0 lag: (min: 1.0, avg: 9.6, max: 21.0) [2024-06-06 12:13:16,758][02692] Avg episode reward: [(0, '0.068')] [2024-06-06 12:13:17,107][02924] Updated weights for policy 0, policy_version 8587 (0.0022) [2024-06-06 12:13:20,627][02924] Updated weights for policy 0, policy_version 8597 (0.0030) [2024-06-06 12:13:21,757][02692] Fps is (10 sec: 47513.3, 60 sec: 49152.0, 300 sec: 46319.6). Total num frames: 140869632. Throughput: 0: 49359.7. Samples: 37766040. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:13:21,757][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:13:23,581][02924] Updated weights for policy 0, policy_version 8607 (0.0025) [2024-06-06 12:13:26,757][02692] Fps is (10 sec: 47513.9, 60 sec: 49425.0, 300 sec: 46597.2). Total num frames: 141148160. Throughput: 0: 49417.7. Samples: 38062080. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:13:26,760][02692] Avg episode reward: [(0, '0.070')] [2024-06-06 12:13:26,770][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008615_141148160.pth... [2024-06-06 12:13:26,822][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007945_130170880.pth [2024-06-06 12:13:27,333][02924] Updated weights for policy 0, policy_version 8617 (0.0035) [2024-06-06 12:13:30,330][02924] Updated weights for policy 0, policy_version 8627 (0.0022) [2024-06-06 12:13:31,757][02692] Fps is (10 sec: 52427.9, 60 sec: 49151.9, 300 sec: 46542.1). Total num frames: 141393920. Throughput: 0: 49573.6. Samples: 38357980. Policy #0 lag: (min: 0.0, avg: 6.9, max: 19.0) [2024-06-06 12:13:31,758][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:13:33,899][02924] Updated weights for policy 0, policy_version 8637 (0.0032) [2024-06-06 12:13:36,757][02692] Fps is (10 sec: 50790.8, 60 sec: 49425.2, 300 sec: 46541.7). Total num frames: 141656064. Throughput: 0: 49526.3. Samples: 38504940. Policy #0 lag: (min: 1.0, avg: 10.8, max: 24.0) [2024-06-06 12:13:36,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:13:36,931][02924] Updated weights for policy 0, policy_version 8647 (0.0033) [2024-06-06 12:13:40,270][02924] Updated weights for policy 0, policy_version 8657 (0.0026) [2024-06-06 12:13:41,757][02692] Fps is (10 sec: 49153.1, 60 sec: 49971.2, 300 sec: 46430.6). Total num frames: 141885440. Throughput: 0: 49541.9. Samples: 38806340. Policy #0 lag: (min: 1.0, avg: 10.8, max: 24.0) [2024-06-06 12:13:41,757][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:13:43,399][02924] Updated weights for policy 0, policy_version 8667 (0.0030) [2024-06-06 12:13:46,757][02692] Fps is (10 sec: 49151.8, 60 sec: 49698.2, 300 sec: 46652.8). Total num frames: 142147584. Throughput: 0: 49788.5. Samples: 39107480. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 12:13:46,757][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:13:47,006][02924] Updated weights for policy 0, policy_version 8677 (0.0031) [2024-06-06 12:13:50,002][02924] Updated weights for policy 0, policy_version 8687 (0.0031) [2024-06-06 12:13:51,757][02692] Fps is (10 sec: 50789.9, 60 sec: 49698.1, 300 sec: 46597.2). Total num frames: 142393344. Throughput: 0: 49905.8. Samples: 39257920. Policy #0 lag: (min: 1.0, avg: 11.3, max: 21.0) [2024-06-06 12:13:51,757][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:13:53,868][02924] Updated weights for policy 0, policy_version 8697 (0.0030) [2024-06-06 12:13:56,661][02924] Updated weights for policy 0, policy_version 8707 (0.0029) [2024-06-06 12:13:56,757][02692] Fps is (10 sec: 50790.8, 60 sec: 49698.2, 300 sec: 46597.2). Total num frames: 142655488. Throughput: 0: 50012.9. Samples: 39555740. Policy #0 lag: (min: 1.0, avg: 11.3, max: 21.0) [2024-06-06 12:13:56,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:14:00,478][02924] Updated weights for policy 0, policy_version 8717 (0.0028) [2024-06-06 12:14:01,251][02904] Signal inference workers to stop experience collection... (700 times) [2024-06-06 12:14:01,279][02924] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-06-06 12:14:01,316][02904] Signal inference workers to resume experience collection... (700 times) [2024-06-06 12:14:01,316][02924] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-06-06 12:14:01,757][02692] Fps is (10 sec: 47513.9, 60 sec: 49971.3, 300 sec: 46375.1). Total num frames: 142868480. Throughput: 0: 49822.3. Samples: 39856020. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 12:14:01,757][02692] Avg episode reward: [(0, '0.068')] [2024-06-06 12:14:03,258][02924] Updated weights for policy 0, policy_version 8727 (0.0036) [2024-06-06 12:14:06,757][02692] Fps is (10 sec: 47513.3, 60 sec: 49971.2, 300 sec: 46652.7). Total num frames: 143130624. Throughput: 0: 49518.7. Samples: 39994380. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 12:14:06,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:14:06,771][02904] Saving new best policy, reward=0.072! [2024-06-06 12:14:07,019][02924] Updated weights for policy 0, policy_version 8737 (0.0032) [2024-06-06 12:14:09,681][02924] Updated weights for policy 0, policy_version 8747 (0.0027) [2024-06-06 12:14:11,757][02692] Fps is (10 sec: 50790.5, 60 sec: 49698.1, 300 sec: 46542.1). Total num frames: 143376384. Throughput: 0: 49506.3. Samples: 40289860. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 12:14:11,757][02692] Avg episode reward: [(0, '0.064')] [2024-06-06 12:14:13,785][02924] Updated weights for policy 0, policy_version 8757 (0.0042) [2024-06-06 12:14:16,411][02924] Updated weights for policy 0, policy_version 8767 (0.0029) [2024-06-06 12:14:16,757][02692] Fps is (10 sec: 50790.1, 60 sec: 49425.1, 300 sec: 46597.2). Total num frames: 143638528. Throughput: 0: 49561.9. Samples: 40588260. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 12:14:16,757][02692] Avg episode reward: [(0, '0.074')] [2024-06-06 12:14:16,799][02904] Saving new best policy, reward=0.074! [2024-06-06 12:14:20,631][02924] Updated weights for policy 0, policy_version 8777 (0.0035) [2024-06-06 12:14:21,757][02692] Fps is (10 sec: 49151.9, 60 sec: 49971.2, 300 sec: 46430.6). Total num frames: 143867904. Throughput: 0: 49637.8. Samples: 40738640. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:14:21,757][02692] Avg episode reward: [(0, '0.070')] [2024-06-06 12:14:23,197][02924] Updated weights for policy 0, policy_version 8787 (0.0037) [2024-06-06 12:14:26,757][02692] Fps is (10 sec: 47513.8, 60 sec: 49425.1, 300 sec: 46652.7). Total num frames: 144113664. Throughput: 0: 49556.4. Samples: 41036380. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:14:26,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:14:26,989][02924] Updated weights for policy 0, policy_version 8797 (0.0027) [2024-06-06 12:14:29,888][02924] Updated weights for policy 0, policy_version 8807 (0.0026) [2024-06-06 12:14:31,757][02692] Fps is (10 sec: 50790.2, 60 sec: 49698.3, 300 sec: 46597.2). Total num frames: 144375808. Throughput: 0: 49497.3. Samples: 41334860. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:14:31,757][02692] Avg episode reward: [(0, '0.073')] [2024-06-06 12:14:33,389][02924] Updated weights for policy 0, policy_version 8817 (0.0019) [2024-06-06 12:14:36,247][02924] Updated weights for policy 0, policy_version 8827 (0.0030) [2024-06-06 12:14:36,757][02692] Fps is (10 sec: 50789.5, 60 sec: 49424.9, 300 sec: 46597.2). Total num frames: 144621568. Throughput: 0: 49600.3. Samples: 41489940. Policy #0 lag: (min: 1.0, avg: 10.6, max: 22.0) [2024-06-06 12:14:36,758][02692] Avg episode reward: [(0, '0.074')] [2024-06-06 12:14:40,106][02924] Updated weights for policy 0, policy_version 8837 (0.0023) [2024-06-06 12:14:41,757][02692] Fps is (10 sec: 49151.9, 60 sec: 49698.1, 300 sec: 46486.1). Total num frames: 144867328. Throughput: 0: 49546.5. Samples: 41785340. Policy #0 lag: (min: 1.0, avg: 10.6, max: 22.0) [2024-06-06 12:14:41,757][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:14:42,969][02924] Updated weights for policy 0, policy_version 8847 (0.0037) [2024-06-06 12:14:46,757][02692] Fps is (10 sec: 47514.2, 60 sec: 49152.0, 300 sec: 46597.2). Total num frames: 145096704. Throughput: 0: 49396.8. Samples: 42078880. Policy #0 lag: (min: 1.0, avg: 11.2, max: 21.0) [2024-06-06 12:14:46,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:14:46,837][02924] Updated weights for policy 0, policy_version 8857 (0.0029) [2024-06-06 12:14:49,737][02924] Updated weights for policy 0, policy_version 8867 (0.0037) [2024-06-06 12:14:51,757][02692] Fps is (10 sec: 47512.9, 60 sec: 49151.9, 300 sec: 46541.7). Total num frames: 145342464. Throughput: 0: 49561.6. Samples: 42224660. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:14:51,758][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:14:53,358][02924] Updated weights for policy 0, policy_version 8877 (0.0030) [2024-06-06 12:14:56,438][02924] Updated weights for policy 0, policy_version 8887 (0.0027) [2024-06-06 12:14:56,757][02692] Fps is (10 sec: 50789.5, 60 sec: 49151.7, 300 sec: 46597.2). Total num frames: 145604608. Throughput: 0: 49505.9. Samples: 42517640. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:14:56,758][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:14:59,967][02924] Updated weights for policy 0, policy_version 8897 (0.0040) [2024-06-06 12:15:01,203][02904] Signal inference workers to stop experience collection... (750 times) [2024-06-06 12:15:01,203][02904] Signal inference workers to resume experience collection... (750 times) [2024-06-06 12:15:01,249][02924] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-06-06 12:15:01,249][02924] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-06-06 12:15:01,757][02692] Fps is (10 sec: 50790.7, 60 sec: 49698.0, 300 sec: 46486.1). Total num frames: 145850368. Throughput: 0: 49744.3. Samples: 42826760. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 12:15:01,758][02692] Avg episode reward: [(0, '0.074')] [2024-06-06 12:15:02,931][02924] Updated weights for policy 0, policy_version 8907 (0.0025) [2024-06-06 12:15:06,455][02924] Updated weights for policy 0, policy_version 8917 (0.0032) [2024-06-06 12:15:06,757][02692] Fps is (10 sec: 49152.9, 60 sec: 49425.0, 300 sec: 46652.8). Total num frames: 146096128. Throughput: 0: 49411.9. Samples: 42962180. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 12:15:06,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:15:09,661][02924] Updated weights for policy 0, policy_version 8927 (0.0030) [2024-06-06 12:15:11,757][02692] Fps is (10 sec: 50790.9, 60 sec: 49698.1, 300 sec: 46652.7). Total num frames: 146358272. Throughput: 0: 49230.2. Samples: 43251740. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 12:15:11,758][02692] Avg episode reward: [(0, '0.075')] [2024-06-06 12:15:13,254][02924] Updated weights for policy 0, policy_version 8937 (0.0023) [2024-06-06 12:15:16,533][02924] Updated weights for policy 0, policy_version 8947 (0.0028) [2024-06-06 12:15:16,757][02692] Fps is (10 sec: 50790.7, 60 sec: 49425.1, 300 sec: 46597.2). Total num frames: 146604032. Throughput: 0: 49283.2. Samples: 43552600. Policy #0 lag: (min: 0.0, avg: 11.2, max: 23.0) [2024-06-06 12:15:16,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:15:19,644][02924] Updated weights for policy 0, policy_version 8957 (0.0038) [2024-06-06 12:15:21,757][02692] Fps is (10 sec: 47513.4, 60 sec: 49425.0, 300 sec: 46375.1). Total num frames: 146833408. Throughput: 0: 49089.0. Samples: 43698940. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 12:15:21,757][02692] Avg episode reward: [(0, '0.073')] [2024-06-06 12:15:23,152][02924] Updated weights for policy 0, policy_version 8967 (0.0023) [2024-06-06 12:15:26,280][02924] Updated weights for policy 0, policy_version 8977 (0.0021) [2024-06-06 12:15:26,759][02692] Fps is (10 sec: 47502.1, 60 sec: 49423.1, 300 sec: 46541.3). Total num frames: 147079168. Throughput: 0: 49077.0. Samples: 43993920. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 12:15:26,760][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:15:26,778][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008977_147079168.pth... [2024-06-06 12:15:26,819][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008253_135217152.pth [2024-06-06 12:15:29,739][02924] Updated weights for policy 0, policy_version 8987 (0.0034) [2024-06-06 12:15:31,757][02692] Fps is (10 sec: 50789.9, 60 sec: 49424.9, 300 sec: 46652.7). Total num frames: 147341312. Throughput: 0: 49119.0. Samples: 44289240. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 12:15:31,758][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:15:32,983][02924] Updated weights for policy 0, policy_version 8997 (0.0033) [2024-06-06 12:15:36,316][02924] Updated weights for policy 0, policy_version 9007 (0.0037) [2024-06-06 12:15:36,757][02692] Fps is (10 sec: 50802.3, 60 sec: 49425.2, 300 sec: 46541.7). Total num frames: 147587072. Throughput: 0: 49407.7. Samples: 44448000. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 12:15:36,757][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:15:39,717][02924] Updated weights for policy 0, policy_version 9017 (0.0034) [2024-06-06 12:15:41,757][02692] Fps is (10 sec: 47513.9, 60 sec: 49151.9, 300 sec: 46430.6). Total num frames: 147816448. Throughput: 0: 49305.9. Samples: 44736400. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 12:15:41,758][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:15:43,051][02924] Updated weights for policy 0, policy_version 9027 (0.0034) [2024-06-06 12:15:46,183][02924] Updated weights for policy 0, policy_version 9037 (0.0029) [2024-06-06 12:15:46,759][02692] Fps is (10 sec: 47502.8, 60 sec: 49423.2, 300 sec: 46485.8). Total num frames: 148062208. Throughput: 0: 48932.7. Samples: 45028840. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 12:15:46,760][02692] Avg episode reward: [(0, '0.073')] [2024-06-06 12:15:49,698][02924] Updated weights for policy 0, policy_version 9047 (0.0027) [2024-06-06 12:15:51,757][02692] Fps is (10 sec: 49152.6, 60 sec: 49425.2, 300 sec: 46541.7). Total num frames: 148307968. Throughput: 0: 49212.1. Samples: 45176720. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:15:51,757][02692] Avg episode reward: [(0, '0.073')] [2024-06-06 12:15:52,730][02924] Updated weights for policy 0, policy_version 9057 (0.0026) [2024-06-06 12:15:56,280][02924] Updated weights for policy 0, policy_version 9067 (0.0029) [2024-06-06 12:15:56,760][02692] Fps is (10 sec: 50787.1, 60 sec: 49422.8, 300 sec: 49462.3). Total num frames: 148570112. Throughput: 0: 49504.3. Samples: 45479580. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:15:56,760][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:15:59,500][02924] Updated weights for policy 0, policy_version 9077 (0.0030) [2024-06-06 12:16:01,757][02692] Fps is (10 sec: 47513.5, 60 sec: 48879.0, 300 sec: 49344.5). Total num frames: 148783104. Throughput: 0: 49338.2. Samples: 45772820. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 12:16:01,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:16:02,899][02924] Updated weights for policy 0, policy_version 9087 (0.0037) [2024-06-06 12:16:04,019][02904] Signal inference workers to stop experience collection... (800 times) [2024-06-06 12:16:04,019][02904] Signal inference workers to resume experience collection... (800 times) [2024-06-06 12:16:04,054][02924] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-06-06 12:16:04,055][02924] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-06-06 12:16:06,482][02924] Updated weights for policy 0, policy_version 9097 (0.0029) [2024-06-06 12:16:06,757][02692] Fps is (10 sec: 47527.3, 60 sec: 49152.0, 300 sec: 49398.8). Total num frames: 149045248. Throughput: 0: 49174.2. Samples: 45911780. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 12:16:06,758][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:16:09,823][02924] Updated weights for policy 0, policy_version 9107 (0.0025)