[2024-06-05 17:50:36,095][10130] Saving configuration to /workspace/metta/train_dir/p2.metta.4/config.json... [2024-06-05 17:50:36,112][10130] Rollout worker 0 uses device cpu [2024-06-05 17:50:36,112][10130] Rollout worker 1 uses device cpu [2024-06-05 17:50:36,113][10130] Rollout worker 2 uses device cpu [2024-06-05 17:50:36,113][10130] Rollout worker 3 uses device cpu [2024-06-05 17:50:36,113][10130] Rollout worker 4 uses device cpu [2024-06-05 17:50:36,113][10130] Rollout worker 5 uses device cpu [2024-06-05 17:50:36,114][10130] Rollout worker 6 uses device cpu [2024-06-05 17:50:36,114][10130] Rollout worker 7 uses device cpu [2024-06-05 17:50:36,114][10130] Rollout worker 8 uses device cpu [2024-06-05 17:50:36,114][10130] Rollout worker 9 uses device cpu [2024-06-05 17:50:36,115][10130] Rollout worker 10 uses device cpu [2024-06-05 17:50:36,115][10130] Rollout worker 11 uses device cpu [2024-06-05 17:50:36,115][10130] Rollout worker 12 uses device cpu [2024-06-05 17:50:36,116][10130] Rollout worker 13 uses device cpu [2024-06-05 17:50:36,116][10130] Rollout worker 14 uses device cpu [2024-06-05 17:50:36,116][10130] Rollout worker 15 uses device cpu [2024-06-05 17:50:36,116][10130] Rollout worker 16 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 17 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 18 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 19 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 20 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 21 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 22 uses device cpu [2024-06-05 17:50:36,117][10130] Rollout worker 23 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 24 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 25 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 26 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 27 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 28 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 29 uses device cpu [2024-06-05 17:50:36,118][10130] Rollout worker 30 uses device cpu [2024-06-05 17:50:36,119][10130] Rollout worker 31 uses device cpu [2024-06-05 17:50:36,630][10130] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-05 17:50:36,630][10130] InferenceWorker_p0-w0: min num requests: 10 [2024-06-05 17:50:36,673][10130] Starting all processes... [2024-06-05 17:50:36,673][10130] Starting process learner_proc0 [2024-06-05 17:50:36,947][10130] Starting all processes... [2024-06-05 17:50:36,950][10130] Starting process inference_proc0-0 [2024-06-05 17:50:36,950][10130] Starting process rollout_proc0 [2024-06-05 17:50:36,950][10130] Starting process rollout_proc1 [2024-06-05 17:50:36,950][10130] Starting process rollout_proc2 [2024-06-05 17:50:36,950][10130] Starting process rollout_proc3 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc4 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc5 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc6 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc7 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc8 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc9 [2024-06-05 17:50:36,951][10130] Starting process rollout_proc10 [2024-06-05 17:50:36,952][10130] Starting process rollout_proc11 [2024-06-05 17:50:36,952][10130] Starting process rollout_proc12 [2024-06-05 17:50:36,953][10130] Starting process rollout_proc13 [2024-06-05 17:50:36,954][10130] Starting process rollout_proc14 [2024-06-05 17:50:36,954][10130] Starting process rollout_proc15 [2024-06-05 17:50:36,960][10130] Starting process rollout_proc16 [2024-06-05 17:50:36,960][10130] Starting process rollout_proc17 [2024-06-05 17:50:36,960][10130] Starting process rollout_proc18 [2024-06-05 17:50:36,961][10130] Starting process rollout_proc19 [2024-06-05 17:50:36,961][10130] Starting process rollout_proc20 [2024-06-05 17:50:36,961][10130] Starting process rollout_proc21 [2024-06-05 17:50:36,963][10130] Starting process rollout_proc22 [2024-06-05 17:50:36,964][10130] Starting process rollout_proc23 [2024-06-05 17:50:36,966][10130] Starting process rollout_proc24 [2024-06-05 17:50:36,969][10130] Starting process rollout_proc25 [2024-06-05 17:50:36,970][10130] Starting process rollout_proc26 [2024-06-05 17:50:36,970][10130] Starting process rollout_proc27 [2024-06-05 17:50:36,974][10130] Starting process rollout_proc28 [2024-06-05 17:50:36,974][10130] Starting process rollout_proc29 [2024-06-05 17:50:36,975][10130] Starting process rollout_proc30 [2024-06-05 17:50:36,978][10130] Starting process rollout_proc31 [2024-06-05 17:50:38,851][10396] Worker 27 uses CPU cores [27] [2024-06-05 17:50:38,866][10384] Worker 16 uses CPU cores [16] [2024-06-05 17:50:39,028][10367] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-05 17:50:39,028][10367] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-05 17:50:39,043][10367] Num visible devices: 1 [2024-06-05 17:50:39,067][10376] Worker 8 uses CPU cores [8] [2024-06-05 17:50:39,079][10389] Worker 22 uses CPU cores [22] [2024-06-05 17:50:39,099][10375] Worker 7 uses CPU cores [7] [2024-06-05 17:50:39,127][10371] Worker 1 uses CPU cores [1] [2024-06-05 17:50:39,155][10381] Worker 14 uses CPU cores [14] [2024-06-05 17:50:39,171][10393] Worker 25 uses CPU cores [25] [2024-06-05 17:50:39,175][10397] Worker 29 uses CPU cores [29] [2024-06-05 17:50:39,179][10394] Worker 26 uses CPU cores [26] [2024-06-05 17:50:39,195][10391] Worker 24 uses CPU cores [24] [2024-06-05 17:50:39,215][10373] Worker 5 uses CPU cores [5] [2024-06-05 17:50:39,234][10374] Worker 6 uses CPU cores [6] [2024-06-05 17:50:39,239][10387] Worker 19 uses CPU cores [19] [2024-06-05 17:50:39,277][10369] Worker 3 uses CPU cores [3] [2024-06-05 17:50:39,317][10382] Worker 15 uses CPU cores [15] [2024-06-05 17:50:39,322][10377] Worker 9 uses CPU cores [9] [2024-06-05 17:50:39,351][10368] Worker 0 uses CPU cores [0] [2024-06-05 17:50:39,363][10372] Worker 4 uses CPU cores [4] [2024-06-05 17:50:39,368][10383] Worker 13 uses CPU cores [13] [2024-06-05 17:50:39,371][10390] Worker 23 uses CPU cores [23] [2024-06-05 17:50:39,391][10370] Worker 2 uses CPU cores [2] [2024-06-05 17:50:39,391][10395] Worker 28 uses CPU cores [28] [2024-06-05 17:50:39,399][10380] Worker 12 uses CPU cores [12] [2024-06-05 17:50:39,401][10379] Worker 11 uses CPU cores [11] [2024-06-05 17:50:39,434][10386] Worker 18 uses CPU cores [18] [2024-06-05 17:50:39,445][10378] Worker 10 uses CPU cores [10] [2024-06-05 17:50:39,516][10399] Worker 30 uses CPU cores [30] [2024-06-05 17:50:39,517][10392] Worker 21 uses CPU cores [21] [2024-06-05 17:50:39,526][10398] Worker 31 uses CPU cores [31] [2024-06-05 17:50:39,532][10388] Worker 20 uses CPU cores [20] [2024-06-05 17:50:39,537][10385] Worker 17 uses CPU cores [17] [2024-06-05 17:50:39,563][10347] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-05 17:50:39,563][10347] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-05 17:50:39,570][10347] Num visible devices: 1 [2024-06-05 17:50:39,580][10347] Setting fixed seed 0 [2024-06-05 17:50:39,580][10347] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-05 17:50:39,580][10347] Initializing actor-critic model on device cuda:0 [2024-06-05 17:50:40,184][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,185][10347] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:40,189][10347] RunningMeanStd input shape: (1,) [2024-06-05 17:50:40,189][10347] RunningMeanStd input shape: (1,) [2024-06-05 17:50:40,189][10347] RunningMeanStd input shape: (1,) [2024-06-05 17:50:40,189][10347] RunningMeanStd input shape: (1,) [2024-06-05 17:50:40,228][10347] RunningMeanStd input shape: (1,) [2024-06-05 17:50:40,232][10347] Created Actor Critic model with architecture: [2024-06-05 17:50:40,232][10347] SampleFactoryAgentWrapper( (obs_normalizer): ObservationNormalizer() (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (agent): MettaAgent( (_encoder): MultiFeatureSetEncoder( (feature_set_encoders): ModuleDict( (grid_obs): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (agent): RunningMeanStdInPlace() (altar): RunningMeanStdInPlace() (converter): RunningMeanStdInPlace() (generator): RunningMeanStdInPlace() (wall): RunningMeanStdInPlace() (agent:dir): RunningMeanStdInPlace() (agent:energy): RunningMeanStdInPlace() (agent:frozen): RunningMeanStdInPlace() (agent:hp): RunningMeanStdInPlace() (agent:id): RunningMeanStdInPlace() (agent:inv_r1): RunningMeanStdInPlace() (agent:inv_r2): RunningMeanStdInPlace() (agent:inv_r3): RunningMeanStdInPlace() (agent:shield): RunningMeanStdInPlace() (altar:hp): RunningMeanStdInPlace() (altar:state): RunningMeanStdInPlace() (converter:hp): RunningMeanStdInPlace() (converter:state): RunningMeanStdInPlace() (generator:amount): RunningMeanStdInPlace() (generator:hp): RunningMeanStdInPlace() (generator:state): RunningMeanStdInPlace() (wall:hp): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=125, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=512, out_features=512, bias=True) (7): ELU(alpha=1.0) ) ) (global_vars): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (_steps): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_action): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_action_id): RunningMeanStdInPlace() (last_action_val): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_reward): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_reward): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) ) (merged_encoder): Sequential( (0): Linear(in_features=536, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) ) ) (_core): ModelCoreRNN( (core): GRU(512, 512) ) (_decoder): Decoder( (mlp): Identity() ) (_critic_linear): Linear(in_features=512, out_features=1, bias=True) (_action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) ) [2024-06-05 17:50:40,295][10347] Using optimizer [2024-06-05 17:50:40,442][10347] No checkpoints found [2024-06-05 17:50:40,442][10347] Did not load from checkpoint, starting from scratch! [2024-06-05 17:50:40,442][10347] Initialized policy 0 weights for model version 0 [2024-06-05 17:50:40,444][10347] LearnerWorker_p0 finished initialization! [2024-06-05 17:50:40,444][10347] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-05 17:50:41,082][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,083][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,084][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,084][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,084][10367] RunningMeanStd input shape: (11, 11) [2024-06-05 17:50:41,087][10367] RunningMeanStd input shape: (1,) [2024-06-05 17:50:41,087][10367] RunningMeanStd input shape: (1,) [2024-06-05 17:50:41,087][10367] RunningMeanStd input shape: (1,) [2024-06-05 17:50:41,087][10367] RunningMeanStd input shape: (1,) [2024-06-05 17:50:41,126][10367] RunningMeanStd input shape: (1,) [2024-06-05 17:50:41,148][10130] Inference worker 0-0 is ready! [2024-06-05 17:50:41,148][10130] All inference workers are ready! Signal rollout workers to start! [2024-06-05 17:50:43,238][10387] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,238][10389] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,252][10390] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,260][10385] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,261][10396] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,261][10388] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,266][10398] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,267][10384] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,271][10391] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,272][10393] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,273][10386] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,274][10399] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,277][10397] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,282][10369] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,283][10382] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,283][10379] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,284][10375] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,284][10377] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,286][10373] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,290][10395] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,291][10371] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,292][10368] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,292][10380] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,293][10381] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,294][10372] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,294][10376] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,295][10374] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,296][10378] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,297][10370] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,297][10383] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,298][10392] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,309][10394] Decorrelating experience for 0 frames... [2024-06-05 17:50:43,920][10130] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-05 17:50:43,980][10387] Decorrelating experience for 256 frames... [2024-06-05 17:50:43,984][10389] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,007][10390] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,017][10385] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,035][10398] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,037][10388] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,042][10384] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,043][10379] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,045][10391] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,050][10396] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,052][10382] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,053][10369] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,057][10373] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,058][10399] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,059][10375] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,061][10377] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,062][10386] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,066][10393] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,068][10380] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,069][10374] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,070][10397] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,071][10371] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,072][10368] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,074][10378] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,074][10372] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,074][10381] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,076][10370] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,078][10376] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,081][10383] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,086][10395] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,104][10394] Decorrelating experience for 256 frames... [2024-06-05 17:50:44,105][10392] Decorrelating experience for 256 frames... [2024-06-05 17:50:48,920][10130] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 31084.7. Samples: 155420. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-05 17:50:49,852][10381] Worker 14, sleep for 65.625 sec to decorrelate experience collection [2024-06-05 17:50:49,852][10389] Worker 22, sleep for 103.125 sec to decorrelate experience collection [2024-06-05 17:50:49,852][10385] Worker 17, sleep for 79.688 sec to decorrelate experience collection [2024-06-05 17:50:49,853][10391] Worker 24, sleep for 112.500 sec to decorrelate experience collection [2024-06-05 17:50:49,862][10383] Worker 13, sleep for 60.938 sec to decorrelate experience collection [2024-06-05 17:50:49,862][10369] Worker 3, sleep for 14.062 sec to decorrelate experience collection [2024-06-05 17:50:49,862][10378] Worker 10, sleep for 46.875 sec to decorrelate experience collection [2024-06-05 17:50:49,862][10370] Worker 2, sleep for 9.375 sec to decorrelate experience collection [2024-06-05 17:50:49,863][10390] Worker 23, sleep for 107.812 sec to decorrelate experience collection [2024-06-05 17:50:49,863][10384] Worker 16, sleep for 75.000 sec to decorrelate experience collection [2024-06-05 17:50:49,863][10387] Worker 19, sleep for 89.062 sec to decorrelate experience collection [2024-06-05 17:50:49,863][10399] Worker 30, sleep for 140.625 sec to decorrelate experience collection [2024-06-05 17:50:49,874][10388] Worker 20, sleep for 93.750 sec to decorrelate experience collection [2024-06-05 17:50:49,874][10398] Worker 31, sleep for 145.312 sec to decorrelate experience collection [2024-06-05 17:50:49,874][10386] Worker 18, sleep for 84.375 sec to decorrelate experience collection [2024-06-05 17:50:49,874][10397] Worker 29, sleep for 135.938 sec to decorrelate experience collection [2024-06-05 17:50:49,875][10396] Worker 27, sleep for 126.562 sec to decorrelate experience collection [2024-06-05 17:50:49,880][10380] Worker 12, sleep for 56.250 sec to decorrelate experience collection [2024-06-05 17:50:49,880][10382] Worker 15, sleep for 70.312 sec to decorrelate experience collection [2024-06-05 17:50:49,881][10376] Worker 8, sleep for 37.500 sec to decorrelate experience collection [2024-06-05 17:50:49,881][10377] Worker 9, sleep for 42.188 sec to decorrelate experience collection [2024-06-05 17:50:49,887][10371] Worker 1, sleep for 4.688 sec to decorrelate experience collection [2024-06-05 17:50:49,891][10379] Worker 11, sleep for 51.562 sec to decorrelate experience collection [2024-06-05 17:50:49,893][10393] Worker 25, sleep for 117.188 sec to decorrelate experience collection [2024-06-05 17:50:49,893][10395] Worker 28, sleep for 131.250 sec to decorrelate experience collection [2024-06-05 17:50:49,899][10374] Worker 6, sleep for 28.125 sec to decorrelate experience collection [2024-06-05 17:50:49,900][10394] Worker 26, sleep for 121.875 sec to decorrelate experience collection [2024-06-05 17:50:49,907][10375] Worker 7, sleep for 32.812 sec to decorrelate experience collection [2024-06-05 17:50:49,908][10392] Worker 21, sleep for 98.438 sec to decorrelate experience collection [2024-06-05 17:50:49,941][10372] Worker 4, sleep for 18.750 sec to decorrelate experience collection [2024-06-05 17:50:49,943][10373] Worker 5, sleep for 23.438 sec to decorrelate experience collection [2024-06-05 17:50:49,958][10347] Signal inference workers to stop experience collection... [2024-06-05 17:50:50,001][10367] InferenceWorker_p0-w0: stopping experience collection [2024-06-05 17:50:50,489][10347] Signal inference workers to resume experience collection... [2024-06-05 17:50:50,489][10367] InferenceWorker_p0-w0: resuming experience collection [2024-06-05 17:50:51,580][10367] Updated weights for policy 0, policy_version 10 (0.0012) [2024-06-05 17:50:53,920][10130] Fps is (10 sec: 16383.9, 60 sec: 16383.9, 300 sec: 16383.9). Total num frames: 163840. Throughput: 0: 33031.8. Samples: 330320. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-05 17:50:54,598][10371] Worker 1 awakens! [2024-06-05 17:50:56,627][10130] Heartbeat connected on Batcher_0 [2024-06-05 17:50:56,629][10130] Heartbeat connected on LearnerWorker_p0 [2024-06-05 17:50:56,640][10130] Heartbeat connected on RolloutWorker_w0 [2024-06-05 17:50:56,641][10130] Heartbeat connected on RolloutWorker_w1 [2024-06-05 17:50:56,697][10130] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-05 17:50:58,920][10130] Fps is (10 sec: 16383.9, 60 sec: 10922.7, 300 sec: 10922.7). Total num frames: 163840. Throughput: 0: 22352.1. Samples: 335280. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-05 17:50:59,283][10370] Worker 2 awakens! [2024-06-05 17:50:59,289][10130] Heartbeat connected on RolloutWorker_w2 [2024-06-05 17:51:03,920][10130] Fps is (10 sec: 1638.4, 60 sec: 9011.1, 300 sec: 9011.1). Total num frames: 180224. Throughput: 0: 17577.7. Samples: 351560. Policy #0 lag: (min: 0.0, avg: 9.4, max: 10.0) [2024-06-05 17:51:03,995][10369] Worker 3 awakens! [2024-06-05 17:51:04,000][10130] Heartbeat connected on RolloutWorker_w3 [2024-06-05 17:51:08,714][10372] Worker 4 awakens! [2024-06-05 17:51:08,722][10130] Heartbeat connected on RolloutWorker_w4 [2024-06-05 17:51:08,920][10130] Fps is (10 sec: 4915.3, 60 sec: 8519.8, 300 sec: 8519.8). Total num frames: 212992. Throughput: 0: 15021.0. Samples: 375520. Policy #0 lag: (min: 0.0, avg: 7.9, max: 12.0) [2024-06-05 17:51:13,477][10373] Worker 5 awakens! [2024-06-05 17:51:13,481][10130] Heartbeat connected on RolloutWorker_w5 [2024-06-05 17:51:13,920][10130] Fps is (10 sec: 11469.5, 60 sec: 9830.5, 300 sec: 9830.5). Total num frames: 294912. Throughput: 0: 14072.2. Samples: 422160. Policy #0 lag: (min: 0.0, avg: 1.2, max: 4.0) [2024-06-05 17:51:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:13,934][10347] Saving new best policy, reward=0.000! [2024-06-05 17:51:15,108][10367] Updated weights for policy 0, policy_version 20 (0.0013) [2024-06-05 17:51:18,127][10374] Worker 6 awakens! [2024-06-05 17:51:18,131][10130] Heartbeat connected on RolloutWorker_w6 [2024-06-05 17:51:18,920][10130] Fps is (10 sec: 18022.3, 60 sec: 11234.8, 300 sec: 11234.8). Total num frames: 393216. Throughput: 0: 15283.5. Samples: 534920. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) [2024-06-05 17:51:18,928][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:21,984][10367] Updated weights for policy 0, policy_version 30 (0.0011) [2024-06-05 17:51:22,820][10375] Worker 7 awakens! [2024-06-05 17:51:22,825][10130] Heartbeat connected on RolloutWorker_w7 [2024-06-05 17:51:23,920][10130] Fps is (10 sec: 24576.0, 60 sec: 13516.9, 300 sec: 13516.9). Total num frames: 540672. Throughput: 0: 17248.1. Samples: 689920. Policy #0 lag: (min: 0.0, avg: 2.7, max: 5.0) [2024-06-05 17:51:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:27,479][10376] Worker 8 awakens! [2024-06-05 17:51:27,483][10130] Heartbeat connected on RolloutWorker_w8 [2024-06-05 17:51:28,055][10367] Updated weights for policy 0, policy_version 40 (0.0011) [2024-06-05 17:51:28,920][10130] Fps is (10 sec: 29491.3, 60 sec: 15291.8, 300 sec: 15291.8). Total num frames: 688128. Throughput: 0: 17122.7. Samples: 770520. Policy #0 lag: (min: 0.0, avg: 11.7, max: 38.0) [2024-06-05 17:51:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:32,168][10377] Worker 9 awakens! [2024-06-05 17:51:32,174][10130] Heartbeat connected on RolloutWorker_w9 [2024-06-05 17:51:32,991][10367] Updated weights for policy 0, policy_version 50 (0.0012) [2024-06-05 17:51:33,920][10130] Fps is (10 sec: 27852.8, 60 sec: 16384.1, 300 sec: 16384.1). Total num frames: 819200. Throughput: 0: 17571.6. Samples: 946140. Policy #0 lag: (min: 0.0, avg: 3.9, max: 8.0) [2024-06-05 17:51:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:36,838][10378] Worker 10 awakens! [2024-06-05 17:51:36,842][10130] Heartbeat connected on RolloutWorker_w10 [2024-06-05 17:51:38,322][10367] Updated weights for policy 0, policy_version 60 (0.0011) [2024-06-05 17:51:38,920][10130] Fps is (10 sec: 31129.9, 60 sec: 18171.4, 300 sec: 18171.4). Total num frames: 999424. Throughput: 0: 18054.8. Samples: 1142780. Policy #0 lag: (min: 0.0, avg: 19.8, max: 56.0) [2024-06-05 17:51:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:41,554][10379] Worker 11 awakens! [2024-06-05 17:51:41,560][10130] Heartbeat connected on RolloutWorker_w11 [2024-06-05 17:51:42,019][10367] Updated weights for policy 0, policy_version 70 (0.0013) [2024-06-05 17:51:43,920][10130] Fps is (10 sec: 39321.3, 60 sec: 20207.0, 300 sec: 20207.0). Total num frames: 1212416. Throughput: 0: 20612.5. Samples: 1262840. Policy #0 lag: (min: 0.0, avg: 24.3, max: 68.0) [2024-06-05 17:51:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:46,224][10367] Updated weights for policy 0, policy_version 80 (0.0013) [2024-06-05 17:51:46,227][10380] Worker 12 awakens! [2024-06-05 17:51:46,232][10130] Heartbeat connected on RolloutWorker_w12 [2024-06-05 17:51:48,920][10130] Fps is (10 sec: 39321.2, 60 sec: 23210.7, 300 sec: 21425.3). Total num frames: 1392640. Throughput: 0: 25902.5. Samples: 1517160. Policy #0 lag: (min: 1.0, avg: 3.3, max: 9.0) [2024-06-05 17:51:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:49,984][10367] Updated weights for policy 0, policy_version 90 (0.0016) [2024-06-05 17:51:50,900][10383] Worker 13 awakens! [2024-06-05 17:51:50,906][10130] Heartbeat connected on RolloutWorker_w13 [2024-06-05 17:51:53,525][10367] Updated weights for policy 0, policy_version 100 (0.0016) [2024-06-05 17:51:53,920][10130] Fps is (10 sec: 44236.5, 60 sec: 24849.2, 300 sec: 23639.8). Total num frames: 1654784. Throughput: 0: 31277.7. Samples: 1783020. Policy #0 lag: (min: 0.0, avg: 4.3, max: 11.0) [2024-06-05 17:51:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:51:55,577][10381] Worker 14 awakens! [2024-06-05 17:51:55,582][10130] Heartbeat connected on RolloutWorker_w14 [2024-06-05 17:51:57,303][10367] Updated weights for policy 0, policy_version 110 (0.0018) [2024-06-05 17:51:58,920][10130] Fps is (10 sec: 44236.9, 60 sec: 27852.9, 300 sec: 24466.8). Total num frames: 1835008. Throughput: 0: 33310.1. Samples: 1921120. Policy #0 lag: (min: 0.0, avg: 6.0, max: 10.0) [2024-06-05 17:51:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:00,291][10382] Worker 15 awakens! [2024-06-05 17:52:00,296][10130] Heartbeat connected on RolloutWorker_w15 [2024-06-05 17:52:00,836][10367] Updated weights for policy 0, policy_version 120 (0.0019) [2024-06-05 17:52:03,920][10130] Fps is (10 sec: 40960.1, 60 sec: 31402.9, 300 sec: 25804.9). Total num frames: 2064384. Throughput: 0: 36505.3. Samples: 2177660. Policy #0 lag: (min: 0.0, avg: 5.8, max: 11.0) [2024-06-05 17:52:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:04,663][10367] Updated weights for policy 0, policy_version 130 (0.0022) [2024-06-05 17:52:04,963][10384] Worker 16 awakens! [2024-06-05 17:52:04,972][10130] Heartbeat connected on RolloutWorker_w16 [2024-06-05 17:52:08,842][10367] Updated weights for policy 0, policy_version 140 (0.0020) [2024-06-05 17:52:08,920][10130] Fps is (10 sec: 45874.9, 60 sec: 34679.4, 300 sec: 26985.5). Total num frames: 2293760. Throughput: 0: 38434.5. Samples: 2419480. Policy #0 lag: (min: 0.0, avg: 4.0, max: 11.0) [2024-06-05 17:52:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:09,640][10385] Worker 17 awakens! [2024-06-05 17:52:09,649][10130] Heartbeat connected on RolloutWorker_w17 [2024-06-05 17:52:12,890][10367] Updated weights for policy 0, policy_version 150 (0.0019) [2024-06-05 17:52:13,920][10130] Fps is (10 sec: 44236.4, 60 sec: 36863.9, 300 sec: 27852.8). Total num frames: 2506752. Throughput: 0: 39530.6. Samples: 2549400. Policy #0 lag: (min: 0.0, avg: 5.1, max: 13.0) [2024-06-05 17:52:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:14,347][10386] Worker 18 awakens! [2024-06-05 17:52:14,356][10130] Heartbeat connected on RolloutWorker_w18 [2024-06-05 17:52:17,039][10367] Updated weights for policy 0, policy_version 160 (0.0026) [2024-06-05 17:52:18,920][10130] Fps is (10 sec: 42598.3, 60 sec: 38775.4, 300 sec: 28628.9). Total num frames: 2719744. Throughput: 0: 41306.5. Samples: 2804940. Policy #0 lag: (min: 0.0, avg: 7.7, max: 13.0) [2024-06-05 17:52:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:19,023][10387] Worker 19 awakens! [2024-06-05 17:52:19,033][10130] Heartbeat connected on RolloutWorker_w19 [2024-06-05 17:52:21,040][10367] Updated weights for policy 0, policy_version 170 (0.0025) [2024-06-05 17:52:23,680][10388] Worker 20 awakens! [2024-06-05 17:52:23,689][10130] Heartbeat connected on RolloutWorker_w20 [2024-06-05 17:52:23,920][10130] Fps is (10 sec: 42598.8, 60 sec: 39867.7, 300 sec: 29327.4). Total num frames: 2932736. Throughput: 0: 42789.7. Samples: 3068320. Policy #0 lag: (min: 0.0, avg: 7.7, max: 13.0) [2024-06-05 17:52:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:24,132][10367] Updated weights for policy 0, policy_version 180 (0.0023) [2024-06-05 17:52:27,730][10367] Updated weights for policy 0, policy_version 190 (0.0025) [2024-06-05 17:52:28,443][10392] Worker 21 awakens! [2024-06-05 17:52:28,452][10130] Heartbeat connected on RolloutWorker_w21 [2024-06-05 17:52:28,920][10130] Fps is (10 sec: 44237.1, 60 sec: 41233.1, 300 sec: 30115.4). Total num frames: 3162112. Throughput: 0: 43200.9. Samples: 3206880. Policy #0 lag: (min: 0.0, avg: 7.8, max: 14.0) [2024-06-05 17:52:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:31,659][10367] Updated weights for policy 0, policy_version 200 (0.0019) [2024-06-05 17:52:33,077][10389] Worker 22 awakens! [2024-06-05 17:52:33,088][10130] Heartbeat connected on RolloutWorker_w22 [2024-06-05 17:52:33,920][10130] Fps is (10 sec: 45874.8, 60 sec: 42871.3, 300 sec: 30831.7). Total num frames: 3391488. Throughput: 0: 43431.0. Samples: 3471560. Policy #0 lag: (min: 0.0, avg: 69.3, max: 204.0) [2024-06-05 17:52:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:33,927][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000207_3391488.pth... [2024-06-05 17:52:35,477][10367] Updated weights for policy 0, policy_version 210 (0.0026) [2024-06-05 17:52:37,775][10390] Worker 23 awakens! [2024-06-05 17:52:37,787][10130] Heartbeat connected on RolloutWorker_w23 [2024-06-05 17:52:38,423][10367] Updated weights for policy 0, policy_version 220 (0.0020) [2024-06-05 17:52:38,920][10130] Fps is (10 sec: 45875.3, 60 sec: 43690.6, 300 sec: 31485.8). Total num frames: 3620864. Throughput: 0: 43491.2. Samples: 3740120. Policy #0 lag: (min: 0.0, avg: 8.7, max: 15.0) [2024-06-05 17:52:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:42,383][10367] Updated weights for policy 0, policy_version 230 (0.0019) [2024-06-05 17:52:42,413][10391] Worker 24 awakens! [2024-06-05 17:52:42,424][10130] Heartbeat connected on RolloutWorker_w24 [2024-06-05 17:52:43,920][10130] Fps is (10 sec: 45875.4, 60 sec: 43963.7, 300 sec: 32085.4). Total num frames: 3850240. Throughput: 0: 43715.0. Samples: 3888300. Policy #0 lag: (min: 0.0, avg: 8.1, max: 17.0) [2024-06-05 17:52:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:46,053][10367] Updated weights for policy 0, policy_version 240 (0.0021) [2024-06-05 17:52:47,179][10393] Worker 25 awakens! [2024-06-05 17:52:47,190][10130] Heartbeat connected on RolloutWorker_w25 [2024-06-05 17:52:48,920][10130] Fps is (10 sec: 45873.6, 60 sec: 44782.7, 300 sec: 32636.9). Total num frames: 4079616. Throughput: 0: 44058.4. Samples: 4160300. Policy #0 lag: (min: 0.0, avg: 7.9, max: 17.0) [2024-06-05 17:52:48,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:49,131][10367] Updated weights for policy 0, policy_version 250 (0.0024) [2024-06-05 17:52:51,873][10394] Worker 26 awakens! [2024-06-05 17:52:51,885][10130] Heartbeat connected on RolloutWorker_w26 [2024-06-05 17:52:53,063][10367] Updated weights for policy 0, policy_version 260 (0.0025) [2024-06-05 17:52:53,920][10130] Fps is (10 sec: 44237.0, 60 sec: 43963.8, 300 sec: 33020.1). Total num frames: 4292608. Throughput: 0: 45023.6. Samples: 4445540. Policy #0 lag: (min: 0.0, avg: 8.0, max: 19.0) [2024-06-05 17:52:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:56,038][10367] Updated weights for policy 0, policy_version 270 (0.0028) [2024-06-05 17:52:56,528][10396] Worker 27 awakens! [2024-06-05 17:52:56,540][10130] Heartbeat connected on RolloutWorker_w27 [2024-06-05 17:52:58,920][10130] Fps is (10 sec: 45876.6, 60 sec: 45056.0, 300 sec: 33617.6). Total num frames: 4538368. Throughput: 0: 45173.0. Samples: 4582180. Policy #0 lag: (min: 0.0, avg: 7.9, max: 18.0) [2024-06-05 17:52:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:52:59,653][10367] Updated weights for policy 0, policy_version 280 (0.0025) [2024-06-05 17:53:01,243][10395] Worker 28 awakens! [2024-06-05 17:53:01,255][10130] Heartbeat connected on RolloutWorker_w28 [2024-06-05 17:53:03,198][10367] Updated weights for policy 0, policy_version 290 (0.0023) [2024-06-05 17:53:03,920][10130] Fps is (10 sec: 47513.1, 60 sec: 45055.9, 300 sec: 34055.3). Total num frames: 4767744. Throughput: 0: 46026.6. Samples: 4876140. Policy #0 lag: (min: 0.0, avg: 9.4, max: 19.0) [2024-06-05 17:53:03,929][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:05,911][10397] Worker 29 awakens! [2024-06-05 17:53:05,925][10130] Heartbeat connected on RolloutWorker_w29 [2024-06-05 17:53:06,251][10367] Updated weights for policy 0, policy_version 300 (0.0032) [2024-06-05 17:53:08,920][10130] Fps is (10 sec: 49151.1, 60 sec: 45602.0, 300 sec: 34688.9). Total num frames: 5029888. Throughput: 0: 46486.0. Samples: 5160200. Policy #0 lag: (min: 0.0, avg: 11.3, max: 20.0) [2024-06-05 17:53:08,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:10,432][10367] Updated weights for policy 0, policy_version 310 (0.0028) [2024-06-05 17:53:10,543][10399] Worker 30 awakens! [2024-06-05 17:53:10,565][10130] Heartbeat connected on RolloutWorker_w30 [2024-06-05 17:53:10,683][10347] Signal inference workers to stop experience collection... (50 times) [2024-06-05 17:53:10,722][10367] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-05 17:53:10,731][10347] Signal inference workers to resume experience collection... (50 times) [2024-06-05 17:53:10,740][10367] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-05 17:53:13,174][10367] Updated weights for policy 0, policy_version 320 (0.0032) [2024-06-05 17:53:13,920][10130] Fps is (10 sec: 49153.1, 60 sec: 45875.4, 300 sec: 35061.8). Total num frames: 5259264. Throughput: 0: 46835.7. Samples: 5314480. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 17:53:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:15,287][10398] Worker 31 awakens! [2024-06-05 17:53:15,301][10130] Heartbeat connected on RolloutWorker_w31 [2024-06-05 17:53:16,975][10367] Updated weights for policy 0, policy_version 330 (0.0032) [2024-06-05 17:53:18,920][10130] Fps is (10 sec: 47514.9, 60 sec: 46421.4, 300 sec: 35516.3). Total num frames: 5505024. Throughput: 0: 47489.1. Samples: 5608560. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-05 17:53:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:19,731][10367] Updated weights for policy 0, policy_version 340 (0.0036) [2024-06-05 17:53:23,435][10367] Updated weights for policy 0, policy_version 350 (0.0032) [2024-06-05 17:53:23,920][10130] Fps is (10 sec: 47512.8, 60 sec: 46694.4, 300 sec: 35840.0). Total num frames: 5734400. Throughput: 0: 48076.3. Samples: 5903560. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 17:53:23,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:26,264][10367] Updated weights for policy 0, policy_version 360 (0.0030) [2024-06-05 17:53:28,920][10130] Fps is (10 sec: 50790.6, 60 sec: 47513.7, 300 sec: 36442.1). Total num frames: 6012928. Throughput: 0: 47960.2. Samples: 6046500. Policy #0 lag: (min: 0.0, avg: 10.4, max: 20.0) [2024-06-05 17:53:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:30,004][10367] Updated weights for policy 0, policy_version 370 (0.0028) [2024-06-05 17:53:32,795][10367] Updated weights for policy 0, policy_version 380 (0.0022) [2024-06-05 17:53:33,921][10130] Fps is (10 sec: 54059.4, 60 sec: 48058.6, 300 sec: 36911.9). Total num frames: 6275072. Throughput: 0: 48709.4. Samples: 6352280. Policy #0 lag: (min: 1.0, avg: 10.2, max: 21.0) [2024-06-05 17:53:33,922][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:36,694][10367] Updated weights for policy 0, policy_version 390 (0.0028) [2024-06-05 17:53:38,920][10130] Fps is (10 sec: 50789.8, 60 sec: 48332.8, 300 sec: 37261.9). Total num frames: 6520832. Throughput: 0: 49069.8. Samples: 6653680. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-05 17:53:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:39,335][10367] Updated weights for policy 0, policy_version 400 (0.0025) [2024-06-05 17:53:43,045][10367] Updated weights for policy 0, policy_version 410 (0.0026) [2024-06-05 17:53:43,920][10130] Fps is (10 sec: 47520.7, 60 sec: 48332.8, 300 sec: 37501.2). Total num frames: 6750208. Throughput: 0: 49386.2. Samples: 6804560. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-05 17:53:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:45,842][10367] Updated weights for policy 0, policy_version 420 (0.0029) [2024-06-05 17:53:48,923][10130] Fps is (10 sec: 49136.1, 60 sec: 48876.6, 300 sec: 37904.0). Total num frames: 7012352. Throughput: 0: 49522.8. Samples: 7104820. Policy #0 lag: (min: 0.0, avg: 11.0, max: 20.0) [2024-06-05 17:53:48,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:49,402][10367] Updated weights for policy 0, policy_version 430 (0.0031) [2024-06-05 17:53:52,443][10367] Updated weights for policy 0, policy_version 440 (0.0036) [2024-06-05 17:53:53,920][10130] Fps is (10 sec: 52428.6, 60 sec: 49698.1, 300 sec: 38286.9). Total num frames: 7274496. Throughput: 0: 49723.7. Samples: 7397760. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-05 17:53:53,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:56,259][10367] Updated weights for policy 0, policy_version 450 (0.0036) [2024-06-05 17:53:58,920][10130] Fps is (10 sec: 50806.2, 60 sec: 49698.1, 300 sec: 38565.4). Total num frames: 7520256. Throughput: 0: 49711.3. Samples: 7551500. Policy #0 lag: (min: 0.0, avg: 8.9, max: 23.0) [2024-06-05 17:53:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:53:59,274][10367] Updated weights for policy 0, policy_version 460 (0.0036) [2024-06-05 17:54:02,961][10367] Updated weights for policy 0, policy_version 470 (0.0028) [2024-06-05 17:54:03,920][10130] Fps is (10 sec: 47513.7, 60 sec: 49698.2, 300 sec: 38748.2). Total num frames: 7749632. Throughput: 0: 49790.6. Samples: 7849140. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 17:54:03,931][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:05,740][10367] Updated weights for policy 0, policy_version 480 (0.0029) [2024-06-05 17:54:08,920][10130] Fps is (10 sec: 47514.6, 60 sec: 49425.3, 300 sec: 39002.0). Total num frames: 7995392. Throughput: 0: 49769.5. Samples: 8143180. Policy #0 lag: (min: 0.0, avg: 10.8, max: 20.0) [2024-06-05 17:54:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:09,597][10367] Updated weights for policy 0, policy_version 490 (0.0044) [2024-06-05 17:54:12,403][10367] Updated weights for policy 0, policy_version 500 (0.0034) [2024-06-05 17:54:13,920][10130] Fps is (10 sec: 52428.8, 60 sec: 50244.2, 300 sec: 39399.7). Total num frames: 8273920. Throughput: 0: 49845.6. Samples: 8289560. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-05 17:54:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:16,010][10367] Updated weights for policy 0, policy_version 510 (0.0021) [2024-06-05 17:54:18,927][10130] Fps is (10 sec: 50753.2, 60 sec: 49965.1, 300 sec: 39548.9). Total num frames: 8503296. Throughput: 0: 49701.2. Samples: 8589120. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-05 17:54:18,928][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:19,106][10367] Updated weights for policy 0, policy_version 520 (0.0035) [2024-06-05 17:54:22,790][10367] Updated weights for policy 0, policy_version 530 (0.0026) [2024-06-05 17:54:23,920][10130] Fps is (10 sec: 44237.1, 60 sec: 49698.2, 300 sec: 39619.5). Total num frames: 8716288. Throughput: 0: 49539.6. Samples: 8882960. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 17:54:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:25,967][10367] Updated weights for policy 0, policy_version 540 (0.0030) [2024-06-05 17:54:28,920][10130] Fps is (10 sec: 47548.5, 60 sec: 49425.1, 300 sec: 39904.2). Total num frames: 8978432. Throughput: 0: 49383.7. Samples: 9026820. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-05 17:54:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:29,482][10367] Updated weights for policy 0, policy_version 550 (0.0026) [2024-06-05 17:54:32,533][10367] Updated weights for policy 0, policy_version 560 (0.0028) [2024-06-05 17:54:33,532][10347] Signal inference workers to stop experience collection... (100 times) [2024-06-05 17:54:33,573][10367] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-05 17:54:33,583][10347] Signal inference workers to resume experience collection... (100 times) [2024-06-05 17:54:33,592][10367] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-05 17:54:33,920][10130] Fps is (10 sec: 54066.6, 60 sec: 49699.3, 300 sec: 40247.7). Total num frames: 9256960. Throughput: 0: 49301.7. Samples: 9323240. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-05 17:54:33,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:33,942][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000565_9256960.pth... [2024-06-05 17:54:36,002][10367] Updated weights for policy 0, policy_version 570 (0.0028) [2024-06-05 17:54:38,920][10130] Fps is (10 sec: 50789.8, 60 sec: 49425.1, 300 sec: 40367.4). Total num frames: 9486336. Throughput: 0: 49438.7. Samples: 9622500. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-05 17:54:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:39,063][10367] Updated weights for policy 0, policy_version 580 (0.0030) [2024-06-05 17:54:42,408][10367] Updated weights for policy 0, policy_version 590 (0.0030) [2024-06-05 17:54:43,920][10130] Fps is (10 sec: 45875.5, 60 sec: 49425.1, 300 sec: 40482.2). Total num frames: 9715712. Throughput: 0: 49290.8. Samples: 9769580. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 17:54:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:45,622][10367] Updated weights for policy 0, policy_version 600 (0.0025) [2024-06-05 17:54:48,920][10130] Fps is (10 sec: 49151.9, 60 sec: 49427.7, 300 sec: 40726.0). Total num frames: 9977856. Throughput: 0: 49204.0. Samples: 10063320. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-05 17:54:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:49,144][10367] Updated weights for policy 0, policy_version 610 (0.0030) [2024-06-05 17:54:52,361][10367] Updated weights for policy 0, policy_version 620 (0.0033) [2024-06-05 17:54:53,920][10130] Fps is (10 sec: 54068.1, 60 sec: 49698.3, 300 sec: 41025.6). Total num frames: 10256384. Throughput: 0: 49302.7. Samples: 10361800. Policy #0 lag: (min: 0.0, avg: 10.4, max: 20.0) [2024-06-05 17:54:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:54:55,848][10367] Updated weights for policy 0, policy_version 630 (0.0040) [2024-06-05 17:54:58,894][10367] Updated weights for policy 0, policy_version 640 (0.0035) [2024-06-05 17:54:58,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49425.2, 300 sec: 41120.7). Total num frames: 10485760. Throughput: 0: 49504.9. Samples: 10517280. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 17:54:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:02,325][10367] Updated weights for policy 0, policy_version 650 (0.0027) [2024-06-05 17:55:03,923][10130] Fps is (10 sec: 45858.1, 60 sec: 49422.1, 300 sec: 41211.5). Total num frames: 10715136. Throughput: 0: 49470.2. Samples: 10815100. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-05 17:55:03,925][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:05,429][10367] Updated weights for policy 0, policy_version 660 (0.0026) [2024-06-05 17:55:08,852][10367] Updated weights for policy 0, policy_version 670 (0.0030) [2024-06-05 17:55:08,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49698.0, 300 sec: 41423.7). Total num frames: 10977280. Throughput: 0: 49404.8. Samples: 11106180. Policy #0 lag: (min: 1.0, avg: 10.5, max: 21.0) [2024-06-05 17:55:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:12,167][10367] Updated weights for policy 0, policy_version 680 (0.0028) [2024-06-05 17:55:13,920][10130] Fps is (10 sec: 50809.2, 60 sec: 49152.1, 300 sec: 41566.9). Total num frames: 11223040. Throughput: 0: 49593.3. Samples: 11258520. Policy #0 lag: (min: 0.0, avg: 9.6, max: 23.0) [2024-06-05 17:55:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:15,511][10367] Updated weights for policy 0, policy_version 690 (0.0035) [2024-06-05 17:55:18,826][10367] Updated weights for policy 0, policy_version 700 (0.0037) [2024-06-05 17:55:18,920][10130] Fps is (10 sec: 49151.9, 60 sec: 49431.0, 300 sec: 41704.8). Total num frames: 11468800. Throughput: 0: 49773.8. Samples: 11563060. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-05 17:55:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:21,997][10367] Updated weights for policy 0, policy_version 710 (0.0028) [2024-06-05 17:55:23,920][10130] Fps is (10 sec: 47513.8, 60 sec: 49698.2, 300 sec: 41779.3). Total num frames: 11698176. Throughput: 0: 49812.2. Samples: 11864040. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-05 17:55:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:25,295][10367] Updated weights for policy 0, policy_version 720 (0.0025) [2024-06-05 17:55:28,569][10367] Updated weights for policy 0, policy_version 730 (0.0031) [2024-06-05 17:55:28,920][10130] Fps is (10 sec: 49151.9, 60 sec: 49698.0, 300 sec: 41966.1). Total num frames: 11960320. Throughput: 0: 49663.1. Samples: 12004420. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 17:55:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:31,724][10367] Updated weights for policy 0, policy_version 740 (0.0029) [2024-06-05 17:55:33,920][10130] Fps is (10 sec: 52428.5, 60 sec: 49425.2, 300 sec: 42146.5). Total num frames: 12222464. Throughput: 0: 49892.1. Samples: 12308460. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 17:55:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:35,221][10367] Updated weights for policy 0, policy_version 750 (0.0043) [2024-06-05 17:55:38,628][10367] Updated weights for policy 0, policy_version 760 (0.0028) [2024-06-05 17:55:38,920][10130] Fps is (10 sec: 50791.1, 60 sec: 49698.2, 300 sec: 42265.2). Total num frames: 12468224. Throughput: 0: 49796.4. Samples: 12602640. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 17:55:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:41,835][10367] Updated weights for policy 0, policy_version 770 (0.0027) [2024-06-05 17:55:43,920][10130] Fps is (10 sec: 47513.2, 60 sec: 49698.1, 300 sec: 43042.7). Total num frames: 12697600. Throughput: 0: 49536.5. Samples: 12746420. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 17:55:43,929][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:45,160][10367] Updated weights for policy 0, policy_version 780 (0.0027) [2024-06-05 17:55:48,403][10367] Updated weights for policy 0, policy_version 790 (0.0027) [2024-06-05 17:55:48,920][10130] Fps is (10 sec: 49151.4, 60 sec: 49698.1, 300 sec: 43376.0). Total num frames: 12959744. Throughput: 0: 49472.8. Samples: 13041200. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-05 17:55:48,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:51,851][10367] Updated weights for policy 0, policy_version 800 (0.0039) [2024-06-05 17:55:53,920][10130] Fps is (10 sec: 52428.9, 60 sec: 49425.0, 300 sec: 44264.6). Total num frames: 13221888. Throughput: 0: 49699.2. Samples: 13342640. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 17:55:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:55,023][10367] Updated weights for policy 0, policy_version 810 (0.0030) [2024-06-05 17:55:58,361][10367] Updated weights for policy 0, policy_version 820 (0.0025) [2024-06-05 17:55:58,920][10130] Fps is (10 sec: 49152.4, 60 sec: 49425.1, 300 sec: 44986.7). Total num frames: 13451264. Throughput: 0: 49787.0. Samples: 13498940. Policy #0 lag: (min: 1.0, avg: 9.1, max: 21.0) [2024-06-05 17:55:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:55:59,496][10347] Signal inference workers to stop experience collection... (150 times) [2024-06-05 17:55:59,523][10367] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-05 17:55:59,549][10347] Signal inference workers to resume experience collection... (150 times) [2024-06-05 17:55:59,554][10367] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-05 17:56:01,526][10367] Updated weights for policy 0, policy_version 830 (0.0033) [2024-06-05 17:56:03,920][10130] Fps is (10 sec: 47514.0, 60 sec: 49701.2, 300 sec: 45708.6). Total num frames: 13697024. Throughput: 0: 49581.5. Samples: 13794220. Policy #0 lag: (min: 1.0, avg: 11.4, max: 22.0) [2024-06-05 17:56:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:05,034][10367] Updated weights for policy 0, policy_version 840 (0.0026) [2024-06-05 17:56:07,973][10367] Updated weights for policy 0, policy_version 850 (0.0024) [2024-06-05 17:56:08,923][10130] Fps is (10 sec: 49134.1, 60 sec: 49422.1, 300 sec: 46263.4). Total num frames: 13942784. Throughput: 0: 49476.8. Samples: 14090680. Policy #0 lag: (min: 0.0, avg: 11.0, max: 20.0) [2024-06-05 17:56:08,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:11,479][10367] Updated weights for policy 0, policy_version 860 (0.0025) [2024-06-05 17:56:13,920][10130] Fps is (10 sec: 50790.0, 60 sec: 49698.1, 300 sec: 46819.4). Total num frames: 14204928. Throughput: 0: 49717.4. Samples: 14241700. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-05 17:56:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:14,512][10367] Updated weights for policy 0, policy_version 870 (0.0028) [2024-06-05 17:56:17,992][10367] Updated weights for policy 0, policy_version 880 (0.0021) [2024-06-05 17:56:18,920][10130] Fps is (10 sec: 50809.1, 60 sec: 49698.2, 300 sec: 47152.6). Total num frames: 14450688. Throughput: 0: 49681.3. Samples: 14544120. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-05 17:56:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:21,105][10367] Updated weights for policy 0, policy_version 890 (0.0029) [2024-06-05 17:56:23,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49971.1, 300 sec: 47485.8). Total num frames: 14696448. Throughput: 0: 49820.4. Samples: 14844560. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 17:56:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:24,769][10367] Updated weights for policy 0, policy_version 900 (0.0030) [2024-06-05 17:56:27,746][10367] Updated weights for policy 0, policy_version 910 (0.0033) [2024-06-05 17:56:28,920][10130] Fps is (10 sec: 49151.5, 60 sec: 49698.2, 300 sec: 47874.6). Total num frames: 14942208. Throughput: 0: 49644.9. Samples: 14980440. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-05 17:56:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:31,396][10367] Updated weights for policy 0, policy_version 920 (0.0030) [2024-06-05 17:56:33,920][10130] Fps is (10 sec: 50790.2, 60 sec: 49698.1, 300 sec: 48152.3). Total num frames: 15204352. Throughput: 0: 49724.1. Samples: 15278780. Policy #0 lag: (min: 0.0, avg: 10.8, max: 20.0) [2024-06-05 17:56:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:34,025][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000929_15220736.pth... [2024-06-05 17:56:34,070][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000207_3391488.pth [2024-06-05 17:56:34,348][10367] Updated weights for policy 0, policy_version 930 (0.0031) [2024-06-05 17:56:37,934][10367] Updated weights for policy 0, policy_version 940 (0.0024) [2024-06-05 17:56:38,920][10130] Fps is (10 sec: 50790.8, 60 sec: 49698.1, 300 sec: 48263.4). Total num frames: 15450112. Throughput: 0: 49526.7. Samples: 15571340. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-05 17:56:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:41,139][10367] Updated weights for policy 0, policy_version 950 (0.0028) [2024-06-05 17:56:43,920][10130] Fps is (10 sec: 45874.6, 60 sec: 49425.0, 300 sec: 48374.4). Total num frames: 15663104. Throughput: 0: 49251.9. Samples: 15715280. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 17:56:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:44,748][10367] Updated weights for policy 0, policy_version 960 (0.0041) [2024-06-05 17:56:48,031][10367] Updated weights for policy 0, policy_version 970 (0.0031) [2024-06-05 17:56:48,920][10130] Fps is (10 sec: 47513.5, 60 sec: 49425.1, 300 sec: 48374.5). Total num frames: 15925248. Throughput: 0: 49133.3. Samples: 16005220. Policy #0 lag: (min: 1.0, avg: 11.6, max: 22.0) [2024-06-05 17:56:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:51,751][10367] Updated weights for policy 0, policy_version 980 (0.0029) [2024-06-05 17:56:53,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49425.0, 300 sec: 48652.1). Total num frames: 16187392. Throughput: 0: 49062.5. Samples: 16298320. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 17:56:53,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:56:54,782][10367] Updated weights for policy 0, policy_version 990 (0.0031) [2024-06-05 17:56:58,246][10367] Updated weights for policy 0, policy_version 1000 (0.0023) [2024-06-05 17:56:58,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49425.1, 300 sec: 48652.2). Total num frames: 16416768. Throughput: 0: 49227.2. Samples: 16456920. Policy #0 lag: (min: 0.0, avg: 9.2, max: 23.0) [2024-06-05 17:56:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:01,299][10367] Updated weights for policy 0, policy_version 1010 (0.0038) [2024-06-05 17:57:03,920][10130] Fps is (10 sec: 45876.0, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 16646144. Throughput: 0: 48875.6. Samples: 16743520. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-05 17:57:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:04,693][10367] Updated weights for policy 0, policy_version 1020 (0.0036) [2024-06-05 17:57:08,030][10367] Updated weights for policy 0, policy_version 1030 (0.0029) [2024-06-05 17:57:08,920][10130] Fps is (10 sec: 47513.1, 60 sec: 49154.9, 300 sec: 48763.2). Total num frames: 16891904. Throughput: 0: 48702.6. Samples: 17036180. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-05 17:57:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:09,901][10347] Signal inference workers to stop experience collection... (200 times) [2024-06-05 17:57:09,952][10347] Signal inference workers to resume experience collection... (200 times) [2024-06-05 17:57:09,953][10367] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-05 17:57:09,969][10367] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-05 17:57:11,440][10367] Updated weights for policy 0, policy_version 1040 (0.0025) [2024-06-05 17:57:13,924][10130] Fps is (10 sec: 50770.8, 60 sec: 49148.9, 300 sec: 48929.2). Total num frames: 17154048. Throughput: 0: 49047.5. Samples: 17187760. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-05 17:57:13,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:14,636][10367] Updated weights for policy 0, policy_version 1050 (0.0024) [2024-06-05 17:57:18,258][10367] Updated weights for policy 0, policy_version 1060 (0.0023) [2024-06-05 17:57:18,920][10130] Fps is (10 sec: 49152.3, 60 sec: 48878.9, 300 sec: 48985.4). Total num frames: 17383424. Throughput: 0: 48994.7. Samples: 17483540. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 17:57:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:21,611][10367] Updated weights for policy 0, policy_version 1070 (0.0028) [2024-06-05 17:57:23,920][10130] Fps is (10 sec: 45893.0, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 17612800. Throughput: 0: 49123.2. Samples: 17781880. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-05 17:57:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:24,791][10367] Updated weights for policy 0, policy_version 1080 (0.0027) [2024-06-05 17:57:28,317][10367] Updated weights for policy 0, policy_version 1090 (0.0033) [2024-06-05 17:57:28,920][10130] Fps is (10 sec: 49152.3, 60 sec: 48879.0, 300 sec: 49096.5). Total num frames: 17874944. Throughput: 0: 49008.6. Samples: 17920660. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 17:57:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:31,394][10367] Updated weights for policy 0, policy_version 1100 (0.0024) [2024-06-05 17:57:33,920][10130] Fps is (10 sec: 54066.3, 60 sec: 49151.9, 300 sec: 49263.1). Total num frames: 18153472. Throughput: 0: 49391.4. Samples: 18227840. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 17:57:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:34,708][10367] Updated weights for policy 0, policy_version 1110 (0.0031) [2024-06-05 17:57:37,980][10367] Updated weights for policy 0, policy_version 1120 (0.0023) [2024-06-05 17:57:38,920][10130] Fps is (10 sec: 49151.6, 60 sec: 48605.8, 300 sec: 49207.6). Total num frames: 18366464. Throughput: 0: 49428.1. Samples: 18522580. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-05 17:57:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:41,318][10367] Updated weights for policy 0, policy_version 1130 (0.0031) [2024-06-05 17:57:43,920][10130] Fps is (10 sec: 45876.0, 60 sec: 49152.2, 300 sec: 49263.2). Total num frames: 18612224. Throughput: 0: 49285.8. Samples: 18674780. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 17:57:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:44,611][10367] Updated weights for policy 0, policy_version 1140 (0.0023) [2024-06-05 17:57:47,744][10367] Updated weights for policy 0, policy_version 1150 (0.0027) [2024-06-05 17:57:48,920][10130] Fps is (10 sec: 50790.5, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 18874368. Throughput: 0: 49469.7. Samples: 18969660. Policy #0 lag: (min: 0.0, avg: 11.8, max: 21.0) [2024-06-05 17:57:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:50,904][10367] Updated weights for policy 0, policy_version 1160 (0.0018) [2024-06-05 17:57:53,920][10130] Fps is (10 sec: 54066.1, 60 sec: 49425.1, 300 sec: 49540.8). Total num frames: 19152896. Throughput: 0: 49701.7. Samples: 19272760. Policy #0 lag: (min: 1.0, avg: 11.0, max: 21.0) [2024-06-05 17:57:53,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:57:54,257][10367] Updated weights for policy 0, policy_version 1170 (0.0027) [2024-06-05 17:57:57,476][10367] Updated weights for policy 0, policy_version 1180 (0.0022) [2024-06-05 17:57:58,920][10130] Fps is (10 sec: 54066.9, 60 sec: 49971.1, 300 sec: 49651.9). Total num frames: 19415040. Throughput: 0: 49994.8. Samples: 19437340. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-05 17:57:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:00,727][10367] Updated weights for policy 0, policy_version 1190 (0.0026) [2024-06-05 17:58:03,887][10367] Updated weights for policy 0, policy_version 1200 (0.0025) [2024-06-05 17:58:03,924][10130] Fps is (10 sec: 50772.4, 60 sec: 50241.1, 300 sec: 49595.7). Total num frames: 19660800. Throughput: 0: 50194.5. Samples: 19742480. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 17:58:03,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:07,170][10367] Updated weights for policy 0, policy_version 1210 (0.0031) [2024-06-05 17:58:08,920][10130] Fps is (10 sec: 47514.2, 60 sec: 49971.3, 300 sec: 49596.3). Total num frames: 19890176. Throughput: 0: 50157.3. Samples: 20038960. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-05 17:58:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:10,743][10367] Updated weights for policy 0, policy_version 1220 (0.0028) [2024-06-05 17:58:13,920][10130] Fps is (10 sec: 47531.0, 60 sec: 49701.3, 300 sec: 49596.3). Total num frames: 20135936. Throughput: 0: 50299.9. Samples: 20184160. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-05 17:58:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:13,930][10367] Updated weights for policy 0, policy_version 1230 (0.0027) [2024-06-05 17:58:14,996][10347] Signal inference workers to stop experience collection... (250 times) [2024-06-05 17:58:15,028][10367] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-05 17:58:15,045][10347] Signal inference workers to resume experience collection... (250 times) [2024-06-05 17:58:15,047][10367] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-05 17:58:17,198][10367] Updated weights for policy 0, policy_version 1240 (0.0031) [2024-06-05 17:58:18,920][10130] Fps is (10 sec: 52428.1, 60 sec: 50517.2, 300 sec: 49762.9). Total num frames: 20414464. Throughput: 0: 50094.2. Samples: 20482080. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-05 17:58:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:20,700][10367] Updated weights for policy 0, policy_version 1250 (0.0035) [2024-06-05 17:58:23,884][10367] Updated weights for policy 0, policy_version 1260 (0.0037) [2024-06-05 17:58:23,920][10130] Fps is (10 sec: 50790.9, 60 sec: 50517.3, 300 sec: 49596.3). Total num frames: 20643840. Throughput: 0: 50177.4. Samples: 20780560. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-05 17:58:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:27,134][10367] Updated weights for policy 0, policy_version 1270 (0.0029) [2024-06-05 17:58:28,920][10130] Fps is (10 sec: 47513.8, 60 sec: 50244.2, 300 sec: 49541.0). Total num frames: 20889600. Throughput: 0: 49994.1. Samples: 20924520. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 17:58:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:30,349][10367] Updated weights for policy 0, policy_version 1280 (0.0026) [2024-06-05 17:58:33,689][10367] Updated weights for policy 0, policy_version 1290 (0.0018) [2024-06-05 17:58:33,920][10130] Fps is (10 sec: 49151.8, 60 sec: 49698.2, 300 sec: 49540.8). Total num frames: 21135360. Throughput: 0: 49958.3. Samples: 21217780. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-05 17:58:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:33,936][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000001290_21135360.pth... [2024-06-05 17:58:33,977][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000565_9256960.pth [2024-06-05 17:58:37,317][10367] Updated weights for policy 0, policy_version 1300 (0.0030) [2024-06-05 17:58:38,920][10130] Fps is (10 sec: 50790.6, 60 sec: 50517.4, 300 sec: 49651.9). Total num frames: 21397504. Throughput: 0: 49891.7. Samples: 21517880. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-05 17:58:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:40,333][10367] Updated weights for policy 0, policy_version 1310 (0.0019) [2024-06-05 17:58:43,770][10367] Updated weights for policy 0, policy_version 1320 (0.0031) [2024-06-05 17:58:43,923][10130] Fps is (10 sec: 49133.9, 60 sec: 50241.1, 300 sec: 49540.7). Total num frames: 21626880. Throughput: 0: 49678.3. Samples: 21673040. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 17:58:43,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:47,157][10367] Updated weights for policy 0, policy_version 1330 (0.0028) [2024-06-05 17:58:48,920][10130] Fps is (10 sec: 47513.5, 60 sec: 49971.2, 300 sec: 49485.2). Total num frames: 21872640. Throughput: 0: 49371.1. Samples: 21964000. Policy #0 lag: (min: 0.0, avg: 12.0, max: 22.0) [2024-06-05 17:58:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:50,391][10367] Updated weights for policy 0, policy_version 1340 (0.0035) [2024-06-05 17:58:53,502][10367] Updated weights for policy 0, policy_version 1350 (0.0032) [2024-06-05 17:58:53,920][10130] Fps is (10 sec: 49169.4, 60 sec: 49425.1, 300 sec: 49485.2). Total num frames: 22118400. Throughput: 0: 49540.2. Samples: 22268280. Policy #0 lag: (min: 0.0, avg: 11.9, max: 23.0) [2024-06-05 17:58:53,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:57,011][10367] Updated weights for policy 0, policy_version 1360 (0.0025) [2024-06-05 17:58:58,920][10130] Fps is (10 sec: 54066.7, 60 sec: 49971.2, 300 sec: 49707.4). Total num frames: 22413312. Throughput: 0: 49592.4. Samples: 22415820. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 17:58:58,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:58:59,949][10367] Updated weights for policy 0, policy_version 1370 (0.0033) [2024-06-05 17:59:03,674][10367] Updated weights for policy 0, policy_version 1380 (0.0025) [2024-06-05 17:59:03,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49155.0, 300 sec: 49540.8). Total num frames: 22609920. Throughput: 0: 49478.7. Samples: 22708620. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-05 17:59:03,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:06,646][10367] Updated weights for policy 0, policy_version 1390 (0.0028) [2024-06-05 17:59:08,920][10130] Fps is (10 sec: 44237.4, 60 sec: 49425.0, 300 sec: 49429.7). Total num frames: 22855680. Throughput: 0: 49479.1. Samples: 23007120. Policy #0 lag: (min: 0.0, avg: 12.6, max: 22.0) [2024-06-05 17:59:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:10,210][10367] Updated weights for policy 0, policy_version 1400 (0.0024) [2024-06-05 17:59:13,225][10367] Updated weights for policy 0, policy_version 1410 (0.0027) [2024-06-05 17:59:13,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49425.1, 300 sec: 49486.4). Total num frames: 23101440. Throughput: 0: 49588.9. Samples: 23156020. Policy #0 lag: (min: 1.0, avg: 12.2, max: 20.0) [2024-06-05 17:59:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:16,963][10367] Updated weights for policy 0, policy_version 1420 (0.0031) [2024-06-05 17:59:17,834][10347] Signal inference workers to stop experience collection... (300 times) [2024-06-05 17:59:17,834][10347] Signal inference workers to resume experience collection... (300 times) [2024-06-05 17:59:17,850][10367] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-05 17:59:17,850][10367] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-05 17:59:18,920][10130] Fps is (10 sec: 54066.9, 60 sec: 49698.2, 300 sec: 49762.9). Total num frames: 23396352. Throughput: 0: 49784.8. Samples: 23458100. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-05 17:59:18,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:19,763][10367] Updated weights for policy 0, policy_version 1430 (0.0025) [2024-06-05 17:59:23,763][10367] Updated weights for policy 0, policy_version 1440 (0.0024) [2024-06-05 17:59:23,920][10130] Fps is (10 sec: 49152.6, 60 sec: 49152.0, 300 sec: 49540.8). Total num frames: 23592960. Throughput: 0: 49796.1. Samples: 23758700. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-05 17:59:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:26,070][10367] Updated weights for policy 0, policy_version 1450 (0.0034) [2024-06-05 17:59:28,920][10130] Fps is (10 sec: 44237.0, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 23838720. Throughput: 0: 49454.7. Samples: 23898320. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-05 17:59:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:30,156][10367] Updated weights for policy 0, policy_version 1460 (0.0032) [2024-06-05 17:59:32,910][10367] Updated weights for policy 0, policy_version 1470 (0.0033) [2024-06-05 17:59:33,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 49485.3). Total num frames: 24084480. Throughput: 0: 49504.1. Samples: 24191680. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 17:59:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:36,741][10367] Updated weights for policy 0, policy_version 1480 (0.0023) [2024-06-05 17:59:38,920][10130] Fps is (10 sec: 54067.7, 60 sec: 49698.2, 300 sec: 49707.4). Total num frames: 24379392. Throughput: 0: 49202.5. Samples: 24482380. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-05 17:59:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:39,854][10367] Updated weights for policy 0, policy_version 1490 (0.0033) [2024-06-05 17:59:43,805][10367] Updated weights for policy 0, policy_version 1500 (0.0037) [2024-06-05 17:59:43,920][10130] Fps is (10 sec: 49151.4, 60 sec: 49154.9, 300 sec: 49485.2). Total num frames: 24576000. Throughput: 0: 49234.3. Samples: 24631360. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-05 17:59:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:46,414][10367] Updated weights for policy 0, policy_version 1510 (0.0024) [2024-06-05 17:59:48,920][10130] Fps is (10 sec: 44236.5, 60 sec: 49152.0, 300 sec: 49374.1). Total num frames: 24821760. Throughput: 0: 49402.3. Samples: 24931720. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-05 17:59:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:50,398][10367] Updated weights for policy 0, policy_version 1520 (0.0030) [2024-06-05 17:59:52,901][10367] Updated weights for policy 0, policy_version 1530 (0.0028) [2024-06-05 17:59:53,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 25067520. Throughput: 0: 49152.3. Samples: 25218980. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-05 17:59:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:57,188][10367] Updated weights for policy 0, policy_version 1540 (0.0020) [2024-06-05 17:59:58,920][10130] Fps is (10 sec: 54066.4, 60 sec: 49152.0, 300 sec: 49652.4). Total num frames: 25362432. Throughput: 0: 49368.4. Samples: 25377600. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-05 17:59:58,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 17:59:59,762][10367] Updated weights for policy 0, policy_version 1550 (0.0041) [2024-06-05 18:00:03,643][10367] Updated weights for policy 0, policy_version 1560 (0.0031) [2024-06-05 18:00:03,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 25559040. Throughput: 0: 49129.8. Samples: 25668940. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-05 18:00:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:06,513][10367] Updated weights for policy 0, policy_version 1570 (0.0027) [2024-06-05 18:00:08,920][10130] Fps is (10 sec: 44237.5, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 25804800. Throughput: 0: 48899.5. Samples: 25959180. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:00:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:10,281][10367] Updated weights for policy 0, policy_version 1580 (0.0024) [2024-06-05 18:00:12,922][10367] Updated weights for policy 0, policy_version 1590 (0.0026) [2024-06-05 18:00:13,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 26050560. Throughput: 0: 49167.0. Samples: 26110840. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-05 18:00:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:16,842][10367] Updated weights for policy 0, policy_version 1600 (0.0029) [2024-06-05 18:00:17,792][10347] Signal inference workers to stop experience collection... (350 times) [2024-06-05 18:00:17,808][10367] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-05 18:00:17,847][10347] Signal inference workers to resume experience collection... (350 times) [2024-06-05 18:00:17,847][10367] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-05 18:00:18,920][10130] Fps is (10 sec: 54067.0, 60 sec: 49152.0, 300 sec: 49651.8). Total num frames: 26345472. Throughput: 0: 49332.8. Samples: 26411660. Policy #0 lag: (min: 0.0, avg: 8.4, max: 21.0) [2024-06-05 18:00:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:19,403][10367] Updated weights for policy 0, policy_version 1610 (0.0028) [2024-06-05 18:00:23,838][10367] Updated weights for policy 0, policy_version 1620 (0.0031) [2024-06-05 18:00:23,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49151.9, 300 sec: 49429.7). Total num frames: 26542080. Throughput: 0: 49213.7. Samples: 26697000. Policy #0 lag: (min: 0.0, avg: 8.8, max: 22.0) [2024-06-05 18:00:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:26,798][10367] Updated weights for policy 0, policy_version 1630 (0.0032) [2024-06-05 18:00:28,920][10130] Fps is (10 sec: 44236.5, 60 sec: 49151.9, 300 sec: 49374.1). Total num frames: 26787840. Throughput: 0: 48948.9. Samples: 26834060. Policy #0 lag: (min: 0.0, avg: 12.8, max: 21.0) [2024-06-05 18:00:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:30,249][10367] Updated weights for policy 0, policy_version 1640 (0.0027) [2024-06-05 18:00:33,495][10367] Updated weights for policy 0, policy_version 1650 (0.0029) [2024-06-05 18:00:33,924][10130] Fps is (10 sec: 50771.5, 60 sec: 49422.0, 300 sec: 49429.1). Total num frames: 27049984. Throughput: 0: 48867.9. Samples: 27130960. Policy #0 lag: (min: 0.0, avg: 13.0, max: 23.0) [2024-06-05 18:00:33,924][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:33,931][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000001651_27049984.pth... [2024-06-05 18:00:33,979][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000000929_15220736.pth [2024-06-05 18:00:36,929][10367] Updated weights for policy 0, policy_version 1660 (0.0027) [2024-06-05 18:00:38,920][10130] Fps is (10 sec: 54066.9, 60 sec: 49151.8, 300 sec: 49596.3). Total num frames: 27328512. Throughput: 0: 49280.0. Samples: 27436580. Policy #0 lag: (min: 0.0, avg: 12.6, max: 23.0) [2024-06-05 18:00:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:39,919][10367] Updated weights for policy 0, policy_version 1670 (0.0024) [2024-06-05 18:00:43,665][10367] Updated weights for policy 0, policy_version 1680 (0.0024) [2024-06-05 18:00:43,920][10130] Fps is (10 sec: 49169.8, 60 sec: 49425.1, 300 sec: 49429.7). Total num frames: 27541504. Throughput: 0: 49076.5. Samples: 27586040. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-05 18:00:43,922][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:46,369][10367] Updated weights for policy 0, policy_version 1690 (0.0033) [2024-06-05 18:00:48,920][10130] Fps is (10 sec: 45875.6, 60 sec: 49425.0, 300 sec: 49374.2). Total num frames: 27787264. Throughput: 0: 49141.3. Samples: 27880300. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:00:48,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:50,325][10367] Updated weights for policy 0, policy_version 1700 (0.0030) [2024-06-05 18:00:53,080][10367] Updated weights for policy 0, policy_version 1710 (0.0029) [2024-06-05 18:00:53,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.1, 300 sec: 49429.7). Total num frames: 28033024. Throughput: 0: 49212.4. Samples: 28173740. Policy #0 lag: (min: 1.0, avg: 12.2, max: 23.0) [2024-06-05 18:00:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:56,974][10367] Updated weights for policy 0, policy_version 1720 (0.0027) [2024-06-05 18:00:58,920][10130] Fps is (10 sec: 52429.3, 60 sec: 49152.2, 300 sec: 49540.8). Total num frames: 28311552. Throughput: 0: 49330.8. Samples: 28330720. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-05 18:00:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:00:59,827][10367] Updated weights for policy 0, policy_version 1730 (0.0023) [2024-06-05 18:01:03,478][10367] Updated weights for policy 0, policy_version 1740 (0.0031) [2024-06-05 18:01:03,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.1, 300 sec: 49430.3). Total num frames: 28524544. Throughput: 0: 49108.0. Samples: 28621520. Policy #0 lag: (min: 1.0, avg: 10.0, max: 22.0) [2024-06-05 18:01:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:06,441][10367] Updated weights for policy 0, policy_version 1750 (0.0031) [2024-06-05 18:01:08,920][10130] Fps is (10 sec: 45874.5, 60 sec: 49425.0, 300 sec: 49374.1). Total num frames: 28770304. Throughput: 0: 49319.4. Samples: 28916380. Policy #0 lag: (min: 0.0, avg: 9.6, max: 23.0) [2024-06-05 18:01:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:10,430][10367] Updated weights for policy 0, policy_version 1760 (0.0029) [2024-06-05 18:01:12,973][10367] Updated weights for policy 0, policy_version 1770 (0.0031) [2024-06-05 18:01:13,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.2, 300 sec: 49374.2). Total num frames: 29016064. Throughput: 0: 49395.2. Samples: 29056840. Policy #0 lag: (min: 1.0, avg: 10.9, max: 23.0) [2024-06-05 18:01:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:16,942][10367] Updated weights for policy 0, policy_version 1780 (0.0033) [2024-06-05 18:01:18,397][10347] Signal inference workers to stop experience collection... (400 times) [2024-06-05 18:01:18,397][10347] Signal inference workers to resume experience collection... (400 times) [2024-06-05 18:01:18,413][10367] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-05 18:01:18,443][10367] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-05 18:01:18,920][10130] Fps is (10 sec: 52428.4, 60 sec: 49151.9, 300 sec: 49485.2). Total num frames: 29294592. Throughput: 0: 49671.9. Samples: 29366020. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-05 18:01:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:19,697][10367] Updated weights for policy 0, policy_version 1790 (0.0024) [2024-06-05 18:01:23,623][10367] Updated weights for policy 0, policy_version 1800 (0.0040) [2024-06-05 18:01:23,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49425.1, 300 sec: 49374.2). Total num frames: 29507584. Throughput: 0: 49392.1. Samples: 29659220. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-05 18:01:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:26,490][10367] Updated weights for policy 0, policy_version 1810 (0.0033) [2024-06-05 18:01:28,920][10130] Fps is (10 sec: 45876.3, 60 sec: 49425.2, 300 sec: 49318.6). Total num frames: 29753344. Throughput: 0: 49118.4. Samples: 29796360. Policy #0 lag: (min: 1.0, avg: 10.2, max: 21.0) [2024-06-05 18:01:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:30,147][10367] Updated weights for policy 0, policy_version 1820 (0.0037) [2024-06-05 18:01:32,953][10367] Updated weights for policy 0, policy_version 1830 (0.0027) [2024-06-05 18:01:33,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49155.0, 300 sec: 49318.6). Total num frames: 29999104. Throughput: 0: 49068.4. Samples: 30088380. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-05 18:01:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:36,871][10367] Updated weights for policy 0, policy_version 1840 (0.0027) [2024-06-05 18:01:38,920][10130] Fps is (10 sec: 52428.3, 60 sec: 49152.1, 300 sec: 49540.8). Total num frames: 30277632. Throughput: 0: 49342.7. Samples: 30394160. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-05 18:01:38,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:39,550][10367] Updated weights for policy 0, policy_version 1850 (0.0020) [2024-06-05 18:01:43,447][10367] Updated weights for policy 0, policy_version 1860 (0.0037) [2024-06-05 18:01:43,920][10130] Fps is (10 sec: 49152.5, 60 sec: 49152.1, 300 sec: 49374.2). Total num frames: 30490624. Throughput: 0: 49143.1. Samples: 30542160. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:01:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:46,282][10367] Updated weights for policy 0, policy_version 1870 (0.0025) [2024-06-05 18:01:48,920][10130] Fps is (10 sec: 45874.7, 60 sec: 49151.9, 300 sec: 49318.6). Total num frames: 30736384. Throughput: 0: 49293.2. Samples: 30839720. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-05 18:01:48,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:50,137][10367] Updated weights for policy 0, policy_version 1880 (0.0028) [2024-06-05 18:01:52,703][10367] Updated weights for policy 0, policy_version 1890 (0.0027) [2024-06-05 18:01:53,920][10130] Fps is (10 sec: 50790.4, 60 sec: 49425.1, 300 sec: 49429.7). Total num frames: 30998528. Throughput: 0: 49393.0. Samples: 31139060. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-05 18:01:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:56,685][10367] Updated weights for policy 0, policy_version 1900 (0.0028) [2024-06-05 18:01:58,920][10130] Fps is (10 sec: 52429.5, 60 sec: 49152.0, 300 sec: 49540.8). Total num frames: 31260672. Throughput: 0: 49658.6. Samples: 31291480. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-05 18:01:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:01:59,543][10367] Updated weights for policy 0, policy_version 1910 (0.0030) [2024-06-05 18:02:03,230][10367] Updated weights for policy 0, policy_version 1920 (0.0024) [2024-06-05 18:02:03,920][10130] Fps is (10 sec: 49151.9, 60 sec: 49425.1, 300 sec: 49485.2). Total num frames: 31490048. Throughput: 0: 49458.0. Samples: 31591620. Policy #0 lag: (min: 1.0, avg: 10.0, max: 22.0) [2024-06-05 18:02:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:05,962][10367] Updated weights for policy 0, policy_version 1930 (0.0028) [2024-06-05 18:02:08,920][10130] Fps is (10 sec: 47513.9, 60 sec: 49425.2, 300 sec: 49430.3). Total num frames: 31735808. Throughput: 0: 49524.1. Samples: 31887800. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-05 18:02:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:09,825][10367] Updated weights for policy 0, policy_version 1940 (0.0031) [2024-06-05 18:02:12,649][10367] Updated weights for policy 0, policy_version 1950 (0.0030) [2024-06-05 18:02:13,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49698.1, 300 sec: 49540.8). Total num frames: 31997952. Throughput: 0: 49704.0. Samples: 32033040. Policy #0 lag: (min: 1.0, avg: 10.8, max: 22.0) [2024-06-05 18:02:13,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:02:16,333][10367] Updated weights for policy 0, policy_version 1960 (0.0036) [2024-06-05 18:02:18,920][10130] Fps is (10 sec: 50789.8, 60 sec: 49152.1, 300 sec: 49596.3). Total num frames: 32243712. Throughput: 0: 49897.8. Samples: 32333780. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:02:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:19,162][10367] Updated weights for policy 0, policy_version 1970 (0.0033) [2024-06-05 18:02:23,019][10367] Updated weights for policy 0, policy_version 1980 (0.0035) [2024-06-05 18:02:23,920][10130] Fps is (10 sec: 49151.5, 60 sec: 49698.1, 300 sec: 49540.7). Total num frames: 32489472. Throughput: 0: 49729.3. Samples: 32631980. Policy #0 lag: (min: 1.0, avg: 9.4, max: 20.0) [2024-06-05 18:02:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:25,790][10367] Updated weights for policy 0, policy_version 1990 (0.0027) [2024-06-05 18:02:27,475][10347] Signal inference workers to stop experience collection... (450 times) [2024-06-05 18:02:27,476][10347] Signal inference workers to resume experience collection... (450 times) [2024-06-05 18:02:27,489][10367] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-05 18:02:27,489][10367] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-05 18:02:28,920][10130] Fps is (10 sec: 49152.5, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 32735232. Throughput: 0: 49710.7. Samples: 32779140. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:02:28,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:02:29,438][10367] Updated weights for policy 0, policy_version 2000 (0.0027) [2024-06-05 18:02:32,214][10367] Updated weights for policy 0, policy_version 2010 (0.0026) [2024-06-05 18:02:33,920][10130] Fps is (10 sec: 49152.6, 60 sec: 49698.2, 300 sec: 49540.8). Total num frames: 32980992. Throughput: 0: 49734.9. Samples: 33077780. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:02:33,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:02:33,931][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002013_32980992.pth... [2024-06-05 18:02:33,978][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000001290_21135360.pth [2024-06-05 18:02:36,034][10367] Updated weights for policy 0, policy_version 2020 (0.0020) [2024-06-05 18:02:38,920][10130] Fps is (10 sec: 50789.9, 60 sec: 49425.0, 300 sec: 49596.3). Total num frames: 33243136. Throughput: 0: 49835.9. Samples: 33381680. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-05 18:02:38,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:38,999][10367] Updated weights for policy 0, policy_version 2030 (0.0029) [2024-06-05 18:02:42,514][10367] Updated weights for policy 0, policy_version 2040 (0.0026) [2024-06-05 18:02:43,920][10130] Fps is (10 sec: 52428.5, 60 sec: 50244.2, 300 sec: 49596.3). Total num frames: 33505280. Throughput: 0: 49856.4. Samples: 33535020. Policy #0 lag: (min: 0.0, avg: 10.6, max: 23.0) [2024-06-05 18:02:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:45,592][10367] Updated weights for policy 0, policy_version 2050 (0.0035) [2024-06-05 18:02:48,920][10130] Fps is (10 sec: 47514.1, 60 sec: 49698.3, 300 sec: 49374.2). Total num frames: 33718272. Throughput: 0: 49959.2. Samples: 33839780. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-05 18:02:48,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:02:49,191][10367] Updated weights for policy 0, policy_version 2060 (0.0026) [2024-06-05 18:02:52,023][10367] Updated weights for policy 0, policy_version 2070 (0.0035) [2024-06-05 18:02:53,920][10130] Fps is (10 sec: 47513.3, 60 sec: 49698.1, 300 sec: 49374.2). Total num frames: 33980416. Throughput: 0: 49815.0. Samples: 34129480. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:02:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:02:55,906][10367] Updated weights for policy 0, policy_version 2080 (0.0025) [2024-06-05 18:02:58,544][10367] Updated weights for policy 0, policy_version 2090 (0.0027) [2024-06-05 18:02:58,920][10130] Fps is (10 sec: 52428.3, 60 sec: 49698.1, 300 sec: 49430.3). Total num frames: 34242560. Throughput: 0: 49839.5. Samples: 34275820. Policy #0 lag: (min: 0.0, avg: 8.5, max: 22.0) [2024-06-05 18:02:58,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:02,392][10367] Updated weights for policy 0, policy_version 2100 (0.0034) [2024-06-05 18:03:03,920][10130] Fps is (10 sec: 50791.0, 60 sec: 49971.2, 300 sec: 49485.2). Total num frames: 34488320. Throughput: 0: 49993.5. Samples: 34583480. Policy #0 lag: (min: 2.0, avg: 10.0, max: 22.0) [2024-06-05 18:03:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:05,225][10367] Updated weights for policy 0, policy_version 2110 (0.0025) [2024-06-05 18:03:08,920][10130] Fps is (10 sec: 47513.8, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 34717696. Throughput: 0: 49885.0. Samples: 34876800. Policy #0 lag: (min: 0.0, avg: 11.8, max: 22.0) [2024-06-05 18:03:08,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:08,970][10367] Updated weights for policy 0, policy_version 2120 (0.0030) [2024-06-05 18:03:11,995][10367] Updated weights for policy 0, policy_version 2130 (0.0029) [2024-06-05 18:03:13,920][10130] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 49263.1). Total num frames: 34947072. Throughput: 0: 49529.8. Samples: 35007980. Policy #0 lag: (min: 0.0, avg: 11.1, max: 25.0) [2024-06-05 18:03:13,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:15,790][10367] Updated weights for policy 0, policy_version 2140 (0.0028) [2024-06-05 18:03:18,580][10367] Updated weights for policy 0, policy_version 2150 (0.0024) [2024-06-05 18:03:18,920][10130] Fps is (10 sec: 50790.0, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 35225600. Throughput: 0: 49502.1. Samples: 35305380. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:03:18,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:22,486][10367] Updated weights for policy 0, policy_version 2160 (0.0024) [2024-06-05 18:03:23,920][10130] Fps is (10 sec: 52429.2, 60 sec: 49698.3, 300 sec: 49429.7). Total num frames: 35471360. Throughput: 0: 49262.4. Samples: 35598480. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-05 18:03:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:25,211][10367] Updated weights for policy 0, policy_version 2170 (0.0028) [2024-06-05 18:03:28,911][10347] Signal inference workers to stop experience collection... (500 times) [2024-06-05 18:03:28,920][10130] Fps is (10 sec: 45875.2, 60 sec: 49151.9, 300 sec: 49318.6). Total num frames: 35684352. Throughput: 0: 49062.1. Samples: 35742820. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-05 18:03:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:28,933][10367] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-05 18:03:28,965][10347] Signal inference workers to resume experience collection... (500 times) [2024-06-05 18:03:28,966][10367] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-05 18:03:29,099][10367] Updated weights for policy 0, policy_version 2180 (0.0029) [2024-06-05 18:03:32,030][10367] Updated weights for policy 0, policy_version 2190 (0.0025) [2024-06-05 18:03:33,920][10130] Fps is (10 sec: 47512.9, 60 sec: 49425.0, 300 sec: 49318.6). Total num frames: 35946496. Throughput: 0: 48762.1. Samples: 36034080. Policy #0 lag: (min: 0.0, avg: 11.6, max: 22.0) [2024-06-05 18:03:33,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:03:35,753][10367] Updated weights for policy 0, policy_version 2200 (0.0029) [2024-06-05 18:03:38,713][10367] Updated weights for policy 0, policy_version 2210 (0.0037) [2024-06-05 18:03:38,920][10130] Fps is (10 sec: 52429.4, 60 sec: 49425.1, 300 sec: 49430.3). Total num frames: 36208640. Throughput: 0: 48962.3. Samples: 36332780. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:03:38,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:03:42,419][10367] Updated weights for policy 0, policy_version 2220 (0.0033) [2024-06-05 18:03:43,920][10130] Fps is (10 sec: 49152.2, 60 sec: 48879.0, 300 sec: 49374.2). Total num frames: 36438016. Throughput: 0: 49133.0. Samples: 36486800. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-05 18:03:43,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:45,180][10367] Updated weights for policy 0, policy_version 2230 (0.0031) [2024-06-05 18:03:48,920][10130] Fps is (10 sec: 47513.0, 60 sec: 49424.9, 300 sec: 49374.2). Total num frames: 36683776. Throughput: 0: 48884.3. Samples: 36783280. Policy #0 lag: (min: 0.0, avg: 10.6, max: 20.0) [2024-06-05 18:03:48,921][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:03:49,061][10367] Updated weights for policy 0, policy_version 2240 (0.0027) [2024-06-05 18:03:51,691][10367] Updated weights for policy 0, policy_version 2250 (0.0034) [2024-06-05 18:03:53,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49152.1, 300 sec: 49207.6). Total num frames: 36929536. Throughput: 0: 48767.2. Samples: 37071320. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-05 18:03:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:03:55,665][10367] Updated weights for policy 0, policy_version 2260 (0.0024) [2024-06-05 18:03:58,681][10367] Updated weights for policy 0, policy_version 2270 (0.0027) [2024-06-05 18:03:58,920][10130] Fps is (10 sec: 50791.1, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 37191680. Throughput: 0: 49253.3. Samples: 37224380. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:03:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:04:02,227][10367] Updated weights for policy 0, policy_version 2280 (0.0029) [2024-06-05 18:04:03,920][10130] Fps is (10 sec: 49151.9, 60 sec: 48878.9, 300 sec: 49374.2). Total num frames: 37421056. Throughput: 0: 49152.6. Samples: 37517240. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:04:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:04:05,338][10367] Updated weights for policy 0, policy_version 2290 (0.0020) [2024-06-05 18:04:08,784][10367] Updated weights for policy 0, policy_version 2300 (0.0026) [2024-06-05 18:04:08,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.1, 300 sec: 49429.7). Total num frames: 37683200. Throughput: 0: 49562.1. Samples: 37828780. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:04:08,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:11,868][10367] Updated weights for policy 0, policy_version 2310 (0.0026) [2024-06-05 18:04:13,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.1, 300 sec: 49207.6). Total num frames: 37912576. Throughput: 0: 49419.3. Samples: 37966680. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-05 18:04:13,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:15,387][10367] Updated weights for policy 0, policy_version 2320 (0.0031) [2024-06-05 18:04:18,424][10367] Updated weights for policy 0, policy_version 2330 (0.0029) [2024-06-05 18:04:18,920][10130] Fps is (10 sec: 50790.1, 60 sec: 49425.1, 300 sec: 49485.2). Total num frames: 38191104. Throughput: 0: 49635.6. Samples: 38267680. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-05 18:04:18,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:22,051][10367] Updated weights for policy 0, policy_version 2340 (0.0026) [2024-06-05 18:04:23,920][10130] Fps is (10 sec: 50790.0, 60 sec: 49151.9, 300 sec: 49429.7). Total num frames: 38420480. Throughput: 0: 49663.9. Samples: 38567660. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-05 18:04:23,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:24,802][10367] Updated weights for policy 0, policy_version 2350 (0.0030) [2024-06-05 18:04:28,844][10367] Updated weights for policy 0, policy_version 2360 (0.0029) [2024-06-05 18:04:28,920][10130] Fps is (10 sec: 47513.5, 60 sec: 49698.2, 300 sec: 49429.7). Total num frames: 38666240. Throughput: 0: 49405.8. Samples: 38710060. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-05 18:04:28,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:04:31,641][10367] Updated weights for policy 0, policy_version 2370 (0.0024) [2024-06-05 18:04:33,920][10130] Fps is (10 sec: 47513.6, 60 sec: 49152.0, 300 sec: 49207.5). Total num frames: 38895616. Throughput: 0: 49290.3. Samples: 39001340. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 18:04:33,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:04:33,942][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002374_38895616.pth... [2024-06-05 18:04:34,006][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000001651_27049984.pth [2024-06-05 18:04:35,495][10367] Updated weights for policy 0, policy_version 2380 (0.0026) [2024-06-05 18:04:38,614][10367] Updated weights for policy 0, policy_version 2390 (0.0025) [2024-06-05 18:04:38,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 39157760. Throughput: 0: 49308.5. Samples: 39290200. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-05 18:04:38,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:38,956][10347] Saving new best policy, reward=0.001! [2024-06-05 18:04:42,114][10367] Updated weights for policy 0, policy_version 2400 (0.0035) [2024-06-05 18:04:43,920][10130] Fps is (10 sec: 49151.4, 60 sec: 49151.9, 300 sec: 49374.1). Total num frames: 39387136. Throughput: 0: 49432.3. Samples: 39448840. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-05 18:04:43,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:45,252][10367] Updated weights for policy 0, policy_version 2410 (0.0021) [2024-06-05 18:04:48,304][10347] Signal inference workers to stop experience collection... (550 times) [2024-06-05 18:04:48,304][10347] Signal inference workers to resume experience collection... (550 times) [2024-06-05 18:04:48,314][10367] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-05 18:04:48,315][10367] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-05 18:04:48,735][10367] Updated weights for policy 0, policy_version 2420 (0.0030) [2024-06-05 18:04:48,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49425.2, 300 sec: 49429.7). Total num frames: 39649280. Throughput: 0: 49469.3. Samples: 39743360. Policy #0 lag: (min: 0.0, avg: 8.0, max: 20.0) [2024-06-05 18:04:48,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:04:51,667][10367] Updated weights for policy 0, policy_version 2430 (0.0032) [2024-06-05 18:04:53,920][10130] Fps is (10 sec: 49152.7, 60 sec: 49152.0, 300 sec: 49207.6). Total num frames: 39878656. Throughput: 0: 49044.4. Samples: 40035780. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:04:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:04:55,505][10367] Updated weights for policy 0, policy_version 2440 (0.0026) [2024-06-05 18:04:58,354][10367] Updated weights for policy 0, policy_version 2450 (0.0026) [2024-06-05 18:04:58,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 40140800. Throughput: 0: 49132.0. Samples: 40177620. Policy #0 lag: (min: 1.0, avg: 8.6, max: 22.0) [2024-06-05 18:04:58,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:02,039][10367] Updated weights for policy 0, policy_version 2460 (0.0033) [2024-06-05 18:05:03,920][10130] Fps is (10 sec: 50790.0, 60 sec: 49425.0, 300 sec: 49429.7). Total num frames: 40386560. Throughput: 0: 48993.7. Samples: 40472400. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-05 18:05:03,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:05:05,184][10367] Updated weights for policy 0, policy_version 2470 (0.0031) [2024-06-05 18:05:08,506][10367] Updated weights for policy 0, policy_version 2480 (0.0026) [2024-06-05 18:05:08,920][10130] Fps is (10 sec: 50790.5, 60 sec: 49425.1, 300 sec: 49485.3). Total num frames: 40648704. Throughput: 0: 49172.1. Samples: 40780400. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-05 18:05:08,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:11,641][10367] Updated weights for policy 0, policy_version 2490 (0.0027) [2024-06-05 18:05:13,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49425.0, 300 sec: 49263.1). Total num frames: 40878080. Throughput: 0: 49254.6. Samples: 40926520. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:05:13,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:15,156][10367] Updated weights for policy 0, policy_version 2500 (0.0031) [2024-06-05 18:05:18,176][10367] Updated weights for policy 0, policy_version 2510 (0.0032) [2024-06-05 18:05:18,920][10130] Fps is (10 sec: 50790.5, 60 sec: 49425.1, 300 sec: 49540.8). Total num frames: 41156608. Throughput: 0: 49415.2. Samples: 41225020. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-05 18:05:18,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:21,914][10367] Updated weights for policy 0, policy_version 2520 (0.0027) [2024-06-05 18:05:23,920][10130] Fps is (10 sec: 49152.2, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 41369600. Throughput: 0: 49599.0. Samples: 41522160. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-05 18:05:23,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:05:24,933][10367] Updated weights for policy 0, policy_version 2530 (0.0030) [2024-06-05 18:05:28,554][10367] Updated weights for policy 0, policy_version 2540 (0.0025) [2024-06-05 18:05:28,920][10130] Fps is (10 sec: 47513.0, 60 sec: 49425.0, 300 sec: 49430.3). Total num frames: 41631744. Throughput: 0: 49138.7. Samples: 41660080. Policy #0 lag: (min: 1.0, avg: 9.8, max: 23.0) [2024-06-05 18:05:28,922][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:31,680][10367] Updated weights for policy 0, policy_version 2550 (0.0033) [2024-06-05 18:05:33,920][10130] Fps is (10 sec: 47513.3, 60 sec: 49152.0, 300 sec: 49207.5). Total num frames: 41844736. Throughput: 0: 49107.5. Samples: 41953200. Policy #0 lag: (min: 0.0, avg: 12.4, max: 23.0) [2024-06-05 18:05:33,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:35,202][10367] Updated weights for policy 0, policy_version 2560 (0.0030) [2024-06-05 18:05:38,172][10367] Updated weights for policy 0, policy_version 2570 (0.0031) [2024-06-05 18:05:38,920][10130] Fps is (10 sec: 50791.2, 60 sec: 49698.1, 300 sec: 49485.3). Total num frames: 42139648. Throughput: 0: 49240.5. Samples: 42251600. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 18:05:38,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:05:41,748][10367] Updated weights for policy 0, policy_version 2580 (0.0033) [2024-06-05 18:05:43,920][10130] Fps is (10 sec: 52428.9, 60 sec: 49698.2, 300 sec: 49429.7). Total num frames: 42369024. Throughput: 0: 49701.3. Samples: 42414180. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-05 18:05:43,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:05:44,670][10367] Updated weights for policy 0, policy_version 2590 (0.0033) [2024-06-05 18:05:47,601][10347] Signal inference workers to stop experience collection... (600 times) [2024-06-05 18:05:47,601][10347] Signal inference workers to resume experience collection... (600 times) [2024-06-05 18:05:47,617][10367] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-05 18:05:47,640][10367] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-05 18:05:48,160][10367] Updated weights for policy 0, policy_version 2600 (0.0033) [2024-06-05 18:05:48,920][10130] Fps is (10 sec: 49150.8, 60 sec: 49698.0, 300 sec: 49485.2). Total num frames: 42631168. Throughput: 0: 49942.6. Samples: 42719820. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-05 18:05:48,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:05:51,478][10367] Updated weights for policy 0, policy_version 2610 (0.0028) [2024-06-05 18:05:53,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 49318.6). Total num frames: 42860544. Throughput: 0: 49599.5. Samples: 43012380. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-05 18:05:53,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:05:54,719][10367] Updated weights for policy 0, policy_version 2620 (0.0023) [2024-06-05 18:05:57,828][10367] Updated weights for policy 0, policy_version 2630 (0.0030) [2024-06-05 18:05:58,920][10130] Fps is (10 sec: 49153.1, 60 sec: 49698.2, 300 sec: 49485.2). Total num frames: 43122688. Throughput: 0: 49633.4. Samples: 43160020. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-05 18:05:58,920][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:06:01,439][10367] Updated weights for policy 0, policy_version 2640 (0.0023) [2024-06-05 18:06:03,920][10130] Fps is (10 sec: 50790.9, 60 sec: 49698.2, 300 sec: 49485.3). Total num frames: 43368448. Throughput: 0: 49518.7. Samples: 43453360. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:06:03,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:04,294][10367] Updated weights for policy 0, policy_version 2650 (0.0025) [2024-06-05 18:06:08,050][10367] Updated weights for policy 0, policy_version 2660 (0.0038) [2024-06-05 18:06:08,920][10130] Fps is (10 sec: 49151.3, 60 sec: 49425.0, 300 sec: 49485.2). Total num frames: 43614208. Throughput: 0: 49539.9. Samples: 43751460. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 18:06:08,921][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:11,157][10367] Updated weights for policy 0, policy_version 2670 (0.0030) [2024-06-05 18:06:13,920][10130] Fps is (10 sec: 47512.9, 60 sec: 49425.0, 300 sec: 49318.6). Total num frames: 43843584. Throughput: 0: 49747.5. Samples: 43898720. Policy #0 lag: (min: 0.0, avg: 10.8, max: 24.0) [2024-06-05 18:06:13,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:14,570][10367] Updated weights for policy 0, policy_version 2680 (0.0025) [2024-06-05 18:06:17,657][10367] Updated weights for policy 0, policy_version 2690 (0.0032) [2024-06-05 18:06:18,920][10130] Fps is (10 sec: 50790.7, 60 sec: 49425.0, 300 sec: 49540.8). Total num frames: 44122112. Throughput: 0: 49937.8. Samples: 44200400. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-05 18:06:18,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:21,097][10367] Updated weights for policy 0, policy_version 2700 (0.0021) [2024-06-05 18:06:23,920][10130] Fps is (10 sec: 52428.9, 60 sec: 49971.2, 300 sec: 49540.8). Total num frames: 44367872. Throughput: 0: 49897.6. Samples: 44497000. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-05 18:06:23,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:06:24,238][10367] Updated weights for policy 0, policy_version 2710 (0.0031) [2024-06-05 18:06:27,685][10367] Updated weights for policy 0, policy_version 2720 (0.0028) [2024-06-05 18:06:28,920][10130] Fps is (10 sec: 49152.4, 60 sec: 49698.2, 300 sec: 49540.8). Total num frames: 44613632. Throughput: 0: 49607.2. Samples: 44646500. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-05 18:06:28,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:30,937][10367] Updated weights for policy 0, policy_version 2730 (0.0033) [2024-06-05 18:06:33,920][10130] Fps is (10 sec: 47514.1, 60 sec: 49971.3, 300 sec: 49374.2). Total num frames: 44843008. Throughput: 0: 49395.4. Samples: 44942600. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-05 18:06:33,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:06:34,060][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002738_44859392.pth... [2024-06-05 18:06:34,121][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002013_32980992.pth [2024-06-05 18:06:34,348][10367] Updated weights for policy 0, policy_version 2740 (0.0021) [2024-06-05 18:06:37,535][10367] Updated weights for policy 0, policy_version 2750 (0.0035) [2024-06-05 18:06:38,920][10130] Fps is (10 sec: 47513.2, 60 sec: 49151.9, 300 sec: 49485.2). Total num frames: 45088768. Throughput: 0: 49466.7. Samples: 45238380. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:06:38,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:06:38,954][10347] Saving new best policy, reward=0.002! [2024-06-05 18:06:40,924][10367] Updated weights for policy 0, policy_version 2760 (0.0029) [2024-06-05 18:06:43,920][10130] Fps is (10 sec: 50790.2, 60 sec: 49698.2, 300 sec: 49540.8). Total num frames: 45350912. Throughput: 0: 49330.2. Samples: 45379880. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-05 18:06:43,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:44,263][10367] Updated weights for policy 0, policy_version 2770 (0.0036) [2024-06-05 18:06:47,589][10367] Updated weights for policy 0, policy_version 2780 (0.0019) [2024-06-05 18:06:48,678][10347] Signal inference workers to stop experience collection... (650 times) [2024-06-05 18:06:48,719][10367] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-05 18:06:48,724][10347] Signal inference workers to resume experience collection... (650 times) [2024-06-05 18:06:48,731][10367] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-05 18:06:48,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49425.2, 300 sec: 49485.2). Total num frames: 45596672. Throughput: 0: 49384.4. Samples: 45675660. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-05 18:06:48,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:50,911][10367] Updated weights for policy 0, policy_version 2790 (0.0023) [2024-06-05 18:06:53,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49698.2, 300 sec: 49429.7). Total num frames: 45842432. Throughput: 0: 49542.4. Samples: 45980860. Policy #0 lag: (min: 1.0, avg: 11.4, max: 23.0) [2024-06-05 18:06:53,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:06:54,536][10367] Updated weights for policy 0, policy_version 2800 (0.0026) [2024-06-05 18:06:57,545][10367] Updated weights for policy 0, policy_version 2810 (0.0020) [2024-06-05 18:06:58,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49425.0, 300 sec: 49485.2). Total num frames: 46088192. Throughput: 0: 49556.5. Samples: 46128760. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-05 18:06:58,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:01,222][10367] Updated weights for policy 0, policy_version 2820 (0.0029) [2024-06-05 18:07:03,920][10130] Fps is (10 sec: 49151.3, 60 sec: 49425.0, 300 sec: 49485.2). Total num frames: 46333952. Throughput: 0: 49436.4. Samples: 46425040. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-05 18:07:03,921][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:07:04,456][10367] Updated weights for policy 0, policy_version 2830 (0.0038) [2024-06-05 18:07:07,922][10367] Updated weights for policy 0, policy_version 2840 (0.0034) [2024-06-05 18:07:08,920][10130] Fps is (10 sec: 49151.6, 60 sec: 49425.1, 300 sec: 49429.7). Total num frames: 46579712. Throughput: 0: 49463.1. Samples: 46722840. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-05 18:07:08,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:10,962][10367] Updated weights for policy 0, policy_version 2850 (0.0031) [2024-06-05 18:07:13,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 46825472. Throughput: 0: 49317.6. Samples: 46865800. Policy #0 lag: (min: 0.0, avg: 10.5, max: 24.0) [2024-06-05 18:07:13,921][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:07:14,706][10367] Updated weights for policy 0, policy_version 2860 (0.0038) [2024-06-05 18:07:17,674][10367] Updated weights for policy 0, policy_version 2870 (0.0034) [2024-06-05 18:07:18,920][10130] Fps is (10 sec: 49152.6, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 47071232. Throughput: 0: 49164.9. Samples: 47155020. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-05 18:07:18,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:07:21,315][10367] Updated weights for policy 0, policy_version 2880 (0.0024) [2024-06-05 18:07:23,920][10130] Fps is (10 sec: 49152.6, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 47316992. Throughput: 0: 49154.7. Samples: 47450340. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:07:23,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:24,345][10367] Updated weights for policy 0, policy_version 2890 (0.0023) [2024-06-05 18:07:28,141][10367] Updated weights for policy 0, policy_version 2900 (0.0022) [2024-06-05 18:07:28,920][10130] Fps is (10 sec: 49151.4, 60 sec: 49151.9, 300 sec: 49429.7). Total num frames: 47562752. Throughput: 0: 49297.2. Samples: 47598260. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-05 18:07:28,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:07:31,231][10367] Updated weights for policy 0, policy_version 2910 (0.0025) [2024-06-05 18:07:33,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49425.0, 300 sec: 49374.2). Total num frames: 47808512. Throughput: 0: 49257.3. Samples: 47892240. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-05 18:07:33,921][10130] Avg episode reward: [(0, '0.000')] [2024-06-05 18:07:34,828][10367] Updated weights for policy 0, policy_version 2920 (0.0023) [2024-06-05 18:07:37,951][10367] Updated weights for policy 0, policy_version 2930 (0.0024) [2024-06-05 18:07:38,920][10130] Fps is (10 sec: 49152.5, 60 sec: 49425.1, 300 sec: 49318.6). Total num frames: 48054272. Throughput: 0: 49046.2. Samples: 48187940. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-05 18:07:38,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:07:41,550][10367] Updated weights for policy 0, policy_version 2940 (0.0018) [2024-06-05 18:07:43,920][10130] Fps is (10 sec: 49152.5, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 48300032. Throughput: 0: 49323.1. Samples: 48348300. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:07:43,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:44,414][10367] Updated weights for policy 0, policy_version 2950 (0.0034) [2024-06-05 18:07:48,265][10367] Updated weights for policy 0, policy_version 2960 (0.0029) [2024-06-05 18:07:48,923][10130] Fps is (10 sec: 47496.2, 60 sec: 48876.0, 300 sec: 49318.0). Total num frames: 48529408. Throughput: 0: 48949.4. Samples: 48627940. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-05 18:07:48,924][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:50,939][10367] Updated weights for policy 0, policy_version 2970 (0.0024) [2024-06-05 18:07:53,922][10130] Fps is (10 sec: 49142.6, 60 sec: 49150.4, 300 sec: 49318.3). Total num frames: 48791552. Throughput: 0: 48998.9. Samples: 48927880. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:07:53,922][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:07:54,946][10367] Updated weights for policy 0, policy_version 2980 (0.0023) [2024-06-05 18:07:57,670][10367] Updated weights for policy 0, policy_version 2990 (0.0027) [2024-06-05 18:07:58,920][10130] Fps is (10 sec: 50809.1, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 49037312. Throughput: 0: 49038.4. Samples: 49072520. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:07:58,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:08:01,453][10367] Updated weights for policy 0, policy_version 3000 (0.0035) [2024-06-05 18:08:03,920][10130] Fps is (10 sec: 49161.0, 60 sec: 49152.0, 300 sec: 49374.2). Total num frames: 49283072. Throughput: 0: 49464.8. Samples: 49380940. Policy #0 lag: (min: 1.0, avg: 10.7, max: 22.0) [2024-06-05 18:08:03,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:08:03,941][10347] Saving new best policy, reward=0.003! [2024-06-05 18:08:04,290][10367] Updated weights for policy 0, policy_version 3010 (0.0027) [2024-06-05 18:08:04,323][10347] Signal inference workers to stop experience collection... (700 times) [2024-06-05 18:08:04,323][10347] Signal inference workers to resume experience collection... (700 times) [2024-06-05 18:08:04,361][10367] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-06-05 18:08:04,361][10367] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-06-05 18:08:07,990][10367] Updated weights for policy 0, policy_version 3020 (0.0026) [2024-06-05 18:08:08,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 49528832. Throughput: 0: 49534.7. Samples: 49679400. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-05 18:08:08,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:10,702][10367] Updated weights for policy 0, policy_version 3030 (0.0022) [2024-06-05 18:08:13,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49152.1, 300 sec: 49318.6). Total num frames: 49774592. Throughput: 0: 49500.5. Samples: 49825780. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-05 18:08:13,932][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:08:14,735][10367] Updated weights for policy 0, policy_version 3040 (0.0028) [2024-06-05 18:08:17,333][10367] Updated weights for policy 0, policy_version 3050 (0.0025) [2024-06-05 18:08:18,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 50053120. Throughput: 0: 49513.9. Samples: 50120360. Policy #0 lag: (min: 1.0, avg: 10.5, max: 22.0) [2024-06-05 18:08:18,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:21,150][10367] Updated weights for policy 0, policy_version 3060 (0.0028) [2024-06-05 18:08:23,843][10367] Updated weights for policy 0, policy_version 3070 (0.0031) [2024-06-05 18:08:23,920][10130] Fps is (10 sec: 52428.3, 60 sec: 49698.0, 300 sec: 49540.8). Total num frames: 50298880. Throughput: 0: 49665.2. Samples: 50422880. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-05 18:08:23,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:08:27,795][10367] Updated weights for policy 0, policy_version 3080 (0.0027) [2024-06-05 18:08:28,920][10130] Fps is (10 sec: 47513.9, 60 sec: 49425.2, 300 sec: 49429.7). Total num frames: 50528256. Throughput: 0: 49382.7. Samples: 50570520. Policy #0 lag: (min: 0.0, avg: 8.3, max: 22.0) [2024-06-05 18:08:28,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:30,543][10367] Updated weights for policy 0, policy_version 3090 (0.0035) [2024-06-05 18:08:33,920][10130] Fps is (10 sec: 45875.1, 60 sec: 49151.9, 300 sec: 49318.6). Total num frames: 50757632. Throughput: 0: 49800.7. Samples: 50868800. Policy #0 lag: (min: 1.0, avg: 11.3, max: 22.0) [2024-06-05 18:08:33,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:33,935][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003098_50757632.pth... [2024-06-05 18:08:34,001][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002374_38895616.pth [2024-06-05 18:08:34,317][10367] Updated weights for policy 0, policy_version 3100 (0.0028) [2024-06-05 18:08:37,055][10367] Updated weights for policy 0, policy_version 3110 (0.0041) [2024-06-05 18:08:38,920][10130] Fps is (10 sec: 50790.2, 60 sec: 49698.1, 300 sec: 49485.2). Total num frames: 51036160. Throughput: 0: 49824.3. Samples: 51169880. Policy #0 lag: (min: 1.0, avg: 10.2, max: 23.0) [2024-06-05 18:08:38,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:40,897][10367] Updated weights for policy 0, policy_version 3120 (0.0032) [2024-06-05 18:08:43,918][10367] Updated weights for policy 0, policy_version 3130 (0.0023) [2024-06-05 18:08:43,920][10130] Fps is (10 sec: 52429.2, 60 sec: 49698.0, 300 sec: 49485.2). Total num frames: 51281920. Throughput: 0: 50086.1. Samples: 51326400. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-05 18:08:43,928][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:47,297][10367] Updated weights for policy 0, policy_version 3140 (0.0023) [2024-06-05 18:08:48,920][10130] Fps is (10 sec: 49151.8, 60 sec: 49974.2, 300 sec: 49485.2). Total num frames: 51527680. Throughput: 0: 49855.6. Samples: 51624440. Policy #0 lag: (min: 0.0, avg: 7.8, max: 19.0) [2024-06-05 18:08:48,929][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:50,224][10367] Updated weights for policy 0, policy_version 3150 (0.0020) [2024-06-05 18:08:53,862][10367] Updated weights for policy 0, policy_version 3160 (0.0031) [2024-06-05 18:08:53,920][10130] Fps is (10 sec: 49152.0, 60 sec: 49699.6, 300 sec: 49429.7). Total num frames: 51773440. Throughput: 0: 49701.7. Samples: 51915980. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:08:53,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:08:56,876][10367] Updated weights for policy 0, policy_version 3170 (0.0025) [2024-06-05 18:08:58,920][10130] Fps is (10 sec: 50790.3, 60 sec: 49971.1, 300 sec: 49540.8). Total num frames: 52035584. Throughput: 0: 49820.9. Samples: 52067720. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:08:58,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:09:00,689][10367] Updated weights for policy 0, policy_version 3180 (0.0049) [2024-06-05 18:09:03,823][10367] Updated weights for policy 0, policy_version 3190 (0.0024) [2024-06-05 18:09:03,920][10130] Fps is (10 sec: 49152.1, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 52264960. Throughput: 0: 49583.5. Samples: 52351620. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-05 18:09:03,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:07,348][10367] Updated weights for policy 0, policy_version 3200 (0.0031) [2024-06-05 18:09:08,920][10130] Fps is (10 sec: 47513.6, 60 sec: 49698.1, 300 sec: 49485.2). Total num frames: 52510720. Throughput: 0: 49544.1. Samples: 52652360. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-05 18:09:08,929][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:09:10,389][10367] Updated weights for policy 0, policy_version 3210 (0.0027) [2024-06-05 18:09:13,695][10367] Updated weights for policy 0, policy_version 3220 (0.0037) [2024-06-05 18:09:13,920][10130] Fps is (10 sec: 49152.4, 60 sec: 49698.2, 300 sec: 49374.2). Total num frames: 52756480. Throughput: 0: 49661.2. Samples: 52805280. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:09:13,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:16,763][10367] Updated weights for policy 0, policy_version 3230 (0.0040) [2024-06-05 18:09:17,575][10347] Signal inference workers to stop experience collection... (750 times) [2024-06-05 18:09:17,576][10347] Signal inference workers to resume experience collection... (750 times) [2024-06-05 18:09:17,625][10367] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-06-05 18:09:17,626][10367] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-06-05 18:09:18,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 49540.8). Total num frames: 53035008. Throughput: 0: 49728.1. Samples: 53106560. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-05 18:09:18,929][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:09:20,406][10367] Updated weights for policy 0, policy_version 3240 (0.0027) [2024-06-05 18:09:23,311][10367] Updated weights for policy 0, policy_version 3250 (0.0033) [2024-06-05 18:09:23,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49425.2, 300 sec: 49485.2). Total num frames: 53264384. Throughput: 0: 49584.0. Samples: 53401160. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-05 18:09:23,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:09:26,948][10367] Updated weights for policy 0, policy_version 3260 (0.0026) [2024-06-05 18:09:28,920][10130] Fps is (10 sec: 45875.6, 60 sec: 49425.1, 300 sec: 49485.2). Total num frames: 53493760. Throughput: 0: 49459.7. Samples: 53552080. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-05 18:09:28,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:30,237][10367] Updated weights for policy 0, policy_version 3270 (0.0029) [2024-06-05 18:09:33,811][10367] Updated weights for policy 0, policy_version 3280 (0.0035) [2024-06-05 18:09:33,920][10130] Fps is (10 sec: 47513.5, 60 sec: 49698.3, 300 sec: 49429.7). Total num frames: 53739520. Throughput: 0: 49192.5. Samples: 53838100. Policy #0 lag: (min: 0.0, avg: 10.6, max: 24.0) [2024-06-05 18:09:33,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:37,053][10367] Updated weights for policy 0, policy_version 3290 (0.0032) [2024-06-05 18:09:38,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49698.2, 300 sec: 49596.3). Total num frames: 54018048. Throughput: 0: 49240.6. Samples: 54131800. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:09:38,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:40,466][10367] Updated weights for policy 0, policy_version 3300 (0.0037) [2024-06-05 18:09:43,623][10367] Updated weights for policy 0, policy_version 3310 (0.0024) [2024-06-05 18:09:43,920][10130] Fps is (10 sec: 49152.3, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 54231040. Throughput: 0: 49207.7. Samples: 54282060. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-05 18:09:43,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:09:47,107][10367] Updated weights for policy 0, policy_version 3320 (0.0030) [2024-06-05 18:09:48,920][10130] Fps is (10 sec: 45875.0, 60 sec: 49152.0, 300 sec: 49485.2). Total num frames: 54476800. Throughput: 0: 49515.2. Samples: 54579800. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 18:09:48,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:09:50,404][10367] Updated weights for policy 0, policy_version 3330 (0.0022) [2024-06-05 18:09:53,880][10367] Updated weights for policy 0, policy_version 3340 (0.0035) [2024-06-05 18:09:53,920][10130] Fps is (10 sec: 49151.7, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 54722560. Throughput: 0: 49396.9. Samples: 54875220. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:09:53,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:09:57,062][10367] Updated weights for policy 0, policy_version 3350 (0.0036) [2024-06-05 18:09:58,920][10130] Fps is (10 sec: 52428.3, 60 sec: 49425.0, 300 sec: 49540.8). Total num frames: 55001088. Throughput: 0: 49290.1. Samples: 55023340. Policy #0 lag: (min: 1.0, avg: 11.5, max: 23.0) [2024-06-05 18:09:58,921][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:10:00,614][10367] Updated weights for policy 0, policy_version 3360 (0.0037) [2024-06-05 18:10:03,745][10367] Updated weights for policy 0, policy_version 3370 (0.0034) [2024-06-05 18:10:03,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49425.2, 300 sec: 49429.7). Total num frames: 55230464. Throughput: 0: 49076.5. Samples: 55315000. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-05 18:10:03,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:10:07,143][10367] Updated weights for policy 0, policy_version 3380 (0.0028) [2024-06-05 18:10:08,920][10130] Fps is (10 sec: 45876.0, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 55459840. Throughput: 0: 48998.3. Samples: 55606080. Policy #0 lag: (min: 0.0, avg: 11.2, max: 24.0) [2024-06-05 18:10:08,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:10:10,340][10367] Updated weights for policy 0, policy_version 3390 (0.0023) [2024-06-05 18:10:13,617][10367] Updated weights for policy 0, policy_version 3400 (0.0031) [2024-06-05 18:10:13,920][10130] Fps is (10 sec: 47513.4, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 55705600. Throughput: 0: 48973.3. Samples: 55755880. Policy #0 lag: (min: 1.0, avg: 9.4, max: 22.0) [2024-06-05 18:10:13,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:10:17,175][10367] Updated weights for policy 0, policy_version 3410 (0.0023) [2024-06-05 18:10:18,920][10130] Fps is (10 sec: 52427.4, 60 sec: 49151.9, 300 sec: 49540.7). Total num frames: 55984128. Throughput: 0: 49152.2. Samples: 56049960. Policy #0 lag: (min: 0.0, avg: 12.9, max: 21.0) [2024-06-05 18:10:18,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:10:20,617][10367] Updated weights for policy 0, policy_version 3420 (0.0026) [2024-06-05 18:10:23,802][10347] Signal inference workers to stop experience collection... (800 times) [2024-06-05 18:10:23,848][10367] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-06-05 18:10:23,851][10347] Signal inference workers to resume experience collection... (800 times) [2024-06-05 18:10:23,859][10367] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-06-05 18:10:23,861][10367] Updated weights for policy 0, policy_version 3430 (0.0031) [2024-06-05 18:10:23,920][10130] Fps is (10 sec: 49151.7, 60 sec: 48878.9, 300 sec: 49374.2). Total num frames: 56197120. Throughput: 0: 49295.4. Samples: 56350100. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-05 18:10:23,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:10:27,387][10367] Updated weights for policy 0, policy_version 3440 (0.0023) [2024-06-05 18:10:28,920][10130] Fps is (10 sec: 45875.4, 60 sec: 49151.8, 300 sec: 49485.2). Total num frames: 56442880. Throughput: 0: 48974.0. Samples: 56485900. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-05 18:10:28,929][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:10:30,402][10367] Updated weights for policy 0, policy_version 3450 (0.0036) [2024-06-05 18:10:33,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48878.9, 300 sec: 49263.1). Total num frames: 56672256. Throughput: 0: 48725.3. Samples: 56772440. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-05 18:10:33,920][10130] Avg episode reward: [(0, '0.001')] [2024-06-05 18:10:33,932][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003460_56688640.pth... [2024-06-05 18:10:33,937][10367] Updated weights for policy 0, policy_version 3460 (0.0028) [2024-06-05 18:10:33,973][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000002738_44859392.pth [2024-06-05 18:10:37,237][10367] Updated weights for policy 0, policy_version 3470 (0.0032) [2024-06-05 18:10:38,920][10130] Fps is (10 sec: 50791.2, 60 sec: 48878.9, 300 sec: 49429.7). Total num frames: 56950784. Throughput: 0: 48778.2. Samples: 57070240. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:10:38,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:10:40,476][10367] Updated weights for policy 0, policy_version 3480 (0.0025) [2024-06-05 18:10:43,882][10367] Updated weights for policy 0, policy_version 3490 (0.0020) [2024-06-05 18:10:43,920][10130] Fps is (10 sec: 50789.6, 60 sec: 49151.8, 300 sec: 49318.6). Total num frames: 57180160. Throughput: 0: 48892.8. Samples: 57223520. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-05 18:10:43,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:10:47,342][10367] Updated weights for policy 0, policy_version 3500 (0.0034) [2024-06-05 18:10:48,920][10130] Fps is (10 sec: 47513.3, 60 sec: 49152.0, 300 sec: 49374.2). Total num frames: 57425920. Throughput: 0: 48999.4. Samples: 57519980. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-05 18:10:48,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:10:48,921][10347] Saving new best policy, reward=0.005! [2024-06-05 18:10:50,556][10367] Updated weights for policy 0, policy_version 3510 (0.0023) [2024-06-05 18:10:53,920][10130] Fps is (10 sec: 47513.6, 60 sec: 48878.8, 300 sec: 49263.0). Total num frames: 57655296. Throughput: 0: 49058.8. Samples: 57813740. Policy #0 lag: (min: 0.0, avg: 11.2, max: 23.0) [2024-06-05 18:10:53,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:10:54,228][10367] Updated weights for policy 0, policy_version 3520 (0.0029) [2024-06-05 18:10:57,164][10367] Updated weights for policy 0, policy_version 3530 (0.0025) [2024-06-05 18:10:58,920][10130] Fps is (10 sec: 50790.7, 60 sec: 48879.0, 300 sec: 49374.1). Total num frames: 57933824. Throughput: 0: 49097.8. Samples: 57965280. Policy #0 lag: (min: 1.0, avg: 11.0, max: 23.0) [2024-06-05 18:10:58,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:00,607][10367] Updated weights for policy 0, policy_version 3540 (0.0021) [2024-06-05 18:11:03,883][10367] Updated weights for policy 0, policy_version 3550 (0.0038) [2024-06-05 18:11:03,920][10130] Fps is (10 sec: 50791.4, 60 sec: 48878.9, 300 sec: 49318.6). Total num frames: 58163200. Throughput: 0: 49026.0. Samples: 58256120. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-05 18:11:03,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:11:07,174][10367] Updated weights for policy 0, policy_version 3560 (0.0035) [2024-06-05 18:11:08,920][10130] Fps is (10 sec: 47513.4, 60 sec: 49151.9, 300 sec: 49374.2). Total num frames: 58408960. Throughput: 0: 48983.1. Samples: 58554340. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:11:08,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:10,592][10367] Updated weights for policy 0, policy_version 3570 (0.0032) [2024-06-05 18:11:13,920][10130] Fps is (10 sec: 47513.1, 60 sec: 48878.9, 300 sec: 49207.5). Total num frames: 58638336. Throughput: 0: 49009.8. Samples: 58691340. Policy #0 lag: (min: 2.0, avg: 9.8, max: 19.0) [2024-06-05 18:11:13,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:11:13,947][10367] Updated weights for policy 0, policy_version 3580 (0.0027) [2024-06-05 18:11:17,127][10367] Updated weights for policy 0, policy_version 3590 (0.0025) [2024-06-05 18:11:18,920][10130] Fps is (10 sec: 50790.6, 60 sec: 48879.1, 300 sec: 49318.6). Total num frames: 58916864. Throughput: 0: 49374.7. Samples: 58994300. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:11:18,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:11:20,446][10367] Updated weights for policy 0, policy_version 3600 (0.0022) [2024-06-05 18:11:23,638][10347] Signal inference workers to stop experience collection... (850 times) [2024-06-05 18:11:23,678][10367] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-06-05 18:11:23,690][10347] Signal inference workers to resume experience collection... (850 times) [2024-06-05 18:11:23,696][10367] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-06-05 18:11:23,699][10367] Updated weights for policy 0, policy_version 3610 (0.0030) [2024-06-05 18:11:23,920][10130] Fps is (10 sec: 52428.2, 60 sec: 49424.9, 300 sec: 49318.6). Total num frames: 59162624. Throughput: 0: 49548.2. Samples: 59299920. Policy #0 lag: (min: 2.0, avg: 10.7, max: 24.0) [2024-06-05 18:11:23,921][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:11:26,919][10367] Updated weights for policy 0, policy_version 3620 (0.0029) [2024-06-05 18:11:28,920][10130] Fps is (10 sec: 47513.8, 60 sec: 49152.2, 300 sec: 49318.6). Total num frames: 59392000. Throughput: 0: 49282.9. Samples: 59441240. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:11:28,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:30,264][10367] Updated weights for policy 0, policy_version 3630 (0.0027) [2024-06-05 18:11:33,497][10367] Updated weights for policy 0, policy_version 3640 (0.0029) [2024-06-05 18:11:33,920][10130] Fps is (10 sec: 49152.6, 60 sec: 49698.1, 300 sec: 49374.1). Total num frames: 59654144. Throughput: 0: 49317.7. Samples: 59739280. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:11:33,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:11:36,893][10367] Updated weights for policy 0, policy_version 3650 (0.0032) [2024-06-05 18:11:38,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49425.1, 300 sec: 49374.2). Total num frames: 59916288. Throughput: 0: 49230.9. Samples: 60029120. Policy #0 lag: (min: 0.0, avg: 10.0, max: 19.0) [2024-06-05 18:11:38,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:11:40,276][10367] Updated weights for policy 0, policy_version 3660 (0.0035) [2024-06-05 18:11:43,594][10367] Updated weights for policy 0, policy_version 3670 (0.0027) [2024-06-05 18:11:43,920][10130] Fps is (10 sec: 50791.1, 60 sec: 49698.3, 300 sec: 49374.2). Total num frames: 60162048. Throughput: 0: 49351.2. Samples: 60186080. Policy #0 lag: (min: 1.0, avg: 9.9, max: 21.0) [2024-06-05 18:11:43,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:46,828][10367] Updated weights for policy 0, policy_version 3680 (0.0020) [2024-06-05 18:11:48,920][10130] Fps is (10 sec: 45874.7, 60 sec: 49152.0, 300 sec: 49263.1). Total num frames: 60375040. Throughput: 0: 49556.8. Samples: 60486180. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:11:48,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:50,163][10367] Updated weights for policy 0, policy_version 3690 (0.0022) [2024-06-05 18:11:53,334][10367] Updated weights for policy 0, policy_version 3700 (0.0038) [2024-06-05 18:11:53,920][10130] Fps is (10 sec: 45874.7, 60 sec: 49425.2, 300 sec: 49263.1). Total num frames: 60620800. Throughput: 0: 49396.9. Samples: 60777200. Policy #0 lag: (min: 1.0, avg: 12.4, max: 25.0) [2024-06-05 18:11:53,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:11:56,937][10367] Updated weights for policy 0, policy_version 3710 (0.0026) [2024-06-05 18:11:58,920][10130] Fps is (10 sec: 50790.6, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 60882944. Throughput: 0: 49817.4. Samples: 60933120. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 18:11:58,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:12:00,338][10367] Updated weights for policy 0, policy_version 3720 (0.0031) [2024-06-05 18:12:03,649][10367] Updated weights for policy 0, policy_version 3730 (0.0037) [2024-06-05 18:12:03,920][10130] Fps is (10 sec: 50790.4, 60 sec: 49425.0, 300 sec: 49318.6). Total num frames: 61128704. Throughput: 0: 49399.9. Samples: 61217300. Policy #0 lag: (min: 1.0, avg: 8.0, max: 19.0) [2024-06-05 18:12:03,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:12:07,101][10367] Updated weights for policy 0, policy_version 3740 (0.0047) [2024-06-05 18:12:08,920][10130] Fps is (10 sec: 47512.8, 60 sec: 49151.9, 300 sec: 49263.1). Total num frames: 61358080. Throughput: 0: 49011.1. Samples: 61505420. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 18:12:08,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:12:10,279][10367] Updated weights for policy 0, policy_version 3750 (0.0022) [2024-06-05 18:12:13,494][10367] Updated weights for policy 0, policy_version 3760 (0.0028) [2024-06-05 18:12:13,920][10130] Fps is (10 sec: 47513.6, 60 sec: 49425.1, 300 sec: 49263.1). Total num frames: 61603840. Throughput: 0: 48998.5. Samples: 61646180. Policy #0 lag: (min: 0.0, avg: 12.5, max: 22.0) [2024-06-05 18:12:13,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:12:16,446][10347] Signal inference workers to stop experience collection... (900 times) [2024-06-05 18:12:16,494][10367] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-06-05 18:12:16,500][10347] Signal inference workers to resume experience collection... (900 times) [2024-06-05 18:12:16,508][10367] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-06-05 18:12:17,004][10367] Updated weights for policy 0, policy_version 3770 (0.0033) [2024-06-05 18:12:18,920][10130] Fps is (10 sec: 50790.9, 60 sec: 49151.9, 300 sec: 49318.6). Total num frames: 61865984. Throughput: 0: 49091.1. Samples: 61948380. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-05 18:12:18,921][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:12:20,137][10367] Updated weights for policy 0, policy_version 3780 (0.0027) [2024-06-05 18:12:23,579][10367] Updated weights for policy 0, policy_version 3790 (0.0031) [2024-06-05 18:12:23,920][10130] Fps is (10 sec: 50790.2, 60 sec: 49152.1, 300 sec: 49318.6). Total num frames: 62111744. Throughput: 0: 49406.5. Samples: 62252420. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:12:23,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:12:27,065][10367] Updated weights for policy 0, policy_version 3800 (0.0030) [2024-06-05 18:12:28,920][10130] Fps is (10 sec: 47514.2, 60 sec: 49152.0, 300 sec: 49263.1). Total num frames: 62341120. Throughput: 0: 48964.4. Samples: 62389480. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-05 18:12:28,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:12:30,435][10367] Updated weights for policy 0, policy_version 3810 (0.0043) [2024-06-05 18:12:33,826][10367] Updated weights for policy 0, policy_version 3820 (0.0023) [2024-06-05 18:12:33,920][10130] Fps is (10 sec: 47513.3, 60 sec: 48878.9, 300 sec: 49263.0). Total num frames: 62586880. Throughput: 0: 48728.8. Samples: 62678980. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-05 18:12:33,921][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:12:33,935][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003820_62586880.pth... [2024-06-05 18:12:33,984][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003098_50757632.pth [2024-06-05 18:12:37,124][10367] Updated weights for policy 0, policy_version 3830 (0.0030) [2024-06-05 18:12:38,920][10130] Fps is (10 sec: 50790.4, 60 sec: 48878.9, 300 sec: 49318.6). Total num frames: 62849024. Throughput: 0: 48750.3. Samples: 62970960. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-05 18:12:38,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:12:40,456][10367] Updated weights for policy 0, policy_version 3840 (0.0045) [2024-06-05 18:12:43,920][10130] Fps is (10 sec: 47514.5, 60 sec: 48332.8, 300 sec: 49263.7). Total num frames: 63062016. Throughput: 0: 48675.6. Samples: 63123520. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-05 18:12:43,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:12:43,934][10367] Updated weights for policy 0, policy_version 3850 (0.0031) [2024-06-05 18:12:47,303][10367] Updated weights for policy 0, policy_version 3860 (0.0029) [2024-06-05 18:12:48,920][10130] Fps is (10 sec: 45874.9, 60 sec: 48879.0, 300 sec: 49207.9). Total num frames: 63307776. Throughput: 0: 48710.7. Samples: 63409280. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-05 18:12:48,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:12:50,676][10367] Updated weights for policy 0, policy_version 3870 (0.0026) [2024-06-05 18:12:53,920][10130] Fps is (10 sec: 49151.2, 60 sec: 48878.9, 300 sec: 49207.5). Total num frames: 63553536. Throughput: 0: 48662.3. Samples: 63695220. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-05 18:12:53,921][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:12:54,247][10367] Updated weights for policy 0, policy_version 3880 (0.0027) [2024-06-05 18:12:57,529][10367] Updated weights for policy 0, policy_version 3890 (0.0027) [2024-06-05 18:12:58,920][10130] Fps is (10 sec: 50790.0, 60 sec: 48878.9, 300 sec: 49263.1). Total num frames: 63815680. Throughput: 0: 48908.0. Samples: 63847040. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-05 18:12:58,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:13:00,938][10367] Updated weights for policy 0, policy_version 3900 (0.0028) [2024-06-05 18:13:03,920][10130] Fps is (10 sec: 47514.7, 60 sec: 48332.9, 300 sec: 49152.0). Total num frames: 64028672. Throughput: 0: 48683.3. Samples: 64139120. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 18:13:03,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:13:04,096][10367] Updated weights for policy 0, policy_version 3910 (0.0023) [2024-06-05 18:13:07,610][10367] Updated weights for policy 0, policy_version 3920 (0.0031) [2024-06-05 18:13:08,920][10130] Fps is (10 sec: 45874.8, 60 sec: 48605.9, 300 sec: 49152.0). Total num frames: 64274432. Throughput: 0: 48501.7. Samples: 64435000. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 18:13:08,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:13:10,800][10367] Updated weights for policy 0, policy_version 3930 (0.0018) [2024-06-05 18:13:13,920][10130] Fps is (10 sec: 50790.0, 60 sec: 48879.0, 300 sec: 49096.5). Total num frames: 64536576. Throughput: 0: 48576.0. Samples: 64575400. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:13:13,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:13:14,180][10367] Updated weights for policy 0, policy_version 3940 (0.0035) [2024-06-05 18:13:17,441][10347] Signal inference workers to stop experience collection... (950 times) [2024-06-05 18:13:17,441][10347] Signal inference workers to resume experience collection... (950 times) [2024-06-05 18:13:17,491][10367] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-06-05 18:13:17,492][10367] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-06-05 18:13:17,573][10367] Updated weights for policy 0, policy_version 3950 (0.0034) [2024-06-05 18:13:18,923][10130] Fps is (10 sec: 52414.4, 60 sec: 48876.6, 300 sec: 49151.5). Total num frames: 64798720. Throughput: 0: 48669.9. Samples: 64869260. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:13:18,923][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:13:21,041][10367] Updated weights for policy 0, policy_version 3960 (0.0030) [2024-06-05 18:13:23,920][10130] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 49040.9). Total num frames: 64995328. Throughput: 0: 48584.9. Samples: 65157280. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-05 18:13:23,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:13:24,492][10367] Updated weights for policy 0, policy_version 3970 (0.0022) [2024-06-05 18:13:27,516][10367] Updated weights for policy 0, policy_version 3980 (0.0018) [2024-06-05 18:13:28,920][10130] Fps is (10 sec: 42607.9, 60 sec: 48059.2, 300 sec: 49040.8). Total num frames: 65224704. Throughput: 0: 48362.8. Samples: 65299880. Policy #0 lag: (min: 1.0, avg: 12.2, max: 25.0) [2024-06-05 18:13:28,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:13:31,229][10367] Updated weights for policy 0, policy_version 3990 (0.0024) [2024-06-05 18:13:33,920][10130] Fps is (10 sec: 52428.2, 60 sec: 48879.0, 300 sec: 49096.4). Total num frames: 65519616. Throughput: 0: 48459.0. Samples: 65589940. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-05 18:13:33,920][10130] Avg episode reward: [(0, '0.003')] [2024-06-05 18:13:34,592][10367] Updated weights for policy 0, policy_version 4000 (0.0023) [2024-06-05 18:13:37,789][10367] Updated weights for policy 0, policy_version 4010 (0.0030) [2024-06-05 18:13:38,920][10130] Fps is (10 sec: 55709.5, 60 sec: 48878.9, 300 sec: 49152.0). Total num frames: 65781760. Throughput: 0: 48744.2. Samples: 65888700. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-05 18:13:38,920][10130] Avg episode reward: [(0, '0.002')] [2024-06-05 18:13:41,229][10367] Updated weights for policy 0, policy_version 4020 (0.0031) [2024-06-05 18:13:43,920][10130] Fps is (10 sec: 45875.6, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 65978368. Throughput: 0: 48718.8. Samples: 66039380. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-05 18:13:43,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:13:44,546][10367] Updated weights for policy 0, policy_version 4030 (0.0038) [2024-06-05 18:13:48,015][10367] Updated weights for policy 0, policy_version 4040 (0.0026) [2024-06-05 18:13:48,920][10130] Fps is (10 sec: 42598.3, 60 sec: 48332.8, 300 sec: 48929.9). Total num frames: 66207744. Throughput: 0: 48575.0. Samples: 66325000. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-05 18:13:48,920][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:13:51,473][10367] Updated weights for policy 0, policy_version 4050 (0.0041) [2024-06-05 18:13:53,920][10130] Fps is (10 sec: 50790.4, 60 sec: 48879.1, 300 sec: 48985.4). Total num frames: 66486272. Throughput: 0: 48392.2. Samples: 66612640. Policy #0 lag: (min: 0.0, avg: 12.9, max: 22.0) [2024-06-05 18:13:53,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:13:54,586][10367] Updated weights for policy 0, policy_version 4060 (0.0026) [2024-06-05 18:13:58,002][10367] Updated weights for policy 0, policy_version 4070 (0.0028) [2024-06-05 18:13:58,920][10130] Fps is (10 sec: 54066.3, 60 sec: 48878.9, 300 sec: 49096.4). Total num frames: 66748416. Throughput: 0: 48756.7. Samples: 66769460. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-05 18:13:58,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:01,354][10367] Updated weights for policy 0, policy_version 4080 (0.0034) [2024-06-05 18:14:03,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48878.9, 300 sec: 48985.4). Total num frames: 66961408. Throughput: 0: 48665.9. Samples: 67059080. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-05 18:14:03,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:04,782][10367] Updated weights for policy 0, policy_version 4090 (0.0024) [2024-06-05 18:14:08,246][10367] Updated weights for policy 0, policy_version 4100 (0.0033) [2024-06-05 18:14:08,920][10130] Fps is (10 sec: 44237.3, 60 sec: 48606.0, 300 sec: 48929.8). Total num frames: 67190784. Throughput: 0: 48763.9. Samples: 67351660. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-05 18:14:08,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:14:11,514][10367] Updated weights for policy 0, policy_version 4110 (0.0028) [2024-06-05 18:14:13,920][10130] Fps is (10 sec: 50789.9, 60 sec: 48878.9, 300 sec: 48929.8). Total num frames: 67469312. Throughput: 0: 48859.3. Samples: 67498520. Policy #0 lag: (min: 1.0, avg: 11.4, max: 23.0) [2024-06-05 18:14:13,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:14,848][10367] Updated weights for policy 0, policy_version 4120 (0.0033) [2024-06-05 18:14:18,375][10367] Updated weights for policy 0, policy_version 4130 (0.0038) [2024-06-05 18:14:18,920][10130] Fps is (10 sec: 50790.7, 60 sec: 48335.2, 300 sec: 48929.8). Total num frames: 67698688. Throughput: 0: 48785.4. Samples: 67785280. Policy #0 lag: (min: 1.0, avg: 8.3, max: 21.0) [2024-06-05 18:14:18,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:21,783][10367] Updated weights for policy 0, policy_version 4140 (0.0031) [2024-06-05 18:14:23,920][10130] Fps is (10 sec: 45875.6, 60 sec: 48878.9, 300 sec: 48929.8). Total num frames: 67928064. Throughput: 0: 48765.8. Samples: 68083160. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:14:23,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:14:24,576][10347] Signal inference workers to stop experience collection... (1000 times) [2024-06-05 18:14:24,576][10347] Signal inference workers to resume experience collection... (1000 times) [2024-06-05 18:14:24,595][10367] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-06-05 18:14:24,595][10367] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-06-05 18:14:24,869][10367] Updated weights for policy 0, policy_version 4150 (0.0030) [2024-06-05 18:14:28,383][10367] Updated weights for policy 0, policy_version 4160 (0.0030) [2024-06-05 18:14:28,920][10130] Fps is (10 sec: 47512.9, 60 sec: 49152.4, 300 sec: 48929.8). Total num frames: 68173824. Throughput: 0: 48502.1. Samples: 68221980. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-05 18:14:28,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:31,350][10367] Updated weights for policy 0, policy_version 4170 (0.0027) [2024-06-05 18:14:33,924][10130] Fps is (10 sec: 52408.9, 60 sec: 48875.9, 300 sec: 48929.2). Total num frames: 68452352. Throughput: 0: 48815.5. Samples: 68521880. Policy #0 lag: (min: 1.0, avg: 10.0, max: 22.0) [2024-06-05 18:14:33,924][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:14:33,936][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004178_68452352.pth... [2024-06-05 18:14:33,979][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003460_56688640.pth [2024-06-05 18:14:33,982][10347] Saving new best policy, reward=0.006! [2024-06-05 18:14:35,028][10367] Updated weights for policy 0, policy_version 4180 (0.0041) [2024-06-05 18:14:38,147][10367] Updated weights for policy 0, policy_version 4190 (0.0033) [2024-06-05 18:14:38,920][10130] Fps is (10 sec: 52429.4, 60 sec: 48605.8, 300 sec: 49040.9). Total num frames: 68698112. Throughput: 0: 49030.2. Samples: 68819000. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 18:14:38,920][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:14:41,640][10367] Updated weights for policy 0, policy_version 4200 (0.0029) [2024-06-05 18:14:43,920][10130] Fps is (10 sec: 45892.2, 60 sec: 48878.9, 300 sec: 48929.8). Total num frames: 68911104. Throughput: 0: 48780.6. Samples: 68964580. Policy #0 lag: (min: 0.0, avg: 11.8, max: 22.0) [2024-06-05 18:14:43,920][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:14:45,022][10367] Updated weights for policy 0, policy_version 4210 (0.0027) [2024-06-05 18:14:48,421][10367] Updated weights for policy 0, policy_version 4220 (0.0028) [2024-06-05 18:14:48,920][10130] Fps is (10 sec: 45874.9, 60 sec: 49151.9, 300 sec: 48929.8). Total num frames: 69156864. Throughput: 0: 48812.3. Samples: 69255640. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-05 18:14:48,921][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:14:51,454][10367] Updated weights for policy 0, policy_version 4230 (0.0035) [2024-06-05 18:14:53,920][10130] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 48929.9). Total num frames: 69435392. Throughput: 0: 48889.4. Samples: 69551680. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-05 18:14:53,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:14:55,027][10367] Updated weights for policy 0, policy_version 4240 (0.0041) [2024-06-05 18:14:58,047][10367] Updated weights for policy 0, policy_version 4250 (0.0027) [2024-06-05 18:14:58,920][10130] Fps is (10 sec: 52429.4, 60 sec: 48879.1, 300 sec: 48985.4). Total num frames: 69681152. Throughput: 0: 49213.9. Samples: 69713140. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-05 18:14:58,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:15:01,773][10367] Updated weights for policy 0, policy_version 4260 (0.0037) [2024-06-05 18:15:03,924][10130] Fps is (10 sec: 47496.1, 60 sec: 49148.9, 300 sec: 48984.8). Total num frames: 69910528. Throughput: 0: 49217.7. Samples: 70000260. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:15:03,924][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:15:04,951][10367] Updated weights for policy 0, policy_version 4270 (0.0035) [2024-06-05 18:15:08,317][10367] Updated weights for policy 0, policy_version 4280 (0.0035) [2024-06-05 18:15:08,920][10130] Fps is (10 sec: 44236.6, 60 sec: 48879.0, 300 sec: 48874.3). Total num frames: 70123520. Throughput: 0: 49046.2. Samples: 70290240. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:15:08,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:15:11,673][10367] Updated weights for policy 0, policy_version 4290 (0.0024) [2024-06-05 18:15:13,920][10130] Fps is (10 sec: 49170.4, 60 sec: 48879.0, 300 sec: 48874.3). Total num frames: 70402048. Throughput: 0: 49359.3. Samples: 70443140. Policy #0 lag: (min: 0.0, avg: 11.8, max: 21.0) [2024-06-05 18:15:13,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:15:13,983][10347] Saving new best policy, reward=0.008! [2024-06-05 18:15:15,044][10367] Updated weights for policy 0, policy_version 4300 (0.0043) [2024-06-05 18:15:18,260][10367] Updated weights for policy 0, policy_version 4310 (0.0036) [2024-06-05 18:15:18,920][10130] Fps is (10 sec: 52428.8, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 70647808. Throughput: 0: 49237.5. Samples: 70737380. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-05 18:15:18,920][10130] Avg episode reward: [(0, '0.004')] [2024-06-05 18:15:21,743][10367] Updated weights for policy 0, policy_version 4320 (0.0020) [2024-06-05 18:15:23,920][10130] Fps is (10 sec: 49151.0, 60 sec: 49424.9, 300 sec: 48985.4). Total num frames: 70893568. Throughput: 0: 49255.4. Samples: 71035500. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-05 18:15:23,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:15:24,854][10367] Updated weights for policy 0, policy_version 4330 (0.0026) [2024-06-05 18:15:28,462][10367] Updated weights for policy 0, policy_version 4340 (0.0028) [2024-06-05 18:15:28,920][10130] Fps is (10 sec: 47512.9, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 71122944. Throughput: 0: 48995.9. Samples: 71169400. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-05 18:15:28,921][10130] Avg episode reward: [(0, '0.006')] [2024-06-05 18:15:31,725][10367] Updated weights for policy 0, policy_version 4350 (0.0034) [2024-06-05 18:15:33,920][10130] Fps is (10 sec: 49152.6, 60 sec: 48882.0, 300 sec: 48929.8). Total num frames: 71385088. Throughput: 0: 49076.0. Samples: 71464060. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-05 18:15:33,920][10130] Avg episode reward: [(0, '0.009')] [2024-06-05 18:15:34,865][10367] Updated weights for policy 0, policy_version 4360 (0.0030) [2024-06-05 18:15:38,424][10367] Updated weights for policy 0, policy_version 4370 (0.0029) [2024-06-05 18:15:38,920][10130] Fps is (10 sec: 52428.7, 60 sec: 49151.9, 300 sec: 49040.9). Total num frames: 71647232. Throughput: 0: 49336.7. Samples: 71771840. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-05 18:15:38,921][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:15:40,566][10347] Signal inference workers to stop experience collection... (1050 times) [2024-06-05 18:15:40,586][10367] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-06-05 18:15:40,672][10347] Signal inference workers to resume experience collection... (1050 times) [2024-06-05 18:15:40,673][10367] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-06-05 18:15:41,813][10367] Updated weights for policy 0, policy_version 4380 (0.0026) [2024-06-05 18:15:43,920][10130] Fps is (10 sec: 47513.5, 60 sec: 49152.0, 300 sec: 48929.8). Total num frames: 71860224. Throughput: 0: 48759.0. Samples: 71907300. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:15:43,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:15:44,958][10367] Updated weights for policy 0, policy_version 4390 (0.0030) [2024-06-05 18:15:48,556][10367] Updated weights for policy 0, policy_version 4400 (0.0025) [2024-06-05 18:15:48,920][10130] Fps is (10 sec: 45875.9, 60 sec: 49152.1, 300 sec: 48985.4). Total num frames: 72105984. Throughput: 0: 48916.4. Samples: 72201320. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:15:48,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:15:51,699][10367] Updated weights for policy 0, policy_version 4410 (0.0032) [2024-06-05 18:15:53,920][10130] Fps is (10 sec: 49152.5, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 72351744. Throughput: 0: 48901.4. Samples: 72490800. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:15:53,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:15:55,278][10367] Updated weights for policy 0, policy_version 4420 (0.0028) [2024-06-05 18:15:58,493][10367] Updated weights for policy 0, policy_version 4430 (0.0029) [2024-06-05 18:15:58,920][10130] Fps is (10 sec: 50790.5, 60 sec: 48878.9, 300 sec: 48985.4). Total num frames: 72613888. Throughput: 0: 48832.4. Samples: 72640600. Policy #0 lag: (min: 0.0, avg: 9.6, max: 24.0) [2024-06-05 18:15:58,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:01,878][10367] Updated weights for policy 0, policy_version 4440 (0.0029) [2024-06-05 18:16:03,920][10130] Fps is (10 sec: 49151.1, 60 sec: 48881.8, 300 sec: 48929.8). Total num frames: 72843264. Throughput: 0: 48819.8. Samples: 72934280. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-05 18:16:03,921][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:05,273][10367] Updated weights for policy 0, policy_version 4450 (0.0034) [2024-06-05 18:16:08,718][10367] Updated weights for policy 0, policy_version 4460 (0.0030) [2024-06-05 18:16:08,920][10130] Fps is (10 sec: 45875.3, 60 sec: 49152.0, 300 sec: 48929.9). Total num frames: 73072640. Throughput: 0: 48721.5. Samples: 73227960. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:16:08,920][10130] Avg episode reward: [(0, '0.005')] [2024-06-05 18:16:11,703][10367] Updated weights for policy 0, policy_version 4470 (0.0032) [2024-06-05 18:16:13,920][10130] Fps is (10 sec: 47514.5, 60 sec: 48605.9, 300 sec: 48818.8). Total num frames: 73318400. Throughput: 0: 49091.8. Samples: 73378520. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:16:13,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:15,637][10367] Updated weights for policy 0, policy_version 4480 (0.0023) [2024-06-05 18:16:18,395][10367] Updated weights for policy 0, policy_version 4490 (0.0037) [2024-06-05 18:16:18,920][10130] Fps is (10 sec: 50789.5, 60 sec: 48878.8, 300 sec: 48874.3). Total num frames: 73580544. Throughput: 0: 48871.5. Samples: 73663280. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:16:18,921][10130] Avg episode reward: [(0, '0.009')] [2024-06-05 18:16:22,445][10367] Updated weights for policy 0, policy_version 4500 (0.0037) [2024-06-05 18:16:23,920][10130] Fps is (10 sec: 49151.5, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 73809920. Throughput: 0: 48405.5. Samples: 73950080. Policy #0 lag: (min: 0.0, avg: 9.4, max: 23.0) [2024-06-05 18:16:23,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:25,456][10367] Updated weights for policy 0, policy_version 4510 (0.0033) [2024-06-05 18:16:28,920][10130] Fps is (10 sec: 45875.9, 60 sec: 48606.0, 300 sec: 48763.2). Total num frames: 74039296. Throughput: 0: 48401.4. Samples: 74085360. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-05 18:16:28,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:16:29,107][10367] Updated weights for policy 0, policy_version 4520 (0.0026) [2024-06-05 18:16:32,267][10367] Updated weights for policy 0, policy_version 4530 (0.0039) [2024-06-05 18:16:33,920][10130] Fps is (10 sec: 49151.8, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 74301440. Throughput: 0: 48416.4. Samples: 74380060. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 18:16:33,921][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:33,929][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004535_74301440.pth... [2024-06-05 18:16:33,973][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000003820_62586880.pth [2024-06-05 18:16:35,928][10367] Updated weights for policy 0, policy_version 4540 (0.0037) [2024-06-05 18:16:38,815][10367] Updated weights for policy 0, policy_version 4550 (0.0030) [2024-06-05 18:16:38,920][10130] Fps is (10 sec: 50790.4, 60 sec: 48332.9, 300 sec: 48763.2). Total num frames: 74547200. Throughput: 0: 48551.5. Samples: 74675620. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:16:38,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:16:43,010][10367] Updated weights for policy 0, policy_version 4560 (0.0032) [2024-06-05 18:16:43,923][10130] Fps is (10 sec: 45858.9, 60 sec: 48329.9, 300 sec: 48762.6). Total num frames: 74760192. Throughput: 0: 48322.3. Samples: 74815280. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-05 18:16:43,924][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:16:45,486][10367] Updated weights for policy 0, policy_version 4570 (0.0034) [2024-06-05 18:16:48,920][10130] Fps is (10 sec: 45874.8, 60 sec: 48332.7, 300 sec: 48763.2). Total num frames: 75005952. Throughput: 0: 48262.3. Samples: 75106080. Policy #0 lag: (min: 0.0, avg: 11.8, max: 22.0) [2024-06-05 18:16:48,920][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:16:48,921][10347] Saving new best policy, reward=0.012! [2024-06-05 18:16:49,810][10367] Updated weights for policy 0, policy_version 4580 (0.0033) [2024-06-05 18:16:51,643][10347] Signal inference workers to stop experience collection... (1100 times) [2024-06-05 18:16:51,643][10347] Signal inference workers to resume experience collection... (1100 times) [2024-06-05 18:16:51,680][10367] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-06-05 18:16:51,680][10367] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-06-05 18:16:52,411][10367] Updated weights for policy 0, policy_version 4590 (0.0039) [2024-06-05 18:16:53,920][10130] Fps is (10 sec: 50808.5, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 75268096. Throughput: 0: 48164.3. Samples: 75395360. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-05 18:16:53,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:16:56,399][10367] Updated weights for policy 0, policy_version 4600 (0.0035) [2024-06-05 18:16:58,920][10130] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 48707.7). Total num frames: 75497472. Throughput: 0: 48190.6. Samples: 75547100. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:16:58,920][10130] Avg episode reward: [(0, '0.008')] [2024-06-05 18:16:59,295][10367] Updated weights for policy 0, policy_version 4610 (0.0034) [2024-06-05 18:17:03,273][10367] Updated weights for policy 0, policy_version 4620 (0.0030) [2024-06-05 18:17:03,920][10130] Fps is (10 sec: 44237.2, 60 sec: 47786.8, 300 sec: 48652.2). Total num frames: 75710464. Throughput: 0: 48051.7. Samples: 75825600. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:17:03,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:17:05,892][10367] Updated weights for policy 0, policy_version 4630 (0.0033) [2024-06-05 18:17:08,920][10130] Fps is (10 sec: 49151.9, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 75988992. Throughput: 0: 47989.3. Samples: 76109600. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:17:08,921][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:17:10,492][10367] Updated weights for policy 0, policy_version 4640 (0.0022) [2024-06-05 18:17:12,632][10367] Updated weights for policy 0, policy_version 4650 (0.0028) [2024-06-05 18:17:13,920][10130] Fps is (10 sec: 49152.2, 60 sec: 48059.7, 300 sec: 48596.6). Total num frames: 76201984. Throughput: 0: 48352.9. Samples: 76261240. Policy #0 lag: (min: 0.0, avg: 11.9, max: 20.0) [2024-06-05 18:17:13,920][10130] Avg episode reward: [(0, '0.009')] [2024-06-05 18:17:17,124][10367] Updated weights for policy 0, policy_version 4660 (0.0034) [2024-06-05 18:17:18,920][10130] Fps is (10 sec: 47514.4, 60 sec: 48059.9, 300 sec: 48652.2). Total num frames: 76464128. Throughput: 0: 48333.1. Samples: 76555040. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:17:18,920][10130] Avg episode reward: [(0, '0.009')] [2024-06-05 18:17:19,654][10367] Updated weights for policy 0, policy_version 4670 (0.0027) [2024-06-05 18:17:23,682][10367] Updated weights for policy 0, policy_version 4680 (0.0029) [2024-06-05 18:17:23,920][10130] Fps is (10 sec: 47513.4, 60 sec: 47786.7, 300 sec: 48596.6). Total num frames: 76677120. Throughput: 0: 48136.9. Samples: 76841780. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-05 18:17:23,920][10130] Avg episode reward: [(0, '0.010')] [2024-06-05 18:17:26,490][10367] Updated weights for policy 0, policy_version 4690 (0.0025) [2024-06-05 18:17:28,920][10130] Fps is (10 sec: 49151.5, 60 sec: 48605.9, 300 sec: 48707.7). Total num frames: 76955648. Throughput: 0: 48214.1. Samples: 76984740. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:17:28,920][10130] Avg episode reward: [(0, '0.007')] [2024-06-05 18:17:30,538][10367] Updated weights for policy 0, policy_version 4700 (0.0034) [2024-06-05 18:17:33,110][10367] Updated weights for policy 0, policy_version 4710 (0.0028) [2024-06-05 18:17:33,924][10130] Fps is (10 sec: 50771.7, 60 sec: 48056.9, 300 sec: 48596.0). Total num frames: 77185024. Throughput: 0: 48226.3. Samples: 77276440. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:17:33,924][10130] Avg episode reward: [(0, '0.010')] [2024-06-05 18:17:37,554][10367] Updated weights for policy 0, policy_version 4720 (0.0026) [2024-06-05 18:17:38,920][10130] Fps is (10 sec: 45875.0, 60 sec: 47786.6, 300 sec: 48652.1). Total num frames: 77414400. Throughput: 0: 48136.0. Samples: 77561480. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:17:38,920][10130] Avg episode reward: [(0, '0.013')] [2024-06-05 18:17:38,980][10347] Saving new best policy, reward=0.013! [2024-06-05 18:17:40,141][10367] Updated weights for policy 0, policy_version 4730 (0.0025) [2024-06-05 18:17:43,920][10130] Fps is (10 sec: 45891.9, 60 sec: 48062.6, 300 sec: 48596.6). Total num frames: 77643776. Throughput: 0: 47862.7. Samples: 77700920. Policy #0 lag: (min: 0.0, avg: 10.6, max: 20.0) [2024-06-05 18:17:43,920][10130] Avg episode reward: [(0, '0.010')] [2024-06-05 18:17:44,205][10367] Updated weights for policy 0, policy_version 4740 (0.0038) [2024-06-05 18:17:47,203][10367] Updated weights for policy 0, policy_version 4750 (0.0031) [2024-06-05 18:17:48,920][10130] Fps is (10 sec: 49152.5, 60 sec: 48332.9, 300 sec: 48652.2). Total num frames: 77905920. Throughput: 0: 48052.5. Samples: 77987960. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-05 18:17:48,920][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:17:50,742][10367] Updated weights for policy 0, policy_version 4760 (0.0025) [2024-06-05 18:17:53,920][10130] Fps is (10 sec: 49152.1, 60 sec: 47786.7, 300 sec: 48541.1). Total num frames: 78135296. Throughput: 0: 48257.8. Samples: 78281200. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-05 18:17:53,920][10130] Avg episode reward: [(0, '0.011')] [2024-06-05 18:17:53,997][10367] Updated weights for policy 0, policy_version 4770 (0.0030) [2024-06-05 18:17:57,487][10367] Updated weights for policy 0, policy_version 4780 (0.0030) [2024-06-05 18:17:58,920][10130] Fps is (10 sec: 49151.6, 60 sec: 48332.8, 300 sec: 48707.7). Total num frames: 78397440. Throughput: 0: 48127.5. Samples: 78426980. Policy #0 lag: (min: 1.0, avg: 10.4, max: 22.0) [2024-06-05 18:17:58,920][10130] Avg episode reward: [(0, '0.013')] [2024-06-05 18:18:00,483][10367] Updated weights for policy 0, policy_version 4790 (0.0025) [2024-06-05 18:18:02,976][10347] Signal inference workers to stop experience collection... (1150 times) [2024-06-05 18:18:02,976][10347] Signal inference workers to resume experience collection... (1150 times) [2024-06-05 18:18:02,985][10367] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-06-05 18:18:03,005][10367] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-06-05 18:18:03,920][10130] Fps is (10 sec: 47512.8, 60 sec: 48332.7, 300 sec: 48596.6). Total num frames: 78610432. Throughput: 0: 48143.7. Samples: 78721520. Policy #0 lag: (min: 1.0, avg: 10.4, max: 22.0) [2024-06-05 18:18:03,921][10130] Avg episode reward: [(0, '0.009')] [2024-06-05 18:18:04,331][10367] Updated weights for policy 0, policy_version 4800 (0.0026) [2024-06-05 18:18:07,078][10367] Updated weights for policy 0, policy_version 4810 (0.0038) [2024-06-05 18:18:08,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48059.8, 300 sec: 48596.6). Total num frames: 78872576. Throughput: 0: 48372.5. Samples: 79018540. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:18:08,920][10130] Avg episode reward: [(0, '0.011')] [2024-06-05 18:18:10,896][10367] Updated weights for policy 0, policy_version 4820 (0.0046) [2024-06-05 18:18:13,920][10130] Fps is (10 sec: 50791.4, 60 sec: 48605.8, 300 sec: 48541.5). Total num frames: 79118336. Throughput: 0: 48461.8. Samples: 79165520. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-05 18:18:13,920][10130] Avg episode reward: [(0, '0.010')] [2024-06-05 18:18:13,967][10367] Updated weights for policy 0, policy_version 4830 (0.0029) [2024-06-05 18:18:17,433][10367] Updated weights for policy 0, policy_version 4840 (0.0026) [2024-06-05 18:18:18,920][10130] Fps is (10 sec: 47513.5, 60 sec: 48059.7, 300 sec: 48652.1). Total num frames: 79347712. Throughput: 0: 48419.5. Samples: 79455140. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-05 18:18:18,920][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:18:20,863][10367] Updated weights for policy 0, policy_version 4850 (0.0035) [2024-06-05 18:18:23,920][10130] Fps is (10 sec: 47513.5, 60 sec: 48605.9, 300 sec: 48707.8). Total num frames: 79593472. Throughput: 0: 48383.6. Samples: 79738740. Policy #0 lag: (min: 1.0, avg: 11.6, max: 22.0) [2024-06-05 18:18:23,920][10130] Avg episode reward: [(0, '0.010')] [2024-06-05 18:18:24,617][10367] Updated weights for policy 0, policy_version 4860 (0.0032) [2024-06-05 18:18:27,483][10367] Updated weights for policy 0, policy_version 4870 (0.0028) [2024-06-05 18:18:28,920][10130] Fps is (10 sec: 49151.4, 60 sec: 48059.6, 300 sec: 48541.1). Total num frames: 79839232. Throughput: 0: 48529.2. Samples: 79884740. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:18:28,921][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:18:31,509][10367] Updated weights for policy 0, policy_version 4880 (0.0025) [2024-06-05 18:18:33,920][10130] Fps is (10 sec: 50790.2, 60 sec: 48608.8, 300 sec: 48541.1). Total num frames: 80101376. Throughput: 0: 48709.2. Samples: 80179880. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:18:33,920][10130] Avg episode reward: [(0, '0.011')] [2024-06-05 18:18:34,030][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004890_80117760.pth... [2024-06-05 18:18:34,044][10367] Updated weights for policy 0, policy_version 4890 (0.0039) [2024-06-05 18:18:34,072][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004178_68452352.pth [2024-06-05 18:18:38,027][10367] Updated weights for policy 0, policy_version 4900 (0.0027) [2024-06-05 18:18:38,920][10130] Fps is (10 sec: 47514.4, 60 sec: 48332.9, 300 sec: 48596.6). Total num frames: 80314368. Throughput: 0: 48740.5. Samples: 80474520. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:18:38,920][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:18:38,934][10347] Saving new best policy, reward=0.015! [2024-06-05 18:18:40,934][10367] Updated weights for policy 0, policy_version 4910 (0.0031) [2024-06-05 18:18:43,920][10130] Fps is (10 sec: 45875.7, 60 sec: 48605.9, 300 sec: 48652.2). Total num frames: 80560128. Throughput: 0: 48533.4. Samples: 80610980. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-05 18:18:43,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:18:44,637][10367] Updated weights for policy 0, policy_version 4920 (0.0026) [2024-06-05 18:18:47,838][10367] Updated weights for policy 0, policy_version 4930 (0.0029) [2024-06-05 18:18:48,920][10130] Fps is (10 sec: 50789.6, 60 sec: 48605.7, 300 sec: 48596.6). Total num frames: 80822272. Throughput: 0: 48488.1. Samples: 80903480. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 18:18:48,921][10130] Avg episode reward: [(0, '0.013')] [2024-06-05 18:18:51,624][10367] Updated weights for policy 0, policy_version 4940 (0.0032) [2024-06-05 18:18:53,920][10130] Fps is (10 sec: 50789.1, 60 sec: 48878.8, 300 sec: 48541.1). Total num frames: 81068032. Throughput: 0: 48307.3. Samples: 81192380. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:18:53,921][10130] Avg episode reward: [(0, '0.014')] [2024-06-05 18:18:54,496][10367] Updated weights for policy 0, policy_version 4950 (0.0029) [2024-06-05 18:18:58,555][10367] Updated weights for policy 0, policy_version 4960 (0.0036) [2024-06-05 18:18:58,920][10130] Fps is (10 sec: 45875.4, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 81281024. Throughput: 0: 48233.3. Samples: 81336020. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:18:58,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:18:58,921][10347] Saving new best policy, reward=0.016! [2024-06-05 18:19:01,180][10367] Updated weights for policy 0, policy_version 4970 (0.0038) [2024-06-05 18:19:03,920][10130] Fps is (10 sec: 44238.0, 60 sec: 48333.0, 300 sec: 48541.1). Total num frames: 81510400. Throughput: 0: 48164.1. Samples: 81622520. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:19:03,920][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:19:05,148][10367] Updated weights for policy 0, policy_version 4980 (0.0028) [2024-06-05 18:19:08,145][10367] Updated weights for policy 0, policy_version 4990 (0.0022) [2024-06-05 18:19:08,920][10130] Fps is (10 sec: 49151.6, 60 sec: 48332.7, 300 sec: 48485.5). Total num frames: 81772544. Throughput: 0: 48267.9. Samples: 81910800. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-05 18:19:08,920][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:19:11,770][10367] Updated weights for policy 0, policy_version 5000 (0.0032) [2024-06-05 18:19:13,920][10130] Fps is (10 sec: 50790.1, 60 sec: 48332.8, 300 sec: 48541.1). Total num frames: 82018304. Throughput: 0: 48343.7. Samples: 82060200. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-05 18:19:13,920][10130] Avg episode reward: [(0, '0.014')] [2024-06-05 18:19:14,902][10367] Updated weights for policy 0, policy_version 5010 (0.0031) [2024-06-05 18:19:18,725][10367] Updated weights for policy 0, policy_version 5020 (0.0026) [2024-06-05 18:19:18,920][10130] Fps is (10 sec: 47514.2, 60 sec: 48332.8, 300 sec: 48541.1). Total num frames: 82247680. Throughput: 0: 48265.4. Samples: 82351820. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:19:18,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:19:21,763][10367] Updated weights for policy 0, policy_version 5030 (0.0025) [2024-06-05 18:19:23,920][10130] Fps is (10 sec: 47513.2, 60 sec: 48332.7, 300 sec: 48541.1). Total num frames: 82493440. Throughput: 0: 48062.1. Samples: 82637320. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:19:23,920][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:19:25,467][10367] Updated weights for policy 0, policy_version 5040 (0.0033) [2024-06-05 18:19:28,369][10367] Updated weights for policy 0, policy_version 5050 (0.0030) [2024-06-05 18:19:28,920][10130] Fps is (10 sec: 50790.1, 60 sec: 48605.9, 300 sec: 48486.1). Total num frames: 82755584. Throughput: 0: 48358.1. Samples: 82787100. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-05 18:19:28,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:19:30,385][10347] Signal inference workers to stop experience collection... (1200 times) [2024-06-05 18:19:30,386][10347] Signal inference workers to resume experience collection... (1200 times) [2024-06-05 18:19:30,425][10367] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-06-05 18:19:30,425][10367] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-06-05 18:19:32,034][10367] Updated weights for policy 0, policy_version 5060 (0.0038) [2024-06-05 18:19:33,920][10130] Fps is (10 sec: 47513.7, 60 sec: 47786.7, 300 sec: 48374.4). Total num frames: 82968576. Throughput: 0: 48206.3. Samples: 83072760. Policy #0 lag: (min: 1.0, avg: 9.4, max: 22.0) [2024-06-05 18:19:33,920][10130] Avg episode reward: [(0, '0.012')] [2024-06-05 18:19:35,099][10367] Updated weights for policy 0, policy_version 5070 (0.0028) [2024-06-05 18:19:38,920][10130] Fps is (10 sec: 45875.5, 60 sec: 48332.8, 300 sec: 48485.5). Total num frames: 83214336. Throughput: 0: 48204.7. Samples: 83361580. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-05 18:19:38,920][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:19:38,999][10367] Updated weights for policy 0, policy_version 5080 (0.0034) [2024-06-05 18:19:42,079][10367] Updated weights for policy 0, policy_version 5090 (0.0035) [2024-06-05 18:19:43,920][10130] Fps is (10 sec: 49151.7, 60 sec: 48332.7, 300 sec: 48485.5). Total num frames: 83460096. Throughput: 0: 48076.8. Samples: 83499480. Policy #0 lag: (min: 1.0, avg: 10.9, max: 22.0) [2024-06-05 18:19:43,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:19:45,984][10367] Updated weights for policy 0, policy_version 5100 (0.0027) [2024-06-05 18:19:48,748][10367] Updated weights for policy 0, policy_version 5110 (0.0036) [2024-06-05 18:19:48,920][10130] Fps is (10 sec: 50790.6, 60 sec: 48332.9, 300 sec: 48430.0). Total num frames: 83722240. Throughput: 0: 48318.2. Samples: 83796840. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-05 18:19:48,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:19:52,601][10367] Updated weights for policy 0, policy_version 5120 (0.0021) [2024-06-05 18:19:53,920][10130] Fps is (10 sec: 47514.2, 60 sec: 47786.8, 300 sec: 48318.9). Total num frames: 83935232. Throughput: 0: 48329.5. Samples: 84085620. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:19:53,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:19:54,034][10347] Saving new best policy, reward=0.017! [2024-06-05 18:19:55,595][10367] Updated weights for policy 0, policy_version 5130 (0.0031) [2024-06-05 18:19:58,920][10130] Fps is (10 sec: 45875.0, 60 sec: 48332.8, 300 sec: 48375.1). Total num frames: 84180992. Throughput: 0: 48228.0. Samples: 84230460. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 18:19:58,920][10130] Avg episode reward: [(0, '0.013')] [2024-06-05 18:19:59,189][10367] Updated weights for policy 0, policy_version 5140 (0.0022) [2024-06-05 18:20:02,310][10367] Updated weights for policy 0, policy_version 5150 (0.0037) [2024-06-05 18:20:03,920][10130] Fps is (10 sec: 47513.7, 60 sec: 48332.8, 300 sec: 48430.0). Total num frames: 84410368. Throughput: 0: 48004.9. Samples: 84512040. Policy #0 lag: (min: 1.0, avg: 11.8, max: 23.0) [2024-06-05 18:20:03,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:20:06,099][10367] Updated weights for policy 0, policy_version 5160 (0.0034) [2024-06-05 18:20:08,920][10130] Fps is (10 sec: 49152.2, 60 sec: 48333.0, 300 sec: 48374.5). Total num frames: 84672512. Throughput: 0: 48268.2. Samples: 84809380. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:20:08,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:20:09,177][10367] Updated weights for policy 0, policy_version 5170 (0.0033) [2024-06-05 18:20:13,161][10367] Updated weights for policy 0, policy_version 5180 (0.0036) [2024-06-05 18:20:13,924][10130] Fps is (10 sec: 49133.7, 60 sec: 48056.8, 300 sec: 48318.3). Total num frames: 84901888. Throughput: 0: 47985.0. Samples: 84946600. Policy #0 lag: (min: 1.0, avg: 9.6, max: 21.0) [2024-06-05 18:20:13,924][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:20:15,943][10367] Updated weights for policy 0, policy_version 5190 (0.0019) [2024-06-05 18:20:18,923][10130] Fps is (10 sec: 45858.1, 60 sec: 48056.8, 300 sec: 48262.8). Total num frames: 85131264. Throughput: 0: 48273.9. Samples: 85245260. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-05 18:20:18,924][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:20:18,968][10347] Saving new best policy, reward=0.022! [2024-06-05 18:20:19,732][10367] Updated weights for policy 0, policy_version 5200 (0.0024) [2024-06-05 18:20:22,947][10367] Updated weights for policy 0, policy_version 5210 (0.0019) [2024-06-05 18:20:23,920][10130] Fps is (10 sec: 47530.9, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 85377024. Throughput: 0: 48171.0. Samples: 85529280. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-05 18:20:23,920][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:20:26,371][10367] Updated weights for policy 0, policy_version 5220 (0.0028) [2024-06-05 18:20:28,920][10130] Fps is (10 sec: 49170.1, 60 sec: 47786.7, 300 sec: 48263.4). Total num frames: 85622784. Throughput: 0: 48381.5. Samples: 85676640. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-05 18:20:28,920][10130] Avg episode reward: [(0, '0.021')] [2024-06-05 18:20:29,641][10367] Updated weights for policy 0, policy_version 5230 (0.0034) [2024-06-05 18:20:33,532][10367] Updated weights for policy 0, policy_version 5240 (0.0032) [2024-06-05 18:20:33,920][10130] Fps is (10 sec: 49151.7, 60 sec: 48332.8, 300 sec: 48207.8). Total num frames: 85868544. Throughput: 0: 48135.4. Samples: 85962940. Policy #0 lag: (min: 1.0, avg: 10.2, max: 20.0) [2024-06-05 18:20:33,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:20:34,043][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005242_85884928.pth... [2024-06-05 18:20:34,091][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004535_74301440.pth [2024-06-05 18:20:36,371][10367] Updated weights for policy 0, policy_version 5250 (0.0034) [2024-06-05 18:20:38,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48059.8, 300 sec: 48263.4). Total num frames: 86097920. Throughput: 0: 47961.4. Samples: 86243880. Policy #0 lag: (min: 1.0, avg: 10.2, max: 20.0) [2024-06-05 18:20:38,920][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:20:40,605][10367] Updated weights for policy 0, policy_version 5260 (0.0027) [2024-06-05 18:20:42,695][10347] Signal inference workers to stop experience collection... (1250 times) [2024-06-05 18:20:42,728][10367] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-06-05 18:20:42,803][10347] Signal inference workers to resume experience collection... (1250 times) [2024-06-05 18:20:42,803][10367] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-06-05 18:20:43,367][10367] Updated weights for policy 0, policy_version 5270 (0.0026) [2024-06-05 18:20:43,920][10130] Fps is (10 sec: 49152.8, 60 sec: 48332.9, 300 sec: 48318.9). Total num frames: 86360064. Throughput: 0: 48002.7. Samples: 86390580. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-05 18:20:43,920][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:20:47,182][10367] Updated weights for policy 0, policy_version 5280 (0.0031) [2024-06-05 18:20:48,920][10130] Fps is (10 sec: 47513.4, 60 sec: 47513.6, 300 sec: 48207.8). Total num frames: 86573056. Throughput: 0: 48181.8. Samples: 86680220. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:20:48,920][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:20:50,307][10367] Updated weights for policy 0, policy_version 5290 (0.0035) [2024-06-05 18:20:53,920][10130] Fps is (10 sec: 45874.4, 60 sec: 48059.6, 300 sec: 48152.3). Total num frames: 86818816. Throughput: 0: 47948.7. Samples: 86967080. Policy #0 lag: (min: 1.0, avg: 12.0, max: 22.0) [2024-06-05 18:20:53,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:20:54,058][10367] Updated weights for policy 0, policy_version 5300 (0.0033) [2024-06-05 18:20:56,916][10367] Updated weights for policy 0, policy_version 5310 (0.0029) [2024-06-05 18:20:58,920][10130] Fps is (10 sec: 50790.5, 60 sec: 48332.8, 300 sec: 48263.4). Total num frames: 87080960. Throughput: 0: 48188.4. Samples: 87114900. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-05 18:20:58,920][10130] Avg episode reward: [(0, '0.020')] [2024-06-05 18:21:00,859][10367] Updated weights for policy 0, policy_version 5320 (0.0026) [2024-06-05 18:21:03,601][10367] Updated weights for policy 0, policy_version 5330 (0.0026) [2024-06-05 18:21:03,920][10130] Fps is (10 sec: 50790.6, 60 sec: 48605.8, 300 sec: 48318.9). Total num frames: 87326720. Throughput: 0: 47968.7. Samples: 87403680. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-05 18:21:03,921][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:21:07,699][10367] Updated weights for policy 0, policy_version 5340 (0.0035) [2024-06-05 18:21:08,920][10130] Fps is (10 sec: 45874.7, 60 sec: 47786.6, 300 sec: 48207.8). Total num frames: 87539712. Throughput: 0: 48062.2. Samples: 87692080. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-05 18:21:08,920][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:21:10,594][10367] Updated weights for policy 0, policy_version 5350 (0.0029) [2024-06-05 18:21:13,920][10130] Fps is (10 sec: 44237.4, 60 sec: 47789.6, 300 sec: 48096.8). Total num frames: 87769088. Throughput: 0: 47904.9. Samples: 87832360. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:21:13,920][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:21:14,431][10367] Updated weights for policy 0, policy_version 5360 (0.0033) [2024-06-05 18:21:17,451][10367] Updated weights for policy 0, policy_version 5370 (0.0032) [2024-06-05 18:21:18,920][10130] Fps is (10 sec: 49151.5, 60 sec: 48335.6, 300 sec: 48207.8). Total num frames: 88031232. Throughput: 0: 47913.7. Samples: 88119060. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-05 18:21:18,921][10130] Avg episode reward: [(0, '0.021')] [2024-06-05 18:21:21,134][10367] Updated weights for policy 0, policy_version 5380 (0.0026) [2024-06-05 18:21:23,920][10130] Fps is (10 sec: 50789.7, 60 sec: 48332.8, 300 sec: 48263.4). Total num frames: 88276992. Throughput: 0: 48211.0. Samples: 88413380. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:21:23,920][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:21:24,184][10367] Updated weights for policy 0, policy_version 5390 (0.0033) [2024-06-05 18:21:28,114][10367] Updated weights for policy 0, policy_version 5400 (0.0022) [2024-06-05 18:21:28,920][10130] Fps is (10 sec: 47514.0, 60 sec: 48059.7, 300 sec: 48152.3). Total num frames: 88506368. Throughput: 0: 47995.4. Samples: 88550380. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:21:28,920][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:21:30,979][10367] Updated weights for policy 0, policy_version 5410 (0.0031) [2024-06-05 18:21:33,920][10130] Fps is (10 sec: 45875.2, 60 sec: 47786.7, 300 sec: 48096.7). Total num frames: 88735744. Throughput: 0: 48043.4. Samples: 88842180. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 18:21:33,922][10130] Avg episode reward: [(0, '0.016')] [2024-06-05 18:21:34,906][10367] Updated weights for policy 0, policy_version 5420 (0.0031) [2024-06-05 18:21:38,014][10367] Updated weights for policy 0, policy_version 5430 (0.0031) [2024-06-05 18:21:38,920][10130] Fps is (10 sec: 47513.5, 60 sec: 48059.6, 300 sec: 48208.4). Total num frames: 88981504. Throughput: 0: 48032.9. Samples: 89128560. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 18:21:38,921][10130] Avg episode reward: [(0, '0.018')] [2024-06-05 18:21:41,515][10367] Updated weights for policy 0, policy_version 5440 (0.0034) [2024-06-05 18:21:43,920][10130] Fps is (10 sec: 49152.4, 60 sec: 47786.6, 300 sec: 48207.8). Total num frames: 89227264. Throughput: 0: 48093.7. Samples: 89279120. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:21:43,920][10130] Avg episode reward: [(0, '0.021')] [2024-06-05 18:21:44,756][10367] Updated weights for policy 0, policy_version 5450 (0.0032) [2024-06-05 18:21:48,447][10367] Updated weights for policy 0, policy_version 5460 (0.0029) [2024-06-05 18:21:48,920][10130] Fps is (10 sec: 49151.5, 60 sec: 48332.6, 300 sec: 48152.3). Total num frames: 89473024. Throughput: 0: 48019.4. Samples: 89564560. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-05 18:21:48,921][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:21:51,702][10367] Updated weights for policy 0, policy_version 5470 (0.0033) [2024-06-05 18:21:53,920][10130] Fps is (10 sec: 47513.8, 60 sec: 48059.9, 300 sec: 48152.3). Total num frames: 89702400. Throughput: 0: 48049.0. Samples: 89854280. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 18:21:53,920][10130] Avg episode reward: [(0, '0.018')] [2024-06-05 18:21:55,230][10367] Updated weights for policy 0, policy_version 5480 (0.0032) [2024-06-05 18:21:58,658][10367] Updated weights for policy 0, policy_version 5490 (0.0034) [2024-06-05 18:21:58,921][10130] Fps is (10 sec: 49145.1, 60 sec: 48058.4, 300 sec: 48318.7). Total num frames: 89964544. Throughput: 0: 48190.7. Samples: 90001020. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-05 18:21:58,922][10130] Avg episode reward: [(0, '0.015')] [2024-06-05 18:22:01,947][10367] Updated weights for policy 0, policy_version 5500 (0.0029) [2024-06-05 18:22:03,920][10130] Fps is (10 sec: 50790.2, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 90210304. Throughput: 0: 48260.2. Samples: 90290760. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-05 18:22:03,920][10130] Avg episode reward: [(0, '0.024')] [2024-06-05 18:22:03,931][10347] Saving new best policy, reward=0.024! [2024-06-05 18:22:05,433][10367] Updated weights for policy 0, policy_version 5510 (0.0036) [2024-06-05 18:22:08,920][10130] Fps is (10 sec: 45882.8, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 90423296. Throughput: 0: 48134.8. Samples: 90579440. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 18:22:08,920][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:22:09,067][10367] Updated weights for policy 0, policy_version 5520 (0.0022) [2024-06-05 18:22:12,236][10367] Updated weights for policy 0, policy_version 5530 (0.0029) [2024-06-05 18:22:13,920][10130] Fps is (10 sec: 45875.2, 60 sec: 48332.8, 300 sec: 48152.3). Total num frames: 90669056. Throughput: 0: 48105.9. Samples: 90715140. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-05 18:22:13,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:22:15,884][10367] Updated weights for policy 0, policy_version 5540 (0.0035) [2024-06-05 18:22:18,920][10130] Fps is (10 sec: 49151.9, 60 sec: 48059.9, 300 sec: 48263.4). Total num frames: 90914816. Throughput: 0: 48012.1. Samples: 91002720. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:22:18,920][10130] Avg episode reward: [(0, '0.019')] [2024-06-05 18:22:19,029][10367] Updated weights for policy 0, policy_version 5550 (0.0023) [2024-06-05 18:22:20,109][10347] Signal inference workers to stop experience collection... (1300 times) [2024-06-05 18:22:20,110][10347] Signal inference workers to resume experience collection... (1300 times) [2024-06-05 18:22:20,139][10367] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-06-05 18:22:20,139][10367] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-06-05 18:22:22,455][10367] Updated weights for policy 0, policy_version 5560 (0.0024) [2024-06-05 18:22:23,920][10130] Fps is (10 sec: 49151.5, 60 sec: 48059.7, 300 sec: 48152.3). Total num frames: 91160576. Throughput: 0: 48218.7. Samples: 91298400. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:22:23,920][10130] Avg episode reward: [(0, '0.021')] [2024-06-05 18:22:25,920][10367] Updated weights for policy 0, policy_version 5570 (0.0030) [2024-06-05 18:22:28,924][10130] Fps is (10 sec: 47495.4, 60 sec: 48056.8, 300 sec: 48152.3). Total num frames: 91389952. Throughput: 0: 47912.0. Samples: 91435340. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-05 18:22:28,924][10130] Avg episode reward: [(0, '0.025')] [2024-06-05 18:22:29,326][10367] Updated weights for policy 0, policy_version 5580 (0.0039) [2024-06-05 18:22:32,771][10367] Updated weights for policy 0, policy_version 5590 (0.0034) [2024-06-05 18:22:33,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48332.8, 300 sec: 48207.8). Total num frames: 91635712. Throughput: 0: 48150.4. Samples: 91731320. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-05 18:22:33,920][10130] Avg episode reward: [(0, '0.017')] [2024-06-05 18:22:33,930][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005593_91635712.pth... [2024-06-05 18:22:33,975][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000004890_80117760.pth [2024-06-05 18:22:36,199][10367] Updated weights for policy 0, policy_version 5600 (0.0028) [2024-06-05 18:22:38,920][10130] Fps is (10 sec: 47531.3, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 91865088. Throughput: 0: 47955.0. Samples: 92012260. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-05 18:22:38,920][10130] Avg episode reward: [(0, '0.023')] [2024-06-05 18:22:39,392][10367] Updated weights for policy 0, policy_version 5610 (0.0032) [2024-06-05 18:22:42,899][10367] Updated weights for policy 0, policy_version 5620 (0.0035) [2024-06-05 18:22:43,922][10130] Fps is (10 sec: 49140.9, 60 sec: 48330.9, 300 sec: 48207.5). Total num frames: 92127232. Throughput: 0: 47985.5. Samples: 92160400. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:22:43,923][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:22:46,458][10367] Updated weights for policy 0, policy_version 5630 (0.0029) [2024-06-05 18:22:48,920][10130] Fps is (10 sec: 47513.3, 60 sec: 47786.7, 300 sec: 48152.3). Total num frames: 92340224. Throughput: 0: 47981.2. Samples: 92449920. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-05 18:22:48,921][10130] Avg episode reward: [(0, '0.020')] [2024-06-05 18:22:49,674][10367] Updated weights for policy 0, policy_version 5640 (0.0023) [2024-06-05 18:22:53,142][10367] Updated weights for policy 0, policy_version 5650 (0.0028) [2024-06-05 18:22:53,920][10130] Fps is (10 sec: 45886.1, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 92585984. Throughput: 0: 47852.5. Samples: 92732800. Policy #0 lag: (min: 1.0, avg: 9.8, max: 21.0) [2024-06-05 18:22:53,920][10130] Avg episode reward: [(0, '0.023')] [2024-06-05 18:22:56,671][10367] Updated weights for policy 0, policy_version 5660 (0.0032) [2024-06-05 18:22:58,920][10130] Fps is (10 sec: 50790.4, 60 sec: 48060.9, 300 sec: 48263.4). Total num frames: 92848128. Throughput: 0: 48080.8. Samples: 92878780. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-05 18:22:58,921][10130] Avg episode reward: [(0, '0.020')] [2024-06-05 18:23:00,203][10367] Updated weights for policy 0, policy_version 5670 (0.0025) [2024-06-05 18:23:03,641][10367] Updated weights for policy 0, policy_version 5680 (0.0036) [2024-06-05 18:23:03,920][10130] Fps is (10 sec: 47513.3, 60 sec: 47513.6, 300 sec: 48096.8). Total num frames: 93061120. Throughput: 0: 48024.0. Samples: 93163800. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-05 18:23:03,920][10130] Avg episode reward: [(0, '0.026')] [2024-06-05 18:23:04,041][10347] Saving new best policy, reward=0.026! [2024-06-05 18:23:07,302][10367] Updated weights for policy 0, policy_version 5690 (0.0032) [2024-06-05 18:23:08,920][10130] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 93306880. Throughput: 0: 47743.2. Samples: 93446840. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-05 18:23:08,921][10130] Avg episode reward: [(0, '0.029')] [2024-06-05 18:23:08,924][10347] Saving new best policy, reward=0.029! [2024-06-05 18:23:10,422][10367] Updated weights for policy 0, policy_version 5700 (0.0027) [2024-06-05 18:23:13,920][10130] Fps is (10 sec: 47513.3, 60 sec: 47786.6, 300 sec: 48096.7). Total num frames: 93536256. Throughput: 0: 47816.9. Samples: 93586920. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:23:13,920][10130] Avg episode reward: [(0, '0.022')] [2024-06-05 18:23:14,006][10367] Updated weights for policy 0, policy_version 5710 (0.0032) [2024-06-05 18:23:17,030][10367] Updated weights for policy 0, policy_version 5720 (0.0028) [2024-06-05 18:23:18,920][10130] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 48152.3). Total num frames: 93798400. Throughput: 0: 47626.3. Samples: 93874500. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:23:18,920][10130] Avg episode reward: [(0, '0.024')] [2024-06-05 18:23:21,223][10367] Updated weights for policy 0, policy_version 5730 (0.0029) [2024-06-05 18:23:23,920][10130] Fps is (10 sec: 49151.8, 60 sec: 47786.7, 300 sec: 48096.8). Total num frames: 94027776. Throughput: 0: 47857.3. Samples: 94165840. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:23:23,922][10130] Avg episode reward: [(0, '0.026')] [2024-06-05 18:23:24,091][10367] Updated weights for policy 0, policy_version 5740 (0.0022) [2024-06-05 18:23:27,878][10367] Updated weights for policy 0, policy_version 5750 (0.0035) [2024-06-05 18:23:28,920][10130] Fps is (10 sec: 47513.3, 60 sec: 48062.7, 300 sec: 48041.2). Total num frames: 94273536. Throughput: 0: 47747.3. Samples: 94308920. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:23:28,929][10130] Avg episode reward: [(0, '0.027')] [2024-06-05 18:23:30,816][10367] Updated weights for policy 0, policy_version 5760 (0.0018) [2024-06-05 18:23:33,920][10130] Fps is (10 sec: 47513.6, 60 sec: 47786.6, 300 sec: 48096.7). Total num frames: 94502912. Throughput: 0: 47730.7. Samples: 94597800. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-05 18:23:33,929][10130] Avg episode reward: [(0, '0.028')] [2024-06-05 18:23:34,866][10367] Updated weights for policy 0, policy_version 5770 (0.0021) [2024-06-05 18:23:37,661][10367] Updated weights for policy 0, policy_version 5780 (0.0028) [2024-06-05 18:23:38,920][10130] Fps is (10 sec: 47513.9, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 94748672. Throughput: 0: 47898.1. Samples: 94888220. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-05 18:23:38,920][10130] Avg episode reward: [(0, '0.023')] [2024-06-05 18:23:40,492][10347] Signal inference workers to stop experience collection... (1350 times) [2024-06-05 18:23:40,492][10347] Signal inference workers to resume experience collection... (1350 times) [2024-06-05 18:23:40,531][10367] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-06-05 18:23:40,532][10367] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-06-05 18:23:41,577][10367] Updated weights for policy 0, policy_version 5790 (0.0028) [2024-06-05 18:23:43,920][10130] Fps is (10 sec: 49152.2, 60 sec: 47788.5, 300 sec: 48041.2). Total num frames: 94994432. Throughput: 0: 47879.2. Samples: 95033340. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-05 18:23:43,920][10130] Avg episode reward: [(0, '0.020')] [2024-06-05 18:23:44,325][10367] Updated weights for policy 0, policy_version 5800 (0.0032) [2024-06-05 18:23:48,378][10367] Updated weights for policy 0, policy_version 5810 (0.0023) [2024-06-05 18:23:48,920][10130] Fps is (10 sec: 47513.5, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 95223808. Throughput: 0: 47924.0. Samples: 95320380. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-05 18:23:48,920][10130] Avg episode reward: [(0, '0.026')] [2024-06-05 18:23:51,313][10367] Updated weights for policy 0, policy_version 5820 (0.0027) [2024-06-05 18:23:53,920][10130] Fps is (10 sec: 47513.8, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 95469568. Throughput: 0: 47903.6. Samples: 95602500. Policy #0 lag: (min: 1.0, avg: 12.0, max: 24.0) [2024-06-05 18:23:53,920][10130] Avg episode reward: [(0, '0.026')] [2024-06-05 18:23:55,279][10367] Updated weights for policy 0, policy_version 5830 (0.0021) [2024-06-05 18:23:58,450][10367] Updated weights for policy 0, policy_version 5840 (0.0034) [2024-06-05 18:23:58,920][10130] Fps is (10 sec: 49151.4, 60 sec: 47786.6, 300 sec: 48152.3). Total num frames: 95715328. Throughput: 0: 48131.0. Samples: 95752820. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:23:58,920][10130] Avg episode reward: [(0, '0.024')] [2024-06-05 18:24:02,190][10367] Updated weights for policy 0, policy_version 5850 (0.0026) [2024-06-05 18:24:03,923][10130] Fps is (10 sec: 47496.2, 60 sec: 48056.8, 300 sec: 48040.6). Total num frames: 95944704. Throughput: 0: 48201.9. Samples: 96043760. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-05 18:24:03,924][10130] Avg episode reward: [(0, '0.032')] [2024-06-05 18:24:03,958][10347] Saving new best policy, reward=0.032! [2024-06-05 18:24:05,077][10367] Updated weights for policy 0, policy_version 5860 (0.0025) [2024-06-05 18:24:08,902][10367] Updated weights for policy 0, policy_version 5870 (0.0030) [2024-06-05 18:24:08,920][10130] Fps is (10 sec: 45875.8, 60 sec: 47786.7, 300 sec: 47985.7). Total num frames: 96174080. Throughput: 0: 48183.2. Samples: 96334080. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-05 18:24:08,920][10130] Avg episode reward: [(0, '0.024')] [2024-06-05 18:24:12,064][10367] Updated weights for policy 0, policy_version 5880 (0.0037) [2024-06-05 18:24:13,922][10130] Fps is (10 sec: 49158.4, 60 sec: 48330.9, 300 sec: 48096.4). Total num frames: 96436224. Throughput: 0: 47947.8. Samples: 96466680. Policy #0 lag: (min: 0.0, avg: 9.5, max: 23.0) [2024-06-05 18:24:13,923][10130] Avg episode reward: [(0, '0.029')] [2024-06-05 18:24:15,822][10367] Updated weights for policy 0, policy_version 5890 (0.0030) [2024-06-05 18:24:18,873][10367] Updated weights for policy 0, policy_version 5900 (0.0035) [2024-06-05 18:24:18,920][10130] Fps is (10 sec: 49152.3, 60 sec: 47786.7, 300 sec: 48041.2). Total num frames: 96665600. Throughput: 0: 47966.4. Samples: 96756280. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-05 18:24:18,920][10130] Avg episode reward: [(0, '0.030')] [2024-06-05 18:24:22,801][10367] Updated weights for policy 0, policy_version 5910 (0.0028) [2024-06-05 18:24:23,921][10130] Fps is (10 sec: 47519.6, 60 sec: 48058.9, 300 sec: 47985.5). Total num frames: 96911360. Throughput: 0: 47936.6. Samples: 97045420. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-05 18:24:23,922][10130] Avg episode reward: [(0, '0.024')] [2024-06-05 18:24:25,533][10367] Updated weights for policy 0, policy_version 5920 (0.0031) [2024-06-05 18:24:28,920][10130] Fps is (10 sec: 45874.6, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 97124352. Throughput: 0: 47990.2. Samples: 97192900. Policy #0 lag: (min: 0.0, avg: 13.1, max: 23.0) [2024-06-05 18:24:28,921][10130] Avg episode reward: [(0, '0.029')] [2024-06-05 18:24:29,274][10367] Updated weights for policy 0, policy_version 5930 (0.0025) [2024-06-05 18:24:32,348][10367] Updated weights for policy 0, policy_version 5940 (0.0032) [2024-06-05 18:24:33,923][10130] Fps is (10 sec: 50780.6, 60 sec: 48603.5, 300 sec: 48151.8). Total num frames: 97419264. Throughput: 0: 47975.5. Samples: 97479420. Policy #0 lag: (min: 0.0, avg: 11.7, max: 23.0) [2024-06-05 18:24:33,923][10130] Avg episode reward: [(0, '0.034')] [2024-06-05 18:24:33,929][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005946_97419264.pth... [2024-06-05 18:24:33,978][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005242_85884928.pth [2024-06-05 18:24:33,982][10347] Saving new best policy, reward=0.034! [2024-06-05 18:24:36,149][10367] Updated weights for policy 0, policy_version 5950 (0.0032) [2024-06-05 18:24:38,920][10130] Fps is (10 sec: 49152.0, 60 sec: 47786.6, 300 sec: 47985.7). Total num frames: 97615872. Throughput: 0: 48110.6. Samples: 97767480. Policy #0 lag: (min: 0.0, avg: 11.7, max: 23.0) [2024-06-05 18:24:38,920][10130] Avg episode reward: [(0, '0.033')] [2024-06-05 18:24:39,093][10367] Updated weights for policy 0, policy_version 5960 (0.0027) [2024-06-05 18:24:42,971][10367] Updated weights for policy 0, policy_version 5970 (0.0032) [2024-06-05 18:24:43,920][10130] Fps is (10 sec: 44250.1, 60 sec: 47786.7, 300 sec: 47930.1). Total num frames: 97861632. Throughput: 0: 48020.1. Samples: 97913720. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:24:43,920][10130] Avg episode reward: [(0, '0.035')] [2024-06-05 18:24:43,937][10347] Saving new best policy, reward=0.035! [2024-06-05 18:24:46,091][10367] Updated weights for policy 0, policy_version 5980 (0.0022) [2024-06-05 18:24:46,707][10347] Signal inference workers to stop experience collection... (1400 times) [2024-06-05 18:24:46,708][10347] Signal inference workers to resume experience collection... (1400 times) [2024-06-05 18:24:46,751][10367] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-06-05 18:24:46,751][10367] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-06-05 18:24:48,920][10130] Fps is (10 sec: 47514.1, 60 sec: 47786.7, 300 sec: 47985.7). Total num frames: 98091008. Throughput: 0: 47804.4. Samples: 98194780. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-05 18:24:48,920][10130] Avg episode reward: [(0, '0.033')] [2024-06-05 18:24:49,943][10367] Updated weights for policy 0, policy_version 5990 (0.0028) [2024-06-05 18:24:52,756][10367] Updated weights for policy 0, policy_version 6000 (0.0034) [2024-06-05 18:24:53,920][10130] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 48041.2). Total num frames: 98353152. Throughput: 0: 47923.1. Samples: 98490620. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-05 18:24:53,920][10130] Avg episode reward: [(0, '0.029')] [2024-06-05 18:24:56,632][10367] Updated weights for policy 0, policy_version 6010 (0.0038) [2024-06-05 18:24:58,920][10130] Fps is (10 sec: 49151.7, 60 sec: 47786.8, 300 sec: 48041.2). Total num frames: 98582528. Throughput: 0: 48257.2. Samples: 98638140. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-05 18:24:58,920][10130] Avg episode reward: [(0, '0.032')] [2024-06-05 18:24:59,742][10367] Updated weights for policy 0, policy_version 6020 (0.0030) [2024-06-05 18:25:03,363][10367] Updated weights for policy 0, policy_version 6030 (0.0033) [2024-06-05 18:25:03,920][10130] Fps is (10 sec: 45874.9, 60 sec: 47789.5, 300 sec: 47930.1). Total num frames: 98811904. Throughput: 0: 48108.8. Samples: 98921180. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-05 18:25:03,920][10130] Avg episode reward: [(0, '0.030')] [2024-06-05 18:25:06,411][10367] Updated weights for policy 0, policy_version 6040 (0.0047) [2024-06-05 18:25:08,920][10130] Fps is (10 sec: 49151.9, 60 sec: 48332.8, 300 sec: 48041.8). Total num frames: 99074048. Throughput: 0: 47986.9. Samples: 99204780. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-05 18:25:08,920][10130] Avg episode reward: [(0, '0.026')] [2024-06-05 18:25:10,394][10367] Updated weights for policy 0, policy_version 6050 (0.0026) [2024-06-05 18:25:13,542][10367] Updated weights for policy 0, policy_version 6060 (0.0020) [2024-06-05 18:25:13,920][10130] Fps is (10 sec: 49152.3, 60 sec: 47788.6, 300 sec: 48041.8). Total num frames: 99303424. Throughput: 0: 47821.0. Samples: 99344840. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:25:13,920][10130] Avg episode reward: [(0, '0.034')] [2024-06-05 18:25:17,323][10367] Updated weights for policy 0, policy_version 6070 (0.0024) [2024-06-05 18:25:18,920][10130] Fps is (10 sec: 47514.0, 60 sec: 48059.7, 300 sec: 48041.2). Total num frames: 99549184. Throughput: 0: 47937.0. Samples: 99636440. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:25:18,920][10130] Avg episode reward: [(0, '0.033')] [2024-06-05 18:25:20,347][10367] Updated weights for policy 0, policy_version 6080 (0.0025) [2024-06-05 18:25:23,920][10130] Fps is (10 sec: 45874.9, 60 sec: 47514.4, 300 sec: 47930.1). Total num frames: 99762176. Throughput: 0: 47834.2. Samples: 99920020. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-05 18:25:23,920][10130] Avg episode reward: [(0, '0.031')] [2024-06-05 18:25:24,074][10367] Updated weights for policy 0, policy_version 6090 (0.0031) [2024-06-05 18:25:27,183][10367] Updated weights for policy 0, policy_version 6100 (0.0027) [2024-06-05 18:25:28,920][10130] Fps is (10 sec: 49151.9, 60 sec: 48605.9, 300 sec: 48041.2). Total num frames: 100040704. Throughput: 0: 47741.4. Samples: 100062080. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-05 18:25:28,921][10130] Avg episode reward: [(0, '0.031')] [2024-06-05 18:25:30,901][10367] Updated weights for policy 0, policy_version 6110 (0.0029) [2024-06-05 18:25:33,718][10367] Updated weights for policy 0, policy_version 6120 (0.0033) [2024-06-05 18:25:33,920][10130] Fps is (10 sec: 50790.3, 60 sec: 47515.9, 300 sec: 48041.2). Total num frames: 100270080. Throughput: 0: 48056.3. Samples: 100357320. Policy #0 lag: (min: 0.0, avg: 8.8, max: 22.0) [2024-06-05 18:25:33,920][10130] Avg episode reward: [(0, '0.027')] [2024-06-05 18:25:37,662][10367] Updated weights for policy 0, policy_version 6130 (0.0028) [2024-06-05 18:25:38,920][10130] Fps is (10 sec: 47513.6, 60 sec: 48332.9, 300 sec: 47985.7). Total num frames: 100515840. Throughput: 0: 48094.2. Samples: 100654860. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-05 18:25:38,921][10130] Avg episode reward: [(0, '0.031')] [2024-06-05 18:25:40,309][10367] Updated weights for policy 0, policy_version 6140 (0.0029) [2024-06-05 18:25:43,920][10130] Fps is (10 sec: 45875.2, 60 sec: 47786.6, 300 sec: 47985.7). Total num frames: 100728832. Throughput: 0: 47969.3. Samples: 100796760. Policy #0 lag: (min: 0.0, avg: 10.3, max: 26.0) [2024-06-05 18:25:43,921][10130] Avg episode reward: [(0, '0.032')] [2024-06-05 18:25:44,287][10367] Updated weights for policy 0, policy_version 6150 (0.0027) [2024-06-05 18:25:45,375][10347] Signal inference workers to stop experience collection... (1450 times) [2024-06-05 18:25:45,419][10367] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-06-05 18:25:45,425][10347] Signal inference workers to resume experience collection... (1450 times) [2024-06-05 18:25:45,440][10367] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-06-05 18:25:47,308][10367] Updated weights for policy 0, policy_version 6160 (0.0033) [2024-06-05 18:25:48,920][10130] Fps is (10 sec: 47513.6, 60 sec: 48332.8, 300 sec: 48041.2). Total num frames: 100990976. Throughput: 0: 47991.2. Samples: 101080780. Policy #0 lag: (min: 0.0, avg: 10.3, max: 26.0) [2024-06-05 18:25:48,920][10130] Avg episode reward: [(0, '0.036')] [2024-06-05 18:25:48,921][10347] Saving new best policy, reward=0.036! [2024-06-05 18:25:51,113][10367] Updated weights for policy 0, policy_version 6170 (0.0030) [2024-06-05 18:25:53,920][10130] Fps is (10 sec: 49151.6, 60 sec: 47786.5, 300 sec: 47930.1). Total num frames: 101220352. Throughput: 0: 47984.3. Samples: 101364080. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-05 18:25:53,921][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:25:54,317][10367] Updated weights for policy 0, policy_version 6180 (0.0037) [2024-06-05 18:25:57,899][10367] Updated weights for policy 0, policy_version 6190 (0.0030) [2024-06-05 18:25:58,920][10130] Fps is (10 sec: 47513.3, 60 sec: 48059.7, 300 sec: 47930.1). Total num frames: 101466112. Throughput: 0: 48090.6. Samples: 101508920. Policy #0 lag: (min: 1.0, avg: 11.6, max: 23.0) [2024-06-05 18:25:58,920][10130] Avg episode reward: [(0, '0.036')] [2024-06-05 18:26:00,956][10367] Updated weights for policy 0, policy_version 6200 (0.0026) [2024-06-05 18:26:03,920][10130] Fps is (10 sec: 47514.4, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 101695488. Throughput: 0: 48140.9. Samples: 101802780. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-05 18:26:03,920][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:26:03,930][10347] Saving new best policy, reward=0.037! [2024-06-05 18:26:04,665][10367] Updated weights for policy 0, policy_version 6210 (0.0030) [2024-06-05 18:26:07,868][10367] Updated weights for policy 0, policy_version 6220 (0.0032) [2024-06-05 18:26:08,920][10130] Fps is (10 sec: 47514.1, 60 sec: 47786.8, 300 sec: 48041.2). Total num frames: 101941248. Throughput: 0: 48343.2. Samples: 102095460. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-05 18:26:08,920][10130] Avg episode reward: [(0, '0.030')] [2024-06-05 18:26:11,486][10367] Updated weights for policy 0, policy_version 6230 (0.0031) [2024-06-05 18:26:13,923][10130] Fps is (10 sec: 50771.7, 60 sec: 48329.8, 300 sec: 48040.6). Total num frames: 102203392. Throughput: 0: 48304.1. Samples: 102235940. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 18:26:13,924][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:26:14,930][10367] Updated weights for policy 0, policy_version 6240 (0.0022) [2024-06-05 18:26:18,200][10367] Updated weights for policy 0, policy_version 6250 (0.0027) [2024-06-05 18:26:18,920][10130] Fps is (10 sec: 47513.5, 60 sec: 47786.7, 300 sec: 47930.2). Total num frames: 102416384. Throughput: 0: 48069.5. Samples: 102520440. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 18:26:18,920][10130] Avg episode reward: [(0, '0.040')] [2024-06-05 18:26:18,921][10347] Saving new best policy, reward=0.040! [2024-06-05 18:26:21,618][10367] Updated weights for policy 0, policy_version 6260 (0.0031) [2024-06-05 18:26:23,920][10130] Fps is (10 sec: 45892.0, 60 sec: 48332.8, 300 sec: 47985.7). Total num frames: 102662144. Throughput: 0: 47905.7. Samples: 102810620. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-05 18:26:23,920][10130] Avg episode reward: [(0, '0.036')] [2024-06-05 18:26:24,831][10367] Updated weights for policy 0, policy_version 6270 (0.0037) [2024-06-05 18:26:28,225][10367] Updated weights for policy 0, policy_version 6280 (0.0029) [2024-06-05 18:26:28,920][10130] Fps is (10 sec: 49151.4, 60 sec: 47786.6, 300 sec: 48041.2). Total num frames: 102907904. Throughput: 0: 47981.3. Samples: 102955920. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-05 18:26:28,920][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:26:31,938][10367] Updated weights for policy 0, policy_version 6290 (0.0027) [2024-06-05 18:26:33,920][10130] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 48041.2). Total num frames: 103153664. Throughput: 0: 48217.7. Samples: 103250580. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-05 18:26:33,920][10130] Avg episode reward: [(0, '0.036')] [2024-06-05 18:26:34,005][10347] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006297_103170048.pth... [2024-06-05 18:26:34,052][10347] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005593_91635712.pth [2024-06-05 18:26:35,272][10367] Updated weights for policy 0, policy_version 6300 (0.0035) [2024-06-05 18:26:38,873][10367] Updated weights for policy 0, policy_version 6310 (0.0029) [2024-06-05 18:26:38,920][10130] Fps is (10 sec: 47513.8, 60 sec: 47786.6, 300 sec: 47985.7). Total num frames: 103383040. Throughput: 0: 48182.4. Samples: 103532280. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-05 18:26:38,920][10130] Avg episode reward: [(0, '0.041')] [2024-06-05 18:26:42,166][10367] Updated weights for policy 0, policy_version 6320 (0.0031) [2024-06-05 18:26:42,804][10347] Signal inference workers to stop experience collection... (1500 times) [2024-06-05 18:26:42,805][10347] Signal inference workers to resume experience collection... (1500 times) [2024-06-05 18:26:42,844][10367] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-06-05 18:26:42,844][10367] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-06-05 18:26:43,920][10130] Fps is (10 sec: 49151.3, 60 sec: 48605.8, 300 sec: 48041.2). Total num frames: 103645184. Throughput: 0: 48091.0. Samples: 103673020. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-05 18:26:43,920][10130] Avg episode reward: [(0, '0.028')] [2024-06-05 18:26:45,687][10367] Updated weights for policy 0, policy_version 6330 (0.0024) [2024-06-05 18:26:48,923][10130] Fps is (10 sec: 47500.4, 60 sec: 47784.4, 300 sec: 47985.2). Total num frames: 103858176. Throughput: 0: 47868.1. Samples: 103956980. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-05 18:26:48,923][10130] Avg episode reward: [(0, '0.038')] [2024-06-05 18:26:48,979][10367] Updated weights for policy 0, policy_version 6340 (0.0034) [2024-06-05 18:26:52,483][10367] Updated weights for policy 0, policy_version 6350 (0.0032) [2024-06-05 18:26:53,920][10130] Fps is (10 sec: 47514.3, 60 sec: 48332.9, 300 sec: 47985.9). Total num frames: 104120320. Throughput: 0: 47956.8. Samples: 104253520. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-05 18:26:53,920][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:26:55,693][10367] Updated weights for policy 0, policy_version 6360 (0.0029) [2024-06-05 18:26:58,920][10130] Fps is (10 sec: 45887.8, 60 sec: 47513.6, 300 sec: 47819.1). Total num frames: 104316928. Throughput: 0: 47907.4. Samples: 104391600. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-05 18:26:58,920][10130] Avg episode reward: [(0, '0.038')] [2024-06-05 18:26:59,446][10367] Updated weights for policy 0, policy_version 6370 (0.0021) [2024-06-05 18:27:02,516][10367] Updated weights for policy 0, policy_version 6380 (0.0031) [2024-06-05 18:27:03,920][10130] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 104611840. Throughput: 0: 48042.6. Samples: 104682360. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-05 18:27:03,926][10130] Avg episode reward: [(0, '0.036')] [2024-06-05 18:27:06,181][10367] Updated weights for policy 0, policy_version 6390 (0.0025) [2024-06-05 18:27:08,920][10130] Fps is (10 sec: 50790.1, 60 sec: 48059.6, 300 sec: 47985.7). Total num frames: 104824832. Throughput: 0: 47833.2. Samples: 104963120. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-05 18:27:08,921][10130] Avg episode reward: [(0, '0.039')] [2024-06-05 18:27:09,380][10367] Updated weights for policy 0, policy_version 6400 (0.0031) [2024-06-05 18:27:12,919][10367] Updated weights for policy 0, policy_version 6410 (0.0028) [2024-06-05 18:27:13,920][10130] Fps is (10 sec: 44237.0, 60 sec: 47516.5, 300 sec: 47930.1). Total num frames: 105054208. Throughput: 0: 47918.4. Samples: 105112240. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-05 18:27:13,920][10130] Avg episode reward: [(0, '0.040')] [2024-06-05 18:27:16,153][10367] Updated weights for policy 0, policy_version 6420 (0.0031) [2024-06-05 18:27:18,920][10130] Fps is (10 sec: 45875.8, 60 sec: 47786.6, 300 sec: 47874.6). Total num frames: 105283584. Throughput: 0: 47747.1. Samples: 105399200. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-05 18:27:18,920][10130] Avg episode reward: [(0, '0.044')] [2024-06-05 18:27:18,997][10347] Saving new best policy, reward=0.044! [2024-06-05 18:27:19,818][10367] Updated weights for policy 0, policy_version 6430 (0.0033) [2024-06-05 18:27:22,920][10367] Updated weights for policy 0, policy_version 6440 (0.0023) [2024-06-05 18:27:23,920][10130] Fps is (10 sec: 50789.9, 60 sec: 48332.8, 300 sec: 48041.8). Total num frames: 105562112. Throughput: 0: 47891.1. Samples: 105687380. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-05 18:27:23,920][10130] Avg episode reward: [(0, '0.039')] [2024-06-05 18:27:26,668][10367] Updated weights for policy 0, policy_version 6450 (0.0034) [2024-06-05 18:27:28,920][10130] Fps is (10 sec: 50790.1, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 105791488. Throughput: 0: 48201.0. Samples: 105842060. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-05 18:27:28,920][10130] Avg episode reward: [(0, '0.039')] [2024-06-05 18:27:29,772][10367] Updated weights for policy 0, policy_version 6460 (0.0035) [2024-06-05 18:27:33,346][10367] Updated weights for policy 0, policy_version 6470 (0.0035) [2024-06-05 18:27:33,920][10130] Fps is (10 sec: 45875.4, 60 sec: 47786.7, 300 sec: 47985.7). Total num frames: 106020864. Throughput: 0: 48353.3. Samples: 106132740. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-05 18:27:33,920][10130] Avg episode reward: [(0, '0.037')] [2024-06-05 18:27:36,340][10367] Updated weights for policy 0, policy_version 6480 (0.0040) [2024-06-05 18:27:38,920][10130] Fps is (10 sec: 47513.2, 60 sec: 48059.7, 300 sec: 47930.5). Total num frames: 106266624. Throughput: 0: 48107.0. Samples: 106418340. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:27:38,920][10130] Avg episode reward: [(0, '0.032')] [2024-06-05 18:27:40,011][10367] Updated weights for policy 0, policy_version 6490 (0.0034) [2024-06-05 18:27:43,099][10367] Updated weights for policy 0, policy_version 6500 (0.0030) [2024-06-05 18:27:43,920][10130] Fps is (10 sec: 49152.1, 60 sec: 47786.8, 300 sec: 48041.2). Total num frames: 106512384. Throughput: 0: 48242.3. Samples: 106562500. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-05 18:27:43,920][10130] Avg episode reward: [(0, '0.040')] [2024-06-05 18:27:46,954][10367] Updated weights for policy 0, policy_version 6510 (0.0030) [2024-06-05 18:27:48,920][10130] Fps is (10 sec: 49153.0, 60 sec: 48335.1, 300 sec: 48041.2). Total num frames: 106758144. Throughput: 0: 48267.2. Samples: 106854380. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-05 18:27:48,920][10130] Avg episode reward: [(0, '0.042')] [2024-06-05 18:27:50,016][10367] Updated weights for policy 0, policy_version 6520 (0.0027) [2024-06-05 18:27:53,920][10130] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 106971136. Throughput: 0: 48253.9. Samples: 107134540. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-05 18:27:53,920][10130] Avg episode reward: [(0, '0.033')] [2024-06-05 18:27:53,965][10367] Updated weights for policy 0, policy_version 6530 (0.0027) [2024-06-05 18:27:57,085][10367] Updated weights for policy 0, policy_version 6540 (0.0034) [2024-06-05 18:27:58,920][10130] Fps is (10 sec: 47513.6, 60 sec: 48606.0, 300 sec: 48041.2). Total num frames: 107233280. Throughput: 0: 48017.8. Samples: 107273040. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-05 18:27:58,920][10130] Avg episode reward: [(0, '0.040')] [2024-06-05 18:28:00,618][10347] Signal inference workers to stop experience collection... (1550 times) [2024-06-05 18:28:00,619][10347] Signal inference workers to resume experience collection... (1550 times) [2024-06-05 18:28:00,643][10367] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-06-05 18:28:00,644][10367] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-06-05 18:28:00,754][10367] Updated weights for policy 0, policy_version 6550 (0.0031) [2024-06-05 18:28:03,920][10130] Fps is (10 sec: 49151.6, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 107462656. Throughput: 0: 48060.8. Samples: 107561940. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-05 18:28:03,920][10130] Avg episode reward: [(0, '0.043')] [2024-06-05 18:28:03,996][10367] Updated weights for policy 0, policy_version 6560 (0.0032) [2024-06-05 18:28:07,477][10367] Updated weights for policy 0, policy_version 6570 (0.0029) [2024-06-05 18:28:08,920][10130] Fps is (10 sec: 49151.6, 60 sec: 48332.9, 300 sec: 48096.8). Total num frames: 107724800. Throughput: 0: 48284.0. Samples: 107860160. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-05 18:28:08,920][10130] Avg episode reward: [(0, '0.042')] [2024-06-05 18:28:10,585][10367] Updated weights for policy 0, policy_version 6580 (0.0029) [2024-06-05 18:28:13,920][10130] Fps is (10 sec: 49152.1, 60 sec: 48332.7, 300 sec: 47985.7). Total num frames: 107954176. Throughput: 0: 47972.9. Samples: 108000840. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-05 18:28:13,920][10130] Avg episode reward: [(0, '0.039')] [2024-06-05 18:28:14,111][10367] Updated weights for policy 0, policy_version 6590 (0.0027) [2024-06-05 18:28:17,480][10367] Updated weights for policy 0, policy_version 6600 (0.0038) [2024-06-05 18:28:18,920][10130] Fps is (10 sec: 49151.8, 60 sec: 48878.9, 300 sec: 48096.8). Total num frames: 108216320. Throughput: 0: 47985.7. Samples: 108292100. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-05 18:28:18,920][10130] Avg episode reward: [(0, '0.041')] [2024-06-05 18:28:21,055][10367] Updated weights for policy 0, policy_version 6610 (0.0031) [2024-06-06 11:59:28,956][02692] Saving configuration to /workspace/metta/train_dir/p2.metta.4/config.json... [2024-06-06 11:59:28,998][02692] Rollout worker 0 uses device cpu [2024-06-06 11:59:28,999][02692] Rollout worker 1 uses device cpu [2024-06-06 11:59:28,999][02692] Rollout worker 2 uses device cpu [2024-06-06 11:59:29,000][02692] Rollout worker 3 uses device cpu [2024-06-06 11:59:29,001][02692] Rollout worker 4 uses device cpu [2024-06-06 11:59:29,001][02692] Rollout worker 5 uses device cpu [2024-06-06 11:59:29,001][02692] Rollout worker 6 uses device cpu [2024-06-06 11:59:29,002][02692] Rollout worker 7 uses device cpu [2024-06-06 11:59:29,002][02692] Rollout worker 8 uses device cpu [2024-06-06 11:59:29,003][02692] Rollout worker 9 uses device cpu [2024-06-06 11:59:29,003][02692] Rollout worker 10 uses device cpu [2024-06-06 11:59:29,004][02692] Rollout worker 11 uses device cpu [2024-06-06 11:59:29,005][02692] Rollout worker 12 uses device cpu [2024-06-06 11:59:29,005][02692] Rollout worker 13 uses device cpu [2024-06-06 11:59:29,005][02692] Rollout worker 14 uses device cpu [2024-06-06 11:59:29,005][02692] Rollout worker 15 uses device cpu [2024-06-06 11:59:29,005][02692] Rollout worker 16 uses device cpu [2024-06-06 11:59:29,006][02692] Rollout worker 17 uses device cpu [2024-06-06 11:59:29,006][02692] Rollout worker 18 uses device cpu [2024-06-06 11:59:29,006][02692] Rollout worker 19 uses device cpu [2024-06-06 11:59:29,006][02692] Rollout worker 20 uses device cpu [2024-06-06 11:59:29,006][02692] Rollout worker 21 uses device cpu [2024-06-06 11:59:29,007][02692] Rollout worker 22 uses device cpu [2024-06-06 11:59:29,007][02692] Rollout worker 23 uses device cpu [2024-06-06 11:59:29,007][02692] Rollout worker 24 uses device cpu [2024-06-06 11:59:29,007][02692] Rollout worker 25 uses device cpu [2024-06-06 11:59:29,007][02692] Rollout worker 26 uses device cpu [2024-06-06 11:59:29,008][02692] Rollout worker 27 uses device cpu [2024-06-06 11:59:29,008][02692] Rollout worker 28 uses device cpu [2024-06-06 11:59:29,008][02692] Rollout worker 29 uses device cpu [2024-06-06 11:59:29,008][02692] Rollout worker 30 uses device cpu [2024-06-06 11:59:29,008][02692] Rollout worker 31 uses device cpu [2024-06-06 11:59:29,528][02692] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 11:59:29,529][02692] InferenceWorker_p0-w0: min num requests: 10 [2024-06-06 11:59:29,589][02692] Starting all processes... [2024-06-06 11:59:29,589][02692] Starting process learner_proc0 [2024-06-06 11:59:29,865][02692] Starting all processes... [2024-06-06 11:59:29,868][02692] Starting process inference_proc0-0 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc0 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc1 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc2 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc3 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc4 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc5 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc6 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc7 [2024-06-06 11:59:29,868][02692] Starting process rollout_proc8 [2024-06-06 11:59:29,871][02692] Starting process rollout_proc9 [2024-06-06 11:59:29,871][02692] Starting process rollout_proc10 [2024-06-06 11:59:29,871][02692] Starting process rollout_proc11 [2024-06-06 11:59:29,872][02692] Starting process rollout_proc12 [2024-06-06 11:59:29,875][02692] Starting process rollout_proc17 [2024-06-06 11:59:29,873][02692] Starting process rollout_proc14 [2024-06-06 11:59:29,873][02692] Starting process rollout_proc15 [2024-06-06 11:59:29,873][02692] Starting process rollout_proc16 [2024-06-06 11:59:29,872][02692] Starting process rollout_proc13 [2024-06-06 11:59:29,875][02692] Starting process rollout_proc18 [2024-06-06 11:59:29,876][02692] Starting process rollout_proc19 [2024-06-06 11:59:29,880][02692] Starting process rollout_proc20 [2024-06-06 11:59:29,880][02692] Starting process rollout_proc21 [2024-06-06 11:59:29,880][02692] Starting process rollout_proc22 [2024-06-06 11:59:29,885][02692] Starting process rollout_proc23 [2024-06-06 11:59:29,889][02692] Starting process rollout_proc24 [2024-06-06 11:59:29,889][02692] Starting process rollout_proc25 [2024-06-06 11:59:29,892][02692] Starting process rollout_proc26 [2024-06-06 11:59:29,893][02692] Starting process rollout_proc27 [2024-06-06 11:59:29,896][02692] Starting process rollout_proc28 [2024-06-06 11:59:29,896][02692] Starting process rollout_proc29 [2024-06-06 11:59:29,898][02692] Starting process rollout_proc30 [2024-06-06 11:59:29,898][02692] Starting process rollout_proc31 [2024-06-06 11:59:31,916][02954] Worker 29 uses CPU cores [29] [2024-06-06 11:59:31,916][02925] Worker 0 uses CPU cores [0] [2024-06-06 11:59:32,001][02941] Worker 14 uses CPU cores [14] [2024-06-06 11:59:32,018][02904] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 11:59:32,019][02904] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-06 11:59:32,020][02927] Worker 2 uses CPU cores [2] [2024-06-06 11:59:32,028][02932] Worker 7 uses CPU cores [7] [2024-06-06 11:59:32,028][02904] Num visible devices: 1 [2024-06-06 11:59:32,032][02937] Worker 10 uses CPU cores [10] [2024-06-06 11:59:32,048][02904] Setting fixed seed 0 [2024-06-06 11:59:32,049][02904] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 11:59:32,050][02904] Initializing actor-critic model on device cuda:0 [2024-06-06 11:59:32,055][02934] Worker 8 uses CPU cores [8] [2024-06-06 11:59:32,080][02955] Worker 31 uses CPU cores [31] [2024-06-06 11:59:32,083][02930] Worker 5 uses CPU cores [5] [2024-06-06 11:59:32,104][02942] Worker 17 uses CPU cores [17] [2024-06-06 11:59:32,128][02949] Worker 24 uses CPU cores [24] [2024-06-06 11:59:32,148][02950] Worker 25 uses CPU cores [25] [2024-06-06 11:59:32,198][02940] Worker 16 uses CPU cores [16] [2024-06-06 11:59:32,204][02946] Worker 21 uses CPU cores [21] [2024-06-06 11:59:32,256][02952] Worker 28 uses CPU cores [28] [2024-06-06 11:59:32,260][02956] Worker 30 uses CPU cores [30] [2024-06-06 11:59:32,262][02936] Worker 12 uses CPU cores [12] [2024-06-06 11:59:32,264][02947] Worker 22 uses CPU cores [22] [2024-06-06 11:59:32,288][02951] Worker 27 uses CPU cores [27] [2024-06-06 11:59:32,288][02928] Worker 3 uses CPU cores [3] [2024-06-06 11:59:32,300][02953] Worker 26 uses CPU cores [26] [2024-06-06 11:59:32,304][02926] Worker 1 uses CPU cores [1] [2024-06-06 11:59:32,310][02938] Worker 19 uses CPU cores [19] [2024-06-06 11:59:32,324][02933] Worker 11 uses CPU cores [11] [2024-06-06 11:59:32,324][02935] Worker 9 uses CPU cores [9] [2024-06-06 11:59:32,331][02948] Worker 23 uses CPU cores [23] [2024-06-06 11:59:32,342][02943] Worker 18 uses CPU cores [18] [2024-06-06 11:59:32,355][02924] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 11:59:32,355][02924] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-06 11:59:32,363][02924] Num visible devices: 1 [2024-06-06 11:59:32,394][02945] Worker 13 uses CPU cores [13] [2024-06-06 11:59:32,400][02931] Worker 6 uses CPU cores [6] [2024-06-06 11:59:32,400][02944] Worker 20 uses CPU cores [20] [2024-06-06 11:59:32,460][02939] Worker 15 uses CPU cores [15] [2024-06-06 11:59:32,480][02929] Worker 4 uses CPU cores [4] [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,805][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,806][02904] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:32,809][02904] RunningMeanStd input shape: (1,) [2024-06-06 11:59:32,809][02904] RunningMeanStd input shape: (1,) [2024-06-06 11:59:32,809][02904] RunningMeanStd input shape: (1,) [2024-06-06 11:59:32,810][02904] RunningMeanStd input shape: (1,) [2024-06-06 11:59:32,849][02904] RunningMeanStd input shape: (1,) [2024-06-06 11:59:32,853][02904] Created Actor Critic model with architecture: [2024-06-06 11:59:32,853][02904] SampleFactoryAgentWrapper( (obs_normalizer): ObservationNormalizer() (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (agent): MettaAgent( (_encoder): MultiFeatureSetEncoder( (feature_set_encoders): ModuleDict( (grid_obs): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (agent): RunningMeanStdInPlace() (altar): RunningMeanStdInPlace() (converter): RunningMeanStdInPlace() (generator): RunningMeanStdInPlace() (wall): RunningMeanStdInPlace() (agent:dir): RunningMeanStdInPlace() (agent:energy): RunningMeanStdInPlace() (agent:frozen): RunningMeanStdInPlace() (agent:hp): RunningMeanStdInPlace() (agent:id): RunningMeanStdInPlace() (agent:inv_r1): RunningMeanStdInPlace() (agent:inv_r2): RunningMeanStdInPlace() (agent:inv_r3): RunningMeanStdInPlace() (agent:shield): RunningMeanStdInPlace() (altar:hp): RunningMeanStdInPlace() (altar:state): RunningMeanStdInPlace() (converter:hp): RunningMeanStdInPlace() (converter:state): RunningMeanStdInPlace() (generator:amount): RunningMeanStdInPlace() (generator:hp): RunningMeanStdInPlace() (generator:state): RunningMeanStdInPlace() (wall:hp): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=125, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=512, out_features=512, bias=True) (7): ELU(alpha=1.0) ) ) (global_vars): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (_steps): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_action): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_action_id): RunningMeanStdInPlace() (last_action_val): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_reward): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_reward): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) ) (merged_encoder): Sequential( (0): Linear(in_features=536, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) ) ) (_core): ModelCoreRNN( (core): GRU(512, 512) ) (_decoder): Decoder( (mlp): Identity() ) (_critic_linear): Linear(in_features=512, out_features=1, bias=True) (_action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) ) [2024-06-06 11:59:32,923][02904] Using optimizer [2024-06-06 11:59:33,068][02904] Loading state from checkpoint /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006297_103170048.pth... [2024-06-06 11:59:33,203][02904] Loading model from checkpoint [2024-06-06 11:59:33,207][02904] Loaded experiment state at self.train_step=6297, self.env_steps=103170048 [2024-06-06 11:59:33,207][02904] Initialized policy 0 weights for model version 6297 [2024-06-06 11:59:33,210][02904] LearnerWorker_p0 finished initialization! [2024-06-06 11:59:33,210][02904] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 11:59:33,844][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,845][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,846][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,846][02924] RunningMeanStd input shape: (11, 11) [2024-06-06 11:59:33,849][02924] RunningMeanStd input shape: (1,) [2024-06-06 11:59:33,849][02924] RunningMeanStd input shape: (1,) [2024-06-06 11:59:33,849][02924] RunningMeanStd input shape: (1,) [2024-06-06 11:59:33,849][02924] RunningMeanStd input shape: (1,) [2024-06-06 11:59:33,888][02924] RunningMeanStd input shape: (1,) [2024-06-06 11:59:33,909][02692] Inference worker 0-0 is ready! [2024-06-06 11:59:33,910][02692] All inference workers are ready! Signal rollout workers to start! [2024-06-06 11:59:35,968][02942] Decorrelating experience for 0 frames... [2024-06-06 11:59:35,987][02946] Decorrelating experience for 0 frames... [2024-06-06 11:59:35,993][02938] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,003][02953] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,006][02943] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,007][02940] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,009][02956] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,011][02950] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,013][02944] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,015][02948] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,015][02947] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,016][02949] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,030][02935] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,032][02945] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,032][02955] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,032][02932] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,033][02926] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,034][02933] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,035][02954] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,036][02928] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,037][02939] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,038][02941] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,039][02927] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,039][02937] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,040][02936] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,043][02925] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,043][02934] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,044][02930] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,044][02931] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,046][02929] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,058][02952] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,069][02951] Decorrelating experience for 0 frames... [2024-06-06 11:59:36,700][02942] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,721][02946] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,734][02938] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,747][02953] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,757][02692] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 103170048. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 11:59:36,761][02940] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,765][02943] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,767][02950] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,769][02956] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,773][02948] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,784][02932] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,784][02947] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,785][02944] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,785][02935] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,788][02945] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,789][02933] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,791][02926] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,792][02928] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,796][02939] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,798][02949] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,799][02937] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,801][02936] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,803][02927] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,804][02941] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,805][02934] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,808][02930] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,810][02929] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,810][02925] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,811][02931] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,816][02954] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,824][02955] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,844][02952] Decorrelating experience for 256 frames... [2024-06-06 11:59:36,855][02951] Decorrelating experience for 256 frames... [2024-06-06 11:59:41,757][02692] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 103170048. Throughput: 0: 31192.2. Samples: 155960. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 11:59:42,573][02950] Worker 25, sleep for 117.188 sec to decorrelate experience collection [2024-06-06 11:59:42,573][02946] Worker 21, sleep for 98.438 sec to decorrelate experience collection [2024-06-06 11:59:42,574][02944] Worker 20, sleep for 93.750 sec to decorrelate experience collection [2024-06-06 11:59:42,581][02947] Worker 22, sleep for 103.125 sec to decorrelate experience collection [2024-06-06 11:59:42,581][02948] Worker 23, sleep for 107.812 sec to decorrelate experience collection [2024-06-06 11:59:42,581][02942] Worker 17, sleep for 79.688 sec to decorrelate experience collection [2024-06-06 11:59:42,581][02953] Worker 26, sleep for 121.875 sec to decorrelate experience collection [2024-06-06 11:59:42,581][02949] Worker 24, sleep for 112.500 sec to decorrelate experience collection [2024-06-06 11:59:42,592][02937] Worker 10, sleep for 46.875 sec to decorrelate experience collection [2024-06-06 11:59:42,594][02926] Worker 1, sleep for 4.688 sec to decorrelate experience collection [2024-06-06 11:59:42,595][02938] Worker 19, sleep for 89.062 sec to decorrelate experience collection [2024-06-06 11:59:42,595][02940] Worker 16, sleep for 75.000 sec to decorrelate experience collection [2024-06-06 11:59:42,595][02954] Worker 29, sleep for 135.938 sec to decorrelate experience collection [2024-06-06 11:59:42,596][02951] Worker 27, sleep for 126.562 sec to decorrelate experience collection [2024-06-06 11:59:42,596][02943] Worker 18, sleep for 84.375 sec to decorrelate experience collection [2024-06-06 11:59:42,596][02955] Worker 31, sleep for 145.312 sec to decorrelate experience collection [2024-06-06 11:59:42,596][02941] Worker 14, sleep for 65.625 sec to decorrelate experience collection [2024-06-06 11:59:42,602][02933] Worker 11, sleep for 51.562 sec to decorrelate experience collection [2024-06-06 11:59:42,603][02935] Worker 9, sleep for 42.188 sec to decorrelate experience collection [2024-06-06 11:59:42,608][02936] Worker 12, sleep for 56.250 sec to decorrelate experience collection [2024-06-06 11:59:42,608][02930] Worker 5, sleep for 23.438 sec to decorrelate experience collection [2024-06-06 11:59:42,609][02952] Worker 28, sleep for 131.250 sec to decorrelate experience collection [2024-06-06 11:59:42,615][02928] Worker 3, sleep for 14.062 sec to decorrelate experience collection [2024-06-06 11:59:42,615][02939] Worker 15, sleep for 70.312 sec to decorrelate experience collection [2024-06-06 11:59:42,616][02956] Worker 30, sleep for 140.625 sec to decorrelate experience collection [2024-06-06 11:59:42,620][02927] Worker 2, sleep for 9.375 sec to decorrelate experience collection [2024-06-06 11:59:42,627][02945] Worker 13, sleep for 60.938 sec to decorrelate experience collection [2024-06-06 11:59:42,628][02929] Worker 4, sleep for 18.750 sec to decorrelate experience collection [2024-06-06 11:59:42,630][02934] Worker 8, sleep for 37.500 sec to decorrelate experience collection [2024-06-06 11:59:42,665][02931] Worker 6, sleep for 28.125 sec to decorrelate experience collection [2024-06-06 11:59:42,669][02904] Signal inference workers to stop experience collection... [2024-06-06 11:59:42,692][02932] Worker 7, sleep for 32.812 sec to decorrelate experience collection [2024-06-06 11:59:42,716][02924] InferenceWorker_p0-w0: stopping experience collection [2024-06-06 11:59:43,230][02904] Signal inference workers to resume experience collection... [2024-06-06 11:59:43,231][02924] InferenceWorker_p0-w0: resuming experience collection [2024-06-06 11:59:44,324][02924] Updated weights for policy 0, policy_version 6307 (0.0011) [2024-06-06 11:59:46,758][02692] Fps is (10 sec: 16383.2, 60 sec: 16383.2, 300 sec: 16383.2). Total num frames: 103333888. Throughput: 0: 33038.3. Samples: 330400. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 11:59:47,304][02926] Worker 1 awakens! [2024-06-06 11:59:49,525][02692] Heartbeat connected on Batcher_0 [2024-06-06 11:59:49,527][02692] Heartbeat connected on LearnerWorker_p0 [2024-06-06 11:59:49,532][02692] Heartbeat connected on RolloutWorker_w0 [2024-06-06 11:59:49,548][02692] Heartbeat connected on RolloutWorker_w1 [2024-06-06 11:59:49,567][02692] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-06 11:59:51,757][02692] Fps is (10 sec: 16383.6, 60 sec: 10922.5, 300 sec: 10922.5). Total num frames: 103333888. Throughput: 0: 22381.0. Samples: 335720. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 11:59:52,016][02927] Worker 2 awakens! [2024-06-06 11:59:52,025][02692] Heartbeat connected on RolloutWorker_w2 [2024-06-06 11:59:56,748][02928] Worker 3 awakens! [2024-06-06 11:59:56,757][02692] Fps is (10 sec: 3276.8, 60 sec: 9830.2, 300 sec: 9830.2). Total num frames: 103366656. Throughput: 0: 17619.6. Samples: 352400. Policy #0 lag: (min: 0.0, avg: 9.4, max: 10.0) [2024-06-06 11:59:56,763][02692] Heartbeat connected on RolloutWorker_w3 [2024-06-06 12:00:01,468][02929] Worker 4 awakens! [2024-06-06 12:00:01,475][02692] Heartbeat connected on RolloutWorker_w4 [2024-06-06 12:00:01,757][02692] Fps is (10 sec: 4915.4, 60 sec: 8519.7, 300 sec: 8519.7). Total num frames: 103383040. Throughput: 0: 15072.1. Samples: 376800. Policy #0 lag: (min: 0.0, avg: 4.3, max: 12.0) [2024-06-06 12:00:01,757][02692] Avg episode reward: [(0, '0.022')] [2024-06-06 12:00:06,146][02930] Worker 5 awakens! [2024-06-06 12:00:06,149][02692] Heartbeat connected on RolloutWorker_w5 [2024-06-06 12:00:06,757][02692] Fps is (10 sec: 9830.8, 60 sec: 9830.4, 300 sec: 9830.4). Total num frames: 103464960. Throughput: 0: 14160.7. Samples: 424820. Policy #0 lag: (min: 0.0, avg: 4.3, max: 12.0) [2024-06-06 12:00:06,757][02692] Avg episode reward: [(0, '0.029')] [2024-06-06 12:00:07,863][02924] Updated weights for policy 0, policy_version 6317 (0.0012) [2024-06-06 12:00:10,888][02931] Worker 6 awakens! [2024-06-06 12:00:10,891][02692] Heartbeat connected on RolloutWorker_w6 [2024-06-06 12:00:11,757][02692] Fps is (10 sec: 19660.8, 60 sec: 11702.9, 300 sec: 11702.9). Total num frames: 103579648. Throughput: 0: 15247.5. Samples: 533660. Policy #0 lag: (min: 0.0, avg: 6.7, max: 18.0) [2024-06-06 12:00:11,757][02692] Avg episode reward: [(0, '0.037')] [2024-06-06 12:00:15,047][02924] Updated weights for policy 0, policy_version 6327 (0.0011) [2024-06-06 12:00:15,548][02932] Worker 7 awakens! [2024-06-06 12:00:15,553][02692] Heartbeat connected on RolloutWorker_w7 [2024-06-06 12:00:16,757][02692] Fps is (10 sec: 24576.1, 60 sec: 13516.8, 300 sec: 13516.8). Total num frames: 103710720. Throughput: 0: 17070.5. Samples: 682820. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) [2024-06-06 12:00:16,757][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:00:16,767][02904] Saving new best policy, reward=0.046! [2024-06-06 12:00:20,231][02934] Worker 8 awakens! [2024-06-06 12:00:20,235][02692] Heartbeat connected on RolloutWorker_w8 [2024-06-06 12:00:21,195][02924] Updated weights for policy 0, policy_version 6337 (0.0012) [2024-06-06 12:00:21,757][02692] Fps is (10 sec: 26214.4, 60 sec: 14927.7, 300 sec: 14927.7). Total num frames: 103841792. Throughput: 0: 17004.0. Samples: 765180. Policy #0 lag: (min: 0.0, avg: 2.4, max: 7.0) [2024-06-06 12:00:21,757][02692] Avg episode reward: [(0, '0.038')] [2024-06-06 12:00:24,812][02935] Worker 9 awakens! [2024-06-06 12:00:24,818][02692] Heartbeat connected on RolloutWorker_w9 [2024-06-06 12:00:25,820][02924] Updated weights for policy 0, policy_version 6347 (0.0012) [2024-06-06 12:00:26,757][02692] Fps is (10 sec: 29491.2, 60 sec: 16711.7, 300 sec: 16711.7). Total num frames: 104005632. Throughput: 0: 17391.1. Samples: 938560. Policy #0 lag: (min: 0.0, avg: 3.1, max: 7.0) [2024-06-06 12:00:26,757][02692] Avg episode reward: [(0, '0.032')] [2024-06-06 12:00:29,566][02937] Worker 10 awakens! [2024-06-06 12:00:29,571][02692] Heartbeat connected on RolloutWorker_w10 [2024-06-06 12:00:30,884][02924] Updated weights for policy 0, policy_version 6357 (0.0017) [2024-06-06 12:00:31,757][02692] Fps is (10 sec: 32767.8, 60 sec: 18171.4, 300 sec: 18171.4). Total num frames: 104169472. Throughput: 0: 18216.7. Samples: 1150140. Policy #0 lag: (min: 0.0, avg: 3.3, max: 7.0) [2024-06-06 12:00:31,757][02692] Avg episode reward: [(0, '0.035')] [2024-06-06 12:00:34,264][02933] Worker 11 awakens! [2024-06-06 12:00:34,270][02692] Heartbeat connected on RolloutWorker_w11 [2024-06-06 12:00:35,136][02924] Updated weights for policy 0, policy_version 6367 (0.0014) [2024-06-06 12:00:36,757][02692] Fps is (10 sec: 36044.5, 60 sec: 19933.9, 300 sec: 19933.9). Total num frames: 104366080. Throughput: 0: 20597.4. Samples: 1262600. Policy #0 lag: (min: 0.0, avg: 3.3, max: 7.0) [2024-06-06 12:00:36,757][02692] Avg episode reward: [(0, '0.040')] [2024-06-06 12:00:38,956][02936] Worker 12 awakens! [2024-06-06 12:00:38,961][02692] Heartbeat connected on RolloutWorker_w12 [2024-06-06 12:00:39,106][02924] Updated weights for policy 0, policy_version 6377 (0.0012) [2024-06-06 12:00:41,757][02692] Fps is (10 sec: 40959.9, 60 sec: 23483.7, 300 sec: 21677.3). Total num frames: 104579072. Throughput: 0: 25802.4. Samples: 1513500. Policy #0 lag: (min: 0.0, avg: 4.5, max: 10.0) [2024-06-06 12:00:41,757][02692] Avg episode reward: [(0, '0.043')] [2024-06-06 12:00:43,025][02924] Updated weights for policy 0, policy_version 6387 (0.0016) [2024-06-06 12:00:43,664][02945] Worker 13 awakens! [2024-06-06 12:00:43,670][02692] Heartbeat connected on RolloutWorker_w13 [2024-06-06 12:00:46,572][02924] Updated weights for policy 0, policy_version 6397 (0.0016) [2024-06-06 12:00:46,757][02692] Fps is (10 sec: 44237.2, 60 sec: 24576.2, 300 sec: 23405.7). Total num frames: 104808448. Throughput: 0: 31061.7. Samples: 1774580. Policy #0 lag: (min: 0.0, avg: 4.1, max: 10.0) [2024-06-06 12:00:46,757][02692] Avg episode reward: [(0, '0.037')] [2024-06-06 12:00:48,320][02941] Worker 14 awakens! [2024-06-06 12:00:48,328][02692] Heartbeat connected on RolloutWorker_w14 [2024-06-06 12:00:50,464][02924] Updated weights for policy 0, policy_version 6407 (0.0018) [2024-06-06 12:00:51,757][02692] Fps is (10 sec: 44237.2, 60 sec: 28126.0, 300 sec: 24685.3). Total num frames: 105021440. Throughput: 0: 32918.7. Samples: 1906160. Policy #0 lag: (min: 0.0, avg: 5.1, max: 9.0) [2024-06-06 12:00:51,757][02692] Avg episode reward: [(0, '0.044')] [2024-06-06 12:00:53,029][02939] Worker 15 awakens! [2024-06-06 12:00:53,037][02692] Heartbeat connected on RolloutWorker_w15 [2024-06-06 12:00:54,289][02924] Updated weights for policy 0, policy_version 6417 (0.0023) [2024-06-06 12:00:56,757][02692] Fps is (10 sec: 42597.9, 60 sec: 31129.8, 300 sec: 25804.8). Total num frames: 105234432. Throughput: 0: 36301.7. Samples: 2167240. Policy #0 lag: (min: 1.0, avg: 5.3, max: 11.0) [2024-06-06 12:00:56,758][02692] Avg episode reward: [(0, '0.038')] [2024-06-06 12:00:57,696][02940] Worker 16 awakens! [2024-06-06 12:00:57,704][02692] Heartbeat connected on RolloutWorker_w16 [2024-06-06 12:00:58,449][02924] Updated weights for policy 0, policy_version 6427 (0.0018) [2024-06-06 12:01:01,757][02692] Fps is (10 sec: 42598.2, 60 sec: 34406.4, 300 sec: 26792.7). Total num frames: 105447424. Throughput: 0: 38332.9. Samples: 2407800. Policy #0 lag: (min: 1.0, avg: 5.3, max: 11.0) [2024-06-06 12:01:01,757][02692] Avg episode reward: [(0, '0.040')] [2024-06-06 12:01:02,111][02924] Updated weights for policy 0, policy_version 6437 (0.0019) [2024-06-06 12:01:02,368][02942] Worker 17 awakens! [2024-06-06 12:01:02,377][02692] Heartbeat connected on RolloutWorker_w17 [2024-06-06 12:01:05,803][02924] Updated weights for policy 0, policy_version 6447 (0.0021) [2024-06-06 12:01:06,757][02692] Fps is (10 sec: 40960.3, 60 sec: 36317.9, 300 sec: 27488.7). Total num frames: 105644032. Throughput: 0: 39404.4. Samples: 2538380. Policy #0 lag: (min: 0.0, avg: 5.6, max: 11.0) [2024-06-06 12:01:06,757][02692] Avg episode reward: [(0, '0.040')] [2024-06-06 12:01:07,026][02943] Worker 18 awakens! [2024-06-06 12:01:07,035][02692] Heartbeat connected on RolloutWorker_w18 [2024-06-06 12:01:09,883][02924] Updated weights for policy 0, policy_version 6457 (0.0019) [2024-06-06 12:01:11,756][02938] Worker 19 awakens! [2024-06-06 12:01:11,757][02692] Fps is (10 sec: 40960.2, 60 sec: 37956.3, 300 sec: 28284.0). Total num frames: 105857024. Throughput: 0: 41197.8. Samples: 2792460. Policy #0 lag: (min: 0.0, avg: 5.4, max: 13.0) [2024-06-06 12:01:11,757][02692] Avg episode reward: [(0, '0.039')] [2024-06-06 12:01:11,764][02692] Heartbeat connected on RolloutWorker_w19 [2024-06-06 12:01:13,475][02924] Updated weights for policy 0, policy_version 6467 (0.0024) [2024-06-06 12:01:16,424][02944] Worker 20 awakens! [2024-06-06 12:01:16,433][02692] Heartbeat connected on RolloutWorker_w20 [2024-06-06 12:01:16,757][02692] Fps is (10 sec: 45875.5, 60 sec: 39867.7, 300 sec: 29327.4). Total num frames: 106102784. Throughput: 0: 42392.1. Samples: 3057780. Policy #0 lag: (min: 0.0, avg: 6.9, max: 13.0) [2024-06-06 12:01:16,757][02692] Avg episode reward: [(0, '0.036')] [2024-06-06 12:01:17,351][02924] Updated weights for policy 0, policy_version 6477 (0.0028) [2024-06-06 12:01:21,112][02946] Worker 21 awakens! [2024-06-06 12:01:21,122][02692] Heartbeat connected on RolloutWorker_w21 [2024-06-06 12:01:21,251][02924] Updated weights for policy 0, policy_version 6487 (0.0022) [2024-06-06 12:01:21,757][02692] Fps is (10 sec: 45874.9, 60 sec: 41233.0, 300 sec: 29959.3). Total num frames: 106315776. Throughput: 0: 42829.4. Samples: 3189920. Policy #0 lag: (min: 0.0, avg: 5.6, max: 14.0) [2024-06-06 12:01:21,757][02692] Avg episode reward: [(0, '0.034')] [2024-06-06 12:01:24,521][02924] Updated weights for policy 0, policy_version 6497 (0.0022) [2024-06-06 12:01:25,798][02947] Worker 22 awakens! [2024-06-06 12:01:25,809][02692] Heartbeat connected on RolloutWorker_w22 [2024-06-06 12:01:26,757][02692] Fps is (10 sec: 42597.6, 60 sec: 42052.1, 300 sec: 30533.8). Total num frames: 106528768. Throughput: 0: 43281.2. Samples: 3461160. Policy #0 lag: (min: 0.0, avg: 5.6, max: 14.0) [2024-06-06 12:01:26,758][02692] Avg episode reward: [(0, '0.043')] [2024-06-06 12:01:26,890][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006503_106545152.pth... [2024-06-06 12:01:26,935][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000005946_97419264.pth [2024-06-06 12:01:28,140][02924] Updated weights for policy 0, policy_version 6507 (0.0023) [2024-06-06 12:01:30,420][02948] Worker 23 awakens! [2024-06-06 12:01:30,431][02692] Heartbeat connected on RolloutWorker_w23 [2024-06-06 12:01:31,555][02924] Updated weights for policy 0, policy_version 6517 (0.0025) [2024-06-06 12:01:31,757][02692] Fps is (10 sec: 45874.7, 60 sec: 43417.5, 300 sec: 31343.3). Total num frames: 106774528. Throughput: 0: 43655.9. Samples: 3739100. Policy #0 lag: (min: 0.0, avg: 7.7, max: 16.0) [2024-06-06 12:01:31,758][02692] Avg episode reward: [(0, '0.041')] [2024-06-06 12:01:35,134][02949] Worker 24 awakens! [2024-06-06 12:01:35,145][02692] Heartbeat connected on RolloutWorker_w24 [2024-06-06 12:01:35,712][02924] Updated weights for policy 0, policy_version 6527 (0.0021) [2024-06-06 12:01:36,757][02692] Fps is (10 sec: 45875.8, 60 sec: 43690.7, 300 sec: 31812.3). Total num frames: 106987520. Throughput: 0: 43704.4. Samples: 3872860. Policy #0 lag: (min: 0.0, avg: 8.1, max: 17.0) [2024-06-06 12:01:36,757][02692] Avg episode reward: [(0, '0.032')] [2024-06-06 12:01:38,798][02924] Updated weights for policy 0, policy_version 6537 (0.0024) [2024-06-06 12:01:39,848][02950] Worker 25 awakens! [2024-06-06 12:01:39,861][02692] Heartbeat connected on RolloutWorker_w25 [2024-06-06 12:01:41,757][02692] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 32243.7). Total num frames: 107200512. Throughput: 0: 44044.4. Samples: 4149240. Policy #0 lag: (min: 0.0, avg: 8.0, max: 17.0) [2024-06-06 12:01:41,758][02692] Avg episode reward: [(0, '0.042')] [2024-06-06 12:01:42,565][02924] Updated weights for policy 0, policy_version 6547 (0.0026) [2024-06-06 12:01:44,556][02953] Worker 26 awakens! [2024-06-06 12:01:44,566][02692] Heartbeat connected on RolloutWorker_w26 [2024-06-06 12:01:45,730][02924] Updated weights for policy 0, policy_version 6557 (0.0031) [2024-06-06 12:01:46,757][02692] Fps is (10 sec: 45875.7, 60 sec: 43963.8, 300 sec: 32894.1). Total num frames: 107446272. Throughput: 0: 44853.9. Samples: 4426220. Policy #0 lag: (min: 1.0, avg: 9.6, max: 18.0) [2024-06-06 12:01:46,757][02692] Avg episode reward: [(0, '0.037')] [2024-06-06 12:01:49,082][02924] Updated weights for policy 0, policy_version 6567 (0.0026) [2024-06-06 12:01:49,256][02951] Worker 27 awakens! [2024-06-06 12:01:49,269][02692] Heartbeat connected on RolloutWorker_w27 [2024-06-06 12:01:51,757][02692] Fps is (10 sec: 52428.9, 60 sec: 45055.9, 300 sec: 33738.9). Total num frames: 107724800. Throughput: 0: 45059.0. Samples: 4566040. Policy #0 lag: (min: 0.0, avg: 10.0, max: 18.0) [2024-06-06 12:01:51,758][02692] Avg episode reward: [(0, '0.042')] [2024-06-06 12:01:53,511][02924] Updated weights for policy 0, policy_version 6577 (0.0031) [2024-06-06 12:01:53,908][02952] Worker 28 awakens! [2024-06-06 12:01:53,918][02692] Heartbeat connected on RolloutWorker_w28 [2024-06-06 12:01:56,037][02924] Updated weights for policy 0, policy_version 6587 (0.0020) [2024-06-06 12:01:56,757][02692] Fps is (10 sec: 50789.9, 60 sec: 45329.1, 300 sec: 34172.4). Total num frames: 107954176. Throughput: 0: 45800.0. Samples: 4853460. Policy #0 lag: (min: 0.0, avg: 10.0, max: 18.0) [2024-06-06 12:01:56,757][02692] Avg episode reward: [(0, '0.041')] [2024-06-06 12:01:58,632][02954] Worker 29 awakens! [2024-06-06 12:01:58,645][02692] Heartbeat connected on RolloutWorker_w29 [2024-06-06 12:02:00,136][02924] Updated weights for policy 0, policy_version 6597 (0.0020) [2024-06-06 12:02:01,757][02692] Fps is (10 sec: 45876.0, 60 sec: 45602.2, 300 sec: 34575.9). Total num frames: 108183552. Throughput: 0: 46296.0. Samples: 5141100. Policy #0 lag: (min: 1.0, avg: 10.4, max: 18.0) [2024-06-06 12:02:01,757][02692] Avg episode reward: [(0, '0.037')] [2024-06-06 12:02:02,782][02904] Signal inference workers to stop experience collection... (50 times) [2024-06-06 12:02:02,834][02924] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-06 12:02:02,892][02904] Signal inference workers to resume experience collection... (50 times) [2024-06-06 12:02:02,893][02924] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-06 12:02:03,046][02924] Updated weights for policy 0, policy_version 6607 (0.0027) [2024-06-06 12:02:03,341][02956] Worker 30 awakens! [2024-06-06 12:02:03,354][02692] Heartbeat connected on RolloutWorker_w30 [2024-06-06 12:02:06,645][02924] Updated weights for policy 0, policy_version 6617 (0.0033) [2024-06-06 12:02:06,757][02692] Fps is (10 sec: 45875.0, 60 sec: 46148.3, 300 sec: 34952.5). Total num frames: 108412928. Throughput: 0: 46683.5. Samples: 5290680. Policy #0 lag: (min: 1.0, avg: 10.1, max: 20.0) [2024-06-06 12:02:06,758][02692] Avg episode reward: [(0, '0.039')] [2024-06-06 12:02:08,009][02955] Worker 31 awakens! [2024-06-06 12:02:08,024][02692] Heartbeat connected on RolloutWorker_w31 [2024-06-06 12:02:09,379][02924] Updated weights for policy 0, policy_version 6627 (0.0022) [2024-06-06 12:02:11,757][02692] Fps is (10 sec: 47513.8, 60 sec: 46694.4, 300 sec: 35410.6). Total num frames: 108658688. Throughput: 0: 47325.6. Samples: 5590800. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 12:02:11,757][02692] Avg episode reward: [(0, '0.038')] [2024-06-06 12:02:13,235][02924] Updated weights for policy 0, policy_version 6637 (0.0029) [2024-06-06 12:02:15,839][02924] Updated weights for policy 0, policy_version 6647 (0.0026) [2024-06-06 12:02:16,757][02692] Fps is (10 sec: 54065.1, 60 sec: 47513.2, 300 sec: 36147.1). Total num frames: 108953600. Throughput: 0: 47749.5. Samples: 5887840. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-06 12:02:16,758][02692] Avg episode reward: [(0, '0.040')] [2024-06-06 12:02:19,949][02924] Updated weights for policy 0, policy_version 6657 (0.0031) [2024-06-06 12:02:21,760][02692] Fps is (10 sec: 50775.1, 60 sec: 47511.3, 300 sec: 36342.1). Total num frames: 109166592. Throughput: 0: 48401.3. Samples: 6051060. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-06 12:02:21,760][02692] Avg episode reward: [(0, '0.039')] [2024-06-06 12:02:22,650][02924] Updated weights for policy 0, policy_version 6667 (0.0023) [2024-06-06 12:02:26,610][02924] Updated weights for policy 0, policy_version 6677 (0.0036) [2024-06-06 12:02:26,757][02692] Fps is (10 sec: 44238.9, 60 sec: 47786.8, 300 sec: 36623.1). Total num frames: 109395968. Throughput: 0: 48685.6. Samples: 6340080. Policy #0 lag: (min: 0.0, avg: 12.1, max: 22.0) [2024-06-06 12:02:26,757][02692] Avg episode reward: [(0, '0.045')] [2024-06-06 12:02:29,303][02924] Updated weights for policy 0, policy_version 6687 (0.0029) [2024-06-06 12:02:31,757][02692] Fps is (10 sec: 47527.8, 60 sec: 47786.8, 300 sec: 36981.0). Total num frames: 109641728. Throughput: 0: 49135.9. Samples: 6637340. Policy #0 lag: (min: 0.0, avg: 12.3, max: 28.0) [2024-06-06 12:02:31,757][02692] Avg episode reward: [(0, '0.052')] [2024-06-06 12:02:31,758][02904] Saving new best policy, reward=0.052! [2024-06-06 12:02:33,139][02924] Updated weights for policy 0, policy_version 6697 (0.0034) [2024-06-06 12:02:35,688][02924] Updated weights for policy 0, policy_version 6707 (0.0025) [2024-06-06 12:02:36,757][02692] Fps is (10 sec: 55705.3, 60 sec: 49425.1, 300 sec: 37683.2). Total num frames: 109953024. Throughput: 0: 49284.6. Samples: 6783840. Policy #0 lag: (min: 0.0, avg: 13.2, max: 26.0) [2024-06-06 12:02:36,757][02692] Avg episode reward: [(0, '0.045')] [2024-06-06 12:02:39,584][02924] Updated weights for policy 0, policy_version 6717 (0.0024) [2024-06-06 12:02:41,757][02692] Fps is (10 sec: 54067.4, 60 sec: 49698.3, 300 sec: 37904.6). Total num frames: 110182400. Throughput: 0: 49754.7. Samples: 7092420. Policy #0 lag: (min: 0.0, avg: 13.1, max: 27.0) [2024-06-06 12:02:41,757][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:02:42,028][02924] Updated weights for policy 0, policy_version 6727 (0.0037) [2024-06-06 12:02:46,180][02924] Updated weights for policy 0, policy_version 6737 (0.0040) [2024-06-06 12:02:46,757][02692] Fps is (10 sec: 44236.8, 60 sec: 49151.9, 300 sec: 38028.1). Total num frames: 110395392. Throughput: 0: 49974.6. Samples: 7389960. Policy #0 lag: (min: 0.0, avg: 13.1, max: 27.0) [2024-06-06 12:02:46,757][02692] Avg episode reward: [(0, '0.043')] [2024-06-06 12:02:48,601][02904] Signal inference workers to stop experience collection... (100 times) [2024-06-06 12:02:48,603][02904] Signal inference workers to resume experience collection... (100 times) [2024-06-06 12:02:48,609][02924] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-06 12:02:48,640][02924] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-06 12:02:48,751][02924] Updated weights for policy 0, policy_version 6747 (0.0028) [2024-06-06 12:02:51,760][02692] Fps is (10 sec: 45861.3, 60 sec: 48603.6, 300 sec: 38312.8). Total num frames: 110641152. Throughput: 0: 49608.4. Samples: 7523200. Policy #0 lag: (min: 0.0, avg: 12.8, max: 25.0) [2024-06-06 12:02:51,760][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:02:52,927][02924] Updated weights for policy 0, policy_version 6757 (0.0031) [2024-06-06 12:02:55,511][02924] Updated weights for policy 0, policy_version 6767 (0.0031) [2024-06-06 12:02:56,757][02692] Fps is (10 sec: 54067.4, 60 sec: 49698.2, 300 sec: 38830.1). Total num frames: 110936064. Throughput: 0: 49699.5. Samples: 7827280. Policy #0 lag: (min: 0.0, avg: 12.9, max: 25.0) [2024-06-06 12:02:56,757][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:02:59,600][02924] Updated weights for policy 0, policy_version 6777 (0.0032) [2024-06-06 12:03:01,757][02692] Fps is (10 sec: 50805.7, 60 sec: 49425.1, 300 sec: 38922.0). Total num frames: 111149056. Throughput: 0: 49511.7. Samples: 8115840. Policy #0 lag: (min: 2.0, avg: 11.7, max: 24.0) [2024-06-06 12:03:01,757][02692] Avg episode reward: [(0, '0.048')] [2024-06-06 12:03:02,126][02924] Updated weights for policy 0, policy_version 6787 (0.0019) [2024-06-06 12:03:06,383][02924] Updated weights for policy 0, policy_version 6797 (0.0026) [2024-06-06 12:03:06,757][02692] Fps is (10 sec: 45875.3, 60 sec: 49698.2, 300 sec: 39165.6). Total num frames: 111394816. Throughput: 0: 49324.2. Samples: 8270500. Policy #0 lag: (min: 2.0, avg: 9.2, max: 22.0) [2024-06-06 12:03:06,757][02692] Avg episode reward: [(0, '0.044')] [2024-06-06 12:03:08,618][02924] Updated weights for policy 0, policy_version 6807 (0.0025) [2024-06-06 12:03:11,757][02692] Fps is (10 sec: 45874.7, 60 sec: 49151.9, 300 sec: 39245.4). Total num frames: 111607808. Throughput: 0: 49243.9. Samples: 8556060. Policy #0 lag: (min: 2.0, avg: 9.2, max: 22.0) [2024-06-06 12:03:11,757][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:03:12,751][02924] Updated weights for policy 0, policy_version 6817 (0.0037) [2024-06-06 12:03:15,159][02924] Updated weights for policy 0, policy_version 6827 (0.0027) [2024-06-06 12:03:16,757][02692] Fps is (10 sec: 50790.1, 60 sec: 49152.3, 300 sec: 39694.0). Total num frames: 111902720. Throughput: 0: 49232.4. Samples: 8852800. Policy #0 lag: (min: 1.0, avg: 7.9, max: 21.0) [2024-06-06 12:03:16,757][02692] Avg episode reward: [(0, '0.048')] [2024-06-06 12:03:19,661][02924] Updated weights for policy 0, policy_version 6837 (0.0021) [2024-06-06 12:03:21,757][02692] Fps is (10 sec: 55705.8, 60 sec: 49973.7, 300 sec: 39977.0). Total num frames: 112164864. Throughput: 0: 49695.6. Samples: 9020140. Policy #0 lag: (min: 0.0, avg: 7.3, max: 21.0) [2024-06-06 12:03:21,757][02692] Avg episode reward: [(0, '0.047')] [2024-06-06 12:03:22,019][02924] Updated weights for policy 0, policy_version 6847 (0.0029) [2024-06-06 12:03:26,074][02924] Updated weights for policy 0, policy_version 6857 (0.0031) [2024-06-06 12:03:26,757][02692] Fps is (10 sec: 49151.5, 60 sec: 49971.1, 300 sec: 40105.2). Total num frames: 112394240. Throughput: 0: 49339.8. Samples: 9312720. Policy #0 lag: (min: 0.0, avg: 7.7, max: 21.0) [2024-06-06 12:03:26,757][02692] Avg episode reward: [(0, '0.047')] [2024-06-06 12:03:26,769][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006860_112394240.pth... [2024-06-06 12:03:26,835][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006297_103170048.pth [2024-06-06 12:03:28,588][02924] Updated weights for policy 0, policy_version 6867 (0.0029) [2024-06-06 12:03:31,757][02692] Fps is (10 sec: 42598.2, 60 sec: 49151.9, 300 sec: 40088.5). Total num frames: 112590848. Throughput: 0: 49144.8. Samples: 9601480. Policy #0 lag: (min: 0.0, avg: 7.7, max: 21.0) [2024-06-06 12:03:31,757][02692] Avg episode reward: [(0, '0.044')] [2024-06-06 12:03:33,014][02924] Updated weights for policy 0, policy_version 6877 (0.0029) [2024-06-06 12:03:34,990][02904] Signal inference workers to stop experience collection... (150 times) [2024-06-06 12:03:35,028][02924] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-06 12:03:35,051][02904] Signal inference workers to resume experience collection... (150 times) [2024-06-06 12:03:35,052][02924] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-06 12:03:35,183][02924] Updated weights for policy 0, policy_version 6887 (0.0032) [2024-06-06 12:03:36,757][02692] Fps is (10 sec: 50791.4, 60 sec: 49152.1, 300 sec: 40550.4). Total num frames: 112902144. Throughput: 0: 49329.6. Samples: 9742880. Policy #0 lag: (min: 0.0, avg: 7.8, max: 20.0) [2024-06-06 12:03:36,757][02692] Avg episode reward: [(0, '0.045')] [2024-06-06 12:03:39,573][02924] Updated weights for policy 0, policy_version 6897 (0.0043) [2024-06-06 12:03:41,757][02692] Fps is (10 sec: 54067.8, 60 sec: 49152.0, 300 sec: 40659.1). Total num frames: 113131520. Throughput: 0: 49324.5. Samples: 10046880. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-06 12:03:41,757][02692] Avg episode reward: [(0, '0.043')] [2024-06-06 12:03:41,908][02924] Updated weights for policy 0, policy_version 6907 (0.0031) [2024-06-06 12:03:46,302][02924] Updated weights for policy 0, policy_version 6917 (0.0029) [2024-06-06 12:03:46,757][02692] Fps is (10 sec: 45875.2, 60 sec: 49425.1, 300 sec: 40763.4). Total num frames: 113360896. Throughput: 0: 49519.1. Samples: 10344200. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 12:03:46,757][02692] Avg episode reward: [(0, '0.047')] [2024-06-06 12:03:48,684][02924] Updated weights for policy 0, policy_version 6927 (0.0025) [2024-06-06 12:03:51,757][02692] Fps is (10 sec: 45875.2, 60 sec: 49154.5, 300 sec: 40863.6). Total num frames: 113590272. Throughput: 0: 49015.5. Samples: 10476200. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:03:51,757][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:03:52,848][02924] Updated weights for policy 0, policy_version 6937 (0.0030) [2024-06-06 12:03:55,202][02924] Updated weights for policy 0, policy_version 6947 (0.0024) [2024-06-06 12:03:56,757][02692] Fps is (10 sec: 52428.2, 60 sec: 49152.0, 300 sec: 41212.1). Total num frames: 113885184. Throughput: 0: 49395.6. Samples: 10778860. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:03:56,757][02692] Avg episode reward: [(0, '0.041')] [2024-06-06 12:03:59,634][02924] Updated weights for policy 0, policy_version 6957 (0.0030) [2024-06-06 12:04:01,757][02692] Fps is (10 sec: 54065.9, 60 sec: 49697.9, 300 sec: 41361.8). Total num frames: 114130944. Throughput: 0: 49284.2. Samples: 11070600. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 12:04:01,758][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:04:01,913][02924] Updated weights for policy 0, policy_version 6967 (0.0023) [2024-06-06 12:04:06,306][02924] Updated weights for policy 0, policy_version 6977 (0.0029) [2024-06-06 12:04:06,757][02692] Fps is (10 sec: 45875.5, 60 sec: 49152.0, 300 sec: 41384.8). Total num frames: 114343936. Throughput: 0: 49001.4. Samples: 11225200. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:04:06,757][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:04:08,562][02924] Updated weights for policy 0, policy_version 6987 (0.0032) [2024-06-06 12:04:11,757][02692] Fps is (10 sec: 44237.6, 60 sec: 49425.1, 300 sec: 41466.4). Total num frames: 114573312. Throughput: 0: 48929.9. Samples: 11514560. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:04:11,758][02692] Avg episode reward: [(0, '0.048')] [2024-06-06 12:04:12,880][02924] Updated weights for policy 0, policy_version 6997 (0.0023) [2024-06-06 12:04:15,101][02924] Updated weights for policy 0, policy_version 7007 (0.0034) [2024-06-06 12:04:16,757][02692] Fps is (10 sec: 52428.6, 60 sec: 49425.1, 300 sec: 41779.2). Total num frames: 114868224. Throughput: 0: 49051.6. Samples: 11808800. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:04:16,757][02692] Avg episode reward: [(0, '0.052')] [2024-06-06 12:04:19,289][02924] Updated weights for policy 0, policy_version 7017 (0.0029) [2024-06-06 12:04:21,570][02924] Updated weights for policy 0, policy_version 7027 (0.0022) [2024-06-06 12:04:21,757][02692] Fps is (10 sec: 55705.9, 60 sec: 49425.1, 300 sec: 41966.0). Total num frames: 115130368. Throughput: 0: 49603.9. Samples: 11975060. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 12:04:21,757][02692] Avg episode reward: [(0, '0.051')] [2024-06-06 12:04:26,075][02924] Updated weights for policy 0, policy_version 7037 (0.0019) [2024-06-06 12:04:26,757][02692] Fps is (10 sec: 47512.5, 60 sec: 49151.9, 300 sec: 41976.9). Total num frames: 115343360. Throughput: 0: 49479.3. Samples: 12273460. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 12:04:26,758][02692] Avg episode reward: [(0, '0.046')] [2024-06-06 12:04:28,159][02924] Updated weights for policy 0, policy_version 7047 (0.0033) [2024-06-06 12:04:31,757][02692] Fps is (10 sec: 42598.3, 60 sec: 49425.1, 300 sec: 41987.5). Total num frames: 115556352. Throughput: 0: 49415.5. Samples: 12567900. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:04:31,757][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:04:31,761][02904] Saving new best policy, reward=0.056! [2024-06-06 12:04:32,683][02924] Updated weights for policy 0, policy_version 7057 (0.0028) [2024-06-06 12:04:33,460][02904] Signal inference workers to stop experience collection... (200 times) [2024-06-06 12:04:33,497][02924] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-06 12:04:33,526][02904] Signal inference workers to resume experience collection... (200 times) [2024-06-06 12:04:33,527][02924] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-06 12:04:34,970][02924] Updated weights for policy 0, policy_version 7067 (0.0028) [2024-06-06 12:04:36,757][02692] Fps is (10 sec: 50791.2, 60 sec: 49151.9, 300 sec: 42987.2). Total num frames: 115851264. Throughput: 0: 49475.4. Samples: 12702600. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:04:36,758][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:04:39,419][02924] Updated weights for policy 0, policy_version 7077 (0.0031) [2024-06-06 12:04:41,450][02924] Updated weights for policy 0, policy_version 7087 (0.0022) [2024-06-06 12:04:41,757][02692] Fps is (10 sec: 55705.8, 60 sec: 49698.1, 300 sec: 43320.5). Total num frames: 116113408. Throughput: 0: 49481.4. Samples: 13005520. Policy #0 lag: (min: 0.0, avg: 7.6, max: 20.0) [2024-06-06 12:04:41,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:04:45,890][02924] Updated weights for policy 0, policy_version 7097 (0.0036) [2024-06-06 12:04:46,757][02692] Fps is (10 sec: 47514.2, 60 sec: 49425.0, 300 sec: 44042.5). Total num frames: 116326400. Throughput: 0: 49799.0. Samples: 13311540. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 12:04:46,757][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:04:48,024][02924] Updated weights for policy 0, policy_version 7107 (0.0029) [2024-06-06 12:04:51,757][02692] Fps is (10 sec: 42598.4, 60 sec: 49152.0, 300 sec: 44653.4). Total num frames: 116539392. Throughput: 0: 49370.2. Samples: 13446860. Policy #0 lag: (min: 0.0, avg: 8.8, max: 20.0) [2024-06-06 12:04:51,757][02692] Avg episode reward: [(0, '0.051')] [2024-06-06 12:04:52,738][02924] Updated weights for policy 0, policy_version 7117 (0.0026) [2024-06-06 12:04:54,625][02924] Updated weights for policy 0, policy_version 7127 (0.0034) [2024-06-06 12:04:56,757][02692] Fps is (10 sec: 50790.1, 60 sec: 49152.0, 300 sec: 45597.5). Total num frames: 116834304. Throughput: 0: 49361.8. Samples: 13735840. Policy #0 lag: (min: 0.0, avg: 8.8, max: 20.0) [2024-06-06 12:04:56,757][02692] Avg episode reward: [(0, '0.048')] [2024-06-06 12:04:59,277][02924] Updated weights for policy 0, policy_version 7137 (0.0030) [2024-06-06 12:05:01,339][02924] Updated weights for policy 0, policy_version 7147 (0.0028) [2024-06-06 12:05:01,757][02692] Fps is (10 sec: 55705.8, 60 sec: 49425.3, 300 sec: 46208.5). Total num frames: 117096448. Throughput: 0: 49435.6. Samples: 14033400. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 12:05:01,757][02692] Avg episode reward: [(0, '0.055')] [2024-06-06 12:05:05,987][02924] Updated weights for policy 0, policy_version 7157 (0.0025) [2024-06-06 12:05:06,757][02692] Fps is (10 sec: 47512.9, 60 sec: 49424.9, 300 sec: 46541.6). Total num frames: 117309440. Throughput: 0: 49193.6. Samples: 14188780. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 12:05:06,758][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:05:07,882][02924] Updated weights for policy 0, policy_version 7167 (0.0029) [2024-06-06 12:05:11,757][02692] Fps is (10 sec: 44236.2, 60 sec: 49425.0, 300 sec: 46874.9). Total num frames: 117538816. Throughput: 0: 49063.7. Samples: 14481320. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:05:11,758][02692] Avg episode reward: [(0, '0.052')] [2024-06-06 12:05:12,594][02924] Updated weights for policy 0, policy_version 7177 (0.0025) [2024-06-06 12:05:14,372][02924] Updated weights for policy 0, policy_version 7187 (0.0029) [2024-06-06 12:05:16,758][02692] Fps is (10 sec: 52424.9, 60 sec: 49424.3, 300 sec: 47430.1). Total num frames: 117833728. Throughput: 0: 49086.6. Samples: 14776840. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:05:16,758][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:05:19,160][02924] Updated weights for policy 0, policy_version 7197 (0.0037) [2024-06-06 12:05:21,066][02924] Updated weights for policy 0, policy_version 7207 (0.0027) [2024-06-06 12:05:21,758][02692] Fps is (10 sec: 55702.5, 60 sec: 49424.5, 300 sec: 47763.4). Total num frames: 118095872. Throughput: 0: 49854.9. Samples: 14946100. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:05:21,758][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:05:25,770][02924] Updated weights for policy 0, policy_version 7217 (0.0046) [2024-06-06 12:05:26,757][02692] Fps is (10 sec: 49156.1, 60 sec: 49698.3, 300 sec: 47985.7). Total num frames: 118325248. Throughput: 0: 49796.8. Samples: 15246380. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:05:26,757][02692] Avg episode reward: [(0, '0.055')] [2024-06-06 12:05:26,799][02904] Signal inference workers to stop experience collection... (250 times) [2024-06-06 12:05:26,801][02904] Signal inference workers to resume experience collection... (250 times) [2024-06-06 12:05:26,802][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007223_118341632.pth... [2024-06-06 12:05:26,817][02924] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-06 12:05:26,817][02924] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-06 12:05:26,864][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006503_106545152.pth [2024-06-06 12:05:27,681][02924] Updated weights for policy 0, policy_version 7227 (0.0022) [2024-06-06 12:05:31,757][02692] Fps is (10 sec: 44238.6, 60 sec: 49698.0, 300 sec: 48041.2). Total num frames: 118538240. Throughput: 0: 49615.3. Samples: 15544240. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:05:31,758][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:05:31,758][02904] Saving new best policy, reward=0.059! [2024-06-06 12:05:32,308][02924] Updated weights for policy 0, policy_version 7237 (0.0030) [2024-06-06 12:05:34,101][02924] Updated weights for policy 0, policy_version 7247 (0.0028) [2024-06-06 12:05:36,760][02692] Fps is (10 sec: 49138.9, 60 sec: 49422.9, 300 sec: 48262.9). Total num frames: 118816768. Throughput: 0: 49416.1. Samples: 15670720. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:05:36,760][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:05:38,843][02924] Updated weights for policy 0, policy_version 7257 (0.0037) [2024-06-06 12:05:40,661][02924] Updated weights for policy 0, policy_version 7267 (0.0020) [2024-06-06 12:05:41,757][02692] Fps is (10 sec: 54068.5, 60 sec: 49425.1, 300 sec: 48374.5). Total num frames: 119078912. Throughput: 0: 49886.3. Samples: 15980720. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 12:05:41,757][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:05:41,918][02904] Saving new best policy, reward=0.060! [2024-06-06 12:05:45,419][02924] Updated weights for policy 0, policy_version 7277 (0.0033) [2024-06-06 12:05:46,757][02692] Fps is (10 sec: 52442.5, 60 sec: 50244.1, 300 sec: 48541.1). Total num frames: 119341056. Throughput: 0: 50077.6. Samples: 16286900. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:05:46,757][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:05:47,466][02924] Updated weights for policy 0, policy_version 7287 (0.0027) [2024-06-06 12:05:51,757][02692] Fps is (10 sec: 44236.9, 60 sec: 49698.1, 300 sec: 48430.0). Total num frames: 119521280. Throughput: 0: 49761.1. Samples: 16428020. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-06 12:05:51,757][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:05:52,004][02924] Updated weights for policy 0, policy_version 7297 (0.0040) [2024-06-06 12:05:54,227][02924] Updated weights for policy 0, policy_version 7307 (0.0033) [2024-06-06 12:05:56,757][02692] Fps is (10 sec: 45875.8, 60 sec: 49425.1, 300 sec: 48652.2). Total num frames: 119799808. Throughput: 0: 49765.4. Samples: 16720760. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-06 12:05:56,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:05:58,596][02924] Updated weights for policy 0, policy_version 7317 (0.0021) [2024-06-06 12:06:00,764][02924] Updated weights for policy 0, policy_version 7327 (0.0034) [2024-06-06 12:06:01,757][02692] Fps is (10 sec: 55705.5, 60 sec: 49698.1, 300 sec: 48929.9). Total num frames: 120078336. Throughput: 0: 49674.8. Samples: 17012160. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 12:06:01,757][02692] Avg episode reward: [(0, '0.055')] [2024-06-06 12:06:05,423][02924] Updated weights for policy 0, policy_version 7337 (0.0026) [2024-06-06 12:06:06,760][02692] Fps is (10 sec: 50776.3, 60 sec: 49969.0, 300 sec: 48984.9). Total num frames: 120307712. Throughput: 0: 49473.2. Samples: 17172500. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:06:06,760][02692] Avg episode reward: [(0, '0.055')] [2024-06-06 12:06:07,452][02924] Updated weights for policy 0, policy_version 7347 (0.0037) [2024-06-06 12:06:08,266][02904] Signal inference workers to stop experience collection... (300 times) [2024-06-06 12:06:08,267][02904] Signal inference workers to resume experience collection... (300 times) [2024-06-06 12:06:08,311][02924] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-06 12:06:08,311][02924] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-06 12:06:11,757][02692] Fps is (10 sec: 42598.5, 60 sec: 49425.2, 300 sec: 48818.8). Total num frames: 120504320. Throughput: 0: 49158.8. Samples: 17458520. Policy #0 lag: (min: 0.0, avg: 7.2, max: 20.0) [2024-06-06 12:06:11,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:06:12,055][02924] Updated weights for policy 0, policy_version 7357 (0.0027) [2024-06-06 12:06:14,178][02924] Updated weights for policy 0, policy_version 7367 (0.0028) [2024-06-06 12:06:16,757][02692] Fps is (10 sec: 45887.4, 60 sec: 48879.6, 300 sec: 48985.4). Total num frames: 120766464. Throughput: 0: 49012.6. Samples: 17749800. Policy #0 lag: (min: 0.0, avg: 7.2, max: 20.0) [2024-06-06 12:06:16,758][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:06:18,787][02924] Updated weights for policy 0, policy_version 7377 (0.0029) [2024-06-06 12:06:20,783][02924] Updated weights for policy 0, policy_version 7387 (0.0033) [2024-06-06 12:06:21,757][02692] Fps is (10 sec: 54066.7, 60 sec: 49152.5, 300 sec: 49207.6). Total num frames: 121044992. Throughput: 0: 49611.4. Samples: 17903100. Policy #0 lag: (min: 0.0, avg: 7.4, max: 20.0) [2024-06-06 12:06:21,758][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:06:25,501][02924] Updated weights for policy 0, policy_version 7397 (0.0031) [2024-06-06 12:06:26,757][02692] Fps is (10 sec: 52429.5, 60 sec: 49425.2, 300 sec: 49207.6). Total num frames: 121290752. Throughput: 0: 49274.7. Samples: 18198080. Policy #0 lag: (min: 0.0, avg: 7.5, max: 20.0) [2024-06-06 12:06:26,757][02692] Avg episode reward: [(0, '0.055')] [2024-06-06 12:06:27,300][02924] Updated weights for policy 0, policy_version 7407 (0.0028) [2024-06-06 12:06:31,757][02692] Fps is (10 sec: 42598.7, 60 sec: 48879.1, 300 sec: 49096.5). Total num frames: 121470976. Throughput: 0: 48952.6. Samples: 18489760. Policy #0 lag: (min: 0.0, avg: 7.3, max: 20.0) [2024-06-06 12:06:31,757][02692] Avg episode reward: [(0, '0.050')] [2024-06-06 12:06:32,191][02924] Updated weights for policy 0, policy_version 7417 (0.0031) [2024-06-06 12:06:34,127][02924] Updated weights for policy 0, policy_version 7427 (0.0027) [2024-06-06 12:06:36,757][02692] Fps is (10 sec: 44236.7, 60 sec: 48608.1, 300 sec: 49263.1). Total num frames: 121733120. Throughput: 0: 48724.0. Samples: 18620600. Policy #0 lag: (min: 0.0, avg: 7.3, max: 20.0) [2024-06-06 12:06:36,757][02692] Avg episode reward: [(0, '0.057')] [2024-06-06 12:06:38,940][02924] Updated weights for policy 0, policy_version 7437 (0.0036) [2024-06-06 12:06:40,982][02924] Updated weights for policy 0, policy_version 7447 (0.0029) [2024-06-06 12:06:41,760][02692] Fps is (10 sec: 55688.6, 60 sec: 49149.5, 300 sec: 49429.2). Total num frames: 122028032. Throughput: 0: 48809.1. Samples: 18917320. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 12:06:41,761][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:06:45,461][02924] Updated weights for policy 0, policy_version 7457 (0.0017) [2024-06-06 12:06:46,757][02692] Fps is (10 sec: 52428.8, 60 sec: 48606.0, 300 sec: 49263.1). Total num frames: 122257408. Throughput: 0: 48984.9. Samples: 19216480. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 12:06:46,757][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:06:47,587][02924] Updated weights for policy 0, policy_version 7467 (0.0025) [2024-06-06 12:06:51,757][02692] Fps is (10 sec: 42611.4, 60 sec: 48878.9, 300 sec: 49152.0). Total num frames: 122454016. Throughput: 0: 48547.0. Samples: 19356980. Policy #0 lag: (min: 0.0, avg: 7.8, max: 20.0) [2024-06-06 12:06:51,757][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:06:52,262][02924] Updated weights for policy 0, policy_version 7477 (0.0030) [2024-06-06 12:06:54,076][02924] Updated weights for policy 0, policy_version 7487 (0.0029) [2024-06-06 12:06:56,757][02692] Fps is (10 sec: 47512.9, 60 sec: 48878.8, 300 sec: 49318.6). Total num frames: 122732544. Throughput: 0: 48878.0. Samples: 19658040. Policy #0 lag: (min: 0.0, avg: 7.8, max: 20.0) [2024-06-06 12:06:56,757][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:06:58,931][02924] Updated weights for policy 0, policy_version 7497 (0.0032) [2024-06-06 12:07:00,698][02924] Updated weights for policy 0, policy_version 7507 (0.0034) [2024-06-06 12:07:01,757][02692] Fps is (10 sec: 55705.0, 60 sec: 48878.9, 300 sec: 49485.2). Total num frames: 123011072. Throughput: 0: 48787.1. Samples: 19945220. Policy #0 lag: (min: 0.0, avg: 8.2, max: 21.0) [2024-06-06 12:07:01,757][02692] Avg episode reward: [(0, '0.057')] [2024-06-06 12:07:05,227][02904] Signal inference workers to stop experience collection... (350 times) [2024-06-06 12:07:05,273][02924] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-06 12:07:05,273][02904] Signal inference workers to resume experience collection... (350 times) [2024-06-06 12:07:05,293][02924] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-06 12:07:05,408][02924] Updated weights for policy 0, policy_version 7517 (0.0032) [2024-06-06 12:07:06,757][02692] Fps is (10 sec: 52429.0, 60 sec: 49154.2, 300 sec: 49485.2). Total num frames: 123256832. Throughput: 0: 49030.2. Samples: 20109460. Policy #0 lag: (min: 0.0, avg: 7.5, max: 22.0) [2024-06-06 12:07:06,758][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:07:07,443][02924] Updated weights for policy 0, policy_version 7527 (0.0031) [2024-06-06 12:07:11,757][02692] Fps is (10 sec: 44237.0, 60 sec: 49151.9, 300 sec: 49152.1). Total num frames: 123453440. Throughput: 0: 48994.6. Samples: 20402840. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:07:11,757][02692] Avg episode reward: [(0, '0.049')] [2024-06-06 12:07:12,118][02924] Updated weights for policy 0, policy_version 7537 (0.0030) [2024-06-06 12:07:14,294][02924] Updated weights for policy 0, policy_version 7547 (0.0021) [2024-06-06 12:07:16,757][02692] Fps is (10 sec: 44237.3, 60 sec: 48879.0, 300 sec: 49263.6). Total num frames: 123699200. Throughput: 0: 48950.2. Samples: 20692520. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:07:16,757][02692] Avg episode reward: [(0, '0.058')] [2024-06-06 12:07:18,729][02924] Updated weights for policy 0, policy_version 7557 (0.0032) [2024-06-06 12:07:20,724][02924] Updated weights for policy 0, policy_version 7567 (0.0022) [2024-06-06 12:07:21,757][02692] Fps is (10 sec: 54067.5, 60 sec: 49152.0, 300 sec: 49485.2). Total num frames: 123994112. Throughput: 0: 49574.2. Samples: 20851440. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 12:07:21,757][02692] Avg episode reward: [(0, '0.058')] [2024-06-06 12:07:25,373][02924] Updated weights for policy 0, policy_version 7577 (0.0025) [2024-06-06 12:07:26,757][02692] Fps is (10 sec: 55705.4, 60 sec: 49425.0, 300 sec: 49540.8). Total num frames: 124256256. Throughput: 0: 49735.3. Samples: 21155260. Policy #0 lag: (min: 0.0, avg: 8.6, max: 23.0) [2024-06-06 12:07:26,757][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:07:26,894][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007585_124272640.pth... [2024-06-06 12:07:26,944][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000006860_112394240.pth [2024-06-06 12:07:27,221][02924] Updated weights for policy 0, policy_version 7587 (0.0032) [2024-06-06 12:07:31,760][02692] Fps is (10 sec: 44223.7, 60 sec: 49422.6, 300 sec: 49096.0). Total num frames: 124436480. Throughput: 0: 49495.8. Samples: 21443940. Policy #0 lag: (min: 0.0, avg: 7.1, max: 21.0) [2024-06-06 12:07:31,760][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:07:32,095][02924] Updated weights for policy 0, policy_version 7597 (0.0030) [2024-06-06 12:07:34,057][02924] Updated weights for policy 0, policy_version 7607 (0.0021) [2024-06-06 12:07:36,757][02692] Fps is (10 sec: 42597.8, 60 sec: 49151.9, 300 sec: 49152.0). Total num frames: 124682240. Throughput: 0: 49190.5. Samples: 21570560. Policy #0 lag: (min: 0.0, avg: 7.1, max: 21.0) [2024-06-06 12:07:36,758][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:07:38,723][02924] Updated weights for policy 0, policy_version 7617 (0.0034) [2024-06-06 12:07:40,640][02924] Updated weights for policy 0, policy_version 7627 (0.0035) [2024-06-06 12:07:41,757][02692] Fps is (10 sec: 52444.6, 60 sec: 48881.4, 300 sec: 49374.2). Total num frames: 124960768. Throughput: 0: 49175.8. Samples: 21870940. Policy #0 lag: (min: 1.0, avg: 7.5, max: 21.0) [2024-06-06 12:07:41,757][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:07:45,425][02924] Updated weights for policy 0, policy_version 7637 (0.0034) [2024-06-06 12:07:46,757][02692] Fps is (10 sec: 55706.6, 60 sec: 49698.1, 300 sec: 49485.7). Total num frames: 125239296. Throughput: 0: 49589.0. Samples: 22176720. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:07:46,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:07:47,190][02924] Updated weights for policy 0, policy_version 7647 (0.0029) [2024-06-06 12:07:51,757][02692] Fps is (10 sec: 44236.7, 60 sec: 49152.0, 300 sec: 49040.9). Total num frames: 125403136. Throughput: 0: 49171.2. Samples: 22322160. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:07:51,757][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:07:52,144][02924] Updated weights for policy 0, policy_version 7657 (0.0025) [2024-06-06 12:07:52,651][02904] Signal inference workers to stop experience collection... (400 times) [2024-06-06 12:07:52,651][02904] Signal inference workers to resume experience collection... (400 times) [2024-06-06 12:07:52,663][02924] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-06 12:07:52,675][02924] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-06 12:07:53,850][02924] Updated weights for policy 0, policy_version 7667 (0.0022) [2024-06-06 12:07:56,757][02692] Fps is (10 sec: 45874.3, 60 sec: 49425.0, 300 sec: 49318.6). Total num frames: 125698048. Throughput: 0: 49076.3. Samples: 22611280. Policy #0 lag: (min: 0.0, avg: 7.2, max: 21.0) [2024-06-06 12:07:56,758][02692] Avg episode reward: [(0, '0.065')] [2024-06-06 12:07:56,771][02904] Saving new best policy, reward=0.065! [2024-06-06 12:07:58,712][02924] Updated weights for policy 0, policy_version 7677 (0.0038) [2024-06-06 12:08:00,723][02924] Updated weights for policy 0, policy_version 7687 (0.0021) [2024-06-06 12:08:01,757][02692] Fps is (10 sec: 55705.4, 60 sec: 49152.1, 300 sec: 49374.1). Total num frames: 125960192. Throughput: 0: 49048.0. Samples: 22899680. Policy #0 lag: (min: 2.0, avg: 7.6, max: 22.0) [2024-06-06 12:08:01,757][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:08:05,476][02924] Updated weights for policy 0, policy_version 7697 (0.0029) [2024-06-06 12:08:06,757][02692] Fps is (10 sec: 52429.7, 60 sec: 49425.2, 300 sec: 49540.8). Total num frames: 126222336. Throughput: 0: 49052.0. Samples: 23058780. Policy #0 lag: (min: 0.0, avg: 8.8, max: 23.0) [2024-06-06 12:08:06,757][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:08:07,617][02924] Updated weights for policy 0, policy_version 7707 (0.0034) [2024-06-06 12:08:11,757][02692] Fps is (10 sec: 45875.1, 60 sec: 49425.1, 300 sec: 49207.5). Total num frames: 126418944. Throughput: 0: 48817.8. Samples: 23352060. Policy #0 lag: (min: 0.0, avg: 8.8, max: 23.0) [2024-06-06 12:08:11,758][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:08:12,024][02924] Updated weights for policy 0, policy_version 7717 (0.0026) [2024-06-06 12:08:13,982][02924] Updated weights for policy 0, policy_version 7727 (0.0039) [2024-06-06 12:08:16,757][02692] Fps is (10 sec: 45874.8, 60 sec: 49698.1, 300 sec: 49207.5). Total num frames: 126681088. Throughput: 0: 49054.7. Samples: 23651260. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 12:08:16,758][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:08:18,713][02924] Updated weights for policy 0, policy_version 7737 (0.0028) [2024-06-06 12:08:20,462][02924] Updated weights for policy 0, policy_version 7747 (0.0031) [2024-06-06 12:08:21,757][02692] Fps is (10 sec: 52428.6, 60 sec: 49151.9, 300 sec: 49318.6). Total num frames: 126943232. Throughput: 0: 49733.9. Samples: 23808580. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 12:08:21,757][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:08:25,128][02924] Updated weights for policy 0, policy_version 7757 (0.0028) [2024-06-06 12:08:26,757][02692] Fps is (10 sec: 52427.4, 60 sec: 49151.7, 300 sec: 49540.7). Total num frames: 127205376. Throughput: 0: 49694.7. Samples: 24107220. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-06 12:08:26,758][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:08:27,379][02924] Updated weights for policy 0, policy_version 7767 (0.0032) [2024-06-06 12:08:31,757][02692] Fps is (10 sec: 45875.6, 60 sec: 49427.5, 300 sec: 49152.0). Total num frames: 127401984. Throughput: 0: 49418.2. Samples: 24400540. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-06 12:08:31,757][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:08:31,922][02924] Updated weights for policy 0, policy_version 7777 (0.0032) [2024-06-06 12:08:34,256][02924] Updated weights for policy 0, policy_version 7787 (0.0029) [2024-06-06 12:08:36,760][02692] Fps is (10 sec: 45863.0, 60 sec: 49695.8, 300 sec: 49262.6). Total num frames: 127664128. Throughput: 0: 49229.6. Samples: 24537640. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 12:08:36,761][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:08:38,621][02924] Updated weights for policy 0, policy_version 7797 (0.0032) [2024-06-06 12:08:39,346][02904] Signal inference workers to stop experience collection... (450 times) [2024-06-06 12:08:39,374][02924] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-06 12:08:39,457][02904] Signal inference workers to resume experience collection... (450 times) [2024-06-06 12:08:39,457][02924] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-06 12:08:40,729][02924] Updated weights for policy 0, policy_version 7807 (0.0027) [2024-06-06 12:08:41,757][02692] Fps is (10 sec: 52428.5, 60 sec: 49425.0, 300 sec: 49374.1). Total num frames: 127926272. Throughput: 0: 49343.2. Samples: 24831720. Policy #0 lag: (min: 1.0, avg: 10.8, max: 21.0) [2024-06-06 12:08:41,757][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:08:45,190][02924] Updated weights for policy 0, policy_version 7817 (0.0029) [2024-06-06 12:08:46,757][02692] Fps is (10 sec: 52444.2, 60 sec: 49151.9, 300 sec: 49485.2). Total num frames: 128188416. Throughput: 0: 49705.7. Samples: 25136440. Policy #0 lag: (min: 2.0, avg: 11.1, max: 24.0) [2024-06-06 12:08:46,758][02692] Avg episode reward: [(0, '0.056')] [2024-06-06 12:08:47,343][02924] Updated weights for policy 0, policy_version 7827 (0.0030) [2024-06-06 12:08:51,757][02692] Fps is (10 sec: 45875.1, 60 sec: 49698.1, 300 sec: 49152.0). Total num frames: 128385024. Throughput: 0: 49205.3. Samples: 25273020. Policy #0 lag: (min: 2.0, avg: 11.1, max: 24.0) [2024-06-06 12:08:51,757][02692] Avg episode reward: [(0, '0.057')] [2024-06-06 12:08:51,801][02924] Updated weights for policy 0, policy_version 7837 (0.0040) [2024-06-06 12:08:54,239][02924] Updated weights for policy 0, policy_version 7847 (0.0025) [2024-06-06 12:08:56,757][02692] Fps is (10 sec: 45875.6, 60 sec: 49152.2, 300 sec: 49207.6). Total num frames: 128647168. Throughput: 0: 49324.9. Samples: 25571680. Policy #0 lag: (min: 1.0, avg: 10.9, max: 26.0) [2024-06-06 12:08:56,757][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:08:58,272][02924] Updated weights for policy 0, policy_version 7857 (0.0022) [2024-06-06 12:09:00,910][02924] Updated weights for policy 0, policy_version 7867 (0.0028) [2024-06-06 12:09:01,757][02692] Fps is (10 sec: 52429.0, 60 sec: 49152.0, 300 sec: 49374.1). Total num frames: 128909312. Throughput: 0: 49192.5. Samples: 25864920. Policy #0 lag: (min: 1.0, avg: 11.8, max: 26.0) [2024-06-06 12:09:01,758][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:09:05,298][02924] Updated weights for policy 0, policy_version 7877 (0.0025) [2024-06-06 12:09:06,757][02692] Fps is (10 sec: 54067.2, 60 sec: 49425.1, 300 sec: 49540.8). Total num frames: 129187840. Throughput: 0: 49208.1. Samples: 26022940. Policy #0 lag: (min: 0.0, avg: 12.3, max: 25.0) [2024-06-06 12:09:06,757][02692] Avg episode reward: [(0, '0.053')] [2024-06-06 12:09:07,372][02924] Updated weights for policy 0, policy_version 7887 (0.0031) [2024-06-06 12:09:11,757][02692] Fps is (10 sec: 45875.4, 60 sec: 49152.0, 300 sec: 49152.0). Total num frames: 129368064. Throughput: 0: 49277.3. Samples: 26324680. Policy #0 lag: (min: 0.0, avg: 12.3, max: 25.0) [2024-06-06 12:09:11,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:09:11,797][02924] Updated weights for policy 0, policy_version 7897 (0.0030) [2024-06-06 12:09:13,949][02924] Updated weights for policy 0, policy_version 7907 (0.0026) [2024-06-06 12:09:16,760][02692] Fps is (10 sec: 45861.4, 60 sec: 49422.7, 300 sec: 49207.0). Total num frames: 129646592. Throughput: 0: 49070.5. Samples: 26608860. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 12:09:16,761][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:09:18,619][02924] Updated weights for policy 0, policy_version 7917 (0.0036) [2024-06-06 12:09:20,719][02924] Updated weights for policy 0, policy_version 7927 (0.0034) [2024-06-06 12:09:21,757][02692] Fps is (10 sec: 52428.8, 60 sec: 49152.1, 300 sec: 49318.7). Total num frames: 129892352. Throughput: 0: 49438.0. Samples: 26762200. Policy #0 lag: (min: 2.0, avg: 12.9, max: 24.0) [2024-06-06 12:09:21,757][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:09:25,068][02924] Updated weights for policy 0, policy_version 7937 (0.0027) [2024-06-06 12:09:26,757][02692] Fps is (10 sec: 52443.7, 60 sec: 49425.2, 300 sec: 49540.8). Total num frames: 130170880. Throughput: 0: 49682.1. Samples: 27067420. Policy #0 lag: (min: 2.0, avg: 12.9, max: 24.0) [2024-06-06 12:09:26,758][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:09:26,767][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007945_130170880.pth... [2024-06-06 12:09:26,810][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007223_118341632.pth [2024-06-06 12:09:27,213][02924] Updated weights for policy 0, policy_version 7947 (0.0033) [2024-06-06 12:09:31,757][02692] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 49152.0). Total num frames: 130351104. Throughput: 0: 49393.4. Samples: 27359140. Policy #0 lag: (min: 0.0, avg: 12.9, max: 29.0) [2024-06-06 12:09:31,757][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:09:31,766][02924] Updated weights for policy 0, policy_version 7957 (0.0027) [2024-06-06 12:09:33,625][02904] Signal inference workers to stop experience collection... (500 times) [2024-06-06 12:09:33,655][02924] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-06 12:09:33,682][02904] Signal inference workers to resume experience collection... (500 times) [2024-06-06 12:09:33,682][02924] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-06 12:09:33,831][02924] Updated weights for policy 0, policy_version 7967 (0.0027) [2024-06-06 12:09:36,757][02692] Fps is (10 sec: 45875.7, 60 sec: 49427.5, 300 sec: 49207.5). Total num frames: 130629632. Throughput: 0: 49172.0. Samples: 27485760. Policy #0 lag: (min: 1.0, avg: 13.1, max: 21.0) [2024-06-06 12:09:36,757][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:09:38,423][02924] Updated weights for policy 0, policy_version 7977 (0.0026) [2024-06-06 12:09:40,731][02924] Updated weights for policy 0, policy_version 7987 (0.0034) [2024-06-06 12:09:41,757][02692] Fps is (10 sec: 52428.8, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 130875392. Throughput: 0: 49236.4. Samples: 27787320. Policy #0 lag: (min: 0.0, avg: 13.3, max: 25.0) [2024-06-06 12:09:41,757][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:09:45,259][02924] Updated weights for policy 0, policy_version 7997 (0.0023) [2024-06-06 12:09:46,757][02692] Fps is (10 sec: 52429.2, 60 sec: 49425.1, 300 sec: 49540.8). Total num frames: 131153920. Throughput: 0: 49397.8. Samples: 28087820. Policy #0 lag: (min: 0.0, avg: 13.3, max: 25.0) [2024-06-06 12:09:46,757][02692] Avg episode reward: [(0, '0.054')] [2024-06-06 12:09:47,390][02924] Updated weights for policy 0, policy_version 8007 (0.0028) [2024-06-06 12:09:51,692][02924] Updated weights for policy 0, policy_version 8017 (0.0026) [2024-06-06 12:09:51,757][02692] Fps is (10 sec: 47513.0, 60 sec: 49425.0, 300 sec: 49207.5). Total num frames: 131350528. Throughput: 0: 49186.1. Samples: 28236320. Policy #0 lag: (min: 0.0, avg: 11.4, max: 20.0) [2024-06-06 12:09:51,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:09:53,994][02924] Updated weights for policy 0, policy_version 8027 (0.0035) [2024-06-06 12:09:56,757][02692] Fps is (10 sec: 45874.7, 60 sec: 49425.0, 300 sec: 49207.5). Total num frames: 131612672. Throughput: 0: 48956.8. Samples: 28527740. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:09:56,758][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:09:58,374][02924] Updated weights for policy 0, policy_version 8037 (0.0031) [2024-06-06 12:10:00,538][02924] Updated weights for policy 0, policy_version 8047 (0.0034) [2024-06-06 12:10:01,757][02692] Fps is (10 sec: 50791.0, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 131858432. Throughput: 0: 49131.3. Samples: 28819620. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:10:01,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:10:04,890][02924] Updated weights for policy 0, policy_version 8057 (0.0037) [2024-06-06 12:10:06,757][02692] Fps is (10 sec: 52429.0, 60 sec: 49151.9, 300 sec: 49485.2). Total num frames: 132136960. Throughput: 0: 49293.7. Samples: 28980420. Policy #0 lag: (min: 1.0, avg: 10.3, max: 21.0) [2024-06-06 12:10:06,757][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:10:07,195][02924] Updated weights for policy 0, policy_version 8067 (0.0031) [2024-06-06 12:10:11,550][02924] Updated weights for policy 0, policy_version 8077 (0.0026) [2024-06-06 12:10:11,757][02692] Fps is (10 sec: 47513.0, 60 sec: 49425.0, 300 sec: 49152.1). Total num frames: 132333568. Throughput: 0: 49040.1. Samples: 29274220. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 12:10:11,757][02692] Avg episode reward: [(0, '0.058')] [2024-06-06 12:10:13,873][02924] Updated weights for policy 0, policy_version 8087 (0.0029) [2024-06-06 12:10:16,757][02692] Fps is (10 sec: 45875.4, 60 sec: 49154.5, 300 sec: 49152.1). Total num frames: 132595712. Throughput: 0: 49292.4. Samples: 29577300. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:10:16,757][02692] Avg episode reward: [(0, '0.064')] [2024-06-06 12:10:17,972][02924] Updated weights for policy 0, policy_version 8097 (0.0028) [2024-06-06 12:10:20,348][02924] Updated weights for policy 0, policy_version 8107 (0.0036) [2024-06-06 12:10:21,757][02692] Fps is (10 sec: 52429.5, 60 sec: 49425.1, 300 sec: 49263.1). Total num frames: 132857856. Throughput: 0: 49848.1. Samples: 29728920. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:10:21,757][02692] Avg episode reward: [(0, '0.057')] [2024-06-06 12:10:24,663][02924] Updated weights for policy 0, policy_version 8117 (0.0025) [2024-06-06 12:10:26,757][02692] Fps is (10 sec: 55704.7, 60 sec: 49698.1, 300 sec: 49540.8). Total num frames: 133152768. Throughput: 0: 49914.5. Samples: 30033480. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 12:10:26,759][02924] Updated weights for policy 0, policy_version 8127 (0.0028) [2024-06-06 12:10:26,766][02692] Avg episode reward: [(0, '0.059')] [2024-06-06 12:10:31,248][02924] Updated weights for policy 0, policy_version 8137 (0.0019) [2024-06-06 12:10:31,757][02692] Fps is (10 sec: 49151.7, 60 sec: 49971.2, 300 sec: 49263.5). Total num frames: 133349376. Throughput: 0: 49935.0. Samples: 30334900. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:10:31,757][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:10:33,493][02924] Updated weights for policy 0, policy_version 8147 (0.0032) [2024-06-06 12:10:36,757][02692] Fps is (10 sec: 42598.6, 60 sec: 49152.0, 300 sec: 49152.0). Total num frames: 133578752. Throughput: 0: 49598.7. Samples: 30468260. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:10:36,758][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:10:37,945][02924] Updated weights for policy 0, policy_version 8157 (0.0039) [2024-06-06 12:10:38,614][02904] Signal inference workers to stop experience collection... (550 times) [2024-06-06 12:10:38,615][02904] Signal inference workers to resume experience collection... (550 times) [2024-06-06 12:10:38,664][02924] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-06 12:10:38,664][02924] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-06 12:10:40,218][02924] Updated weights for policy 0, policy_version 8167 (0.0023) [2024-06-06 12:10:41,757][02692] Fps is (10 sec: 50790.5, 60 sec: 49698.1, 300 sec: 49207.6). Total num frames: 133857280. Throughput: 0: 49733.4. Samples: 30765740. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:10:41,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:10:44,446][02924] Updated weights for policy 0, policy_version 8177 (0.0027) [2024-06-06 12:10:46,757][02692] Fps is (10 sec: 54067.7, 60 sec: 49425.0, 300 sec: 49485.2). Total num frames: 134119424. Throughput: 0: 49714.6. Samples: 31056780. Policy #0 lag: (min: 1.0, avg: 8.5, max: 21.0) [2024-06-06 12:10:46,757][02692] Avg episode reward: [(0, '0.062')] [2024-06-06 12:10:46,789][02924] Updated weights for policy 0, policy_version 8187 (0.0022) [2024-06-06 12:10:51,022][02924] Updated weights for policy 0, policy_version 8197 (0.0023) [2024-06-06 12:10:51,757][02692] Fps is (10 sec: 49151.8, 60 sec: 49971.2, 300 sec: 49318.6). Total num frames: 134348800. Throughput: 0: 49774.7. Samples: 31220280. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 12:10:51,758][02692] Avg episode reward: [(0, '0.061')] [2024-06-06 12:10:53,115][02924] Updated weights for policy 0, policy_version 8207 (0.0033) [2024-06-06 12:10:56,757][02692] Fps is (10 sec: 45875.4, 60 sec: 49425.2, 300 sec: 49152.0). Total num frames: 134578176. Throughput: 0: 49951.7. Samples: 31522040. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 12:10:56,757][02692] Avg episode reward: [(0, '0.060')] [2024-06-06 12:10:57,777][02924] Updated weights for policy 0, policy_version 8217 (0.0024) [2024-06-06 12:11:17,523][02692] Fps is (10 sec: 15897.2, 60 sec: 38275.5, 300 sec: 46500.7). Total num frames: 134758400. Throughput: 0: 34448.7. Samples: 31670600. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 12:11:17,523][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:11:17,537][02692] Fps is (10 sec: 8673.0, 60 sec: 37036.5, 300 sec: 46615.5). Total num frames: 134758400. Throughput: 0: 34809.6. Samples: 31670600. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 12:11:17,537][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:11:17,542][02692] Fps is (10 sec: 0.0, 60 sec: 36859.7, 300 sec: 46518.0). Total num frames: 134758400. Throughput: 0: 33551.0. Samples: 31737380. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 12:11:17,543][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:11:17,543][02692] Fps is (10 sec: 0.0, 60 sec: 35578.7, 300 sec: 46362.6). Total num frames: 134758400. Throughput: 0: 30631.2. Samples: 31737380. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 12:11:17,543][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:11:17,586][02904] Saving new best policy, reward=0.069! [2024-06-06 12:11:18,006][02924] Updated weights for policy 0, policy_version 8227 (0.0029) [2024-06-06 12:11:21,759][02692] Fps is (10 sec: 38856.9, 60 sec: 34405.2, 300 sec: 46208.1). Total num frames: 134922240. Throughput: 0: 29791.6. Samples: 31808940. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 12:11:21,760][02692] Avg episode reward: [(0, '0.065')] [2024-06-06 12:11:22,234][02924] Updated weights for policy 0, policy_version 8237 (0.0024) [2024-06-06 12:11:24,574][02924] Updated weights for policy 0, policy_version 8247 (0.0020) [2024-06-06 12:11:26,757][02692] Fps is (10 sec: 48010.8, 60 sec: 34133.4, 300 sec: 46541.7). Total num frames: 135200768. Throughput: 0: 30004.9. Samples: 32115960. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-06 12:11:26,762][02692] Avg episode reward: [(0, '0.057')] [2024-06-06 12:11:26,888][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008253_135217152.pth... [2024-06-06 12:11:26,946][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007585_124272640.pth [2024-06-06 12:11:28,761][02924] Updated weights for policy 0, policy_version 8257 (0.0027) [2024-06-06 12:11:31,037][02924] Updated weights for policy 0, policy_version 8267 (0.0029) [2024-06-06 12:11:31,760][02692] Fps is (10 sec: 54062.5, 60 sec: 35223.9, 300 sec: 46541.2). Total num frames: 135462912. Throughput: 0: 30146.5. Samples: 32413460. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-06 12:11:31,760][02692] Avg episode reward: [(0, '0.070')] [2024-06-06 12:11:35,286][02924] Updated weights for policy 0, policy_version 8277 (0.0027) [2024-06-06 12:11:36,757][02692] Fps is (10 sec: 52429.1, 60 sec: 35771.8, 300 sec: 46431.1). Total num frames: 135725056. Throughput: 0: 30183.2. Samples: 32578520. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:11:36,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:11:37,470][02924] Updated weights for policy 0, policy_version 8287 (0.0022) [2024-06-06 12:11:41,757][02692] Fps is (10 sec: 45888.6, 60 sec: 34406.4, 300 sec: 46319.5). Total num frames: 135921664. Throughput: 0: 29943.1. Samples: 32869480. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 12:11:41,757][02692] Avg episode reward: [(0, '0.067')] [2024-06-06 12:11:41,883][02924] Updated weights for policy 0, policy_version 8297 (0.0029) [2024-06-06 12:11:44,196][02924] Updated weights for policy 0, policy_version 8307 (0.0027) [2024-06-06 12:11:46,757][02692] Fps is (10 sec: 47513.3, 60 sec: 34679.5, 300 sec: 46597.2). Total num frames: 136200192. Throughput: 0: 51249.2. Samples: 33168840. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-06 12:11:46,757][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:11:48,359][02924] Updated weights for policy 0, policy_version 8317 (0.0030) [2024-06-06 12:11:50,797][02924] Updated weights for policy 0, policy_version 8327 (0.0027) [2024-06-06 12:11:51,703][02904] Signal inference workers to stop experience collection... (600 times) [2024-06-06 12:11:51,743][02924] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-06 12:11:51,749][02904] Signal inference workers to resume experience collection... (600 times) [2024-06-06 12:11:51,757][02924] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-06 12:11:51,757][02692] Fps is (10 sec: 54067.4, 60 sec: 35225.6, 300 sec: 46541.7). Total num frames: 136462336. Throughput: 0: 48039.1. Samples: 33314500. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-06 12:11:51,762][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:11:54,889][02924] Updated weights for policy 0, policy_version 8337 (0.0030) [2024-06-06 12:11:56,757][02692] Fps is (10 sec: 50790.8, 60 sec: 35498.7, 300 sec: 46430.6). Total num frames: 136708096. Throughput: 0: 48044.6. Samples: 33621420. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:11:56,757][02692] Avg episode reward: [(0, '0.065')] [2024-06-06 12:11:57,279][02924] Updated weights for policy 0, policy_version 8347 (0.0022) [2024-06-06 12:12:01,591][02924] Updated weights for policy 0, policy_version 8357 (0.0032) [2024-06-06 12:12:01,757][02692] Fps is (10 sec: 47513.6, 60 sec: 49262.0, 300 sec: 46375.1). Total num frames: 136937472. Throughput: 0: 49378.6. Samples: 33920600. Policy #0 lag: (min: 0.0, avg: 9.1, max: 24.0) [2024-06-06 12:12:01,757][02692] Avg episode reward: [(0, '0.067')] [2024-06-06 12:12:03,823][02924] Updated weights for policy 0, policy_version 8367 (0.0029) [2024-06-06 12:12:06,757][02692] Fps is (10 sec: 47513.4, 60 sec: 49265.1, 300 sec: 46541.7). Total num frames: 137183232. Throughput: 0: 49911.2. Samples: 34054840. Policy #0 lag: (min: 0.0, avg: 9.1, max: 24.0) [2024-06-06 12:12:06,757][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:12:08,091][02924] Updated weights for policy 0, policy_version 8377 (0.0025) [2024-06-06 12:12:10,631][02924] Updated weights for policy 0, policy_version 8387 (0.0027) [2024-06-06 12:12:11,757][02692] Fps is (10 sec: 52428.6, 60 sec: 49864.2, 300 sec: 46652.7). Total num frames: 137461760. Throughput: 0: 49783.5. Samples: 34356220. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 12:12:11,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:12:14,692][02924] Updated weights for policy 0, policy_version 8397 (0.0033) [2024-06-06 12:12:16,757][02692] Fps is (10 sec: 50790.2, 60 sec: 49527.8, 300 sec: 46430.6). Total num frames: 137691136. Throughput: 0: 49810.3. Samples: 34654780. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 12:12:16,757][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:12:17,258][02924] Updated weights for policy 0, policy_version 8407 (0.0032) [2024-06-06 12:12:21,177][02924] Updated weights for policy 0, policy_version 8417 (0.0037) [2024-06-06 12:12:21,757][02692] Fps is (10 sec: 45875.6, 60 sec: 49973.0, 300 sec: 46319.5). Total num frames: 137920512. Throughput: 0: 49573.3. Samples: 34809320. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 12:12:21,757][02692] Avg episode reward: [(0, '0.068')] [2024-06-06 12:12:23,659][02924] Updated weights for policy 0, policy_version 8427 (0.0028) [2024-06-06 12:12:26,757][02692] Fps is (10 sec: 49152.2, 60 sec: 49698.2, 300 sec: 46597.7). Total num frames: 138182656. Throughput: 0: 49648.0. Samples: 35103640. Policy #0 lag: (min: 0.0, avg: 9.2, max: 20.0) [2024-06-06 12:12:26,757][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:12:26,762][02904] Saving new best policy, reward=0.071! [2024-06-06 12:12:27,883][02924] Updated weights for policy 0, policy_version 8437 (0.0025) [2024-06-06 12:12:30,292][02924] Updated weights for policy 0, policy_version 8447 (0.0037) [2024-06-06 12:12:31,757][02692] Fps is (10 sec: 52428.1, 60 sec: 49700.5, 300 sec: 46652.8). Total num frames: 138444800. Throughput: 0: 49453.7. Samples: 35394260. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 12:12:31,758][02692] Avg episode reward: [(0, '0.068')] [2024-06-06 12:12:34,495][02924] Updated weights for policy 0, policy_version 8457 (0.0028) [2024-06-06 12:12:36,757][02692] Fps is (10 sec: 50789.9, 60 sec: 49425.0, 300 sec: 46541.6). Total num frames: 138690560. Throughput: 0: 49563.5. Samples: 35544860. Policy #0 lag: (min: 0.0, avg: 10.6, max: 23.0) [2024-06-06 12:12:36,758][02692] Avg episode reward: [(0, '0.067')] [2024-06-06 12:12:37,268][02924] Updated weights for policy 0, policy_version 8467 (0.0022) [2024-06-06 12:12:41,187][02924] Updated weights for policy 0, policy_version 8477 (0.0028) [2024-06-06 12:12:41,757][02692] Fps is (10 sec: 44237.3, 60 sec: 49425.1, 300 sec: 46264.0). Total num frames: 138887168. Throughput: 0: 49263.5. Samples: 35838280. Policy #0 lag: (min: 0.0, avg: 10.6, max: 23.0) [2024-06-06 12:12:41,757][02692] Avg episode reward: [(0, '0.058')] [2024-06-06 12:12:43,956][02924] Updated weights for policy 0, policy_version 8487 (0.0027) [2024-06-06 12:12:46,757][02692] Fps is (10 sec: 47513.9, 60 sec: 49425.1, 300 sec: 46652.7). Total num frames: 139165696. Throughput: 0: 49034.6. Samples: 36127160. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 12:12:46,757][02692] Avg episode reward: [(0, '0.068')] [2024-06-06 12:12:47,821][02924] Updated weights for policy 0, policy_version 8497 (0.0029) [2024-06-06 12:12:50,120][02904] Signal inference workers to stop experience collection... (650 times) [2024-06-06 12:12:50,164][02924] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-06 12:12:50,227][02904] Signal inference workers to resume experience collection... (650 times) [2024-06-06 12:12:50,227][02924] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-06 12:12:50,545][02924] Updated weights for policy 0, policy_version 8507 (0.0029) [2024-06-06 12:12:51,757][02692] Fps is (10 sec: 52428.6, 60 sec: 49152.0, 300 sec: 46486.2). Total num frames: 139411456. Throughput: 0: 49364.9. Samples: 36276260. Policy #0 lag: (min: 0.0, avg: 11.1, max: 20.0) [2024-06-06 12:12:51,757][02692] Avg episode reward: [(0, '0.070')] [2024-06-06 12:12:54,324][02924] Updated weights for policy 0, policy_version 8517 (0.0036) [2024-06-06 12:12:56,757][02692] Fps is (10 sec: 50790.4, 60 sec: 49425.0, 300 sec: 46486.1). Total num frames: 139673600. Throughput: 0: 49349.8. Samples: 36576960. Policy #0 lag: (min: 0.0, avg: 11.1, max: 20.0) [2024-06-06 12:12:56,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:12:57,097][02924] Updated weights for policy 0, policy_version 8527 (0.0026) [2024-06-06 12:13:01,074][02924] Updated weights for policy 0, policy_version 8537 (0.0030) [2024-06-06 12:13:01,757][02692] Fps is (10 sec: 45875.1, 60 sec: 48878.9, 300 sec: 46264.0). Total num frames: 139870208. Throughput: 0: 49160.4. Samples: 36867000. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 12:13:01,757][02692] Avg episode reward: [(0, '0.064')] [2024-06-06 12:13:03,955][02924] Updated weights for policy 0, policy_version 8547 (0.0028) [2024-06-06 12:13:06,757][02692] Fps is (10 sec: 45875.3, 60 sec: 49152.0, 300 sec: 46486.1). Total num frames: 140132352. Throughput: 0: 48951.9. Samples: 37012160. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:13:06,757][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:13:07,463][02924] Updated weights for policy 0, policy_version 8557 (0.0039) [2024-06-06 12:13:10,745][02924] Updated weights for policy 0, policy_version 8567 (0.0032) [2024-06-06 12:13:11,757][02692] Fps is (10 sec: 52429.4, 60 sec: 48879.0, 300 sec: 46486.2). Total num frames: 140394496. Throughput: 0: 48922.7. Samples: 37305160. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:13:11,757][02692] Avg episode reward: [(0, '0.065')] [2024-06-06 12:13:13,926][02924] Updated weights for policy 0, policy_version 8577 (0.0027) [2024-06-06 12:13:16,757][02692] Fps is (10 sec: 54066.8, 60 sec: 49698.1, 300 sec: 46541.7). Total num frames: 140673024. Throughput: 0: 49328.0. Samples: 37614020. Policy #0 lag: (min: 1.0, avg: 9.6, max: 21.0) [2024-06-06 12:13:16,758][02692] Avg episode reward: [(0, '0.068')] [2024-06-06 12:13:17,107][02924] Updated weights for policy 0, policy_version 8587 (0.0022) [2024-06-06 12:13:20,627][02924] Updated weights for policy 0, policy_version 8597 (0.0030) [2024-06-06 12:13:21,757][02692] Fps is (10 sec: 47513.3, 60 sec: 49152.0, 300 sec: 46319.6). Total num frames: 140869632. Throughput: 0: 49359.7. Samples: 37766040. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:13:21,757][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:13:23,581][02924] Updated weights for policy 0, policy_version 8607 (0.0025) [2024-06-06 12:13:26,757][02692] Fps is (10 sec: 47513.9, 60 sec: 49425.0, 300 sec: 46597.2). Total num frames: 141148160. Throughput: 0: 49417.7. Samples: 38062080. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:13:26,760][02692] Avg episode reward: [(0, '0.070')] [2024-06-06 12:13:26,770][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008615_141148160.pth... [2024-06-06 12:13:26,822][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000007945_130170880.pth [2024-06-06 12:13:27,333][02924] Updated weights for policy 0, policy_version 8617 (0.0035) [2024-06-06 12:13:30,330][02924] Updated weights for policy 0, policy_version 8627 (0.0022) [2024-06-06 12:13:31,757][02692] Fps is (10 sec: 52427.9, 60 sec: 49151.9, 300 sec: 46542.1). Total num frames: 141393920. Throughput: 0: 49573.6. Samples: 38357980. Policy #0 lag: (min: 0.0, avg: 6.9, max: 19.0) [2024-06-06 12:13:31,758][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:13:33,899][02924] Updated weights for policy 0, policy_version 8637 (0.0032) [2024-06-06 12:13:36,757][02692] Fps is (10 sec: 50790.8, 60 sec: 49425.2, 300 sec: 46541.7). Total num frames: 141656064. Throughput: 0: 49526.3. Samples: 38504940. Policy #0 lag: (min: 1.0, avg: 10.8, max: 24.0) [2024-06-06 12:13:36,757][02692] Avg episode reward: [(0, '0.063')] [2024-06-06 12:13:36,931][02924] Updated weights for policy 0, policy_version 8647 (0.0033) [2024-06-06 12:13:40,270][02924] Updated weights for policy 0, policy_version 8657 (0.0026) [2024-06-06 12:13:41,757][02692] Fps is (10 sec: 49153.1, 60 sec: 49971.2, 300 sec: 46430.6). Total num frames: 141885440. Throughput: 0: 49541.9. Samples: 38806340. Policy #0 lag: (min: 1.0, avg: 10.8, max: 24.0) [2024-06-06 12:13:41,757][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:13:43,399][02924] Updated weights for policy 0, policy_version 8667 (0.0030) [2024-06-06 12:13:46,757][02692] Fps is (10 sec: 49151.8, 60 sec: 49698.2, 300 sec: 46652.8). Total num frames: 142147584. Throughput: 0: 49788.5. Samples: 39107480. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 12:13:46,757][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:13:47,006][02924] Updated weights for policy 0, policy_version 8677 (0.0031) [2024-06-06 12:13:50,002][02924] Updated weights for policy 0, policy_version 8687 (0.0031) [2024-06-06 12:13:51,757][02692] Fps is (10 sec: 50789.9, 60 sec: 49698.1, 300 sec: 46597.2). Total num frames: 142393344. Throughput: 0: 49905.8. Samples: 39257920. Policy #0 lag: (min: 1.0, avg: 11.3, max: 21.0) [2024-06-06 12:13:51,757][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:13:53,868][02924] Updated weights for policy 0, policy_version 8697 (0.0030) [2024-06-06 12:13:56,661][02924] Updated weights for policy 0, policy_version 8707 (0.0029) [2024-06-06 12:13:56,757][02692] Fps is (10 sec: 50790.8, 60 sec: 49698.2, 300 sec: 46597.2). Total num frames: 142655488. Throughput: 0: 50012.9. Samples: 39555740. Policy #0 lag: (min: 1.0, avg: 11.3, max: 21.0) [2024-06-06 12:13:56,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:14:00,478][02924] Updated weights for policy 0, policy_version 8717 (0.0028) [2024-06-06 12:14:01,251][02904] Signal inference workers to stop experience collection... (700 times) [2024-06-06 12:14:01,279][02924] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-06-06 12:14:01,316][02904] Signal inference workers to resume experience collection... (700 times) [2024-06-06 12:14:01,316][02924] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-06-06 12:14:01,757][02692] Fps is (10 sec: 47513.9, 60 sec: 49971.3, 300 sec: 46375.1). Total num frames: 142868480. Throughput: 0: 49822.3. Samples: 39856020. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 12:14:01,757][02692] Avg episode reward: [(0, '0.068')] [2024-06-06 12:14:03,258][02924] Updated weights for policy 0, policy_version 8727 (0.0036) [2024-06-06 12:14:06,757][02692] Fps is (10 sec: 47513.3, 60 sec: 49971.2, 300 sec: 46652.7). Total num frames: 143130624. Throughput: 0: 49518.7. Samples: 39994380. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 12:14:06,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:14:06,771][02904] Saving new best policy, reward=0.072! [2024-06-06 12:14:07,019][02924] Updated weights for policy 0, policy_version 8737 (0.0032) [2024-06-06 12:14:09,681][02924] Updated weights for policy 0, policy_version 8747 (0.0027) [2024-06-06 12:14:11,757][02692] Fps is (10 sec: 50790.5, 60 sec: 49698.1, 300 sec: 46542.1). Total num frames: 143376384. Throughput: 0: 49506.3. Samples: 40289860. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 12:14:11,757][02692] Avg episode reward: [(0, '0.064')] [2024-06-06 12:14:13,785][02924] Updated weights for policy 0, policy_version 8757 (0.0042) [2024-06-06 12:14:16,411][02924] Updated weights for policy 0, policy_version 8767 (0.0029) [2024-06-06 12:14:16,757][02692] Fps is (10 sec: 50790.1, 60 sec: 49425.1, 300 sec: 46597.2). Total num frames: 143638528. Throughput: 0: 49561.9. Samples: 40588260. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 12:14:16,757][02692] Avg episode reward: [(0, '0.074')] [2024-06-06 12:14:16,799][02904] Saving new best policy, reward=0.074! [2024-06-06 12:14:20,631][02924] Updated weights for policy 0, policy_version 8777 (0.0035) [2024-06-06 12:14:21,757][02692] Fps is (10 sec: 49151.9, 60 sec: 49971.2, 300 sec: 46430.6). Total num frames: 143867904. Throughput: 0: 49637.8. Samples: 40738640. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:14:21,757][02692] Avg episode reward: [(0, '0.070')] [2024-06-06 12:14:23,197][02924] Updated weights for policy 0, policy_version 8787 (0.0037) [2024-06-06 12:14:26,757][02692] Fps is (10 sec: 47513.8, 60 sec: 49425.1, 300 sec: 46652.7). Total num frames: 144113664. Throughput: 0: 49556.4. Samples: 41036380. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:14:26,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:14:26,989][02924] Updated weights for policy 0, policy_version 8797 (0.0027) [2024-06-06 12:14:29,888][02924] Updated weights for policy 0, policy_version 8807 (0.0026) [2024-06-06 12:14:31,757][02692] Fps is (10 sec: 50790.2, 60 sec: 49698.3, 300 sec: 46597.2). Total num frames: 144375808. Throughput: 0: 49497.3. Samples: 41334860. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:14:31,757][02692] Avg episode reward: [(0, '0.073')] [2024-06-06 12:14:33,389][02924] Updated weights for policy 0, policy_version 8817 (0.0019) [2024-06-06 12:14:36,247][02924] Updated weights for policy 0, policy_version 8827 (0.0030) [2024-06-06 12:14:36,757][02692] Fps is (10 sec: 50789.5, 60 sec: 49424.9, 300 sec: 46597.2). Total num frames: 144621568. Throughput: 0: 49600.3. Samples: 41489940. Policy #0 lag: (min: 1.0, avg: 10.6, max: 22.0) [2024-06-06 12:14:36,758][02692] Avg episode reward: [(0, '0.074')] [2024-06-06 12:14:40,106][02924] Updated weights for policy 0, policy_version 8837 (0.0023) [2024-06-06 12:14:41,757][02692] Fps is (10 sec: 49151.9, 60 sec: 49698.1, 300 sec: 46486.1). Total num frames: 144867328. Throughput: 0: 49546.5. Samples: 41785340. Policy #0 lag: (min: 1.0, avg: 10.6, max: 22.0) [2024-06-06 12:14:41,757][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:14:42,969][02924] Updated weights for policy 0, policy_version 8847 (0.0037) [2024-06-06 12:14:46,757][02692] Fps is (10 sec: 47514.2, 60 sec: 49152.0, 300 sec: 46597.2). Total num frames: 145096704. Throughput: 0: 49396.8. Samples: 42078880. Policy #0 lag: (min: 1.0, avg: 11.2, max: 21.0) [2024-06-06 12:14:46,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:14:46,837][02924] Updated weights for policy 0, policy_version 8857 (0.0029) [2024-06-06 12:14:49,737][02924] Updated weights for policy 0, policy_version 8867 (0.0037) [2024-06-06 12:14:51,757][02692] Fps is (10 sec: 47512.9, 60 sec: 49151.9, 300 sec: 46541.7). Total num frames: 145342464. Throughput: 0: 49561.6. Samples: 42224660. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:14:51,758][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:14:53,358][02924] Updated weights for policy 0, policy_version 8877 (0.0030) [2024-06-06 12:14:56,438][02924] Updated weights for policy 0, policy_version 8887 (0.0027) [2024-06-06 12:14:56,757][02692] Fps is (10 sec: 50789.5, 60 sec: 49151.7, 300 sec: 46597.2). Total num frames: 145604608. Throughput: 0: 49505.9. Samples: 42517640. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:14:56,758][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:14:59,967][02924] Updated weights for policy 0, policy_version 8897 (0.0040) [2024-06-06 12:15:01,203][02904] Signal inference workers to stop experience collection... (750 times) [2024-06-06 12:15:01,203][02904] Signal inference workers to resume experience collection... (750 times) [2024-06-06 12:15:01,249][02924] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-06-06 12:15:01,249][02924] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-06-06 12:15:01,757][02692] Fps is (10 sec: 50790.7, 60 sec: 49698.0, 300 sec: 46486.1). Total num frames: 145850368. Throughput: 0: 49744.3. Samples: 42826760. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 12:15:01,758][02692] Avg episode reward: [(0, '0.074')] [2024-06-06 12:15:02,931][02924] Updated weights for policy 0, policy_version 8907 (0.0025) [2024-06-06 12:15:06,455][02924] Updated weights for policy 0, policy_version 8917 (0.0032) [2024-06-06 12:15:06,757][02692] Fps is (10 sec: 49152.9, 60 sec: 49425.0, 300 sec: 46652.8). Total num frames: 146096128. Throughput: 0: 49411.9. Samples: 42962180. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 12:15:06,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:15:09,661][02924] Updated weights for policy 0, policy_version 8927 (0.0030) [2024-06-06 12:15:11,757][02692] Fps is (10 sec: 50790.9, 60 sec: 49698.1, 300 sec: 46652.7). Total num frames: 146358272. Throughput: 0: 49230.2. Samples: 43251740. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 12:15:11,758][02692] Avg episode reward: [(0, '0.075')] [2024-06-06 12:15:13,254][02924] Updated weights for policy 0, policy_version 8937 (0.0023) [2024-06-06 12:15:16,533][02924] Updated weights for policy 0, policy_version 8947 (0.0028) [2024-06-06 12:15:16,757][02692] Fps is (10 sec: 50790.7, 60 sec: 49425.1, 300 sec: 46597.2). Total num frames: 146604032. Throughput: 0: 49283.2. Samples: 43552600. Policy #0 lag: (min: 0.0, avg: 11.2, max: 23.0) [2024-06-06 12:15:16,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:15:19,644][02924] Updated weights for policy 0, policy_version 8957 (0.0038) [2024-06-06 12:15:21,757][02692] Fps is (10 sec: 47513.4, 60 sec: 49425.0, 300 sec: 46375.1). Total num frames: 146833408. Throughput: 0: 49089.0. Samples: 43698940. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 12:15:21,757][02692] Avg episode reward: [(0, '0.073')] [2024-06-06 12:15:23,152][02924] Updated weights for policy 0, policy_version 8967 (0.0023) [2024-06-06 12:15:26,280][02924] Updated weights for policy 0, policy_version 8977 (0.0021) [2024-06-06 12:15:26,759][02692] Fps is (10 sec: 47502.1, 60 sec: 49423.1, 300 sec: 46541.3). Total num frames: 147079168. Throughput: 0: 49077.0. Samples: 43993920. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 12:15:26,760][02692] Avg episode reward: [(0, '0.069')] [2024-06-06 12:15:26,778][02904] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008977_147079168.pth... [2024-06-06 12:15:26,819][02904] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008253_135217152.pth [2024-06-06 12:15:29,739][02924] Updated weights for policy 0, policy_version 8987 (0.0034) [2024-06-06 12:15:31,757][02692] Fps is (10 sec: 50789.9, 60 sec: 49424.9, 300 sec: 46652.7). Total num frames: 147341312. Throughput: 0: 49119.0. Samples: 44289240. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 12:15:31,758][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:15:32,983][02924] Updated weights for policy 0, policy_version 8997 (0.0033) [2024-06-06 12:15:36,316][02924] Updated weights for policy 0, policy_version 9007 (0.0037) [2024-06-06 12:15:36,757][02692] Fps is (10 sec: 50802.3, 60 sec: 49425.2, 300 sec: 46541.7). Total num frames: 147587072. Throughput: 0: 49407.7. Samples: 44448000. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 12:15:36,757][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:15:39,717][02924] Updated weights for policy 0, policy_version 9017 (0.0034) [2024-06-06 12:15:41,757][02692] Fps is (10 sec: 47513.9, 60 sec: 49151.9, 300 sec: 46430.6). Total num frames: 147816448. Throughput: 0: 49305.9. Samples: 44736400. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 12:15:41,758][02692] Avg episode reward: [(0, '0.071')] [2024-06-06 12:15:43,051][02924] Updated weights for policy 0, policy_version 9027 (0.0034) [2024-06-06 12:15:46,183][02924] Updated weights for policy 0, policy_version 9037 (0.0029) [2024-06-06 12:15:46,759][02692] Fps is (10 sec: 47502.8, 60 sec: 49423.2, 300 sec: 46485.8). Total num frames: 148062208. Throughput: 0: 48932.7. Samples: 45028840. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 12:15:46,760][02692] Avg episode reward: [(0, '0.073')] [2024-06-06 12:15:49,698][02924] Updated weights for policy 0, policy_version 9047 (0.0027) [2024-06-06 12:15:51,757][02692] Fps is (10 sec: 49152.6, 60 sec: 49425.2, 300 sec: 46541.7). Total num frames: 148307968. Throughput: 0: 49212.1. Samples: 45176720. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:15:51,757][02692] Avg episode reward: [(0, '0.073')] [2024-06-06 12:15:52,730][02924] Updated weights for policy 0, policy_version 9057 (0.0026) [2024-06-06 12:15:56,280][02924] Updated weights for policy 0, policy_version 9067 (0.0029) [2024-06-06 12:15:56,760][02692] Fps is (10 sec: 50787.1, 60 sec: 49422.8, 300 sec: 49462.3). Total num frames: 148570112. Throughput: 0: 49504.3. Samples: 45479580. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:15:56,760][02692] Avg episode reward: [(0, '0.066')] [2024-06-06 12:15:59,500][02924] Updated weights for policy 0, policy_version 9077 (0.0030) [2024-06-06 12:16:01,757][02692] Fps is (10 sec: 47513.5, 60 sec: 48879.0, 300 sec: 49344.5). Total num frames: 148783104. Throughput: 0: 49338.2. Samples: 45772820. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 12:16:01,757][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:16:02,899][02924] Updated weights for policy 0, policy_version 9087 (0.0037) [2024-06-06 12:16:04,019][02904] Signal inference workers to stop experience collection... (800 times) [2024-06-06 12:16:04,019][02904] Signal inference workers to resume experience collection... (800 times) [2024-06-06 12:16:04,054][02924] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-06-06 12:16:04,055][02924] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-06-06 12:16:06,482][02924] Updated weights for policy 0, policy_version 9097 (0.0029) [2024-06-06 12:16:06,757][02692] Fps is (10 sec: 47527.3, 60 sec: 49152.0, 300 sec: 49398.8). Total num frames: 149045248. Throughput: 0: 49174.2. Samples: 45911780. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 12:16:06,758][02692] Avg episode reward: [(0, '0.072')] [2024-06-06 12:16:09,823][02924] Updated weights for policy 0, policy_version 9107 (0.0025) [2024-06-06 12:34:08,691][14064] Saving configuration to /workspace/metta/train_dir/p2.metta.4/config.json... [2024-06-06 12:34:08,708][14064] Rollout worker 0 uses device cpu [2024-06-06 12:34:08,708][14064] Rollout worker 1 uses device cpu [2024-06-06 12:34:08,708][14064] Rollout worker 2 uses device cpu [2024-06-06 12:34:08,708][14064] Rollout worker 3 uses device cpu [2024-06-06 12:34:08,708][14064] Rollout worker 4 uses device cpu [2024-06-06 12:34:08,708][14064] Rollout worker 5 uses device cpu [2024-06-06 12:34:08,708][14064] Rollout worker 6 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 7 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 8 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 9 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 10 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 11 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 12 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 13 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 14 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 15 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 16 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 17 uses device cpu [2024-06-06 12:34:08,709][14064] Rollout worker 18 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 19 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 20 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 21 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 22 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 23 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 24 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 25 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 26 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 27 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 28 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 29 uses device cpu [2024-06-06 12:34:08,710][14064] Rollout worker 30 uses device cpu [2024-06-06 12:34:08,711][14064] Rollout worker 31 uses device cpu [2024-06-06 12:34:09,227][14064] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 12:34:09,227][14064] InferenceWorker_p0-w0: min num requests: 10 [2024-06-06 12:34:09,271][14064] Starting all processes... [2024-06-06 12:34:09,271][14064] Starting process learner_proc0 [2024-06-06 12:34:09,534][14064] Starting all processes... [2024-06-06 12:34:09,536][14064] Starting process inference_proc0-0 [2024-06-06 12:34:09,536][14064] Starting process rollout_proc0 [2024-06-06 12:34:09,536][14064] Starting process rollout_proc1 [2024-06-06 12:34:09,537][14064] Starting process rollout_proc2 [2024-06-06 12:34:09,537][14064] Starting process rollout_proc3 [2024-06-06 12:34:09,537][14064] Starting process rollout_proc4 [2024-06-06 12:34:09,537][14064] Starting process rollout_proc5 [2024-06-06 12:34:09,537][14064] Starting process rollout_proc6 [2024-06-06 12:34:09,537][14064] Starting process rollout_proc7 [2024-06-06 12:34:09,538][14064] Starting process rollout_proc8 [2024-06-06 12:34:09,539][14064] Starting process rollout_proc9 [2024-06-06 12:34:09,542][14064] Starting process rollout_proc10 [2024-06-06 12:34:09,542][14064] Starting process rollout_proc11 [2024-06-06 12:34:09,543][14064] Starting process rollout_proc12 [2024-06-06 12:34:09,543][14064] Starting process rollout_proc13 [2024-06-06 12:34:09,543][14064] Starting process rollout_proc14 [2024-06-06 12:34:09,543][14064] Starting process rollout_proc15 [2024-06-06 12:34:09,544][14064] Starting process rollout_proc16 [2024-06-06 12:34:09,544][14064] Starting process rollout_proc17 [2024-06-06 12:34:09,545][14064] Starting process rollout_proc18 [2024-06-06 12:34:09,547][14064] Starting process rollout_proc19 [2024-06-06 12:34:09,547][14064] Starting process rollout_proc20 [2024-06-06 12:34:09,548][14064] Starting process rollout_proc21 [2024-06-06 12:34:09,549][14064] Starting process rollout_proc22 [2024-06-06 12:34:09,551][14064] Starting process rollout_proc23 [2024-06-06 12:34:09,551][14064] Starting process rollout_proc24 [2024-06-06 12:34:09,553][14064] Starting process rollout_proc25 [2024-06-06 12:34:09,553][14064] Starting process rollout_proc26 [2024-06-06 12:34:09,554][14064] Starting process rollout_proc27 [2024-06-06 12:34:09,555][14064] Starting process rollout_proc28 [2024-06-06 12:34:09,562][14064] Starting process rollout_proc29 [2024-06-06 12:34:09,562][14064] Starting process rollout_proc30 [2024-06-06 12:34:09,565][14064] Starting process rollout_proc31 [2024-06-06 12:34:11,436][14300] Worker 6 uses CPU cores [6] [2024-06-06 12:34:11,632][14315] Worker 16 uses CPU cores [16] [2024-06-06 12:34:11,636][14314] Worker 15 uses CPU cores [15] [2024-06-06 12:34:11,643][14316] Worker 19 uses CPU cores [19] [2024-06-06 12:34:11,660][14296] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 12:34:11,660][14296] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-06 12:34:11,664][14304] Worker 4 uses CPU cores [4] [2024-06-06 12:34:11,668][14276] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 12:34:11,668][14276] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-06 12:34:11,669][14296] Num visible devices: 1 [2024-06-06 12:34:11,680][14276] Num visible devices: 1 [2024-06-06 12:34:11,688][14297] Worker 0 uses CPU cores [0] [2024-06-06 12:34:11,708][14321] Worker 23 uses CPU cores [23] [2024-06-06 12:34:11,708][14276] Setting fixed seed 0 [2024-06-06 12:34:11,709][14276] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 12:34:11,709][14276] Initializing actor-critic model on device cuda:0 [2024-06-06 12:34:11,712][14305] Worker 8 uses CPU cores [8] [2024-06-06 12:34:11,728][14299] Worker 3 uses CPU cores [3] [2024-06-06 12:34:11,740][14302] Worker 7 uses CPU cores [7] [2024-06-06 12:34:11,760][14324] Worker 27 uses CPU cores [27] [2024-06-06 12:34:11,784][14322] Worker 26 uses CPU cores [26] [2024-06-06 12:34:11,812][14323] Worker 25 uses CPU cores [25] [2024-06-06 12:34:11,838][14301] Worker 5 uses CPU cores [5] [2024-06-06 12:34:11,839][14306] Worker 11 uses CPU cores [11] [2024-06-06 12:34:11,844][14327] Worker 28 uses CPU cores [28] [2024-06-06 12:34:11,848][14325] Worker 29 uses CPU cores [29] [2024-06-06 12:34:11,860][14308] Worker 12 uses CPU cores [12] [2024-06-06 12:34:11,883][14309] Worker 9 uses CPU cores [9] [2024-06-06 12:34:11,883][14303] Worker 2 uses CPU cores [2] [2024-06-06 12:34:11,892][14312] Worker 18 uses CPU cores [18] [2024-06-06 12:34:11,920][14307] Worker 10 uses CPU cores [10] [2024-06-06 12:34:11,930][14298] Worker 1 uses CPU cores [1] [2024-06-06 12:34:11,936][14328] Worker 30 uses CPU cores [30] [2024-06-06 12:34:11,967][14320] Worker 24 uses CPU cores [24] [2024-06-06 12:34:12,018][14317] Worker 20 uses CPU cores [20] [2024-06-06 12:34:12,030][14319] Worker 22 uses CPU cores [22] [2024-06-06 12:34:12,035][14326] Worker 31 uses CPU cores [31] [2024-06-06 12:34:12,042][14310] Worker 13 uses CPU cores [13] [2024-06-06 12:34:12,050][14311] Worker 14 uses CPU cores [14] [2024-06-06 12:34:12,058][14313] Worker 17 uses CPU cores [17] [2024-06-06 12:34:12,083][14318] Worker 21 uses CPU cores [21] [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,452][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,453][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,453][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,453][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,453][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,453][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,453][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,453][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,453][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,453][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,453][14276] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:12,456][14276] RunningMeanStd input shape: (1,) [2024-06-06 12:34:12,456][14276] RunningMeanStd input shape: (1,) [2024-06-06 12:34:12,456][14276] RunningMeanStd input shape: (1,) [2024-06-06 12:34:12,457][14276] RunningMeanStd input shape: (1,) [2024-06-06 12:34:12,496][14276] RunningMeanStd input shape: (1,) [2024-06-06 12:34:12,500][14276] Created Actor Critic model with architecture: [2024-06-06 12:34:12,500][14276] SampleFactoryAgentWrapper( (obs_normalizer): ObservationNormalizer() (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (agent): MettaAgent( (_encoder): MultiFeatureSetEncoder( (feature_set_encoders): ModuleDict( (grid_obs): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (agent): RunningMeanStdInPlace() (altar): RunningMeanStdInPlace() (converter): RunningMeanStdInPlace() (generator): RunningMeanStdInPlace() (wall): RunningMeanStdInPlace() (agent:dir): RunningMeanStdInPlace() (agent:energy): RunningMeanStdInPlace() (agent:frozen): RunningMeanStdInPlace() (agent:hp): RunningMeanStdInPlace() (agent:id): RunningMeanStdInPlace() (agent:inv_r1): RunningMeanStdInPlace() (agent:inv_r2): RunningMeanStdInPlace() (agent:inv_r3): RunningMeanStdInPlace() (agent:shield): RunningMeanStdInPlace() (altar:hp): RunningMeanStdInPlace() (altar:state): RunningMeanStdInPlace() (converter:hp): RunningMeanStdInPlace() (converter:state): RunningMeanStdInPlace() (generator:amount): RunningMeanStdInPlace() (generator:hp): RunningMeanStdInPlace() (generator:state): RunningMeanStdInPlace() (wall:hp): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=125, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=512, out_features=512, bias=True) (7): ELU(alpha=1.0) ) ) (global_vars): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (_steps): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_action): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_action_id): RunningMeanStdInPlace() (last_action_val): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_reward): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_reward): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) ) (merged_encoder): Sequential( (0): Linear(in_features=536, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) ) ) (_core): ModelCoreRNN( (core): GRU(512, 512) ) (_decoder): Decoder( (mlp): Identity() ) (_critic_linear): Linear(in_features=512, out_features=1, bias=True) (_action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) ) [2024-06-06 12:34:12,561][14276] Using optimizer [2024-06-06 12:34:12,701][14276] Loading state from checkpoint /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008977_147079168.pth... [2024-06-06 12:34:12,716][14276] Loading model from checkpoint [2024-06-06 12:34:12,718][14276] Loaded experiment state at self.train_step=8977, self.env_steps=147079168 [2024-06-06 12:34:12,718][14276] Initialized policy 0 weights for model version 8977 [2024-06-06 12:34:12,719][14276] LearnerWorker_p0 finished initialization! [2024-06-06 12:34:12,720][14276] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,347][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,348][14296] RunningMeanStd input shape: (11, 11) [2024-06-06 12:34:13,351][14296] RunningMeanStd input shape: (1,) [2024-06-06 12:34:13,351][14296] RunningMeanStd input shape: (1,) [2024-06-06 12:34:13,351][14296] RunningMeanStd input shape: (1,) [2024-06-06 12:34:13,352][14296] RunningMeanStd input shape: (1,) [2024-06-06 12:34:13,390][14296] RunningMeanStd input shape: (1,) [2024-06-06 12:34:13,411][14064] Inference worker 0-0 is ready! [2024-06-06 12:34:13,411][14064] All inference workers are ready! Signal rollout workers to start! [2024-06-06 12:34:15,483][14321] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,494][14317] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,495][14322] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,498][14319] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,500][14316] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,512][14315] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,515][14318] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,516][14326] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,517][14313] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,518][14312] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,518][14320] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,521][14323] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,521][14325] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,533][14301] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,534][14314] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,535][14327] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,536][14309] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,536][14310] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,537][14306] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,538][14304] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,539][14298] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,540][14305] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,541][14302] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,542][14311] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,544][14308] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,544][14303] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,544][14307] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,545][14297] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,545][14300] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,546][14299] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,553][14328] Decorrelating experience for 0 frames... [2024-06-06 12:34:15,570][14324] Decorrelating experience for 0 frames... [2024-06-06 12:34:16,222][14321] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,231][14317] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,235][14322] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,245][14319] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,252][14316] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,271][14315] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,282][14313] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,283][14312] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,285][14320] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,288][14301] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,288][14326] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,289][14314] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,293][14318] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,293][14310] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,293][14309] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,294][14306] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,294][14302] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,295][14304] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,295][14325] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,297][14298] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,298][14323] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,300][14308] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,301][14305] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,309][14311] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,310][14307] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,312][14297] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,313][14300] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,316][14299] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,316][14303] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,324][14327] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,341][14328] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,351][14324] Decorrelating experience for 256 frames... [2024-06-06 12:34:16,561][14064] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 147079168. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 12:34:21,561][14064] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 147079168. Throughput: 0: 32012.3. Samples: 160060. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 12:34:21,910][14321] Worker 23, sleep for 107.812 sec to decorrelate experience collection [2024-06-06 12:34:21,916][14314] Worker 15, sleep for 70.312 sec to decorrelate experience collection [2024-06-06 12:34:21,916][14311] Worker 14, sleep for 65.625 sec to decorrelate experience collection [2024-06-06 12:34:21,916][14305] Worker 8, sleep for 37.500 sec to decorrelate experience collection [2024-06-06 12:34:21,920][14315] Worker 16, sleep for 75.000 sec to decorrelate experience collection [2024-06-06 12:34:21,928][14299] Worker 3, sleep for 14.062 sec to decorrelate experience collection [2024-06-06 12:34:21,928][14312] Worker 18, sleep for 84.375 sec to decorrelate experience collection [2024-06-06 12:34:21,929][14313] Worker 17, sleep for 79.688 sec to decorrelate experience collection [2024-06-06 12:34:21,929][14318] Worker 21, sleep for 98.438 sec to decorrelate experience collection [2024-06-06 12:34:21,929][14319] Worker 22, sleep for 103.125 sec to decorrelate experience collection [2024-06-06 12:34:21,930][14322] Worker 26, sleep for 121.875 sec to decorrelate experience collection [2024-06-06 12:34:21,931][14316] Worker 19, sleep for 89.062 sec to decorrelate experience collection [2024-06-06 12:34:21,939][14317] Worker 20, sleep for 93.750 sec to decorrelate experience collection [2024-06-06 12:34:21,939][14327] Worker 28, sleep for 131.250 sec to decorrelate experience collection [2024-06-06 12:34:21,945][14309] Worker 9, sleep for 42.188 sec to decorrelate experience collection [2024-06-06 12:34:21,947][14306] Worker 11, sleep for 51.562 sec to decorrelate experience collection [2024-06-06 12:34:21,948][14298] Worker 1, sleep for 4.688 sec to decorrelate experience collection [2024-06-06 12:34:21,955][14307] Worker 10, sleep for 46.875 sec to decorrelate experience collection [2024-06-06 12:34:21,956][14308] Worker 12, sleep for 56.250 sec to decorrelate experience collection [2024-06-06 12:34:21,958][14323] Worker 25, sleep for 117.188 sec to decorrelate experience collection [2024-06-06 12:34:21,958][14326] Worker 31, sleep for 145.312 sec to decorrelate experience collection [2024-06-06 12:34:21,959][14310] Worker 13, sleep for 60.938 sec to decorrelate experience collection [2024-06-06 12:34:21,966][14320] Worker 24, sleep for 112.500 sec to decorrelate experience collection [2024-06-06 12:34:21,968][14303] Worker 2, sleep for 9.375 sec to decorrelate experience collection [2024-06-06 12:34:21,970][14302] Worker 7, sleep for 32.812 sec to decorrelate experience collection [2024-06-06 12:34:21,970][14301] Worker 5, sleep for 23.438 sec to decorrelate experience collection [2024-06-06 12:34:21,974][14304] Worker 4, sleep for 18.750 sec to decorrelate experience collection [2024-06-06 12:34:21,976][14325] Worker 29, sleep for 135.938 sec to decorrelate experience collection [2024-06-06 12:34:21,983][14328] Worker 30, sleep for 140.625 sec to decorrelate experience collection [2024-06-06 12:34:22,012][14276] Signal inference workers to stop experience collection... [2024-06-06 12:34:22,013][14300] Worker 6, sleep for 28.125 sec to decorrelate experience collection [2024-06-06 12:34:22,014][14324] Worker 27, sleep for 126.562 sec to decorrelate experience collection [2024-06-06 12:34:22,075][14296] InferenceWorker_p0-w0: stopping experience collection [2024-06-06 12:34:22,526][14276] Signal inference workers to resume experience collection... [2024-06-06 12:34:22,526][14296] InferenceWorker_p0-w0: resuming experience collection [2024-06-06 12:34:23,624][14296] Updated weights for policy 0, policy_version 8987 (0.0013) [2024-06-06 12:34:26,562][14064] Fps is (10 sec: 16383.6, 60 sec: 16383.6, 300 sec: 16383.6). Total num frames: 147243008. Throughput: 0: 33065.3. Samples: 330660. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 12:34:26,659][14298] Worker 1 awakens! [2024-06-06 12:34:29,223][14064] Heartbeat connected on Batcher_0 [2024-06-06 12:34:29,225][14064] Heartbeat connected on LearnerWorker_p0 [2024-06-06 12:34:29,230][14064] Heartbeat connected on RolloutWorker_w0 [2024-06-06 12:34:29,232][14064] Heartbeat connected on RolloutWorker_w1 [2024-06-06 12:34:29,273][14064] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-06 12:34:31,391][14303] Worker 2 awakens! [2024-06-06 12:34:31,395][14064] Heartbeat connected on RolloutWorker_w2 [2024-06-06 12:34:31,562][14064] Fps is (10 sec: 16383.6, 60 sec: 10922.5, 300 sec: 10922.5). Total num frames: 147243008. Throughput: 0: 22851.7. Samples: 342780. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 12:34:36,061][14299] Worker 3 awakens! [2024-06-06 12:34:36,068][14064] Heartbeat connected on RolloutWorker_w3 [2024-06-06 12:34:36,562][14064] Fps is (10 sec: 3276.8, 60 sec: 9830.3, 300 sec: 9830.3). Total num frames: 147275776. Throughput: 0: 17636.8. Samples: 352740. Policy #0 lag: (min: 0.0, avg: 9.4, max: 10.0) [2024-06-06 12:34:40,816][14304] Worker 4 awakens! [2024-06-06 12:34:40,822][14064] Heartbeat connected on RolloutWorker_w4 [2024-06-06 12:34:41,561][14064] Fps is (10 sec: 6553.8, 60 sec: 9175.1, 300 sec: 9175.1). Total num frames: 147308544. Throughput: 0: 15094.4. Samples: 377360. Policy #0 lag: (min: 0.0, avg: 4.3, max: 12.0) [2024-06-06 12:34:41,562][14064] Avg episode reward: [(0, '0.065')] [2024-06-06 12:34:45,508][14301] Worker 5 awakens! [2024-06-06 12:34:45,512][14064] Heartbeat connected on RolloutWorker_w5 [2024-06-06 12:34:46,561][14064] Fps is (10 sec: 9830.7, 60 sec: 9830.4, 300 sec: 9830.4). Total num frames: 147374080. Throughput: 0: 16002.7. Samples: 480080. Policy #0 lag: (min: 0.0, avg: 4.3, max: 12.0) [2024-06-06 12:34:46,561][14064] Avg episode reward: [(0, '0.059')] [2024-06-06 12:34:47,066][14296] Updated weights for policy 0, policy_version 8997 (0.0012) [2024-06-06 12:34:50,236][14300] Worker 6 awakens! [2024-06-06 12:34:50,240][14064] Heartbeat connected on RolloutWorker_w6 [2024-06-06 12:34:51,561][14064] Fps is (10 sec: 19660.9, 60 sec: 12171.0, 300 sec: 12171.0). Total num frames: 147505152. Throughput: 0: 15515.5. Samples: 543040. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) [2024-06-06 12:34:51,568][14064] Avg episode reward: [(0, '0.062')] [2024-06-06 12:34:54,323][14296] Updated weights for policy 0, policy_version 9007 (0.0011) [2024-06-06 12:34:54,858][14302] Worker 7 awakens! [2024-06-06 12:34:54,865][14064] Heartbeat connected on RolloutWorker_w7 [2024-06-06 12:34:56,561][14064] Fps is (10 sec: 26214.4, 60 sec: 13926.4, 300 sec: 13926.4). Total num frames: 147636224. Throughput: 0: 17346.5. Samples: 693860. Policy #0 lag: (min: 0.0, avg: 3.2, max: 6.0) [2024-06-06 12:34:56,561][14064] Avg episode reward: [(0, '0.074')] [2024-06-06 12:34:59,517][14305] Worker 8 awakens! [2024-06-06 12:34:59,520][14064] Heartbeat connected on RolloutWorker_w8 [2024-06-06 12:35:00,097][14296] Updated weights for policy 0, policy_version 9017 (0.0011) [2024-06-06 12:35:01,561][14064] Fps is (10 sec: 26214.2, 60 sec: 15291.7, 300 sec: 15291.7). Total num frames: 147767296. Throughput: 0: 19364.0. Samples: 871380. Policy #0 lag: (min: 0.0, avg: 3.2, max: 6.0) [2024-06-06 12:35:01,562][14064] Avg episode reward: [(0, '0.072')] [2024-06-06 12:35:04,233][14309] Worker 9 awakens! [2024-06-06 12:35:04,237][14064] Heartbeat connected on RolloutWorker_w9 [2024-06-06 12:35:04,814][14296] Updated weights for policy 0, policy_version 9027 (0.0011) [2024-06-06 12:35:06,561][14064] Fps is (10 sec: 27852.8, 60 sec: 16711.7, 300 sec: 16711.7). Total num frames: 147914752. Throughput: 0: 17748.9. Samples: 958760. Policy #0 lag: (min: 0.0, avg: 2.5, max: 6.0) [2024-06-06 12:35:06,568][14064] Avg episode reward: [(0, '0.065')] [2024-06-06 12:35:08,928][14307] Worker 10 awakens! [2024-06-06 12:35:08,932][14064] Heartbeat connected on RolloutWorker_w10 [2024-06-06 12:35:10,082][14296] Updated weights for policy 0, policy_version 9037 (0.0012) [2024-06-06 12:35:11,561][14064] Fps is (10 sec: 34406.4, 60 sec: 18767.1, 300 sec: 18767.1). Total num frames: 148111360. Throughput: 0: 18602.8. Samples: 1167780. Policy #0 lag: (min: 0.0, avg: 2.5, max: 6.0) [2024-06-06 12:35:11,562][14064] Avg episode reward: [(0, '0.067')] [2024-06-06 12:35:13,608][14306] Worker 11 awakens! [2024-06-06 12:35:13,612][14064] Heartbeat connected on RolloutWorker_w11 [2024-06-06 12:35:13,694][14296] Updated weights for policy 0, policy_version 9047 (0.0012) [2024-06-06 12:35:16,561][14064] Fps is (10 sec: 42598.0, 60 sec: 21026.1, 300 sec: 21026.1). Total num frames: 148340736. Throughput: 0: 23941.0. Samples: 1420120. Policy #0 lag: (min: 0.0, avg: 3.3, max: 7.0) [2024-06-06 12:35:16,562][14064] Avg episode reward: [(0, '0.070')] [2024-06-06 12:35:18,166][14296] Updated weights for policy 0, policy_version 9057 (0.0012) [2024-06-06 12:35:18,308][14308] Worker 12 awakens! [2024-06-06 12:35:18,312][14064] Heartbeat connected on RolloutWorker_w12 [2024-06-06 12:35:21,561][14064] Fps is (10 sec: 42598.8, 60 sec: 24303.0, 300 sec: 22433.5). Total num frames: 148537344. Throughput: 0: 26435.8. Samples: 1542340. Policy #0 lag: (min: 1.0, avg: 4.9, max: 10.0) [2024-06-06 12:35:21,561][14064] Avg episode reward: [(0, '0.077')] [2024-06-06 12:35:21,640][14276] Saving new best policy, reward=0.077! [2024-06-06 12:35:21,646][14296] Updated weights for policy 0, policy_version 9067 (0.0015) [2024-06-06 12:35:22,908][14310] Worker 13 awakens! [2024-06-06 12:35:22,915][14064] Heartbeat connected on RolloutWorker_w13 [2024-06-06 12:35:25,131][14296] Updated weights for policy 0, policy_version 9077 (0.0016) [2024-06-06 12:35:26,561][14064] Fps is (10 sec: 42598.4, 60 sec: 25395.3, 300 sec: 24107.9). Total num frames: 148766720. Throughput: 0: 31978.2. Samples: 1816380. Policy #0 lag: (min: 1.0, avg: 4.9, max: 10.0) [2024-06-06 12:35:26,562][14064] Avg episode reward: [(0, '0.075')] [2024-06-06 12:35:27,641][14311] Worker 14 awakens! [2024-06-06 12:35:27,646][14064] Heartbeat connected on RolloutWorker_w14 [2024-06-06 12:35:28,768][14296] Updated weights for policy 0, policy_version 9087 (0.0018) [2024-06-06 12:35:31,561][14064] Fps is (10 sec: 44236.1, 60 sec: 28945.2, 300 sec: 25340.6). Total num frames: 148979712. Throughput: 0: 35481.7. Samples: 2076760. Policy #0 lag: (min: 0.0, avg: 4.4, max: 10.0) [2024-06-06 12:35:31,562][14064] Avg episode reward: [(0, '0.069')] [2024-06-06 12:35:32,288][14314] Worker 15 awakens! [2024-06-06 12:35:32,294][14064] Heartbeat connected on RolloutWorker_w15 [2024-06-06 12:35:32,473][14296] Updated weights for policy 0, policy_version 9097 (0.0019) [2024-06-06 12:35:36,402][14296] Updated weights for policy 0, policy_version 9107 (0.0021) [2024-06-06 12:35:36,561][14064] Fps is (10 sec: 44236.9, 60 sec: 32222.0, 300 sec: 26624.0). Total num frames: 149209088. Throughput: 0: 36933.7. Samples: 2205060. Policy #0 lag: (min: 0.0, avg: 4.7, max: 11.0) [2024-06-06 12:35:36,562][14064] Avg episode reward: [(0, '0.072')] [2024-06-06 12:35:36,958][14315] Worker 16 awakens! [2024-06-06 12:35:36,967][14064] Heartbeat connected on RolloutWorker_w16 [2024-06-06 12:35:40,506][14296] Updated weights for policy 0, policy_version 9117 (0.0021) [2024-06-06 12:35:41,561][14064] Fps is (10 sec: 42598.6, 60 sec: 34952.5, 300 sec: 27370.9). Total num frames: 149405696. Throughput: 0: 38963.5. Samples: 2447220. Policy #0 lag: (min: 0.0, avg: 4.7, max: 11.0) [2024-06-06 12:35:41,562][14064] Avg episode reward: [(0, '0.068')] [2024-06-06 12:35:41,717][14313] Worker 17 awakens! [2024-06-06 12:35:41,725][14064] Heartbeat connected on RolloutWorker_w17 [2024-06-06 12:35:44,576][14296] Updated weights for policy 0, policy_version 9127 (0.0021) [2024-06-06 12:35:46,390][14312] Worker 18 awakens! [2024-06-06 12:35:46,398][14064] Heartbeat connected on RolloutWorker_w18 [2024-06-06 12:35:46,561][14064] Fps is (10 sec: 40959.9, 60 sec: 37410.1, 300 sec: 28216.9). Total num frames: 149618688. Throughput: 0: 40601.7. Samples: 2698460. Policy #0 lag: (min: 0.0, avg: 22.6, max: 143.0) [2024-06-06 12:35:46,562][14064] Avg episode reward: [(0, '0.070')] [2024-06-06 12:35:48,732][14296] Updated weights for policy 0, policy_version 9137 (0.0023) [2024-06-06 12:35:51,106][14316] Worker 19 awakens! [2024-06-06 12:35:51,115][14064] Heartbeat connected on RolloutWorker_w19 [2024-06-06 12:35:51,561][14064] Fps is (10 sec: 42598.8, 60 sec: 38775.5, 300 sec: 28973.8). Total num frames: 149831680. Throughput: 0: 41670.7. Samples: 2833940. Policy #0 lag: (min: 0.0, avg: 7.3, max: 13.0) [2024-06-06 12:35:51,561][14064] Avg episode reward: [(0, '0.070')] [2024-06-06 12:35:52,157][14296] Updated weights for policy 0, policy_version 9147 (0.0025) [2024-06-06 12:35:55,744][14296] Updated weights for policy 0, policy_version 9157 (0.0025) [2024-06-06 12:35:55,788][14317] Worker 20 awakens! [2024-06-06 12:35:55,797][14064] Heartbeat connected on RolloutWorker_w20 [2024-06-06 12:35:56,561][14064] Fps is (10 sec: 44237.1, 60 sec: 40413.9, 300 sec: 29818.9). Total num frames: 150061056. Throughput: 0: 42980.9. Samples: 3101920. Policy #0 lag: (min: 0.0, avg: 7.3, max: 13.0) [2024-06-06 12:35:56,562][14064] Avg episode reward: [(0, '0.080')] [2024-06-06 12:35:56,570][14276] Saving new best policy, reward=0.080! [2024-06-06 12:35:59,472][14296] Updated weights for policy 0, policy_version 9167 (0.0021) [2024-06-06 12:36:00,464][14318] Worker 21 awakens! [2024-06-06 12:36:00,474][14064] Heartbeat connected on RolloutWorker_w21 [2024-06-06 12:36:01,561][14064] Fps is (10 sec: 44236.5, 60 sec: 41779.2, 300 sec: 30427.4). Total num frames: 150274048. Throughput: 0: 43253.4. Samples: 3366520. Policy #0 lag: (min: 0.0, avg: 6.2, max: 16.0) [2024-06-06 12:36:01,562][14064] Avg episode reward: [(0, '0.072')] [2024-06-06 12:36:03,112][14296] Updated weights for policy 0, policy_version 9177 (0.0025) [2024-06-06 12:36:05,152][14319] Worker 22 awakens! [2024-06-06 12:36:05,163][14064] Heartbeat connected on RolloutWorker_w22 [2024-06-06 12:36:06,483][14296] Updated weights for policy 0, policy_version 9187 (0.0021) [2024-06-06 12:36:06,561][14064] Fps is (10 sec: 45874.5, 60 sec: 43417.5, 300 sec: 31278.5). Total num frames: 150519808. Throughput: 0: 43671.8. Samples: 3507580. Policy #0 lag: (min: 1.0, avg: 6.1, max: 14.0) [2024-06-06 12:36:06,562][14064] Avg episode reward: [(0, '0.077')] [2024-06-06 12:36:06,571][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000009187_150519808.pth... [2024-06-06 12:36:06,619][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008615_141148160.pth [2024-06-06 12:36:09,822][14321] Worker 23 awakens! [2024-06-06 12:36:09,833][14064] Heartbeat connected on RolloutWorker_w23 [2024-06-06 12:36:10,443][14296] Updated weights for policy 0, policy_version 9197 (0.0026) [2024-06-06 12:36:11,561][14064] Fps is (10 sec: 47513.2, 60 sec: 43963.7, 300 sec: 31913.2). Total num frames: 150749184. Throughput: 0: 43628.8. Samples: 3779680. Policy #0 lag: (min: 1.0, avg: 6.1, max: 14.0) [2024-06-06 12:36:11,562][14064] Avg episode reward: [(0, '0.070')] [2024-06-06 12:36:13,742][14296] Updated weights for policy 0, policy_version 9207 (0.0025) [2024-06-06 12:36:14,508][14320] Worker 24 awakens! [2024-06-06 12:36:14,518][14064] Heartbeat connected on RolloutWorker_w24 [2024-06-06 12:36:16,561][14064] Fps is (10 sec: 47514.3, 60 sec: 44236.8, 300 sec: 32631.5). Total num frames: 150994944. Throughput: 0: 43960.1. Samples: 4054960. Policy #0 lag: (min: 0.0, avg: 7.3, max: 16.0) [2024-06-06 12:36:16,562][14064] Avg episode reward: [(0, '0.074')] [2024-06-06 12:36:17,225][14296] Updated weights for policy 0, policy_version 9217 (0.0022) [2024-06-06 12:36:19,245][14323] Worker 25 awakens! [2024-06-06 12:36:19,258][14064] Heartbeat connected on RolloutWorker_w25 [2024-06-06 12:36:21,207][14296] Updated weights for policy 0, policy_version 9227 (0.0030) [2024-06-06 12:36:21,561][14064] Fps is (10 sec: 44237.2, 60 sec: 44236.7, 300 sec: 32899.1). Total num frames: 151191552. Throughput: 0: 44371.1. Samples: 4201760. Policy #0 lag: (min: 0.0, avg: 8.4, max: 17.0) [2024-06-06 12:36:21,562][14064] Avg episode reward: [(0, '0.078')] [2024-06-06 12:36:23,904][14322] Worker 26 awakens! [2024-06-06 12:36:23,916][14064] Heartbeat connected on RolloutWorker_w26 [2024-06-06 12:36:24,426][14296] Updated weights for policy 0, policy_version 9237 (0.0033) [2024-06-06 12:36:26,561][14064] Fps is (10 sec: 42597.7, 60 sec: 44236.7, 300 sec: 33398.1). Total num frames: 151420928. Throughput: 0: 45167.4. Samples: 4479760. Policy #0 lag: (min: 0.0, avg: 8.4, max: 17.0) [2024-06-06 12:36:26,562][14064] Avg episode reward: [(0, '0.067')] [2024-06-06 12:36:27,672][14296] Updated weights for policy 0, policy_version 9247 (0.0034) [2024-06-06 12:36:28,618][14324] Worker 27 awakens! [2024-06-06 12:36:28,630][14064] Heartbeat connected on RolloutWorker_w27 [2024-06-06 12:36:31,033][14296] Updated weights for policy 0, policy_version 9257 (0.0025) [2024-06-06 12:36:31,561][14064] Fps is (10 sec: 47513.6, 60 sec: 44783.0, 300 sec: 33981.6). Total num frames: 151666688. Throughput: 0: 46020.9. Samples: 4769400. Policy #0 lag: (min: 2.0, avg: 59.4, max: 272.0) [2024-06-06 12:36:31,562][14064] Avg episode reward: [(0, '0.070')] [2024-06-06 12:36:33,258][14327] Worker 28 awakens! [2024-06-06 12:36:33,271][14064] Heartbeat connected on RolloutWorker_w28 [2024-06-06 12:36:34,192][14296] Updated weights for policy 0, policy_version 9267 (0.0025) [2024-06-06 12:36:36,561][14064] Fps is (10 sec: 49152.6, 60 sec: 45056.0, 300 sec: 34523.4). Total num frames: 151912448. Throughput: 0: 46167.0. Samples: 4911460. Policy #0 lag: (min: 2.0, avg: 59.4, max: 272.0) [2024-06-06 12:36:36,562][14064] Avg episode reward: [(0, '0.074')] [2024-06-06 12:36:36,632][14276] Signal inference workers to stop experience collection... (50 times) [2024-06-06 12:36:36,677][14296] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-06 12:36:36,690][14276] Signal inference workers to resume experience collection... (50 times) [2024-06-06 12:36:36,692][14296] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-06 12:36:37,710][14296] Updated weights for policy 0, policy_version 9277 (0.0029) [2024-06-06 12:36:38,014][14325] Worker 29 awakens! [2024-06-06 12:36:38,027][14064] Heartbeat connected on RolloutWorker_w29 [2024-06-06 12:36:41,090][14296] Updated weights for policy 0, policy_version 9287 (0.0028) [2024-06-06 12:36:41,561][14064] Fps is (10 sec: 49151.4, 60 sec: 45875.1, 300 sec: 35027.8). Total num frames: 152158208. Throughput: 0: 46804.8. Samples: 5208140. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 12:36:41,562][14064] Avg episode reward: [(0, '0.070')] [2024-06-06 12:36:42,656][14328] Worker 30 awakens! [2024-06-06 12:36:42,669][14064] Heartbeat connected on RolloutWorker_w30 [2024-06-06 12:36:45,146][14296] Updated weights for policy 0, policy_version 9297 (0.0020) [2024-06-06 12:36:46,561][14064] Fps is (10 sec: 50790.3, 60 sec: 46694.4, 300 sec: 35607.9). Total num frames: 152420352. Throughput: 0: 47457.7. Samples: 5502120. Policy #0 lag: (min: 0.0, avg: 8.9, max: 20.0) [2024-06-06 12:36:46,562][14064] Avg episode reward: [(0, '0.069')] [2024-06-06 12:36:47,320][14326] Worker 31 awakens! [2024-06-06 12:36:47,332][14064] Heartbeat connected on RolloutWorker_w31 [2024-06-06 12:36:47,717][14296] Updated weights for policy 0, policy_version 9307 (0.0026) [2024-06-06 12:36:51,451][14296] Updated weights for policy 0, policy_version 9317 (0.0029) [2024-06-06 12:36:51,562][14064] Fps is (10 sec: 49149.4, 60 sec: 46966.9, 300 sec: 35939.0). Total num frames: 152649728. Throughput: 0: 47768.4. Samples: 5657180. Policy #0 lag: (min: 0.0, avg: 8.9, max: 20.0) [2024-06-06 12:36:51,562][14064] Avg episode reward: [(0, '0.074')] [2024-06-06 12:36:54,535][14296] Updated weights for policy 0, policy_version 9327 (0.0035) [2024-06-06 12:36:56,564][14064] Fps is (10 sec: 49139.4, 60 sec: 47511.5, 300 sec: 36453.8). Total num frames: 152911872. Throughput: 0: 48342.2. Samples: 5955200. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 12:36:56,564][14064] Avg episode reward: [(0, '0.067')] [2024-06-06 12:36:58,089][14296] Updated weights for policy 0, policy_version 9337 (0.0031) [2024-06-06 12:37:00,845][14296] Updated weights for policy 0, policy_version 9347 (0.0027) [2024-06-06 12:37:01,561][14064] Fps is (10 sec: 50793.1, 60 sec: 48059.7, 300 sec: 36839.2). Total num frames: 153157632. Throughput: 0: 48977.7. Samples: 6258960. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 12:37:01,562][14064] Avg episode reward: [(0, '0.065')] [2024-06-06 12:37:04,672][14296] Updated weights for policy 0, policy_version 9357 (0.0026) [2024-06-06 12:37:06,561][14064] Fps is (10 sec: 50804.0, 60 sec: 48333.0, 300 sec: 37297.7). Total num frames: 153419776. Throughput: 0: 49082.3. Samples: 6410460. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 12:37:06,561][14064] Avg episode reward: [(0, '0.072')] [2024-06-06 12:37:07,345][14296] Updated weights for policy 0, policy_version 9367 (0.0025) [2024-06-06 12:37:11,258][14296] Updated weights for policy 0, policy_version 9377 (0.0026) [2024-06-06 12:37:11,566][14064] Fps is (10 sec: 49128.3, 60 sec: 48328.9, 300 sec: 37541.7). Total num frames: 153649152. Throughput: 0: 49262.8. Samples: 6696820. Policy #0 lag: (min: 0.0, avg: 9.2, max: 20.0) [2024-06-06 12:37:11,567][14064] Avg episode reward: [(0, '0.076')] [2024-06-06 12:37:13,942][14296] Updated weights for policy 0, policy_version 9387 (0.0023) [2024-06-06 12:37:16,561][14064] Fps is (10 sec: 49151.5, 60 sec: 48605.8, 300 sec: 37956.3). Total num frames: 153911296. Throughput: 0: 49649.3. Samples: 7003620. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 12:37:16,562][14064] Avg episode reward: [(0, '0.078')] [2024-06-06 12:37:17,697][14296] Updated weights for policy 0, policy_version 9397 (0.0031) [2024-06-06 12:37:20,658][14296] Updated weights for policy 0, policy_version 9407 (0.0030) [2024-06-06 12:37:21,561][14064] Fps is (10 sec: 50815.5, 60 sec: 49425.1, 300 sec: 38258.9). Total num frames: 154157056. Throughput: 0: 49674.7. Samples: 7146820. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 12:37:21,562][14064] Avg episode reward: [(0, '0.073')] [2024-06-06 12:37:24,363][14296] Updated weights for policy 0, policy_version 9417 (0.0032) [2024-06-06 12:37:26,561][14064] Fps is (10 sec: 50790.9, 60 sec: 49971.4, 300 sec: 38631.8). Total num frames: 154419200. Throughput: 0: 49685.9. Samples: 7444000. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 12:37:26,561][14064] Avg episode reward: [(0, '0.073')] [2024-06-06 12:37:27,158][14296] Updated weights for policy 0, policy_version 9427 (0.0020) [2024-06-06 12:37:30,881][14296] Updated weights for policy 0, policy_version 9437 (0.0023) [2024-06-06 12:37:31,561][14064] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 38817.5). Total num frames: 154648576. Throughput: 0: 49804.1. Samples: 7743300. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 12:37:31,562][14064] Avg episode reward: [(0, '0.072')] [2024-06-06 12:37:33,571][14296] Updated weights for policy 0, policy_version 9447 (0.0022) [2024-06-06 12:37:36,561][14064] Fps is (10 sec: 49150.8, 60 sec: 49971.1, 300 sec: 39157.7). Total num frames: 154910720. Throughput: 0: 49774.7. Samples: 7897020. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 12:37:36,570][14064] Avg episode reward: [(0, '0.074')] [2024-06-06 12:37:37,433][14296] Updated weights for policy 0, policy_version 9457 (0.0033) [2024-06-06 12:37:40,111][14296] Updated weights for policy 0, policy_version 9467 (0.0026) [2024-06-06 12:37:41,561][14064] Fps is (10 sec: 50790.6, 60 sec: 49971.3, 300 sec: 39401.5). Total num frames: 155156480. Throughput: 0: 49765.6. Samples: 8194520. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-06 12:37:41,561][14064] Avg episode reward: [(0, '0.070')] [2024-06-06 12:37:43,902][14296] Updated weights for policy 0, policy_version 9477 (0.0029) [2024-06-06 12:37:46,561][14064] Fps is (10 sec: 50791.2, 60 sec: 49971.2, 300 sec: 39711.7). Total num frames: 155418624. Throughput: 0: 49570.3. Samples: 8489620. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-06 12:37:46,562][14064] Avg episode reward: [(0, '0.070')] [2024-06-06 12:37:46,866][14296] Updated weights for policy 0, policy_version 9487 (0.0023) [2024-06-06 12:37:50,534][14296] Updated weights for policy 0, policy_version 9497 (0.0036) [2024-06-06 12:37:51,561][14064] Fps is (10 sec: 49151.6, 60 sec: 49971.7, 300 sec: 39855.0). Total num frames: 155648000. Throughput: 0: 49671.0. Samples: 8645660. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 12:37:51,570][14064] Avg episode reward: [(0, '0.076')] [2024-06-06 12:37:53,303][14296] Updated weights for policy 0, policy_version 9507 (0.0030) [2024-06-06 12:37:56,561][14064] Fps is (10 sec: 47514.0, 60 sec: 49700.3, 300 sec: 40066.3). Total num frames: 155893760. Throughput: 0: 49832.2. Samples: 8939020. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-06 12:37:56,562][14064] Avg episode reward: [(0, '0.077')] [2024-06-06 12:37:56,987][14296] Updated weights for policy 0, policy_version 9517 (0.0031) [2024-06-06 12:37:59,978][14296] Updated weights for policy 0, policy_version 9527 (0.0027) [2024-06-06 12:38:01,561][14064] Fps is (10 sec: 49152.0, 60 sec: 49698.2, 300 sec: 40268.2). Total num frames: 156139520. Throughput: 0: 49596.0. Samples: 9235440. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-06 12:38:01,562][14064] Avg episode reward: [(0, '0.075')] [2024-06-06 12:38:03,766][14296] Updated weights for policy 0, policy_version 9537 (0.0025) [2024-06-06 12:38:06,425][14276] Signal inference workers to stop experience collection... (100 times) [2024-06-06 12:38:06,472][14296] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-06 12:38:06,477][14276] Signal inference workers to resume experience collection... (100 times) [2024-06-06 12:38:06,493][14296] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-06 12:38:06,561][14064] Fps is (10 sec: 50789.9, 60 sec: 49698.0, 300 sec: 40532.6). Total num frames: 156401664. Throughput: 0: 49876.4. Samples: 9391260. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 12:38:06,562][14064] Avg episode reward: [(0, '0.080')] [2024-06-06 12:38:06,600][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000009547_156418048.pth... [2024-06-06 12:38:06,601][14296] Updated weights for policy 0, policy_version 9547 (0.0021) [2024-06-06 12:38:06,641][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000008977_147079168.pth [2024-06-06 12:38:10,505][14296] Updated weights for policy 0, policy_version 9557 (0.0037) [2024-06-06 12:38:11,561][14064] Fps is (10 sec: 47513.6, 60 sec: 49429.1, 300 sec: 40576.5). Total num frames: 156614656. Throughput: 0: 49535.9. Samples: 9673120. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 12:38:11,562][14064] Avg episode reward: [(0, '0.071')] [2024-06-06 12:38:13,374][14296] Updated weights for policy 0, policy_version 9567 (0.0023) [2024-06-06 12:38:16,561][14064] Fps is (10 sec: 45875.6, 60 sec: 49152.1, 300 sec: 40755.2). Total num frames: 156860416. Throughput: 0: 49565.8. Samples: 9973760. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 12:38:16,562][14064] Avg episode reward: [(0, '0.082')] [2024-06-06 12:38:16,576][14276] Saving new best policy, reward=0.082! [2024-06-06 12:38:17,168][14296] Updated weights for policy 0, policy_version 9577 (0.0031) [2024-06-06 12:38:19,855][14296] Updated weights for policy 0, policy_version 9587 (0.0034) [2024-06-06 12:38:21,561][14064] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 41060.3). Total num frames: 157138944. Throughput: 0: 49261.5. Samples: 10113780. Policy #0 lag: (min: 0.0, avg: 9.6, max: 23.0) [2024-06-06 12:38:21,562][14064] Avg episode reward: [(0, '0.072')] [2024-06-06 12:38:23,723][14296] Updated weights for policy 0, policy_version 9597 (0.0033) [2024-06-06 12:38:26,561][14064] Fps is (10 sec: 52428.3, 60 sec: 49425.0, 300 sec: 41222.1). Total num frames: 157384704. Throughput: 0: 49219.4. Samples: 10409400. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 12:38:26,562][14064] Avg episode reward: [(0, '0.073')] [2024-06-06 12:38:27,006][14296] Updated weights for policy 0, policy_version 9607 (0.0040) [2024-06-06 12:38:30,425][14296] Updated weights for policy 0, policy_version 9617 (0.0030) [2024-06-06 12:38:31,564][14064] Fps is (10 sec: 47501.5, 60 sec: 49422.9, 300 sec: 41313.0). Total num frames: 157614080. Throughput: 0: 49308.3. Samples: 10708620. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 12:38:31,564][14064] Avg episode reward: [(0, '0.076')] [2024-06-06 12:38:33,419][14296] Updated weights for policy 0, policy_version 9627 (0.0027) [2024-06-06 12:38:36,561][14064] Fps is (10 sec: 47513.4, 60 sec: 49152.1, 300 sec: 41464.1). Total num frames: 157859840. Throughput: 0: 49059.5. Samples: 10853340. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 12:38:36,562][14064] Avg episode reward: [(0, '0.071')] [2024-06-06 12:38:37,050][14296] Updated weights for policy 0, policy_version 9637 (0.0035) [2024-06-06 12:38:39,785][14296] Updated weights for policy 0, policy_version 9647 (0.0026) [2024-06-06 12:38:41,561][14064] Fps is (10 sec: 50803.5, 60 sec: 49425.0, 300 sec: 41671.0). Total num frames: 158121984. Throughput: 0: 49259.9. Samples: 11155720. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 12:38:41,562][14064] Avg episode reward: [(0, '0.074')] [2024-06-06 12:38:43,623][14296] Updated weights for policy 0, policy_version 9657 (0.0027) [2024-06-06 12:38:46,419][14296] Updated weights for policy 0, policy_version 9667 (0.0030) [2024-06-06 12:38:46,561][14064] Fps is (10 sec: 52429.6, 60 sec: 49425.1, 300 sec: 41870.2). Total num frames: 158384128. Throughput: 0: 49272.5. Samples: 11452700. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 12:38:46,562][14064] Avg episode reward: [(0, '0.076')] [2024-06-06 12:38:50,205][14296] Updated weights for policy 0, policy_version 9677 (0.0036) [2024-06-06 12:38:51,561][14064] Fps is (10 sec: 47514.1, 60 sec: 49152.1, 300 sec: 41883.5). Total num frames: 158597120. Throughput: 0: 49200.2. Samples: 11605260. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 12:38:51,561][14064] Avg episode reward: [(0, '0.079')] [2024-06-06 12:38:53,165][14296] Updated weights for policy 0, policy_version 9687 (0.0032) [2024-06-06 12:38:56,561][14064] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 42013.3). Total num frames: 158842880. Throughput: 0: 49541.0. Samples: 11902460. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 12:38:56,562][14064] Avg episode reward: [(0, '0.077')] [2024-06-06 12:38:56,825][14296] Updated weights for policy 0, policy_version 9697 (0.0037) [2024-06-06 12:38:59,643][14296] Updated weights for policy 0, policy_version 9707 (0.0032) [2024-06-06 12:39:01,561][14064] Fps is (10 sec: 50790.2, 60 sec: 49425.2, 300 sec: 42196.0). Total num frames: 159105024. Throughput: 0: 49504.0. Samples: 12201440. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 12:39:01,561][14064] Avg episode reward: [(0, '0.079')] [2024-06-06 12:39:03,557][14296] Updated weights for policy 0, policy_version 9717 (0.0035) [2024-06-06 12:39:06,110][14296] Updated weights for policy 0, policy_version 9727 (0.0025) [2024-06-06 12:39:06,561][14064] Fps is (10 sec: 52428.5, 60 sec: 49425.1, 300 sec: 42372.4). Total num frames: 159367168. Throughput: 0: 49761.8. Samples: 12353060. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 12:39:06,562][14064] Avg episode reward: [(0, '0.067')] [2024-06-06 12:39:10,214][14296] Updated weights for policy 0, policy_version 9737 (0.0023) [2024-06-06 12:39:11,561][14064] Fps is (10 sec: 49152.0, 60 sec: 49698.2, 300 sec: 42431.8). Total num frames: 159596544. Throughput: 0: 49650.4. Samples: 12643660. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 12:39:11,561][14064] Avg episode reward: [(0, '0.079')] [2024-06-06 12:39:12,711][14296] Updated weights for policy 0, policy_version 9747 (0.0026) [2024-06-06 12:39:16,561][14064] Fps is (10 sec: 45875.0, 60 sec: 49425.0, 300 sec: 43209.3). Total num frames: 159825920. Throughput: 0: 49655.2. Samples: 12942980. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 12:39:16,562][14064] Avg episode reward: [(0, '0.079')] [2024-06-06 12:39:16,911][14296] Updated weights for policy 0, policy_version 9757 (0.0027) [2024-06-06 12:39:19,538][14296] Updated weights for policy 0, policy_version 9767 (0.0026) [2024-06-06 12:39:21,561][14064] Fps is (10 sec: 49151.6, 60 sec: 49152.0, 300 sec: 43542.6). Total num frames: 160088064. Throughput: 0: 49613.9. Samples: 13085960. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 12:39:21,562][14064] Avg episode reward: [(0, '0.077')] [2024-06-06 12:39:23,274][14296] Updated weights for policy 0, policy_version 9777 (0.0031) [2024-06-06 12:39:26,151][14296] Updated weights for policy 0, policy_version 9787 (0.0028) [2024-06-06 12:39:26,561][14064] Fps is (10 sec: 54067.3, 60 sec: 49698.1, 300 sec: 44486.8). Total num frames: 160366592. Throughput: 0: 49651.9. Samples: 13390060. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 12:39:26,562][14064] Avg episode reward: [(0, '0.084')] [2024-06-06 12:39:26,573][14276] Saving new best policy, reward=0.084! [2024-06-06 12:39:29,966][14296] Updated weights for policy 0, policy_version 9797 (0.0028) [2024-06-06 12:39:31,013][14276] Signal inference workers to stop experience collection... (150 times) [2024-06-06 12:39:31,041][14296] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-06 12:39:31,128][14276] Signal inference workers to resume experience collection... (150 times) [2024-06-06 12:39:31,128][14296] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-06 12:39:31,561][14064] Fps is (10 sec: 52429.2, 60 sec: 49973.4, 300 sec: 45208.8). Total num frames: 160612352. Throughput: 0: 49577.3. Samples: 13683680. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 12:39:31,561][14064] Avg episode reward: [(0, '0.079')] [2024-06-06 12:39:32,617][14296] Updated weights for policy 0, policy_version 9807 (0.0026) [2024-06-06 12:39:36,433][14296] Updated weights for policy 0, policy_version 9817 (0.0023) [2024-06-06 12:39:36,561][14064] Fps is (10 sec: 47514.2, 60 sec: 49698.3, 300 sec: 45875.2). Total num frames: 160841728. Throughput: 0: 49716.0. Samples: 13842480. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 12:39:36,561][14064] Avg episode reward: [(0, '0.080')] [2024-06-06 12:39:38,894][14296] Updated weights for policy 0, policy_version 9827 (0.0032) [2024-06-06 12:39:41,561][14064] Fps is (10 sec: 47513.0, 60 sec: 49425.0, 300 sec: 46486.1). Total num frames: 161087488. Throughput: 0: 49779.0. Samples: 14142520. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 12:39:41,562][14064] Avg episode reward: [(0, '0.079')] [2024-06-06 12:39:43,014][14296] Updated weights for policy 0, policy_version 9837 (0.0034) [2024-06-06 12:39:45,775][14296] Updated weights for policy 0, policy_version 9847 (0.0023) [2024-06-06 12:39:46,561][14064] Fps is (10 sec: 50790.0, 60 sec: 49425.0, 300 sec: 46930.4). Total num frames: 161349632. Throughput: 0: 49721.7. Samples: 14438920. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 12:39:46,562][14064] Avg episode reward: [(0, '0.075')] [2024-06-06 12:39:49,623][14296] Updated weights for policy 0, policy_version 9857 (0.0024) [2024-06-06 12:39:51,561][14064] Fps is (10 sec: 50790.4, 60 sec: 49971.0, 300 sec: 47319.2). Total num frames: 161595392. Throughput: 0: 49747.0. Samples: 14591680. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 12:39:51,562][14064] Avg episode reward: [(0, '0.077')] [2024-06-06 12:39:52,362][14296] Updated weights for policy 0, policy_version 9867 (0.0034) [2024-06-06 12:39:56,103][14296] Updated weights for policy 0, policy_version 9877 (0.0027) [2024-06-06 12:39:56,561][14064] Fps is (10 sec: 49151.8, 60 sec: 49971.1, 300 sec: 47708.0). Total num frames: 161841152. Throughput: 0: 50005.2. Samples: 14893900. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 12:39:56,571][14064] Avg episode reward: [(0, '0.077')] [2024-06-06 12:39:58,932][14296] Updated weights for policy 0, policy_version 9887 (0.0037) [2024-06-06 12:40:01,561][14064] Fps is (10 sec: 47514.1, 60 sec: 49425.0, 300 sec: 47985.7). Total num frames: 162070528. Throughput: 0: 49773.4. Samples: 15182780. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 12:40:01,562][14064] Avg episode reward: [(0, '0.079')] [2024-06-06 12:40:02,773][14296] Updated weights for policy 0, policy_version 9897 (0.0027) [2024-06-06 12:40:05,782][14296] Updated weights for policy 0, policy_version 9907 (0.0028) [2024-06-06 12:40:06,561][14064] Fps is (10 sec: 52428.8, 60 sec: 49971.2, 300 sec: 48318.9). Total num frames: 162365440. Throughput: 0: 49781.3. Samples: 15326120. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 12:40:06,568][14064] Avg episode reward: [(0, '0.081')] [2024-06-06 12:40:06,584][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000009910_162365440.pth... [2024-06-06 12:40:06,628][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000009187_150519808.pth [2024-06-06 12:40:09,515][14296] Updated weights for policy 0, policy_version 9917 (0.0036) [2024-06-06 12:40:11,561][14064] Fps is (10 sec: 50789.8, 60 sec: 49698.0, 300 sec: 48263.4). Total num frames: 162578432. Throughput: 0: 49781.3. Samples: 15630220. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 12:40:11,562][14064] Avg episode reward: [(0, '0.078')] [2024-06-06 12:40:12,372][14296] Updated weights for policy 0, policy_version 9927 (0.0026) [2024-06-06 12:40:15,978][14296] Updated weights for policy 0, policy_version 9937 (0.0033) [2024-06-06 12:40:16,561][14064] Fps is (10 sec: 45875.3, 60 sec: 49971.2, 300 sec: 48430.0). Total num frames: 162824192. Throughput: 0: 49882.1. Samples: 15928380. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 12:40:16,562][14064] Avg episode reward: [(0, '0.076')] [2024-06-06 12:40:19,032][14296] Updated weights for policy 0, policy_version 9947 (0.0027) [2024-06-06 12:40:21,564][14064] Fps is (10 sec: 47503.1, 60 sec: 49423.2, 300 sec: 48429.6). Total num frames: 163053568. Throughput: 0: 49469.4. Samples: 16068720. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 12:40:21,564][14064] Avg episode reward: [(0, '0.079')] [2024-06-06 12:40:22,528][14296] Updated weights for policy 0, policy_version 9957 (0.0033) [2024-06-06 12:40:25,547][14296] Updated weights for policy 0, policy_version 9967 (0.0031) [2024-06-06 12:40:26,564][14064] Fps is (10 sec: 52415.3, 60 sec: 49696.0, 300 sec: 48707.3). Total num frames: 163348480. Throughput: 0: 49481.7. Samples: 16369320. Policy #0 lag: (min: 2.0, avg: 12.0, max: 24.0) [2024-06-06 12:40:26,565][14064] Avg episode reward: [(0, '0.075')] [2024-06-06 12:40:29,008][14296] Updated weights for policy 0, policy_version 9977 (0.0034) [2024-06-06 12:40:31,561][14064] Fps is (10 sec: 52441.3, 60 sec: 49425.1, 300 sec: 48707.7). Total num frames: 163577856. Throughput: 0: 49672.5. Samples: 16674180. Policy #0 lag: (min: 2.0, avg: 12.0, max: 24.0) [2024-06-06 12:40:31,562][14064] Avg episode reward: [(0, '0.081')] [2024-06-06 12:40:32,130][14296] Updated weights for policy 0, policy_version 9987 (0.0021) [2024-06-06 12:40:35,584][14296] Updated weights for policy 0, policy_version 9997 (0.0020) [2024-06-06 12:40:36,561][14064] Fps is (10 sec: 49164.9, 60 sec: 49971.2, 300 sec: 48929.9). Total num frames: 163840000. Throughput: 0: 49713.9. Samples: 16828800. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:40:36,562][14064] Avg episode reward: [(0, '0.082')] [2024-06-06 12:40:38,621][14296] Updated weights for policy 0, policy_version 10007 (0.0031) [2024-06-06 12:40:41,561][14064] Fps is (10 sec: 49152.1, 60 sec: 49698.3, 300 sec: 48985.4). Total num frames: 164069376. Throughput: 0: 49366.4. Samples: 17115380. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:40:41,562][14064] Avg episode reward: [(0, '0.077')] [2024-06-06 12:40:42,309][14296] Updated weights for policy 0, policy_version 10017 (0.0031) [2024-06-06 12:40:45,354][14296] Updated weights for policy 0, policy_version 10027 (0.0031) [2024-06-06 12:40:46,561][14064] Fps is (10 sec: 50790.4, 60 sec: 49971.2, 300 sec: 49207.5). Total num frames: 164347904. Throughput: 0: 49460.0. Samples: 17408480. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 12:40:46,562][14064] Avg episode reward: [(0, '0.083')] [2024-06-06 12:40:48,969][14296] Updated weights for policy 0, policy_version 10037 (0.0026) [2024-06-06 12:40:51,322][14276] Signal inference workers to stop experience collection... (200 times) [2024-06-06 12:40:51,361][14296] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-06 12:40:51,371][14276] Signal inference workers to resume experience collection... (200 times) [2024-06-06 12:40:51,380][14296] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-06 12:40:51,561][14064] Fps is (10 sec: 50789.8, 60 sec: 49698.2, 300 sec: 49207.5). Total num frames: 164577280. Throughput: 0: 49630.2. Samples: 17559480. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 12:40:51,562][14064] Avg episode reward: [(0, '0.073')] [2024-06-06 12:40:51,946][14296] Updated weights for policy 0, policy_version 10047 (0.0021) [2024-06-06 12:40:55,686][14296] Updated weights for policy 0, policy_version 10057 (0.0025) [2024-06-06 12:40:56,561][14064] Fps is (10 sec: 44236.9, 60 sec: 49152.1, 300 sec: 49207.5). Total num frames: 164790272. Throughput: 0: 49410.0. Samples: 17853660. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 12:40:56,562][14064] Avg episode reward: [(0, '0.086')] [2024-06-06 12:40:56,651][14276] Saving new best policy, reward=0.086! [2024-06-06 12:40:58,653][14296] Updated weights for policy 0, policy_version 10067 (0.0026) [2024-06-06 12:41:01,561][14064] Fps is (10 sec: 44237.2, 60 sec: 49152.0, 300 sec: 49152.0). Total num frames: 165019648. Throughput: 0: 49485.4. Samples: 18155220. Policy #0 lag: (min: 0.0, avg: 12.3, max: 24.0) [2024-06-06 12:41:01,562][14064] Avg episode reward: [(0, '0.087')] [2024-06-06 12:41:01,608][14276] Saving new best policy, reward=0.087! [2024-06-06 12:41:02,231][14296] Updated weights for policy 0, policy_version 10077 (0.0033) [2024-06-06 12:41:05,546][14296] Updated weights for policy 0, policy_version 10087 (0.0042) [2024-06-06 12:41:06,561][14064] Fps is (10 sec: 52428.1, 60 sec: 49152.0, 300 sec: 49374.2). Total num frames: 165314560. Throughput: 0: 49366.0. Samples: 18290080. Policy #0 lag: (min: 0.0, avg: 12.3, max: 24.0) [2024-06-06 12:41:06,562][14064] Avg episode reward: [(0, '0.085')] [2024-06-06 12:41:08,997][14296] Updated weights for policy 0, policy_version 10097 (0.0022) [2024-06-06 12:41:11,561][14064] Fps is (10 sec: 54066.9, 60 sec: 49698.2, 300 sec: 49374.2). Total num frames: 165560320. Throughput: 0: 49267.8. Samples: 18586240. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 12:41:11,562][14064] Avg episode reward: [(0, '0.082')] [2024-06-06 12:41:12,254][14296] Updated weights for policy 0, policy_version 10107 (0.0026) [2024-06-06 12:41:15,966][14296] Updated weights for policy 0, policy_version 10117 (0.0023) [2024-06-06 12:41:16,561][14064] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 165773312. Throughput: 0: 48964.3. Samples: 18877580. Policy #0 lag: (min: 0.0, avg: 7.7, max: 20.0) [2024-06-06 12:41:16,562][14064] Avg episode reward: [(0, '0.082')] [2024-06-06 12:41:18,761][14296] Updated weights for policy 0, policy_version 10127 (0.0040) [2024-06-06 12:41:21,561][14064] Fps is (10 sec: 44236.9, 60 sec: 49153.9, 300 sec: 49429.7). Total num frames: 166002688. Throughput: 0: 48623.1. Samples: 19016840. Policy #0 lag: (min: 0.0, avg: 7.7, max: 20.0) [2024-06-06 12:41:21,562][14064] Avg episode reward: [(0, '0.081')] [2024-06-06 12:41:22,531][14296] Updated weights for policy 0, policy_version 10137 (0.0029) [2024-06-06 12:41:25,394][14296] Updated weights for policy 0, policy_version 10147 (0.0032) [2024-06-06 12:41:26,561][14064] Fps is (10 sec: 52428.8, 60 sec: 49154.1, 300 sec: 49596.3). Total num frames: 166297600. Throughput: 0: 48864.3. Samples: 19314280. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 12:41:26,562][14064] Avg episode reward: [(0, '0.082')] [2024-06-06 12:41:28,949][14296] Updated weights for policy 0, policy_version 10157 (0.0034) [2024-06-06 12:41:31,562][14064] Fps is (10 sec: 54061.6, 60 sec: 49424.2, 300 sec: 49596.2). Total num frames: 166543360. Throughput: 0: 48886.9. Samples: 19608440. Policy #0 lag: (min: 1.0, avg: 11.3, max: 21.0) [2024-06-06 12:41:31,563][14064] Avg episode reward: [(0, '0.080')] [2024-06-06 12:41:32,550][14296] Updated weights for policy 0, policy_version 10167 (0.0026) [2024-06-06 12:41:35,889][14296] Updated weights for policy 0, policy_version 10177 (0.0027) [2024-06-06 12:41:36,561][14064] Fps is (10 sec: 47514.2, 60 sec: 48879.0, 300 sec: 49540.8). Total num frames: 166772736. Throughput: 0: 49105.4. Samples: 19769220. Policy #0 lag: (min: 1.0, avg: 11.3, max: 21.0) [2024-06-06 12:41:36,562][14064] Avg episode reward: [(0, '0.088')] [2024-06-06 12:41:36,689][14276] Saving new best policy, reward=0.088! [2024-06-06 12:41:38,949][14296] Updated weights for policy 0, policy_version 10187 (0.0027) [2024-06-06 12:41:41,561][14064] Fps is (10 sec: 44241.3, 60 sec: 48605.8, 300 sec: 49374.2). Total num frames: 166985728. Throughput: 0: 48966.6. Samples: 20057160. Policy #0 lag: (min: 0.0, avg: 11.6, max: 22.0) [2024-06-06 12:41:41,562][14064] Avg episode reward: [(0, '0.084')] [2024-06-06 12:41:42,820][14296] Updated weights for policy 0, policy_version 10197 (0.0031) [2024-06-06 12:41:45,378][14296] Updated weights for policy 0, policy_version 10207 (0.0023) [2024-06-06 12:41:46,561][14064] Fps is (10 sec: 50790.3, 60 sec: 48879.0, 300 sec: 49596.4). Total num frames: 167280640. Throughput: 0: 48806.6. Samples: 20351520. Policy #0 lag: (min: 0.0, avg: 11.6, max: 22.0) [2024-06-06 12:41:46,562][14064] Avg episode reward: [(0, '0.085')] [2024-06-06 12:41:49,198][14296] Updated weights for policy 0, policy_version 10217 (0.0030) [2024-06-06 12:41:51,561][14064] Fps is (10 sec: 54066.8, 60 sec: 49152.0, 300 sec: 49541.2). Total num frames: 167526400. Throughput: 0: 49278.2. Samples: 20507600. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 12:41:51,562][14064] Avg episode reward: [(0, '0.086')] [2024-06-06 12:41:52,083][14296] Updated weights for policy 0, policy_version 10227 (0.0031) [2024-06-06 12:41:55,662][14296] Updated weights for policy 0, policy_version 10237 (0.0036) [2024-06-06 12:41:56,561][14064] Fps is (10 sec: 49152.1, 60 sec: 49698.1, 300 sec: 49540.8). Total num frames: 167772160. Throughput: 0: 49300.5. Samples: 20804760. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 12:41:56,561][14064] Avg episode reward: [(0, '0.083')] [2024-06-06 12:41:58,655][14296] Updated weights for policy 0, policy_version 10247 (0.0020) [2024-06-06 12:41:59,092][14276] Signal inference workers to stop experience collection... (250 times) [2024-06-06 12:41:59,094][14276] Signal inference workers to resume experience collection... (250 times) [2024-06-06 12:41:59,132][14296] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-06 12:41:59,132][14296] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-06 12:42:01,561][14064] Fps is (10 sec: 47514.2, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 168001536. Throughput: 0: 49575.7. Samples: 21108480. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 12:42:01,561][14064] Avg episode reward: [(0, '0.082')] [2024-06-06 12:42:02,461][14296] Updated weights for policy 0, policy_version 10257 (0.0030) [2024-06-06 12:42:05,094][14296] Updated weights for policy 0, policy_version 10267 (0.0029) [2024-06-06 12:42:06,561][14064] Fps is (10 sec: 50790.4, 60 sec: 49425.2, 300 sec: 49597.1). Total num frames: 168280064. Throughput: 0: 49636.0. Samples: 21250460. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 12:42:06,561][14064] Avg episode reward: [(0, '0.082')] [2024-06-06 12:42:06,575][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000010271_168280064.pth... [2024-06-06 12:42:06,624][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000009547_156418048.pth [2024-06-06 12:42:08,907][14296] Updated weights for policy 0, policy_version 10277 (0.0031) [2024-06-06 12:42:11,426][14296] Updated weights for policy 0, policy_version 10287 (0.0034) [2024-06-06 12:42:11,562][14064] Fps is (10 sec: 54065.8, 60 sec: 49698.0, 300 sec: 49596.3). Total num frames: 168542208. Throughput: 0: 49843.9. Samples: 21557260. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 12:42:11,562][14064] Avg episode reward: [(0, '0.084')] [2024-06-06 12:42:15,715][14296] Updated weights for policy 0, policy_version 10297 (0.0022) [2024-06-06 12:42:16,561][14064] Fps is (10 sec: 47513.6, 60 sec: 49698.2, 300 sec: 49485.2). Total num frames: 168755200. Throughput: 0: 49935.4. Samples: 21855480. Policy #0 lag: (min: 1.0, avg: 12.2, max: 21.0) [2024-06-06 12:42:16,561][14064] Avg episode reward: [(0, '0.080')] [2024-06-06 12:42:18,090][14296] Updated weights for policy 0, policy_version 10307 (0.0028) [2024-06-06 12:42:21,561][14064] Fps is (10 sec: 45875.5, 60 sec: 49971.1, 300 sec: 49429.7). Total num frames: 169000960. Throughput: 0: 49598.5. Samples: 22001160. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 12:42:21,562][14064] Avg episode reward: [(0, '0.083')] [2024-06-06 12:42:22,111][14296] Updated weights for policy 0, policy_version 10317 (0.0027) [2024-06-06 12:42:24,794][14296] Updated weights for policy 0, policy_version 10327 (0.0019) [2024-06-06 12:42:26,561][14064] Fps is (10 sec: 50790.4, 60 sec: 49425.2, 300 sec: 49540.8). Total num frames: 169263104. Throughput: 0: 49689.8. Samples: 22293200. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 12:42:26,562][14064] Avg episode reward: [(0, '0.085')] [2024-06-06 12:42:28,736][14296] Updated weights for policy 0, policy_version 10337 (0.0027) [2024-06-06 12:42:31,354][14296] Updated weights for policy 0, policy_version 10347 (0.0036) [2024-06-06 12:42:31,561][14064] Fps is (10 sec: 52429.8, 60 sec: 49699.0, 300 sec: 49540.8). Total num frames: 169525248. Throughput: 0: 50008.5. Samples: 22601900. Policy #0 lag: (min: 0.0, avg: 8.8, max: 20.0) [2024-06-06 12:42:31,562][14064] Avg episode reward: [(0, '0.090')] [2024-06-06 12:42:31,573][14276] Saving new best policy, reward=0.090! [2024-06-06 12:42:35,308][14296] Updated weights for policy 0, policy_version 10357 (0.0026) [2024-06-06 12:42:36,561][14064] Fps is (10 sec: 49151.9, 60 sec: 49698.1, 300 sec: 49485.2). Total num frames: 169754624. Throughput: 0: 49941.9. Samples: 22754980. Policy #0 lag: (min: 0.0, avg: 8.8, max: 20.0) [2024-06-06 12:42:36,562][14064] Avg episode reward: [(0, '0.083')] [2024-06-06 12:42:37,719][14296] Updated weights for policy 0, policy_version 10367 (0.0028) [2024-06-06 12:42:41,561][14064] Fps is (10 sec: 45874.7, 60 sec: 49971.2, 300 sec: 49374.2). Total num frames: 169984000. Throughput: 0: 50049.2. Samples: 23056980. Policy #0 lag: (min: 1.0, avg: 11.1, max: 23.0) [2024-06-06 12:42:41,562][14064] Avg episode reward: [(0, '0.089')] [2024-06-06 12:42:41,773][14296] Updated weights for policy 0, policy_version 10377 (0.0027) [2024-06-06 12:42:44,317][14296] Updated weights for policy 0, policy_version 10387 (0.0027) [2024-06-06 12:42:46,564][14064] Fps is (10 sec: 49138.9, 60 sec: 49422.9, 300 sec: 49484.8). Total num frames: 170246144. Throughput: 0: 49808.1. Samples: 23349980. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 12:42:46,564][14064] Avg episode reward: [(0, '0.087')] [2024-06-06 12:42:48,368][14296] Updated weights for policy 0, policy_version 10397 (0.0026) [2024-06-06 12:42:51,064][14296] Updated weights for policy 0, policy_version 10407 (0.0023) [2024-06-06 12:42:51,561][14064] Fps is (10 sec: 55705.9, 60 sec: 50244.3, 300 sec: 49651.8). Total num frames: 170541056. Throughput: 0: 49974.6. Samples: 23499320. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 12:42:51,562][14064] Avg episode reward: [(0, '0.086')] [2024-06-06 12:42:54,881][14296] Updated weights for policy 0, policy_version 10417 (0.0027) [2024-06-06 12:42:56,561][14064] Fps is (10 sec: 50802.9, 60 sec: 49697.9, 300 sec: 49540.8). Total num frames: 170754048. Throughput: 0: 49859.6. Samples: 23800940. Policy #0 lag: (min: 0.0, avg: 9.4, max: 23.0) [2024-06-06 12:42:56,562][14064] Avg episode reward: [(0, '0.089')] [2024-06-06 12:42:57,489][14296] Updated weights for policy 0, policy_version 10427 (0.0030) [2024-06-06 12:43:01,561][14064] Fps is (10 sec: 45875.6, 60 sec: 49971.2, 300 sec: 49485.3). Total num frames: 170999808. Throughput: 0: 49910.7. Samples: 24101460. Policy #0 lag: (min: 0.0, avg: 9.4, max: 23.0) [2024-06-06 12:43:01,562][14064] Avg episode reward: [(0, '0.092')] [2024-06-06 12:43:01,562][14276] Saving new best policy, reward=0.092! [2024-06-06 12:43:01,566][14296] Updated weights for policy 0, policy_version 10437 (0.0024) [2024-06-06 12:43:03,996][14296] Updated weights for policy 0, policy_version 10447 (0.0027) [2024-06-06 12:43:06,561][14064] Fps is (10 sec: 49152.8, 60 sec: 49425.0, 300 sec: 49596.3). Total num frames: 171245568. Throughput: 0: 49765.0. Samples: 24240580. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 12:43:06,562][14064] Avg episode reward: [(0, '0.091')] [2024-06-06 12:43:08,223][14296] Updated weights for policy 0, policy_version 10457 (0.0032) [2024-06-06 12:43:11,030][14296] Updated weights for policy 0, policy_version 10467 (0.0023) [2024-06-06 12:43:11,298][14276] Signal inference workers to stop experience collection... (300 times) [2024-06-06 12:43:11,343][14296] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-06 12:43:11,363][14276] Signal inference workers to resume experience collection... (300 times) [2024-06-06 12:43:11,363][14296] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-06 12:43:11,561][14064] Fps is (10 sec: 54067.1, 60 sec: 49971.4, 300 sec: 49762.9). Total num frames: 171540480. Throughput: 0: 49896.5. Samples: 24538540. Policy #0 lag: (min: 1.0, avg: 12.7, max: 24.0) [2024-06-06 12:43:11,562][14064] Avg episode reward: [(0, '0.088')] [2024-06-06 12:43:14,772][14296] Updated weights for policy 0, policy_version 10477 (0.0030) [2024-06-06 12:43:16,561][14064] Fps is (10 sec: 49152.3, 60 sec: 49698.1, 300 sec: 49485.3). Total num frames: 171737088. Throughput: 0: 49617.3. Samples: 24834680. Policy #0 lag: (min: 1.0, avg: 12.7, max: 24.0) [2024-06-06 12:43:16,561][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:43:16,611][14276] Saving new best policy, reward=0.093! [2024-06-06 12:43:17,696][14296] Updated weights for policy 0, policy_version 10487 (0.0029) [2024-06-06 12:43:21,347][14296] Updated weights for policy 0, policy_version 10497 (0.0031) [2024-06-06 12:43:21,561][14064] Fps is (10 sec: 44236.4, 60 sec: 49698.2, 300 sec: 49485.2). Total num frames: 171982848. Throughput: 0: 49404.8. Samples: 24978200. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 12:43:21,562][14064] Avg episode reward: [(0, '0.085')] [2024-06-06 12:43:24,230][14296] Updated weights for policy 0, policy_version 10507 (0.0023) [2024-06-06 12:43:26,564][14064] Fps is (10 sec: 49138.9, 60 sec: 49422.9, 300 sec: 49540.8). Total num frames: 172228608. Throughput: 0: 49222.1. Samples: 25272100. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 12:43:26,564][14064] Avg episode reward: [(0, '0.091')] [2024-06-06 12:43:28,085][14296] Updated weights for policy 0, policy_version 10517 (0.0035) [2024-06-06 12:43:31,103][14296] Updated weights for policy 0, policy_version 10527 (0.0033) [2024-06-06 12:43:31,561][14064] Fps is (10 sec: 52428.2, 60 sec: 49698.0, 300 sec: 49651.8). Total num frames: 172507136. Throughput: 0: 49195.2. Samples: 25563640. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 12:43:31,562][14064] Avg episode reward: [(0, '0.081')] [2024-06-06 12:43:34,836][14296] Updated weights for policy 0, policy_version 10537 (0.0031) [2024-06-06 12:43:36,561][14064] Fps is (10 sec: 47526.3, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 172703744. Throughput: 0: 49197.4. Samples: 25713200. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 12:43:36,561][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:43:37,819][14296] Updated weights for policy 0, policy_version 10547 (0.0026) [2024-06-06 12:43:41,373][14296] Updated weights for policy 0, policy_version 10557 (0.0027) [2024-06-06 12:43:41,561][14064] Fps is (10 sec: 45876.0, 60 sec: 49698.2, 300 sec: 49429.7). Total num frames: 172965888. Throughput: 0: 49102.0. Samples: 26010520. Policy #0 lag: (min: 0.0, avg: 10.1, max: 24.0) [2024-06-06 12:43:41,561][14064] Avg episode reward: [(0, '0.087')] [2024-06-06 12:43:44,487][14296] Updated weights for policy 0, policy_version 10567 (0.0025) [2024-06-06 12:43:46,561][14064] Fps is (10 sec: 49152.0, 60 sec: 49154.2, 300 sec: 49485.2). Total num frames: 173195264. Throughput: 0: 48754.6. Samples: 26295420. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 12:43:46,562][14064] Avg episode reward: [(0, '0.090')] [2024-06-06 12:43:48,050][14296] Updated weights for policy 0, policy_version 10577 (0.0032) [2024-06-06 12:43:51,121][14296] Updated weights for policy 0, policy_version 10587 (0.0033) [2024-06-06 12:43:51,561][14064] Fps is (10 sec: 50790.7, 60 sec: 48879.0, 300 sec: 49596.3). Total num frames: 173473792. Throughput: 0: 49113.5. Samples: 26450680. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 12:43:51,561][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:43:54,841][14296] Updated weights for policy 0, policy_version 10597 (0.0036) [2024-06-06 12:43:56,561][14064] Fps is (10 sec: 49152.3, 60 sec: 48879.2, 300 sec: 49429.7). Total num frames: 173686784. Throughput: 0: 48992.9. Samples: 26743220. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 12:43:56,561][14064] Avg episode reward: [(0, '0.090')] [2024-06-06 12:43:57,904][14296] Updated weights for policy 0, policy_version 10607 (0.0031) [2024-06-06 12:44:01,505][14296] Updated weights for policy 0, policy_version 10617 (0.0043) [2024-06-06 12:44:01,561][14064] Fps is (10 sec: 47512.8, 60 sec: 49151.9, 300 sec: 49429.7). Total num frames: 173948928. Throughput: 0: 48766.5. Samples: 27029180. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 12:44:01,562][14064] Avg episode reward: [(0, '0.091')] [2024-06-06 12:44:04,610][14296] Updated weights for policy 0, policy_version 10627 (0.0023) [2024-06-06 12:44:06,561][14064] Fps is (10 sec: 49151.6, 60 sec: 48879.0, 300 sec: 49429.7). Total num frames: 174178304. Throughput: 0: 48846.7. Samples: 27176300. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 12:44:06,562][14064] Avg episode reward: [(0, '0.086')] [2024-06-06 12:44:06,568][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000010631_174178304.pth... [2024-06-06 12:44:06,616][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000009910_162365440.pth [2024-06-06 12:44:08,091][14296] Updated weights for policy 0, policy_version 10637 (0.0025) [2024-06-06 12:44:11,302][14296] Updated weights for policy 0, policy_version 10647 (0.0031) [2024-06-06 12:44:11,561][14064] Fps is (10 sec: 49152.6, 60 sec: 48332.8, 300 sec: 49540.8). Total num frames: 174440448. Throughput: 0: 48902.9. Samples: 27472600. Policy #0 lag: (min: 2.0, avg: 10.8, max: 22.0) [2024-06-06 12:44:11,561][14064] Avg episode reward: [(0, '0.089')] [2024-06-06 12:44:15,022][14296] Updated weights for policy 0, policy_version 10657 (0.0031) [2024-06-06 12:44:16,561][14064] Fps is (10 sec: 47513.7, 60 sec: 48605.9, 300 sec: 49374.2). Total num frames: 174653440. Throughput: 0: 48887.7. Samples: 27763580. Policy #0 lag: (min: 2.0, avg: 10.8, max: 22.0) [2024-06-06 12:44:16,561][14064] Avg episode reward: [(0, '0.091')] [2024-06-06 12:44:17,949][14276] Signal inference workers to stop experience collection... (350 times) [2024-06-06 12:44:17,954][14276] Signal inference workers to resume experience collection... (350 times) [2024-06-06 12:44:18,005][14296] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-06 12:44:18,005][14296] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-06 12:44:18,093][14296] Updated weights for policy 0, policy_version 10667 (0.0026) [2024-06-06 12:44:21,561][14064] Fps is (10 sec: 47513.3, 60 sec: 48879.0, 300 sec: 49318.6). Total num frames: 174915584. Throughput: 0: 48608.8. Samples: 27900600. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 12:44:21,562][14064] Avg episode reward: [(0, '0.092')] [2024-06-06 12:44:21,741][14296] Updated weights for policy 0, policy_version 10677 (0.0023) [2024-06-06 12:44:24,816][14296] Updated weights for policy 0, policy_version 10687 (0.0029) [2024-06-06 12:44:26,561][14064] Fps is (10 sec: 50790.3, 60 sec: 48881.1, 300 sec: 49318.6). Total num frames: 175161344. Throughput: 0: 48368.0. Samples: 28187080. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 12:44:26,562][14064] Avg episode reward: [(0, '0.090')] [2024-06-06 12:44:28,304][14296] Updated weights for policy 0, policy_version 10697 (0.0038) [2024-06-06 12:44:31,450][14296] Updated weights for policy 0, policy_version 10707 (0.0024) [2024-06-06 12:44:31,561][14064] Fps is (10 sec: 50790.5, 60 sec: 48606.0, 300 sec: 49429.7). Total num frames: 175423488. Throughput: 0: 48761.3. Samples: 28489680. Policy #0 lag: (min: 1.0, avg: 9.1, max: 22.0) [2024-06-06 12:44:31,562][14064] Avg episode reward: [(0, '0.086')] [2024-06-06 12:44:34,886][14296] Updated weights for policy 0, policy_version 10717 (0.0022) [2024-06-06 12:44:36,561][14064] Fps is (10 sec: 47513.3, 60 sec: 48878.9, 300 sec: 49318.6). Total num frames: 175636480. Throughput: 0: 48773.2. Samples: 28645480. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 12:44:36,562][14064] Avg episode reward: [(0, '0.091')] [2024-06-06 12:44:38,101][14296] Updated weights for policy 0, policy_version 10727 (0.0025) [2024-06-06 12:44:41,561][14064] Fps is (10 sec: 47513.6, 60 sec: 48878.9, 300 sec: 49318.6). Total num frames: 175898624. Throughput: 0: 48915.5. Samples: 28944420. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 12:44:41,562][14064] Avg episode reward: [(0, '0.087')] [2024-06-06 12:44:41,701][14296] Updated weights for policy 0, policy_version 10737 (0.0034) [2024-06-06 12:44:44,710][14296] Updated weights for policy 0, policy_version 10747 (0.0030) [2024-06-06 12:44:46,561][14064] Fps is (10 sec: 50790.7, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 176144384. Throughput: 0: 48986.8. Samples: 29233580. Policy #0 lag: (min: 0.0, avg: 12.6, max: 21.0) [2024-06-06 12:44:46,562][14064] Avg episode reward: [(0, '0.092')] [2024-06-06 12:44:48,530][14296] Updated weights for policy 0, policy_version 10757 (0.0028) [2024-06-06 12:44:51,405][14296] Updated weights for policy 0, policy_version 10767 (0.0028) [2024-06-06 12:44:51,561][14064] Fps is (10 sec: 52428.4, 60 sec: 49151.9, 300 sec: 49429.7). Total num frames: 176422912. Throughput: 0: 49131.9. Samples: 29387240. Policy #0 lag: (min: 0.0, avg: 12.6, max: 21.0) [2024-06-06 12:44:51,562][14064] Avg episode reward: [(0, '0.091')] [2024-06-06 12:44:55,077][14296] Updated weights for policy 0, policy_version 10777 (0.0024) [2024-06-06 12:44:56,561][14064] Fps is (10 sec: 47513.1, 60 sec: 48878.8, 300 sec: 49318.6). Total num frames: 176619520. Throughput: 0: 48977.2. Samples: 29676580. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:44:56,562][14064] Avg episode reward: [(0, '0.090')] [2024-06-06 12:44:58,116][14296] Updated weights for policy 0, policy_version 10787 (0.0026) [2024-06-06 12:45:01,494][14296] Updated weights for policy 0, policy_version 10797 (0.0026) [2024-06-06 12:45:01,561][14064] Fps is (10 sec: 47513.8, 60 sec: 49152.0, 300 sec: 49263.1). Total num frames: 176898048. Throughput: 0: 49198.6. Samples: 29977520. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:45:01,562][14064] Avg episode reward: [(0, '0.084')] [2024-06-06 12:45:04,550][14296] Updated weights for policy 0, policy_version 10807 (0.0031) [2024-06-06 12:45:06,561][14064] Fps is (10 sec: 50791.0, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 177127424. Throughput: 0: 49435.2. Samples: 30125180. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 12:45:06,562][14064] Avg episode reward: [(0, '0.085')] [2024-06-06 12:45:07,372][14276] Signal inference workers to stop experience collection... (400 times) [2024-06-06 12:45:07,373][14276] Signal inference workers to resume experience collection... (400 times) [2024-06-06 12:45:07,397][14296] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-06 12:45:07,397][14296] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-06 12:45:08,238][14296] Updated weights for policy 0, policy_version 10817 (0.0025) [2024-06-06 12:45:11,319][14296] Updated weights for policy 0, policy_version 10827 (0.0022) [2024-06-06 12:45:11,561][14064] Fps is (10 sec: 50790.1, 60 sec: 49425.0, 300 sec: 49429.7). Total num frames: 177405952. Throughput: 0: 49794.1. Samples: 30427820. Policy #0 lag: (min: 1.0, avg: 11.0, max: 22.0) [2024-06-06 12:45:11,562][14064] Avg episode reward: [(0, '0.095')] [2024-06-06 12:45:11,563][14276] Saving new best policy, reward=0.095! [2024-06-06 12:45:14,854][14296] Updated weights for policy 0, policy_version 10837 (0.0028) [2024-06-06 12:45:16,564][14064] Fps is (10 sec: 49138.9, 60 sec: 49422.9, 300 sec: 49374.1). Total num frames: 177618944. Throughput: 0: 49696.7. Samples: 30726160. Policy #0 lag: (min: 1.0, avg: 11.0, max: 22.0) [2024-06-06 12:45:16,564][14064] Avg episode reward: [(0, '0.089')] [2024-06-06 12:45:17,846][14296] Updated weights for policy 0, policy_version 10847 (0.0026) [2024-06-06 12:45:21,453][14296] Updated weights for policy 0, policy_version 10857 (0.0025) [2024-06-06 12:45:21,564][14064] Fps is (10 sec: 47501.6, 60 sec: 49422.9, 300 sec: 49263.1). Total num frames: 177881088. Throughput: 0: 49327.0. Samples: 30865320. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 12:45:21,564][14064] Avg episode reward: [(0, '0.090')] [2024-06-06 12:45:24,414][14296] Updated weights for policy 0, policy_version 10867 (0.0038) [2024-06-06 12:45:26,561][14064] Fps is (10 sec: 50804.0, 60 sec: 49425.1, 300 sec: 49318.6). Total num frames: 178126848. Throughput: 0: 49196.9. Samples: 31158280. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 12:45:26,562][14064] Avg episode reward: [(0, '0.089')] [2024-06-06 12:45:28,079][14296] Updated weights for policy 0, policy_version 10877 (0.0027) [2024-06-06 12:45:31,157][14296] Updated weights for policy 0, policy_version 10887 (0.0038) [2024-06-06 12:45:31,561][14064] Fps is (10 sec: 50803.9, 60 sec: 49425.1, 300 sec: 49318.6). Total num frames: 178388992. Throughput: 0: 49364.0. Samples: 31454960. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 12:45:31,561][14064] Avg episode reward: [(0, '0.087')] [2024-06-06 12:45:34,752][14296] Updated weights for policy 0, policy_version 10897 (0.0038) [2024-06-06 12:45:36,561][14064] Fps is (10 sec: 47513.1, 60 sec: 49425.0, 300 sec: 49263.1). Total num frames: 178601984. Throughput: 0: 49390.7. Samples: 31609820. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 12:45:36,562][14064] Avg episode reward: [(0, '0.095')] [2024-06-06 12:45:37,793][14296] Updated weights for policy 0, policy_version 10907 (0.0030) [2024-06-06 12:45:41,561][14064] Fps is (10 sec: 45874.9, 60 sec: 49152.0, 300 sec: 49152.0). Total num frames: 178847744. Throughput: 0: 49587.6. Samples: 31908020. Policy #0 lag: (min: 3.0, avg: 11.3, max: 22.0) [2024-06-06 12:45:41,562][14064] Avg episode reward: [(0, '0.086')] [2024-06-06 12:45:41,644][14296] Updated weights for policy 0, policy_version 10917 (0.0032) [2024-06-06 12:45:44,459][14296] Updated weights for policy 0, policy_version 10927 (0.0028) [2024-06-06 12:45:46,561][14064] Fps is (10 sec: 50791.2, 60 sec: 49425.1, 300 sec: 49263.1). Total num frames: 179109888. Throughput: 0: 49337.9. Samples: 32197720. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:45:46,561][14064] Avg episode reward: [(0, '0.084')] [2024-06-06 12:45:48,156][14296] Updated weights for policy 0, policy_version 10937 (0.0023) [2024-06-06 12:45:51,033][14296] Updated weights for policy 0, policy_version 10947 (0.0027) [2024-06-06 12:45:51,561][14064] Fps is (10 sec: 52428.2, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 179372032. Throughput: 0: 49510.9. Samples: 32353180. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:45:51,562][14064] Avg episode reward: [(0, '0.097')] [2024-06-06 12:45:51,562][14276] Saving new best policy, reward=0.097! [2024-06-06 12:45:54,674][14296] Updated weights for policy 0, policy_version 10957 (0.0030) [2024-06-06 12:45:56,561][14064] Fps is (10 sec: 45874.7, 60 sec: 49152.1, 300 sec: 49318.6). Total num frames: 179568640. Throughput: 0: 49378.8. Samples: 32649860. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 12:45:56,562][14064] Avg episode reward: [(0, '0.092')] [2024-06-06 12:45:57,657][14296] Updated weights for policy 0, policy_version 10967 (0.0026) [2024-06-06 12:46:01,441][14296] Updated weights for policy 0, policy_version 10977 (0.0021) [2024-06-06 12:46:01,561][14064] Fps is (10 sec: 47513.8, 60 sec: 49152.0, 300 sec: 49263.1). Total num frames: 179847168. Throughput: 0: 49110.3. Samples: 32936000. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 12:46:01,562][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:46:04,450][14296] Updated weights for policy 0, policy_version 10987 (0.0041) [2024-06-06 12:46:06,561][14064] Fps is (10 sec: 52428.7, 60 sec: 49425.0, 300 sec: 49263.1). Total num frames: 180092928. Throughput: 0: 49187.7. Samples: 33078640. Policy #0 lag: (min: 2.0, avg: 12.0, max: 23.0) [2024-06-06 12:46:06,562][14064] Avg episode reward: [(0, '0.091')] [2024-06-06 12:46:06,572][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000010992_180092928.pth... [2024-06-06 12:46:06,628][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000010271_168280064.pth [2024-06-06 12:46:08,136][14296] Updated weights for policy 0, policy_version 10997 (0.0048) [2024-06-06 12:46:11,051][14296] Updated weights for policy 0, policy_version 11007 (0.0027) [2024-06-06 12:46:11,561][14064] Fps is (10 sec: 50791.1, 60 sec: 49152.1, 300 sec: 49429.7). Total num frames: 180355072. Throughput: 0: 49364.0. Samples: 33379660. Policy #0 lag: (min: 2.0, avg: 12.0, max: 23.0) [2024-06-06 12:46:11,562][14064] Avg episode reward: [(0, '0.089')] [2024-06-06 12:46:14,595][14276] Signal inference workers to stop experience collection... (450 times) [2024-06-06 12:46:14,596][14276] Signal inference workers to resume experience collection... (450 times) [2024-06-06 12:46:14,639][14296] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-06 12:46:14,639][14296] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-06 12:46:14,730][14296] Updated weights for policy 0, policy_version 11017 (0.0031) [2024-06-06 12:46:16,561][14064] Fps is (10 sec: 47514.0, 60 sec: 49154.2, 300 sec: 49374.2). Total num frames: 180568064. Throughput: 0: 49456.0. Samples: 33680480. Policy #0 lag: (min: 1.0, avg: 10.8, max: 21.0) [2024-06-06 12:46:16,561][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:46:17,590][14296] Updated weights for policy 0, policy_version 11027 (0.0034) [2024-06-06 12:46:21,216][14296] Updated weights for policy 0, policy_version 11037 (0.0029) [2024-06-06 12:46:21,564][14064] Fps is (10 sec: 47500.9, 60 sec: 49152.0, 300 sec: 49262.7). Total num frames: 180830208. Throughput: 0: 49303.9. Samples: 33828620. Policy #0 lag: (min: 1.0, avg: 10.8, max: 21.0) [2024-06-06 12:46:21,565][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:46:24,225][14296] Updated weights for policy 0, policy_version 11047 (0.0025) [2024-06-06 12:46:26,561][14064] Fps is (10 sec: 50790.2, 60 sec: 49152.0, 300 sec: 49263.2). Total num frames: 181075968. Throughput: 0: 49033.4. Samples: 34114520. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 12:46:26,562][14064] Avg episode reward: [(0, '0.094')] [2024-06-06 12:46:27,930][14296] Updated weights for policy 0, policy_version 11057 (0.0024) [2024-06-06 12:46:30,931][14296] Updated weights for policy 0, policy_version 11067 (0.0028) [2024-06-06 12:46:31,561][14064] Fps is (10 sec: 50803.9, 60 sec: 49152.0, 300 sec: 49374.2). Total num frames: 181338112. Throughput: 0: 49201.7. Samples: 34411800. Policy #0 lag: (min: 1.0, avg: 11.2, max: 23.0) [2024-06-06 12:46:31,561][14064] Avg episode reward: [(0, '0.094')] [2024-06-06 12:46:34,592][14296] Updated weights for policy 0, policy_version 11077 (0.0024) [2024-06-06 12:46:36,561][14064] Fps is (10 sec: 49152.2, 60 sec: 49425.2, 300 sec: 49429.7). Total num frames: 181567488. Throughput: 0: 49072.2. Samples: 34561420. Policy #0 lag: (min: 1.0, avg: 11.2, max: 23.0) [2024-06-06 12:46:36,562][14064] Avg episode reward: [(0, '0.097')] [2024-06-06 12:46:37,766][14296] Updated weights for policy 0, policy_version 11087 (0.0025) [2024-06-06 12:46:41,076][14296] Updated weights for policy 0, policy_version 11097 (0.0024) [2024-06-06 12:46:41,561][14064] Fps is (10 sec: 47513.2, 60 sec: 49425.0, 300 sec: 49263.1). Total num frames: 181813248. Throughput: 0: 49181.3. Samples: 34863020. Policy #0 lag: (min: 0.0, avg: 10.4, max: 23.0) [2024-06-06 12:46:41,562][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:46:44,470][14296] Updated weights for policy 0, policy_version 11107 (0.0022) [2024-06-06 12:46:46,561][14064] Fps is (10 sec: 49151.4, 60 sec: 49151.9, 300 sec: 49263.1). Total num frames: 182059008. Throughput: 0: 49312.5. Samples: 35155060. Policy #0 lag: (min: 0.0, avg: 10.4, max: 23.0) [2024-06-06 12:46:46,562][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:46:47,738][14296] Updated weights for policy 0, policy_version 11117 (0.0023) [2024-06-06 12:46:51,081][14296] Updated weights for policy 0, policy_version 11127 (0.0025) [2024-06-06 12:46:51,561][14064] Fps is (10 sec: 50790.9, 60 sec: 49152.2, 300 sec: 49318.6). Total num frames: 182321152. Throughput: 0: 49521.0. Samples: 35307080. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 12:46:51,561][14064] Avg episode reward: [(0, '0.087')] [2024-06-06 12:46:54,574][14296] Updated weights for policy 0, policy_version 11137 (0.0022) [2024-06-06 12:46:56,561][14064] Fps is (10 sec: 45875.7, 60 sec: 49152.0, 300 sec: 49207.5). Total num frames: 182517760. Throughput: 0: 49262.6. Samples: 35596480. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 12:46:56,562][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:46:58,038][14296] Updated weights for policy 0, policy_version 11147 (0.0019) [2024-06-06 12:47:01,474][14296] Updated weights for policy 0, policy_version 11157 (0.0024) [2024-06-06 12:47:01,561][14064] Fps is (10 sec: 47513.4, 60 sec: 49152.1, 300 sec: 49207.5). Total num frames: 182796288. Throughput: 0: 48957.8. Samples: 35883580. Policy #0 lag: (min: 1.0, avg: 8.5, max: 21.0) [2024-06-06 12:47:01,562][14064] Avg episode reward: [(0, '0.092')] [2024-06-06 12:47:04,814][14296] Updated weights for policy 0, policy_version 11167 (0.0038) [2024-06-06 12:47:06,562][14064] Fps is (10 sec: 52425.7, 60 sec: 49151.6, 300 sec: 49151.9). Total num frames: 183042048. Throughput: 0: 48891.1. Samples: 36028620. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 12:47:06,562][14064] Avg episode reward: [(0, '0.094')] [2024-06-06 12:47:08,037][14296] Updated weights for policy 0, policy_version 11177 (0.0028) [2024-06-06 12:47:11,371][14296] Updated weights for policy 0, policy_version 11187 (0.0031) [2024-06-06 12:47:11,561][14064] Fps is (10 sec: 49151.4, 60 sec: 48878.8, 300 sec: 49263.1). Total num frames: 183287808. Throughput: 0: 49103.9. Samples: 36324200. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 12:47:11,562][14064] Avg episode reward: [(0, '0.089')] [2024-06-06 12:47:14,762][14296] Updated weights for policy 0, policy_version 11197 (0.0037) [2024-06-06 12:47:16,561][14064] Fps is (10 sec: 45877.1, 60 sec: 48878.8, 300 sec: 49152.0). Total num frames: 183500800. Throughput: 0: 48996.7. Samples: 36616660. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 12:47:16,562][14064] Avg episode reward: [(0, '0.091')] [2024-06-06 12:47:18,210][14296] Updated weights for policy 0, policy_version 11207 (0.0036) [2024-06-06 12:47:21,561][14064] Fps is (10 sec: 47513.7, 60 sec: 48881.0, 300 sec: 49152.0). Total num frames: 183762944. Throughput: 0: 48837.6. Samples: 36759120. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 12:47:21,562][14064] Avg episode reward: [(0, '0.091')] [2024-06-06 12:47:21,625][14296] Updated weights for policy 0, policy_version 11217 (0.0027) [2024-06-06 12:47:24,954][14296] Updated weights for policy 0, policy_version 11227 (0.0032) [2024-06-06 12:47:26,561][14064] Fps is (10 sec: 50791.4, 60 sec: 48879.0, 300 sec: 49096.5). Total num frames: 184008704. Throughput: 0: 48466.3. Samples: 37044000. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 12:47:26,561][14064] Avg episode reward: [(0, '0.099')] [2024-06-06 12:47:26,615][14276] Saving new best policy, reward=0.099! [2024-06-06 12:47:28,602][14296] Updated weights for policy 0, policy_version 11237 (0.0028) [2024-06-06 12:47:31,431][14276] Signal inference workers to stop experience collection... (500 times) [2024-06-06 12:47:31,432][14276] Signal inference workers to resume experience collection... (500 times) [2024-06-06 12:47:31,452][14296] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-06 12:47:31,453][14296] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-06 12:47:31,561][14064] Fps is (10 sec: 49152.5, 60 sec: 48605.9, 300 sec: 49152.0). Total num frames: 184254464. Throughput: 0: 48636.6. Samples: 37343700. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 12:47:31,567][14064] Avg episode reward: [(0, '0.098')] [2024-06-06 12:47:31,587][14296] Updated weights for policy 0, policy_version 11247 (0.0023) [2024-06-06 12:47:34,957][14296] Updated weights for policy 0, policy_version 11257 (0.0027) [2024-06-06 12:47:36,561][14064] Fps is (10 sec: 47513.8, 60 sec: 48605.9, 300 sec: 49152.0). Total num frames: 184483840. Throughput: 0: 48620.5. Samples: 37495000. Policy #0 lag: (min: 0.0, avg: 10.9, max: 26.0) [2024-06-06 12:47:36,561][14064] Avg episode reward: [(0, '0.100')] [2024-06-06 12:47:36,584][14276] Saving new best policy, reward=0.100! [2024-06-06 12:47:38,357][14296] Updated weights for policy 0, policy_version 11267 (0.0025) [2024-06-06 12:47:41,502][14296] Updated weights for policy 0, policy_version 11277 (0.0030) [2024-06-06 12:47:41,561][14064] Fps is (10 sec: 50789.5, 60 sec: 49151.9, 300 sec: 49208.0). Total num frames: 184762368. Throughput: 0: 48625.6. Samples: 37784640. Policy #0 lag: (min: 0.0, avg: 10.9, max: 26.0) [2024-06-06 12:47:41,562][14064] Avg episode reward: [(0, '0.095')] [2024-06-06 12:47:44,975][14296] Updated weights for policy 0, policy_version 11287 (0.0035) [2024-06-06 12:47:46,561][14064] Fps is (10 sec: 50790.3, 60 sec: 48879.0, 300 sec: 48985.4). Total num frames: 184991744. Throughput: 0: 48936.5. Samples: 38085720. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 12:47:46,561][14064] Avg episode reward: [(0, '0.094')] [2024-06-06 12:47:48,219][14296] Updated weights for policy 0, policy_version 11297 (0.0030) [2024-06-06 12:47:51,561][14064] Fps is (10 sec: 47514.5, 60 sec: 48605.9, 300 sec: 49096.5). Total num frames: 185237504. Throughput: 0: 49021.1. Samples: 38234540. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 12:47:51,562][14064] Avg episode reward: [(0, '0.094')] [2024-06-06 12:47:51,649][14296] Updated weights for policy 0, policy_version 11307 (0.0022) [2024-06-06 12:47:55,061][14296] Updated weights for policy 0, policy_version 11317 (0.0024) [2024-06-06 12:47:56,561][14064] Fps is (10 sec: 49151.7, 60 sec: 49425.0, 300 sec: 49096.4). Total num frames: 185483264. Throughput: 0: 49008.1. Samples: 38529560. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 12:47:56,562][14064] Avg episode reward: [(0, '0.095')] [2024-06-06 12:47:58,292][14296] Updated weights for policy 0, policy_version 11327 (0.0030) [2024-06-06 12:48:01,557][14296] Updated weights for policy 0, policy_version 11337 (0.0032) [2024-06-06 12:48:01,561][14064] Fps is (10 sec: 50789.9, 60 sec: 49151.9, 300 sec: 49152.0). Total num frames: 185745408. Throughput: 0: 49028.1. Samples: 38822920. Policy #0 lag: (min: 1.0, avg: 10.8, max: 21.0) [2024-06-06 12:48:01,562][14064] Avg episode reward: [(0, '0.089')] [2024-06-06 12:48:05,041][14296] Updated weights for policy 0, policy_version 11347 (0.0033) [2024-06-06 12:48:06,561][14064] Fps is (10 sec: 50789.7, 60 sec: 49152.3, 300 sec: 48985.3). Total num frames: 185991168. Throughput: 0: 49349.2. Samples: 38979840. Policy #0 lag: (min: 1.0, avg: 10.8, max: 21.0) [2024-06-06 12:48:06,562][14064] Avg episode reward: [(0, '0.099')] [2024-06-06 12:48:06,574][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000011352_185991168.pth... [2024-06-06 12:48:06,621][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000010631_174178304.pth [2024-06-06 12:48:08,013][14296] Updated weights for policy 0, policy_version 11357 (0.0029) [2024-06-06 12:48:11,561][14064] Fps is (10 sec: 47514.1, 60 sec: 48879.1, 300 sec: 49096.5). Total num frames: 186220544. Throughput: 0: 49602.7. Samples: 39276120. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 12:48:11,561][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:48:11,582][14296] Updated weights for policy 0, policy_version 11367 (0.0028) [2024-06-06 12:48:14,687][14296] Updated weights for policy 0, policy_version 11377 (0.0023) [2024-06-06 12:48:16,561][14064] Fps is (10 sec: 47514.2, 60 sec: 49425.2, 300 sec: 49096.5). Total num frames: 186466304. Throughput: 0: 49522.2. Samples: 39572200. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 12:48:16,562][14064] Avg episode reward: [(0, '0.099')] [2024-06-06 12:48:18,111][14296] Updated weights for policy 0, policy_version 11387 (0.0033) [2024-06-06 12:48:21,416][14296] Updated weights for policy 0, policy_version 11397 (0.0028) [2024-06-06 12:48:21,561][14064] Fps is (10 sec: 50789.5, 60 sec: 49425.0, 300 sec: 49152.4). Total num frames: 186728448. Throughput: 0: 49258.0. Samples: 39711620. Policy #0 lag: (min: 1.0, avg: 9.3, max: 21.0) [2024-06-06 12:48:21,562][14064] Avg episode reward: [(0, '0.098')] [2024-06-06 12:48:24,805][14296] Updated weights for policy 0, policy_version 11407 (0.0037) [2024-06-06 12:48:26,562][14064] Fps is (10 sec: 50788.0, 60 sec: 49424.6, 300 sec: 49040.9). Total num frames: 186974208. Throughput: 0: 49348.0. Samples: 40005320. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 12:48:26,562][14064] Avg episode reward: [(0, '0.097')] [2024-06-06 12:48:27,942][14296] Updated weights for policy 0, policy_version 11417 (0.0032) [2024-06-06 12:48:31,561][14064] Fps is (10 sec: 47514.3, 60 sec: 49152.0, 300 sec: 49152.0). Total num frames: 187203584. Throughput: 0: 49440.9. Samples: 40310560. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 12:48:31,562][14064] Avg episode reward: [(0, '0.093')] [2024-06-06 12:48:31,678][14296] Updated weights for policy 0, policy_version 11427 (0.0029) [2024-06-06 12:48:34,633][14296] Updated weights for policy 0, policy_version 11437 (0.0034) [2024-06-06 12:48:36,561][14064] Fps is (10 sec: 47515.2, 60 sec: 49424.9, 300 sec: 49096.4). Total num frames: 187449344. Throughput: 0: 49177.1. Samples: 40447520. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 12:48:36,562][14064] Avg episode reward: [(0, '0.095')] [2024-06-06 12:48:38,158][14296] Updated weights for policy 0, policy_version 11447 (0.0026) [2024-06-06 12:48:41,222][14296] Updated weights for policy 0, policy_version 11457 (0.0026) [2024-06-06 12:48:41,562][14064] Fps is (10 sec: 50784.3, 60 sec: 49151.1, 300 sec: 49207.3). Total num frames: 187711488. Throughput: 0: 49272.5. Samples: 40746880. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 12:48:41,563][14064] Avg episode reward: [(0, '0.096')] [2024-06-06 12:48:44,784][14296] Updated weights for policy 0, policy_version 11467 (0.0022) [2024-06-06 12:48:45,009][14276] Signal inference workers to stop experience collection... (550 times) [2024-06-06 12:48:45,012][14276] Signal inference workers to resume experience collection... (550 times) [2024-06-06 12:48:45,025][14296] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-06 12:48:45,025][14296] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-06 12:48:46,561][14064] Fps is (10 sec: 50790.2, 60 sec: 49424.9, 300 sec: 49096.4). Total num frames: 187957248. Throughput: 0: 49437.2. Samples: 41047600. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 12:48:46,562][14064] Avg episode reward: [(0, '0.094')] [2024-06-06 12:48:47,907][14296] Updated weights for policy 0, policy_version 11477 (0.0028) [2024-06-06 12:48:51,295][14296] Updated weights for policy 0, policy_version 11487 (0.0040) [2024-06-06 12:48:51,564][14064] Fps is (10 sec: 49146.4, 60 sec: 49423.1, 300 sec: 49207.1). Total num frames: 188203008. Throughput: 0: 49247.4. Samples: 41196080. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 12:48:51,564][14064] Avg episode reward: [(0, '0.097')] [2024-06-06 12:48:54,417][14296] Updated weights for policy 0, policy_version 11497 (0.0024) [2024-06-06 12:48:56,561][14064] Fps is (10 sec: 49152.9, 60 sec: 49425.1, 300 sec: 49152.0). Total num frames: 188448768. Throughput: 0: 49369.7. Samples: 41497760. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-06 12:48:56,562][14064] Avg episode reward: [(0, '0.092')] [2024-06-06 12:48:58,081][14296] Updated weights for policy 0, policy_version 11507 (0.0024) [2024-06-06 12:49:01,143][14296] Updated weights for policy 0, policy_version 11517 (0.0032) [2024-06-06 12:49:01,561][14064] Fps is (10 sec: 50802.4, 60 sec: 49425.1, 300 sec: 49263.1). Total num frames: 188710912. Throughput: 0: 49438.7. Samples: 41796940. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-06 12:49:01,562][14064] Avg episode reward: [(0, '0.095')] [2024-06-06 12:49:04,653][14296] Updated weights for policy 0, policy_version 11527 (0.0028) [2024-06-06 12:49:06,561][14064] Fps is (10 sec: 50789.7, 60 sec: 49425.1, 300 sec: 49207.5). Total num frames: 188956672. Throughput: 0: 49559.5. Samples: 41941800. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 12:49:06,572][14064] Avg episode reward: [(0, '0.097')] [2024-06-06 12:49:07,907][14296] Updated weights for policy 0, policy_version 11537 (0.0036) [2024-06-06 12:49:11,288][14296] Updated weights for policy 0, policy_version 11547 (0.0029) [2024-06-06 12:49:11,561][14064] Fps is (10 sec: 49151.9, 60 sec: 49698.1, 300 sec: 49318.6). Total num frames: 189202432. Throughput: 0: 49633.9. Samples: 42238820. Policy #0 lag: (min: 1.0, avg: 8.8, max: 21.0) [2024-06-06 12:49:11,562][14064] Avg episode reward: [(0, '0.099')] [2024-06-06 12:49:14,668][14296] Updated weights for policy 0, policy_version 11557 (0.0028) [2024-06-06 12:49:16,561][14064] Fps is (10 sec: 47514.2, 60 sec: 49425.1, 300 sec: 49207.5). Total num frames: 189431808. Throughput: 0: 49398.6. Samples: 42533500. Policy #0 lag: (min: 1.0, avg: 8.8, max: 21.0) [2024-06-06 12:49:16,562][14064] Avg episode reward: [(0, '0.094')] [2024-06-06 12:49:17,986][14296] Updated weights for policy 0, policy_version 11567 (0.0033) [2024-06-06 12:49:21,159][14296] Updated weights for policy 0, policy_version 11577 (0.0030) [2024-06-06 12:49:21,561][14064] Fps is (10 sec: 47513.8, 60 sec: 49152.1, 300 sec: 49207.5). Total num frames: 189677568. Throughput: 0: 49473.1. Samples: 42673800. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 12:49:21,561][14064] Avg episode reward: [(0, '0.101')] [2024-06-06 12:49:24,763][14296] Updated weights for policy 0, policy_version 11587 (0.0025) [2024-06-06 12:49:26,561][14064] Fps is (10 sec: 50790.6, 60 sec: 49425.5, 300 sec: 49207.5). Total num frames: 189939712. Throughput: 0: 49346.2. Samples: 42967400. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 12:49:26,562][14064] Avg episode reward: [(0, '0.095')] [2024-06-06 12:49:28,095][14296] Updated weights for policy 0, policy_version 11597 (0.0027) [2024-06-06 12:49:31,420][14296] Updated weights for policy 0, policy_version 11607 (0.0035) [2024-06-06 12:49:31,561][14064] Fps is (10 sec: 49151.8, 60 sec: 49425.1, 300 sec: 49263.1). Total num frames: 190169088. Throughput: 0: 49198.5. Samples: 43261520. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 12:49:31,562][14064] Avg episode reward: [(0, '0.095')] [2024-06-06 12:49:34,694][14296] Updated weights for policy 0, policy_version 11617 (0.0038) [2024-06-06 12:49:36,561][14064] Fps is (10 sec: 45874.7, 60 sec: 49152.1, 300 sec: 49152.0). Total num frames: 190398464. Throughput: 0: 49088.7. Samples: 43404960. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 12:49:36,562][14064] Avg episode reward: [(0, '0.097')] [2024-06-06 12:49:38,145][14296] Updated weights for policy 0, policy_version 11627 (0.0032) [2024-06-06 12:49:41,561][14064] Fps is (10 sec: 47513.8, 60 sec: 48879.9, 300 sec: 49152.0). Total num frames: 190644224. Throughput: 0: 48752.5. Samples: 43691620. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 12:49:41,562][14064] Avg episode reward: [(0, '0.102')] [2024-06-06 12:49:41,693][14276] Saving new best policy, reward=0.102! [2024-06-06 12:49:41,699][14296] Updated weights for policy 0, policy_version 11637 (0.0038) [2024-06-06 12:49:44,926][14296] Updated weights for policy 0, policy_version 11647 (0.0037) [2024-06-06 12:49:46,561][14064] Fps is (10 sec: 50790.3, 60 sec: 49152.1, 300 sec: 49096.5). Total num frames: 190906368. Throughput: 0: 48514.1. Samples: 43980080. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 12:49:46,562][14064] Avg episode reward: [(0, '0.097')] [2024-06-06 12:49:48,307][14296] Updated weights for policy 0, policy_version 11657 (0.0022) [2024-06-06 12:49:51,561][14064] Fps is (10 sec: 47513.8, 60 sec: 48607.8, 300 sec: 49152.0). Total num frames: 191119360. Throughput: 0: 48710.9. Samples: 44133780. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 12:49:51,562][14064] Avg episode reward: [(0, '0.098')] [2024-06-06 12:49:51,729][14296] Updated weights for policy 0, policy_version 11667 (0.0028) [2024-06-06 12:49:55,058][14296] Updated weights for policy 0, policy_version 11677 (0.0029) [2024-06-06 12:49:56,561][14064] Fps is (10 sec: 45875.8, 60 sec: 48605.9, 300 sec: 49040.9). Total num frames: 191365120. Throughput: 0: 48582.7. Samples: 44425040. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 12:49:56,562][14064] Avg episode reward: [(0, '0.099')] [2024-06-06 12:49:58,429][14296] Updated weights for policy 0, policy_version 11687 (0.0030) [2024-06-06 12:50:01,561][14064] Fps is (10 sec: 50790.5, 60 sec: 48605.9, 300 sec: 49152.0). Total num frames: 191627264. Throughput: 0: 48599.2. Samples: 44720460. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 12:50:01,561][14064] Avg episode reward: [(0, '0.102')] [2024-06-06 12:50:01,631][14296] Updated weights for policy 0, policy_version 11697 (0.0027) [2024-06-06 12:50:05,159][14296] Updated weights for policy 0, policy_version 11707 (0.0024) [2024-06-06 12:50:05,812][14276] Signal inference workers to stop experience collection... (600 times) [2024-06-06 12:50:05,851][14296] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-06 12:50:05,873][14276] Signal inference workers to resume experience collection... (600 times) [2024-06-06 12:50:05,876][14296] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-06 12:50:06,564][14064] Fps is (10 sec: 54052.5, 60 sec: 49149.9, 300 sec: 49151.6). Total num frames: 191905792. Throughput: 0: 48865.9. Samples: 44872900. Policy #0 lag: (min: 1.0, avg: 9.6, max: 22.0) [2024-06-06 12:50:06,564][14064] Avg episode reward: [(0, '0.100')] [2024-06-06 12:50:06,572][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000011713_191905792.pth... [2024-06-06 12:50:06,613][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000010992_180092928.pth [2024-06-06 12:50:08,381][14296] Updated weights for policy 0, policy_version 11717 (0.0028) [2024-06-06 12:50:11,561][14064] Fps is (10 sec: 47513.3, 60 sec: 48332.8, 300 sec: 49096.9). Total num frames: 192102400. Throughput: 0: 48823.6. Samples: 45164460. Policy #0 lag: (min: 1.0, avg: 9.6, max: 22.0) [2024-06-06 12:50:11,561][14064] Avg episode reward: [(0, '0.099')] [2024-06-06 12:50:11,861][14296] Updated weights for policy 0, policy_version 11727 (0.0034) [2024-06-06 12:50:15,001][14296] Updated weights for policy 0, policy_version 11737 (0.0026) [2024-06-06 12:50:16,561][14064] Fps is (10 sec: 45887.7, 60 sec: 48879.0, 300 sec: 49096.9). Total num frames: 192364544. Throughput: 0: 48713.3. Samples: 45453620. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 12:50:16,562][14064] Avg episode reward: [(0, '0.101')] [2024-06-06 12:50:18,678][14296] Updated weights for policy 0, policy_version 11747 (0.0031) [2024-06-06 12:50:21,561][14064] Fps is (10 sec: 49151.3, 60 sec: 48605.7, 300 sec: 49040.9). Total num frames: 192593920. Throughput: 0: 48608.0. Samples: 45592320. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 12:50:21,562][14064] Avg episode reward: [(0, '0.097')] [2024-06-06 12:50:22,027][14296] Updated weights for policy 0, policy_version 11757 (0.0035) [2024-06-06 12:50:25,599][14296] Updated weights for policy 0, policy_version 11767 (0.0030) [2024-06-06 12:50:26,561][14064] Fps is (10 sec: 47513.4, 60 sec: 48332.8, 300 sec: 48985.4). Total num frames: 192839680. Throughput: 0: 48773.7. Samples: 45886440. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:50:26,562][14064] Avg episode reward: [(0, '0.100')] [2024-06-06 12:50:28,814][14296] Updated weights for policy 0, policy_version 11777 (0.0032) [2024-06-06 12:50:31,561][14064] Fps is (10 sec: 45875.7, 60 sec: 48059.7, 300 sec: 48985.4). Total num frames: 193052672. Throughput: 0: 48920.1. Samples: 46181480. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 12:50:31,562][14064] Avg episode reward: [(0, '0.107')] [2024-06-06 12:50:31,637][14276] Saving new best policy, reward=0.107! [2024-06-06 12:50:32,405][14296] Updated weights for policy 0, policy_version 11787 (0.0031) [2024-06-06 12:50:35,558][14296] Updated weights for policy 0, policy_version 11797 (0.0028) [2024-06-06 12:50:36,561][14064] Fps is (10 sec: 49151.6, 60 sec: 48878.9, 300 sec: 49096.4). Total num frames: 193331200. Throughput: 0: 48459.8. Samples: 46314480. Policy #0 lag: (min: 0.0, avg: 10.9, max: 24.0) [2024-06-06 12:50:36,562][14064] Avg episode reward: [(0, '0.101')] [2024-06-06 12:50:39,194][14296] Updated weights for policy 0, policy_version 11807 (0.0031) [2024-06-06 12:50:41,561][14064] Fps is (10 sec: 52428.4, 60 sec: 48878.8, 300 sec: 49040.9). Total num frames: 193576960. Throughput: 0: 48275.9. Samples: 46597460. Policy #0 lag: (min: 0.0, avg: 10.9, max: 24.0) [2024-06-06 12:50:41,563][14064] Avg episode reward: [(0, '0.102')] [2024-06-06 12:50:42,335][14296] Updated weights for policy 0, policy_version 11817 (0.0028) [2024-06-06 12:50:45,983][14296] Updated weights for policy 0, policy_version 11827 (0.0024) [2024-06-06 12:50:46,564][14064] Fps is (10 sec: 47501.4, 60 sec: 48330.8, 300 sec: 48929.4). Total num frames: 193806336. Throughput: 0: 48349.9. Samples: 46896340. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:50:46,565][14064] Avg episode reward: [(0, '0.101')] [2024-06-06 12:50:49,262][14296] Updated weights for policy 0, policy_version 11837 (0.0031) [2024-06-06 12:50:51,561][14064] Fps is (10 sec: 44237.1, 60 sec: 48332.7, 300 sec: 48985.4). Total num frames: 194019328. Throughput: 0: 48074.9. Samples: 47036140. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 12:50:51,562][14064] Avg episode reward: [(0, '0.099')] [2024-06-06 12:50:52,936][14296] Updated weights for policy 0, policy_version 11847 (0.0034) [2024-06-06 12:50:56,179][14296] Updated weights for policy 0, policy_version 11857 (0.0031) [2024-06-06 12:50:56,561][14064] Fps is (10 sec: 47526.3, 60 sec: 48605.9, 300 sec: 48929.9). Total num frames: 194281472. Throughput: 0: 48002.2. Samples: 47324560. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 12:50:56,562][14064] Avg episode reward: [(0, '0.103')] [2024-06-06 12:50:59,682][14296] Updated weights for policy 0, policy_version 11867 (0.0030) [2024-06-06 12:51:01,561][14064] Fps is (10 sec: 52428.1, 60 sec: 48605.7, 300 sec: 48985.4). Total num frames: 194543616. Throughput: 0: 47874.5. Samples: 47607980. Policy #0 lag: (min: 0.0, avg: 12.3, max: 21.0) [2024-06-06 12:51:01,562][14064] Avg episode reward: [(0, '0.097')] [2024-06-06 12:51:02,944][14296] Updated weights for policy 0, policy_version 11877 (0.0037) [2024-06-06 12:51:06,396][14276] Signal inference workers to stop experience collection... (650 times) [2024-06-06 12:51:06,396][14276] Signal inference workers to resume experience collection... (650 times) [2024-06-06 12:51:06,425][14296] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-06 12:51:06,425][14296] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-06 12:51:06,548][14296] Updated weights for policy 0, policy_version 11887 (0.0036) [2024-06-06 12:51:06,561][14064] Fps is (10 sec: 47513.5, 60 sec: 47515.7, 300 sec: 48818.8). Total num frames: 194756608. Throughput: 0: 48206.3. Samples: 47761600. Policy #0 lag: (min: 0.0, avg: 12.3, max: 21.0) [2024-06-06 12:51:06,562][14064] Avg episode reward: [(0, '0.104')] [2024-06-06 12:51:09,504][14296] Updated weights for policy 0, policy_version 11897 (0.0028) [2024-06-06 12:51:11,561][14064] Fps is (10 sec: 45875.4, 60 sec: 48332.7, 300 sec: 48929.8). Total num frames: 195002368. Throughput: 0: 48167.0. Samples: 48053960. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 12:51:11,562][14064] Avg episode reward: [(0, '0.103')] [2024-06-06 12:51:13,228][14296] Updated weights for policy 0, policy_version 11907 (0.0026) [2024-06-06 12:51:16,227][14296] Updated weights for policy 0, policy_version 11917 (0.0027) [2024-06-06 12:51:16,561][14064] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 48874.7). Total num frames: 195248128. Throughput: 0: 47972.9. Samples: 48340260. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 12:51:16,562][14064] Avg episode reward: [(0, '0.109')] [2024-06-06 12:51:16,689][14276] Saving new best policy, reward=0.109! [2024-06-06 12:51:20,136][14296] Updated weights for policy 0, policy_version 11927 (0.0029) [2024-06-06 12:51:21,561][14064] Fps is (10 sec: 50790.0, 60 sec: 48605.8, 300 sec: 48929.8). Total num frames: 195510272. Throughput: 0: 48489.7. Samples: 48496520. Policy #0 lag: (min: 0.0, avg: 6.9, max: 21.0) [2024-06-06 12:51:21,562][14064] Avg episode reward: [(0, '0.101')] [2024-06-06 12:51:23,225][14296] Updated weights for policy 0, policy_version 11937 (0.0026) [2024-06-06 12:51:26,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47786.7, 300 sec: 48707.7). Total num frames: 195706880. Throughput: 0: 48578.8. Samples: 48783500. Policy #0 lag: (min: 0.0, avg: 6.9, max: 21.0) [2024-06-06 12:51:26,562][14064] Avg episode reward: [(0, '0.107')] [2024-06-06 12:51:26,792][14296] Updated weights for policy 0, policy_version 11947 (0.0023) [2024-06-06 12:51:29,766][14296] Updated weights for policy 0, policy_version 11957 (0.0025) [2024-06-06 12:51:31,561][14064] Fps is (10 sec: 47513.6, 60 sec: 48878.8, 300 sec: 48874.3). Total num frames: 195985408. Throughput: 0: 48289.3. Samples: 49069240. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 12:51:31,562][14064] Avg episode reward: [(0, '0.101')] [2024-06-06 12:51:33,582][14296] Updated weights for policy 0, policy_version 11967 (0.0027) [2024-06-06 12:51:36,442][14296] Updated weights for policy 0, policy_version 11977 (0.0019) [2024-06-06 12:51:36,561][14064] Fps is (10 sec: 52428.4, 60 sec: 48332.8, 300 sec: 48874.3). Total num frames: 196231168. Throughput: 0: 48248.4. Samples: 49207320. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 12:51:36,562][14064] Avg episode reward: [(0, '0.107')] [2024-06-06 12:51:40,483][14296] Updated weights for policy 0, policy_version 11987 (0.0032) [2024-06-06 12:51:41,561][14064] Fps is (10 sec: 49152.8, 60 sec: 48332.9, 300 sec: 48874.3). Total num frames: 196476928. Throughput: 0: 48480.9. Samples: 49506200. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 12:51:41,562][14064] Avg episode reward: [(0, '0.105')] [2024-06-06 12:51:43,454][14296] Updated weights for policy 0, policy_version 11997 (0.0028) [2024-06-06 12:51:46,561][14064] Fps is (10 sec: 42599.0, 60 sec: 47515.8, 300 sec: 48596.6). Total num frames: 196657152. Throughput: 0: 48712.2. Samples: 49800020. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 12:51:46,561][14064] Avg episode reward: [(0, '0.102')] [2024-06-06 12:51:47,207][14296] Updated weights for policy 0, policy_version 12007 (0.0033) [2024-06-06 12:51:50,178][14296] Updated weights for policy 0, policy_version 12017 (0.0045) [2024-06-06 12:51:51,561][14064] Fps is (10 sec: 47513.4, 60 sec: 48878.9, 300 sec: 48929.8). Total num frames: 196952064. Throughput: 0: 48153.3. Samples: 49928500. Policy #0 lag: (min: 0.0, avg: 12.2, max: 21.0) [2024-06-06 12:51:51,562][14064] Avg episode reward: [(0, '0.098')] [2024-06-06 12:51:54,175][14296] Updated weights for policy 0, policy_version 12027 (0.0028) [2024-06-06 12:51:56,561][14064] Fps is (10 sec: 52428.5, 60 sec: 48332.8, 300 sec: 48763.2). Total num frames: 197181440. Throughput: 0: 48024.1. Samples: 50215040. Policy #0 lag: (min: 0.0, avg: 12.2, max: 21.0) [2024-06-06 12:51:56,562][14064] Avg episode reward: [(0, '0.100')] [2024-06-06 12:51:57,052][14296] Updated weights for policy 0, policy_version 12037 (0.0029) [2024-06-06 12:52:00,911][14296] Updated weights for policy 0, policy_version 12047 (0.0031) [2024-06-06 12:52:01,561][14064] Fps is (10 sec: 45875.1, 60 sec: 47786.7, 300 sec: 48707.8). Total num frames: 197410816. Throughput: 0: 48179.9. Samples: 50508360. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 12:52:01,562][14064] Avg episode reward: [(0, '0.107')] [2024-06-06 12:52:03,745][14296] Updated weights for policy 0, policy_version 12057 (0.0036) [2024-06-06 12:52:06,562][14064] Fps is (10 sec: 45874.0, 60 sec: 48059.5, 300 sec: 48652.1). Total num frames: 197640192. Throughput: 0: 47934.1. Samples: 50653560. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 12:52:06,562][14064] Avg episode reward: [(0, '0.100')] [2024-06-06 12:52:06,573][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000012063_197640192.pth... [2024-06-06 12:52:06,617][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000011352_185991168.pth [2024-06-06 12:52:07,638][14276] Signal inference workers to stop experience collection... (700 times) [2024-06-06 12:52:07,671][14296] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-06-06 12:52:07,693][14276] Signal inference workers to resume experience collection... (700 times) [2024-06-06 12:52:07,699][14296] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-06-06 12:52:07,853][14296] Updated weights for policy 0, policy_version 12067 (0.0027) [2024-06-06 12:52:10,573][14296] Updated weights for policy 0, policy_version 12077 (0.0027) [2024-06-06 12:52:11,562][14064] Fps is (10 sec: 47512.9, 60 sec: 48059.6, 300 sec: 48763.2). Total num frames: 197885952. Throughput: 0: 47974.9. Samples: 50942380. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 12:52:11,562][14064] Avg episode reward: [(0, '0.101')] [2024-06-06 12:52:14,564][14296] Updated weights for policy 0, policy_version 12087 (0.0029) [2024-06-06 12:52:16,561][14064] Fps is (10 sec: 50792.0, 60 sec: 48332.9, 300 sec: 48763.3). Total num frames: 198148096. Throughput: 0: 47955.8. Samples: 51227240. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 12:52:16,561][14064] Avg episode reward: [(0, '0.106')] [2024-06-06 12:52:17,594][14296] Updated weights for policy 0, policy_version 12097 (0.0026) [2024-06-06 12:52:21,478][14296] Updated weights for policy 0, policy_version 12107 (0.0040) [2024-06-06 12:52:21,561][14064] Fps is (10 sec: 47515.0, 60 sec: 47513.8, 300 sec: 48652.2). Total num frames: 198361088. Throughput: 0: 48172.2. Samples: 51375060. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 12:52:21,561][14064] Avg episode reward: [(0, '0.110')] [2024-06-06 12:52:21,752][14276] Saving new best policy, reward=0.110! [2024-06-06 12:52:24,397][14296] Updated weights for policy 0, policy_version 12117 (0.0029) [2024-06-06 12:52:26,561][14064] Fps is (10 sec: 45874.5, 60 sec: 48332.7, 300 sec: 48652.1). Total num frames: 198606848. Throughput: 0: 47951.9. Samples: 51664040. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 12:52:26,562][14064] Avg episode reward: [(0, '0.109')] [2024-06-06 12:52:28,137][14296] Updated weights for policy 0, policy_version 12127 (0.0025) [2024-06-06 12:52:31,015][14296] Updated weights for policy 0, policy_version 12137 (0.0027) [2024-06-06 12:52:31,561][14064] Fps is (10 sec: 50790.0, 60 sec: 48059.9, 300 sec: 48763.2). Total num frames: 198868992. Throughput: 0: 47870.6. Samples: 51954200. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 12:52:31,562][14064] Avg episode reward: [(0, '0.103')] [2024-06-06 12:52:34,823][14296] Updated weights for policy 0, policy_version 12147 (0.0038) [2024-06-06 12:52:36,561][14064] Fps is (10 sec: 52429.3, 60 sec: 48332.9, 300 sec: 48707.7). Total num frames: 199131136. Throughput: 0: 48587.1. Samples: 52114920. Policy #0 lag: (min: 0.0, avg: 12.7, max: 24.0) [2024-06-06 12:52:36,562][14064] Avg episode reward: [(0, '0.103')] [2024-06-06 12:52:37,615][14296] Updated weights for policy 0, policy_version 12157 (0.0026) [2024-06-06 12:52:41,561][14064] Fps is (10 sec: 45874.2, 60 sec: 47513.4, 300 sec: 48596.6). Total num frames: 199327744. Throughput: 0: 48664.7. Samples: 52404960. Policy #0 lag: (min: 0.0, avg: 12.7, max: 24.0) [2024-06-06 12:52:41,562][14064] Avg episode reward: [(0, '0.105')] [2024-06-06 12:52:41,578][14296] Updated weights for policy 0, policy_version 12167 (0.0025) [2024-06-06 12:52:44,656][14296] Updated weights for policy 0, policy_version 12177 (0.0037) [2024-06-06 12:52:46,561][14064] Fps is (10 sec: 45874.9, 60 sec: 48878.8, 300 sec: 48652.1). Total num frames: 199589888. Throughput: 0: 48291.5. Samples: 52681480. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 12:52:46,562][14064] Avg episode reward: [(0, '0.105')] [2024-06-06 12:52:48,307][14296] Updated weights for policy 0, policy_version 12187 (0.0017) [2024-06-06 12:52:51,451][14296] Updated weights for policy 0, policy_version 12197 (0.0028) [2024-06-06 12:52:51,561][14064] Fps is (10 sec: 50791.3, 60 sec: 48059.8, 300 sec: 48652.2). Total num frames: 199835648. Throughput: 0: 48251.4. Samples: 52824860. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 12:52:51,562][14064] Avg episode reward: [(0, '0.107')] [2024-06-06 12:52:55,079][14296] Updated weights for policy 0, policy_version 12207 (0.0031) [2024-06-06 12:52:56,561][14064] Fps is (10 sec: 50791.2, 60 sec: 48605.9, 300 sec: 48652.2). Total num frames: 200097792. Throughput: 0: 48371.0. Samples: 53119060. Policy #0 lag: (min: 1.0, avg: 9.0, max: 23.0) [2024-06-06 12:52:56,561][14064] Avg episode reward: [(0, '0.106')] [2024-06-06 12:52:56,895][14276] Signal inference workers to stop experience collection... (750 times) [2024-06-06 12:52:56,935][14296] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-06-06 12:52:57,006][14276] Signal inference workers to resume experience collection... (750 times) [2024-06-06 12:52:57,007][14296] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-06-06 12:52:58,083][14296] Updated weights for policy 0, policy_version 12217 (0.0022) [2024-06-06 12:53:01,562][14064] Fps is (10 sec: 45874.3, 60 sec: 48059.6, 300 sec: 48485.5). Total num frames: 200294400. Throughput: 0: 48695.3. Samples: 53418540. Policy #0 lag: (min: 1.0, avg: 9.0, max: 23.0) [2024-06-06 12:53:01,563][14064] Avg episode reward: [(0, '0.106')] [2024-06-06 12:53:01,849][14296] Updated weights for policy 0, policy_version 12227 (0.0017) [2024-06-06 12:53:04,777][14296] Updated weights for policy 0, policy_version 12237 (0.0035) [2024-06-06 12:53:06,561][14064] Fps is (10 sec: 45875.1, 60 sec: 48606.1, 300 sec: 48596.6). Total num frames: 200556544. Throughput: 0: 48443.1. Samples: 53555000. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 12:53:06,561][14064] Avg episode reward: [(0, '0.111')] [2024-06-06 12:53:08,640][14296] Updated weights for policy 0, policy_version 12247 (0.0029) [2024-06-06 12:53:11,561][14064] Fps is (10 sec: 50791.0, 60 sec: 48606.0, 300 sec: 48596.6). Total num frames: 200802304. Throughput: 0: 48425.3. Samples: 53843180. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 12:53:11,562][14064] Avg episode reward: [(0, '0.107')] [2024-06-06 12:53:11,715][14296] Updated weights for policy 0, policy_version 12257 (0.0025) [2024-06-06 12:53:15,310][14296] Updated weights for policy 0, policy_version 12267 (0.0035) [2024-06-06 12:53:16,561][14064] Fps is (10 sec: 50789.4, 60 sec: 48605.7, 300 sec: 48596.6). Total num frames: 201064448. Throughput: 0: 48341.2. Samples: 54129560. Policy #0 lag: (min: 0.0, avg: 7.1, max: 20.0) [2024-06-06 12:53:16,562][14064] Avg episode reward: [(0, '0.108')] [2024-06-06 12:53:18,468][14296] Updated weights for policy 0, policy_version 12277 (0.0029) [2024-06-06 12:53:21,561][14064] Fps is (10 sec: 44237.4, 60 sec: 48059.7, 300 sec: 48374.5). Total num frames: 201244672. Throughput: 0: 47950.3. Samples: 54272680. Policy #0 lag: (min: 0.0, avg: 7.1, max: 20.0) [2024-06-06 12:53:21,562][14064] Avg episode reward: [(0, '0.111')] [2024-06-06 12:53:22,306][14296] Updated weights for policy 0, policy_version 12287 (0.0032) [2024-06-06 12:53:25,074][14296] Updated weights for policy 0, policy_version 12297 (0.0037) [2024-06-06 12:53:26,561][14064] Fps is (10 sec: 47514.1, 60 sec: 48879.0, 300 sec: 48596.6). Total num frames: 201539584. Throughput: 0: 47964.2. Samples: 54563340. Policy #0 lag: (min: 0.0, avg: 11.3, max: 24.0) [2024-06-06 12:53:26,562][14064] Avg episode reward: [(0, '0.113')] [2024-06-06 12:53:26,693][14276] Saving new best policy, reward=0.113! [2024-06-06 12:53:29,011][14296] Updated weights for policy 0, policy_version 12307 (0.0028) [2024-06-06 12:53:31,561][14064] Fps is (10 sec: 52428.8, 60 sec: 48332.8, 300 sec: 48541.1). Total num frames: 201768960. Throughput: 0: 48286.8. Samples: 54854380. Policy #0 lag: (min: 0.0, avg: 11.3, max: 24.0) [2024-06-06 12:53:31,562][14064] Avg episode reward: [(0, '0.106')] [2024-06-06 12:53:31,914][14296] Updated weights for policy 0, policy_version 12317 (0.0027) [2024-06-06 12:53:35,782][14296] Updated weights for policy 0, policy_version 12327 (0.0037) [2024-06-06 12:53:36,561][14064] Fps is (10 sec: 47513.8, 60 sec: 48059.8, 300 sec: 48485.7). Total num frames: 202014720. Throughput: 0: 48532.5. Samples: 55008820. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 12:53:36,561][14064] Avg episode reward: [(0, '0.106')] [2024-06-06 12:53:38,873][14296] Updated weights for policy 0, policy_version 12337 (0.0035) [2024-06-06 12:53:41,561][14064] Fps is (10 sec: 44236.8, 60 sec: 48059.9, 300 sec: 48319.0). Total num frames: 202211328. Throughput: 0: 48019.5. Samples: 55279940. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 12:53:41,562][14064] Avg episode reward: [(0, '0.111')] [2024-06-06 12:53:42,772][14296] Updated weights for policy 0, policy_version 12347 (0.0031) [2024-06-06 12:53:45,794][14296] Updated weights for policy 0, policy_version 12357 (0.0039) [2024-06-06 12:53:46,561][14064] Fps is (10 sec: 47513.3, 60 sec: 48332.8, 300 sec: 48430.4). Total num frames: 202489856. Throughput: 0: 47796.6. Samples: 55569380. Policy #0 lag: (min: 1.0, avg: 9.5, max: 22.0) [2024-06-06 12:53:46,562][14064] Avg episode reward: [(0, '0.110')] [2024-06-06 12:53:49,551][14296] Updated weights for policy 0, policy_version 12367 (0.0034) [2024-06-06 12:53:51,564][14064] Fps is (10 sec: 52414.7, 60 sec: 48330.7, 300 sec: 48429.6). Total num frames: 202735616. Throughput: 0: 48082.0. Samples: 55718820. Policy #0 lag: (min: 1.0, avg: 9.5, max: 22.0) [2024-06-06 12:53:51,565][14064] Avg episode reward: [(0, '0.112')] [2024-06-06 12:53:52,537][14296] Updated weights for policy 0, policy_version 12377 (0.0036) [2024-06-06 12:53:56,493][14296] Updated weights for policy 0, policy_version 12387 (0.0036) [2024-06-06 12:53:56,561][14064] Fps is (10 sec: 45874.9, 60 sec: 47513.4, 300 sec: 48263.4). Total num frames: 202948608. Throughput: 0: 48213.8. Samples: 56012800. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:53:56,562][14064] Avg episode reward: [(0, '0.114')] [2024-06-06 12:53:57,625][14276] Signal inference workers to stop experience collection... (800 times) [2024-06-06 12:53:57,675][14296] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-06-06 12:53:57,676][14276] Signal inference workers to resume experience collection... (800 times) [2024-06-06 12:53:57,687][14296] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-06-06 12:53:59,256][14296] Updated weights for policy 0, policy_version 12397 (0.0025) [2024-06-06 12:54:01,561][14064] Fps is (10 sec: 45887.5, 60 sec: 48333.0, 300 sec: 48263.4). Total num frames: 203194368. Throughput: 0: 48041.9. Samples: 56291440. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:54:01,562][14064] Avg episode reward: [(0, '0.109')] [2024-06-06 12:54:03,394][14296] Updated weights for policy 0, policy_version 12407 (0.0023) [2024-06-06 12:54:06,157][14296] Updated weights for policy 0, policy_version 12417 (0.0029) [2024-06-06 12:54:06,561][14064] Fps is (10 sec: 50790.4, 60 sec: 48332.6, 300 sec: 48318.9). Total num frames: 203456512. Throughput: 0: 47909.6. Samples: 56428620. Policy #0 lag: (min: 0.0, avg: 7.1, max: 20.0) [2024-06-06 12:54:06,562][14064] Avg episode reward: [(0, '0.114')] [2024-06-06 12:54:06,567][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000012418_203456512.pth... [2024-06-06 12:54:06,614][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000011713_191905792.pth [2024-06-06 12:54:06,617][14276] Saving new best policy, reward=0.114! [2024-06-06 12:54:10,210][14296] Updated weights for policy 0, policy_version 12427 (0.0024) [2024-06-06 12:54:11,561][14064] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 203685888. Throughput: 0: 48170.3. Samples: 56731000. Policy #0 lag: (min: 0.0, avg: 7.1, max: 20.0) [2024-06-06 12:54:11,562][14064] Avg episode reward: [(0, '0.111')] [2024-06-06 12:54:13,025][14296] Updated weights for policy 0, policy_version 12437 (0.0032) [2024-06-06 12:54:16,561][14064] Fps is (10 sec: 44237.3, 60 sec: 47240.6, 300 sec: 48207.8). Total num frames: 203898880. Throughput: 0: 48054.2. Samples: 57016820. Policy #0 lag: (min: 0.0, avg: 11.8, max: 21.0) [2024-06-06 12:54:16,562][14064] Avg episode reward: [(0, '0.106')] [2024-06-06 12:54:17,122][14296] Updated weights for policy 0, policy_version 12447 (0.0023) [2024-06-06 12:54:19,640][14296] Updated weights for policy 0, policy_version 12457 (0.0032) [2024-06-06 12:54:21,561][14064] Fps is (10 sec: 47514.0, 60 sec: 48605.9, 300 sec: 48207.9). Total num frames: 204161024. Throughput: 0: 47564.1. Samples: 57149200. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 12:54:21,561][14064] Avg episode reward: [(0, '0.105')] [2024-06-06 12:54:24,073][14296] Updated weights for policy 0, policy_version 12467 (0.0029) [2024-06-06 12:54:26,561][14064] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 48207.8). Total num frames: 204390400. Throughput: 0: 47824.8. Samples: 57432060. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 12:54:26,562][14064] Avg episode reward: [(0, '0.113')] [2024-06-06 12:54:26,722][14296] Updated weights for policy 0, policy_version 12477 (0.0031) [2024-06-06 12:54:30,805][14296] Updated weights for policy 0, policy_version 12487 (0.0029) [2024-06-06 12:54:31,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47786.7, 300 sec: 48263.4). Total num frames: 204636160. Throughput: 0: 47927.2. Samples: 57726100. Policy #0 lag: (min: 1.0, avg: 8.5, max: 21.0) [2024-06-06 12:54:31,562][14064] Avg episode reward: [(0, '0.115')] [2024-06-06 12:54:33,670][14296] Updated weights for policy 0, policy_version 12497 (0.0030) [2024-06-06 12:54:36,561][14064] Fps is (10 sec: 45874.6, 60 sec: 47240.4, 300 sec: 48152.3). Total num frames: 204849152. Throughput: 0: 47829.3. Samples: 57871020. Policy #0 lag: (min: 1.0, avg: 8.5, max: 21.0) [2024-06-06 12:54:36,562][14064] Avg episode reward: [(0, '0.112')] [2024-06-06 12:54:37,514][14296] Updated weights for policy 0, policy_version 12507 (0.0031) [2024-06-06 12:54:40,335][14296] Updated weights for policy 0, policy_version 12517 (0.0027) [2024-06-06 12:54:41,561][14064] Fps is (10 sec: 49151.6, 60 sec: 48605.8, 300 sec: 48207.8). Total num frames: 205127680. Throughput: 0: 47605.4. Samples: 58155040. Policy #0 lag: (min: 1.0, avg: 11.5, max: 22.0) [2024-06-06 12:54:41,562][14064] Avg episode reward: [(0, '0.109')] [2024-06-06 12:54:44,376][14296] Updated weights for policy 0, policy_version 12527 (0.0025) [2024-06-06 12:54:46,561][14064] Fps is (10 sec: 52429.5, 60 sec: 48059.7, 300 sec: 48318.9). Total num frames: 205373440. Throughput: 0: 47819.5. Samples: 58443320. Policy #0 lag: (min: 1.0, avg: 11.5, max: 22.0) [2024-06-06 12:54:46,562][14064] Avg episode reward: [(0, '0.114')] [2024-06-06 12:54:47,003][14296] Updated weights for policy 0, policy_version 12537 (0.0033) [2024-06-06 12:54:51,266][14296] Updated weights for policy 0, policy_version 12547 (0.0033) [2024-06-06 12:54:51,561][14064] Fps is (10 sec: 45875.4, 60 sec: 47515.7, 300 sec: 48207.8). Total num frames: 205586432. Throughput: 0: 48086.8. Samples: 58592520. Policy #0 lag: (min: 0.0, avg: 7.1, max: 20.0) [2024-06-06 12:54:51,562][14064] Avg episode reward: [(0, '0.111')] [2024-06-06 12:54:53,894][14296] Updated weights for policy 0, policy_version 12557 (0.0023) [2024-06-06 12:54:56,561][14064] Fps is (10 sec: 44236.6, 60 sec: 47786.7, 300 sec: 48096.7). Total num frames: 205815808. Throughput: 0: 47644.4. Samples: 58875000. Policy #0 lag: (min: 0.0, avg: 7.1, max: 20.0) [2024-06-06 12:54:56,562][14064] Avg episode reward: [(0, '0.113')] [2024-06-06 12:54:58,080][14296] Updated weights for policy 0, policy_version 12567 (0.0034) [2024-06-06 12:55:00,087][14276] Signal inference workers to stop experience collection... (850 times) [2024-06-06 12:55:00,120][14296] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-06-06 12:55:00,149][14276] Signal inference workers to resume experience collection... (850 times) [2024-06-06 12:55:00,149][14296] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-06-06 12:55:00,895][14296] Updated weights for policy 0, policy_version 12577 (0.0029) [2024-06-06 12:55:01,561][14064] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 48041.7). Total num frames: 206077952. Throughput: 0: 47612.9. Samples: 59159400. Policy #0 lag: (min: 0.0, avg: 11.7, max: 23.0) [2024-06-06 12:55:01,562][14064] Avg episode reward: [(0, '0.113')] [2024-06-06 12:55:04,795][14296] Updated weights for policy 0, policy_version 12587 (0.0031) [2024-06-06 12:55:06,561][14064] Fps is (10 sec: 50791.3, 60 sec: 47786.8, 300 sec: 48207.8). Total num frames: 206323712. Throughput: 0: 48169.8. Samples: 59316840. Policy #0 lag: (min: 0.0, avg: 11.7, max: 23.0) [2024-06-06 12:55:06,561][14064] Avg episode reward: [(0, '0.110')] [2024-06-06 12:55:07,571][14296] Updated weights for policy 0, policy_version 12597 (0.0028) [2024-06-06 12:55:11,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47513.6, 300 sec: 48041.2). Total num frames: 206536704. Throughput: 0: 48359.1. Samples: 59608220. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:55:11,562][14064] Avg episode reward: [(0, '0.113')] [2024-06-06 12:55:11,679][14296] Updated weights for policy 0, policy_version 12607 (0.0039) [2024-06-06 12:55:14,313][14296] Updated weights for policy 0, policy_version 12617 (0.0027) [2024-06-06 12:55:16,561][14064] Fps is (10 sec: 45874.8, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 206782464. Throughput: 0: 48063.1. Samples: 59888940. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:55:16,562][14064] Avg episode reward: [(0, '0.111')] [2024-06-06 12:55:18,525][14296] Updated weights for policy 0, policy_version 12627 (0.0042) [2024-06-06 12:55:21,040][14296] Updated weights for policy 0, policy_version 12637 (0.0029) [2024-06-06 12:55:21,561][14064] Fps is (10 sec: 52429.0, 60 sec: 48332.7, 300 sec: 48207.8). Total num frames: 207060992. Throughput: 0: 47855.3. Samples: 60024500. Policy #0 lag: (min: 2.0, avg: 10.2, max: 21.0) [2024-06-06 12:55:21,562][14064] Avg episode reward: [(0, '0.108')] [2024-06-06 12:55:25,402][14296] Updated weights for policy 0, policy_version 12647 (0.0030) [2024-06-06 12:55:26,561][14064] Fps is (10 sec: 50790.5, 60 sec: 48332.8, 300 sec: 48263.4). Total num frames: 207290368. Throughput: 0: 48160.1. Samples: 60322240. Policy #0 lag: (min: 2.0, avg: 10.2, max: 21.0) [2024-06-06 12:55:26,562][14064] Avg episode reward: [(0, '0.112')] [2024-06-06 12:55:27,983][14296] Updated weights for policy 0, policy_version 12657 (0.0034) [2024-06-06 12:55:31,561][14064] Fps is (10 sec: 42598.2, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 207486976. Throughput: 0: 48269.3. Samples: 60615440. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:55:31,562][14064] Avg episode reward: [(0, '0.106')] [2024-06-06 12:55:32,158][14296] Updated weights for policy 0, policy_version 12667 (0.0038) [2024-06-06 12:55:34,708][14296] Updated weights for policy 0, policy_version 12677 (0.0032) [2024-06-06 12:55:36,561][14064] Fps is (10 sec: 45875.5, 60 sec: 48333.0, 300 sec: 48041.2). Total num frames: 207749120. Throughput: 0: 47833.0. Samples: 60745000. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 12:55:36,561][14064] Avg episode reward: [(0, '0.107')] [2024-06-06 12:55:38,942][14296] Updated weights for policy 0, policy_version 12687 (0.0026) [2024-06-06 12:55:41,505][14296] Updated weights for policy 0, policy_version 12697 (0.0039) [2024-06-06 12:55:41,561][14064] Fps is (10 sec: 54067.5, 60 sec: 48332.9, 300 sec: 48208.3). Total num frames: 208027648. Throughput: 0: 47942.3. Samples: 61032400. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-06 12:55:41,562][14064] Avg episode reward: [(0, '0.113')] [2024-06-06 12:55:45,974][14296] Updated weights for policy 0, policy_version 12707 (0.0040) [2024-06-06 12:55:46,561][14064] Fps is (10 sec: 47512.9, 60 sec: 47513.6, 300 sec: 48152.3). Total num frames: 208224256. Throughput: 0: 48001.7. Samples: 61319480. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-06 12:55:46,562][14064] Avg episode reward: [(0, '0.107')] [2024-06-06 12:55:48,334][14296] Updated weights for policy 0, policy_version 12717 (0.0020) [2024-06-06 12:55:51,561][14064] Fps is (10 sec: 40960.4, 60 sec: 47513.7, 300 sec: 47985.7). Total num frames: 208437248. Throughput: 0: 47785.4. Samples: 61467180. Policy #0 lag: (min: 1.0, avg: 12.6, max: 21.0) [2024-06-06 12:55:51,561][14064] Avg episode reward: [(0, '0.111')] [2024-06-06 12:55:52,755][14296] Updated weights for policy 0, policy_version 12727 (0.0029) [2024-06-06 12:55:55,150][14296] Updated weights for policy 0, policy_version 12737 (0.0024) [2024-06-06 12:55:56,561][14064] Fps is (10 sec: 49152.2, 60 sec: 48332.9, 300 sec: 48041.2). Total num frames: 208715776. Throughput: 0: 47574.7. Samples: 61749080. Policy #0 lag: (min: 1.0, avg: 12.6, max: 21.0) [2024-06-06 12:55:56,562][14064] Avg episode reward: [(0, '0.116')] [2024-06-06 12:55:56,574][14276] Saving new best policy, reward=0.116! [2024-06-06 12:55:59,471][14296] Updated weights for policy 0, policy_version 12747 (0.0033) [2024-06-06 12:56:01,561][14064] Fps is (10 sec: 52428.3, 60 sec: 48059.8, 300 sec: 48152.3). Total num frames: 208961536. Throughput: 0: 47924.5. Samples: 62045540. Policy #0 lag: (min: 0.0, avg: 7.9, max: 20.0) [2024-06-06 12:56:01,561][14064] Avg episode reward: [(0, '0.112')] [2024-06-06 12:56:01,801][14276] Signal inference workers to stop experience collection... (900 times) [2024-06-06 12:56:01,801][14276] Signal inference workers to resume experience collection... (900 times) [2024-06-06 12:56:01,822][14296] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-06-06 12:56:01,856][14296] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-06-06 12:56:01,943][14296] Updated weights for policy 0, policy_version 12757 (0.0027) [2024-06-06 12:56:06,313][14296] Updated weights for policy 0, policy_version 12767 (0.0028) [2024-06-06 12:56:06,561][14064] Fps is (10 sec: 47513.3, 60 sec: 47786.5, 300 sec: 48096.8). Total num frames: 209190912. Throughput: 0: 48200.3. Samples: 62193520. Policy #0 lag: (min: 0.0, avg: 7.9, max: 20.0) [2024-06-06 12:56:06,562][14064] Avg episode reward: [(0, '0.110')] [2024-06-06 12:56:06,675][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000012769_209207296.pth... [2024-06-06 12:56:06,721][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000012063_197640192.pth [2024-06-06 12:56:08,835][14296] Updated weights for policy 0, policy_version 12777 (0.0019) [2024-06-06 12:56:11,561][14064] Fps is (10 sec: 45874.8, 60 sec: 48059.7, 300 sec: 48041.2). Total num frames: 209420288. Throughput: 0: 47963.0. Samples: 62480580. Policy #0 lag: (min: 0.0, avg: 12.1, max: 23.0) [2024-06-06 12:56:11,562][14064] Avg episode reward: [(0, '0.118')] [2024-06-06 12:56:11,562][14276] Saving new best policy, reward=0.118! [2024-06-06 12:56:13,136][14296] Updated weights for policy 0, policy_version 12787 (0.0039) [2024-06-06 12:56:15,562][14296] Updated weights for policy 0, policy_version 12797 (0.0031) [2024-06-06 12:56:16,561][14064] Fps is (10 sec: 49152.0, 60 sec: 48332.7, 300 sec: 48041.2). Total num frames: 209682432. Throughput: 0: 47771.5. Samples: 62765160. Policy #0 lag: (min: 0.0, avg: 12.1, max: 23.0) [2024-06-06 12:56:16,562][14064] Avg episode reward: [(0, '0.113')] [2024-06-06 12:56:19,930][14296] Updated weights for policy 0, policy_version 12807 (0.0035) [2024-06-06 12:56:21,561][14064] Fps is (10 sec: 50790.3, 60 sec: 47786.6, 300 sec: 48207.8). Total num frames: 209928192. Throughput: 0: 48222.1. Samples: 62915000. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 12:56:21,562][14064] Avg episode reward: [(0, '0.117')] [2024-06-06 12:56:22,428][14296] Updated weights for policy 0, policy_version 12817 (0.0025) [2024-06-06 12:56:26,561][14064] Fps is (10 sec: 45875.6, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 210141184. Throughput: 0: 48246.2. Samples: 63203480. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 12:56:26,562][14064] Avg episode reward: [(0, '0.108')] [2024-06-06 12:56:26,839][14296] Updated weights for policy 0, policy_version 12827 (0.0023) [2024-06-06 12:56:29,300][14296] Updated weights for policy 0, policy_version 12837 (0.0028) [2024-06-06 12:56:31,561][14064] Fps is (10 sec: 45875.8, 60 sec: 48332.9, 300 sec: 47985.7). Total num frames: 210386944. Throughput: 0: 48244.1. Samples: 63490460. Policy #0 lag: (min: 1.0, avg: 10.8, max: 22.0) [2024-06-06 12:56:31,562][14064] Avg episode reward: [(0, '0.116')] [2024-06-06 12:56:33,557][14296] Updated weights for policy 0, policy_version 12847 (0.0039) [2024-06-06 12:56:36,043][14296] Updated weights for policy 0, policy_version 12857 (0.0035) [2024-06-06 12:56:36,561][14064] Fps is (10 sec: 52427.9, 60 sec: 48605.6, 300 sec: 48096.7). Total num frames: 210665472. Throughput: 0: 48070.3. Samples: 63630360. Policy #0 lag: (min: 1.0, avg: 10.8, max: 22.0) [2024-06-06 12:56:36,562][14064] Avg episode reward: [(0, '0.112')] [2024-06-06 12:56:40,450][14296] Updated weights for policy 0, policy_version 12867 (0.0038) [2024-06-06 12:56:41,561][14064] Fps is (10 sec: 49151.6, 60 sec: 47513.6, 300 sec: 48207.8). Total num frames: 210878464. Throughput: 0: 48149.3. Samples: 63915800. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 12:56:41,562][14064] Avg episode reward: [(0, '0.116')] [2024-06-06 12:56:42,859][14296] Updated weights for policy 0, policy_version 12877 (0.0045) [2024-06-06 12:56:46,561][14064] Fps is (10 sec: 40960.7, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 211075072. Throughput: 0: 47862.2. Samples: 64199340. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 12:56:46,562][14064] Avg episode reward: [(0, '0.110')] [2024-06-06 12:56:47,481][14296] Updated weights for policy 0, policy_version 12887 (0.0038) [2024-06-06 12:56:49,949][14296] Updated weights for policy 0, policy_version 12897 (0.0027) [2024-06-06 12:56:51,561][14064] Fps is (10 sec: 45875.9, 60 sec: 48332.8, 300 sec: 47985.7). Total num frames: 211337216. Throughput: 0: 47548.7. Samples: 64333200. Policy #0 lag: (min: 0.0, avg: 8.6, max: 22.0) [2024-06-06 12:56:51,561][14064] Avg episode reward: [(0, '0.117')] [2024-06-06 12:56:54,447][14296] Updated weights for policy 0, policy_version 12907 (0.0033) [2024-06-06 12:56:56,561][14064] Fps is (10 sec: 54066.6, 60 sec: 48332.7, 300 sec: 48152.3). Total num frames: 211615744. Throughput: 0: 47541.3. Samples: 64619940. Policy #0 lag: (min: 0.0, avg: 8.6, max: 22.0) [2024-06-06 12:56:56,562][14064] Avg episode reward: [(0, '0.112')] [2024-06-06 12:56:57,059][14296] Updated weights for policy 0, policy_version 12917 (0.0026) [2024-06-06 12:57:01,466][14296] Updated weights for policy 0, policy_version 12927 (0.0028) [2024-06-06 12:57:01,561][14064] Fps is (10 sec: 45874.8, 60 sec: 47240.5, 300 sec: 47985.7). Total num frames: 211795968. Throughput: 0: 47721.9. Samples: 64912640. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-06 12:57:01,562][14064] Avg episode reward: [(0, '0.117')] [2024-06-06 12:57:02,924][14276] Signal inference workers to stop experience collection... (950 times) [2024-06-06 12:57:02,971][14296] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-06-06 12:57:03,035][14276] Signal inference workers to resume experience collection... (950 times) [2024-06-06 12:57:03,035][14296] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-06-06 12:57:03,843][14296] Updated weights for policy 0, policy_version 12937 (0.0036) [2024-06-06 12:57:06,561][14064] Fps is (10 sec: 40959.9, 60 sec: 47240.5, 300 sec: 47930.1). Total num frames: 212025344. Throughput: 0: 47382.6. Samples: 65047220. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-06 12:57:06,562][14064] Avg episode reward: [(0, '0.112')] [2024-06-06 12:57:08,258][14296] Updated weights for policy 0, policy_version 12947 (0.0032) [2024-06-06 12:57:10,546][14296] Updated weights for policy 0, policy_version 12957 (0.0025) [2024-06-06 12:57:11,561][14064] Fps is (10 sec: 50789.7, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 212303872. Throughput: 0: 47195.4. Samples: 65327280. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 12:57:11,562][14064] Avg episode reward: [(0, '0.114')] [2024-06-06 12:57:15,086][14296] Updated weights for policy 0, policy_version 12967 (0.0027) [2024-06-06 12:57:16,561][14064] Fps is (10 sec: 50790.8, 60 sec: 47513.6, 300 sec: 48041.2). Total num frames: 212533248. Throughput: 0: 47326.5. Samples: 65620160. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 12:57:16,562][14064] Avg episode reward: [(0, '0.114')] [2024-06-06 12:57:17,639][14296] Updated weights for policy 0, policy_version 12977 (0.0023) [2024-06-06 12:57:21,561][14064] Fps is (10 sec: 44237.4, 60 sec: 46967.5, 300 sec: 47930.2). Total num frames: 212746240. Throughput: 0: 47500.7. Samples: 65767880. Policy #0 lag: (min: 1.0, avg: 11.8, max: 22.0) [2024-06-06 12:57:21,562][14064] Avg episode reward: [(0, '0.116')] [2024-06-06 12:57:21,833][14296] Updated weights for policy 0, policy_version 12987 (0.0040) [2024-06-06 12:57:24,539][14296] Updated weights for policy 0, policy_version 12997 (0.0029) [2024-06-06 12:57:26,561][14064] Fps is (10 sec: 47513.4, 60 sec: 47786.6, 300 sec: 47930.1). Total num frames: 213008384. Throughput: 0: 47537.2. Samples: 66054980. Policy #0 lag: (min: 1.0, avg: 11.8, max: 22.0) [2024-06-06 12:57:26,562][14064] Avg episode reward: [(0, '0.118')] [2024-06-06 12:57:28,905][14296] Updated weights for policy 0, policy_version 13007 (0.0038) [2024-06-06 12:57:31,266][14296] Updated weights for policy 0, policy_version 13017 (0.0031) [2024-06-06 12:57:31,561][14064] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47930.2). Total num frames: 213270528. Throughput: 0: 47374.8. Samples: 66331200. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 12:57:31,561][14064] Avg episode reward: [(0, '0.120')] [2024-06-06 12:57:31,562][14276] Saving new best policy, reward=0.120! [2024-06-06 12:57:35,776][14296] Updated weights for policy 0, policy_version 13027 (0.0031) [2024-06-06 12:57:36,561][14064] Fps is (10 sec: 47514.6, 60 sec: 46967.7, 300 sec: 47985.7). Total num frames: 213483520. Throughput: 0: 47849.3. Samples: 66486420. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 12:57:36,561][14064] Avg episode reward: [(0, '0.117')] [2024-06-06 12:57:38,012][14296] Updated weights for policy 0, policy_version 13037 (0.0030) [2024-06-06 12:57:41,561][14064] Fps is (10 sec: 42598.0, 60 sec: 46967.5, 300 sec: 47819.1). Total num frames: 213696512. Throughput: 0: 47792.1. Samples: 66770580. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 12:57:41,562][14064] Avg episode reward: [(0, '0.116')] [2024-06-06 12:57:42,435][14296] Updated weights for policy 0, policy_version 13047 (0.0030) [2024-06-06 12:57:44,991][14296] Updated weights for policy 0, policy_version 13057 (0.0025) [2024-06-06 12:57:46,561][14064] Fps is (10 sec: 47512.8, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 213958656. Throughput: 0: 47600.3. Samples: 67054660. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:57:46,562][14064] Avg episode reward: [(0, '0.120')] [2024-06-06 12:57:49,376][14296] Updated weights for policy 0, policy_version 13067 (0.0031) [2024-06-06 12:57:51,561][14064] Fps is (10 sec: 54066.8, 60 sec: 48332.6, 300 sec: 47930.1). Total num frames: 214237184. Throughput: 0: 47887.6. Samples: 67202160. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 12:57:51,562][14064] Avg episode reward: [(0, '0.118')] [2024-06-06 12:57:52,278][14296] Updated weights for policy 0, policy_version 13077 (0.0027) [2024-06-06 12:57:56,363][14296] Updated weights for policy 0, policy_version 13087 (0.0034) [2024-06-06 12:57:56,561][14064] Fps is (10 sec: 47513.6, 60 sec: 46967.5, 300 sec: 47930.2). Total num frames: 214433792. Throughput: 0: 48032.0. Samples: 67488720. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 12:57:56,562][14064] Avg episode reward: [(0, '0.118')] [2024-06-06 12:57:59,040][14296] Updated weights for policy 0, policy_version 13097 (0.0026) [2024-06-06 12:58:01,561][14064] Fps is (10 sec: 40960.4, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 214646784. Throughput: 0: 47843.7. Samples: 67773120. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 12:58:01,562][14064] Avg episode reward: [(0, '0.113')] [2024-06-06 12:58:03,231][14296] Updated weights for policy 0, policy_version 13107 (0.0030) [2024-06-06 12:58:06,171][14296] Updated weights for policy 0, policy_version 13117 (0.0018) [2024-06-06 12:58:06,561][14064] Fps is (10 sec: 49152.5, 60 sec: 48332.9, 300 sec: 47874.6). Total num frames: 214925312. Throughput: 0: 47502.7. Samples: 67905500. Policy #0 lag: (min: 0.0, avg: 12.3, max: 21.0) [2024-06-06 12:58:06,561][14064] Avg episode reward: [(0, '0.120')] [2024-06-06 12:58:06,572][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000013118_214925312.pth... [2024-06-06 12:58:06,614][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000012418_203456512.pth [2024-06-06 12:58:09,890][14296] Updated weights for policy 0, policy_version 13127 (0.0028) [2024-06-06 12:58:11,561][14064] Fps is (10 sec: 54066.8, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 215187456. Throughput: 0: 47627.6. Samples: 68198220. Policy #0 lag: (min: 0.0, avg: 12.3, max: 21.0) [2024-06-06 12:58:11,562][14064] Avg episode reward: [(0, '0.118')] [2024-06-06 12:58:13,097][14296] Updated weights for policy 0, policy_version 13137 (0.0032) [2024-06-06 12:58:16,561][14064] Fps is (10 sec: 44236.5, 60 sec: 47240.6, 300 sec: 47874.6). Total num frames: 215367680. Throughput: 0: 48081.2. Samples: 68494860. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 12:58:16,562][14064] Avg episode reward: [(0, '0.116')] [2024-06-06 12:58:16,761][14296] Updated weights for policy 0, policy_version 13147 (0.0029) [2024-06-06 12:58:18,442][14276] Signal inference workers to stop experience collection... (1000 times) [2024-06-06 12:58:18,443][14276] Signal inference workers to resume experience collection... (1000 times) [2024-06-06 12:58:18,462][14296] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-06-06 12:58:18,462][14296] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-06-06 12:58:19,859][14296] Updated weights for policy 0, policy_version 13157 (0.0036) [2024-06-06 12:58:21,561][14064] Fps is (10 sec: 42598.8, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 215613440. Throughput: 0: 47587.5. Samples: 68627860. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 12:58:21,562][14064] Avg episode reward: [(0, '0.120')] [2024-06-06 12:58:23,718][14296] Updated weights for policy 0, policy_version 13167 (0.0021) [2024-06-06 12:58:26,446][14296] Updated weights for policy 0, policy_version 13177 (0.0026) [2024-06-06 12:58:26,561][14064] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 215891968. Throughput: 0: 47539.1. Samples: 68909840. Policy #0 lag: (min: 0.0, avg: 11.7, max: 24.0) [2024-06-06 12:58:26,562][14064] Avg episode reward: [(0, '0.122')] [2024-06-06 12:58:26,569][14276] Saving new best policy, reward=0.122! [2024-06-06 12:58:30,305][14296] Updated weights for policy 0, policy_version 13187 (0.0033) [2024-06-06 12:58:31,561][14064] Fps is (10 sec: 52428.7, 60 sec: 47786.6, 300 sec: 47874.6). Total num frames: 216137728. Throughput: 0: 47830.8. Samples: 69207040. Policy #0 lag: (min: 0.0, avg: 11.7, max: 24.0) [2024-06-06 12:58:31,562][14064] Avg episode reward: [(0, '0.117')] [2024-06-06 12:58:33,346][14296] Updated weights for policy 0, policy_version 13197 (0.0030) [2024-06-06 12:58:36,561][14064] Fps is (10 sec: 45874.4, 60 sec: 47786.4, 300 sec: 47930.1). Total num frames: 216350720. Throughput: 0: 47848.3. Samples: 69355340. Policy #0 lag: (min: 0.0, avg: 8.7, max: 22.0) [2024-06-06 12:58:36,562][14064] Avg episode reward: [(0, '0.116')] [2024-06-06 12:58:37,023][14296] Updated weights for policy 0, policy_version 13207 (0.0029) [2024-06-06 12:58:40,095][14296] Updated weights for policy 0, policy_version 13217 (0.0038) [2024-06-06 12:58:41,561][14064] Fps is (10 sec: 45875.0, 60 sec: 48332.8, 300 sec: 47819.1). Total num frames: 216596480. Throughput: 0: 47789.8. Samples: 69639260. Policy #0 lag: (min: 0.0, avg: 8.7, max: 22.0) [2024-06-06 12:58:41,562][14064] Avg episode reward: [(0, '0.118')] [2024-06-06 12:58:44,075][14296] Updated weights for policy 0, policy_version 13227 (0.0031) [2024-06-06 12:58:46,561][14064] Fps is (10 sec: 50791.4, 60 sec: 48332.9, 300 sec: 47875.0). Total num frames: 216858624. Throughput: 0: 47734.2. Samples: 69921160. Policy #0 lag: (min: 0.0, avg: 13.0, max: 22.0) [2024-06-06 12:58:46,562][14064] Avg episode reward: [(0, '0.120')] [2024-06-06 12:58:47,094][14296] Updated weights for policy 0, policy_version 13237 (0.0032) [2024-06-06 12:58:51,017][14296] Updated weights for policy 0, policy_version 13247 (0.0031) [2024-06-06 12:58:51,561][14064] Fps is (10 sec: 49152.2, 60 sec: 47513.7, 300 sec: 47930.2). Total num frames: 217088000. Throughput: 0: 48337.3. Samples: 70080680. Policy #0 lag: (min: 0.0, avg: 13.0, max: 22.0) [2024-06-06 12:58:51,562][14064] Avg episode reward: [(0, '0.118')] [2024-06-06 12:58:53,713][14296] Updated weights for policy 0, policy_version 13257 (0.0034) [2024-06-06 12:58:56,561][14064] Fps is (10 sec: 42598.1, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 217284608. Throughput: 0: 48016.0. Samples: 70358940. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:58:56,562][14064] Avg episode reward: [(0, '0.122')] [2024-06-06 12:58:57,777][14296] Updated weights for policy 0, policy_version 13267 (0.0026) [2024-06-06 12:59:00,554][14296] Updated weights for policy 0, policy_version 13277 (0.0024) [2024-06-06 12:59:01,561][14064] Fps is (10 sec: 45874.7, 60 sec: 48332.7, 300 sec: 47763.5). Total num frames: 217546752. Throughput: 0: 47774.2. Samples: 70644700. Policy #0 lag: (min: 0.0, avg: 7.8, max: 21.0) [2024-06-06 12:59:01,562][14064] Avg episode reward: [(0, '0.119')] [2024-06-06 12:59:04,536][14296] Updated weights for policy 0, policy_version 13287 (0.0026) [2024-06-06 12:59:06,561][14064] Fps is (10 sec: 52429.3, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 217808896. Throughput: 0: 48121.8. Samples: 70793340. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 12:59:06,562][14064] Avg episode reward: [(0, '0.118')] [2024-06-06 12:59:07,359][14296] Updated weights for policy 0, policy_version 13297 (0.0021) [2024-06-06 12:59:11,297][14296] Updated weights for policy 0, policy_version 13307 (0.0036) [2024-06-06 12:59:11,561][14064] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 47930.1). Total num frames: 218038272. Throughput: 0: 48240.8. Samples: 71080680. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 12:59:11,562][14064] Avg episode reward: [(0, '0.120')] [2024-06-06 12:59:14,162][14296] Updated weights for policy 0, policy_version 13317 (0.0030) [2024-06-06 12:59:16,561][14064] Fps is (10 sec: 42598.2, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 218234880. Throughput: 0: 48220.0. Samples: 71376940. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 12:59:16,568][14064] Avg episode reward: [(0, '0.121')] [2024-06-06 12:59:18,277][14296] Updated weights for policy 0, policy_version 13327 (0.0029) [2024-06-06 12:59:18,663][14276] Signal inference workers to stop experience collection... (1050 times) [2024-06-06 12:59:18,664][14276] Signal inference workers to resume experience collection... (1050 times) [2024-06-06 12:59:18,677][14296] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-06-06 12:59:18,677][14296] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-06-06 12:59:20,989][14296] Updated weights for policy 0, policy_version 13337 (0.0028) [2024-06-06 12:59:21,561][14064] Fps is (10 sec: 47514.2, 60 sec: 48332.8, 300 sec: 47874.6). Total num frames: 218513408. Throughput: 0: 47730.9. Samples: 71503220. Policy #0 lag: (min: 0.0, avg: 7.5, max: 21.0) [2024-06-06 12:59:21,561][14064] Avg episode reward: [(0, '0.120')] [2024-06-06 12:59:25,240][14296] Updated weights for policy 0, policy_version 13347 (0.0036) [2024-06-06 12:59:26,561][14064] Fps is (10 sec: 54067.2, 60 sec: 48059.7, 300 sec: 47930.1). Total num frames: 218775552. Throughput: 0: 47819.1. Samples: 71791120. Policy #0 lag: (min: 0.0, avg: 7.5, max: 21.0) [2024-06-06 12:59:26,562][14064] Avg episode reward: [(0, '0.122')] [2024-06-06 12:59:27,948][14296] Updated weights for policy 0, policy_version 13357 (0.0027) [2024-06-06 12:59:31,561][14064] Fps is (10 sec: 44236.8, 60 sec: 46967.5, 300 sec: 47819.1). Total num frames: 218955776. Throughput: 0: 48146.7. Samples: 72087760. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-06 12:59:31,561][14064] Avg episode reward: [(0, '0.124')] [2024-06-06 12:59:31,607][14276] Saving new best policy, reward=0.124! [2024-06-06 12:59:31,924][14296] Updated weights for policy 0, policy_version 13367 (0.0022) [2024-06-06 12:59:34,710][14296] Updated weights for policy 0, policy_version 13377 (0.0030) [2024-06-06 12:59:36,561][14064] Fps is (10 sec: 42598.0, 60 sec: 47513.7, 300 sec: 47708.0). Total num frames: 219201536. Throughput: 0: 47476.3. Samples: 72217120. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-06 12:59:36,562][14064] Avg episode reward: [(0, '0.122')] [2024-06-06 12:59:38,782][14296] Updated weights for policy 0, policy_version 13387 (0.0036) [2024-06-06 12:59:41,561][14064] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47819.1). Total num frames: 219480064. Throughput: 0: 47681.8. Samples: 72504620. Policy #0 lag: (min: 0.0, avg: 8.2, max: 21.0) [2024-06-06 12:59:41,562][14064] Avg episode reward: [(0, '0.123')] [2024-06-06 12:59:41,616][14296] Updated weights for policy 0, policy_version 13397 (0.0032) [2024-06-06 12:59:45,869][14296] Updated weights for policy 0, policy_version 13407 (0.0035) [2024-06-06 12:59:46,561][14064] Fps is (10 sec: 50791.1, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 219709440. Throughput: 0: 47680.6. Samples: 72790320. Policy #0 lag: (min: 0.0, avg: 8.2, max: 21.0) [2024-06-06 12:59:46,562][14064] Avg episode reward: [(0, '0.116')] [2024-06-06 12:59:48,392][14296] Updated weights for policy 0, policy_version 13417 (0.0035) [2024-06-06 12:59:51,561][14064] Fps is (10 sec: 44237.0, 60 sec: 47240.5, 300 sec: 47819.1). Total num frames: 219922432. Throughput: 0: 47620.0. Samples: 72936240. Policy #0 lag: (min: 0.0, avg: 11.3, max: 20.0) [2024-06-06 12:59:51,562][14064] Avg episode reward: [(0, '0.121')] [2024-06-06 12:59:52,613][14296] Updated weights for policy 0, policy_version 13427 (0.0035) [2024-06-06 12:59:55,439][14296] Updated weights for policy 0, policy_version 13437 (0.0031) [2024-06-06 12:59:56,561][14064] Fps is (10 sec: 45875.3, 60 sec: 48059.8, 300 sec: 47763.5). Total num frames: 220168192. Throughput: 0: 47304.6. Samples: 73209380. Policy #0 lag: (min: 0.0, avg: 11.3, max: 20.0) [2024-06-06 12:59:56,561][14064] Avg episode reward: [(0, '0.121')] [2024-06-06 12:59:59,454][14296] Updated weights for policy 0, policy_version 13447 (0.0024) [2024-06-06 13:00:01,561][14064] Fps is (10 sec: 50790.3, 60 sec: 48059.8, 300 sec: 47819.0). Total num frames: 220430336. Throughput: 0: 47259.6. Samples: 73503620. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 13:00:01,562][14064] Avg episode reward: [(0, '0.121')] [2024-06-06 13:00:02,173][14296] Updated weights for policy 0, policy_version 13457 (0.0035) [2024-06-06 13:00:06,354][14296] Updated weights for policy 0, policy_version 13467 (0.0027) [2024-06-06 13:00:06,564][14064] Fps is (10 sec: 49138.6, 60 sec: 47511.5, 300 sec: 47874.2). Total num frames: 220659712. Throughput: 0: 47974.4. Samples: 73662200. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 13:00:06,565][14064] Avg episode reward: [(0, '0.123')] [2024-06-06 13:00:06,571][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000013468_220659712.pth... [2024-06-06 13:00:06,631][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000012769_209207296.pth [2024-06-06 13:00:08,992][14296] Updated weights for policy 0, policy_version 13477 (0.0032) [2024-06-06 13:00:11,561][14064] Fps is (10 sec: 44236.4, 60 sec: 47240.5, 300 sec: 47763.5). Total num frames: 220872704. Throughput: 0: 47651.9. Samples: 73935460. Policy #0 lag: (min: 0.0, avg: 12.0, max: 22.0) [2024-06-06 13:00:11,562][14064] Avg episode reward: [(0, '0.119')] [2024-06-06 13:00:13,494][14296] Updated weights for policy 0, policy_version 13487 (0.0033) [2024-06-06 13:00:15,930][14296] Updated weights for policy 0, policy_version 13497 (0.0031) [2024-06-06 13:00:16,561][14064] Fps is (10 sec: 47526.5, 60 sec: 48332.8, 300 sec: 47708.0). Total num frames: 221134848. Throughput: 0: 47267.1. Samples: 74214780. Policy #0 lag: (min: 0.0, avg: 12.0, max: 22.0) [2024-06-06 13:00:16,561][14064] Avg episode reward: [(0, '0.125')] [2024-06-06 13:00:16,666][14276] Saving new best policy, reward=0.125! [2024-06-06 13:00:20,247][14296] Updated weights for policy 0, policy_version 13507 (0.0031) [2024-06-06 13:00:21,561][14064] Fps is (10 sec: 50790.3, 60 sec: 47786.5, 300 sec: 47763.5). Total num frames: 221380608. Throughput: 0: 47794.2. Samples: 74367860. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 13:00:21,562][14064] Avg episode reward: [(0, '0.124')] [2024-06-06 13:00:22,962][14296] Updated weights for policy 0, policy_version 13517 (0.0027) [2024-06-06 13:00:24,687][14276] Signal inference workers to stop experience collection... (1100 times) [2024-06-06 13:00:24,735][14276] Signal inference workers to resume experience collection... (1100 times) [2024-06-06 13:00:24,735][14296] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-06-06 13:00:24,752][14296] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-06-06 13:00:26,561][14064] Fps is (10 sec: 45874.8, 60 sec: 46967.4, 300 sec: 47819.1). Total num frames: 221593600. Throughput: 0: 47678.2. Samples: 74650140. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 13:00:26,562][14064] Avg episode reward: [(0, '0.118')] [2024-06-06 13:00:26,955][14296] Updated weights for policy 0, policy_version 13527 (0.0032) [2024-06-06 13:00:29,826][14296] Updated weights for policy 0, policy_version 13537 (0.0027) [2024-06-06 13:00:31,561][14064] Fps is (10 sec: 44237.0, 60 sec: 47786.6, 300 sec: 47708.0). Total num frames: 221822976. Throughput: 0: 47838.6. Samples: 74943060. Policy #0 lag: (min: 1.0, avg: 10.5, max: 22.0) [2024-06-06 13:00:31,562][14064] Avg episode reward: [(0, '0.119')] [2024-06-06 13:00:33,844][14296] Updated weights for policy 0, policy_version 13547 (0.0028) [2024-06-06 13:00:36,561][14064] Fps is (10 sec: 50790.0, 60 sec: 48332.8, 300 sec: 47708.0). Total num frames: 222101504. Throughput: 0: 47686.9. Samples: 75082160. Policy #0 lag: (min: 1.0, avg: 10.5, max: 22.0) [2024-06-06 13:00:36,562][14064] Avg episode reward: [(0, '0.121')] [2024-06-06 13:00:36,577][14296] Updated weights for policy 0, policy_version 13557 (0.0031) [2024-06-06 13:00:41,107][14296] Updated weights for policy 0, policy_version 13567 (0.0029) [2024-06-06 13:00:41,561][14064] Fps is (10 sec: 50790.9, 60 sec: 47513.6, 300 sec: 47819.1). Total num frames: 222330880. Throughput: 0: 47889.3. Samples: 75364400. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:00:41,562][14064] Avg episode reward: [(0, '0.123')] [2024-06-06 13:00:43,534][14296] Updated weights for policy 0, policy_version 13577 (0.0020) [2024-06-06 13:00:46,561][14064] Fps is (10 sec: 42599.2, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 222527488. Throughput: 0: 47681.8. Samples: 75649300. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:00:46,562][14064] Avg episode reward: [(0, '0.128')] [2024-06-06 13:00:46,594][14276] Saving new best policy, reward=0.128! [2024-06-06 13:00:47,837][14296] Updated weights for policy 0, policy_version 13587 (0.0037) [2024-06-06 13:00:50,491][14296] Updated weights for policy 0, policy_version 13597 (0.0038) [2024-06-06 13:00:51,561][14064] Fps is (10 sec: 45874.3, 60 sec: 47786.5, 300 sec: 47708.0). Total num frames: 222789632. Throughput: 0: 47140.9. Samples: 75783420. Policy #0 lag: (min: 0.0, avg: 12.1, max: 25.0) [2024-06-06 13:00:51,562][14064] Avg episode reward: [(0, '0.121')] [2024-06-06 13:00:54,666][14296] Updated weights for policy 0, policy_version 13607 (0.0021) [2024-06-06 13:00:56,561][14064] Fps is (10 sec: 52428.1, 60 sec: 48059.6, 300 sec: 47763.5). Total num frames: 223051776. Throughput: 0: 47369.8. Samples: 76067100. Policy #0 lag: (min: 0.0, avg: 12.1, max: 25.0) [2024-06-06 13:00:56,562][14064] Avg episode reward: [(0, '0.129')] [2024-06-06 13:00:56,705][14276] Saving new best policy, reward=0.129! [2024-06-06 13:00:57,307][14296] Updated weights for policy 0, policy_version 13617 (0.0032) [2024-06-06 13:01:01,489][14296] Updated weights for policy 0, policy_version 13627 (0.0041) [2024-06-06 13:01:01,561][14064] Fps is (10 sec: 47514.1, 60 sec: 47240.5, 300 sec: 47708.0). Total num frames: 223264768. Throughput: 0: 47711.5. Samples: 76361800. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:01:01,562][14064] Avg episode reward: [(0, '0.121')] [2024-06-06 13:01:04,237][14296] Updated weights for policy 0, policy_version 13637 (0.0031) [2024-06-06 13:01:06,561][14064] Fps is (10 sec: 44237.2, 60 sec: 47242.7, 300 sec: 47708.0). Total num frames: 223494144. Throughput: 0: 47303.3. Samples: 76496500. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:01:06,562][14064] Avg episode reward: [(0, '0.120')] [2024-06-06 13:01:08,539][14296] Updated weights for policy 0, policy_version 13647 (0.0030) [2024-06-06 13:01:11,394][14296] Updated weights for policy 0, policy_version 13657 (0.0031) [2024-06-06 13:01:11,561][14064] Fps is (10 sec: 49152.8, 60 sec: 48059.9, 300 sec: 47708.0). Total num frames: 223756288. Throughput: 0: 47256.2. Samples: 76776660. Policy #0 lag: (min: 2.0, avg: 11.1, max: 22.0) [2024-06-06 13:01:11,561][14064] Avg episode reward: [(0, '0.128')] [2024-06-06 13:01:15,346][14296] Updated weights for policy 0, policy_version 13667 (0.0037) [2024-06-06 13:01:16,561][14064] Fps is (10 sec: 49151.7, 60 sec: 47513.5, 300 sec: 47652.5). Total num frames: 223985664. Throughput: 0: 47124.0. Samples: 77063640. Policy #0 lag: (min: 2.0, avg: 11.1, max: 22.0) [2024-06-06 13:01:16,562][14064] Avg episode reward: [(0, '0.120')] [2024-06-06 13:01:18,166][14296] Updated weights for policy 0, policy_version 13677 (0.0029) [2024-06-06 13:01:21,561][14064] Fps is (10 sec: 44236.3, 60 sec: 46967.6, 300 sec: 47652.4). Total num frames: 224198656. Throughput: 0: 47418.8. Samples: 77216000. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 13:01:21,562][14064] Avg episode reward: [(0, '0.126')] [2024-06-06 13:01:22,262][14296] Updated weights for policy 0, policy_version 13687 (0.0037) [2024-06-06 13:01:24,774][14296] Updated weights for policy 0, policy_version 13697 (0.0028) [2024-06-06 13:01:26,561][14064] Fps is (10 sec: 44236.8, 60 sec: 47240.5, 300 sec: 47596.9). Total num frames: 224428032. Throughput: 0: 47406.6. Samples: 77497700. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 13:01:26,562][14064] Avg episode reward: [(0, '0.123')] [2024-06-06 13:01:29,226][14296] Updated weights for policy 0, policy_version 13707 (0.0033) [2024-06-06 13:01:29,962][14276] Signal inference workers to stop experience collection... (1150 times) [2024-06-06 13:01:29,963][14276] Signal inference workers to resume experience collection... (1150 times) [2024-06-06 13:01:30,020][14296] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-06-06 13:01:30,020][14296] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-06-06 13:01:31,561][14064] Fps is (10 sec: 52428.3, 60 sec: 48332.8, 300 sec: 47652.5). Total num frames: 224722944. Throughput: 0: 47322.5. Samples: 77778820. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:01:31,562][14064] Avg episode reward: [(0, '0.128')] [2024-06-06 13:01:31,783][14296] Updated weights for policy 0, policy_version 13717 (0.0025) [2024-06-06 13:01:36,192][14296] Updated weights for policy 0, policy_version 13727 (0.0043) [2024-06-06 13:01:36,561][14064] Fps is (10 sec: 50790.2, 60 sec: 47240.6, 300 sec: 47652.4). Total num frames: 224935936. Throughput: 0: 47587.2. Samples: 77924840. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:01:36,562][14064] Avg episode reward: [(0, '0.125')] [2024-06-06 13:01:38,757][14296] Updated weights for policy 0, policy_version 13737 (0.0020) [2024-06-06 13:01:41,561][14064] Fps is (10 sec: 44237.6, 60 sec: 47240.6, 300 sec: 47763.5). Total num frames: 225165312. Throughput: 0: 47770.4. Samples: 78216760. Policy #0 lag: (min: 0.0, avg: 11.4, max: 20.0) [2024-06-06 13:01:41,561][14064] Avg episode reward: [(0, '0.126')] [2024-06-06 13:01:42,894][14296] Updated weights for policy 0, policy_version 13747 (0.0023) [2024-06-06 13:01:45,531][14296] Updated weights for policy 0, policy_version 13757 (0.0025) [2024-06-06 13:01:46,561][14064] Fps is (10 sec: 45875.7, 60 sec: 47786.7, 300 sec: 47652.4). Total num frames: 225394688. Throughput: 0: 47401.0. Samples: 78494840. Policy #0 lag: (min: 0.0, avg: 11.4, max: 20.0) [2024-06-06 13:01:46,562][14064] Avg episode reward: [(0, '0.122')] [2024-06-06 13:01:49,922][14296] Updated weights for policy 0, policy_version 13767 (0.0026) [2024-06-06 13:01:51,561][14064] Fps is (10 sec: 50790.6, 60 sec: 48060.0, 300 sec: 47652.5). Total num frames: 225673216. Throughput: 0: 47649.0. Samples: 78640700. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 13:01:51,561][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:01:51,566][14276] Saving new best policy, reward=0.132! [2024-06-06 13:01:52,341][14296] Updated weights for policy 0, policy_version 13777 (0.0027) [2024-06-06 13:01:56,561][14064] Fps is (10 sec: 45874.6, 60 sec: 46694.4, 300 sec: 47652.4). Total num frames: 225853440. Throughput: 0: 47795.4. Samples: 78927460. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 13:01:56,562][14064] Avg episode reward: [(0, '0.123')] [2024-06-06 13:01:56,901][14296] Updated weights for policy 0, policy_version 13787 (0.0036) [2024-06-06 13:01:59,271][14296] Updated weights for policy 0, policy_version 13797 (0.0027) [2024-06-06 13:02:01,561][14064] Fps is (10 sec: 42598.1, 60 sec: 47240.6, 300 sec: 47708.0). Total num frames: 226099200. Throughput: 0: 47773.0. Samples: 79213420. Policy #0 lag: (min: 0.0, avg: 11.3, max: 20.0) [2024-06-06 13:02:01,562][14064] Avg episode reward: [(0, '0.128')] [2024-06-06 13:02:03,897][14296] Updated weights for policy 0, policy_version 13807 (0.0023) [2024-06-06 13:02:06,251][14296] Updated weights for policy 0, policy_version 13817 (0.0029) [2024-06-06 13:02:06,561][14064] Fps is (10 sec: 52429.4, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 226377728. Throughput: 0: 47435.2. Samples: 79350580. Policy #0 lag: (min: 0.0, avg: 11.3, max: 20.0) [2024-06-06 13:02:06,562][14064] Avg episode reward: [(0, '0.130')] [2024-06-06 13:02:06,575][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000013817_226377728.pth... [2024-06-06 13:02:06,614][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000013118_214925312.pth [2024-06-06 13:02:10,675][14296] Updated weights for policy 0, policy_version 13827 (0.0026) [2024-06-06 13:02:11,564][14064] Fps is (10 sec: 49138.8, 60 sec: 47238.4, 300 sec: 47652.0). Total num frames: 226590720. Throughput: 0: 47538.6. Samples: 79637060. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 13:02:11,565][14064] Avg episode reward: [(0, '0.131')] [2024-06-06 13:02:13,187][14296] Updated weights for policy 0, policy_version 13837 (0.0021) [2024-06-06 13:02:16,561][14064] Fps is (10 sec: 44236.7, 60 sec: 47240.6, 300 sec: 47708.0). Total num frames: 226820096. Throughput: 0: 47849.4. Samples: 79932040. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 13:02:16,562][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:02:17,418][14296] Updated weights for policy 0, policy_version 13847 (0.0019) [2024-06-06 13:02:20,008][14296] Updated weights for policy 0, policy_version 13857 (0.0025) [2024-06-06 13:02:21,561][14064] Fps is (10 sec: 47526.0, 60 sec: 47786.6, 300 sec: 47652.5). Total num frames: 227065856. Throughput: 0: 47519.6. Samples: 80063220. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 13:02:21,570][14064] Avg episode reward: [(0, '0.129')] [2024-06-06 13:02:24,369][14296] Updated weights for policy 0, policy_version 13867 (0.0028) [2024-06-06 13:02:26,561][14064] Fps is (10 sec: 50790.7, 60 sec: 48332.9, 300 sec: 47652.4). Total num frames: 227328000. Throughput: 0: 47412.4. Samples: 80350320. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 13:02:26,562][14064] Avg episode reward: [(0, '0.128')] [2024-06-06 13:02:26,874][14296] Updated weights for policy 0, policy_version 13877 (0.0027) [2024-06-06 13:02:31,244][14296] Updated weights for policy 0, policy_version 13887 (0.0040) [2024-06-06 13:02:31,564][14064] Fps is (10 sec: 47501.4, 60 sec: 46965.5, 300 sec: 47652.0). Total num frames: 227540992. Throughput: 0: 47807.8. Samples: 80646320. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 13:02:31,564][14064] Avg episode reward: [(0, '0.131')] [2024-06-06 13:02:33,730][14296] Updated weights for policy 0, policy_version 13897 (0.0034) [2024-06-06 13:02:36,561][14064] Fps is (10 sec: 44236.4, 60 sec: 47240.6, 300 sec: 47708.0). Total num frames: 227770368. Throughput: 0: 47632.7. Samples: 80784180. Policy #0 lag: (min: 1.0, avg: 9.6, max: 23.0) [2024-06-06 13:02:36,562][14064] Avg episode reward: [(0, '0.131')] [2024-06-06 13:02:38,130][14296] Updated weights for policy 0, policy_version 13907 (0.0032) [2024-06-06 13:02:40,696][14296] Updated weights for policy 0, policy_version 13917 (0.0035) [2024-06-06 13:02:41,561][14064] Fps is (10 sec: 47525.5, 60 sec: 47513.4, 300 sec: 47652.4). Total num frames: 228016128. Throughput: 0: 47436.9. Samples: 81062120. Policy #0 lag: (min: 1.0, avg: 9.6, max: 23.0) [2024-06-06 13:02:41,562][14064] Avg episode reward: [(0, '0.127')] [2024-06-06 13:02:44,778][14296] Updated weights for policy 0, policy_version 13927 (0.0034) [2024-06-06 13:02:45,258][14276] Signal inference workers to stop experience collection... (1200 times) [2024-06-06 13:02:45,258][14276] Signal inference workers to resume experience collection... (1200 times) [2024-06-06 13:02:45,289][14296] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-06-06 13:02:45,289][14296] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-06-06 13:02:46,561][14064] Fps is (10 sec: 52428.9, 60 sec: 48332.7, 300 sec: 47652.5). Total num frames: 228294656. Throughput: 0: 47474.6. Samples: 81349780. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 13:02:46,562][14064] Avg episode reward: [(0, '0.124')] [2024-06-06 13:02:47,779][14296] Updated weights for policy 0, policy_version 13937 (0.0036) [2024-06-06 13:02:51,561][14064] Fps is (10 sec: 45876.0, 60 sec: 46694.4, 300 sec: 47596.9). Total num frames: 228474880. Throughput: 0: 47691.6. Samples: 81496700. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 13:02:51,562][14064] Avg episode reward: [(0, '0.126')] [2024-06-06 13:02:51,794][14296] Updated weights for policy 0, policy_version 13947 (0.0028) [2024-06-06 13:02:54,405][14296] Updated weights for policy 0, policy_version 13957 (0.0028) [2024-06-06 13:02:56,561][14064] Fps is (10 sec: 44237.3, 60 sec: 48059.9, 300 sec: 47763.5). Total num frames: 228737024. Throughput: 0: 47802.9. Samples: 81788060. Policy #0 lag: (min: 1.0, avg: 10.7, max: 22.0) [2024-06-06 13:02:56,561][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:02:58,544][14296] Updated weights for policy 0, policy_version 13967 (0.0026) [2024-06-06 13:03:01,395][14296] Updated weights for policy 0, policy_version 13977 (0.0033) [2024-06-06 13:03:01,561][14064] Fps is (10 sec: 52427.9, 60 sec: 48332.7, 300 sec: 47708.0). Total num frames: 228999168. Throughput: 0: 47425.7. Samples: 82066200. Policy #0 lag: (min: 1.0, avg: 10.7, max: 22.0) [2024-06-06 13:03:01,562][14064] Avg episode reward: [(0, '0.129')] [2024-06-06 13:03:05,389][14296] Updated weights for policy 0, policy_version 13987 (0.0029) [2024-06-06 13:03:06,561][14064] Fps is (10 sec: 49151.3, 60 sec: 47513.5, 300 sec: 47596.9). Total num frames: 229228544. Throughput: 0: 47962.7. Samples: 82221540. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 13:03:06,562][14064] Avg episode reward: [(0, '0.128')] [2024-06-06 13:03:08,306][14296] Updated weights for policy 0, policy_version 13997 (0.0019) [2024-06-06 13:03:11,561][14064] Fps is (10 sec: 42599.0, 60 sec: 47242.6, 300 sec: 47652.5). Total num frames: 229425152. Throughput: 0: 47900.9. Samples: 82505860. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 13:03:11,561][14064] Avg episode reward: [(0, '0.129')] [2024-06-06 13:03:12,202][14296] Updated weights for policy 0, policy_version 14007 (0.0029) [2024-06-06 13:03:15,063][14296] Updated weights for policy 0, policy_version 14017 (0.0021) [2024-06-06 13:03:16,561][14064] Fps is (10 sec: 47513.4, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 229703680. Throughput: 0: 47602.7. Samples: 82788320. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:03:16,562][14064] Avg episode reward: [(0, '0.127')] [2024-06-06 13:03:19,154][14296] Updated weights for policy 0, policy_version 14027 (0.0030) [2024-06-06 13:03:21,561][14064] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 229949440. Throughput: 0: 47764.4. Samples: 82933580. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:03:21,562][14064] Avg episode reward: [(0, '0.131')] [2024-06-06 13:03:21,845][14296] Updated weights for policy 0, policy_version 14037 (0.0028) [2024-06-06 13:03:25,993][14296] Updated weights for policy 0, policy_version 14047 (0.0025) [2024-06-06 13:03:26,561][14064] Fps is (10 sec: 45875.8, 60 sec: 47240.5, 300 sec: 47541.4). Total num frames: 230162432. Throughput: 0: 48082.8. Samples: 83225840. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 13:03:26,562][14064] Avg episode reward: [(0, '0.134')] [2024-06-06 13:03:26,634][14276] Saving new best policy, reward=0.134! [2024-06-06 13:03:28,649][14296] Updated weights for policy 0, policy_version 14057 (0.0034) [2024-06-06 13:03:31,561][14064] Fps is (10 sec: 44236.9, 60 sec: 47515.6, 300 sec: 47596.9). Total num frames: 230391808. Throughput: 0: 47796.0. Samples: 83500600. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 13:03:31,562][14064] Avg episode reward: [(0, '0.124')] [2024-06-06 13:03:32,778][14296] Updated weights for policy 0, policy_version 14067 (0.0038) [2024-06-06 13:03:35,654][14296] Updated weights for policy 0, policy_version 14077 (0.0043) [2024-06-06 13:03:36,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 230637568. Throughput: 0: 47558.2. Samples: 83636820. Policy #0 lag: (min: 1.0, avg: 11.8, max: 22.0) [2024-06-06 13:03:36,562][14064] Avg episode reward: [(0, '0.127')] [2024-06-06 13:03:39,810][14296] Updated weights for policy 0, policy_version 14087 (0.0026) [2024-06-06 13:03:41,561][14064] Fps is (10 sec: 50790.5, 60 sec: 48059.8, 300 sec: 47596.9). Total num frames: 230899712. Throughput: 0: 47598.1. Samples: 83929980. Policy #0 lag: (min: 1.0, avg: 11.8, max: 22.0) [2024-06-06 13:03:41,562][14064] Avg episode reward: [(0, '0.134')] [2024-06-06 13:03:42,671][14296] Updated weights for policy 0, policy_version 14097 (0.0034) [2024-06-06 13:03:46,564][14064] Fps is (10 sec: 45863.0, 60 sec: 46692.4, 300 sec: 47485.4). Total num frames: 231096320. Throughput: 0: 47666.2. Samples: 84211300. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 13:03:46,565][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:03:46,894][14296] Updated weights for policy 0, policy_version 14107 (0.0025) [2024-06-06 13:03:49,562][14296] Updated weights for policy 0, policy_version 14117 (0.0037) [2024-06-06 13:03:50,697][14276] Signal inference workers to stop experience collection... (1250 times) [2024-06-06 13:03:50,698][14276] Signal inference workers to resume experience collection... (1250 times) [2024-06-06 13:03:50,735][14296] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-06-06 13:03:50,735][14296] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-06-06 13:03:51,561][14064] Fps is (10 sec: 44236.8, 60 sec: 47786.6, 300 sec: 47652.5). Total num frames: 231342080. Throughput: 0: 47092.5. Samples: 84340700. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 13:03:51,562][14064] Avg episode reward: [(0, '0.125')] [2024-06-06 13:03:54,043][14296] Updated weights for policy 0, policy_version 14127 (0.0031) [2024-06-06 13:03:56,491][14296] Updated weights for policy 0, policy_version 14137 (0.0024) [2024-06-06 13:03:56,561][14064] Fps is (10 sec: 52442.7, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 231620608. Throughput: 0: 46961.8. Samples: 84619140. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 13:03:56,562][14064] Avg episode reward: [(0, '0.134')] [2024-06-06 13:04:01,053][14296] Updated weights for policy 0, policy_version 14147 (0.0032) [2024-06-06 13:04:01,561][14064] Fps is (10 sec: 45875.5, 60 sec: 46694.5, 300 sec: 47430.3). Total num frames: 231800832. Throughput: 0: 47137.9. Samples: 84909520. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 13:04:01,561][14064] Avg episode reward: [(0, '0.130')] [2024-06-06 13:04:03,700][14296] Updated weights for policy 0, policy_version 14157 (0.0033) [2024-06-06 13:04:06,561][14064] Fps is (10 sec: 40959.9, 60 sec: 46694.4, 300 sec: 47430.3). Total num frames: 232030208. Throughput: 0: 46980.1. Samples: 85047680. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 13:04:06,562][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:04:06,654][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000014163_232046592.pth... [2024-06-06 13:04:06,705][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000013468_220659712.pth [2024-06-06 13:04:07,790][14296] Updated weights for policy 0, policy_version 14167 (0.0033) [2024-06-06 13:04:10,466][14296] Updated weights for policy 0, policy_version 14177 (0.0026) [2024-06-06 13:04:11,561][14064] Fps is (10 sec: 49152.0, 60 sec: 47786.7, 300 sec: 47652.5). Total num frames: 232292352. Throughput: 0: 46801.8. Samples: 85331920. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 13:04:11,561][14064] Avg episode reward: [(0, '0.131')] [2024-06-06 13:04:14,719][14296] Updated weights for policy 0, policy_version 14187 (0.0026) [2024-06-06 13:04:16,561][14064] Fps is (10 sec: 54067.4, 60 sec: 47786.8, 300 sec: 47652.4). Total num frames: 232570880. Throughput: 0: 47149.0. Samples: 85622300. Policy #0 lag: (min: 0.0, avg: 11.8, max: 21.0) [2024-06-06 13:04:16,562][14064] Avg episode reward: [(0, '0.130')] [2024-06-06 13:04:17,131][14296] Updated weights for policy 0, policy_version 14197 (0.0032) [2024-06-06 13:04:21,561][14064] Fps is (10 sec: 45875.1, 60 sec: 46694.5, 300 sec: 47374.8). Total num frames: 232751104. Throughput: 0: 47488.9. Samples: 85773820. Policy #0 lag: (min: 0.0, avg: 11.8, max: 21.0) [2024-06-06 13:04:21,562][14064] Avg episode reward: [(0, '0.130')] [2024-06-06 13:04:21,689][14296] Updated weights for policy 0, policy_version 14207 (0.0032) [2024-06-06 13:04:24,061][14296] Updated weights for policy 0, policy_version 14217 (0.0028) [2024-06-06 13:04:26,561][14064] Fps is (10 sec: 40959.6, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 232980480. Throughput: 0: 47069.3. Samples: 86048100. Policy #0 lag: (min: 0.0, avg: 11.8, max: 21.0) [2024-06-06 13:04:26,562][14064] Avg episode reward: [(0, '0.129')] [2024-06-06 13:04:28,401][14296] Updated weights for policy 0, policy_version 14227 (0.0034) [2024-06-06 13:04:31,239][14296] Updated weights for policy 0, policy_version 14237 (0.0031) [2024-06-06 13:04:31,561][14064] Fps is (10 sec: 50789.9, 60 sec: 47786.6, 300 sec: 47652.5). Total num frames: 233259008. Throughput: 0: 47184.0. Samples: 86334460. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 13:04:31,562][14064] Avg episode reward: [(0, '0.122')] [2024-06-06 13:04:35,258][14296] Updated weights for policy 0, policy_version 14247 (0.0026) [2024-06-06 13:04:36,561][14064] Fps is (10 sec: 52428.9, 60 sec: 47786.6, 300 sec: 47541.4). Total num frames: 233504768. Throughput: 0: 47786.2. Samples: 86491080. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 13:04:36,562][14064] Avg episode reward: [(0, '0.133')] [2024-06-06 13:04:38,094][14296] Updated weights for policy 0, policy_version 14257 (0.0035) [2024-06-06 13:04:41,561][14064] Fps is (10 sec: 44237.3, 60 sec: 46694.4, 300 sec: 47430.3). Total num frames: 233701376. Throughput: 0: 47929.8. Samples: 86775980. Policy #0 lag: (min: 0.0, avg: 12.0, max: 22.0) [2024-06-06 13:04:41,562][14064] Avg episode reward: [(0, '0.129')] [2024-06-06 13:04:42,205][14296] Updated weights for policy 0, policy_version 14267 (0.0028) [2024-06-06 13:04:44,989][14296] Updated weights for policy 0, policy_version 14277 (0.0033) [2024-06-06 13:04:46,561][14064] Fps is (10 sec: 45875.1, 60 sec: 47788.7, 300 sec: 47596.9). Total num frames: 233963520. Throughput: 0: 47527.0. Samples: 87048240. Policy #0 lag: (min: 0.0, avg: 12.0, max: 22.0) [2024-06-06 13:04:46,562][14064] Avg episode reward: [(0, '0.135')] [2024-06-06 13:04:49,132][14296] Updated weights for policy 0, policy_version 14287 (0.0027) [2024-06-06 13:04:51,561][14064] Fps is (10 sec: 50790.3, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 234209280. Throughput: 0: 47782.7. Samples: 87197900. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 13:04:51,562][14064] Avg episode reward: [(0, '0.129')] [2024-06-06 13:04:52,151][14296] Updated weights for policy 0, policy_version 14297 (0.0030) [2024-06-06 13:04:55,779][14296] Updated weights for policy 0, policy_version 14307 (0.0027) [2024-06-06 13:04:56,561][14064] Fps is (10 sec: 49152.6, 60 sec: 47240.6, 300 sec: 47541.4). Total num frames: 234455040. Throughput: 0: 47896.5. Samples: 87487260. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 13:04:56,562][14064] Avg episode reward: [(0, '0.137')] [2024-06-06 13:04:56,571][14276] Saving new best policy, reward=0.137! [2024-06-06 13:04:59,040][14296] Updated weights for policy 0, policy_version 14317 (0.0028) [2024-06-06 13:05:01,561][14064] Fps is (10 sec: 44236.5, 60 sec: 47513.5, 300 sec: 47430.7). Total num frames: 234651648. Throughput: 0: 47912.8. Samples: 87778380. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 13:05:01,562][14064] Avg episode reward: [(0, '0.125')] [2024-06-06 13:05:02,613][14296] Updated weights for policy 0, policy_version 14327 (0.0027) [2024-06-06 13:05:05,682][14296] Updated weights for policy 0, policy_version 14337 (0.0031) [2024-06-06 13:05:06,561][14064] Fps is (10 sec: 49151.7, 60 sec: 48605.9, 300 sec: 47708.0). Total num frames: 234946560. Throughput: 0: 47435.5. Samples: 87908420. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 13:05:06,562][14064] Avg episode reward: [(0, '0.129')] [2024-06-06 13:05:09,280][14276] Signal inference workers to stop experience collection... (1300 times) [2024-06-06 13:05:09,280][14276] Signal inference workers to resume experience collection... (1300 times) [2024-06-06 13:05:09,303][14296] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-06-06 13:05:09,303][14296] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-06-06 13:05:09,601][14296] Updated weights for policy 0, policy_version 14347 (0.0035) [2024-06-06 13:05:11,561][14064] Fps is (10 sec: 54067.6, 60 sec: 48332.8, 300 sec: 47652.4). Total num frames: 235192320. Throughput: 0: 47933.9. Samples: 88205120. Policy #0 lag: (min: 0.0, avg: 8.7, max: 22.0) [2024-06-06 13:05:11,562][14064] Avg episode reward: [(0, '0.130')] [2024-06-06 13:05:12,295][14296] Updated weights for policy 0, policy_version 14357 (0.0036) [2024-06-06 13:05:16,561][14064] Fps is (10 sec: 42598.8, 60 sec: 46694.4, 300 sec: 47430.3). Total num frames: 235372544. Throughput: 0: 48187.7. Samples: 88502900. Policy #0 lag: (min: 0.0, avg: 8.7, max: 22.0) [2024-06-06 13:05:16,562][14064] Avg episode reward: [(0, '0.126')] [2024-06-06 13:05:16,617][14296] Updated weights for policy 0, policy_version 14367 (0.0028) [2024-06-06 13:05:19,298][14296] Updated weights for policy 0, policy_version 14377 (0.0021) [2024-06-06 13:05:21,563][14064] Fps is (10 sec: 42590.2, 60 sec: 47785.1, 300 sec: 47541.1). Total num frames: 235618304. Throughput: 0: 47502.0. Samples: 88628760. Policy #0 lag: (min: 0.0, avg: 12.2, max: 23.0) [2024-06-06 13:05:21,564][14064] Avg episode reward: [(0, '0.126')] [2024-06-06 13:05:23,196][14296] Updated weights for policy 0, policy_version 14387 (0.0028) [2024-06-06 13:05:25,994][14296] Updated weights for policy 0, policy_version 14397 (0.0027) [2024-06-06 13:05:26,561][14064] Fps is (10 sec: 54066.1, 60 sec: 48878.9, 300 sec: 47763.5). Total num frames: 235913216. Throughput: 0: 47765.2. Samples: 88925420. Policy #0 lag: (min: 0.0, avg: 12.2, max: 23.0) [2024-06-06 13:05:26,562][14064] Avg episode reward: [(0, '0.134')] [2024-06-06 13:05:30,026][14296] Updated weights for policy 0, policy_version 14407 (0.0028) [2024-06-06 13:05:31,561][14064] Fps is (10 sec: 52438.7, 60 sec: 48059.8, 300 sec: 47596.9). Total num frames: 236142592. Throughput: 0: 48020.1. Samples: 89209140. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:05:31,562][14064] Avg episode reward: [(0, '0.133')] [2024-06-06 13:05:32,667][14296] Updated weights for policy 0, policy_version 14417 (0.0030) [2024-06-06 13:05:36,561][14064] Fps is (10 sec: 40960.4, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 236322816. Throughput: 0: 48002.2. Samples: 89358000. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:05:36,562][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:05:36,927][14296] Updated weights for policy 0, policy_version 14427 (0.0023) [2024-06-06 13:05:39,730][14296] Updated weights for policy 0, policy_version 14437 (0.0032) [2024-06-06 13:05:41,561][14064] Fps is (10 sec: 44236.9, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 236584960. Throughput: 0: 47861.3. Samples: 89641020. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:05:41,562][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:05:43,941][14296] Updated weights for policy 0, policy_version 14447 (0.0024) [2024-06-06 13:05:46,561][14064] Fps is (10 sec: 50790.7, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 236830720. Throughput: 0: 47670.3. Samples: 89923540. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 13:05:46,562][14064] Avg episode reward: [(0, '0.130')] [2024-06-06 13:05:46,746][14296] Updated weights for policy 0, policy_version 14457 (0.0040) [2024-06-06 13:05:50,610][14296] Updated weights for policy 0, policy_version 14467 (0.0027) [2024-06-06 13:05:51,561][14064] Fps is (10 sec: 49152.3, 60 sec: 47786.7, 300 sec: 47541.4). Total num frames: 237076480. Throughput: 0: 48285.4. Samples: 90081260. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 13:05:51,562][14064] Avg episode reward: [(0, '0.135')] [2024-06-06 13:05:53,491][14296] Updated weights for policy 0, policy_version 14477 (0.0024) [2024-06-06 13:05:56,561][14064] Fps is (10 sec: 44236.6, 60 sec: 46967.4, 300 sec: 47485.8). Total num frames: 237273088. Throughput: 0: 48023.1. Samples: 90366160. Policy #0 lag: (min: 0.0, avg: 12.5, max: 23.0) [2024-06-06 13:05:56,562][14064] Avg episode reward: [(0, '0.136')] [2024-06-06 13:05:57,466][14296] Updated weights for policy 0, policy_version 14487 (0.0029) [2024-06-06 13:06:00,155][14296] Updated weights for policy 0, policy_version 14497 (0.0035) [2024-06-06 13:06:01,561][14064] Fps is (10 sec: 47513.7, 60 sec: 48332.9, 300 sec: 47652.5). Total num frames: 237551616. Throughput: 0: 47594.2. Samples: 90644640. Policy #0 lag: (min: 0.0, avg: 12.5, max: 23.0) [2024-06-06 13:06:01,562][14064] Avg episode reward: [(0, '0.133')] [2024-06-06 13:06:04,405][14296] Updated weights for policy 0, policy_version 14507 (0.0031) [2024-06-06 13:06:06,073][14276] Signal inference workers to stop experience collection... (1350 times) [2024-06-06 13:06:06,118][14296] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-06-06 13:06:06,184][14276] Signal inference workers to resume experience collection... (1350 times) [2024-06-06 13:06:06,184][14296] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-06-06 13:06:06,561][14064] Fps is (10 sec: 54066.8, 60 sec: 47786.6, 300 sec: 47652.4). Total num frames: 237813760. Throughput: 0: 48007.7. Samples: 90789020. Policy #0 lag: (min: 0.0, avg: 7.5, max: 21.0) [2024-06-06 13:06:06,562][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:06:06,575][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000014515_237813760.pth... [2024-06-06 13:06:06,622][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000013817_226377728.pth [2024-06-06 13:06:07,141][14296] Updated weights for policy 0, policy_version 14517 (0.0030) [2024-06-06 13:06:11,427][14296] Updated weights for policy 0, policy_version 14527 (0.0027) [2024-06-06 13:06:11,561][14064] Fps is (10 sec: 45874.5, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 238010368. Throughput: 0: 47734.3. Samples: 91073460. Policy #0 lag: (min: 0.0, avg: 7.5, max: 21.0) [2024-06-06 13:06:11,562][14064] Avg episode reward: [(0, '0.133')] [2024-06-06 13:06:14,226][14296] Updated weights for policy 0, policy_version 14537 (0.0034) [2024-06-06 13:06:16,561][14064] Fps is (10 sec: 40960.6, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 238223360. Throughput: 0: 47730.3. Samples: 91357000. Policy #0 lag: (min: 0.0, avg: 11.4, max: 20.0) [2024-06-06 13:06:16,561][14064] Avg episode reward: [(0, '0.131')] [2024-06-06 13:06:18,207][14296] Updated weights for policy 0, policy_version 14547 (0.0027) [2024-06-06 13:06:20,992][14296] Updated weights for policy 0, policy_version 14557 (0.0029) [2024-06-06 13:06:21,561][14064] Fps is (10 sec: 50790.5, 60 sec: 48334.3, 300 sec: 47763.5). Total num frames: 238518272. Throughput: 0: 47414.6. Samples: 91491660. Policy #0 lag: (min: 0.0, avg: 11.4, max: 20.0) [2024-06-06 13:06:21,562][14064] Avg episode reward: [(0, '0.134')] [2024-06-06 13:06:25,274][14296] Updated weights for policy 0, policy_version 14567 (0.0022) [2024-06-06 13:06:26,561][14064] Fps is (10 sec: 52427.7, 60 sec: 47240.5, 300 sec: 47541.4). Total num frames: 238747648. Throughput: 0: 47510.5. Samples: 91779000. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 13:06:26,562][14064] Avg episode reward: [(0, '0.139')] [2024-06-06 13:06:26,569][14276] Saving new best policy, reward=0.139! [2024-06-06 13:06:27,902][14296] Updated weights for policy 0, policy_version 14577 (0.0035) [2024-06-06 13:06:31,561][14064] Fps is (10 sec: 40959.4, 60 sec: 46421.2, 300 sec: 47430.3). Total num frames: 238927872. Throughput: 0: 47647.3. Samples: 92067680. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 13:06:31,562][14064] Avg episode reward: [(0, '0.133')] [2024-06-06 13:06:32,207][14296] Updated weights for policy 0, policy_version 14587 (0.0031) [2024-06-06 13:06:34,713][14296] Updated weights for policy 0, policy_version 14597 (0.0030) [2024-06-06 13:06:36,561][14064] Fps is (10 sec: 44237.6, 60 sec: 47786.7, 300 sec: 47541.4). Total num frames: 239190016. Throughput: 0: 47052.4. Samples: 92198620. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 13:06:36,562][14064] Avg episode reward: [(0, '0.134')] [2024-06-06 13:06:38,892][14296] Updated weights for policy 0, policy_version 14607 (0.0032) [2024-06-06 13:06:41,564][14064] Fps is (10 sec: 54053.7, 60 sec: 48057.6, 300 sec: 47707.5). Total num frames: 239468544. Throughput: 0: 47099.4. Samples: 92485760. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:06:41,565][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:06:41,719][14296] Updated weights for policy 0, policy_version 14617 (0.0025) [2024-06-06 13:06:45,823][14296] Updated weights for policy 0, policy_version 14627 (0.0028) [2024-06-06 13:06:46,561][14064] Fps is (10 sec: 47513.5, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 239665152. Throughput: 0: 47451.9. Samples: 92779980. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:06:46,562][14064] Avg episode reward: [(0, '0.136')] [2024-06-06 13:06:48,426][14296] Updated weights for policy 0, policy_version 14637 (0.0031) [2024-06-06 13:06:51,561][14064] Fps is (10 sec: 42609.6, 60 sec: 46967.4, 300 sec: 47596.9). Total num frames: 239894528. Throughput: 0: 47365.8. Samples: 92920480. Policy #0 lag: (min: 0.0, avg: 11.7, max: 23.0) [2024-06-06 13:06:51,562][14064] Avg episode reward: [(0, '0.134')] [2024-06-06 13:06:52,672][14296] Updated weights for policy 0, policy_version 14647 (0.0026) [2024-06-06 13:06:55,271][14296] Updated weights for policy 0, policy_version 14657 (0.0032) [2024-06-06 13:06:56,561][14064] Fps is (10 sec: 50790.5, 60 sec: 48332.8, 300 sec: 47708.0). Total num frames: 240173056. Throughput: 0: 47236.1. Samples: 93199080. Policy #0 lag: (min: 0.0, avg: 11.7, max: 23.0) [2024-06-06 13:06:56,562][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:06:59,498][14296] Updated weights for policy 0, policy_version 14667 (0.0024) [2024-06-06 13:07:01,561][14064] Fps is (10 sec: 52428.4, 60 sec: 47786.5, 300 sec: 47596.9). Total num frames: 240418816. Throughput: 0: 47254.5. Samples: 93483460. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:07:01,562][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:07:02,255][14296] Updated weights for policy 0, policy_version 14677 (0.0030) [2024-06-06 13:07:06,490][14296] Updated weights for policy 0, policy_version 14687 (0.0028) [2024-06-06 13:07:06,561][14064] Fps is (10 sec: 45874.6, 60 sec: 46967.5, 300 sec: 47597.3). Total num frames: 240631808. Throughput: 0: 47732.8. Samples: 93639640. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:07:06,562][14064] Avg episode reward: [(0, '0.137')] [2024-06-06 13:07:09,131][14296] Updated weights for policy 0, policy_version 14697 (0.0035) [2024-06-06 13:07:11,561][14064] Fps is (10 sec: 42598.4, 60 sec: 47240.5, 300 sec: 47541.3). Total num frames: 240844800. Throughput: 0: 47527.6. Samples: 93917740. Policy #0 lag: (min: 0.0, avg: 12.5, max: 23.0) [2024-06-06 13:07:11,562][14064] Avg episode reward: [(0, '0.129')] [2024-06-06 13:07:13,150][14276] Signal inference workers to stop experience collection... (1400 times) [2024-06-06 13:07:13,200][14296] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-06-06 13:07:13,204][14276] Signal inference workers to resume experience collection... (1400 times) [2024-06-06 13:07:13,224][14296] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-06-06 13:07:13,331][14296] Updated weights for policy 0, policy_version 14707 (0.0025) [2024-06-06 13:07:16,132][14296] Updated weights for policy 0, policy_version 14717 (0.0030) [2024-06-06 13:07:16,561][14064] Fps is (10 sec: 50790.6, 60 sec: 48605.8, 300 sec: 47708.0). Total num frames: 241139712. Throughput: 0: 47210.3. Samples: 94192140. Policy #0 lag: (min: 0.0, avg: 12.5, max: 23.0) [2024-06-06 13:07:16,562][14064] Avg episode reward: [(0, '0.125')] [2024-06-06 13:07:20,639][14296] Updated weights for policy 0, policy_version 14727 (0.0024) [2024-06-06 13:07:21,561][14064] Fps is (10 sec: 49152.1, 60 sec: 46967.4, 300 sec: 47485.8). Total num frames: 241336320. Throughput: 0: 47674.1. Samples: 94343960. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 13:07:21,562][14064] Avg episode reward: [(0, '0.133')] [2024-06-06 13:07:22,954][14296] Updated weights for policy 0, policy_version 14737 (0.0022) [2024-06-06 13:07:26,561][14064] Fps is (10 sec: 42598.6, 60 sec: 46967.6, 300 sec: 47541.8). Total num frames: 241565696. Throughput: 0: 47642.4. Samples: 94629540. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 13:07:26,562][14064] Avg episode reward: [(0, '0.131')] [2024-06-06 13:07:27,677][14296] Updated weights for policy 0, policy_version 14747 (0.0035) [2024-06-06 13:07:30,078][14296] Updated weights for policy 0, policy_version 14757 (0.0035) [2024-06-06 13:07:31,562][14064] Fps is (10 sec: 47513.3, 60 sec: 48059.7, 300 sec: 47596.9). Total num frames: 241811456. Throughput: 0: 47384.2. Samples: 94912280. Policy #0 lag: (min: 0.0, avg: 12.6, max: 21.0) [2024-06-06 13:07:31,562][14064] Avg episode reward: [(0, '0.135')] [2024-06-06 13:07:34,474][14296] Updated weights for policy 0, policy_version 14767 (0.0037) [2024-06-06 13:07:36,561][14064] Fps is (10 sec: 50789.9, 60 sec: 48059.6, 300 sec: 47652.4). Total num frames: 242073600. Throughput: 0: 47496.4. Samples: 95057820. Policy #0 lag: (min: 0.0, avg: 12.6, max: 21.0) [2024-06-06 13:07:36,562][14064] Avg episode reward: [(0, '0.138')] [2024-06-06 13:07:36,861][14296] Updated weights for policy 0, policy_version 14777 (0.0026) [2024-06-06 13:07:41,491][14296] Updated weights for policy 0, policy_version 14787 (0.0024) [2024-06-06 13:07:41,561][14064] Fps is (10 sec: 45876.0, 60 sec: 46696.5, 300 sec: 47374.8). Total num frames: 242270208. Throughput: 0: 47619.0. Samples: 95341940. Policy #0 lag: (min: 0.0, avg: 12.6, max: 21.0) [2024-06-06 13:07:41,562][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:07:43,729][14296] Updated weights for policy 0, policy_version 14797 (0.0020) [2024-06-06 13:07:46,561][14064] Fps is (10 sec: 44237.0, 60 sec: 47513.5, 300 sec: 47596.9). Total num frames: 242515968. Throughput: 0: 47643.2. Samples: 95627400. Policy #0 lag: (min: 1.0, avg: 9.4, max: 22.0) [2024-06-06 13:07:46,562][14064] Avg episode reward: [(0, '0.132')] [2024-06-06 13:07:48,423][14296] Updated weights for policy 0, policy_version 14807 (0.0031) [2024-06-06 13:07:50,546][14296] Updated weights for policy 0, policy_version 14817 (0.0025) [2024-06-06 13:07:51,561][14064] Fps is (10 sec: 52428.7, 60 sec: 48332.8, 300 sec: 47652.4). Total num frames: 242794496. Throughput: 0: 47200.5. Samples: 95763660. Policy #0 lag: (min: 1.0, avg: 9.4, max: 22.0) [2024-06-06 13:07:51,562][14064] Avg episode reward: [(0, '0.139')] [2024-06-06 13:07:55,244][14296] Updated weights for policy 0, policy_version 14827 (0.0034) [2024-06-06 13:07:56,561][14064] Fps is (10 sec: 50790.4, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 243023872. Throughput: 0: 47535.2. Samples: 96056820. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 13:07:56,562][14064] Avg episode reward: [(0, '0.130')] [2024-06-06 13:07:57,474][14296] Updated weights for policy 0, policy_version 14837 (0.0038) [2024-06-06 13:08:01,561][14064] Fps is (10 sec: 42598.4, 60 sec: 46694.5, 300 sec: 47430.3). Total num frames: 243220480. Throughput: 0: 47691.1. Samples: 96338240. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 13:08:01,562][14064] Avg episode reward: [(0, '0.129')] [2024-06-06 13:08:02,015][14296] Updated weights for policy 0, policy_version 14847 (0.0029) [2024-06-06 13:08:04,772][14296] Updated weights for policy 0, policy_version 14857 (0.0022) [2024-06-06 13:08:06,561][14064] Fps is (10 sec: 44236.3, 60 sec: 47240.5, 300 sec: 47596.9). Total num frames: 243466240. Throughput: 0: 47433.7. Samples: 96478480. Policy #0 lag: (min: 1.0, avg: 10.9, max: 22.0) [2024-06-06 13:08:06,562][14064] Avg episode reward: [(0, '0.134')] [2024-06-06 13:08:06,566][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000014860_243466240.pth... [2024-06-06 13:08:06,617][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000014163_232046592.pth [2024-06-06 13:08:09,092][14296] Updated weights for policy 0, policy_version 14867 (0.0029) [2024-06-06 13:08:11,511][14296] Updated weights for policy 0, policy_version 14877 (0.0023) [2024-06-06 13:08:11,561][14064] Fps is (10 sec: 52428.7, 60 sec: 48332.9, 300 sec: 47596.9). Total num frames: 243744768. Throughput: 0: 47236.0. Samples: 96755160. Policy #0 lag: (min: 1.0, avg: 10.9, max: 22.0) [2024-06-06 13:08:11,562][14064] Avg episode reward: [(0, '0.131')] [2024-06-06 13:08:16,054][14296] Updated weights for policy 0, policy_version 14887 (0.0032) [2024-06-06 13:08:16,561][14064] Fps is (10 sec: 47514.4, 60 sec: 46694.4, 300 sec: 47430.3). Total num frames: 243941376. Throughput: 0: 47623.8. Samples: 97055340. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 13:08:16,562][14064] Avg episode reward: [(0, '0.136')] [2024-06-06 13:08:18,538][14296] Updated weights for policy 0, policy_version 14897 (0.0019) [2024-06-06 13:08:21,561][14064] Fps is (10 sec: 44236.2, 60 sec: 47513.6, 300 sec: 47541.3). Total num frames: 244187136. Throughput: 0: 47321.7. Samples: 97187300. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 13:08:21,562][14064] Avg episode reward: [(0, '0.137')] [2024-06-06 13:08:23,119][14276] Signal inference workers to stop experience collection... (1450 times) [2024-06-06 13:08:23,120][14276] Signal inference workers to resume experience collection... (1450 times) [2024-06-06 13:08:23,123][14296] Updated weights for policy 0, policy_version 14907 (0.0023) [2024-06-06 13:08:23,134][14296] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-06-06 13:08:23,134][14296] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-06-06 13:08:25,334][14296] Updated weights for policy 0, policy_version 14917 (0.0026) [2024-06-06 13:08:26,561][14064] Fps is (10 sec: 49151.1, 60 sec: 47786.5, 300 sec: 47596.9). Total num frames: 244432896. Throughput: 0: 47284.7. Samples: 97469760. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 13:08:26,562][14064] Avg episode reward: [(0, '0.136')] [2024-06-06 13:08:29,770][14296] Updated weights for policy 0, policy_version 14927 (0.0040) [2024-06-06 13:08:31,561][14064] Fps is (10 sec: 50791.4, 60 sec: 48059.9, 300 sec: 47652.4). Total num frames: 244695040. Throughput: 0: 47298.3. Samples: 97755820. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 13:08:31,562][14064] Avg episode reward: [(0, '0.135')] [2024-06-06 13:08:32,208][14296] Updated weights for policy 0, policy_version 14937 (0.0025) [2024-06-06 13:08:36,561][14064] Fps is (10 sec: 44237.1, 60 sec: 46694.4, 300 sec: 47374.7). Total num frames: 244875264. Throughput: 0: 47670.6. Samples: 97908840. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 13:08:36,562][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:08:36,576][14276] Saving new best policy, reward=0.141! [2024-06-06 13:08:36,788][14296] Updated weights for policy 0, policy_version 14947 (0.0028) [2024-06-06 13:08:38,871][14296] Updated weights for policy 0, policy_version 14957 (0.0034) [2024-06-06 13:08:41,561][14064] Fps is (10 sec: 44236.7, 60 sec: 47786.7, 300 sec: 47597.3). Total num frames: 245137408. Throughput: 0: 47420.5. Samples: 98190740. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 13:08:41,567][14064] Avg episode reward: [(0, '0.142')] [2024-06-06 13:08:43,498][14296] Updated weights for policy 0, policy_version 14967 (0.0031) [2024-06-06 13:08:45,916][14296] Updated weights for policy 0, policy_version 14977 (0.0039) [2024-06-06 13:08:46,561][14064] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 245399552. Throughput: 0: 47442.1. Samples: 98473140. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 13:08:46,562][14064] Avg episode reward: [(0, '0.135')] [2024-06-06 13:08:50,554][14296] Updated weights for policy 0, policy_version 14987 (0.0021) [2024-06-06 13:08:51,561][14064] Fps is (10 sec: 49151.8, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 245628928. Throughput: 0: 47605.9. Samples: 98620740. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 13:08:51,562][14064] Avg episode reward: [(0, '0.140')] [2024-06-06 13:08:52,826][14296] Updated weights for policy 0, policy_version 14997 (0.0035) [2024-06-06 13:08:56,561][14064] Fps is (10 sec: 44237.4, 60 sec: 46967.5, 300 sec: 47596.9). Total num frames: 245841920. Throughput: 0: 47918.8. Samples: 98911500. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 13:08:56,562][14064] Avg episode reward: [(0, '0.134')] [2024-06-06 13:08:57,140][14296] Updated weights for policy 0, policy_version 15007 (0.0026) [2024-06-06 13:08:59,513][14296] Updated weights for policy 0, policy_version 15017 (0.0028) [2024-06-06 13:09:01,561][14064] Fps is (10 sec: 47513.9, 60 sec: 48059.8, 300 sec: 47708.0). Total num frames: 246104064. Throughput: 0: 47827.6. Samples: 99207580. Policy #0 lag: (min: 1.0, avg: 10.5, max: 22.0) [2024-06-06 13:09:01,562][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:09:04,004][14296] Updated weights for policy 0, policy_version 15027 (0.0028) [2024-06-06 13:09:06,179][14296] Updated weights for policy 0, policy_version 15037 (0.0028) [2024-06-06 13:09:06,561][14064] Fps is (10 sec: 52428.8, 60 sec: 48333.0, 300 sec: 47708.0). Total num frames: 246366208. Throughput: 0: 48084.2. Samples: 99351080. Policy #0 lag: (min: 1.0, avg: 10.5, max: 22.0) [2024-06-06 13:09:06,562][14064] Avg episode reward: [(0, '0.138')] [2024-06-06 13:09:10,919][14296] Updated weights for policy 0, policy_version 15047 (0.0031) [2024-06-06 13:09:11,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 246579200. Throughput: 0: 48129.9. Samples: 99635600. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 13:09:11,562][14064] Avg episode reward: [(0, '0.135')] [2024-06-06 13:09:12,753][14276] Signal inference workers to stop experience collection... (1500 times) [2024-06-06 13:09:12,753][14276] Signal inference workers to resume experience collection... (1500 times) [2024-06-06 13:09:12,798][14296] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-06-06 13:09:12,798][14296] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-06-06 13:09:13,043][14296] Updated weights for policy 0, policy_version 15057 (0.0032) [2024-06-06 13:09:16,561][14064] Fps is (10 sec: 44236.6, 60 sec: 47786.7, 300 sec: 47652.4). Total num frames: 246808576. Throughput: 0: 47950.6. Samples: 99913600. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 13:09:16,562][14064] Avg episode reward: [(0, '0.137')] [2024-06-06 13:09:17,920][14296] Updated weights for policy 0, policy_version 15067 (0.0032) [2024-06-06 13:09:20,056][14296] Updated weights for policy 0, policy_version 15077 (0.0030) [2024-06-06 13:09:21,561][14064] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 47763.5). Total num frames: 247070720. Throughput: 0: 47621.0. Samples: 100051780. Policy #0 lag: (min: 0.0, avg: 11.7, max: 20.0) [2024-06-06 13:09:21,562][14064] Avg episode reward: [(0, '0.133')] [2024-06-06 13:09:24,549][14296] Updated weights for policy 0, policy_version 15087 (0.0033) [2024-06-06 13:09:26,561][14064] Fps is (10 sec: 50790.2, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 247316480. Throughput: 0: 47848.8. Samples: 100343940. Policy #0 lag: (min: 0.0, avg: 11.7, max: 20.0) [2024-06-06 13:09:26,562][14064] Avg episode reward: [(0, '0.142')] [2024-06-06 13:09:26,984][14296] Updated weights for policy 0, policy_version 15097 (0.0032) [2024-06-06 13:09:31,535][14296] Updated weights for policy 0, policy_version 15107 (0.0021) [2024-06-06 13:09:31,561][14064] Fps is (10 sec: 44237.2, 60 sec: 46967.5, 300 sec: 47485.8). Total num frames: 247513088. Throughput: 0: 48168.6. Samples: 100640720. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 13:09:31,562][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:09:33,877][14296] Updated weights for policy 0, policy_version 15117 (0.0023) [2024-06-06 13:09:36,561][14064] Fps is (10 sec: 44237.3, 60 sec: 48059.9, 300 sec: 47652.4). Total num frames: 247758848. Throughput: 0: 47660.1. Samples: 100765440. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 13:09:36,562][14064] Avg episode reward: [(0, '0.134')] [2024-06-06 13:09:38,380][14296] Updated weights for policy 0, policy_version 15127 (0.0026) [2024-06-06 13:09:40,986][14296] Updated weights for policy 0, policy_version 15137 (0.0025) [2024-06-06 13:09:41,561][14064] Fps is (10 sec: 50789.8, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 248020992. Throughput: 0: 47523.0. Samples: 101050040. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 13:09:41,562][14064] Avg episode reward: [(0, '0.143')] [2024-06-06 13:09:41,562][14276] Saving new best policy, reward=0.143! [2024-06-06 13:09:45,349][14296] Updated weights for policy 0, policy_version 15147 (0.0033) [2024-06-06 13:09:46,564][14064] Fps is (10 sec: 50776.8, 60 sec: 47784.7, 300 sec: 47652.0). Total num frames: 248266752. Throughput: 0: 47332.3. Samples: 101337660. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:09:46,565][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:09:47,734][14296] Updated weights for policy 0, policy_version 15157 (0.0026) [2024-06-06 13:09:51,561][14064] Fps is (10 sec: 44237.0, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 248463360. Throughput: 0: 47368.4. Samples: 101482660. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:09:51,562][14064] Avg episode reward: [(0, '0.138')] [2024-06-06 13:09:52,018][14296] Updated weights for policy 0, policy_version 15167 (0.0028) [2024-06-06 13:09:54,419][14296] Updated weights for policy 0, policy_version 15177 (0.0024) [2024-06-06 13:09:56,561][14064] Fps is (10 sec: 47525.6, 60 sec: 48332.7, 300 sec: 47763.5). Total num frames: 248741888. Throughput: 0: 47480.8. Samples: 101772240. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 13:09:56,562][14064] Avg episode reward: [(0, '0.138')] [2024-06-06 13:09:58,866][14296] Updated weights for policy 0, policy_version 15187 (0.0029) [2024-06-06 13:10:01,342][14296] Updated weights for policy 0, policy_version 15197 (0.0027) [2024-06-06 13:10:01,561][14064] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 47596.9). Total num frames: 248987648. Throughput: 0: 47551.6. Samples: 102053420. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 13:10:01,562][14064] Avg episode reward: [(0, '0.139')] [2024-06-06 13:10:05,661][14296] Updated weights for policy 0, policy_version 15207 (0.0027) [2024-06-06 13:10:06,561][14064] Fps is (10 sec: 45875.6, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 249200640. Throughput: 0: 47760.5. Samples: 102201000. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 13:10:06,562][14064] Avg episode reward: [(0, '0.136')] [2024-06-06 13:10:06,571][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000015210_249200640.pth... [2024-06-06 13:10:06,622][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000014515_237813760.pth [2024-06-06 13:10:08,240][14296] Updated weights for policy 0, policy_version 15217 (0.0039) [2024-06-06 13:10:11,562][14064] Fps is (10 sec: 45874.2, 60 sec: 47786.6, 300 sec: 47707.9). Total num frames: 249446400. Throughput: 0: 47708.3. Samples: 102490820. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 13:10:11,562][14064] Avg episode reward: [(0, '0.143')] [2024-06-06 13:10:12,641][14296] Updated weights for policy 0, policy_version 15227 (0.0037) [2024-06-06 13:10:15,110][14296] Updated weights for policy 0, policy_version 15237 (0.0024) [2024-06-06 13:10:16,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47597.2). Total num frames: 249659392. Throughput: 0: 47528.0. Samples: 102779480. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 13:10:16,562][14064] Avg episode reward: [(0, '0.136')] [2024-06-06 13:10:16,832][14276] Signal inference workers to stop experience collection... (1550 times) [2024-06-06 13:10:16,837][14276] Signal inference workers to resume experience collection... (1550 times) [2024-06-06 13:10:16,855][14296] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-06-06 13:10:16,855][14296] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-06-06 13:10:19,239][14296] Updated weights for policy 0, policy_version 15247 (0.0032) [2024-06-06 13:10:21,561][14064] Fps is (10 sec: 50791.3, 60 sec: 48059.8, 300 sec: 47596.9). Total num frames: 249954304. Throughput: 0: 47983.5. Samples: 102924700. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:10:21,562][14064] Avg episode reward: [(0, '0.139')] [2024-06-06 13:10:21,773][14296] Updated weights for policy 0, policy_version 15257 (0.0038) [2024-06-06 13:10:26,178][14296] Updated weights for policy 0, policy_version 15267 (0.0030) [2024-06-06 13:10:26,561][14064] Fps is (10 sec: 47513.6, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 250134528. Throughput: 0: 47833.4. Samples: 103202540. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:10:26,562][14064] Avg episode reward: [(0, '0.138')] [2024-06-06 13:10:29,235][14296] Updated weights for policy 0, policy_version 15277 (0.0046) [2024-06-06 13:10:31,561][14064] Fps is (10 sec: 44236.6, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 250396672. Throughput: 0: 47615.2. Samples: 103480220. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 13:10:31,562][14064] Avg episode reward: [(0, '0.140')] [2024-06-06 13:10:33,217][14296] Updated weights for policy 0, policy_version 15287 (0.0028) [2024-06-06 13:10:36,042][14296] Updated weights for policy 0, policy_version 15297 (0.0042) [2024-06-06 13:10:36,561][14064] Fps is (10 sec: 49152.1, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 250626048. Throughput: 0: 47533.0. Samples: 103621640. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 13:10:36,562][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:10:40,486][14296] Updated weights for policy 0, policy_version 15307 (0.0020) [2024-06-06 13:10:41,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47513.6, 300 sec: 47596.9). Total num frames: 250871808. Throughput: 0: 47541.0. Samples: 103911580. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 13:10:41,562][14064] Avg episode reward: [(0, '0.144')] [2024-06-06 13:10:43,217][14296] Updated weights for policy 0, policy_version 15317 (0.0047) [2024-06-06 13:10:46,561][14064] Fps is (10 sec: 45875.2, 60 sec: 46969.6, 300 sec: 47485.8). Total num frames: 251084800. Throughput: 0: 47545.8. Samples: 104192980. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 13:10:46,562][14064] Avg episode reward: [(0, '0.135')] [2024-06-06 13:10:47,478][14296] Updated weights for policy 0, policy_version 15327 (0.0038) [2024-06-06 13:10:50,446][14296] Updated weights for policy 0, policy_version 15337 (0.0030) [2024-06-06 13:10:51,561][14064] Fps is (10 sec: 47513.8, 60 sec: 48059.8, 300 sec: 47708.0). Total num frames: 251346944. Throughput: 0: 47286.2. Samples: 104328880. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:10:51,562][14064] Avg episode reward: [(0, '0.143')] [2024-06-06 13:10:54,455][14296] Updated weights for policy 0, policy_version 15347 (0.0027) [2024-06-06 13:10:56,561][14064] Fps is (10 sec: 50790.1, 60 sec: 47513.7, 300 sec: 47596.9). Total num frames: 251592704. Throughput: 0: 47138.4. Samples: 104612040. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:10:56,562][14064] Avg episode reward: [(0, '0.138')] [2024-06-06 13:10:57,183][14296] Updated weights for policy 0, policy_version 15357 (0.0028) [2024-06-06 13:11:01,370][14296] Updated weights for policy 0, policy_version 15367 (0.0028) [2024-06-06 13:11:01,564][14064] Fps is (10 sec: 42587.2, 60 sec: 46419.3, 300 sec: 47318.8). Total num frames: 251772928. Throughput: 0: 47271.5. Samples: 104906820. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:11:01,565][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:11:03,827][14296] Updated weights for policy 0, policy_version 15377 (0.0027) [2024-06-06 13:11:06,564][14064] Fps is (10 sec: 47501.2, 60 sec: 47784.6, 300 sec: 47652.0). Total num frames: 252067840. Throughput: 0: 47059.9. Samples: 105042520. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 13:11:06,565][14064] Avg episode reward: [(0, '0.142')] [2024-06-06 13:11:08,187][14296] Updated weights for policy 0, policy_version 15387 (0.0021) [2024-06-06 13:11:10,894][14296] Updated weights for policy 0, policy_version 15397 (0.0033) [2024-06-06 13:11:11,561][14064] Fps is (10 sec: 50803.4, 60 sec: 47240.7, 300 sec: 47652.4). Total num frames: 252280832. Throughput: 0: 47218.6. Samples: 105327380. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 13:11:11,562][14064] Avg episode reward: [(0, '0.138')] [2024-06-06 13:11:14,990][14296] Updated weights for policy 0, policy_version 15407 (0.0028) [2024-06-06 13:11:16,564][14064] Fps is (10 sec: 47513.2, 60 sec: 48057.5, 300 sec: 47540.9). Total num frames: 252542976. Throughput: 0: 47420.3. Samples: 105614260. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:11:16,565][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:11:17,655][14296] Updated weights for policy 0, policy_version 15417 (0.0022) [2024-06-06 13:11:21,561][14064] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 47430.3). Total num frames: 252739584. Throughput: 0: 47484.8. Samples: 105758460. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:11:21,562][14064] Avg episode reward: [(0, '0.138')] [2024-06-06 13:11:21,666][14296] Updated weights for policy 0, policy_version 15427 (0.0026) [2024-06-06 13:11:24,226][14296] Updated weights for policy 0, policy_version 15437 (0.0026) [2024-06-06 13:11:25,472][14276] Signal inference workers to stop experience collection... (1600 times) [2024-06-06 13:11:25,478][14276] Signal inference workers to resume experience collection... (1600 times) [2024-06-06 13:11:25,484][14296] InferenceWorker_p0-w0: stopping experience collection (1600 times) [2024-06-06 13:11:25,494][14296] InferenceWorker_p0-w0: resuming experience collection (1600 times) [2024-06-06 13:11:26,561][14064] Fps is (10 sec: 47526.5, 60 sec: 48059.7, 300 sec: 47763.6). Total num frames: 253018112. Throughput: 0: 47605.4. Samples: 106053820. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 13:11:26,562][14064] Avg episode reward: [(0, '0.139')] [2024-06-06 13:11:28,795][14296] Updated weights for policy 0, policy_version 15447 (0.0026) [2024-06-06 13:11:31,239][14296] Updated weights for policy 0, policy_version 15457 (0.0028) [2024-06-06 13:11:31,561][14064] Fps is (10 sec: 52429.2, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 253263872. Throughput: 0: 47533.3. Samples: 106331980. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 13:11:31,562][14064] Avg episode reward: [(0, '0.143')] [2024-06-06 13:11:35,627][14296] Updated weights for policy 0, policy_version 15467 (0.0031) [2024-06-06 13:11:36,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47486.3). Total num frames: 253476864. Throughput: 0: 47866.7. Samples: 106482880. Policy #0 lag: (min: 0.0, avg: 7.1, max: 23.0) [2024-06-06 13:11:36,562][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:11:38,264][14296] Updated weights for policy 0, policy_version 15477 (0.0025) [2024-06-06 13:11:41,564][14064] Fps is (10 sec: 44225.0, 60 sec: 47238.5, 300 sec: 47596.5). Total num frames: 253706240. Throughput: 0: 47683.9. Samples: 106757940. Policy #0 lag: (min: 0.0, avg: 7.1, max: 23.0) [2024-06-06 13:11:41,565][14064] Avg episode reward: [(0, '0.140')] [2024-06-06 13:11:42,626][14296] Updated weights for policy 0, policy_version 15487 (0.0028) [2024-06-06 13:11:45,136][14296] Updated weights for policy 0, policy_version 15497 (0.0036) [2024-06-06 13:11:46,561][14064] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 253968384. Throughput: 0: 47441.0. Samples: 107041540. Policy #0 lag: (min: 0.0, avg: 7.1, max: 23.0) [2024-06-06 13:11:46,562][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:11:49,385][14296] Updated weights for policy 0, policy_version 15507 (0.0023) [2024-06-06 13:11:51,561][14064] Fps is (10 sec: 49165.2, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 254197760. Throughput: 0: 47869.1. Samples: 107196500. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:11:51,562][14064] Avg episode reward: [(0, '0.146')] [2024-06-06 13:11:51,570][14276] Saving new best policy, reward=0.146! [2024-06-06 13:11:51,923][14296] Updated weights for policy 0, policy_version 15517 (0.0037) [2024-06-06 13:11:56,368][14296] Updated weights for policy 0, policy_version 15527 (0.0032) [2024-06-06 13:11:56,561][14064] Fps is (10 sec: 42598.7, 60 sec: 46694.5, 300 sec: 47374.8). Total num frames: 254394368. Throughput: 0: 47807.3. Samples: 107478700. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:11:56,561][14064] Avg episode reward: [(0, '0.145')] [2024-06-06 13:11:58,679][14296] Updated weights for policy 0, policy_version 15537 (0.0027) [2024-06-06 13:12:01,561][14064] Fps is (10 sec: 49152.1, 60 sec: 48608.0, 300 sec: 47652.5). Total num frames: 254689280. Throughput: 0: 47571.8. Samples: 107754860. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 13:12:01,562][14064] Avg episode reward: [(0, '0.145')] [2024-06-06 13:12:03,320][14296] Updated weights for policy 0, policy_version 15547 (0.0034) [2024-06-06 13:12:05,714][14296] Updated weights for policy 0, policy_version 15557 (0.0033) [2024-06-06 13:12:06,561][14064] Fps is (10 sec: 52427.9, 60 sec: 47515.6, 300 sec: 47708.0). Total num frames: 254918656. Throughput: 0: 47557.8. Samples: 107898560. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 13:12:06,562][14064] Avg episode reward: [(0, '0.143')] [2024-06-06 13:12:06,661][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000015560_254935040.pth... [2024-06-06 13:12:06,716][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000014860_243466240.pth [2024-06-06 13:12:10,163][14296] Updated weights for policy 0, policy_version 15567 (0.0026) [2024-06-06 13:12:11,561][14064] Fps is (10 sec: 44236.8, 60 sec: 47513.7, 300 sec: 47430.3). Total num frames: 255131648. Throughput: 0: 47520.9. Samples: 108192260. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 13:12:11,561][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:12:12,370][14296] Updated weights for policy 0, policy_version 15577 (0.0030) [2024-06-06 13:12:16,561][14064] Fps is (10 sec: 44237.4, 60 sec: 46969.6, 300 sec: 47541.4). Total num frames: 255361024. Throughput: 0: 47816.1. Samples: 108483700. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 13:12:16,562][14064] Avg episode reward: [(0, '0.144')] [2024-06-06 13:12:16,959][14296] Updated weights for policy 0, policy_version 15587 (0.0040) [2024-06-06 13:12:19,277][14296] Updated weights for policy 0, policy_version 15597 (0.0025) [2024-06-06 13:12:21,561][14064] Fps is (10 sec: 49152.2, 60 sec: 48059.9, 300 sec: 47652.5). Total num frames: 255623168. Throughput: 0: 47317.8. Samples: 108612180. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 13:12:21,562][14064] Avg episode reward: [(0, '0.143')] [2024-06-06 13:12:24,048][14296] Updated weights for policy 0, policy_version 15607 (0.0032) [2024-06-06 13:12:26,289][14296] Updated weights for policy 0, policy_version 15617 (0.0034) [2024-06-06 13:12:26,564][14064] Fps is (10 sec: 50776.7, 60 sec: 47511.5, 300 sec: 47652.1). Total num frames: 255868928. Throughput: 0: 47662.7. Samples: 108902760. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 13:12:26,565][14064] Avg episode reward: [(0, '0.146')] [2024-06-06 13:12:30,705][14296] Updated weights for policy 0, policy_version 15627 (0.0025) [2024-06-06 13:12:31,564][14064] Fps is (10 sec: 45862.9, 60 sec: 46965.4, 300 sec: 47485.4). Total num frames: 256081920. Throughput: 0: 47695.0. Samples: 109187940. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 13:12:31,565][14064] Avg episode reward: [(0, '0.140')] [2024-06-06 13:12:33,059][14296] Updated weights for policy 0, policy_version 15637 (0.0028) [2024-06-06 13:12:36,566][14064] Fps is (10 sec: 45866.5, 60 sec: 47510.0, 300 sec: 47651.7). Total num frames: 256327680. Throughput: 0: 47469.9. Samples: 109332860. Policy #0 lag: (min: 1.0, avg: 11.3, max: 23.0) [2024-06-06 13:12:36,566][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:12:37,487][14296] Updated weights for policy 0, policy_version 15647 (0.0025) [2024-06-06 13:12:39,862][14296] Updated weights for policy 0, policy_version 15657 (0.0032) [2024-06-06 13:12:41,561][14064] Fps is (10 sec: 50803.2, 60 sec: 48061.8, 300 sec: 47708.0). Total num frames: 256589824. Throughput: 0: 47537.2. Samples: 109617880. Policy #0 lag: (min: 1.0, avg: 11.3, max: 23.0) [2024-06-06 13:12:41,562][14064] Avg episode reward: [(0, '0.142')] [2024-06-06 13:12:44,319][14296] Updated weights for policy 0, policy_version 15667 (0.0031) [2024-06-06 13:12:44,655][14276] Signal inference workers to stop experience collection... (1650 times) [2024-06-06 13:12:44,655][14276] Signal inference workers to resume experience collection... (1650 times) [2024-06-06 13:12:44,696][14296] InferenceWorker_p0-w0: stopping experience collection (1650 times) [2024-06-06 13:12:44,696][14296] InferenceWorker_p0-w0: resuming experience collection (1650 times) [2024-06-06 13:12:46,561][14064] Fps is (10 sec: 50813.6, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 256835584. Throughput: 0: 47736.9. Samples: 109903020. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-06 13:12:46,562][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:12:46,724][14296] Updated weights for policy 0, policy_version 15677 (0.0034) [2024-06-06 13:12:51,253][14296] Updated weights for policy 0, policy_version 15687 (0.0022) [2024-06-06 13:12:51,561][14064] Fps is (10 sec: 44237.3, 60 sec: 47240.6, 300 sec: 47485.8). Total num frames: 257032192. Throughput: 0: 47768.1. Samples: 110048120. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-06 13:12:51,561][14064] Avg episode reward: [(0, '0.138')] [2024-06-06 13:12:53,387][14296] Updated weights for policy 0, policy_version 15697 (0.0026) [2024-06-06 13:12:56,561][14064] Fps is (10 sec: 47513.3, 60 sec: 48605.8, 300 sec: 47763.5). Total num frames: 257310720. Throughput: 0: 47741.7. Samples: 110340640. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 13:12:56,568][14064] Avg episode reward: [(0, '0.139')] [2024-06-06 13:12:57,924][14296] Updated weights for policy 0, policy_version 15707 (0.0027) [2024-06-06 13:13:00,540][14296] Updated weights for policy 0, policy_version 15717 (0.0025) [2024-06-06 13:13:01,561][14064] Fps is (10 sec: 50790.4, 60 sec: 47513.6, 300 sec: 47708.0). Total num frames: 257540096. Throughput: 0: 47640.9. Samples: 110627540. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 13:13:01,561][14064] Avg episode reward: [(0, '0.143')] [2024-06-06 13:13:04,881][14296] Updated weights for policy 0, policy_version 15727 (0.0026) [2024-06-06 13:13:06,561][14064] Fps is (10 sec: 45875.5, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 257769472. Throughput: 0: 48068.4. Samples: 110775260. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 13:13:06,562][14064] Avg episode reward: [(0, '0.141')] [2024-06-06 13:13:07,383][14296] Updated weights for policy 0, policy_version 15737 (0.0031) [2024-06-06 13:13:11,561][14064] Fps is (10 sec: 44236.0, 60 sec: 47513.5, 300 sec: 47596.9). Total num frames: 257982464. Throughput: 0: 47777.8. Samples: 111052640. Policy #0 lag: (min: 1.0, avg: 10.9, max: 23.0) [2024-06-06 13:13:11,562][14064] Avg episode reward: [(0, '0.148')] [2024-06-06 13:13:11,563][14276] Saving new best policy, reward=0.148! [2024-06-06 13:13:11,779][14296] Updated weights for policy 0, policy_version 15747 (0.0031) [2024-06-06 13:13:14,425][14296] Updated weights for policy 0, policy_version 15757 (0.0035) [2024-06-06 13:13:16,561][14064] Fps is (10 sec: 47513.7, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 258244608. Throughput: 0: 47689.0. Samples: 111333820. Policy #0 lag: (min: 1.0, avg: 10.9, max: 23.0) [2024-06-06 13:13:16,561][14064] Avg episode reward: [(0, '0.148')] [2024-06-06 13:13:18,641][14296] Updated weights for policy 0, policy_version 15767 (0.0038) [2024-06-06 13:13:21,301][14296] Updated weights for policy 0, policy_version 15777 (0.0029) [2024-06-06 13:13:21,561][14064] Fps is (10 sec: 50791.2, 60 sec: 47786.6, 300 sec: 47652.5). Total num frames: 258490368. Throughput: 0: 47736.0. Samples: 111480760. Policy #0 lag: (min: 0.0, avg: 8.4, max: 21.0) [2024-06-06 13:13:21,562][14064] Avg episode reward: [(0, '0.146')] [2024-06-06 13:13:25,596][14296] Updated weights for policy 0, policy_version 15787 (0.0023) [2024-06-06 13:13:26,561][14064] Fps is (10 sec: 47513.4, 60 sec: 47515.7, 300 sec: 47541.4). Total num frames: 258719744. Throughput: 0: 47771.6. Samples: 111767600. Policy #0 lag: (min: 0.0, avg: 8.4, max: 21.0) [2024-06-06 13:13:26,562][14064] Avg episode reward: [(0, '0.144')] [2024-06-06 13:13:28,209][14296] Updated weights for policy 0, policy_version 15797 (0.0028) [2024-06-06 13:13:31,564][14064] Fps is (10 sec: 45863.0, 60 sec: 47786.6, 300 sec: 47707.6). Total num frames: 258949120. Throughput: 0: 47755.0. Samples: 112052120. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 13:13:31,565][14064] Avg episode reward: [(0, '0.144')] [2024-06-06 13:13:32,364][14296] Updated weights for policy 0, policy_version 15807 (0.0033) [2024-06-06 13:13:35,166][14296] Updated weights for policy 0, policy_version 15817 (0.0033) [2024-06-06 13:13:36,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47790.3, 300 sec: 47652.5). Total num frames: 259194880. Throughput: 0: 47604.9. Samples: 112190340. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 13:13:36,562][14064] Avg episode reward: [(0, '0.148')] [2024-06-06 13:13:39,203][14296] Updated weights for policy 0, policy_version 15827 (0.0025) [2024-06-06 13:13:41,561][14064] Fps is (10 sec: 49164.7, 60 sec: 47513.6, 300 sec: 47596.9). Total num frames: 259440640. Throughput: 0: 47314.2. Samples: 112469780. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 13:13:41,562][14064] Avg episode reward: [(0, '0.146')] [2024-06-06 13:13:42,164][14296] Updated weights for policy 0, policy_version 15837 (0.0038) [2024-06-06 13:13:46,182][14296] Updated weights for policy 0, policy_version 15847 (0.0029) [2024-06-06 13:13:46,564][14064] Fps is (10 sec: 45863.3, 60 sec: 46965.4, 300 sec: 47541.0). Total num frames: 259653632. Throughput: 0: 47298.1. Samples: 112756080. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 13:13:46,564][14064] Avg episode reward: [(0, '0.143')] [2024-06-06 13:13:49,077][14296] Updated weights for policy 0, policy_version 15857 (0.0034) [2024-06-06 13:13:51,561][14064] Fps is (10 sec: 45874.4, 60 sec: 47786.5, 300 sec: 47652.4). Total num frames: 259899392. Throughput: 0: 47165.5. Samples: 112897720. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 13:13:51,562][14064] Avg episode reward: [(0, '0.144')] [2024-06-06 13:13:53,093][14296] Updated weights for policy 0, policy_version 15867 (0.0031) [2024-06-06 13:13:55,873][14296] Updated weights for policy 0, policy_version 15877 (0.0022) [2024-06-06 13:13:56,561][14064] Fps is (10 sec: 49164.8, 60 sec: 47240.6, 300 sec: 47596.9). Total num frames: 260145152. Throughput: 0: 47366.4. Samples: 113184120. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:13:56,562][14064] Avg episode reward: [(0, '0.149')] [2024-06-06 13:14:00,068][14296] Updated weights for policy 0, policy_version 15887 (0.0025) [2024-06-06 13:14:01,561][14064] Fps is (10 sec: 47514.5, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 260374528. Throughput: 0: 47329.3. Samples: 113463640. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:14:01,562][14064] Avg episode reward: [(0, '0.147')] [2024-06-06 13:14:02,861][14296] Updated weights for policy 0, policy_version 15897 (0.0020) [2024-06-06 13:14:06,561][14064] Fps is (10 sec: 44236.7, 60 sec: 46967.5, 300 sec: 47485.8). Total num frames: 260587520. Throughput: 0: 47113.8. Samples: 113600880. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:14:06,562][14064] Avg episode reward: [(0, '0.145')] [2024-06-06 13:14:06,666][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000015906_260603904.pth... [2024-06-06 13:14:06,722][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000015210_249200640.pth [2024-06-06 13:14:06,866][14296] Updated weights for policy 0, policy_version 15907 (0.0035) [2024-06-06 13:14:09,964][14296] Updated weights for policy 0, policy_version 15917 (0.0033) [2024-06-06 13:14:09,981][14276] Signal inference workers to stop experience collection... (1700 times) [2024-06-06 13:14:09,981][14276] Signal inference workers to resume experience collection... (1700 times) [2024-06-06 13:14:10,004][14296] InferenceWorker_p0-w0: stopping experience collection (1700 times) [2024-06-06 13:14:10,004][14296] InferenceWorker_p0-w0: resuming experience collection (1700 times) [2024-06-06 13:14:11,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 260833280. Throughput: 0: 46978.2. Samples: 113881620. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:14:11,562][14064] Avg episode reward: [(0, '0.146')] [2024-06-06 13:14:13,937][14296] Updated weights for policy 0, policy_version 15927 (0.0032) [2024-06-06 13:14:16,561][14064] Fps is (10 sec: 49152.2, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 261079040. Throughput: 0: 47068.2. Samples: 114170060. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 13:14:16,562][14064] Avg episode reward: [(0, '0.150')] [2024-06-06 13:14:16,614][14276] Saving new best policy, reward=0.150! [2024-06-06 13:14:16,858][14296] Updated weights for policy 0, policy_version 15937 (0.0029) [2024-06-06 13:14:20,736][14296] Updated weights for policy 0, policy_version 15947 (0.0029) [2024-06-06 13:14:21,561][14064] Fps is (10 sec: 47513.5, 60 sec: 46967.4, 300 sec: 47430.3). Total num frames: 261308416. Throughput: 0: 47127.5. Samples: 114311080. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 13:14:21,562][14064] Avg episode reward: [(0, '0.138')] [2024-06-06 13:14:23,638][14296] Updated weights for policy 0, policy_version 15957 (0.0030) [2024-06-06 13:14:26,561][14064] Fps is (10 sec: 45875.0, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 261537792. Throughput: 0: 47287.6. Samples: 114597720. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 13:14:26,562][14064] Avg episode reward: [(0, '0.146')] [2024-06-06 13:14:27,734][14296] Updated weights for policy 0, policy_version 15967 (0.0025) [2024-06-06 13:14:30,566][14296] Updated weights for policy 0, policy_version 15977 (0.0025) [2024-06-06 13:14:31,561][14064] Fps is (10 sec: 49151.3, 60 sec: 47515.6, 300 sec: 47596.9). Total num frames: 261799936. Throughput: 0: 47239.0. Samples: 114881720. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-06 13:14:31,562][14064] Avg episode reward: [(0, '0.143')] [2024-06-06 13:14:34,664][14296] Updated weights for policy 0, policy_version 15987 (0.0037) [2024-06-06 13:14:36,561][14064] Fps is (10 sec: 49151.6, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 262029312. Throughput: 0: 47330.4. Samples: 115027580. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-06 13:14:36,562][14064] Avg episode reward: [(0, '0.144')] [2024-06-06 13:14:37,740][14296] Updated weights for policy 0, policy_version 15997 (0.0038) [2024-06-06 13:14:41,561][14064] Fps is (10 sec: 44237.7, 60 sec: 46694.5, 300 sec: 47375.2). Total num frames: 262242304. Throughput: 0: 47171.1. Samples: 115306820. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 13:14:41,562][14064] Avg episode reward: [(0, '0.146')] [2024-06-06 13:14:41,594][14296] Updated weights for policy 0, policy_version 16007 (0.0032) [2024-06-06 13:14:44,496][14296] Updated weights for policy 0, policy_version 16017 (0.0027) [2024-06-06 13:14:46,561][14064] Fps is (10 sec: 49152.4, 60 sec: 47788.7, 300 sec: 47652.5). Total num frames: 262520832. Throughput: 0: 47088.5. Samples: 115582620. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 13:14:46,562][14064] Avg episode reward: [(0, '0.148')] [2024-06-06 13:14:48,630][14296] Updated weights for policy 0, policy_version 16027 (0.0039) [2024-06-06 13:14:51,235][14296] Updated weights for policy 0, policy_version 16037 (0.0027) [2024-06-06 13:14:51,561][14064] Fps is (10 sec: 50789.6, 60 sec: 47513.7, 300 sec: 47485.8). Total num frames: 262750208. Throughput: 0: 47396.8. Samples: 115733740. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:14:51,562][14064] Avg episode reward: [(0, '0.152')] [2024-06-06 13:14:51,563][14276] Saving new best policy, reward=0.152! [2024-06-06 13:14:55,393][14296] Updated weights for policy 0, policy_version 16047 (0.0035) [2024-06-06 13:14:56,564][14064] Fps is (10 sec: 45863.0, 60 sec: 47238.4, 300 sec: 47429.9). Total num frames: 262979584. Throughput: 0: 47612.8. Samples: 116024320. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:14:56,565][14064] Avg episode reward: [(0, '0.148')] [2024-06-06 13:14:58,046][14296] Updated weights for policy 0, policy_version 16057 (0.0035) [2024-06-06 13:15:01,561][14064] Fps is (10 sec: 44237.3, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 263192576. Throughput: 0: 47707.1. Samples: 116316880. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:15:01,562][14064] Avg episode reward: [(0, '0.147')] [2024-06-06 13:15:02,293][14296] Updated weights for policy 0, policy_version 16067 (0.0032) [2024-06-06 13:15:04,977][14296] Updated weights for policy 0, policy_version 16077 (0.0045) [2024-06-06 13:15:06,561][14064] Fps is (10 sec: 47526.3, 60 sec: 47786.7, 300 sec: 47485.9). Total num frames: 263454720. Throughput: 0: 47535.2. Samples: 116450160. Policy #0 lag: (min: 1.0, avg: 9.3, max: 22.0) [2024-06-06 13:15:06,562][14064] Avg episode reward: [(0, '0.150')] [2024-06-06 13:15:09,095][14296] Updated weights for policy 0, policy_version 16087 (0.0029) [2024-06-06 13:15:11,561][14064] Fps is (10 sec: 50790.8, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 263700480. Throughput: 0: 47544.5. Samples: 116737220. Policy #0 lag: (min: 1.0, avg: 9.3, max: 22.0) [2024-06-06 13:15:11,562][14064] Avg episode reward: [(0, '0.148')] [2024-06-06 13:15:11,879][14296] Updated weights for policy 0, policy_version 16097 (0.0036) [2024-06-06 13:15:16,080][14296] Updated weights for policy 0, policy_version 16107 (0.0026) [2024-06-06 13:15:16,564][14064] Fps is (10 sec: 45863.2, 60 sec: 47238.5, 300 sec: 47318.8). Total num frames: 263913472. Throughput: 0: 47449.0. Samples: 117017040. Policy #0 lag: (min: 1.0, avg: 11.1, max: 21.0) [2024-06-06 13:15:16,564][14064] Avg episode reward: [(0, '0.151')] [2024-06-06 13:15:18,769][14296] Updated weights for policy 0, policy_version 16117 (0.0028) [2024-06-06 13:15:21,561][14064] Fps is (10 sec: 45874.9, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 264159232. Throughput: 0: 47254.7. Samples: 117154040. Policy #0 lag: (min: 1.0, avg: 11.1, max: 21.0) [2024-06-06 13:15:21,562][14064] Avg episode reward: [(0, '0.140')] [2024-06-06 13:15:23,161][14296] Updated weights for policy 0, policy_version 16127 (0.0030) [2024-06-06 13:15:25,803][14296] Updated weights for policy 0, policy_version 16137 (0.0025) [2024-06-06 13:15:26,561][14064] Fps is (10 sec: 49163.9, 60 sec: 47786.5, 300 sec: 47485.8). Total num frames: 264404992. Throughput: 0: 47330.4. Samples: 117436700. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 13:15:26,562][14064] Avg episode reward: [(0, '0.153')] [2024-06-06 13:15:29,674][14276] Signal inference workers to stop experience collection... (1750 times) [2024-06-06 13:15:29,674][14276] Signal inference workers to resume experience collection... (1750 times) [2024-06-06 13:15:29,727][14296] InferenceWorker_p0-w0: stopping experience collection (1750 times) [2024-06-06 13:15:29,727][14296] InferenceWorker_p0-w0: resuming experience collection (1750 times) [2024-06-06 13:15:29,807][14296] Updated weights for policy 0, policy_version 16147 (0.0027) [2024-06-06 13:15:31,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47240.7, 300 sec: 47485.8). Total num frames: 264634368. Throughput: 0: 47718.2. Samples: 117729940. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 13:15:31,561][14064] Avg episode reward: [(0, '0.145')] [2024-06-06 13:15:32,606][14296] Updated weights for policy 0, policy_version 16157 (0.0032) [2024-06-06 13:15:36,561][14064] Fps is (10 sec: 45876.2, 60 sec: 47240.6, 300 sec: 47430.3). Total num frames: 264863744. Throughput: 0: 47529.1. Samples: 117872540. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 13:15:36,561][14064] Avg episode reward: [(0, '0.143')] [2024-06-06 13:15:36,678][14296] Updated weights for policy 0, policy_version 16167 (0.0026) [2024-06-06 13:15:39,223][14296] Updated weights for policy 0, policy_version 16177 (0.0027) [2024-06-06 13:15:41,561][14064] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 47596.9). Total num frames: 265125888. Throughput: 0: 47390.4. Samples: 118156760. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 13:15:41,562][14064] Avg episode reward: [(0, '0.149')] [2024-06-06 13:15:43,758][14296] Updated weights for policy 0, policy_version 16187 (0.0031) [2024-06-06 13:15:46,223][14296] Updated weights for policy 0, policy_version 16197 (0.0025) [2024-06-06 13:15:46,561][14064] Fps is (10 sec: 50789.8, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 265371648. Throughput: 0: 47130.6. Samples: 118437760. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 13:15:46,562][14064] Avg episode reward: [(0, '0.145')] [2024-06-06 13:15:50,669][14296] Updated weights for policy 0, policy_version 16207 (0.0025) [2024-06-06 13:15:51,562][14064] Fps is (10 sec: 45874.0, 60 sec: 47240.4, 300 sec: 47430.3). Total num frames: 265584640. Throughput: 0: 47595.3. Samples: 118591960. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:15:51,562][14064] Avg episode reward: [(0, '0.150')] [2024-06-06 13:15:53,305][14296] Updated weights for policy 0, policy_version 16217 (0.0038) [2024-06-06 13:15:56,561][14064] Fps is (10 sec: 42598.8, 60 sec: 46969.5, 300 sec: 47541.8). Total num frames: 265797632. Throughput: 0: 47419.5. Samples: 118871100. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:15:56,562][14064] Avg episode reward: [(0, '0.146')] [2024-06-06 13:15:57,469][14296] Updated weights for policy 0, policy_version 16227 (0.0032) [2024-06-06 13:16:00,131][14296] Updated weights for policy 0, policy_version 16237 (0.0034) [2024-06-06 13:16:01,561][14064] Fps is (10 sec: 49153.4, 60 sec: 48059.8, 300 sec: 47486.3). Total num frames: 266076160. Throughput: 0: 47553.4. Samples: 119156820. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:16:01,561][14064] Avg episode reward: [(0, '0.148')] [2024-06-06 13:16:04,374][14296] Updated weights for policy 0, policy_version 16247 (0.0030) [2024-06-06 13:16:06,561][14064] Fps is (10 sec: 52428.9, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 266321920. Throughput: 0: 47762.2. Samples: 119303340. Policy #0 lag: (min: 0.0, avg: 11.2, max: 23.0) [2024-06-06 13:16:06,562][14064] Avg episode reward: [(0, '0.147')] [2024-06-06 13:16:06,662][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000016256_266338304.pth... [2024-06-06 13:16:06,719][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000015560_254935040.pth [2024-06-06 13:16:06,867][14296] Updated weights for policy 0, policy_version 16257 (0.0030) [2024-06-06 13:16:11,331][14296] Updated weights for policy 0, policy_version 16267 (0.0028) [2024-06-06 13:16:11,561][14064] Fps is (10 sec: 44236.5, 60 sec: 46967.4, 300 sec: 47375.2). Total num frames: 266518528. Throughput: 0: 47887.3. Samples: 119591620. Policy #0 lag: (min: 0.0, avg: 11.2, max: 23.0) [2024-06-06 13:16:11,562][14064] Avg episode reward: [(0, '0.146')] [2024-06-06 13:16:13,669][14296] Updated weights for policy 0, policy_version 16277 (0.0040) [2024-06-06 13:16:16,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47788.7, 300 sec: 47596.9). Total num frames: 266780672. Throughput: 0: 47665.8. Samples: 119874900. Policy #0 lag: (min: 1.0, avg: 11.5, max: 21.0) [2024-06-06 13:16:16,562][14064] Avg episode reward: [(0, '0.144')] [2024-06-06 13:16:18,383][14296] Updated weights for policy 0, policy_version 16287 (0.0027) [2024-06-06 13:16:20,640][14296] Updated weights for policy 0, policy_version 16297 (0.0028) [2024-06-06 13:16:21,561][14064] Fps is (10 sec: 49151.8, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 267010048. Throughput: 0: 47463.5. Samples: 120008400. Policy #0 lag: (min: 1.0, avg: 11.5, max: 21.0) [2024-06-06 13:16:21,562][14064] Avg episode reward: [(0, '0.150')] [2024-06-06 13:16:25,143][14296] Updated weights for policy 0, policy_version 16307 (0.0039) [2024-06-06 13:16:26,561][14064] Fps is (10 sec: 49151.4, 60 sec: 47786.7, 300 sec: 47485.8). Total num frames: 267272192. Throughput: 0: 47697.6. Samples: 120303160. Policy #0 lag: (min: 0.0, avg: 7.6, max: 21.0) [2024-06-06 13:16:26,562][14064] Avg episode reward: [(0, '0.148')] [2024-06-06 13:16:27,595][14296] Updated weights for policy 0, policy_version 16317 (0.0032) [2024-06-06 13:16:31,561][14064] Fps is (10 sec: 45875.6, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 267468800. Throughput: 0: 47650.8. Samples: 120582040. Policy #0 lag: (min: 0.0, avg: 7.6, max: 21.0) [2024-06-06 13:16:31,562][14064] Avg episode reward: [(0, '0.149')] [2024-06-06 13:16:32,023][14296] Updated weights for policy 0, policy_version 16327 (0.0025) [2024-06-06 13:16:34,534][14296] Updated weights for policy 0, policy_version 16337 (0.0036) [2024-06-06 13:16:36,561][14064] Fps is (10 sec: 47514.2, 60 sec: 48059.7, 300 sec: 47597.3). Total num frames: 267747328. Throughput: 0: 47177.2. Samples: 120714920. Policy #0 lag: (min: 0.0, avg: 7.6, max: 21.0) [2024-06-06 13:16:36,561][14064] Avg episode reward: [(0, '0.149')] [2024-06-06 13:16:38,981][14296] Updated weights for policy 0, policy_version 16347 (0.0038) [2024-06-06 13:16:41,561][14064] Fps is (10 sec: 50790.0, 60 sec: 47513.5, 300 sec: 47485.8). Total num frames: 267976704. Throughput: 0: 47352.8. Samples: 121001980. Policy #0 lag: (min: 0.0, avg: 8.9, max: 20.0) [2024-06-06 13:16:41,562][14064] Avg episode reward: [(0, '0.151')] [2024-06-06 13:16:41,613][14296] Updated weights for policy 0, policy_version 16357 (0.0032) [2024-06-06 13:16:42,766][14276] Signal inference workers to stop experience collection... (1800 times) [2024-06-06 13:16:42,767][14276] Signal inference workers to resume experience collection... (1800 times) [2024-06-06 13:16:42,783][14296] InferenceWorker_p0-w0: stopping experience collection (1800 times) [2024-06-06 13:16:42,819][14296] InferenceWorker_p0-w0: resuming experience collection (1800 times) [2024-06-06 13:16:45,702][14296] Updated weights for policy 0, policy_version 16367 (0.0030) [2024-06-06 13:16:46,561][14064] Fps is (10 sec: 45874.8, 60 sec: 47240.6, 300 sec: 47485.8). Total num frames: 268206080. Throughput: 0: 47595.0. Samples: 121298600. Policy #0 lag: (min: 0.0, avg: 8.9, max: 20.0) [2024-06-06 13:16:46,562][14064] Avg episode reward: [(0, '0.154')] [2024-06-06 13:16:46,648][14276] Saving new best policy, reward=0.154! [2024-06-06 13:16:48,382][14296] Updated weights for policy 0, policy_version 16377 (0.0024) [2024-06-06 13:16:51,561][14064] Fps is (10 sec: 44237.0, 60 sec: 47240.7, 300 sec: 47541.4). Total num frames: 268419072. Throughput: 0: 47562.2. Samples: 121443640. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:16:51,562][14064] Avg episode reward: [(0, '0.155')] [2024-06-06 13:16:51,726][14276] Saving new best policy, reward=0.155! [2024-06-06 13:16:52,596][14296] Updated weights for policy 0, policy_version 16387 (0.0032) [2024-06-06 13:16:55,316][14296] Updated weights for policy 0, policy_version 16397 (0.0024) [2024-06-06 13:16:56,561][14064] Fps is (10 sec: 47513.3, 60 sec: 48059.6, 300 sec: 47430.3). Total num frames: 268681216. Throughput: 0: 47378.1. Samples: 121723640. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:16:56,562][14064] Avg episode reward: [(0, '0.153')] [2024-06-06 13:16:59,347][14296] Updated weights for policy 0, policy_version 16407 (0.0038) [2024-06-06 13:17:01,563][14064] Fps is (10 sec: 50782.0, 60 sec: 47512.2, 300 sec: 47485.6). Total num frames: 268926976. Throughput: 0: 47422.2. Samples: 122008980. Policy #0 lag: (min: 0.0, avg: 10.6, max: 25.0) [2024-06-06 13:17:01,564][14064] Avg episode reward: [(0, '0.156')] [2024-06-06 13:17:02,139][14296] Updated weights for policy 0, policy_version 16417 (0.0026) [2024-06-06 13:17:06,561][14064] Fps is (10 sec: 44237.5, 60 sec: 46694.4, 300 sec: 47430.3). Total num frames: 269123584. Throughput: 0: 47652.1. Samples: 122152740. Policy #0 lag: (min: 0.0, avg: 10.6, max: 25.0) [2024-06-06 13:17:06,562][14064] Avg episode reward: [(0, '0.156')] [2024-06-06 13:17:06,591][14296] Updated weights for policy 0, policy_version 16427 (0.0020) [2024-06-06 13:17:08,886][14296] Updated weights for policy 0, policy_version 16437 (0.0028) [2024-06-06 13:17:11,561][14064] Fps is (10 sec: 45883.1, 60 sec: 47786.7, 300 sec: 47541.4). Total num frames: 269385728. Throughput: 0: 47330.8. Samples: 122433040. Policy #0 lag: (min: 0.0, avg: 10.6, max: 25.0) [2024-06-06 13:17:11,561][14064] Avg episode reward: [(0, '0.150')] [2024-06-06 13:17:13,328][14296] Updated weights for policy 0, policy_version 16447 (0.0025) [2024-06-06 13:17:15,978][14296] Updated weights for policy 0, policy_version 16457 (0.0024) [2024-06-06 13:17:16,561][14064] Fps is (10 sec: 50790.4, 60 sec: 47513.6, 300 sec: 47485.8). Total num frames: 269631488. Throughput: 0: 47442.2. Samples: 122716940. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 13:17:16,562][14064] Avg episode reward: [(0, '0.156')] [2024-06-06 13:17:20,496][14296] Updated weights for policy 0, policy_version 16467 (0.0028) [2024-06-06 13:17:21,561][14064] Fps is (10 sec: 47513.4, 60 sec: 47513.7, 300 sec: 47430.7). Total num frames: 269860864. Throughput: 0: 47670.7. Samples: 122860100. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 13:17:21,561][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:17:21,624][14276] Saving new best policy, reward=0.159! [2024-06-06 13:17:23,024][14296] Updated weights for policy 0, policy_version 16477 (0.0031) [2024-06-06 13:17:26,564][14064] Fps is (10 sec: 45862.8, 60 sec: 46965.4, 300 sec: 47485.8). Total num frames: 270090240. Throughput: 0: 47506.1. Samples: 123139880. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 13:17:26,565][14064] Avg episode reward: [(0, '0.152')] [2024-06-06 13:17:27,187][14296] Updated weights for policy 0, policy_version 16487 (0.0035) [2024-06-06 13:17:29,772][14296] Updated weights for policy 0, policy_version 16497 (0.0031) [2024-06-06 13:17:31,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47786.7, 300 sec: 47486.6). Total num frames: 270336000. Throughput: 0: 47226.4. Samples: 123423780. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 13:17:31,562][14064] Avg episode reward: [(0, '0.158')] [2024-06-06 13:17:34,367][14296] Updated weights for policy 0, policy_version 16507 (0.0023) [2024-06-06 13:17:36,561][14064] Fps is (10 sec: 50804.0, 60 sec: 47513.6, 300 sec: 47485.8). Total num frames: 270598144. Throughput: 0: 47316.4. Samples: 123572880. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 13:17:36,562][14064] Avg episode reward: [(0, '0.157')] [2024-06-06 13:17:36,705][14296] Updated weights for policy 0, policy_version 16517 (0.0029) [2024-06-06 13:17:41,402][14296] Updated weights for policy 0, policy_version 16527 (0.0025) [2024-06-06 13:17:41,561][14064] Fps is (10 sec: 44236.6, 60 sec: 46694.5, 300 sec: 47263.7). Total num frames: 270778368. Throughput: 0: 47281.0. Samples: 123851280. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 13:17:41,562][14064] Avg episode reward: [(0, '0.150')] [2024-06-06 13:17:43,823][14296] Updated weights for policy 0, policy_version 16537 (0.0035) [2024-06-06 13:17:46,561][14064] Fps is (10 sec: 44236.6, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 271040512. Throughput: 0: 47266.1. Samples: 124135880. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 13:17:46,562][14064] Avg episode reward: [(0, '0.154')] [2024-06-06 13:17:48,377][14296] Updated weights for policy 0, policy_version 16547 (0.0030) [2024-06-06 13:17:50,663][14296] Updated weights for policy 0, policy_version 16557 (0.0034) [2024-06-06 13:17:51,561][14064] Fps is (10 sec: 50790.3, 60 sec: 47786.7, 300 sec: 47374.8). Total num frames: 271286272. Throughput: 0: 47207.1. Samples: 124277060. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 13:17:51,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:17:53,817][14276] Signal inference workers to stop experience collection... (1850 times) [2024-06-06 13:17:53,859][14296] InferenceWorker_p0-w0: stopping experience collection (1850 times) [2024-06-06 13:17:53,867][14276] Signal inference workers to resume experience collection... (1850 times) [2024-06-06 13:17:53,870][14296] InferenceWorker_p0-w0: resuming experience collection (1850 times) [2024-06-06 13:17:55,131][14296] Updated weights for policy 0, policy_version 16567 (0.0031) [2024-06-06 13:17:56,561][14064] Fps is (10 sec: 49152.2, 60 sec: 47513.7, 300 sec: 47430.3). Total num frames: 271532032. Throughput: 0: 47383.5. Samples: 124565300. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 13:17:56,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:17:56,567][14276] Saving new best policy, reward=0.162! [2024-06-06 13:17:57,597][14296] Updated weights for policy 0, policy_version 16577 (0.0040) [2024-06-06 13:18:01,561][14064] Fps is (10 sec: 45875.3, 60 sec: 46968.8, 300 sec: 47374.8). Total num frames: 271745024. Throughput: 0: 47545.3. Samples: 124856480. Policy #0 lag: (min: 1.0, avg: 9.2, max: 21.0) [2024-06-06 13:18:01,562][14064] Avg episode reward: [(0, '0.153')] [2024-06-06 13:18:01,693][14296] Updated weights for policy 0, policy_version 16587 (0.0033) [2024-06-06 13:18:04,291][14296] Updated weights for policy 0, policy_version 16597 (0.0028) [2024-06-06 13:18:06,561][14064] Fps is (10 sec: 47513.8, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 272007168. Throughput: 0: 47236.0. Samples: 124985720. Policy #0 lag: (min: 1.0, avg: 9.2, max: 21.0) [2024-06-06 13:18:06,562][14064] Avg episode reward: [(0, '0.161')] [2024-06-06 13:18:06,571][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000016602_272007168.pth... [2024-06-06 13:18:06,618][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000015906_260603904.pth [2024-06-06 13:18:08,717][14296] Updated weights for policy 0, policy_version 16607 (0.0038) [2024-06-06 13:18:11,242][14296] Updated weights for policy 0, policy_version 16617 (0.0018) [2024-06-06 13:18:11,561][14064] Fps is (10 sec: 50790.4, 60 sec: 47786.6, 300 sec: 47485.8). Total num frames: 272252928. Throughput: 0: 47498.4. Samples: 125277180. Policy #0 lag: (min: 1.0, avg: 9.2, max: 21.0) [2024-06-06 13:18:11,562][14064] Avg episode reward: [(0, '0.157')] [2024-06-06 13:18:15,756][14296] Updated weights for policy 0, policy_version 16627 (0.0027) [2024-06-06 13:18:16,561][14064] Fps is (10 sec: 45874.7, 60 sec: 47240.5, 300 sec: 47374.7). Total num frames: 272465920. Throughput: 0: 47604.7. Samples: 125566000. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:18:16,564][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:18:18,204][14296] Updated weights for policy 0, policy_version 16637 (0.0026) [2024-06-06 13:18:21,561][14064] Fps is (10 sec: 44236.1, 60 sec: 47240.4, 300 sec: 47374.7). Total num frames: 272695296. Throughput: 0: 47411.9. Samples: 125706420. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:18:21,562][14064] Avg episode reward: [(0, '0.155')] [2024-06-06 13:18:22,480][14296] Updated weights for policy 0, policy_version 16647 (0.0035) [2024-06-06 13:18:24,945][14296] Updated weights for policy 0, policy_version 16657 (0.0028) [2024-06-06 13:18:26,564][14064] Fps is (10 sec: 49139.5, 60 sec: 47786.7, 300 sec: 47485.8). Total num frames: 272957440. Throughput: 0: 47563.0. Samples: 125991740. Policy #0 lag: (min: 1.0, avg: 12.1, max: 22.0) [2024-06-06 13:18:26,565][14064] Avg episode reward: [(0, '0.152')] [2024-06-06 13:18:29,194][14296] Updated weights for policy 0, policy_version 16667 (0.0032) [2024-06-06 13:18:31,561][14064] Fps is (10 sec: 52429.6, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 273219584. Throughput: 0: 47516.1. Samples: 126274100. Policy #0 lag: (min: 1.0, avg: 12.1, max: 22.0) [2024-06-06 13:18:31,562][14064] Avg episode reward: [(0, '0.155')] [2024-06-06 13:18:31,729][14296] Updated weights for policy 0, policy_version 16677 (0.0032) [2024-06-06 13:18:36,261][14296] Updated weights for policy 0, policy_version 16687 (0.0026) [2024-06-06 13:18:36,561][14064] Fps is (10 sec: 44248.2, 60 sec: 46694.4, 300 sec: 47319.2). Total num frames: 273399808. Throughput: 0: 47628.8. Samples: 126420360. Policy #0 lag: (min: 1.0, avg: 12.1, max: 22.0) [2024-06-06 13:18:36,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:18:38,858][14296] Updated weights for policy 0, policy_version 16697 (0.0025) [2024-06-06 13:18:41,561][14064] Fps is (10 sec: 44236.2, 60 sec: 48059.6, 300 sec: 47486.2). Total num frames: 273661952. Throughput: 0: 47501.7. Samples: 126702880. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 13:18:41,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:18:43,111][14296] Updated weights for policy 0, policy_version 16707 (0.0020) [2024-06-06 13:18:45,805][14296] Updated weights for policy 0, policy_version 16717 (0.0025) [2024-06-06 13:18:46,561][14064] Fps is (10 sec: 50790.7, 60 sec: 47786.7, 300 sec: 47485.9). Total num frames: 273907712. Throughput: 0: 47399.5. Samples: 126989460. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 13:18:46,562][14064] Avg episode reward: [(0, '0.146')] [2024-06-06 13:18:49,959][14296] Updated weights for policy 0, policy_version 16727 (0.0036) [2024-06-06 13:18:51,561][14064] Fps is (10 sec: 49152.3, 60 sec: 47786.6, 300 sec: 47485.8). Total num frames: 274153472. Throughput: 0: 47841.7. Samples: 127138600. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:18:51,562][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:18:52,641][14296] Updated weights for policy 0, policy_version 16737 (0.0025) [2024-06-06 13:18:56,561][14064] Fps is (10 sec: 45874.8, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 274366464. Throughput: 0: 47716.3. Samples: 127424420. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:18:56,562][14064] Avg episode reward: [(0, '0.161')] [2024-06-06 13:18:56,622][14296] Updated weights for policy 0, policy_version 16747 (0.0031) [2024-06-06 13:18:58,581][14276] Signal inference workers to stop experience collection... (1900 times) [2024-06-06 13:18:58,581][14276] Signal inference workers to resume experience collection... (1900 times) [2024-06-06 13:18:58,621][14296] InferenceWorker_p0-w0: stopping experience collection (1900 times) [2024-06-06 13:18:58,622][14296] InferenceWorker_p0-w0: resuming experience collection (1900 times) [2024-06-06 13:18:59,356][14296] Updated weights for policy 0, policy_version 16757 (0.0036) [2024-06-06 13:19:01,561][14064] Fps is (10 sec: 47513.4, 60 sec: 48059.6, 300 sec: 47596.9). Total num frames: 274628608. Throughput: 0: 47572.9. Samples: 127706780. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 13:19:01,562][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:19:03,419][14296] Updated weights for policy 0, policy_version 16767 (0.0044) [2024-06-06 13:19:06,407][14296] Updated weights for policy 0, policy_version 16777 (0.0037) [2024-06-06 13:19:06,561][14064] Fps is (10 sec: 50790.7, 60 sec: 47786.6, 300 sec: 47596.9). Total num frames: 274874368. Throughput: 0: 47773.0. Samples: 127856200. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 13:19:06,562][14064] Avg episode reward: [(0, '0.154')] [2024-06-06 13:19:10,473][14296] Updated weights for policy 0, policy_version 16787 (0.0027) [2024-06-06 13:19:11,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47240.4, 300 sec: 47485.8). Total num frames: 275087360. Throughput: 0: 47667.6. Samples: 128136660. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 13:19:11,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:19:13,266][14296] Updated weights for policy 0, policy_version 16797 (0.0023) [2024-06-06 13:19:16,561][14064] Fps is (10 sec: 45874.8, 60 sec: 47786.6, 300 sec: 47541.4). Total num frames: 275333120. Throughput: 0: 47731.9. Samples: 128422040. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 13:19:16,562][14064] Avg episode reward: [(0, '0.153')] [2024-06-06 13:19:17,366][14296] Updated weights for policy 0, policy_version 16807 (0.0031) [2024-06-06 13:19:19,915][14296] Updated weights for policy 0, policy_version 16817 (0.0036) [2024-06-06 13:19:21,564][14064] Fps is (10 sec: 49139.4, 60 sec: 48057.7, 300 sec: 47596.5). Total num frames: 275578880. Throughput: 0: 47671.0. Samples: 128565680. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 13:19:21,565][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:19:24,079][14296] Updated weights for policy 0, policy_version 16827 (0.0034) [2024-06-06 13:19:26,561][14064] Fps is (10 sec: 50790.4, 60 sec: 48061.7, 300 sec: 47596.9). Total num frames: 275841024. Throughput: 0: 47895.1. Samples: 128858160. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 13:19:26,562][14064] Avg episode reward: [(0, '0.161')] [2024-06-06 13:19:26,759][14296] Updated weights for policy 0, policy_version 16837 (0.0021) [2024-06-06 13:19:30,726][14296] Updated weights for policy 0, policy_version 16847 (0.0034) [2024-06-06 13:19:31,561][14064] Fps is (10 sec: 44248.8, 60 sec: 46694.4, 300 sec: 47430.3). Total num frames: 276021248. Throughput: 0: 48054.3. Samples: 129151900. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 13:19:31,562][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:19:33,760][14296] Updated weights for policy 0, policy_version 16857 (0.0037) [2024-06-06 13:19:36,561][14064] Fps is (10 sec: 42598.9, 60 sec: 47786.7, 300 sec: 47541.4). Total num frames: 276267008. Throughput: 0: 47660.1. Samples: 129283300. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 13:19:36,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:19:36,577][14276] Saving new best policy, reward=0.163! [2024-06-06 13:19:37,683][14296] Updated weights for policy 0, policy_version 16867 (0.0029) [2024-06-06 13:19:40,678][14296] Updated weights for policy 0, policy_version 16877 (0.0036) [2024-06-06 13:19:41,561][14064] Fps is (10 sec: 50790.2, 60 sec: 47786.8, 300 sec: 47485.8). Total num frames: 276529152. Throughput: 0: 47595.7. Samples: 129566220. Policy #0 lag: (min: 1.0, avg: 9.5, max: 22.0) [2024-06-06 13:19:41,562][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:19:44,683][14296] Updated weights for policy 0, policy_version 16887 (0.0031) [2024-06-06 13:19:46,561][14064] Fps is (10 sec: 50790.5, 60 sec: 47786.7, 300 sec: 47541.4). Total num frames: 276774912. Throughput: 0: 47668.1. Samples: 129851840. Policy #0 lag: (min: 1.0, avg: 9.5, max: 22.0) [2024-06-06 13:19:46,562][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:19:47,360][14296] Updated weights for policy 0, policy_version 16897 (0.0023) [2024-06-06 13:19:51,357][14296] Updated weights for policy 0, policy_version 16907 (0.0033) [2024-06-06 13:19:51,561][14064] Fps is (10 sec: 47513.9, 60 sec: 47513.7, 300 sec: 47541.8). Total num frames: 277004288. Throughput: 0: 47611.2. Samples: 129998700. Policy #0 lag: (min: 0.0, avg: 9.1, max: 23.0) [2024-06-06 13:19:51,562][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:19:54,307][14296] Updated weights for policy 0, policy_version 16917 (0.0040) [2024-06-06 13:19:56,561][14064] Fps is (10 sec: 45874.4, 60 sec: 47786.6, 300 sec: 47596.9). Total num frames: 277233664. Throughput: 0: 47662.1. Samples: 130281460. Policy #0 lag: (min: 0.0, avg: 9.1, max: 23.0) [2024-06-06 13:19:56,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:19:58,182][14296] Updated weights for policy 0, policy_version 16927 (0.0023) [2024-06-06 13:20:01,505][14296] Updated weights for policy 0, policy_version 16937 (0.0020) [2024-06-06 13:20:01,564][14064] Fps is (10 sec: 49138.4, 60 sec: 47784.6, 300 sec: 47596.5). Total num frames: 277495808. Throughput: 0: 47779.9. Samples: 130572260. Policy #0 lag: (min: 0.0, avg: 9.1, max: 23.0) [2024-06-06 13:20:01,564][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:20:05,222][14296] Updated weights for policy 0, policy_version 16947 (0.0049) [2024-06-06 13:20:06,561][14064] Fps is (10 sec: 49152.4, 60 sec: 47513.5, 300 sec: 47541.3). Total num frames: 277725184. Throughput: 0: 47797.8. Samples: 130716460. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:20:06,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:20:06,569][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000016951_277725184.pth... [2024-06-06 13:20:06,574][14276] Signal inference workers to stop experience collection... (1950 times) [2024-06-06 13:20:06,574][14276] Signal inference workers to resume experience collection... (1950 times) [2024-06-06 13:20:06,584][14296] InferenceWorker_p0-w0: stopping experience collection (1950 times) [2024-06-06 13:20:06,584][14296] InferenceWorker_p0-w0: resuming experience collection (1950 times) [2024-06-06 13:20:06,615][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000016256_266338304.pth [2024-06-06 13:20:08,588][14296] Updated weights for policy 0, policy_version 16957 (0.0038) [2024-06-06 13:20:11,561][14064] Fps is (10 sec: 45887.0, 60 sec: 47786.6, 300 sec: 47597.3). Total num frames: 277954560. Throughput: 0: 47727.5. Samples: 131005900. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:20:11,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:20:12,064][14296] Updated weights for policy 0, policy_version 16967 (0.0029) [2024-06-06 13:20:15,242][14296] Updated weights for policy 0, policy_version 16977 (0.0026) [2024-06-06 13:20:16,561][14064] Fps is (10 sec: 47513.5, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 278200320. Throughput: 0: 47485.6. Samples: 131288760. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 13:20:16,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:20:18,751][14296] Updated weights for policy 0, policy_version 16987 (0.0021) [2024-06-06 13:20:21,561][14064] Fps is (10 sec: 49152.9, 60 sec: 47788.8, 300 sec: 47596.9). Total num frames: 278446080. Throughput: 0: 47800.0. Samples: 131434300. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 13:20:21,561][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:20:22,257][14296] Updated weights for policy 0, policy_version 16997 (0.0036) [2024-06-06 13:20:25,548][14296] Updated weights for policy 0, policy_version 17007 (0.0033) [2024-06-06 13:20:26,561][14064] Fps is (10 sec: 45875.8, 60 sec: 46967.6, 300 sec: 47541.4). Total num frames: 278659072. Throughput: 0: 47924.0. Samples: 131722800. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 13:20:26,562][14064] Avg episode reward: [(0, '0.157')] [2024-06-06 13:20:29,267][14296] Updated weights for policy 0, policy_version 17017 (0.0025) [2024-06-06 13:20:31,562][14064] Fps is (10 sec: 45873.9, 60 sec: 48059.5, 300 sec: 47596.9). Total num frames: 278904832. Throughput: 0: 48039.3. Samples: 132013620. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 13:20:31,562][14064] Avg episode reward: [(0, '0.154')] [2024-06-06 13:20:32,315][14296] Updated weights for policy 0, policy_version 17027 (0.0026) [2024-06-06 13:20:35,970][14296] Updated weights for policy 0, policy_version 17037 (0.0030) [2024-06-06 13:20:36,562][14064] Fps is (10 sec: 49150.7, 60 sec: 48059.5, 300 sec: 47541.3). Total num frames: 279150592. Throughput: 0: 47805.5. Samples: 132149960. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 13:20:36,562][14064] Avg episode reward: [(0, '0.157')] [2024-06-06 13:20:39,534][14296] Updated weights for policy 0, policy_version 17047 (0.0029) [2024-06-06 13:20:41,564][14064] Fps is (10 sec: 49140.2, 60 sec: 47784.6, 300 sec: 47541.0). Total num frames: 279396352. Throughput: 0: 47870.3. Samples: 132435740. Policy #0 lag: (min: 1.0, avg: 10.6, max: 23.0) [2024-06-06 13:20:41,565][14064] Avg episode reward: [(0, '0.158')] [2024-06-06 13:20:42,687][14296] Updated weights for policy 0, policy_version 17057 (0.0026) [2024-06-06 13:20:46,327][14296] Updated weights for policy 0, policy_version 17067 (0.0020) [2024-06-06 13:20:46,561][14064] Fps is (10 sec: 47514.4, 60 sec: 47513.5, 300 sec: 47596.9). Total num frames: 279625728. Throughput: 0: 47816.6. Samples: 132723880. Policy #0 lag: (min: 1.0, avg: 10.6, max: 23.0) [2024-06-06 13:20:46,562][14064] Avg episode reward: [(0, '0.161')] [2024-06-06 13:20:49,666][14296] Updated weights for policy 0, policy_version 17077 (0.0032) [2024-06-06 13:20:51,561][14064] Fps is (10 sec: 45887.0, 60 sec: 47513.5, 300 sec: 47652.4). Total num frames: 279855104. Throughput: 0: 47684.5. Samples: 132862260. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 13:20:51,562][14064] Avg episode reward: [(0, '0.164')] [2024-06-06 13:20:53,435][14296] Updated weights for policy 0, policy_version 17087 (0.0028) [2024-06-06 13:20:56,563][14064] Fps is (10 sec: 47507.3, 60 sec: 47785.7, 300 sec: 47541.1). Total num frames: 280100864. Throughput: 0: 47308.9. Samples: 133134860. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 13:20:56,563][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:20:56,955][14296] Updated weights for policy 0, policy_version 17097 (0.0035) [2024-06-06 13:21:00,587][14296] Updated weights for policy 0, policy_version 17107 (0.0027) [2024-06-06 13:21:01,561][14064] Fps is (10 sec: 45875.3, 60 sec: 46969.5, 300 sec: 47430.3). Total num frames: 280313856. Throughput: 0: 47517.4. Samples: 133427040. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 13:21:01,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:21:03,758][14296] Updated weights for policy 0, policy_version 17117 (0.0034) [2024-06-06 13:21:05,493][14276] Signal inference workers to stop experience collection... (2000 times) [2024-06-06 13:21:05,493][14276] Signal inference workers to resume experience collection... (2000 times) [2024-06-06 13:21:05,535][14296] InferenceWorker_p0-w0: stopping experience collection (2000 times) [2024-06-06 13:21:05,535][14296] InferenceWorker_p0-w0: resuming experience collection (2000 times) [2024-06-06 13:21:06,561][14064] Fps is (10 sec: 44243.0, 60 sec: 46967.6, 300 sec: 47541.4). Total num frames: 280543232. Throughput: 0: 47329.7. Samples: 133564140. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:21:06,562][14064] Avg episode reward: [(0, '0.164')] [2024-06-06 13:21:07,382][14296] Updated weights for policy 0, policy_version 17127 (0.0034) [2024-06-06 13:21:10,383][14296] Updated weights for policy 0, policy_version 17137 (0.0033) [2024-06-06 13:21:11,561][14064] Fps is (10 sec: 49152.4, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 280805376. Throughput: 0: 47234.7. Samples: 133848360. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:21:11,561][14064] Avg episode reward: [(0, '0.154')] [2024-06-06 13:21:14,209][14296] Updated weights for policy 0, policy_version 17147 (0.0039) [2024-06-06 13:21:16,561][14064] Fps is (10 sec: 50790.1, 60 sec: 47513.7, 300 sec: 47596.9). Total num frames: 281051136. Throughput: 0: 47111.3. Samples: 134133620. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:21:16,562][14064] Avg episode reward: [(0, '0.156')] [2024-06-06 13:21:17,351][14296] Updated weights for policy 0, policy_version 17157 (0.0028) [2024-06-06 13:21:21,239][14296] Updated weights for policy 0, policy_version 17167 (0.0033) [2024-06-06 13:21:21,563][14064] Fps is (10 sec: 45867.3, 60 sec: 46966.1, 300 sec: 47430.0). Total num frames: 281264128. Throughput: 0: 47305.6. Samples: 134278780. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:21:21,563][14064] Avg episode reward: [(0, '0.158')] [2024-06-06 13:21:24,345][14296] Updated weights for policy 0, policy_version 17177 (0.0030) [2024-06-06 13:21:26,561][14064] Fps is (10 sec: 45874.5, 60 sec: 47513.4, 300 sec: 47596.9). Total num frames: 281509888. Throughput: 0: 47315.5. Samples: 134564820. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:21:26,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:21:28,268][14296] Updated weights for policy 0, policy_version 17187 (0.0042) [2024-06-06 13:21:31,074][14296] Updated weights for policy 0, policy_version 17197 (0.0025) [2024-06-06 13:21:31,561][14064] Fps is (10 sec: 50798.9, 60 sec: 47786.9, 300 sec: 47541.4). Total num frames: 281772032. Throughput: 0: 47187.2. Samples: 134847300. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:21:31,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:21:35,116][14296] Updated weights for policy 0, policy_version 17207 (0.0026) [2024-06-06 13:21:36,561][14064] Fps is (10 sec: 50791.0, 60 sec: 47786.8, 300 sec: 47596.9). Total num frames: 282017792. Throughput: 0: 47428.9. Samples: 134996560. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:21:36,562][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:21:36,580][14276] Saving new best policy, reward=0.165! [2024-06-06 13:21:37,848][14296] Updated weights for policy 0, policy_version 17217 (0.0031) [2024-06-06 13:21:41,561][14064] Fps is (10 sec: 44237.0, 60 sec: 46969.6, 300 sec: 47485.8). Total num frames: 282214400. Throughput: 0: 47811.7. Samples: 135286320. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 13:21:41,562][14064] Avg episode reward: [(0, '0.158')] [2024-06-06 13:21:41,815][14296] Updated weights for policy 0, policy_version 17227 (0.0035) [2024-06-06 13:21:44,716][14296] Updated weights for policy 0, policy_version 17237 (0.0025) [2024-06-06 13:21:46,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 282476544. Throughput: 0: 47664.0. Samples: 135571920. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 13:21:46,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:21:48,710][14296] Updated weights for policy 0, policy_version 17247 (0.0018) [2024-06-06 13:21:51,561][14064] Fps is (10 sec: 50789.4, 60 sec: 47786.6, 300 sec: 47596.9). Total num frames: 282722304. Throughput: 0: 47797.6. Samples: 135715040. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 13:21:51,562][14064] Avg episode reward: [(0, '0.161')] [2024-06-06 13:21:51,762][14296] Updated weights for policy 0, policy_version 17257 (0.0022) [2024-06-06 13:21:55,560][14296] Updated weights for policy 0, policy_version 17267 (0.0021) [2024-06-06 13:21:56,562][14064] Fps is (10 sec: 47509.3, 60 sec: 47513.9, 300 sec: 47541.5). Total num frames: 282951680. Throughput: 0: 47917.1. Samples: 136004680. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 13:21:56,563][14064] Avg episode reward: [(0, '0.156')] [2024-06-06 13:21:58,469][14296] Updated weights for policy 0, policy_version 17277 (0.0031) [2024-06-06 13:22:01,561][14064] Fps is (10 sec: 45875.0, 60 sec: 47786.5, 300 sec: 47652.4). Total num frames: 283181056. Throughput: 0: 48023.4. Samples: 136294680. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 13:22:01,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:22:02,302][14296] Updated weights for policy 0, policy_version 17287 (0.0030) [2024-06-06 13:22:05,443][14296] Updated weights for policy 0, policy_version 17297 (0.0039) [2024-06-06 13:22:06,561][14064] Fps is (10 sec: 49156.9, 60 sec: 48332.8, 300 sec: 47652.4). Total num frames: 283443200. Throughput: 0: 47776.9. Samples: 136428660. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:22:06,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:22:06,581][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000017300_283443200.pth... [2024-06-06 13:22:06,626][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000016602_272007168.pth [2024-06-06 13:22:09,173][14296] Updated weights for policy 0, policy_version 17307 (0.0039) [2024-06-06 13:22:11,561][14064] Fps is (10 sec: 50791.1, 60 sec: 48059.6, 300 sec: 47652.4). Total num frames: 283688960. Throughput: 0: 47732.6. Samples: 136712780. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:22:11,562][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:22:12,299][14296] Updated weights for policy 0, policy_version 17317 (0.0035) [2024-06-06 13:22:16,008][14296] Updated weights for policy 0, policy_version 17327 (0.0038) [2024-06-06 13:22:16,561][14064] Fps is (10 sec: 45874.5, 60 sec: 47513.5, 300 sec: 47596.9). Total num frames: 283901952. Throughput: 0: 48063.9. Samples: 137010180. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 13:22:16,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:22:19,171][14296] Updated weights for policy 0, policy_version 17337 (0.0034) [2024-06-06 13:22:21,561][14064] Fps is (10 sec: 45875.4, 60 sec: 48061.0, 300 sec: 47652.9). Total num frames: 284147712. Throughput: 0: 47861.8. Samples: 137150340. Policy #0 lag: (min: 0.0, avg: 12.5, max: 21.0) [2024-06-06 13:22:21,562][14064] Avg episode reward: [(0, '0.152')] [2024-06-06 13:22:22,918][14296] Updated weights for policy 0, policy_version 17347 (0.0026) [2024-06-06 13:22:25,904][14296] Updated weights for policy 0, policy_version 17357 (0.0026) [2024-06-06 13:22:26,561][14064] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 284393472. Throughput: 0: 47776.7. Samples: 137436280. Policy #0 lag: (min: 0.0, avg: 12.5, max: 21.0) [2024-06-06 13:22:26,562][14064] Avg episode reward: [(0, '0.161')] [2024-06-06 13:22:29,563][14296] Updated weights for policy 0, policy_version 17367 (0.0026) [2024-06-06 13:22:31,561][14064] Fps is (10 sec: 49152.0, 60 sec: 47786.6, 300 sec: 47596.9). Total num frames: 284639232. Throughput: 0: 47868.9. Samples: 137726020. Policy #0 lag: (min: 0.0, avg: 9.6, max: 23.0) [2024-06-06 13:22:31,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:22:32,734][14296] Updated weights for policy 0, policy_version 17377 (0.0024) [2024-06-06 13:22:36,441][14296] Updated weights for policy 0, policy_version 17387 (0.0031) [2024-06-06 13:22:36,561][14064] Fps is (10 sec: 47514.5, 60 sec: 47513.7, 300 sec: 47763.5). Total num frames: 284868608. Throughput: 0: 47942.9. Samples: 137872460. Policy #0 lag: (min: 0.0, avg: 9.6, max: 23.0) [2024-06-06 13:22:36,561][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:22:39,719][14296] Updated weights for policy 0, policy_version 17397 (0.0032) [2024-06-06 13:22:41,047][14276] Signal inference workers to stop experience collection... (2050 times) [2024-06-06 13:22:41,048][14276] Signal inference workers to resume experience collection... (2050 times) [2024-06-06 13:22:41,086][14296] InferenceWorker_p0-w0: stopping experience collection (2050 times) [2024-06-06 13:22:41,086][14296] InferenceWorker_p0-w0: resuming experience collection (2050 times) [2024-06-06 13:22:41,561][14064] Fps is (10 sec: 47513.4, 60 sec: 48332.7, 300 sec: 47708.0). Total num frames: 285114368. Throughput: 0: 47909.0. Samples: 138160540. Policy #0 lag: (min: 0.0, avg: 9.6, max: 23.0) [2024-06-06 13:22:41,562][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:22:43,190][14296] Updated weights for policy 0, policy_version 17407 (0.0036) [2024-06-06 13:22:46,514][14296] Updated weights for policy 0, policy_version 17417 (0.0026) [2024-06-06 13:22:46,561][14064] Fps is (10 sec: 49151.2, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 285360128. Throughput: 0: 47655.7. Samples: 138439180. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 13:22:46,562][14064] Avg episode reward: [(0, '0.158')] [2024-06-06 13:22:50,241][14296] Updated weights for policy 0, policy_version 17427 (0.0033) [2024-06-06 13:22:51,561][14064] Fps is (10 sec: 49152.4, 60 sec: 48059.9, 300 sec: 47708.0). Total num frames: 285605888. Throughput: 0: 48021.8. Samples: 138589640. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 13:22:51,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:22:53,115][14296] Updated weights for policy 0, policy_version 17437 (0.0029) [2024-06-06 13:22:56,561][14064] Fps is (10 sec: 44237.5, 60 sec: 47514.4, 300 sec: 47652.5). Total num frames: 285802496. Throughput: 0: 48001.0. Samples: 138872820. Policy #0 lag: (min: 1.0, avg: 10.7, max: 22.0) [2024-06-06 13:22:56,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:22:57,119][14296] Updated weights for policy 0, policy_version 17447 (0.0024) [2024-06-06 13:23:00,470][14296] Updated weights for policy 0, policy_version 17457 (0.0029) [2024-06-06 13:23:01,562][14064] Fps is (10 sec: 45872.1, 60 sec: 48059.4, 300 sec: 47652.3). Total num frames: 286064640. Throughput: 0: 47749.2. Samples: 139158920. Policy #0 lag: (min: 1.0, avg: 10.7, max: 22.0) [2024-06-06 13:23:01,563][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:23:04,174][14296] Updated weights for policy 0, policy_version 17467 (0.0032) [2024-06-06 13:23:06,561][14064] Fps is (10 sec: 50789.6, 60 sec: 47786.6, 300 sec: 47652.4). Total num frames: 286310400. Throughput: 0: 47855.0. Samples: 139303820. Policy #0 lag: (min: 1.0, avg: 10.7, max: 22.0) [2024-06-06 13:23:06,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:23:07,351][14296] Updated weights for policy 0, policy_version 17477 (0.0034) [2024-06-06 13:23:11,036][14296] Updated weights for policy 0, policy_version 17487 (0.0032) [2024-06-06 13:23:11,561][14064] Fps is (10 sec: 47516.9, 60 sec: 47513.7, 300 sec: 47708.0). Total num frames: 286539776. Throughput: 0: 47852.6. Samples: 139589640. Policy #0 lag: (min: 1.0, avg: 9.3, max: 20.0) [2024-06-06 13:23:11,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:23:14,443][14296] Updated weights for policy 0, policy_version 17497 (0.0035) [2024-06-06 13:23:16,561][14064] Fps is (10 sec: 45875.7, 60 sec: 47786.8, 300 sec: 47708.0). Total num frames: 286769152. Throughput: 0: 47708.5. Samples: 139872900. Policy #0 lag: (min: 1.0, avg: 9.3, max: 20.0) [2024-06-06 13:23:16,561][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:23:16,572][14276] Saving new best policy, reward=0.169! [2024-06-06 13:23:17,913][14296] Updated weights for policy 0, policy_version 17507 (0.0029) [2024-06-06 13:23:21,078][14296] Updated weights for policy 0, policy_version 17517 (0.0032) [2024-06-06 13:23:21,561][14064] Fps is (10 sec: 47512.9, 60 sec: 47786.6, 300 sec: 47652.9). Total num frames: 287014912. Throughput: 0: 47596.7. Samples: 140014320. Policy #0 lag: (min: 0.0, avg: 12.4, max: 23.0) [2024-06-06 13:23:21,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:23:24,679][14296] Updated weights for policy 0, policy_version 17527 (0.0028) [2024-06-06 13:23:26,561][14064] Fps is (10 sec: 49151.4, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 287260672. Throughput: 0: 47484.8. Samples: 140297360. Policy #0 lag: (min: 0.0, avg: 12.4, max: 23.0) [2024-06-06 13:23:26,562][14064] Avg episode reward: [(0, '0.161')] [2024-06-06 13:23:27,879][14296] Updated weights for policy 0, policy_version 17537 (0.0025) [2024-06-06 13:23:31,559][14296] Updated weights for policy 0, policy_version 17547 (0.0036) [2024-06-06 13:23:31,561][14064] Fps is (10 sec: 47514.4, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 287490048. Throughput: 0: 47860.1. Samples: 140592880. Policy #0 lag: (min: 0.0, avg: 12.4, max: 23.0) [2024-06-06 13:23:31,562][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:23:34,952][14296] Updated weights for policy 0, policy_version 17557 (0.0032) [2024-06-06 13:23:36,561][14064] Fps is (10 sec: 47514.2, 60 sec: 47786.6, 300 sec: 47708.0). Total num frames: 287735808. Throughput: 0: 47632.0. Samples: 140733080. Policy #0 lag: (min: 1.0, avg: 11.0, max: 23.0) [2024-06-06 13:23:36,562][14064] Avg episode reward: [(0, '0.164')] [2024-06-06 13:23:38,380][14296] Updated weights for policy 0, policy_version 17567 (0.0036) [2024-06-06 13:23:41,561][14064] Fps is (10 sec: 45874.9, 60 sec: 47240.6, 300 sec: 47596.9). Total num frames: 287948800. Throughput: 0: 47528.3. Samples: 141011600. Policy #0 lag: (min: 1.0, avg: 11.0, max: 23.0) [2024-06-06 13:23:41,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:23:41,978][14296] Updated weights for policy 0, policy_version 17577 (0.0020) [2024-06-06 13:23:42,078][14276] Signal inference workers to stop experience collection... (2100 times) [2024-06-06 13:23:42,123][14296] InferenceWorker_p0-w0: stopping experience collection (2100 times) [2024-06-06 13:23:42,130][14276] Signal inference workers to resume experience collection... (2100 times) [2024-06-06 13:23:42,138][14296] InferenceWorker_p0-w0: resuming experience collection (2100 times) [2024-06-06 13:23:45,241][14296] Updated weights for policy 0, policy_version 17587 (0.0036) [2024-06-06 13:23:46,561][14064] Fps is (10 sec: 49152.0, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 288227328. Throughput: 0: 47540.3. Samples: 141298200. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:23:46,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:23:48,557][14296] Updated weights for policy 0, policy_version 17597 (0.0030) [2024-06-06 13:23:51,561][14064] Fps is (10 sec: 47513.9, 60 sec: 46967.5, 300 sec: 47652.5). Total num frames: 288423936. Throughput: 0: 47620.6. Samples: 141446740. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:23:51,562][14064] Avg episode reward: [(0, '0.164')] [2024-06-06 13:23:51,886][14296] Updated weights for policy 0, policy_version 17607 (0.0028) [2024-06-06 13:23:55,283][14296] Updated weights for policy 0, policy_version 17617 (0.0036) [2024-06-06 13:23:56,561][14064] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 288686080. Throughput: 0: 47728.9. Samples: 141737440. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:23:56,561][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:23:58,794][14296] Updated weights for policy 0, policy_version 17627 (0.0036) [2024-06-06 13:24:01,561][14064] Fps is (10 sec: 50790.2, 60 sec: 47787.2, 300 sec: 47652.4). Total num frames: 288931840. Throughput: 0: 47831.1. Samples: 142025300. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 13:24:01,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:24:02,231][14296] Updated weights for policy 0, policy_version 17637 (0.0037) [2024-06-06 13:24:05,611][14296] Updated weights for policy 0, policy_version 17647 (0.0031) [2024-06-06 13:24:06,561][14064] Fps is (10 sec: 49151.8, 60 sec: 47786.8, 300 sec: 47763.5). Total num frames: 289177600. Throughput: 0: 47796.2. Samples: 142165140. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 13:24:06,562][14064] Avg episode reward: [(0, '0.164')] [2024-06-06 13:24:06,582][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000017650_289177600.pth... [2024-06-06 13:24:06,637][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000016951_277725184.pth [2024-06-06 13:24:09,230][14296] Updated weights for policy 0, policy_version 17657 (0.0029) [2024-06-06 13:24:11,561][14064] Fps is (10 sec: 45874.7, 60 sec: 47513.5, 300 sec: 47652.4). Total num frames: 289390592. Throughput: 0: 47848.0. Samples: 142450520. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:24:11,562][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:24:12,411][14296] Updated weights for policy 0, policy_version 17667 (0.0040) [2024-06-06 13:24:15,834][14296] Updated weights for policy 0, policy_version 17677 (0.0027) [2024-06-06 13:24:16,561][14064] Fps is (10 sec: 45874.9, 60 sec: 47786.6, 300 sec: 47652.9). Total num frames: 289636352. Throughput: 0: 47724.8. Samples: 142740500. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:24:16,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:24:19,265][14296] Updated weights for policy 0, policy_version 17687 (0.0036) [2024-06-06 13:24:21,561][14064] Fps is (10 sec: 49152.3, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 289882112. Throughput: 0: 47766.1. Samples: 142882560. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:24:21,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:24:22,731][14296] Updated weights for policy 0, policy_version 17697 (0.0034) [2024-06-06 13:24:26,090][14296] Updated weights for policy 0, policy_version 17707 (0.0035) [2024-06-06 13:24:26,561][14064] Fps is (10 sec: 49152.3, 60 sec: 47786.8, 300 sec: 47819.1). Total num frames: 290127872. Throughput: 0: 48130.7. Samples: 143177480. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:24:26,562][14064] Avg episode reward: [(0, '0.156')] [2024-06-06 13:24:29,760][14296] Updated weights for policy 0, policy_version 17717 (0.0035) [2024-06-06 13:24:31,561][14064] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 47819.1). Total num frames: 290373632. Throughput: 0: 47831.6. Samples: 143450620. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:24:31,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:24:33,008][14296] Updated weights for policy 0, policy_version 17727 (0.0029) [2024-06-06 13:24:36,561][14064] Fps is (10 sec: 45874.5, 60 sec: 47513.5, 300 sec: 47652.4). Total num frames: 290586624. Throughput: 0: 47670.5. Samples: 143591920. Policy #0 lag: (min: 2.0, avg: 11.4, max: 21.0) [2024-06-06 13:24:36,562][14064] Avg episode reward: [(0, '0.157')] [2024-06-06 13:24:36,631][14296] Updated weights for policy 0, policy_version 17737 (0.0033) [2024-06-06 13:24:39,903][14296] Updated weights for policy 0, policy_version 17747 (0.0027) [2024-06-06 13:24:41,561][14064] Fps is (10 sec: 47513.4, 60 sec: 48332.8, 300 sec: 47708.0). Total num frames: 290848768. Throughput: 0: 47616.3. Samples: 143880180. Policy #0 lag: (min: 2.0, avg: 11.4, max: 21.0) [2024-06-06 13:24:41,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:24:43,383][14296] Updated weights for policy 0, policy_version 17757 (0.0031) [2024-06-06 13:24:46,561][14064] Fps is (10 sec: 47514.4, 60 sec: 47240.5, 300 sec: 47652.4). Total num frames: 291061760. Throughput: 0: 47620.9. Samples: 144168240. Policy #0 lag: (min: 2.0, avg: 11.4, max: 21.0) [2024-06-06 13:24:46,561][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:24:46,743][14296] Updated weights for policy 0, policy_version 17767 (0.0035) [2024-06-06 13:24:47,941][14276] Signal inference workers to stop experience collection... (2150 times) [2024-06-06 13:24:47,946][14276] Signal inference workers to resume experience collection... (2150 times) [2024-06-06 13:24:47,962][14296] InferenceWorker_p0-w0: stopping experience collection (2150 times) [2024-06-06 13:24:47,963][14296] InferenceWorker_p0-w0: resuming experience collection (2150 times) [2024-06-06 13:24:50,391][14296] Updated weights for policy 0, policy_version 17777 (0.0030) [2024-06-06 13:24:51,561][14064] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 291307520. Throughput: 0: 47630.7. Samples: 144308520. Policy #0 lag: (min: 0.0, avg: 11.0, max: 24.0) [2024-06-06 13:24:51,562][14064] Avg episode reward: [(0, '0.164')] [2024-06-06 13:24:53,609][14296] Updated weights for policy 0, policy_version 17787 (0.0028) [2024-06-06 13:24:56,561][14064] Fps is (10 sec: 49151.7, 60 sec: 47786.6, 300 sec: 47652.9). Total num frames: 291553280. Throughput: 0: 47594.3. Samples: 144592260. Policy #0 lag: (min: 0.0, avg: 11.0, max: 24.0) [2024-06-06 13:24:56,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:24:57,190][14296] Updated weights for policy 0, policy_version 17797 (0.0031) [2024-06-06 13:25:00,488][14296] Updated weights for policy 0, policy_version 17807 (0.0031) [2024-06-06 13:25:01,561][14064] Fps is (10 sec: 47513.3, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 291782656. Throughput: 0: 47620.0. Samples: 144883400. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 13:25:01,562][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:25:04,170][14296] Updated weights for policy 0, policy_version 17817 (0.0027) [2024-06-06 13:25:06,561][14064] Fps is (10 sec: 45874.8, 60 sec: 47240.4, 300 sec: 47652.4). Total num frames: 292012032. Throughput: 0: 47508.4. Samples: 145020440. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 13:25:06,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:25:07,429][14296] Updated weights for policy 0, policy_version 17827 (0.0033) [2024-06-06 13:25:11,021][14296] Updated weights for policy 0, policy_version 17837 (0.0027) [2024-06-06 13:25:11,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47786.7, 300 sec: 47652.4). Total num frames: 292257792. Throughput: 0: 47398.5. Samples: 145310420. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 13:25:11,562][14064] Avg episode reward: [(0, '0.159')] [2024-06-06 13:25:14,199][14296] Updated weights for policy 0, policy_version 17847 (0.0026) [2024-06-06 13:25:16,561][14064] Fps is (10 sec: 49151.6, 60 sec: 47786.6, 300 sec: 47652.4). Total num frames: 292503552. Throughput: 0: 47747.3. Samples: 145599260. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:25:16,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:25:17,859][14296] Updated weights for policy 0, policy_version 17857 (0.0022) [2024-06-06 13:25:20,966][14296] Updated weights for policy 0, policy_version 17867 (0.0029) [2024-06-06 13:25:21,561][14064] Fps is (10 sec: 49151.7, 60 sec: 47786.6, 300 sec: 47763.5). Total num frames: 292749312. Throughput: 0: 47779.5. Samples: 145742000. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:25:21,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:25:24,879][14296] Updated weights for policy 0, policy_version 17877 (0.0033) [2024-06-06 13:25:26,561][14064] Fps is (10 sec: 49152.9, 60 sec: 47786.7, 300 sec: 47763.6). Total num frames: 292995072. Throughput: 0: 47782.3. Samples: 146030380. Policy #0 lag: (min: 1.0, avg: 9.3, max: 22.0) [2024-06-06 13:25:26,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:25:27,863][14296] Updated weights for policy 0, policy_version 17887 (0.0028) [2024-06-06 13:25:31,553][14296] Updated weights for policy 0, policy_version 17897 (0.0029) [2024-06-06 13:25:31,561][14064] Fps is (10 sec: 47514.1, 60 sec: 47513.5, 300 sec: 47708.0). Total num frames: 293224448. Throughput: 0: 47805.2. Samples: 146319480. Policy #0 lag: (min: 1.0, avg: 9.3, max: 22.0) [2024-06-06 13:25:31,562][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:25:34,953][14296] Updated weights for policy 0, policy_version 17907 (0.0026) [2024-06-06 13:25:36,561][14064] Fps is (10 sec: 47513.6, 60 sec: 48059.8, 300 sec: 47708.4). Total num frames: 293470208. Throughput: 0: 47822.6. Samples: 146460540. Policy #0 lag: (min: 1.0, avg: 9.3, max: 22.0) [2024-06-06 13:25:36,562][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:25:38,342][14296] Updated weights for policy 0, policy_version 17917 (0.0023) [2024-06-06 13:25:41,561][14064] Fps is (10 sec: 47514.2, 60 sec: 47513.7, 300 sec: 47708.0). Total num frames: 293699584. Throughput: 0: 47984.1. Samples: 146751540. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 13:25:41,561][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:25:41,650][14276] Saving new best policy, reward=0.171! [2024-06-06 13:25:41,656][14296] Updated weights for policy 0, policy_version 17927 (0.0027) [2024-06-06 13:25:45,265][14296] Updated weights for policy 0, policy_version 17937 (0.0030) [2024-06-06 13:25:46,561][14064] Fps is (10 sec: 49152.2, 60 sec: 48332.8, 300 sec: 47819.1). Total num frames: 293961728. Throughput: 0: 47960.1. Samples: 147041600. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 13:25:46,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:25:48,419][14296] Updated weights for policy 0, policy_version 17947 (0.0034) [2024-06-06 13:25:49,805][14276] Signal inference workers to stop experience collection... (2200 times) [2024-06-06 13:25:49,848][14296] InferenceWorker_p0-w0: stopping experience collection (2200 times) [2024-06-06 13:25:49,858][14276] Signal inference workers to resume experience collection... (2200 times) [2024-06-06 13:25:49,868][14296] InferenceWorker_p0-w0: resuming experience collection (2200 times) [2024-06-06 13:25:51,561][14064] Fps is (10 sec: 47512.4, 60 sec: 47786.5, 300 sec: 47708.2). Total num frames: 294174720. Throughput: 0: 48099.9. Samples: 147184940. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 13:25:51,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:25:52,099][14296] Updated weights for policy 0, policy_version 17957 (0.0024) [2024-06-06 13:25:55,271][14296] Updated weights for policy 0, policy_version 17967 (0.0025) [2024-06-06 13:25:56,561][14064] Fps is (10 sec: 45874.9, 60 sec: 47786.7, 300 sec: 47819.1). Total num frames: 294420480. Throughput: 0: 47997.9. Samples: 147470320. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 13:25:56,562][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:25:58,989][14296] Updated weights for policy 0, policy_version 17977 (0.0023) [2024-06-06 13:26:01,561][14064] Fps is (10 sec: 47514.6, 60 sec: 47786.7, 300 sec: 47819.1). Total num frames: 294649856. Throughput: 0: 47865.6. Samples: 147753200. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 13:26:01,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:26:02,372][14296] Updated weights for policy 0, policy_version 17987 (0.0038) [2024-06-06 13:26:05,740][14296] Updated weights for policy 0, policy_version 17997 (0.0038) [2024-06-06 13:26:06,561][14064] Fps is (10 sec: 49152.0, 60 sec: 48332.9, 300 sec: 47819.1). Total num frames: 294912000. Throughput: 0: 47826.8. Samples: 147894200. Policy #0 lag: (min: 1.0, avg: 10.7, max: 21.0) [2024-06-06 13:26:06,562][14064] Avg episode reward: [(0, '0.158')] [2024-06-06 13:26:06,584][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000018000_294912000.pth... [2024-06-06 13:26:06,632][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000017300_283443200.pth [2024-06-06 13:26:09,059][14296] Updated weights for policy 0, policy_version 18007 (0.0035) [2024-06-06 13:26:11,561][14064] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 47763.5). Total num frames: 295141376. Throughput: 0: 47715.1. Samples: 148177560. Policy #0 lag: (min: 1.0, avg: 10.7, max: 21.0) [2024-06-06 13:26:11,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:26:12,712][14296] Updated weights for policy 0, policy_version 18017 (0.0043) [2024-06-06 13:26:15,887][14296] Updated weights for policy 0, policy_version 18027 (0.0029) [2024-06-06 13:26:16,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47786.8, 300 sec: 47819.3). Total num frames: 295370752. Throughput: 0: 47737.9. Samples: 148467680. Policy #0 lag: (min: 1.0, avg: 10.7, max: 21.0) [2024-06-06 13:26:16,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:26:19,589][14296] Updated weights for policy 0, policy_version 18037 (0.0038) [2024-06-06 13:26:21,561][14064] Fps is (10 sec: 45874.8, 60 sec: 47513.7, 300 sec: 47763.5). Total num frames: 295600128. Throughput: 0: 47867.0. Samples: 148614560. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:26:21,562][14064] Avg episode reward: [(0, '0.164')] [2024-06-06 13:26:22,738][14296] Updated weights for policy 0, policy_version 18047 (0.0029) [2024-06-06 13:26:26,355][14296] Updated weights for policy 0, policy_version 18057 (0.0030) [2024-06-06 13:26:26,561][14064] Fps is (10 sec: 47513.4, 60 sec: 47513.6, 300 sec: 47708.0). Total num frames: 295845888. Throughput: 0: 47639.4. Samples: 148895320. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:26:26,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:26:29,772][14296] Updated weights for policy 0, policy_version 18067 (0.0020) [2024-06-06 13:26:31,561][14064] Fps is (10 sec: 49152.4, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 296091648. Throughput: 0: 47464.0. Samples: 149177480. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 13:26:31,562][14064] Avg episode reward: [(0, '0.164')] [2024-06-06 13:26:33,211][14296] Updated weights for policy 0, policy_version 18077 (0.0032) [2024-06-06 13:26:36,398][14296] Updated weights for policy 0, policy_version 18087 (0.0029) [2024-06-06 13:26:36,561][14064] Fps is (10 sec: 49152.4, 60 sec: 47786.7, 300 sec: 47874.6). Total num frames: 296337408. Throughput: 0: 47614.5. Samples: 149327580. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 13:26:36,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:26:40,118][14296] Updated weights for policy 0, policy_version 18097 (0.0024) [2024-06-06 13:26:41,561][14064] Fps is (10 sec: 45875.5, 60 sec: 47513.6, 300 sec: 47708.0). Total num frames: 296550400. Throughput: 0: 47573.9. Samples: 149611140. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 13:26:41,561][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:26:43,248][14296] Updated weights for policy 0, policy_version 18107 (0.0022) [2024-06-06 13:26:46,561][14064] Fps is (10 sec: 44236.6, 60 sec: 46967.4, 300 sec: 47652.5). Total num frames: 296779776. Throughput: 0: 47654.7. Samples: 149897660. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 13:26:46,562][14064] Avg episode reward: [(0, '0.160')] [2024-06-06 13:26:46,935][14296] Updated weights for policy 0, policy_version 18117 (0.0020) [2024-06-06 13:26:50,268][14296] Updated weights for policy 0, policy_version 18127 (0.0035) [2024-06-06 13:26:50,992][14276] Signal inference workers to stop experience collection... (2250 times) [2024-06-06 13:26:50,993][14276] Signal inference workers to resume experience collection... (2250 times) [2024-06-06 13:26:51,025][14296] InferenceWorker_p0-w0: stopping experience collection (2250 times) [2024-06-06 13:26:51,025][14296] InferenceWorker_p0-w0: resuming experience collection (2250 times) [2024-06-06 13:26:51,561][14064] Fps is (10 sec: 49151.9, 60 sec: 47786.9, 300 sec: 47763.7). Total num frames: 297041920. Throughput: 0: 47620.5. Samples: 150037120. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 13:26:51,562][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:26:53,746][14296] Updated weights for policy 0, policy_version 18137 (0.0026) [2024-06-06 13:26:56,564][14064] Fps is (10 sec: 50777.4, 60 sec: 47784.6, 300 sec: 47818.7). Total num frames: 297287680. Throughput: 0: 47576.8. Samples: 150318640. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 13:26:56,565][14064] Avg episode reward: [(0, '0.164')] [2024-06-06 13:26:57,215][14296] Updated weights for policy 0, policy_version 18147 (0.0035) [2024-06-06 13:27:00,561][14296] Updated weights for policy 0, policy_version 18157 (0.0025) [2024-06-06 13:27:01,561][14064] Fps is (10 sec: 45874.9, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 297500672. Throughput: 0: 47625.8. Samples: 150610840. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 13:27:01,562][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:27:04,076][14296] Updated weights for policy 0, policy_version 18167 (0.0030) [2024-06-06 13:27:06,561][14064] Fps is (10 sec: 45886.9, 60 sec: 47240.5, 300 sec: 47652.5). Total num frames: 297746432. Throughput: 0: 47410.7. Samples: 150748040. Policy #0 lag: (min: 1.0, avg: 10.3, max: 22.0) [2024-06-06 13:27:06,562][14064] Avg episode reward: [(0, '0.170')] [2024-06-06 13:27:07,631][14296] Updated weights for policy 0, policy_version 18177 (0.0030) [2024-06-06 13:27:10,857][14296] Updated weights for policy 0, policy_version 18187 (0.0024) [2024-06-06 13:27:11,561][14064] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 47763.6). Total num frames: 297992192. Throughput: 0: 47439.2. Samples: 151030080. Policy #0 lag: (min: 0.0, avg: 8.9, max: 20.0) [2024-06-06 13:27:11,562][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:27:14,642][14296] Updated weights for policy 0, policy_version 18197 (0.0036) [2024-06-06 13:27:16,561][14064] Fps is (10 sec: 45874.9, 60 sec: 47240.5, 300 sec: 47652.4). Total num frames: 298205184. Throughput: 0: 47493.2. Samples: 151314680. Policy #0 lag: (min: 0.0, avg: 8.9, max: 20.0) [2024-06-06 13:27:16,562][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:27:17,895][14296] Updated weights for policy 0, policy_version 18207 (0.0033) [2024-06-06 13:27:21,447][14296] Updated weights for policy 0, policy_version 18217 (0.0030) [2024-06-06 13:27:21,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 298467328. Throughput: 0: 47231.0. Samples: 151452980. Policy #0 lag: (min: 1.0, avg: 9.8, max: 21.0) [2024-06-06 13:27:21,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:27:24,965][14296] Updated weights for policy 0, policy_version 18227 (0.0033) [2024-06-06 13:27:26,561][14064] Fps is (10 sec: 47514.0, 60 sec: 47240.6, 300 sec: 47596.9). Total num frames: 298680320. Throughput: 0: 47225.3. Samples: 151736280. Policy #0 lag: (min: 1.0, avg: 9.8, max: 21.0) [2024-06-06 13:27:26,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:27:28,364][14296] Updated weights for policy 0, policy_version 18237 (0.0033) [2024-06-06 13:27:31,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47513.6, 300 sec: 47708.0). Total num frames: 298942464. Throughput: 0: 47232.4. Samples: 152023120. Policy #0 lag: (min: 1.0, avg: 9.8, max: 21.0) [2024-06-06 13:27:31,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:27:31,834][14296] Updated weights for policy 0, policy_version 18247 (0.0018) [2024-06-06 13:27:35,415][14296] Updated weights for policy 0, policy_version 18257 (0.0027) [2024-06-06 13:27:36,561][14064] Fps is (10 sec: 47513.1, 60 sec: 46967.4, 300 sec: 47596.9). Total num frames: 299155456. Throughput: 0: 47333.6. Samples: 152167140. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 13:27:36,562][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:27:38,612][14296] Updated weights for policy 0, policy_version 18267 (0.0032) [2024-06-06 13:27:41,564][14064] Fps is (10 sec: 45864.3, 60 sec: 47511.6, 300 sec: 47596.5). Total num frames: 299401216. Throughput: 0: 47259.3. Samples: 152445300. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 13:27:41,564][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:27:42,274][14296] Updated weights for policy 0, policy_version 18277 (0.0042) [2024-06-06 13:27:45,563][14296] Updated weights for policy 0, policy_version 18287 (0.0028) [2024-06-06 13:27:46,561][14064] Fps is (10 sec: 49152.5, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 299646976. Throughput: 0: 47167.1. Samples: 152733360. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 13:27:46,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:27:48,991][14296] Updated weights for policy 0, policy_version 18297 (0.0022) [2024-06-06 13:27:51,564][14064] Fps is (10 sec: 50790.8, 60 sec: 47784.7, 300 sec: 47818.7). Total num frames: 299909120. Throughput: 0: 47400.7. Samples: 152881180. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:27:51,564][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:27:52,404][14296] Updated weights for policy 0, policy_version 18307 (0.0023) [2024-06-06 13:27:56,028][14296] Updated weights for policy 0, policy_version 18317 (0.0033) [2024-06-06 13:27:56,561][14064] Fps is (10 sec: 47513.1, 60 sec: 47242.5, 300 sec: 47652.5). Total num frames: 300122112. Throughput: 0: 47685.2. Samples: 153175920. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:27:56,562][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:27:59,191][14296] Updated weights for policy 0, policy_version 18327 (0.0033) [2024-06-06 13:28:01,561][14064] Fps is (10 sec: 45886.2, 60 sec: 47786.7, 300 sec: 47652.5). Total num frames: 300367872. Throughput: 0: 47635.3. Samples: 153458260. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 13:28:01,562][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:28:03,052][14296] Updated weights for policy 0, policy_version 18337 (0.0029) [2024-06-06 13:28:06,069][14296] Updated weights for policy 0, policy_version 18347 (0.0029) [2024-06-06 13:28:06,564][14064] Fps is (10 sec: 49139.5, 60 sec: 47784.6, 300 sec: 47707.6). Total num frames: 300613632. Throughput: 0: 47844.8. Samples: 153606120. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 13:28:06,565][14064] Avg episode reward: [(0, '0.170')] [2024-06-06 13:28:06,577][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000018348_300613632.pth... [2024-06-06 13:28:06,622][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000017650_289177600.pth [2024-06-06 13:28:09,753][14296] Updated weights for policy 0, policy_version 18357 (0.0027) [2024-06-06 13:28:11,561][14064] Fps is (10 sec: 47512.7, 60 sec: 47513.5, 300 sec: 47708.0). Total num frames: 300843008. Throughput: 0: 47832.3. Samples: 153888740. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 13:28:11,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:28:12,931][14296] Updated weights for policy 0, policy_version 18367 (0.0031) [2024-06-06 13:28:13,481][14276] Signal inference workers to stop experience collection... (2300 times) [2024-06-06 13:28:13,483][14276] Signal inference workers to resume experience collection... (2300 times) [2024-06-06 13:28:13,496][14296] InferenceWorker_p0-w0: stopping experience collection (2300 times) [2024-06-06 13:28:13,496][14296] InferenceWorker_p0-w0: resuming experience collection (2300 times) [2024-06-06 13:28:16,561][14064] Fps is (10 sec: 45886.9, 60 sec: 47786.7, 300 sec: 47652.5). Total num frames: 301072384. Throughput: 0: 47992.8. Samples: 154182800. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-06 13:28:16,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:28:16,688][14296] Updated weights for policy 0, policy_version 18377 (0.0021) [2024-06-06 13:28:19,653][14296] Updated weights for policy 0, policy_version 18387 (0.0026) [2024-06-06 13:28:21,561][14064] Fps is (10 sec: 50791.1, 60 sec: 48059.8, 300 sec: 47763.5). Total num frames: 301350912. Throughput: 0: 47943.7. Samples: 154324600. Policy #0 lag: (min: 0.0, avg: 10.3, max: 20.0) [2024-06-06 13:28:21,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:28:23,365][14296] Updated weights for policy 0, policy_version 18397 (0.0029) [2024-06-06 13:28:26,442][14296] Updated weights for policy 0, policy_version 18407 (0.0021) [2024-06-06 13:28:26,561][14064] Fps is (10 sec: 50790.6, 60 sec: 48332.8, 300 sec: 47763.5). Total num frames: 301580288. Throughput: 0: 48113.6. Samples: 154610300. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:28:26,562][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:28:30,223][14296] Updated weights for policy 0, policy_version 18417 (0.0031) [2024-06-06 13:28:31,561][14064] Fps is (10 sec: 44236.8, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 301793280. Throughput: 0: 48225.3. Samples: 154903500. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:28:31,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:28:33,339][14296] Updated weights for policy 0, policy_version 18427 (0.0038) [2024-06-06 13:28:36,561][14064] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 302039040. Throughput: 0: 48025.5. Samples: 155042220. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:28:36,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:28:37,155][14296] Updated weights for policy 0, policy_version 18437 (0.0031) [2024-06-06 13:28:40,315][14296] Updated weights for policy 0, policy_version 18447 (0.0026) [2024-06-06 13:28:41,561][14064] Fps is (10 sec: 49151.7, 60 sec: 48061.6, 300 sec: 47652.4). Total num frames: 302284800. Throughput: 0: 47808.9. Samples: 155327320. Policy #0 lag: (min: 1.0, avg: 11.3, max: 23.0) [2024-06-06 13:28:41,562][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:28:41,563][14276] Saving new best policy, reward=0.174! [2024-06-06 13:28:43,836][14296] Updated weights for policy 0, policy_version 18457 (0.0032) [2024-06-06 13:28:46,561][14064] Fps is (10 sec: 49151.6, 60 sec: 48059.6, 300 sec: 47819.0). Total num frames: 302530560. Throughput: 0: 47903.3. Samples: 155613920. Policy #0 lag: (min: 1.0, avg: 11.3, max: 23.0) [2024-06-06 13:28:46,562][14064] Avg episode reward: [(0, '0.164')] [2024-06-06 13:28:47,007][14296] Updated weights for policy 0, policy_version 18467 (0.0032) [2024-06-06 13:28:50,710][14296] Updated weights for policy 0, policy_version 18477 (0.0033) [2024-06-06 13:28:51,561][14064] Fps is (10 sec: 45875.5, 60 sec: 47242.4, 300 sec: 47652.4). Total num frames: 302743552. Throughput: 0: 47966.8. Samples: 155764500. Policy #0 lag: (min: 1.0, avg: 11.3, max: 23.0) [2024-06-06 13:28:51,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:28:53,906][14296] Updated weights for policy 0, policy_version 18487 (0.0038) [2024-06-06 13:28:56,561][14064] Fps is (10 sec: 45875.6, 60 sec: 47786.7, 300 sec: 47652.4). Total num frames: 302989312. Throughput: 0: 47902.7. Samples: 156044360. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:28:56,562][14064] Avg episode reward: [(0, '0.165')] [2024-06-06 13:28:57,503][14296] Updated weights for policy 0, policy_version 18497 (0.0032) [2024-06-06 13:29:00,789][14296] Updated weights for policy 0, policy_version 18507 (0.0028) [2024-06-06 13:29:01,561][14064] Fps is (10 sec: 49152.2, 60 sec: 47786.7, 300 sec: 47652.5). Total num frames: 303235072. Throughput: 0: 47719.2. Samples: 156330160. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:29:01,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:29:04,542][14296] Updated weights for policy 0, policy_version 18517 (0.0029) [2024-06-06 13:29:06,561][14064] Fps is (10 sec: 49152.5, 60 sec: 47788.8, 300 sec: 47763.6). Total num frames: 303480832. Throughput: 0: 47837.8. Samples: 156477300. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:29:06,561][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:29:07,647][14296] Updated weights for policy 0, policy_version 18527 (0.0029) [2024-06-06 13:29:11,263][14296] Updated weights for policy 0, policy_version 18537 (0.0046) [2024-06-06 13:29:11,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 303710208. Throughput: 0: 47792.0. Samples: 156760940. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:29:11,562][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:29:14,389][14296] Updated weights for policy 0, policy_version 18547 (0.0026) [2024-06-06 13:29:16,561][14064] Fps is (10 sec: 47513.3, 60 sec: 48059.8, 300 sec: 47708.0). Total num frames: 303955968. Throughput: 0: 47593.3. Samples: 157045200. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:29:16,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:29:18,279][14296] Updated weights for policy 0, policy_version 18557 (0.0029) [2024-06-06 13:29:21,195][14296] Updated weights for policy 0, policy_version 18567 (0.0027) [2024-06-06 13:29:21,561][14064] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 47708.0). Total num frames: 304201728. Throughput: 0: 47733.9. Samples: 157190240. Policy #0 lag: (min: 1.0, avg: 10.8, max: 22.0) [2024-06-06 13:29:21,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:29:25,070][14296] Updated weights for policy 0, policy_version 18577 (0.0028) [2024-06-06 13:29:26,561][14064] Fps is (10 sec: 45875.8, 60 sec: 47240.6, 300 sec: 47596.9). Total num frames: 304414720. Throughput: 0: 47657.9. Samples: 157471920. Policy #0 lag: (min: 1.0, avg: 10.8, max: 22.0) [2024-06-06 13:29:26,561][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:29:28,167][14296] Updated weights for policy 0, policy_version 18587 (0.0031) [2024-06-06 13:29:31,561][14064] Fps is (10 sec: 45874.8, 60 sec: 47786.6, 300 sec: 47708.0). Total num frames: 304660480. Throughput: 0: 47670.3. Samples: 157759080. Policy #0 lag: (min: 1.0, avg: 10.8, max: 22.0) [2024-06-06 13:29:31,562][14064] Avg episode reward: [(0, '0.170')] [2024-06-06 13:29:31,836][14276] Signal inference workers to stop experience collection... (2350 times) [2024-06-06 13:29:31,837][14276] Signal inference workers to resume experience collection... (2350 times) [2024-06-06 13:29:31,874][14296] InferenceWorker_p0-w0: stopping experience collection (2350 times) [2024-06-06 13:29:31,874][14296] InferenceWorker_p0-w0: resuming experience collection (2350 times) [2024-06-06 13:29:31,969][14296] Updated weights for policy 0, policy_version 18597 (0.0036) [2024-06-06 13:29:35,040][14296] Updated weights for policy 0, policy_version 18607 (0.0033) [2024-06-06 13:29:36,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47513.7, 300 sec: 47596.9). Total num frames: 304889856. Throughput: 0: 47498.7. Samples: 157901940. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 13:29:36,561][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:29:38,859][14296] Updated weights for policy 0, policy_version 18617 (0.0025) [2024-06-06 13:29:41,561][14064] Fps is (10 sec: 49152.5, 60 sec: 47786.7, 300 sec: 47763.5). Total num frames: 305152000. Throughput: 0: 47596.1. Samples: 158186180. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 13:29:41,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:29:41,784][14296] Updated weights for policy 0, policy_version 18627 (0.0025) [2024-06-06 13:29:45,459][14296] Updated weights for policy 0, policy_version 18637 (0.0036) [2024-06-06 13:29:46,561][14064] Fps is (10 sec: 47513.5, 60 sec: 47240.7, 300 sec: 47652.4). Total num frames: 305364992. Throughput: 0: 47820.4. Samples: 158482080. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 13:29:46,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:29:48,600][14296] Updated weights for policy 0, policy_version 18647 (0.0033) [2024-06-06 13:29:51,561][14064] Fps is (10 sec: 45874.4, 60 sec: 47786.5, 300 sec: 47652.4). Total num frames: 305610752. Throughput: 0: 47716.2. Samples: 158624540. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 13:29:51,562][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:29:52,554][14296] Updated weights for policy 0, policy_version 18657 (0.0032) [2024-06-06 13:29:55,589][14296] Updated weights for policy 0, policy_version 18667 (0.0030) [2024-06-06 13:29:56,561][14064] Fps is (10 sec: 49151.8, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 305856512. Throughput: 0: 47592.0. Samples: 158902580. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 13:29:56,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:29:59,249][14296] Updated weights for policy 0, policy_version 18677 (0.0032) [2024-06-06 13:30:01,562][14064] Fps is (10 sec: 49151.7, 60 sec: 47786.5, 300 sec: 47763.5). Total num frames: 306102272. Throughput: 0: 47772.2. Samples: 159194960. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 13:30:01,562][14064] Avg episode reward: [(0, '0.170')] [2024-06-06 13:30:02,281][14296] Updated weights for policy 0, policy_version 18687 (0.0022) [2024-06-06 13:30:06,011][14296] Updated weights for policy 0, policy_version 18697 (0.0037) [2024-06-06 13:30:06,563][14064] Fps is (10 sec: 47506.3, 60 sec: 47512.3, 300 sec: 47707.7). Total num frames: 306331648. Throughput: 0: 47739.7. Samples: 159338600. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 13:30:06,564][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:30:06,704][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000018698_306348032.pth... [2024-06-06 13:30:06,752][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000018000_294912000.pth [2024-06-06 13:30:09,001][14296] Updated weights for policy 0, policy_version 18707 (0.0021) [2024-06-06 13:30:11,561][14064] Fps is (10 sec: 47514.4, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 306577408. Throughput: 0: 47888.7. Samples: 159626920. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 13:30:11,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:30:12,891][14296] Updated weights for policy 0, policy_version 18717 (0.0025) [2024-06-06 13:30:15,912][14296] Updated weights for policy 0, policy_version 18727 (0.0020) [2024-06-06 13:30:16,561][14064] Fps is (10 sec: 49158.9, 60 sec: 47786.5, 300 sec: 47708.0). Total num frames: 306823168. Throughput: 0: 47816.8. Samples: 159910840. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 13:30:16,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:30:19,931][14296] Updated weights for policy 0, policy_version 18737 (0.0032) [2024-06-06 13:30:21,564][14064] Fps is (10 sec: 47501.2, 60 sec: 47511.5, 300 sec: 47652.0). Total num frames: 307052544. Throughput: 0: 47784.3. Samples: 160052360. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 13:30:21,565][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:30:22,881][14296] Updated weights for policy 0, policy_version 18747 (0.0034) [2024-06-06 13:30:26,561][14064] Fps is (10 sec: 47514.5, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 307298304. Throughput: 0: 47820.9. Samples: 160338120. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 13:30:26,562][14064] Avg episode reward: [(0, '0.172')] [2024-06-06 13:30:26,654][14296] Updated weights for policy 0, policy_version 18757 (0.0033) [2024-06-06 13:30:29,717][14296] Updated weights for policy 0, policy_version 18767 (0.0029) [2024-06-06 13:30:31,561][14064] Fps is (10 sec: 44248.7, 60 sec: 47240.6, 300 sec: 47541.4). Total num frames: 307494912. Throughput: 0: 47688.0. Samples: 160628040. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 13:30:31,562][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:30:33,503][14296] Updated weights for policy 0, policy_version 18777 (0.0029) [2024-06-06 13:30:36,561][14064] Fps is (10 sec: 49152.0, 60 sec: 48332.8, 300 sec: 47763.5). Total num frames: 307789824. Throughput: 0: 47652.2. Samples: 160768880. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 13:30:36,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:30:36,781][14296] Updated weights for policy 0, policy_version 18787 (0.0027) [2024-06-06 13:30:39,704][14276] Signal inference workers to stop experience collection... (2400 times) [2024-06-06 13:30:39,709][14276] Signal inference workers to resume experience collection... (2400 times) [2024-06-06 13:30:39,740][14296] InferenceWorker_p0-w0: stopping experience collection (2400 times) [2024-06-06 13:30:39,740][14296] InferenceWorker_p0-w0: resuming experience collection (2400 times) [2024-06-06 13:30:40,380][14296] Updated weights for policy 0, policy_version 18797 (0.0033) [2024-06-06 13:30:41,561][14064] Fps is (10 sec: 49151.3, 60 sec: 47240.4, 300 sec: 47541.3). Total num frames: 307986432. Throughput: 0: 47679.0. Samples: 161048140. Policy #0 lag: (min: 1.0, avg: 8.4, max: 21.0) [2024-06-06 13:30:41,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:30:43,514][14296] Updated weights for policy 0, policy_version 18807 (0.0028) [2024-06-06 13:30:46,564][14064] Fps is (10 sec: 45863.0, 60 sec: 48057.6, 300 sec: 47707.6). Total num frames: 308248576. Throughput: 0: 47462.3. Samples: 161330880. Policy #0 lag: (min: 1.0, avg: 8.4, max: 21.0) [2024-06-06 13:30:46,565][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:30:47,385][14296] Updated weights for policy 0, policy_version 18817 (0.0029) [2024-06-06 13:30:50,391][14296] Updated weights for policy 0, policy_version 18827 (0.0036) [2024-06-06 13:30:51,561][14064] Fps is (10 sec: 47514.0, 60 sec: 47513.7, 300 sec: 47596.9). Total num frames: 308461568. Throughput: 0: 47536.7. Samples: 161477680. Policy #0 lag: (min: 1.0, avg: 8.4, max: 21.0) [2024-06-06 13:30:51,562][14064] Avg episode reward: [(0, '0.175')] [2024-06-06 13:30:54,075][14296] Updated weights for policy 0, policy_version 18837 (0.0036) [2024-06-06 13:30:56,561][14064] Fps is (10 sec: 50804.0, 60 sec: 48332.8, 300 sec: 47819.1). Total num frames: 308756480. Throughput: 0: 47647.2. Samples: 161771040. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 13:30:56,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:30:57,414][14296] Updated weights for policy 0, policy_version 18847 (0.0030) [2024-06-06 13:31:00,804][14296] Updated weights for policy 0, policy_version 18857 (0.0033) [2024-06-06 13:31:01,561][14064] Fps is (10 sec: 49151.8, 60 sec: 47513.7, 300 sec: 47596.9). Total num frames: 308953088. Throughput: 0: 47620.1. Samples: 162053740. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 13:31:01,562][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:31:04,379][14296] Updated weights for policy 0, policy_version 18867 (0.0039) [2024-06-06 13:31:06,561][14064] Fps is (10 sec: 42597.8, 60 sec: 47514.8, 300 sec: 47596.9). Total num frames: 309182464. Throughput: 0: 47575.6. Samples: 162193140. Policy #0 lag: (min: 0.0, avg: 7.3, max: 20.0) [2024-06-06 13:31:06,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:31:07,861][14296] Updated weights for policy 0, policy_version 18877 (0.0032) [2024-06-06 13:31:11,225][14296] Updated weights for policy 0, policy_version 18887 (0.0031) [2024-06-06 13:31:11,561][14064] Fps is (10 sec: 49152.2, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 309444608. Throughput: 0: 47380.4. Samples: 162470240. Policy #0 lag: (min: 0.0, avg: 7.3, max: 20.0) [2024-06-06 13:31:11,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:31:14,937][14296] Updated weights for policy 0, policy_version 18897 (0.0026) [2024-06-06 13:31:16,561][14064] Fps is (10 sec: 47513.5, 60 sec: 47240.6, 300 sec: 47652.4). Total num frames: 309657600. Throughput: 0: 47518.5. Samples: 162766380. Policy #0 lag: (min: 0.0, avg: 7.3, max: 20.0) [2024-06-06 13:31:16,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:31:18,185][14296] Updated weights for policy 0, policy_version 18907 (0.0027) [2024-06-06 13:31:21,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47788.8, 300 sec: 47708.0). Total num frames: 309919744. Throughput: 0: 47555.5. Samples: 162908880. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 13:31:21,562][14064] Avg episode reward: [(0, '0.170')] [2024-06-06 13:31:21,671][14296] Updated weights for policy 0, policy_version 18917 (0.0023) [2024-06-06 13:31:25,104][14296] Updated weights for policy 0, policy_version 18927 (0.0041) [2024-06-06 13:31:26,561][14064] Fps is (10 sec: 44237.1, 60 sec: 46694.3, 300 sec: 47485.8). Total num frames: 310099968. Throughput: 0: 47528.1. Samples: 163186900. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 13:31:26,562][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:31:28,232][14276] Signal inference workers to stop experience collection... (2450 times) [2024-06-06 13:31:28,232][14276] Signal inference workers to resume experience collection... (2450 times) [2024-06-06 13:31:28,254][14296] InferenceWorker_p0-w0: stopping experience collection (2450 times) [2024-06-06 13:31:28,254][14296] InferenceWorker_p0-w0: resuming experience collection (2450 times) [2024-06-06 13:31:28,366][14296] Updated weights for policy 0, policy_version 18937 (0.0033) [2024-06-06 13:31:31,561][14064] Fps is (10 sec: 49152.2, 60 sec: 48605.8, 300 sec: 47708.0). Total num frames: 310411264. Throughput: 0: 47609.0. Samples: 163473160. Policy #0 lag: (min: 0.0, avg: 8.2, max: 20.0) [2024-06-06 13:31:31,562][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:31:31,916][14296] Updated weights for policy 0, policy_version 18947 (0.0031) [2024-06-06 13:31:35,422][14296] Updated weights for policy 0, policy_version 18957 (0.0040) [2024-06-06 13:31:36,561][14064] Fps is (10 sec: 50790.3, 60 sec: 46967.4, 300 sec: 47652.4). Total num frames: 310607872. Throughput: 0: 47532.0. Samples: 163616620. Policy #0 lag: (min: 0.0, avg: 7.8, max: 20.0) [2024-06-06 13:31:36,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:31:38,816][14296] Updated weights for policy 0, policy_version 18967 (0.0028) [2024-06-06 13:31:41,561][14064] Fps is (10 sec: 44236.9, 60 sec: 47786.8, 300 sec: 47708.0). Total num frames: 310853632. Throughput: 0: 47409.8. Samples: 163904480. Policy #0 lag: (min: 0.0, avg: 7.8, max: 20.0) [2024-06-06 13:31:41,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:31:42,526][14296] Updated weights for policy 0, policy_version 18977 (0.0026) [2024-06-06 13:31:45,657][14296] Updated weights for policy 0, policy_version 18987 (0.0031) [2024-06-06 13:31:46,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47242.6, 300 sec: 47596.9). Total num frames: 311083008. Throughput: 0: 47255.6. Samples: 164180240. Policy #0 lag: (min: 0.0, avg: 7.8, max: 20.0) [2024-06-06 13:31:46,562][14064] Avg episode reward: [(0, '0.175')] [2024-06-06 13:31:46,567][14276] Saving new best policy, reward=0.175! [2024-06-06 13:31:49,398][14296] Updated weights for policy 0, policy_version 18997 (0.0027) [2024-06-06 13:31:51,561][14064] Fps is (10 sec: 47513.4, 60 sec: 47786.7, 300 sec: 47597.3). Total num frames: 311328768. Throughput: 0: 47169.0. Samples: 164315740. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 13:31:51,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:31:52,804][14296] Updated weights for policy 0, policy_version 19007 (0.0040) [2024-06-06 13:31:56,141][14296] Updated weights for policy 0, policy_version 19017 (0.0024) [2024-06-06 13:31:56,561][14064] Fps is (10 sec: 49151.8, 60 sec: 46967.4, 300 sec: 47708.0). Total num frames: 311574528. Throughput: 0: 47571.1. Samples: 164610940. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 13:31:56,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:31:59,808][14296] Updated weights for policy 0, policy_version 19027 (0.0031) [2024-06-06 13:32:01,561][14064] Fps is (10 sec: 45874.9, 60 sec: 47240.5, 300 sec: 47596.9). Total num frames: 311787520. Throughput: 0: 47343.6. Samples: 164896840. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 13:32:01,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:32:03,134][14296] Updated weights for policy 0, policy_version 19037 (0.0026) [2024-06-06 13:32:06,474][14296] Updated weights for policy 0, policy_version 19047 (0.0033) [2024-06-06 13:32:06,562][14064] Fps is (10 sec: 49149.1, 60 sec: 48059.3, 300 sec: 47707.9). Total num frames: 312066048. Throughput: 0: 47497.5. Samples: 165046300. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 13:32:06,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:32:06,573][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000019047_312066048.pth... [2024-06-06 13:32:06,621][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000018348_300613632.pth [2024-06-06 13:32:10,071][14296] Updated weights for policy 0, policy_version 19057 (0.0035) [2024-06-06 13:32:11,561][14064] Fps is (10 sec: 49152.8, 60 sec: 47240.6, 300 sec: 47708.0). Total num frames: 312279040. Throughput: 0: 47522.8. Samples: 165325420. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 13:32:11,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:32:13,236][14296] Updated weights for policy 0, policy_version 19067 (0.0032) [2024-06-06 13:32:16,561][14064] Fps is (10 sec: 45877.5, 60 sec: 47786.6, 300 sec: 47652.4). Total num frames: 312524800. Throughput: 0: 47462.5. Samples: 165608980. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:32:16,562][14064] Avg episode reward: [(0, '0.168')] [2024-06-06 13:32:16,904][14296] Updated weights for policy 0, policy_version 19077 (0.0037) [2024-06-06 13:32:20,381][14296] Updated weights for policy 0, policy_version 19087 (0.0025) [2024-06-06 13:32:21,561][14064] Fps is (10 sec: 44236.6, 60 sec: 46694.5, 300 sec: 47596.9). Total num frames: 312721408. Throughput: 0: 47356.1. Samples: 165747640. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:32:21,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:32:23,673][14296] Updated weights for policy 0, policy_version 19097 (0.0040) [2024-06-06 13:32:25,744][14276] Signal inference workers to stop experience collection... (2500 times) [2024-06-06 13:32:25,745][14276] Signal inference workers to resume experience collection... (2500 times) [2024-06-06 13:32:25,792][14296] InferenceWorker_p0-w0: stopping experience collection (2500 times) [2024-06-06 13:32:25,792][14296] InferenceWorker_p0-w0: resuming experience collection (2500 times) [2024-06-06 13:32:26,561][14064] Fps is (10 sec: 49152.8, 60 sec: 48605.9, 300 sec: 47708.0). Total num frames: 313016320. Throughput: 0: 47490.2. Samples: 166041540. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:32:26,562][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:32:27,474][14296] Updated weights for policy 0, policy_version 19107 (0.0028) [2024-06-06 13:32:30,644][14296] Updated weights for policy 0, policy_version 19117 (0.0029) [2024-06-06 13:32:31,561][14064] Fps is (10 sec: 50790.2, 60 sec: 46967.5, 300 sec: 47708.0). Total num frames: 313229312. Throughput: 0: 47753.4. Samples: 166329140. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 13:32:31,562][14064] Avg episode reward: [(0, '0.172')] [2024-06-06 13:32:34,129][14296] Updated weights for policy 0, policy_version 19127 (0.0023) [2024-06-06 13:32:36,561][14064] Fps is (10 sec: 45875.4, 60 sec: 47786.7, 300 sec: 47708.4). Total num frames: 313475072. Throughput: 0: 47765.8. Samples: 166465200. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 13:32:36,561][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:32:37,456][14296] Updated weights for policy 0, policy_version 19137 (0.0022) [2024-06-06 13:32:40,887][14296] Updated weights for policy 0, policy_version 19147 (0.0023) [2024-06-06 13:32:41,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 313704448. Throughput: 0: 47499.7. Samples: 166748420. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 13:32:41,562][14064] Avg episode reward: [(0, '0.173')] [2024-06-06 13:32:44,406][14296] Updated weights for policy 0, policy_version 19157 (0.0019) [2024-06-06 13:32:46,561][14064] Fps is (10 sec: 47513.5, 60 sec: 47786.7, 300 sec: 47597.3). Total num frames: 313950208. Throughput: 0: 47747.7. Samples: 167045480. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:32:46,562][14064] Avg episode reward: [(0, '0.173')] [2024-06-06 13:32:47,983][14296] Updated weights for policy 0, policy_version 19167 (0.0028) [2024-06-06 13:32:51,103][14296] Updated weights for policy 0, policy_version 19177 (0.0029) [2024-06-06 13:32:51,561][14064] Fps is (10 sec: 49151.5, 60 sec: 47786.6, 300 sec: 47708.0). Total num frames: 314195968. Throughput: 0: 47703.3. Samples: 167192920. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:32:51,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:32:55,137][14296] Updated weights for policy 0, policy_version 19187 (0.0024) [2024-06-06 13:32:56,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47513.7, 300 sec: 47652.4). Total num frames: 314425344. Throughput: 0: 47773.7. Samples: 167475240. Policy #0 lag: (min: 0.0, avg: 8.9, max: 20.0) [2024-06-06 13:32:56,561][14064] Avg episode reward: [(0, '0.163')] [2024-06-06 13:32:58,331][14296] Updated weights for policy 0, policy_version 19197 (0.0031) [2024-06-06 13:33:01,561][14064] Fps is (10 sec: 47514.3, 60 sec: 48059.8, 300 sec: 47652.9). Total num frames: 314671104. Throughput: 0: 47530.9. Samples: 167747860. Policy #0 lag: (min: 0.0, avg: 8.9, max: 20.0) [2024-06-06 13:33:01,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:33:02,018][14296] Updated weights for policy 0, policy_version 19207 (0.0024) [2024-06-06 13:33:05,167][14296] Updated weights for policy 0, policy_version 19217 (0.0035) [2024-06-06 13:33:06,561][14064] Fps is (10 sec: 47513.5, 60 sec: 47241.1, 300 sec: 47652.5). Total num frames: 314900480. Throughput: 0: 47649.3. Samples: 167891860. Policy #0 lag: (min: 0.0, avg: 8.9, max: 20.0) [2024-06-06 13:33:06,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:33:08,863][14296] Updated weights for policy 0, policy_version 19227 (0.0027) [2024-06-06 13:33:11,561][14064] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 315129856. Throughput: 0: 47521.0. Samples: 168179980. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 13:33:11,561][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:33:11,922][14296] Updated weights for policy 0, policy_version 19237 (0.0029) [2024-06-06 13:33:12,749][14276] Signal inference workers to stop experience collection... (2550 times) [2024-06-06 13:33:12,750][14276] Signal inference workers to resume experience collection... (2550 times) [2024-06-06 13:33:12,785][14296] InferenceWorker_p0-w0: stopping experience collection (2550 times) [2024-06-06 13:33:12,786][14296] InferenceWorker_p0-w0: resuming experience collection (2550 times) [2024-06-06 13:33:16,006][14296] Updated weights for policy 0, policy_version 19247 (0.0026) [2024-06-06 13:33:16,561][14064] Fps is (10 sec: 44236.3, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 315342848. Throughput: 0: 47693.7. Samples: 168475360. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 13:33:16,562][14064] Avg episode reward: [(0, '0.170')] [2024-06-06 13:33:18,710][14296] Updated weights for policy 0, policy_version 19257 (0.0024) [2024-06-06 13:33:21,561][14064] Fps is (10 sec: 50790.1, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 315637760. Throughput: 0: 47704.0. Samples: 168611880. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 13:33:21,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:33:23,111][14296] Updated weights for policy 0, policy_version 19267 (0.0039) [2024-06-06 13:33:25,782][14296] Updated weights for policy 0, policy_version 19277 (0.0024) [2024-06-06 13:33:26,561][14064] Fps is (10 sec: 49152.0, 60 sec: 46967.4, 300 sec: 47596.9). Total num frames: 315834368. Throughput: 0: 47626.5. Samples: 168891620. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 13:33:26,562][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:33:29,899][14296] Updated weights for policy 0, policy_version 19287 (0.0036) [2024-06-06 13:33:31,564][14064] Fps is (10 sec: 44224.7, 60 sec: 47511.5, 300 sec: 47596.5). Total num frames: 316080128. Throughput: 0: 47307.8. Samples: 169174460. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 13:33:31,565][14064] Avg episode reward: [(0, '0.172')] [2024-06-06 13:33:32,841][14296] Updated weights for policy 0, policy_version 19297 (0.0026) [2024-06-06 13:33:36,553][14296] Updated weights for policy 0, policy_version 19307 (0.0029) [2024-06-06 13:33:36,561][14064] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 47596.9). Total num frames: 316325888. Throughput: 0: 47277.4. Samples: 169320400. Policy #0 lag: (min: 0.0, avg: 10.4, max: 23.0) [2024-06-06 13:33:36,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:33:39,727][14296] Updated weights for policy 0, policy_version 19317 (0.0024) [2024-06-06 13:33:41,561][14064] Fps is (10 sec: 47526.2, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 316555264. Throughput: 0: 47341.3. Samples: 169605600. Policy #0 lag: (min: 0.0, avg: 10.4, max: 23.0) [2024-06-06 13:33:41,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:33:43,617][14296] Updated weights for policy 0, policy_version 19327 (0.0031) [2024-06-06 13:33:46,471][14296] Updated weights for policy 0, policy_version 19337 (0.0030) [2024-06-06 13:33:46,561][14064] Fps is (10 sec: 49151.6, 60 sec: 47786.6, 300 sec: 47708.0). Total num frames: 316817408. Throughput: 0: 47583.8. Samples: 169889140. Policy #0 lag: (min: 0.0, avg: 10.4, max: 23.0) [2024-06-06 13:33:46,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:33:50,765][14296] Updated weights for policy 0, policy_version 19347 (0.0032) [2024-06-06 13:33:51,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47240.6, 300 sec: 47596.9). Total num frames: 317030400. Throughput: 0: 47436.4. Samples: 170026500. Policy #0 lag: (min: 0.0, avg: 10.5, max: 24.0) [2024-06-06 13:33:51,562][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:33:53,533][14296] Updated weights for policy 0, policy_version 19357 (0.0033) [2024-06-06 13:33:56,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47786.6, 300 sec: 47652.4). Total num frames: 317292544. Throughput: 0: 47421.1. Samples: 170313940. Policy #0 lag: (min: 0.0, avg: 10.5, max: 24.0) [2024-06-06 13:33:56,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:33:57,353][14296] Updated weights for policy 0, policy_version 19367 (0.0037) [2024-06-06 13:34:00,251][14296] Updated weights for policy 0, policy_version 19377 (0.0034) [2024-06-06 13:34:01,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47240.5, 300 sec: 47541.4). Total num frames: 317505536. Throughput: 0: 47201.4. Samples: 170599420. Policy #0 lag: (min: 0.0, avg: 10.5, max: 24.0) [2024-06-06 13:34:01,562][14064] Avg episode reward: [(0, '0.175')] [2024-06-06 13:34:04,055][14296] Updated weights for policy 0, policy_version 19387 (0.0027) [2024-06-06 13:34:06,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47596.9). Total num frames: 317751296. Throughput: 0: 47480.4. Samples: 170748500. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 13:34:06,562][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:34:06,622][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000019395_317767680.pth... [2024-06-06 13:34:06,673][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000018698_306348032.pth [2024-06-06 13:34:07,168][14296] Updated weights for policy 0, policy_version 19397 (0.0029) [2024-06-06 13:34:09,875][14276] Signal inference workers to stop experience collection... (2600 times) [2024-06-06 13:34:09,876][14276] Signal inference workers to resume experience collection... (2600 times) [2024-06-06 13:34:09,916][14296] InferenceWorker_p0-w0: stopping experience collection (2600 times) [2024-06-06 13:34:09,916][14296] InferenceWorker_p0-w0: resuming experience collection (2600 times) [2024-06-06 13:34:10,931][14296] Updated weights for policy 0, policy_version 19407 (0.0023) [2024-06-06 13:34:11,561][14064] Fps is (10 sec: 45874.7, 60 sec: 47240.3, 300 sec: 47485.8). Total num frames: 317964288. Throughput: 0: 47604.8. Samples: 171033840. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 13:34:11,562][14064] Avg episode reward: [(0, '0.162')] [2024-06-06 13:34:13,943][14296] Updated weights for policy 0, policy_version 19417 (0.0038) [2024-06-06 13:34:16,561][14064] Fps is (10 sec: 47513.4, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 318226432. Throughput: 0: 47537.0. Samples: 171313500. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 13:34:16,562][14064] Avg episode reward: [(0, '0.175')] [2024-06-06 13:34:18,199][14296] Updated weights for policy 0, policy_version 19427 (0.0027) [2024-06-06 13:34:20,988][14296] Updated weights for policy 0, policy_version 19437 (0.0026) [2024-06-06 13:34:21,561][14064] Fps is (10 sec: 49151.7, 60 sec: 46967.3, 300 sec: 47596.9). Total num frames: 318455808. Throughput: 0: 47678.0. Samples: 171465920. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 13:34:21,568][14064] Avg episode reward: [(0, '0.172')] [2024-06-06 13:34:25,196][14296] Updated weights for policy 0, policy_version 19447 (0.0028) [2024-06-06 13:34:26,561][14064] Fps is (10 sec: 45875.7, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 318685184. Throughput: 0: 47503.2. Samples: 171743240. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 13:34:26,561][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:34:27,815][14296] Updated weights for policy 0, policy_version 19457 (0.0036) [2024-06-06 13:34:31,561][14064] Fps is (10 sec: 47514.7, 60 sec: 47515.7, 300 sec: 47596.9). Total num frames: 318930944. Throughput: 0: 47295.7. Samples: 172017440. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 13:34:31,562][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:34:31,925][14296] Updated weights for policy 0, policy_version 19467 (0.0023) [2024-06-06 13:34:34,858][14296] Updated weights for policy 0, policy_version 19477 (0.0030) [2024-06-06 13:34:36,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47240.6, 300 sec: 47485.8). Total num frames: 319160320. Throughput: 0: 47401.0. Samples: 172159540. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 13:34:36,562][14064] Avg episode reward: [(0, '0.175')] [2024-06-06 13:34:38,969][14296] Updated weights for policy 0, policy_version 19487 (0.0026) [2024-06-06 13:34:41,561][14064] Fps is (10 sec: 49152.5, 60 sec: 47786.8, 300 sec: 47652.5). Total num frames: 319422464. Throughput: 0: 47558.9. Samples: 172454080. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 13:34:41,561][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:34:41,666][14276] Saving new best policy, reward=0.180! [2024-06-06 13:34:41,672][14296] Updated weights for policy 0, policy_version 19497 (0.0030) [2024-06-06 13:34:45,967][14296] Updated weights for policy 0, policy_version 19507 (0.0034) [2024-06-06 13:34:46,561][14064] Fps is (10 sec: 45874.5, 60 sec: 46694.4, 300 sec: 47485.8). Total num frames: 319619072. Throughput: 0: 47529.3. Samples: 172738240. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:34:46,562][14064] Avg episode reward: [(0, '0.172')] [2024-06-06 13:34:48,806][14296] Updated weights for policy 0, policy_version 19517 (0.0024) [2024-06-06 13:34:51,561][14064] Fps is (10 sec: 47513.4, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 319897600. Throughput: 0: 47170.3. Samples: 172871160. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:34:51,562][14064] Avg episode reward: [(0, '0.175')] [2024-06-06 13:34:52,638][14296] Updated weights for policy 0, policy_version 19527 (0.0023) [2024-06-06 13:34:55,468][14296] Updated weights for policy 0, policy_version 19537 (0.0028) [2024-06-06 13:34:56,561][14064] Fps is (10 sec: 49152.7, 60 sec: 46967.5, 300 sec: 47485.9). Total num frames: 320110592. Throughput: 0: 47180.6. Samples: 173156960. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:34:56,562][14064] Avg episode reward: [(0, '0.176')] [2024-06-06 13:34:59,356][14296] Updated weights for policy 0, policy_version 19547 (0.0030) [2024-06-06 13:35:00,829][14276] Signal inference workers to stop experience collection... (2650 times) [2024-06-06 13:35:00,878][14296] InferenceWorker_p0-w0: stopping experience collection (2650 times) [2024-06-06 13:35:00,884][14276] Signal inference workers to resume experience collection... (2650 times) [2024-06-06 13:35:00,892][14296] InferenceWorker_p0-w0: resuming experience collection (2650 times) [2024-06-06 13:35:01,561][14064] Fps is (10 sec: 45874.4, 60 sec: 47513.5, 300 sec: 47541.6). Total num frames: 320356352. Throughput: 0: 47466.2. Samples: 173449480. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:35:01,562][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:35:02,351][14296] Updated weights for policy 0, policy_version 19557 (0.0030) [2024-06-06 13:35:06,482][14296] Updated weights for policy 0, policy_version 19567 (0.0021) [2024-06-06 13:35:06,561][14064] Fps is (10 sec: 47513.1, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 320585728. Throughput: 0: 47204.1. Samples: 173590100. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:35:06,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:35:09,135][14296] Updated weights for policy 0, policy_version 19577 (0.0028) [2024-06-06 13:35:11,564][14064] Fps is (10 sec: 47501.6, 60 sec: 47784.7, 300 sec: 47485.4). Total num frames: 320831488. Throughput: 0: 47428.8. Samples: 173877660. Policy #0 lag: (min: 0.0, avg: 9.0, max: 20.0) [2024-06-06 13:35:11,565][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:35:13,160][14296] Updated weights for policy 0, policy_version 19587 (0.0025) [2024-06-06 13:35:16,118][14296] Updated weights for policy 0, policy_version 19597 (0.0028) [2024-06-06 13:35:16,561][14064] Fps is (10 sec: 50790.5, 60 sec: 47786.7, 300 sec: 47597.3). Total num frames: 321093632. Throughput: 0: 47619.5. Samples: 174160320. Policy #0 lag: (min: 0.0, avg: 8.3, max: 22.0) [2024-06-06 13:35:16,562][14064] Avg episode reward: [(0, '0.178')] [2024-06-06 13:35:20,168][14296] Updated weights for policy 0, policy_version 19607 (0.0030) [2024-06-06 13:35:21,561][14064] Fps is (10 sec: 45887.4, 60 sec: 47240.7, 300 sec: 47430.3). Total num frames: 321290240. Throughput: 0: 47649.3. Samples: 174303760. Policy #0 lag: (min: 0.0, avg: 8.3, max: 22.0) [2024-06-06 13:35:21,561][14064] Avg episode reward: [(0, '0.179')] [2024-06-06 13:35:23,078][14296] Updated weights for policy 0, policy_version 19617 (0.0032) [2024-06-06 13:35:26,564][14064] Fps is (10 sec: 45863.3, 60 sec: 47784.5, 300 sec: 47652.0). Total num frames: 321552384. Throughput: 0: 47496.2. Samples: 174591540. Policy #0 lag: (min: 0.0, avg: 8.3, max: 22.0) [2024-06-06 13:35:26,565][14064] Avg episode reward: [(0, '0.179')] [2024-06-06 13:35:27,127][14296] Updated weights for policy 0, policy_version 19627 (0.0029) [2024-06-06 13:35:30,056][14296] Updated weights for policy 0, policy_version 19637 (0.0035) [2024-06-06 13:35:31,561][14064] Fps is (10 sec: 49151.8, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 321781760. Throughput: 0: 47421.9. Samples: 174872220. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 13:35:31,562][14064] Avg episode reward: [(0, '0.172')] [2024-06-06 13:35:34,215][14296] Updated weights for policy 0, policy_version 19647 (0.0031) [2024-06-06 13:35:36,561][14064] Fps is (10 sec: 45887.5, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 322011136. Throughput: 0: 47800.8. Samples: 175022200. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 13:35:36,561][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:35:36,718][14276] Saving new best policy, reward=0.182! [2024-06-06 13:35:36,952][14296] Updated weights for policy 0, policy_version 19657 (0.0035) [2024-06-06 13:35:40,905][14296] Updated weights for policy 0, policy_version 19667 (0.0025) [2024-06-06 13:35:41,561][14064] Fps is (10 sec: 45875.9, 60 sec: 46967.5, 300 sec: 47430.7). Total num frames: 322240512. Throughput: 0: 47557.9. Samples: 175297060. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 13:35:41,561][14064] Avg episode reward: [(0, '0.181')] [2024-06-06 13:35:43,888][14296] Updated weights for policy 0, policy_version 19677 (0.0033) [2024-06-06 13:35:46,562][14064] Fps is (10 sec: 49150.7, 60 sec: 48059.6, 300 sec: 47596.9). Total num frames: 322502656. Throughput: 0: 47401.2. Samples: 175582540. Policy #0 lag: (min: 0.0, avg: 8.0, max: 22.0) [2024-06-06 13:35:46,562][14064] Avg episode reward: [(0, '0.176')] [2024-06-06 13:35:47,824][14296] Updated weights for policy 0, policy_version 19687 (0.0030) [2024-06-06 13:35:50,826][14296] Updated weights for policy 0, policy_version 19697 (0.0031) [2024-06-06 13:35:51,561][14064] Fps is (10 sec: 49150.7, 60 sec: 47240.4, 300 sec: 47374.7). Total num frames: 322732032. Throughput: 0: 47552.8. Samples: 175729980. Policy #0 lag: (min: 0.0, avg: 8.0, max: 22.0) [2024-06-06 13:35:51,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:35:54,823][14296] Updated weights for policy 0, policy_version 19707 (0.0029) [2024-06-06 13:35:56,561][14064] Fps is (10 sec: 44238.0, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 322945024. Throughput: 0: 47535.2. Samples: 176016620. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 13:35:56,562][14064] Avg episode reward: [(0, '0.176')] [2024-06-06 13:35:57,391][14276] Signal inference workers to stop experience collection... (2700 times) [2024-06-06 13:35:57,435][14296] InferenceWorker_p0-w0: stopping experience collection (2700 times) [2024-06-06 13:35:57,443][14276] Signal inference workers to resume experience collection... (2700 times) [2024-06-06 13:35:57,454][14296] InferenceWorker_p0-w0: resuming experience collection (2700 times) [2024-06-06 13:35:57,571][14296] Updated weights for policy 0, policy_version 19717 (0.0026) [2024-06-06 13:36:01,493][14296] Updated weights for policy 0, policy_version 19727 (0.0024) [2024-06-06 13:36:01,561][14064] Fps is (10 sec: 47514.0, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 323207168. Throughput: 0: 47694.7. Samples: 176306580. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 13:36:01,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:36:04,425][14296] Updated weights for policy 0, policy_version 19737 (0.0025) [2024-06-06 13:36:06,561][14064] Fps is (10 sec: 49151.1, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 323436544. Throughput: 0: 47417.1. Samples: 176437540. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 13:36:06,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:36:06,594][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000019742_323452928.pth... [2024-06-06 13:36:06,645][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000019047_312066048.pth [2024-06-06 13:36:08,171][14296] Updated weights for policy 0, policy_version 19747 (0.0021) [2024-06-06 13:36:11,126][14296] Updated weights for policy 0, policy_version 19757 (0.0026) [2024-06-06 13:36:11,561][14064] Fps is (10 sec: 50790.4, 60 sec: 48061.8, 300 sec: 47652.5). Total num frames: 323715072. Throughput: 0: 47568.1. Samples: 176731980. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-06 13:36:11,562][14064] Avg episode reward: [(0, '0.177')] [2024-06-06 13:36:15,261][14296] Updated weights for policy 0, policy_version 19767 (0.0035) [2024-06-06 13:36:16,561][14064] Fps is (10 sec: 45875.7, 60 sec: 46694.4, 300 sec: 47374.8). Total num frames: 323895296. Throughput: 0: 47802.6. Samples: 177023340. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-06 13:36:16,562][14064] Avg episode reward: [(0, '0.169')] [2024-06-06 13:36:18,019][14296] Updated weights for policy 0, policy_version 19777 (0.0030) [2024-06-06 13:36:21,562][14064] Fps is (10 sec: 45870.2, 60 sec: 48058.8, 300 sec: 47707.8). Total num frames: 324173824. Throughput: 0: 47428.6. Samples: 177156540. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-06 13:36:21,563][14064] Avg episode reward: [(0, '0.166')] [2024-06-06 13:36:22,227][14296] Updated weights for policy 0, policy_version 19787 (0.0024) [2024-06-06 13:36:24,970][14296] Updated weights for policy 0, policy_version 19797 (0.0028) [2024-06-06 13:36:26,561][14064] Fps is (10 sec: 50791.0, 60 sec: 47515.8, 300 sec: 47430.3). Total num frames: 324403200. Throughput: 0: 47571.9. Samples: 177437800. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 13:36:26,561][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:36:28,849][14296] Updated weights for policy 0, policy_version 19807 (0.0034) [2024-06-06 13:36:31,561][14064] Fps is (10 sec: 47518.6, 60 sec: 47786.6, 300 sec: 47596.9). Total num frames: 324648960. Throughput: 0: 47624.6. Samples: 177725640. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 13:36:31,562][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:36:31,924][14296] Updated weights for policy 0, policy_version 19817 (0.0037) [2024-06-06 13:36:35,623][14296] Updated weights for policy 0, policy_version 19827 (0.0020) [2024-06-06 13:36:36,561][14064] Fps is (10 sec: 44236.2, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 324845568. Throughput: 0: 47625.8. Samples: 177873140. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 13:36:36,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:36:38,874][14296] Updated weights for policy 0, policy_version 19837 (0.0031) [2024-06-06 13:36:41,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47786.5, 300 sec: 47541.4). Total num frames: 325107712. Throughput: 0: 47340.4. Samples: 178146940. Policy #0 lag: (min: 1.0, avg: 8.6, max: 21.0) [2024-06-06 13:36:41,562][14064] Avg episode reward: [(0, '0.179')] [2024-06-06 13:36:42,632][14296] Updated weights for policy 0, policy_version 19847 (0.0026) [2024-06-06 13:36:45,739][14296] Updated weights for policy 0, policy_version 19857 (0.0033) [2024-06-06 13:36:46,561][14064] Fps is (10 sec: 50790.5, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 325353472. Throughput: 0: 47240.9. Samples: 178432420. Policy #0 lag: (min: 1.0, avg: 8.6, max: 21.0) [2024-06-06 13:36:46,562][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:36:49,525][14296] Updated weights for policy 0, policy_version 19867 (0.0026) [2024-06-06 13:36:51,561][14064] Fps is (10 sec: 44237.1, 60 sec: 46967.6, 300 sec: 47374.8). Total num frames: 325550080. Throughput: 0: 47559.7. Samples: 178577720. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 13:36:51,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:36:52,835][14296] Updated weights for policy 0, policy_version 19877 (0.0026) [2024-06-06 13:36:56,207][14296] Updated weights for policy 0, policy_version 19887 (0.0031) [2024-06-06 13:36:56,561][14064] Fps is (10 sec: 47513.8, 60 sec: 48059.7, 300 sec: 47596.9). Total num frames: 325828608. Throughput: 0: 47479.1. Samples: 178868540. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 13:36:56,562][14064] Avg episode reward: [(0, '0.176')] [2024-06-06 13:36:58,624][14276] Signal inference workers to stop experience collection... (2750 times) [2024-06-06 13:36:58,677][14296] InferenceWorker_p0-w0: stopping experience collection (2750 times) [2024-06-06 13:36:58,683][14276] Signal inference workers to resume experience collection... (2750 times) [2024-06-06 13:36:58,692][14296] InferenceWorker_p0-w0: resuming experience collection (2750 times) [2024-06-06 13:36:59,892][14296] Updated weights for policy 0, policy_version 19897 (0.0027) [2024-06-06 13:37:01,561][14064] Fps is (10 sec: 50789.5, 60 sec: 47513.5, 300 sec: 47430.4). Total num frames: 326057984. Throughput: 0: 47188.3. Samples: 179146820. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 13:37:01,562][14064] Avg episode reward: [(0, '0.177')] [2024-06-06 13:37:03,163][14296] Updated weights for policy 0, policy_version 19907 (0.0033) [2024-06-06 13:37:06,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47786.8, 300 sec: 47541.4). Total num frames: 326303744. Throughput: 0: 47471.9. Samples: 179292720. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 13:37:06,562][14064] Avg episode reward: [(0, '0.176')] [2024-06-06 13:37:06,625][14296] Updated weights for policy 0, policy_version 19917 (0.0027) [2024-06-06 13:37:10,262][14296] Updated weights for policy 0, policy_version 19927 (0.0030) [2024-06-06 13:37:11,561][14064] Fps is (10 sec: 44237.8, 60 sec: 46421.4, 300 sec: 47374.8). Total num frames: 326500352. Throughput: 0: 47399.5. Samples: 179570780. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 13:37:11,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:37:13,683][14296] Updated weights for policy 0, policy_version 19937 (0.0027) [2024-06-06 13:37:16,561][14064] Fps is (10 sec: 47513.4, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 326778880. Throughput: 0: 47542.7. Samples: 179865060. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 13:37:16,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:37:16,945][14296] Updated weights for policy 0, policy_version 19947 (0.0029) [2024-06-06 13:37:20,365][14296] Updated weights for policy 0, policy_version 19957 (0.0027) [2024-06-06 13:37:21,561][14064] Fps is (10 sec: 54066.9, 60 sec: 47787.6, 300 sec: 47541.4). Total num frames: 327041024. Throughput: 0: 47581.9. Samples: 180014320. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 13:37:21,561][14064] Avg episode reward: [(0, '0.177')] [2024-06-06 13:37:23,738][14296] Updated weights for policy 0, policy_version 19967 (0.0031) [2024-06-06 13:37:26,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47240.4, 300 sec: 47485.8). Total num frames: 327237632. Throughput: 0: 47862.2. Samples: 180300740. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 13:37:26,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:37:27,274][14296] Updated weights for policy 0, policy_version 19977 (0.0034) [2024-06-06 13:37:30,838][14296] Updated weights for policy 0, policy_version 19987 (0.0033) [2024-06-06 13:37:31,561][14064] Fps is (10 sec: 42598.4, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 327467008. Throughput: 0: 47681.4. Samples: 180578080. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 13:37:31,562][14064] Avg episode reward: [(0, '0.175')] [2024-06-06 13:37:34,543][14296] Updated weights for policy 0, policy_version 19997 (0.0030) [2024-06-06 13:37:36,561][14064] Fps is (10 sec: 45875.4, 60 sec: 47513.7, 300 sec: 47430.3). Total num frames: 327696384. Throughput: 0: 47448.9. Samples: 180712920. Policy #0 lag: (min: 0.0, avg: 7.6, max: 21.0) [2024-06-06 13:37:36,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:37:38,643][14296] Updated weights for policy 0, policy_version 20007 (0.0031) [2024-06-06 13:37:41,423][14296] Updated weights for policy 0, policy_version 20017 (0.0021) [2024-06-06 13:37:41,561][14064] Fps is (10 sec: 49151.2, 60 sec: 47513.5, 300 sec: 47485.8). Total num frames: 327958528. Throughput: 0: 47230.1. Samples: 180993900. Policy #0 lag: (min: 0.0, avg: 7.6, max: 21.0) [2024-06-06 13:37:41,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:37:45,210][14296] Updated weights for policy 0, policy_version 20027 (0.0031) [2024-06-06 13:37:46,561][14064] Fps is (10 sec: 47513.6, 60 sec: 46967.5, 300 sec: 47374.8). Total num frames: 328171520. Throughput: 0: 47529.1. Samples: 181285620. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 13:37:46,562][14064] Avg episode reward: [(0, '0.178')] [2024-06-06 13:37:48,138][14296] Updated weights for policy 0, policy_version 20037 (0.0024) [2024-06-06 13:37:51,561][14064] Fps is (10 sec: 47513.6, 60 sec: 48059.6, 300 sec: 47485.8). Total num frames: 328433664. Throughput: 0: 47294.5. Samples: 181420980. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 13:37:51,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:37:52,475][14296] Updated weights for policy 0, policy_version 20047 (0.0024) [2024-06-06 13:37:53,744][14276] Signal inference workers to stop experience collection... (2800 times) [2024-06-06 13:37:53,797][14296] InferenceWorker_p0-w0: stopping experience collection (2800 times) [2024-06-06 13:37:53,802][14276] Signal inference workers to resume experience collection... (2800 times) [2024-06-06 13:37:53,813][14296] InferenceWorker_p0-w0: resuming experience collection (2800 times) [2024-06-06 13:37:55,065][14296] Updated weights for policy 0, policy_version 20057 (0.0028) [2024-06-06 13:37:56,561][14064] Fps is (10 sec: 49151.6, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 328663040. Throughput: 0: 47255.4. Samples: 181697280. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 13:37:56,562][14064] Avg episode reward: [(0, '0.177')] [2024-06-06 13:37:59,207][14296] Updated weights for policy 0, policy_version 20067 (0.0028) [2024-06-06 13:38:01,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47513.6, 300 sec: 47485.8). Total num frames: 328908800. Throughput: 0: 47138.6. Samples: 181986300. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 13:38:01,566][14064] Avg episode reward: [(0, '0.178')] [2024-06-06 13:38:02,268][14296] Updated weights for policy 0, policy_version 20077 (0.0031) [2024-06-06 13:38:06,110][14296] Updated weights for policy 0, policy_version 20087 (0.0032) [2024-06-06 13:38:06,561][14064] Fps is (10 sec: 44237.0, 60 sec: 46694.4, 300 sec: 47374.7). Total num frames: 329105408. Throughput: 0: 46908.4. Samples: 182125200. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 13:38:06,562][14064] Avg episode reward: [(0, '0.173')] [2024-06-06 13:38:06,572][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000020087_329105408.pth... [2024-06-06 13:38:06,630][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000019395_317767680.pth [2024-06-06 13:38:09,367][14296] Updated weights for policy 0, policy_version 20097 (0.0035) [2024-06-06 13:38:11,561][14064] Fps is (10 sec: 45875.1, 60 sec: 47786.5, 300 sec: 47541.4). Total num frames: 329367552. Throughput: 0: 46823.0. Samples: 182407780. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 13:38:11,562][14064] Avg episode reward: [(0, '0.179')] [2024-06-06 13:38:13,189][14296] Updated weights for policy 0, policy_version 20107 (0.0038) [2024-06-06 13:38:16,017][14296] Updated weights for policy 0, policy_version 20117 (0.0033) [2024-06-06 13:38:16,561][14064] Fps is (10 sec: 54066.6, 60 sec: 47786.6, 300 sec: 47485.8). Total num frames: 329646080. Throughput: 0: 46958.5. Samples: 182691220. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 13:38:16,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:38:20,138][14296] Updated weights for policy 0, policy_version 20127 (0.0026) [2024-06-06 13:38:21,561][14064] Fps is (10 sec: 47514.3, 60 sec: 46694.4, 300 sec: 47485.8). Total num frames: 329842688. Throughput: 0: 47216.0. Samples: 182837640. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 13:38:21,562][14064] Avg episode reward: [(0, '0.179')] [2024-06-06 13:38:22,790][14296] Updated weights for policy 0, policy_version 20137 (0.0028) [2024-06-06 13:38:26,561][14064] Fps is (10 sec: 42599.0, 60 sec: 47240.6, 300 sec: 47430.7). Total num frames: 330072064. Throughput: 0: 47285.9. Samples: 183121760. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 13:38:26,562][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:38:26,566][14276] Saving new best policy, reward=0.184! [2024-06-06 13:38:26,903][14296] Updated weights for policy 0, policy_version 20147 (0.0037) [2024-06-06 13:38:29,932][14296] Updated weights for policy 0, policy_version 20157 (0.0029) [2024-06-06 13:38:31,561][14064] Fps is (10 sec: 49152.1, 60 sec: 47786.7, 300 sec: 47485.8). Total num frames: 330334208. Throughput: 0: 47094.2. Samples: 183404860. Policy #0 lag: (min: 1.0, avg: 8.0, max: 21.0) [2024-06-06 13:38:31,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:38:33,659][14296] Updated weights for policy 0, policy_version 20167 (0.0037) [2024-06-06 13:38:36,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 330547200. Throughput: 0: 47398.4. Samples: 183553900. Policy #0 lag: (min: 1.0, avg: 8.0, max: 21.0) [2024-06-06 13:38:36,562][14064] Avg episode reward: [(0, '0.176')] [2024-06-06 13:38:36,886][14296] Updated weights for policy 0, policy_version 20177 (0.0028) [2024-06-06 13:38:40,623][14296] Updated weights for policy 0, policy_version 20187 (0.0030) [2024-06-06 13:38:41,561][14064] Fps is (10 sec: 42598.4, 60 sec: 46694.5, 300 sec: 47263.7). Total num frames: 330760192. Throughput: 0: 47655.6. Samples: 183841780. Policy #0 lag: (min: 1.0, avg: 8.0, max: 21.0) [2024-06-06 13:38:41,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:38:43,668][14296] Updated weights for policy 0, policy_version 20197 (0.0023) [2024-06-06 13:38:46,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 331022336. Throughput: 0: 47495.3. Samples: 184123580. Policy #0 lag: (min: 1.0, avg: 9.6, max: 22.0) [2024-06-06 13:38:46,561][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:38:47,482][14296] Updated weights for policy 0, policy_version 20207 (0.0025) [2024-06-06 13:38:50,500][14296] Updated weights for policy 0, policy_version 20217 (0.0038) [2024-06-06 13:38:51,561][14064] Fps is (10 sec: 52428.4, 60 sec: 47513.7, 300 sec: 47430.3). Total num frames: 331284480. Throughput: 0: 47623.1. Samples: 184268240. Policy #0 lag: (min: 1.0, avg: 9.6, max: 22.0) [2024-06-06 13:38:51,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:38:54,169][14296] Updated weights for policy 0, policy_version 20227 (0.0029) [2024-06-06 13:38:56,561][14064] Fps is (10 sec: 49152.0, 60 sec: 47513.7, 300 sec: 47485.8). Total num frames: 331513856. Throughput: 0: 47681.5. Samples: 184553440. Policy #0 lag: (min: 1.0, avg: 9.9, max: 23.0) [2024-06-06 13:38:56,562][14064] Avg episode reward: [(0, '0.179')] [2024-06-06 13:38:57,387][14296] Updated weights for policy 0, policy_version 20237 (0.0025) [2024-06-06 13:38:57,574][14276] Signal inference workers to stop experience collection... (2850 times) [2024-06-06 13:38:57,585][14296] InferenceWorker_p0-w0: stopping experience collection (2850 times) [2024-06-06 13:38:57,681][14276] Signal inference workers to resume experience collection... (2850 times) [2024-06-06 13:38:57,681][14296] InferenceWorker_p0-w0: resuming experience collection (2850 times) [2024-06-06 13:39:00,882][14296] Updated weights for policy 0, policy_version 20247 (0.0023) [2024-06-06 13:39:01,561][14064] Fps is (10 sec: 44237.2, 60 sec: 46967.6, 300 sec: 47374.8). Total num frames: 331726848. Throughput: 0: 47675.3. Samples: 184836600. Policy #0 lag: (min: 1.0, avg: 9.9, max: 23.0) [2024-06-06 13:39:01,562][14064] Avg episode reward: [(0, '0.178')] [2024-06-06 13:39:04,459][14296] Updated weights for policy 0, policy_version 20257 (0.0028) [2024-06-06 13:39:06,561][14064] Fps is (10 sec: 45874.4, 60 sec: 47786.6, 300 sec: 47485.8). Total num frames: 331972608. Throughput: 0: 47473.6. Samples: 184973960. Policy #0 lag: (min: 1.0, avg: 9.9, max: 23.0) [2024-06-06 13:39:06,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:39:07,715][14296] Updated weights for policy 0, policy_version 20267 (0.0026) [2024-06-06 13:39:11,125][14296] Updated weights for policy 0, policy_version 20277 (0.0022) [2024-06-06 13:39:11,561][14064] Fps is (10 sec: 50790.4, 60 sec: 47786.8, 300 sec: 47485.8). Total num frames: 332234752. Throughput: 0: 47764.0. Samples: 185271140. Policy #0 lag: (min: 2.0, avg: 10.4, max: 23.0) [2024-06-06 13:39:11,561][14064] Avg episode reward: [(0, '0.174')] [2024-06-06 13:39:14,633][14296] Updated weights for policy 0, policy_version 20287 (0.0033) [2024-06-06 13:39:16,561][14064] Fps is (10 sec: 47513.9, 60 sec: 46694.4, 300 sec: 47430.3). Total num frames: 332447744. Throughput: 0: 47847.0. Samples: 185557980. Policy #0 lag: (min: 2.0, avg: 10.4, max: 23.0) [2024-06-06 13:39:16,562][14064] Avg episode reward: [(0, '0.167')] [2024-06-06 13:39:17,982][14296] Updated weights for policy 0, policy_version 20297 (0.0024) [2024-06-06 13:39:21,225][14296] Updated weights for policy 0, policy_version 20307 (0.0034) [2024-06-06 13:39:21,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47786.7, 300 sec: 47541.4). Total num frames: 332709888. Throughput: 0: 47603.2. Samples: 185696040. Policy #0 lag: (min: 2.0, avg: 10.4, max: 23.0) [2024-06-06 13:39:21,562][14064] Avg episode reward: [(0, '0.175')] [2024-06-06 13:39:24,805][14296] Updated weights for policy 0, policy_version 20317 (0.0026) [2024-06-06 13:39:26,561][14064] Fps is (10 sec: 49152.2, 60 sec: 47786.6, 300 sec: 47485.8). Total num frames: 332939264. Throughput: 0: 47449.3. Samples: 185977000. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 13:39:26,562][14064] Avg episode reward: [(0, '0.173')] [2024-06-06 13:39:28,206][14296] Updated weights for policy 0, policy_version 20327 (0.0022) [2024-06-06 13:39:31,561][14064] Fps is (10 sec: 45874.3, 60 sec: 47240.4, 300 sec: 47485.8). Total num frames: 333168640. Throughput: 0: 47766.5. Samples: 186273080. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 13:39:31,562][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:39:31,786][14296] Updated weights for policy 0, policy_version 20337 (0.0022) [2024-06-06 13:39:34,890][14296] Updated weights for policy 0, policy_version 20347 (0.0031) [2024-06-06 13:39:36,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47374.7). Total num frames: 333398016. Throughput: 0: 47740.9. Samples: 186416580. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 13:39:36,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:39:38,454][14296] Updated weights for policy 0, policy_version 20357 (0.0027) [2024-06-06 13:39:41,561][14064] Fps is (10 sec: 49152.5, 60 sec: 48332.8, 300 sec: 47596.9). Total num frames: 333660160. Throughput: 0: 47714.2. Samples: 186700580. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:39:41,562][14064] Avg episode reward: [(0, '0.179')] [2024-06-06 13:39:42,003][14296] Updated weights for policy 0, policy_version 20367 (0.0029) [2024-06-06 13:39:45,348][14296] Updated weights for policy 0, policy_version 20377 (0.0029) [2024-06-06 13:39:46,561][14064] Fps is (10 sec: 52428.3, 60 sec: 48332.7, 300 sec: 47541.3). Total num frames: 333922304. Throughput: 0: 47674.5. Samples: 186981960. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:39:46,565][14064] Avg episode reward: [(0, '0.181')] [2024-06-06 13:39:48,924][14296] Updated weights for policy 0, policy_version 20387 (0.0033) [2024-06-06 13:39:51,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47240.6, 300 sec: 47485.8). Total num frames: 334118912. Throughput: 0: 47997.9. Samples: 187133860. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:39:51,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:39:52,255][14296] Updated weights for policy 0, policy_version 20397 (0.0035) [2024-06-06 13:39:55,643][14296] Updated weights for policy 0, policy_version 20407 (0.0033) [2024-06-06 13:39:56,561][14064] Fps is (10 sec: 44237.0, 60 sec: 47513.5, 300 sec: 47485.8). Total num frames: 334364672. Throughput: 0: 47676.8. Samples: 187416600. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 13:39:56,567][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:39:59,028][14296] Updated weights for policy 0, policy_version 20417 (0.0026) [2024-06-06 13:40:01,564][14064] Fps is (10 sec: 50777.1, 60 sec: 48330.6, 300 sec: 47596.5). Total num frames: 334626816. Throughput: 0: 47842.6. Samples: 187711020. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 13:40:01,565][14064] Avg episode reward: [(0, '0.176')] [2024-06-06 13:40:02,357][14296] Updated weights for policy 0, policy_version 20427 (0.0032) [2024-06-06 13:40:05,746][14296] Updated weights for policy 0, policy_version 20437 (0.0024) [2024-06-06 13:40:06,561][14064] Fps is (10 sec: 50790.8, 60 sec: 48332.9, 300 sec: 47597.3). Total num frames: 334872576. Throughput: 0: 48045.3. Samples: 187858080. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:40:06,562][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:40:06,664][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000020440_334888960.pth... [2024-06-06 13:40:06,710][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000019742_323452928.pth [2024-06-06 13:40:09,295][14296] Updated weights for policy 0, policy_version 20447 (0.0023) [2024-06-06 13:40:11,561][14064] Fps is (10 sec: 47526.3, 60 sec: 47786.7, 300 sec: 47485.8). Total num frames: 335101952. Throughput: 0: 48217.4. Samples: 188146780. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:40:11,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:40:12,158][14276] Signal inference workers to stop experience collection... (2900 times) [2024-06-06 13:40:12,158][14276] Signal inference workers to resume experience collection... (2900 times) [2024-06-06 13:40:12,171][14296] InferenceWorker_p0-w0: stopping experience collection (2900 times) [2024-06-06 13:40:12,171][14296] InferenceWorker_p0-w0: resuming experience collection (2900 times) [2024-06-06 13:40:12,702][14296] Updated weights for policy 0, policy_version 20457 (0.0035) [2024-06-06 13:40:16,438][14296] Updated weights for policy 0, policy_version 20467 (0.0022) [2024-06-06 13:40:16,561][14064] Fps is (10 sec: 45875.0, 60 sec: 48059.8, 300 sec: 47596.9). Total num frames: 335331328. Throughput: 0: 47718.3. Samples: 188420400. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:40:16,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:40:19,740][14296] Updated weights for policy 0, policy_version 20477 (0.0031) [2024-06-06 13:40:21,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47786.7, 300 sec: 47541.8). Total num frames: 335577088. Throughput: 0: 47671.6. Samples: 188561800. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 13:40:21,562][14064] Avg episode reward: [(0, '0.171')] [2024-06-06 13:40:23,231][14296] Updated weights for policy 0, policy_version 20487 (0.0035) [2024-06-06 13:40:26,561][14064] Fps is (10 sec: 47513.1, 60 sec: 47786.6, 300 sec: 47541.4). Total num frames: 335806464. Throughput: 0: 47740.3. Samples: 188848900. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 13:40:26,562][14064] Avg episode reward: [(0, '0.178')] [2024-06-06 13:40:26,692][14296] Updated weights for policy 0, policy_version 20497 (0.0028) [2024-06-06 13:40:29,873][14296] Updated weights for policy 0, policy_version 20507 (0.0022) [2024-06-06 13:40:31,561][14064] Fps is (10 sec: 45874.3, 60 sec: 47786.7, 300 sec: 47541.3). Total num frames: 336035840. Throughput: 0: 47970.6. Samples: 189140640. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 13:40:31,562][14064] Avg episode reward: [(0, '0.177')] [2024-06-06 13:40:33,379][14296] Updated weights for policy 0, policy_version 20517 (0.0038) [2024-06-06 13:40:36,561][14064] Fps is (10 sec: 47514.4, 60 sec: 48059.8, 300 sec: 47596.9). Total num frames: 336281600. Throughput: 0: 47585.4. Samples: 189275200. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 13:40:36,562][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:40:36,858][14296] Updated weights for policy 0, policy_version 20527 (0.0032) [2024-06-06 13:40:40,323][14296] Updated weights for policy 0, policy_version 20537 (0.0027) [2024-06-06 13:40:41,561][14064] Fps is (10 sec: 45875.6, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 336494592. Throughput: 0: 47568.9. Samples: 189557200. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 13:40:41,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:40:43,818][14296] Updated weights for policy 0, policy_version 20547 (0.0047) [2024-06-06 13:40:46,561][14064] Fps is (10 sec: 45875.2, 60 sec: 46967.6, 300 sec: 47485.9). Total num frames: 336740352. Throughput: 0: 47446.4. Samples: 189845980. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 13:40:46,561][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:40:47,456][14296] Updated weights for policy 0, policy_version 20557 (0.0028) [2024-06-06 13:40:50,506][14296] Updated weights for policy 0, policy_version 20567 (0.0036) [2024-06-06 13:40:51,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47513.5, 300 sec: 47541.3). Total num frames: 336969728. Throughput: 0: 47275.4. Samples: 189985480. Policy #0 lag: (min: 1.0, avg: 11.3, max: 22.0) [2024-06-06 13:40:51,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:40:54,194][14296] Updated weights for policy 0, policy_version 20577 (0.0032) [2024-06-06 13:40:56,561][14064] Fps is (10 sec: 49151.7, 60 sec: 47786.7, 300 sec: 47541.4). Total num frames: 337231872. Throughput: 0: 47306.2. Samples: 190275560. Policy #0 lag: (min: 1.0, avg: 11.3, max: 22.0) [2024-06-06 13:40:56,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:40:57,208][14296] Updated weights for policy 0, policy_version 20587 (0.0033) [2024-06-06 13:41:00,904][14296] Updated weights for policy 0, policy_version 20597 (0.0036) [2024-06-06 13:41:01,561][14064] Fps is (10 sec: 49151.9, 60 sec: 47242.5, 300 sec: 47541.4). Total num frames: 337461248. Throughput: 0: 47611.0. Samples: 190562900. Policy #0 lag: (min: 1.0, avg: 11.3, max: 22.0) [2024-06-06 13:41:01,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:41:04,252][14296] Updated weights for policy 0, policy_version 20607 (0.0023) [2024-06-06 13:41:06,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 337707008. Throughput: 0: 47482.1. Samples: 190698500. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 13:41:06,567][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:41:07,882][14296] Updated weights for policy 0, policy_version 20617 (0.0035) [2024-06-06 13:41:09,258][14276] Signal inference workers to stop experience collection... (2950 times) [2024-06-06 13:41:09,283][14296] InferenceWorker_p0-w0: stopping experience collection (2950 times) [2024-06-06 13:41:09,318][14276] Signal inference workers to resume experience collection... (2950 times) [2024-06-06 13:41:09,319][14296] InferenceWorker_p0-w0: resuming experience collection (2950 times) [2024-06-06 13:41:11,288][14296] Updated weights for policy 0, policy_version 20627 (0.0038) [2024-06-06 13:41:11,561][14064] Fps is (10 sec: 49152.8, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 337952768. Throughput: 0: 47507.3. Samples: 190986720. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 13:41:11,562][14064] Avg episode reward: [(0, '0.181')] [2024-06-06 13:41:15,170][14296] Updated weights for policy 0, policy_version 20637 (0.0038) [2024-06-06 13:41:16,567][14064] Fps is (10 sec: 49124.2, 60 sec: 47782.2, 300 sec: 47540.6). Total num frames: 338198528. Throughput: 0: 47396.8. Samples: 191273760. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 13:41:16,567][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:41:17,966][14296] Updated weights for policy 0, policy_version 20647 (0.0024) [2024-06-06 13:41:21,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 338427904. Throughput: 0: 47663.1. Samples: 191420040. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-06 13:41:21,561][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:41:21,826][14296] Updated weights for policy 0, policy_version 20657 (0.0036) [2024-06-06 13:41:24,840][14296] Updated weights for policy 0, policy_version 20667 (0.0018) [2024-06-06 13:41:26,561][14064] Fps is (10 sec: 42621.9, 60 sec: 46967.5, 300 sec: 47374.7). Total num frames: 338624512. Throughput: 0: 47702.1. Samples: 191703800. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-06 13:41:26,562][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:41:26,609][14276] Saving new best policy, reward=0.187! [2024-06-06 13:41:28,501][14296] Updated weights for policy 0, policy_version 20677 (0.0034) [2024-06-06 13:41:31,561][14064] Fps is (10 sec: 47513.4, 60 sec: 47786.8, 300 sec: 47652.5). Total num frames: 338903040. Throughput: 0: 47595.1. Samples: 191987760. Policy #0 lag: (min: 0.0, avg: 12.1, max: 21.0) [2024-06-06 13:41:31,562][14064] Avg episode reward: [(0, '0.185')] [2024-06-06 13:41:31,877][14296] Updated weights for policy 0, policy_version 20687 (0.0027) [2024-06-06 13:41:35,499][14296] Updated weights for policy 0, policy_version 20697 (0.0023) [2024-06-06 13:41:36,561][14064] Fps is (10 sec: 50790.5, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 339132416. Throughput: 0: 47798.7. Samples: 192136420. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 13:41:36,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:41:38,583][14296] Updated weights for policy 0, policy_version 20707 (0.0027) [2024-06-06 13:41:41,561][14064] Fps is (10 sec: 47513.6, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 339378176. Throughput: 0: 47762.7. Samples: 192424880. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 13:41:41,562][14064] Avg episode reward: [(0, '0.175')] [2024-06-06 13:41:42,470][14296] Updated weights for policy 0, policy_version 20717 (0.0026) [2024-06-06 13:41:45,480][14296] Updated weights for policy 0, policy_version 20727 (0.0031) [2024-06-06 13:41:46,561][14064] Fps is (10 sec: 47513.9, 60 sec: 47786.6, 300 sec: 47652.4). Total num frames: 339607552. Throughput: 0: 47653.0. Samples: 192707280. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 13:41:46,562][14064] Avg episode reward: [(0, '0.185')] [2024-06-06 13:41:49,156][14296] Updated weights for policy 0, policy_version 20737 (0.0028) [2024-06-06 13:41:51,561][14064] Fps is (10 sec: 47513.1, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 339853312. Throughput: 0: 47803.0. Samples: 192849640. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 13:41:51,563][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:41:52,394][14296] Updated weights for policy 0, policy_version 20747 (0.0026) [2024-06-06 13:41:55,967][14296] Updated weights for policy 0, policy_version 20757 (0.0027) [2024-06-06 13:41:56,561][14064] Fps is (10 sec: 49151.9, 60 sec: 47786.6, 300 sec: 47596.9). Total num frames: 340099072. Throughput: 0: 47710.5. Samples: 193133700. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 13:41:56,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:41:59,524][14296] Updated weights for policy 0, policy_version 20767 (0.0034) [2024-06-06 13:42:01,561][14064] Fps is (10 sec: 45876.0, 60 sec: 47513.8, 300 sec: 47485.8). Total num frames: 340312064. Throughput: 0: 47823.4. Samples: 193425540. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 13:42:01,561][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:42:02,804][14296] Updated weights for policy 0, policy_version 20777 (0.0035) [2024-06-06 13:42:06,169][14296] Updated weights for policy 0, policy_version 20787 (0.0033) [2024-06-06 13:42:06,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47786.6, 300 sec: 47708.0). Total num frames: 340574208. Throughput: 0: 47629.7. Samples: 193563380. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 13:42:06,562][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:42:06,568][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000020787_340574208.pth... [2024-06-06 13:42:06,610][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000020087_329105408.pth [2024-06-06 13:42:09,824][14296] Updated weights for policy 0, policy_version 20797 (0.0029) [2024-06-06 13:42:11,561][14064] Fps is (10 sec: 49151.2, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 340803584. Throughput: 0: 47589.8. Samples: 193845340. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 13:42:11,562][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:42:13,364][14296] Updated weights for policy 0, policy_version 20807 (0.0033) [2024-06-06 13:42:16,561][14064] Fps is (10 sec: 47514.1, 60 sec: 47518.1, 300 sec: 47485.8). Total num frames: 341049344. Throughput: 0: 47631.1. Samples: 194131160. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:42:16,561][14064] Avg episode reward: [(0, '0.181')] [2024-06-06 13:42:16,643][14296] Updated weights for policy 0, policy_version 20817 (0.0029) [2024-06-06 13:42:20,374][14296] Updated weights for policy 0, policy_version 20827 (0.0029) [2024-06-06 13:42:21,561][14064] Fps is (10 sec: 44236.8, 60 sec: 46967.4, 300 sec: 47485.8). Total num frames: 341245952. Throughput: 0: 47483.2. Samples: 194273160. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:42:21,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:42:23,490][14296] Updated weights for policy 0, policy_version 20837 (0.0042) [2024-06-06 13:42:25,073][14276] Signal inference workers to stop experience collection... (3000 times) [2024-06-06 13:42:25,074][14276] Signal inference workers to resume experience collection... (3000 times) [2024-06-06 13:42:25,101][14296] InferenceWorker_p0-w0: stopping experience collection (3000 times) [2024-06-06 13:42:25,101][14296] InferenceWorker_p0-w0: resuming experience collection (3000 times) [2024-06-06 13:42:26,561][14064] Fps is (10 sec: 47512.5, 60 sec: 48332.8, 300 sec: 47652.4). Total num frames: 341524480. Throughput: 0: 47312.2. Samples: 194553940. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:42:26,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:42:27,134][14296] Updated weights for policy 0, policy_version 20847 (0.0034) [2024-06-06 13:42:30,371][14296] Updated weights for policy 0, policy_version 20857 (0.0033) [2024-06-06 13:42:31,561][14064] Fps is (10 sec: 50790.4, 60 sec: 47513.5, 300 sec: 47652.4). Total num frames: 341753856. Throughput: 0: 47470.2. Samples: 194843440. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 13:42:31,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:42:33,775][14296] Updated weights for policy 0, policy_version 20867 (0.0027) [2024-06-06 13:42:36,561][14064] Fps is (10 sec: 45875.9, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 341983232. Throughput: 0: 47501.4. Samples: 194987200. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 13:42:36,562][14064] Avg episode reward: [(0, '0.186')] [2024-06-06 13:42:37,511][14296] Updated weights for policy 0, policy_version 20877 (0.0039) [2024-06-06 13:42:40,870][14296] Updated weights for policy 0, policy_version 20887 (0.0031) [2024-06-06 13:42:41,561][14064] Fps is (10 sec: 47513.0, 60 sec: 47513.4, 300 sec: 47652.4). Total num frames: 342228992. Throughput: 0: 47448.3. Samples: 195268880. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 13:42:41,562][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:42:44,551][14296] Updated weights for policy 0, policy_version 20897 (0.0029) [2024-06-06 13:42:46,561][14064] Fps is (10 sec: 49152.1, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 342474752. Throughput: 0: 47447.0. Samples: 195560660. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 13:42:46,562][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:42:46,572][14276] Saving new best policy, reward=0.189! [2024-06-06 13:42:47,930][14296] Updated weights for policy 0, policy_version 20907 (0.0029) [2024-06-06 13:42:51,365][14296] Updated weights for policy 0, policy_version 20917 (0.0029) [2024-06-06 13:42:51,561][14064] Fps is (10 sec: 47514.6, 60 sec: 47513.7, 300 sec: 47596.9). Total num frames: 342704128. Throughput: 0: 47722.8. Samples: 195710900. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 13:42:51,562][14064] Avg episode reward: [(0, '0.188')] [2024-06-06 13:42:54,644][14296] Updated weights for policy 0, policy_version 20927 (0.0038) [2024-06-06 13:42:56,561][14064] Fps is (10 sec: 44236.9, 60 sec: 46967.5, 300 sec: 47485.9). Total num frames: 342917120. Throughput: 0: 47575.2. Samples: 195986220. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 13:42:56,562][14064] Avg episode reward: [(0, '0.175')] [2024-06-06 13:42:58,302][14296] Updated weights for policy 0, policy_version 20937 (0.0025) [2024-06-06 13:43:01,363][14296] Updated weights for policy 0, policy_version 20947 (0.0019) [2024-06-06 13:43:01,562][14064] Fps is (10 sec: 49149.3, 60 sec: 48059.2, 300 sec: 47763.4). Total num frames: 343195648. Throughput: 0: 47420.3. Samples: 196265100. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 13:43:01,562][14064] Avg episode reward: [(0, '0.188')] [2024-06-06 13:43:05,252][14296] Updated weights for policy 0, policy_version 20957 (0.0039) [2024-06-06 13:43:06,561][14064] Fps is (10 sec: 50789.8, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 343425024. Throughput: 0: 47585.7. Samples: 196414520. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 13:43:06,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:43:08,542][14296] Updated weights for policy 0, policy_version 20967 (0.0046) [2024-06-06 13:43:11,561][14064] Fps is (10 sec: 45877.2, 60 sec: 47513.6, 300 sec: 47485.8). Total num frames: 343654400. Throughput: 0: 47688.1. Samples: 196699900. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 13:43:11,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:43:12,113][14296] Updated weights for policy 0, policy_version 20977 (0.0030) [2024-06-06 13:43:15,734][14296] Updated weights for policy 0, policy_version 20987 (0.0022) [2024-06-06 13:43:16,561][14064] Fps is (10 sec: 44237.7, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 343867392. Throughput: 0: 47545.5. Samples: 196982980. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 13:43:16,561][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:43:19,085][14296] Updated weights for policy 0, policy_version 20997 (0.0026) [2024-06-06 13:43:21,561][14064] Fps is (10 sec: 47514.0, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 344129536. Throughput: 0: 47364.5. Samples: 197118600. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 13:43:21,568][14064] Avg episode reward: [(0, '0.186')] [2024-06-06 13:43:22,624][14296] Updated weights for policy 0, policy_version 21007 (0.0033) [2024-06-06 13:43:25,929][14296] Updated weights for policy 0, policy_version 21017 (0.0027) [2024-06-06 13:43:26,561][14064] Fps is (10 sec: 49151.3, 60 sec: 47240.7, 300 sec: 47541.4). Total num frames: 344358912. Throughput: 0: 47409.9. Samples: 197402320. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 13:43:26,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:43:29,433][14296] Updated weights for policy 0, policy_version 21027 (0.0039) [2024-06-06 13:43:31,561][14064] Fps is (10 sec: 45875.4, 60 sec: 47240.6, 300 sec: 47596.9). Total num frames: 344588288. Throughput: 0: 47192.9. Samples: 197684340. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 13:43:31,562][14064] Avg episode reward: [(0, '0.179')] [2024-06-06 13:43:32,031][14276] Signal inference workers to stop experience collection... (3050 times) [2024-06-06 13:43:32,051][14296] InferenceWorker_p0-w0: stopping experience collection (3050 times) [2024-06-06 13:43:32,138][14276] Signal inference workers to resume experience collection... (3050 times) [2024-06-06 13:43:32,138][14296] InferenceWorker_p0-w0: resuming experience collection (3050 times) [2024-06-06 13:43:33,047][14296] Updated weights for policy 0, policy_version 21037 (0.0025) [2024-06-06 13:43:36,561][14064] Fps is (10 sec: 45875.6, 60 sec: 47240.6, 300 sec: 47652.5). Total num frames: 344817664. Throughput: 0: 46903.6. Samples: 197821560. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 13:43:36,562][14064] Avg episode reward: [(0, '0.185')] [2024-06-06 13:43:36,573][14296] Updated weights for policy 0, policy_version 21047 (0.0026) [2024-06-06 13:43:39,830][14296] Updated weights for policy 0, policy_version 21057 (0.0039) [2024-06-06 13:43:41,561][14064] Fps is (10 sec: 45874.5, 60 sec: 46967.5, 300 sec: 47541.3). Total num frames: 345047040. Throughput: 0: 47176.8. Samples: 198109180. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 13:43:41,562][14064] Avg episode reward: [(0, '0.186')] [2024-06-06 13:43:43,679][14296] Updated weights for policy 0, policy_version 21067 (0.0027) [2024-06-06 13:43:46,564][14064] Fps is (10 sec: 49140.8, 60 sec: 47238.8, 300 sec: 47541.0). Total num frames: 345309184. Throughput: 0: 47088.5. Samples: 198384160. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 13:43:46,564][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:43:46,850][14296] Updated weights for policy 0, policy_version 21077 (0.0027) [2024-06-06 13:43:50,440][14296] Updated weights for policy 0, policy_version 21087 (0.0026) [2024-06-06 13:43:51,561][14064] Fps is (10 sec: 45875.3, 60 sec: 46694.3, 300 sec: 47430.3). Total num frames: 345505792. Throughput: 0: 46993.3. Samples: 198529220. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 13:43:51,567][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:43:53,734][14296] Updated weights for policy 0, policy_version 21097 (0.0030) [2024-06-06 13:43:56,561][14064] Fps is (10 sec: 47523.3, 60 sec: 47786.5, 300 sec: 47652.4). Total num frames: 345784320. Throughput: 0: 46933.2. Samples: 198811900. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 13:43:56,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:43:57,329][14296] Updated weights for policy 0, policy_version 21107 (0.0031) [2024-06-06 13:44:01,250][14296] Updated weights for policy 0, policy_version 21117 (0.0031) [2024-06-06 13:44:01,561][14064] Fps is (10 sec: 49152.2, 60 sec: 46694.8, 300 sec: 47541.4). Total num frames: 345997312. Throughput: 0: 46953.2. Samples: 199095880. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 13:44:01,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:44:04,332][14296] Updated weights for policy 0, policy_version 21127 (0.0041) [2024-06-06 13:44:06,561][14064] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 47485.8). Total num frames: 346243072. Throughput: 0: 47089.7. Samples: 199237640. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 13:44:06,562][14064] Avg episode reward: [(0, '0.186')] [2024-06-06 13:44:06,570][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000021133_346243072.pth... [2024-06-06 13:44:06,637][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000020440_334888960.pth [2024-06-06 13:44:07,767][14296] Updated weights for policy 0, policy_version 21137 (0.0024) [2024-06-06 13:44:11,175][14296] Updated weights for policy 0, policy_version 21147 (0.0029) [2024-06-06 13:44:11,561][14064] Fps is (10 sec: 47513.8, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 346472448. Throughput: 0: 47131.1. Samples: 199523220. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 13:44:11,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:44:14,472][14296] Updated weights for policy 0, policy_version 21157 (0.0026) [2024-06-06 13:44:16,561][14064] Fps is (10 sec: 49151.3, 60 sec: 47786.4, 300 sec: 47541.3). Total num frames: 346734592. Throughput: 0: 47317.5. Samples: 199813640. Policy #0 lag: (min: 1.0, avg: 10.1, max: 23.0) [2024-06-06 13:44:16,562][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:44:17,849][14296] Updated weights for policy 0, policy_version 21167 (0.0032) [2024-06-06 13:44:21,557][14296] Updated weights for policy 0, policy_version 21177 (0.0027) [2024-06-06 13:44:21,561][14064] Fps is (10 sec: 49152.2, 60 sec: 47240.5, 300 sec: 47541.4). Total num frames: 346963968. Throughput: 0: 47556.4. Samples: 199961600. Policy #0 lag: (min: 1.0, avg: 10.1, max: 23.0) [2024-06-06 13:44:21,562][14064] Avg episode reward: [(0, '0.186')] [2024-06-06 13:44:24,667][14296] Updated weights for policy 0, policy_version 21187 (0.0049) [2024-06-06 13:44:26,561][14064] Fps is (10 sec: 44237.1, 60 sec: 46967.4, 300 sec: 47485.8). Total num frames: 347176960. Throughput: 0: 47330.6. Samples: 200239060. Policy #0 lag: (min: 1.0, avg: 10.1, max: 23.0) [2024-06-06 13:44:26,562][14064] Avg episode reward: [(0, '0.186')] [2024-06-06 13:44:28,550][14296] Updated weights for policy 0, policy_version 21197 (0.0035) [2024-06-06 13:44:31,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47513.6, 300 sec: 47596.9). Total num frames: 347439104. Throughput: 0: 47536.1. Samples: 200523180. Policy #0 lag: (min: 0.0, avg: 10.5, max: 24.0) [2024-06-06 13:44:31,561][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:44:31,582][14296] Updated weights for policy 0, policy_version 21207 (0.0038) [2024-06-06 13:44:35,276][14296] Updated weights for policy 0, policy_version 21217 (0.0039) [2024-06-06 13:44:36,561][14064] Fps is (10 sec: 49153.1, 60 sec: 47513.6, 300 sec: 47485.8). Total num frames: 347668480. Throughput: 0: 47522.9. Samples: 200667740. Policy #0 lag: (min: 0.0, avg: 10.5, max: 24.0) [2024-06-06 13:44:36,562][14064] Avg episode reward: [(0, '0.177')] [2024-06-06 13:44:38,670][14296] Updated weights for policy 0, policy_version 21227 (0.0037) [2024-06-06 13:44:41,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47513.7, 300 sec: 47374.8). Total num frames: 347897856. Throughput: 0: 47535.3. Samples: 200950980. Policy #0 lag: (min: 0.0, avg: 10.5, max: 24.0) [2024-06-06 13:44:41,562][14064] Avg episode reward: [(0, '0.188')] [2024-06-06 13:44:42,121][14296] Updated weights for policy 0, policy_version 21237 (0.0031) [2024-06-06 13:44:45,535][14296] Updated weights for policy 0, policy_version 21247 (0.0034) [2024-06-06 13:44:46,564][14064] Fps is (10 sec: 45862.7, 60 sec: 46967.1, 300 sec: 47485.4). Total num frames: 348127232. Throughput: 0: 47537.7. Samples: 201235200. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-06 13:44:46,565][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:44:49,054][14296] Updated weights for policy 0, policy_version 21257 (0.0023) [2024-06-06 13:44:50,083][14276] Signal inference workers to stop experience collection... (3100 times) [2024-06-06 13:44:50,084][14276] Signal inference workers to resume experience collection... (3100 times) [2024-06-06 13:44:50,127][14296] InferenceWorker_p0-w0: stopping experience collection (3100 times) [2024-06-06 13:44:50,127][14296] InferenceWorker_p0-w0: resuming experience collection (3100 times) [2024-06-06 13:44:51,561][14064] Fps is (10 sec: 49152.2, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 348389376. Throughput: 0: 47523.7. Samples: 201376200. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-06 13:44:51,561][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:44:51,562][14276] Saving new best policy, reward=0.191! [2024-06-06 13:44:52,350][14296] Updated weights for policy 0, policy_version 21267 (0.0037) [2024-06-06 13:44:56,135][14296] Updated weights for policy 0, policy_version 21277 (0.0034) [2024-06-06 13:44:56,561][14064] Fps is (10 sec: 49165.1, 60 sec: 47240.7, 300 sec: 47430.7). Total num frames: 348618752. Throughput: 0: 47561.8. Samples: 201663500. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-06 13:44:56,562][14064] Avg episode reward: [(0, '0.190')] [2024-06-06 13:44:59,421][14296] Updated weights for policy 0, policy_version 21287 (0.0034) [2024-06-06 13:45:01,561][14064] Fps is (10 sec: 45875.0, 60 sec: 47513.7, 300 sec: 47374.8). Total num frames: 348848128. Throughput: 0: 47463.4. Samples: 201949480. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:45:01,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:45:02,706][14296] Updated weights for policy 0, policy_version 21297 (0.0025) [2024-06-06 13:45:06,331][14296] Updated weights for policy 0, policy_version 21307 (0.0030) [2024-06-06 13:45:06,563][14064] Fps is (10 sec: 47503.7, 60 sec: 47512.0, 300 sec: 47430.0). Total num frames: 349093888. Throughput: 0: 47252.0. Samples: 202088040. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:45:06,564][14064] Avg episode reward: [(0, '0.188')] [2024-06-06 13:45:09,697][14296] Updated weights for policy 0, policy_version 21317 (0.0030) [2024-06-06 13:45:11,561][14064] Fps is (10 sec: 49151.1, 60 sec: 47786.6, 300 sec: 47485.8). Total num frames: 349339648. Throughput: 0: 47521.3. Samples: 202377520. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:45:11,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:45:13,134][14296] Updated weights for policy 0, policy_version 21327 (0.0035) [2024-06-06 13:45:16,534][14296] Updated weights for policy 0, policy_version 21337 (0.0027) [2024-06-06 13:45:16,561][14064] Fps is (10 sec: 49161.8, 60 sec: 47513.7, 300 sec: 47485.8). Total num frames: 349585408. Throughput: 0: 47342.6. Samples: 202653600. Policy #0 lag: (min: 0.0, avg: 10.8, max: 20.0) [2024-06-06 13:45:16,562][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:45:20,013][14296] Updated weights for policy 0, policy_version 21347 (0.0033) [2024-06-06 13:45:21,561][14064] Fps is (10 sec: 42599.4, 60 sec: 46694.5, 300 sec: 47319.2). Total num frames: 349765632. Throughput: 0: 47320.4. Samples: 202797160. Policy #0 lag: (min: 0.0, avg: 10.8, max: 20.0) [2024-06-06 13:45:21,561][14064] Avg episode reward: [(0, '0.185')] [2024-06-06 13:45:23,832][14296] Updated weights for policy 0, policy_version 21357 (0.0027) [2024-06-06 13:45:26,561][14064] Fps is (10 sec: 45875.0, 60 sec: 47786.7, 300 sec: 47485.8). Total num frames: 350044160. Throughput: 0: 47251.4. Samples: 203077300. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 13:45:26,562][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:45:27,083][14296] Updated weights for policy 0, policy_version 21367 (0.0025) [2024-06-06 13:45:30,519][14296] Updated weights for policy 0, policy_version 21377 (0.0034) [2024-06-06 13:45:31,561][14064] Fps is (10 sec: 50789.8, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 350273536. Throughput: 0: 47457.0. Samples: 203370640. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 13:45:31,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:45:31,589][14276] Saving new best policy, reward=0.192! [2024-06-06 13:45:34,094][14296] Updated weights for policy 0, policy_version 21387 (0.0026) [2024-06-06 13:45:36,561][14064] Fps is (10 sec: 45876.1, 60 sec: 47240.5, 300 sec: 47485.9). Total num frames: 350502912. Throughput: 0: 47440.0. Samples: 203511000. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 13:45:36,561][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:45:37,234][14296] Updated weights for policy 0, policy_version 21397 (0.0033) [2024-06-06 13:45:40,864][14296] Updated weights for policy 0, policy_version 21407 (0.0024) [2024-06-06 13:45:41,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47513.6, 300 sec: 47485.8). Total num frames: 350748672. Throughput: 0: 47447.6. Samples: 203798640. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 13:45:41,562][14064] Avg episode reward: [(0, '0.178')] [2024-06-06 13:45:44,131][14296] Updated weights for policy 0, policy_version 21417 (0.0028) [2024-06-06 13:45:46,561][14064] Fps is (10 sec: 49151.7, 60 sec: 47788.8, 300 sec: 47541.4). Total num frames: 350994432. Throughput: 0: 47505.8. Samples: 204087240. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 13:45:46,562][14064] Avg episode reward: [(0, '0.181')] [2024-06-06 13:45:47,562][14296] Updated weights for policy 0, policy_version 21427 (0.0030) [2024-06-06 13:45:51,154][14296] Updated weights for policy 0, policy_version 21437 (0.0032) [2024-06-06 13:45:51,561][14064] Fps is (10 sec: 49151.3, 60 sec: 47513.4, 300 sec: 47485.8). Total num frames: 351240192. Throughput: 0: 47683.8. Samples: 204233720. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 13:45:51,562][14064] Avg episode reward: [(0, '0.185')] [2024-06-06 13:45:54,554][14296] Updated weights for policy 0, policy_version 21447 (0.0032) [2024-06-06 13:45:56,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47513.6, 300 sec: 47485.9). Total num frames: 351469568. Throughput: 0: 47749.1. Samples: 204526220. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:45:56,562][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:45:57,761][14296] Updated weights for policy 0, policy_version 21457 (0.0029) [2024-06-06 13:46:01,393][14296] Updated weights for policy 0, policy_version 21467 (0.0025) [2024-06-06 13:46:01,563][14064] Fps is (10 sec: 47506.3, 60 sec: 47785.3, 300 sec: 47485.6). Total num frames: 351715328. Throughput: 0: 47880.1. Samples: 204808280. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:46:01,563][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:46:04,204][14276] Signal inference workers to stop experience collection... (3150 times) [2024-06-06 13:46:04,223][14296] InferenceWorker_p0-w0: stopping experience collection (3150 times) [2024-06-06 13:46:04,261][14276] Signal inference workers to resume experience collection... (3150 times) [2024-06-06 13:46:04,261][14296] InferenceWorker_p0-w0: resuming experience collection (3150 times) [2024-06-06 13:46:04,572][14296] Updated weights for policy 0, policy_version 21477 (0.0032) [2024-06-06 13:46:06,561][14064] Fps is (10 sec: 49151.5, 60 sec: 47788.2, 300 sec: 47485.8). Total num frames: 351961088. Throughput: 0: 47870.9. Samples: 204951360. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:46:06,562][14064] Avg episode reward: [(0, '0.190')] [2024-06-06 13:46:06,665][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000021483_351977472.pth... [2024-06-06 13:46:06,709][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000020787_340574208.pth [2024-06-06 13:46:08,198][14296] Updated weights for policy 0, policy_version 21487 (0.0032) [2024-06-06 13:46:11,561][14064] Fps is (10 sec: 47520.7, 60 sec: 47513.6, 300 sec: 47431.2). Total num frames: 352190464. Throughput: 0: 48128.4. Samples: 205243080. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 13:46:11,562][14064] Avg episode reward: [(0, '0.190')] [2024-06-06 13:46:11,615][14296] Updated weights for policy 0, policy_version 21497 (0.0036) [2024-06-06 13:46:14,867][14296] Updated weights for policy 0, policy_version 21507 (0.0032) [2024-06-06 13:46:16,561][14064] Fps is (10 sec: 45875.9, 60 sec: 47240.6, 300 sec: 47430.3). Total num frames: 352419840. Throughput: 0: 48067.6. Samples: 205533680. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 13:46:16,561][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:46:16,592][14276] Saving new best policy, reward=0.193! [2024-06-06 13:46:18,377][14296] Updated weights for policy 0, policy_version 21517 (0.0034) [2024-06-06 13:46:21,561][14064] Fps is (10 sec: 49152.9, 60 sec: 48605.8, 300 sec: 47652.5). Total num frames: 352681984. Throughput: 0: 48039.9. Samples: 205672800. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 13:46:21,561][14064] Avg episode reward: [(0, '0.190')] [2024-06-06 13:46:21,738][14296] Updated weights for policy 0, policy_version 21527 (0.0028) [2024-06-06 13:46:25,415][14296] Updated weights for policy 0, policy_version 21537 (0.0027) [2024-06-06 13:46:26,561][14064] Fps is (10 sec: 49151.7, 60 sec: 47786.8, 300 sec: 47485.8). Total num frames: 352911360. Throughput: 0: 48027.5. Samples: 205959880. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 13:46:26,562][14064] Avg episode reward: [(0, '0.182')] [2024-06-06 13:46:28,771][14296] Updated weights for policy 0, policy_version 21547 (0.0035) [2024-06-06 13:46:31,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47786.7, 300 sec: 47485.9). Total num frames: 353140736. Throughput: 0: 48003.1. Samples: 206247380. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 13:46:31,562][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:46:32,099][14296] Updated weights for policy 0, policy_version 21557 (0.0037) [2024-06-06 13:46:35,583][14296] Updated weights for policy 0, policy_version 21567 (0.0037) [2024-06-06 13:46:36,561][14064] Fps is (10 sec: 47513.5, 60 sec: 48059.6, 300 sec: 47485.8). Total num frames: 353386496. Throughput: 0: 47841.9. Samples: 206386600. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 13:46:36,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:46:39,036][14296] Updated weights for policy 0, policy_version 21577 (0.0036) [2024-06-06 13:46:41,561][14064] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 353632256. Throughput: 0: 47680.5. Samples: 206671840. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:46:41,562][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:46:42,355][14296] Updated weights for policy 0, policy_version 21587 (0.0039) [2024-06-06 13:46:46,007][14296] Updated weights for policy 0, policy_version 21597 (0.0028) [2024-06-06 13:46:46,563][14064] Fps is (10 sec: 47503.4, 60 sec: 47784.9, 300 sec: 47485.5). Total num frames: 353861632. Throughput: 0: 47641.2. Samples: 206952160. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:46:46,564][14064] Avg episode reward: [(0, '0.198')] [2024-06-06 13:46:46,682][14276] Saving new best policy, reward=0.198! [2024-06-06 13:46:49,319][14296] Updated weights for policy 0, policy_version 21607 (0.0028) [2024-06-06 13:46:51,561][14064] Fps is (10 sec: 45875.1, 60 sec: 47513.7, 300 sec: 47430.3). Total num frames: 354091008. Throughput: 0: 47693.4. Samples: 207097560. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:46:51,562][14064] Avg episode reward: [(0, '0.185')] [2024-06-06 13:46:52,878][14296] Updated weights for policy 0, policy_version 21617 (0.0025) [2024-06-06 13:46:56,311][14296] Updated weights for policy 0, policy_version 21627 (0.0026) [2024-06-06 13:46:56,561][14064] Fps is (10 sec: 47523.6, 60 sec: 47786.6, 300 sec: 47541.3). Total num frames: 354336768. Throughput: 0: 47610.3. Samples: 207385540. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 13:46:56,562][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:46:59,673][14296] Updated weights for policy 0, policy_version 21637 (0.0033) [2024-06-06 13:47:01,561][14064] Fps is (10 sec: 49151.3, 60 sec: 47787.9, 300 sec: 47485.8). Total num frames: 354582528. Throughput: 0: 47418.0. Samples: 207667500. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 13:47:01,562][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:47:03,224][14296] Updated weights for policy 0, policy_version 21647 (0.0029) [2024-06-06 13:47:06,561][14064] Fps is (10 sec: 45875.5, 60 sec: 47240.6, 300 sec: 47430.3). Total num frames: 354795520. Throughput: 0: 47427.5. Samples: 207807040. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 13:47:06,562][14064] Avg episode reward: [(0, '0.188')] [2024-06-06 13:47:06,774][14296] Updated weights for policy 0, policy_version 21657 (0.0029) [2024-06-06 13:47:09,938][14296] Updated weights for policy 0, policy_version 21667 (0.0022) [2024-06-06 13:47:11,561][14064] Fps is (10 sec: 45876.0, 60 sec: 47513.8, 300 sec: 47430.3). Total num frames: 355041280. Throughput: 0: 47466.7. Samples: 208095880. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 13:47:11,562][14064] Avg episode reward: [(0, '0.186')] [2024-06-06 13:47:13,409][14296] Updated weights for policy 0, policy_version 21677 (0.0026) [2024-06-06 13:47:16,561][14064] Fps is (10 sec: 50790.1, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 355303424. Throughput: 0: 47470.2. Samples: 208383540. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 13:47:16,562][14064] Avg episode reward: [(0, '0.190')] [2024-06-06 13:47:16,977][14296] Updated weights for policy 0, policy_version 21687 (0.0019) [2024-06-06 13:47:20,380][14296] Updated weights for policy 0, policy_version 21697 (0.0031) [2024-06-06 13:47:21,564][14064] Fps is (10 sec: 49139.9, 60 sec: 47511.7, 300 sec: 47485.5). Total num frames: 355532800. Throughput: 0: 47469.0. Samples: 208522820. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 13:47:21,564][14064] Avg episode reward: [(0, '0.185')] [2024-06-06 13:47:24,041][14296] Updated weights for policy 0, policy_version 21707 (0.0031) [2024-06-06 13:47:26,561][14064] Fps is (10 sec: 45875.7, 60 sec: 47513.6, 300 sec: 47485.9). Total num frames: 355762176. Throughput: 0: 47355.1. Samples: 208802820. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 13:47:26,562][14064] Avg episode reward: [(0, '0.190')] [2024-06-06 13:47:27,506][14296] Updated weights for policy 0, policy_version 21717 (0.0039) [2024-06-06 13:47:30,905][14296] Updated weights for policy 0, policy_version 21727 (0.0039) [2024-06-06 13:47:31,561][14064] Fps is (10 sec: 44247.4, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 355975168. Throughput: 0: 47443.1. Samples: 209087000. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 13:47:31,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:47:34,482][14296] Updated weights for policy 0, policy_version 21737 (0.0029) [2024-06-06 13:47:36,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47513.7, 300 sec: 47485.9). Total num frames: 356237312. Throughput: 0: 47332.0. Samples: 209227500. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 13:47:36,562][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:47:37,647][14296] Updated weights for policy 0, policy_version 21747 (0.0031) [2024-06-06 13:47:41,363][14296] Updated weights for policy 0, policy_version 21757 (0.0025) [2024-06-06 13:47:41,561][14064] Fps is (10 sec: 50790.5, 60 sec: 47513.6, 300 sec: 47485.8). Total num frames: 356483072. Throughput: 0: 47355.2. Samples: 209516520. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 13:47:41,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:47:44,723][14296] Updated weights for policy 0, policy_version 21767 (0.0034) [2024-06-06 13:47:46,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47242.3, 300 sec: 47430.3). Total num frames: 356696064. Throughput: 0: 47418.9. Samples: 209801340. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 13:47:46,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:47:47,482][14276] Signal inference workers to stop experience collection... (3200 times) [2024-06-06 13:47:47,482][14276] Signal inference workers to resume experience collection... (3200 times) [2024-06-06 13:47:47,501][14296] InferenceWorker_p0-w0: stopping experience collection (3200 times) [2024-06-06 13:47:47,501][14296] InferenceWorker_p0-w0: resuming experience collection (3200 times) [2024-06-06 13:47:48,212][14296] Updated weights for policy 0, policy_version 21777 (0.0033) [2024-06-06 13:47:51,562][14064] Fps is (10 sec: 45874.1, 60 sec: 47513.4, 300 sec: 47541.3). Total num frames: 356941824. Throughput: 0: 47364.6. Samples: 209938460. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 13:47:51,562][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:47:51,892][14296] Updated weights for policy 0, policy_version 21787 (0.0031) [2024-06-06 13:47:54,972][14296] Updated weights for policy 0, policy_version 21797 (0.0034) [2024-06-06 13:47:56,561][14064] Fps is (10 sec: 49151.6, 60 sec: 47513.6, 300 sec: 47430.4). Total num frames: 357187584. Throughput: 0: 47191.9. Samples: 210219520. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 13:47:56,562][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:47:58,835][14296] Updated weights for policy 0, policy_version 21807 (0.0028) [2024-06-06 13:48:01,561][14064] Fps is (10 sec: 47515.0, 60 sec: 47240.7, 300 sec: 47430.3). Total num frames: 357416960. Throughput: 0: 47213.9. Samples: 210508160. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 13:48:01,561][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:48:02,088][14296] Updated weights for policy 0, policy_version 21817 (0.0034) [2024-06-06 13:48:05,753][14296] Updated weights for policy 0, policy_version 21827 (0.0033) [2024-06-06 13:48:06,564][14064] Fps is (10 sec: 47500.9, 60 sec: 47784.5, 300 sec: 47485.4). Total num frames: 357662720. Throughput: 0: 47315.3. Samples: 210652020. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 13:48:06,565][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:48:06,577][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000021830_357662720.pth... [2024-06-06 13:48:06,621][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000021133_346243072.pth [2024-06-06 13:48:08,967][14296] Updated weights for policy 0, policy_version 21837 (0.0042) [2024-06-06 13:48:11,561][14064] Fps is (10 sec: 47512.6, 60 sec: 47513.4, 300 sec: 47541.3). Total num frames: 357892096. Throughput: 0: 47414.4. Samples: 210936480. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:48:11,562][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:48:12,708][14296] Updated weights for policy 0, policy_version 21847 (0.0028) [2024-06-06 13:48:15,775][14296] Updated weights for policy 0, policy_version 21857 (0.0031) [2024-06-06 13:48:16,561][14064] Fps is (10 sec: 47526.3, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 358137856. Throughput: 0: 47406.2. Samples: 211220280. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:48:16,562][14064] Avg episode reward: [(0, '0.183')] [2024-06-06 13:48:19,658][14296] Updated weights for policy 0, policy_version 21867 (0.0029) [2024-06-06 13:48:21,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47242.3, 300 sec: 47485.8). Total num frames: 358367232. Throughput: 0: 47641.6. Samples: 211371380. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:48:21,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:48:22,442][14296] Updated weights for policy 0, policy_version 21877 (0.0032) [2024-06-06 13:48:26,376][14296] Updated weights for policy 0, policy_version 21887 (0.0031) [2024-06-06 13:48:26,561][14064] Fps is (10 sec: 45875.6, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 358596608. Throughput: 0: 47497.8. Samples: 211653920. Policy #0 lag: (min: 0.0, avg: 10.8, max: 20.0) [2024-06-06 13:48:26,562][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:48:29,521][14296] Updated weights for policy 0, policy_version 21897 (0.0027) [2024-06-06 13:48:31,561][14064] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 47596.9). Total num frames: 358858752. Throughput: 0: 47378.1. Samples: 211933360. Policy #0 lag: (min: 0.0, avg: 10.8, max: 20.0) [2024-06-06 13:48:31,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:48:33,174][14296] Updated weights for policy 0, policy_version 21907 (0.0032) [2024-06-06 13:48:36,263][14296] Updated weights for policy 0, policy_version 21917 (0.0023) [2024-06-06 13:48:36,561][14064] Fps is (10 sec: 49151.6, 60 sec: 47513.5, 300 sec: 47596.9). Total num frames: 359088128. Throughput: 0: 47750.9. Samples: 212087240. Policy #0 lag: (min: 0.0, avg: 10.8, max: 20.0) [2024-06-06 13:48:36,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:48:40,070][14296] Updated weights for policy 0, policy_version 21927 (0.0024) [2024-06-06 13:48:41,561][14064] Fps is (10 sec: 45875.5, 60 sec: 47240.6, 300 sec: 47486.2). Total num frames: 359317504. Throughput: 0: 47971.2. Samples: 212378220. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:48:41,562][14064] Avg episode reward: [(0, '0.186')] [2024-06-06 13:48:43,076][14296] Updated weights for policy 0, policy_version 21937 (0.0034) [2024-06-06 13:48:46,561][14064] Fps is (10 sec: 45875.1, 60 sec: 47513.5, 300 sec: 47596.9). Total num frames: 359546880. Throughput: 0: 47740.3. Samples: 212656480. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:48:46,562][14064] Avg episode reward: [(0, '0.180')] [2024-06-06 13:48:47,122][14296] Updated weights for policy 0, policy_version 21947 (0.0033) [2024-06-06 13:48:50,269][14296] Updated weights for policy 0, policy_version 21957 (0.0039) [2024-06-06 13:48:51,561][14064] Fps is (10 sec: 47513.1, 60 sec: 47513.7, 300 sec: 47485.8). Total num frames: 359792640. Throughput: 0: 47658.3. Samples: 212796520. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:48:51,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:48:54,030][14296] Updated weights for policy 0, policy_version 21967 (0.0028) [2024-06-06 13:48:56,561][14064] Fps is (10 sec: 49152.7, 60 sec: 47513.7, 300 sec: 47596.9). Total num frames: 360038400. Throughput: 0: 47755.4. Samples: 213085460. Policy #0 lag: (min: 1.0, avg: 10.6, max: 21.0) [2024-06-06 13:48:56,561][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:48:57,041][14296] Updated weights for policy 0, policy_version 21977 (0.0032) [2024-06-06 13:49:00,702][14296] Updated weights for policy 0, policy_version 21987 (0.0022) [2024-06-06 13:49:01,564][14064] Fps is (10 sec: 49139.3, 60 sec: 47784.5, 300 sec: 47596.5). Total num frames: 360284160. Throughput: 0: 47721.2. Samples: 213367860. Policy #0 lag: (min: 1.0, avg: 10.6, max: 21.0) [2024-06-06 13:49:01,565][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:49:03,809][14296] Updated weights for policy 0, policy_version 21997 (0.0036) [2024-06-06 13:49:04,457][14276] Signal inference workers to stop experience collection... (3250 times) [2024-06-06 13:49:04,457][14276] Signal inference workers to resume experience collection... (3250 times) [2024-06-06 13:49:04,475][14296] InferenceWorker_p0-w0: stopping experience collection (3250 times) [2024-06-06 13:49:04,480][14296] InferenceWorker_p0-w0: resuming experience collection (3250 times) [2024-06-06 13:49:06,561][14064] Fps is (10 sec: 47513.3, 60 sec: 47515.7, 300 sec: 47596.9). Total num frames: 360513536. Throughput: 0: 47532.6. Samples: 213510340. Policy #0 lag: (min: 1.0, avg: 10.6, max: 21.0) [2024-06-06 13:49:06,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:49:07,566][14296] Updated weights for policy 0, policy_version 22007 (0.0026) [2024-06-06 13:49:10,625][14296] Updated weights for policy 0, policy_version 22017 (0.0033) [2024-06-06 13:49:11,561][14064] Fps is (10 sec: 49164.9, 60 sec: 48059.8, 300 sec: 47596.9). Total num frames: 360775680. Throughput: 0: 47665.2. Samples: 213798860. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 13:49:11,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:49:14,576][14296] Updated weights for policy 0, policy_version 22027 (0.0029) [2024-06-06 13:49:16,561][14064] Fps is (10 sec: 47514.1, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 360988672. Throughput: 0: 47800.6. Samples: 214084380. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 13:49:16,561][14064] Avg episode reward: [(0, '0.186')] [2024-06-06 13:49:17,622][14296] Updated weights for policy 0, policy_version 22037 (0.0029) [2024-06-06 13:49:21,561][14064] Fps is (10 sec: 42598.4, 60 sec: 47240.6, 300 sec: 47541.4). Total num frames: 361201664. Throughput: 0: 47435.1. Samples: 214221820. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 13:49:21,562][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:49:21,607][14296] Updated weights for policy 0, policy_version 22047 (0.0029) [2024-06-06 13:49:24,508][14296] Updated weights for policy 0, policy_version 22057 (0.0033) [2024-06-06 13:49:26,561][14064] Fps is (10 sec: 47512.9, 60 sec: 47786.6, 300 sec: 47541.4). Total num frames: 361463808. Throughput: 0: 47220.4. Samples: 214503140. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:49:26,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:49:28,378][14296] Updated weights for policy 0, policy_version 22067 (0.0028) [2024-06-06 13:49:31,561][14064] Fps is (10 sec: 49152.6, 60 sec: 47240.6, 300 sec: 47541.4). Total num frames: 361693184. Throughput: 0: 47468.2. Samples: 214792540. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:49:31,561][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:49:31,775][14296] Updated weights for policy 0, policy_version 22077 (0.0021) [2024-06-06 13:49:35,376][14296] Updated weights for policy 0, policy_version 22087 (0.0021) [2024-06-06 13:49:36,561][14064] Fps is (10 sec: 45875.5, 60 sec: 47240.6, 300 sec: 47541.4). Total num frames: 361922560. Throughput: 0: 47410.8. Samples: 214930000. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:49:36,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:49:38,520][14296] Updated weights for policy 0, policy_version 22097 (0.0042) [2024-06-06 13:49:41,561][14064] Fps is (10 sec: 45875.1, 60 sec: 47240.6, 300 sec: 47541.8). Total num frames: 362151936. Throughput: 0: 47332.9. Samples: 215215440. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:49:41,562][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:49:42,315][14296] Updated weights for policy 0, policy_version 22107 (0.0039) [2024-06-06 13:49:45,191][14296] Updated weights for policy 0, policy_version 22117 (0.0038) [2024-06-06 13:49:46,561][14064] Fps is (10 sec: 49152.0, 60 sec: 47786.7, 300 sec: 47541.4). Total num frames: 362414080. Throughput: 0: 47432.2. Samples: 215502180. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:49:46,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 13:49:49,078][14296] Updated weights for policy 0, policy_version 22127 (0.0029) [2024-06-06 13:49:51,561][14064] Fps is (10 sec: 50789.4, 60 sec: 47786.6, 300 sec: 47596.9). Total num frames: 362659840. Throughput: 0: 47457.6. Samples: 215645940. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 13:49:51,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:49:52,240][14296] Updated weights for policy 0, policy_version 22137 (0.0029) [2024-06-06 13:49:55,851][14296] Updated weights for policy 0, policy_version 22147 (0.0028) [2024-06-06 13:49:56,564][14064] Fps is (10 sec: 49140.1, 60 sec: 47784.7, 300 sec: 47652.1). Total num frames: 362905600. Throughput: 0: 47509.1. Samples: 215936880. Policy #0 lag: (min: 0.0, avg: 9.5, max: 23.0) [2024-06-06 13:49:56,564][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:49:59,212][14296] Updated weights for policy 0, policy_version 22157 (0.0024) [2024-06-06 13:50:01,561][14064] Fps is (10 sec: 47513.4, 60 sec: 47515.6, 300 sec: 47597.2). Total num frames: 363134976. Throughput: 0: 47483.3. Samples: 216221140. Policy #0 lag: (min: 0.0, avg: 9.5, max: 23.0) [2024-06-06 13:50:01,562][14064] Avg episode reward: [(0, '0.188')] [2024-06-06 13:50:02,684][14296] Updated weights for policy 0, policy_version 22167 (0.0031) [2024-06-06 13:50:05,879][14296] Updated weights for policy 0, policy_version 22177 (0.0033) [2024-06-06 13:50:06,561][14064] Fps is (10 sec: 47525.1, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 363380736. Throughput: 0: 47584.5. Samples: 216363120. Policy #0 lag: (min: 0.0, avg: 9.5, max: 23.0) [2024-06-06 13:50:06,561][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:50:06,663][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000022180_363397120.pth... [2024-06-06 13:50:06,714][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000021483_351977472.pth [2024-06-06 13:50:09,798][14296] Updated weights for policy 0, policy_version 22187 (0.0030) [2024-06-06 13:50:11,564][14064] Fps is (10 sec: 47501.9, 60 sec: 47238.5, 300 sec: 47541.0). Total num frames: 363610112. Throughput: 0: 47717.3. Samples: 216650540. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 13:50:11,565][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:50:12,587][14296] Updated weights for policy 0, policy_version 22197 (0.0036) [2024-06-06 13:50:16,531][14296] Updated weights for policy 0, policy_version 22207 (0.0036) [2024-06-06 13:50:16,561][14064] Fps is (10 sec: 45875.1, 60 sec: 47513.5, 300 sec: 47708.0). Total num frames: 363839488. Throughput: 0: 47594.1. Samples: 216934280. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 13:50:16,562][14064] Avg episode reward: [(0, '0.194')] [2024-06-06 13:50:19,646][14296] Updated weights for policy 0, policy_version 22217 (0.0025) [2024-06-06 13:50:21,561][14064] Fps is (10 sec: 45886.9, 60 sec: 47786.6, 300 sec: 47541.4). Total num frames: 364068864. Throughput: 0: 47623.9. Samples: 217073080. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 13:50:21,562][14064] Avg episode reward: [(0, '0.198')] [2024-06-06 13:50:23,238][14296] Updated weights for policy 0, policy_version 22227 (0.0029) [2024-06-06 13:50:26,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47513.7, 300 sec: 47596.9). Total num frames: 364314624. Throughput: 0: 47604.4. Samples: 217357640. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:50:26,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:50:26,739][14296] Updated weights for policy 0, policy_version 22237 (0.0034) [2024-06-06 13:50:30,321][14296] Updated weights for policy 0, policy_version 22247 (0.0033) [2024-06-06 13:50:31,561][14064] Fps is (10 sec: 45875.7, 60 sec: 47240.5, 300 sec: 47541.4). Total num frames: 364527616. Throughput: 0: 47646.7. Samples: 217646280. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:50:31,561][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:50:33,527][14296] Updated weights for policy 0, policy_version 22257 (0.0035) [2024-06-06 13:50:36,561][14064] Fps is (10 sec: 47512.8, 60 sec: 47786.6, 300 sec: 47596.9). Total num frames: 364789760. Throughput: 0: 47393.8. Samples: 217778660. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:50:36,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:50:37,321][14296] Updated weights for policy 0, policy_version 22267 (0.0025) [2024-06-06 13:50:40,314][14296] Updated weights for policy 0, policy_version 22277 (0.0036) [2024-06-06 13:50:41,561][14064] Fps is (10 sec: 50789.8, 60 sec: 48059.6, 300 sec: 47596.9). Total num frames: 365035520. Throughput: 0: 47167.8. Samples: 218059320. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 13:50:41,562][14064] Avg episode reward: [(0, '0.184')] [2024-06-06 13:50:43,257][14276] Signal inference workers to stop experience collection... (3300 times) [2024-06-06 13:50:43,258][14276] Signal inference workers to resume experience collection... (3300 times) [2024-06-06 13:50:43,299][14296] InferenceWorker_p0-w0: stopping experience collection (3300 times) [2024-06-06 13:50:43,299][14296] InferenceWorker_p0-w0: resuming experience collection (3300 times) [2024-06-06 13:50:44,226][14296] Updated weights for policy 0, policy_version 22287 (0.0021) [2024-06-06 13:50:46,561][14064] Fps is (10 sec: 45875.7, 60 sec: 47240.5, 300 sec: 47485.8). Total num frames: 365248512. Throughput: 0: 47337.9. Samples: 218351340. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 13:50:46,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:50:47,426][14296] Updated weights for policy 0, policy_version 22297 (0.0035) [2024-06-06 13:50:50,923][14296] Updated weights for policy 0, policy_version 22307 (0.0031) [2024-06-06 13:50:51,561][14064] Fps is (10 sec: 44237.1, 60 sec: 46967.5, 300 sec: 47485.8). Total num frames: 365477888. Throughput: 0: 47329.3. Samples: 218492940. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 13:50:51,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:50:54,359][14296] Updated weights for policy 0, policy_version 22317 (0.0033) [2024-06-06 13:50:56,561][14064] Fps is (10 sec: 47513.4, 60 sec: 46969.3, 300 sec: 47486.1). Total num frames: 365723648. Throughput: 0: 47233.8. Samples: 218775940. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 13:50:56,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:50:57,980][14296] Updated weights for policy 0, policy_version 22327 (0.0027) [2024-06-06 13:51:01,061][14296] Updated weights for policy 0, policy_version 22337 (0.0034) [2024-06-06 13:51:01,561][14064] Fps is (10 sec: 50790.0, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 365985792. Throughput: 0: 47243.0. Samples: 219060220. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 13:51:01,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:51:05,015][14296] Updated weights for policy 0, policy_version 22347 (0.0032) [2024-06-06 13:51:06,564][14064] Fps is (10 sec: 47502.5, 60 sec: 46965.6, 300 sec: 47485.5). Total num frames: 366198784. Throughput: 0: 47384.7. Samples: 219205500. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 13:51:06,564][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:51:07,796][14296] Updated weights for policy 0, policy_version 22357 (0.0040) [2024-06-06 13:51:11,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47242.5, 300 sec: 47541.3). Total num frames: 366444544. Throughput: 0: 47432.3. Samples: 219492100. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 13:51:11,562][14064] Avg episode reward: [(0, '0.190')] [2024-06-06 13:51:11,717][14296] Updated weights for policy 0, policy_version 22367 (0.0031) [2024-06-06 13:51:14,776][14296] Updated weights for policy 0, policy_version 22377 (0.0026) [2024-06-06 13:51:16,561][14064] Fps is (10 sec: 47525.0, 60 sec: 47240.5, 300 sec: 47430.3). Total num frames: 366673920. Throughput: 0: 47482.6. Samples: 219783000. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:51:16,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:51:18,435][14296] Updated weights for policy 0, policy_version 22387 (0.0029) [2024-06-06 13:51:21,561][14064] Fps is (10 sec: 47513.9, 60 sec: 47513.6, 300 sec: 47485.8). Total num frames: 366919680. Throughput: 0: 47806.3. Samples: 219929940. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:51:21,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:51:21,794][14296] Updated weights for policy 0, policy_version 22397 (0.0035) [2024-06-06 13:51:25,465][14296] Updated weights for policy 0, policy_version 22407 (0.0034) [2024-06-06 13:51:26,561][14064] Fps is (10 sec: 47513.0, 60 sec: 47240.4, 300 sec: 47485.8). Total num frames: 367149056. Throughput: 0: 47718.6. Samples: 220206660. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 13:51:26,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:51:28,586][14296] Updated weights for policy 0, policy_version 22417 (0.0026) [2024-06-06 13:51:31,564][14064] Fps is (10 sec: 49139.1, 60 sec: 48057.6, 300 sec: 47540.9). Total num frames: 367411200. Throughput: 0: 47605.7. Samples: 220493720. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 13:51:31,564][14064] Avg episode reward: [(0, '0.188')] [2024-06-06 13:51:32,238][14296] Updated weights for policy 0, policy_version 22427 (0.0022) [2024-06-06 13:51:35,294][14296] Updated weights for policy 0, policy_version 22437 (0.0040) [2024-06-06 13:51:36,561][14064] Fps is (10 sec: 47514.3, 60 sec: 47240.7, 300 sec: 47430.3). Total num frames: 367624192. Throughput: 0: 47576.0. Samples: 220633860. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 13:51:36,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:51:38,964][14296] Updated weights for policy 0, policy_version 22447 (0.0038) [2024-06-06 13:51:41,561][14064] Fps is (10 sec: 45886.7, 60 sec: 47240.5, 300 sec: 47486.2). Total num frames: 367869952. Throughput: 0: 47714.1. Samples: 220923080. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 13:51:41,562][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:51:42,295][14296] Updated weights for policy 0, policy_version 22457 (0.0030) [2024-06-06 13:51:45,921][14296] Updated weights for policy 0, policy_version 22467 (0.0020) [2024-06-06 13:51:46,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47513.6, 300 sec: 47485.8). Total num frames: 368099328. Throughput: 0: 47816.5. Samples: 221211960. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 13:51:46,562][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:51:49,362][14296] Updated weights for policy 0, policy_version 22477 (0.0032) [2024-06-06 13:51:51,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47786.6, 300 sec: 47485.8). Total num frames: 368345088. Throughput: 0: 47564.2. Samples: 221345780. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 13:51:51,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:51:52,957][14296] Updated weights for policy 0, policy_version 22487 (0.0022) [2024-06-06 13:51:54,139][14276] Signal inference workers to stop experience collection... (3350 times) [2024-06-06 13:51:54,140][14276] Signal inference workers to resume experience collection... (3350 times) [2024-06-06 13:51:54,170][14296] InferenceWorker_p0-w0: stopping experience collection (3350 times) [2024-06-06 13:51:54,171][14296] InferenceWorker_p0-w0: resuming experience collection (3350 times) [2024-06-06 13:51:56,021][14296] Updated weights for policy 0, policy_version 22497 (0.0028) [2024-06-06 13:51:56,561][14064] Fps is (10 sec: 50789.9, 60 sec: 48059.6, 300 sec: 47541.4). Total num frames: 368607232. Throughput: 0: 47648.4. Samples: 221636280. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 13:51:56,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:51:59,598][14296] Updated weights for policy 0, policy_version 22507 (0.0028) [2024-06-06 13:52:01,561][14064] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 47596.9). Total num frames: 368836608. Throughput: 0: 47500.8. Samples: 221920540. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 13:52:01,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:52:02,723][14296] Updated weights for policy 0, policy_version 22517 (0.0035) [2024-06-06 13:52:06,495][14296] Updated weights for policy 0, policy_version 22527 (0.0032) [2024-06-06 13:52:06,561][14064] Fps is (10 sec: 47514.0, 60 sec: 48061.6, 300 sec: 47596.9). Total num frames: 369082368. Throughput: 0: 47582.6. Samples: 222071160. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 13:52:06,562][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:52:06,577][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000022527_369082368.pth... [2024-06-06 13:52:06,636][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000021830_357662720.pth [2024-06-06 13:52:09,729][14296] Updated weights for policy 0, policy_version 22537 (0.0023) [2024-06-06 13:52:11,561][14064] Fps is (10 sec: 45875.7, 60 sec: 47513.7, 300 sec: 47430.3). Total num frames: 369295360. Throughput: 0: 47773.5. Samples: 222356460. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 13:52:11,561][14064] Avg episode reward: [(0, '0.194')] [2024-06-06 13:52:13,437][14296] Updated weights for policy 0, policy_version 22547 (0.0021) [2024-06-06 13:52:16,471][14296] Updated weights for policy 0, policy_version 22557 (0.0031) [2024-06-06 13:52:16,561][14064] Fps is (10 sec: 49152.4, 60 sec: 48332.8, 300 sec: 47597.3). Total num frames: 369573888. Throughput: 0: 47644.1. Samples: 222637580. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:52:16,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 13:52:16,571][14276] Saving new best policy, reward=0.200! [2024-06-06 13:52:20,627][14296] Updated weights for policy 0, policy_version 22567 (0.0044) [2024-06-06 13:52:21,561][14064] Fps is (10 sec: 49151.5, 60 sec: 47786.6, 300 sec: 47541.4). Total num frames: 369786880. Throughput: 0: 47697.3. Samples: 222780240. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:52:21,562][14064] Avg episode reward: [(0, '0.201')] [2024-06-06 13:52:21,562][14276] Saving new best policy, reward=0.201! [2024-06-06 13:52:23,396][14296] Updated weights for policy 0, policy_version 22577 (0.0024) [2024-06-06 13:52:26,561][14064] Fps is (10 sec: 42598.4, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 369999872. Throughput: 0: 47655.3. Samples: 223067560. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 13:52:26,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:52:27,362][14296] Updated weights for policy 0, policy_version 22587 (0.0038) [2024-06-06 13:52:30,338][14296] Updated weights for policy 0, policy_version 22597 (0.0036) [2024-06-06 13:52:31,561][14064] Fps is (10 sec: 45874.9, 60 sec: 47242.5, 300 sec: 47485.8). Total num frames: 370245632. Throughput: 0: 47543.5. Samples: 223351420. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 13:52:31,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:52:34,174][14296] Updated weights for policy 0, policy_version 22607 (0.0030) [2024-06-06 13:52:36,561][14064] Fps is (10 sec: 49152.0, 60 sec: 47786.7, 300 sec: 47485.8). Total num frames: 370491392. Throughput: 0: 47687.3. Samples: 223491700. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 13:52:36,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:52:37,390][14296] Updated weights for policy 0, policy_version 22617 (0.0032) [2024-06-06 13:52:41,091][14296] Updated weights for policy 0, policy_version 22627 (0.0023) [2024-06-06 13:52:41,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47513.6, 300 sec: 47541.3). Total num frames: 370720768. Throughput: 0: 47611.2. Samples: 223778780. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 13:52:41,562][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 13:52:41,670][14276] Saving new best policy, reward=0.204! [2024-06-06 13:52:44,306][14296] Updated weights for policy 0, policy_version 22637 (0.0023) [2024-06-06 13:52:46,561][14064] Fps is (10 sec: 47513.4, 60 sec: 47786.7, 300 sec: 47541.4). Total num frames: 370966528. Throughput: 0: 47643.6. Samples: 224064500. Policy #0 lag: (min: 1.0, avg: 10.4, max: 21.0) [2024-06-06 13:52:46,562][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:52:48,019][14296] Updated weights for policy 0, policy_version 22647 (0.0030) [2024-06-06 13:52:51,074][14296] Updated weights for policy 0, policy_version 22657 (0.0027) [2024-06-06 13:52:51,561][14064] Fps is (10 sec: 50790.5, 60 sec: 48059.7, 300 sec: 47596.9). Total num frames: 371228672. Throughput: 0: 47497.3. Samples: 224208540. Policy #0 lag: (min: 1.0, avg: 10.4, max: 21.0) [2024-06-06 13:52:51,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:52:54,710][14296] Updated weights for policy 0, policy_version 22667 (0.0029) [2024-06-06 13:52:56,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47240.6, 300 sec: 47541.3). Total num frames: 371441664. Throughput: 0: 47523.8. Samples: 224495040. Policy #0 lag: (min: 1.0, avg: 10.4, max: 21.0) [2024-06-06 13:52:56,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:52:57,899][14296] Updated weights for policy 0, policy_version 22677 (0.0033) [2024-06-06 13:53:01,561][14064] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 47541.8). Total num frames: 371687424. Throughput: 0: 47720.8. Samples: 224785020. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 13:53:01,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:53:01,774][14296] Updated weights for policy 0, policy_version 22687 (0.0032) [2024-06-06 13:53:02,421][14276] Signal inference workers to stop experience collection... (3400 times) [2024-06-06 13:53:02,430][14296] InferenceWorker_p0-w0: stopping experience collection (3400 times) [2024-06-06 13:53:02,479][14276] Signal inference workers to resume experience collection... (3400 times) [2024-06-06 13:53:02,479][14296] InferenceWorker_p0-w0: resuming experience collection (3400 times) [2024-06-06 13:53:04,717][14296] Updated weights for policy 0, policy_version 22697 (0.0031) [2024-06-06 13:53:06,561][14064] Fps is (10 sec: 47513.4, 60 sec: 47240.5, 300 sec: 47541.4). Total num frames: 371916800. Throughput: 0: 47763.0. Samples: 224929580. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 13:53:06,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 13:53:08,391][14296] Updated weights for policy 0, policy_version 22707 (0.0033) [2024-06-06 13:53:11,561][14064] Fps is (10 sec: 49151.5, 60 sec: 48059.6, 300 sec: 47596.9). Total num frames: 372178944. Throughput: 0: 47744.7. Samples: 225216080. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 13:53:11,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:53:11,575][14296] Updated weights for policy 0, policy_version 22717 (0.0029) [2024-06-06 13:53:15,316][14296] Updated weights for policy 0, policy_version 22727 (0.0024) [2024-06-06 13:53:16,561][14064] Fps is (10 sec: 49152.8, 60 sec: 47240.5, 300 sec: 47596.9). Total num frames: 372408320. Throughput: 0: 47805.5. Samples: 225502660. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 13:53:16,562][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:53:18,367][14296] Updated weights for policy 0, policy_version 22737 (0.0031) [2024-06-06 13:53:21,561][14064] Fps is (10 sec: 45875.6, 60 sec: 47513.6, 300 sec: 47596.9). Total num frames: 372637696. Throughput: 0: 47904.3. Samples: 225647400. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 13:53:21,562][14064] Avg episode reward: [(0, '0.189')] [2024-06-06 13:53:22,026][14296] Updated weights for policy 0, policy_version 22747 (0.0037) [2024-06-06 13:53:25,542][14296] Updated weights for policy 0, policy_version 22757 (0.0034) [2024-06-06 13:53:26,561][14064] Fps is (10 sec: 47512.5, 60 sec: 48059.5, 300 sec: 47541.3). Total num frames: 372883456. Throughput: 0: 47750.6. Samples: 225927560. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 13:53:26,562][14064] Avg episode reward: [(0, '0.190')] [2024-06-06 13:53:28,991][14296] Updated weights for policy 0, policy_version 22767 (0.0032) [2024-06-06 13:53:31,564][14064] Fps is (10 sec: 47501.4, 60 sec: 47784.7, 300 sec: 47540.9). Total num frames: 373112832. Throughput: 0: 47860.8. Samples: 226218360. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 13:53:31,565][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:53:32,496][14296] Updated weights for policy 0, policy_version 22777 (0.0020) [2024-06-06 13:53:36,108][14296] Updated weights for policy 0, policy_version 22787 (0.0026) [2024-06-06 13:53:36,561][14064] Fps is (10 sec: 47514.8, 60 sec: 47786.7, 300 sec: 47596.9). Total num frames: 373358592. Throughput: 0: 47835.3. Samples: 226361120. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 13:53:36,561][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:53:39,171][14296] Updated weights for policy 0, policy_version 22797 (0.0027) [2024-06-06 13:53:41,561][14064] Fps is (10 sec: 49164.7, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 373604352. Throughput: 0: 47797.8. Samples: 226645940. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 13:53:41,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:53:42,910][14296] Updated weights for policy 0, policy_version 22807 (0.0037) [2024-06-06 13:53:45,887][14296] Updated weights for policy 0, policy_version 22817 (0.0036) [2024-06-06 13:53:46,561][14064] Fps is (10 sec: 49151.8, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 373850112. Throughput: 0: 47712.1. Samples: 226932060. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 13:53:46,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:53:49,582][14296] Updated weights for policy 0, policy_version 22827 (0.0036) [2024-06-06 13:53:51,561][14064] Fps is (10 sec: 47513.9, 60 sec: 47513.7, 300 sec: 47596.9). Total num frames: 374079488. Throughput: 0: 47775.7. Samples: 227079480. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 13:53:51,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:53:52,811][14296] Updated weights for policy 0, policy_version 22837 (0.0025) [2024-06-06 13:53:56,561][14064] Fps is (10 sec: 45875.5, 60 sec: 47786.8, 300 sec: 47541.8). Total num frames: 374308864. Throughput: 0: 47829.1. Samples: 227368380. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 13:53:56,562][14064] Avg episode reward: [(0, '0.185')] [2024-06-06 13:53:56,653][14296] Updated weights for policy 0, policy_version 22847 (0.0033) [2024-06-06 13:53:59,798][14296] Updated weights for policy 0, policy_version 22857 (0.0038) [2024-06-06 13:54:01,561][14064] Fps is (10 sec: 49151.4, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 374571008. Throughput: 0: 47635.4. Samples: 227646260. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 13:54:01,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 13:54:03,454][14296] Updated weights for policy 0, policy_version 22867 (0.0037) [2024-06-06 13:54:06,561][14296] Updated weights for policy 0, policy_version 22877 (0.0026) [2024-06-06 13:54:06,561][14064] Fps is (10 sec: 50789.3, 60 sec: 48332.8, 300 sec: 47596.9). Total num frames: 374816768. Throughput: 0: 47633.7. Samples: 227790920. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-06 13:54:06,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:54:06,569][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000022877_374816768.pth... [2024-06-06 13:54:06,614][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000022180_363397120.pth [2024-06-06 13:54:10,577][14296] Updated weights for policy 0, policy_version 22887 (0.0028) [2024-06-06 13:54:11,561][14064] Fps is (10 sec: 45875.0, 60 sec: 47513.6, 300 sec: 47596.9). Total num frames: 375029760. Throughput: 0: 47711.6. Samples: 228074580. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-06 13:54:11,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:54:13,614][14296] Updated weights for policy 0, policy_version 22897 (0.0031) [2024-06-06 13:54:16,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47786.5, 300 sec: 47708.0). Total num frames: 375275520. Throughput: 0: 47547.1. Samples: 228357860. Policy #0 lag: (min: 0.0, avg: 10.7, max: 20.0) [2024-06-06 13:54:16,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 13:54:17,445][14296] Updated weights for policy 0, policy_version 22907 (0.0040) [2024-06-06 13:54:20,511][14296] Updated weights for policy 0, policy_version 22917 (0.0025) [2024-06-06 13:54:21,564][14064] Fps is (10 sec: 47501.8, 60 sec: 47784.6, 300 sec: 47596.5). Total num frames: 375504896. Throughput: 0: 47632.3. Samples: 228504700. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 13:54:21,565][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:54:24,456][14296] Updated weights for policy 0, policy_version 22927 (0.0027) [2024-06-06 13:54:26,561][14064] Fps is (10 sec: 45875.4, 60 sec: 47513.7, 300 sec: 47596.9). Total num frames: 375734272. Throughput: 0: 47426.6. Samples: 228780140. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 13:54:26,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:54:27,610][14296] Updated weights for policy 0, policy_version 22937 (0.0024) [2024-06-06 13:54:31,305][14296] Updated weights for policy 0, policy_version 22947 (0.0034) [2024-06-06 13:54:31,561][14064] Fps is (10 sec: 45887.3, 60 sec: 47515.7, 300 sec: 47596.9). Total num frames: 375963648. Throughput: 0: 47411.1. Samples: 229065560. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 13:54:31,562][14064] Avg episode reward: [(0, '0.198')] [2024-06-06 13:54:34,597][14296] Updated weights for policy 0, policy_version 22957 (0.0035) [2024-06-06 13:54:36,561][14064] Fps is (10 sec: 47514.1, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 376209408. Throughput: 0: 47211.2. Samples: 229203980. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:54:36,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:54:38,214][14296] Updated weights for policy 0, policy_version 22967 (0.0020) [2024-06-06 13:54:41,459][14296] Updated weights for policy 0, policy_version 22977 (0.0023) [2024-06-06 13:54:41,564][14064] Fps is (10 sec: 49139.1, 60 sec: 47511.6, 300 sec: 47596.5). Total num frames: 376455168. Throughput: 0: 47036.7. Samples: 229485160. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:54:41,565][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:54:45,185][14296] Updated weights for policy 0, policy_version 22987 (0.0032) [2024-06-06 13:54:46,561][14064] Fps is (10 sec: 45874.7, 60 sec: 46967.4, 300 sec: 47485.8). Total num frames: 376668160. Throughput: 0: 47241.4. Samples: 229772120. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:54:46,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:54:48,306][14296] Updated weights for policy 0, policy_version 22997 (0.0027) [2024-06-06 13:54:51,561][14064] Fps is (10 sec: 45887.3, 60 sec: 47240.5, 300 sec: 47486.2). Total num frames: 376913920. Throughput: 0: 47040.6. Samples: 229907740. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 13:54:51,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:54:51,999][14296] Updated weights for policy 0, policy_version 23007 (0.0036) [2024-06-06 13:54:55,166][14276] Signal inference workers to stop experience collection... (3450 times) [2024-06-06 13:54:55,168][14276] Signal inference workers to resume experience collection... (3450 times) [2024-06-06 13:54:55,207][14296] InferenceWorker_p0-w0: stopping experience collection (3450 times) [2024-06-06 13:54:55,208][14296] InferenceWorker_p0-w0: resuming experience collection (3450 times) [2024-06-06 13:54:55,296][14296] Updated weights for policy 0, policy_version 23017 (0.0025) [2024-06-06 13:54:56,561][14064] Fps is (10 sec: 49151.8, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 377159680. Throughput: 0: 47113.8. Samples: 230194700. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 13:54:56,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:54:59,024][14296] Updated weights for policy 0, policy_version 23027 (0.0034) [2024-06-06 13:55:01,561][14064] Fps is (10 sec: 49151.9, 60 sec: 47240.6, 300 sec: 47541.4). Total num frames: 377405440. Throughput: 0: 47185.9. Samples: 230481220. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 13:55:01,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:55:02,033][14296] Updated weights for policy 0, policy_version 23037 (0.0024) [2024-06-06 13:55:05,735][14296] Updated weights for policy 0, policy_version 23047 (0.0032) [2024-06-06 13:55:06,561][14064] Fps is (10 sec: 47514.1, 60 sec: 46967.6, 300 sec: 47541.8). Total num frames: 377634816. Throughput: 0: 47131.7. Samples: 230625500. Policy #0 lag: (min: 1.0, avg: 10.2, max: 21.0) [2024-06-06 13:55:06,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 13:55:08,815][14296] Updated weights for policy 0, policy_version 23057 (0.0038) [2024-06-06 13:55:11,561][14064] Fps is (10 sec: 47513.0, 60 sec: 47513.6, 300 sec: 47596.9). Total num frames: 377880576. Throughput: 0: 47381.7. Samples: 230912320. Policy #0 lag: (min: 1.0, avg: 10.2, max: 21.0) [2024-06-06 13:55:11,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:55:12,763][14296] Updated weights for policy 0, policy_version 23067 (0.0032) [2024-06-06 13:55:15,941][14296] Updated weights for policy 0, policy_version 23077 (0.0031) [2024-06-06 13:55:16,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47240.7, 300 sec: 47596.9). Total num frames: 378109952. Throughput: 0: 47403.6. Samples: 231198720. Policy #0 lag: (min: 1.0, avg: 10.2, max: 21.0) [2024-06-06 13:55:16,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 13:55:19,531][14296] Updated weights for policy 0, policy_version 23087 (0.0036) [2024-06-06 13:55:21,561][14064] Fps is (10 sec: 45876.3, 60 sec: 47242.7, 300 sec: 47541.4). Total num frames: 378339328. Throughput: 0: 47480.5. Samples: 231340600. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 13:55:21,562][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 13:55:22,966][14296] Updated weights for policy 0, policy_version 23097 (0.0037) [2024-06-06 13:55:26,464][14296] Updated weights for policy 0, policy_version 23107 (0.0029) [2024-06-06 13:55:26,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 378585088. Throughput: 0: 47729.4. Samples: 231632860. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 13:55:26,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:55:29,594][14296] Updated weights for policy 0, policy_version 23117 (0.0030) [2024-06-06 13:55:31,561][14064] Fps is (10 sec: 49151.4, 60 sec: 47786.6, 300 sec: 47596.9). Total num frames: 378830848. Throughput: 0: 47680.0. Samples: 231917720. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 13:55:31,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:55:33,220][14296] Updated weights for policy 0, policy_version 23127 (0.0030) [2024-06-06 13:55:36,351][14296] Updated weights for policy 0, policy_version 23137 (0.0030) [2024-06-06 13:55:36,561][14064] Fps is (10 sec: 49151.3, 60 sec: 47786.5, 300 sec: 47596.9). Total num frames: 379076608. Throughput: 0: 47817.6. Samples: 232059540. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 13:55:36,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:55:40,319][14296] Updated weights for policy 0, policy_version 23147 (0.0040) [2024-06-06 13:55:41,562][14064] Fps is (10 sec: 45874.5, 60 sec: 47242.4, 300 sec: 47596.9). Total num frames: 379289600. Throughput: 0: 47659.9. Samples: 232339400. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 13:55:41,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:55:43,303][14296] Updated weights for policy 0, policy_version 23157 (0.0029) [2024-06-06 13:55:46,561][14064] Fps is (10 sec: 45876.2, 60 sec: 47786.7, 300 sec: 47652.5). Total num frames: 379535360. Throughput: 0: 47778.7. Samples: 232631260. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 13:55:46,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:55:46,943][14296] Updated weights for policy 0, policy_version 23167 (0.0027) [2024-06-06 13:55:50,373][14296] Updated weights for policy 0, policy_version 23177 (0.0022) [2024-06-06 13:55:51,561][14064] Fps is (10 sec: 49153.2, 60 sec: 47786.7, 300 sec: 47652.5). Total num frames: 379781120. Throughput: 0: 47848.5. Samples: 232778680. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:55:51,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:55:53,979][14296] Updated weights for policy 0, policy_version 23187 (0.0035) [2024-06-06 13:55:56,561][14064] Fps is (10 sec: 49151.9, 60 sec: 47786.8, 300 sec: 47596.9). Total num frames: 380026880. Throughput: 0: 47850.4. Samples: 233065580. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:55:56,562][14064] Avg episode reward: [(0, '0.187')] [2024-06-06 13:55:57,114][14296] Updated weights for policy 0, policy_version 23197 (0.0031) [2024-06-06 13:56:00,809][14296] Updated weights for policy 0, policy_version 23207 (0.0029) [2024-06-06 13:56:01,562][14064] Fps is (10 sec: 49147.7, 60 sec: 47786.0, 300 sec: 47708.2). Total num frames: 380272640. Throughput: 0: 47708.8. Samples: 233345660. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:56:01,563][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 13:56:03,785][14296] Updated weights for policy 0, policy_version 23217 (0.0031) [2024-06-06 13:56:06,561][14064] Fps is (10 sec: 47513.5, 60 sec: 47786.7, 300 sec: 47652.5). Total num frames: 380502016. Throughput: 0: 47892.3. Samples: 233495760. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 13:56:06,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:56:06,572][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000023224_380502016.pth... [2024-06-06 13:56:06,618][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000022527_369082368.pth [2024-06-06 13:56:07,542][14296] Updated weights for policy 0, policy_version 23227 (0.0026) [2024-06-06 13:56:10,866][14296] Updated weights for policy 0, policy_version 23237 (0.0025) [2024-06-06 13:56:11,561][14064] Fps is (10 sec: 45878.9, 60 sec: 47513.7, 300 sec: 47652.4). Total num frames: 380731392. Throughput: 0: 47738.7. Samples: 233781100. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:56:11,562][14064] Avg episode reward: [(0, '0.191')] [2024-06-06 13:56:14,384][14296] Updated weights for policy 0, policy_version 23247 (0.0025) [2024-06-06 13:56:16,561][14064] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 47596.9). Total num frames: 380960768. Throughput: 0: 47929.0. Samples: 234074520. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:56:16,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:56:17,634][14276] Signal inference workers to stop experience collection... (3500 times) [2024-06-06 13:56:17,640][14276] Signal inference workers to resume experience collection... (3500 times) [2024-06-06 13:56:17,682][14296] InferenceWorker_p0-w0: stopping experience collection (3500 times) [2024-06-06 13:56:17,682][14296] InferenceWorker_p0-w0: resuming experience collection (3500 times) [2024-06-06 13:56:17,771][14296] Updated weights for policy 0, policy_version 23257 (0.0027) [2024-06-06 13:56:21,237][14296] Updated weights for policy 0, policy_version 23267 (0.0027) [2024-06-06 13:56:21,561][14064] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 381222912. Throughput: 0: 47878.4. Samples: 234214060. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:56:21,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:56:24,467][14296] Updated weights for policy 0, policy_version 23277 (0.0026) [2024-06-06 13:56:26,563][14064] Fps is (10 sec: 49141.5, 60 sec: 47785.0, 300 sec: 47597.0). Total num frames: 381452288. Throughput: 0: 48022.4. Samples: 234500500. Policy #0 lag: (min: 0.0, avg: 11.1, max: 24.0) [2024-06-06 13:56:26,564][14064] Avg episode reward: [(0, '0.198')] [2024-06-06 13:56:27,952][14296] Updated weights for policy 0, policy_version 23287 (0.0034) [2024-06-06 13:56:31,242][14296] Updated weights for policy 0, policy_version 23297 (0.0031) [2024-06-06 13:56:31,561][14064] Fps is (10 sec: 47513.3, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 381698048. Throughput: 0: 47871.5. Samples: 234785480. Policy #0 lag: (min: 0.0, avg: 11.1, max: 24.0) [2024-06-06 13:56:31,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:56:34,922][14296] Updated weights for policy 0, policy_version 23307 (0.0027) [2024-06-06 13:56:36,564][14064] Fps is (10 sec: 47511.1, 60 sec: 47511.7, 300 sec: 47652.0). Total num frames: 381927424. Throughput: 0: 47902.5. Samples: 234934420. Policy #0 lag: (min: 0.0, avg: 11.1, max: 24.0) [2024-06-06 13:56:36,564][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:56:38,383][14296] Updated weights for policy 0, policy_version 23317 (0.0035) [2024-06-06 13:56:41,538][14296] Updated weights for policy 0, policy_version 23327 (0.0030) [2024-06-06 13:56:41,561][14064] Fps is (10 sec: 49151.8, 60 sec: 48332.9, 300 sec: 47763.5). Total num frames: 382189568. Throughput: 0: 47823.0. Samples: 235217620. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 13:56:41,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 13:56:45,317][14296] Updated weights for policy 0, policy_version 23337 (0.0027) [2024-06-06 13:56:46,561][14064] Fps is (10 sec: 50803.7, 60 sec: 48332.8, 300 sec: 47763.5). Total num frames: 382435328. Throughput: 0: 47919.1. Samples: 235501980. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 13:56:46,562][14064] Avg episode reward: [(0, '0.190')] [2024-06-06 13:56:48,428][14296] Updated weights for policy 0, policy_version 23347 (0.0024) [2024-06-06 13:56:51,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47786.6, 300 sec: 47596.9). Total num frames: 382648320. Throughput: 0: 48010.2. Samples: 235656220. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 13:56:51,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 13:56:52,004][14296] Updated weights for policy 0, policy_version 23357 (0.0039) [2024-06-06 13:56:55,407][14296] Updated weights for policy 0, policy_version 23367 (0.0041) [2024-06-06 13:56:56,561][14064] Fps is (10 sec: 45875.4, 60 sec: 47786.7, 300 sec: 47652.5). Total num frames: 382894080. Throughput: 0: 47830.7. Samples: 235933480. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:56:56,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 13:56:58,790][14296] Updated weights for policy 0, policy_version 23377 (0.0031) [2024-06-06 13:57:01,561][14064] Fps is (10 sec: 49152.4, 60 sec: 47787.3, 300 sec: 47652.5). Total num frames: 383139840. Throughput: 0: 47781.3. Samples: 236224680. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:57:01,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:57:02,005][14296] Updated weights for policy 0, policy_version 23387 (0.0027) [2024-06-06 13:57:05,674][14296] Updated weights for policy 0, policy_version 23397 (0.0034) [2024-06-06 13:57:06,561][14064] Fps is (10 sec: 49151.5, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 383385600. Throughput: 0: 47931.9. Samples: 236371000. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 13:57:06,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:57:08,670][14296] Updated weights for policy 0, policy_version 23407 (0.0029) [2024-06-06 13:57:11,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47786.7, 300 sec: 47541.4). Total num frames: 383598592. Throughput: 0: 47964.5. Samples: 236658800. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:57:11,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 13:57:12,536][14296] Updated weights for policy 0, policy_version 23417 (0.0029) [2024-06-06 13:57:15,691][14296] Updated weights for policy 0, policy_version 23427 (0.0033) [2024-06-06 13:57:16,561][14064] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 383844352. Throughput: 0: 47917.8. Samples: 236941780. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:57:16,562][14064] Avg episode reward: [(0, '0.194')] [2024-06-06 13:57:19,371][14296] Updated weights for policy 0, policy_version 23437 (0.0042) [2024-06-06 13:57:21,564][14064] Fps is (10 sec: 49139.1, 60 sec: 47784.6, 300 sec: 47763.1). Total num frames: 384090112. Throughput: 0: 47782.3. Samples: 237084620. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:57:21,565][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:57:22,723][14296] Updated weights for policy 0, policy_version 23447 (0.0029) [2024-06-06 13:57:26,191][14296] Updated weights for policy 0, policy_version 23457 (0.0035) [2024-06-06 13:57:26,561][14064] Fps is (10 sec: 49152.1, 60 sec: 48061.4, 300 sec: 47763.5). Total num frames: 384335872. Throughput: 0: 47939.2. Samples: 237374880. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 13:57:26,562][14064] Avg episode reward: [(0, '0.194')] [2024-06-06 13:57:29,361][14296] Updated weights for policy 0, policy_version 23467 (0.0035) [2024-06-06 13:57:31,561][14064] Fps is (10 sec: 47526.0, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 384565248. Throughput: 0: 47938.7. Samples: 237659220. Policy #0 lag: (min: 0.0, avg: 11.9, max: 23.0) [2024-06-06 13:57:31,562][14064] Avg episode reward: [(0, '0.192')] [2024-06-06 13:57:33,182][14296] Updated weights for policy 0, policy_version 23477 (0.0030) [2024-06-06 13:57:36,142][14296] Updated weights for policy 0, policy_version 23487 (0.0028) [2024-06-06 13:57:36,561][14064] Fps is (10 sec: 47513.7, 60 sec: 48061.9, 300 sec: 47763.5). Total num frames: 384811008. Throughput: 0: 47553.4. Samples: 237796120. Policy #0 lag: (min: 0.0, avg: 11.9, max: 23.0) [2024-06-06 13:57:36,562][14064] Avg episode reward: [(0, '0.198')] [2024-06-06 13:57:39,943][14296] Updated weights for policy 0, policy_version 23497 (0.0032) [2024-06-06 13:57:41,561][14064] Fps is (10 sec: 49151.6, 60 sec: 47786.7, 300 sec: 47763.5). Total num frames: 385056768. Throughput: 0: 47820.3. Samples: 238085400. Policy #0 lag: (min: 0.0, avg: 11.9, max: 23.0) [2024-06-06 13:57:41,562][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 13:57:43,262][14296] Updated weights for policy 0, policy_version 23507 (0.0029) [2024-06-06 13:57:46,519][14276] Signal inference workers to stop experience collection... (3550 times) [2024-06-06 13:57:46,525][14276] Signal inference workers to resume experience collection... (3550 times) [2024-06-06 13:57:46,539][14296] InferenceWorker_p0-w0: stopping experience collection (3550 times) [2024-06-06 13:57:46,539][14296] InferenceWorker_p0-w0: resuming experience collection (3550 times) [2024-06-06 13:57:46,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47513.7, 300 sec: 47652.5). Total num frames: 385286144. Throughput: 0: 47810.3. Samples: 238376140. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:57:46,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:57:46,664][14296] Updated weights for policy 0, policy_version 23517 (0.0031) [2024-06-06 13:57:50,239][14296] Updated weights for policy 0, policy_version 23527 (0.0031) [2024-06-06 13:57:51,561][14064] Fps is (10 sec: 45875.7, 60 sec: 47786.8, 300 sec: 47708.0). Total num frames: 385515520. Throughput: 0: 47667.2. Samples: 238516020. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:57:51,562][14064] Avg episode reward: [(0, '0.207')] [2024-06-06 13:57:51,562][14276] Saving new best policy, reward=0.207! [2024-06-06 13:57:53,689][14296] Updated weights for policy 0, policy_version 23537 (0.0025) [2024-06-06 13:57:56,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47786.7, 300 sec: 47708.0). Total num frames: 385761280. Throughput: 0: 47778.7. Samples: 238808840. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 13:57:56,561][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 13:57:56,927][14296] Updated weights for policy 0, policy_version 23547 (0.0029) [2024-06-06 13:58:00,459][14296] Updated weights for policy 0, policy_version 23557 (0.0026) [2024-06-06 13:58:01,561][14064] Fps is (10 sec: 49152.0, 60 sec: 47786.7, 300 sec: 47763.6). Total num frames: 386007040. Throughput: 0: 47877.8. Samples: 239096280. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 13:58:01,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 13:58:03,731][14296] Updated weights for policy 0, policy_version 23567 (0.0025) [2024-06-06 13:58:06,561][14064] Fps is (10 sec: 45874.9, 60 sec: 47240.6, 300 sec: 47596.9). Total num frames: 386220032. Throughput: 0: 47939.2. Samples: 239241760. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 13:58:06,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 13:58:06,597][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000023574_386236416.pth... [2024-06-06 13:58:06,637][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000022877_374816768.pth [2024-06-06 13:58:07,402][14296] Updated weights for policy 0, policy_version 23577 (0.0043) [2024-06-06 13:58:10,443][14296] Updated weights for policy 0, policy_version 23587 (0.0028) [2024-06-06 13:58:11,561][14064] Fps is (10 sec: 49151.4, 60 sec: 48332.7, 300 sec: 47763.5). Total num frames: 386498560. Throughput: 0: 47774.6. Samples: 239524740. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 13:58:11,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:58:14,112][14296] Updated weights for policy 0, policy_version 23597 (0.0028) [2024-06-06 13:58:16,561][14064] Fps is (10 sec: 50789.6, 60 sec: 48059.6, 300 sec: 47763.5). Total num frames: 386727936. Throughput: 0: 47878.0. Samples: 239813740. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 13:58:16,562][14064] Avg episode reward: [(0, '0.201')] [2024-06-06 13:58:17,408][14296] Updated weights for policy 0, policy_version 23607 (0.0025) [2024-06-06 13:58:20,899][14296] Updated weights for policy 0, policy_version 23617 (0.0041) [2024-06-06 13:58:21,561][14064] Fps is (10 sec: 45875.8, 60 sec: 47788.8, 300 sec: 47708.0). Total num frames: 386957312. Throughput: 0: 48184.0. Samples: 239964400. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 13:58:21,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 13:58:24,124][14296] Updated weights for policy 0, policy_version 23627 (0.0021) [2024-06-06 13:58:26,561][14064] Fps is (10 sec: 45875.6, 60 sec: 47513.5, 300 sec: 47708.4). Total num frames: 387186688. Throughput: 0: 48065.8. Samples: 240248360. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 13:58:26,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:58:27,880][14296] Updated weights for policy 0, policy_version 23637 (0.0028) [2024-06-06 13:58:31,059][14296] Updated weights for policy 0, policy_version 23647 (0.0028) [2024-06-06 13:58:31,561][14064] Fps is (10 sec: 49151.1, 60 sec: 48059.6, 300 sec: 47763.5). Total num frames: 387448832. Throughput: 0: 47862.0. Samples: 240529940. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:58:31,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 13:58:34,786][14296] Updated weights for policy 0, policy_version 23657 (0.0021) [2024-06-06 13:58:36,561][14064] Fps is (10 sec: 50790.5, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 387694592. Throughput: 0: 48079.9. Samples: 240679620. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:58:36,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 13:58:37,695][14296] Updated weights for policy 0, policy_version 23667 (0.0027) [2024-06-06 13:58:41,509][14296] Updated weights for policy 0, policy_version 23677 (0.0027) [2024-06-06 13:58:41,561][14064] Fps is (10 sec: 47514.4, 60 sec: 47786.8, 300 sec: 47708.0). Total num frames: 387923968. Throughput: 0: 47911.1. Samples: 240964840. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:58:41,561][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 13:58:44,758][14296] Updated weights for policy 0, policy_version 23687 (0.0026) [2024-06-06 13:58:46,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47786.6, 300 sec: 47708.0). Total num frames: 388153344. Throughput: 0: 47939.5. Samples: 241253560. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 13:58:46,562][14064] Avg episode reward: [(0, '0.201')] [2024-06-06 13:58:48,313][14296] Updated weights for policy 0, policy_version 23697 (0.0031) [2024-06-06 13:58:51,548][14296] Updated weights for policy 0, policy_version 23707 (0.0038) [2024-06-06 13:58:51,561][14064] Fps is (10 sec: 49151.6, 60 sec: 48332.7, 300 sec: 47819.0). Total num frames: 388415488. Throughput: 0: 47899.5. Samples: 241397240. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 13:58:51,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:58:55,255][14296] Updated weights for policy 0, policy_version 23717 (0.0037) [2024-06-06 13:58:56,561][14064] Fps is (10 sec: 49152.4, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 388644864. Throughput: 0: 48038.8. Samples: 241686480. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 13:58:56,561][14064] Avg episode reward: [(0, '0.206')] [2024-06-06 13:58:58,414][14296] Updated weights for policy 0, policy_version 23727 (0.0032) [2024-06-06 13:59:01,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47786.6, 300 sec: 47652.5). Total num frames: 388874240. Throughput: 0: 48029.5. Samples: 241975060. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 13:59:01,562][14064] Avg episode reward: [(0, '0.209')] [2024-06-06 13:59:01,562][14276] Saving new best policy, reward=0.209! [2024-06-06 13:59:02,065][14296] Updated weights for policy 0, policy_version 23737 (0.0031) [2024-06-06 13:59:05,185][14296] Updated weights for policy 0, policy_version 23747 (0.0027) [2024-06-06 13:59:06,561][14064] Fps is (10 sec: 47512.9, 60 sec: 48332.7, 300 sec: 47763.5). Total num frames: 389120000. Throughput: 0: 47708.7. Samples: 242111300. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 13:59:06,562][14064] Avg episode reward: [(0, '0.186')] [2024-06-06 13:59:08,879][14296] Updated weights for policy 0, policy_version 23757 (0.0032) [2024-06-06 13:59:11,561][14064] Fps is (10 sec: 49151.7, 60 sec: 47786.7, 300 sec: 47763.5). Total num frames: 389365760. Throughput: 0: 47645.8. Samples: 242392420. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 13:59:11,562][14064] Avg episode reward: [(0, '0.198')] [2024-06-06 13:59:12,159][14296] Updated weights for policy 0, policy_version 23767 (0.0029) [2024-06-06 13:59:15,632][14296] Updated weights for policy 0, policy_version 23777 (0.0037) [2024-06-06 13:59:16,561][14064] Fps is (10 sec: 47514.6, 60 sec: 47786.9, 300 sec: 47764.0). Total num frames: 389595136. Throughput: 0: 47986.0. Samples: 242689300. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 13:59:16,561][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 13:59:18,751][14296] Updated weights for policy 0, policy_version 23787 (0.0037) [2024-06-06 13:59:21,561][14064] Fps is (10 sec: 47514.1, 60 sec: 48059.7, 300 sec: 47819.1). Total num frames: 389840896. Throughput: 0: 47860.6. Samples: 242833340. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:59:21,562][14064] Avg episode reward: [(0, '0.207')] [2024-06-06 13:59:22,695][14296] Updated weights for policy 0, policy_version 23797 (0.0024) [2024-06-06 13:59:25,541][14296] Updated weights for policy 0, policy_version 23807 (0.0029) [2024-06-06 13:59:26,561][14064] Fps is (10 sec: 49151.4, 60 sec: 48332.8, 300 sec: 47874.6). Total num frames: 390086656. Throughput: 0: 47900.4. Samples: 243120360. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:59:26,562][14064] Avg episode reward: [(0, '0.196')] [2024-06-06 13:59:29,418][14296] Updated weights for policy 0, policy_version 23817 (0.0034) [2024-06-06 13:59:29,431][14276] Signal inference workers to stop experience collection... (3600 times) [2024-06-06 13:59:29,431][14276] Signal inference workers to resume experience collection... (3600 times) [2024-06-06 13:59:29,479][14296] InferenceWorker_p0-w0: stopping experience collection (3600 times) [2024-06-06 13:59:29,479][14296] InferenceWorker_p0-w0: resuming experience collection (3600 times) [2024-06-06 13:59:31,561][14064] Fps is (10 sec: 47513.5, 60 sec: 47786.8, 300 sec: 47819.1). Total num frames: 390316032. Throughput: 0: 47872.5. Samples: 243407820. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 13:59:31,562][14064] Avg episode reward: [(0, '0.195')] [2024-06-06 13:59:32,481][14296] Updated weights for policy 0, policy_version 23827 (0.0033) [2024-06-06 13:59:36,180][14296] Updated weights for policy 0, policy_version 23837 (0.0025) [2024-06-06 13:59:36,561][14064] Fps is (10 sec: 47513.7, 60 sec: 47786.7, 300 sec: 47819.5). Total num frames: 390561792. Throughput: 0: 47895.6. Samples: 243552540. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:59:36,562][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 13:59:39,442][14296] Updated weights for policy 0, policy_version 23847 (0.0028) [2024-06-06 13:59:41,561][14064] Fps is (10 sec: 47513.1, 60 sec: 47786.6, 300 sec: 47874.6). Total num frames: 390791168. Throughput: 0: 47807.9. Samples: 243837840. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:59:41,567][14064] Avg episode reward: [(0, '0.194')] [2024-06-06 13:59:43,131][14296] Updated weights for policy 0, policy_version 23857 (0.0026) [2024-06-06 13:59:46,002][14296] Updated weights for policy 0, policy_version 23867 (0.0029) [2024-06-06 13:59:46,561][14064] Fps is (10 sec: 47513.9, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 391036928. Throughput: 0: 47762.7. Samples: 244124380. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:59:46,562][14064] Avg episode reward: [(0, '0.207')] [2024-06-06 13:59:49,999][14296] Updated weights for policy 0, policy_version 23877 (0.0035) [2024-06-06 13:59:51,561][14064] Fps is (10 sec: 50790.6, 60 sec: 48059.7, 300 sec: 47930.2). Total num frames: 391299072. Throughput: 0: 48005.4. Samples: 244271540. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 13:59:51,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 13:59:52,734][14296] Updated weights for policy 0, policy_version 23887 (0.0030) [2024-06-06 13:59:56,561][14064] Fps is (10 sec: 47513.1, 60 sec: 47786.6, 300 sec: 47819.1). Total num frames: 391512064. Throughput: 0: 48053.8. Samples: 244554840. Policy #0 lag: (min: 0.0, avg: 11.2, max: 25.0) [2024-06-06 13:59:56,562][14064] Avg episode reward: [(0, '0.194')] [2024-06-06 13:59:56,814][14296] Updated weights for policy 0, policy_version 23897 (0.0028) [2024-06-06 13:59:59,741][14296] Updated weights for policy 0, policy_version 23907 (0.0027) [2024-06-06 14:00:01,561][14064] Fps is (10 sec: 42598.7, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 391725056. Throughput: 0: 47870.1. Samples: 244843460. Policy #0 lag: (min: 0.0, avg: 11.2, max: 25.0) [2024-06-06 14:00:01,562][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 14:00:03,532][14296] Updated weights for policy 0, policy_version 23917 (0.0033) [2024-06-06 14:00:06,561][14064] Fps is (10 sec: 49151.6, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 392003584. Throughput: 0: 47817.2. Samples: 244985120. Policy #0 lag: (min: 0.0, avg: 11.2, max: 25.0) [2024-06-06 14:00:06,562][14064] Avg episode reward: [(0, '0.208')] [2024-06-06 14:00:06,566][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000023926_392003584.pth... [2024-06-06 14:00:06,611][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000023224_380502016.pth [2024-06-06 14:00:06,826][14296] Updated weights for policy 0, policy_version 23927 (0.0024) [2024-06-06 14:00:10,602][14296] Updated weights for policy 0, policy_version 23937 (0.0026) [2024-06-06 14:00:11,564][14064] Fps is (10 sec: 49138.9, 60 sec: 47511.5, 300 sec: 47818.6). Total num frames: 392216576. Throughput: 0: 47739.0. Samples: 245268740. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 14:00:11,565][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 14:00:13,531][14296] Updated weights for policy 0, policy_version 23947 (0.0031) [2024-06-06 14:00:16,561][14064] Fps is (10 sec: 45874.8, 60 sec: 47786.4, 300 sec: 47874.6). Total num frames: 392462336. Throughput: 0: 47682.4. Samples: 245553540. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 14:00:16,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:00:17,647][14296] Updated weights for policy 0, policy_version 23957 (0.0027) [2024-06-06 14:00:20,282][14296] Updated weights for policy 0, policy_version 23967 (0.0036) [2024-06-06 14:00:21,561][14064] Fps is (10 sec: 49164.5, 60 sec: 47786.6, 300 sec: 47874.6). Total num frames: 392708096. Throughput: 0: 47650.1. Samples: 245696800. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 14:00:21,562][14064] Avg episode reward: [(0, '0.205')] [2024-06-06 14:00:24,362][14296] Updated weights for policy 0, policy_version 23977 (0.0029) [2024-06-06 14:00:26,561][14064] Fps is (10 sec: 49152.0, 60 sec: 47786.5, 300 sec: 47874.6). Total num frames: 392953856. Throughput: 0: 47567.4. Samples: 245978380. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 14:00:26,562][14064] Avg episode reward: [(0, '0.197')] [2024-06-06 14:00:27,519][14296] Updated weights for policy 0, policy_version 23987 (0.0034) [2024-06-06 14:00:31,111][14296] Updated weights for policy 0, policy_version 23997 (0.0026) [2024-06-06 14:00:31,561][14064] Fps is (10 sec: 45875.9, 60 sec: 47513.6, 300 sec: 47763.6). Total num frames: 393166848. Throughput: 0: 47558.7. Samples: 246264520. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 14:00:31,562][14064] Avg episode reward: [(0, '0.205')] [2024-06-06 14:00:34,515][14296] Updated weights for policy 0, policy_version 24007 (0.0021) [2024-06-06 14:00:36,561][14064] Fps is (10 sec: 44237.7, 60 sec: 47240.5, 300 sec: 47819.1). Total num frames: 393396224. Throughput: 0: 47385.8. Samples: 246403900. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 14:00:36,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 14:00:38,260][14296] Updated weights for policy 0, policy_version 24017 (0.0036) [2024-06-06 14:00:41,524][14296] Updated weights for policy 0, policy_version 24027 (0.0038) [2024-06-06 14:00:41,561][14064] Fps is (10 sec: 49151.6, 60 sec: 47786.7, 300 sec: 47874.6). Total num frames: 393658368. Throughput: 0: 47225.8. Samples: 246680000. Policy #0 lag: (min: 1.0, avg: 12.1, max: 21.0) [2024-06-06 14:00:41,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 14:00:42,759][14276] Signal inference workers to stop experience collection... (3650 times) [2024-06-06 14:00:42,759][14276] Signal inference workers to resume experience collection... (3650 times) [2024-06-06 14:00:42,786][14296] InferenceWorker_p0-w0: stopping experience collection (3650 times) [2024-06-06 14:00:42,786][14296] InferenceWorker_p0-w0: resuming experience collection (3650 times) [2024-06-06 14:00:45,312][14296] Updated weights for policy 0, policy_version 24037 (0.0031) [2024-06-06 14:00:46,561][14064] Fps is (10 sec: 50790.3, 60 sec: 47786.6, 300 sec: 47874.6). Total num frames: 393904128. Throughput: 0: 47341.3. Samples: 246973820. Policy #0 lag: (min: 1.0, avg: 12.1, max: 21.0) [2024-06-06 14:00:46,562][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 14:00:48,485][14296] Updated weights for policy 0, policy_version 24047 (0.0035) [2024-06-06 14:00:51,561][14064] Fps is (10 sec: 45875.5, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 394117120. Throughput: 0: 47582.4. Samples: 247126320. Policy #0 lag: (min: 1.0, avg: 12.1, max: 21.0) [2024-06-06 14:00:51,562][14064] Avg episode reward: [(0, '0.209')] [2024-06-06 14:00:52,043][14296] Updated weights for policy 0, policy_version 24057 (0.0024) [2024-06-06 14:00:55,554][14296] Updated weights for policy 0, policy_version 24067 (0.0040) [2024-06-06 14:00:56,564][14064] Fps is (10 sec: 44225.3, 60 sec: 47238.5, 300 sec: 47707.7). Total num frames: 394346496. Throughput: 0: 47433.8. Samples: 247403260. Policy #0 lag: (min: 1.0, avg: 9.9, max: 21.0) [2024-06-06 14:00:56,565][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 14:00:58,864][14296] Updated weights for policy 0, policy_version 24077 (0.0036) [2024-06-06 14:01:01,564][14064] Fps is (10 sec: 49138.8, 60 sec: 48057.6, 300 sec: 47818.6). Total num frames: 394608640. Throughput: 0: 47227.6. Samples: 247678900. Policy #0 lag: (min: 1.0, avg: 9.9, max: 21.0) [2024-06-06 14:01:01,565][14064] Avg episode reward: [(0, '0.198')] [2024-06-06 14:01:02,311][14296] Updated weights for policy 0, policy_version 24087 (0.0030) [2024-06-06 14:01:05,847][14296] Updated weights for policy 0, policy_version 24097 (0.0031) [2024-06-06 14:01:06,561][14064] Fps is (10 sec: 49164.3, 60 sec: 47240.5, 300 sec: 47819.0). Total num frames: 394838016. Throughput: 0: 47480.9. Samples: 247833440. Policy #0 lag: (min: 1.0, avg: 9.9, max: 21.0) [2024-06-06 14:01:06,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:01:09,235][14296] Updated weights for policy 0, policy_version 24107 (0.0034) [2024-06-06 14:01:11,561][14064] Fps is (10 sec: 45887.4, 60 sec: 47515.7, 300 sec: 47819.1). Total num frames: 395067392. Throughput: 0: 47517.1. Samples: 248116640. Policy #0 lag: (min: 1.0, avg: 9.9, max: 21.0) [2024-06-06 14:01:11,562][14064] Avg episode reward: [(0, '0.198')] [2024-06-06 14:01:12,873][14296] Updated weights for policy 0, policy_version 24117 (0.0024) [2024-06-06 14:01:16,262][14296] Updated weights for policy 0, policy_version 24127 (0.0036) [2024-06-06 14:01:16,561][14064] Fps is (10 sec: 45875.8, 60 sec: 47240.7, 300 sec: 47708.0). Total num frames: 395296768. Throughput: 0: 47389.3. Samples: 248397040. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:01:16,562][14064] Avg episode reward: [(0, '0.205')] [2024-06-06 14:01:19,681][14296] Updated weights for policy 0, policy_version 24137 (0.0031) [2024-06-06 14:01:21,561][14064] Fps is (10 sec: 47513.8, 60 sec: 47240.6, 300 sec: 47763.9). Total num frames: 395542528. Throughput: 0: 47445.4. Samples: 248538940. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:01:21,561][14064] Avg episode reward: [(0, '0.205')] [2024-06-06 14:01:23,049][14296] Updated weights for policy 0, policy_version 24147 (0.0035) [2024-06-06 14:01:26,470][14296] Updated weights for policy 0, policy_version 24157 (0.0025) [2024-06-06 14:01:26,561][14064] Fps is (10 sec: 49152.1, 60 sec: 47240.7, 300 sec: 47763.5). Total num frames: 395788288. Throughput: 0: 47612.9. Samples: 248822580. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:01:26,561][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 14:01:29,787][14296] Updated weights for policy 0, policy_version 24167 (0.0025) [2024-06-06 14:01:31,561][14064] Fps is (10 sec: 44236.7, 60 sec: 46967.4, 300 sec: 47652.9). Total num frames: 395984896. Throughput: 0: 47435.6. Samples: 249108420. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 14:01:31,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 14:01:33,581][14296] Updated weights for policy 0, policy_version 24177 (0.0028) [2024-06-06 14:01:36,561][14064] Fps is (10 sec: 47513.1, 60 sec: 47786.6, 300 sec: 47708.0). Total num frames: 396263424. Throughput: 0: 47115.0. Samples: 249246500. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 14:01:36,562][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 14:01:36,932][14296] Updated weights for policy 0, policy_version 24187 (0.0028) [2024-06-06 14:01:40,663][14296] Updated weights for policy 0, policy_version 24197 (0.0032) [2024-06-06 14:01:41,561][14064] Fps is (10 sec: 50790.2, 60 sec: 47240.5, 300 sec: 47652.4). Total num frames: 396492800. Throughput: 0: 47389.9. Samples: 249535680. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 14:01:41,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 14:01:44,050][14296] Updated weights for policy 0, policy_version 24207 (0.0033) [2024-06-06 14:01:46,561][14064] Fps is (10 sec: 45875.8, 60 sec: 46967.5, 300 sec: 47708.0). Total num frames: 396722176. Throughput: 0: 47499.3. Samples: 249816240. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 14:01:46,561][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 14:01:47,383][14296] Updated weights for policy 0, policy_version 24217 (0.0031) [2024-06-06 14:01:50,715][14296] Updated weights for policy 0, policy_version 24227 (0.0040) [2024-06-06 14:01:51,561][14064] Fps is (10 sec: 44237.0, 60 sec: 46967.4, 300 sec: 47596.9). Total num frames: 396935168. Throughput: 0: 47110.8. Samples: 249953420. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 14:01:51,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 14:01:54,151][14296] Updated weights for policy 0, policy_version 24237 (0.0029) [2024-06-06 14:01:56,561][14064] Fps is (10 sec: 50789.8, 60 sec: 48061.8, 300 sec: 47763.5). Total num frames: 397230080. Throughput: 0: 47271.0. Samples: 250243840. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 14:01:56,562][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 14:01:57,426][14296] Updated weights for policy 0, policy_version 24247 (0.0031) [2024-06-06 14:02:01,247][14296] Updated weights for policy 0, policy_version 24257 (0.0040) [2024-06-06 14:02:01,561][14064] Fps is (10 sec: 50790.1, 60 sec: 47242.6, 300 sec: 47652.5). Total num frames: 397443072. Throughput: 0: 47283.1. Samples: 250524780. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 14:02:01,570][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 14:02:04,514][14296] Updated weights for policy 0, policy_version 24267 (0.0026) [2024-06-06 14:02:06,561][14064] Fps is (10 sec: 44237.2, 60 sec: 47240.6, 300 sec: 47708.0). Total num frames: 397672448. Throughput: 0: 47392.0. Samples: 250671580. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 14:02:06,562][14064] Avg episode reward: [(0, '0.205')] [2024-06-06 14:02:06,693][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000024273_397688832.pth... [2024-06-06 14:02:06,745][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000023574_386236416.pth [2024-06-06 14:02:08,233][14296] Updated weights for policy 0, policy_version 24277 (0.0030) [2024-06-06 14:02:11,561][14064] Fps is (10 sec: 45875.5, 60 sec: 47240.6, 300 sec: 47652.5). Total num frames: 397901824. Throughput: 0: 47366.7. Samples: 250954080. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 14:02:11,562][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 14:02:11,613][14296] Updated weights for policy 0, policy_version 24287 (0.0024) [2024-06-06 14:02:15,134][14276] Signal inference workers to stop experience collection... (3700 times) [2024-06-06 14:02:15,180][14296] InferenceWorker_p0-w0: stopping experience collection (3700 times) [2024-06-06 14:02:15,180][14276] Signal inference workers to resume experience collection... (3700 times) [2024-06-06 14:02:15,182][14296] Updated weights for policy 0, policy_version 24297 (0.0040) [2024-06-06 14:02:15,198][14296] InferenceWorker_p0-w0: resuming experience collection (3700 times) [2024-06-06 14:02:16,562][14064] Fps is (10 sec: 49147.5, 60 sec: 47785.9, 300 sec: 47708.3). Total num frames: 398163968. Throughput: 0: 47367.5. Samples: 251240000. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 14:02:16,563][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:02:18,444][14296] Updated weights for policy 0, policy_version 24307 (0.0026) [2024-06-06 14:02:21,561][14064] Fps is (10 sec: 47514.0, 60 sec: 47240.6, 300 sec: 47596.9). Total num frames: 398376960. Throughput: 0: 47642.4. Samples: 251390400. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:02:21,562][14064] Avg episode reward: [(0, '0.205')] [2024-06-06 14:02:21,843][14296] Updated weights for policy 0, policy_version 24317 (0.0027) [2024-06-06 14:02:25,398][14296] Updated weights for policy 0, policy_version 24327 (0.0032) [2024-06-06 14:02:26,561][14064] Fps is (10 sec: 44240.7, 60 sec: 46967.4, 300 sec: 47596.9). Total num frames: 398606336. Throughput: 0: 47465.3. Samples: 251671620. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:02:26,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 14:02:28,687][14296] Updated weights for policy 0, policy_version 24337 (0.0026) [2024-06-06 14:02:31,561][14064] Fps is (10 sec: 50789.6, 60 sec: 48332.8, 300 sec: 47708.0). Total num frames: 398884864. Throughput: 0: 47611.5. Samples: 251958760. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:02:31,562][14064] Avg episode reward: [(0, '0.193')] [2024-06-06 14:02:32,281][14296] Updated weights for policy 0, policy_version 24347 (0.0029) [2024-06-06 14:02:35,502][14296] Updated weights for policy 0, policy_version 24357 (0.0021) [2024-06-06 14:02:36,561][14064] Fps is (10 sec: 50790.7, 60 sec: 47513.7, 300 sec: 47652.5). Total num frames: 399114240. Throughput: 0: 47928.9. Samples: 252110220. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:02:36,562][14064] Avg episode reward: [(0, '0.201')] [2024-06-06 14:02:38,951][14296] Updated weights for policy 0, policy_version 24367 (0.0037) [2024-06-06 14:02:41,561][14064] Fps is (10 sec: 47513.1, 60 sec: 47786.6, 300 sec: 47708.0). Total num frames: 399360000. Throughput: 0: 47977.7. Samples: 252402840. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:02:41,562][14064] Avg episode reward: [(0, '0.205')] [2024-06-06 14:02:42,245][14296] Updated weights for policy 0, policy_version 24377 (0.0023) [2024-06-06 14:02:45,639][14296] Updated weights for policy 0, policy_version 24387 (0.0028) [2024-06-06 14:02:46,561][14064] Fps is (10 sec: 45874.6, 60 sec: 47513.5, 300 sec: 47652.4). Total num frames: 399572992. Throughput: 0: 48164.4. Samples: 252692180. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:02:46,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:02:48,874][14296] Updated weights for policy 0, policy_version 24397 (0.0020) [2024-06-06 14:02:51,561][14064] Fps is (10 sec: 47514.4, 60 sec: 48332.8, 300 sec: 47708.0). Total num frames: 399835136. Throughput: 0: 47960.9. Samples: 252829820. Policy #0 lag: (min: 0.0, avg: 12.2, max: 24.0) [2024-06-06 14:02:51,562][14064] Avg episode reward: [(0, '0.207')] [2024-06-06 14:02:52,552][14296] Updated weights for policy 0, policy_version 24407 (0.0033) [2024-06-06 14:02:55,794][14296] Updated weights for policy 0, policy_version 24417 (0.0034) [2024-06-06 14:02:56,561][14064] Fps is (10 sec: 49152.1, 60 sec: 47240.5, 300 sec: 47652.4). Total num frames: 400064512. Throughput: 0: 48138.5. Samples: 253120320. Policy #0 lag: (min: 0.0, avg: 12.2, max: 24.0) [2024-06-06 14:02:56,562][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 14:02:59,214][14296] Updated weights for policy 0, policy_version 24427 (0.0029) [2024-06-06 14:03:01,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47786.7, 300 sec: 47763.5). Total num frames: 400310272. Throughput: 0: 48173.5. Samples: 253407760. Policy #0 lag: (min: 0.0, avg: 12.2, max: 24.0) [2024-06-06 14:03:01,562][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 14:03:02,679][14296] Updated weights for policy 0, policy_version 24437 (0.0030) [2024-06-06 14:03:05,950][14296] Updated weights for policy 0, policy_version 24447 (0.0034) [2024-06-06 14:03:06,561][14064] Fps is (10 sec: 49152.2, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 400556032. Throughput: 0: 47977.6. Samples: 253549400. Policy #0 lag: (min: 0.0, avg: 12.2, max: 24.0) [2024-06-06 14:03:06,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 14:03:09,322][14296] Updated weights for policy 0, policy_version 24457 (0.0035) [2024-06-06 14:03:11,561][14064] Fps is (10 sec: 49151.6, 60 sec: 48332.7, 300 sec: 47708.0). Total num frames: 400801792. Throughput: 0: 48141.3. Samples: 253837980. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-06 14:03:11,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:03:12,876][14296] Updated weights for policy 0, policy_version 24467 (0.0026) [2024-06-06 14:03:16,074][14296] Updated weights for policy 0, policy_version 24477 (0.0029) [2024-06-06 14:03:16,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47787.3, 300 sec: 47708.0). Total num frames: 401031168. Throughput: 0: 48206.1. Samples: 254128040. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-06 14:03:16,562][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 14:03:19,750][14296] Updated weights for policy 0, policy_version 24487 (0.0029) [2024-06-06 14:03:21,561][14064] Fps is (10 sec: 47513.7, 60 sec: 48332.7, 300 sec: 47763.5). Total num frames: 401276928. Throughput: 0: 48061.7. Samples: 254273000. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-06 14:03:21,562][14064] Avg episode reward: [(0, '0.194')] [2024-06-06 14:03:23,050][14296] Updated weights for policy 0, policy_version 24497 (0.0022) [2024-06-06 14:03:23,735][14276] Signal inference workers to stop experience collection... (3750 times) [2024-06-06 14:03:23,783][14296] InferenceWorker_p0-w0: stopping experience collection (3750 times) [2024-06-06 14:03:23,790][14276] Signal inference workers to resume experience collection... (3750 times) [2024-06-06 14:03:23,791][14296] InferenceWorker_p0-w0: resuming experience collection (3750 times) [2024-06-06 14:03:26,405][14296] Updated weights for policy 0, policy_version 24507 (0.0021) [2024-06-06 14:03:26,561][14064] Fps is (10 sec: 49152.5, 60 sec: 48605.9, 300 sec: 47708.0). Total num frames: 401522688. Throughput: 0: 47977.0. Samples: 254561800. Policy #0 lag: (min: 1.0, avg: 11.3, max: 22.0) [2024-06-06 14:03:26,562][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 14:03:29,961][14296] Updated weights for policy 0, policy_version 24517 (0.0030) [2024-06-06 14:03:31,561][14064] Fps is (10 sec: 49151.8, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 401768448. Throughput: 0: 47747.6. Samples: 254840820. Policy #0 lag: (min: 1.0, avg: 11.3, max: 22.0) [2024-06-06 14:03:31,562][14064] Avg episode reward: [(0, '0.206')] [2024-06-06 14:03:33,363][14296] Updated weights for policy 0, policy_version 24527 (0.0030) [2024-06-06 14:03:36,561][14064] Fps is (10 sec: 47513.7, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 401997824. Throughput: 0: 48103.1. Samples: 254994460. Policy #0 lag: (min: 1.0, avg: 11.3, max: 22.0) [2024-06-06 14:03:36,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:03:36,639][14296] Updated weights for policy 0, policy_version 24537 (0.0022) [2024-06-06 14:03:40,230][14296] Updated weights for policy 0, policy_version 24547 (0.0026) [2024-06-06 14:03:41,561][14064] Fps is (10 sec: 45875.6, 60 sec: 47786.8, 300 sec: 47708.0). Total num frames: 402227200. Throughput: 0: 47984.2. Samples: 255279600. Policy #0 lag: (min: 0.0, avg: 10.4, max: 23.0) [2024-06-06 14:03:41,562][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 14:03:43,674][14296] Updated weights for policy 0, policy_version 24557 (0.0022) [2024-06-06 14:03:46,561][14064] Fps is (10 sec: 47513.9, 60 sec: 48332.9, 300 sec: 47652.5). Total num frames: 402472960. Throughput: 0: 47933.8. Samples: 255564780. Policy #0 lag: (min: 0.0, avg: 10.4, max: 23.0) [2024-06-06 14:03:46,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 14:03:47,037][14296] Updated weights for policy 0, policy_version 24567 (0.0031) [2024-06-06 14:03:50,682][14296] Updated weights for policy 0, policy_version 24577 (0.0026) [2024-06-06 14:03:51,561][14064] Fps is (10 sec: 50790.1, 60 sec: 48332.8, 300 sec: 47763.5). Total num frames: 402735104. Throughput: 0: 48102.7. Samples: 255714020. Policy #0 lag: (min: 0.0, avg: 10.4, max: 23.0) [2024-06-06 14:03:51,562][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 14:03:53,663][14296] Updated weights for policy 0, policy_version 24587 (0.0028) [2024-06-06 14:03:56,561][14064] Fps is (10 sec: 45875.5, 60 sec: 47786.8, 300 sec: 47652.5). Total num frames: 402931712. Throughput: 0: 47908.6. Samples: 255993860. Policy #0 lag: (min: 0.0, avg: 10.4, max: 23.0) [2024-06-06 14:03:56,561][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 14:03:57,442][14296] Updated weights for policy 0, policy_version 24597 (0.0028) [2024-06-06 14:04:00,828][14296] Updated weights for policy 0, policy_version 24607 (0.0022) [2024-06-06 14:04:01,561][14064] Fps is (10 sec: 44236.2, 60 sec: 47786.5, 300 sec: 47652.4). Total num frames: 403177472. Throughput: 0: 47821.8. Samples: 256280020. Policy #0 lag: (min: 1.0, avg: 9.7, max: 22.0) [2024-06-06 14:04:01,562][14064] Avg episode reward: [(0, '0.207')] [2024-06-06 14:04:04,183][14296] Updated weights for policy 0, policy_version 24617 (0.0034) [2024-06-06 14:04:06,561][14064] Fps is (10 sec: 49150.6, 60 sec: 47786.6, 300 sec: 47652.4). Total num frames: 403423232. Throughput: 0: 47654.9. Samples: 256417480. Policy #0 lag: (min: 1.0, avg: 9.7, max: 22.0) [2024-06-06 14:04:06,562][14064] Avg episode reward: [(0, '0.206')] [2024-06-06 14:04:06,685][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000024624_403439616.pth... [2024-06-06 14:04:06,736][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000023926_392003584.pth [2024-06-06 14:04:07,791][14296] Updated weights for policy 0, policy_version 24627 (0.0032) [2024-06-06 14:04:11,264][14296] Updated weights for policy 0, policy_version 24637 (0.0026) [2024-06-06 14:04:11,561][14064] Fps is (10 sec: 50790.7, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 403685376. Throughput: 0: 47908.8. Samples: 256717700. Policy #0 lag: (min: 1.0, avg: 9.7, max: 22.0) [2024-06-06 14:04:11,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 14:04:14,663][14296] Updated weights for policy 0, policy_version 24647 (0.0031) [2024-06-06 14:04:16,561][14064] Fps is (10 sec: 49152.6, 60 sec: 48059.8, 300 sec: 47708.0). Total num frames: 403914752. Throughput: 0: 47870.7. Samples: 256995000. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:04:16,562][14064] Avg episode reward: [(0, '0.208')] [2024-06-06 14:04:18,248][14296] Updated weights for policy 0, policy_version 24657 (0.0032) [2024-06-06 14:04:21,369][14296] Updated weights for policy 0, policy_version 24667 (0.0041) [2024-06-06 14:04:21,561][14064] Fps is (10 sec: 45875.5, 60 sec: 47786.7, 300 sec: 47652.4). Total num frames: 404144128. Throughput: 0: 47578.2. Samples: 257135480. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:04:21,562][14064] Avg episode reward: [(0, '0.205')] [2024-06-06 14:04:24,945][14276] Signal inference workers to stop experience collection... (3800 times) [2024-06-06 14:04:24,946][14276] Signal inference workers to resume experience collection... (3800 times) [2024-06-06 14:04:24,986][14296] InferenceWorker_p0-w0: stopping experience collection (3800 times) [2024-06-06 14:04:24,986][14296] InferenceWorker_p0-w0: resuming experience collection (3800 times) [2024-06-06 14:04:25,105][14296] Updated weights for policy 0, policy_version 24677 (0.0026) [2024-06-06 14:04:26,564][14064] Fps is (10 sec: 49139.5, 60 sec: 48057.7, 300 sec: 47763.1). Total num frames: 404406272. Throughput: 0: 47569.6. Samples: 257420360. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:04:26,565][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:04:28,426][14296] Updated weights for policy 0, policy_version 24687 (0.0027) [2024-06-06 14:04:31,562][14064] Fps is (10 sec: 45872.0, 60 sec: 47240.0, 300 sec: 47596.8). Total num frames: 404602880. Throughput: 0: 47695.6. Samples: 257711120. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 14:04:31,563][14064] Avg episode reward: [(0, '0.206')] [2024-06-06 14:04:31,775][14296] Updated weights for policy 0, policy_version 24697 (0.0034) [2024-06-06 14:04:35,311][14296] Updated weights for policy 0, policy_version 24707 (0.0023) [2024-06-06 14:04:36,561][14064] Fps is (10 sec: 44248.7, 60 sec: 47513.7, 300 sec: 47652.5). Total num frames: 404848640. Throughput: 0: 47571.7. Samples: 257854740. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 14:04:36,561][14064] Avg episode reward: [(0, '0.209')] [2024-06-06 14:04:38,745][14296] Updated weights for policy 0, policy_version 24717 (0.0033) [2024-06-06 14:04:41,564][14064] Fps is (10 sec: 49143.8, 60 sec: 47784.7, 300 sec: 47652.1). Total num frames: 405094400. Throughput: 0: 47589.8. Samples: 258135520. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 14:04:41,564][14064] Avg episode reward: [(0, '0.214')] [2024-06-06 14:04:41,565][14276] Saving new best policy, reward=0.214! [2024-06-06 14:04:42,291][14296] Updated weights for policy 0, policy_version 24727 (0.0038) [2024-06-06 14:04:45,850][14296] Updated weights for policy 0, policy_version 24737 (0.0028) [2024-06-06 14:04:46,561][14064] Fps is (10 sec: 49150.9, 60 sec: 47786.5, 300 sec: 47596.9). Total num frames: 405340160. Throughput: 0: 47645.3. Samples: 258424060. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 14:04:46,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 14:04:48,947][14296] Updated weights for policy 0, policy_version 24747 (0.0029) [2024-06-06 14:04:51,561][14064] Fps is (10 sec: 45886.3, 60 sec: 46967.5, 300 sec: 47596.9). Total num frames: 405553152. Throughput: 0: 47848.7. Samples: 258570660. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 14:04:51,562][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 14:04:52,548][14296] Updated weights for policy 0, policy_version 24757 (0.0035) [2024-06-06 14:04:55,940][14296] Updated weights for policy 0, policy_version 24767 (0.0025) [2024-06-06 14:04:56,561][14064] Fps is (10 sec: 45875.9, 60 sec: 47786.6, 300 sec: 47708.0). Total num frames: 405798912. Throughput: 0: 47498.3. Samples: 258855120. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 14:04:56,562][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 14:04:59,267][14296] Updated weights for policy 0, policy_version 24777 (0.0041) [2024-06-06 14:05:01,561][14064] Fps is (10 sec: 49152.2, 60 sec: 47786.8, 300 sec: 47596.9). Total num frames: 406044672. Throughput: 0: 47637.0. Samples: 259138660. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 14:05:01,562][14064] Avg episode reward: [(0, '0.205')] [2024-06-06 14:05:03,015][14296] Updated weights for policy 0, policy_version 24787 (0.0039) [2024-06-06 14:05:06,283][14296] Updated weights for policy 0, policy_version 24797 (0.0027) [2024-06-06 14:05:06,561][14064] Fps is (10 sec: 47513.6, 60 sec: 47513.8, 300 sec: 47652.9). Total num frames: 406274048. Throughput: 0: 47701.8. Samples: 259282060. Policy #0 lag: (min: 0.0, avg: 11.4, max: 24.0) [2024-06-06 14:05:06,562][14064] Avg episode reward: [(0, '0.206')] [2024-06-06 14:05:09,821][14296] Updated weights for policy 0, policy_version 24807 (0.0032) [2024-06-06 14:05:11,564][14064] Fps is (10 sec: 49139.5, 60 sec: 47511.7, 300 sec: 47707.6). Total num frames: 406536192. Throughput: 0: 47787.2. Samples: 259570780. Policy #0 lag: (min: 0.0, avg: 11.4, max: 24.0) [2024-06-06 14:05:11,564][14064] Avg episode reward: [(0, '0.205')] [2024-06-06 14:05:13,239][14296] Updated weights for policy 0, policy_version 24817 (0.0025) [2024-06-06 14:05:16,486][14296] Updated weights for policy 0, policy_version 24827 (0.0030) [2024-06-06 14:05:16,561][14064] Fps is (10 sec: 49151.5, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 406765568. Throughput: 0: 47732.2. Samples: 259859040. Policy #0 lag: (min: 0.0, avg: 11.4, max: 24.0) [2024-06-06 14:05:16,562][14064] Avg episode reward: [(0, '0.215')] [2024-06-06 14:05:16,573][14276] Saving new best policy, reward=0.215! [2024-06-06 14:05:19,990][14296] Updated weights for policy 0, policy_version 24837 (0.0027) [2024-06-06 14:05:21,561][14064] Fps is (10 sec: 47525.4, 60 sec: 47786.7, 300 sec: 47652.5). Total num frames: 407011328. Throughput: 0: 47629.2. Samples: 259998060. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 14:05:21,562][14064] Avg episode reward: [(0, '0.206')] [2024-06-06 14:05:23,554][14296] Updated weights for policy 0, policy_version 24847 (0.0036) [2024-06-06 14:05:26,561][14064] Fps is (10 sec: 47514.1, 60 sec: 47242.6, 300 sec: 47708.0). Total num frames: 407240704. Throughput: 0: 47680.8. Samples: 260281040. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 14:05:26,562][14064] Avg episode reward: [(0, '0.213')] [2024-06-06 14:05:26,768][14296] Updated weights for policy 0, policy_version 24857 (0.0036) [2024-06-06 14:05:30,605][14296] Updated weights for policy 0, policy_version 24867 (0.0029) [2024-06-06 14:05:31,561][14064] Fps is (10 sec: 47513.9, 60 sec: 48060.4, 300 sec: 47763.5). Total num frames: 407486464. Throughput: 0: 47744.2. Samples: 260572540. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 14:05:31,561][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 14:05:33,661][14296] Updated weights for policy 0, policy_version 24877 (0.0033) [2024-06-06 14:05:36,561][14064] Fps is (10 sec: 47513.2, 60 sec: 47786.5, 300 sec: 47652.4). Total num frames: 407715840. Throughput: 0: 47635.9. Samples: 260714280. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 14:05:36,562][14064] Avg episode reward: [(0, '0.200')] [2024-06-06 14:05:37,244][14296] Updated weights for policy 0, policy_version 24887 (0.0023) [2024-06-06 14:05:40,496][14296] Updated weights for policy 0, policy_version 24897 (0.0036) [2024-06-06 14:05:41,561][14064] Fps is (10 sec: 49151.6, 60 sec: 48061.6, 300 sec: 47708.0). Total num frames: 407977984. Throughput: 0: 47988.4. Samples: 261014600. Policy #0 lag: (min: 1.0, avg: 9.6, max: 20.0) [2024-06-06 14:05:41,562][14064] Avg episode reward: [(0, '0.208')] [2024-06-06 14:05:43,851][14296] Updated weights for policy 0, policy_version 24907 (0.0032) [2024-06-06 14:05:44,867][14276] Signal inference workers to stop experience collection... (3850 times) [2024-06-06 14:05:44,868][14276] Signal inference workers to resume experience collection... (3850 times) [2024-06-06 14:05:44,888][14296] InferenceWorker_p0-w0: stopping experience collection (3850 times) [2024-06-06 14:05:44,888][14296] InferenceWorker_p0-w0: resuming experience collection (3850 times) [2024-06-06 14:05:46,564][14064] Fps is (10 sec: 49139.4, 60 sec: 47784.7, 300 sec: 47763.1). Total num frames: 408207360. Throughput: 0: 48008.7. Samples: 261299180. Policy #0 lag: (min: 1.0, avg: 9.6, max: 20.0) [2024-06-06 14:05:46,565][14064] Avg episode reward: [(0, '0.207')] [2024-06-06 14:05:47,187][14296] Updated weights for policy 0, policy_version 24917 (0.0029) [2024-06-06 14:05:50,744][14296] Updated weights for policy 0, policy_version 24927 (0.0040) [2024-06-06 14:05:51,561][14064] Fps is (10 sec: 44237.1, 60 sec: 47786.7, 300 sec: 47708.4). Total num frames: 408420352. Throughput: 0: 48050.2. Samples: 261444320. Policy #0 lag: (min: 1.0, avg: 9.6, max: 20.0) [2024-06-06 14:05:51,562][14064] Avg episode reward: [(0, '0.211')] [2024-06-06 14:05:53,842][14296] Updated weights for policy 0, policy_version 24937 (0.0034) [2024-06-06 14:05:56,561][14064] Fps is (10 sec: 49165.2, 60 sec: 48332.8, 300 sec: 47764.0). Total num frames: 408698880. Throughput: 0: 47935.6. Samples: 261727760. Policy #0 lag: (min: 1.0, avg: 11.2, max: 21.0) [2024-06-06 14:05:56,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:05:57,571][14296] Updated weights for policy 0, policy_version 24947 (0.0035) [2024-06-06 14:06:00,814][14296] Updated weights for policy 0, policy_version 24957 (0.0028) [2024-06-06 14:06:01,561][14064] Fps is (10 sec: 52428.3, 60 sec: 48332.7, 300 sec: 47819.1). Total num frames: 408944640. Throughput: 0: 47852.9. Samples: 262012420. Policy #0 lag: (min: 1.0, avg: 11.2, max: 21.0) [2024-06-06 14:06:01,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:06:04,490][14296] Updated weights for policy 0, policy_version 24967 (0.0028) [2024-06-06 14:06:06,561][14064] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 409157632. Throughput: 0: 48165.4. Samples: 262165500. Policy #0 lag: (min: 1.0, avg: 11.2, max: 21.0) [2024-06-06 14:06:06,562][14064] Avg episode reward: [(0, '0.207')] [2024-06-06 14:06:06,634][14276] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000024974_409174016.pth... [2024-06-06 14:06:06,680][14276] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000024273_397688832.pth [2024-06-06 14:06:08,094][14296] Updated weights for policy 0, policy_version 24977 (0.0032) [2024-06-06 14:06:11,288][14296] Updated weights for policy 0, policy_version 24987 (0.0027) [2024-06-06 14:06:11,561][14064] Fps is (10 sec: 44237.1, 60 sec: 47515.6, 300 sec: 47763.5). Total num frames: 409387008. Throughput: 0: 48203.1. Samples: 262450180. Policy #0 lag: (min: 1.0, avg: 9.7, max: 21.0) [2024-06-06 14:06:11,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:06:14,760][14296] Updated weights for policy 0, policy_version 24997 (0.0028) [2024-06-06 14:06:16,561][14064] Fps is (10 sec: 49151.1, 60 sec: 48059.7, 300 sec: 47819.0). Total num frames: 409649152. Throughput: 0: 47979.4. Samples: 262731620. Policy #0 lag: (min: 1.0, avg: 9.7, max: 21.0) [2024-06-06 14:06:16,562][14064] Avg episode reward: [(0, '0.215')] [2024-06-06 14:06:18,243][14296] Updated weights for policy 0, policy_version 25007 (0.0028) [2024-06-06 14:06:21,359][14296] Updated weights for policy 0, policy_version 25017 (0.0034) [2024-06-06 14:06:21,561][14064] Fps is (10 sec: 50790.5, 60 sec: 48059.8, 300 sec: 47819.1). Total num frames: 409894912. Throughput: 0: 48104.1. Samples: 262878960. Policy #0 lag: (min: 1.0, avg: 9.7, max: 21.0) [2024-06-06 14:06:21,562][14064] Avg episode reward: [(0, '0.212')] [2024-06-06 14:06:24,913][14296] Updated weights for policy 0, policy_version 25027 (0.0032) [2024-06-06 14:06:26,561][14064] Fps is (10 sec: 47514.6, 60 sec: 48059.8, 300 sec: 47930.2). Total num frames: 410124288. Throughput: 0: 47938.8. Samples: 263171840. Policy #0 lag: (min: 1.0, avg: 9.7, max: 21.0) [2024-06-06 14:06:26,561][14064] Avg episode reward: [(0, '0.202')] [2024-06-06 14:06:28,239][14296] Updated weights for policy 0, policy_version 25037 (0.0032) [2024-06-06 14:06:31,561][14064] Fps is (10 sec: 45875.3, 60 sec: 47786.7, 300 sec: 47763.5). Total num frames: 410353664. Throughput: 0: 47903.3. Samples: 263454700. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 14:06:31,562][14064] Avg episode reward: [(0, '0.207')] [2024-06-06 14:06:31,819][14296] Updated weights for policy 0, policy_version 25047 (0.0022) [2024-06-06 14:06:35,134][14296] Updated weights for policy 0, policy_version 25057 (0.0046) [2024-06-06 14:06:36,561][14064] Fps is (10 sec: 49151.3, 60 sec: 48332.8, 300 sec: 47874.6). Total num frames: 410615808. Throughput: 0: 47907.9. Samples: 263600180. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 14:06:36,562][14064] Avg episode reward: [(0, '0.206')] [2024-06-06 14:06:38,733][14296] Updated weights for policy 0, policy_version 25067 (0.0038) [2024-06-06 14:06:41,561][14064] Fps is (10 sec: 49151.7, 60 sec: 47786.7, 300 sec: 47874.6). Total num frames: 410845184. Throughput: 0: 47857.3. Samples: 263881340. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 14:06:41,562][14064] Avg episode reward: [(0, '0.211')] [2024-06-06 14:06:41,878][14296] Updated weights for policy 0, policy_version 25077 (0.0028) [2024-06-06 14:06:45,752][14296] Updated weights for policy 0, policy_version 25087 (0.0030) [2024-06-06 14:06:45,788][14276] Signal inference workers to stop experience collection... (3900 times) [2024-06-06 14:06:45,788][14276] Signal inference workers to resume experience collection... (3900 times) [2024-06-06 14:06:45,812][14296] InferenceWorker_p0-w0: stopping experience collection (3900 times) [2024-06-06 14:06:45,813][14296] InferenceWorker_p0-w0: resuming experience collection (3900 times) [2024-06-06 14:06:46,561][14064] Fps is (10 sec: 44237.2, 60 sec: 47515.7, 300 sec: 47874.6). Total num frames: 411058176. Throughput: 0: 47965.9. Samples: 264170880. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 14:06:46,562][14064] Avg episode reward: [(0, '0.212')] [2024-06-06 14:06:48,780][14296] Updated weights for policy 0, policy_version 25097 (0.0043) [2024-06-06 14:06:51,561][14064] Fps is (10 sec: 45875.2, 60 sec: 48059.7, 300 sec: 47708.0). Total num frames: 411303936. Throughput: 0: 47582.6. Samples: 264306720. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 14:06:51,562][14064] Avg episode reward: [(0, '0.208')] [2024-06-06 14:06:52,396][14296] Updated weights for policy 0, policy_version 25107 (0.0035) [2024-06-06 14:06:55,635][14296] Updated weights for policy 0, policy_version 25117 (0.0030) [2024-06-06 14:06:56,561][14064] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47930.1). Total num frames: 411582464. Throughput: 0: 47815.5. Samples: 264601880. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 14:06:56,562][14064] Avg episode reward: [(0, '0.207')] [2024-06-06 14:06:59,107][14296] Updated weights for policy 0, policy_version 25127 (0.0030) [2024-06-06 14:07:01,564][14064] Fps is (10 sec: 49139.1, 60 sec: 47511.6, 300 sec: 47874.2). Total num frames: 411795456. Throughput: 0: 47950.2. Samples: 264889500. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 14:07:01,565][14064] Avg episode reward: [(0, '0.199')] [2024-06-06 14:07:02,339][14296] Updated weights for policy 0, policy_version 25137 (0.0031) [2024-06-06 14:07:06,212][14296] Updated weights for policy 0, policy_version 25147 (0.0034) [2024-06-06 14:07:06,561][14064] Fps is (10 sec: 44236.7, 60 sec: 47786.6, 300 sec: 47874.6). Total num frames: 412024832. Throughput: 0: 47839.9. Samples: 265031760. Policy #0 lag: (min: 0.0, avg: 9.8, max: 23.0) [2024-06-06 14:07:06,562][14064] Avg episode reward: [(0, '0.211')] [2024-06-06 14:07:09,044][14296] Updated weights for policy 0, policy_version 25157 (0.0025) [2024-06-06 14:07:11,561][14064] Fps is (10 sec: 47526.2, 60 sec: 48059.7, 300 sec: 47819.2). Total num frames: 412270592. Throughput: 0: 47690.6. Samples: 265317920. Policy #0 lag: (min: 0.0, avg: 9.8, max: 23.0) [2024-06-06 14:07:11,562][14064] Avg episode reward: [(0, '0.210')] [2024-06-06 14:07:13,225][14296] Updated weights for policy 0, policy_version 25167 (0.0044) [2024-06-06 14:07:16,038][14296] Updated weights for policy 0, policy_version 25177 (0.0034) [2024-06-06 14:07:16,561][14064] Fps is (10 sec: 49152.6, 60 sec: 47786.8, 300 sec: 47930.1). Total num frames: 412516352. Throughput: 0: 47658.2. Samples: 265599320. Policy #0 lag: (min: 0.0, avg: 9.8, max: 23.0) [2024-06-06 14:07:16,562][14064] Avg episode reward: [(0, '0.203')] [2024-06-06 14:07:20,175][14296] Updated weights for policy 0, policy_version 25187 (0.0030) [2024-06-06 14:07:21,561][14064] Fps is (10 sec: 47513.5, 60 sec: 47513.6, 300 sec: 47930.1). Total num frames: 412745728. Throughput: 0: 47792.1. Samples: 265750820. Policy #0 lag: (min: 0.0, avg: 12.5, max: 23.0) [2024-06-06 14:07:21,562][14064] Avg episode reward: [(0, '0.204')] [2024-06-06 14:07:22,955][14296] Updated weights for policy 0, policy_version 25197 (0.0023) [2024-06-06 14:07:26,561][14064] Fps is (10 sec: 45875.2, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 412975104. Throughput: 0: 47826.3. Samples: 266033520. Policy #0 lag: (min: 0.0, avg: 12.5, max: 23.0) [2024-06-06 14:07:26,562][14064] Avg episode reward: [(0, '0.208')] [2024-06-06 14:07:26,945][14296] Updated weights for policy 0, policy_version 25207 (0.0024) [2024-06-06 14:07:29,658][14296] Updated weights for policy 0, policy_version 25217 (0.0026) [2024-06-06 14:07:31,561][14064] Fps is (10 sec: 50790.6, 60 sec: 48332.8, 300 sec: 47930.1). Total num frames: 413253632. Throughput: 0: 47676.9. Samples: 266316340. Policy #0 lag: (min: 0.0, avg: 12.5, max: 23.0) [2024-06-06 14:07:31,562][14064] Avg episode reward: [(0, '0.209')] [2024-06-06 14:07:33,827][14296] Updated weights for policy 0, policy_version 25227 (0.0024) [2024-06-06 14:07:36,408][14296] Updated weights for policy 0, policy_version 25237 (0.0036) [2024-06-06 14:07:36,561][14064] Fps is (10 sec: 50789.9, 60 sec: 47786.7, 300 sec: 47874.6). Total num frames: 413483008. Throughput: 0: 47973.3. Samples: 266465520. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 14:07:36,562][14064] Avg episode reward: [(0, '0.209')] [2024-06-06 14:07:40,987][14296] Updated weights for policy 0, policy_version 25247 (0.0033) [2024-06-06 14:07:41,561][14064] Fps is (10 sec: 44236.1, 60 sec: 47513.5, 300 sec: 47874.6). Total num frames: 413696000. Throughput: 0: 47695.0. Samples: 266748160. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 14:07:41,562][14064] Avg episode reward: [(0, '0.211')] [2024-06-06 14:07:43,466][14296] Updated weights for policy 0, policy_version 25257 (0.0031) [2024-06-06 14:07:46,561][14064] Fps is (10 sec: 44237.5, 60 sec: 47786.7, 300 sec: 47763.5). Total num frames: 413925376. Throughput: 0: 47603.8. Samples: 267031540. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 14:07:46,561][14064] Avg episode reward: [(0, '0.209')] [2024-06-06 14:07:47,699][14296] Updated weights for policy 0, policy_version 25267 (0.0028) [2024-06-06 14:07:50,598][14296] Updated weights for policy 0, policy_version 25277 (0.0030) [2024-06-06 14:07:51,561][14064] Fps is (10 sec: 49152.7, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 414187520. Throughput: 0: 47507.2. Samples: 267169580. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 14:07:51,562][14064] Avg episode reward: [(0, '0.213')] [2024-06-06 14:07:54,477][14296] Updated weights for policy 0, policy_version 25287 (0.0027) [2024-06-06 14:07:56,561][14064] Fps is (10 sec: 50789.8, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 414433280. Throughput: 0: 47586.2. Samples: 267459300. Policy #0 lag: (min: 0.0, avg: 11.8, max: 21.0) [2024-06-06 14:07:56,562][14064] Avg episode reward: [(0, '0.206')] [2024-06-06 14:07:57,335][14296] Updated weights for policy 0, policy_version 25297 (0.0028) [2024-06-06 14:08:14,848][16569] Saving configuration to /workspace/metta/train_dir/p2.metta.4/config.json... [2024-06-06 14:08:14,865][16569] Rollout worker 0 uses device cpu [2024-06-06 14:08:14,866][16569] Rollout worker 1 uses device cpu [2024-06-06 14:08:14,866][16569] Rollout worker 2 uses device cpu [2024-06-06 14:08:14,866][16569] Rollout worker 3 uses device cpu [2024-06-06 14:08:14,866][16569] Rollout worker 4 uses device cpu [2024-06-06 14:08:14,867][16569] Rollout worker 5 uses device cpu [2024-06-06 14:08:14,867][16569] Rollout worker 6 uses device cpu [2024-06-06 14:08:14,867][16569] Rollout worker 7 uses device cpu [2024-06-06 14:08:14,867][16569] Rollout worker 8 uses device cpu [2024-06-06 14:08:14,868][16569] Rollout worker 9 uses device cpu [2024-06-06 14:08:14,868][16569] Rollout worker 10 uses device cpu [2024-06-06 14:08:14,868][16569] Rollout worker 11 uses device cpu [2024-06-06 14:08:14,868][16569] Rollout worker 12 uses device cpu [2024-06-06 14:08:14,869][16569] Rollout worker 13 uses device cpu [2024-06-06 14:08:14,869][16569] Rollout worker 14 uses device cpu [2024-06-06 14:08:14,869][16569] Rollout worker 15 uses device cpu [2024-06-06 14:08:14,869][16569] Rollout worker 16 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 17 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 18 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 19 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 20 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 21 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 22 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 23 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 24 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 25 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 26 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 27 uses device cpu [2024-06-06 14:08:14,870][16569] Rollout worker 28 uses device cpu [2024-06-06 14:08:14,871][16569] Rollout worker 29 uses device cpu [2024-06-06 14:08:14,871][16569] Rollout worker 30 uses device cpu [2024-06-06 14:08:14,871][16569] Rollout worker 31 uses device cpu [2024-06-06 14:08:15,403][16569] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:08:15,403][16569] InferenceWorker_p0-w0: min num requests: 10 [2024-06-06 14:08:15,448][16569] Starting all processes... [2024-06-06 14:08:15,448][16569] Starting process learner_proc0 [2024-06-06 14:08:15,718][16569] Starting all processes... [2024-06-06 14:08:15,720][16569] Starting process inference_proc0-0 [2024-06-06 14:08:15,720][16569] Starting process rollout_proc0 [2024-06-06 14:08:15,723][16569] Starting process rollout_proc1 [2024-06-06 14:08:15,723][16569] Starting process rollout_proc2 [2024-06-06 14:08:15,724][16569] Starting process rollout_proc3 [2024-06-06 14:08:15,725][16569] Starting process rollout_proc4 [2024-06-06 14:08:15,726][16569] Starting process rollout_proc5 [2024-06-06 14:08:15,726][16569] Starting process rollout_proc6 [2024-06-06 14:08:15,727][16569] Starting process rollout_proc7 [2024-06-06 14:08:15,727][16569] Starting process rollout_proc8 [2024-06-06 14:08:15,727][16569] Starting process rollout_proc9 [2024-06-06 14:08:15,727][16569] Starting process rollout_proc10 [2024-06-06 14:08:15,727][16569] Starting process rollout_proc11 [2024-06-06 14:08:15,728][16569] Starting process rollout_proc12 [2024-06-06 14:08:15,728][16569] Starting process rollout_proc13 [2024-06-06 14:08:15,728][16569] Starting process rollout_proc14 [2024-06-06 14:08:15,730][16569] Starting process rollout_proc15 [2024-06-06 14:08:15,731][16569] Starting process rollout_proc16 [2024-06-06 14:08:15,731][16569] Starting process rollout_proc17 [2024-06-06 14:08:15,732][16569] Starting process rollout_proc18 [2024-06-06 14:08:15,732][16569] Starting process rollout_proc19 [2024-06-06 14:08:15,732][16569] Starting process rollout_proc20 [2024-06-06 14:08:15,732][16569] Starting process rollout_proc21 [2024-06-06 14:08:15,736][16569] Starting process rollout_proc22 [2024-06-06 14:08:15,737][16569] Starting process rollout_proc23 [2024-06-06 14:08:15,737][16569] Starting process rollout_proc24 [2024-06-06 14:08:15,745][16569] Starting process rollout_proc25 [2024-06-06 14:08:15,745][16569] Starting process rollout_proc26 [2024-06-06 14:08:15,747][16569] Starting process rollout_proc27 [2024-06-06 14:08:15,749][16569] Starting process rollout_proc28 [2024-06-06 14:08:15,751][16569] Starting process rollout_proc29 [2024-06-06 14:08:15,751][16569] Starting process rollout_proc30 [2024-06-06 14:08:15,754][16569] Starting process rollout_proc31 [2024-06-06 14:08:17,676][16807] Worker 5 uses CPU cores [5] [2024-06-06 14:08:17,944][16827] Worker 25 uses CPU cores [25] [2024-06-06 14:08:17,952][16814] Worker 13 uses CPU cores [13] [2024-06-06 14:08:17,958][16803] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:08:17,958][16803] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-06 14:08:17,962][16804] Worker 2 uses CPU cores [2] [2024-06-06 14:08:17,967][16803] Num visible devices: 1 [2024-06-06 14:08:17,996][16801] Worker 0 uses CPU cores [0] [2024-06-06 14:08:18,011][16817] Worker 14 uses CPU cores [14] [2024-06-06 14:08:18,028][16802] Worker 1 uses CPU cores [1] [2024-06-06 14:08:18,036][16824] Worker 23 uses CPU cores [23] [2024-06-06 14:08:18,046][16810] Worker 8 uses CPU cores [8] [2024-06-06 14:08:18,048][16815] Worker 17 uses CPU cores [17] [2024-06-06 14:08:18,060][16820] Worker 18 uses CPU cores [18] [2024-06-06 14:08:18,064][16825] Worker 24 uses CPU cores [24] [2024-06-06 14:08:18,075][16829] Worker 19 uses CPU cores [19] [2024-06-06 14:08:18,075][16821] Worker 20 uses CPU cores [20] [2024-06-06 14:08:18,134][16809] Worker 7 uses CPU cores [7] [2024-06-06 14:08:18,172][16806] Worker 4 uses CPU cores [4] [2024-06-06 14:08:18,181][16805] Worker 3 uses CPU cores [3] [2024-06-06 14:08:18,214][16833] Worker 29 uses CPU cores [29] [2024-06-06 14:08:18,218][16822] Worker 21 uses CPU cores [21] [2024-06-06 14:08:18,218][16823] Worker 22 uses CPU cores [22] [2024-06-06 14:08:18,236][16813] Worker 12 uses CPU cores [12] [2024-06-06 14:08:18,237][16816] Worker 11 uses CPU cores [11] [2024-06-06 14:08:18,256][16828] Worker 27 uses CPU cores [27] [2024-06-06 14:08:18,263][16832] Worker 31 uses CPU cores [31] [2024-06-06 14:08:18,263][16831] Worker 30 uses CPU cores [30] [2024-06-06 14:08:18,277][16819] Worker 16 uses CPU cores [16] [2024-06-06 14:08:18,296][16826] Worker 26 uses CPU cores [26] [2024-06-06 14:08:18,336][16811] Worker 9 uses CPU cores [9] [2024-06-06 14:08:18,338][16830] Worker 28 uses CPU cores [28] [2024-06-06 14:08:18,343][16808] Worker 6 uses CPU cores [6] [2024-06-06 14:08:18,345][16781] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:08:18,345][16781] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-06 14:08:18,352][16781] Num visible devices: 1 [2024-06-06 14:08:18,369][16781] Setting fixed seed 0 [2024-06-06 14:08:18,370][16781] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:08:18,370][16781] Initializing actor-critic model on device cuda:0 [2024-06-06 14:08:18,374][16812] Worker 10 uses CPU cores [10] [2024-06-06 14:08:18,375][16818] Worker 15 uses CPU cores [15] [2024-06-06 14:08:19,014][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,015][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,016][16781] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,019][16781] RunningMeanStd input shape: (1,) [2024-06-06 14:08:19,019][16781] RunningMeanStd input shape: (1,) [2024-06-06 14:08:19,019][16781] RunningMeanStd input shape: (1,) [2024-06-06 14:08:19,019][16781] RunningMeanStd input shape: (1,) [2024-06-06 14:08:19,059][16781] RunningMeanStd input shape: (1,) [2024-06-06 14:08:19,064][16781] Created Actor Critic model with architecture: [2024-06-06 14:08:19,064][16781] SampleFactoryAgentWrapper( (obs_normalizer): ObservationNormalizer() (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (agent): MettaAgent( (_encoder): MultiFeatureSetEncoder( (feature_set_encoders): ModuleDict( (grid_obs): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (agent): RunningMeanStdInPlace() (altar): RunningMeanStdInPlace() (converter): RunningMeanStdInPlace() (generator): RunningMeanStdInPlace() (wall): RunningMeanStdInPlace() (agent:dir): RunningMeanStdInPlace() (agent:energy): RunningMeanStdInPlace() (agent:frozen): RunningMeanStdInPlace() (agent:hp): RunningMeanStdInPlace() (agent:id): RunningMeanStdInPlace() (agent:inv_r1): RunningMeanStdInPlace() (agent:inv_r2): RunningMeanStdInPlace() (agent:inv_r3): RunningMeanStdInPlace() (agent:shield): RunningMeanStdInPlace() (altar:hp): RunningMeanStdInPlace() (altar:state): RunningMeanStdInPlace() (converter:hp): RunningMeanStdInPlace() (converter:state): RunningMeanStdInPlace() (generator:amount): RunningMeanStdInPlace() (generator:hp): RunningMeanStdInPlace() (generator:state): RunningMeanStdInPlace() (wall:hp): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=125, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=512, out_features=512, bias=True) (7): ELU(alpha=1.0) ) ) (global_vars): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (_steps): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_action): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_action_id): RunningMeanStdInPlace() (last_action_val): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_reward): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_reward): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) ) (merged_encoder): Sequential( (0): Linear(in_features=536, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) ) ) (_core): ModelCoreRNN( (core): GRU(512, 512) ) (_decoder): Decoder( (mlp): Identity() ) (_critic_linear): Linear(in_features=512, out_features=1, bias=True) (_action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) ) [2024-06-06 14:08:19,125][16781] Using optimizer [2024-06-06 14:08:19,268][16781] Loading state from checkpoint /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000024974_409174016.pth... [2024-06-06 14:08:19,285][16781] Loading model from checkpoint [2024-06-06 14:08:19,287][16781] Loaded experiment state at self.train_step=24974, self.env_steps=409174016 [2024-06-06 14:08:19,287][16781] Initialized policy 0 weights for model version 24974 [2024-06-06 14:08:19,288][16781] LearnerWorker_p0 finished initialization! [2024-06-06 14:08:19,288][16781] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,964][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,965][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,965][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,965][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,965][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,965][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,965][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,965][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,965][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,965][16803] RunningMeanStd input shape: (11, 11) [2024-06-06 14:08:19,968][16803] RunningMeanStd input shape: (1,) [2024-06-06 14:08:19,968][16803] RunningMeanStd input shape: (1,) [2024-06-06 14:08:19,968][16803] RunningMeanStd input shape: (1,) [2024-06-06 14:08:19,969][16803] RunningMeanStd input shape: (1,) [2024-06-06 14:08:20,008][16803] RunningMeanStd input shape: (1,) [2024-06-06 14:08:20,030][16569] Inference worker 0-0 is ready! [2024-06-06 14:08:20,030][16569] All inference workers are ready! Signal rollout workers to start! [2024-06-06 14:08:22,409][16824] EvtLoop [rollout_proc23_evt_loop, process=rollout_proc23] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,418][16825] EvtLoop [rollout_proc24_evt_loop, process=rollout_proc24] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,425][16824] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc23_evt_loop [2024-06-06 14:08:22,425][16825] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc24_evt_loop [2024-06-06 14:08:22,425][16821] EvtLoop [rollout_proc20_evt_loop, process=rollout_proc20] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,427][16821] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc20_evt_loop [2024-06-06 14:08:22,426][16819] EvtLoop [rollout_proc16_evt_loop, process=rollout_proc16] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,428][16819] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc16_evt_loop [2024-06-06 14:08:22,434][16826] EvtLoop [rollout_proc26_evt_loop, process=rollout_proc26] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,435][16826] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc26_evt_loop [2024-06-06 14:08:22,436][16831] EvtLoop [rollout_proc30_evt_loop, process=rollout_proc30] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,437][16831] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc30_evt_loop [2024-06-06 14:08:22,437][16822] EvtLoop [rollout_proc21_evt_loop, process=rollout_proc21] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,437][16820] EvtLoop [rollout_proc18_evt_loop, process=rollout_proc18] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,438][16822] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc21_evt_loop [2024-06-06 14:08:22,438][16820] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc18_evt_loop [2024-06-06 14:08:22,444][16829] EvtLoop [rollout_proc19_evt_loop, process=rollout_proc19] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,446][16829] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc19_evt_loop [2024-06-06 14:08:22,445][16827] EvtLoop [rollout_proc25_evt_loop, process=rollout_proc25] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,447][16827] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc25_evt_loop [2024-06-06 14:08:22,448][16832] EvtLoop [rollout_proc31_evt_loop, process=rollout_proc31] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,450][16832] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc31_evt_loop [2024-06-06 14:08:22,452][16833] EvtLoop [rollout_proc29_evt_loop, process=rollout_proc29] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,454][16833] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc29_evt_loop [2024-06-06 14:08:22,455][16816] EvtLoop [rollout_proc11_evt_loop, process=rollout_proc11] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,456][16816] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc11_evt_loop [2024-06-06 14:08:22,464][16805] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,465][16805] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc3_evt_loop [2024-06-06 14:08:22,464][16830] EvtLoop [rollout_proc28_evt_loop, process=rollout_proc28] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,466][16830] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc28_evt_loop [2024-06-06 14:08:22,469][16823] EvtLoop [rollout_proc22_evt_loop, process=rollout_proc22] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,472][16823] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc22_evt_loop [2024-06-06 14:08:22,471][16809] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,473][16809] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc7_evt_loop [2024-06-06 14:08:22,473][16804] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,475][16804] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc2_evt_loop [2024-06-06 14:08:22,474][16802] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,476][16802] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc1_evt_loop [2024-06-06 14:08:22,479][16806] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,481][16806] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc4_evt_loop [2024-06-06 14:08:22,481][16811] EvtLoop [rollout_proc9_evt_loop, process=rollout_proc9] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,483][16811] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc9_evt_loop [2024-06-06 14:08:22,482][16807] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,484][16807] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc5_evt_loop [2024-06-06 14:08:22,483][16814] EvtLoop [rollout_proc13_evt_loop, process=rollout_proc13] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,485][16814] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc13_evt_loop [2024-06-06 14:08:22,483][16813] EvtLoop [rollout_proc12_evt_loop, process=rollout_proc12] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,483][16812] EvtLoop [rollout_proc10_evt_loop, process=rollout_proc10] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,485][16813] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc12_evt_loop [2024-06-06 14:08:22,485][16812] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc10_evt_loop [2024-06-06 14:08:22,484][16810] EvtLoop [rollout_proc8_evt_loop, process=rollout_proc8] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,486][16810] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc8_evt_loop [2024-06-06 14:08:22,485][16801] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,487][16801] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc0_evt_loop [2024-06-06 14:08:22,486][16808] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,488][16808] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc6_evt_loop [2024-06-06 14:08:22,487][16818] EvtLoop [rollout_proc15_evt_loop, process=rollout_proc15] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,489][16818] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc15_evt_loop [2024-06-06 14:08:22,487][16817] EvtLoop [rollout_proc14_evt_loop, process=rollout_proc14] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,489][16817] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc14_evt_loop [2024-06-06 14:08:22,523][16815] EvtLoop [rollout_proc17_evt_loop, process=rollout_proc17] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,523][16828] EvtLoop [rollout_proc27_evt_loop, process=rollout_proc27] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/workspace/metta/third_party/sample_factory/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/opt/conda/lib/python3.10/site-packages/gymnasium/core.py", line 467, in reset return self.env.reset(seed=seed, options=options) File "/workspace/metta/rl_framework/sample_factory/sample_factory_env_wrapper.py", line 46, in reset return self.gym_env.reset() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 40, in reset self._compute_max_energy() File "/workspace/metta/env/griddly/mettagrid/gym_env.py", line 64, in _compute_max_energy max_resources = self._game_builder.object_configs.generator.count * min( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__ self._format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/opt/conda/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__ return self._get_impl( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/opt/conda/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/opt/conda/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key count full_key: generator.count object_type=dict [2024-06-06 14:08:22,526][16815] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc17_evt_loop [2024-06-06 14:08:22,526][16828] Unhandled exception Missing key count full_key: generator.count object_type=dict in evt loop rollout_proc27_evt_loop [2024-06-06 14:08:22,570][16569] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 409174016. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:08:27,570][16569] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 409174016. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:16:32,324][19065] Saving configuration to /workspace/metta/train_dir/p2.metta.4/config.json... [2024-06-06 14:16:32,341][19065] Rollout worker 0 uses device cpu [2024-06-06 14:16:32,341][19065] Rollout worker 1 uses device cpu [2024-06-06 14:16:32,342][19065] Rollout worker 2 uses device cpu [2024-06-06 14:16:32,342][19065] Rollout worker 3 uses device cpu [2024-06-06 14:16:32,342][19065] Rollout worker 4 uses device cpu [2024-06-06 14:16:32,342][19065] Rollout worker 5 uses device cpu [2024-06-06 14:16:32,343][19065] Rollout worker 6 uses device cpu [2024-06-06 14:16:32,343][19065] Rollout worker 7 uses device cpu [2024-06-06 14:16:32,343][19065] Rollout worker 8 uses device cpu [2024-06-06 14:16:32,343][19065] Rollout worker 9 uses device cpu [2024-06-06 14:16:32,344][19065] Rollout worker 10 uses device cpu [2024-06-06 14:16:32,344][19065] Rollout worker 11 uses device cpu [2024-06-06 14:16:32,344][19065] Rollout worker 12 uses device cpu [2024-06-06 14:16:32,344][19065] Rollout worker 13 uses device cpu [2024-06-06 14:16:32,345][19065] Rollout worker 14 uses device cpu [2024-06-06 14:16:32,345][19065] Rollout worker 15 uses device cpu [2024-06-06 14:16:32,345][19065] Rollout worker 16 uses device cpu [2024-06-06 14:16:32,345][19065] Rollout worker 17 uses device cpu [2024-06-06 14:16:32,345][19065] Rollout worker 18 uses device cpu [2024-06-06 14:16:32,345][19065] Rollout worker 19 uses device cpu [2024-06-06 14:16:32,345][19065] Rollout worker 20 uses device cpu [2024-06-06 14:16:32,346][19065] Rollout worker 21 uses device cpu [2024-06-06 14:16:32,346][19065] Rollout worker 22 uses device cpu [2024-06-06 14:16:32,346][19065] Rollout worker 23 uses device cpu [2024-06-06 14:16:32,346][19065] Rollout worker 24 uses device cpu [2024-06-06 14:16:32,346][19065] Rollout worker 25 uses device cpu [2024-06-06 14:16:32,346][19065] Rollout worker 26 uses device cpu [2024-06-06 14:16:32,346][19065] Rollout worker 27 uses device cpu [2024-06-06 14:16:32,346][19065] Rollout worker 28 uses device cpu [2024-06-06 14:16:32,346][19065] Rollout worker 29 uses device cpu [2024-06-06 14:16:32,347][19065] Rollout worker 30 uses device cpu [2024-06-06 14:16:32,347][19065] Rollout worker 31 uses device cpu [2024-06-06 14:16:32,861][19065] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:16:32,861][19065] InferenceWorker_p0-w0: min num requests: 10 [2024-06-06 14:16:32,915][19065] Starting all processes... [2024-06-06 14:16:32,916][19065] Starting process learner_proc0 [2024-06-06 14:16:33,181][19065] Starting all processes... [2024-06-06 14:16:33,183][19065] Starting process inference_proc0-0 [2024-06-06 14:16:33,183][19065] Starting process rollout_proc0 [2024-06-06 14:16:33,183][19065] Starting process rollout_proc1 [2024-06-06 14:16:33,184][19065] Starting process rollout_proc2 [2024-06-06 14:16:33,184][19065] Starting process rollout_proc3 [2024-06-06 14:16:33,184][19065] Starting process rollout_proc4 [2024-06-06 14:16:33,184][19065] Starting process rollout_proc5 [2024-06-06 14:16:33,184][19065] Starting process rollout_proc6 [2024-06-06 14:16:33,185][19065] Starting process rollout_proc7 [2024-06-06 14:16:33,186][19065] Starting process rollout_proc8 [2024-06-06 14:16:33,186][19065] Starting process rollout_proc9 [2024-06-06 14:16:33,186][19065] Starting process rollout_proc10 [2024-06-06 14:16:33,186][19065] Starting process rollout_proc11 [2024-06-06 14:16:33,186][19065] Starting process rollout_proc12 [2024-06-06 14:16:33,186][19065] Starting process rollout_proc13 [2024-06-06 14:16:33,186][19065] Starting process rollout_proc14 [2024-06-06 14:16:33,187][19065] Starting process rollout_proc15 [2024-06-06 14:16:33,190][19065] Starting process rollout_proc16 [2024-06-06 14:16:33,191][19065] Starting process rollout_proc17 [2024-06-06 14:16:33,192][19065] Starting process rollout_proc18 [2024-06-06 14:16:33,193][19065] Starting process rollout_proc19 [2024-06-06 14:16:33,193][19065] Starting process rollout_proc20 [2024-06-06 14:16:33,196][19065] Starting process rollout_proc21 [2024-06-06 14:16:33,196][19065] Starting process rollout_proc22 [2024-06-06 14:16:33,197][19065] Starting process rollout_proc23 [2024-06-06 14:16:33,201][19065] Starting process rollout_proc24 [2024-06-06 14:16:33,203][19065] Starting process rollout_proc25 [2024-06-06 14:16:33,204][19065] Starting process rollout_proc26 [2024-06-06 14:16:33,207][19065] Starting process rollout_proc27 [2024-06-06 14:16:33,211][19065] Starting process rollout_proc28 [2024-06-06 14:16:33,212][19065] Starting process rollout_proc29 [2024-06-06 14:16:33,215][19065] Starting process rollout_proc30 [2024-06-06 14:16:33,215][19065] Starting process rollout_proc31 [2024-06-06 14:16:35,241][19277] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:16:35,241][19277] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-06 14:16:35,251][19277] Num visible devices: 1 [2024-06-06 14:16:35,268][19277] Setting fixed seed 0 [2024-06-06 14:16:35,270][19277] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:16:35,270][19277] Initializing actor-critic model on device cuda:0 [2024-06-06 14:16:35,327][19312] Worker 13 uses CPU cores [13] [2024-06-06 14:16:35,360][19306] Worker 8 uses CPU cores [8] [2024-06-06 14:16:35,368][19301] Worker 2 uses CPU cores [2] [2024-06-06 14:16:35,376][19302] Worker 3 uses CPU cores [3] [2024-06-06 14:16:35,387][19324] Worker 25 uses CPU cores [25] [2024-06-06 14:16:35,389][19305] Worker 7 uses CPU cores [7] [2024-06-06 14:16:35,404][19326] Worker 29 uses CPU cores [29] [2024-06-06 14:16:35,436][19318] Worker 22 uses CPU cores [22] [2024-06-06 14:16:35,448][19320] Worker 23 uses CPU cores [23] [2024-06-06 14:16:35,455][19311] Worker 10 uses CPU cores [10] [2024-06-06 14:16:35,460][19316] Worker 16 uses CPU cores [16] [2024-06-06 14:16:35,469][19313] Worker 14 uses CPU cores [14] [2024-06-06 14:16:35,476][19319] Worker 20 uses CPU cores [20] [2024-06-06 14:16:35,488][19308] Worker 9 uses CPU cores [9] [2024-06-06 14:16:35,500][19325] Worker 28 uses CPU cores [28] [2024-06-06 14:16:35,521][19300] Worker 4 uses CPU cores [4] [2024-06-06 14:16:35,528][19321] Worker 24 uses CPU cores [24] [2024-06-06 14:16:35,552][19304] Worker 6 uses CPU cores [6] [2024-06-06 14:16:35,552][19303] Worker 5 uses CPU cores [5] [2024-06-06 14:16:35,573][19297] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:16:35,573][19297] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-06 14:16:35,576][19317] Worker 19 uses CPU cores [19] [2024-06-06 14:16:35,580][19297] Num visible devices: 1 [2024-06-06 14:16:35,588][19309] Worker 12 uses CPU cores [12] [2024-06-06 14:16:35,597][19315] Worker 18 uses CPU cores [18] [2024-06-06 14:16:35,603][19327] Worker 26 uses CPU cores [26] [2024-06-06 14:16:35,632][19307] Worker 11 uses CPU cores [11] [2024-06-06 14:16:35,648][19299] Worker 1 uses CPU cores [1] [2024-06-06 14:16:35,680][19328] Worker 31 uses CPU cores [31] [2024-06-06 14:16:35,684][19322] Worker 21 uses CPU cores [21] [2024-06-06 14:16:35,696][19298] Worker 0 uses CPU cores [0] [2024-06-06 14:16:35,704][19329] Worker 30 uses CPU cores [30] [2024-06-06 14:16:35,759][19314] Worker 15 uses CPU cores [15] [2024-06-06 14:16:35,768][19310] Worker 17 uses CPU cores [17] [2024-06-06 14:16:35,864][19323] Worker 27 uses CPU cores [27] [2024-06-06 14:16:36,102][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,102][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,102][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,103][19277] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:36,106][19277] RunningMeanStd input shape: (1,) [2024-06-06 14:16:36,107][19277] RunningMeanStd input shape: (1,) [2024-06-06 14:16:36,107][19277] RunningMeanStd input shape: (1,) [2024-06-06 14:16:36,107][19277] RunningMeanStd input shape: (1,) [2024-06-06 14:16:36,145][19277] RunningMeanStd input shape: (1,) [2024-06-06 14:16:36,150][19277] Created Actor Critic model with architecture: [2024-06-06 14:16:36,150][19277] SampleFactoryAgentWrapper( (obs_normalizer): ObservationNormalizer() (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (agent): MettaAgent( (_encoder): MultiFeatureSetEncoder( (feature_set_encoders): ModuleDict( (grid_obs): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (agent): RunningMeanStdInPlace() (altar): RunningMeanStdInPlace() (converter): RunningMeanStdInPlace() (generator): RunningMeanStdInPlace() (wall): RunningMeanStdInPlace() (agent:dir): RunningMeanStdInPlace() (agent:energy): RunningMeanStdInPlace() (agent:frozen): RunningMeanStdInPlace() (agent:hp): RunningMeanStdInPlace() (agent:id): RunningMeanStdInPlace() (agent:inv_r1): RunningMeanStdInPlace() (agent:inv_r2): RunningMeanStdInPlace() (agent:inv_r3): RunningMeanStdInPlace() (agent:shield): RunningMeanStdInPlace() (altar:hp): RunningMeanStdInPlace() (altar:state): RunningMeanStdInPlace() (converter:hp): RunningMeanStdInPlace() (converter:state): RunningMeanStdInPlace() (generator:amount): RunningMeanStdInPlace() (generator:hp): RunningMeanStdInPlace() (generator:state): RunningMeanStdInPlace() (wall:hp): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=125, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=512, out_features=512, bias=True) (7): ELU(alpha=1.0) ) ) (global_vars): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (_steps): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_action): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_action_id): RunningMeanStdInPlace() (last_action_val): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_reward): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_reward): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) ) (merged_encoder): Sequential( (0): Linear(in_features=536, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) ) ) (_core): ModelCoreRNN( (core): GRU(512, 512) ) (_decoder): Decoder( (mlp): Identity() ) (_critic_linear): Linear(in_features=512, out_features=1, bias=True) (_action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) ) [2024-06-06 14:16:36,214][19277] Using optimizer [2024-06-06 14:16:36,398][19277] Loading state from checkpoint /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000024974_409174016.pth... [2024-06-06 14:16:36,413][19277] Loading model from checkpoint [2024-06-06 14:16:36,415][19277] Loaded experiment state at self.train_step=24974, self.env_steps=409174016 [2024-06-06 14:16:36,415][19277] Initialized policy 0 weights for model version 24974 [2024-06-06 14:16:36,416][19277] LearnerWorker_p0 finished initialization! [2024-06-06 14:16:36,416][19277] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:16:37,127][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,128][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,129][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,129][19297] RunningMeanStd input shape: (11, 11) [2024-06-06 14:16:37,132][19297] RunningMeanStd input shape: (1,) [2024-06-06 14:16:37,132][19297] RunningMeanStd input shape: (1,) [2024-06-06 14:16:37,132][19297] RunningMeanStd input shape: (1,) [2024-06-06 14:16:37,132][19297] RunningMeanStd input shape: (1,) [2024-06-06 14:16:37,171][19297] RunningMeanStd input shape: (1,) [2024-06-06 14:16:37,193][19065] Inference worker 0-0 is ready! [2024-06-06 14:16:37,193][19065] All inference workers are ready! Signal rollout workers to start! [2024-06-06 14:16:39,681][19316] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,682][19310] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,683][19317] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,688][19320] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,689][19315] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,689][19328] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,691][19322] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,697][19324] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,698][19321] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,704][19318] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,714][19319] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,726][19329] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,727][19325] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,728][19326] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,749][19303] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,752][19314] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,754][19327] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,756][19302] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,763][19312] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,768][19305] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,770][19299] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,773][19308] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,775][19298] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,776][19307] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,779][19301] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,781][19306] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,782][19304] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,783][19311] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,783][19313] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,785][19300] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,786][19309] Decorrelating experience for 0 frames... [2024-06-06 14:16:39,831][19323] Decorrelating experience for 0 frames... [2024-06-06 14:16:40,005][19065] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 409174016. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:16:41,111][19310] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,127][19328] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,140][19315] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,141][19322] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,146][19316] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,154][19317] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,156][19320] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,183][19321] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,184][19324] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,185][19318] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,195][19329] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,202][19319] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,230][19302] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,233][19303] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,236][19326] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,244][19314] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,264][19325] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,269][19312] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,270][19307] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,272][19299] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,279][19305] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,283][19300] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,283][19298] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,286][19308] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,288][19313] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,289][19301] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,291][19306] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,296][19311] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,297][19309] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,298][19304] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,316][19327] Decorrelating experience for 256 frames... [2024-06-06 14:16:41,354][19323] Decorrelating experience for 256 frames... [2024-06-06 14:16:45,005][19065] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 409174016. Throughput: 0: 5004.2. Samples: 25020. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:16:48,074][19299] Worker 1, sleep for 4.688 sec to decorrelate experience collection [2024-06-06 14:16:48,078][19314] Worker 15, sleep for 70.312 sec to decorrelate experience collection [2024-06-06 14:16:48,098][19307] Worker 11, sleep for 51.562 sec to decorrelate experience collection [2024-06-06 14:16:48,105][19311] Worker 10, sleep for 46.875 sec to decorrelate experience collection [2024-06-06 14:16:48,125][19306] Worker 8, sleep for 37.500 sec to decorrelate experience collection [2024-06-06 14:16:48,128][19313] Worker 14, sleep for 65.625 sec to decorrelate experience collection [2024-06-06 14:16:48,136][19308] Worker 9, sleep for 42.188 sec to decorrelate experience collection [2024-06-06 14:16:48,150][19309] Worker 12, sleep for 56.250 sec to decorrelate experience collection [2024-06-06 14:16:48,150][19312] Worker 13, sleep for 60.938 sec to decorrelate experience collection [2024-06-06 14:16:48,160][19302] Worker 3, sleep for 14.062 sec to decorrelate experience collection [2024-06-06 14:16:48,186][19322] Worker 21, sleep for 98.438 sec to decorrelate experience collection [2024-06-06 14:16:48,206][19277] Signal inference workers to stop experience collection... [2024-06-06 14:16:48,212][19297] InferenceWorker_p0-w0: stopping experience collection [2024-06-06 14:16:48,216][19304] Worker 6, sleep for 28.125 sec to decorrelate experience collection [2024-06-06 14:16:48,221][19305] Worker 7, sleep for 32.812 sec to decorrelate experience collection [2024-06-06 14:16:48,222][19320] Worker 23, sleep for 107.812 sec to decorrelate experience collection [2024-06-06 14:16:48,816][19277] Signal inference workers to resume experience collection... [2024-06-06 14:16:48,816][19297] InferenceWorker_p0-w0: resuming experience collection [2024-06-06 14:16:48,832][19301] Worker 2, sleep for 9.375 sec to decorrelate experience collection [2024-06-06 14:16:48,835][19317] Worker 19, sleep for 89.062 sec to decorrelate experience collection [2024-06-06 14:16:48,841][19318] Worker 22, sleep for 103.125 sec to decorrelate experience collection [2024-06-06 14:16:48,847][19328] Worker 31, sleep for 145.312 sec to decorrelate experience collection [2024-06-06 14:16:48,852][19303] Worker 5, sleep for 23.438 sec to decorrelate experience collection [2024-06-06 14:16:48,853][19316] Worker 16, sleep for 75.000 sec to decorrelate experience collection [2024-06-06 14:16:49,049][19319] Worker 20, sleep for 93.750 sec to decorrelate experience collection [2024-06-06 14:16:49,049][19310] Worker 17, sleep for 79.688 sec to decorrelate experience collection [2024-06-06 14:16:49,055][19324] Worker 25, sleep for 117.188 sec to decorrelate experience collection [2024-06-06 14:16:49,168][19321] Worker 24, sleep for 112.500 sec to decorrelate experience collection [2024-06-06 14:16:49,233][19300] Worker 4, sleep for 18.750 sec to decorrelate experience collection [2024-06-06 14:16:49,308][19315] Worker 18, sleep for 84.375 sec to decorrelate experience collection [2024-06-06 14:16:49,313][19327] Worker 26, sleep for 121.875 sec to decorrelate experience collection [2024-06-06 14:16:49,375][19325] Worker 28, sleep for 131.250 sec to decorrelate experience collection [2024-06-06 14:16:49,434][19329] Worker 30, sleep for 140.625 sec to decorrelate experience collection [2024-06-06 14:16:49,439][19323] Worker 27, sleep for 126.562 sec to decorrelate experience collection [2024-06-06 14:16:49,528][19326] Worker 29, sleep for 135.938 sec to decorrelate experience collection [2024-06-06 14:16:49,939][19297] Updated weights for policy 0, policy_version 24984 (0.0013) [2024-06-06 14:16:50,005][19065] Fps is (10 sec: 16383.9, 60 sec: 16383.9, 300 sec: 16383.9). Total num frames: 409337856. Throughput: 0: 32737.7. Samples: 327380. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 14:16:52,785][19299] Worker 1 awakens! [2024-06-06 14:16:52,857][19065] Heartbeat connected on Batcher_0 [2024-06-06 14:16:52,859][19065] Heartbeat connected on LearnerWorker_p0 [2024-06-06 14:16:52,865][19065] Heartbeat connected on RolloutWorker_w1 [2024-06-06 14:16:52,865][19065] Heartbeat connected on RolloutWorker_w0 [2024-06-06 14:16:52,916][19065] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-06 14:16:55,005][19065] Fps is (10 sec: 16384.1, 60 sec: 10922.8, 300 sec: 10922.8). Total num frames: 409337856. Throughput: 0: 22104.3. Samples: 331560. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 14:16:58,254][19301] Worker 2 awakens! [2024-06-06 14:16:58,261][19065] Heartbeat connected on RolloutWorker_w2 [2024-06-06 14:17:00,005][19065] Fps is (10 sec: 1638.4, 60 sec: 9011.3, 300 sec: 9011.3). Total num frames: 409354240. Throughput: 0: 17256.1. Samples: 345120. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 14:17:02,292][19302] Worker 3 awakens! [2024-06-06 14:17:02,297][19065] Heartbeat connected on RolloutWorker_w3 [2024-06-06 14:17:05,005][19065] Fps is (10 sec: 3276.8, 60 sec: 7864.4, 300 sec: 7864.4). Total num frames: 409370624. Throughput: 0: 14778.5. Samples: 369460. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 14:17:08,077][19300] Worker 4 awakens! [2024-06-06 14:17:08,084][19065] Heartbeat connected on RolloutWorker_w4 [2024-06-06 14:17:10,005][19065] Fps is (10 sec: 6553.6, 60 sec: 8192.0, 300 sec: 8192.0). Total num frames: 409419776. Throughput: 0: 12788.1. Samples: 383640. Policy #0 lag: (min: 0.0, avg: 4.3, max: 12.0) [2024-06-06 14:17:10,005][19065] Avg episode reward: [(0, '0.197')] [2024-06-06 14:17:12,388][19303] Worker 5 awakens! [2024-06-06 14:17:12,393][19065] Heartbeat connected on RolloutWorker_w5 [2024-06-06 14:17:15,005][19065] Fps is (10 sec: 9830.5, 60 sec: 8426.1, 300 sec: 8426.1). Total num frames: 409468928. Throughput: 0: 12876.6. Samples: 450680. Policy #0 lag: (min: 0.0, avg: 4.3, max: 12.0) [2024-06-06 14:17:15,005][19065] Avg episode reward: [(0, '0.217')] [2024-06-06 14:17:15,014][19277] Saving new best policy, reward=0.217! [2024-06-06 14:17:16,264][19297] Updated weights for policy 0, policy_version 24994 (0.0013) [2024-06-06 14:17:16,415][19304] Worker 6 awakens! [2024-06-06 14:17:16,420][19065] Heartbeat connected on RolloutWorker_w6 [2024-06-06 14:17:20,005][19065] Fps is (10 sec: 14745.5, 60 sec: 9830.4, 300 sec: 9830.4). Total num frames: 409567232. Throughput: 0: 13806.0. Samples: 552240. Policy #0 lag: (min: 0.0, avg: 4.3, max: 12.0) [2024-06-06 14:17:20,005][19065] Avg episode reward: [(0, '0.224')] [2024-06-06 14:17:20,019][19277] Saving new best policy, reward=0.224! [2024-06-06 14:17:21,133][19305] Worker 7 awakens! [2024-06-06 14:17:21,139][19065] Heartbeat connected on RolloutWorker_w7 [2024-06-06 14:17:24,358][19297] Updated weights for policy 0, policy_version 25004 (0.0012) [2024-06-06 14:17:25,005][19065] Fps is (10 sec: 21299.0, 60 sec: 11286.8, 300 sec: 11286.8). Total num frames: 409681920. Throughput: 0: 13531.6. Samples: 608920. Policy #0 lag: (min: 0.0, avg: 2.4, max: 5.0) [2024-06-06 14:17:25,005][19065] Avg episode reward: [(0, '0.228')] [2024-06-06 14:17:25,014][19277] Saving new best policy, reward=0.228! [2024-06-06 14:17:25,724][19306] Worker 8 awakens! [2024-06-06 14:17:25,729][19065] Heartbeat connected on RolloutWorker_w8 [2024-06-06 14:17:30,005][19065] Fps is (10 sec: 21299.3, 60 sec: 12124.2, 300 sec: 12124.2). Total num frames: 409780224. Throughput: 0: 15672.0. Samples: 730260. Policy #0 lag: (min: 0.0, avg: 2.4, max: 5.0) [2024-06-06 14:17:30,005][19065] Avg episode reward: [(0, '0.259')] [2024-06-06 14:17:30,019][19277] Saving new best policy, reward=0.259! [2024-06-06 14:17:30,420][19308] Worker 9 awakens! [2024-06-06 14:17:30,427][19065] Heartbeat connected on RolloutWorker_w9 [2024-06-06 14:17:31,925][19297] Updated weights for policy 0, policy_version 25014 (0.0012) [2024-06-06 14:17:35,005][19065] Fps is (10 sec: 21299.4, 60 sec: 13107.2, 300 sec: 13107.2). Total num frames: 409894912. Throughput: 0: 12176.1. Samples: 875300. Policy #0 lag: (min: 0.0, avg: 2.4, max: 5.0) [2024-06-06 14:17:35,005][19065] Avg episode reward: [(0, '0.242')] [2024-06-06 14:17:35,080][19311] Worker 10 awakens! [2024-06-06 14:17:35,085][19065] Heartbeat connected on RolloutWorker_w10 [2024-06-06 14:17:38,335][19297] Updated weights for policy 0, policy_version 25024 (0.0017) [2024-06-06 14:17:39,760][19307] Worker 11 awakens! [2024-06-06 14:17:39,767][19065] Heartbeat connected on RolloutWorker_w11 [2024-06-06 14:17:40,005][19065] Fps is (10 sec: 26214.5, 60 sec: 14472.6, 300 sec: 14472.6). Total num frames: 410042368. Throughput: 0: 13906.2. Samples: 957340. Policy #0 lag: (min: 0.0, avg: 2.4, max: 5.0) [2024-06-06 14:17:40,005][19065] Avg episode reward: [(0, '0.251')] [2024-06-06 14:17:43,598][19297] Updated weights for policy 0, policy_version 25034 (0.0013) [2024-06-06 14:17:44,500][19309] Worker 12 awakens! [2024-06-06 14:17:44,506][19065] Heartbeat connected on RolloutWorker_w12 [2024-06-06 14:17:45,005][19065] Fps is (10 sec: 31129.1, 60 sec: 17203.2, 300 sec: 15879.9). Total num frames: 410206208. Throughput: 0: 17704.9. Samples: 1141840. Policy #0 lag: (min: 0.0, avg: 3.9, max: 8.0) [2024-06-06 14:17:45,005][19065] Avg episode reward: [(0, '0.262')] [2024-06-06 14:17:45,006][19277] Saving new best policy, reward=0.262! [2024-06-06 14:17:48,613][19297] Updated weights for policy 0, policy_version 25044 (0.0015) [2024-06-06 14:17:49,188][19312] Worker 13 awakens! [2024-06-06 14:17:49,197][19065] Heartbeat connected on RolloutWorker_w13 [2024-06-06 14:17:50,005][19065] Fps is (10 sec: 32768.0, 60 sec: 17203.3, 300 sec: 17086.2). Total num frames: 410370048. Throughput: 0: 21387.6. Samples: 1331900. Policy #0 lag: (min: 0.0, avg: 3.9, max: 8.0) [2024-06-06 14:17:50,005][19065] Avg episode reward: [(0, '0.239')] [2024-06-06 14:17:53,372][19297] Updated weights for policy 0, policy_version 25054 (0.0022) [2024-06-06 14:17:53,852][19313] Worker 14 awakens! [2024-06-06 14:17:53,857][19065] Heartbeat connected on RolloutWorker_w14 [2024-06-06 14:17:55,005][19065] Fps is (10 sec: 31129.6, 60 sec: 19660.8, 300 sec: 17913.2). Total num frames: 410517504. Throughput: 0: 23290.6. Samples: 1431720. Policy #0 lag: (min: 0.0, avg: 3.9, max: 8.0) [2024-06-06 14:17:55,005][19065] Avg episode reward: [(0, '0.243')] [2024-06-06 14:17:57,961][19297] Updated weights for policy 0, policy_version 25064 (0.0020) [2024-06-06 14:17:58,490][19314] Worker 15 awakens! [2024-06-06 14:17:58,498][19065] Heartbeat connected on RolloutWorker_w15 [2024-06-06 14:18:00,005][19065] Fps is (10 sec: 31129.3, 60 sec: 22118.4, 300 sec: 18841.6). Total num frames: 410681344. Throughput: 0: 26202.6. Samples: 1629800. Policy #0 lag: (min: 0.0, avg: 4.4, max: 11.0) [2024-06-06 14:18:00,005][19065] Avg episode reward: [(0, '0.260')] [2024-06-06 14:18:02,954][19297] Updated weights for policy 0, policy_version 25074 (0.0019) [2024-06-06 14:18:03,954][19316] Worker 16 awakens! [2024-06-06 14:18:03,965][19065] Heartbeat connected on RolloutWorker_w16 [2024-06-06 14:18:05,005][19065] Fps is (10 sec: 32767.8, 60 sec: 24576.0, 300 sec: 19660.8). Total num frames: 410845184. Throughput: 0: 28398.6. Samples: 1830180. Policy #0 lag: (min: 0.0, avg: 4.4, max: 11.0) [2024-06-06 14:18:05,005][19065] Avg episode reward: [(0, '0.275')] [2024-06-06 14:18:05,006][19277] Saving new best policy, reward=0.275! [2024-06-06 14:18:07,964][19297] Updated weights for policy 0, policy_version 25084 (0.0029) [2024-06-06 14:18:08,836][19310] Worker 17 awakens! [2024-06-06 14:18:08,847][19065] Heartbeat connected on RolloutWorker_w17 [2024-06-06 14:18:10,005][19065] Fps is (10 sec: 36044.7, 60 sec: 27033.5, 300 sec: 20753.1). Total num frames: 411041792. Throughput: 0: 29532.4. Samples: 1937880. Policy #0 lag: (min: 0.0, avg: 4.4, max: 11.0) [2024-06-06 14:18:10,005][19065] Avg episode reward: [(0, '0.243')] [2024-06-06 14:18:12,698][19297] Updated weights for policy 0, policy_version 25094 (0.0026) [2024-06-06 14:18:13,784][19315] Worker 18 awakens! [2024-06-06 14:18:13,795][19065] Heartbeat connected on RolloutWorker_w18 [2024-06-06 14:18:15,005][19065] Fps is (10 sec: 37683.2, 60 sec: 29218.0, 300 sec: 21557.9). Total num frames: 411222016. Throughput: 0: 31661.7. Samples: 2155040. Policy #0 lag: (min: 0.0, avg: 14.7, max: 124.0) [2024-06-06 14:18:15,005][19065] Avg episode reward: [(0, '0.250')] [2024-06-06 14:18:17,434][19297] Updated weights for policy 0, policy_version 25104 (0.0024) [2024-06-06 14:18:18,000][19317] Worker 19 awakens! [2024-06-06 14:18:18,011][19065] Heartbeat connected on RolloutWorker_w19 [2024-06-06 14:18:20,005][19065] Fps is (10 sec: 36045.2, 60 sec: 30583.5, 300 sec: 22282.3). Total num frames: 411402240. Throughput: 0: 33380.8. Samples: 2377440. Policy #0 lag: (min: 0.0, avg: 14.7, max: 124.0) [2024-06-06 14:18:20,005][19065] Avg episode reward: [(0, '0.243')] [2024-06-06 14:18:21,185][19297] Updated weights for policy 0, policy_version 25114 (0.0027) [2024-06-06 14:18:22,896][19319] Worker 20 awakens! [2024-06-06 14:18:22,906][19065] Heartbeat connected on RolloutWorker_w20 [2024-06-06 14:18:25,005][19065] Fps is (10 sec: 37683.3, 60 sec: 31948.8, 300 sec: 23093.6). Total num frames: 411598848. Throughput: 0: 34266.6. Samples: 2499340. Policy #0 lag: (min: 0.0, avg: 14.7, max: 124.0) [2024-06-06 14:18:25,005][19065] Avg episode reward: [(0, '0.245')] [2024-06-06 14:18:25,402][19297] Updated weights for policy 0, policy_version 25124 (0.0034) [2024-06-06 14:18:26,724][19322] Worker 21 awakens! [2024-06-06 14:18:26,735][19065] Heartbeat connected on RolloutWorker_w21 [2024-06-06 14:18:29,361][19297] Updated weights for policy 0, policy_version 25134 (0.0026) [2024-06-06 14:18:30,005][19065] Fps is (10 sec: 40960.1, 60 sec: 33860.3, 300 sec: 23980.2). Total num frames: 411811840. Throughput: 0: 35366.7. Samples: 2733340. Policy #0 lag: (min: 0.0, avg: 14.7, max: 124.0) [2024-06-06 14:18:30,005][19065] Avg episode reward: [(0, '0.230')] [2024-06-06 14:18:30,013][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000025135_411811840.pth... [2024-06-06 14:18:30,063][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000024624_403439616.pth [2024-06-06 14:18:32,064][19318] Worker 22 awakens! [2024-06-06 14:18:32,077][19065] Heartbeat connected on RolloutWorker_w22 [2024-06-06 14:18:33,907][19297] Updated weights for policy 0, policy_version 25144 (0.0025) [2024-06-06 14:18:35,005][19065] Fps is (10 sec: 40959.8, 60 sec: 35225.5, 300 sec: 24647.2). Total num frames: 412008448. Throughput: 0: 36596.3. Samples: 2978740. Policy #0 lag: (min: 0.0, avg: 56.0, max: 165.0) [2024-06-06 14:18:35,005][19065] Avg episode reward: [(0, '0.251')] [2024-06-06 14:18:36,125][19320] Worker 23 awakens! [2024-06-06 14:18:36,135][19065] Heartbeat connected on RolloutWorker_w23 [2024-06-06 14:18:37,680][19297] Updated weights for policy 0, policy_version 25154 (0.0030) [2024-06-06 14:18:40,005][19065] Fps is (10 sec: 39320.9, 60 sec: 36044.7, 300 sec: 25258.7). Total num frames: 412205056. Throughput: 0: 37052.4. Samples: 3099080. Policy #0 lag: (min: 0.0, avg: 56.0, max: 165.0) [2024-06-06 14:18:40,005][19065] Avg episode reward: [(0, '0.258')] [2024-06-06 14:18:41,510][19297] Updated weights for policy 0, policy_version 25164 (0.0027) [2024-06-06 14:18:41,718][19321] Worker 24 awakens! [2024-06-06 14:18:41,731][19065] Heartbeat connected on RolloutWorker_w24 [2024-06-06 14:18:45,005][19065] Fps is (10 sec: 39322.2, 60 sec: 36591.0, 300 sec: 25821.2). Total num frames: 412401664. Throughput: 0: 38261.9. Samples: 3351580. Policy #0 lag: (min: 0.0, avg: 56.0, max: 165.0) [2024-06-06 14:18:45,005][19065] Avg episode reward: [(0, '0.258')] [2024-06-06 14:18:45,814][19297] Updated weights for policy 0, policy_version 25174 (0.0037) [2024-06-06 14:18:46,340][19324] Worker 25 awakens! [2024-06-06 14:18:46,354][19065] Heartbeat connected on RolloutWorker_w25 [2024-06-06 14:18:49,443][19297] Updated weights for policy 0, policy_version 25184 (0.0028) [2024-06-06 14:18:50,005][19065] Fps is (10 sec: 42598.6, 60 sec: 37683.1, 300 sec: 26592.5). Total num frames: 412631040. Throughput: 0: 39314.7. Samples: 3599340. Policy #0 lag: (min: 0.0, avg: 8.6, max: 19.0) [2024-06-06 14:18:50,005][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:18:50,014][19277] Saving new best policy, reward=0.283! [2024-06-06 14:18:51,288][19327] Worker 26 awakens! [2024-06-06 14:18:51,302][19065] Heartbeat connected on RolloutWorker_w26 [2024-06-06 14:18:53,139][19297] Updated weights for policy 0, policy_version 25194 (0.0031) [2024-06-06 14:18:55,005][19065] Fps is (10 sec: 44236.3, 60 sec: 38775.4, 300 sec: 27185.3). Total num frames: 412844032. Throughput: 0: 39827.6. Samples: 3730120. Policy #0 lag: (min: 0.0, avg: 8.6, max: 19.0) [2024-06-06 14:18:55,005][19065] Avg episode reward: [(0, '0.228')] [2024-06-06 14:18:56,100][19323] Worker 27 awakens! [2024-06-06 14:18:56,114][19065] Heartbeat connected on RolloutWorker_w27 [2024-06-06 14:18:56,990][19297] Updated weights for policy 0, policy_version 25204 (0.0027) [2024-06-06 14:19:00,005][19065] Fps is (10 sec: 44236.9, 60 sec: 39867.7, 300 sec: 27852.8). Total num frames: 413073408. Throughput: 0: 40671.1. Samples: 3985240. Policy #0 lag: (min: 0.0, avg: 8.6, max: 19.0) [2024-06-06 14:19:00,005][19065] Avg episode reward: [(0, '0.241')] [2024-06-06 14:19:00,724][19325] Worker 28 awakens! [2024-06-06 14:19:00,738][19065] Heartbeat connected on RolloutWorker_w28 [2024-06-06 14:19:00,797][19297] Updated weights for policy 0, policy_version 25214 (0.0030) [2024-06-06 14:19:04,708][19297] Updated weights for policy 0, policy_version 25224 (0.0030) [2024-06-06 14:19:05,005][19065] Fps is (10 sec: 42598.3, 60 sec: 40413.9, 300 sec: 28248.3). Total num frames: 413270016. Throughput: 0: 41491.5. Samples: 4244560. Policy #0 lag: (min: 0.0, avg: 8.6, max: 19.0) [2024-06-06 14:19:05,005][19065] Avg episode reward: [(0, '0.251')] [2024-06-06 14:19:05,566][19326] Worker 29 awakens! [2024-06-06 14:19:05,582][19065] Heartbeat connected on RolloutWorker_w29 [2024-06-06 14:19:08,236][19297] Updated weights for policy 0, policy_version 25234 (0.0029) [2024-06-06 14:19:10,005][19065] Fps is (10 sec: 40959.9, 60 sec: 40686.9, 300 sec: 28726.6). Total num frames: 413483008. Throughput: 0: 41694.2. Samples: 4375580. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 14:19:10,005][19065] Avg episode reward: [(0, '0.259')] [2024-06-06 14:19:10,156][19329] Worker 30 awakens! [2024-06-06 14:19:10,176][19065] Heartbeat connected on RolloutWorker_w30 [2024-06-06 14:19:12,388][19297] Updated weights for policy 0, policy_version 25244 (0.0043) [2024-06-06 14:19:14,165][19328] Worker 31 awakens! [2024-06-06 14:19:14,181][19065] Heartbeat connected on RolloutWorker_w31 [2024-06-06 14:19:15,005][19065] Fps is (10 sec: 45875.7, 60 sec: 41779.3, 300 sec: 29385.5). Total num frames: 413728768. Throughput: 0: 42435.1. Samples: 4642920. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 14:19:15,005][19065] Avg episode reward: [(0, '0.262')] [2024-06-06 14:19:15,575][19297] Updated weights for policy 0, policy_version 25254 (0.0036) [2024-06-06 14:19:19,770][19297] Updated weights for policy 0, policy_version 25264 (0.0039) [2024-06-06 14:19:20,005][19065] Fps is (10 sec: 44237.6, 60 sec: 42052.3, 300 sec: 29696.0). Total num frames: 413925376. Throughput: 0: 42829.1. Samples: 4906040. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 14:19:20,005][19065] Avg episode reward: [(0, '0.244')] [2024-06-06 14:19:23,233][19297] Updated weights for policy 0, policy_version 25274 (0.0034) [2024-06-06 14:19:25,005][19065] Fps is (10 sec: 40960.4, 60 sec: 42325.4, 300 sec: 30087.0). Total num frames: 414138368. Throughput: 0: 43006.9. Samples: 5034380. Policy #0 lag: (min: 1.0, avg: 12.7, max: 23.0) [2024-06-06 14:19:25,005][19065] Avg episode reward: [(0, '0.250')] [2024-06-06 14:19:27,310][19297] Updated weights for policy 0, policy_version 25284 (0.0044) [2024-06-06 14:19:30,005][19065] Fps is (10 sec: 45874.7, 60 sec: 42871.4, 300 sec: 30647.7). Total num frames: 414384128. Throughput: 0: 43132.8. Samples: 5292560. Policy #0 lag: (min: 1.0, avg: 12.7, max: 23.0) [2024-06-06 14:19:30,005][19065] Avg episode reward: [(0, '0.253')] [2024-06-06 14:19:31,103][19297] Updated weights for policy 0, policy_version 25294 (0.0038) [2024-06-06 14:19:35,005][19065] Fps is (10 sec: 42598.2, 60 sec: 42598.5, 300 sec: 30801.9). Total num frames: 414564352. Throughput: 0: 43413.9. Samples: 5552960. Policy #0 lag: (min: 1.0, avg: 12.7, max: 23.0) [2024-06-06 14:19:35,005][19065] Avg episode reward: [(0, '0.275')] [2024-06-06 14:19:35,054][19297] Updated weights for policy 0, policy_version 25304 (0.0032) [2024-06-06 14:19:38,320][19297] Updated weights for policy 0, policy_version 25314 (0.0033) [2024-06-06 14:19:40,005][19065] Fps is (10 sec: 40960.0, 60 sec: 43144.6, 300 sec: 31220.6). Total num frames: 414793728. Throughput: 0: 43382.3. Samples: 5682320. Policy #0 lag: (min: 1.0, avg: 12.7, max: 23.0) [2024-06-06 14:19:40,005][19065] Avg episode reward: [(0, '0.266')] [2024-06-06 14:19:42,452][19297] Updated weights for policy 0, policy_version 25324 (0.0034) [2024-06-06 14:19:45,005][19065] Fps is (10 sec: 45874.9, 60 sec: 43690.6, 300 sec: 31616.7). Total num frames: 415023104. Throughput: 0: 43459.2. Samples: 5940900. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 14:19:45,005][19065] Avg episode reward: [(0, '0.267')] [2024-06-06 14:19:45,813][19297] Updated weights for policy 0, policy_version 25334 (0.0033) [2024-06-06 14:19:49,209][19277] Signal inference workers to stop experience collection... (50 times) [2024-06-06 14:19:49,252][19297] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-06 14:19:49,260][19277] Signal inference workers to resume experience collection... (50 times) [2024-06-06 14:19:49,264][19297] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-06 14:19:49,817][19297] Updated weights for policy 0, policy_version 25344 (0.0040) [2024-06-06 14:19:50,005][19065] Fps is (10 sec: 44236.8, 60 sec: 43417.7, 300 sec: 31905.7). Total num frames: 415236096. Throughput: 0: 43658.3. Samples: 6209180. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 14:19:50,005][19065] Avg episode reward: [(0, '0.268')] [2024-06-06 14:19:53,602][19297] Updated weights for policy 0, policy_version 25354 (0.0033) [2024-06-06 14:19:55,005][19065] Fps is (10 sec: 40960.0, 60 sec: 43144.6, 300 sec: 32095.9). Total num frames: 415432704. Throughput: 0: 43476.5. Samples: 6332020. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 14:19:55,005][19065] Avg episode reward: [(0, '0.250')] [2024-06-06 14:19:57,243][19297] Updated weights for policy 0, policy_version 25364 (0.0031) [2024-06-06 14:20:00,005][19065] Fps is (10 sec: 45874.9, 60 sec: 43690.7, 300 sec: 32604.2). Total num frames: 415694848. Throughput: 0: 43367.9. Samples: 6594480. Policy #0 lag: (min: 1.0, avg: 10.1, max: 23.0) [2024-06-06 14:20:00,005][19065] Avg episode reward: [(0, '0.266')] [2024-06-06 14:20:00,882][19297] Updated weights for policy 0, policy_version 25374 (0.0029) [2024-06-06 14:20:04,933][19297] Updated weights for policy 0, policy_version 25384 (0.0043) [2024-06-06 14:20:05,005][19065] Fps is (10 sec: 45874.7, 60 sec: 43690.7, 300 sec: 32768.0). Total num frames: 415891456. Throughput: 0: 43371.8. Samples: 6857780. Policy #0 lag: (min: 1.0, avg: 10.1, max: 23.0) [2024-06-06 14:20:05,005][19065] Avg episode reward: [(0, '0.265')] [2024-06-06 14:20:08,229][19297] Updated weights for policy 0, policy_version 25394 (0.0022) [2024-06-06 14:20:10,005][19065] Fps is (10 sec: 40960.4, 60 sec: 43690.8, 300 sec: 33002.1). Total num frames: 416104448. Throughput: 0: 43464.4. Samples: 6990280. Policy #0 lag: (min: 1.0, avg: 10.1, max: 23.0) [2024-06-06 14:20:10,005][19065] Avg episode reward: [(0, '0.268')] [2024-06-06 14:20:12,107][19297] Updated weights for policy 0, policy_version 25404 (0.0031) [2024-06-06 14:20:15,005][19065] Fps is (10 sec: 45875.9, 60 sec: 43690.7, 300 sec: 33377.7). Total num frames: 416350208. Throughput: 0: 43488.5. Samples: 7249540. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 14:20:15,013][19065] Avg episode reward: [(0, '0.259')] [2024-06-06 14:20:15,860][19297] Updated weights for policy 0, policy_version 25414 (0.0032) [2024-06-06 14:20:19,929][19297] Updated weights for policy 0, policy_version 25424 (0.0026) [2024-06-06 14:20:20,005][19065] Fps is (10 sec: 44236.7, 60 sec: 43690.6, 300 sec: 33512.7). Total num frames: 416546816. Throughput: 0: 43570.6. Samples: 7513640. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 14:20:20,005][19065] Avg episode reward: [(0, '0.271')] [2024-06-06 14:20:23,640][19297] Updated weights for policy 0, policy_version 25434 (0.0033) [2024-06-06 14:20:25,005][19065] Fps is (10 sec: 39321.2, 60 sec: 43417.5, 300 sec: 33641.8). Total num frames: 416743424. Throughput: 0: 43416.0. Samples: 7636040. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 14:20:25,005][19065] Avg episode reward: [(0, '0.265')] [2024-06-06 14:20:27,227][19297] Updated weights for policy 0, policy_version 25444 (0.0029) [2024-06-06 14:20:30,005][19065] Fps is (10 sec: 44236.8, 60 sec: 43417.6, 300 sec: 33979.0). Total num frames: 416989184. Throughput: 0: 43591.6. Samples: 7902520. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 14:20:30,013][19065] Avg episode reward: [(0, '0.281')] [2024-06-06 14:20:30,036][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000025451_416989184.pth... [2024-06-06 14:20:30,087][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000024974_409174016.pth [2024-06-06 14:20:30,913][19297] Updated weights for policy 0, policy_version 25454 (0.0029) [2024-06-06 14:20:34,740][19297] Updated weights for policy 0, policy_version 25464 (0.0038) [2024-06-06 14:20:35,005][19065] Fps is (10 sec: 45875.7, 60 sec: 43963.7, 300 sec: 34162.4). Total num frames: 417202176. Throughput: 0: 43436.1. Samples: 8163800. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:20:35,005][19065] Avg episode reward: [(0, '0.267')] [2024-06-06 14:20:38,440][19297] Updated weights for policy 0, policy_version 25474 (0.0045) [2024-06-06 14:20:40,005][19065] Fps is (10 sec: 39321.7, 60 sec: 43144.6, 300 sec: 34201.6). Total num frames: 417382400. Throughput: 0: 43532.9. Samples: 8291000. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:20:40,005][19065] Avg episode reward: [(0, '0.258')] [2024-06-06 14:20:42,319][19297] Updated weights for policy 0, policy_version 25484 (0.0023) [2024-06-06 14:20:45,005][19065] Fps is (10 sec: 42598.5, 60 sec: 43417.7, 300 sec: 34506.7). Total num frames: 417628160. Throughput: 0: 43536.1. Samples: 8553600. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:20:45,005][19065] Avg episode reward: [(0, '0.275')] [2024-06-06 14:20:45,779][19297] Updated weights for policy 0, policy_version 25494 (0.0027) [2024-06-06 14:20:50,005][19065] Fps is (10 sec: 45875.2, 60 sec: 43417.6, 300 sec: 34668.6). Total num frames: 417841152. Throughput: 0: 43528.1. Samples: 8816540. Policy #0 lag: (min: 1.0, avg: 7.7, max: 19.0) [2024-06-06 14:20:50,005][19065] Avg episode reward: [(0, '0.267')] [2024-06-06 14:20:50,023][19297] Updated weights for policy 0, policy_version 25504 (0.0033) [2024-06-06 14:20:53,483][19297] Updated weights for policy 0, policy_version 25514 (0.0021) [2024-06-06 14:20:55,005][19065] Fps is (10 sec: 42597.8, 60 sec: 43690.6, 300 sec: 34824.0). Total num frames: 418054144. Throughput: 0: 43343.4. Samples: 8940740. Policy #0 lag: (min: 1.0, avg: 7.7, max: 19.0) [2024-06-06 14:20:55,005][19065] Avg episode reward: [(0, '0.274')] [2024-06-06 14:20:57,417][19297] Updated weights for policy 0, policy_version 25524 (0.0021) [2024-06-06 14:21:00,005][19065] Fps is (10 sec: 44236.1, 60 sec: 43144.5, 300 sec: 35036.5). Total num frames: 418283520. Throughput: 0: 43481.6. Samples: 9206220. Policy #0 lag: (min: 1.0, avg: 7.7, max: 19.0) [2024-06-06 14:21:00,005][19065] Avg episode reward: [(0, '0.271')] [2024-06-06 14:21:00,800][19297] Updated weights for policy 0, policy_version 25534 (0.0019) [2024-06-06 14:21:04,783][19297] Updated weights for policy 0, policy_version 25544 (0.0038) [2024-06-06 14:21:05,005][19065] Fps is (10 sec: 45875.8, 60 sec: 43690.8, 300 sec: 35241.1). Total num frames: 418512896. Throughput: 0: 43481.8. Samples: 9470320. Policy #0 lag: (min: 1.0, avg: 7.7, max: 19.0) [2024-06-06 14:21:05,005][19065] Avg episode reward: [(0, '0.259')] [2024-06-06 14:21:05,911][19277] Signal inference workers to stop experience collection... (100 times) [2024-06-06 14:21:05,924][19277] Signal inference workers to resume experience collection... (100 times) [2024-06-06 14:21:05,932][19297] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-06 14:21:05,965][19297] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-06 14:21:08,497][19297] Updated weights for policy 0, policy_version 25554 (0.0034) [2024-06-06 14:21:10,005][19065] Fps is (10 sec: 40960.6, 60 sec: 43144.5, 300 sec: 35256.0). Total num frames: 418693120. Throughput: 0: 43593.0. Samples: 9597720. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 14:21:10,005][19065] Avg episode reward: [(0, '0.282')] [2024-06-06 14:21:12,477][19297] Updated weights for policy 0, policy_version 25564 (0.0034) [2024-06-06 14:21:15,005][19065] Fps is (10 sec: 42598.2, 60 sec: 43144.5, 300 sec: 35508.6). Total num frames: 418938880. Throughput: 0: 43478.7. Samples: 9859060. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 14:21:15,005][19065] Avg episode reward: [(0, '0.268')] [2024-06-06 14:21:15,915][19297] Updated weights for policy 0, policy_version 25574 (0.0032) [2024-06-06 14:21:20,005][19065] Fps is (10 sec: 44237.1, 60 sec: 43144.6, 300 sec: 35576.7). Total num frames: 419135488. Throughput: 0: 43462.3. Samples: 10119600. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 14:21:20,005][19065] Avg episode reward: [(0, '0.264')] [2024-06-06 14:21:20,254][19297] Updated weights for policy 0, policy_version 25584 (0.0033) [2024-06-06 14:21:23,791][19297] Updated weights for policy 0, policy_version 25594 (0.0029) [2024-06-06 14:21:25,005][19065] Fps is (10 sec: 40959.9, 60 sec: 43417.7, 300 sec: 35699.9). Total num frames: 419348480. Throughput: 0: 43407.5. Samples: 10244340. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:21:25,005][19065] Avg episode reward: [(0, '0.266')] [2024-06-06 14:21:27,569][19297] Updated weights for policy 0, policy_version 25604 (0.0036) [2024-06-06 14:21:30,005][19065] Fps is (10 sec: 45875.0, 60 sec: 43417.6, 300 sec: 35931.8). Total num frames: 419594240. Throughput: 0: 43367.1. Samples: 10505120. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:21:30,005][19065] Avg episode reward: [(0, '0.276')] [2024-06-06 14:21:31,050][19297] Updated weights for policy 0, policy_version 25614 (0.0031) [2024-06-06 14:21:35,005][19065] Fps is (10 sec: 45875.3, 60 sec: 43417.6, 300 sec: 36044.8). Total num frames: 419807232. Throughput: 0: 43443.1. Samples: 10771480. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:21:35,005][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:21:35,123][19297] Updated weights for policy 0, policy_version 25624 (0.0032) [2024-06-06 14:21:38,517][19297] Updated weights for policy 0, policy_version 25634 (0.0032) [2024-06-06 14:21:40,005][19065] Fps is (10 sec: 40960.2, 60 sec: 43690.7, 300 sec: 36711.3). Total num frames: 420003840. Throughput: 0: 43490.4. Samples: 10897800. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:21:40,005][19065] Avg episode reward: [(0, '0.269')] [2024-06-06 14:21:42,902][19297] Updated weights for policy 0, policy_version 25644 (0.0033) [2024-06-06 14:21:45,005][19065] Fps is (10 sec: 42598.3, 60 sec: 43417.6, 300 sec: 36933.4). Total num frames: 420233216. Throughput: 0: 43307.2. Samples: 11155040. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-06 14:21:45,005][19065] Avg episode reward: [(0, '0.271')] [2024-06-06 14:21:46,186][19297] Updated weights for policy 0, policy_version 25654 (0.0027) [2024-06-06 14:21:50,005][19065] Fps is (10 sec: 42598.1, 60 sec: 43144.5, 300 sec: 37599.9). Total num frames: 420429824. Throughput: 0: 43088.4. Samples: 11409300. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-06 14:21:50,005][19065] Avg episode reward: [(0, '0.277')] [2024-06-06 14:21:50,627][19297] Updated weights for policy 0, policy_version 25664 (0.0026) [2024-06-06 14:21:54,292][19297] Updated weights for policy 0, policy_version 25674 (0.0040) [2024-06-06 14:21:55,005][19065] Fps is (10 sec: 40960.1, 60 sec: 43144.6, 300 sec: 38266.4). Total num frames: 420642816. Throughput: 0: 43131.1. Samples: 11538620. Policy #0 lag: (min: 0.0, avg: 8.1, max: 20.0) [2024-06-06 14:21:55,005][19065] Avg episode reward: [(0, '0.282')] [2024-06-06 14:21:57,982][19297] Updated weights for policy 0, policy_version 25684 (0.0025) [2024-06-06 14:22:00,005][19065] Fps is (10 sec: 45874.6, 60 sec: 43417.6, 300 sec: 39043.9). Total num frames: 420888576. Throughput: 0: 43007.0. Samples: 11794380. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 14:22:00,005][19065] Avg episode reward: [(0, '0.276')] [2024-06-06 14:22:01,602][19297] Updated weights for policy 0, policy_version 25694 (0.0031) [2024-06-06 14:22:05,005][19065] Fps is (10 sec: 44236.4, 60 sec: 42871.4, 300 sec: 39543.7). Total num frames: 421085184. Throughput: 0: 43104.8. Samples: 12059320. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 14:22:05,005][19065] Avg episode reward: [(0, '0.271')] [2024-06-06 14:22:05,483][19297] Updated weights for policy 0, policy_version 25704 (0.0026) [2024-06-06 14:22:08,899][19297] Updated weights for policy 0, policy_version 25714 (0.0026) [2024-06-06 14:22:10,005][19065] Fps is (10 sec: 40960.6, 60 sec: 43417.6, 300 sec: 40099.1). Total num frames: 421298176. Throughput: 0: 43217.3. Samples: 12189120. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 14:22:10,005][19065] Avg episode reward: [(0, '0.282')] [2024-06-06 14:22:13,289][19297] Updated weights for policy 0, policy_version 25724 (0.0029) [2024-06-06 14:22:15,005][19065] Fps is (10 sec: 42598.7, 60 sec: 42871.5, 300 sec: 40487.9). Total num frames: 421511168. Throughput: 0: 43252.9. Samples: 12451500. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 14:22:15,005][19065] Avg episode reward: [(0, '0.273')] [2024-06-06 14:22:16,445][19297] Updated weights for policy 0, policy_version 25734 (0.0037) [2024-06-06 14:22:20,005][19065] Fps is (10 sec: 44236.4, 60 sec: 43417.5, 300 sec: 40876.7). Total num frames: 421740544. Throughput: 0: 43033.2. Samples: 12707980. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 14:22:20,005][19065] Avg episode reward: [(0, '0.285')] [2024-06-06 14:22:20,018][19277] Saving new best policy, reward=0.285! [2024-06-06 14:22:20,852][19297] Updated weights for policy 0, policy_version 25744 (0.0031) [2024-06-06 14:22:24,336][19297] Updated weights for policy 0, policy_version 25754 (0.0035) [2024-06-06 14:22:25,008][19065] Fps is (10 sec: 44222.4, 60 sec: 43415.2, 300 sec: 41265.0). Total num frames: 421953536. Throughput: 0: 43173.3. Samples: 12840740. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 14:22:25,009][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:22:25,012][19277] Saving new best policy, reward=0.286! [2024-06-06 14:22:27,747][19277] Signal inference workers to stop experience collection... (150 times) [2024-06-06 14:22:27,748][19277] Signal inference workers to resume experience collection... (150 times) [2024-06-06 14:22:27,773][19297] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-06 14:22:27,773][19297] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-06 14:22:28,413][19297] Updated weights for policy 0, policy_version 25764 (0.0041) [2024-06-06 14:22:30,005][19065] Fps is (10 sec: 42599.0, 60 sec: 42871.5, 300 sec: 41598.7). Total num frames: 422166528. Throughput: 0: 43128.5. Samples: 13095820. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 14:22:30,005][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:22:30,020][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000025767_422166528.pth... [2024-06-06 14:22:30,085][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000025135_411811840.pth [2024-06-06 14:22:30,087][19277] Saving new best policy, reward=0.288! [2024-06-06 14:22:32,446][19297] Updated weights for policy 0, policy_version 25774 (0.0030) [2024-06-06 14:22:35,005][19065] Fps is (10 sec: 40973.0, 60 sec: 42598.3, 300 sec: 41765.3). Total num frames: 422363136. Throughput: 0: 43287.0. Samples: 13357220. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:22:35,005][19065] Avg episode reward: [(0, '0.271')] [2024-06-06 14:22:35,878][19297] Updated weights for policy 0, policy_version 25784 (0.0026) [2024-06-06 14:22:39,600][19297] Updated weights for policy 0, policy_version 25794 (0.0032) [2024-06-06 14:22:40,005][19065] Fps is (10 sec: 44236.7, 60 sec: 43417.6, 300 sec: 42043.0). Total num frames: 422608896. Throughput: 0: 43227.1. Samples: 13483840. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:22:40,005][19065] Avg episode reward: [(0, '0.274')] [2024-06-06 14:22:43,728][19297] Updated weights for policy 0, policy_version 25804 (0.0038) [2024-06-06 14:22:45,005][19065] Fps is (10 sec: 45875.9, 60 sec: 43144.6, 300 sec: 42209.6). Total num frames: 422821888. Throughput: 0: 43376.2. Samples: 13746300. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:22:45,005][19065] Avg episode reward: [(0, '0.277')] [2024-06-06 14:22:46,897][19297] Updated weights for policy 0, policy_version 25814 (0.0039) [2024-06-06 14:22:50,005][19065] Fps is (10 sec: 42598.4, 60 sec: 43417.6, 300 sec: 42431.8). Total num frames: 423034880. Throughput: 0: 43289.9. Samples: 14007360. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 14:22:50,005][19065] Avg episode reward: [(0, '0.266')] [2024-06-06 14:22:50,980][19297] Updated weights for policy 0, policy_version 25824 (0.0036) [2024-06-06 14:22:55,005][19065] Fps is (10 sec: 42598.1, 60 sec: 43417.6, 300 sec: 42598.4). Total num frames: 423247872. Throughput: 0: 43233.4. Samples: 14134620. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 14:22:55,005][19065] Avg episode reward: [(0, '0.281')] [2024-06-06 14:22:55,094][19297] Updated weights for policy 0, policy_version 25834 (0.0031) [2024-06-06 14:22:59,057][19297] Updated weights for policy 0, policy_version 25844 (0.0037) [2024-06-06 14:23:00,005][19065] Fps is (10 sec: 44236.8, 60 sec: 43144.7, 300 sec: 42820.6). Total num frames: 423477248. Throughput: 0: 43131.6. Samples: 14392420. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 14:23:00,005][19065] Avg episode reward: [(0, '0.274')] [2024-06-06 14:23:02,624][19297] Updated weights for policy 0, policy_version 25854 (0.0029) [2024-06-06 14:23:05,005][19065] Fps is (10 sec: 42598.2, 60 sec: 43144.6, 300 sec: 42820.6). Total num frames: 423673856. Throughput: 0: 43238.7. Samples: 14653720. Policy #0 lag: (min: 0.0, avg: 9.8, max: 22.0) [2024-06-06 14:23:05,005][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:23:06,332][19297] Updated weights for policy 0, policy_version 25864 (0.0029) [2024-06-06 14:23:09,958][19297] Updated weights for policy 0, policy_version 25874 (0.0050) [2024-06-06 14:23:10,005][19065] Fps is (10 sec: 44236.2, 60 sec: 43690.6, 300 sec: 43042.7). Total num frames: 423919616. Throughput: 0: 43067.9. Samples: 14778660. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 14:23:10,005][19065] Avg episode reward: [(0, '0.268')] [2024-06-06 14:23:14,002][19297] Updated weights for policy 0, policy_version 25884 (0.0026) [2024-06-06 14:23:15,005][19065] Fps is (10 sec: 44237.4, 60 sec: 43417.7, 300 sec: 43098.3). Total num frames: 424116224. Throughput: 0: 43320.1. Samples: 15045220. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 14:23:15,005][19065] Avg episode reward: [(0, '0.273')] [2024-06-06 14:23:17,481][19297] Updated weights for policy 0, policy_version 25894 (0.0033) [2024-06-06 14:23:20,008][19065] Fps is (10 sec: 42585.1, 60 sec: 43415.3, 300 sec: 43208.9). Total num frames: 424345600. Throughput: 0: 43268.5. Samples: 15304440. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 14:23:20,016][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:23:21,479][19297] Updated weights for policy 0, policy_version 25904 (0.0041) [2024-06-06 14:23:25,005][19065] Fps is (10 sec: 44236.4, 60 sec: 43420.0, 300 sec: 43209.3). Total num frames: 424558592. Throughput: 0: 43481.3. Samples: 15440500. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 14:23:25,005][19065] Avg episode reward: [(0, '0.272')] [2024-06-06 14:23:25,119][19297] Updated weights for policy 0, policy_version 25914 (0.0033) [2024-06-06 14:23:29,019][19297] Updated weights for policy 0, policy_version 25924 (0.0037) [2024-06-06 14:23:30,005][19065] Fps is (10 sec: 44250.5, 60 sec: 43690.5, 300 sec: 43320.4). Total num frames: 424787968. Throughput: 0: 43508.7. Samples: 15704200. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 14:23:30,005][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:23:32,363][19297] Updated weights for policy 0, policy_version 25934 (0.0022) [2024-06-06 14:23:35,005][19065] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 424984576. Throughput: 0: 43592.4. Samples: 15969020. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 14:23:35,006][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:23:36,259][19297] Updated weights for policy 0, policy_version 25944 (0.0038) [2024-06-06 14:23:39,827][19297] Updated weights for policy 0, policy_version 25954 (0.0039) [2024-06-06 14:23:40,005][19065] Fps is (10 sec: 44237.7, 60 sec: 43690.7, 300 sec: 43487.0). Total num frames: 425230336. Throughput: 0: 43584.5. Samples: 16095920. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 14:23:40,005][19065] Avg episode reward: [(0, '0.272')] [2024-06-06 14:23:44,003][19297] Updated weights for policy 0, policy_version 25964 (0.0024) [2024-06-06 14:23:45,005][19065] Fps is (10 sec: 44236.5, 60 sec: 43417.5, 300 sec: 43375.9). Total num frames: 425426944. Throughput: 0: 43777.7. Samples: 16362420. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 14:23:45,005][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:23:47,444][19297] Updated weights for policy 0, policy_version 25974 (0.0027) [2024-06-06 14:23:50,005][19065] Fps is (10 sec: 40960.2, 60 sec: 43417.6, 300 sec: 43376.0). Total num frames: 425639936. Throughput: 0: 43490.8. Samples: 16610800. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 14:23:50,005][19065] Avg episode reward: [(0, '0.269')] [2024-06-06 14:23:51,420][19297] Updated weights for policy 0, policy_version 25984 (0.0042) [2024-06-06 14:23:55,005][19065] Fps is (10 sec: 44237.7, 60 sec: 43690.7, 300 sec: 43376.0). Total num frames: 425869312. Throughput: 0: 43869.6. Samples: 16752780. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 14:23:55,005][19065] Avg episode reward: [(0, '0.278')] [2024-06-06 14:23:55,118][19297] Updated weights for policy 0, policy_version 25994 (0.0024) [2024-06-06 14:23:59,287][19297] Updated weights for policy 0, policy_version 26004 (0.0044) [2024-06-06 14:24:00,006][19065] Fps is (10 sec: 42594.2, 60 sec: 43143.9, 300 sec: 43375.8). Total num frames: 426065920. Throughput: 0: 43652.8. Samples: 17009640. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:24:00,006][19065] Avg episode reward: [(0, '0.279')] [2024-06-06 14:24:02,626][19297] Updated weights for policy 0, policy_version 26014 (0.0031) [2024-06-06 14:24:05,005][19065] Fps is (10 sec: 42598.3, 60 sec: 43690.8, 300 sec: 43431.5). Total num frames: 426295296. Throughput: 0: 43630.8. Samples: 17267680. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:24:05,006][19065] Avg episode reward: [(0, '0.274')] [2024-06-06 14:24:06,623][19297] Updated weights for policy 0, policy_version 26024 (0.0038) [2024-06-06 14:24:08,071][19277] Signal inference workers to stop experience collection... (200 times) [2024-06-06 14:24:08,117][19297] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-06 14:24:08,183][19277] Signal inference workers to resume experience collection... (200 times) [2024-06-06 14:24:08,183][19297] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-06 14:24:10,008][19065] Fps is (10 sec: 45864.3, 60 sec: 43415.3, 300 sec: 43375.5). Total num frames: 426524672. Throughput: 0: 43568.4. Samples: 17401220. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:24:10,009][19065] Avg episode reward: [(0, '0.263')] [2024-06-06 14:24:10,019][19297] Updated weights for policy 0, policy_version 26034 (0.0025) [2024-06-06 14:24:14,543][19297] Updated weights for policy 0, policy_version 26044 (0.0030) [2024-06-06 14:24:15,005][19065] Fps is (10 sec: 40960.0, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 426704896. Throughput: 0: 43493.1. Samples: 17661380. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:24:15,005][19065] Avg episode reward: [(0, '0.270')] [2024-06-06 14:24:17,554][19297] Updated weights for policy 0, policy_version 26054 (0.0037) [2024-06-06 14:24:20,005][19065] Fps is (10 sec: 42612.6, 60 sec: 43420.0, 300 sec: 43431.5). Total num frames: 426950656. Throughput: 0: 43284.1. Samples: 17916800. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 14:24:20,005][19065] Avg episode reward: [(0, '0.270')] [2024-06-06 14:24:21,709][19297] Updated weights for policy 0, policy_version 26064 (0.0030) [2024-06-06 14:24:25,005][19065] Fps is (10 sec: 47513.4, 60 sec: 43690.7, 300 sec: 43376.0). Total num frames: 427180032. Throughput: 0: 43558.7. Samples: 18056060. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 14:24:25,005][19065] Avg episode reward: [(0, '0.271')] [2024-06-06 14:24:25,115][19297] Updated weights for policy 0, policy_version 26074 (0.0040) [2024-06-06 14:24:29,738][19297] Updated weights for policy 0, policy_version 26084 (0.0039) [2024-06-06 14:24:30,005][19065] Fps is (10 sec: 42597.7, 60 sec: 43144.6, 300 sec: 43431.5). Total num frames: 427376640. Throughput: 0: 43350.7. Samples: 18313200. Policy #0 lag: (min: 0.0, avg: 8.4, max: 20.0) [2024-06-06 14:24:30,005][19065] Avg episode reward: [(0, '0.267')] [2024-06-06 14:24:30,019][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000026085_427376640.pth... [2024-06-06 14:24:30,068][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000025451_416989184.pth [2024-06-06 14:24:32,679][19297] Updated weights for policy 0, policy_version 26094 (0.0043) [2024-06-06 14:24:35,005][19065] Fps is (10 sec: 40959.9, 60 sec: 43417.6, 300 sec: 43376.0). Total num frames: 427589632. Throughput: 0: 43499.9. Samples: 18568300. Policy #0 lag: (min: 0.0, avg: 12.6, max: 21.0) [2024-06-06 14:24:35,005][19065] Avg episode reward: [(0, '0.268')] [2024-06-06 14:24:37,298][19297] Updated weights for policy 0, policy_version 26104 (0.0042) [2024-06-06 14:24:40,005][19065] Fps is (10 sec: 44237.2, 60 sec: 43144.5, 300 sec: 43375.9). Total num frames: 427819008. Throughput: 0: 43383.0. Samples: 18705020. Policy #0 lag: (min: 0.0, avg: 12.6, max: 21.0) [2024-06-06 14:24:40,005][19065] Avg episode reward: [(0, '0.272')] [2024-06-06 14:24:40,158][19297] Updated weights for policy 0, policy_version 26114 (0.0033) [2024-06-06 14:24:44,986][19297] Updated weights for policy 0, policy_version 26124 (0.0041) [2024-06-06 14:24:45,005][19065] Fps is (10 sec: 42598.3, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 428015616. Throughput: 0: 43447.1. Samples: 18964720. Policy #0 lag: (min: 0.0, avg: 12.6, max: 21.0) [2024-06-06 14:24:45,006][19065] Avg episode reward: [(0, '0.270')] [2024-06-06 14:24:48,043][19297] Updated weights for policy 0, policy_version 26134 (0.0037) [2024-06-06 14:24:50,005][19065] Fps is (10 sec: 44236.7, 60 sec: 43690.6, 300 sec: 43487.0). Total num frames: 428261376. Throughput: 0: 43263.9. Samples: 19214560. Policy #0 lag: (min: 0.0, avg: 12.6, max: 21.0) [2024-06-06 14:24:50,005][19065] Avg episode reward: [(0, '0.279')] [2024-06-06 14:24:52,566][19297] Updated weights for policy 0, policy_version 26144 (0.0040) [2024-06-06 14:24:55,005][19065] Fps is (10 sec: 45875.3, 60 sec: 43417.5, 300 sec: 43320.4). Total num frames: 428474368. Throughput: 0: 43467.6. Samples: 19357120. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 14:24:55,005][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:24:55,359][19297] Updated weights for policy 0, policy_version 26154 (0.0031) [2024-06-06 14:25:00,005][19065] Fps is (10 sec: 39321.7, 60 sec: 43145.2, 300 sec: 43264.9). Total num frames: 428654592. Throughput: 0: 43370.2. Samples: 19613040. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 14:25:00,005][19065] Avg episode reward: [(0, '0.262')] [2024-06-06 14:25:00,245][19297] Updated weights for policy 0, policy_version 26164 (0.0031) [2024-06-06 14:25:02,813][19297] Updated weights for policy 0, policy_version 26174 (0.0045) [2024-06-06 14:25:05,005][19065] Fps is (10 sec: 45875.1, 60 sec: 43963.7, 300 sec: 43487.0). Total num frames: 428933120. Throughput: 0: 43275.9. Samples: 19864220. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 14:25:05,005][19065] Avg episode reward: [(0, '0.260')] [2024-06-06 14:25:07,744][19297] Updated weights for policy 0, policy_version 26184 (0.0030) [2024-06-06 14:25:10,005][19065] Fps is (10 sec: 45875.2, 60 sec: 43146.9, 300 sec: 43264.9). Total num frames: 429113344. Throughput: 0: 43304.9. Samples: 20004780. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:25:10,005][19065] Avg episode reward: [(0, '0.278')] [2024-06-06 14:25:10,425][19297] Updated weights for policy 0, policy_version 26194 (0.0033) [2024-06-06 14:25:15,008][19065] Fps is (10 sec: 36033.2, 60 sec: 43142.2, 300 sec: 43208.9). Total num frames: 429293568. Throughput: 0: 43256.6. Samples: 20259880. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:25:15,008][19065] Avg episode reward: [(0, '0.276')] [2024-06-06 14:25:15,398][19297] Updated weights for policy 0, policy_version 26204 (0.0036) [2024-06-06 14:25:17,922][19297] Updated weights for policy 0, policy_version 26214 (0.0032) [2024-06-06 14:25:20,005][19065] Fps is (10 sec: 47513.6, 60 sec: 43963.7, 300 sec: 43542.6). Total num frames: 429588480. Throughput: 0: 43244.9. Samples: 20514320. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:25:20,005][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:25:22,553][19297] Updated weights for policy 0, policy_version 26224 (0.0040) [2024-06-06 14:25:24,178][19277] Signal inference workers to stop experience collection... (250 times) [2024-06-06 14:25:24,229][19297] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-06 14:25:24,290][19277] Signal inference workers to resume experience collection... (250 times) [2024-06-06 14:25:24,291][19297] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-06 14:25:25,005][19065] Fps is (10 sec: 45889.8, 60 sec: 42871.4, 300 sec: 43264.9). Total num frames: 429752320. Throughput: 0: 43488.9. Samples: 20662020. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:25:25,005][19065] Avg episode reward: [(0, '0.275')] [2024-06-06 14:25:25,527][19297] Updated weights for policy 0, policy_version 26234 (0.0040) [2024-06-06 14:25:30,005][19065] Fps is (10 sec: 37682.9, 60 sec: 43144.5, 300 sec: 43264.8). Total num frames: 429965312. Throughput: 0: 43295.0. Samples: 20913000. Policy #0 lag: (min: 0.0, avg: 13.2, max: 21.0) [2024-06-06 14:25:30,005][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:25:30,178][19297] Updated weights for policy 0, policy_version 26244 (0.0038) [2024-06-06 14:25:33,005][19297] Updated weights for policy 0, policy_version 26254 (0.0023) [2024-06-06 14:25:35,005][19065] Fps is (10 sec: 49152.2, 60 sec: 44236.8, 300 sec: 43598.1). Total num frames: 430243840. Throughput: 0: 43422.3. Samples: 21168560. Policy #0 lag: (min: 0.0, avg: 13.2, max: 21.0) [2024-06-06 14:25:35,005][19065] Avg episode reward: [(0, '0.275')] [2024-06-06 14:25:37,712][19297] Updated weights for policy 0, policy_version 26264 (0.0033) [2024-06-06 14:25:40,005][19065] Fps is (10 sec: 44237.0, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 430407680. Throughput: 0: 43377.7. Samples: 21309120. Policy #0 lag: (min: 0.0, avg: 13.2, max: 21.0) [2024-06-06 14:25:40,005][19065] Avg episode reward: [(0, '0.269')] [2024-06-06 14:25:40,592][19297] Updated weights for policy 0, policy_version 26274 (0.0038) [2024-06-06 14:25:45,005][19065] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 43375.9). Total num frames: 430637056. Throughput: 0: 43307.6. Samples: 21561880. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 14:25:45,005][19065] Avg episode reward: [(0, '0.277')] [2024-06-06 14:25:45,015][19297] Updated weights for policy 0, policy_version 26284 (0.0043) [2024-06-06 14:25:48,198][19297] Updated weights for policy 0, policy_version 26294 (0.0027) [2024-06-06 14:25:50,005][19065] Fps is (10 sec: 47513.7, 60 sec: 43690.7, 300 sec: 43487.0). Total num frames: 430882816. Throughput: 0: 43357.8. Samples: 21815320. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 14:25:50,012][19065] Avg episode reward: [(0, '0.267')] [2024-06-06 14:25:52,458][19297] Updated weights for policy 0, policy_version 26304 (0.0036) [2024-06-06 14:25:55,005][19065] Fps is (10 sec: 42598.5, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 431063040. Throughput: 0: 43371.1. Samples: 21956480. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 14:25:55,005][19065] Avg episode reward: [(0, '0.291')] [2024-06-06 14:25:55,103][19277] Saving new best policy, reward=0.291! [2024-06-06 14:25:55,678][19297] Updated weights for policy 0, policy_version 26314 (0.0025) [2024-06-06 14:26:00,005][19065] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 43264.9). Total num frames: 431276032. Throughput: 0: 43284.5. Samples: 22207540. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 14:26:00,005][19065] Avg episode reward: [(0, '0.278')] [2024-06-06 14:26:00,911][19297] Updated weights for policy 0, policy_version 26324 (0.0032) [2024-06-06 14:26:03,499][19297] Updated weights for policy 0, policy_version 26334 (0.0036) [2024-06-06 14:26:05,008][19065] Fps is (10 sec: 47498.0, 60 sec: 43415.2, 300 sec: 43542.1). Total num frames: 431538176. Throughput: 0: 43281.3. Samples: 22462120. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 14:26:05,008][19065] Avg episode reward: [(0, '0.281')] [2024-06-06 14:26:08,174][19297] Updated weights for policy 0, policy_version 26344 (0.0031) [2024-06-06 14:26:10,005][19065] Fps is (10 sec: 42598.2, 60 sec: 43144.5, 300 sec: 43264.9). Total num frames: 431702016. Throughput: 0: 42999.6. Samples: 22597000. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 14:26:10,005][19065] Avg episode reward: [(0, '0.277')] [2024-06-06 14:26:11,081][19297] Updated weights for policy 0, policy_version 26354 (0.0047) [2024-06-06 14:26:15,005][19065] Fps is (10 sec: 37695.6, 60 sec: 43693.0, 300 sec: 43320.4). Total num frames: 431915008. Throughput: 0: 43179.7. Samples: 22856080. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 14:26:15,005][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:26:15,376][19297] Updated weights for policy 0, policy_version 26364 (0.0043) [2024-06-06 14:26:18,418][19297] Updated weights for policy 0, policy_version 26374 (0.0039) [2024-06-06 14:26:20,005][19065] Fps is (10 sec: 47513.6, 60 sec: 43144.5, 300 sec: 43487.0). Total num frames: 432177152. Throughput: 0: 43215.5. Samples: 23113260. Policy #0 lag: (min: 0.0, avg: 12.9, max: 24.0) [2024-06-06 14:26:20,005][19065] Avg episode reward: [(0, '0.273')] [2024-06-06 14:26:23,334][19297] Updated weights for policy 0, policy_version 26384 (0.0032) [2024-06-06 14:26:25,005][19065] Fps is (10 sec: 44237.0, 60 sec: 43417.7, 300 sec: 43264.9). Total num frames: 432357376. Throughput: 0: 43152.6. Samples: 23250980. Policy #0 lag: (min: 0.0, avg: 12.9, max: 24.0) [2024-06-06 14:26:25,005][19065] Avg episode reward: [(0, '0.274')] [2024-06-06 14:26:26,227][19297] Updated weights for policy 0, policy_version 26394 (0.0031) [2024-06-06 14:26:26,929][19277] Signal inference workers to stop experience collection... (300 times) [2024-06-06 14:26:26,929][19277] Signal inference workers to resume experience collection... (300 times) [2024-06-06 14:26:26,939][19297] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-06 14:26:26,940][19297] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-06 14:26:30,005][19065] Fps is (10 sec: 39321.2, 60 sec: 43417.6, 300 sec: 43264.9). Total num frames: 432570368. Throughput: 0: 43133.2. Samples: 23502880. Policy #0 lag: (min: 0.0, avg: 12.9, max: 24.0) [2024-06-06 14:26:30,005][19065] Avg episode reward: [(0, '0.285')] [2024-06-06 14:26:30,024][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000026402_432570368.pth... [2024-06-06 14:26:30,071][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000025767_422166528.pth [2024-06-06 14:26:31,013][19297] Updated weights for policy 0, policy_version 26404 (0.0041) [2024-06-06 14:26:33,783][19297] Updated weights for policy 0, policy_version 26414 (0.0021) [2024-06-06 14:26:35,005][19065] Fps is (10 sec: 47513.2, 60 sec: 43144.5, 300 sec: 43487.0). Total num frames: 432832512. Throughput: 0: 43278.7. Samples: 23762860. Policy #0 lag: (min: 0.0, avg: 12.9, max: 24.0) [2024-06-06 14:26:35,005][19065] Avg episode reward: [(0, '0.270')] [2024-06-06 14:26:38,261][19297] Updated weights for policy 0, policy_version 26424 (0.0025) [2024-06-06 14:26:40,005][19065] Fps is (10 sec: 44237.2, 60 sec: 43417.6, 300 sec: 43320.4). Total num frames: 433012736. Throughput: 0: 43128.4. Samples: 23897260. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:26:40,005][19065] Avg episode reward: [(0, '0.274')] [2024-06-06 14:26:41,511][19297] Updated weights for policy 0, policy_version 26434 (0.0032) [2024-06-06 14:26:45,008][19065] Fps is (10 sec: 39308.9, 60 sec: 43142.2, 300 sec: 43375.5). Total num frames: 433225728. Throughput: 0: 43267.9. Samples: 24154740. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:26:45,009][19065] Avg episode reward: [(0, '0.275')] [2024-06-06 14:26:45,662][19297] Updated weights for policy 0, policy_version 26444 (0.0049) [2024-06-06 14:26:48,859][19297] Updated weights for policy 0, policy_version 26454 (0.0027) [2024-06-06 14:26:50,005][19065] Fps is (10 sec: 47514.0, 60 sec: 43417.7, 300 sec: 43542.6). Total num frames: 433487872. Throughput: 0: 43347.7. Samples: 24412620. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:26:50,005][19065] Avg episode reward: [(0, '0.270')] [2024-06-06 14:26:53,396][19297] Updated weights for policy 0, policy_version 26464 (0.0038) [2024-06-06 14:26:55,008][19065] Fps is (10 sec: 42598.3, 60 sec: 43142.2, 300 sec: 43264.4). Total num frames: 433651712. Throughput: 0: 43399.5. Samples: 24550120. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:26:55,009][19065] Avg episode reward: [(0, '0.273')] [2024-06-06 14:26:56,657][19297] Updated weights for policy 0, policy_version 26474 (0.0034) [2024-06-06 14:27:00,005][19065] Fps is (10 sec: 37683.1, 60 sec: 43144.5, 300 sec: 43320.4). Total num frames: 433864704. Throughput: 0: 43194.2. Samples: 24799820. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:27:00,005][19065] Avg episode reward: [(0, '0.282')] [2024-06-06 14:27:00,908][19297] Updated weights for policy 0, policy_version 26484 (0.0037) [2024-06-06 14:27:03,985][19297] Updated weights for policy 0, policy_version 26494 (0.0038) [2024-06-06 14:27:05,005][19065] Fps is (10 sec: 45889.9, 60 sec: 42873.7, 300 sec: 43431.5). Total num frames: 434110464. Throughput: 0: 43291.0. Samples: 25061360. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:27:05,005][19065] Avg episode reward: [(0, '0.297')] [2024-06-06 14:27:05,042][19277] Saving new best policy, reward=0.297! [2024-06-06 14:27:08,322][19297] Updated weights for policy 0, policy_version 26504 (0.0030) [2024-06-06 14:27:10,005][19065] Fps is (10 sec: 44235.9, 60 sec: 43417.5, 300 sec: 43375.9). Total num frames: 434307072. Throughput: 0: 43239.3. Samples: 25196760. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:27:10,005][19065] Avg episode reward: [(0, '0.273')] [2024-06-06 14:27:11,773][19297] Updated weights for policy 0, policy_version 26514 (0.0042) [2024-06-06 14:27:15,005][19065] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 43375.9). Total num frames: 434536448. Throughput: 0: 43356.9. Samples: 25453940. Policy #0 lag: (min: 0.0, avg: 12.0, max: 22.0) [2024-06-06 14:27:15,006][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:27:15,714][19297] Updated weights for policy 0, policy_version 26524 (0.0033) [2024-06-06 14:27:19,146][19297] Updated weights for policy 0, policy_version 26534 (0.0031) [2024-06-06 14:27:20,005][19065] Fps is (10 sec: 45875.6, 60 sec: 43144.5, 300 sec: 43432.0). Total num frames: 434765824. Throughput: 0: 43348.4. Samples: 25713540. Policy #0 lag: (min: 0.0, avg: 12.0, max: 22.0) [2024-06-06 14:27:20,005][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:27:23,537][19297] Updated weights for policy 0, policy_version 26544 (0.0031) [2024-06-06 14:27:25,005][19065] Fps is (10 sec: 40960.0, 60 sec: 43144.4, 300 sec: 43320.4). Total num frames: 434946048. Throughput: 0: 43363.9. Samples: 25848640. Policy #0 lag: (min: 0.0, avg: 12.0, max: 22.0) [2024-06-06 14:27:25,005][19065] Avg episode reward: [(0, '0.273')] [2024-06-06 14:27:27,062][19297] Updated weights for policy 0, policy_version 26554 (0.0040) [2024-06-06 14:27:30,005][19065] Fps is (10 sec: 39321.4, 60 sec: 43144.5, 300 sec: 43375.9). Total num frames: 435159040. Throughput: 0: 43355.5. Samples: 26105600. Policy #0 lag: (min: 0.0, avg: 12.0, max: 22.0) [2024-06-06 14:27:30,006][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:27:30,990][19297] Updated weights for policy 0, policy_version 26564 (0.0029) [2024-06-06 14:27:34,390][19297] Updated weights for policy 0, policy_version 26574 (0.0034) [2024-06-06 14:27:35,005][19065] Fps is (10 sec: 45875.6, 60 sec: 42871.5, 300 sec: 43375.9). Total num frames: 435404800. Throughput: 0: 43345.3. Samples: 26363160. Policy #0 lag: (min: 1.0, avg: 10.6, max: 21.0) [2024-06-06 14:27:35,005][19065] Avg episode reward: [(0, '0.269')] [2024-06-06 14:27:38,463][19297] Updated weights for policy 0, policy_version 26584 (0.0029) [2024-06-06 14:27:40,005][19065] Fps is (10 sec: 44237.6, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 435601408. Throughput: 0: 43213.5. Samples: 26494580. Policy #0 lag: (min: 1.0, avg: 10.6, max: 21.0) [2024-06-06 14:27:40,005][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:27:41,246][19277] Signal inference workers to stop experience collection... (350 times) [2024-06-06 14:27:41,283][19297] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-06 14:27:41,293][19277] Signal inference workers to resume experience collection... (350 times) [2024-06-06 14:27:41,304][19297] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-06 14:27:42,013][19297] Updated weights for policy 0, policy_version 26594 (0.0031) [2024-06-06 14:27:45,005][19065] Fps is (10 sec: 42597.4, 60 sec: 43419.8, 300 sec: 43375.9). Total num frames: 435830784. Throughput: 0: 43384.2. Samples: 26752120. Policy #0 lag: (min: 1.0, avg: 10.6, max: 21.0) [2024-06-06 14:27:45,005][19065] Avg episode reward: [(0, '0.273')] [2024-06-06 14:27:46,026][19297] Updated weights for policy 0, policy_version 26604 (0.0030) [2024-06-06 14:27:49,550][19297] Updated weights for policy 0, policy_version 26614 (0.0032) [2024-06-06 14:27:50,005][19065] Fps is (10 sec: 45875.1, 60 sec: 42871.4, 300 sec: 43431.5). Total num frames: 436060160. Throughput: 0: 43300.6. Samples: 27009880. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 14:27:50,005][19065] Avg episode reward: [(0, '0.278')] [2024-06-06 14:27:53,627][19297] Updated weights for policy 0, policy_version 26624 (0.0026) [2024-06-06 14:27:55,005][19065] Fps is (10 sec: 42599.4, 60 sec: 43420.0, 300 sec: 43320.4). Total num frames: 436256768. Throughput: 0: 43214.4. Samples: 27141400. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 14:27:55,005][19065] Avg episode reward: [(0, '0.272')] [2024-06-06 14:27:57,572][19297] Updated weights for policy 0, policy_version 26634 (0.0033) [2024-06-06 14:28:00,008][19065] Fps is (10 sec: 40946.5, 60 sec: 43415.2, 300 sec: 43375.5). Total num frames: 436469760. Throughput: 0: 43172.5. Samples: 27396840. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 14:28:00,009][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:28:01,578][19297] Updated weights for policy 0, policy_version 26644 (0.0033) [2024-06-06 14:28:05,005][19065] Fps is (10 sec: 42598.8, 60 sec: 42871.6, 300 sec: 43264.9). Total num frames: 436682752. Throughput: 0: 43217.9. Samples: 27658340. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 14:28:05,005][19065] Avg episode reward: [(0, '0.284')] [2024-06-06 14:28:05,026][19297] Updated weights for policy 0, policy_version 26654 (0.0031) [2024-06-06 14:28:08,875][19297] Updated weights for policy 0, policy_version 26664 (0.0037) [2024-06-06 14:28:10,005][19065] Fps is (10 sec: 42612.0, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 436895744. Throughput: 0: 43038.7. Samples: 27785380. Policy #0 lag: (min: 0.0, avg: 11.2, max: 23.0) [2024-06-06 14:28:10,005][19065] Avg episode reward: [(0, '0.266')] [2024-06-06 14:28:12,723][19297] Updated weights for policy 0, policy_version 26674 (0.0037) [2024-06-06 14:28:15,005][19065] Fps is (10 sec: 45874.8, 60 sec: 43417.7, 300 sec: 43376.4). Total num frames: 437141504. Throughput: 0: 43139.2. Samples: 28046860. Policy #0 lag: (min: 0.0, avg: 11.2, max: 23.0) [2024-06-06 14:28:15,005][19065] Avg episode reward: [(0, '0.274')] [2024-06-06 14:28:16,685][19297] Updated weights for policy 0, policy_version 26684 (0.0032) [2024-06-06 14:28:20,005][19065] Fps is (10 sec: 44236.9, 60 sec: 42871.5, 300 sec: 43320.4). Total num frames: 437338112. Throughput: 0: 43193.8. Samples: 28306880. Policy #0 lag: (min: 0.0, avg: 11.2, max: 23.0) [2024-06-06 14:28:20,005][19065] Avg episode reward: [(0, '0.289')] [2024-06-06 14:28:20,309][19297] Updated weights for policy 0, policy_version 26694 (0.0033) [2024-06-06 14:28:24,067][19297] Updated weights for policy 0, policy_version 26704 (0.0028) [2024-06-06 14:28:25,005][19065] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 437567488. Throughput: 0: 43021.2. Samples: 28430540. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:28:25,005][19065] Avg episode reward: [(0, '0.285')] [2024-06-06 14:28:28,173][19297] Updated weights for policy 0, policy_version 26714 (0.0024) [2024-06-06 14:28:30,005][19065] Fps is (10 sec: 42597.8, 60 sec: 43417.6, 300 sec: 43320.4). Total num frames: 437764096. Throughput: 0: 43033.4. Samples: 28688620. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:28:30,006][19065] Avg episode reward: [(0, '0.289')] [2024-06-06 14:28:30,023][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000026719_437764096.pth... [2024-06-06 14:28:30,087][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000026085_427376640.pth [2024-06-06 14:28:31,716][19297] Updated weights for policy 0, policy_version 26724 (0.0035) [2024-06-06 14:28:35,005][19065] Fps is (10 sec: 40959.7, 60 sec: 42871.4, 300 sec: 43209.3). Total num frames: 437977088. Throughput: 0: 43035.4. Samples: 28946480. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:28:35,005][19065] Avg episode reward: [(0, '0.274')] [2024-06-06 14:28:35,687][19297] Updated weights for policy 0, policy_version 26734 (0.0034) [2024-06-06 14:28:39,228][19297] Updated weights for policy 0, policy_version 26744 (0.0036) [2024-06-06 14:28:40,005][19065] Fps is (10 sec: 42599.0, 60 sec: 43144.5, 300 sec: 43264.9). Total num frames: 438190080. Throughput: 0: 42866.7. Samples: 29070400. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:28:40,005][19065] Avg episode reward: [(0, '0.285')] [2024-06-06 14:28:43,357][19297] Updated weights for policy 0, policy_version 26754 (0.0043) [2024-06-06 14:28:45,005][19065] Fps is (10 sec: 44237.0, 60 sec: 43144.7, 300 sec: 43320.4). Total num frames: 438419456. Throughput: 0: 43039.9. Samples: 29333500. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:28:45,005][19065] Avg episode reward: [(0, '0.287')] [2024-06-06 14:28:46,693][19297] Updated weights for policy 0, policy_version 26764 (0.0040) [2024-06-06 14:28:50,005][19065] Fps is (10 sec: 40960.2, 60 sec: 42325.3, 300 sec: 43153.8). Total num frames: 438599680. Throughput: 0: 42821.7. Samples: 29585320. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:28:50,005][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:28:51,163][19297] Updated weights for policy 0, policy_version 26774 (0.0038) [2024-06-06 14:28:54,574][19297] Updated weights for policy 0, policy_version 26784 (0.0034) [2024-06-06 14:28:55,005][19065] Fps is (10 sec: 42598.5, 60 sec: 43144.5, 300 sec: 43320.5). Total num frames: 438845440. Throughput: 0: 42878.7. Samples: 29714920. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:28:55,005][19065] Avg episode reward: [(0, '0.277')] [2024-06-06 14:28:59,103][19297] Updated weights for policy 0, policy_version 26794 (0.0030) [2024-06-06 14:29:00,005][19065] Fps is (10 sec: 44236.2, 60 sec: 42873.7, 300 sec: 43209.3). Total num frames: 439042048. Throughput: 0: 42753.7. Samples: 29970780. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 14:29:00,005][19065] Avg episode reward: [(0, '0.274')] [2024-06-06 14:29:02,272][19297] Updated weights for policy 0, policy_version 26804 (0.0037) [2024-06-06 14:29:05,005][19065] Fps is (10 sec: 40960.4, 60 sec: 42871.5, 300 sec: 43154.3). Total num frames: 439255040. Throughput: 0: 42695.2. Samples: 30228160. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 14:29:05,005][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:29:06,630][19297] Updated weights for policy 0, policy_version 26814 (0.0036) [2024-06-06 14:29:09,775][19297] Updated weights for policy 0, policy_version 26824 (0.0035) [2024-06-06 14:29:10,005][19065] Fps is (10 sec: 44237.4, 60 sec: 43144.6, 300 sec: 43320.4). Total num frames: 439484416. Throughput: 0: 42752.5. Samples: 30354400. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 14:29:10,005][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:29:14,166][19297] Updated weights for policy 0, policy_version 26834 (0.0030) [2024-06-06 14:29:15,005][19065] Fps is (10 sec: 44236.8, 60 sec: 42598.5, 300 sec: 43209.3). Total num frames: 439697408. Throughput: 0: 42970.0. Samples: 30622260. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 14:29:15,005][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:29:17,127][19297] Updated weights for policy 0, policy_version 26844 (0.0037) [2024-06-06 14:29:19,236][19277] Signal inference workers to stop experience collection... (400 times) [2024-06-06 14:29:19,286][19297] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-06 14:29:19,347][19277] Signal inference workers to resume experience collection... (400 times) [2024-06-06 14:29:19,348][19297] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-06 14:29:20,005][19065] Fps is (10 sec: 42598.3, 60 sec: 42871.5, 300 sec: 43153.8). Total num frames: 439910400. Throughput: 0: 42786.8. Samples: 30871880. Policy #0 lag: (min: 1.0, avg: 11.2, max: 21.0) [2024-06-06 14:29:20,005][19065] Avg episode reward: [(0, '0.277')] [2024-06-06 14:29:21,547][19297] Updated weights for policy 0, policy_version 26854 (0.0040) [2024-06-06 14:29:25,004][19065] Fps is (10 sec: 42598.7, 60 sec: 42598.5, 300 sec: 43209.4). Total num frames: 440123392. Throughput: 0: 42940.6. Samples: 31002720. Policy #0 lag: (min: 1.0, avg: 11.2, max: 21.0) [2024-06-06 14:29:25,005][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:29:25,124][19297] Updated weights for policy 0, policy_version 26864 (0.0033) [2024-06-06 14:29:29,568][19297] Updated weights for policy 0, policy_version 26874 (0.0045) [2024-06-06 14:29:30,008][19065] Fps is (10 sec: 40946.7, 60 sec: 42596.2, 300 sec: 43153.3). Total num frames: 440320000. Throughput: 0: 42646.7. Samples: 31252740. Policy #0 lag: (min: 1.0, avg: 11.2, max: 21.0) [2024-06-06 14:29:30,008][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:29:33,075][19297] Updated weights for policy 0, policy_version 26884 (0.0032) [2024-06-06 14:29:35,005][19065] Fps is (10 sec: 40959.4, 60 sec: 42598.5, 300 sec: 43098.3). Total num frames: 440532992. Throughput: 0: 42708.0. Samples: 31507180. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 14:29:35,005][19065] Avg episode reward: [(0, '0.289')] [2024-06-06 14:29:37,287][19297] Updated weights for policy 0, policy_version 26894 (0.0031) [2024-06-06 14:29:40,005][19065] Fps is (10 sec: 44251.5, 60 sec: 42871.5, 300 sec: 43209.3). Total num frames: 440762368. Throughput: 0: 42754.3. Samples: 31638860. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 14:29:40,005][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:29:40,459][19297] Updated weights for policy 0, policy_version 26904 (0.0038) [2024-06-06 14:29:44,964][19297] Updated weights for policy 0, policy_version 26914 (0.0033) [2024-06-06 14:29:45,005][19065] Fps is (10 sec: 42598.6, 60 sec: 42325.4, 300 sec: 43042.7). Total num frames: 440958976. Throughput: 0: 42881.0. Samples: 31900420. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 14:29:45,005][19065] Avg episode reward: [(0, '0.278')] [2024-06-06 14:29:47,871][19297] Updated weights for policy 0, policy_version 26924 (0.0024) [2024-06-06 14:29:50,005][19065] Fps is (10 sec: 44236.3, 60 sec: 43417.5, 300 sec: 43153.8). Total num frames: 441204736. Throughput: 0: 42613.7. Samples: 32145780. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 14:29:50,005][19065] Avg episode reward: [(0, '0.294')] [2024-06-06 14:29:52,565][19297] Updated weights for policy 0, policy_version 26934 (0.0039) [2024-06-06 14:29:55,005][19065] Fps is (10 sec: 44236.8, 60 sec: 42598.5, 300 sec: 43209.3). Total num frames: 441401344. Throughput: 0: 42740.9. Samples: 32277740. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:29:55,005][19065] Avg episode reward: [(0, '0.284')] [2024-06-06 14:29:55,657][19297] Updated weights for policy 0, policy_version 26944 (0.0036) [2024-06-06 14:30:00,005][19065] Fps is (10 sec: 37682.7, 60 sec: 42325.3, 300 sec: 42876.1). Total num frames: 441581568. Throughput: 0: 42372.6. Samples: 32529040. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:30:00,005][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:30:00,759][19297] Updated weights for policy 0, policy_version 26954 (0.0035) [2024-06-06 14:30:03,836][19297] Updated weights for policy 0, policy_version 26964 (0.0028) [2024-06-06 14:30:05,005][19065] Fps is (10 sec: 40960.3, 60 sec: 42598.4, 300 sec: 43042.7). Total num frames: 441810944. Throughput: 0: 42396.1. Samples: 32779700. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:30:05,005][19065] Avg episode reward: [(0, '0.275')] [2024-06-06 14:30:08,380][19297] Updated weights for policy 0, policy_version 26974 (0.0023) [2024-06-06 14:30:10,005][19065] Fps is (10 sec: 45875.1, 60 sec: 42598.2, 300 sec: 43209.8). Total num frames: 442040320. Throughput: 0: 42429.4. Samples: 32912060. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:30:10,005][19065] Avg episode reward: [(0, '0.278')] [2024-06-06 14:30:11,469][19297] Updated weights for policy 0, policy_version 26984 (0.0028) [2024-06-06 14:30:15,005][19065] Fps is (10 sec: 39321.1, 60 sec: 41779.1, 300 sec: 42765.0). Total num frames: 442204160. Throughput: 0: 42350.1. Samples: 33158360. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-06 14:30:15,005][19065] Avg episode reward: [(0, '0.281')] [2024-06-06 14:30:16,100][19297] Updated weights for policy 0, policy_version 26994 (0.0038) [2024-06-06 14:30:18,958][19297] Updated weights for policy 0, policy_version 27004 (0.0032) [2024-06-06 14:30:20,005][19065] Fps is (10 sec: 40960.7, 60 sec: 42325.3, 300 sec: 43042.7). Total num frames: 442449920. Throughput: 0: 42321.7. Samples: 33411660. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-06 14:30:20,005][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:30:23,845][19297] Updated weights for policy 0, policy_version 27014 (0.0033) [2024-06-06 14:30:25,005][19065] Fps is (10 sec: 44236.9, 60 sec: 42052.2, 300 sec: 42987.2). Total num frames: 442646528. Throughput: 0: 42395.9. Samples: 33546680. Policy #0 lag: (min: 0.0, avg: 12.0, max: 21.0) [2024-06-06 14:30:25,006][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:30:26,709][19297] Updated weights for policy 0, policy_version 27024 (0.0022) [2024-06-06 14:30:30,005][19065] Fps is (10 sec: 40959.6, 60 sec: 42327.5, 300 sec: 42765.0). Total num frames: 442859520. Throughput: 0: 42028.3. Samples: 33791700. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 14:30:30,005][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:30:30,015][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000027030_442859520.pth... [2024-06-06 14:30:30,077][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000026402_432570368.pth [2024-06-06 14:30:31,650][19297] Updated weights for policy 0, policy_version 27034 (0.0027) [2024-06-06 14:30:34,499][19297] Updated weights for policy 0, policy_version 27044 (0.0036) [2024-06-06 14:30:35,008][19065] Fps is (10 sec: 44222.3, 60 sec: 42596.1, 300 sec: 42986.7). Total num frames: 443088896. Throughput: 0: 42192.1. Samples: 34044560. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 14:30:35,009][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:30:39,491][19297] Updated weights for policy 0, policy_version 27054 (0.0035) [2024-06-06 14:30:40,005][19065] Fps is (10 sec: 40960.5, 60 sec: 41779.1, 300 sec: 42820.6). Total num frames: 443269120. Throughput: 0: 42204.0. Samples: 34176920. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 14:30:40,005][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:30:42,425][19297] Updated weights for policy 0, policy_version 27064 (0.0029) [2024-06-06 14:30:45,005][19065] Fps is (10 sec: 40973.5, 60 sec: 42325.3, 300 sec: 42765.0). Total num frames: 443498496. Throughput: 0: 42174.9. Samples: 34426900. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 14:30:45,008][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:30:47,078][19277] Signal inference workers to stop experience collection... (450 times) [2024-06-06 14:30:47,078][19277] Signal inference workers to resume experience collection... (450 times) [2024-06-06 14:30:47,097][19297] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-06 14:30:47,097][19297] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-06 14:30:47,219][19297] Updated weights for policy 0, policy_version 27074 (0.0039) [2024-06-06 14:30:49,924][19297] Updated weights for policy 0, policy_version 27084 (0.0041) [2024-06-06 14:30:50,005][19065] Fps is (10 sec: 47513.2, 60 sec: 42325.3, 300 sec: 42987.2). Total num frames: 443744256. Throughput: 0: 42138.0. Samples: 34675920. Policy #0 lag: (min: 0.0, avg: 7.9, max: 21.0) [2024-06-06 14:30:50,005][19065] Avg episode reward: [(0, '0.287')] [2024-06-06 14:30:55,005][19065] Fps is (10 sec: 39321.2, 60 sec: 41506.0, 300 sec: 42765.0). Total num frames: 443891712. Throughput: 0: 42101.9. Samples: 34806640. Policy #0 lag: (min: 0.0, avg: 7.9, max: 21.0) [2024-06-06 14:30:55,005][19065] Avg episode reward: [(0, '0.287')] [2024-06-06 14:30:55,077][19297] Updated weights for policy 0, policy_version 27094 (0.0038) [2024-06-06 14:30:57,796][19297] Updated weights for policy 0, policy_version 27104 (0.0032) [2024-06-06 14:31:00,008][19065] Fps is (10 sec: 37671.3, 60 sec: 42323.2, 300 sec: 42653.9). Total num frames: 444121088. Throughput: 0: 42099.7. Samples: 35052980. Policy #0 lag: (min: 0.0, avg: 7.9, max: 21.0) [2024-06-06 14:31:00,009][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:31:02,740][19297] Updated weights for policy 0, policy_version 27114 (0.0030) [2024-06-06 14:31:05,005][19065] Fps is (10 sec: 47514.2, 60 sec: 42598.4, 300 sec: 42931.6). Total num frames: 444366848. Throughput: 0: 42065.9. Samples: 35304620. Policy #0 lag: (min: 1.0, avg: 11.3, max: 24.0) [2024-06-06 14:31:05,005][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:31:05,700][19297] Updated weights for policy 0, policy_version 27124 (0.0037) [2024-06-06 14:31:10,005][19065] Fps is (10 sec: 40973.3, 60 sec: 41506.3, 300 sec: 42765.0). Total num frames: 444530688. Throughput: 0: 41984.9. Samples: 35436000. Policy #0 lag: (min: 1.0, avg: 11.3, max: 24.0) [2024-06-06 14:31:10,005][19065] Avg episode reward: [(0, '0.272')] [2024-06-06 14:31:10,809][19297] Updated weights for policy 0, policy_version 27134 (0.0038) [2024-06-06 14:31:13,396][19297] Updated weights for policy 0, policy_version 27144 (0.0040) [2024-06-06 14:31:15,005][19065] Fps is (10 sec: 40960.0, 60 sec: 42871.5, 300 sec: 42709.5). Total num frames: 444776448. Throughput: 0: 42090.4. Samples: 35685760. Policy #0 lag: (min: 1.0, avg: 11.3, max: 24.0) [2024-06-06 14:31:15,005][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:31:18,396][19297] Updated weights for policy 0, policy_version 27154 (0.0033) [2024-06-06 14:31:20,005][19065] Fps is (10 sec: 42598.1, 60 sec: 41779.2, 300 sec: 42709.5). Total num frames: 444956672. Throughput: 0: 42247.5. Samples: 35945560. Policy #0 lag: (min: 1.0, avg: 11.3, max: 24.0) [2024-06-06 14:31:20,005][19065] Avg episode reward: [(0, '0.300')] [2024-06-06 14:31:20,047][19277] Saving new best policy, reward=0.300! [2024-06-06 14:31:21,116][19297] Updated weights for policy 0, policy_version 27164 (0.0046) [2024-06-06 14:31:25,005][19065] Fps is (10 sec: 39321.4, 60 sec: 42052.3, 300 sec: 42709.5). Total num frames: 445169664. Throughput: 0: 41945.3. Samples: 36064460. Policy #0 lag: (min: 0.0, avg: 11.9, max: 24.0) [2024-06-06 14:31:25,005][19065] Avg episode reward: [(0, '0.285')] [2024-06-06 14:31:26,306][19297] Updated weights for policy 0, policy_version 27174 (0.0038) [2024-06-06 14:31:28,864][19297] Updated weights for policy 0, policy_version 27184 (0.0035) [2024-06-06 14:31:30,005][19065] Fps is (10 sec: 45875.6, 60 sec: 42598.5, 300 sec: 42653.9). Total num frames: 445415424. Throughput: 0: 42098.7. Samples: 36321340. Policy #0 lag: (min: 0.0, avg: 11.9, max: 24.0) [2024-06-06 14:31:30,005][19065] Avg episode reward: [(0, '0.289')] [2024-06-06 14:31:34,010][19297] Updated weights for policy 0, policy_version 27194 (0.0027) [2024-06-06 14:31:35,005][19065] Fps is (10 sec: 40960.4, 60 sec: 41508.5, 300 sec: 42598.4). Total num frames: 445579264. Throughput: 0: 42149.9. Samples: 36572660. Policy #0 lag: (min: 0.0, avg: 11.9, max: 24.0) [2024-06-06 14:31:35,005][19065] Avg episode reward: [(0, '0.285')] [2024-06-06 14:31:36,755][19297] Updated weights for policy 0, policy_version 27204 (0.0035) [2024-06-06 14:31:40,005][19065] Fps is (10 sec: 39321.3, 60 sec: 42325.3, 300 sec: 42654.4). Total num frames: 445808640. Throughput: 0: 41982.7. Samples: 36695860. Policy #0 lag: (min: 0.0, avg: 11.9, max: 24.0) [2024-06-06 14:31:40,005][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:31:41,972][19297] Updated weights for policy 0, policy_version 27214 (0.0032) [2024-06-06 14:31:44,695][19297] Updated weights for policy 0, policy_version 27224 (0.0037) [2024-06-06 14:31:45,005][19065] Fps is (10 sec: 47513.1, 60 sec: 42598.4, 300 sec: 42598.4). Total num frames: 446054400. Throughput: 0: 42187.9. Samples: 36951300. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 14:31:45,005][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:31:49,698][19297] Updated weights for policy 0, policy_version 27234 (0.0039) [2024-06-06 14:31:50,005][19065] Fps is (10 sec: 40960.1, 60 sec: 41233.1, 300 sec: 42598.9). Total num frames: 446218240. Throughput: 0: 42355.9. Samples: 37210640. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 14:31:50,005][19065] Avg episode reward: [(0, '0.277')] [2024-06-06 14:31:52,275][19297] Updated weights for policy 0, policy_version 27244 (0.0024) [2024-06-06 14:31:52,276][19277] Signal inference workers to stop experience collection... (500 times) [2024-06-06 14:31:52,276][19277] Signal inference workers to resume experience collection... (500 times) [2024-06-06 14:31:52,314][19297] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-06 14:31:52,314][19297] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-06 14:31:55,006][19065] Fps is (10 sec: 39315.1, 60 sec: 42597.3, 300 sec: 42653.7). Total num frames: 446447616. Throughput: 0: 42028.7. Samples: 37327360. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 14:31:55,007][19065] Avg episode reward: [(0, '0.287')] [2024-06-06 14:31:57,057][19297] Updated weights for policy 0, policy_version 27254 (0.0037) [2024-06-06 14:32:00,005][19065] Fps is (10 sec: 44236.8, 60 sec: 42327.6, 300 sec: 42542.9). Total num frames: 446660608. Throughput: 0: 42274.6. Samples: 37588120. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 14:32:00,005][19065] Avg episode reward: [(0, '0.291')] [2024-06-06 14:32:00,232][19297] Updated weights for policy 0, policy_version 27264 (0.0034) [2024-06-06 14:32:04,970][19297] Updated weights for policy 0, policy_version 27274 (0.0030) [2024-06-06 14:32:05,005][19065] Fps is (10 sec: 40966.3, 60 sec: 41506.0, 300 sec: 42542.9). Total num frames: 446857216. Throughput: 0: 41969.3. Samples: 37834180. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 14:32:05,005][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:32:07,852][19297] Updated weights for policy 0, policy_version 27284 (0.0032) [2024-06-06 14:32:10,005][19065] Fps is (10 sec: 40960.4, 60 sec: 42325.4, 300 sec: 42487.3). Total num frames: 447070208. Throughput: 0: 42096.5. Samples: 37958800. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 14:32:10,005][19065] Avg episode reward: [(0, '0.299')] [2024-06-06 14:32:13,208][19297] Updated weights for policy 0, policy_version 27294 (0.0031) [2024-06-06 14:32:15,005][19065] Fps is (10 sec: 44237.5, 60 sec: 42052.2, 300 sec: 42487.3). Total num frames: 447299584. Throughput: 0: 42176.0. Samples: 38219260. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 14:32:15,005][19065] Avg episode reward: [(0, '0.279')] [2024-06-06 14:32:15,593][19297] Updated weights for policy 0, policy_version 27304 (0.0026) [2024-06-06 14:32:20,008][19065] Fps is (10 sec: 40944.2, 60 sec: 42049.7, 300 sec: 42486.8). Total num frames: 447479808. Throughput: 0: 42204.4. Samples: 38472020. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:32:20,009][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:32:20,812][19297] Updated weights for policy 0, policy_version 27314 (0.0039) [2024-06-06 14:32:23,440][19297] Updated weights for policy 0, policy_version 27324 (0.0026) [2024-06-06 14:32:25,005][19065] Fps is (10 sec: 42598.3, 60 sec: 42598.4, 300 sec: 42598.4). Total num frames: 447725568. Throughput: 0: 42122.7. Samples: 38591380. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:32:25,005][19065] Avg episode reward: [(0, '0.281')] [2024-06-06 14:32:28,180][19297] Updated weights for policy 0, policy_version 27334 (0.0027) [2024-06-06 14:32:30,005][19065] Fps is (10 sec: 42614.7, 60 sec: 41506.1, 300 sec: 42376.2). Total num frames: 447905792. Throughput: 0: 42169.8. Samples: 38848940. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:32:30,005][19065] Avg episode reward: [(0, '0.287')] [2024-06-06 14:32:30,062][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000027339_447922176.pth... [2024-06-06 14:32:30,110][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000026719_437764096.pth [2024-06-06 14:32:31,523][19297] Updated weights for policy 0, policy_version 27344 (0.0031) [2024-06-06 14:32:35,005][19065] Fps is (10 sec: 39321.8, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 448118784. Throughput: 0: 41975.2. Samples: 39099520. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 14:32:35,005][19065] Avg episode reward: [(0, '0.281')] [2024-06-06 14:32:36,270][19297] Updated weights for policy 0, policy_version 27354 (0.0041) [2024-06-06 14:32:39,143][19297] Updated weights for policy 0, policy_version 27364 (0.0038) [2024-06-06 14:32:40,005][19065] Fps is (10 sec: 44236.3, 60 sec: 42325.3, 300 sec: 42431.8). Total num frames: 448348160. Throughput: 0: 42244.6. Samples: 39228300. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 14:32:40,005][19065] Avg episode reward: [(0, '0.282')] [2024-06-06 14:32:44,106][19297] Updated weights for policy 0, policy_version 27374 (0.0041) [2024-06-06 14:32:45,008][19065] Fps is (10 sec: 42582.1, 60 sec: 41503.5, 300 sec: 42320.2). Total num frames: 448544768. Throughput: 0: 42045.0. Samples: 39480300. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 14:32:45,009][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:32:46,899][19297] Updated weights for policy 0, policy_version 27384 (0.0035) [2024-06-06 14:32:50,005][19065] Fps is (10 sec: 40960.1, 60 sec: 42325.3, 300 sec: 42376.2). Total num frames: 448757760. Throughput: 0: 42126.7. Samples: 39729880. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 14:32:50,005][19065] Avg episode reward: [(0, '0.294')] [2024-06-06 14:32:51,865][19297] Updated weights for policy 0, policy_version 27394 (0.0024) [2024-06-06 14:32:55,008][19065] Fps is (10 sec: 42600.6, 60 sec: 42051.2, 300 sec: 42376.2). Total num frames: 448970752. Throughput: 0: 42235.6. Samples: 39859540. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 14:32:55,009][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:32:55,094][19297] Updated weights for policy 0, policy_version 27404 (0.0031) [2024-06-06 14:32:59,361][19297] Updated weights for policy 0, policy_version 27414 (0.0024) [2024-06-06 14:33:00,005][19065] Fps is (10 sec: 40960.2, 60 sec: 41779.2, 300 sec: 42320.7). Total num frames: 449167360. Throughput: 0: 42018.6. Samples: 40110100. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:33:00,005][19065] Avg episode reward: [(0, '0.287')] [2024-06-06 14:33:02,422][19277] Signal inference workers to stop experience collection... (550 times) [2024-06-06 14:33:02,423][19277] Signal inference workers to resume experience collection... (550 times) [2024-06-06 14:33:02,445][19297] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-06 14:33:02,445][19297] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-06 14:33:02,554][19297] Updated weights for policy 0, policy_version 27424 (0.0035) [2024-06-06 14:33:05,005][19065] Fps is (10 sec: 42612.5, 60 sec: 42325.5, 300 sec: 42376.3). Total num frames: 449396736. Throughput: 0: 42021.4. Samples: 40362820. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:33:05,005][19065] Avg episode reward: [(0, '0.302')] [2024-06-06 14:33:05,005][19277] Saving new best policy, reward=0.302! [2024-06-06 14:33:07,289][19297] Updated weights for policy 0, policy_version 27434 (0.0033) [2024-06-06 14:33:10,008][19065] Fps is (10 sec: 44222.3, 60 sec: 42323.0, 300 sec: 42264.7). Total num frames: 449609728. Throughput: 0: 42272.0. Samples: 40493760. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:33:10,009][19065] Avg episode reward: [(0, '0.287')] [2024-06-06 14:33:10,415][19297] Updated weights for policy 0, policy_version 27444 (0.0036) [2024-06-06 14:33:14,980][19297] Updated weights for policy 0, policy_version 27454 (0.0030) [2024-06-06 14:33:15,005][19065] Fps is (10 sec: 40959.5, 60 sec: 41779.1, 300 sec: 42265.2). Total num frames: 449806336. Throughput: 0: 42123.0. Samples: 40744480. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:33:15,005][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:33:18,026][19297] Updated weights for policy 0, policy_version 27464 (0.0043) [2024-06-06 14:33:20,005][19065] Fps is (10 sec: 44251.4, 60 sec: 42874.2, 300 sec: 42320.7). Total num frames: 450052096. Throughput: 0: 42193.7. Samples: 40998240. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:33:20,005][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:33:22,981][19297] Updated weights for policy 0, policy_version 27474 (0.0034) [2024-06-06 14:33:25,005][19065] Fps is (10 sec: 42598.8, 60 sec: 41779.2, 300 sec: 42265.2). Total num frames: 450232320. Throughput: 0: 42213.0. Samples: 41127880. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:33:25,005][19065] Avg episode reward: [(0, '0.275')] [2024-06-06 14:33:26,172][19297] Updated weights for policy 0, policy_version 27484 (0.0028) [2024-06-06 14:33:30,005][19065] Fps is (10 sec: 37682.6, 60 sec: 42052.1, 300 sec: 42209.6). Total num frames: 450428928. Throughput: 0: 42150.0. Samples: 41376900. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:33:30,005][19065] Avg episode reward: [(0, '0.298')] [2024-06-06 14:33:30,684][19297] Updated weights for policy 0, policy_version 27494 (0.0034) [2024-06-06 14:33:34,058][19297] Updated weights for policy 0, policy_version 27504 (0.0038) [2024-06-06 14:33:35,008][19065] Fps is (10 sec: 42584.4, 60 sec: 42323.0, 300 sec: 42264.7). Total num frames: 450658304. Throughput: 0: 42065.0. Samples: 41622940. Policy #0 lag: (min: 1.0, avg: 11.2, max: 22.0) [2024-06-06 14:33:35,009][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:33:38,261][19297] Updated weights for policy 0, policy_version 27514 (0.0026) [2024-06-06 14:33:40,005][19065] Fps is (10 sec: 44237.6, 60 sec: 42052.3, 300 sec: 42209.6). Total num frames: 450871296. Throughput: 0: 42027.9. Samples: 41750660. Policy #0 lag: (min: 1.0, avg: 11.2, max: 22.0) [2024-06-06 14:33:40,005][19065] Avg episode reward: [(0, '0.278')] [2024-06-06 14:33:41,947][19297] Updated weights for policy 0, policy_version 27524 (0.0027) [2024-06-06 14:33:45,005][19065] Fps is (10 sec: 40972.9, 60 sec: 42054.8, 300 sec: 42265.1). Total num frames: 451067904. Throughput: 0: 42015.5. Samples: 42000800. Policy #0 lag: (min: 1.0, avg: 11.2, max: 22.0) [2024-06-06 14:33:45,005][19065] Avg episode reward: [(0, '0.302')] [2024-06-06 14:33:46,396][19297] Updated weights for policy 0, policy_version 27534 (0.0029) [2024-06-06 14:33:49,621][19297] Updated weights for policy 0, policy_version 27544 (0.0041) [2024-06-06 14:33:50,005][19065] Fps is (10 sec: 42598.2, 60 sec: 42325.4, 300 sec: 42209.6). Total num frames: 451297280. Throughput: 0: 42058.1. Samples: 42255440. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:33:50,005][19065] Avg episode reward: [(0, '0.284')] [2024-06-06 14:33:54,236][19297] Updated weights for policy 0, policy_version 27554 (0.0037) [2024-06-06 14:33:55,005][19065] Fps is (10 sec: 42599.3, 60 sec: 42054.6, 300 sec: 42209.7). Total num frames: 451493888. Throughput: 0: 41844.5. Samples: 42376620. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:33:55,005][19065] Avg episode reward: [(0, '0.287')] [2024-06-06 14:33:57,873][19297] Updated weights for policy 0, policy_version 27564 (0.0035) [2024-06-06 14:34:00,005][19065] Fps is (10 sec: 40959.5, 60 sec: 42325.2, 300 sec: 42209.6). Total num frames: 451706880. Throughput: 0: 41939.9. Samples: 42631780. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:34:00,005][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:34:01,769][19297] Updated weights for policy 0, policy_version 27574 (0.0043) [2024-06-06 14:34:05,005][19065] Fps is (10 sec: 40958.9, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 451903488. Throughput: 0: 41665.6. Samples: 42873200. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:34:05,005][19065] Avg episode reward: [(0, '0.282')] [2024-06-06 14:34:05,396][19297] Updated weights for policy 0, policy_version 27584 (0.0037) [2024-06-06 14:34:09,395][19297] Updated weights for policy 0, policy_version 27594 (0.0045) [2024-06-06 14:34:10,005][19065] Fps is (10 sec: 40960.9, 60 sec: 41781.6, 300 sec: 42098.5). Total num frames: 452116480. Throughput: 0: 41685.8. Samples: 43003740. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 14:34:10,005][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:34:13,038][19297] Updated weights for policy 0, policy_version 27604 (0.0037) [2024-06-06 14:34:15,005][19065] Fps is (10 sec: 42598.6, 60 sec: 42052.2, 300 sec: 42098.5). Total num frames: 452329472. Throughput: 0: 41663.2. Samples: 43251740. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 14:34:15,005][19065] Avg episode reward: [(0, '0.284')] [2024-06-06 14:34:17,423][19297] Updated weights for policy 0, policy_version 27614 (0.0034) [2024-06-06 14:34:20,005][19065] Fps is (10 sec: 42598.4, 60 sec: 41506.2, 300 sec: 42098.5). Total num frames: 452542464. Throughput: 0: 41959.1. Samples: 43510960. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 14:34:20,005][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:34:20,903][19297] Updated weights for policy 0, policy_version 27624 (0.0032) [2024-06-06 14:34:25,005][19065] Fps is (10 sec: 40960.7, 60 sec: 41779.2, 300 sec: 42099.0). Total num frames: 452739072. Throughput: 0: 41820.9. Samples: 43632600. Policy #0 lag: (min: 0.0, avg: 9.4, max: 20.0) [2024-06-06 14:34:25,005][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:34:25,287][19297] Updated weights for policy 0, policy_version 27634 (0.0035) [2024-06-06 14:34:28,482][19297] Updated weights for policy 0, policy_version 27644 (0.0033) [2024-06-06 14:34:30,005][19065] Fps is (10 sec: 42598.1, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 452968448. Throughput: 0: 42003.6. Samples: 43890960. Policy #0 lag: (min: 1.0, avg: 10.6, max: 22.0) [2024-06-06 14:34:30,005][19065] Avg episode reward: [(0, '0.295')] [2024-06-06 14:34:30,026][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000027647_452968448.pth... [2024-06-06 14:34:30,080][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000027030_442859520.pth [2024-06-06 14:34:32,815][19297] Updated weights for policy 0, policy_version 27654 (0.0032) [2024-06-06 14:34:35,005][19065] Fps is (10 sec: 44236.7, 60 sec: 42054.6, 300 sec: 42098.5). Total num frames: 453181440. Throughput: 0: 41798.7. Samples: 44136380. Policy #0 lag: (min: 1.0, avg: 10.6, max: 22.0) [2024-06-06 14:34:35,005][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:34:36,529][19297] Updated weights for policy 0, policy_version 27664 (0.0025) [2024-06-06 14:34:40,005][19065] Fps is (10 sec: 40959.8, 60 sec: 41779.1, 300 sec: 42098.5). Total num frames: 453378048. Throughput: 0: 41911.4. Samples: 44262640. Policy #0 lag: (min: 1.0, avg: 10.6, max: 22.0) [2024-06-06 14:34:40,005][19065] Avg episode reward: [(0, '0.295')] [2024-06-06 14:34:40,515][19297] Updated weights for policy 0, policy_version 27674 (0.0038) [2024-06-06 14:34:41,921][19277] Signal inference workers to stop experience collection... (600 times) [2024-06-06 14:34:41,922][19277] Signal inference workers to resume experience collection... (600 times) [2024-06-06 14:34:41,954][19297] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-06 14:34:41,955][19297] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-06 14:34:44,267][19297] Updated weights for policy 0, policy_version 27684 (0.0034) [2024-06-06 14:34:45,008][19065] Fps is (10 sec: 39308.7, 60 sec: 41777.0, 300 sec: 41931.5). Total num frames: 453574656. Throughput: 0: 41900.7. Samples: 44517440. Policy #0 lag: (min: 1.0, avg: 10.6, max: 22.0) [2024-06-06 14:34:45,008][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:34:48,162][19297] Updated weights for policy 0, policy_version 27694 (0.0033) [2024-06-06 14:34:50,005][19065] Fps is (10 sec: 40960.5, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 453787648. Throughput: 0: 42165.1. Samples: 44770620. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 14:34:50,005][19065] Avg episode reward: [(0, '0.294')] [2024-06-06 14:34:52,467][19297] Updated weights for policy 0, policy_version 27704 (0.0042) [2024-06-06 14:34:55,008][19065] Fps is (10 sec: 44236.7, 60 sec: 42049.9, 300 sec: 42153.6). Total num frames: 454017024. Throughput: 0: 42125.3. Samples: 44899520. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 14:34:55,009][19065] Avg episode reward: [(0, '0.284')] [2024-06-06 14:34:55,874][19297] Updated weights for policy 0, policy_version 27714 (0.0036) [2024-06-06 14:34:59,905][19297] Updated weights for policy 0, policy_version 27724 (0.0044) [2024-06-06 14:35:00,005][19065] Fps is (10 sec: 44236.0, 60 sec: 42052.3, 300 sec: 42098.5). Total num frames: 454230016. Throughput: 0: 42280.4. Samples: 45154360. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 14:35:00,005][19065] Avg episode reward: [(0, '0.296')] [2024-06-06 14:35:03,791][19297] Updated weights for policy 0, policy_version 27734 (0.0038) [2024-06-06 14:35:05,005][19065] Fps is (10 sec: 42612.6, 60 sec: 42325.5, 300 sec: 42043.0). Total num frames: 454443008. Throughput: 0: 42035.1. Samples: 45402540. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 14:35:05,005][19065] Avg episode reward: [(0, '0.291')] [2024-06-06 14:35:08,012][19297] Updated weights for policy 0, policy_version 27744 (0.0033) [2024-06-06 14:35:10,005][19065] Fps is (10 sec: 40959.9, 60 sec: 42052.1, 300 sec: 42154.1). Total num frames: 454639616. Throughput: 0: 42155.4. Samples: 45529600. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 14:35:10,005][19065] Avg episode reward: [(0, '0.294')] [2024-06-06 14:35:11,243][19297] Updated weights for policy 0, policy_version 27754 (0.0034) [2024-06-06 14:35:15,005][19065] Fps is (10 sec: 39320.9, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 454836224. Throughput: 0: 42003.9. Samples: 45781140. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 14:35:15,005][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:35:15,643][19297] Updated weights for policy 0, policy_version 27764 (0.0038) [2024-06-06 14:35:18,789][19297] Updated weights for policy 0, policy_version 27774 (0.0036) [2024-06-06 14:35:20,006][19065] Fps is (10 sec: 44230.1, 60 sec: 42324.1, 300 sec: 42153.8). Total num frames: 455081984. Throughput: 0: 42078.8. Samples: 46030000. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 14:35:20,007][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:35:23,691][19297] Updated weights for policy 0, policy_version 27784 (0.0032) [2024-06-06 14:35:25,005][19065] Fps is (10 sec: 44237.8, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 455278592. Throughput: 0: 42346.4. Samples: 46168220. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 14:35:25,005][19065] Avg episode reward: [(0, '0.302')] [2024-06-06 14:35:26,567][19297] Updated weights for policy 0, policy_version 27794 (0.0042) [2024-06-06 14:35:30,005][19065] Fps is (10 sec: 39328.2, 60 sec: 41779.2, 300 sec: 41987.9). Total num frames: 455475200. Throughput: 0: 42267.5. Samples: 46419340. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 14:35:30,005][19065] Avg episode reward: [(0, '0.295')] [2024-06-06 14:35:31,187][19297] Updated weights for policy 0, policy_version 27804 (0.0027) [2024-06-06 14:35:34,690][19297] Updated weights for policy 0, policy_version 27814 (0.0030) [2024-06-06 14:35:35,005][19065] Fps is (10 sec: 42598.0, 60 sec: 42052.3, 300 sec: 42154.1). Total num frames: 455704576. Throughput: 0: 41958.2. Samples: 46658740. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 14:35:35,005][19065] Avg episode reward: [(0, '0.282')] [2024-06-06 14:35:39,301][19297] Updated weights for policy 0, policy_version 27824 (0.0032) [2024-06-06 14:35:40,005][19065] Fps is (10 sec: 40959.8, 60 sec: 41779.2, 300 sec: 41987.5). Total num frames: 455884800. Throughput: 0: 42076.4. Samples: 46792820. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 14:35:40,005][19065] Avg episode reward: [(0, '0.295')] [2024-06-06 14:35:42,283][19297] Updated weights for policy 0, policy_version 27834 (0.0028) [2024-06-06 14:35:45,005][19065] Fps is (10 sec: 40959.6, 60 sec: 42327.6, 300 sec: 41931.9). Total num frames: 456114176. Throughput: 0: 41906.7. Samples: 47040160. Policy #0 lag: (min: 1.0, avg: 8.8, max: 22.0) [2024-06-06 14:35:45,005][19065] Avg episode reward: [(0, '0.285')] [2024-06-06 14:35:47,059][19297] Updated weights for policy 0, policy_version 27844 (0.0030) [2024-06-06 14:35:49,883][19297] Updated weights for policy 0, policy_version 27854 (0.0040) [2024-06-06 14:35:50,006][19065] Fps is (10 sec: 47506.1, 60 sec: 42870.3, 300 sec: 42264.9). Total num frames: 456359936. Throughput: 0: 42055.8. Samples: 47295120. Policy #0 lag: (min: 1.0, avg: 8.8, max: 22.0) [2024-06-06 14:35:50,007][19065] Avg episode reward: [(0, '0.273')] [2024-06-06 14:35:55,005][19065] Fps is (10 sec: 37683.7, 60 sec: 41235.4, 300 sec: 41932.4). Total num frames: 456491008. Throughput: 0: 42097.1. Samples: 47423960. Policy #0 lag: (min: 1.0, avg: 8.8, max: 22.0) [2024-06-06 14:35:55,005][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:35:55,184][19297] Updated weights for policy 0, policy_version 27864 (0.0034) [2024-06-06 14:35:58,038][19297] Updated weights for policy 0, policy_version 27874 (0.0039) [2024-06-06 14:36:00,005][19065] Fps is (10 sec: 37689.2, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 456736768. Throughput: 0: 41834.8. Samples: 47663700. Policy #0 lag: (min: 0.0, avg: 13.3, max: 24.0) [2024-06-06 14:36:00,005][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:36:03,241][19297] Updated weights for policy 0, policy_version 27884 (0.0039) [2024-06-06 14:36:04,058][19277] Signal inference workers to stop experience collection... (650 times) [2024-06-06 14:36:04,059][19277] Signal inference workers to resume experience collection... (650 times) [2024-06-06 14:36:04,107][19297] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-06 14:36:04,107][19297] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-06 14:36:05,005][19065] Fps is (10 sec: 47512.9, 60 sec: 42052.2, 300 sec: 42154.1). Total num frames: 456966144. Throughput: 0: 41979.7. Samples: 47919020. Policy #0 lag: (min: 0.0, avg: 13.3, max: 24.0) [2024-06-06 14:36:05,005][19065] Avg episode reward: [(0, '0.294')] [2024-06-06 14:36:06,043][19297] Updated weights for policy 0, policy_version 27894 (0.0028) [2024-06-06 14:36:10,005][19065] Fps is (10 sec: 40960.1, 60 sec: 41779.3, 300 sec: 41931.9). Total num frames: 457146368. Throughput: 0: 41790.6. Samples: 48048800. Policy #0 lag: (min: 0.0, avg: 13.3, max: 24.0) [2024-06-06 14:36:10,005][19065] Avg episode reward: [(0, '0.297')] [2024-06-06 14:36:10,671][19297] Updated weights for policy 0, policy_version 27904 (0.0029) [2024-06-06 14:36:13,649][19297] Updated weights for policy 0, policy_version 27914 (0.0033) [2024-06-06 14:36:15,005][19065] Fps is (10 sec: 40960.4, 60 sec: 42325.4, 300 sec: 42098.6). Total num frames: 457375744. Throughput: 0: 41713.7. Samples: 48296460. Policy #0 lag: (min: 0.0, avg: 13.3, max: 24.0) [2024-06-06 14:36:15,005][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:36:18,538][19297] Updated weights for policy 0, policy_version 27924 (0.0043) [2024-06-06 14:36:20,005][19065] Fps is (10 sec: 44236.1, 60 sec: 41780.3, 300 sec: 42098.5). Total num frames: 457588736. Throughput: 0: 42031.4. Samples: 48550160. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:36:20,005][19065] Avg episode reward: [(0, '0.289')] [2024-06-06 14:36:21,301][19297] Updated weights for policy 0, policy_version 27934 (0.0041) [2024-06-06 14:36:25,005][19065] Fps is (10 sec: 39321.5, 60 sec: 41506.0, 300 sec: 41876.4). Total num frames: 457768960. Throughput: 0: 41848.4. Samples: 48676000. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:36:25,005][19065] Avg episode reward: [(0, '0.286')] [2024-06-06 14:36:26,421][19297] Updated weights for policy 0, policy_version 27944 (0.0043) [2024-06-06 14:36:29,206][19297] Updated weights for policy 0, policy_version 27954 (0.0030) [2024-06-06 14:36:30,005][19065] Fps is (10 sec: 42598.3, 60 sec: 42325.2, 300 sec: 42154.1). Total num frames: 458014720. Throughput: 0: 41859.5. Samples: 48923840. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:36:30,005][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:36:30,148][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000027956_458031104.pth... [2024-06-06 14:36:30,195][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000027339_447922176.pth [2024-06-06 14:36:34,289][19297] Updated weights for policy 0, policy_version 27964 (0.0030) [2024-06-06 14:36:35,005][19065] Fps is (10 sec: 42598.9, 60 sec: 41506.2, 300 sec: 41987.5). Total num frames: 458194944. Throughput: 0: 41812.2. Samples: 49176600. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:36:35,005][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:36:37,147][19297] Updated weights for policy 0, policy_version 27974 (0.0033) [2024-06-06 14:36:40,005][19065] Fps is (10 sec: 40960.3, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 458424320. Throughput: 0: 41621.2. Samples: 49296920. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 14:36:40,005][19065] Avg episode reward: [(0, '0.297')] [2024-06-06 14:36:41,911][19297] Updated weights for policy 0, policy_version 27984 (0.0032) [2024-06-06 14:36:44,619][19297] Updated weights for policy 0, policy_version 27994 (0.0035) [2024-06-06 14:36:45,005][19065] Fps is (10 sec: 45874.9, 60 sec: 42325.4, 300 sec: 42154.1). Total num frames: 458653696. Throughput: 0: 42071.1. Samples: 49556900. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 14:36:45,005][19065] Avg episode reward: [(0, '0.285')] [2024-06-06 14:36:49,850][19297] Updated weights for policy 0, policy_version 28004 (0.0022) [2024-06-06 14:36:50,005][19065] Fps is (10 sec: 39321.4, 60 sec: 40961.0, 300 sec: 41932.2). Total num frames: 458817536. Throughput: 0: 42053.3. Samples: 49811420. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 14:36:50,014][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:36:52,349][19297] Updated weights for policy 0, policy_version 28014 (0.0038) [2024-06-06 14:36:55,005][19065] Fps is (10 sec: 37683.1, 60 sec: 42325.3, 300 sec: 41931.9). Total num frames: 459030528. Throughput: 0: 41768.4. Samples: 49928380. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 14:36:55,005][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:36:57,660][19297] Updated weights for policy 0, policy_version 28024 (0.0048) [2024-06-06 14:37:00,008][19065] Fps is (10 sec: 45860.4, 60 sec: 42323.0, 300 sec: 42098.1). Total num frames: 459276288. Throughput: 0: 41992.9. Samples: 50186280. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 14:37:00,009][19065] Avg episode reward: [(0, '0.291')] [2024-06-06 14:37:00,478][19297] Updated weights for policy 0, policy_version 28034 (0.0047) [2024-06-06 14:37:05,005][19065] Fps is (10 sec: 40960.0, 60 sec: 41233.1, 300 sec: 41931.9). Total num frames: 459440128. Throughput: 0: 41852.1. Samples: 50433500. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 14:37:05,005][19065] Avg episode reward: [(0, '0.287')] [2024-06-06 14:37:05,622][19297] Updated weights for policy 0, policy_version 28044 (0.0026) [2024-06-06 14:37:08,307][19297] Updated weights for policy 0, policy_version 28054 (0.0027) [2024-06-06 14:37:10,008][19065] Fps is (10 sec: 40960.3, 60 sec: 42323.0, 300 sec: 41987.0). Total num frames: 459685888. Throughput: 0: 41802.8. Samples: 50557260. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 14:37:10,009][19065] Avg episode reward: [(0, '0.302')] [2024-06-06 14:37:13,537][19297] Updated weights for policy 0, policy_version 28064 (0.0039) [2024-06-06 14:37:15,005][19065] Fps is (10 sec: 44236.3, 60 sec: 41779.1, 300 sec: 42043.5). Total num frames: 459882496. Throughput: 0: 42028.5. Samples: 50815120. Policy #0 lag: (min: 0.0, avg: 7.9, max: 21.0) [2024-06-06 14:37:15,006][19065] Avg episode reward: [(0, '0.285')] [2024-06-06 14:37:16,032][19297] Updated weights for policy 0, policy_version 28074 (0.0040) [2024-06-06 14:37:20,005][19065] Fps is (10 sec: 39334.6, 60 sec: 41506.3, 300 sec: 41876.4). Total num frames: 460079104. Throughput: 0: 41826.2. Samples: 51058780. Policy #0 lag: (min: 0.0, avg: 7.9, max: 21.0) [2024-06-06 14:37:20,005][19065] Avg episode reward: [(0, '0.299')] [2024-06-06 14:37:21,126][19297] Updated weights for policy 0, policy_version 28084 (0.0043) [2024-06-06 14:37:23,311][19277] Signal inference workers to stop experience collection... (700 times) [2024-06-06 14:37:23,361][19297] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-06-06 14:37:23,364][19277] Signal inference workers to resume experience collection... (700 times) [2024-06-06 14:37:23,377][19297] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-06-06 14:37:23,659][19297] Updated weights for policy 0, policy_version 28094 (0.0037) [2024-06-06 14:37:25,005][19065] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 42043.0). Total num frames: 460308480. Throughput: 0: 41988.9. Samples: 51186420. Policy #0 lag: (min: 0.0, avg: 7.9, max: 21.0) [2024-06-06 14:37:25,006][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:37:29,048][19297] Updated weights for policy 0, policy_version 28104 (0.0027) [2024-06-06 14:37:30,005][19065] Fps is (10 sec: 42598.4, 60 sec: 41506.3, 300 sec: 41987.5). Total num frames: 460505088. Throughput: 0: 41786.3. Samples: 51437280. Policy #0 lag: (min: 0.0, avg: 7.9, max: 21.0) [2024-06-06 14:37:30,005][19065] Avg episode reward: [(0, '0.282')] [2024-06-06 14:37:31,800][19297] Updated weights for policy 0, policy_version 28114 (0.0029) [2024-06-06 14:37:35,005][19065] Fps is (10 sec: 40960.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 460718080. Throughput: 0: 41697.9. Samples: 51687820. Policy #0 lag: (min: 0.0, avg: 13.1, max: 23.0) [2024-06-06 14:37:35,008][19065] Avg episode reward: [(0, '0.298')] [2024-06-06 14:37:36,941][19297] Updated weights for policy 0, policy_version 28124 (0.0037) [2024-06-06 14:37:39,827][19297] Updated weights for policy 0, policy_version 28134 (0.0029) [2024-06-06 14:37:40,005][19065] Fps is (10 sec: 44236.0, 60 sec: 42052.2, 300 sec: 42043.5). Total num frames: 460947456. Throughput: 0: 41856.8. Samples: 51811940. Policy #0 lag: (min: 0.0, avg: 13.1, max: 23.0) [2024-06-06 14:37:40,005][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:37:44,374][19297] Updated weights for policy 0, policy_version 28144 (0.0023) [2024-06-06 14:37:45,005][19065] Fps is (10 sec: 40959.9, 60 sec: 41233.0, 300 sec: 41931.9). Total num frames: 461127680. Throughput: 0: 41661.3. Samples: 52060900. Policy #0 lag: (min: 0.0, avg: 13.1, max: 23.0) [2024-06-06 14:37:45,005][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:37:47,410][19297] Updated weights for policy 0, policy_version 28154 (0.0039) [2024-06-06 14:37:50,007][19065] Fps is (10 sec: 37674.5, 60 sec: 41777.6, 300 sec: 41876.5). Total num frames: 461324288. Throughput: 0: 41863.5. Samples: 52317460. Policy #0 lag: (min: 0.0, avg: 13.1, max: 23.0) [2024-06-06 14:37:50,008][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:37:52,317][19297] Updated weights for policy 0, policy_version 28164 (0.0030) [2024-06-06 14:37:55,008][19065] Fps is (10 sec: 45860.1, 60 sec: 42596.1, 300 sec: 42098.1). Total num frames: 461586432. Throughput: 0: 41892.4. Samples: 52442420. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 14:37:55,009][19065] Avg episode reward: [(0, '0.300')] [2024-06-06 14:37:55,223][19297] Updated weights for policy 0, policy_version 28174 (0.0041) [2024-06-06 14:38:00,005][19065] Fps is (10 sec: 42607.9, 60 sec: 41235.2, 300 sec: 41876.4). Total num frames: 461750272. Throughput: 0: 41702.1. Samples: 52691720. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 14:38:00,005][19065] Avg episode reward: [(0, '0.295')] [2024-06-06 14:38:00,271][19297] Updated weights for policy 0, policy_version 28184 (0.0038) [2024-06-06 14:38:03,457][19297] Updated weights for policy 0, policy_version 28194 (0.0037) [2024-06-06 14:38:05,005][19065] Fps is (10 sec: 39334.4, 60 sec: 42325.3, 300 sec: 41932.4). Total num frames: 461979648. Throughput: 0: 41854.1. Samples: 52942220. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 14:38:05,006][19065] Avg episode reward: [(0, '0.291')] [2024-06-06 14:38:07,734][19297] Updated weights for policy 0, policy_version 28204 (0.0034) [2024-06-06 14:38:10,008][19065] Fps is (10 sec: 44223.3, 60 sec: 41779.2, 300 sec: 41987.0). Total num frames: 462192640. Throughput: 0: 41738.3. Samples: 53064780. Policy #0 lag: (min: 1.0, avg: 10.0, max: 23.0) [2024-06-06 14:38:10,009][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:38:11,196][19297] Updated weights for policy 0, policy_version 28214 (0.0027) [2024-06-06 14:38:15,008][19065] Fps is (10 sec: 40946.9, 60 sec: 41777.0, 300 sec: 41820.4). Total num frames: 462389248. Throughput: 0: 41870.7. Samples: 53321600. Policy #0 lag: (min: 1.0, avg: 10.0, max: 23.0) [2024-06-06 14:38:15,009][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:38:15,624][19297] Updated weights for policy 0, policy_version 28224 (0.0034) [2024-06-06 14:38:18,863][19297] Updated weights for policy 0, policy_version 28234 (0.0033) [2024-06-06 14:38:20,005][19065] Fps is (10 sec: 40973.3, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 462602240. Throughput: 0: 41705.7. Samples: 53564580. Policy #0 lag: (min: 1.0, avg: 10.0, max: 23.0) [2024-06-06 14:38:20,005][19065] Avg episode reward: [(0, '0.299')] [2024-06-06 14:38:23,464][19297] Updated weights for policy 0, policy_version 28244 (0.0030) [2024-06-06 14:38:25,008][19065] Fps is (10 sec: 44236.8, 60 sec: 42050.0, 300 sec: 42042.6). Total num frames: 462831616. Throughput: 0: 41961.1. Samples: 53700320. Policy #0 lag: (min: 1.0, avg: 10.0, max: 23.0) [2024-06-06 14:38:25,009][19065] Avg episode reward: [(0, '0.302')] [2024-06-06 14:38:26,845][19297] Updated weights for policy 0, policy_version 28254 (0.0033) [2024-06-06 14:38:30,005][19065] Fps is (10 sec: 40960.0, 60 sec: 41779.1, 300 sec: 41876.8). Total num frames: 463011840. Throughput: 0: 41984.4. Samples: 53950200. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 14:38:30,005][19065] Avg episode reward: [(0, '0.305')] [2024-06-06 14:38:30,021][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000028260_463011840.pth... [2024-06-06 14:38:30,066][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000027647_452968448.pth [2024-06-06 14:38:30,073][19277] Saving new best policy, reward=0.305! [2024-06-06 14:38:31,164][19297] Updated weights for policy 0, policy_version 28264 (0.0042) [2024-06-06 14:38:35,005][19065] Fps is (10 sec: 40973.8, 60 sec: 42052.3, 300 sec: 41931.9). Total num frames: 463241216. Throughput: 0: 41712.1. Samples: 54194400. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 14:38:35,005][19065] Avg episode reward: [(0, '0.299')] [2024-06-06 14:38:35,011][19297] Updated weights for policy 0, policy_version 28274 (0.0034) [2024-06-06 14:38:38,840][19297] Updated weights for policy 0, policy_version 28284 (0.0043) [2024-06-06 14:38:40,005][19065] Fps is (10 sec: 39322.0, 60 sec: 40960.1, 300 sec: 41820.9). Total num frames: 463405056. Throughput: 0: 41772.4. Samples: 54322040. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 14:38:40,005][19065] Avg episode reward: [(0, '0.302')] [2024-06-06 14:38:42,739][19297] Updated weights for policy 0, policy_version 28294 (0.0036) [2024-06-06 14:38:45,005][19065] Fps is (10 sec: 39321.5, 60 sec: 41779.3, 300 sec: 41820.9). Total num frames: 463634432. Throughput: 0: 41790.9. Samples: 54572300. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 14:38:45,005][19065] Avg episode reward: [(0, '0.300')] [2024-06-06 14:38:46,684][19297] Updated weights for policy 0, policy_version 28304 (0.0030) [2024-06-06 14:38:50,005][19065] Fps is (10 sec: 45874.7, 60 sec: 42327.0, 300 sec: 41931.9). Total num frames: 463863808. Throughput: 0: 41754.2. Samples: 54821160. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 14:38:50,005][19065] Avg episode reward: [(0, '0.311')] [2024-06-06 14:38:50,113][19277] Saving new best policy, reward=0.311! [2024-06-06 14:38:50,311][19297] Updated weights for policy 0, policy_version 28314 (0.0035) [2024-06-06 14:38:53,732][19277] Signal inference workers to stop experience collection... (750 times) [2024-06-06 14:38:53,732][19277] Signal inference workers to resume experience collection... (750 times) [2024-06-06 14:38:53,749][19297] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-06-06 14:38:53,749][19297] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-06-06 14:38:54,655][19297] Updated weights for policy 0, policy_version 28324 (0.0041) [2024-06-06 14:38:55,008][19065] Fps is (10 sec: 42584.1, 60 sec: 41233.1, 300 sec: 41875.9). Total num frames: 464060416. Throughput: 0: 41928.0. Samples: 54951540. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 14:38:55,009][19065] Avg episode reward: [(0, '0.307')] [2024-06-06 14:38:58,051][19297] Updated weights for policy 0, policy_version 28334 (0.0025) [2024-06-06 14:39:00,005][19065] Fps is (10 sec: 40960.1, 60 sec: 42052.4, 300 sec: 41931.9). Total num frames: 464273408. Throughput: 0: 41686.5. Samples: 55197360. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 14:39:00,005][19065] Avg episode reward: [(0, '0.307')] [2024-06-06 14:39:02,537][19297] Updated weights for policy 0, policy_version 28344 (0.0036) [2024-06-06 14:39:05,005][19065] Fps is (10 sec: 42612.1, 60 sec: 41779.2, 300 sec: 41931.9). Total num frames: 464486400. Throughput: 0: 41762.2. Samples: 55443880. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 14:39:05,005][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:39:06,342][19297] Updated weights for policy 0, policy_version 28354 (0.0037) [2024-06-06 14:39:10,005][19065] Fps is (10 sec: 40960.1, 60 sec: 41508.4, 300 sec: 41876.4). Total num frames: 464683008. Throughput: 0: 41537.7. Samples: 55569380. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:39:10,005][19065] Avg episode reward: [(0, '0.305')] [2024-06-06 14:39:10,289][19297] Updated weights for policy 0, policy_version 28364 (0.0029) [2024-06-06 14:39:14,539][19297] Updated weights for policy 0, policy_version 28374 (0.0032) [2024-06-06 14:39:15,005][19065] Fps is (10 sec: 40960.1, 60 sec: 41781.4, 300 sec: 41876.4). Total num frames: 464896000. Throughput: 0: 41458.7. Samples: 55815840. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:39:15,005][19065] Avg episode reward: [(0, '0.297')] [2024-06-06 14:39:18,330][19297] Updated weights for policy 0, policy_version 28384 (0.0035) [2024-06-06 14:39:20,008][19065] Fps is (10 sec: 40946.8, 60 sec: 41503.9, 300 sec: 41875.9). Total num frames: 465092608. Throughput: 0: 41712.0. Samples: 56071580. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:39:20,012][19065] Avg episode reward: [(0, '0.300')] [2024-06-06 14:39:22,137][19297] Updated weights for policy 0, policy_version 28394 (0.0035) [2024-06-06 14:39:25,005][19065] Fps is (10 sec: 39321.6, 60 sec: 40962.2, 300 sec: 41765.3). Total num frames: 465289216. Throughput: 0: 41575.9. Samples: 56192960. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:39:25,005][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:39:26,252][19297] Updated weights for policy 0, policy_version 28404 (0.0034) [2024-06-06 14:39:29,679][19297] Updated weights for policy 0, policy_version 28414 (0.0030) [2024-06-06 14:39:30,005][19065] Fps is (10 sec: 44250.7, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 465534976. Throughput: 0: 41710.1. Samples: 56449260. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 14:39:30,005][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:39:34,102][19297] Updated weights for policy 0, policy_version 28424 (0.0042) [2024-06-06 14:39:35,005][19065] Fps is (10 sec: 42598.8, 60 sec: 41233.0, 300 sec: 41820.9). Total num frames: 465715200. Throughput: 0: 41625.0. Samples: 56694280. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 14:39:35,005][19065] Avg episode reward: [(0, '0.275')] [2024-06-06 14:39:38,024][19297] Updated weights for policy 0, policy_version 28434 (0.0035) [2024-06-06 14:39:40,005][19065] Fps is (10 sec: 39321.5, 60 sec: 42052.2, 300 sec: 41876.8). Total num frames: 465928192. Throughput: 0: 41594.0. Samples: 56823140. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 14:39:40,005][19065] Avg episode reward: [(0, '0.300')] [2024-06-06 14:39:41,819][19297] Updated weights for policy 0, policy_version 28444 (0.0039) [2024-06-06 14:39:45,005][19065] Fps is (10 sec: 44236.0, 60 sec: 42052.1, 300 sec: 41931.9). Total num frames: 466157568. Throughput: 0: 41663.5. Samples: 57072220. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:39:45,005][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:39:45,985][19297] Updated weights for policy 0, policy_version 28454 (0.0045) [2024-06-06 14:39:49,660][19297] Updated weights for policy 0, policy_version 28464 (0.0047) [2024-06-06 14:39:50,005][19065] Fps is (10 sec: 42598.8, 60 sec: 41506.2, 300 sec: 41821.3). Total num frames: 466354176. Throughput: 0: 41679.6. Samples: 57319460. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:39:50,005][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:39:53,715][19297] Updated weights for policy 0, policy_version 28474 (0.0023) [2024-06-06 14:39:55,005][19065] Fps is (10 sec: 40960.5, 60 sec: 41781.5, 300 sec: 41820.9). Total num frames: 466567168. Throughput: 0: 41751.1. Samples: 57448180. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:39:55,005][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:39:57,637][19297] Updated weights for policy 0, policy_version 28484 (0.0041) [2024-06-06 14:40:00,005][19065] Fps is (10 sec: 42598.3, 60 sec: 41779.2, 300 sec: 41820.8). Total num frames: 466780160. Throughput: 0: 41867.1. Samples: 57699860. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:40:00,006][19065] Avg episode reward: [(0, '0.291')] [2024-06-06 14:40:01,114][19297] Updated weights for policy 0, policy_version 28494 (0.0043) [2024-06-06 14:40:05,008][19065] Fps is (10 sec: 40946.6, 60 sec: 41503.9, 300 sec: 41820.4). Total num frames: 466976768. Throughput: 0: 41728.4. Samples: 57949360. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:40:05,009][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:40:05,690][19297] Updated weights for policy 0, policy_version 28504 (0.0040) [2024-06-06 14:40:09,163][19297] Updated weights for policy 0, policy_version 28514 (0.0034) [2024-06-06 14:40:10,005][19065] Fps is (10 sec: 40960.4, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 467189760. Throughput: 0: 41757.0. Samples: 58072020. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:40:10,005][19065] Avg episode reward: [(0, '0.300')] [2024-06-06 14:40:13,474][19297] Updated weights for policy 0, policy_version 28524 (0.0038) [2024-06-06 14:40:15,005][19065] Fps is (10 sec: 40973.7, 60 sec: 41506.2, 300 sec: 41710.0). Total num frames: 467386368. Throughput: 0: 41519.7. Samples: 58317640. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:40:15,005][19065] Avg episode reward: [(0, '0.294')] [2024-06-06 14:40:17,240][19297] Updated weights for policy 0, policy_version 28534 (0.0038) [2024-06-06 14:40:20,005][19065] Fps is (10 sec: 40959.7, 60 sec: 41781.5, 300 sec: 41765.3). Total num frames: 467599360. Throughput: 0: 41847.9. Samples: 58577440. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:40:20,005][19065] Avg episode reward: [(0, '0.291')] [2024-06-06 14:40:21,124][19297] Updated weights for policy 0, policy_version 28544 (0.0051) [2024-06-06 14:40:24,809][19297] Updated weights for policy 0, policy_version 28554 (0.0029) [2024-06-06 14:40:25,005][19065] Fps is (10 sec: 44236.2, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 467828736. Throughput: 0: 41669.4. Samples: 58698260. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 14:40:25,005][19065] Avg episode reward: [(0, '0.281')] [2024-06-06 14:40:29,244][19297] Updated weights for policy 0, policy_version 28564 (0.0040) [2024-06-06 14:40:30,005][19065] Fps is (10 sec: 42598.3, 60 sec: 41506.2, 300 sec: 41765.3). Total num frames: 468025344. Throughput: 0: 41857.9. Samples: 58955820. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 14:40:30,005][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:40:30,022][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000028567_468041728.pth... [2024-06-06 14:40:30,074][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000027956_458031104.pth [2024-06-06 14:40:32,353][19297] Updated weights for policy 0, policy_version 28574 (0.0030) [2024-06-06 14:40:35,005][19065] Fps is (10 sec: 39322.3, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 468221952. Throughput: 0: 41718.3. Samples: 59196780. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 14:40:35,005][19065] Avg episode reward: [(0, '0.304')] [2024-06-06 14:40:37,216][19277] Signal inference workers to stop experience collection... (800 times) [2024-06-06 14:40:37,218][19277] Signal inference workers to resume experience collection... (800 times) [2024-06-06 14:40:37,219][19297] Updated weights for policy 0, policy_version 28584 (0.0036) [2024-06-06 14:40:37,241][19297] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-06-06 14:40:37,242][19297] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-06-06 14:40:40,005][19065] Fps is (10 sec: 42598.1, 60 sec: 42052.3, 300 sec: 41820.9). Total num frames: 468451328. Throughput: 0: 41710.6. Samples: 59325160. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 14:40:40,005][19065] Avg episode reward: [(0, '0.290')] [2024-06-06 14:40:40,960][19297] Updated weights for policy 0, policy_version 28594 (0.0027) [2024-06-06 14:40:44,845][19297] Updated weights for policy 0, policy_version 28604 (0.0033) [2024-06-06 14:40:45,005][19065] Fps is (10 sec: 42597.7, 60 sec: 41506.2, 300 sec: 41654.5). Total num frames: 468647936. Throughput: 0: 41482.7. Samples: 59566580. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 14:40:45,005][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:40:49,089][19297] Updated weights for policy 0, policy_version 28614 (0.0046) [2024-06-06 14:40:50,005][19065] Fps is (10 sec: 40959.8, 60 sec: 41779.1, 300 sec: 41931.9). Total num frames: 468860928. Throughput: 0: 41654.0. Samples: 59823660. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 14:40:50,005][19065] Avg episode reward: [(0, '0.295')] [2024-06-06 14:40:52,713][19297] Updated weights for policy 0, policy_version 28624 (0.0042) [2024-06-06 14:40:55,008][19065] Fps is (10 sec: 40946.9, 60 sec: 41503.9, 300 sec: 41764.9). Total num frames: 469057536. Throughput: 0: 41705.4. Samples: 59948900. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 14:40:55,009][19065] Avg episode reward: [(0, '0.299')] [2024-06-06 14:40:56,505][19297] Updated weights for policy 0, policy_version 28634 (0.0042) [2024-06-06 14:41:00,005][19065] Fps is (10 sec: 39322.4, 60 sec: 41233.1, 300 sec: 41654.3). Total num frames: 469254144. Throughput: 0: 41985.3. Samples: 60206980. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:41:00,005][19065] Avg episode reward: [(0, '0.303')] [2024-06-06 14:41:00,477][19297] Updated weights for policy 0, policy_version 28644 (0.0028) [2024-06-06 14:41:04,198][19297] Updated weights for policy 0, policy_version 28654 (0.0032) [2024-06-06 14:41:05,005][19065] Fps is (10 sec: 42612.1, 60 sec: 41781.4, 300 sec: 41820.8). Total num frames: 469483520. Throughput: 0: 41673.3. Samples: 60452740. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:41:05,005][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:41:08,259][19297] Updated weights for policy 0, policy_version 28664 (0.0030) [2024-06-06 14:41:10,005][19065] Fps is (10 sec: 44235.9, 60 sec: 41779.1, 300 sec: 41765.3). Total num frames: 469696512. Throughput: 0: 41927.0. Samples: 60584980. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:41:10,005][19065] Avg episode reward: [(0, '0.297')] [2024-06-06 14:41:12,296][19297] Updated weights for policy 0, policy_version 28674 (0.0038) [2024-06-06 14:41:15,005][19065] Fps is (10 sec: 40960.5, 60 sec: 41779.2, 300 sec: 41709.8). Total num frames: 469893120. Throughput: 0: 41646.3. Samples: 60829900. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:41:15,005][19065] Avg episode reward: [(0, '0.302')] [2024-06-06 14:41:15,925][19297] Updated weights for policy 0, policy_version 28684 (0.0038) [2024-06-06 14:41:20,005][19065] Fps is (10 sec: 40960.9, 60 sec: 41779.2, 300 sec: 41820.9). Total num frames: 470106112. Throughput: 0: 41901.3. Samples: 61082340. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 14:41:20,005][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:41:20,031][19297] Updated weights for policy 0, policy_version 28694 (0.0024) [2024-06-06 14:41:23,879][19297] Updated weights for policy 0, policy_version 28704 (0.0042) [2024-06-06 14:41:25,005][19065] Fps is (10 sec: 42598.1, 60 sec: 41506.2, 300 sec: 41709.8). Total num frames: 470319104. Throughput: 0: 41883.2. Samples: 61209900. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 14:41:25,005][19065] Avg episode reward: [(0, '0.304')] [2024-06-06 14:41:27,471][19297] Updated weights for policy 0, policy_version 28714 (0.0027) [2024-06-06 14:41:30,005][19065] Fps is (10 sec: 44236.4, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 470548480. Throughput: 0: 42152.9. Samples: 61463460. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 14:41:30,005][19065] Avg episode reward: [(0, '0.303')] [2024-06-06 14:41:31,707][19297] Updated weights for policy 0, policy_version 28724 (0.0031) [2024-06-06 14:41:35,005][19065] Fps is (10 sec: 42598.7, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 470745088. Throughput: 0: 42153.5. Samples: 61720560. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 14:41:35,005][19065] Avg episode reward: [(0, '0.300')] [2024-06-06 14:41:35,274][19297] Updated weights for policy 0, policy_version 28734 (0.0029) [2024-06-06 14:41:39,485][19297] Updated weights for policy 0, policy_version 28744 (0.0039) [2024-06-06 14:41:40,005][19065] Fps is (10 sec: 42598.6, 60 sec: 42052.4, 300 sec: 41765.3). Total num frames: 470974464. Throughput: 0: 42121.3. Samples: 61844220. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:41:40,009][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:41:43,351][19297] Updated weights for policy 0, policy_version 28754 (0.0037) [2024-06-06 14:41:45,005][19065] Fps is (10 sec: 42598.2, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 471171072. Throughput: 0: 42008.4. Samples: 62097360. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:41:45,005][19065] Avg episode reward: [(0, '0.291')] [2024-06-06 14:41:47,118][19297] Updated weights for policy 0, policy_version 28764 (0.0034) [2024-06-06 14:41:50,005][19065] Fps is (10 sec: 40959.8, 60 sec: 42052.4, 300 sec: 41876.4). Total num frames: 471384064. Throughput: 0: 42299.1. Samples: 62356200. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:41:50,005][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:41:51,258][19297] Updated weights for policy 0, policy_version 28774 (0.0043) [2024-06-06 14:41:54,787][19297] Updated weights for policy 0, policy_version 28784 (0.0041) [2024-06-06 14:41:55,008][19065] Fps is (10 sec: 42584.5, 60 sec: 42325.3, 300 sec: 41765.3). Total num frames: 471597056. Throughput: 0: 42114.0. Samples: 62480240. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:41:55,009][19065] Avg episode reward: [(0, '0.305')] [2024-06-06 14:41:58,902][19297] Updated weights for policy 0, policy_version 28794 (0.0043) [2024-06-06 14:42:00,005][19065] Fps is (10 sec: 42598.1, 60 sec: 42598.3, 300 sec: 41931.9). Total num frames: 471810048. Throughput: 0: 42190.9. Samples: 62728500. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:42:00,006][19065] Avg episode reward: [(0, '0.294')] [2024-06-06 14:42:02,674][19297] Updated weights for policy 0, policy_version 28804 (0.0023) [2024-06-06 14:42:05,005][19065] Fps is (10 sec: 40973.1, 60 sec: 42052.3, 300 sec: 41765.8). Total num frames: 472006656. Throughput: 0: 42275.5. Samples: 62984740. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:42:05,005][19065] Avg episode reward: [(0, '0.301')] [2024-06-06 14:42:06,707][19297] Updated weights for policy 0, policy_version 28814 (0.0037) [2024-06-06 14:42:10,005][19065] Fps is (10 sec: 39322.3, 60 sec: 41779.4, 300 sec: 41765.3). Total num frames: 472203264. Throughput: 0: 42131.6. Samples: 63105820. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:42:10,005][19065] Avg episode reward: [(0, '0.293')] [2024-06-06 14:42:10,374][19297] Updated weights for policy 0, policy_version 28824 (0.0027) [2024-06-06 14:42:14,351][19297] Updated weights for policy 0, policy_version 28834 (0.0042) [2024-06-06 14:42:15,005][19065] Fps is (10 sec: 42598.6, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 472432640. Throughput: 0: 42133.3. Samples: 63359460. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:42:15,005][19065] Avg episode reward: [(0, '0.306')] [2024-06-06 14:42:18,167][19297] Updated weights for policy 0, policy_version 28844 (0.0022) [2024-06-06 14:42:20,005][19065] Fps is (10 sec: 42597.8, 60 sec: 42052.2, 300 sec: 41765.3). Total num frames: 472629248. Throughput: 0: 41919.4. Samples: 63606940. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 14:42:20,005][19065] Avg episode reward: [(0, '0.301')] [2024-06-06 14:42:21,899][19277] Signal inference workers to stop experience collection... (850 times) [2024-06-06 14:42:21,899][19277] Signal inference workers to resume experience collection... (850 times) [2024-06-06 14:42:21,915][19297] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-06-06 14:42:21,915][19297] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-06-06 14:42:22,048][19297] Updated weights for policy 0, policy_version 28854 (0.0028) [2024-06-06 14:42:25,005][19065] Fps is (10 sec: 42598.5, 60 sec: 42325.3, 300 sec: 41876.4). Total num frames: 472858624. Throughput: 0: 41996.4. Samples: 63734060. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 14:42:25,005][19065] Avg episode reward: [(0, '0.280')] [2024-06-06 14:42:26,189][19297] Updated weights for policy 0, policy_version 28864 (0.0037) [2024-06-06 14:42:29,841][19297] Updated weights for policy 0, policy_version 28874 (0.0037) [2024-06-06 14:42:30,005][19065] Fps is (10 sec: 44236.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 473071616. Throughput: 0: 42096.4. Samples: 63991700. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 14:42:30,005][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:42:30,014][19277] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000028874_473071616.pth... [2024-06-06 14:42:30,070][19277] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000028260_463011840.pth [2024-06-06 14:42:33,587][19297] Updated weights for policy 0, policy_version 28884 (0.0034) [2024-06-06 14:42:35,005][19065] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 41820.9). Total num frames: 473284608. Throughput: 0: 42061.8. Samples: 64248980. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 14:42:35,005][19065] Avg episode reward: [(0, '0.283')] [2024-06-06 14:42:37,667][19297] Updated weights for policy 0, policy_version 28894 (0.0032) [2024-06-06 14:42:40,005][19065] Fps is (10 sec: 40960.6, 60 sec: 41779.2, 300 sec: 41876.4). Total num frames: 473481216. Throughput: 0: 42061.4. Samples: 64372860. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 14:42:40,005][19065] Avg episode reward: [(0, '0.298')] [2024-06-06 14:42:41,409][19297] Updated weights for policy 0, policy_version 28904 (0.0039) [2024-06-06 14:42:45,005][19065] Fps is (10 sec: 39321.7, 60 sec: 41779.2, 300 sec: 41876.7). Total num frames: 473677824. Throughput: 0: 42233.9. Samples: 64629020. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 14:42:45,005][19065] Avg episode reward: [(0, '0.305')] [2024-06-06 14:42:45,470][19297] Updated weights for policy 0, policy_version 28914 (0.0027) [2024-06-06 14:42:49,107][19297] Updated weights for policy 0, policy_version 28924 (0.0033) [2024-06-06 14:42:50,005][19065] Fps is (10 sec: 42598.3, 60 sec: 42052.3, 300 sec: 41765.8). Total num frames: 473907200. Throughput: 0: 41967.2. Samples: 64873260. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 14:42:50,005][19065] Avg episode reward: [(0, '0.296')] [2024-06-06 14:42:53,291][19297] Updated weights for policy 0, policy_version 28934 (0.0036) [2024-06-06 14:42:55,005][19065] Fps is (10 sec: 44236.7, 60 sec: 42054.5, 300 sec: 41932.0). Total num frames: 474120192. Throughput: 0: 42273.3. Samples: 65008120. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 14:42:55,005][19065] Avg episode reward: [(0, '0.305')] [2024-06-06 14:42:56,795][19297] Updated weights for policy 0, policy_version 28944 (0.0034) [2024-06-06 14:43:00,005][19065] Fps is (10 sec: 39321.8, 60 sec: 41506.3, 300 sec: 41765.3). Total num frames: 474300416. Throughput: 0: 42199.7. Samples: 65258440. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 14:43:00,005][19065] Avg episode reward: [(0, '0.305')] [2024-06-06 14:43:00,858][19297] Updated weights for policy 0, policy_version 28954 (0.0028) [2024-06-06 14:43:04,468][19297] Updated weights for policy 0, policy_version 28964 (0.0037) [2024-06-06 14:43:05,005][19065] Fps is (10 sec: 42598.5, 60 sec: 42325.4, 300 sec: 41876.9). Total num frames: 474546176. Throughput: 0: 42317.4. Samples: 65511220. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 14:43:05,005][19065] Avg episode reward: [(0, '0.303')] [2024-06-06 14:43:08,845][19297] Updated weights for policy 0, policy_version 28974 (0.0037) [2024-06-06 14:43:10,005][19065] Fps is (10 sec: 45874.5, 60 sec: 42598.3, 300 sec: 41932.4). Total num frames: 474759168. Throughput: 0: 42269.3. Samples: 65636180. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 14:43:10,005][19065] Avg episode reward: [(0, '0.299')] [2024-06-06 14:43:12,655][19297] Updated weights for policy 0, policy_version 28984 (0.0032) [2024-06-06 14:43:15,005][19065] Fps is (10 sec: 40959.9, 60 sec: 42052.3, 300 sec: 41876.4). Total num frames: 474955776. Throughput: 0: 42233.4. Samples: 65892200. Policy #0 lag: (min: 0.0, avg: 10.4, max: 20.0) [2024-06-06 14:43:15,005][19065] Avg episode reward: [(0, '0.301')] [2024-06-06 14:43:16,750][19297] Updated weights for policy 0, policy_version 28994 (0.0038) [2024-06-06 14:43:20,005][19065] Fps is (10 sec: 42598.2, 60 sec: 42598.4, 300 sec: 41876.8). Total num frames: 475185152. Throughput: 0: 42093.7. Samples: 66143200. Policy #0 lag: (min: 0.0, avg: 10.4, max: 20.0) [2024-06-06 14:43:20,005][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:43:20,163][19297] Updated weights for policy 0, policy_version 29004 (0.0025) [2024-06-06 14:43:24,538][19297] Updated weights for policy 0, policy_version 29014 (0.0041) [2024-06-06 14:43:25,005][19065] Fps is (10 sec: 44236.6, 60 sec: 42325.3, 300 sec: 41987.5). Total num frames: 475398144. Throughput: 0: 42209.7. Samples: 66272300. Policy #0 lag: (min: 0.0, avg: 10.4, max: 20.0) [2024-06-06 14:43:25,005][19065] Avg episode reward: [(0, '0.288')] [2024-06-06 14:43:27,839][19297] Updated weights for policy 0, policy_version 29024 (0.0022) [2024-06-06 14:43:30,005][19065] Fps is (10 sec: 40959.8, 60 sec: 42052.2, 300 sec: 41876.4). Total num frames: 475594752. Throughput: 0: 41965.2. Samples: 66517460. Policy #0 lag: (min: 0.0, avg: 10.4, max: 20.0) [2024-06-06 14:43:30,005][19065] Avg episode reward: [(0, '0.297')] [2024-06-06 14:43:32,194][19297] Updated weights for policy 0, policy_version 29034 (0.0038) [2024-06-06 14:43:35,005][19065] Fps is (10 sec: 42598.4, 60 sec: 42325.3, 300 sec: 42098.5). Total num frames: 475824128. Throughput: 0: 42271.9. Samples: 66775500. Policy #0 lag: (min: 1.0, avg: 10.5, max: 23.0) [2024-06-06 14:43:35,005][19065] Avg episode reward: [(0, '0.295')] [2024-06-06 14:43:35,645][19297] Updated weights for policy 0, policy_version 29044 (0.0025) [2024-06-06 14:43:40,005][19065] Fps is (10 sec: 40960.5, 60 sec: 42052.2, 300 sec: 41931.9). Total num frames: 476004352. Throughput: 0: 42054.6. Samples: 66900580. Policy #0 lag: (min: 1.0, avg: 10.5, max: 23.0) [2024-06-06 14:43:40,005][19065] Avg episode reward: [(0, '0.281')] [2024-06-06 14:43:40,263][19297] Updated weights for policy 0, policy_version 29054 (0.0042) [2024-06-06 14:43:43,386][19297] Updated weights for policy 0, policy_version 29064 (0.0033) [2024-06-06 14:43:45,005][19065] Fps is (10 sec: 40960.0, 60 sec: 42598.3, 300 sec: 41931.9). Total num frames: 476233728. Throughput: 0: 42179.0. Samples: 67156500. Policy #0 lag: (min: 1.0, avg: 10.5, max: 23.0) [2024-06-06 14:43:45,005][19065] Avg episode reward: [(0, '0.307')] [2024-06-06 14:43:47,955][19297] Updated weights for policy 0, policy_version 29074 (0.0040) [2024-06-06 14:43:50,005][19065] Fps is (10 sec: 42598.8, 60 sec: 42052.3, 300 sec: 41932.4). Total num frames: 476430336. Throughput: 0: 41971.6. Samples: 67399940. Policy #0 lag: (min: 1.0, avg: 10.5, max: 23.0) [2024-06-06 14:43:50,005][19065] Avg episode reward: [(0, '0.306')] [2024-06-06 14:43:51,403][19297] Updated weights for policy 0, policy_version 29084 (0.0034) [2024-06-06 14:43:52,367][19277] Signal inference workers to stop experience collection... (900 times) [2024-06-06 14:43:52,424][19297] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-06-06 14:43:52,480][19277] Signal inference workers to resume experience collection... (900 times) [2024-06-06 14:43:52,481][19297] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-06-06 14:43:55,005][19065] Fps is (10 sec: 37682.9, 60 sec: 41506.0, 300 sec: 41820.8). Total num frames: 476610560. Throughput: 0: 42073.2. Samples: 67529480. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 14:43:55,005][19065] Avg episode reward: [(0, '0.292')] [2024-06-06 14:43:55,646][19297] Updated weights for policy 0, policy_version 29094 (0.0025) [2024-06-06 14:43:59,145][19297] Updated weights for policy 0, policy_version 29104 (0.0041) [2024-06-06 14:44:00,005][19065] Fps is (10 sec: 44236.1, 60 sec: 42871.3, 300 sec: 41987.5). Total num frames: 476872704. Throughput: 0: 41911.9. Samples: 67778240. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 14:44:00,005][19065] Avg episode reward: [(0, '0.304')] [2024-06-06 14:44:03,566][19297] Updated weights for policy 0, policy_version 29114 (0.0041) [2024-06-06 14:44:05,008][19065] Fps is (10 sec: 45860.9, 60 sec: 42050.0, 300 sec: 41987.0). Total num frames: 477069312. Throughput: 0: 41985.5. Samples: 68032680. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 14:44:05,012][19065] Avg episode reward: [(0, '0.301')] [2024-06-06 14:44:06,809][19297] Updated weights for policy 0, policy_version 29124 (0.0033) [2024-06-06 14:44:10,005][19065] Fps is (10 sec: 37683.4, 60 sec: 41506.2, 300 sec: 41876.4). Total num frames: 477249536. Throughput: 0: 41833.3. Samples: 68154800. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 14:44:10,005][19065] Avg episode reward: [(0, '0.285')] [2024-06-06 14:44:11,649][19297] Updated weights for policy 0, policy_version 29134 (0.0035) [2024-06-06 14:44:35,337][21617] Saving configuration to /workspace/metta/train_dir/p2.metta.4/config.json... [2024-06-06 14:44:35,354][21617] Rollout worker 0 uses device cpu [2024-06-06 14:44:35,354][21617] Rollout worker 1 uses device cpu [2024-06-06 14:44:35,355][21617] Rollout worker 2 uses device cpu [2024-06-06 14:44:35,355][21617] Rollout worker 3 uses device cpu [2024-06-06 14:44:35,355][21617] Rollout worker 4 uses device cpu [2024-06-06 14:44:35,356][21617] Rollout worker 5 uses device cpu [2024-06-06 14:44:35,356][21617] Rollout worker 6 uses device cpu [2024-06-06 14:44:35,356][21617] Rollout worker 7 uses device cpu [2024-06-06 14:44:35,356][21617] Rollout worker 8 uses device cpu [2024-06-06 14:44:35,357][21617] Rollout worker 9 uses device cpu [2024-06-06 14:44:35,357][21617] Rollout worker 10 uses device cpu [2024-06-06 14:44:35,357][21617] Rollout worker 11 uses device cpu [2024-06-06 14:44:35,357][21617] Rollout worker 12 uses device cpu [2024-06-06 14:44:35,358][21617] Rollout worker 13 uses device cpu [2024-06-06 14:44:35,358][21617] Rollout worker 14 uses device cpu [2024-06-06 14:44:35,358][21617] Rollout worker 15 uses device cpu [2024-06-06 14:44:35,359][21617] Rollout worker 16 uses device cpu [2024-06-06 14:44:35,359][21617] Rollout worker 17 uses device cpu [2024-06-06 14:44:35,359][21617] Rollout worker 18 uses device cpu [2024-06-06 14:44:35,360][21617] Rollout worker 19 uses device cpu [2024-06-06 14:44:35,360][21617] Rollout worker 20 uses device cpu [2024-06-06 14:44:35,360][21617] Rollout worker 21 uses device cpu [2024-06-06 14:44:35,361][21617] Rollout worker 22 uses device cpu [2024-06-06 14:44:35,361][21617] Rollout worker 23 uses device cpu [2024-06-06 14:44:35,361][21617] Rollout worker 24 uses device cpu [2024-06-06 14:44:35,362][21617] Rollout worker 25 uses device cpu [2024-06-06 14:44:35,362][21617] Rollout worker 26 uses device cpu [2024-06-06 14:44:35,362][21617] Rollout worker 27 uses device cpu [2024-06-06 14:44:35,363][21617] Rollout worker 28 uses device cpu [2024-06-06 14:44:35,363][21617] Rollout worker 29 uses device cpu [2024-06-06 14:44:35,363][21617] Rollout worker 30 uses device cpu [2024-06-06 14:44:35,363][21617] Rollout worker 31 uses device cpu [2024-06-06 14:44:35,909][21617] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:44:35,909][21617] InferenceWorker_p0-w0: min num requests: 10 [2024-06-06 14:44:35,953][21617] Starting all processes... [2024-06-06 14:44:35,953][21617] Starting process learner_proc0 [2024-06-06 14:44:36,225][21617] Starting all processes... [2024-06-06 14:44:36,227][21617] Starting process inference_proc0-0 [2024-06-06 14:44:36,227][21617] Starting process rollout_proc0 [2024-06-06 14:44:36,228][21617] Starting process rollout_proc1 [2024-06-06 14:44:36,228][21617] Starting process rollout_proc2 [2024-06-06 14:44:36,231][21617] Starting process rollout_proc7 [2024-06-06 14:44:36,229][21617] Starting process rollout_proc4 [2024-06-06 14:44:36,229][21617] Starting process rollout_proc5 [2024-06-06 14:44:36,231][21617] Starting process rollout_proc6 [2024-06-06 14:44:36,228][21617] Starting process rollout_proc3 [2024-06-06 14:44:36,231][21617] Starting process rollout_proc8 [2024-06-06 14:44:36,232][21617] Starting process rollout_proc9 [2024-06-06 14:44:36,233][21617] Starting process rollout_proc10 [2024-06-06 14:44:36,233][21617] Starting process rollout_proc11 [2024-06-06 14:44:36,233][21617] Starting process rollout_proc12 [2024-06-06 14:44:36,233][21617] Starting process rollout_proc13 [2024-06-06 14:44:36,233][21617] Starting process rollout_proc14 [2024-06-06 14:44:36,233][21617] Starting process rollout_proc15 [2024-06-06 14:44:36,233][21617] Starting process rollout_proc16 [2024-06-06 14:44:36,237][21617] Starting process rollout_proc17 [2024-06-06 14:44:36,238][21617] Starting process rollout_proc18 [2024-06-06 14:44:36,239][21617] Starting process rollout_proc19 [2024-06-06 14:44:36,241][21617] Starting process rollout_proc20 [2024-06-06 14:44:36,243][21617] Starting process rollout_proc21 [2024-06-06 14:44:36,244][21617] Starting process rollout_proc22 [2024-06-06 14:44:36,244][21617] Starting process rollout_proc23 [2024-06-06 14:44:36,245][21617] Starting process rollout_proc24 [2024-06-06 14:44:36,246][21617] Starting process rollout_proc25 [2024-06-06 14:44:36,250][21617] Starting process rollout_proc26 [2024-06-06 14:44:36,252][21617] Starting process rollout_proc27 [2024-06-06 14:44:36,253][21617] Starting process rollout_proc28 [2024-06-06 14:44:36,253][21617] Starting process rollout_proc29 [2024-06-06 14:44:36,253][21617] Starting process rollout_proc30 [2024-06-06 14:44:36,254][21617] Starting process rollout_proc31 [2024-06-06 14:44:38,384][21849] Worker 0 uses CPU cores [0] [2024-06-06 14:44:38,456][21853] Worker 1 uses CPU cores [1] [2024-06-06 14:44:38,504][21876] Worker 25 uses CPU cores [25] [2024-06-06 14:44:38,519][21856] Worker 8 uses CPU cores [8] [2024-06-06 14:44:38,520][21872] Worker 23 uses CPU cores [23] [2024-06-06 14:44:38,524][21874] Worker 24 uses CPU cores [24] [2024-06-06 14:44:38,535][21850] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:44:38,535][21850] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-06 14:44:38,544][21858] Worker 6 uses CPU cores [6] [2024-06-06 14:44:38,544][21850] Num visible devices: 1 [2024-06-06 14:44:38,560][21869] Worker 19 uses CPU cores [19] [2024-06-06 14:44:38,574][21829] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:44:38,574][21829] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-06 14:44:38,584][21829] Num visible devices: 1 [2024-06-06 14:44:38,604][21829] Setting fixed seed 0 [2024-06-06 14:44:38,605][21829] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:44:38,605][21829] Initializing actor-critic model on device cuda:0 [2024-06-06 14:44:38,608][21871] Worker 22 uses CPU cores [22] [2024-06-06 14:44:38,611][21861] Worker 12 uses CPU cores [12] [2024-06-06 14:44:38,616][21879] Worker 30 uses CPU cores [30] [2024-06-06 14:44:38,620][21854] Worker 4 uses CPU cores [4] [2024-06-06 14:44:38,623][21875] Worker 26 uses CPU cores [26] [2024-06-06 14:44:38,624][21880] Worker 28 uses CPU cores [28] [2024-06-06 14:44:38,632][21857] Worker 10 uses CPU cores [10] [2024-06-06 14:44:38,636][21867] Worker 16 uses CPU cores [16] [2024-06-06 14:44:38,655][21852] Worker 2 uses CPU cores [2] [2024-06-06 14:44:38,660][21855] Worker 3 uses CPU cores [3] [2024-06-06 14:44:38,668][21870] Worker 21 uses CPU cores [21] [2024-06-06 14:44:38,679][21863] Worker 9 uses CPU cores [9] [2024-06-06 14:44:38,679][21865] Worker 14 uses CPU cores [14] [2024-06-06 14:44:38,687][21873] Worker 20 uses CPU cores [20] [2024-06-06 14:44:38,690][21862] Worker 13 uses CPU cores [13] [2024-06-06 14:44:38,696][21866] Worker 17 uses CPU cores [17] [2024-06-06 14:44:38,735][21864] Worker 11 uses CPU cores [11] [2024-06-06 14:44:38,784][21878] Worker 29 uses CPU cores [29] [2024-06-06 14:44:38,810][21877] Worker 27 uses CPU cores [27] [2024-06-06 14:44:38,810][21881] Worker 31 uses CPU cores [31] [2024-06-06 14:44:38,813][21860] Worker 15 uses CPU cores [15] [2024-06-06 14:44:38,832][21868] Worker 18 uses CPU cores [18] [2024-06-06 14:44:38,840][21851] Worker 7 uses CPU cores [7] [2024-06-06 14:44:38,849][21859] Worker 5 uses CPU cores [5] [2024-06-06 14:44:39,389][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,389][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,390][21829] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:39,393][21829] RunningMeanStd input shape: (1,) [2024-06-06 14:44:39,394][21829] RunningMeanStd input shape: (1,) [2024-06-06 14:44:39,394][21829] RunningMeanStd input shape: (1,) [2024-06-06 14:44:39,394][21829] RunningMeanStd input shape: (1,) [2024-06-06 14:44:39,433][21829] RunningMeanStd input shape: (1,) [2024-06-06 14:44:39,437][21829] Created Actor Critic model with architecture: [2024-06-06 14:44:39,438][21829] SampleFactoryAgentWrapper( (obs_normalizer): ObservationNormalizer() (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (agent): MettaAgent( (_encoder): MultiFeatureSetEncoder( (feature_set_encoders): ModuleDict( (grid_obs): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (agent): RunningMeanStdInPlace() (altar): RunningMeanStdInPlace() (converter): RunningMeanStdInPlace() (generator): RunningMeanStdInPlace() (wall): RunningMeanStdInPlace() (agent:dir): RunningMeanStdInPlace() (agent:energy): RunningMeanStdInPlace() (agent:frozen): RunningMeanStdInPlace() (agent:hp): RunningMeanStdInPlace() (agent:id): RunningMeanStdInPlace() (agent:inv_r1): RunningMeanStdInPlace() (agent:inv_r2): RunningMeanStdInPlace() (agent:inv_r3): RunningMeanStdInPlace() (agent:shield): RunningMeanStdInPlace() (altar:hp): RunningMeanStdInPlace() (altar:state): RunningMeanStdInPlace() (converter:hp): RunningMeanStdInPlace() (converter:state): RunningMeanStdInPlace() (generator:amount): RunningMeanStdInPlace() (generator:hp): RunningMeanStdInPlace() (generator:state): RunningMeanStdInPlace() (wall:hp): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=125, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=512, out_features=512, bias=True) (7): ELU(alpha=1.0) ) ) (global_vars): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (_steps): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_action): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_action_id): RunningMeanStdInPlace() (last_action_val): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_reward): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_reward): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) ) (merged_encoder): Sequential( (0): Linear(in_features=536, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) ) ) (_core): ModelCoreRNN( (core): GRU(512, 512) ) (_decoder): Decoder( (mlp): Identity() ) (_critic_linear): Linear(in_features=512, out_features=1, bias=True) (_action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) ) [2024-06-06 14:44:39,506][21829] Using optimizer [2024-06-06 14:44:39,695][21829] Loading state from checkpoint /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000028874_473071616.pth... [2024-06-06 14:44:39,709][21829] Loading model from checkpoint [2024-06-06 14:44:39,710][21829] Loaded experiment state at self.train_step=28874, self.env_steps=473071616 [2024-06-06 14:44:39,711][21829] Initialized policy 0 weights for model version 28874 [2024-06-06 14:44:39,712][21829] LearnerWorker_p0 finished initialization! [2024-06-06 14:44:39,712][21829] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,445][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,446][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,446][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,446][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,446][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,446][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,446][21850] RunningMeanStd input shape: (11, 11) [2024-06-06 14:44:40,449][21850] RunningMeanStd input shape: (1,) [2024-06-06 14:44:40,449][21850] RunningMeanStd input shape: (1,) [2024-06-06 14:44:40,449][21850] RunningMeanStd input shape: (1,) [2024-06-06 14:44:40,449][21850] RunningMeanStd input shape: (1,) [2024-06-06 14:44:40,488][21850] RunningMeanStd input shape: (1,) [2024-06-06 14:44:40,510][21617] Inference worker 0-0 is ready! [2024-06-06 14:44:40,511][21617] All inference workers are ready! Signal rollout workers to start! [2024-06-06 14:44:43,055][21617] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 473071616. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:44:43,214][21867] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,215][21869] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,244][21868] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,248][21879] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,254][21876] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,254][21870] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,258][21866] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,276][21875] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,280][21881] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,281][21878] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,285][21873] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,291][21874] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,309][21860] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,310][21863] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,310][21864] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,311][21851] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,312][21853] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,314][21856] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,315][21861] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,319][21858] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,319][21865] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,320][21859] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,322][21849] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,322][21854] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,322][21857] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,322][21852] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,323][21862] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,326][21855] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,326][21872] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,332][21871] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,335][21880] Decorrelating experience for 0 frames... [2024-06-06 14:44:43,344][21877] Decorrelating experience for 0 frames... [2024-06-06 14:44:44,274][21869] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,283][21867] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,319][21868] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,346][21879] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,351][21876] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,354][21870] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,355][21866] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,378][21875] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,385][21881] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,387][21873] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,387][21878] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,389][21874] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,433][21863] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,435][21860] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,438][21851] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,441][21864] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,444][21856] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,448][21853] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,449][21861] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,454][21858] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,463][21859] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,466][21865] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,466][21862] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,466][21857] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,468][21854] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,469][21849] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,471][21852] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,471][21871] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,478][21855] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,502][21872] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,506][21880] Decorrelating experience for 256 frames... [2024-06-06 14:44:44,515][21877] Decorrelating experience for 256 frames... [2024-06-06 14:44:48,056][21617] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 473071616. Throughput: 0: 9506.7. Samples: 47540. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:44:50,982][21866] Worker 17, sleep for 79.688 sec to decorrelate experience collection [2024-06-06 14:44:50,993][21867] Worker 16, sleep for 75.000 sec to decorrelate experience collection [2024-06-06 14:44:51,005][21870] Worker 21, sleep for 98.438 sec to decorrelate experience collection [2024-06-06 14:44:51,005][21868] Worker 18, sleep for 84.375 sec to decorrelate experience collection [2024-06-06 14:44:51,005][21869] Worker 19, sleep for 89.062 sec to decorrelate experience collection [2024-06-06 14:44:51,015][21861] Worker 12, sleep for 56.250 sec to decorrelate experience collection [2024-06-06 14:44:51,015][21863] Worker 9, sleep for 42.188 sec to decorrelate experience collection [2024-06-06 14:44:51,016][21871] Worker 22, sleep for 103.125 sec to decorrelate experience collection [2024-06-06 14:44:51,016][21876] Worker 25, sleep for 117.188 sec to decorrelate experience collection [2024-06-06 14:44:51,019][21873] Worker 20, sleep for 93.750 sec to decorrelate experience collection [2024-06-06 14:44:51,026][21852] Worker 2, sleep for 9.375 sec to decorrelate experience collection [2024-06-06 14:44:51,026][21857] Worker 10, sleep for 46.875 sec to decorrelate experience collection [2024-06-06 14:44:51,026][21872] Worker 23, sleep for 107.812 sec to decorrelate experience collection [2024-06-06 14:44:51,027][21875] Worker 26, sleep for 121.875 sec to decorrelate experience collection [2024-06-06 14:44:51,031][21865] Worker 14, sleep for 65.625 sec to decorrelate experience collection [2024-06-06 14:44:51,034][21881] Worker 31, sleep for 145.312 sec to decorrelate experience collection [2024-06-06 14:44:51,039][21862] Worker 13, sleep for 60.938 sec to decorrelate experience collection [2024-06-06 14:44:51,041][21879] Worker 30, sleep for 140.625 sec to decorrelate experience collection [2024-06-06 14:44:51,045][21874] Worker 24, sleep for 112.500 sec to decorrelate experience collection [2024-06-06 14:44:51,049][21856] Worker 8, sleep for 37.500 sec to decorrelate experience collection [2024-06-06 14:44:51,049][21860] Worker 15, sleep for 70.312 sec to decorrelate experience collection [2024-06-06 14:44:51,056][21877] Worker 27, sleep for 126.562 sec to decorrelate experience collection [2024-06-06 14:44:51,060][21853] Worker 1, sleep for 4.688 sec to decorrelate experience collection [2024-06-06 14:44:51,065][21878] Worker 29, sleep for 135.938 sec to decorrelate experience collection [2024-06-06 14:44:51,072][21880] Worker 28, sleep for 131.250 sec to decorrelate experience collection [2024-06-06 14:44:51,076][21855] Worker 3, sleep for 14.062 sec to decorrelate experience collection [2024-06-06 14:44:51,077][21864] Worker 11, sleep for 51.562 sec to decorrelate experience collection [2024-06-06 14:44:51,091][21859] Worker 5, sleep for 23.438 sec to decorrelate experience collection [2024-06-06 14:44:51,119][21829] Signal inference workers to stop experience collection... [2024-06-06 14:44:51,124][21858] Worker 6, sleep for 28.125 sec to decorrelate experience collection [2024-06-06 14:44:51,176][21850] InferenceWorker_p0-w0: stopping experience collection [2024-06-06 14:44:51,209][21851] Worker 7, sleep for 32.812 sec to decorrelate experience collection [2024-06-06 14:44:51,672][21829] Signal inference workers to resume experience collection... [2024-06-06 14:44:51,673][21850] InferenceWorker_p0-w0: resuming experience collection [2024-06-06 14:44:51,945][21854] Worker 4, sleep for 18.750 sec to decorrelate experience collection [2024-06-06 14:44:52,788][21850] Updated weights for policy 0, policy_version 28884 (0.0013) [2024-06-06 14:44:53,056][21617] Fps is (10 sec: 16383.9, 60 sec: 16383.9, 300 sec: 16383.9). Total num frames: 473235456. Throughput: 0: 32839.9. Samples: 328400. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 14:44:55,771][21853] Worker 1 awakens! [2024-06-06 14:44:55,905][21617] Heartbeat connected on Batcher_0 [2024-06-06 14:44:55,907][21617] Heartbeat connected on LearnerWorker_p0 [2024-06-06 14:44:55,924][21617] Heartbeat connected on RolloutWorker_w1 [2024-06-06 14:44:55,924][21617] Heartbeat connected on RolloutWorker_w0 [2024-06-06 14:44:55,975][21617] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-06 14:44:58,056][21617] Fps is (10 sec: 16385.0, 60 sec: 10922.6, 300 sec: 10922.6). Total num frames: 473235456. Throughput: 0: 22114.6. Samples: 331720. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 14:45:00,448][21852] Worker 2 awakens! [2024-06-06 14:45:00,456][21617] Heartbeat connected on RolloutWorker_w2 [2024-06-06 14:45:03,055][21617] Fps is (10 sec: 1638.4, 60 sec: 9011.2, 300 sec: 9011.2). Total num frames: 473251840. Throughput: 0: 17371.0. Samples: 347420. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 14:45:05,209][21855] Worker 3 awakens! [2024-06-06 14:45:05,225][21617] Heartbeat connected on RolloutWorker_w3 [2024-06-06 14:45:08,056][21617] Fps is (10 sec: 3276.7, 60 sec: 7864.2, 300 sec: 7864.2). Total num frames: 473268224. Throughput: 0: 14869.4. Samples: 371740. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 14:45:10,788][21854] Worker 4 awakens! [2024-06-06 14:45:10,796][21617] Heartbeat connected on RolloutWorker_w4 [2024-06-06 14:45:13,055][21617] Fps is (10 sec: 6553.6, 60 sec: 8192.0, 300 sec: 8192.0). Total num frames: 473317376. Throughput: 0: 12856.0. Samples: 385680. Policy #0 lag: (min: 0.0, avg: 4.3, max: 13.0) [2024-06-06 14:45:13,056][21617] Avg episode reward: [(0, '0.127')] [2024-06-06 14:45:14,628][21859] Worker 5 awakens! [2024-06-06 14:45:14,635][21617] Heartbeat connected on RolloutWorker_w5 [2024-06-06 14:45:18,055][21617] Fps is (10 sec: 9830.8, 60 sec: 8426.1, 300 sec: 8426.1). Total num frames: 473366528. Throughput: 0: 13197.7. Samples: 461920. Policy #0 lag: (min: 0.0, avg: 4.3, max: 13.0) [2024-06-06 14:45:18,056][21617] Avg episode reward: [(0, '0.147')] [2024-06-06 14:45:18,702][21850] Updated weights for policy 0, policy_version 28894 (0.0015) [2024-06-06 14:45:19,348][21858] Worker 6 awakens! [2024-06-06 14:45:19,352][21617] Heartbeat connected on RolloutWorker_w6 [2024-06-06 14:45:23,056][21617] Fps is (10 sec: 16383.8, 60 sec: 10240.0, 300 sec: 10240.0). Total num frames: 473481216. Throughput: 0: 14183.0. Samples: 567320. Policy #0 lag: (min: 0.0, avg: 4.3, max: 13.0) [2024-06-06 14:45:23,056][21617] Avg episode reward: [(0, '0.160')] [2024-06-06 14:45:24,054][21851] Worker 7 awakens! [2024-06-06 14:45:24,059][21617] Heartbeat connected on RolloutWorker_w7 [2024-06-06 14:45:26,047][21850] Updated weights for policy 0, policy_version 28904 (0.0011) [2024-06-06 14:45:28,055][21617] Fps is (10 sec: 21299.0, 60 sec: 11286.7, 300 sec: 11286.7). Total num frames: 473579520. Throughput: 0: 14119.5. Samples: 635380. Policy #0 lag: (min: 0.0, avg: 4.3, max: 13.0) [2024-06-06 14:45:28,056][21617] Avg episode reward: [(0, '0.170')] [2024-06-06 14:45:28,649][21856] Worker 8 awakens! [2024-06-06 14:45:28,654][21617] Heartbeat connected on RolloutWorker_w8 [2024-06-06 14:45:33,055][21617] Fps is (10 sec: 22938.0, 60 sec: 12779.5, 300 sec: 12779.5). Total num frames: 473710592. Throughput: 0: 16134.5. Samples: 773580. Policy #0 lag: (min: 0.0, avg: 3.5, max: 8.0) [2024-06-06 14:45:33,056][21617] Avg episode reward: [(0, '0.176')] [2024-06-06 14:45:33,254][21863] Worker 9 awakens! [2024-06-06 14:45:33,261][21617] Heartbeat connected on RolloutWorker_w9 [2024-06-06 14:45:33,470][21850] Updated weights for policy 0, policy_version 28914 (0.0011) [2024-06-06 14:45:38,000][21857] Worker 10 awakens! [2024-06-06 14:45:38,005][21617] Heartbeat connected on RolloutWorker_w10 [2024-06-06 14:45:38,055][21617] Fps is (10 sec: 26214.6, 60 sec: 14000.9, 300 sec: 14000.9). Total num frames: 473841664. Throughput: 0: 13236.0. Samples: 924020. Policy #0 lag: (min: 0.0, avg: 3.5, max: 8.0) [2024-06-06 14:45:38,056][21617] Avg episode reward: [(0, '0.168')] [2024-06-06 14:45:39,617][21850] Updated weights for policy 0, policy_version 28924 (0.0013) [2024-06-06 14:45:42,738][21864] Worker 11 awakens! [2024-06-06 14:45:42,751][21617] Heartbeat connected on RolloutWorker_w11 [2024-06-06 14:45:43,056][21617] Fps is (10 sec: 27852.3, 60 sec: 15291.7, 300 sec: 15291.7). Total num frames: 473989120. Throughput: 0: 15316.4. Samples: 1020960. Policy #0 lag: (min: 0.0, avg: 3.5, max: 8.0) [2024-06-06 14:45:43,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:45:44,523][21850] Updated weights for policy 0, policy_version 28934 (0.0016) [2024-06-06 14:45:47,364][21861] Worker 12 awakens! [2024-06-06 14:45:47,369][21617] Heartbeat connected on RolloutWorker_w12 [2024-06-06 14:45:48,056][21617] Fps is (10 sec: 32767.6, 60 sec: 18295.6, 300 sec: 16888.1). Total num frames: 474169344. Throughput: 0: 19370.6. Samples: 1219100. Policy #0 lag: (min: 0.0, avg: 3.5, max: 8.0) [2024-06-06 14:45:48,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:45:49,258][21850] Updated weights for policy 0, policy_version 28944 (0.0015) [2024-06-06 14:45:52,076][21862] Worker 13 awakens! [2024-06-06 14:45:52,085][21617] Heartbeat connected on RolloutWorker_w13 [2024-06-06 14:45:53,055][21617] Fps is (10 sec: 36044.9, 60 sec: 18568.5, 300 sec: 18256.4). Total num frames: 474349568. Throughput: 0: 23494.4. Samples: 1428980. Policy #0 lag: (min: 0.0, avg: 4.9, max: 9.0) [2024-06-06 14:45:53,056][21617] Avg episode reward: [(0, '0.171')] [2024-06-06 14:45:53,834][21850] Updated weights for policy 0, policy_version 28954 (0.0018) [2024-06-06 14:45:56,756][21865] Worker 14 awakens! [2024-06-06 14:45:56,762][21617] Heartbeat connected on RolloutWorker_w14 [2024-06-06 14:45:58,056][21617] Fps is (10 sec: 36044.8, 60 sec: 21572.3, 300 sec: 19442.3). Total num frames: 474529792. Throughput: 0: 25545.7. Samples: 1535240. Policy #0 lag: (min: 0.0, avg: 4.9, max: 9.0) [2024-06-06 14:45:58,056][21617] Avg episode reward: [(0, '0.169')] [2024-06-06 14:45:58,952][21850] Updated weights for policy 0, policy_version 28964 (0.0021) [2024-06-06 14:46:01,460][21860] Worker 15 awakens! [2024-06-06 14:46:01,466][21617] Heartbeat connected on RolloutWorker_w15 [2024-06-06 14:46:03,056][21617] Fps is (10 sec: 34406.3, 60 sec: 24029.8, 300 sec: 20275.2). Total num frames: 474693632. Throughput: 0: 28512.8. Samples: 1745000. Policy #0 lag: (min: 0.0, avg: 4.9, max: 9.0) [2024-06-06 14:46:03,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:46:03,133][21850] Updated weights for policy 0, policy_version 28974 (0.0019) [2024-06-06 14:46:06,092][21867] Worker 16 awakens! [2024-06-06 14:46:06,101][21617] Heartbeat connected on RolloutWorker_w16 [2024-06-06 14:46:07,624][21850] Updated weights for policy 0, policy_version 28984 (0.0020) [2024-06-06 14:46:08,055][21617] Fps is (10 sec: 34406.8, 60 sec: 26760.7, 300 sec: 21202.8). Total num frames: 474873856. Throughput: 0: 30923.2. Samples: 1958860. Policy #0 lag: (min: 0.0, avg: 4.9, max: 9.0) [2024-06-06 14:46:08,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:46:10,768][21866] Worker 17 awakens! [2024-06-06 14:46:10,778][21617] Heartbeat connected on RolloutWorker_w17 [2024-06-06 14:46:11,780][21850] Updated weights for policy 0, policy_version 28994 (0.0019) [2024-06-06 14:46:13,056][21617] Fps is (10 sec: 36044.6, 60 sec: 28945.0, 300 sec: 22027.3). Total num frames: 475054080. Throughput: 0: 31918.6. Samples: 2071720. Policy #0 lag: (min: 0.0, avg: 4.1, max: 11.0) [2024-06-06 14:46:13,056][21617] Avg episode reward: [(0, '0.172')] [2024-06-06 14:46:15,480][21868] Worker 18 awakens! [2024-06-06 14:46:15,489][21617] Heartbeat connected on RolloutWorker_w18 [2024-06-06 14:46:15,986][21850] Updated weights for policy 0, policy_version 29004 (0.0021) [2024-06-06 14:46:18,056][21617] Fps is (10 sec: 36044.5, 60 sec: 31129.5, 300 sec: 22765.1). Total num frames: 475234304. Throughput: 0: 33686.5. Samples: 2289480. Policy #0 lag: (min: 0.0, avg: 4.1, max: 11.0) [2024-06-06 14:46:18,056][21617] Avg episode reward: [(0, '0.171')] [2024-06-06 14:46:20,168][21869] Worker 19 awakens! [2024-06-06 14:46:20,179][21617] Heartbeat connected on RolloutWorker_w19 [2024-06-06 14:46:20,742][21850] Updated weights for policy 0, policy_version 29014 (0.0023) [2024-06-06 14:46:23,056][21617] Fps is (10 sec: 39321.9, 60 sec: 32768.0, 300 sec: 23756.8). Total num frames: 475447296. Throughput: 0: 35559.5. Samples: 2524200. Policy #0 lag: (min: 0.0, avg: 4.1, max: 11.0) [2024-06-06 14:46:23,056][21617] Avg episode reward: [(0, '0.172')] [2024-06-06 14:46:24,868][21873] Worker 20 awakens! [2024-06-06 14:46:24,879][21617] Heartbeat connected on RolloutWorker_w20 [2024-06-06 14:46:25,423][21850] Updated weights for policy 0, policy_version 29024 (0.0030) [2024-06-06 14:46:28,055][21617] Fps is (10 sec: 40960.4, 60 sec: 34406.4, 300 sec: 24498.0). Total num frames: 475643904. Throughput: 0: 36007.2. Samples: 2641280. Policy #0 lag: (min: 0.0, avg: 8.6, max: 16.0) [2024-06-06 14:46:28,056][21617] Avg episode reward: [(0, '0.171')] [2024-06-06 14:46:29,516][21870] Worker 21 awakens! [2024-06-06 14:46:29,526][21617] Heartbeat connected on RolloutWorker_w21 [2024-06-06 14:46:29,616][21850] Updated weights for policy 0, policy_version 29034 (0.0023) [2024-06-06 14:46:32,445][21850] Updated weights for policy 0, policy_version 29044 (0.0032) [2024-06-06 14:46:33,056][21617] Fps is (10 sec: 40959.9, 60 sec: 35771.6, 300 sec: 25320.7). Total num frames: 475856896. Throughput: 0: 36894.7. Samples: 2879360. Policy #0 lag: (min: 0.0, avg: 8.6, max: 16.0) [2024-06-06 14:46:33,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:46:33,068][21829] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000029044_475856896.pth... [2024-06-06 14:46:33,121][21829] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000028567_468041728.pth [2024-06-06 14:46:34,208][21871] Worker 22 awakens! [2024-06-06 14:46:34,220][21617] Heartbeat connected on RolloutWorker_w22 [2024-06-06 14:46:37,214][21850] Updated weights for policy 0, policy_version 29054 (0.0027) [2024-06-06 14:46:38,056][21617] Fps is (10 sec: 44236.5, 60 sec: 37410.1, 300 sec: 26214.4). Total num frames: 476086272. Throughput: 0: 37616.4. Samples: 3121720. Policy #0 lag: (min: 0.0, avg: 8.6, max: 16.0) [2024-06-06 14:46:38,056][21617] Avg episode reward: [(0, '0.168')] [2024-06-06 14:46:38,939][21872] Worker 23 awakens! [2024-06-06 14:46:38,951][21617] Heartbeat connected on RolloutWorker_w23 [2024-06-06 14:46:41,501][21850] Updated weights for policy 0, policy_version 29064 (0.0028) [2024-06-06 14:46:43,056][21617] Fps is (10 sec: 40959.9, 60 sec: 37956.2, 300 sec: 26624.0). Total num frames: 476266496. Throughput: 0: 38210.2. Samples: 3254700. Policy #0 lag: (min: 0.0, avg: 8.6, max: 16.0) [2024-06-06 14:46:43,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:46:43,645][21874] Worker 24 awakens! [2024-06-06 14:46:43,658][21617] Heartbeat connected on RolloutWorker_w24 [2024-06-06 14:46:44,654][21850] Updated weights for policy 0, policy_version 29074 (0.0026) [2024-06-06 14:46:48,055][21617] Fps is (10 sec: 36045.0, 60 sec: 37956.3, 300 sec: 27000.8). Total num frames: 476446720. Throughput: 0: 39250.8. Samples: 3511280. Policy #0 lag: (min: 0.0, avg: 6.7, max: 17.0) [2024-06-06 14:46:48,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:46:48,214][21876] Worker 25 awakens! [2024-06-06 14:46:48,227][21617] Heartbeat connected on RolloutWorker_w25 [2024-06-06 14:46:49,031][21850] Updated weights for policy 0, policy_version 29084 (0.0037) [2024-06-06 14:46:53,002][21875] Worker 26 awakens! [2024-06-06 14:46:53,014][21617] Heartbeat connected on RolloutWorker_w26 [2024-06-06 14:46:53,019][21850] Updated weights for policy 0, policy_version 29094 (0.0023) [2024-06-06 14:46:53,056][21617] Fps is (10 sec: 40960.1, 60 sec: 38775.4, 300 sec: 27726.8). Total num frames: 476676096. Throughput: 0: 40165.2. Samples: 3766300. Policy #0 lag: (min: 0.0, avg: 6.7, max: 17.0) [2024-06-06 14:46:53,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:46:56,001][21850] Updated weights for policy 0, policy_version 29104 (0.0024) [2024-06-06 14:46:57,716][21877] Worker 27 awakens! [2024-06-06 14:46:57,730][21617] Heartbeat connected on RolloutWorker_w27 [2024-06-06 14:46:58,055][21617] Fps is (10 sec: 44236.9, 60 sec: 39321.7, 300 sec: 28277.6). Total num frames: 476889088. Throughput: 0: 40290.8. Samples: 3884800. Policy #0 lag: (min: 0.0, avg: 6.7, max: 17.0) [2024-06-06 14:46:58,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:47:00,803][21850] Updated weights for policy 0, policy_version 29114 (0.0033) [2024-06-06 14:47:02,216][21829] Signal inference workers to stop experience collection... (50 times) [2024-06-06 14:47:02,260][21850] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-06 14:47:02,266][21829] Signal inference workers to resume experience collection... (50 times) [2024-06-06 14:47:02,271][21850] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-06 14:47:02,349][21880] Worker 28 awakens! [2024-06-06 14:47:02,362][21617] Heartbeat connected on RolloutWorker_w28 [2024-06-06 14:47:03,056][21617] Fps is (10 sec: 47513.4, 60 sec: 40960.0, 300 sec: 29140.1). Total num frames: 477151232. Throughput: 0: 41406.6. Samples: 4152780. Policy #0 lag: (min: 0.0, avg: 6.7, max: 17.0) [2024-06-06 14:47:03,056][21617] Avg episode reward: [(0, '0.167')] [2024-06-06 14:47:04,163][21850] Updated weights for policy 0, policy_version 29124 (0.0029) [2024-06-06 14:47:07,104][21878] Worker 29 awakens! [2024-06-06 14:47:07,117][21617] Heartbeat connected on RolloutWorker_w29 [2024-06-06 14:47:08,055][21617] Fps is (10 sec: 42598.2, 60 sec: 40686.9, 300 sec: 29265.2). Total num frames: 477315072. Throughput: 0: 42024.5. Samples: 4415300. Policy #0 lag: (min: 0.0, avg: 8.4, max: 19.0) [2024-06-06 14:47:08,056][21617] Avg episode reward: [(0, '0.184')] [2024-06-06 14:47:08,166][21850] Updated weights for policy 0, policy_version 29134 (0.0038) [2024-06-06 14:47:11,653][21850] Updated weights for policy 0, policy_version 29144 (0.0023) [2024-06-06 14:47:11,731][21879] Worker 30 awakens! [2024-06-06 14:47:11,744][21617] Heartbeat connected on RolloutWorker_w30 [2024-06-06 14:47:13,056][21617] Fps is (10 sec: 37683.3, 60 sec: 41233.1, 300 sec: 29709.6). Total num frames: 477528064. Throughput: 0: 42080.3. Samples: 4534900. Policy #0 lag: (min: 0.0, avg: 8.4, max: 19.0) [2024-06-06 14:47:13,056][21617] Avg episode reward: [(0, '0.184')] [2024-06-06 14:47:15,657][21850] Updated weights for policy 0, policy_version 29154 (0.0032) [2024-06-06 14:47:16,353][21881] Worker 31 awakens! [2024-06-06 14:47:16,365][21617] Heartbeat connected on RolloutWorker_w31 [2024-06-06 14:47:18,060][21617] Fps is (10 sec: 47493.9, 60 sec: 42595.5, 300 sec: 30441.7). Total num frames: 477790208. Throughput: 0: 42719.7. Samples: 4801920. Policy #0 lag: (min: 0.0, avg: 8.4, max: 19.0) [2024-06-06 14:47:18,060][21617] Avg episode reward: [(0, '0.165')] [2024-06-06 14:47:19,098][21850] Updated weights for policy 0, policy_version 29164 (0.0033) [2024-06-06 14:47:22,868][21850] Updated weights for policy 0, policy_version 29174 (0.0046) [2024-06-06 14:47:23,055][21617] Fps is (10 sec: 47514.2, 60 sec: 42598.5, 300 sec: 30822.4). Total num frames: 478003200. Throughput: 0: 43716.1. Samples: 5088940. Policy #0 lag: (min: 0.0, avg: 8.4, max: 19.0) [2024-06-06 14:47:23,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:47:26,102][21850] Updated weights for policy 0, policy_version 29184 (0.0034) [2024-06-06 14:47:28,056][21617] Fps is (10 sec: 40976.6, 60 sec: 42598.3, 300 sec: 31079.9). Total num frames: 478199808. Throughput: 0: 43660.0. Samples: 5219400. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 14:47:28,056][21617] Avg episode reward: [(0, '0.168')] [2024-06-06 14:47:30,212][21850] Updated weights for policy 0, policy_version 29194 (0.0026) [2024-06-06 14:47:33,055][21617] Fps is (10 sec: 44236.5, 60 sec: 43144.6, 300 sec: 31611.5). Total num frames: 478445568. Throughput: 0: 43753.7. Samples: 5480200. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 14:47:33,056][21617] Avg episode reward: [(0, '0.169')] [2024-06-06 14:47:33,482][21850] Updated weights for policy 0, policy_version 29204 (0.0043) [2024-06-06 14:47:37,507][21850] Updated weights for policy 0, policy_version 29214 (0.0033) [2024-06-06 14:47:38,056][21617] Fps is (10 sec: 49150.3, 60 sec: 43417.3, 300 sec: 32112.6). Total num frames: 478691328. Throughput: 0: 44213.0. Samples: 5755900. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 14:47:38,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:47:40,828][21850] Updated weights for policy 0, policy_version 29224 (0.0034) [2024-06-06 14:47:43,056][21617] Fps is (10 sec: 42598.1, 60 sec: 43417.6, 300 sec: 32221.8). Total num frames: 478871552. Throughput: 0: 44559.8. Samples: 5890000. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 14:47:43,056][21617] Avg episode reward: [(0, '0.176')] [2024-06-06 14:47:44,771][21850] Updated weights for policy 0, policy_version 29234 (0.0032) [2024-06-06 14:47:47,913][21850] Updated weights for policy 0, policy_version 29244 (0.0040) [2024-06-06 14:47:48,055][21617] Fps is (10 sec: 44239.1, 60 sec: 44783.0, 300 sec: 32768.0). Total num frames: 479133696. Throughput: 0: 44549.5. Samples: 6157500. Policy #0 lag: (min: 0.0, avg: 10.7, max: 24.0) [2024-06-06 14:47:48,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:47:52,018][21850] Updated weights for policy 0, policy_version 29254 (0.0034) [2024-06-06 14:47:53,055][21617] Fps is (10 sec: 50790.9, 60 sec: 45056.1, 300 sec: 33199.2). Total num frames: 479379456. Throughput: 0: 44705.8. Samples: 6427060. Policy #0 lag: (min: 0.0, avg: 10.7, max: 24.0) [2024-06-06 14:47:53,056][21617] Avg episode reward: [(0, '0.188')] [2024-06-06 14:47:55,056][21850] Updated weights for policy 0, policy_version 29264 (0.0034) [2024-06-06 14:47:58,055][21617] Fps is (10 sec: 40959.7, 60 sec: 44236.8, 300 sec: 33188.1). Total num frames: 479543296. Throughput: 0: 45201.9. Samples: 6568980. Policy #0 lag: (min: 0.0, avg: 10.7, max: 24.0) [2024-06-06 14:47:58,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:47:59,053][21850] Updated weights for policy 0, policy_version 29274 (0.0039) [2024-06-06 14:48:02,415][21850] Updated weights for policy 0, policy_version 29284 (0.0033) [2024-06-06 14:48:03,056][21617] Fps is (10 sec: 42598.0, 60 sec: 44236.8, 300 sec: 33669.1). Total num frames: 479805440. Throughput: 0: 45150.3. Samples: 6833500. Policy #0 lag: (min: 0.0, avg: 10.7, max: 24.0) [2024-06-06 14:48:03,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:48:06,562][21850] Updated weights for policy 0, policy_version 29294 (0.0031) [2024-06-06 14:48:08,060][21617] Fps is (10 sec: 49130.0, 60 sec: 45325.7, 300 sec: 33966.1). Total num frames: 480034816. Throughput: 0: 44731.5. Samples: 7102060. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 14:48:08,061][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 14:48:08,216][21829] Signal inference workers to stop experience collection... (100 times) [2024-06-06 14:48:08,217][21829] Signal inference workers to resume experience collection... (100 times) [2024-06-06 14:48:08,260][21850] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-06 14:48:08,260][21850] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-06 14:48:09,703][21850] Updated weights for policy 0, policy_version 29304 (0.0040) [2024-06-06 14:48:13,056][21617] Fps is (10 sec: 42598.8, 60 sec: 45056.0, 300 sec: 34094.3). Total num frames: 480231424. Throughput: 0: 44958.7. Samples: 7242540. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 14:48:13,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:48:13,709][21850] Updated weights for policy 0, policy_version 29314 (0.0040) [2024-06-06 14:48:16,902][21850] Updated weights for policy 0, policy_version 29324 (0.0039) [2024-06-06 14:48:18,055][21617] Fps is (10 sec: 42617.7, 60 sec: 44513.0, 300 sec: 34368.3). Total num frames: 480460800. Throughput: 0: 45061.4. Samples: 7507960. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 14:48:18,056][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 14:48:21,013][21850] Updated weights for policy 0, policy_version 29334 (0.0047) [2024-06-06 14:48:23,055][21617] Fps is (10 sec: 47514.0, 60 sec: 45056.0, 300 sec: 34704.3). Total num frames: 480706560. Throughput: 0: 44918.7. Samples: 7777220. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 14:48:23,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:48:24,048][21850] Updated weights for policy 0, policy_version 29344 (0.0025) [2024-06-06 14:48:28,055][21617] Fps is (10 sec: 44236.9, 60 sec: 45056.1, 300 sec: 34806.9). Total num frames: 480903168. Throughput: 0: 44969.5. Samples: 7913620. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 14:48:28,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:48:28,491][21850] Updated weights for policy 0, policy_version 29354 (0.0032) [2024-06-06 14:48:31,553][21850] Updated weights for policy 0, policy_version 29364 (0.0036) [2024-06-06 14:48:33,056][21617] Fps is (10 sec: 44236.2, 60 sec: 45056.0, 300 sec: 35118.7). Total num frames: 481148928. Throughput: 0: 44978.5. Samples: 8181540. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 14:48:33,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:48:33,063][21829] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000029367_481148928.pth... [2024-06-06 14:48:33,122][21829] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000028874_473071616.pth [2024-06-06 14:48:35,630][21850] Updated weights for policy 0, policy_version 29374 (0.0037) [2024-06-06 14:48:38,056][21617] Fps is (10 sec: 47510.9, 60 sec: 44782.9, 300 sec: 35347.5). Total num frames: 481378304. Throughput: 0: 44938.6. Samples: 8449320. Policy #0 lag: (min: 0.0, avg: 11.1, max: 22.0) [2024-06-06 14:48:38,057][21617] Avg episode reward: [(0, '0.168')] [2024-06-06 14:48:38,738][21850] Updated weights for policy 0, policy_version 29384 (0.0045) [2024-06-06 14:48:42,747][21850] Updated weights for policy 0, policy_version 29394 (0.0038) [2024-06-06 14:48:43,057][21617] Fps is (10 sec: 45869.8, 60 sec: 45601.3, 300 sec: 35566.7). Total num frames: 481607680. Throughput: 0: 44991.2. Samples: 8593640. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 14:48:43,057][21617] Avg episode reward: [(0, '0.186')] [2024-06-06 14:48:45,870][21850] Updated weights for policy 0, policy_version 29404 (0.0041) [2024-06-06 14:48:48,055][21617] Fps is (10 sec: 42600.8, 60 sec: 44509.9, 300 sec: 35643.6). Total num frames: 481804288. Throughput: 0: 45122.0. Samples: 8863980. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 14:48:48,056][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 14:48:50,081][21850] Updated weights for policy 0, policy_version 29414 (0.0033) [2024-06-06 14:48:53,047][21850] Updated weights for policy 0, policy_version 29424 (0.0033) [2024-06-06 14:48:53,055][21617] Fps is (10 sec: 47519.7, 60 sec: 45056.0, 300 sec: 36044.8). Total num frames: 482082816. Throughput: 0: 45413.0. Samples: 9145440. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 14:48:53,056][21617] Avg episode reward: [(0, '0.176')] [2024-06-06 14:48:57,279][21850] Updated weights for policy 0, policy_version 29434 (0.0022) [2024-06-06 14:48:58,056][21617] Fps is (10 sec: 49148.9, 60 sec: 45874.8, 300 sec: 36173.2). Total num frames: 482295808. Throughput: 0: 45239.9. Samples: 9278360. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 14:48:58,057][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:49:00,351][21850] Updated weights for policy 0, policy_version 29444 (0.0036) [2024-06-06 14:49:03,055][21617] Fps is (10 sec: 40960.3, 60 sec: 44783.1, 300 sec: 36233.9). Total num frames: 482492416. Throughput: 0: 45268.9. Samples: 9545060. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:49:03,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:49:04,701][21850] Updated weights for policy 0, policy_version 29454 (0.0036) [2024-06-06 14:49:07,545][21850] Updated weights for policy 0, policy_version 29464 (0.0037) [2024-06-06 14:49:08,055][21617] Fps is (10 sec: 44239.9, 60 sec: 45059.5, 300 sec: 36477.6). Total num frames: 482738176. Throughput: 0: 45133.4. Samples: 9808220. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:49:08,056][21617] Avg episode reward: [(0, '0.171')] [2024-06-06 14:49:11,871][21850] Updated weights for policy 0, policy_version 29474 (0.0036) [2024-06-06 14:49:13,055][21617] Fps is (10 sec: 47513.4, 60 sec: 45602.2, 300 sec: 36651.6). Total num frames: 482967552. Throughput: 0: 45308.0. Samples: 9952480. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:49:13,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:49:15,029][21850] Updated weights for policy 0, policy_version 29484 (0.0030) [2024-06-06 14:49:18,055][21617] Fps is (10 sec: 42598.1, 60 sec: 45056.0, 300 sec: 36700.2). Total num frames: 483164160. Throughput: 0: 45299.3. Samples: 10220000. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 14:49:18,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:49:19,246][21850] Updated weights for policy 0, policy_version 29494 (0.0046) [2024-06-06 14:49:22,142][21850] Updated weights for policy 0, policy_version 29504 (0.0025) [2024-06-06 14:49:23,055][21617] Fps is (10 sec: 44237.0, 60 sec: 45056.0, 300 sec: 36922.5). Total num frames: 483409920. Throughput: 0: 45299.7. Samples: 10487780. Policy #0 lag: (min: 1.0, avg: 10.9, max: 21.0) [2024-06-06 14:49:23,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:49:26,585][21850] Updated weights for policy 0, policy_version 29514 (0.0044) [2024-06-06 14:49:28,055][21617] Fps is (10 sec: 49151.9, 60 sec: 45875.2, 300 sec: 37137.1). Total num frames: 483655680. Throughput: 0: 45273.3. Samples: 10630880. Policy #0 lag: (min: 1.0, avg: 10.9, max: 21.0) [2024-06-06 14:49:28,056][21617] Avg episode reward: [(0, '0.174')] [2024-06-06 14:49:29,771][21850] Updated weights for policy 0, policy_version 29524 (0.0032) [2024-06-06 14:49:33,056][21617] Fps is (10 sec: 42597.7, 60 sec: 44782.9, 300 sec: 37118.2). Total num frames: 483835904. Throughput: 0: 44978.1. Samples: 10888000. Policy #0 lag: (min: 1.0, avg: 10.9, max: 21.0) [2024-06-06 14:49:33,056][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 14:49:33,995][21850] Updated weights for policy 0, policy_version 29534 (0.0042) [2024-06-06 14:49:34,657][21829] Signal inference workers to stop experience collection... (150 times) [2024-06-06 14:49:34,658][21829] Signal inference workers to resume experience collection... (150 times) [2024-06-06 14:49:34,700][21850] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-06 14:49:34,700][21850] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-06 14:49:37,088][21850] Updated weights for policy 0, policy_version 29544 (0.0034) [2024-06-06 14:49:38,055][21617] Fps is (10 sec: 44236.9, 60 sec: 45329.5, 300 sec: 37377.7). Total num frames: 484098048. Throughput: 0: 44776.9. Samples: 11160400. Policy #0 lag: (min: 1.0, avg: 10.9, max: 21.0) [2024-06-06 14:49:38,064][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:49:41,067][21850] Updated weights for policy 0, policy_version 29554 (0.0025) [2024-06-06 14:49:43,055][21617] Fps is (10 sec: 47514.2, 60 sec: 45057.0, 300 sec: 38099.8). Total num frames: 484311040. Throughput: 0: 44965.5. Samples: 11301780. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 14:49:43,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:49:44,099][21850] Updated weights for policy 0, policy_version 29564 (0.0037) [2024-06-06 14:49:48,055][21617] Fps is (10 sec: 42598.1, 60 sec: 45329.0, 300 sec: 38266.4). Total num frames: 484524032. Throughput: 0: 44869.2. Samples: 11564180. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 14:49:48,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:49:48,162][21850] Updated weights for policy 0, policy_version 29574 (0.0026) [2024-06-06 14:49:51,485][21850] Updated weights for policy 0, policy_version 29584 (0.0034) [2024-06-06 14:49:53,055][21617] Fps is (10 sec: 44236.9, 60 sec: 44509.9, 300 sec: 39043.9). Total num frames: 484753408. Throughput: 0: 44910.6. Samples: 11829200. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 14:49:53,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:49:55,770][21850] Updated weights for policy 0, policy_version 29594 (0.0038) [2024-06-06 14:49:58,055][21617] Fps is (10 sec: 44237.3, 60 sec: 44510.4, 300 sec: 39710.4). Total num frames: 484966400. Throughput: 0: 44812.5. Samples: 11969040. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 14:49:58,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:49:59,107][21850] Updated weights for policy 0, policy_version 29604 (0.0030) [2024-06-06 14:50:03,055][21617] Fps is (10 sec: 40959.9, 60 sec: 44509.8, 300 sec: 40321.4). Total num frames: 485163008. Throughput: 0: 44741.3. Samples: 12233360. Policy #0 lag: (min: 1.0, avg: 11.3, max: 21.0) [2024-06-06 14:50:03,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:50:03,383][21850] Updated weights for policy 0, policy_version 29614 (0.0031) [2024-06-06 14:50:06,286][21850] Updated weights for policy 0, policy_version 29624 (0.0037) [2024-06-06 14:50:08,055][21617] Fps is (10 sec: 45874.5, 60 sec: 44782.8, 300 sec: 41043.3). Total num frames: 485425152. Throughput: 0: 44835.0. Samples: 12505360. Policy #0 lag: (min: 1.0, avg: 11.3, max: 21.0) [2024-06-06 14:50:08,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:50:10,382][21850] Updated weights for policy 0, policy_version 29634 (0.0022) [2024-06-06 14:50:13,055][21617] Fps is (10 sec: 50790.4, 60 sec: 45056.0, 300 sec: 41709.8). Total num frames: 485670912. Throughput: 0: 44705.4. Samples: 12642620. Policy #0 lag: (min: 1.0, avg: 11.3, max: 21.0) [2024-06-06 14:50:13,056][21617] Avg episode reward: [(0, '0.174')] [2024-06-06 14:50:13,121][21850] Updated weights for policy 0, policy_version 29644 (0.0041) [2024-06-06 14:50:17,360][21850] Updated weights for policy 0, policy_version 29654 (0.0025) [2024-06-06 14:50:18,055][21617] Fps is (10 sec: 44237.1, 60 sec: 45056.0, 300 sec: 41987.5). Total num frames: 485867520. Throughput: 0: 45004.1. Samples: 12913180. Policy #0 lag: (min: 1.0, avg: 11.3, max: 21.0) [2024-06-06 14:50:18,056][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 14:50:20,495][21850] Updated weights for policy 0, policy_version 29664 (0.0038) [2024-06-06 14:50:23,055][21617] Fps is (10 sec: 42598.4, 60 sec: 44782.9, 300 sec: 42431.8). Total num frames: 486096896. Throughput: 0: 44875.6. Samples: 13179800. Policy #0 lag: (min: 0.0, avg: 11.6, max: 22.0) [2024-06-06 14:50:23,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:50:24,771][21850] Updated weights for policy 0, policy_version 29674 (0.0034) [2024-06-06 14:50:28,008][21850] Updated weights for policy 0, policy_version 29684 (0.0035) [2024-06-06 14:50:28,060][21617] Fps is (10 sec: 47492.1, 60 sec: 44779.6, 300 sec: 42819.9). Total num frames: 486342656. Throughput: 0: 44760.8. Samples: 13316220. Policy #0 lag: (min: 0.0, avg: 11.6, max: 22.0) [2024-06-06 14:50:28,061][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:50:32,271][21850] Updated weights for policy 0, policy_version 29694 (0.0040) [2024-06-06 14:50:33,056][21617] Fps is (10 sec: 45874.7, 60 sec: 45329.1, 300 sec: 43098.2). Total num frames: 486555648. Throughput: 0: 44894.2. Samples: 13584420. Policy #0 lag: (min: 0.0, avg: 11.6, max: 22.0) [2024-06-06 14:50:33,057][21617] Avg episode reward: [(0, '0.184')] [2024-06-06 14:50:33,074][21829] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000029697_486555648.pth... [2024-06-06 14:50:33,121][21829] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000029044_475856896.pth [2024-06-06 14:50:35,152][21850] Updated weights for policy 0, policy_version 29704 (0.0041) [2024-06-06 14:50:38,056][21617] Fps is (10 sec: 42617.2, 60 sec: 44509.8, 300 sec: 43320.4). Total num frames: 486768640. Throughput: 0: 44995.0. Samples: 13853980. Policy #0 lag: (min: 0.0, avg: 11.6, max: 22.0) [2024-06-06 14:50:38,056][21617] Avg episode reward: [(0, '0.172')] [2024-06-06 14:50:39,310][21850] Updated weights for policy 0, policy_version 29714 (0.0034) [2024-06-06 14:50:42,165][21850] Updated weights for policy 0, policy_version 29724 (0.0036) [2024-06-06 14:50:43,055][21617] Fps is (10 sec: 44236.9, 60 sec: 44782.9, 300 sec: 43487.0). Total num frames: 486998016. Throughput: 0: 44887.9. Samples: 13989000. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-06 14:50:43,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 14:50:46,418][21850] Updated weights for policy 0, policy_version 29734 (0.0032) [2024-06-06 14:50:48,055][21617] Fps is (10 sec: 44237.1, 60 sec: 44782.9, 300 sec: 43598.1). Total num frames: 487211008. Throughput: 0: 44905.7. Samples: 14254120. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-06 14:50:48,056][21617] Avg episode reward: [(0, '0.170')] [2024-06-06 14:50:49,899][21850] Updated weights for policy 0, policy_version 29744 (0.0039) [2024-06-06 14:50:50,737][21829] Signal inference workers to stop experience collection... (200 times) [2024-06-06 14:50:50,780][21850] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-06 14:50:50,790][21829] Signal inference workers to resume experience collection... (200 times) [2024-06-06 14:50:50,794][21850] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-06 14:50:53,056][21617] Fps is (10 sec: 42598.2, 60 sec: 44509.8, 300 sec: 43709.2). Total num frames: 487424000. Throughput: 0: 44917.3. Samples: 14526640. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-06 14:50:53,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:50:53,885][21850] Updated weights for policy 0, policy_version 29754 (0.0040) [2024-06-06 14:50:57,137][21850] Updated weights for policy 0, policy_version 29764 (0.0035) [2024-06-06 14:50:58,056][21617] Fps is (10 sec: 45875.0, 60 sec: 45055.9, 300 sec: 43986.9). Total num frames: 487669760. Throughput: 0: 44791.4. Samples: 14658240. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-06 14:50:58,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 14:51:01,390][21850] Updated weights for policy 0, policy_version 29774 (0.0035) [2024-06-06 14:51:03,056][21617] Fps is (10 sec: 47513.3, 60 sec: 45602.0, 300 sec: 44153.5). Total num frames: 487899136. Throughput: 0: 44834.9. Samples: 14930760. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 14:51:03,056][21617] Avg episode reward: [(0, '0.188')] [2024-06-06 14:51:04,114][21850] Updated weights for policy 0, policy_version 29784 (0.0024) [2024-06-06 14:51:08,055][21617] Fps is (10 sec: 44237.3, 60 sec: 44783.0, 300 sec: 44264.6). Total num frames: 488112128. Throughput: 0: 45056.9. Samples: 15207360. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 14:51:08,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:51:08,434][21850] Updated weights for policy 0, policy_version 29794 (0.0021) [2024-06-06 14:51:11,483][21850] Updated weights for policy 0, policy_version 29804 (0.0038) [2024-06-06 14:51:13,055][21617] Fps is (10 sec: 42599.3, 60 sec: 44236.8, 300 sec: 44375.7). Total num frames: 488325120. Throughput: 0: 44877.9. Samples: 15335520. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 14:51:13,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:51:15,502][21850] Updated weights for policy 0, policy_version 29814 (0.0031) [2024-06-06 14:51:18,056][21617] Fps is (10 sec: 45872.8, 60 sec: 45055.6, 300 sec: 44486.7). Total num frames: 488570880. Throughput: 0: 44961.4. Samples: 15607700. Policy #0 lag: (min: 0.0, avg: 10.0, max: 20.0) [2024-06-06 14:51:18,057][21617] Avg episode reward: [(0, '0.185')] [2024-06-06 14:51:19,058][21850] Updated weights for policy 0, policy_version 29824 (0.0034) [2024-06-06 14:51:22,992][21850] Updated weights for policy 0, policy_version 29834 (0.0034) [2024-06-06 14:51:23,056][21617] Fps is (10 sec: 47513.0, 60 sec: 45055.9, 300 sec: 44597.8). Total num frames: 488800256. Throughput: 0: 44975.1. Samples: 15877860. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:51:23,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:51:26,419][21850] Updated weights for policy 0, policy_version 29844 (0.0034) [2024-06-06 14:51:28,055][21617] Fps is (10 sec: 44239.0, 60 sec: 44513.2, 300 sec: 44597.8). Total num frames: 489013248. Throughput: 0: 44920.1. Samples: 16010400. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:51:28,056][21617] Avg episode reward: [(0, '0.186')] [2024-06-06 14:51:30,557][21850] Updated weights for policy 0, policy_version 29854 (0.0044) [2024-06-06 14:51:33,055][21617] Fps is (10 sec: 45875.4, 60 sec: 45056.0, 300 sec: 44653.3). Total num frames: 489259008. Throughput: 0: 45017.3. Samples: 16279900. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:51:33,056][21617] Avg episode reward: [(0, '0.176')] [2024-06-06 14:51:33,479][21850] Updated weights for policy 0, policy_version 29864 (0.0033) [2024-06-06 14:51:37,734][21850] Updated weights for policy 0, policy_version 29874 (0.0037) [2024-06-06 14:51:38,055][21617] Fps is (10 sec: 44236.8, 60 sec: 44783.0, 300 sec: 44708.9). Total num frames: 489455616. Throughput: 0: 44885.9. Samples: 16546500. Policy #0 lag: (min: 0.0, avg: 10.5, max: 20.0) [2024-06-06 14:51:38,056][21617] Avg episode reward: [(0, '0.169')] [2024-06-06 14:51:40,977][21850] Updated weights for policy 0, policy_version 29884 (0.0031) [2024-06-06 14:51:43,055][21617] Fps is (10 sec: 42598.4, 60 sec: 44782.9, 300 sec: 44875.5). Total num frames: 489684992. Throughput: 0: 44874.3. Samples: 16677580. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:51:43,056][21617] Avg episode reward: [(0, '0.188')] [2024-06-06 14:51:44,789][21850] Updated weights for policy 0, policy_version 29894 (0.0035) [2024-06-06 14:51:48,055][21617] Fps is (10 sec: 45875.2, 60 sec: 45056.0, 300 sec: 44875.5). Total num frames: 489914368. Throughput: 0: 44761.0. Samples: 16945000. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:51:48,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:51:48,524][21850] Updated weights for policy 0, policy_version 29904 (0.0037) [2024-06-06 14:51:52,362][21850] Updated weights for policy 0, policy_version 29914 (0.0032) [2024-06-06 14:51:53,055][21617] Fps is (10 sec: 45875.6, 60 sec: 45329.2, 300 sec: 44931.0). Total num frames: 490143744. Throughput: 0: 44678.7. Samples: 17217900. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:51:53,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:51:55,695][21850] Updated weights for policy 0, policy_version 29924 (0.0031) [2024-06-06 14:51:58,055][21617] Fps is (10 sec: 42598.6, 60 sec: 44510.0, 300 sec: 44708.9). Total num frames: 490340352. Throughput: 0: 44817.3. Samples: 17352300. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 14:51:58,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:51:59,860][21850] Updated weights for policy 0, policy_version 29934 (0.0038) [2024-06-06 14:52:02,882][21850] Updated weights for policy 0, policy_version 29944 (0.0035) [2024-06-06 14:52:03,060][21617] Fps is (10 sec: 45854.6, 60 sec: 45052.8, 300 sec: 45041.4). Total num frames: 490602496. Throughput: 0: 44642.3. Samples: 17616780. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 14:52:03,060][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 14:52:07,031][21850] Updated weights for policy 0, policy_version 29954 (0.0034) [2024-06-06 14:52:08,056][21617] Fps is (10 sec: 47513.0, 60 sec: 45055.9, 300 sec: 45042.1). Total num frames: 490815488. Throughput: 0: 44696.0. Samples: 17889180. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 14:52:08,056][21617] Avg episode reward: [(0, '0.190')] [2024-06-06 14:52:10,165][21850] Updated weights for policy 0, policy_version 29964 (0.0039) [2024-06-06 14:52:11,572][21829] Signal inference workers to stop experience collection... (250 times) [2024-06-06 14:52:11,618][21850] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-06 14:52:11,685][21829] Signal inference workers to resume experience collection... (250 times) [2024-06-06 14:52:11,685][21850] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-06 14:52:13,060][21617] Fps is (10 sec: 40959.9, 60 sec: 44779.6, 300 sec: 44819.9). Total num frames: 491012096. Throughput: 0: 44664.4. Samples: 18020500. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 14:52:13,060][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:52:14,476][21850] Updated weights for policy 0, policy_version 29974 (0.0037) [2024-06-06 14:52:17,782][21850] Updated weights for policy 0, policy_version 29984 (0.0045) [2024-06-06 14:52:18,055][21617] Fps is (10 sec: 44237.1, 60 sec: 44783.3, 300 sec: 44931.0). Total num frames: 491257856. Throughput: 0: 44632.5. Samples: 18288360. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 14:52:18,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 14:52:21,795][21850] Updated weights for policy 0, policy_version 29994 (0.0026) [2024-06-06 14:52:23,055][21617] Fps is (10 sec: 47535.3, 60 sec: 44783.1, 300 sec: 45042.1). Total num frames: 491487232. Throughput: 0: 44680.1. Samples: 18557100. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 14:52:23,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:52:24,917][21850] Updated weights for policy 0, policy_version 30004 (0.0030) [2024-06-06 14:52:28,055][21617] Fps is (10 sec: 42598.5, 60 sec: 44509.9, 300 sec: 44875.5). Total num frames: 491683840. Throughput: 0: 44836.9. Samples: 18695240. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 14:52:28,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:52:29,122][21850] Updated weights for policy 0, policy_version 30014 (0.0031) [2024-06-06 14:52:32,270][21850] Updated weights for policy 0, policy_version 30024 (0.0030) [2024-06-06 14:52:33,056][21617] Fps is (10 sec: 42597.1, 60 sec: 44236.7, 300 sec: 44820.0). Total num frames: 491913216. Throughput: 0: 44798.0. Samples: 18960920. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 14:52:33,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:52:33,070][21829] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000030024_491913216.pth... [2024-06-06 14:52:33,136][21829] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000029367_481148928.pth [2024-06-06 14:52:36,271][21850] Updated weights for policy 0, policy_version 30034 (0.0024) [2024-06-06 14:52:38,055][21617] Fps is (10 sec: 49151.6, 60 sec: 45329.0, 300 sec: 45097.7). Total num frames: 492175360. Throughput: 0: 44699.0. Samples: 19229360. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 14:52:38,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:52:39,935][21850] Updated weights for policy 0, policy_version 30044 (0.0028) [2024-06-06 14:52:43,055][21617] Fps is (10 sec: 42599.4, 60 sec: 44236.8, 300 sec: 44764.4). Total num frames: 492339200. Throughput: 0: 44838.6. Samples: 19370040. Policy #0 lag: (min: 0.0, avg: 12.6, max: 24.0) [2024-06-06 14:52:43,056][21617] Avg episode reward: [(0, '0.176')] [2024-06-06 14:52:43,927][21850] Updated weights for policy 0, policy_version 30054 (0.0029) [2024-06-06 14:52:47,235][21850] Updated weights for policy 0, policy_version 30064 (0.0035) [2024-06-06 14:52:48,056][21617] Fps is (10 sec: 40960.0, 60 sec: 44509.8, 300 sec: 44764.4). Total num frames: 492584960. Throughput: 0: 44735.9. Samples: 19629700. Policy #0 lag: (min: 0.0, avg: 12.6, max: 24.0) [2024-06-06 14:52:48,056][21617] Avg episode reward: [(0, '0.174')] [2024-06-06 14:52:51,373][21850] Updated weights for policy 0, policy_version 30074 (0.0043) [2024-06-06 14:52:53,060][21617] Fps is (10 sec: 49129.8, 60 sec: 44779.5, 300 sec: 45041.4). Total num frames: 492830720. Throughput: 0: 44554.7. Samples: 19894340. Policy #0 lag: (min: 0.0, avg: 12.6, max: 24.0) [2024-06-06 14:52:53,061][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:52:54,404][21850] Updated weights for policy 0, policy_version 30084 (0.0040) [2024-06-06 14:52:58,056][21617] Fps is (10 sec: 44236.8, 60 sec: 44782.8, 300 sec: 44820.0). Total num frames: 493027328. Throughput: 0: 44729.3. Samples: 20033120. Policy #0 lag: (min: 0.0, avg: 12.6, max: 24.0) [2024-06-06 14:52:58,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:52:58,468][21850] Updated weights for policy 0, policy_version 30094 (0.0033) [2024-06-06 14:53:01,736][21850] Updated weights for policy 0, policy_version 30104 (0.0028) [2024-06-06 14:53:03,060][21617] Fps is (10 sec: 40960.1, 60 sec: 43963.7, 300 sec: 44764.4). Total num frames: 493240320. Throughput: 0: 44629.8. Samples: 20296900. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 14:53:03,060][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 14:53:05,701][21850] Updated weights for policy 0, policy_version 30114 (0.0031) [2024-06-06 14:53:08,056][21617] Fps is (10 sec: 47512.9, 60 sec: 44782.8, 300 sec: 44986.6). Total num frames: 493502464. Throughput: 0: 44578.8. Samples: 20563160. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 14:53:08,060][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:53:09,234][21850] Updated weights for policy 0, policy_version 30124 (0.0023) [2024-06-06 14:53:13,056][21617] Fps is (10 sec: 45895.4, 60 sec: 44786.2, 300 sec: 44875.5). Total num frames: 493699072. Throughput: 0: 44690.6. Samples: 20706320. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 14:53:13,056][21617] Avg episode reward: [(0, '0.176')] [2024-06-06 14:53:13,263][21850] Updated weights for policy 0, policy_version 30134 (0.0024) [2024-06-06 14:53:16,486][21850] Updated weights for policy 0, policy_version 30144 (0.0041) [2024-06-06 14:53:18,056][21617] Fps is (10 sec: 42598.8, 60 sec: 44509.8, 300 sec: 44819.9). Total num frames: 493928448. Throughput: 0: 44501.0. Samples: 20963460. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:53:18,056][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 14:53:20,592][21850] Updated weights for policy 0, policy_version 30154 (0.0037) [2024-06-06 14:53:23,056][21617] Fps is (10 sec: 47513.0, 60 sec: 44782.7, 300 sec: 44986.5). Total num frames: 494174208. Throughput: 0: 44624.7. Samples: 21237480. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:53:23,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:53:23,830][21850] Updated weights for policy 0, policy_version 30164 (0.0046) [2024-06-06 14:53:27,664][21850] Updated weights for policy 0, policy_version 30174 (0.0055) [2024-06-06 14:53:28,060][21617] Fps is (10 sec: 45854.6, 60 sec: 45052.5, 300 sec: 44874.8). Total num frames: 494387200. Throughput: 0: 44635.4. Samples: 21378840. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:53:28,061][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:53:30,826][21850] Updated weights for policy 0, policy_version 30184 (0.0036) [2024-06-06 14:53:32,390][21829] Signal inference workers to stop experience collection... (300 times) [2024-06-06 14:53:32,440][21850] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-06 14:53:32,446][21829] Signal inference workers to resume experience collection... (300 times) [2024-06-06 14:53:32,456][21850] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-06 14:53:33,057][21617] Fps is (10 sec: 42592.7, 60 sec: 44781.9, 300 sec: 44819.8). Total num frames: 494600192. Throughput: 0: 44916.7. Samples: 21651020. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:53:33,058][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:53:35,163][21850] Updated weights for policy 0, policy_version 30194 (0.0043) [2024-06-06 14:53:38,055][21617] Fps is (10 sec: 44257.2, 60 sec: 44236.8, 300 sec: 44820.2). Total num frames: 494829568. Throughput: 0: 44877.4. Samples: 21913620. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:53:38,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:53:38,356][21850] Updated weights for policy 0, policy_version 30204 (0.0035) [2024-06-06 14:53:42,149][21850] Updated weights for policy 0, policy_version 30214 (0.0031) [2024-06-06 14:53:43,056][21617] Fps is (10 sec: 45880.2, 60 sec: 45328.7, 300 sec: 44931.0). Total num frames: 495058944. Throughput: 0: 44910.7. Samples: 22054120. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:53:43,056][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 14:53:45,885][21850] Updated weights for policy 0, policy_version 30224 (0.0035) [2024-06-06 14:53:48,055][21617] Fps is (10 sec: 44237.0, 60 sec: 44783.0, 300 sec: 44708.9). Total num frames: 495271936. Throughput: 0: 45013.8. Samples: 22322320. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:53:48,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:53:49,622][21850] Updated weights for policy 0, policy_version 30234 (0.0036) [2024-06-06 14:53:52,937][21850] Updated weights for policy 0, policy_version 30244 (0.0026) [2024-06-06 14:53:53,056][21617] Fps is (10 sec: 45877.0, 60 sec: 44786.3, 300 sec: 44820.0). Total num frames: 495517696. Throughput: 0: 44995.7. Samples: 22587960. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 14:53:53,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:53:56,634][21850] Updated weights for policy 0, policy_version 30254 (0.0022) [2024-06-06 14:53:58,055][21617] Fps is (10 sec: 49151.8, 60 sec: 45602.2, 300 sec: 44986.6). Total num frames: 495763456. Throughput: 0: 44925.4. Samples: 22727960. Policy #0 lag: (min: 1.0, avg: 11.3, max: 24.0) [2024-06-06 14:53:58,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:54:00,019][21850] Updated weights for policy 0, policy_version 30264 (0.0027) [2024-06-06 14:54:03,056][21617] Fps is (10 sec: 42598.3, 60 sec: 45059.3, 300 sec: 44764.4). Total num frames: 495943680. Throughput: 0: 45226.7. Samples: 22998660. Policy #0 lag: (min: 1.0, avg: 11.3, max: 24.0) [2024-06-06 14:54:03,056][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 14:54:03,961][21850] Updated weights for policy 0, policy_version 30274 (0.0029) [2024-06-06 14:54:07,494][21850] Updated weights for policy 0, policy_version 30284 (0.0037) [2024-06-06 14:54:08,060][21617] Fps is (10 sec: 40941.5, 60 sec: 44506.7, 300 sec: 44763.7). Total num frames: 496173056. Throughput: 0: 45045.9. Samples: 23264740. Policy #0 lag: (min: 1.0, avg: 11.3, max: 24.0) [2024-06-06 14:54:08,061][21617] Avg episode reward: [(0, '0.186')] [2024-06-06 14:54:11,129][21850] Updated weights for policy 0, policy_version 30294 (0.0032) [2024-06-06 14:54:13,055][21617] Fps is (10 sec: 47513.9, 60 sec: 45329.1, 300 sec: 44931.0). Total num frames: 496418816. Throughput: 0: 45070.8. Samples: 23406820. Policy #0 lag: (min: 1.0, avg: 11.3, max: 24.0) [2024-06-06 14:54:13,056][21617] Avg episode reward: [(0, '0.176')] [2024-06-06 14:54:15,046][21850] Updated weights for policy 0, policy_version 30304 (0.0029) [2024-06-06 14:54:18,055][21617] Fps is (10 sec: 45895.8, 60 sec: 45056.1, 300 sec: 44819.9). Total num frames: 496631808. Throughput: 0: 44825.5. Samples: 23668100. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:54:18,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:54:18,783][21850] Updated weights for policy 0, policy_version 30314 (0.0039) [2024-06-06 14:54:22,209][21850] Updated weights for policy 0, policy_version 30324 (0.0025) [2024-06-06 14:54:23,055][21617] Fps is (10 sec: 42598.5, 60 sec: 44510.0, 300 sec: 44708.9). Total num frames: 496844800. Throughput: 0: 45095.6. Samples: 23942920. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:54:23,059][21617] Avg episode reward: [(0, '0.185')] [2024-06-06 14:54:25,757][21850] Updated weights for policy 0, policy_version 30334 (0.0030) [2024-06-06 14:54:28,056][21617] Fps is (10 sec: 45874.8, 60 sec: 45059.4, 300 sec: 44931.0). Total num frames: 497090560. Throughput: 0: 44953.6. Samples: 24077020. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:54:28,060][21617] Avg episode reward: [(0, '0.174')] [2024-06-06 14:54:29,471][21850] Updated weights for policy 0, policy_version 30344 (0.0030) [2024-06-06 14:54:32,784][21850] Updated weights for policy 0, policy_version 30354 (0.0031) [2024-06-06 14:54:33,060][21617] Fps is (10 sec: 47492.0, 60 sec: 45326.8, 300 sec: 44819.3). Total num frames: 497319936. Throughput: 0: 45042.5. Samples: 24349440. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 14:54:33,061][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 14:54:33,066][21829] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000030354_497319936.pth... [2024-06-06 14:54:33,129][21829] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000029697_486555648.pth [2024-06-06 14:54:36,981][21850] Updated weights for policy 0, policy_version 30364 (0.0031) [2024-06-06 14:54:38,055][21617] Fps is (10 sec: 45875.7, 60 sec: 45329.0, 300 sec: 44875.5). Total num frames: 497549312. Throughput: 0: 45210.7. Samples: 24622440. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 14:54:38,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:54:40,266][21850] Updated weights for policy 0, policy_version 30374 (0.0026) [2024-06-06 14:54:43,060][21617] Fps is (10 sec: 45875.6, 60 sec: 45326.0, 300 sec: 44930.4). Total num frames: 497778688. Throughput: 0: 45028.9. Samples: 24754460. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 14:54:43,060][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:54:44,151][21850] Updated weights for policy 0, policy_version 30384 (0.0032) [2024-06-06 14:54:47,976][21850] Updated weights for policy 0, policy_version 30394 (0.0041) [2024-06-06 14:54:48,056][21617] Fps is (10 sec: 42597.9, 60 sec: 45055.9, 300 sec: 44819.9). Total num frames: 497975296. Throughput: 0: 44974.1. Samples: 25022500. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 14:54:48,060][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:54:51,408][21850] Updated weights for policy 0, policy_version 30404 (0.0035) [2024-06-06 14:54:53,056][21617] Fps is (10 sec: 42616.5, 60 sec: 44782.8, 300 sec: 44875.4). Total num frames: 498204672. Throughput: 0: 45035.0. Samples: 25291120. Policy #0 lag: (min: 0.0, avg: 9.9, max: 23.0) [2024-06-06 14:54:53,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:54:54,950][21850] Updated weights for policy 0, policy_version 30414 (0.0027) [2024-06-06 14:54:55,525][21829] Signal inference workers to stop experience collection... (350 times) [2024-06-06 14:54:55,566][21850] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-06 14:54:55,573][21829] Signal inference workers to resume experience collection... (350 times) [2024-06-06 14:54:55,588][21850] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-06 14:54:58,055][21617] Fps is (10 sec: 45876.2, 60 sec: 44509.9, 300 sec: 44986.6). Total num frames: 498434048. Throughput: 0: 44813.9. Samples: 25423440. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 14:54:58,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:54:58,835][21850] Updated weights for policy 0, policy_version 30424 (0.0024) [2024-06-06 14:55:02,027][21850] Updated weights for policy 0, policy_version 30434 (0.0042) [2024-06-06 14:55:03,055][21617] Fps is (10 sec: 45876.0, 60 sec: 45329.1, 300 sec: 44875.5). Total num frames: 498663424. Throughput: 0: 45013.8. Samples: 25693720. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 14:55:03,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:55:06,386][21850] Updated weights for policy 0, policy_version 30444 (0.0027) [2024-06-06 14:55:08,060][21617] Fps is (10 sec: 42579.1, 60 sec: 44783.0, 300 sec: 44708.2). Total num frames: 498860032. Throughput: 0: 44748.9. Samples: 25956820. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 14:55:08,061][21617] Avg episode reward: [(0, '0.176')] [2024-06-06 14:55:09,643][21850] Updated weights for policy 0, policy_version 30454 (0.0032) [2024-06-06 14:55:13,055][21617] Fps is (10 sec: 42598.6, 60 sec: 44509.9, 300 sec: 44820.0). Total num frames: 499089408. Throughput: 0: 44739.3. Samples: 26090280. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 14:55:13,056][21617] Avg episode reward: [(0, '0.191')] [2024-06-06 14:55:13,502][21850] Updated weights for policy 0, policy_version 30464 (0.0033) [2024-06-06 14:55:17,198][21850] Updated weights for policy 0, policy_version 30474 (0.0030) [2024-06-06 14:55:18,056][21617] Fps is (10 sec: 49173.6, 60 sec: 45329.1, 300 sec: 44931.0). Total num frames: 499351552. Throughput: 0: 44750.2. Samples: 26363000. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 14:55:18,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:55:20,887][21850] Updated weights for policy 0, policy_version 30484 (0.0044) [2024-06-06 14:55:23,055][21617] Fps is (10 sec: 44236.9, 60 sec: 44783.0, 300 sec: 44709.6). Total num frames: 499531776. Throughput: 0: 44827.6. Samples: 26639680. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 14:55:23,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:55:24,110][21850] Updated weights for policy 0, policy_version 30494 (0.0045) [2024-06-06 14:55:27,963][21850] Updated weights for policy 0, policy_version 30504 (0.0036) [2024-06-06 14:55:28,056][21617] Fps is (10 sec: 42598.0, 60 sec: 44782.9, 300 sec: 44819.9). Total num frames: 499777536. Throughput: 0: 44748.7. Samples: 26767960. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 14:55:28,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:55:31,313][21850] Updated weights for policy 0, policy_version 30514 (0.0030) [2024-06-06 14:55:33,055][21617] Fps is (10 sec: 49151.8, 60 sec: 45059.4, 300 sec: 44931.0). Total num frames: 500023296. Throughput: 0: 44685.0. Samples: 27033320. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 14:55:33,056][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 14:55:35,288][21850] Updated weights for policy 0, policy_version 30524 (0.0036) [2024-06-06 14:55:38,055][21617] Fps is (10 sec: 44237.4, 60 sec: 44509.9, 300 sec: 44820.0). Total num frames: 500219904. Throughput: 0: 44913.1. Samples: 27312200. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:55:38,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:55:38,970][21850] Updated weights for policy 0, policy_version 30534 (0.0027) [2024-06-06 14:55:42,488][21850] Updated weights for policy 0, policy_version 30544 (0.0034) [2024-06-06 14:55:43,060][21617] Fps is (10 sec: 42580.0, 60 sec: 44510.0, 300 sec: 44874.8). Total num frames: 500449280. Throughput: 0: 44751.2. Samples: 27437440. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:55:43,060][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:55:46,243][21850] Updated weights for policy 0, policy_version 30554 (0.0032) [2024-06-06 14:55:48,055][21617] Fps is (10 sec: 47513.8, 60 sec: 45329.2, 300 sec: 44986.6). Total num frames: 500695040. Throughput: 0: 44812.1. Samples: 27710260. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:55:48,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 14:55:49,675][21850] Updated weights for policy 0, policy_version 30564 (0.0029) [2024-06-06 14:55:53,056][21617] Fps is (10 sec: 44255.1, 60 sec: 44783.0, 300 sec: 44819.9). Total num frames: 500891648. Throughput: 0: 45034.0. Samples: 27983160. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:55:53,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:55:53,536][21850] Updated weights for policy 0, policy_version 30574 (0.0042) [2024-06-06 14:55:56,972][21850] Updated weights for policy 0, policy_version 30584 (0.0031) [2024-06-06 14:55:58,055][21617] Fps is (10 sec: 42598.4, 60 sec: 44782.9, 300 sec: 44820.0). Total num frames: 501121024. Throughput: 0: 44883.6. Samples: 28110040. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:55:58,056][21617] Avg episode reward: [(0, '0.174')] [2024-06-06 14:56:00,700][21850] Updated weights for policy 0, policy_version 30594 (0.0035) [2024-06-06 14:56:03,056][21617] Fps is (10 sec: 45875.5, 60 sec: 44782.9, 300 sec: 44875.5). Total num frames: 501350400. Throughput: 0: 44882.6. Samples: 28382720. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:56:03,056][21617] Avg episode reward: [(0, '0.172')] [2024-06-06 14:56:04,524][21850] Updated weights for policy 0, policy_version 30604 (0.0036) [2024-06-06 14:56:08,055][21617] Fps is (10 sec: 44237.0, 60 sec: 45059.4, 300 sec: 44875.5). Total num frames: 501563392. Throughput: 0: 44791.6. Samples: 28655300. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:56:08,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:56:08,160][21850] Updated weights for policy 0, policy_version 30614 (0.0045) [2024-06-06 14:56:12,017][21850] Updated weights for policy 0, policy_version 30624 (0.0032) [2024-06-06 14:56:13,056][21617] Fps is (10 sec: 44237.2, 60 sec: 45056.0, 300 sec: 44820.0). Total num frames: 501792768. Throughput: 0: 44861.0. Samples: 28786700. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:56:13,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:56:14,963][21829] Signal inference workers to stop experience collection... (400 times) [2024-06-06 14:56:14,964][21829] Signal inference workers to resume experience collection... (400 times) [2024-06-06 14:56:14,996][21850] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-06 14:56:14,996][21850] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-06 14:56:15,525][21850] Updated weights for policy 0, policy_version 30634 (0.0035) [2024-06-06 14:56:18,055][21617] Fps is (10 sec: 44236.8, 60 sec: 44236.9, 300 sec: 44764.4). Total num frames: 502005760. Throughput: 0: 44849.4. Samples: 29051540. Policy #0 lag: (min: 0.0, avg: 9.1, max: 20.0) [2024-06-06 14:56:18,056][21617] Avg episode reward: [(0, '0.172')] [2024-06-06 14:56:19,102][21850] Updated weights for policy 0, policy_version 30644 (0.0034) [2024-06-06 14:56:22,715][21850] Updated weights for policy 0, policy_version 30654 (0.0039) [2024-06-06 14:56:23,060][21617] Fps is (10 sec: 45854.8, 60 sec: 45325.6, 300 sec: 44874.8). Total num frames: 502251520. Throughput: 0: 44852.9. Samples: 29330780. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 14:56:23,061][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:56:26,474][21850] Updated weights for policy 0, policy_version 30664 (0.0025) [2024-06-06 14:56:28,055][21617] Fps is (10 sec: 44236.8, 60 sec: 44510.0, 300 sec: 44708.9). Total num frames: 502448128. Throughput: 0: 44876.4. Samples: 29456680. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 14:56:28,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:56:29,835][21850] Updated weights for policy 0, policy_version 30674 (0.0030) [2024-06-06 14:56:33,055][21617] Fps is (10 sec: 42617.7, 60 sec: 44236.8, 300 sec: 44820.0). Total num frames: 502677504. Throughput: 0: 44811.1. Samples: 29726760. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 14:56:33,056][21617] Avg episode reward: [(0, '0.184')] [2024-06-06 14:56:33,071][21829] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000030682_502693888.pth... [2024-06-06 14:56:33,127][21829] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000030024_491913216.pth [2024-06-06 14:56:33,607][21850] Updated weights for policy 0, policy_version 30684 (0.0036) [2024-06-06 14:56:37,319][21850] Updated weights for policy 0, policy_version 30694 (0.0050) [2024-06-06 14:56:38,056][21617] Fps is (10 sec: 47513.0, 60 sec: 45056.0, 300 sec: 44875.5). Total num frames: 502923264. Throughput: 0: 44599.2. Samples: 29990120. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 14:56:38,056][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 14:56:41,139][21850] Updated weights for policy 0, policy_version 30704 (0.0027) [2024-06-06 14:56:43,056][21617] Fps is (10 sec: 44236.3, 60 sec: 44513.0, 300 sec: 44764.4). Total num frames: 503119872. Throughput: 0: 44862.6. Samples: 30128860. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 14:56:43,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 14:56:44,543][21850] Updated weights for policy 0, policy_version 30714 (0.0039) [2024-06-06 14:56:48,056][21617] Fps is (10 sec: 44236.5, 60 sec: 44509.7, 300 sec: 44819.9). Total num frames: 503365632. Throughput: 0: 44701.3. Samples: 30394280. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 14:56:48,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:56:48,305][21850] Updated weights for policy 0, policy_version 30724 (0.0032) [2024-06-06 14:56:51,832][21850] Updated weights for policy 0, policy_version 30734 (0.0019) [2024-06-06 14:56:53,060][21617] Fps is (10 sec: 47492.6, 60 sec: 45052.8, 300 sec: 44930.3). Total num frames: 503595008. Throughput: 0: 44667.0. Samples: 30665520. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 14:56:53,061][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:56:55,716][21850] Updated weights for policy 0, policy_version 30744 (0.0039) [2024-06-06 14:56:58,057][21617] Fps is (10 sec: 44232.8, 60 sec: 44782.1, 300 sec: 44764.9). Total num frames: 503808000. Throughput: 0: 44826.5. Samples: 30803940. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 14:56:58,057][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:56:58,987][21850] Updated weights for policy 0, policy_version 30754 (0.0041) [2024-06-06 14:57:02,931][21850] Updated weights for policy 0, policy_version 30764 (0.0031) [2024-06-06 14:57:03,056][21617] Fps is (10 sec: 44256.5, 60 sec: 44783.0, 300 sec: 44820.0). Total num frames: 504037376. Throughput: 0: 44850.1. Samples: 31069800. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:57:03,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:57:06,459][21850] Updated weights for policy 0, policy_version 30774 (0.0030) [2024-06-06 14:57:08,055][21617] Fps is (10 sec: 45880.1, 60 sec: 45056.0, 300 sec: 44931.7). Total num frames: 504266752. Throughput: 0: 44626.2. Samples: 31338760. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:57:08,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:57:10,109][21850] Updated weights for policy 0, policy_version 30784 (0.0028) [2024-06-06 14:57:13,055][21617] Fps is (10 sec: 44237.3, 60 sec: 44783.0, 300 sec: 44820.0). Total num frames: 504479744. Throughput: 0: 44850.7. Samples: 31474960. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:57:13,056][21617] Avg episode reward: [(0, '0.174')] [2024-06-06 14:57:13,504][21850] Updated weights for policy 0, policy_version 30794 (0.0030) [2024-06-06 14:57:17,384][21850] Updated weights for policy 0, policy_version 30804 (0.0026) [2024-06-06 14:57:18,056][21617] Fps is (10 sec: 44236.3, 60 sec: 45055.9, 300 sec: 44819.9). Total num frames: 504709120. Throughput: 0: 45064.7. Samples: 31754680. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 14:57:18,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 14:57:20,817][21850] Updated weights for policy 0, policy_version 30814 (0.0031) [2024-06-06 14:57:23,055][21617] Fps is (10 sec: 45874.9, 60 sec: 44786.3, 300 sec: 44931.0). Total num frames: 504938496. Throughput: 0: 45104.1. Samples: 32019800. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:57:23,056][21617] Avg episode reward: [(0, '0.186')] [2024-06-06 14:57:24,566][21850] Updated weights for policy 0, policy_version 30824 (0.0037) [2024-06-06 14:57:28,055][21617] Fps is (10 sec: 45875.9, 60 sec: 45329.1, 300 sec: 44931.1). Total num frames: 505167872. Throughput: 0: 44999.7. Samples: 32153840. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:57:28,056][21617] Avg episode reward: [(0, '0.185')] [2024-06-06 14:57:28,203][21850] Updated weights for policy 0, policy_version 30834 (0.0038) [2024-06-06 14:57:31,761][21850] Updated weights for policy 0, policy_version 30844 (0.0030) [2024-06-06 14:57:33,056][21617] Fps is (10 sec: 44234.7, 60 sec: 45055.6, 300 sec: 44764.4). Total num frames: 505380864. Throughput: 0: 44922.8. Samples: 32415820. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:57:33,057][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:57:34,900][21829] Signal inference workers to stop experience collection... (450 times) [2024-06-06 14:57:34,900][21829] Signal inference workers to resume experience collection... (450 times) [2024-06-06 14:57:34,947][21850] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-06 14:57:34,948][21850] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-06 14:57:35,731][21850] Updated weights for policy 0, policy_version 30854 (0.0036) [2024-06-06 14:57:38,055][21617] Fps is (10 sec: 44236.7, 60 sec: 44783.0, 300 sec: 44986.6). Total num frames: 505610240. Throughput: 0: 44826.7. Samples: 32682520. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 14:57:38,056][21617] Avg episode reward: [(0, '0.176')] [2024-06-06 14:57:39,267][21850] Updated weights for policy 0, policy_version 30864 (0.0043) [2024-06-06 14:57:42,912][21850] Updated weights for policy 0, policy_version 30874 (0.0035) [2024-06-06 14:57:43,055][21617] Fps is (10 sec: 47516.1, 60 sec: 45602.2, 300 sec: 44986.6). Total num frames: 505856000. Throughput: 0: 44880.2. Samples: 32823500. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 14:57:43,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:57:46,835][21850] Updated weights for policy 0, policy_version 30884 (0.0035) [2024-06-06 14:57:48,055][21617] Fps is (10 sec: 40959.8, 60 sec: 44236.9, 300 sec: 44709.6). Total num frames: 506019840. Throughput: 0: 44739.6. Samples: 33083080. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 14:57:48,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:57:50,197][21850] Updated weights for policy 0, policy_version 30894 (0.0035) [2024-06-06 14:57:53,056][21617] Fps is (10 sec: 40959.2, 60 sec: 44513.1, 300 sec: 44875.5). Total num frames: 506265600. Throughput: 0: 44712.3. Samples: 33350820. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 14:57:53,057][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:57:54,064][21850] Updated weights for policy 0, policy_version 30904 (0.0022) [2024-06-06 14:57:57,678][21850] Updated weights for policy 0, policy_version 30914 (0.0027) [2024-06-06 14:57:58,055][21617] Fps is (10 sec: 49152.1, 60 sec: 45056.8, 300 sec: 44987.3). Total num frames: 506511360. Throughput: 0: 44693.7. Samples: 33486180. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 14:57:58,056][21617] Avg episode reward: [(0, '0.174')] [2024-06-06 14:58:01,354][21850] Updated weights for policy 0, policy_version 30924 (0.0049) [2024-06-06 14:58:03,055][21617] Fps is (10 sec: 42599.1, 60 sec: 44236.8, 300 sec: 44708.9). Total num frames: 506691584. Throughput: 0: 44321.0. Samples: 33749120. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 14:58:03,056][21617] Avg episode reward: [(0, '0.184')] [2024-06-06 14:58:05,001][21850] Updated weights for policy 0, policy_version 30934 (0.0040) [2024-06-06 14:58:08,055][21617] Fps is (10 sec: 44236.9, 60 sec: 44782.9, 300 sec: 44931.0). Total num frames: 506953728. Throughput: 0: 44513.3. Samples: 34022900. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 14:58:08,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:58:08,615][21850] Updated weights for policy 0, policy_version 30944 (0.0041) [2024-06-06 14:58:12,180][21850] Updated weights for policy 0, policy_version 30954 (0.0042) [2024-06-06 14:58:13,056][21617] Fps is (10 sec: 47511.4, 60 sec: 44782.5, 300 sec: 44875.4). Total num frames: 507166720. Throughput: 0: 44668.4. Samples: 34163940. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 14:58:13,057][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 14:58:16,020][21850] Updated weights for policy 0, policy_version 30964 (0.0027) [2024-06-06 14:58:18,055][21617] Fps is (10 sec: 42598.4, 60 sec: 44509.9, 300 sec: 44764.5). Total num frames: 507379712. Throughput: 0: 44672.9. Samples: 34426080. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 14:58:18,056][21617] Avg episode reward: [(0, '0.188')] [2024-06-06 14:58:19,388][21850] Updated weights for policy 0, policy_version 30974 (0.0035) [2024-06-06 14:58:23,055][21617] Fps is (10 sec: 44238.9, 60 sec: 44509.9, 300 sec: 44820.7). Total num frames: 507609088. Throughput: 0: 44704.5. Samples: 34694220. Policy #0 lag: (min: 1.0, avg: 11.9, max: 24.0) [2024-06-06 14:58:23,056][21617] Avg episode reward: [(0, '0.172')] [2024-06-06 14:58:23,406][21850] Updated weights for policy 0, policy_version 30984 (0.0036) [2024-06-06 14:58:27,174][21850] Updated weights for policy 0, policy_version 30994 (0.0032) [2024-06-06 14:58:28,055][21617] Fps is (10 sec: 44236.8, 60 sec: 44236.8, 300 sec: 44820.2). Total num frames: 507822080. Throughput: 0: 44578.6. Samples: 34829540. Policy #0 lag: (min: 1.0, avg: 11.9, max: 24.0) [2024-06-06 14:58:28,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 14:58:30,660][21850] Updated weights for policy 0, policy_version 31004 (0.0021) [2024-06-06 14:58:33,055][21617] Fps is (10 sec: 42598.4, 60 sec: 44237.2, 300 sec: 44764.4). Total num frames: 508035072. Throughput: 0: 44691.6. Samples: 35094200. Policy #0 lag: (min: 1.0, avg: 11.9, max: 24.0) [2024-06-06 14:58:33,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 14:58:33,218][21829] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000031010_508067840.pth... [2024-06-06 14:58:33,268][21829] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000030354_497319936.pth [2024-06-06 14:58:34,188][21850] Updated weights for policy 0, policy_version 31014 (0.0030) [2024-06-06 14:58:37,783][21850] Updated weights for policy 0, policy_version 31024 (0.0031) [2024-06-06 14:58:38,056][21617] Fps is (10 sec: 47511.4, 60 sec: 44782.6, 300 sec: 44875.5). Total num frames: 508297216. Throughput: 0: 44625.9. Samples: 35359000. Policy #0 lag: (min: 1.0, avg: 11.9, max: 24.0) [2024-06-06 14:58:38,057][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:58:41,567][21850] Updated weights for policy 0, policy_version 31034 (0.0033) [2024-06-06 14:58:43,055][21617] Fps is (10 sec: 45875.4, 60 sec: 43963.8, 300 sec: 44820.0). Total num frames: 508493824. Throughput: 0: 44783.2. Samples: 35501420. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:58:43,056][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 14:58:45,159][21850] Updated weights for policy 0, policy_version 31044 (0.0040) [2024-06-06 14:58:48,056][21617] Fps is (10 sec: 42598.5, 60 sec: 45055.7, 300 sec: 44764.4). Total num frames: 508723200. Throughput: 0: 44938.7. Samples: 35771380. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:58:48,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 14:58:48,815][21850] Updated weights for policy 0, policy_version 31054 (0.0046) [2024-06-06 14:58:52,499][21850] Updated weights for policy 0, policy_version 31064 (0.0021) [2024-06-06 14:58:53,055][21617] Fps is (10 sec: 47513.2, 60 sec: 45056.1, 300 sec: 44764.4). Total num frames: 508968960. Throughput: 0: 44613.8. Samples: 36030520. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:58:53,060][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 14:58:56,207][21850] Updated weights for policy 0, policy_version 31074 (0.0039) [2024-06-06 14:58:58,055][21617] Fps is (10 sec: 45877.6, 60 sec: 44509.9, 300 sec: 44875.5). Total num frames: 509181952. Throughput: 0: 44565.4. Samples: 36169360. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 14:58:58,056][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 14:58:59,418][21829] Signal inference workers to stop experience collection... (500 times) [2024-06-06 14:58:59,441][21850] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-06 14:58:59,477][21829] Signal inference workers to resume experience collection... (500 times) [2024-06-06 14:58:59,477][21850] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-06 14:58:59,603][21850] Updated weights for policy 0, policy_version 31084 (0.0027) [2024-06-06 14:59:03,055][21617] Fps is (10 sec: 42598.3, 60 sec: 45056.0, 300 sec: 44820.6). Total num frames: 509394944. Throughput: 0: 44731.5. Samples: 36439000. Policy #0 lag: (min: 0.0, avg: 12.4, max: 22.0) [2024-06-06 14:59:03,056][21617] Avg episode reward: [(0, '0.185')] [2024-06-06 14:59:03,706][21850] Updated weights for policy 0, policy_version 31094 (0.0034) [2024-06-06 14:59:06,849][21850] Updated weights for policy 0, policy_version 31104 (0.0033) [2024-06-06 14:59:08,060][21617] Fps is (10 sec: 44216.8, 60 sec: 44506.6, 300 sec: 44763.8). Total num frames: 509624320. Throughput: 0: 44716.9. Samples: 36706680. Policy #0 lag: (min: 0.0, avg: 12.4, max: 22.0) [2024-06-06 14:59:08,060][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 14:59:10,889][21850] Updated weights for policy 0, policy_version 31114 (0.0038) [2024-06-06 14:59:13,055][21617] Fps is (10 sec: 44237.1, 60 sec: 44510.2, 300 sec: 44764.4). Total num frames: 509837312. Throughput: 0: 44676.5. Samples: 36839980. Policy #0 lag: (min: 0.0, avg: 12.4, max: 22.0) [2024-06-06 14:59:13,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 14:59:14,343][21850] Updated weights for policy 0, policy_version 31124 (0.0027) [2024-06-06 14:59:18,055][21617] Fps is (10 sec: 44256.3, 60 sec: 44782.9, 300 sec: 44820.0). Total num frames: 510066688. Throughput: 0: 44913.7. Samples: 37115320. Policy #0 lag: (min: 0.0, avg: 12.4, max: 22.0) [2024-06-06 14:59:18,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 14:59:18,344][21850] Updated weights for policy 0, policy_version 31134 (0.0037) [2024-06-06 14:59:21,758][21850] Updated weights for policy 0, policy_version 31144 (0.0028) [2024-06-06 14:59:23,056][21617] Fps is (10 sec: 47512.8, 60 sec: 45055.9, 300 sec: 44820.0). Total num frames: 510312448. Throughput: 0: 44843.9. Samples: 37376960. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:59:23,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 14:59:25,949][21850] Updated weights for policy 0, policy_version 31154 (0.0041) [2024-06-06 14:59:28,055][21617] Fps is (10 sec: 44237.1, 60 sec: 44783.0, 300 sec: 44709.6). Total num frames: 510509056. Throughput: 0: 44647.1. Samples: 37510540. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:59:28,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 14:59:28,845][21850] Updated weights for policy 0, policy_version 31164 (0.0024) [2024-06-06 14:59:33,055][21617] Fps is (10 sec: 42598.8, 60 sec: 45055.9, 300 sec: 44708.9). Total num frames: 510738432. Throughput: 0: 44703.5. Samples: 37783020. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:59:33,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:59:33,311][21850] Updated weights for policy 0, policy_version 31174 (0.0037) [2024-06-06 14:59:36,032][21850] Updated weights for policy 0, policy_version 31184 (0.0030) [2024-06-06 14:59:38,055][21617] Fps is (10 sec: 45875.0, 60 sec: 44510.2, 300 sec: 44709.6). Total num frames: 510967808. Throughput: 0: 44974.3. Samples: 38054360. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 14:59:38,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:59:40,445][21850] Updated weights for policy 0, policy_version 31194 (0.0030) [2024-06-06 14:59:43,055][21617] Fps is (10 sec: 47513.8, 60 sec: 45329.0, 300 sec: 44875.5). Total num frames: 511213568. Throughput: 0: 44855.0. Samples: 38187840. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 14:59:43,056][21617] Avg episode reward: [(0, '0.184')] [2024-06-06 14:59:43,408][21850] Updated weights for policy 0, policy_version 31204 (0.0033) [2024-06-06 14:59:47,639][21850] Updated weights for policy 0, policy_version 31214 (0.0029) [2024-06-06 14:59:48,060][21617] Fps is (10 sec: 45854.4, 60 sec: 45052.9, 300 sec: 44819.3). Total num frames: 511426560. Throughput: 0: 44896.0. Samples: 38459520. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 14:59:48,061][21617] Avg episode reward: [(0, '0.185')] [2024-06-06 14:59:50,723][21850] Updated weights for policy 0, policy_version 31224 (0.0031) [2024-06-06 14:59:53,055][21617] Fps is (10 sec: 40960.0, 60 sec: 44236.8, 300 sec: 44708.9). Total num frames: 511623168. Throughput: 0: 44929.3. Samples: 38728300. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 14:59:53,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 14:59:55,217][21850] Updated weights for policy 0, policy_version 31234 (0.0028) [2024-06-06 14:59:58,055][21617] Fps is (10 sec: 45896.1, 60 sec: 45056.0, 300 sec: 44820.0). Total num frames: 511885312. Throughput: 0: 44843.1. Samples: 38857920. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 14:59:58,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 14:59:58,290][21850] Updated weights for policy 0, policy_version 31244 (0.0032) [2024-06-06 15:00:02,755][21850] Updated weights for policy 0, policy_version 31254 (0.0033) [2024-06-06 15:00:03,055][21617] Fps is (10 sec: 44236.9, 60 sec: 44509.9, 300 sec: 44765.1). Total num frames: 512065536. Throughput: 0: 44767.6. Samples: 39129860. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:00:03,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 15:00:05,368][21850] Updated weights for policy 0, policy_version 31264 (0.0031) [2024-06-06 15:00:08,055][21617] Fps is (10 sec: 42598.1, 60 sec: 44786.2, 300 sec: 44820.0). Total num frames: 512311296. Throughput: 0: 45037.4. Samples: 39403640. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:00:08,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 15:00:09,690][21850] Updated weights for policy 0, policy_version 31274 (0.0038) [2024-06-06 15:00:10,519][21829] Signal inference workers to stop experience collection... (550 times) [2024-06-06 15:00:10,520][21829] Signal inference workers to resume experience collection... (550 times) [2024-06-06 15:00:10,559][21850] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-06 15:00:10,559][21850] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-06 15:00:12,488][21850] Updated weights for policy 0, policy_version 31284 (0.0029) [2024-06-06 15:00:13,056][21617] Fps is (10 sec: 50787.9, 60 sec: 45601.8, 300 sec: 44819.9). Total num frames: 512573440. Throughput: 0: 45081.2. Samples: 39539220. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:00:13,057][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 15:00:17,108][21850] Updated weights for policy 0, policy_version 31294 (0.0040) [2024-06-06 15:00:18,055][21617] Fps is (10 sec: 44237.3, 60 sec: 44783.0, 300 sec: 44820.0). Total num frames: 512753664. Throughput: 0: 44938.3. Samples: 39805240. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:00:18,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 15:00:19,891][21850] Updated weights for policy 0, policy_version 31304 (0.0025) [2024-06-06 15:00:23,060][21617] Fps is (10 sec: 40943.5, 60 sec: 44506.7, 300 sec: 44763.8). Total num frames: 512983040. Throughput: 0: 44846.2. Samples: 40072640. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 15:00:23,061][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 15:00:24,215][21850] Updated weights for policy 0, policy_version 31314 (0.0025) [2024-06-06 15:00:27,257][21850] Updated weights for policy 0, policy_version 31324 (0.0034) [2024-06-06 15:00:28,056][21617] Fps is (10 sec: 49151.3, 60 sec: 45602.0, 300 sec: 44819.9). Total num frames: 513245184. Throughput: 0: 44978.6. Samples: 40211880. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 15:00:28,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 15:00:31,805][21850] Updated weights for policy 0, policy_version 31334 (0.0030) [2024-06-06 15:00:33,055][21617] Fps is (10 sec: 47535.2, 60 sec: 45329.1, 300 sec: 44875.5). Total num frames: 513458176. Throughput: 0: 45057.9. Samples: 40486920. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 15:00:33,056][21617] Avg episode reward: [(0, '0.182')] [2024-06-06 15:00:33,164][21829] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000031340_513474560.pth... [2024-06-06 15:00:33,217][21829] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000030682_502693888.pth [2024-06-06 15:00:34,212][21850] Updated weights for policy 0, policy_version 31344 (0.0027) [2024-06-06 15:00:38,056][21617] Fps is (10 sec: 42598.4, 60 sec: 45055.9, 300 sec: 44820.6). Total num frames: 513671168. Throughput: 0: 45003.9. Samples: 40753480. Policy #0 lag: (min: 0.0, avg: 9.3, max: 21.0) [2024-06-06 15:00:38,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 15:00:38,820][21850] Updated weights for policy 0, policy_version 31354 (0.0036) [2024-06-06 15:00:41,722][21850] Updated weights for policy 0, policy_version 31364 (0.0031) [2024-06-06 15:00:43,056][21617] Fps is (10 sec: 42597.6, 60 sec: 44509.8, 300 sec: 44708.9). Total num frames: 513884160. Throughput: 0: 45109.6. Samples: 40887860. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:00:43,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 15:00:45,907][21850] Updated weights for policy 0, policy_version 31374 (0.0020) [2024-06-06 15:00:48,055][21617] Fps is (10 sec: 45875.6, 60 sec: 45059.4, 300 sec: 44875.5). Total num frames: 514129920. Throughput: 0: 45166.7. Samples: 41162360. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:00:48,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 15:00:49,100][21850] Updated weights for policy 0, policy_version 31384 (0.0029) [2024-06-06 15:00:53,055][21617] Fps is (10 sec: 44237.4, 60 sec: 45056.0, 300 sec: 44764.4). Total num frames: 514326528. Throughput: 0: 44869.8. Samples: 41422780. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:00:53,056][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 15:00:53,330][21850] Updated weights for policy 0, policy_version 31394 (0.0036) [2024-06-06 15:00:56,565][21850] Updated weights for policy 0, policy_version 31404 (0.0030) [2024-06-06 15:00:58,056][21617] Fps is (10 sec: 44234.2, 60 sec: 44782.5, 300 sec: 44819.9). Total num frames: 514572288. Throughput: 0: 44870.6. Samples: 41558400. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:00:58,056][21617] Avg episode reward: [(0, '0.174')] [2024-06-06 15:01:00,725][21850] Updated weights for policy 0, policy_version 31414 (0.0032) [2024-06-06 15:01:03,060][21617] Fps is (10 sec: 49130.7, 60 sec: 45871.9, 300 sec: 44930.4). Total num frames: 514818048. Throughput: 0: 45174.2. Samples: 41838280. Policy #0 lag: (min: 1.0, avg: 12.1, max: 23.0) [2024-06-06 15:01:03,060][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 15:01:03,606][21850] Updated weights for policy 0, policy_version 31424 (0.0041) [2024-06-06 15:01:07,657][21850] Updated weights for policy 0, policy_version 31434 (0.0036) [2024-06-06 15:01:08,055][21617] Fps is (10 sec: 44239.1, 60 sec: 45056.0, 300 sec: 44820.0). Total num frames: 515014656. Throughput: 0: 45071.6. Samples: 42100660. Policy #0 lag: (min: 1.0, avg: 12.1, max: 23.0) [2024-06-06 15:01:08,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 15:01:11,057][21850] Updated weights for policy 0, policy_version 31444 (0.0029) [2024-06-06 15:01:13,056][21617] Fps is (10 sec: 40975.8, 60 sec: 44236.8, 300 sec: 44819.9). Total num frames: 515227648. Throughput: 0: 44775.6. Samples: 42226800. Policy #0 lag: (min: 1.0, avg: 12.1, max: 23.0) [2024-06-06 15:01:13,057][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 15:01:15,077][21850] Updated weights for policy 0, policy_version 31454 (0.0026) [2024-06-06 15:01:16,735][21829] Signal inference workers to stop experience collection... (600 times) [2024-06-06 15:01:16,761][21850] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-06 15:01:16,789][21829] Signal inference workers to resume experience collection... (600 times) [2024-06-06 15:01:16,796][21850] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-06 15:01:18,056][21617] Fps is (10 sec: 45874.9, 60 sec: 45328.9, 300 sec: 44820.6). Total num frames: 515473408. Throughput: 0: 44814.9. Samples: 42503600. Policy #0 lag: (min: 1.0, avg: 12.1, max: 23.0) [2024-06-06 15:01:18,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 15:01:18,404][21850] Updated weights for policy 0, policy_version 31464 (0.0023) [2024-06-06 15:01:22,451][21850] Updated weights for policy 0, policy_version 31474 (0.0039) [2024-06-06 15:01:23,056][21617] Fps is (10 sec: 44238.4, 60 sec: 44786.2, 300 sec: 44819.9). Total num frames: 515670016. Throughput: 0: 44759.0. Samples: 42767640. Policy #0 lag: (min: 1.0, avg: 12.1, max: 23.0) [2024-06-06 15:01:23,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 15:01:25,869][21850] Updated weights for policy 0, policy_version 31484 (0.0021) [2024-06-06 15:01:28,056][21617] Fps is (10 sec: 44236.3, 60 sec: 44509.7, 300 sec: 44875.5). Total num frames: 515915776. Throughput: 0: 44706.6. Samples: 42899660. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 15:01:28,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 15:01:29,985][21850] Updated weights for policy 0, policy_version 31494 (0.0043) [2024-06-06 15:01:33,001][21850] Updated weights for policy 0, policy_version 31504 (0.0037) [2024-06-06 15:01:33,056][21617] Fps is (10 sec: 49150.3, 60 sec: 45055.6, 300 sec: 44875.4). Total num frames: 516161536. Throughput: 0: 44716.8. Samples: 43174640. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 15:01:33,057][21617] Avg episode reward: [(0, '0.184')] [2024-06-06 15:01:36,961][21850] Updated weights for policy 0, policy_version 31514 (0.0031) [2024-06-06 15:01:38,055][21617] Fps is (10 sec: 44237.9, 60 sec: 44783.0, 300 sec: 44875.5). Total num frames: 516358144. Throughput: 0: 45097.3. Samples: 43452160. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 15:01:38,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 15:01:40,336][21850] Updated weights for policy 0, policy_version 31524 (0.0030) [2024-06-06 15:01:43,055][21617] Fps is (10 sec: 42600.3, 60 sec: 45056.1, 300 sec: 44820.0). Total num frames: 516587520. Throughput: 0: 45007.2. Samples: 43583700. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 15:01:43,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 15:01:44,065][21850] Updated weights for policy 0, policy_version 31534 (0.0036) [2024-06-06 15:01:47,678][21850] Updated weights for policy 0, policy_version 31544 (0.0029) [2024-06-06 15:01:48,056][21617] Fps is (10 sec: 47512.6, 60 sec: 45055.8, 300 sec: 44876.2). Total num frames: 516833280. Throughput: 0: 44841.5. Samples: 43855960. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-06 15:01:48,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 15:01:51,378][21850] Updated weights for policy 0, policy_version 31554 (0.0049) [2024-06-06 15:01:53,056][21617] Fps is (10 sec: 45874.9, 60 sec: 45329.0, 300 sec: 44875.6). Total num frames: 517046272. Throughput: 0: 44973.7. Samples: 44124480. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-06 15:01:53,056][21617] Avg episode reward: [(0, '0.172')] [2024-06-06 15:01:55,086][21850] Updated weights for policy 0, policy_version 31564 (0.0032) [2024-06-06 15:01:58,060][21617] Fps is (10 sec: 42580.5, 60 sec: 44780.1, 300 sec: 44819.3). Total num frames: 517259264. Throughput: 0: 45054.7. Samples: 44254440. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-06 15:01:58,060][21617] Avg episode reward: [(0, '0.172')] [2024-06-06 15:01:59,214][21850] Updated weights for policy 0, policy_version 31574 (0.0037) [2024-06-06 15:02:02,347][21850] Updated weights for policy 0, policy_version 31584 (0.0028) [2024-06-06 15:02:03,056][21617] Fps is (10 sec: 44236.6, 60 sec: 44513.0, 300 sec: 44819.9). Total num frames: 517488640. Throughput: 0: 44845.3. Samples: 44521640. Policy #0 lag: (min: 0.0, avg: 9.3, max: 20.0) [2024-06-06 15:02:03,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 15:02:06,279][21850] Updated weights for policy 0, policy_version 31594 (0.0030) [2024-06-06 15:02:08,055][21617] Fps is (10 sec: 45895.6, 60 sec: 45056.1, 300 sec: 44875.5). Total num frames: 517718016. Throughput: 0: 44959.7. Samples: 44790820. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 15:02:08,056][21617] Avg episode reward: [(0, '0.174')] [2024-06-06 15:02:09,594][21850] Updated weights for policy 0, policy_version 31604 (0.0030) [2024-06-06 15:02:13,055][21617] Fps is (10 sec: 44237.5, 60 sec: 45056.4, 300 sec: 44820.0). Total num frames: 517931008. Throughput: 0: 45084.2. Samples: 44928440. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 15:02:13,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 15:02:13,727][21850] Updated weights for policy 0, policy_version 31614 (0.0031) [2024-06-06 15:02:16,887][21850] Updated weights for policy 0, policy_version 31624 (0.0029) [2024-06-06 15:02:18,055][21617] Fps is (10 sec: 44236.7, 60 sec: 44783.0, 300 sec: 44820.0). Total num frames: 518160384. Throughput: 0: 45046.3. Samples: 45201700. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 15:02:18,056][21617] Avg episode reward: [(0, '0.184')] [2024-06-06 15:02:20,735][21850] Updated weights for policy 0, policy_version 31634 (0.0028) [2024-06-06 15:02:23,055][21617] Fps is (10 sec: 45875.2, 60 sec: 45329.2, 300 sec: 44820.0). Total num frames: 518389760. Throughput: 0: 44769.8. Samples: 45466800. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 15:02:23,064][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 15:02:24,096][21850] Updated weights for policy 0, policy_version 31644 (0.0036) [2024-06-06 15:02:28,055][21617] Fps is (10 sec: 44236.7, 60 sec: 44783.1, 300 sec: 44820.0). Total num frames: 518602752. Throughput: 0: 44962.3. Samples: 45607000. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:02:28,056][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 15:02:28,211][21850] Updated weights for policy 0, policy_version 31654 (0.0034) [2024-06-06 15:02:31,395][21850] Updated weights for policy 0, policy_version 31664 (0.0033) [2024-06-06 15:02:33,055][21617] Fps is (10 sec: 44236.9, 60 sec: 44510.2, 300 sec: 44820.0). Total num frames: 518832128. Throughput: 0: 44862.9. Samples: 45874780. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:02:33,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 15:02:33,155][21829] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000031668_518848512.pth... [2024-06-06 15:02:33,246][21829] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000031010_508067840.pth [2024-06-06 15:02:35,207][21850] Updated weights for policy 0, policy_version 31674 (0.0027) [2024-06-06 15:02:38,055][21617] Fps is (10 sec: 45875.1, 60 sec: 45056.0, 300 sec: 44764.4). Total num frames: 519061504. Throughput: 0: 44976.1. Samples: 46148400. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:02:38,056][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 15:02:38,857][21850] Updated weights for policy 0, policy_version 31684 (0.0042) [2024-06-06 15:02:42,905][21850] Updated weights for policy 0, policy_version 31694 (0.0044) [2024-06-06 15:02:43,055][21617] Fps is (10 sec: 44236.8, 60 sec: 44783.0, 300 sec: 44931.0). Total num frames: 519274496. Throughput: 0: 45096.0. Samples: 46283560. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:02:43,056][21617] Avg episode reward: [(0, '0.180')] [2024-06-06 15:02:46,212][21850] Updated weights for policy 0, policy_version 31704 (0.0028) [2024-06-06 15:02:48,056][21617] Fps is (10 sec: 44236.3, 60 sec: 44509.9, 300 sec: 44875.5). Total num frames: 519503872. Throughput: 0: 45030.2. Samples: 46548000. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:02:48,056][21617] Avg episode reward: [(0, '0.175')] [2024-06-06 15:02:48,179][21829] Signal inference workers to stop experience collection... (650 times) [2024-06-06 15:02:48,181][21829] Signal inference workers to resume experience collection... (650 times) [2024-06-06 15:02:48,195][21850] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-06 15:02:48,223][21850] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-06 15:02:50,027][21850] Updated weights for policy 0, policy_version 31714 (0.0030) [2024-06-06 15:02:53,056][21617] Fps is (10 sec: 45874.5, 60 sec: 44782.9, 300 sec: 44819.9). Total num frames: 519733248. Throughput: 0: 45049.2. Samples: 46818040. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:02:53,056][21617] Avg episode reward: [(0, '0.191')] [2024-06-06 15:02:53,521][21850] Updated weights for policy 0, policy_version 31724 (0.0040) [2024-06-06 15:02:57,321][21850] Updated weights for policy 0, policy_version 31734 (0.0039) [2024-06-06 15:02:58,055][21617] Fps is (10 sec: 44237.6, 60 sec: 44786.3, 300 sec: 44931.0). Total num frames: 519946240. Throughput: 0: 44916.0. Samples: 46949660. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:02:58,056][21617] Avg episode reward: [(0, '0.184')] [2024-06-06 15:03:00,964][21850] Updated weights for policy 0, policy_version 31744 (0.0036) [2024-06-06 15:03:03,056][21617] Fps is (10 sec: 45875.1, 60 sec: 45056.0, 300 sec: 44875.5). Total num frames: 520192000. Throughput: 0: 44951.0. Samples: 47224500. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:03:03,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 15:03:04,514][21850] Updated weights for policy 0, policy_version 31754 (0.0021) [2024-06-06 15:03:08,055][21617] Fps is (10 sec: 44236.7, 60 sec: 44509.9, 300 sec: 44820.0). Total num frames: 520388608. Throughput: 0: 44936.0. Samples: 47488920. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 15:03:08,056][21617] Avg episode reward: [(0, '0.170')] [2024-06-06 15:03:08,224][21850] Updated weights for policy 0, policy_version 31764 (0.0031) [2024-06-06 15:03:11,955][21850] Updated weights for policy 0, policy_version 31774 (0.0026) [2024-06-06 15:03:13,056][21617] Fps is (10 sec: 42597.9, 60 sec: 44782.7, 300 sec: 44875.5). Total num frames: 520617984. Throughput: 0: 44798.0. Samples: 47622920. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 15:03:13,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 15:03:15,817][21850] Updated weights for policy 0, policy_version 31784 (0.0039) [2024-06-06 15:03:18,056][21617] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 44875.5). Total num frames: 520847360. Throughput: 0: 44813.7. Samples: 47891400. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 15:03:18,056][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 15:03:19,363][21850] Updated weights for policy 0, policy_version 31794 (0.0036) [2024-06-06 15:03:23,060][21617] Fps is (10 sec: 44218.0, 60 sec: 44506.5, 300 sec: 44874.8). Total num frames: 521060352. Throughput: 0: 44794.2. Samples: 48164340. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 15:03:23,061][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 15:03:23,327][21850] Updated weights for policy 0, policy_version 31804 (0.0031) [2024-06-06 15:03:26,507][21850] Updated weights for policy 0, policy_version 31814 (0.0040) [2024-06-06 15:03:28,055][21617] Fps is (10 sec: 44236.9, 60 sec: 44782.9, 300 sec: 44931.0). Total num frames: 521289728. Throughput: 0: 44831.5. Samples: 48300980. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 15:03:28,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 15:03:30,245][21850] Updated weights for policy 0, policy_version 31824 (0.0039) [2024-06-06 15:03:33,055][21617] Fps is (10 sec: 45896.1, 60 sec: 44782.9, 300 sec: 44820.0). Total num frames: 521519104. Throughput: 0: 44935.3. Samples: 48570080. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 15:03:33,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 15:03:33,543][21850] Updated weights for policy 0, policy_version 31834 (0.0025) [2024-06-06 15:03:37,462][21850] Updated weights for policy 0, policy_version 31844 (0.0037) [2024-06-06 15:03:38,056][21617] Fps is (10 sec: 45874.3, 60 sec: 44782.8, 300 sec: 44931.0). Total num frames: 521748480. Throughput: 0: 44857.7. Samples: 48836640. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 15:03:38,056][21617] Avg episode reward: [(0, '0.183')] [2024-06-06 15:03:41,094][21850] Updated weights for policy 0, policy_version 31854 (0.0033) [2024-06-06 15:03:43,056][21617] Fps is (10 sec: 47512.9, 60 sec: 45329.0, 300 sec: 44986.6). Total num frames: 521994240. Throughput: 0: 44975.4. Samples: 48973560. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 15:03:43,056][21617] Avg episode reward: [(0, '0.184')] [2024-06-06 15:03:44,920][21850] Updated weights for policy 0, policy_version 31864 (0.0029) [2024-06-06 15:03:48,056][21617] Fps is (10 sec: 44235.7, 60 sec: 44782.7, 300 sec: 44819.9). Total num frames: 522190848. Throughput: 0: 44767.7. Samples: 49239060. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 15:03:48,057][21617] Avg episode reward: [(0, '0.189')] [2024-06-06 15:03:48,265][21850] Updated weights for policy 0, policy_version 31874 (0.0031) [2024-06-06 15:03:52,244][21850] Updated weights for policy 0, policy_version 31884 (0.0035) [2024-06-06 15:03:53,056][21617] Fps is (10 sec: 42598.2, 60 sec: 44782.9, 300 sec: 44875.5). Total num frames: 522420224. Throughput: 0: 45004.3. Samples: 49514120. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 15:03:53,056][21617] Avg episode reward: [(0, '0.181')] [2024-06-06 15:03:55,543][21850] Updated weights for policy 0, policy_version 31894 (0.0028) [2024-06-06 15:03:58,055][21617] Fps is (10 sec: 47516.4, 60 sec: 45329.1, 300 sec: 44986.6). Total num frames: 522665984. Throughput: 0: 44880.8. Samples: 49642540. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 15:03:58,056][21617] Avg episode reward: [(0, '0.179')] [2024-06-06 15:03:59,165][21850] Updated weights for policy 0, policy_version 31904 (0.0020) [2024-06-06 15:04:02,536][21850] Updated weights for policy 0, policy_version 31914 (0.0026) [2024-06-06 15:04:03,055][21617] Fps is (10 sec: 47514.3, 60 sec: 45056.1, 300 sec: 44987.3). Total num frames: 522895360. Throughput: 0: 45079.2. Samples: 49919960. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 15:04:03,056][21617] Avg episode reward: [(0, '0.177')] [2024-06-06 15:04:06,517][21850] Updated weights for policy 0, policy_version 31924 (0.0031) [2024-06-06 15:04:08,060][21617] Fps is (10 sec: 42578.8, 60 sec: 45052.6, 300 sec: 44930.4). Total num frames: 523091968. Throughput: 0: 45088.5. Samples: 50193320. Policy #0 lag: (min: 0.0, avg: 11.4, max: 22.0) [2024-06-06 15:04:08,061][21617] Avg episode reward: [(0, '0.173')] [2024-06-06 15:04:10,000][21850] Updated weights for policy 0, policy_version 31934 (0.0029) [2024-06-06 15:04:13,060][21617] Fps is (10 sec: 42579.3, 60 sec: 45052.8, 300 sec: 44930.4). Total num frames: 523321344. Throughput: 0: 44921.8. Samples: 50322660. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 15:04:13,061][21617] Avg episode reward: [(0, '0.178')] [2024-06-06 15:04:13,871][21850] Updated weights for policy 0, policy_version 31944 (0.0022) [2024-06-06 15:04:17,006][21850] Updated weights for policy 0, policy_version 31954 (0.0040) [2024-06-06 15:04:24,639][24114] Saving configuration to /workspace/metta/train_dir/p2.metta.4/config.json... [2024-06-06 15:04:24,656][24114] Rollout worker 0 uses device cpu [2024-06-06 15:04:24,656][24114] Rollout worker 1 uses device cpu [2024-06-06 15:04:24,656][24114] Rollout worker 2 uses device cpu [2024-06-06 15:04:24,656][24114] Rollout worker 3 uses device cpu [2024-06-06 15:04:24,657][24114] Rollout worker 4 uses device cpu [2024-06-06 15:04:24,657][24114] Rollout worker 5 uses device cpu [2024-06-06 15:04:24,657][24114] Rollout worker 6 uses device cpu [2024-06-06 15:04:24,657][24114] Rollout worker 7 uses device cpu [2024-06-06 15:04:24,657][24114] Rollout worker 8 uses device cpu [2024-06-06 15:04:24,657][24114] Rollout worker 9 uses device cpu [2024-06-06 15:04:24,657][24114] Rollout worker 10 uses device cpu [2024-06-06 15:04:24,657][24114] Rollout worker 11 uses device cpu [2024-06-06 15:04:24,657][24114] Rollout worker 12 uses device cpu [2024-06-06 15:04:24,657][24114] Rollout worker 13 uses device cpu [2024-06-06 15:04:24,658][24114] Rollout worker 14 uses device cpu [2024-06-06 15:04:24,658][24114] Rollout worker 15 uses device cpu [2024-06-06 15:04:24,658][24114] Rollout worker 16 uses device cpu [2024-06-06 15:04:24,658][24114] Rollout worker 17 uses device cpu [2024-06-06 15:04:24,658][24114] Rollout worker 18 uses device cpu [2024-06-06 15:04:24,658][24114] Rollout worker 19 uses device cpu [2024-06-06 15:04:24,658][24114] Rollout worker 20 uses device cpu [2024-06-06 15:04:24,658][24114] Rollout worker 21 uses device cpu [2024-06-06 15:04:24,658][24114] Rollout worker 22 uses device cpu [2024-06-06 15:04:24,659][24114] Rollout worker 23 uses device cpu [2024-06-06 15:04:24,659][24114] Rollout worker 24 uses device cpu [2024-06-06 15:04:24,659][24114] Rollout worker 25 uses device cpu [2024-06-06 15:04:24,659][24114] Rollout worker 26 uses device cpu [2024-06-06 15:04:24,659][24114] Rollout worker 27 uses device cpu [2024-06-06 15:04:24,659][24114] Rollout worker 28 uses device cpu [2024-06-06 15:04:24,659][24114] Rollout worker 29 uses device cpu [2024-06-06 15:04:24,659][24114] Rollout worker 30 uses device cpu [2024-06-06 15:04:24,659][24114] Rollout worker 31 uses device cpu [2024-06-06 15:04:25,202][24114] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 15:04:25,202][24114] InferenceWorker_p0-w0: min num requests: 10 [2024-06-06 15:04:25,246][24114] Starting all processes... [2024-06-06 15:04:25,246][24114] Starting process learner_proc0 [2024-06-06 15:04:25,515][24114] Starting all processes... [2024-06-06 15:04:25,519][24114] Starting process inference_proc0-0 [2024-06-06 15:04:25,519][24114] Starting process rollout_proc0 [2024-06-06 15:04:25,519][24114] Starting process rollout_proc1 [2024-06-06 15:04:25,519][24114] Starting process rollout_proc2 [2024-06-06 15:04:25,519][24114] Starting process rollout_proc3 [2024-06-06 15:04:25,519][24114] Starting process rollout_proc4 [2024-06-06 15:04:25,519][24114] Starting process rollout_proc5 [2024-06-06 15:04:25,519][24114] Starting process rollout_proc6 [2024-06-06 15:04:25,519][24114] Starting process rollout_proc7 [2024-06-06 15:04:25,531][24114] Starting process rollout_proc21 [2024-06-06 15:04:25,520][24114] Starting process rollout_proc9 [2024-06-06 15:04:25,520][24114] Starting process rollout_proc10 [2024-06-06 15:04:25,522][24114] Starting process rollout_proc11 [2024-06-06 15:04:25,523][24114] Starting process rollout_proc12 [2024-06-06 15:04:25,525][24114] Starting process rollout_proc13 [2024-06-06 15:04:25,525][24114] Starting process rollout_proc14 [2024-06-06 15:04:25,525][24114] Starting process rollout_proc15 [2024-06-06 15:04:25,525][24114] Starting process rollout_proc16 [2024-06-06 15:04:25,526][24114] Starting process rollout_proc17 [2024-06-06 15:04:25,527][24114] Starting process rollout_proc18 [2024-06-06 15:04:25,527][24114] Starting process rollout_proc19 [2024-06-06 15:04:25,530][24114] Starting process rollout_proc20 [2024-06-06 15:04:25,520][24114] Starting process rollout_proc8 [2024-06-06 15:04:25,533][24114] Starting process rollout_proc22 [2024-06-06 15:04:25,534][24114] Starting process rollout_proc23 [2024-06-06 15:04:25,536][24114] Starting process rollout_proc24 [2024-06-06 15:04:25,538][24114] Starting process rollout_proc25 [2024-06-06 15:04:25,540][24114] Starting process rollout_proc26 [2024-06-06 15:04:25,540][24114] Starting process rollout_proc27 [2024-06-06 15:04:25,543][24114] Starting process rollout_proc28 [2024-06-06 15:04:25,544][24114] Starting process rollout_proc29 [2024-06-06 15:04:25,545][24114] Starting process rollout_proc30 [2024-06-06 15:04:25,548][24114] Starting process rollout_proc31 [2024-06-06 15:04:27,339][24350] Worker 3 uses CPU cores [3] [2024-06-06 15:04:27,535][24354] Worker 21 uses CPU cores [21] [2024-06-06 15:04:27,652][24355] Worker 9 uses CPU cores [9] [2024-06-06 15:04:27,697][24352] Worker 5 uses CPU cores [5] [2024-06-06 15:04:27,698][24364] Worker 16 uses CPU cores [16] [2024-06-06 15:04:27,725][24326] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 15:04:27,725][24326] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-06 15:04:27,734][24326] Num visible devices: 1 [2024-06-06 15:04:27,748][24326] Setting fixed seed 0 [2024-06-06 15:04:27,749][24326] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 15:04:27,749][24326] Initializing actor-critic model on device cuda:0 [2024-06-06 15:04:27,760][24367] Worker 8 uses CPU cores [8] [2024-06-06 15:04:27,764][24346] Worker 0 uses CPU cores [0] [2024-06-06 15:04:27,780][24360] Worker 13 uses CPU cores [13] [2024-06-06 15:04:27,784][24378] Worker 30 uses CPU cores [30] [2024-06-06 15:04:27,788][24357] Worker 10 uses CPU cores [10] [2024-06-06 15:04:27,800][24351] Worker 4 uses CPU cores [4] [2024-06-06 15:04:27,818][24359] Worker 12 uses CPU cores [12] [2024-06-06 15:04:27,832][24376] Worker 28 uses CPU cores [28] [2024-06-06 15:04:27,840][24371] Worker 22 uses CPU cores [22] [2024-06-06 15:04:27,844][24370] Worker 24 uses CPU cores [24] [2024-06-06 15:04:27,860][24358] Worker 11 uses CPU cores [11] [2024-06-06 15:04:27,888][24368] Worker 18 uses CPU cores [18] [2024-06-06 15:04:27,903][24348] Worker 1 uses CPU cores [1] [2024-06-06 15:04:27,923][24356] Worker 7 uses CPU cores [7] [2024-06-06 15:04:27,994][24362] Worker 17 uses CPU cores [17] [2024-06-06 15:04:28,008][24365] Worker 19 uses CPU cores [19] [2024-06-06 15:04:28,020][24349] Worker 2 uses CPU cores [2] [2024-06-06 15:04:28,024][24372] Worker 29 uses CPU cores [29] [2024-06-06 15:04:28,033][24353] Worker 6 uses CPU cores [6] [2024-06-06 15:04:28,047][24363] Worker 15 uses CPU cores [15] [2024-06-06 15:04:28,047][24369] Worker 23 uses CPU cores [23] [2024-06-06 15:04:28,048][24347] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 15:04:28,048][24347] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-06 15:04:28,053][24375] Worker 27 uses CPU cores [27] [2024-06-06 15:04:28,056][24347] Num visible devices: 1 [2024-06-06 15:04:28,079][24361] Worker 14 uses CPU cores [14] [2024-06-06 15:04:28,084][24366] Worker 20 uses CPU cores [20] [2024-06-06 15:04:28,126][24373] Worker 25 uses CPU cores [25] [2024-06-06 15:04:28,139][24377] Worker 31 uses CPU cores [31] [2024-06-06 15:04:28,173][24374] Worker 26 uses CPU cores [26] [2024-06-06 15:04:28,608][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,608][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,608][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,608][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,608][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,608][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,608][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,608][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,609][24326] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:28,612][24326] RunningMeanStd input shape: (1,) [2024-06-06 15:04:28,613][24326] RunningMeanStd input shape: (1,) [2024-06-06 15:04:28,613][24326] RunningMeanStd input shape: (1,) [2024-06-06 15:04:28,613][24326] RunningMeanStd input shape: (1,) [2024-06-06 15:04:28,652][24326] RunningMeanStd input shape: (1,) [2024-06-06 15:04:28,657][24326] Created Actor Critic model with architecture: [2024-06-06 15:04:28,657][24326] SampleFactoryAgentWrapper( (obs_normalizer): ObservationNormalizer() (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (agent): MettaAgent( (_encoder): MultiFeatureSetEncoder( (feature_set_encoders): ModuleDict( (grid_obs): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (agent): RunningMeanStdInPlace() (altar): RunningMeanStdInPlace() (converter): RunningMeanStdInPlace() (generator): RunningMeanStdInPlace() (wall): RunningMeanStdInPlace() (agent:dir): RunningMeanStdInPlace() (agent:energy): RunningMeanStdInPlace() (agent:frozen): RunningMeanStdInPlace() (agent:hp): RunningMeanStdInPlace() (agent:id): RunningMeanStdInPlace() (agent:inv_r1): RunningMeanStdInPlace() (agent:inv_r2): RunningMeanStdInPlace() (agent:inv_r3): RunningMeanStdInPlace() (agent:shield): RunningMeanStdInPlace() (altar:hp): RunningMeanStdInPlace() (altar:state): RunningMeanStdInPlace() (converter:hp): RunningMeanStdInPlace() (converter:state): RunningMeanStdInPlace() (generator:amount): RunningMeanStdInPlace() (generator:hp): RunningMeanStdInPlace() (generator:state): RunningMeanStdInPlace() (wall:hp): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=125, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=512, out_features=512, bias=True) (7): ELU(alpha=1.0) ) ) (global_vars): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (_steps): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_action): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_action_id): RunningMeanStdInPlace() (last_action_val): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_reward): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_reward): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) ) (merged_encoder): Sequential( (0): Linear(in_features=536, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) ) ) (_core): ModelCoreRNN( (core): GRU(512, 512) ) (_decoder): Decoder( (mlp): Identity() ) (_critic_linear): Linear(in_features=512, out_features=1, bias=True) (_action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) ) [2024-06-06 15:04:28,726][24326] Using optimizer [2024-06-06 15:04:28,915][24326] Loading state from checkpoint /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000031668_518848512.pth... [2024-06-06 15:04:28,929][24326] Loading model from checkpoint [2024-06-06 15:04:28,930][24326] Loaded experiment state at self.train_step=31668, self.env_steps=518848512 [2024-06-06 15:04:28,931][24326] Initialized policy 0 weights for model version 31668 [2024-06-06 15:04:28,932][24326] LearnerWorker_p0 finished initialization! [2024-06-06 15:04:28,932][24326] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 15:04:29,665][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,666][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,667][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,667][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,667][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,667][24347] RunningMeanStd input shape: (11, 11) [2024-06-06 15:04:29,670][24347] RunningMeanStd input shape: (1,) [2024-06-06 15:04:29,670][24347] RunningMeanStd input shape: (1,) [2024-06-06 15:04:29,670][24347] RunningMeanStd input shape: (1,) [2024-06-06 15:04:29,670][24347] RunningMeanStd input shape: (1,) [2024-06-06 15:04:29,709][24347] RunningMeanStd input shape: (1,) [2024-06-06 15:04:29,731][24114] Inference worker 0-0 is ready! [2024-06-06 15:04:29,731][24114] All inference workers are ready! Signal rollout workers to start! [2024-06-06 15:04:32,289][24369] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,293][24354] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,294][24372] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,299][24366] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,299][24368] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,300][24373] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,300][24374] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,301][24365] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,305][24371] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,311][24362] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,311][24378] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,318][24114] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 518848512. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 15:04:32,348][24377] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,349][24370] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,351][24376] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,366][24348] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,376][24352] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,380][24360] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,381][24356] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,382][24355] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,383][24350] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,388][24367] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,389][24363] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,390][24359] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,394][24351] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,395][24349] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,398][24361] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,398][24346] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,401][24358] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,404][24357] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,409][24353] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,414][24375] Decorrelating experience for 0 frames... [2024-06-06 15:04:32,430][24364] Decorrelating experience for 0 frames... [2024-06-06 15:04:33,780][24369] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,790][24372] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,793][24354] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,825][24366] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,832][24373] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,834][24374] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,835][24368] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,839][24365] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,839][24371] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,850][24362] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,858][24378] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,884][24348] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,916][24370] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,918][24377] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,929][24356] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,931][24352] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,933][24350] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,939][24355] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,940][24360] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,942][24346] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,946][24363] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,950][24367] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,952][24349] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,958][24359] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,959][24351] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,963][24361] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,968][24376] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,974][24358] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,974][24353] Decorrelating experience for 256 frames... [2024-06-06 15:04:33,976][24357] Decorrelating experience for 256 frames... [2024-06-06 15:04:34,004][24375] Decorrelating experience for 256 frames... [2024-06-06 15:04:34,010][24364] Decorrelating experience for 256 frames... [2024-06-06 15:04:37,318][24114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 518848512. Throughput: 0: 7260.1. Samples: 36300. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 15:04:40,339][24354] Worker 21, sleep for 98.438 sec to decorrelate experience collection [2024-06-06 15:04:40,340][24374] Worker 26, sleep for 121.875 sec to decorrelate experience collection [2024-06-06 15:04:40,349][24361] Worker 14, sleep for 65.625 sec to decorrelate experience collection [2024-06-06 15:04:40,350][24371] Worker 22, sleep for 103.125 sec to decorrelate experience collection [2024-06-06 15:04:40,350][24372] Worker 29, sleep for 135.938 sec to decorrelate experience collection [2024-06-06 15:04:40,361][24367] Worker 8, sleep for 37.500 sec to decorrelate experience collection [2024-06-06 15:04:40,362][24366] Worker 20, sleep for 93.750 sec to decorrelate experience collection [2024-06-06 15:04:40,362][24373] Worker 25, sleep for 117.188 sec to decorrelate experience collection [2024-06-06 15:04:40,371][24355] Worker 9, sleep for 42.188 sec to decorrelate experience collection [2024-06-06 15:04:40,372][24360] Worker 13, sleep for 60.938 sec to decorrelate experience collection [2024-06-06 15:04:40,373][24369] Worker 23, sleep for 107.812 sec to decorrelate experience collection [2024-06-06 15:04:40,373][24368] Worker 18, sleep for 84.375 sec to decorrelate experience collection [2024-06-06 15:04:40,379][24359] Worker 12, sleep for 56.250 sec to decorrelate experience collection [2024-06-06 15:04:40,381][24358] Worker 11, sleep for 51.562 sec to decorrelate experience collection [2024-06-06 15:04:40,387][24363] Worker 15, sleep for 70.312 sec to decorrelate experience collection [2024-06-06 15:04:40,387][24357] Worker 10, sleep for 46.875 sec to decorrelate experience collection [2024-06-06 15:04:40,389][24350] Worker 3, sleep for 14.062 sec to decorrelate experience collection [2024-06-06 15:04:40,390][24362] Worker 17, sleep for 79.688 sec to decorrelate experience collection [2024-06-06 15:04:40,404][24377] Worker 31, sleep for 145.312 sec to decorrelate experience collection [2024-06-06 15:04:40,409][24349] Worker 2, sleep for 9.375 sec to decorrelate experience collection [2024-06-06 15:04:40,409][24356] Worker 7, sleep for 32.812 sec to decorrelate experience collection [2024-06-06 15:04:40,411][24378] Worker 30, sleep for 140.625 sec to decorrelate experience collection [2024-06-06 15:04:40,437][24348] Worker 1, sleep for 4.688 sec to decorrelate experience collection [2024-06-06 15:04:40,438][24365] Worker 19, sleep for 89.062 sec to decorrelate experience collection [2024-06-06 15:04:40,487][24352] Worker 5, sleep for 23.438 sec to decorrelate experience collection [2024-06-06 15:04:40,487][24375] Worker 27, sleep for 126.562 sec to decorrelate experience collection [2024-06-06 15:04:40,493][24326] Signal inference workers to stop experience collection... [2024-06-06 15:04:40,497][24364] Worker 16, sleep for 75.000 sec to decorrelate experience collection [2024-06-06 15:04:40,517][24347] InferenceWorker_p0-w0: stopping experience collection [2024-06-06 15:04:40,522][24370] Worker 24, sleep for 112.500 sec to decorrelate experience collection [2024-06-06 15:04:40,522][24376] Worker 28, sleep for 131.250 sec to decorrelate experience collection [2024-06-06 15:04:40,551][24353] Worker 6, sleep for 28.125 sec to decorrelate experience collection [2024-06-06 15:04:41,049][24326] Signal inference workers to resume experience collection... [2024-06-06 15:04:41,050][24347] InferenceWorker_p0-w0: resuming experience collection [2024-06-06 15:04:41,084][24351] Worker 4, sleep for 18.750 sec to decorrelate experience collection [2024-06-06 15:04:42,159][24347] Updated weights for policy 0, policy_version 31678 (0.0011) [2024-06-06 15:04:42,318][24114] Fps is (10 sec: 16383.9, 60 sec: 16383.9, 300 sec: 16383.9). Total num frames: 519012352. Throughput: 0: 32843.8. Samples: 328440. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 15:04:45,148][24348] Worker 1 awakens! [2024-06-06 15:04:45,198][24114] Heartbeat connected on Batcher_0 [2024-06-06 15:04:45,200][24114] Heartbeat connected on LearnerWorker_p0 [2024-06-06 15:04:45,216][24114] Heartbeat connected on RolloutWorker_w0 [2024-06-06 15:04:45,216][24114] Heartbeat connected on RolloutWorker_w1 [2024-06-06 15:04:45,259][24114] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-06 15:04:47,318][24114] Fps is (10 sec: 16383.9, 60 sec: 10922.6, 300 sec: 10922.6). Total num frames: 519012352. Throughput: 0: 22109.3. Samples: 331640. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 15:04:49,831][24349] Worker 2 awakens! [2024-06-06 15:04:49,841][24114] Heartbeat connected on RolloutWorker_w2 [2024-06-06 15:04:52,318][24114] Fps is (10 sec: 1638.4, 60 sec: 9011.3, 300 sec: 9011.3). Total num frames: 519028736. Throughput: 0: 17345.1. Samples: 346900. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 15:04:54,522][24350] Worker 3 awakens! [2024-06-06 15:04:54,535][24114] Heartbeat connected on RolloutWorker_w3 [2024-06-06 15:04:57,318][24114] Fps is (10 sec: 3276.8, 60 sec: 7864.3, 300 sec: 7864.3). Total num frames: 519045120. Throughput: 0: 14848.8. Samples: 371220. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 15:04:59,928][24351] Worker 4 awakens! [2024-06-06 15:04:59,933][24114] Heartbeat connected on RolloutWorker_w4 [2024-06-06 15:05:02,318][24114] Fps is (10 sec: 6553.5, 60 sec: 8192.0, 300 sec: 8192.0). Total num frames: 519094272. Throughput: 0: 12837.4. Samples: 385120. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 15:05:02,318][24114] Avg episode reward: [(0, '0.129')] [2024-06-06 15:05:04,027][24352] Worker 5 awakens! [2024-06-06 15:05:04,034][24114] Heartbeat connected on RolloutWorker_w5 [2024-06-06 15:05:07,318][24114] Fps is (10 sec: 9830.6, 60 sec: 8426.1, 300 sec: 8426.1). Total num frames: 519143424. Throughput: 0: 13194.3. Samples: 461800. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 15:05:07,325][24114] Avg episode reward: [(0, '0.150')] [2024-06-06 15:05:07,895][24347] Updated weights for policy 0, policy_version 31688 (0.0016) [2024-06-06 15:05:08,776][24353] Worker 6 awakens! [2024-06-06 15:05:08,780][24114] Heartbeat connected on RolloutWorker_w6 [2024-06-06 15:05:12,318][24114] Fps is (10 sec: 16384.1, 60 sec: 10240.0, 300 sec: 10240.0). Total num frames: 519258112. Throughput: 0: 14165.0. Samples: 566600. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 15:05:12,318][24114] Avg episode reward: [(0, '0.162')] [2024-06-06 15:05:13,248][24356] Worker 7 awakens! [2024-06-06 15:05:13,255][24114] Heartbeat connected on RolloutWorker_w7 [2024-06-06 15:05:15,361][24347] Updated weights for policy 0, policy_version 31698 (0.0011) [2024-06-06 15:05:17,318][24114] Fps is (10 sec: 21299.0, 60 sec: 11286.8, 300 sec: 11286.8). Total num frames: 519356416. Throughput: 0: 14121.3. Samples: 635460. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 15:05:17,318][24114] Avg episode reward: [(0, '0.163')] [2024-06-06 15:05:17,960][24367] Worker 8 awakens! [2024-06-06 15:05:17,964][24114] Heartbeat connected on RolloutWorker_w8 [2024-06-06 15:05:22,318][24114] Fps is (10 sec: 21299.1, 60 sec: 12451.9, 300 sec: 12451.9). Total num frames: 519471104. Throughput: 0: 16287.6. Samples: 769240. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 15:05:22,325][24114] Avg episode reward: [(0, '0.165')] [2024-06-06 15:05:22,658][24355] Worker 9 awakens! [2024-06-06 15:05:22,665][24114] Heartbeat connected on RolloutWorker_w9 [2024-06-06 15:05:22,919][24347] Updated weights for policy 0, policy_version 31708 (0.0011) [2024-06-06 15:05:27,318][24114] Fps is (10 sec: 26214.2, 60 sec: 14000.9, 300 sec: 14000.9). Total num frames: 519618560. Throughput: 0: 13108.9. Samples: 918340. Policy #0 lag: (min: 0.0, avg: 4.1, max: 7.0) [2024-06-06 15:05:27,318][24114] Avg episode reward: [(0, '0.170')] [2024-06-06 15:05:27,360][24357] Worker 10 awakens! [2024-06-06 15:05:27,365][24114] Heartbeat connected on RolloutWorker_w10 [2024-06-06 15:05:29,341][24347] Updated weights for policy 0, policy_version 31718 (0.0018) [2024-06-06 15:05:32,040][24358] Worker 11 awakens! [2024-06-06 15:05:32,048][24114] Heartbeat connected on RolloutWorker_w11 [2024-06-06 15:05:32,318][24114] Fps is (10 sec: 29490.8, 60 sec: 15291.7, 300 sec: 15291.7). Total num frames: 519766016. Throughput: 0: 15112.4. Samples: 1011700. Policy #0 lag: (min: 0.0, avg: 4.1, max: 7.0) [2024-06-06 15:05:32,318][24114] Avg episode reward: [(0, '0.172')] [2024-06-06 15:05:34,105][24347] Updated weights for policy 0, policy_version 31728 (0.0021) [2024-06-06 15:05:36,728][24359] Worker 12 awakens! [2024-06-06 15:05:36,733][24114] Heartbeat connected on RolloutWorker_w12 [2024-06-06 15:05:37,318][24114] Fps is (10 sec: 29491.3, 60 sec: 17749.3, 300 sec: 16384.0). Total num frames: 519913472. Throughput: 0: 19006.2. Samples: 1202180. Policy #0 lag: (min: 0.0, avg: 4.1, max: 7.0) [2024-06-06 15:05:37,325][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:05:38,727][24347] Updated weights for policy 0, policy_version 31738 (0.0015) [2024-06-06 15:05:41,412][24360] Worker 13 awakens! [2024-06-06 15:05:41,442][24114] Heartbeat connected on RolloutWorker_w13 [2024-06-06 15:05:42,318][24114] Fps is (10 sec: 32768.1, 60 sec: 18022.4, 300 sec: 17788.3). Total num frames: 520093696. Throughput: 0: 22973.4. Samples: 1405020. Policy #0 lag: (min: 0.0, avg: 4.1, max: 7.0) [2024-06-06 15:05:42,318][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:05:44,026][24347] Updated weights for policy 0, policy_version 31748 (0.0020) [2024-06-06 15:05:46,074][24361] Worker 14 awakens! [2024-06-06 15:05:46,080][24114] Heartbeat connected on RolloutWorker_w14 [2024-06-06 15:05:47,318][24114] Fps is (10 sec: 36044.7, 60 sec: 21026.1, 300 sec: 19005.4). Total num frames: 520273920. Throughput: 0: 24985.3. Samples: 1509460. Policy #0 lag: (min: 0.0, avg: 3.5, max: 9.0) [2024-06-06 15:05:47,318][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:05:48,771][24347] Updated weights for policy 0, policy_version 31758 (0.0026) [2024-06-06 15:05:50,784][24363] Worker 15 awakens! [2024-06-06 15:05:50,792][24114] Heartbeat connected on RolloutWorker_w15 [2024-06-06 15:05:52,318][24114] Fps is (10 sec: 36044.9, 60 sec: 23756.7, 300 sec: 20070.4). Total num frames: 520454144. Throughput: 0: 27870.1. Samples: 1715960. Policy #0 lag: (min: 0.0, avg: 3.5, max: 9.0) [2024-06-06 15:05:52,318][24114] Avg episode reward: [(0, '0.177')] [2024-06-06 15:05:53,152][24347] Updated weights for policy 0, policy_version 31768 (0.0022) [2024-06-06 15:05:55,596][24364] Worker 16 awakens! [2024-06-06 15:05:55,605][24114] Heartbeat connected on RolloutWorker_w16 [2024-06-06 15:05:57,318][24114] Fps is (10 sec: 34406.3, 60 sec: 26214.4, 300 sec: 20817.3). Total num frames: 520617984. Throughput: 0: 30059.4. Samples: 1919280. Policy #0 lag: (min: 0.0, avg: 3.5, max: 9.0) [2024-06-06 15:05:57,319][24114] Avg episode reward: [(0, '0.167')] [2024-06-06 15:05:58,324][24347] Updated weights for policy 0, policy_version 31778 (0.0023) [2024-06-06 15:06:00,178][24362] Worker 17 awakens! [2024-06-06 15:06:00,188][24114] Heartbeat connected on RolloutWorker_w17 [2024-06-06 15:06:02,318][24114] Fps is (10 sec: 34406.2, 60 sec: 28398.9, 300 sec: 21663.3). Total num frames: 520798208. Throughput: 0: 31114.6. Samples: 2035620. Policy #0 lag: (min: 0.0, avg: 3.5, max: 9.0) [2024-06-06 15:06:02,318][24114] Avg episode reward: [(0, '0.166')] [2024-06-06 15:06:02,537][24347] Updated weights for policy 0, policy_version 31788 (0.0021) [2024-06-06 15:06:04,848][24368] Worker 18 awakens! [2024-06-06 15:06:04,857][24114] Heartbeat connected on RolloutWorker_w18 [2024-06-06 15:06:06,378][24347] Updated weights for policy 0, policy_version 31798 (0.0026) [2024-06-06 15:06:07,318][24114] Fps is (10 sec: 36045.2, 60 sec: 30583.4, 300 sec: 22420.2). Total num frames: 520978432. Throughput: 0: 32858.2. Samples: 2247860. Policy #0 lag: (min: 1.0, avg: 8.3, max: 14.0) [2024-06-06 15:06:07,318][24114] Avg episode reward: [(0, '0.166')] [2024-06-06 15:06:09,602][24365] Worker 19 awakens! [2024-06-06 15:06:09,612][24114] Heartbeat connected on RolloutWorker_w19 [2024-06-06 15:06:10,917][24347] Updated weights for policy 0, policy_version 31808 (0.0020) [2024-06-06 15:06:12,318][24114] Fps is (10 sec: 39321.3, 60 sec: 32221.7, 300 sec: 23429.1). Total num frames: 521191424. Throughput: 0: 34580.4. Samples: 2474460. Policy #0 lag: (min: 1.0, avg: 8.3, max: 14.0) [2024-06-06 15:06:12,319][24114] Avg episode reward: [(0, '0.168')] [2024-06-06 15:06:14,212][24366] Worker 20 awakens! [2024-06-06 15:06:14,223][24114] Heartbeat connected on RolloutWorker_w20 [2024-06-06 15:06:15,629][24347] Updated weights for policy 0, policy_version 31818 (0.0023) [2024-06-06 15:06:17,318][24114] Fps is (10 sec: 40960.4, 60 sec: 33860.3, 300 sec: 24185.9). Total num frames: 521388032. Throughput: 0: 35303.3. Samples: 2600340. Policy #0 lag: (min: 1.0, avg: 8.3, max: 14.0) [2024-06-06 15:06:17,318][24114] Avg episode reward: [(0, '0.163')] [2024-06-06 15:06:18,877][24354] Worker 21 awakens! [2024-06-06 15:06:18,889][24114] Heartbeat connected on RolloutWorker_w21 [2024-06-06 15:06:19,697][24347] Updated weights for policy 0, policy_version 31828 (0.0035) [2024-06-06 15:06:22,318][24114] Fps is (10 sec: 40959.9, 60 sec: 35498.5, 300 sec: 25022.8). Total num frames: 521601024. Throughput: 0: 36454.1. Samples: 2842620. Policy #0 lag: (min: 1.0, avg: 8.3, max: 14.0) [2024-06-06 15:06:22,319][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:06:22,330][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000031836_521601024.pth... [2024-06-06 15:06:22,385][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000031340_513474560.pth [2024-06-06 15:06:23,451][24347] Updated weights for policy 0, policy_version 31838 (0.0025) [2024-06-06 15:06:23,576][24371] Worker 22 awakens! [2024-06-06 15:06:23,587][24114] Heartbeat connected on RolloutWorker_w22 [2024-06-06 15:06:27,280][24347] Updated weights for policy 0, policy_version 31848 (0.0034) [2024-06-06 15:06:27,318][24114] Fps is (10 sec: 40959.0, 60 sec: 36317.8, 300 sec: 25644.5). Total num frames: 521797632. Throughput: 0: 37290.1. Samples: 3083080. Policy #0 lag: (min: 0.0, avg: 5.4, max: 14.0) [2024-06-06 15:06:27,319][24114] Avg episode reward: [(0, '0.169')] [2024-06-06 15:06:28,284][24369] Worker 23 awakens! [2024-06-06 15:06:28,296][24114] Heartbeat connected on RolloutWorker_w23 [2024-06-06 15:06:31,904][24347] Updated weights for policy 0, policy_version 31858 (0.0032) [2024-06-06 15:06:32,318][24114] Fps is (10 sec: 36045.3, 60 sec: 36591.0, 300 sec: 25941.3). Total num frames: 521961472. Throughput: 0: 37675.1. Samples: 3204840. Policy #0 lag: (min: 0.0, avg: 5.4, max: 14.0) [2024-06-06 15:06:32,318][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:06:33,120][24370] Worker 24 awakens! [2024-06-06 15:06:33,131][24114] Heartbeat connected on RolloutWorker_w24 [2024-06-06 15:06:35,321][24347] Updated weights for policy 0, policy_version 31868 (0.0018) [2024-06-06 15:06:37,318][24114] Fps is (10 sec: 37683.8, 60 sec: 37683.2, 300 sec: 26607.6). Total num frames: 522174464. Throughput: 0: 38718.2. Samples: 3458280. Policy #0 lag: (min: 0.0, avg: 5.4, max: 14.0) [2024-06-06 15:06:37,318][24114] Avg episode reward: [(0, '0.177')] [2024-06-06 15:06:37,615][24373] Worker 25 awakens! [2024-06-06 15:06:37,627][24114] Heartbeat connected on RolloutWorker_w25 [2024-06-06 15:06:39,385][24347] Updated weights for policy 0, policy_version 31878 (0.0042) [2024-06-06 15:06:42,274][24374] Worker 26 awakens! [2024-06-06 15:06:42,290][24114] Heartbeat connected on RolloutWorker_w26 [2024-06-06 15:06:42,318][24114] Fps is (10 sec: 42598.3, 60 sec: 38229.3, 300 sec: 27222.6). Total num frames: 522387456. Throughput: 0: 39733.4. Samples: 3707280. Policy #0 lag: (min: 0.0, avg: 5.4, max: 14.0) [2024-06-06 15:06:42,318][24114] Avg episode reward: [(0, '0.175')] [2024-06-06 15:06:43,522][24347] Updated weights for policy 0, policy_version 31888 (0.0026) [2024-06-06 15:06:46,749][24347] Updated weights for policy 0, policy_version 31898 (0.0031) [2024-06-06 15:06:47,098][24375] Worker 27 awakens! [2024-06-06 15:06:47,111][24114] Heartbeat connected on RolloutWorker_w27 [2024-06-06 15:06:47,318][24114] Fps is (10 sec: 45875.4, 60 sec: 39321.7, 300 sec: 28034.9). Total num frames: 522633216. Throughput: 0: 40078.3. Samples: 3839140. Policy #0 lag: (min: 0.0, avg: 79.2, max: 230.0) [2024-06-06 15:06:47,318][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:06:51,173][24347] Updated weights for policy 0, policy_version 31908 (0.0035) [2024-06-06 15:06:51,872][24376] Worker 28 awakens! [2024-06-06 15:06:51,883][24114] Heartbeat connected on RolloutWorker_w28 [2024-06-06 15:06:52,318][24114] Fps is (10 sec: 45875.1, 60 sec: 39867.7, 300 sec: 28555.0). Total num frames: 522846208. Throughput: 0: 41102.6. Samples: 4097480. Policy #0 lag: (min: 0.0, avg: 79.2, max: 230.0) [2024-06-06 15:06:52,318][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:06:54,137][24347] Updated weights for policy 0, policy_version 31918 (0.0034) [2024-06-06 15:06:56,388][24372] Worker 29 awakens! [2024-06-06 15:06:56,402][24114] Heartbeat connected on RolloutWorker_w29 [2024-06-06 15:06:57,318][24114] Fps is (10 sec: 39320.7, 60 sec: 40140.7, 300 sec: 28813.2). Total num frames: 523026432. Throughput: 0: 41923.1. Samples: 4361000. Policy #0 lag: (min: 0.0, avg: 79.2, max: 230.0) [2024-06-06 15:06:57,319][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:06:58,458][24347] Updated weights for policy 0, policy_version 31928 (0.0025) [2024-06-06 15:07:01,136][24378] Worker 30 awakens! [2024-06-06 15:07:01,151][24114] Heartbeat connected on RolloutWorker_w30 [2024-06-06 15:07:02,056][24347] Updated weights for policy 0, policy_version 31938 (0.0044) [2024-06-06 15:07:02,318][24114] Fps is (10 sec: 42599.1, 60 sec: 41233.2, 300 sec: 29491.2). Total num frames: 523272192. Throughput: 0: 42009.3. Samples: 4490760. Policy #0 lag: (min: 0.0, avg: 79.2, max: 230.0) [2024-06-06 15:07:02,318][24114] Avg episode reward: [(0, '0.170')] [2024-06-06 15:07:05,816][24377] Worker 31 awakens! [2024-06-06 15:07:05,828][24114] Heartbeat connected on RolloutWorker_w31 [2024-06-06 15:07:05,943][24347] Updated weights for policy 0, policy_version 31948 (0.0030) [2024-06-06 15:07:07,318][24114] Fps is (10 sec: 47514.7, 60 sec: 42052.3, 300 sec: 30019.7). Total num frames: 523501568. Throughput: 0: 42450.4. Samples: 4752880. Policy #0 lag: (min: 0.0, avg: 79.2, max: 230.0) [2024-06-06 15:07:07,318][24114] Avg episode reward: [(0, '0.168')] [2024-06-06 15:07:09,468][24347] Updated weights for policy 0, policy_version 31958 (0.0028) [2024-06-06 15:07:12,318][24114] Fps is (10 sec: 42597.7, 60 sec: 41779.2, 300 sec: 30310.4). Total num frames: 523698176. Throughput: 0: 43082.3. Samples: 5021780. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 15:07:12,319][24114] Avg episode reward: [(0, '0.175')] [2024-06-06 15:07:13,486][24347] Updated weights for policy 0, policy_version 31968 (0.0035) [2024-06-06 15:07:16,481][24347] Updated weights for policy 0, policy_version 31978 (0.0037) [2024-06-06 15:07:17,322][24114] Fps is (10 sec: 44219.0, 60 sec: 42595.5, 300 sec: 30880.6). Total num frames: 523943936. Throughput: 0: 43243.8. Samples: 5150980. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 15:07:17,322][24114] Avg episode reward: [(0, '0.165')] [2024-06-06 15:07:20,630][24347] Updated weights for policy 0, policy_version 31988 (0.0044) [2024-06-06 15:07:22,318][24114] Fps is (10 sec: 47513.9, 60 sec: 42871.6, 300 sec: 31322.4). Total num frames: 524173312. Throughput: 0: 43714.6. Samples: 5425440. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 15:07:22,318][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:07:23,672][24347] Updated weights for policy 0, policy_version 31998 (0.0046) [2024-06-06 15:07:27,318][24114] Fps is (10 sec: 44254.0, 60 sec: 43144.6, 300 sec: 31644.5). Total num frames: 524386304. Throughput: 0: 44256.4. Samples: 5698820. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 15:07:27,319][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:07:27,641][24347] Updated weights for policy 0, policy_version 32008 (0.0040) [2024-06-06 15:07:31,106][24347] Updated weights for policy 0, policy_version 32018 (0.0033) [2024-06-06 15:07:32,318][24114] Fps is (10 sec: 40959.9, 60 sec: 43690.7, 300 sec: 31857.8). Total num frames: 524582912. Throughput: 0: 44109.2. Samples: 5824060. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:07:32,318][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:07:35,214][24326] Signal inference workers to stop experience collection... (50 times) [2024-06-06 15:07:35,271][24347] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-06 15:07:35,323][24326] Signal inference workers to resume experience collection... (50 times) [2024-06-06 15:07:35,324][24347] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-06 15:07:35,327][24347] Updated weights for policy 0, policy_version 32028 (0.0030) [2024-06-06 15:07:37,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44509.8, 300 sec: 32413.8). Total num frames: 524845056. Throughput: 0: 44276.0. Samples: 6089900. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:07:37,318][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:07:38,578][24347] Updated weights for policy 0, policy_version 32038 (0.0026) [2024-06-06 15:07:42,318][24114] Fps is (10 sec: 47513.5, 60 sec: 44509.9, 300 sec: 32681.8). Total num frames: 525058048. Throughput: 0: 44536.6. Samples: 6365140. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:07:42,319][24114] Avg episode reward: [(0, '0.183')] [2024-06-06 15:07:42,875][24347] Updated weights for policy 0, policy_version 32048 (0.0041) [2024-06-06 15:07:45,961][24347] Updated weights for policy 0, policy_version 32058 (0.0034) [2024-06-06 15:07:47,318][24114] Fps is (10 sec: 42598.5, 60 sec: 43963.7, 300 sec: 32936.0). Total num frames: 525271040. Throughput: 0: 44560.8. Samples: 6496000. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:07:47,318][24114] Avg episode reward: [(0, '0.175')] [2024-06-06 15:07:49,969][24347] Updated weights for policy 0, policy_version 32068 (0.0053) [2024-06-06 15:07:52,320][24114] Fps is (10 sec: 44229.8, 60 sec: 44235.6, 300 sec: 33259.3). Total num frames: 525500416. Throughput: 0: 44632.6. Samples: 6761420. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 15:07:52,320][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:07:53,096][24347] Updated weights for policy 0, policy_version 32078 (0.0033) [2024-06-06 15:07:57,096][24347] Updated weights for policy 0, policy_version 32088 (0.0031) [2024-06-06 15:07:57,318][24114] Fps is (10 sec: 45874.6, 60 sec: 45056.0, 300 sec: 33567.2). Total num frames: 525729792. Throughput: 0: 44627.1. Samples: 7030000. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 15:07:57,318][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:08:00,327][24347] Updated weights for policy 0, policy_version 32098 (0.0022) [2024-06-06 15:08:02,318][24114] Fps is (10 sec: 45882.6, 60 sec: 44782.8, 300 sec: 33860.3). Total num frames: 525959168. Throughput: 0: 44782.2. Samples: 7166000. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 15:08:02,318][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:08:04,524][24347] Updated weights for policy 0, policy_version 32108 (0.0035) [2024-06-06 15:08:07,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44782.8, 300 sec: 34139.7). Total num frames: 526188544. Throughput: 0: 44693.7. Samples: 7436660. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 15:08:07,319][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:08:08,000][24347] Updated weights for policy 0, policy_version 32118 (0.0046) [2024-06-06 15:08:12,189][24347] Updated weights for policy 0, policy_version 32128 (0.0035) [2024-06-06 15:08:12,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44782.9, 300 sec: 34257.4). Total num frames: 526385152. Throughput: 0: 44608.0. Samples: 7706180. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:08:12,319][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:08:15,176][24347] Updated weights for policy 0, policy_version 32138 (0.0040) [2024-06-06 15:08:17,318][24114] Fps is (10 sec: 42598.9, 60 sec: 44512.8, 300 sec: 34515.6). Total num frames: 526614528. Throughput: 0: 44805.8. Samples: 7840320. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:08:17,318][24114] Avg episode reward: [(0, '0.177')] [2024-06-06 15:08:19,315][24347] Updated weights for policy 0, policy_version 32148 (0.0034) [2024-06-06 15:08:22,321][24114] Fps is (10 sec: 47497.5, 60 sec: 44780.3, 300 sec: 34833.3). Total num frames: 526860288. Throughput: 0: 44767.7. Samples: 8104600. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:08:22,322][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:08:22,332][24347] Updated weights for policy 0, policy_version 32158 (0.0042) [2024-06-06 15:08:22,336][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000032158_526876672.pth... [2024-06-06 15:08:22,396][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000031668_518848512.pth [2024-06-06 15:08:26,481][24347] Updated weights for policy 0, policy_version 32168 (0.0040) [2024-06-06 15:08:27,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44783.0, 300 sec: 34999.0). Total num frames: 527073280. Throughput: 0: 44747.6. Samples: 8378780. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:08:27,318][24114] Avg episode reward: [(0, '0.177')] [2024-06-06 15:08:29,903][24347] Updated weights for policy 0, policy_version 32178 (0.0041) [2024-06-06 15:08:32,318][24114] Fps is (10 sec: 42613.3, 60 sec: 45056.0, 300 sec: 35157.3). Total num frames: 527286272. Throughput: 0: 44762.2. Samples: 8510300. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:08:32,318][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:08:33,817][24347] Updated weights for policy 0, policy_version 32188 (0.0036) [2024-06-06 15:08:37,318][24114] Fps is (10 sec: 44234.9, 60 sec: 44509.6, 300 sec: 35376.0). Total num frames: 527515648. Throughput: 0: 44883.4. Samples: 8781120. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:08:37,319][24114] Avg episode reward: [(0, '0.177')] [2024-06-06 15:08:37,327][24347] Updated weights for policy 0, policy_version 32198 (0.0033) [2024-06-06 15:08:41,398][24347] Updated weights for policy 0, policy_version 32208 (0.0027) [2024-06-06 15:08:42,318][24114] Fps is (10 sec: 47513.3, 60 sec: 45056.0, 300 sec: 35651.6). Total num frames: 527761408. Throughput: 0: 44891.2. Samples: 9050100. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:08:42,318][24114] Avg episode reward: [(0, '0.188')] [2024-06-06 15:08:44,380][24347] Updated weights for policy 0, policy_version 32218 (0.0045) [2024-06-06 15:08:47,318][24114] Fps is (10 sec: 44238.8, 60 sec: 44783.0, 300 sec: 35723.6). Total num frames: 527958016. Throughput: 0: 44788.5. Samples: 9181480. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:08:47,318][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:08:48,460][24347] Updated weights for policy 0, policy_version 32228 (0.0024) [2024-06-06 15:08:51,905][24347] Updated weights for policy 0, policy_version 32238 (0.0040) [2024-06-06 15:08:52,318][24114] Fps is (10 sec: 42598.8, 60 sec: 44784.2, 300 sec: 35918.8). Total num frames: 528187392. Throughput: 0: 44678.8. Samples: 9447200. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:08:52,318][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:08:55,679][24347] Updated weights for policy 0, policy_version 32248 (0.0027) [2024-06-06 15:08:57,318][24114] Fps is (10 sec: 45874.7, 60 sec: 44783.0, 300 sec: 36106.6). Total num frames: 528416768. Throughput: 0: 44636.9. Samples: 9714840. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 15:08:57,319][24114] Avg episode reward: [(0, '0.177')] [2024-06-06 15:08:59,427][24347] Updated weights for policy 0, policy_version 32258 (0.0042) [2024-06-06 15:09:02,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44783.0, 300 sec: 36287.5). Total num frames: 528646144. Throughput: 0: 44595.1. Samples: 9847100. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 15:09:02,318][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:09:03,432][24347] Updated weights for policy 0, policy_version 32268 (0.0029) [2024-06-06 15:09:06,611][24347] Updated weights for policy 0, policy_version 32278 (0.0038) [2024-06-06 15:09:07,324][24114] Fps is (10 sec: 44210.5, 60 sec: 44505.5, 300 sec: 36401.5). Total num frames: 528859136. Throughput: 0: 44712.2. Samples: 10116760. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 15:09:07,325][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:09:10,818][24347] Updated weights for policy 0, policy_version 32288 (0.0027) [2024-06-06 15:09:11,449][24326] Signal inference workers to stop experience collection... (100 times) [2024-06-06 15:09:11,449][24326] Signal inference workers to resume experience collection... (100 times) [2024-06-06 15:09:11,467][24347] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-06 15:09:11,467][24347] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-06 15:09:12,318][24114] Fps is (10 sec: 44236.8, 60 sec: 45056.1, 300 sec: 36571.4). Total num frames: 529088512. Throughput: 0: 44507.5. Samples: 10381620. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 15:09:12,318][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:09:14,031][24347] Updated weights for policy 0, policy_version 32298 (0.0034) [2024-06-06 15:09:17,324][24114] Fps is (10 sec: 44237.2, 60 sec: 44778.5, 300 sec: 36676.4). Total num frames: 529301504. Throughput: 0: 44499.5. Samples: 10513040. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 15:09:17,325][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:09:18,068][24347] Updated weights for policy 0, policy_version 32308 (0.0030) [2024-06-06 15:09:21,634][24347] Updated weights for policy 0, policy_version 32318 (0.0022) [2024-06-06 15:09:22,324][24114] Fps is (10 sec: 42572.2, 60 sec: 44234.8, 300 sec: 36778.5). Total num frames: 529514496. Throughput: 0: 44399.7. Samples: 10779360. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 15:09:22,326][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:09:25,247][24347] Updated weights for policy 0, policy_version 32328 (0.0023) [2024-06-06 15:09:27,318][24114] Fps is (10 sec: 44263.2, 60 sec: 44509.9, 300 sec: 36933.4). Total num frames: 529743872. Throughput: 0: 44268.5. Samples: 11042180. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 15:09:27,318][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:09:28,829][24347] Updated weights for policy 0, policy_version 32338 (0.0029) [2024-06-06 15:09:32,318][24114] Fps is (10 sec: 44263.2, 60 sec: 44509.7, 300 sec: 37655.4). Total num frames: 529956864. Throughput: 0: 44337.6. Samples: 11176680. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 15:09:32,319][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:09:32,759][24347] Updated weights for policy 0, policy_version 32348 (0.0032) [2024-06-06 15:09:36,381][24347] Updated weights for policy 0, policy_version 32358 (0.0032) [2024-06-06 15:09:37,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44237.1, 300 sec: 37822.1). Total num frames: 530169856. Throughput: 0: 44369.8. Samples: 11443840. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:09:37,318][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:09:40,095][24347] Updated weights for policy 0, policy_version 32368 (0.0037) [2024-06-06 15:09:42,318][24114] Fps is (10 sec: 44237.6, 60 sec: 43963.8, 300 sec: 38599.6). Total num frames: 530399232. Throughput: 0: 44429.0. Samples: 11714140. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:09:42,318][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:09:43,531][24347] Updated weights for policy 0, policy_version 32378 (0.0039) [2024-06-06 15:09:47,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44509.9, 300 sec: 39321.6). Total num frames: 530628608. Throughput: 0: 44384.9. Samples: 11844420. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:09:47,319][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:09:47,432][24347] Updated weights for policy 0, policy_version 32388 (0.0024) [2024-06-06 15:09:51,041][24347] Updated weights for policy 0, policy_version 32398 (0.0029) [2024-06-06 15:09:52,318][24114] Fps is (10 sec: 42598.3, 60 sec: 43963.7, 300 sec: 39932.5). Total num frames: 530825216. Throughput: 0: 44399.7. Samples: 12114480. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:09:52,318][24114] Avg episode reward: [(0, '0.176')] [2024-06-06 15:09:54,508][24347] Updated weights for policy 0, policy_version 32408 (0.0024) [2024-06-06 15:09:57,318][24114] Fps is (10 sec: 42598.3, 60 sec: 43963.8, 300 sec: 40543.5). Total num frames: 531054592. Throughput: 0: 44390.2. Samples: 12379180. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 15:09:57,319][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:09:58,338][24347] Updated weights for policy 0, policy_version 32418 (0.0027) [2024-06-06 15:10:02,249][24347] Updated weights for policy 0, policy_version 32428 (0.0036) [2024-06-06 15:10:02,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44236.8, 300 sec: 41209.9). Total num frames: 531300352. Throughput: 0: 44534.7. Samples: 12516840. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 15:10:02,318][24114] Avg episode reward: [(0, '0.176')] [2024-06-06 15:10:05,803][24347] Updated weights for policy 0, policy_version 32438 (0.0035) [2024-06-06 15:10:07,318][24114] Fps is (10 sec: 45874.9, 60 sec: 44241.2, 300 sec: 41543.1). Total num frames: 531513344. Throughput: 0: 44339.8. Samples: 12774380. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 15:10:07,319][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:10:09,461][24347] Updated weights for policy 0, policy_version 32448 (0.0037) [2024-06-06 15:10:12,324][24114] Fps is (10 sec: 44210.3, 60 sec: 44232.4, 300 sec: 41986.6). Total num frames: 531742720. Throughput: 0: 44548.7. Samples: 13047140. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 15:10:12,325][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:10:13,322][24347] Updated weights for policy 0, policy_version 32458 (0.0032) [2024-06-06 15:10:16,967][24347] Updated weights for policy 0, policy_version 32468 (0.0051) [2024-06-06 15:10:17,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44241.2, 300 sec: 42320.7). Total num frames: 531955712. Throughput: 0: 44555.4. Samples: 13181660. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 15:10:17,318][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:10:20,920][24347] Updated weights for policy 0, policy_version 32478 (0.0043) [2024-06-06 15:10:22,318][24114] Fps is (10 sec: 44263.4, 60 sec: 44514.4, 300 sec: 42598.4). Total num frames: 532185088. Throughput: 0: 44499.5. Samples: 13446320. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 15:10:22,319][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:10:22,340][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000032482_532185088.pth... [2024-06-06 15:10:22,385][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000031836_521601024.pth [2024-06-06 15:10:23,326][24326] Signal inference workers to stop experience collection... (150 times) [2024-06-06 15:10:23,376][24347] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-06 15:10:23,379][24326] Signal inference workers to resume experience collection... (150 times) [2024-06-06 15:10:23,390][24347] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-06 15:10:24,122][24347] Updated weights for policy 0, policy_version 32488 (0.0040) [2024-06-06 15:10:27,318][24114] Fps is (10 sec: 45874.3, 60 sec: 44509.7, 300 sec: 42876.1). Total num frames: 532414464. Throughput: 0: 44321.2. Samples: 13708600. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 15:10:27,319][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:10:27,980][24347] Updated weights for policy 0, policy_version 32498 (0.0039) [2024-06-06 15:10:31,799][24347] Updated weights for policy 0, policy_version 32508 (0.0036) [2024-06-06 15:10:32,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44510.0, 300 sec: 43098.2). Total num frames: 532627456. Throughput: 0: 44313.3. Samples: 13838520. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 15:10:32,318][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:10:35,162][24347] Updated weights for policy 0, policy_version 32518 (0.0032) [2024-06-06 15:10:37,318][24114] Fps is (10 sec: 44237.3, 60 sec: 44782.9, 300 sec: 43264.9). Total num frames: 532856832. Throughput: 0: 44386.2. Samples: 14111860. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 15:10:37,318][24114] Avg episode reward: [(0, '0.183')] [2024-06-06 15:10:39,189][24347] Updated weights for policy 0, policy_version 32528 (0.0036) [2024-06-06 15:10:42,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44509.9, 300 sec: 43376.0). Total num frames: 533069824. Throughput: 0: 44280.0. Samples: 14371780. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:10:42,318][24114] Avg episode reward: [(0, '0.183')] [2024-06-06 15:10:42,638][24347] Updated weights for policy 0, policy_version 32538 (0.0030) [2024-06-06 15:10:46,600][24347] Updated weights for policy 0, policy_version 32548 (0.0039) [2024-06-06 15:10:47,318][24114] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 43598.1). Total num frames: 533315584. Throughput: 0: 44254.6. Samples: 14508300. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:10:47,318][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:10:49,953][24347] Updated weights for policy 0, policy_version 32558 (0.0038) [2024-06-06 15:10:52,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44782.9, 300 sec: 43709.2). Total num frames: 533512192. Throughput: 0: 44476.9. Samples: 14775840. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:10:52,319][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:10:53,677][24347] Updated weights for policy 0, policy_version 32568 (0.0021) [2024-06-06 15:10:57,079][24347] Updated weights for policy 0, policy_version 32578 (0.0031) [2024-06-06 15:10:57,318][24114] Fps is (10 sec: 44236.9, 60 sec: 45056.0, 300 sec: 43931.3). Total num frames: 533757952. Throughput: 0: 44285.4. Samples: 15039720. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:10:57,319][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:11:01,319][24347] Updated weights for policy 0, policy_version 32588 (0.0037) [2024-06-06 15:11:02,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44509.8, 300 sec: 44042.4). Total num frames: 533970944. Throughput: 0: 44330.9. Samples: 15176560. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 15:11:02,319][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:11:04,649][24347] Updated weights for policy 0, policy_version 32598 (0.0037) [2024-06-06 15:11:07,318][24114] Fps is (10 sec: 42598.6, 60 sec: 44509.9, 300 sec: 44042.4). Total num frames: 534183936. Throughput: 0: 44406.7. Samples: 15444620. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 15:11:07,319][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:11:08,722][24347] Updated weights for policy 0, policy_version 32608 (0.0028) [2024-06-06 15:11:12,029][24347] Updated weights for policy 0, policy_version 32618 (0.0028) [2024-06-06 15:11:12,318][24114] Fps is (10 sec: 44236.9, 60 sec: 44514.3, 300 sec: 44153.5). Total num frames: 534413312. Throughput: 0: 44402.3. Samples: 15706700. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 15:11:12,319][24114] Avg episode reward: [(0, '0.176')] [2024-06-06 15:11:16,202][24347] Updated weights for policy 0, policy_version 32628 (0.0042) [2024-06-06 15:11:17,318][24114] Fps is (10 sec: 45874.6, 60 sec: 44782.8, 300 sec: 44209.0). Total num frames: 534642688. Throughput: 0: 44551.4. Samples: 15843340. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 15:11:17,319][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:11:19,234][24347] Updated weights for policy 0, policy_version 32638 (0.0033) [2024-06-06 15:11:22,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44236.7, 300 sec: 44209.0). Total num frames: 534839296. Throughput: 0: 44327.0. Samples: 16106580. Policy #0 lag: (min: 1.0, avg: 9.5, max: 20.0) [2024-06-06 15:11:22,319][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:11:23,325][24347] Updated weights for policy 0, policy_version 32648 (0.0037) [2024-06-06 15:11:26,429][24347] Updated weights for policy 0, policy_version 32658 (0.0033) [2024-06-06 15:11:27,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 535068672. Throughput: 0: 44532.8. Samples: 16375760. Policy #0 lag: (min: 1.0, avg: 9.5, max: 20.0) [2024-06-06 15:11:27,319][24114] Avg episode reward: [(0, '0.175')] [2024-06-06 15:11:30,876][24347] Updated weights for policy 0, policy_version 32668 (0.0050) [2024-06-06 15:11:32,318][24114] Fps is (10 sec: 47513.8, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 535314432. Throughput: 0: 44625.4. Samples: 16516440. Policy #0 lag: (min: 1.0, avg: 9.5, max: 20.0) [2024-06-06 15:11:32,318][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:11:34,108][24347] Updated weights for policy 0, policy_version 32678 (0.0026) [2024-06-06 15:11:37,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 535511040. Throughput: 0: 44651.6. Samples: 16785160. Policy #0 lag: (min: 1.0, avg: 9.5, max: 20.0) [2024-06-06 15:11:37,318][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:11:38,231][24347] Updated weights for policy 0, policy_version 32688 (0.0035) [2024-06-06 15:11:40,600][24326] Signal inference workers to stop experience collection... (200 times) [2024-06-06 15:11:40,604][24326] Signal inference workers to resume experience collection... (200 times) [2024-06-06 15:11:40,618][24347] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-06 15:11:40,655][24347] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-06 15:11:41,250][24347] Updated weights for policy 0, policy_version 32698 (0.0046) [2024-06-06 15:11:42,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 535740416. Throughput: 0: 44459.6. Samples: 17040400. Policy #0 lag: (min: 1.0, avg: 9.5, max: 20.0) [2024-06-06 15:11:42,319][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:11:45,585][24347] Updated weights for policy 0, policy_version 32708 (0.0044) [2024-06-06 15:11:47,318][24114] Fps is (10 sec: 45874.9, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 535969792. Throughput: 0: 44504.0. Samples: 17179240. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:11:47,319][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:11:48,695][24347] Updated weights for policy 0, policy_version 32718 (0.0032) [2024-06-06 15:11:52,318][24114] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 536166400. Throughput: 0: 44335.9. Samples: 17439740. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:11:52,319][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:11:53,112][24347] Updated weights for policy 0, policy_version 32728 (0.0035) [2024-06-06 15:11:56,518][24347] Updated weights for policy 0, policy_version 32738 (0.0030) [2024-06-06 15:11:57,320][24114] Fps is (10 sec: 44228.5, 60 sec: 44235.4, 300 sec: 44542.0). Total num frames: 536412160. Throughput: 0: 44469.7. Samples: 17707920. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:11:57,321][24114] Avg episode reward: [(0, '0.176')] [2024-06-06 15:12:00,454][24347] Updated weights for policy 0, policy_version 32748 (0.0033) [2024-06-06 15:12:02,318][24114] Fps is (10 sec: 47513.7, 60 sec: 44509.9, 300 sec: 44542.2). Total num frames: 536641536. Throughput: 0: 44529.8. Samples: 17847180. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:12:02,319][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:12:03,740][24347] Updated weights for policy 0, policy_version 32758 (0.0027) [2024-06-06 15:12:07,318][24114] Fps is (10 sec: 42606.4, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 536838144. Throughput: 0: 44448.5. Samples: 18106760. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:12:07,318][24114] Avg episode reward: [(0, '0.176')] [2024-06-06 15:12:07,748][24347] Updated weights for policy 0, policy_version 32768 (0.0042) [2024-06-06 15:12:10,793][24347] Updated weights for policy 0, policy_version 32778 (0.0045) [2024-06-06 15:12:12,318][24114] Fps is (10 sec: 42598.8, 60 sec: 44236.9, 300 sec: 44487.3). Total num frames: 537067520. Throughput: 0: 44503.2. Samples: 18378400. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:12:12,318][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:12:15,191][24347] Updated weights for policy 0, policy_version 32788 (0.0031) [2024-06-06 15:12:17,318][24114] Fps is (10 sec: 45875.6, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 537296896. Throughput: 0: 44350.7. Samples: 18512220. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:12:17,318][24114] Avg episode reward: [(0, '0.176')] [2024-06-06 15:12:18,361][24347] Updated weights for policy 0, policy_version 32798 (0.0028) [2024-06-06 15:12:22,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 537493504. Throughput: 0: 44369.3. Samples: 18781780. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 15:12:22,318][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:12:22,330][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000032806_537493504.pth... [2024-06-06 15:12:22,393][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000032158_526876672.pth [2024-06-06 15:12:22,687][24347] Updated weights for policy 0, policy_version 32808 (0.0047) [2024-06-06 15:12:25,982][24347] Updated weights for policy 0, policy_version 32818 (0.0034) [2024-06-06 15:12:27,318][24114] Fps is (10 sec: 44236.2, 60 sec: 44509.8, 300 sec: 44597.8). Total num frames: 537739264. Throughput: 0: 44583.9. Samples: 19046680. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 15:12:27,319][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:12:29,829][24347] Updated weights for policy 0, policy_version 32828 (0.0038) [2024-06-06 15:12:32,318][24114] Fps is (10 sec: 49151.6, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 537985024. Throughput: 0: 44394.6. Samples: 19177000. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 15:12:32,319][24114] Avg episode reward: [(0, '0.176')] [2024-06-06 15:12:33,189][24347] Updated weights for policy 0, policy_version 32838 (0.0036) [2024-06-06 15:12:37,318][24114] Fps is (10 sec: 42598.9, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 538165248. Throughput: 0: 44595.7. Samples: 19446540. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 15:12:37,318][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:12:37,358][24347] Updated weights for policy 0, policy_version 32848 (0.0028) [2024-06-06 15:12:40,392][24347] Updated weights for policy 0, policy_version 32858 (0.0033) [2024-06-06 15:12:42,318][24114] Fps is (10 sec: 40959.8, 60 sec: 44236.7, 300 sec: 44486.7). Total num frames: 538394624. Throughput: 0: 44472.0. Samples: 19709080. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 15:12:42,319][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:12:44,503][24347] Updated weights for policy 0, policy_version 32868 (0.0037) [2024-06-06 15:12:47,318][24114] Fps is (10 sec: 47513.1, 60 sec: 44509.8, 300 sec: 44542.5). Total num frames: 538640384. Throughput: 0: 44448.9. Samples: 19847380. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-06 15:12:47,318][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:12:47,693][24347] Updated weights for policy 0, policy_version 32878 (0.0024) [2024-06-06 15:12:52,111][24347] Updated weights for policy 0, policy_version 32888 (0.0040) [2024-06-06 15:12:52,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 538836992. Throughput: 0: 44595.6. Samples: 20113560. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-06 15:12:52,318][24114] Avg episode reward: [(0, '0.176')] [2024-06-06 15:12:55,253][24347] Updated weights for policy 0, policy_version 32898 (0.0044) [2024-06-06 15:12:57,318][24114] Fps is (10 sec: 42598.9, 60 sec: 44238.3, 300 sec: 44431.2). Total num frames: 539066368. Throughput: 0: 44477.8. Samples: 20379900. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-06 15:12:57,318][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:12:59,730][24347] Updated weights for policy 0, policy_version 32908 (0.0032) [2024-06-06 15:13:02,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 539312128. Throughput: 0: 44559.9. Samples: 20517420. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-06 15:13:02,319][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:13:02,648][24347] Updated weights for policy 0, policy_version 32918 (0.0029) [2024-06-06 15:13:06,844][24347] Updated weights for policy 0, policy_version 32928 (0.0036) [2024-06-06 15:13:07,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 539492352. Throughput: 0: 44414.7. Samples: 20780440. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-06 15:13:07,318][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:13:09,696][24347] Updated weights for policy 0, policy_version 32938 (0.0038) [2024-06-06 15:13:11,923][24326] Signal inference workers to stop experience collection... (250 times) [2024-06-06 15:13:11,924][24326] Signal inference workers to resume experience collection... (250 times) [2024-06-06 15:13:11,967][24347] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-06 15:13:11,967][24347] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-06 15:13:12,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 539738112. Throughput: 0: 44543.6. Samples: 21051140. Policy #0 lag: (min: 0.0, avg: 9.7, max: 23.0) [2024-06-06 15:13:12,319][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:13:14,011][24347] Updated weights for policy 0, policy_version 32948 (0.0026) [2024-06-06 15:13:17,296][24347] Updated weights for policy 0, policy_version 32958 (0.0035) [2024-06-06 15:13:17,318][24114] Fps is (10 sec: 49151.5, 60 sec: 44782.9, 300 sec: 44487.2). Total num frames: 539983872. Throughput: 0: 44642.2. Samples: 21185900. Policy #0 lag: (min: 0.0, avg: 9.7, max: 23.0) [2024-06-06 15:13:17,319][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:13:21,332][24347] Updated weights for policy 0, policy_version 32968 (0.0031) [2024-06-06 15:13:22,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44782.8, 300 sec: 44431.2). Total num frames: 540180480. Throughput: 0: 44634.5. Samples: 21455100. Policy #0 lag: (min: 0.0, avg: 9.7, max: 23.0) [2024-06-06 15:13:22,319][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:13:24,646][24347] Updated weights for policy 0, policy_version 32978 (0.0025) [2024-06-06 15:13:27,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 540409856. Throughput: 0: 44665.4. Samples: 21719020. Policy #0 lag: (min: 0.0, avg: 9.7, max: 23.0) [2024-06-06 15:13:27,319][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:13:29,020][24347] Updated weights for policy 0, policy_version 32988 (0.0040) [2024-06-06 15:13:31,974][24347] Updated weights for policy 0, policy_version 32998 (0.0040) [2024-06-06 15:13:32,318][24114] Fps is (10 sec: 47514.0, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 540655616. Throughput: 0: 44567.2. Samples: 21852900. Policy #0 lag: (min: 0.0, avg: 7.4, max: 19.0) [2024-06-06 15:13:32,318][24114] Avg episode reward: [(0, '0.176')] [2024-06-06 15:13:36,259][24347] Updated weights for policy 0, policy_version 33008 (0.0035) [2024-06-06 15:13:37,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44782.9, 300 sec: 44375.6). Total num frames: 540852224. Throughput: 0: 44589.8. Samples: 22120100. Policy #0 lag: (min: 0.0, avg: 7.4, max: 19.0) [2024-06-06 15:13:37,319][24114] Avg episode reward: [(0, '0.183')] [2024-06-06 15:13:39,200][24347] Updated weights for policy 0, policy_version 33018 (0.0026) [2024-06-06 15:13:42,318][24114] Fps is (10 sec: 40960.0, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 541065216. Throughput: 0: 44519.5. Samples: 22383280. Policy #0 lag: (min: 0.0, avg: 7.4, max: 19.0) [2024-06-06 15:13:42,318][24114] Avg episode reward: [(0, '0.188')] [2024-06-06 15:13:43,635][24347] Updated weights for policy 0, policy_version 33028 (0.0047) [2024-06-06 15:13:46,770][24347] Updated weights for policy 0, policy_version 33038 (0.0039) [2024-06-06 15:13:47,318][24114] Fps is (10 sec: 47513.7, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 541327360. Throughput: 0: 44453.8. Samples: 22517840. Policy #0 lag: (min: 0.0, avg: 7.4, max: 19.0) [2024-06-06 15:13:47,319][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:13:51,043][24347] Updated weights for policy 0, policy_version 33048 (0.0046) [2024-06-06 15:13:52,318][24114] Fps is (10 sec: 47513.7, 60 sec: 45056.0, 300 sec: 44486.7). Total num frames: 541540352. Throughput: 0: 44694.2. Samples: 22791680. Policy #0 lag: (min: 1.0, avg: 8.4, max: 21.0) [2024-06-06 15:13:52,318][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:13:53,979][24347] Updated weights for policy 0, policy_version 33058 (0.0025) [2024-06-06 15:13:57,318][24114] Fps is (10 sec: 42598.2, 60 sec: 44782.8, 300 sec: 44431.2). Total num frames: 541753344. Throughput: 0: 44594.2. Samples: 23057880. Policy #0 lag: (min: 1.0, avg: 8.4, max: 21.0) [2024-06-06 15:13:57,319][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:13:58,617][24347] Updated weights for policy 0, policy_version 33068 (0.0028) [2024-06-06 15:14:01,296][24347] Updated weights for policy 0, policy_version 33078 (0.0036) [2024-06-06 15:14:02,318][24114] Fps is (10 sec: 44236.1, 60 sec: 44509.8, 300 sec: 44487.6). Total num frames: 541982720. Throughput: 0: 44472.4. Samples: 23187160. Policy #0 lag: (min: 1.0, avg: 8.4, max: 21.0) [2024-06-06 15:14:02,319][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:14:05,686][24347] Updated weights for policy 0, policy_version 33088 (0.0036) [2024-06-06 15:14:07,318][24114] Fps is (10 sec: 45875.7, 60 sec: 45329.1, 300 sec: 44486.7). Total num frames: 542212096. Throughput: 0: 44608.6. Samples: 23462480. Policy #0 lag: (min: 1.0, avg: 8.4, max: 21.0) [2024-06-06 15:14:07,318][24114] Avg episode reward: [(0, '0.183')] [2024-06-06 15:14:08,445][24347] Updated weights for policy 0, policy_version 33098 (0.0024) [2024-06-06 15:14:12,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44509.9, 300 sec: 44432.1). Total num frames: 542408704. Throughput: 0: 44690.2. Samples: 23730080. Policy #0 lag: (min: 1.0, avg: 8.4, max: 21.0) [2024-06-06 15:14:12,319][24114] Avg episode reward: [(0, '0.175')] [2024-06-06 15:14:12,771][24347] Updated weights for policy 0, policy_version 33108 (0.0031) [2024-06-06 15:14:15,898][24347] Updated weights for policy 0, policy_version 33118 (0.0025) [2024-06-06 15:14:17,324][24114] Fps is (10 sec: 44210.5, 60 sec: 44505.5, 300 sec: 44542.3). Total num frames: 542654464. Throughput: 0: 44641.7. Samples: 23862040. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 15:14:17,324][24114] Avg episode reward: [(0, '0.183')] [2024-06-06 15:14:20,237][24347] Updated weights for policy 0, policy_version 33128 (0.0042) [2024-06-06 15:14:22,318][24114] Fps is (10 sec: 47513.8, 60 sec: 45056.1, 300 sec: 44542.3). Total num frames: 542883840. Throughput: 0: 44755.6. Samples: 24134100. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 15:14:22,318][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:14:22,362][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000033136_542900224.pth... [2024-06-06 15:14:22,417][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000032482_532185088.pth [2024-06-06 15:14:23,419][24347] Updated weights for policy 0, policy_version 33138 (0.0020) [2024-06-06 15:14:27,318][24114] Fps is (10 sec: 42622.8, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 543080448. Throughput: 0: 44825.6. Samples: 24400440. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 15:14:27,319][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:14:27,770][24347] Updated weights for policy 0, policy_version 33148 (0.0035) [2024-06-06 15:14:30,641][24347] Updated weights for policy 0, policy_version 33158 (0.0037) [2024-06-06 15:14:32,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 44542.2). Total num frames: 543309824. Throughput: 0: 44703.1. Samples: 24529480. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 15:14:32,319][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:14:35,115][24347] Updated weights for policy 0, policy_version 33168 (0.0034) [2024-06-06 15:14:36,055][24326] Signal inference workers to stop experience collection... (300 times) [2024-06-06 15:14:36,056][24326] Signal inference workers to resume experience collection... (300 times) [2024-06-06 15:14:36,075][24347] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-06 15:14:36,075][24347] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-06 15:14:37,318][24114] Fps is (10 sec: 45876.0, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 543539200. Throughput: 0: 44603.5. Samples: 24798840. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 15:14:37,318][24114] Avg episode reward: [(0, '0.187')] [2024-06-06 15:14:38,021][24347] Updated weights for policy 0, policy_version 33178 (0.0030) [2024-06-06 15:14:42,220][24347] Updated weights for policy 0, policy_version 33188 (0.0037) [2024-06-06 15:14:42,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44783.0, 300 sec: 44486.7). Total num frames: 543752192. Throughput: 0: 44669.0. Samples: 25067980. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 15:14:42,318][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:14:45,316][24347] Updated weights for policy 0, policy_version 33198 (0.0041) [2024-06-06 15:14:47,320][24114] Fps is (10 sec: 44228.2, 60 sec: 44235.4, 300 sec: 44597.5). Total num frames: 543981568. Throughput: 0: 44695.5. Samples: 25198540. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 15:14:47,321][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:14:49,735][24347] Updated weights for policy 0, policy_version 33208 (0.0038) [2024-06-06 15:14:52,320][24114] Fps is (10 sec: 45866.0, 60 sec: 44508.4, 300 sec: 44597.5). Total num frames: 544210944. Throughput: 0: 44375.8. Samples: 25459480. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 15:14:52,321][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:14:52,862][24347] Updated weights for policy 0, policy_version 33218 (0.0032) [2024-06-06 15:14:57,268][24347] Updated weights for policy 0, policy_version 33228 (0.0048) [2024-06-06 15:14:57,318][24114] Fps is (10 sec: 42605.9, 60 sec: 44236.7, 300 sec: 44431.2). Total num frames: 544407552. Throughput: 0: 44633.6. Samples: 25738600. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:14:57,319][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:15:00,240][24347] Updated weights for policy 0, policy_version 33238 (0.0026) [2024-06-06 15:15:02,318][24114] Fps is (10 sec: 42607.0, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 544636928. Throughput: 0: 44532.1. Samples: 25865720. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:15:02,318][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:15:04,476][24347] Updated weights for policy 0, policy_version 33248 (0.0026) [2024-06-06 15:15:07,320][24114] Fps is (10 sec: 45867.2, 60 sec: 44235.3, 300 sec: 44487.3). Total num frames: 544866304. Throughput: 0: 44238.5. Samples: 26124920. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:15:07,321][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:15:07,580][24347] Updated weights for policy 0, policy_version 33258 (0.0029) [2024-06-06 15:15:11,708][24347] Updated weights for policy 0, policy_version 33268 (0.0028) [2024-06-06 15:15:12,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 545095680. Throughput: 0: 44381.6. Samples: 26397600. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:15:12,318][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:15:14,863][24347] Updated weights for policy 0, policy_version 33278 (0.0033) [2024-06-06 15:15:17,318][24114] Fps is (10 sec: 44244.7, 60 sec: 44241.0, 300 sec: 44486.7). Total num frames: 545308672. Throughput: 0: 44421.3. Samples: 26528440. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:15:17,319][24114] Avg episode reward: [(0, '0.183')] [2024-06-06 15:15:19,068][24347] Updated weights for policy 0, policy_version 33288 (0.0038) [2024-06-06 15:15:22,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 545538048. Throughput: 0: 44383.1. Samples: 26796080. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 15:15:22,319][24114] Avg episode reward: [(0, '0.193')] [2024-06-06 15:15:22,415][24347] Updated weights for policy 0, policy_version 33298 (0.0036) [2024-06-06 15:15:26,731][24347] Updated weights for policy 0, policy_version 33308 (0.0034) [2024-06-06 15:15:27,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 545751040. Throughput: 0: 44475.7. Samples: 27069400. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 15:15:27,319][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:15:29,757][24347] Updated weights for policy 0, policy_version 33318 (0.0025) [2024-06-06 15:15:32,324][24114] Fps is (10 sec: 44212.0, 60 sec: 44505.7, 300 sec: 44485.9). Total num frames: 545980416. Throughput: 0: 44422.1. Samples: 27197700. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 15:15:32,324][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:15:33,970][24347] Updated weights for policy 0, policy_version 33328 (0.0034) [2024-06-06 15:15:36,913][24347] Updated weights for policy 0, policy_version 33338 (0.0035) [2024-06-06 15:15:37,318][24114] Fps is (10 sec: 45876.9, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 546209792. Throughput: 0: 44561.2. Samples: 27464640. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 15:15:37,318][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:15:41,078][24347] Updated weights for policy 0, policy_version 33348 (0.0040) [2024-06-06 15:15:42,318][24114] Fps is (10 sec: 44261.5, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 546422784. Throughput: 0: 44248.6. Samples: 27729780. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 15:15:42,319][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:15:44,037][24347] Updated weights for policy 0, policy_version 33358 (0.0037) [2024-06-06 15:15:47,318][24114] Fps is (10 sec: 45874.4, 60 sec: 44784.3, 300 sec: 44597.8). Total num frames: 546668544. Throughput: 0: 44451.4. Samples: 27866040. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 15:15:47,319][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:15:48,665][24347] Updated weights for policy 0, policy_version 33368 (0.0028) [2024-06-06 15:15:51,929][24347] Updated weights for policy 0, policy_version 33378 (0.0037) [2024-06-06 15:15:52,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44238.2, 300 sec: 44431.2). Total num frames: 546865152. Throughput: 0: 44605.0. Samples: 28132060. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 15:15:52,318][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:15:55,839][24347] Updated weights for policy 0, policy_version 33388 (0.0034) [2024-06-06 15:15:57,318][24114] Fps is (10 sec: 42598.8, 60 sec: 44783.1, 300 sec: 44486.7). Total num frames: 547094528. Throughput: 0: 44503.5. Samples: 28400260. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 15:15:57,318][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:15:59,227][24347] Updated weights for policy 0, policy_version 33398 (0.0029) [2024-06-06 15:16:02,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44236.7, 300 sec: 44431.2). Total num frames: 547291136. Throughput: 0: 44465.0. Samples: 28529360. Policy #0 lag: (min: 0.0, avg: 9.7, max: 22.0) [2024-06-06 15:16:02,319][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:16:02,560][24326] Signal inference workers to stop experience collection... (350 times) [2024-06-06 15:16:02,607][24347] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-06 15:16:02,679][24326] Signal inference workers to resume experience collection... (350 times) [2024-06-06 15:16:02,679][24347] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-06 15:16:03,257][24347] Updated weights for policy 0, policy_version 33408 (0.0041) [2024-06-06 15:16:06,673][24347] Updated weights for policy 0, policy_version 33418 (0.0025) [2024-06-06 15:16:07,318][24114] Fps is (10 sec: 44236.9, 60 sec: 44511.3, 300 sec: 44486.7). Total num frames: 547536896. Throughput: 0: 44349.8. Samples: 28791820. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:16:07,318][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:16:10,551][24347] Updated weights for policy 0, policy_version 33428 (0.0026) [2024-06-06 15:16:12,318][24114] Fps is (10 sec: 47513.7, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 547766272. Throughput: 0: 44333.6. Samples: 29064400. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:16:12,318][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:16:14,285][24347] Updated weights for policy 0, policy_version 33438 (0.0042) [2024-06-06 15:16:17,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44510.0, 300 sec: 44542.3). Total num frames: 547979264. Throughput: 0: 44438.1. Samples: 29197160. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:16:17,318][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:16:18,123][24347] Updated weights for policy 0, policy_version 33448 (0.0031) [2024-06-06 15:16:21,724][24347] Updated weights for policy 0, policy_version 33458 (0.0033) [2024-06-06 15:16:22,318][24114] Fps is (10 sec: 42597.4, 60 sec: 44236.6, 300 sec: 44486.7). Total num frames: 548192256. Throughput: 0: 44320.6. Samples: 29459080. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:16:22,319][24114] Avg episode reward: [(0, '0.187')] [2024-06-06 15:16:22,343][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000033459_548192256.pth... [2024-06-06 15:16:22,393][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000032806_537493504.pth [2024-06-06 15:16:25,317][24347] Updated weights for policy 0, policy_version 33468 (0.0023) [2024-06-06 15:16:27,318][24114] Fps is (10 sec: 45874.7, 60 sec: 44783.1, 300 sec: 44486.7). Total num frames: 548438016. Throughput: 0: 44322.2. Samples: 29724280. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:16:27,321][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:16:28,927][24347] Updated weights for policy 0, policy_version 33478 (0.0033) [2024-06-06 15:16:32,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44240.8, 300 sec: 44486.7). Total num frames: 548634624. Throughput: 0: 44247.9. Samples: 29857200. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:16:32,319][24114] Avg episode reward: [(0, '0.188')] [2024-06-06 15:16:32,958][24347] Updated weights for policy 0, policy_version 33488 (0.0029) [2024-06-06 15:16:36,081][24347] Updated weights for policy 0, policy_version 33498 (0.0032) [2024-06-06 15:16:37,318][24114] Fps is (10 sec: 40960.6, 60 sec: 43963.7, 300 sec: 44431.2). Total num frames: 548847616. Throughput: 0: 44252.1. Samples: 30123400. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:16:37,318][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:16:40,058][24347] Updated weights for policy 0, policy_version 33508 (0.0041) [2024-06-06 15:16:42,318][24114] Fps is (10 sec: 45876.0, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 549093376. Throughput: 0: 44223.1. Samples: 30390300. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:16:42,318][24114] Avg episode reward: [(0, '0.187')] [2024-06-06 15:16:43,668][24347] Updated weights for policy 0, policy_version 33518 (0.0041) [2024-06-06 15:16:47,318][24114] Fps is (10 sec: 45875.1, 60 sec: 43963.8, 300 sec: 44542.3). Total num frames: 549306368. Throughput: 0: 44406.3. Samples: 30527640. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:16:47,318][24114] Avg episode reward: [(0, '0.179')] [2024-06-06 15:16:47,620][24347] Updated weights for policy 0, policy_version 33528 (0.0039) [2024-06-06 15:16:51,210][24347] Updated weights for policy 0, policy_version 33538 (0.0028) [2024-06-06 15:16:52,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 44431.5). Total num frames: 549519360. Throughput: 0: 44443.1. Samples: 30791760. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:16:52,319][24114] Avg episode reward: [(0, '0.190')] [2024-06-06 15:16:55,248][24347] Updated weights for policy 0, policy_version 33548 (0.0035) [2024-06-06 15:16:57,318][24114] Fps is (10 sec: 45874.7, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 549765120. Throughput: 0: 44239.5. Samples: 31055180. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:16:57,319][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:16:58,266][24347] Updated weights for policy 0, policy_version 33558 (0.0027) [2024-06-06 15:17:02,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44510.0, 300 sec: 44486.7). Total num frames: 549961728. Throughput: 0: 44360.0. Samples: 31193360. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:17:02,318][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:17:02,422][24347] Updated weights for policy 0, policy_version 33568 (0.0031) [2024-06-06 15:17:05,509][24347] Updated weights for policy 0, policy_version 33578 (0.0026) [2024-06-06 15:17:07,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44236.7, 300 sec: 44486.7). Total num frames: 550191104. Throughput: 0: 44326.4. Samples: 31453760. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:17:07,319][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:17:09,622][24347] Updated weights for policy 0, policy_version 33588 (0.0033) [2024-06-06 15:17:12,324][24114] Fps is (10 sec: 45847.3, 60 sec: 44232.4, 300 sec: 44485.8). Total num frames: 550420480. Throughput: 0: 44396.8. Samples: 31722400. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:17:12,325][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:17:12,903][24347] Updated weights for policy 0, policy_version 33598 (0.0029) [2024-06-06 15:17:17,266][24347] Updated weights for policy 0, policy_version 33608 (0.0026) [2024-06-06 15:17:17,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 550633472. Throughput: 0: 44433.5. Samples: 31856700. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:17:17,318][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:17:20,210][24347] Updated weights for policy 0, policy_version 33618 (0.0036) [2024-06-06 15:17:22,322][24114] Fps is (10 sec: 45886.3, 60 sec: 44780.4, 300 sec: 44541.7). Total num frames: 550879232. Throughput: 0: 44574.6. Samples: 32129420. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:17:22,322][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:17:24,768][24347] Updated weights for policy 0, policy_version 33628 (0.0032) [2024-06-06 15:17:27,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44510.0, 300 sec: 44486.7). Total num frames: 551108608. Throughput: 0: 44449.8. Samples: 32390540. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:17:27,318][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:17:27,741][24347] Updated weights for policy 0, policy_version 33638 (0.0027) [2024-06-06 15:17:31,873][24347] Updated weights for policy 0, policy_version 33648 (0.0033) [2024-06-06 15:17:32,097][24326] Signal inference workers to stop experience collection... (400 times) [2024-06-06 15:17:32,099][24326] Signal inference workers to resume experience collection... (400 times) [2024-06-06 15:17:32,113][24347] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-06 15:17:32,113][24347] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-06 15:17:32,318][24114] Fps is (10 sec: 44252.2, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 551321600. Throughput: 0: 44335.4. Samples: 32522740. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:17:32,319][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:17:35,057][24347] Updated weights for policy 0, policy_version 33658 (0.0027) [2024-06-06 15:17:37,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 551534592. Throughput: 0: 44558.2. Samples: 32796880. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:17:37,319][24114] Avg episode reward: [(0, '0.181')] [2024-06-06 15:17:39,082][24347] Updated weights for policy 0, policy_version 33668 (0.0024) [2024-06-06 15:17:42,121][24347] Updated weights for policy 0, policy_version 33678 (0.0034) [2024-06-06 15:17:42,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44782.8, 300 sec: 44542.3). Total num frames: 551780352. Throughput: 0: 44655.5. Samples: 33064680. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:17:42,319][24114] Avg episode reward: [(0, '0.185')] [2024-06-06 15:17:46,545][24347] Updated weights for policy 0, policy_version 33688 (0.0039) [2024-06-06 15:17:47,318][24114] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 551960576. Throughput: 0: 44575.1. Samples: 33199240. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 15:17:47,318][24114] Avg episode reward: [(0, '0.178')] [2024-06-06 15:17:49,633][24347] Updated weights for policy 0, policy_version 33698 (0.0033) [2024-06-06 15:17:52,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44782.8, 300 sec: 44542.2). Total num frames: 552206336. Throughput: 0: 44759.5. Samples: 33467940. Policy #0 lag: (min: 1.0, avg: 11.0, max: 23.0) [2024-06-06 15:17:52,319][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:17:54,002][24347] Updated weights for policy 0, policy_version 33708 (0.0045) [2024-06-06 15:17:56,986][24347] Updated weights for policy 0, policy_version 33718 (0.0028) [2024-06-06 15:17:57,318][24114] Fps is (10 sec: 47513.2, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 552435712. Throughput: 0: 44724.1. Samples: 33734720. Policy #0 lag: (min: 1.0, avg: 11.0, max: 23.0) [2024-06-06 15:17:57,319][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:18:01,118][24347] Updated weights for policy 0, policy_version 33728 (0.0031) [2024-06-06 15:18:02,318][24114] Fps is (10 sec: 45875.3, 60 sec: 45055.9, 300 sec: 44653.3). Total num frames: 552665088. Throughput: 0: 44790.5. Samples: 33872280. Policy #0 lag: (min: 1.0, avg: 11.0, max: 23.0) [2024-06-06 15:18:02,327][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:18:04,243][24347] Updated weights for policy 0, policy_version 33738 (0.0042) [2024-06-06 15:18:07,318][24114] Fps is (10 sec: 42597.9, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 552861696. Throughput: 0: 44671.8. Samples: 34139500. Policy #0 lag: (min: 1.0, avg: 11.0, max: 23.0) [2024-06-06 15:18:07,319][24114] Avg episode reward: [(0, '0.183')] [2024-06-06 15:18:08,441][24347] Updated weights for policy 0, policy_version 33748 (0.0032) [2024-06-06 15:18:11,815][24347] Updated weights for policy 0, policy_version 33758 (0.0031) [2024-06-06 15:18:12,318][24114] Fps is (10 sec: 42598.5, 60 sec: 44514.3, 300 sec: 44431.2). Total num frames: 553091072. Throughput: 0: 44691.9. Samples: 34401680. Policy #0 lag: (min: 1.0, avg: 11.0, max: 23.0) [2024-06-06 15:18:12,318][24114] Avg episode reward: [(0, '0.189')] [2024-06-06 15:18:15,987][24347] Updated weights for policy 0, policy_version 33768 (0.0030) [2024-06-06 15:18:17,318][24114] Fps is (10 sec: 45876.1, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 553320448. Throughput: 0: 44803.7. Samples: 34538900. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 15:18:17,318][24114] Avg episode reward: [(0, '0.188')] [2024-06-06 15:18:19,169][24347] Updated weights for policy 0, policy_version 33778 (0.0041) [2024-06-06 15:18:22,318][24114] Fps is (10 sec: 42598.2, 60 sec: 43966.3, 300 sec: 44431.2). Total num frames: 553517056. Throughput: 0: 44440.3. Samples: 34796700. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 15:18:22,319][24114] Avg episode reward: [(0, '0.187')] [2024-06-06 15:18:22,331][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000033784_553517056.pth... [2024-06-06 15:18:22,385][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000033136_542900224.pth [2024-06-06 15:18:23,578][24347] Updated weights for policy 0, policy_version 33788 (0.0027) [2024-06-06 15:18:26,470][24347] Updated weights for policy 0, policy_version 33798 (0.0038) [2024-06-06 15:18:27,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 553762816. Throughput: 0: 44415.3. Samples: 35063360. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 15:18:27,318][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:18:30,649][24347] Updated weights for policy 0, policy_version 33808 (0.0027) [2024-06-06 15:18:32,320][24114] Fps is (10 sec: 47504.8, 60 sec: 44508.5, 300 sec: 44542.0). Total num frames: 553992192. Throughput: 0: 44541.1. Samples: 35203680. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 15:18:32,321][24114] Avg episode reward: [(0, '0.190')] [2024-06-06 15:18:33,789][24347] Updated weights for policy 0, policy_version 33818 (0.0038) [2024-06-06 15:18:37,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 554188800. Throughput: 0: 44413.9. Samples: 35466560. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-06 15:18:37,319][24114] Avg episode reward: [(0, '0.189')] [2024-06-06 15:18:37,804][24347] Updated weights for policy 0, policy_version 33828 (0.0023) [2024-06-06 15:18:41,188][24347] Updated weights for policy 0, policy_version 33838 (0.0042) [2024-06-06 15:18:42,320][24114] Fps is (10 sec: 44236.5, 60 sec: 44235.4, 300 sec: 44430.9). Total num frames: 554434560. Throughput: 0: 44423.4. Samples: 35733860. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-06 15:18:42,321][24114] Avg episode reward: [(0, '0.180')] [2024-06-06 15:18:45,566][24347] Updated weights for policy 0, policy_version 33848 (0.0037) [2024-06-06 15:18:47,320][24114] Fps is (10 sec: 45866.0, 60 sec: 44781.4, 300 sec: 44430.9). Total num frames: 554647552. Throughput: 0: 44389.6. Samples: 35869900. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-06 15:18:47,321][24114] Avg episode reward: [(0, '0.190')] [2024-06-06 15:18:48,535][24347] Updated weights for policy 0, policy_version 33858 (0.0023) [2024-06-06 15:18:52,320][24114] Fps is (10 sec: 42598.7, 60 sec: 44235.4, 300 sec: 44430.9). Total num frames: 554860544. Throughput: 0: 44290.2. Samples: 36132640. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-06 15:18:52,321][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:18:52,776][24347] Updated weights for policy 0, policy_version 33868 (0.0032) [2024-06-06 15:18:55,942][24347] Updated weights for policy 0, policy_version 33878 (0.0031) [2024-06-06 15:18:57,318][24114] Fps is (10 sec: 44245.4, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 555089920. Throughput: 0: 44358.7. Samples: 36397820. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 15:18:57,319][24114] Avg episode reward: [(0, '0.195')] [2024-06-06 15:19:00,186][24347] Updated weights for policy 0, policy_version 33888 (0.0030) [2024-06-06 15:19:00,668][24326] Signal inference workers to stop experience collection... (450 times) [2024-06-06 15:19:00,704][24347] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-06 15:19:00,713][24326] Signal inference workers to resume experience collection... (450 times) [2024-06-06 15:19:00,723][24347] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-06 15:19:02,318][24114] Fps is (10 sec: 44245.1, 60 sec: 43963.7, 300 sec: 44375.6). Total num frames: 555302912. Throughput: 0: 44198.1. Samples: 36527820. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 15:19:02,319][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:19:03,638][24347] Updated weights for policy 0, policy_version 33898 (0.0046) [2024-06-06 15:19:07,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44510.0, 300 sec: 44486.7). Total num frames: 555532288. Throughput: 0: 44377.1. Samples: 36793660. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 15:19:07,318][24114] Avg episode reward: [(0, '0.184')] [2024-06-06 15:19:07,578][24347] Updated weights for policy 0, policy_version 33908 (0.0036) [2024-06-06 15:19:10,979][24347] Updated weights for policy 0, policy_version 33918 (0.0037) [2024-06-06 15:19:12,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44509.9, 300 sec: 44432.1). Total num frames: 555761664. Throughput: 0: 44239.9. Samples: 37054160. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 15:19:12,319][24114] Avg episode reward: [(0, '0.189')] [2024-06-06 15:19:15,286][24347] Updated weights for policy 0, policy_version 33928 (0.0032) [2024-06-06 15:19:17,318][24114] Fps is (10 sec: 44236.0, 60 sec: 44236.7, 300 sec: 44375.6). Total num frames: 555974656. Throughput: 0: 44245.4. Samples: 37194640. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 15:19:17,319][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:19:18,185][24347] Updated weights for policy 0, policy_version 33938 (0.0035) [2024-06-06 15:19:22,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44510.0, 300 sec: 44431.2). Total num frames: 556187648. Throughput: 0: 44133.4. Samples: 37452560. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 15:19:22,318][24114] Avg episode reward: [(0, '0.189')] [2024-06-06 15:19:22,413][24347] Updated weights for policy 0, policy_version 33948 (0.0032) [2024-06-06 15:19:25,916][24347] Updated weights for policy 0, policy_version 33958 (0.0044) [2024-06-06 15:19:27,318][24114] Fps is (10 sec: 44237.6, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 556417024. Throughput: 0: 44176.2. Samples: 37721700. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 15:19:27,318][24114] Avg episode reward: [(0, '0.191')] [2024-06-06 15:19:29,802][24347] Updated weights for policy 0, policy_version 33968 (0.0035) [2024-06-06 15:19:32,318][24114] Fps is (10 sec: 44236.5, 60 sec: 43965.1, 300 sec: 44375.6). Total num frames: 556630016. Throughput: 0: 44076.2. Samples: 37853240. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 15:19:32,318][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:19:33,643][24347] Updated weights for policy 0, policy_version 33978 (0.0035) [2024-06-06 15:19:36,907][24347] Updated weights for policy 0, policy_version 33988 (0.0032) [2024-06-06 15:19:37,319][24114] Fps is (10 sec: 44234.4, 60 sec: 44509.5, 300 sec: 44431.1). Total num frames: 556859392. Throughput: 0: 44102.3. Samples: 38117180. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 15:19:37,319][24114] Avg episode reward: [(0, '0.189')] [2024-06-06 15:19:40,765][24347] Updated weights for policy 0, policy_version 33998 (0.0030) [2024-06-06 15:19:42,318][24114] Fps is (10 sec: 47513.3, 60 sec: 44511.3, 300 sec: 44487.0). Total num frames: 557105152. Throughput: 0: 44160.0. Samples: 38385020. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 15:19:42,319][24114] Avg episode reward: [(0, '0.188')] [2024-06-06 15:19:44,512][24347] Updated weights for policy 0, policy_version 34008 (0.0029) [2024-06-06 15:19:47,318][24114] Fps is (10 sec: 44238.9, 60 sec: 44238.3, 300 sec: 44375.9). Total num frames: 557301760. Throughput: 0: 44309.4. Samples: 38521740. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 15:19:47,319][24114] Avg episode reward: [(0, '0.189')] [2024-06-06 15:19:47,898][24347] Updated weights for policy 0, policy_version 34018 (0.0037) [2024-06-06 15:19:52,003][24347] Updated weights for policy 0, policy_version 34028 (0.0044) [2024-06-06 15:19:52,318][24114] Fps is (10 sec: 40960.2, 60 sec: 44238.2, 300 sec: 44431.2). Total num frames: 557514752. Throughput: 0: 44155.9. Samples: 38780680. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 15:19:52,319][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:19:55,780][24347] Updated weights for policy 0, policy_version 34038 (0.0040) [2024-06-06 15:19:57,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 557744128. Throughput: 0: 44283.2. Samples: 39046900. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 15:19:57,318][24114] Avg episode reward: [(0, '0.189')] [2024-06-06 15:19:59,405][24347] Updated weights for policy 0, policy_version 34048 (0.0030) [2024-06-06 15:20:02,318][24114] Fps is (10 sec: 45875.8, 60 sec: 44510.0, 300 sec: 44431.5). Total num frames: 557973504. Throughput: 0: 44081.5. Samples: 39178300. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 15:20:02,318][24114] Avg episode reward: [(0, '0.188')] [2024-06-06 15:20:03,012][24347] Updated weights for policy 0, policy_version 34058 (0.0037) [2024-06-06 15:20:06,522][24347] Updated weights for policy 0, policy_version 34068 (0.0034) [2024-06-06 15:20:07,322][24114] Fps is (10 sec: 44220.4, 60 sec: 44234.1, 300 sec: 44375.1). Total num frames: 558186496. Throughput: 0: 44185.7. Samples: 39441080. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 15:20:07,322][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:20:10,510][24347] Updated weights for policy 0, policy_version 34078 (0.0041) [2024-06-06 15:20:12,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 558415872. Throughput: 0: 44184.8. Samples: 39710020. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 15:20:12,318][24114] Avg episode reward: [(0, '0.190')] [2024-06-06 15:20:14,178][24347] Updated weights for policy 0, policy_version 34088 (0.0030) [2024-06-06 15:20:17,318][24114] Fps is (10 sec: 44252.8, 60 sec: 44236.9, 300 sec: 44375.6). Total num frames: 558628864. Throughput: 0: 44296.0. Samples: 39846560. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 15:20:17,318][24114] Avg episode reward: [(0, '0.191')] [2024-06-06 15:20:17,636][24347] Updated weights for policy 0, policy_version 34098 (0.0028) [2024-06-06 15:20:21,293][24347] Updated weights for policy 0, policy_version 34108 (0.0039) [2024-06-06 15:20:22,318][24114] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 44375.7). Total num frames: 558841856. Throughput: 0: 44218.2. Samples: 40106980. Policy #0 lag: (min: 0.0, avg: 9.3, max: 22.0) [2024-06-06 15:20:22,319][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:20:22,383][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000034110_558858240.pth... [2024-06-06 15:20:22,444][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000033459_548192256.pth [2024-06-06 15:20:23,495][24326] Signal inference workers to stop experience collection... (500 times) [2024-06-06 15:20:23,495][24326] Signal inference workers to resume experience collection... (500 times) [2024-06-06 15:20:23,509][24347] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-06 15:20:23,509][24347] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-06 15:20:25,292][24347] Updated weights for policy 0, policy_version 34118 (0.0038) [2024-06-06 15:20:27,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44236.8, 300 sec: 44376.5). Total num frames: 559071232. Throughput: 0: 44251.7. Samples: 40376340. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 15:20:27,321][24114] Avg episode reward: [(0, '0.188')] [2024-06-06 15:20:28,862][24347] Updated weights for policy 0, policy_version 34128 (0.0028) [2024-06-06 15:20:32,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44236.9, 300 sec: 44320.1). Total num frames: 559284224. Throughput: 0: 44074.3. Samples: 40505080. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 15:20:32,318][24114] Avg episode reward: [(0, '0.193')] [2024-06-06 15:20:32,530][24347] Updated weights for policy 0, policy_version 34138 (0.0027) [2024-06-06 15:20:35,933][24347] Updated weights for policy 0, policy_version 34148 (0.0031) [2024-06-06 15:20:37,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44510.2, 300 sec: 44431.2). Total num frames: 559529984. Throughput: 0: 44260.0. Samples: 40772380. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 15:20:37,319][24114] Avg episode reward: [(0, '0.189')] [2024-06-06 15:20:40,080][24347] Updated weights for policy 0, policy_version 34158 (0.0036) [2024-06-06 15:20:42,318][24114] Fps is (10 sec: 47513.1, 60 sec: 44236.9, 300 sec: 44375.7). Total num frames: 559759360. Throughput: 0: 44239.5. Samples: 41037680. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 15:20:42,318][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:20:43,482][24347] Updated weights for policy 0, policy_version 34168 (0.0037) [2024-06-06 15:20:47,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 44375.6). Total num frames: 559955968. Throughput: 0: 44334.6. Samples: 41173360. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 15:20:47,319][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:20:47,659][24347] Updated weights for policy 0, policy_version 34178 (0.0040) [2024-06-06 15:20:50,664][24347] Updated weights for policy 0, policy_version 34188 (0.0026) [2024-06-06 15:20:52,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44510.0, 300 sec: 44375.7). Total num frames: 560185344. Throughput: 0: 44233.9. Samples: 41431440. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 15:20:52,318][24114] Avg episode reward: [(0, '0.191')] [2024-06-06 15:20:55,070][24347] Updated weights for policy 0, policy_version 34198 (0.0022) [2024-06-06 15:20:57,324][24114] Fps is (10 sec: 45848.0, 60 sec: 44505.4, 300 sec: 44485.8). Total num frames: 560414720. Throughput: 0: 44273.3. Samples: 41702580. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 15:20:57,325][24114] Avg episode reward: [(0, '0.190')] [2024-06-06 15:20:58,334][24347] Updated weights for policy 0, policy_version 34208 (0.0038) [2024-06-06 15:21:02,109][24347] Updated weights for policy 0, policy_version 34218 (0.0042) [2024-06-06 15:21:02,320][24114] Fps is (10 sec: 44227.7, 60 sec: 44235.3, 300 sec: 44375.3). Total num frames: 560627712. Throughput: 0: 44284.7. Samples: 41839460. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 15:21:02,321][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:21:05,465][24347] Updated weights for policy 0, policy_version 34228 (0.0037) [2024-06-06 15:21:07,318][24114] Fps is (10 sec: 42623.9, 60 sec: 44239.5, 300 sec: 44320.1). Total num frames: 560840704. Throughput: 0: 44367.2. Samples: 42103500. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 15:21:07,318][24114] Avg episode reward: [(0, '0.196')] [2024-06-06 15:21:09,574][24347] Updated weights for policy 0, policy_version 34238 (0.0058) [2024-06-06 15:21:12,318][24114] Fps is (10 sec: 45884.7, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 561086464. Throughput: 0: 44236.9. Samples: 42367000. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 15:21:12,318][24114] Avg episode reward: [(0, '0.194')] [2024-06-06 15:21:13,162][24347] Updated weights for policy 0, policy_version 34248 (0.0035) [2024-06-06 15:21:17,022][24347] Updated weights for policy 0, policy_version 34258 (0.0028) [2024-06-06 15:21:17,324][24114] Fps is (10 sec: 45848.0, 60 sec: 44505.5, 300 sec: 44430.3). Total num frames: 561299456. Throughput: 0: 44382.5. Samples: 42502560. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 15:21:17,333][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:21:20,249][24347] Updated weights for policy 0, policy_version 34268 (0.0024) [2024-06-06 15:21:22,318][24114] Fps is (10 sec: 42597.9, 60 sec: 44509.9, 300 sec: 44320.1). Total num frames: 561512448. Throughput: 0: 44440.9. Samples: 42772220. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 15:21:22,318][24114] Avg episode reward: [(0, '0.194')] [2024-06-06 15:21:24,328][24347] Updated weights for policy 0, policy_version 34278 (0.0040) [2024-06-06 15:21:27,318][24114] Fps is (10 sec: 44262.6, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 561741824. Throughput: 0: 44466.6. Samples: 43038680. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 15:21:27,318][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:21:27,541][24347] Updated weights for policy 0, policy_version 34288 (0.0037) [2024-06-06 15:21:31,716][24347] Updated weights for policy 0, policy_version 34298 (0.0030) [2024-06-06 15:21:32,324][24114] Fps is (10 sec: 44210.5, 60 sec: 44505.4, 300 sec: 44430.3). Total num frames: 561954816. Throughput: 0: 44453.2. Samples: 43174020. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 15:21:32,325][24114] Avg episode reward: [(0, '0.190')] [2024-06-06 15:21:34,873][24347] Updated weights for policy 0, policy_version 34308 (0.0044) [2024-06-06 15:21:37,318][24114] Fps is (10 sec: 42598.6, 60 sec: 43963.7, 300 sec: 44320.1). Total num frames: 562167808. Throughput: 0: 44597.2. Samples: 43438320. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:21:37,318][24114] Avg episode reward: [(0, '0.187')] [2024-06-06 15:21:38,877][24347] Updated weights for policy 0, policy_version 34318 (0.0031) [2024-06-06 15:21:42,319][24114] Fps is (10 sec: 45898.8, 60 sec: 44236.2, 300 sec: 44431.0). Total num frames: 562413568. Throughput: 0: 44465.5. Samples: 43703300. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:21:42,319][24114] Avg episode reward: [(0, '0.190')] [2024-06-06 15:21:42,516][24347] Updated weights for policy 0, policy_version 34328 (0.0041) [2024-06-06 15:21:46,550][24347] Updated weights for policy 0, policy_version 34338 (0.0034) [2024-06-06 15:21:47,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 562626560. Throughput: 0: 44406.8. Samples: 43837680. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:21:47,319][24114] Avg episode reward: [(0, '0.195')] [2024-06-06 15:21:48,909][24326] Signal inference workers to stop experience collection... (550 times) [2024-06-06 15:21:48,956][24347] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-06 15:21:48,962][24326] Signal inference workers to resume experience collection... (550 times) [2024-06-06 15:21:48,966][24347] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-06 15:21:50,169][24347] Updated weights for policy 0, policy_version 34348 (0.0030) [2024-06-06 15:21:52,318][24114] Fps is (10 sec: 42601.8, 60 sec: 44236.7, 300 sec: 44320.1). Total num frames: 562839552. Throughput: 0: 44434.2. Samples: 44103040. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:21:52,319][24114] Avg episode reward: [(0, '0.197')] [2024-06-06 15:21:54,209][24347] Updated weights for policy 0, policy_version 34358 (0.0036) [2024-06-06 15:21:57,315][24347] Updated weights for policy 0, policy_version 34368 (0.0035) [2024-06-06 15:21:57,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44514.3, 300 sec: 44486.7). Total num frames: 563085312. Throughput: 0: 44446.6. Samples: 44367100. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-06 15:21:57,318][24114] Avg episode reward: [(0, '0.191')] [2024-06-06 15:22:01,420][24347] Updated weights for policy 0, policy_version 34378 (0.0031) [2024-06-06 15:22:02,318][24114] Fps is (10 sec: 44236.9, 60 sec: 44238.2, 300 sec: 44375.7). Total num frames: 563281920. Throughput: 0: 44375.1. Samples: 44499180. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-06 15:22:02,319][24114] Avg episode reward: [(0, '0.193')] [2024-06-06 15:22:05,105][24347] Updated weights for policy 0, policy_version 34388 (0.0029) [2024-06-06 15:22:07,318][24114] Fps is (10 sec: 40960.0, 60 sec: 44236.8, 300 sec: 44321.0). Total num frames: 563494912. Throughput: 0: 44260.9. Samples: 44763960. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-06 15:22:07,318][24114] Avg episode reward: [(0, '0.195')] [2024-06-06 15:22:08,485][24347] Updated weights for policy 0, policy_version 34398 (0.0046) [2024-06-06 15:22:12,318][24114] Fps is (10 sec: 44235.0, 60 sec: 43963.4, 300 sec: 44375.6). Total num frames: 563724288. Throughput: 0: 44181.9. Samples: 45026880. Policy #0 lag: (min: 1.0, avg: 10.0, max: 21.0) [2024-06-06 15:22:12,319][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:22:12,469][24347] Updated weights for policy 0, policy_version 34408 (0.0021) [2024-06-06 15:22:15,994][24347] Updated weights for policy 0, policy_version 34418 (0.0027) [2024-06-06 15:22:17,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44241.1, 300 sec: 44320.6). Total num frames: 563953664. Throughput: 0: 44265.9. Samples: 45165720. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 15:22:17,318][24114] Avg episode reward: [(0, '0.198')] [2024-06-06 15:22:19,570][24347] Updated weights for policy 0, policy_version 34428 (0.0040) [2024-06-06 15:22:22,318][24114] Fps is (10 sec: 42600.2, 60 sec: 43963.7, 300 sec: 44209.0). Total num frames: 564150272. Throughput: 0: 44292.5. Samples: 45431480. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 15:22:22,318][24114] Avg episode reward: [(0, '0.187')] [2024-06-06 15:22:22,335][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000034434_564166656.pth... [2024-06-06 15:22:22,390][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000033784_553517056.pth [2024-06-06 15:22:23,564][24347] Updated weights for policy 0, policy_version 34438 (0.0041) [2024-06-06 15:22:27,175][24347] Updated weights for policy 0, policy_version 34448 (0.0038) [2024-06-06 15:22:27,320][24114] Fps is (10 sec: 45866.5, 60 sec: 44508.5, 300 sec: 44375.4). Total num frames: 564412416. Throughput: 0: 44405.2. Samples: 45701580. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 15:22:27,321][24114] Avg episode reward: [(0, '0.182')] [2024-06-06 15:22:30,721][24347] Updated weights for policy 0, policy_version 34458 (0.0050) [2024-06-06 15:22:32,318][24114] Fps is (10 sec: 47514.0, 60 sec: 44514.3, 300 sec: 44375.7). Total num frames: 564625408. Throughput: 0: 44349.4. Samples: 45833400. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 15:22:32,318][24114] Avg episode reward: [(0, '0.196')] [2024-06-06 15:22:34,478][24347] Updated weights for policy 0, policy_version 34468 (0.0024) [2024-06-06 15:22:37,320][24114] Fps is (10 sec: 40959.8, 60 sec: 44235.4, 300 sec: 44208.7). Total num frames: 564822016. Throughput: 0: 44316.3. Samples: 46097360. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 15:22:37,321][24114] Avg episode reward: [(0, '0.186')] [2024-06-06 15:22:37,952][24347] Updated weights for policy 0, policy_version 34478 (0.0031) [2024-06-06 15:22:41,806][24347] Updated weights for policy 0, policy_version 34488 (0.0032) [2024-06-06 15:22:42,318][24114] Fps is (10 sec: 42598.0, 60 sec: 43964.3, 300 sec: 44375.6). Total num frames: 565051392. Throughput: 0: 44202.7. Samples: 46356220. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:22:42,319][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:22:45,564][24347] Updated weights for policy 0, policy_version 34498 (0.0022) [2024-06-06 15:22:47,318][24114] Fps is (10 sec: 45884.4, 60 sec: 44236.9, 300 sec: 44320.1). Total num frames: 565280768. Throughput: 0: 44331.6. Samples: 46494100. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:22:47,318][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:22:49,263][24347] Updated weights for policy 0, policy_version 34508 (0.0028) [2024-06-06 15:22:52,318][24114] Fps is (10 sec: 42598.6, 60 sec: 43963.8, 300 sec: 44209.0). Total num frames: 565477376. Throughput: 0: 44353.4. Samples: 46759860. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:22:52,318][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:22:53,072][24347] Updated weights for policy 0, policy_version 34518 (0.0033) [2024-06-06 15:22:56,622][24347] Updated weights for policy 0, policy_version 34528 (0.0037) [2024-06-06 15:22:57,318][24114] Fps is (10 sec: 44236.5, 60 sec: 43963.7, 300 sec: 44264.6). Total num frames: 565723136. Throughput: 0: 44544.8. Samples: 47031380. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:22:57,319][24114] Avg episode reward: [(0, '0.196')] [2024-06-06 15:23:00,051][24347] Updated weights for policy 0, policy_version 34538 (0.0038) [2024-06-06 15:23:02,320][24114] Fps is (10 sec: 47504.2, 60 sec: 44508.4, 300 sec: 44375.4). Total num frames: 565952512. Throughput: 0: 44412.3. Samples: 47164360. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 15:23:02,321][24114] Avg episode reward: [(0, '0.190')] [2024-06-06 15:23:04,062][24347] Updated weights for policy 0, policy_version 34548 (0.0031) [2024-06-06 15:23:07,228][24347] Updated weights for policy 0, policy_version 34558 (0.0046) [2024-06-06 15:23:07,318][24114] Fps is (10 sec: 47514.0, 60 sec: 45056.1, 300 sec: 44431.2). Total num frames: 566198272. Throughput: 0: 44677.4. Samples: 47441960. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 15:23:07,319][24114] Avg episode reward: [(0, '0.196')] [2024-06-06 15:23:11,225][24347] Updated weights for policy 0, policy_version 34568 (0.0034) [2024-06-06 15:23:12,320][24114] Fps is (10 sec: 42598.7, 60 sec: 44235.7, 300 sec: 44264.3). Total num frames: 566378496. Throughput: 0: 44538.7. Samples: 47705820. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 15:23:12,320][24114] Avg episode reward: [(0, '0.190')] [2024-06-06 15:23:14,922][24347] Updated weights for policy 0, policy_version 34578 (0.0038) [2024-06-06 15:23:17,318][24114] Fps is (10 sec: 42597.9, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 566624256. Throughput: 0: 44558.5. Samples: 47838540. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 15:23:17,319][24114] Avg episode reward: [(0, '0.187')] [2024-06-06 15:23:18,569][24347] Updated weights for policy 0, policy_version 34588 (0.0041) [2024-06-06 15:23:21,952][24347] Updated weights for policy 0, policy_version 34598 (0.0034) [2024-06-06 15:23:22,324][24114] Fps is (10 sec: 47494.4, 60 sec: 45051.5, 300 sec: 44374.7). Total num frames: 566853632. Throughput: 0: 44620.5. Samples: 48105460. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 15:23:22,325][24114] Avg episode reward: [(0, '0.195')] [2024-06-06 15:23:25,626][24347] Updated weights for policy 0, policy_version 34608 (0.0023) [2024-06-06 15:23:27,318][24114] Fps is (10 sec: 40960.6, 60 sec: 43692.1, 300 sec: 44209.3). Total num frames: 567033856. Throughput: 0: 44947.7. Samples: 48378860. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:23:27,318][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:23:28,716][24326] Signal inference workers to stop experience collection... (600 times) [2024-06-06 15:23:28,716][24326] Signal inference workers to resume experience collection... (600 times) [2024-06-06 15:23:28,760][24347] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-06 15:23:28,760][24347] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-06 15:23:29,482][24347] Updated weights for policy 0, policy_version 34618 (0.0036) [2024-06-06 15:23:32,318][24114] Fps is (10 sec: 45902.7, 60 sec: 44782.9, 300 sec: 44486.7). Total num frames: 567312384. Throughput: 0: 44587.1. Samples: 48500520. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:23:32,318][24114] Avg episode reward: [(0, '0.193')] [2024-06-06 15:23:33,622][24347] Updated weights for policy 0, policy_version 34628 (0.0039) [2024-06-06 15:23:36,648][24347] Updated weights for policy 0, policy_version 34638 (0.0036) [2024-06-06 15:23:37,321][24114] Fps is (10 sec: 49135.4, 60 sec: 45055.0, 300 sec: 44375.5). Total num frames: 567525376. Throughput: 0: 44804.7. Samples: 48776220. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:23:37,322][24114] Avg episode reward: [(0, '0.195')] [2024-06-06 15:23:41,280][24347] Updated weights for policy 0, policy_version 34648 (0.0024) [2024-06-06 15:23:42,318][24114] Fps is (10 sec: 40959.3, 60 sec: 44509.8, 300 sec: 44320.4). Total num frames: 567721984. Throughput: 0: 44530.1. Samples: 49035240. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:23:42,319][24114] Avg episode reward: [(0, '0.195')] [2024-06-06 15:23:44,276][24347] Updated weights for policy 0, policy_version 34658 (0.0036) [2024-06-06 15:23:47,318][24114] Fps is (10 sec: 42612.5, 60 sec: 44509.8, 300 sec: 44375.9). Total num frames: 567951360. Throughput: 0: 44470.8. Samples: 49165460. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 15:23:47,318][24114] Avg episode reward: [(0, '0.197')] [2024-06-06 15:23:48,366][24347] Updated weights for policy 0, policy_version 34668 (0.0038) [2024-06-06 15:23:51,734][24347] Updated weights for policy 0, policy_version 34678 (0.0035) [2024-06-06 15:23:52,318][24114] Fps is (10 sec: 45875.6, 60 sec: 45056.0, 300 sec: 44375.6). Total num frames: 568180736. Throughput: 0: 44301.7. Samples: 49435540. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 15:23:52,319][24114] Avg episode reward: [(0, '0.201')] [2024-06-06 15:23:55,490][24347] Updated weights for policy 0, policy_version 34688 (0.0023) [2024-06-06 15:23:57,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44509.9, 300 sec: 44375.7). Total num frames: 568393728. Throughput: 0: 44488.5. Samples: 49707720. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 15:23:57,319][24114] Avg episode reward: [(0, '0.194')] [2024-06-06 15:23:58,930][24347] Updated weights for policy 0, policy_version 34698 (0.0045) [2024-06-06 15:24:02,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44511.2, 300 sec: 44375.6). Total num frames: 568623104. Throughput: 0: 44462.2. Samples: 49839340. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 15:24:02,319][24114] Avg episode reward: [(0, '0.193')] [2024-06-06 15:24:02,994][24347] Updated weights for policy 0, policy_version 34708 (0.0024) [2024-06-06 15:24:06,151][24347] Updated weights for policy 0, policy_version 34718 (0.0049) [2024-06-06 15:24:07,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44236.7, 300 sec: 44375.6). Total num frames: 568852480. Throughput: 0: 44472.5. Samples: 50106460. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 15:24:07,319][24114] Avg episode reward: [(0, '0.188')] [2024-06-06 15:24:10,610][24347] Updated weights for policy 0, policy_version 34728 (0.0025) [2024-06-06 15:24:12,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44784.3, 300 sec: 44375.7). Total num frames: 569065472. Throughput: 0: 44414.1. Samples: 50377500. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:24:12,319][24114] Avg episode reward: [(0, '0.194')] [2024-06-06 15:24:13,614][24347] Updated weights for policy 0, policy_version 34738 (0.0041) [2024-06-06 15:24:17,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44236.9, 300 sec: 44375.6). Total num frames: 569278464. Throughput: 0: 44549.8. Samples: 50505260. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:24:17,318][24114] Avg episode reward: [(0, '0.197')] [2024-06-06 15:24:17,688][24347] Updated weights for policy 0, policy_version 34748 (0.0041) [2024-06-06 15:24:20,874][24347] Updated weights for policy 0, policy_version 34758 (0.0032) [2024-06-06 15:24:22,318][24114] Fps is (10 sec: 45874.9, 60 sec: 44514.2, 300 sec: 44431.2). Total num frames: 569524224. Throughput: 0: 44462.7. Samples: 50776900. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:24:22,319][24114] Avg episode reward: [(0, '0.193')] [2024-06-06 15:24:22,326][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000034761_569524224.pth... [2024-06-06 15:24:22,391][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000034110_558858240.pth [2024-06-06 15:24:24,726][24347] Updated weights for policy 0, policy_version 34768 (0.0044) [2024-06-06 15:24:27,318][24114] Fps is (10 sec: 45875.3, 60 sec: 45056.0, 300 sec: 44431.2). Total num frames: 569737216. Throughput: 0: 44706.9. Samples: 51047040. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:24:27,319][24114] Avg episode reward: [(0, '0.204')] [2024-06-06 15:24:28,185][24347] Updated weights for policy 0, policy_version 34778 (0.0031) [2024-06-06 15:24:32,318][24114] Fps is (10 sec: 40959.8, 60 sec: 43690.5, 300 sec: 44320.2). Total num frames: 569933824. Throughput: 0: 44669.6. Samples: 51175600. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:24:32,319][24114] Avg episode reward: [(0, '0.198')] [2024-06-06 15:24:32,612][24347] Updated weights for policy 0, policy_version 34788 (0.0031) [2024-06-06 15:24:35,609][24347] Updated weights for policy 0, policy_version 34798 (0.0028) [2024-06-06 15:24:37,318][24114] Fps is (10 sec: 45875.6, 60 sec: 44512.4, 300 sec: 44375.7). Total num frames: 570195968. Throughput: 0: 44496.6. Samples: 51437880. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 15:24:37,318][24114] Avg episode reward: [(0, '0.192')] [2024-06-06 15:24:40,127][24347] Updated weights for policy 0, policy_version 34808 (0.0027) [2024-06-06 15:24:40,130][24326] Signal inference workers to stop experience collection... (650 times) [2024-06-06 15:24:40,131][24326] Signal inference workers to resume experience collection... (650 times) [2024-06-06 15:24:40,150][24347] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-06 15:24:40,151][24347] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-06 15:24:42,318][24114] Fps is (10 sec: 45875.8, 60 sec: 44510.0, 300 sec: 44375.6). Total num frames: 570392576. Throughput: 0: 44551.6. Samples: 51712540. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 15:24:42,318][24114] Avg episode reward: [(0, '0.194')] [2024-06-06 15:24:43,265][24347] Updated weights for policy 0, policy_version 34818 (0.0039) [2024-06-06 15:24:47,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44510.0, 300 sec: 44431.2). Total num frames: 570621952. Throughput: 0: 44331.0. Samples: 51834220. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 15:24:47,318][24114] Avg episode reward: [(0, '0.200')] [2024-06-06 15:24:47,321][24347] Updated weights for policy 0, policy_version 34828 (0.0030) [2024-06-06 15:24:50,349][24347] Updated weights for policy 0, policy_version 34838 (0.0030) [2024-06-06 15:24:52,318][24114] Fps is (10 sec: 49151.8, 60 sec: 45056.0, 300 sec: 44542.2). Total num frames: 570884096. Throughput: 0: 44471.6. Samples: 52107680. Policy #0 lag: (min: 0.0, avg: 8.7, max: 21.0) [2024-06-06 15:24:52,319][24114] Avg episode reward: [(0, '0.198')] [2024-06-06 15:24:54,915][24347] Updated weights for policy 0, policy_version 34848 (0.0035) [2024-06-06 15:24:57,318][24114] Fps is (10 sec: 44235.9, 60 sec: 44509.9, 300 sec: 44375.6). Total num frames: 571064320. Throughput: 0: 44360.5. Samples: 52373720. Policy #0 lag: (min: 1.0, avg: 8.7, max: 22.0) [2024-06-06 15:24:57,319][24114] Avg episode reward: [(0, '0.198')] [2024-06-06 15:24:58,006][24347] Updated weights for policy 0, policy_version 34858 (0.0027) [2024-06-06 15:25:02,115][24347] Updated weights for policy 0, policy_version 34868 (0.0031) [2024-06-06 15:25:02,318][24114] Fps is (10 sec: 39321.3, 60 sec: 44236.8, 300 sec: 44376.2). Total num frames: 571277312. Throughput: 0: 44392.3. Samples: 52502920. Policy #0 lag: (min: 1.0, avg: 8.7, max: 22.0) [2024-06-06 15:25:02,319][24114] Avg episode reward: [(0, '0.195')] [2024-06-06 15:25:05,291][24347] Updated weights for policy 0, policy_version 34878 (0.0032) [2024-06-06 15:25:07,318][24114] Fps is (10 sec: 47513.7, 60 sec: 44783.0, 300 sec: 44486.7). Total num frames: 571539456. Throughput: 0: 44193.4. Samples: 52765600. Policy #0 lag: (min: 1.0, avg: 8.7, max: 22.0) [2024-06-06 15:25:07,318][24114] Avg episode reward: [(0, '0.198')] [2024-06-06 15:25:09,262][24347] Updated weights for policy 0, policy_version 34888 (0.0039) [2024-06-06 15:25:12,318][24114] Fps is (10 sec: 45875.7, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 571736064. Throughput: 0: 44249.7. Samples: 53038280. Policy #0 lag: (min: 1.0, avg: 8.7, max: 22.0) [2024-06-06 15:25:12,318][24114] Avg episode reward: [(0, '0.199')] [2024-06-06 15:25:12,811][24347] Updated weights for policy 0, policy_version 34898 (0.0025) [2024-06-06 15:25:16,616][24347] Updated weights for policy 0, policy_version 34908 (0.0026) [2024-06-06 15:25:17,318][24114] Fps is (10 sec: 40960.3, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 571949056. Throughput: 0: 44305.1. Samples: 53169320. Policy #0 lag: (min: 1.0, avg: 8.7, max: 22.0) [2024-06-06 15:25:17,324][24114] Avg episode reward: [(0, '0.200')] [2024-06-06 15:25:20,265][24347] Updated weights for policy 0, policy_version 34918 (0.0032) [2024-06-06 15:25:22,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 572211200. Throughput: 0: 44524.8. Samples: 53441500. Policy #0 lag: (min: 0.0, avg: 8.4, max: 21.0) [2024-06-06 15:25:22,319][24114] Avg episode reward: [(0, '0.198')] [2024-06-06 15:25:24,204][24347] Updated weights for policy 0, policy_version 34928 (0.0039) [2024-06-06 15:25:27,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 572391424. Throughput: 0: 44301.3. Samples: 53706100. Policy #0 lag: (min: 0.0, avg: 8.4, max: 21.0) [2024-06-06 15:25:27,318][24114] Avg episode reward: [(0, '0.199')] [2024-06-06 15:25:27,538][24347] Updated weights for policy 0, policy_version 34938 (0.0030) [2024-06-06 15:25:31,382][24347] Updated weights for policy 0, policy_version 34948 (0.0030) [2024-06-06 15:25:32,318][24114] Fps is (10 sec: 39321.4, 60 sec: 44509.9, 300 sec: 44320.1). Total num frames: 572604416. Throughput: 0: 44330.4. Samples: 53829100. Policy #0 lag: (min: 0.0, avg: 8.4, max: 21.0) [2024-06-06 15:25:32,319][24114] Avg episode reward: [(0, '0.196')] [2024-06-06 15:25:34,955][24347] Updated weights for policy 0, policy_version 34958 (0.0037) [2024-06-06 15:25:37,318][24114] Fps is (10 sec: 49151.5, 60 sec: 44782.8, 300 sec: 44486.7). Total num frames: 572882944. Throughput: 0: 44245.7. Samples: 54098740. Policy #0 lag: (min: 0.0, avg: 8.4, max: 21.0) [2024-06-06 15:25:37,319][24114] Avg episode reward: [(0, '0.195')] [2024-06-06 15:25:38,846][24347] Updated weights for policy 0, policy_version 34968 (0.0045) [2024-06-06 15:25:42,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44236.7, 300 sec: 44375.6). Total num frames: 573046784. Throughput: 0: 44367.0. Samples: 54370240. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 15:25:42,319][24114] Avg episode reward: [(0, '0.198')] [2024-06-06 15:25:42,500][24347] Updated weights for policy 0, policy_version 34978 (0.0029) [2024-06-06 15:25:45,945][24347] Updated weights for policy 0, policy_version 34988 (0.0030) [2024-06-06 15:25:47,318][24114] Fps is (10 sec: 37683.6, 60 sec: 43963.6, 300 sec: 44320.1). Total num frames: 573259776. Throughput: 0: 44345.9. Samples: 54498480. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 15:25:47,319][24114] Avg episode reward: [(0, '0.199')] [2024-06-06 15:25:49,961][24347] Updated weights for policy 0, policy_version 34998 (0.0031) [2024-06-06 15:25:52,318][24114] Fps is (10 sec: 49152.4, 60 sec: 44236.8, 300 sec: 44487.6). Total num frames: 573538304. Throughput: 0: 44521.7. Samples: 54769080. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 15:25:52,319][24114] Avg episode reward: [(0, '0.194')] [2024-06-06 15:25:53,656][24347] Updated weights for policy 0, policy_version 35008 (0.0033) [2024-06-06 15:25:55,679][24326] Signal inference workers to stop experience collection... (700 times) [2024-06-06 15:25:55,681][24326] Signal inference workers to resume experience collection... (700 times) [2024-06-06 15:25:55,698][24347] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-06-06 15:25:55,698][24347] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-06-06 15:25:57,098][24347] Updated weights for policy 0, policy_version 35018 (0.0030) [2024-06-06 15:25:57,318][24114] Fps is (10 sec: 49152.6, 60 sec: 44783.0, 300 sec: 44487.0). Total num frames: 573751296. Throughput: 0: 44543.2. Samples: 55042720. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 15:25:57,318][24114] Avg episode reward: [(0, '0.202')] [2024-06-06 15:26:00,734][24347] Updated weights for policy 0, policy_version 35028 (0.0037) [2024-06-06 15:26:02,318][24114] Fps is (10 sec: 37683.7, 60 sec: 43963.9, 300 sec: 44320.1). Total num frames: 573915136. Throughput: 0: 44378.2. Samples: 55166340. Policy #0 lag: (min: 0.0, avg: 8.0, max: 21.0) [2024-06-06 15:26:02,318][24114] Avg episode reward: [(0, '0.199')] [2024-06-06 15:26:04,677][24347] Updated weights for policy 0, policy_version 35038 (0.0028) [2024-06-06 15:26:07,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 574193664. Throughput: 0: 44218.7. Samples: 55431340. Policy #0 lag: (min: 0.0, avg: 7.9, max: 22.0) [2024-06-06 15:26:07,318][24114] Avg episode reward: [(0, '0.201')] [2024-06-06 15:26:08,159][24347] Updated weights for policy 0, policy_version 35048 (0.0029) [2024-06-06 15:26:12,013][24347] Updated weights for policy 0, policy_version 35058 (0.0033) [2024-06-06 15:26:12,318][24114] Fps is (10 sec: 47513.3, 60 sec: 44236.8, 300 sec: 44376.5). Total num frames: 574390272. Throughput: 0: 44356.0. Samples: 55702120. Policy #0 lag: (min: 0.0, avg: 7.9, max: 22.0) [2024-06-06 15:26:12,318][24114] Avg episode reward: [(0, '0.201')] [2024-06-06 15:26:15,444][24347] Updated weights for policy 0, policy_version 35068 (0.0033) [2024-06-06 15:26:17,318][24114] Fps is (10 sec: 37683.6, 60 sec: 43690.7, 300 sec: 44264.6). Total num frames: 574570496. Throughput: 0: 44410.0. Samples: 55827540. Policy #0 lag: (min: 0.0, avg: 7.9, max: 22.0) [2024-06-06 15:26:17,318][24114] Avg episode reward: [(0, '0.195')] [2024-06-06 15:26:19,553][24347] Updated weights for policy 0, policy_version 35078 (0.0035) [2024-06-06 15:26:22,318][24114] Fps is (10 sec: 45874.7, 60 sec: 43963.6, 300 sec: 44431.2). Total num frames: 574849024. Throughput: 0: 44302.2. Samples: 56092340. Policy #0 lag: (min: 0.0, avg: 7.9, max: 22.0) [2024-06-06 15:26:22,319][24114] Avg episode reward: [(0, '0.204')] [2024-06-06 15:26:22,349][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000035086_574849024.pth... [2024-06-06 15:26:22,401][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000034434_564166656.pth [2024-06-06 15:26:22,879][24347] Updated weights for policy 0, policy_version 35088 (0.0032) [2024-06-06 15:26:26,851][24347] Updated weights for policy 0, policy_version 35098 (0.0038) [2024-06-06 15:26:27,318][24114] Fps is (10 sec: 50790.1, 60 sec: 44783.0, 300 sec: 44487.6). Total num frames: 575078400. Throughput: 0: 44474.0. Samples: 56371560. Policy #0 lag: (min: 0.0, avg: 8.8, max: 22.0) [2024-06-06 15:26:27,318][24114] Avg episode reward: [(0, '0.197')] [2024-06-06 15:26:30,154][24347] Updated weights for policy 0, policy_version 35108 (0.0039) [2024-06-06 15:26:32,320][24114] Fps is (10 sec: 40952.2, 60 sec: 44235.4, 300 sec: 44375.4). Total num frames: 575258624. Throughput: 0: 44494.5. Samples: 56500820. Policy #0 lag: (min: 0.0, avg: 8.8, max: 22.0) [2024-06-06 15:26:32,321][24114] Avg episode reward: [(0, '0.204')] [2024-06-06 15:26:33,954][24347] Updated weights for policy 0, policy_version 35118 (0.0036) [2024-06-06 15:26:37,318][24114] Fps is (10 sec: 44236.1, 60 sec: 43963.8, 300 sec: 44431.3). Total num frames: 575520768. Throughput: 0: 44357.3. Samples: 56765160. Policy #0 lag: (min: 0.0, avg: 8.8, max: 22.0) [2024-06-06 15:26:37,319][24114] Avg episode reward: [(0, '0.200')] [2024-06-06 15:26:37,777][24347] Updated weights for policy 0, policy_version 35128 (0.0036) [2024-06-06 15:26:41,494][24347] Updated weights for policy 0, policy_version 35138 (0.0033) [2024-06-06 15:26:42,318][24114] Fps is (10 sec: 49161.4, 60 sec: 45056.0, 300 sec: 44486.7). Total num frames: 575750144. Throughput: 0: 44310.0. Samples: 57036680. Policy #0 lag: (min: 0.0, avg: 8.8, max: 22.0) [2024-06-06 15:26:42,319][24114] Avg episode reward: [(0, '0.205')] [2024-06-06 15:26:45,001][24347] Updated weights for policy 0, policy_version 35148 (0.0041) [2024-06-06 15:26:47,318][24114] Fps is (10 sec: 39322.4, 60 sec: 44236.9, 300 sec: 44320.1). Total num frames: 575913984. Throughput: 0: 44464.1. Samples: 57167220. Policy #0 lag: (min: 0.0, avg: 8.8, max: 22.0) [2024-06-06 15:26:47,318][24114] Avg episode reward: [(0, '0.191')] [2024-06-06 15:26:48,829][24347] Updated weights for policy 0, policy_version 35158 (0.0034) [2024-06-06 15:26:52,318][24114] Fps is (10 sec: 42597.3, 60 sec: 43963.5, 300 sec: 44375.6). Total num frames: 576176128. Throughput: 0: 44395.2. Samples: 57429140. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 15:26:52,319][24114] Avg episode reward: [(0, '0.198')] [2024-06-06 15:26:52,365][24347] Updated weights for policy 0, policy_version 35168 (0.0028) [2024-06-06 15:26:56,316][24347] Updated weights for policy 0, policy_version 35178 (0.0040) [2024-06-06 15:26:57,318][24114] Fps is (10 sec: 49151.8, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 576405504. Throughput: 0: 44394.8. Samples: 57699880. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 15:26:57,318][24114] Avg episode reward: [(0, '0.197')] [2024-06-06 15:26:59,496][24347] Updated weights for policy 0, policy_version 35188 (0.0040) [2024-06-06 15:27:02,318][24114] Fps is (10 sec: 42599.9, 60 sec: 44782.9, 300 sec: 44431.2). Total num frames: 576602112. Throughput: 0: 44452.8. Samples: 57827920. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 15:27:02,319][24114] Avg episode reward: [(0, '0.201')] [2024-06-06 15:27:03,543][24347] Updated weights for policy 0, policy_version 35198 (0.0036) [2024-06-06 15:27:06,933][24347] Updated weights for policy 0, policy_version 35208 (0.0040) [2024-06-06 15:27:07,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44236.8, 300 sec: 44486.8). Total num frames: 576847872. Throughput: 0: 44412.5. Samples: 58090900. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 15:27:07,319][24114] Avg episode reward: [(0, '0.206')] [2024-06-06 15:27:11,263][24347] Updated weights for policy 0, policy_version 35218 (0.0027) [2024-06-06 15:27:11,429][24326] Signal inference workers to stop experience collection... (750 times) [2024-06-06 15:27:11,471][24347] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-06-06 15:27:11,482][24326] Signal inference workers to resume experience collection... (750 times) [2024-06-06 15:27:11,488][24347] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-06-06 15:27:12,318][24114] Fps is (10 sec: 45874.3, 60 sec: 44509.7, 300 sec: 44431.2). Total num frames: 577060864. Throughput: 0: 44092.2. Samples: 58355720. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:27:12,319][24114] Avg episode reward: [(0, '0.196')] [2024-06-06 15:27:14,487][24347] Updated weights for policy 0, policy_version 35228 (0.0036) [2024-06-06 15:27:17,324][24114] Fps is (10 sec: 40935.6, 60 sec: 44778.4, 300 sec: 44430.3). Total num frames: 577257472. Throughput: 0: 44332.5. Samples: 58495960. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:27:17,325][24114] Avg episode reward: [(0, '0.201')] [2024-06-06 15:27:18,647][24347] Updated weights for policy 0, policy_version 35238 (0.0026) [2024-06-06 15:27:21,576][24347] Updated weights for policy 0, policy_version 35248 (0.0029) [2024-06-06 15:27:22,318][24114] Fps is (10 sec: 44237.7, 60 sec: 44236.9, 300 sec: 44375.9). Total num frames: 577503232. Throughput: 0: 44332.5. Samples: 58760120. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:27:22,318][24114] Avg episode reward: [(0, '0.203')] [2024-06-06 15:27:25,871][24347] Updated weights for policy 0, policy_version 35258 (0.0041) [2024-06-06 15:27:27,318][24114] Fps is (10 sec: 49180.8, 60 sec: 44509.7, 300 sec: 44486.7). Total num frames: 577748992. Throughput: 0: 44246.2. Samples: 59027760. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:27:27,319][24114] Avg episode reward: [(0, '0.202')] [2024-06-06 15:27:28,660][24347] Updated weights for policy 0, policy_version 35268 (0.0042) [2024-06-06 15:27:32,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44511.3, 300 sec: 44431.5). Total num frames: 577929216. Throughput: 0: 44429.6. Samples: 59166560. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:27:32,319][24114] Avg episode reward: [(0, '0.206')] [2024-06-06 15:27:33,117][24347] Updated weights for policy 0, policy_version 35278 (0.0029) [2024-06-06 15:27:36,110][24347] Updated weights for policy 0, policy_version 35288 (0.0028) [2024-06-06 15:27:37,318][24114] Fps is (10 sec: 40960.2, 60 sec: 43963.7, 300 sec: 44431.2). Total num frames: 578158592. Throughput: 0: 44417.6. Samples: 59427920. Policy #0 lag: (min: 0.0, avg: 10.4, max: 25.0) [2024-06-06 15:27:37,319][24114] Avg episode reward: [(0, '0.198')] [2024-06-06 15:27:40,721][24347] Updated weights for policy 0, policy_version 35298 (0.0021) [2024-06-06 15:27:42,318][24114] Fps is (10 sec: 49151.4, 60 sec: 44509.8, 300 sec: 44542.2). Total num frames: 578420736. Throughput: 0: 44491.3. Samples: 59702000. Policy #0 lag: (min: 0.0, avg: 10.4, max: 25.0) [2024-06-06 15:27:42,319][24114] Avg episode reward: [(0, '0.201')] [2024-06-06 15:27:43,747][24347] Updated weights for policy 0, policy_version 35308 (0.0036) [2024-06-06 15:27:47,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44782.9, 300 sec: 44486.7). Total num frames: 578600960. Throughput: 0: 44762.3. Samples: 59842220. Policy #0 lag: (min: 0.0, avg: 10.4, max: 25.0) [2024-06-06 15:27:47,318][24114] Avg episode reward: [(0, '0.208')] [2024-06-06 15:27:47,901][24347] Updated weights for policy 0, policy_version 35318 (0.0022) [2024-06-06 15:27:50,817][24347] Updated weights for policy 0, policy_version 35328 (0.0034) [2024-06-06 15:27:52,318][24114] Fps is (10 sec: 40960.1, 60 sec: 44237.0, 300 sec: 44431.2). Total num frames: 578830336. Throughput: 0: 44595.0. Samples: 60097680. Policy #0 lag: (min: 0.0, avg: 10.4, max: 25.0) [2024-06-06 15:27:52,319][24114] Avg episode reward: [(0, '0.201')] [2024-06-06 15:27:55,178][24347] Updated weights for policy 0, policy_version 35338 (0.0032) [2024-06-06 15:27:57,318][24114] Fps is (10 sec: 49152.0, 60 sec: 44782.9, 300 sec: 44542.6). Total num frames: 579092480. Throughput: 0: 44657.1. Samples: 60365280. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 15:27:57,318][24114] Avg episode reward: [(0, '0.202')] [2024-06-06 15:27:57,913][24347] Updated weights for policy 0, policy_version 35348 (0.0026) [2024-06-06 15:28:02,318][24114] Fps is (10 sec: 44237.9, 60 sec: 44509.9, 300 sec: 44320.1). Total num frames: 579272704. Throughput: 0: 44696.7. Samples: 60507040. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 15:28:02,318][24114] Avg episode reward: [(0, '0.202')] [2024-06-06 15:28:02,835][24347] Updated weights for policy 0, policy_version 35358 (0.0032) [2024-06-06 15:28:05,433][24347] Updated weights for policy 0, policy_version 35368 (0.0038) [2024-06-06 15:28:07,318][24114] Fps is (10 sec: 39321.2, 60 sec: 43963.7, 300 sec: 44431.5). Total num frames: 579485696. Throughput: 0: 44485.7. Samples: 60761980. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 15:28:07,319][24114] Avg episode reward: [(0, '0.204')] [2024-06-06 15:28:10,201][24347] Updated weights for policy 0, policy_version 35378 (0.0024) [2024-06-06 15:28:12,318][24114] Fps is (10 sec: 49151.9, 60 sec: 45056.2, 300 sec: 44542.3). Total num frames: 579764224. Throughput: 0: 44560.2. Samples: 61032960. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 15:28:12,318][24114] Avg episode reward: [(0, '0.209')] [2024-06-06 15:28:12,889][24347] Updated weights for policy 0, policy_version 35388 (0.0046) [2024-06-06 15:28:17,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44787.4, 300 sec: 44376.5). Total num frames: 579944448. Throughput: 0: 44604.9. Samples: 61173780. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 15:28:17,318][24114] Avg episode reward: [(0, '0.203')] [2024-06-06 15:28:17,408][24347] Updated weights for policy 0, policy_version 35398 (0.0032) [2024-06-06 15:28:20,023][24347] Updated weights for policy 0, policy_version 35408 (0.0027) [2024-06-06 15:28:22,318][24114] Fps is (10 sec: 37683.0, 60 sec: 43963.7, 300 sec: 44431.2). Total num frames: 580141056. Throughput: 0: 44660.5. Samples: 61437640. Policy #0 lag: (min: 1.0, avg: 13.1, max: 23.0) [2024-06-06 15:28:22,318][24114] Avg episode reward: [(0, '0.209')] [2024-06-06 15:28:22,401][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000035410_580157440.pth... [2024-06-06 15:28:22,458][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000034761_569524224.pth [2024-06-06 15:28:23,168][24326] Signal inference workers to stop experience collection... (800 times) [2024-06-06 15:28:23,168][24326] Signal inference workers to resume experience collection... (800 times) [2024-06-06 15:28:23,181][24347] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-06-06 15:28:23,182][24347] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-06-06 15:28:25,002][24347] Updated weights for policy 0, policy_version 35418 (0.0029) [2024-06-06 15:28:27,207][24347] Updated weights for policy 0, policy_version 35428 (0.0036) [2024-06-06 15:28:27,322][24114] Fps is (10 sec: 50769.3, 60 sec: 45053.0, 300 sec: 44541.6). Total num frames: 580452352. Throughput: 0: 44337.9. Samples: 61697380. Policy #0 lag: (min: 1.0, avg: 13.1, max: 23.0) [2024-06-06 15:28:27,323][24114] Avg episode reward: [(0, '0.205')] [2024-06-06 15:28:32,318][24114] Fps is (10 sec: 45874.0, 60 sec: 44509.7, 300 sec: 44320.6). Total num frames: 580599808. Throughput: 0: 44540.6. Samples: 61846560. Policy #0 lag: (min: 1.0, avg: 13.1, max: 23.0) [2024-06-06 15:28:32,319][24114] Avg episode reward: [(0, '0.209')] [2024-06-06 15:28:32,333][24347] Updated weights for policy 0, policy_version 35438 (0.0040) [2024-06-06 15:28:34,708][24347] Updated weights for policy 0, policy_version 35448 (0.0029) [2024-06-06 15:28:37,318][24114] Fps is (10 sec: 36059.8, 60 sec: 44236.9, 300 sec: 44375.7). Total num frames: 580812800. Throughput: 0: 44592.1. Samples: 62104320. Policy #0 lag: (min: 1.0, avg: 13.1, max: 23.0) [2024-06-06 15:28:37,318][24114] Avg episode reward: [(0, '0.205')] [2024-06-06 15:28:39,662][24347] Updated weights for policy 0, policy_version 35458 (0.0040) [2024-06-06 15:28:42,151][24347] Updated weights for policy 0, policy_version 35468 (0.0039) [2024-06-06 15:28:42,318][24114] Fps is (10 sec: 50791.3, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 581107712. Throughput: 0: 44477.7. Samples: 62366780. Policy #0 lag: (min: 0.0, avg: 12.7, max: 22.0) [2024-06-06 15:28:42,319][24114] Avg episode reward: [(0, '0.210')] [2024-06-06 15:28:46,991][24347] Updated weights for policy 0, policy_version 35478 (0.0023) [2024-06-06 15:28:47,318][24114] Fps is (10 sec: 47513.9, 60 sec: 44782.9, 300 sec: 44431.2). Total num frames: 581287936. Throughput: 0: 44485.3. Samples: 62508880. Policy #0 lag: (min: 0.0, avg: 12.7, max: 22.0) [2024-06-06 15:28:47,318][24114] Avg episode reward: [(0, '0.208')] [2024-06-06 15:28:49,816][24347] Updated weights for policy 0, policy_version 35488 (0.0033) [2024-06-06 15:28:52,320][24114] Fps is (10 sec: 36038.1, 60 sec: 43962.4, 300 sec: 44319.8). Total num frames: 581468160. Throughput: 0: 44602.2. Samples: 62769160. Policy #0 lag: (min: 0.0, avg: 12.7, max: 22.0) [2024-06-06 15:28:52,321][24114] Avg episode reward: [(0, '0.207')] [2024-06-06 15:28:54,589][24347] Updated weights for policy 0, policy_version 35498 (0.0047) [2024-06-06 15:28:57,115][24347] Updated weights for policy 0, policy_version 35508 (0.0031) [2024-06-06 15:28:57,318][24114] Fps is (10 sec: 47513.0, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 581763072. Throughput: 0: 44323.9. Samples: 63027540. Policy #0 lag: (min: 0.0, avg: 12.7, max: 22.0) [2024-06-06 15:28:57,319][24114] Avg episode reward: [(0, '0.205')] [2024-06-06 15:29:01,758][24347] Updated weights for policy 0, policy_version 35518 (0.0028) [2024-06-06 15:29:02,318][24114] Fps is (10 sec: 45884.1, 60 sec: 44236.7, 300 sec: 44320.1). Total num frames: 581926912. Throughput: 0: 44313.4. Samples: 63167880. Policy #0 lag: (min: 0.0, avg: 12.7, max: 22.0) [2024-06-06 15:29:02,318][24114] Avg episode reward: [(0, '0.204')] [2024-06-06 15:29:04,730][24347] Updated weights for policy 0, policy_version 35528 (0.0036) [2024-06-06 15:29:07,318][24114] Fps is (10 sec: 36045.3, 60 sec: 43963.8, 300 sec: 44264.6). Total num frames: 582123520. Throughput: 0: 44257.4. Samples: 63429220. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 15:29:07,318][24114] Avg episode reward: [(0, '0.207')] [2024-06-06 15:29:09,378][24347] Updated weights for policy 0, policy_version 35538 (0.0031) [2024-06-06 15:29:12,122][24347] Updated weights for policy 0, policy_version 35548 (0.0024) [2024-06-06 15:29:12,318][24114] Fps is (10 sec: 49152.1, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 582418432. Throughput: 0: 44301.5. Samples: 63690760. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 15:29:12,318][24114] Avg episode reward: [(0, '0.207')] [2024-06-06 15:29:16,463][24347] Updated weights for policy 0, policy_version 35558 (0.0027) [2024-06-06 15:29:17,318][24114] Fps is (10 sec: 47512.6, 60 sec: 44236.7, 300 sec: 44320.1). Total num frames: 582598656. Throughput: 0: 44137.9. Samples: 63832760. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 15:29:17,319][24114] Avg episode reward: [(0, '0.211')] [2024-06-06 15:29:19,554][24347] Updated weights for policy 0, policy_version 35568 (0.0027) [2024-06-06 15:29:22,318][24114] Fps is (10 sec: 37683.5, 60 sec: 44236.9, 300 sec: 44264.6). Total num frames: 582795264. Throughput: 0: 44267.7. Samples: 64096360. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 15:29:22,318][24114] Avg episode reward: [(0, '0.205')] [2024-06-06 15:29:24,015][24347] Updated weights for policy 0, policy_version 35578 (0.0027) [2024-06-06 15:29:25,249][24326] Signal inference workers to stop experience collection... (850 times) [2024-06-06 15:29:25,274][24347] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-06-06 15:29:25,309][24326] Signal inference workers to resume experience collection... (850 times) [2024-06-06 15:29:25,310][24347] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-06-06 15:29:27,052][24347] Updated weights for policy 0, policy_version 35588 (0.0035) [2024-06-06 15:29:27,318][24114] Fps is (10 sec: 47514.7, 60 sec: 43693.7, 300 sec: 44542.3). Total num frames: 583073792. Throughput: 0: 44340.1. Samples: 64362080. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 15:29:27,318][24114] Avg episode reward: [(0, '0.207')] [2024-06-06 15:29:31,255][24347] Updated weights for policy 0, policy_version 35598 (0.0031) [2024-06-06 15:29:32,318][24114] Fps is (10 sec: 47513.3, 60 sec: 44510.1, 300 sec: 44320.1). Total num frames: 583270400. Throughput: 0: 44160.9. Samples: 64496120. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:29:32,318][24114] Avg episode reward: [(0, '0.214')] [2024-06-06 15:29:34,281][24347] Updated weights for policy 0, policy_version 35608 (0.0023) [2024-06-06 15:29:37,318][24114] Fps is (10 sec: 40959.1, 60 sec: 44509.8, 300 sec: 44375.6). Total num frames: 583483392. Throughput: 0: 44399.5. Samples: 64767060. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:29:37,327][24114] Avg episode reward: [(0, '0.208')] [2024-06-06 15:29:38,741][24347] Updated weights for policy 0, policy_version 35618 (0.0038) [2024-06-06 15:29:41,825][24347] Updated weights for policy 0, policy_version 35628 (0.0028) [2024-06-06 15:29:42,318][24114] Fps is (10 sec: 47512.8, 60 sec: 43963.7, 300 sec: 44486.7). Total num frames: 583745536. Throughput: 0: 44424.8. Samples: 65026660. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:29:42,319][24114] Avg episode reward: [(0, '0.212')] [2024-06-06 15:29:46,184][24347] Updated weights for policy 0, policy_version 35638 (0.0037) [2024-06-06 15:29:47,318][24114] Fps is (10 sec: 49152.3, 60 sec: 44782.8, 300 sec: 44375.6). Total num frames: 583974912. Throughput: 0: 44433.7. Samples: 65167400. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:29:47,318][24114] Avg episode reward: [(0, '0.197')] [2024-06-06 15:29:49,299][24347] Updated weights for policy 0, policy_version 35648 (0.0043) [2024-06-06 15:29:52,318][24114] Fps is (10 sec: 40959.8, 60 sec: 44784.2, 300 sec: 44375.6). Total num frames: 584155136. Throughput: 0: 44515.3. Samples: 65432420. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-06 15:29:52,319][24114] Avg episode reward: [(0, '0.208')] [2024-06-06 15:29:53,611][24347] Updated weights for policy 0, policy_version 35658 (0.0033) [2024-06-06 15:29:56,681][24347] Updated weights for policy 0, policy_version 35668 (0.0035) [2024-06-06 15:29:57,324][24114] Fps is (10 sec: 42573.5, 60 sec: 43959.5, 300 sec: 44485.9). Total num frames: 584400896. Throughput: 0: 44383.5. Samples: 65688280. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-06 15:29:57,325][24114] Avg episode reward: [(0, '0.202')] [2024-06-06 15:30:00,771][24347] Updated weights for policy 0, policy_version 35678 (0.0042) [2024-06-06 15:30:02,318][24114] Fps is (10 sec: 47514.3, 60 sec: 45056.0, 300 sec: 44375.6). Total num frames: 584630272. Throughput: 0: 44215.7. Samples: 65822460. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-06 15:30:02,319][24114] Avg episode reward: [(0, '0.208')] [2024-06-06 15:30:04,211][24347] Updated weights for policy 0, policy_version 35688 (0.0028) [2024-06-06 15:30:07,318][24114] Fps is (10 sec: 39345.0, 60 sec: 44509.8, 300 sec: 44264.6). Total num frames: 584794112. Throughput: 0: 44187.0. Samples: 66084780. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-06 15:30:07,318][24114] Avg episode reward: [(0, '0.206')] [2024-06-06 15:30:08,237][24347] Updated weights for policy 0, policy_version 35698 (0.0038) [2024-06-06 15:30:11,506][24347] Updated weights for policy 0, policy_version 35708 (0.0032) [2024-06-06 15:30:12,318][24114] Fps is (10 sec: 40960.0, 60 sec: 43690.6, 300 sec: 44375.6). Total num frames: 585039872. Throughput: 0: 44119.5. Samples: 66347460. Policy #0 lag: (min: 0.0, avg: 8.9, max: 22.0) [2024-06-06 15:30:12,319][24114] Avg episode reward: [(0, '0.211')] [2024-06-06 15:30:15,862][24347] Updated weights for policy 0, policy_version 35718 (0.0031) [2024-06-06 15:30:17,318][24114] Fps is (10 sec: 50789.8, 60 sec: 45056.1, 300 sec: 44375.6). Total num frames: 585302016. Throughput: 0: 44312.3. Samples: 66490180. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 15:30:17,319][24114] Avg episode reward: [(0, '0.210')] [2024-06-06 15:30:19,086][24347] Updated weights for policy 0, policy_version 35728 (0.0026) [2024-06-06 15:30:22,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44509.8, 300 sec: 44320.1). Total num frames: 585465856. Throughput: 0: 44244.5. Samples: 66758060. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 15:30:22,319][24114] Avg episode reward: [(0, '0.212')] [2024-06-06 15:30:22,388][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000035735_585482240.pth... [2024-06-06 15:30:22,465][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000035086_574849024.pth [2024-06-06 15:30:22,956][24347] Updated weights for policy 0, policy_version 35738 (0.0039) [2024-06-06 15:30:26,440][24347] Updated weights for policy 0, policy_version 35748 (0.0035) [2024-06-06 15:30:27,318][24114] Fps is (10 sec: 39322.1, 60 sec: 43690.6, 300 sec: 44375.7). Total num frames: 585695232. Throughput: 0: 44235.7. Samples: 67017260. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 15:30:27,318][24114] Avg episode reward: [(0, '0.207')] [2024-06-06 15:30:30,468][24347] Updated weights for policy 0, policy_version 35758 (0.0037) [2024-06-06 15:30:32,318][24114] Fps is (10 sec: 49152.3, 60 sec: 44782.9, 300 sec: 44320.1). Total num frames: 585957376. Throughput: 0: 44081.4. Samples: 67151060. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 15:30:32,318][24114] Avg episode reward: [(0, '0.217')] [2024-06-06 15:30:33,860][24347] Updated weights for policy 0, policy_version 35768 (0.0048) [2024-06-06 15:30:37,081][24326] Signal inference workers to stop experience collection... (900 times) [2024-06-06 15:30:37,138][24347] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-06-06 15:30:37,138][24326] Signal inference workers to resume experience collection... (900 times) [2024-06-06 15:30:37,153][24347] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-06-06 15:30:37,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44510.0, 300 sec: 44431.2). Total num frames: 586153984. Throughput: 0: 44102.4. Samples: 67417020. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 15:30:37,318][24114] Avg episode reward: [(0, '0.208')] [2024-06-06 15:30:37,593][24347] Updated weights for policy 0, policy_version 35778 (0.0028) [2024-06-06 15:30:41,722][24347] Updated weights for policy 0, policy_version 35788 (0.0041) [2024-06-06 15:30:42,318][24114] Fps is (10 sec: 40959.8, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 586366976. Throughput: 0: 44317.4. Samples: 67682300. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 15:30:42,318][24114] Avg episode reward: [(0, '0.214')] [2024-06-06 15:30:45,373][24347] Updated weights for policy 0, policy_version 35798 (0.0033) [2024-06-06 15:30:47,318][24114] Fps is (10 sec: 45875.1, 60 sec: 43963.8, 300 sec: 44320.1). Total num frames: 586612736. Throughput: 0: 44257.4. Samples: 67814040. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 15:30:47,318][24114] Avg episode reward: [(0, '0.208')] [2024-06-06 15:30:49,403][24347] Updated weights for policy 0, policy_version 35808 (0.0032) [2024-06-06 15:30:52,318][24114] Fps is (10 sec: 44236.9, 60 sec: 44236.9, 300 sec: 44264.6). Total num frames: 586809344. Throughput: 0: 44267.5. Samples: 68076820. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 15:30:52,319][24114] Avg episode reward: [(0, '0.210')] [2024-06-06 15:30:52,776][24347] Updated weights for policy 0, policy_version 35818 (0.0026) [2024-06-06 15:30:56,608][24347] Updated weights for policy 0, policy_version 35828 (0.0036) [2024-06-06 15:30:57,318][24114] Fps is (10 sec: 40959.6, 60 sec: 43694.9, 300 sec: 44431.2). Total num frames: 587022336. Throughput: 0: 44339.9. Samples: 68342760. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 15:30:57,319][24114] Avg episode reward: [(0, '0.211')] [2024-06-06 15:31:00,184][24347] Updated weights for policy 0, policy_version 35838 (0.0033) [2024-06-06 15:31:02,318][24114] Fps is (10 sec: 45875.4, 60 sec: 43963.8, 300 sec: 44320.1). Total num frames: 587268096. Throughput: 0: 44100.6. Samples: 68474700. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 15:31:02,318][24114] Avg episode reward: [(0, '0.214')] [2024-06-06 15:31:04,088][24347] Updated weights for policy 0, policy_version 35848 (0.0035) [2024-06-06 15:31:07,320][24114] Fps is (10 sec: 45865.9, 60 sec: 44781.3, 300 sec: 44375.3). Total num frames: 587481088. Throughput: 0: 44088.6. Samples: 68742140. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 15:31:07,321][24114] Avg episode reward: [(0, '0.207')] [2024-06-06 15:31:07,499][24347] Updated weights for policy 0, policy_version 35858 (0.0022) [2024-06-06 15:31:12,034][24347] Updated weights for policy 0, policy_version 35868 (0.0039) [2024-06-06 15:31:12,318][24114] Fps is (10 sec: 40960.0, 60 sec: 43963.8, 300 sec: 44431.2). Total num frames: 587677696. Throughput: 0: 44281.8. Samples: 69009940. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 15:31:12,318][24114] Avg episode reward: [(0, '0.221')] [2024-06-06 15:31:14,833][24347] Updated weights for policy 0, policy_version 35878 (0.0027) [2024-06-06 15:31:17,320][24114] Fps is (10 sec: 45876.2, 60 sec: 43962.4, 300 sec: 44375.4). Total num frames: 587939840. Throughput: 0: 44121.7. Samples: 69136620. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 15:31:17,321][24114] Avg episode reward: [(0, '0.220')] [2024-06-06 15:31:19,011][24347] Updated weights for policy 0, policy_version 35888 (0.0036) [2024-06-06 15:31:22,152][24347] Updated weights for policy 0, policy_version 35898 (0.0027) [2024-06-06 15:31:22,318][24114] Fps is (10 sec: 47513.7, 60 sec: 44783.0, 300 sec: 44320.1). Total num frames: 588152832. Throughput: 0: 44235.6. Samples: 69407620. Policy #0 lag: (min: 0.0, avg: 11.0, max: 23.0) [2024-06-06 15:31:22,318][24114] Avg episode reward: [(0, '0.208')] [2024-06-06 15:31:26,142][24347] Updated weights for policy 0, policy_version 35908 (0.0037) [2024-06-06 15:31:27,318][24114] Fps is (10 sec: 40967.8, 60 sec: 44236.8, 300 sec: 44375.9). Total num frames: 588349440. Throughput: 0: 44224.5. Samples: 69672400. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 15:31:27,318][24114] Avg episode reward: [(0, '0.207')] [2024-06-06 15:31:29,471][24347] Updated weights for policy 0, policy_version 35918 (0.0038) [2024-06-06 15:31:32,318][24114] Fps is (10 sec: 45874.4, 60 sec: 44236.7, 300 sec: 44375.6). Total num frames: 588611584. Throughput: 0: 44195.5. Samples: 69802840. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 15:31:32,319][24114] Avg episode reward: [(0, '0.215')] [2024-06-06 15:31:33,376][24347] Updated weights for policy 0, policy_version 35928 (0.0029) [2024-06-06 15:31:36,662][24347] Updated weights for policy 0, policy_version 35938 (0.0024) [2024-06-06 15:31:37,318][24114] Fps is (10 sec: 49152.3, 60 sec: 44783.0, 300 sec: 44375.7). Total num frames: 588840960. Throughput: 0: 44422.3. Samples: 70075820. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 15:31:37,318][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:31:40,971][24347] Updated weights for policy 0, policy_version 35948 (0.0031) [2024-06-06 15:31:42,319][24114] Fps is (10 sec: 40956.2, 60 sec: 44236.0, 300 sec: 44431.0). Total num frames: 589021184. Throughput: 0: 44488.8. Samples: 70344800. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 15:31:42,320][24114] Avg episode reward: [(0, '0.217')] [2024-06-06 15:31:44,071][24347] Updated weights for policy 0, policy_version 35958 (0.0023) [2024-06-06 15:31:47,318][24114] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 44375.7). Total num frames: 589266944. Throughput: 0: 44518.2. Samples: 70478020. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 15:31:47,319][24114] Avg episode reward: [(0, '0.209')] [2024-06-06 15:31:48,175][24347] Updated weights for policy 0, policy_version 35968 (0.0037) [2024-06-06 15:31:49,613][24326] Signal inference workers to stop experience collection... (950 times) [2024-06-06 15:31:49,652][24347] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-06-06 15:31:49,660][24326] Signal inference workers to resume experience collection... (950 times) [2024-06-06 15:31:49,665][24347] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-06-06 15:31:51,408][24347] Updated weights for policy 0, policy_version 35978 (0.0043) [2024-06-06 15:31:52,318][24114] Fps is (10 sec: 45880.2, 60 sec: 44509.9, 300 sec: 44320.1). Total num frames: 589479936. Throughput: 0: 44435.0. Samples: 70741620. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:31:52,318][24114] Avg episode reward: [(0, '0.214')] [2024-06-06 15:31:55,895][24347] Updated weights for policy 0, policy_version 35988 (0.0029) [2024-06-06 15:31:57,318][24114] Fps is (10 sec: 42597.9, 60 sec: 44509.9, 300 sec: 44375.6). Total num frames: 589692928. Throughput: 0: 44458.9. Samples: 71010600. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:31:57,318][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:31:58,832][24347] Updated weights for policy 0, policy_version 35998 (0.0035) [2024-06-06 15:32:02,318][24114] Fps is (10 sec: 44236.1, 60 sec: 44236.7, 300 sec: 44320.1). Total num frames: 589922304. Throughput: 0: 44556.9. Samples: 71141600. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:32:02,319][24114] Avg episode reward: [(0, '0.218')] [2024-06-06 15:32:03,195][24347] Updated weights for policy 0, policy_version 36008 (0.0024) [2024-06-06 15:32:06,195][24347] Updated weights for policy 0, policy_version 36018 (0.0029) [2024-06-06 15:32:07,318][24114] Fps is (10 sec: 45876.1, 60 sec: 44511.5, 300 sec: 44375.7). Total num frames: 590151680. Throughput: 0: 44353.3. Samples: 71403520. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:32:07,318][24114] Avg episode reward: [(0, '0.218')] [2024-06-06 15:32:10,446][24347] Updated weights for policy 0, policy_version 36028 (0.0035) [2024-06-06 15:32:12,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44782.9, 300 sec: 44432.1). Total num frames: 590364672. Throughput: 0: 44611.5. Samples: 71679920. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-06 15:32:12,319][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:32:13,515][24347] Updated weights for policy 0, policy_version 36038 (0.0037) [2024-06-06 15:32:17,318][24114] Fps is (10 sec: 42598.1, 60 sec: 43965.1, 300 sec: 44320.1). Total num frames: 590577664. Throughput: 0: 44508.6. Samples: 71805720. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-06 15:32:17,318][24114] Avg episode reward: [(0, '0.220')] [2024-06-06 15:32:18,004][24347] Updated weights for policy 0, policy_version 36048 (0.0033) [2024-06-06 15:32:21,089][24347] Updated weights for policy 0, policy_version 36058 (0.0044) [2024-06-06 15:32:22,318][24114] Fps is (10 sec: 45874.8, 60 sec: 44509.7, 300 sec: 44320.1). Total num frames: 590823424. Throughput: 0: 44330.5. Samples: 72070700. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-06 15:32:22,319][24114] Avg episode reward: [(0, '0.210')] [2024-06-06 15:32:22,344][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000036061_590823424.pth... [2024-06-06 15:32:22,411][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000035410_580157440.pth [2024-06-06 15:32:25,308][24347] Updated weights for policy 0, policy_version 36068 (0.0034) [2024-06-06 15:32:27,318][24114] Fps is (10 sec: 45874.6, 60 sec: 44782.9, 300 sec: 44431.2). Total num frames: 591036416. Throughput: 0: 44249.8. Samples: 72336000. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-06 15:32:27,319][24114] Avg episode reward: [(0, '0.217')] [2024-06-06 15:32:28,269][24347] Updated weights for policy 0, policy_version 36078 (0.0027) [2024-06-06 15:32:32,318][24114] Fps is (10 sec: 42598.9, 60 sec: 43963.8, 300 sec: 44375.7). Total num frames: 591249408. Throughput: 0: 44235.1. Samples: 72468600. Policy #0 lag: (min: 0.0, avg: 8.1, max: 21.0) [2024-06-06 15:32:32,319][24114] Avg episode reward: [(0, '0.219')] [2024-06-06 15:32:32,779][24347] Updated weights for policy 0, policy_version 36088 (0.0031) [2024-06-06 15:32:35,954][24347] Updated weights for policy 0, policy_version 36098 (0.0023) [2024-06-06 15:32:37,319][24114] Fps is (10 sec: 45872.3, 60 sec: 44236.2, 300 sec: 44320.0). Total num frames: 591495168. Throughput: 0: 44193.9. Samples: 72730380. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 15:32:37,319][24114] Avg episode reward: [(0, '0.210')] [2024-06-06 15:32:40,113][24347] Updated weights for policy 0, policy_version 36108 (0.0041) [2024-06-06 15:32:42,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44510.7, 300 sec: 44375.6). Total num frames: 591691776. Throughput: 0: 44146.8. Samples: 72997200. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 15:32:42,318][24114] Avg episode reward: [(0, '0.215')] [2024-06-06 15:32:43,431][24347] Updated weights for policy 0, policy_version 36118 (0.0036) [2024-06-06 15:32:47,318][24114] Fps is (10 sec: 40963.3, 60 sec: 43963.8, 300 sec: 44320.1). Total num frames: 591904768. Throughput: 0: 44277.5. Samples: 73134080. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 15:32:47,318][24114] Avg episode reward: [(0, '0.215')] [2024-06-06 15:32:47,679][24347] Updated weights for policy 0, policy_version 36128 (0.0043) [2024-06-06 15:32:50,981][24347] Updated weights for policy 0, policy_version 36138 (0.0028) [2024-06-06 15:32:52,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44509.9, 300 sec: 44264.6). Total num frames: 592150528. Throughput: 0: 44299.1. Samples: 73396980. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 15:32:52,318][24114] Avg episode reward: [(0, '0.216')] [2024-06-06 15:32:54,791][24347] Updated weights for policy 0, policy_version 36148 (0.0035) [2024-06-06 15:32:57,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44510.0, 300 sec: 44375.6). Total num frames: 592363520. Throughput: 0: 44031.2. Samples: 73661320. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 15:32:57,318][24114] Avg episode reward: [(0, '0.216')] [2024-06-06 15:32:58,084][24347] Updated weights for policy 0, policy_version 36158 (0.0035) [2024-06-06 15:33:02,318][24114] Fps is (10 sec: 40959.9, 60 sec: 43963.8, 300 sec: 44320.1). Total num frames: 592560128. Throughput: 0: 44241.3. Samples: 73796580. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:33:02,319][24114] Avg episode reward: [(0, '0.215')] [2024-06-06 15:33:02,659][24347] Updated weights for policy 0, policy_version 36168 (0.0032) [2024-06-06 15:33:05,619][24347] Updated weights for policy 0, policy_version 36178 (0.0028) [2024-06-06 15:33:07,318][24114] Fps is (10 sec: 45874.7, 60 sec: 44509.7, 300 sec: 44264.5). Total num frames: 592822272. Throughput: 0: 44260.9. Samples: 74062440. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:33:07,319][24114] Avg episode reward: [(0, '0.213')] [2024-06-06 15:33:10,277][24347] Updated weights for policy 0, policy_version 36188 (0.0041) [2024-06-06 15:33:12,318][24114] Fps is (10 sec: 47513.1, 60 sec: 44509.8, 300 sec: 44375.6). Total num frames: 593035264. Throughput: 0: 44257.3. Samples: 74327580. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:33:12,319][24114] Avg episode reward: [(0, '0.216')] [2024-06-06 15:33:12,923][24347] Updated weights for policy 0, policy_version 36198 (0.0038) [2024-06-06 15:33:17,320][24114] Fps is (10 sec: 39312.3, 60 sec: 43961.9, 300 sec: 44319.7). Total num frames: 593215488. Throughput: 0: 44258.9. Samples: 74460360. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:33:17,322][24114] Avg episode reward: [(0, '0.210')] [2024-06-06 15:33:17,497][24347] Updated weights for policy 0, policy_version 36208 (0.0034) [2024-06-06 15:33:18,896][24326] Signal inference workers to stop experience collection... (1000 times) [2024-06-06 15:33:18,896][24326] Signal inference workers to resume experience collection... (1000 times) [2024-06-06 15:33:18,932][24347] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-06-06 15:33:18,932][24347] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-06-06 15:33:20,349][24347] Updated weights for policy 0, policy_version 36218 (0.0033) [2024-06-06 15:33:22,318][24114] Fps is (10 sec: 42598.7, 60 sec: 43963.8, 300 sec: 44098.6). Total num frames: 593461248. Throughput: 0: 44455.4. Samples: 74730840. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 15:33:22,319][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:33:24,475][24347] Updated weights for policy 0, policy_version 36228 (0.0037) [2024-06-06 15:33:27,318][24114] Fps is (10 sec: 49163.9, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 593707008. Throughput: 0: 44381.2. Samples: 74994360. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 15:33:27,319][24114] Avg episode reward: [(0, '0.211')] [2024-06-06 15:33:27,740][24347] Updated weights for policy 0, policy_version 36238 (0.0025) [2024-06-06 15:33:32,062][24347] Updated weights for policy 0, policy_version 36248 (0.0036) [2024-06-06 15:33:32,320][24114] Fps is (10 sec: 42590.3, 60 sec: 43962.3, 300 sec: 44319.8). Total num frames: 593887232. Throughput: 0: 44358.0. Samples: 75130280. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 15:33:32,329][24114] Avg episode reward: [(0, '0.212')] [2024-06-06 15:33:35,082][24347] Updated weights for policy 0, policy_version 36258 (0.0038) [2024-06-06 15:33:37,318][24114] Fps is (10 sec: 42598.9, 60 sec: 43964.3, 300 sec: 44153.5). Total num frames: 594132992. Throughput: 0: 44457.8. Samples: 75397580. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 15:33:37,318][24114] Avg episode reward: [(0, '0.219')] [2024-06-06 15:33:39,513][24347] Updated weights for policy 0, policy_version 36268 (0.0031) [2024-06-06 15:33:42,168][24347] Updated weights for policy 0, policy_version 36278 (0.0039) [2024-06-06 15:33:42,318][24114] Fps is (10 sec: 49161.5, 60 sec: 44782.9, 300 sec: 44375.6). Total num frames: 594378752. Throughput: 0: 44479.1. Samples: 75662880. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 15:33:42,318][24114] Avg episode reward: [(0, '0.208')] [2024-06-06 15:33:46,613][24347] Updated weights for policy 0, policy_version 36288 (0.0025) [2024-06-06 15:33:47,318][24114] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 44376.0). Total num frames: 594558976. Throughput: 0: 44470.4. Samples: 75797740. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 15:33:47,318][24114] Avg episode reward: [(0, '0.210')] [2024-06-06 15:33:49,657][24347] Updated weights for policy 0, policy_version 36298 (0.0037) [2024-06-06 15:33:52,318][24114] Fps is (10 sec: 40959.4, 60 sec: 43963.6, 300 sec: 44153.5). Total num frames: 594788352. Throughput: 0: 44520.4. Samples: 76065860. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 15:33:52,323][24114] Avg episode reward: [(0, '0.209')] [2024-06-06 15:33:53,734][24347] Updated weights for policy 0, policy_version 36308 (0.0034) [2024-06-06 15:33:57,023][24347] Updated weights for policy 0, policy_version 36318 (0.0022) [2024-06-06 15:33:57,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 595034112. Throughput: 0: 44499.8. Samples: 76330060. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 15:33:57,318][24114] Avg episode reward: [(0, '0.217')] [2024-06-06 15:34:01,221][24347] Updated weights for policy 0, policy_version 36328 (0.0039) [2024-06-06 15:34:02,318][24114] Fps is (10 sec: 44237.3, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 595230720. Throughput: 0: 44468.2. Samples: 76461320. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 15:34:02,319][24114] Avg episode reward: [(0, '0.221')] [2024-06-06 15:34:04,410][24347] Updated weights for policy 0, policy_version 36338 (0.0035) [2024-06-06 15:34:07,318][24114] Fps is (10 sec: 45874.6, 60 sec: 44509.9, 300 sec: 44320.1). Total num frames: 595492864. Throughput: 0: 44385.8. Samples: 76728200. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 15:34:07,319][24114] Avg episode reward: [(0, '0.223')] [2024-06-06 15:34:08,920][24347] Updated weights for policy 0, policy_version 36348 (0.0034) [2024-06-06 15:34:11,699][24347] Updated weights for policy 0, policy_version 36358 (0.0032) [2024-06-06 15:34:12,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44236.9, 300 sec: 44375.7). Total num frames: 595689472. Throughput: 0: 44433.8. Samples: 76993880. Policy #0 lag: (min: 0.0, avg: 9.7, max: 25.0) [2024-06-06 15:34:12,319][24114] Avg episode reward: [(0, '0.214')] [2024-06-06 15:34:15,994][24347] Updated weights for policy 0, policy_version 36368 (0.0030) [2024-06-06 15:34:17,318][24114] Fps is (10 sec: 42598.4, 60 sec: 45057.8, 300 sec: 44486.7). Total num frames: 595918848. Throughput: 0: 44506.8. Samples: 77133000. Policy #0 lag: (min: 0.0, avg: 9.7, max: 25.0) [2024-06-06 15:34:17,319][24114] Avg episode reward: [(0, '0.213')] [2024-06-06 15:34:18,988][24347] Updated weights for policy 0, policy_version 36378 (0.0036) [2024-06-06 15:34:22,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 596115456. Throughput: 0: 44351.4. Samples: 77393400. Policy #0 lag: (min: 0.0, avg: 9.7, max: 25.0) [2024-06-06 15:34:22,319][24114] Avg episode reward: [(0, '0.224')] [2024-06-06 15:34:22,331][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000036384_596115456.pth... [2024-06-06 15:34:22,402][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000035735_585482240.pth [2024-06-06 15:34:23,081][24347] Updated weights for policy 0, policy_version 36388 (0.0030) [2024-06-06 15:34:25,606][24326] Signal inference workers to stop experience collection... (1050 times) [2024-06-06 15:34:25,607][24326] Signal inference workers to resume experience collection... (1050 times) [2024-06-06 15:34:25,648][24347] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-06-06 15:34:25,648][24347] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-06-06 15:34:26,626][24347] Updated weights for policy 0, policy_version 36398 (0.0030) [2024-06-06 15:34:27,324][24114] Fps is (10 sec: 44210.7, 60 sec: 44232.5, 300 sec: 44374.7). Total num frames: 596361216. Throughput: 0: 44212.0. Samples: 77652680. Policy #0 lag: (min: 0.0, avg: 9.7, max: 25.0) [2024-06-06 15:34:27,325][24114] Avg episode reward: [(0, '0.224')] [2024-06-06 15:34:30,802][24347] Updated weights for policy 0, policy_version 36408 (0.0040) [2024-06-06 15:34:32,318][24114] Fps is (10 sec: 47514.2, 60 sec: 45057.5, 300 sec: 44431.2). Total num frames: 596590592. Throughput: 0: 44244.8. Samples: 77788760. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 15:34:32,319][24114] Avg episode reward: [(0, '0.215')] [2024-06-06 15:34:33,946][24347] Updated weights for policy 0, policy_version 36418 (0.0036) [2024-06-06 15:34:37,318][24114] Fps is (10 sec: 44263.2, 60 sec: 44509.8, 300 sec: 44264.6). Total num frames: 596803584. Throughput: 0: 44317.1. Samples: 78060120. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 15:34:37,318][24114] Avg episode reward: [(0, '0.217')] [2024-06-06 15:34:38,351][24347] Updated weights for policy 0, policy_version 36428 (0.0029) [2024-06-06 15:34:41,319][24347] Updated weights for policy 0, policy_version 36438 (0.0035) [2024-06-06 15:34:42,318][24114] Fps is (10 sec: 42598.5, 60 sec: 43963.8, 300 sec: 44209.1). Total num frames: 597016576. Throughput: 0: 44305.7. Samples: 78323820. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 15:34:42,318][24114] Avg episode reward: [(0, '0.221')] [2024-06-06 15:34:45,369][24347] Updated weights for policy 0, policy_version 36448 (0.0032) [2024-06-06 15:34:47,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44782.9, 300 sec: 44375.7). Total num frames: 597245952. Throughput: 0: 44412.6. Samples: 78459880. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 15:34:47,318][24114] Avg episode reward: [(0, '0.225')] [2024-06-06 15:34:48,500][24347] Updated weights for policy 0, policy_version 36458 (0.0031) [2024-06-06 15:34:52,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44509.9, 300 sec: 44265.4). Total num frames: 597458944. Throughput: 0: 44458.6. Samples: 78728840. Policy #0 lag: (min: 0.0, avg: 10.5, max: 21.0) [2024-06-06 15:34:52,319][24114] Avg episode reward: [(0, '0.210')] [2024-06-06 15:34:52,569][24347] Updated weights for policy 0, policy_version 36468 (0.0036) [2024-06-06 15:34:55,789][24347] Updated weights for policy 0, policy_version 36478 (0.0038) [2024-06-06 15:34:57,318][24114] Fps is (10 sec: 42598.3, 60 sec: 43963.7, 300 sec: 44209.0). Total num frames: 597671936. Throughput: 0: 44370.3. Samples: 78990540. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 15:34:57,318][24114] Avg episode reward: [(0, '0.214')] [2024-06-06 15:35:00,065][24347] Updated weights for policy 0, policy_version 36488 (0.0039) [2024-06-06 15:35:02,318][24114] Fps is (10 sec: 45876.0, 60 sec: 44783.0, 300 sec: 44486.7). Total num frames: 597917696. Throughput: 0: 44404.6. Samples: 79131200. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 15:35:02,318][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:35:03,194][24347] Updated weights for policy 0, policy_version 36498 (0.0036) [2024-06-06 15:35:07,318][24114] Fps is (10 sec: 44236.5, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 598114304. Throughput: 0: 44513.4. Samples: 79396500. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 15:35:07,320][24114] Avg episode reward: [(0, '0.218')] [2024-06-06 15:35:07,648][24347] Updated weights for policy 0, policy_version 36508 (0.0032) [2024-06-06 15:35:10,794][24347] Updated weights for policy 0, policy_version 36518 (0.0041) [2024-06-06 15:35:12,324][24114] Fps is (10 sec: 42572.7, 60 sec: 44232.5, 300 sec: 44208.2). Total num frames: 598343680. Throughput: 0: 44676.0. Samples: 79663100. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 15:35:12,324][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:35:14,770][24347] Updated weights for policy 0, policy_version 36528 (0.0024) [2024-06-06 15:35:17,320][24114] Fps is (10 sec: 49142.7, 60 sec: 44781.5, 300 sec: 44542.0). Total num frames: 598605824. Throughput: 0: 44624.3. Samples: 79796940. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 15:35:17,321][24114] Avg episode reward: [(0, '0.224')] [2024-06-06 15:35:17,844][24347] Updated weights for policy 0, policy_version 36538 (0.0044) [2024-06-06 15:35:22,318][24114] Fps is (10 sec: 45902.1, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 598802432. Throughput: 0: 44596.4. Samples: 80066960. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 15:35:22,319][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:35:22,320][24347] Updated weights for policy 0, policy_version 36548 (0.0040) [2024-06-06 15:35:25,358][24347] Updated weights for policy 0, policy_version 36558 (0.0027) [2024-06-06 15:35:27,318][24114] Fps is (10 sec: 40968.0, 60 sec: 44241.2, 300 sec: 44264.6). Total num frames: 599015424. Throughput: 0: 44456.4. Samples: 80324360. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 15:35:27,318][24114] Avg episode reward: [(0, '0.219')] [2024-06-06 15:35:29,617][24347] Updated weights for policy 0, policy_version 36568 (0.0047) [2024-06-06 15:35:32,320][24114] Fps is (10 sec: 45866.5, 60 sec: 44508.4, 300 sec: 44430.9). Total num frames: 599261184. Throughput: 0: 44462.9. Samples: 80460800. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 15:35:32,321][24114] Avg episode reward: [(0, '0.213')] [2024-06-06 15:35:32,582][24347] Updated weights for policy 0, policy_version 36578 (0.0027) [2024-06-06 15:35:36,880][24347] Updated weights for policy 0, policy_version 36588 (0.0026) [2024-06-06 15:35:37,318][24114] Fps is (10 sec: 45874.8, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 599474176. Throughput: 0: 44579.6. Samples: 80734920. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 15:35:37,319][24114] Avg episode reward: [(0, '0.220')] [2024-06-06 15:35:39,955][24347] Updated weights for policy 0, policy_version 36598 (0.0032) [2024-06-06 15:35:42,318][24114] Fps is (10 sec: 40967.5, 60 sec: 44236.7, 300 sec: 44264.6). Total num frames: 599670784. Throughput: 0: 44670.5. Samples: 81000720. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 15:35:42,319][24114] Avg episode reward: [(0, '0.215')] [2024-06-06 15:35:44,133][24347] Updated weights for policy 0, policy_version 36608 (0.0030) [2024-06-06 15:35:47,318][24114] Fps is (10 sec: 45875.7, 60 sec: 44782.9, 300 sec: 44486.7). Total num frames: 599932928. Throughput: 0: 44440.4. Samples: 81131020. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 15:35:47,318][24114] Avg episode reward: [(0, '0.229')] [2024-06-06 15:35:47,339][24347] Updated weights for policy 0, policy_version 36618 (0.0046) [2024-06-06 15:35:51,270][24326] Signal inference workers to stop experience collection... (1100 times) [2024-06-06 15:35:51,271][24326] Signal inference workers to resume experience collection... (1100 times) [2024-06-06 15:35:51,311][24347] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-06-06 15:35:51,311][24347] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-06-06 15:35:51,557][24347] Updated weights for policy 0, policy_version 36628 (0.0030) [2024-06-06 15:35:52,318][24114] Fps is (10 sec: 47514.7, 60 sec: 44783.1, 300 sec: 44486.8). Total num frames: 600145920. Throughput: 0: 44562.4. Samples: 81401800. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 15:35:52,318][24114] Avg episode reward: [(0, '0.220')] [2024-06-06 15:35:54,688][24347] Updated weights for policy 0, policy_version 36638 (0.0045) [2024-06-06 15:35:57,318][24114] Fps is (10 sec: 40959.4, 60 sec: 44509.8, 300 sec: 44320.1). Total num frames: 600342528. Throughput: 0: 44485.8. Samples: 81664700. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 15:35:57,319][24114] Avg episode reward: [(0, '0.216')] [2024-06-06 15:35:59,225][24347] Updated weights for policy 0, policy_version 36648 (0.0029) [2024-06-06 15:36:02,095][24347] Updated weights for policy 0, policy_version 36658 (0.0031) [2024-06-06 15:36:02,324][24114] Fps is (10 sec: 45847.1, 60 sec: 44778.4, 300 sec: 44486.1). Total num frames: 600604672. Throughput: 0: 44420.9. Samples: 81796060. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 15:36:02,325][24114] Avg episode reward: [(0, '0.218')] [2024-06-06 15:36:06,438][24347] Updated weights for policy 0, policy_version 36668 (0.0036) [2024-06-06 15:36:07,320][24114] Fps is (10 sec: 45866.0, 60 sec: 44781.4, 300 sec: 44486.4). Total num frames: 600801280. Throughput: 0: 44530.4. Samples: 82070920. Policy #0 lag: (min: 1.0, avg: 9.8, max: 22.0) [2024-06-06 15:36:07,321][24114] Avg episode reward: [(0, '0.218')] [2024-06-06 15:36:09,833][24347] Updated weights for policy 0, policy_version 36678 (0.0036) [2024-06-06 15:36:12,318][24114] Fps is (10 sec: 40984.9, 60 sec: 44514.3, 300 sec: 44320.4). Total num frames: 601014272. Throughput: 0: 44611.1. Samples: 82331860. Policy #0 lag: (min: 1.0, avg: 9.8, max: 22.0) [2024-06-06 15:36:12,318][24114] Avg episode reward: [(0, '0.226')] [2024-06-06 15:36:13,805][24347] Updated weights for policy 0, policy_version 36688 (0.0035) [2024-06-06 15:36:16,981][24347] Updated weights for policy 0, policy_version 36698 (0.0040) [2024-06-06 15:36:17,318][24114] Fps is (10 sec: 45884.5, 60 sec: 44238.2, 300 sec: 44431.2). Total num frames: 601260032. Throughput: 0: 44463.7. Samples: 82461580. Policy #0 lag: (min: 1.0, avg: 9.8, max: 22.0) [2024-06-06 15:36:17,319][24114] Avg episode reward: [(0, '0.220')] [2024-06-06 15:36:21,404][24347] Updated weights for policy 0, policy_version 36708 (0.0033) [2024-06-06 15:36:22,318][24114] Fps is (10 sec: 47513.2, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 601489408. Throughput: 0: 44328.0. Samples: 82729680. Policy #0 lag: (min: 1.0, avg: 9.8, max: 22.0) [2024-06-06 15:36:22,318][24114] Avg episode reward: [(0, '0.215')] [2024-06-06 15:36:22,409][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000036713_601505792.pth... [2024-06-06 15:36:22,450][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000036061_590823424.pth [2024-06-06 15:36:24,312][24347] Updated weights for policy 0, policy_version 36718 (0.0033) [2024-06-06 15:36:27,318][24114] Fps is (10 sec: 40960.7, 60 sec: 44236.8, 300 sec: 44264.6). Total num frames: 601669632. Throughput: 0: 44432.2. Samples: 83000160. Policy #0 lag: (min: 1.0, avg: 9.8, max: 22.0) [2024-06-06 15:36:27,318][24114] Avg episode reward: [(0, '0.219')] [2024-06-06 15:36:28,832][24347] Updated weights for policy 0, policy_version 36728 (0.0041) [2024-06-06 15:36:31,642][24347] Updated weights for policy 0, policy_version 36738 (0.0036) [2024-06-06 15:36:32,318][24114] Fps is (10 sec: 42598.2, 60 sec: 44238.2, 300 sec: 44320.1). Total num frames: 601915392. Throughput: 0: 44339.0. Samples: 83126280. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 15:36:32,318][24114] Avg episode reward: [(0, '0.218')] [2024-06-06 15:36:35,921][24347] Updated weights for policy 0, policy_version 36748 (0.0032) [2024-06-06 15:36:37,318][24114] Fps is (10 sec: 47513.2, 60 sec: 44509.9, 300 sec: 44486.9). Total num frames: 602144768. Throughput: 0: 44406.6. Samples: 83400100. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 15:36:37,319][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:36:39,192][24347] Updated weights for policy 0, policy_version 36758 (0.0034) [2024-06-06 15:36:42,318][24114] Fps is (10 sec: 42599.1, 60 sec: 44510.0, 300 sec: 44320.1). Total num frames: 602341376. Throughput: 0: 44595.7. Samples: 83671500. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 15:36:42,318][24114] Avg episode reward: [(0, '0.219')] [2024-06-06 15:36:43,245][24347] Updated weights for policy 0, policy_version 36768 (0.0033) [2024-06-06 15:36:46,068][24347] Updated weights for policy 0, policy_version 36778 (0.0025) [2024-06-06 15:36:47,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44236.7, 300 sec: 44431.2). Total num frames: 602587136. Throughput: 0: 44521.4. Samples: 83799260. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 15:36:47,319][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:36:50,432][24347] Updated weights for policy 0, policy_version 36788 (0.0025) [2024-06-06 15:36:52,318][24114] Fps is (10 sec: 47513.2, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 602816512. Throughput: 0: 44556.8. Samples: 84075880. Policy #0 lag: (min: 1.0, avg: 11.1, max: 22.0) [2024-06-06 15:36:52,318][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 15:36:53,519][24347] Updated weights for policy 0, policy_version 36798 (0.0028) [2024-06-06 15:36:57,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44509.8, 300 sec: 44375.6). Total num frames: 603013120. Throughput: 0: 44714.0. Samples: 84344000. Policy #0 lag: (min: 1.0, avg: 11.8, max: 23.0) [2024-06-06 15:36:57,319][24114] Avg episode reward: [(0, '0.223')] [2024-06-06 15:36:57,794][24347] Updated weights for policy 0, policy_version 36808 (0.0048) [2024-06-06 15:37:00,797][24347] Updated weights for policy 0, policy_version 36818 (0.0032) [2024-06-06 15:37:02,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44241.2, 300 sec: 44431.2). Total num frames: 603258880. Throughput: 0: 44794.3. Samples: 84477320. Policy #0 lag: (min: 1.0, avg: 11.8, max: 23.0) [2024-06-06 15:37:02,319][24114] Avg episode reward: [(0, '0.224')] [2024-06-06 15:37:03,167][24326] Signal inference workers to stop experience collection... (1150 times) [2024-06-06 15:37:03,220][24347] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-06-06 15:37:03,225][24326] Signal inference workers to resume experience collection... (1150 times) [2024-06-06 15:37:03,235][24347] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-06-06 15:37:05,108][24347] Updated weights for policy 0, policy_version 36828 (0.0032) [2024-06-06 15:37:07,318][24114] Fps is (10 sec: 47514.4, 60 sec: 44784.5, 300 sec: 44486.7). Total num frames: 603488256. Throughput: 0: 44721.8. Samples: 84742160. Policy #0 lag: (min: 1.0, avg: 11.8, max: 23.0) [2024-06-06 15:37:07,318][24114] Avg episode reward: [(0, '0.229')] [2024-06-06 15:37:08,633][24347] Updated weights for policy 0, policy_version 36838 (0.0033) [2024-06-06 15:37:12,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44782.8, 300 sec: 44486.7). Total num frames: 603701248. Throughput: 0: 44589.6. Samples: 85006700. Policy #0 lag: (min: 1.0, avg: 11.8, max: 23.0) [2024-06-06 15:37:12,319][24114] Avg episode reward: [(0, '0.223')] [2024-06-06 15:37:12,543][24347] Updated weights for policy 0, policy_version 36848 (0.0047) [2024-06-06 15:37:15,637][24347] Updated weights for policy 0, policy_version 36858 (0.0039) [2024-06-06 15:37:17,318][24114] Fps is (10 sec: 40960.1, 60 sec: 43963.8, 300 sec: 44320.1). Total num frames: 603897856. Throughput: 0: 44751.3. Samples: 85140080. Policy #0 lag: (min: 1.0, avg: 11.8, max: 23.0) [2024-06-06 15:37:17,318][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 15:37:19,833][24347] Updated weights for policy 0, policy_version 36868 (0.0040) [2024-06-06 15:37:22,318][24114] Fps is (10 sec: 45875.7, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 604160000. Throughput: 0: 44531.2. Samples: 85404000. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 15:37:22,318][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:37:23,102][24347] Updated weights for policy 0, policy_version 36878 (0.0033) [2024-06-06 15:37:27,126][24347] Updated weights for policy 0, policy_version 36888 (0.0023) [2024-06-06 15:37:27,318][24114] Fps is (10 sec: 47513.1, 60 sec: 45055.9, 300 sec: 44486.7). Total num frames: 604372992. Throughput: 0: 44551.4. Samples: 85676320. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 15:37:27,319][24114] Avg episode reward: [(0, '0.224')] [2024-06-06 15:37:30,457][24347] Updated weights for policy 0, policy_version 36898 (0.0028) [2024-06-06 15:37:32,318][24114] Fps is (10 sec: 42598.0, 60 sec: 44509.9, 300 sec: 44375.8). Total num frames: 604585984. Throughput: 0: 44653.3. Samples: 85808660. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 15:37:32,319][24114] Avg episode reward: [(0, '0.224')] [2024-06-06 15:37:34,325][24347] Updated weights for policy 0, policy_version 36908 (0.0037) [2024-06-06 15:37:37,318][24114] Fps is (10 sec: 45875.5, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 604831744. Throughput: 0: 44501.3. Samples: 86078440. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 15:37:37,318][24114] Avg episode reward: [(0, '0.220')] [2024-06-06 15:37:38,063][24347] Updated weights for policy 0, policy_version 36918 (0.0039) [2024-06-06 15:37:42,068][24347] Updated weights for policy 0, policy_version 36928 (0.0037) [2024-06-06 15:37:42,318][24114] Fps is (10 sec: 45875.0, 60 sec: 45055.8, 300 sec: 44542.2). Total num frames: 605044736. Throughput: 0: 44471.6. Samples: 86345220. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 15:37:42,319][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:37:45,134][24347] Updated weights for policy 0, policy_version 36938 (0.0036) [2024-06-06 15:37:47,318][24114] Fps is (10 sec: 40959.6, 60 sec: 44236.8, 300 sec: 44375.6). Total num frames: 605241344. Throughput: 0: 44394.6. Samples: 86475080. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 15:37:47,319][24114] Avg episode reward: [(0, '0.224')] [2024-06-06 15:37:49,215][24347] Updated weights for policy 0, policy_version 36948 (0.0021) [2024-06-06 15:37:52,318][24114] Fps is (10 sec: 44237.5, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 605487104. Throughput: 0: 44364.9. Samples: 86738580. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 15:37:52,319][24114] Avg episode reward: [(0, '0.232')] [2024-06-06 15:37:52,696][24347] Updated weights for policy 0, policy_version 36958 (0.0031) [2024-06-06 15:37:56,374][24347] Updated weights for policy 0, policy_version 36968 (0.0026) [2024-06-06 15:37:57,318][24114] Fps is (10 sec: 45875.5, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 605700096. Throughput: 0: 44491.6. Samples: 87008820. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 15:37:57,324][24114] Avg episode reward: [(0, '0.223')] [2024-06-06 15:38:00,310][24347] Updated weights for policy 0, policy_version 36978 (0.0042) [2024-06-06 15:38:02,318][24114] Fps is (10 sec: 42597.9, 60 sec: 44236.8, 300 sec: 44375.7). Total num frames: 605913088. Throughput: 0: 44459.0. Samples: 87140740. Policy #0 lag: (min: 0.0, avg: 10.1, max: 23.0) [2024-06-06 15:38:02,319][24114] Avg episode reward: [(0, '0.217')] [2024-06-06 15:38:04,020][24347] Updated weights for policy 0, policy_version 36988 (0.0028) [2024-06-06 15:38:07,318][24114] Fps is (10 sec: 42598.9, 60 sec: 43963.8, 300 sec: 44375.7). Total num frames: 606126080. Throughput: 0: 44457.9. Samples: 87404600. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-06 15:38:07,318][24114] Avg episode reward: [(0, '0.231')] [2024-06-06 15:38:07,842][24347] Updated weights for policy 0, policy_version 36998 (0.0036) [2024-06-06 15:38:11,476][24347] Updated weights for policy 0, policy_version 37008 (0.0029) [2024-06-06 15:38:12,318][24114] Fps is (10 sec: 47513.8, 60 sec: 44783.0, 300 sec: 44653.7). Total num frames: 606388224. Throughput: 0: 44379.5. Samples: 87673400. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-06 15:38:12,319][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:38:15,234][24347] Updated weights for policy 0, policy_version 37018 (0.0038) [2024-06-06 15:38:17,320][24114] Fps is (10 sec: 45865.8, 60 sec: 44781.4, 300 sec: 44486.4). Total num frames: 606584832. Throughput: 0: 44417.3. Samples: 87807520. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-06 15:38:17,320][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:38:18,584][24347] Updated weights for policy 0, policy_version 37028 (0.0035) [2024-06-06 15:38:20,237][24326] Signal inference workers to stop experience collection... (1200 times) [2024-06-06 15:38:20,237][24326] Signal inference workers to resume experience collection... (1200 times) [2024-06-06 15:38:20,284][24347] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-06-06 15:38:20,284][24347] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-06-06 15:38:22,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 606814208. Throughput: 0: 44184.9. Samples: 88066760. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-06 15:38:22,318][24114] Avg episode reward: [(0, '0.217')] [2024-06-06 15:38:22,361][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000037038_606830592.pth... [2024-06-06 15:38:22,361][24347] Updated weights for policy 0, policy_version 37038 (0.0029) [2024-06-06 15:38:22,406][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000036384_596115456.pth [2024-06-06 15:38:25,831][24347] Updated weights for policy 0, policy_version 37048 (0.0032) [2024-06-06 15:38:27,318][24114] Fps is (10 sec: 45884.3, 60 sec: 44509.9, 300 sec: 44598.1). Total num frames: 607043584. Throughput: 0: 44319.3. Samples: 88339580. Policy #0 lag: (min: 0.0, avg: 10.5, max: 23.0) [2024-06-06 15:38:27,318][24114] Avg episode reward: [(0, '0.224')] [2024-06-06 15:38:29,726][24347] Updated weights for policy 0, policy_version 37058 (0.0024) [2024-06-06 15:38:32,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 607240192. Throughput: 0: 44338.8. Samples: 88470320. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 15:38:32,318][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 15:38:33,368][24347] Updated weights for policy 0, policy_version 37068 (0.0034) [2024-06-06 15:38:37,318][24114] Fps is (10 sec: 42598.5, 60 sec: 43963.8, 300 sec: 44375.7). Total num frames: 607469568. Throughput: 0: 44453.3. Samples: 88738980. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 15:38:37,318][24114] Avg episode reward: [(0, '0.223')] [2024-06-06 15:38:37,331][24347] Updated weights for policy 0, policy_version 37078 (0.0029) [2024-06-06 15:38:40,894][24347] Updated weights for policy 0, policy_version 37088 (0.0051) [2024-06-06 15:38:42,318][24114] Fps is (10 sec: 47513.4, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 607715328. Throughput: 0: 44336.4. Samples: 89003960. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 15:38:42,318][24114] Avg episode reward: [(0, '0.226')] [2024-06-06 15:38:44,545][24347] Updated weights for policy 0, policy_version 37098 (0.0041) [2024-06-06 15:38:47,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 607928320. Throughput: 0: 44391.7. Samples: 89138360. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 15:38:47,318][24114] Avg episode reward: [(0, '0.231')] [2024-06-06 15:38:47,953][24347] Updated weights for policy 0, policy_version 37108 (0.0023) [2024-06-06 15:38:52,127][24347] Updated weights for policy 0, policy_version 37118 (0.0030) [2024-06-06 15:38:52,318][24114] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 608141312. Throughput: 0: 44598.1. Samples: 89411520. Policy #0 lag: (min: 0.0, avg: 10.0, max: 23.0) [2024-06-06 15:38:52,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:38:55,024][24347] Updated weights for policy 0, policy_version 37128 (0.0036) [2024-06-06 15:38:57,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 608387072. Throughput: 0: 44544.9. Samples: 89677920. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 15:38:57,318][24114] Avg episode reward: [(0, '0.239')] [2024-06-06 15:38:59,830][24347] Updated weights for policy 0, policy_version 37138 (0.0034) [2024-06-06 15:39:02,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 608600064. Throughput: 0: 44593.5. Samples: 89814140. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 15:39:02,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:39:02,638][24347] Updated weights for policy 0, policy_version 37148 (0.0038) [2024-06-06 15:39:07,029][24347] Updated weights for policy 0, policy_version 37158 (0.0054) [2024-06-06 15:39:07,318][24114] Fps is (10 sec: 40960.2, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 608796672. Throughput: 0: 44669.4. Samples: 90076880. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 15:39:07,318][24114] Avg episode reward: [(0, '0.223')] [2024-06-06 15:39:10,262][24347] Updated weights for policy 0, policy_version 37168 (0.0039) [2024-06-06 15:39:12,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 609042432. Throughput: 0: 44283.6. Samples: 90332340. Policy #0 lag: (min: 0.0, avg: 11.3, max: 21.0) [2024-06-06 15:39:12,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:39:14,466][24347] Updated weights for policy 0, policy_version 37178 (0.0037) [2024-06-06 15:39:17,318][24114] Fps is (10 sec: 47512.9, 60 sec: 44784.3, 300 sec: 44597.8). Total num frames: 609271808. Throughput: 0: 44421.2. Samples: 90469280. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-06 15:39:17,320][24114] Avg episode reward: [(0, '0.221')] [2024-06-06 15:39:17,477][24347] Updated weights for policy 0, policy_version 37188 (0.0037) [2024-06-06 15:39:21,781][24347] Updated weights for policy 0, policy_version 37198 (0.0037) [2024-06-06 15:39:22,318][24114] Fps is (10 sec: 44236.2, 60 sec: 44509.8, 300 sec: 44487.6). Total num frames: 609484800. Throughput: 0: 44548.3. Samples: 90743660. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-06 15:39:22,319][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:39:24,610][24347] Updated weights for policy 0, policy_version 37208 (0.0022) [2024-06-06 15:39:27,318][24114] Fps is (10 sec: 42599.1, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 609697792. Throughput: 0: 44481.4. Samples: 91005620. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-06 15:39:27,318][24114] Avg episode reward: [(0, '0.225')] [2024-06-06 15:39:29,346][24347] Updated weights for policy 0, policy_version 37218 (0.0041) [2024-06-06 15:39:32,233][24347] Updated weights for policy 0, policy_version 37228 (0.0029) [2024-06-06 15:39:32,318][24114] Fps is (10 sec: 45875.5, 60 sec: 45056.0, 300 sec: 44542.3). Total num frames: 609943552. Throughput: 0: 44445.3. Samples: 91138400. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-06 15:39:32,318][24114] Avg episode reward: [(0, '0.226')] [2024-06-06 15:39:36,500][24326] Signal inference workers to stop experience collection... (1250 times) [2024-06-06 15:39:36,535][24347] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-06-06 15:39:36,555][24326] Signal inference workers to resume experience collection... (1250 times) [2024-06-06 15:39:36,560][24347] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-06-06 15:39:36,563][24347] Updated weights for policy 0, policy_version 37238 (0.0034) [2024-06-06 15:39:37,318][24114] Fps is (10 sec: 44236.1, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 610140160. Throughput: 0: 44577.7. Samples: 91417520. Policy #0 lag: (min: 0.0, avg: 11.4, max: 23.0) [2024-06-06 15:39:37,319][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:39:39,507][24347] Updated weights for policy 0, policy_version 37248 (0.0037) [2024-06-06 15:39:42,318][24114] Fps is (10 sec: 40960.0, 60 sec: 43963.8, 300 sec: 44431.2). Total num frames: 610353152. Throughput: 0: 44372.9. Samples: 91674700. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 15:39:42,318][24114] Avg episode reward: [(0, '0.226')] [2024-06-06 15:39:43,703][24347] Updated weights for policy 0, policy_version 37258 (0.0040) [2024-06-06 15:39:46,933][24347] Updated weights for policy 0, policy_version 37268 (0.0025) [2024-06-06 15:39:47,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 610598912. Throughput: 0: 44320.4. Samples: 91808560. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 15:39:47,319][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:39:51,327][24347] Updated weights for policy 0, policy_version 37278 (0.0031) [2024-06-06 15:39:52,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 610811904. Throughput: 0: 44389.2. Samples: 92074400. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 15:39:52,318][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:39:54,562][24347] Updated weights for policy 0, policy_version 37288 (0.0037) [2024-06-06 15:39:57,318][24114] Fps is (10 sec: 40960.5, 60 sec: 43690.7, 300 sec: 44375.6). Total num frames: 611008512. Throughput: 0: 44619.1. Samples: 92340200. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 15:39:57,318][24114] Avg episode reward: [(0, '0.224')] [2024-06-06 15:39:58,664][24347] Updated weights for policy 0, policy_version 37298 (0.0038) [2024-06-06 15:40:01,676][24347] Updated weights for policy 0, policy_version 37308 (0.0028) [2024-06-06 15:40:02,318][24114] Fps is (10 sec: 45875.5, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 611270656. Throughput: 0: 44469.5. Samples: 92470400. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 15:40:02,318][24114] Avg episode reward: [(0, '0.232')] [2024-06-06 15:40:06,374][24347] Updated weights for policy 0, policy_version 37318 (0.0038) [2024-06-06 15:40:07,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44509.9, 300 sec: 44487.6). Total num frames: 611467264. Throughput: 0: 44409.4. Samples: 92742080. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 15:40:07,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:40:09,244][24347] Updated weights for policy 0, policy_version 37328 (0.0030) [2024-06-06 15:40:12,318][24114] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 44264.9). Total num frames: 611663872. Throughput: 0: 44439.5. Samples: 93005400. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 15:40:12,318][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:40:13,480][24347] Updated weights for policy 0, policy_version 37338 (0.0030) [2024-06-06 15:40:16,974][24347] Updated weights for policy 0, policy_version 37348 (0.0031) [2024-06-06 15:40:17,318][24114] Fps is (10 sec: 47513.3, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 611942400. Throughput: 0: 44335.1. Samples: 93133480. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 15:40:17,318][24114] Avg episode reward: [(0, '0.237')] [2024-06-06 15:40:20,763][24347] Updated weights for policy 0, policy_version 37358 (0.0027) [2024-06-06 15:40:22,318][24114] Fps is (10 sec: 47513.5, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 612139008. Throughput: 0: 44207.2. Samples: 93406840. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 15:40:22,318][24114] Avg episode reward: [(0, '0.225')] [2024-06-06 15:40:22,330][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000037363_612155392.pth... [2024-06-06 15:40:22,431][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000036713_601505792.pth [2024-06-06 15:40:24,286][24347] Updated weights for policy 0, policy_version 37368 (0.0033) [2024-06-06 15:40:27,318][24114] Fps is (10 sec: 39321.5, 60 sec: 43963.7, 300 sec: 44320.4). Total num frames: 612335616. Throughput: 0: 44429.7. Samples: 93674040. Policy #0 lag: (min: 0.0, avg: 8.6, max: 21.0) [2024-06-06 15:40:27,318][24114] Avg episode reward: [(0, '0.231')] [2024-06-06 15:40:28,207][24347] Updated weights for policy 0, policy_version 37378 (0.0046) [2024-06-06 15:40:31,426][24347] Updated weights for policy 0, policy_version 37388 (0.0031) [2024-06-06 15:40:32,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 612597760. Throughput: 0: 44217.0. Samples: 93798320. Policy #0 lag: (min: 2.0, avg: 11.9, max: 23.0) [2024-06-06 15:40:32,318][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 15:40:35,846][24347] Updated weights for policy 0, policy_version 37398 (0.0036) [2024-06-06 15:40:37,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 612794368. Throughput: 0: 44201.8. Samples: 94063480. Policy #0 lag: (min: 2.0, avg: 11.9, max: 23.0) [2024-06-06 15:40:37,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:40:38,954][24347] Updated weights for policy 0, policy_version 37408 (0.0026) [2024-06-06 15:40:42,318][24114] Fps is (10 sec: 39321.6, 60 sec: 43963.8, 300 sec: 44264.6). Total num frames: 612990976. Throughput: 0: 44376.4. Samples: 94337140. Policy #0 lag: (min: 2.0, avg: 11.9, max: 23.0) [2024-06-06 15:40:42,318][24114] Avg episode reward: [(0, '0.231')] [2024-06-06 15:40:43,103][24347] Updated weights for policy 0, policy_version 37418 (0.0034) [2024-06-06 15:40:44,223][24326] Signal inference workers to stop experience collection... (1300 times) [2024-06-06 15:40:44,267][24347] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-06-06 15:40:44,274][24326] Signal inference workers to resume experience collection... (1300 times) [2024-06-06 15:40:44,280][24347] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-06-06 15:40:46,679][24347] Updated weights for policy 0, policy_version 37428 (0.0028) [2024-06-06 15:40:47,318][24114] Fps is (10 sec: 47513.4, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 613269504. Throughput: 0: 44267.5. Samples: 94462440. Policy #0 lag: (min: 2.0, avg: 11.9, max: 23.0) [2024-06-06 15:40:47,319][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:40:50,441][24347] Updated weights for policy 0, policy_version 37438 (0.0039) [2024-06-06 15:40:52,318][24114] Fps is (10 sec: 47513.0, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 613466112. Throughput: 0: 44290.1. Samples: 94735140. Policy #0 lag: (min: 2.0, avg: 11.9, max: 23.0) [2024-06-06 15:40:52,319][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:40:53,826][24347] Updated weights for policy 0, policy_version 37448 (0.0033) [2024-06-06 15:40:57,320][24114] Fps is (10 sec: 39314.1, 60 sec: 44235.3, 300 sec: 44265.2). Total num frames: 613662720. Throughput: 0: 44211.8. Samples: 94995020. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 15:40:57,321][24114] Avg episode reward: [(0, '0.222')] [2024-06-06 15:40:57,982][24347] Updated weights for policy 0, policy_version 37458 (0.0029) [2024-06-06 15:41:00,914][24347] Updated weights for policy 0, policy_version 37468 (0.0030) [2024-06-06 15:41:02,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44236.7, 300 sec: 44487.0). Total num frames: 613924864. Throughput: 0: 44231.0. Samples: 95123880. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 15:41:02,319][24114] Avg episode reward: [(0, '0.233')] [2024-06-06 15:41:05,402][24347] Updated weights for policy 0, policy_version 37478 (0.0031) [2024-06-06 15:41:07,318][24114] Fps is (10 sec: 47522.6, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 614137856. Throughput: 0: 44071.5. Samples: 95390060. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 15:41:07,319][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:41:08,638][24347] Updated weights for policy 0, policy_version 37488 (0.0031) [2024-06-06 15:41:12,318][24114] Fps is (10 sec: 40959.9, 60 sec: 44509.8, 300 sec: 44320.1). Total num frames: 614334464. Throughput: 0: 44208.4. Samples: 95663420. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 15:41:12,319][24114] Avg episode reward: [(0, '0.229')] [2024-06-06 15:41:12,860][24347] Updated weights for policy 0, policy_version 37498 (0.0038) [2024-06-06 15:41:15,799][24347] Updated weights for policy 0, policy_version 37508 (0.0026) [2024-06-06 15:41:17,318][24114] Fps is (10 sec: 45875.5, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 614596608. Throughput: 0: 44293.8. Samples: 95791540. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 15:41:17,318][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 15:41:20,048][24347] Updated weights for policy 0, policy_version 37518 (0.0030) [2024-06-06 15:41:22,318][24114] Fps is (10 sec: 45875.8, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 614793216. Throughput: 0: 44403.6. Samples: 96061640. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:41:22,318][24114] Avg episode reward: [(0, '0.231')] [2024-06-06 15:41:23,242][24347] Updated weights for policy 0, policy_version 37528 (0.0032) [2024-06-06 15:41:27,198][24347] Updated weights for policy 0, policy_version 37538 (0.0033) [2024-06-06 15:41:27,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44782.9, 300 sec: 44431.2). Total num frames: 615022592. Throughput: 0: 44339.0. Samples: 96332400. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:41:27,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:41:30,329][24347] Updated weights for policy 0, policy_version 37548 (0.0033) [2024-06-06 15:41:32,318][24114] Fps is (10 sec: 47513.5, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 615268352. Throughput: 0: 44349.9. Samples: 96458180. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:41:32,318][24114] Avg episode reward: [(0, '0.229')] [2024-06-06 15:41:34,751][24347] Updated weights for policy 0, policy_version 37558 (0.0036) [2024-06-06 15:41:37,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 615464960. Throughput: 0: 44053.8. Samples: 96717560. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:41:37,319][24114] Avg episode reward: [(0, '0.225')] [2024-06-06 15:41:38,032][24347] Updated weights for policy 0, policy_version 37568 (0.0024) [2024-06-06 15:41:42,318][24114] Fps is (10 sec: 39321.7, 60 sec: 44509.9, 300 sec: 44320.1). Total num frames: 615661568. Throughput: 0: 44287.7. Samples: 96987880. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:41:42,318][24114] Avg episode reward: [(0, '0.233')] [2024-06-06 15:41:42,379][24347] Updated weights for policy 0, policy_version 37578 (0.0025) [2024-06-06 15:41:45,498][24347] Updated weights for policy 0, policy_version 37588 (0.0032) [2024-06-06 15:41:47,318][24114] Fps is (10 sec: 45875.8, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 615923712. Throughput: 0: 44437.5. Samples: 97123560. Policy #0 lag: (min: 0.0, avg: 9.6, max: 24.0) [2024-06-06 15:41:47,318][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 15:41:49,410][24347] Updated weights for policy 0, policy_version 37598 (0.0029) [2024-06-06 15:41:52,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44510.0, 300 sec: 44486.8). Total num frames: 616136704. Throughput: 0: 44495.2. Samples: 97392340. Policy #0 lag: (min: 0.0, avg: 9.6, max: 24.0) [2024-06-06 15:41:52,318][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 15:41:53,030][24347] Updated weights for policy 0, policy_version 37608 (0.0031) [2024-06-06 15:41:53,672][24326] Signal inference workers to stop experience collection... (1350 times) [2024-06-06 15:41:53,676][24326] Signal inference workers to resume experience collection... (1350 times) [2024-06-06 15:41:53,712][24347] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-06-06 15:41:53,716][24347] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-06-06 15:41:56,788][24347] Updated weights for policy 0, policy_version 37618 (0.0037) [2024-06-06 15:41:57,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44784.5, 300 sec: 44375.7). Total num frames: 616349696. Throughput: 0: 44312.6. Samples: 97657480. Policy #0 lag: (min: 0.0, avg: 9.6, max: 24.0) [2024-06-06 15:41:57,318][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 15:42:00,278][24347] Updated weights for policy 0, policy_version 37628 (0.0029) [2024-06-06 15:42:02,318][24114] Fps is (10 sec: 44236.2, 60 sec: 44236.8, 300 sec: 44375.6). Total num frames: 616579072. Throughput: 0: 44447.9. Samples: 97791700. Policy #0 lag: (min: 0.0, avg: 9.6, max: 24.0) [2024-06-06 15:42:02,327][24114] Avg episode reward: [(0, '0.237')] [2024-06-06 15:42:04,164][24347] Updated weights for policy 0, policy_version 37638 (0.0036) [2024-06-06 15:42:07,318][24114] Fps is (10 sec: 44236.0, 60 sec: 44236.8, 300 sec: 44375.6). Total num frames: 616792064. Throughput: 0: 44186.5. Samples: 98050040. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:42:07,319][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 15:42:07,952][24347] Updated weights for policy 0, policy_version 37648 (0.0029) [2024-06-06 15:42:11,834][24347] Updated weights for policy 0, policy_version 37658 (0.0046) [2024-06-06 15:42:12,318][24114] Fps is (10 sec: 45875.4, 60 sec: 45056.0, 300 sec: 44542.2). Total num frames: 617037824. Throughput: 0: 44234.2. Samples: 98322940. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:42:12,319][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:42:15,613][24347] Updated weights for policy 0, policy_version 37668 (0.0025) [2024-06-06 15:42:17,318][24114] Fps is (10 sec: 44237.3, 60 sec: 43963.7, 300 sec: 44320.1). Total num frames: 617234432. Throughput: 0: 44414.2. Samples: 98456820. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:42:17,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:42:18,989][24347] Updated weights for policy 0, policy_version 37678 (0.0049) [2024-06-06 15:42:22,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44509.7, 300 sec: 44375.6). Total num frames: 617463808. Throughput: 0: 44540.8. Samples: 98721900. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:42:22,318][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:42:22,334][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000037687_617463808.pth... [2024-06-06 15:42:22,387][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000037038_606830592.pth [2024-06-06 15:42:22,729][24347] Updated weights for policy 0, policy_version 37688 (0.0028) [2024-06-06 15:42:26,216][24347] Updated weights for policy 0, policy_version 37698 (0.0030) [2024-06-06 15:42:27,318][24114] Fps is (10 sec: 44236.9, 60 sec: 44236.9, 300 sec: 44375.7). Total num frames: 617676800. Throughput: 0: 44403.1. Samples: 98986020. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 15:42:27,318][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:42:29,824][24347] Updated weights for policy 0, policy_version 37708 (0.0042) [2024-06-06 15:42:32,320][24114] Fps is (10 sec: 42590.7, 60 sec: 43689.2, 300 sec: 44264.3). Total num frames: 617889792. Throughput: 0: 44368.7. Samples: 99120240. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:42:32,320][24114] Avg episode reward: [(0, '0.233')] [2024-06-06 15:42:33,704][24347] Updated weights for policy 0, policy_version 37718 (0.0051) [2024-06-06 15:42:37,318][24114] Fps is (10 sec: 44235.6, 60 sec: 44236.7, 300 sec: 44320.1). Total num frames: 618119168. Throughput: 0: 44186.4. Samples: 99380740. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:42:37,319][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:42:37,442][24347] Updated weights for policy 0, policy_version 37728 (0.0037) [2024-06-06 15:42:41,284][24347] Updated weights for policy 0, policy_version 37738 (0.0045) [2024-06-06 15:42:42,320][24114] Fps is (10 sec: 49151.9, 60 sec: 45327.6, 300 sec: 44542.0). Total num frames: 618381312. Throughput: 0: 44215.3. Samples: 99647260. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:42:42,320][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:42:44,854][24347] Updated weights for policy 0, policy_version 37748 (0.0035) [2024-06-06 15:42:47,318][24114] Fps is (10 sec: 42599.3, 60 sec: 43690.6, 300 sec: 44264.6). Total num frames: 618545152. Throughput: 0: 44338.3. Samples: 99786920. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:42:47,318][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:42:48,602][24347] Updated weights for policy 0, policy_version 37758 (0.0031) [2024-06-06 15:42:52,309][24347] Updated weights for policy 0, policy_version 37768 (0.0029) [2024-06-06 15:42:52,318][24114] Fps is (10 sec: 40968.3, 60 sec: 44236.8, 300 sec: 44375.7). Total num frames: 618790912. Throughput: 0: 44508.6. Samples: 100052920. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:42:52,318][24114] Avg episode reward: [(0, '0.225')] [2024-06-06 15:42:55,705][24347] Updated weights for policy 0, policy_version 37778 (0.0027) [2024-06-06 15:42:56,308][24326] Signal inference workers to stop experience collection... (1400 times) [2024-06-06 15:42:56,309][24326] Signal inference workers to resume experience collection... (1400 times) [2024-06-06 15:42:56,355][24347] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-06-06 15:42:56,355][24347] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-06-06 15:42:57,318][24114] Fps is (10 sec: 47513.8, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 619020288. Throughput: 0: 44253.4. Samples: 100314340. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 15:42:57,318][24114] Avg episode reward: [(0, '0.219')] [2024-06-06 15:42:59,480][24347] Updated weights for policy 0, policy_version 37788 (0.0027) [2024-06-06 15:43:02,318][24114] Fps is (10 sec: 42597.5, 60 sec: 43963.7, 300 sec: 44375.6). Total num frames: 619216896. Throughput: 0: 44366.5. Samples: 100453320. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 15:43:02,319][24114] Avg episode reward: [(0, '0.218')] [2024-06-06 15:43:03,266][24347] Updated weights for policy 0, policy_version 37798 (0.0034) [2024-06-06 15:43:07,080][24347] Updated weights for policy 0, policy_version 37808 (0.0045) [2024-06-06 15:43:07,318][24114] Fps is (10 sec: 42597.7, 60 sec: 44236.8, 300 sec: 44264.6). Total num frames: 619446272. Throughput: 0: 44238.2. Samples: 100712620. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 15:43:07,319][24114] Avg episode reward: [(0, '0.229')] [2024-06-06 15:43:10,647][24347] Updated weights for policy 0, policy_version 37818 (0.0038) [2024-06-06 15:43:12,318][24114] Fps is (10 sec: 49152.2, 60 sec: 44509.8, 300 sec: 44487.0). Total num frames: 619708416. Throughput: 0: 44141.2. Samples: 100972380. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 15:43:12,319][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:43:14,721][24347] Updated weights for policy 0, policy_version 37828 (0.0040) [2024-06-06 15:43:17,318][24114] Fps is (10 sec: 42598.4, 60 sec: 43963.6, 300 sec: 44264.5). Total num frames: 619872256. Throughput: 0: 44310.2. Samples: 101114120. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 15:43:17,319][24114] Avg episode reward: [(0, '0.226')] [2024-06-06 15:43:18,119][24347] Updated weights for policy 0, policy_version 37838 (0.0033) [2024-06-06 15:43:22,034][24347] Updated weights for policy 0, policy_version 37848 (0.0036) [2024-06-06 15:43:22,318][24114] Fps is (10 sec: 39321.4, 60 sec: 43963.7, 300 sec: 44264.5). Total num frames: 620101632. Throughput: 0: 44419.6. Samples: 101379620. Policy #0 lag: (min: 1.0, avg: 12.1, max: 23.0) [2024-06-06 15:43:22,319][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:43:25,653][24347] Updated weights for policy 0, policy_version 37858 (0.0026) [2024-06-06 15:43:27,318][24114] Fps is (10 sec: 45875.9, 60 sec: 44236.8, 300 sec: 44375.7). Total num frames: 620331008. Throughput: 0: 44193.5. Samples: 101635880. Policy #0 lag: (min: 1.0, avg: 12.1, max: 23.0) [2024-06-06 15:43:27,319][24114] Avg episode reward: [(0, '0.232')] [2024-06-06 15:43:29,631][24347] Updated weights for policy 0, policy_version 37868 (0.0022) [2024-06-06 15:43:32,318][24114] Fps is (10 sec: 44237.6, 60 sec: 44238.2, 300 sec: 44320.1). Total num frames: 620544000. Throughput: 0: 44104.0. Samples: 101771600. Policy #0 lag: (min: 1.0, avg: 12.1, max: 23.0) [2024-06-06 15:43:32,318][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:43:32,902][24347] Updated weights for policy 0, policy_version 37878 (0.0037) [2024-06-06 15:43:37,263][24347] Updated weights for policy 0, policy_version 37888 (0.0026) [2024-06-06 15:43:37,318][24114] Fps is (10 sec: 42598.7, 60 sec: 43964.0, 300 sec: 44209.1). Total num frames: 620756992. Throughput: 0: 44136.0. Samples: 102039040. Policy #0 lag: (min: 1.0, avg: 12.1, max: 23.0) [2024-06-06 15:43:37,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:43:40,220][24347] Updated weights for policy 0, policy_version 37898 (0.0030) [2024-06-06 15:43:42,318][24114] Fps is (10 sec: 47513.8, 60 sec: 43965.2, 300 sec: 44375.7). Total num frames: 621019136. Throughput: 0: 44032.0. Samples: 102295780. Policy #0 lag: (min: 1.0, avg: 12.1, max: 23.0) [2024-06-06 15:43:42,318][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:43:44,786][24347] Updated weights for policy 0, policy_version 37908 (0.0035) [2024-06-06 15:43:47,318][24114] Fps is (10 sec: 45874.4, 60 sec: 44509.8, 300 sec: 44320.1). Total num frames: 621215744. Throughput: 0: 44103.2. Samples: 102437960. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:43:47,319][24114] Avg episode reward: [(0, '0.231')] [2024-06-06 15:43:47,619][24347] Updated weights for policy 0, policy_version 37918 (0.0029) [2024-06-06 15:43:52,030][24347] Updated weights for policy 0, policy_version 37928 (0.0037) [2024-06-06 15:43:52,318][24114] Fps is (10 sec: 39321.1, 60 sec: 43690.6, 300 sec: 44153.5). Total num frames: 621412352. Throughput: 0: 44184.5. Samples: 102700920. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:43:52,320][24114] Avg episode reward: [(0, '0.229')] [2024-06-06 15:43:55,057][24347] Updated weights for policy 0, policy_version 37938 (0.0026) [2024-06-06 15:43:57,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44236.7, 300 sec: 44320.1). Total num frames: 621674496. Throughput: 0: 44369.8. Samples: 102969020. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:43:57,319][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:43:59,449][24347] Updated weights for policy 0, policy_version 37948 (0.0032) [2024-06-06 15:44:02,285][24347] Updated weights for policy 0, policy_version 37958 (0.0036) [2024-06-06 15:44:02,318][24114] Fps is (10 sec: 49152.0, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 621903872. Throughput: 0: 44192.9. Samples: 103102800. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:44:02,319][24114] Avg episode reward: [(0, '0.233')] [2024-06-06 15:44:06,922][24347] Updated weights for policy 0, policy_version 37968 (0.0029) [2024-06-06 15:44:07,318][24114] Fps is (10 sec: 39322.2, 60 sec: 43690.8, 300 sec: 44153.5). Total num frames: 622067712. Throughput: 0: 44345.1. Samples: 103375140. Policy #0 lag: (min: 0.0, avg: 10.2, max: 20.0) [2024-06-06 15:44:07,318][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:44:09,634][24347] Updated weights for policy 0, policy_version 37978 (0.0033) [2024-06-06 15:44:12,318][24114] Fps is (10 sec: 42598.0, 60 sec: 43690.6, 300 sec: 44264.6). Total num frames: 622329856. Throughput: 0: 44427.8. Samples: 103635140. Policy #0 lag: (min: 2.0, avg: 8.7, max: 21.0) [2024-06-06 15:44:12,318][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:44:14,166][24347] Updated weights for policy 0, policy_version 37988 (0.0035) [2024-06-06 15:44:16,958][24347] Updated weights for policy 0, policy_version 37998 (0.0037) [2024-06-06 15:44:17,318][24114] Fps is (10 sec: 50789.8, 60 sec: 45056.1, 300 sec: 44375.7). Total num frames: 622575616. Throughput: 0: 44460.0. Samples: 103772300. Policy #0 lag: (min: 2.0, avg: 8.7, max: 21.0) [2024-06-06 15:44:17,318][24114] Avg episode reward: [(0, '0.229')] [2024-06-06 15:44:21,416][24347] Updated weights for policy 0, policy_version 38008 (0.0034) [2024-06-06 15:44:22,318][24114] Fps is (10 sec: 42599.3, 60 sec: 44236.9, 300 sec: 44264.6). Total num frames: 622755840. Throughput: 0: 44360.8. Samples: 104035280. Policy #0 lag: (min: 2.0, avg: 8.7, max: 21.0) [2024-06-06 15:44:22,318][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:44:22,337][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000038010_622755840.pth... [2024-06-06 15:44:22,403][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000037363_612155392.pth [2024-06-06 15:44:22,901][24326] Signal inference workers to stop experience collection... (1450 times) [2024-06-06 15:44:22,932][24347] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-06-06 15:44:22,963][24326] Signal inference workers to resume experience collection... (1450 times) [2024-06-06 15:44:22,967][24347] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-06-06 15:44:24,399][24347] Updated weights for policy 0, policy_version 38018 (0.0037) [2024-06-06 15:44:27,318][24114] Fps is (10 sec: 40960.0, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 622985216. Throughput: 0: 44597.3. Samples: 104302660. Policy #0 lag: (min: 2.0, avg: 8.7, max: 21.0) [2024-06-06 15:44:27,319][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:44:28,940][24347] Updated weights for policy 0, policy_version 38028 (0.0029) [2024-06-06 15:44:31,763][24347] Updated weights for policy 0, policy_version 38038 (0.0051) [2024-06-06 15:44:32,318][24114] Fps is (10 sec: 49151.9, 60 sec: 45056.0, 300 sec: 44431.2). Total num frames: 623247360. Throughput: 0: 44383.2. Samples: 104435200. Policy #0 lag: (min: 0.0, avg: 12.1, max: 22.0) [2024-06-06 15:44:32,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:44:36,450][24347] Updated weights for policy 0, policy_version 38048 (0.0022) [2024-06-06 15:44:37,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44509.8, 300 sec: 44320.1). Total num frames: 623427584. Throughput: 0: 44305.0. Samples: 104694640. Policy #0 lag: (min: 0.0, avg: 12.1, max: 22.0) [2024-06-06 15:44:37,318][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:44:39,254][24347] Updated weights for policy 0, policy_version 38058 (0.0033) [2024-06-06 15:44:42,318][24114] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 623640576. Throughput: 0: 44373.4. Samples: 104965820. Policy #0 lag: (min: 0.0, avg: 12.1, max: 22.0) [2024-06-06 15:44:42,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:44:43,499][24347] Updated weights for policy 0, policy_version 38068 (0.0038) [2024-06-06 15:44:46,480][24347] Updated weights for policy 0, policy_version 38078 (0.0037) [2024-06-06 15:44:47,318][24114] Fps is (10 sec: 47513.5, 60 sec: 44783.0, 300 sec: 44375.7). Total num frames: 623902720. Throughput: 0: 44365.9. Samples: 105099260. Policy #0 lag: (min: 0.0, avg: 12.1, max: 22.0) [2024-06-06 15:44:47,318][24114] Avg episode reward: [(0, '0.225')] [2024-06-06 15:44:51,105][24347] Updated weights for policy 0, policy_version 38088 (0.0028) [2024-06-06 15:44:52,318][24114] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 44375.6). Total num frames: 624099328. Throughput: 0: 44272.3. Samples: 105367400. Policy #0 lag: (min: 0.0, avg: 12.1, max: 22.0) [2024-06-06 15:44:52,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:44:53,888][24347] Updated weights for policy 0, policy_version 38098 (0.0026) [2024-06-06 15:44:57,318][24114] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 44153.5). Total num frames: 624295936. Throughput: 0: 44282.8. Samples: 105627860. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 15:44:57,318][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:44:58,621][24347] Updated weights for policy 0, policy_version 38108 (0.0024) [2024-06-06 15:45:01,337][24347] Updated weights for policy 0, policy_version 38118 (0.0026) [2024-06-06 15:45:02,318][24114] Fps is (10 sec: 47513.9, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 624574464. Throughput: 0: 44167.6. Samples: 105759840. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 15:45:02,318][24114] Avg episode reward: [(0, '0.233')] [2024-06-06 15:45:05,939][24347] Updated weights for policy 0, policy_version 38128 (0.0035) [2024-06-06 15:45:07,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44782.8, 300 sec: 44375.6). Total num frames: 624754688. Throughput: 0: 44307.0. Samples: 106029100. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 15:45:07,318][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:45:08,823][24347] Updated weights for policy 0, policy_version 38138 (0.0042) [2024-06-06 15:45:12,318][24114] Fps is (10 sec: 39321.7, 60 sec: 43963.9, 300 sec: 44153.5). Total num frames: 624967680. Throughput: 0: 44354.7. Samples: 106298620. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 15:45:12,318][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:45:12,989][24347] Updated weights for policy 0, policy_version 38148 (0.0040) [2024-06-06 15:45:15,945][24347] Updated weights for policy 0, policy_version 38158 (0.0028) [2024-06-06 15:45:17,318][24114] Fps is (10 sec: 47513.4, 60 sec: 44236.7, 300 sec: 44375.6). Total num frames: 625229824. Throughput: 0: 44293.2. Samples: 106428400. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 15:45:17,319][24114] Avg episode reward: [(0, '0.229')] [2024-06-06 15:45:20,576][24347] Updated weights for policy 0, policy_version 38168 (0.0035) [2024-06-06 15:45:22,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44509.9, 300 sec: 44375.7). Total num frames: 625426432. Throughput: 0: 44435.5. Samples: 106694240. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 15:45:22,318][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:45:23,456][24347] Updated weights for policy 0, policy_version 38178 (0.0045) [2024-06-06 15:45:27,318][24114] Fps is (10 sec: 39322.0, 60 sec: 43963.8, 300 sec: 44153.5). Total num frames: 625623040. Throughput: 0: 44212.9. Samples: 106955400. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 15:45:27,318][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 15:45:28,126][24347] Updated weights for policy 0, policy_version 38188 (0.0044) [2024-06-06 15:45:31,021][24347] Updated weights for policy 0, policy_version 38198 (0.0031) [2024-06-06 15:45:32,318][24114] Fps is (10 sec: 47513.1, 60 sec: 44236.7, 300 sec: 44431.2). Total num frames: 625901568. Throughput: 0: 44169.2. Samples: 107086880. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 15:45:32,319][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:45:35,471][24347] Updated weights for policy 0, policy_version 38208 (0.0023) [2024-06-06 15:45:37,318][24114] Fps is (10 sec: 47513.8, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 626098176. Throughput: 0: 44324.1. Samples: 107361980. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 15:45:37,318][24114] Avg episode reward: [(0, '0.226')] [2024-06-06 15:45:38,008][24347] Updated weights for policy 0, policy_version 38218 (0.0037) [2024-06-06 15:45:42,318][24114] Fps is (10 sec: 40959.7, 60 sec: 44509.7, 300 sec: 44209.0). Total num frames: 626311168. Throughput: 0: 44404.8. Samples: 107626080. Policy #0 lag: (min: 0.0, avg: 10.3, max: 22.0) [2024-06-06 15:45:42,319][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:45:42,664][24347] Updated weights for policy 0, policy_version 38228 (0.0029) [2024-06-06 15:45:45,084][24326] Signal inference workers to stop experience collection... (1500 times) [2024-06-06 15:45:45,131][24347] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-06-06 15:45:45,136][24326] Signal inference workers to resume experience collection... (1500 times) [2024-06-06 15:45:45,146][24347] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-06-06 15:45:45,268][24347] Updated weights for policy 0, policy_version 38238 (0.0036) [2024-06-06 15:45:47,318][24114] Fps is (10 sec: 44236.5, 60 sec: 43963.7, 300 sec: 44320.1). Total num frames: 626540544. Throughput: 0: 44396.0. Samples: 107757660. Policy #0 lag: (min: 0.0, avg: 11.9, max: 20.0) [2024-06-06 15:45:47,319][24114] Avg episode reward: [(0, '0.225')] [2024-06-06 15:45:50,311][24347] Updated weights for policy 0, policy_version 38248 (0.0022) [2024-06-06 15:45:52,320][24114] Fps is (10 sec: 47504.7, 60 sec: 44781.5, 300 sec: 44486.7). Total num frames: 626786304. Throughput: 0: 44322.9. Samples: 108023720. Policy #0 lag: (min: 0.0, avg: 11.9, max: 20.0) [2024-06-06 15:45:52,321][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 15:45:52,814][24347] Updated weights for policy 0, policy_version 38258 (0.0039) [2024-06-06 15:45:57,318][24114] Fps is (10 sec: 40959.7, 60 sec: 44236.8, 300 sec: 44153.5). Total num frames: 626950144. Throughput: 0: 44219.4. Samples: 108288500. Policy #0 lag: (min: 0.0, avg: 11.9, max: 20.0) [2024-06-06 15:45:57,318][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:45:57,682][24347] Updated weights for policy 0, policy_version 38268 (0.0046) [2024-06-06 15:46:00,326][24347] Updated weights for policy 0, policy_version 38278 (0.0034) [2024-06-06 15:46:02,318][24114] Fps is (10 sec: 42607.3, 60 sec: 43963.8, 300 sec: 44320.1). Total num frames: 627212288. Throughput: 0: 44121.5. Samples: 108413860. Policy #0 lag: (min: 0.0, avg: 11.9, max: 20.0) [2024-06-06 15:46:02,318][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 15:46:04,953][24347] Updated weights for policy 0, policy_version 38288 (0.0035) [2024-06-06 15:46:07,318][24114] Fps is (10 sec: 50791.0, 60 sec: 45056.1, 300 sec: 44486.7). Total num frames: 627458048. Throughput: 0: 44319.6. Samples: 108688620. Policy #0 lag: (min: 0.0, avg: 11.9, max: 20.0) [2024-06-06 15:46:07,318][24114] Avg episode reward: [(0, '0.233')] [2024-06-06 15:46:07,424][24347] Updated weights for policy 0, policy_version 38298 (0.0033) [2024-06-06 15:46:12,217][24347] Updated weights for policy 0, policy_version 38308 (0.0031) [2024-06-06 15:46:12,318][24114] Fps is (10 sec: 42597.7, 60 sec: 44509.8, 300 sec: 44209.0). Total num frames: 627638272. Throughput: 0: 44559.0. Samples: 108960560. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:46:12,319][24114] Avg episode reward: [(0, '0.231')] [2024-06-06 15:46:14,727][24347] Updated weights for policy 0, policy_version 38318 (0.0022) [2024-06-06 15:46:17,324][24114] Fps is (10 sec: 40935.4, 60 sec: 43959.4, 300 sec: 44319.2). Total num frames: 627867648. Throughput: 0: 44347.5. Samples: 109082780. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:46:17,325][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:46:19,698][24347] Updated weights for policy 0, policy_version 38328 (0.0032) [2024-06-06 15:46:22,267][24347] Updated weights for policy 0, policy_version 38338 (0.0037) [2024-06-06 15:46:22,318][24114] Fps is (10 sec: 49151.9, 60 sec: 45055.9, 300 sec: 44431.2). Total num frames: 628129792. Throughput: 0: 44241.2. Samples: 109352840. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:46:22,318][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:46:22,334][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000038338_628129792.pth... [2024-06-06 15:46:22,381][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000037687_617463808.pth [2024-06-06 15:46:26,925][24347] Updated weights for policy 0, policy_version 38348 (0.0030) [2024-06-06 15:46:27,318][24114] Fps is (10 sec: 42623.9, 60 sec: 44509.8, 300 sec: 44153.5). Total num frames: 628293632. Throughput: 0: 44286.4. Samples: 109618960. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:46:27,318][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:46:29,925][24347] Updated weights for policy 0, policy_version 38358 (0.0031) [2024-06-06 15:46:32,318][24114] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 44264.6). Total num frames: 628523008. Throughput: 0: 44195.9. Samples: 109746480. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:46:32,319][24114] Avg episode reward: [(0, '0.237')] [2024-06-06 15:46:34,457][24347] Updated weights for policy 0, policy_version 38368 (0.0035) [2024-06-06 15:46:36,971][24347] Updated weights for policy 0, policy_version 38378 (0.0036) [2024-06-06 15:46:37,318][24114] Fps is (10 sec: 49152.1, 60 sec: 44782.9, 300 sec: 44486.7). Total num frames: 628785152. Throughput: 0: 44274.0. Samples: 110015960. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 15:46:37,318][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:46:41,505][24347] Updated weights for policy 0, policy_version 38388 (0.0032) [2024-06-06 15:46:42,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 628965376. Throughput: 0: 44433.3. Samples: 110288000. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 15:46:42,322][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:46:44,248][24347] Updated weights for policy 0, policy_version 38398 (0.0033) [2024-06-06 15:46:47,318][24114] Fps is (10 sec: 40959.5, 60 sec: 44236.7, 300 sec: 44264.5). Total num frames: 629194752. Throughput: 0: 44518.0. Samples: 110417180. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 15:46:47,318][24114] Avg episode reward: [(0, '0.233')] [2024-06-06 15:46:48,893][24347] Updated weights for policy 0, policy_version 38408 (0.0036) [2024-06-06 15:46:51,660][24347] Updated weights for policy 0, policy_version 38418 (0.0035) [2024-06-06 15:46:52,318][24114] Fps is (10 sec: 49150.7, 60 sec: 44511.1, 300 sec: 44431.1). Total num frames: 629456896. Throughput: 0: 44375.1. Samples: 110685520. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 15:46:52,319][24114] Avg episode reward: [(0, '0.233')] [2024-06-06 15:46:56,299][24347] Updated weights for policy 0, policy_version 38428 (0.0037) [2024-06-06 15:46:57,318][24114] Fps is (10 sec: 45875.2, 60 sec: 45056.0, 300 sec: 44320.1). Total num frames: 629653504. Throughput: 0: 44294.2. Samples: 110953800. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 15:46:57,319][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:46:59,158][24347] Updated weights for policy 0, policy_version 38438 (0.0036) [2024-06-06 15:47:02,318][24114] Fps is (10 sec: 39322.6, 60 sec: 43963.6, 300 sec: 44264.6). Total num frames: 629850112. Throughput: 0: 44512.4. Samples: 111085580. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 15:47:02,318][24114] Avg episode reward: [(0, '0.231')] [2024-06-06 15:47:03,783][24347] Updated weights for policy 0, policy_version 38448 (0.0027) [2024-06-06 15:47:06,419][24347] Updated weights for policy 0, policy_version 38458 (0.0031) [2024-06-06 15:47:07,318][24114] Fps is (10 sec: 45875.6, 60 sec: 44236.8, 300 sec: 44320.1). Total num frames: 630112256. Throughput: 0: 44453.0. Samples: 111353220. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 15:47:07,318][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:47:10,907][24347] Updated weights for policy 0, policy_version 38468 (0.0033) [2024-06-06 15:47:11,554][24326] Signal inference workers to stop experience collection... (1550 times) [2024-06-06 15:47:11,608][24347] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-06-06 15:47:11,616][24326] Signal inference workers to resume experience collection... (1550 times) [2024-06-06 15:47:11,623][24347] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-06-06 15:47:12,318][24114] Fps is (10 sec: 49152.4, 60 sec: 45056.0, 300 sec: 44431.2). Total num frames: 630341632. Throughput: 0: 44482.6. Samples: 111620680. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 15:47:12,318][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:47:13,938][24347] Updated weights for policy 0, policy_version 38478 (0.0024) [2024-06-06 15:47:17,318][24114] Fps is (10 sec: 40959.8, 60 sec: 44241.2, 300 sec: 44264.6). Total num frames: 630521856. Throughput: 0: 44509.4. Samples: 111749400. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 15:47:17,319][24114] Avg episode reward: [(0, '0.232')] [2024-06-06 15:47:18,529][24347] Updated weights for policy 0, policy_version 38488 (0.0031) [2024-06-06 15:47:21,475][24347] Updated weights for policy 0, policy_version 38498 (0.0034) [2024-06-06 15:47:22,324][24114] Fps is (10 sec: 44210.7, 60 sec: 44232.5, 300 sec: 44430.3). Total num frames: 630784000. Throughput: 0: 44485.2. Samples: 112018060. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 15:47:22,324][24114] Avg episode reward: [(0, '0.237')] [2024-06-06 15:47:25,726][24347] Updated weights for policy 0, policy_version 38508 (0.0030) [2024-06-06 15:47:27,318][24114] Fps is (10 sec: 45875.7, 60 sec: 44783.0, 300 sec: 44375.9). Total num frames: 630980608. Throughput: 0: 44321.5. Samples: 112282460. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 15:47:27,318][24114] Avg episode reward: [(0, '0.227')] [2024-06-06 15:47:28,813][24347] Updated weights for policy 0, policy_version 38518 (0.0041) [2024-06-06 15:47:32,318][24114] Fps is (10 sec: 42624.0, 60 sec: 44783.1, 300 sec: 44375.7). Total num frames: 631209984. Throughput: 0: 44279.7. Samples: 112409760. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 15:47:32,318][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:47:33,373][24347] Updated weights for policy 0, policy_version 38528 (0.0036) [2024-06-06 15:47:36,282][24347] Updated weights for policy 0, policy_version 38538 (0.0027) [2024-06-06 15:47:37,318][24114] Fps is (10 sec: 44236.2, 60 sec: 43963.6, 300 sec: 44209.3). Total num frames: 631422976. Throughput: 0: 44321.2. Samples: 112679960. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 15:47:37,319][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:47:40,619][24347] Updated weights for policy 0, policy_version 38548 (0.0030) [2024-06-06 15:47:42,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 631652352. Throughput: 0: 44244.5. Samples: 112944800. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 15:47:42,318][24114] Avg episode reward: [(0, '0.239')] [2024-06-06 15:47:43,790][24347] Updated weights for policy 0, policy_version 38558 (0.0032) [2024-06-06 15:47:47,318][24114] Fps is (10 sec: 45876.3, 60 sec: 44783.1, 300 sec: 44375.7). Total num frames: 631881728. Throughput: 0: 44333.2. Samples: 113080560. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 15:47:47,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:47:48,088][24347] Updated weights for policy 0, policy_version 38568 (0.0035) [2024-06-06 15:47:51,038][24347] Updated weights for policy 0, policy_version 38578 (0.0043) [2024-06-06 15:47:52,318][24114] Fps is (10 sec: 44236.9, 60 sec: 43964.0, 300 sec: 44320.1). Total num frames: 632094720. Throughput: 0: 44298.2. Samples: 113346640. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 15:47:52,319][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:47:55,218][24347] Updated weights for policy 0, policy_version 38588 (0.0026) [2024-06-06 15:47:57,318][24114] Fps is (10 sec: 44235.8, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 632324096. Throughput: 0: 44276.9. Samples: 113613140. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 15:47:57,319][24114] Avg episode reward: [(0, '0.239')] [2024-06-06 15:47:58,571][24347] Updated weights for policy 0, policy_version 38598 (0.0033) [2024-06-06 15:48:02,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44509.9, 300 sec: 44320.1). Total num frames: 632520704. Throughput: 0: 44440.0. Samples: 113749200. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 15:48:02,319][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:48:02,652][24347] Updated weights for policy 0, policy_version 38608 (0.0033) [2024-06-06 15:48:05,942][24347] Updated weights for policy 0, policy_version 38618 (0.0031) [2024-06-06 15:48:07,318][24114] Fps is (10 sec: 42598.7, 60 sec: 43963.7, 300 sec: 44209.0). Total num frames: 632750080. Throughput: 0: 44490.3. Samples: 114019860. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 15:48:07,318][24114] Avg episode reward: [(0, '0.237')] [2024-06-06 15:48:10,027][24347] Updated weights for policy 0, policy_version 38628 (0.0026) [2024-06-06 15:48:12,318][24114] Fps is (10 sec: 45875.8, 60 sec: 43963.8, 300 sec: 44431.2). Total num frames: 632979456. Throughput: 0: 44530.2. Samples: 114286320. Policy #0 lag: (min: 0.0, avg: 11.5, max: 24.0) [2024-06-06 15:48:12,318][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:48:13,105][24347] Updated weights for policy 0, policy_version 38638 (0.0031) [2024-06-06 15:48:17,054][24347] Updated weights for policy 0, policy_version 38648 (0.0024) [2024-06-06 15:48:17,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 633208832. Throughput: 0: 44765.7. Samples: 114424220. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:48:17,318][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:48:20,417][24347] Updated weights for policy 0, policy_version 38658 (0.0027) [2024-06-06 15:48:22,318][24114] Fps is (10 sec: 44236.4, 60 sec: 43968.1, 300 sec: 44375.6). Total num frames: 633421824. Throughput: 0: 44394.3. Samples: 114677700. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:48:22,327][24114] Avg episode reward: [(0, '0.221')] [2024-06-06 15:48:22,336][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000038661_633421824.pth... [2024-06-06 15:48:22,394][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000038010_622755840.pth [2024-06-06 15:48:24,366][24347] Updated weights for policy 0, policy_version 38668 (0.0036) [2024-06-06 15:48:27,324][24114] Fps is (10 sec: 44210.4, 60 sec: 44505.4, 300 sec: 44430.3). Total num frames: 633651200. Throughput: 0: 44519.0. Samples: 114948420. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:48:27,325][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:48:28,143][24347] Updated weights for policy 0, policy_version 38678 (0.0031) [2024-06-06 15:48:31,923][24326] Signal inference workers to stop experience collection... (1600 times) [2024-06-06 15:48:31,924][24326] Signal inference workers to resume experience collection... (1600 times) [2024-06-06 15:48:31,943][24347] InferenceWorker_p0-w0: stopping experience collection (1600 times) [2024-06-06 15:48:31,943][24347] InferenceWorker_p0-w0: resuming experience collection (1600 times) [2024-06-06 15:48:32,078][24347] Updated weights for policy 0, policy_version 38688 (0.0026) [2024-06-06 15:48:32,318][24114] Fps is (10 sec: 44237.6, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 633864192. Throughput: 0: 44556.4. Samples: 115085600. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:48:32,318][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:48:35,333][24347] Updated weights for policy 0, policy_version 38698 (0.0027) [2024-06-06 15:48:37,318][24114] Fps is (10 sec: 42623.4, 60 sec: 44236.8, 300 sec: 44264.5). Total num frames: 634077184. Throughput: 0: 44563.9. Samples: 115352020. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 15:48:37,319][24114] Avg episode reward: [(0, '0.232')] [2024-06-06 15:48:39,473][24347] Updated weights for policy 0, policy_version 38708 (0.0041) [2024-06-06 15:48:42,318][24114] Fps is (10 sec: 45874.3, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 634322944. Throughput: 0: 44638.2. Samples: 115621860. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:48:42,319][24114] Avg episode reward: [(0, '0.230')] [2024-06-06 15:48:42,571][24347] Updated weights for policy 0, policy_version 38718 (0.0028) [2024-06-06 15:48:46,844][24347] Updated weights for policy 0, policy_version 38728 (0.0031) [2024-06-06 15:48:47,318][24114] Fps is (10 sec: 45876.1, 60 sec: 44236.7, 300 sec: 44486.7). Total num frames: 634535936. Throughput: 0: 44611.7. Samples: 115756720. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:48:47,318][24114] Avg episode reward: [(0, '0.237')] [2024-06-06 15:48:50,146][24347] Updated weights for policy 0, policy_version 38738 (0.0042) [2024-06-06 15:48:52,320][24114] Fps is (10 sec: 40952.3, 60 sec: 43962.3, 300 sec: 44264.3). Total num frames: 634732544. Throughput: 0: 44229.6. Samples: 116010280. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:48:52,321][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:48:54,230][24347] Updated weights for policy 0, policy_version 38748 (0.0023) [2024-06-06 15:48:57,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44236.9, 300 sec: 44320.1). Total num frames: 634978304. Throughput: 0: 44319.6. Samples: 116280700. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:48:57,318][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:48:57,975][24347] Updated weights for policy 0, policy_version 38758 (0.0039) [2024-06-06 15:49:01,493][24347] Updated weights for policy 0, policy_version 38768 (0.0034) [2024-06-06 15:49:02,318][24114] Fps is (10 sec: 49161.4, 60 sec: 45056.0, 300 sec: 44597.8). Total num frames: 635224064. Throughput: 0: 44336.4. Samples: 116419360. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 15:49:02,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:49:05,061][24347] Updated weights for policy 0, policy_version 38778 (0.0039) [2024-06-06 15:49:07,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44509.8, 300 sec: 44375.7). Total num frames: 635420672. Throughput: 0: 44544.9. Samples: 116682220. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:49:07,319][24114] Avg episode reward: [(0, '0.232')] [2024-06-06 15:49:09,254][24347] Updated weights for policy 0, policy_version 38788 (0.0039) [2024-06-06 15:49:12,218][24347] Updated weights for policy 0, policy_version 38798 (0.0026) [2024-06-06 15:49:12,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44782.9, 300 sec: 44375.7). Total num frames: 635666432. Throughput: 0: 44411.3. Samples: 116946660. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:49:12,318][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:49:16,445][24347] Updated weights for policy 0, policy_version 38808 (0.0040) [2024-06-06 15:49:17,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 635863040. Throughput: 0: 44380.7. Samples: 117082740. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:49:17,319][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 15:49:19,794][24347] Updated weights for policy 0, policy_version 38818 (0.0029) [2024-06-06 15:49:22,318][24114] Fps is (10 sec: 42597.6, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 636092416. Throughput: 0: 44269.3. Samples: 117344140. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:49:22,319][24114] Avg episode reward: [(0, '0.225')] [2024-06-06 15:49:23,970][24347] Updated weights for policy 0, policy_version 38828 (0.0032) [2024-06-06 15:49:27,318][24114] Fps is (10 sec: 44237.3, 60 sec: 44241.2, 300 sec: 44264.6). Total num frames: 636305408. Throughput: 0: 44332.1. Samples: 117616800. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:49:27,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:49:27,361][24347] Updated weights for policy 0, policy_version 38838 (0.0025) [2024-06-06 15:49:31,055][24347] Updated weights for policy 0, policy_version 38848 (0.0042) [2024-06-06 15:49:32,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44509.7, 300 sec: 44431.2). Total num frames: 636534784. Throughput: 0: 44368.8. Samples: 117753320. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:49:32,319][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:49:34,498][24347] Updated weights for policy 0, policy_version 38858 (0.0024) [2024-06-06 15:49:37,318][24114] Fps is (10 sec: 44235.9, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 636747776. Throughput: 0: 44808.5. Samples: 118026580. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:49:37,318][24114] Avg episode reward: [(0, '0.231')] [2024-06-06 15:49:38,132][24347] Updated weights for policy 0, policy_version 38868 (0.0032) [2024-06-06 15:49:41,735][24347] Updated weights for policy 0, policy_version 38878 (0.0029) [2024-06-06 15:49:42,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44509.9, 300 sec: 44375.6). Total num frames: 636993536. Throughput: 0: 44558.1. Samples: 118285820. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:49:42,318][24114] Avg episode reward: [(0, '0.232')] [2024-06-06 15:49:45,738][24347] Updated weights for policy 0, policy_version 38888 (0.0031) [2024-06-06 15:49:47,318][24114] Fps is (10 sec: 47514.6, 60 sec: 44782.9, 300 sec: 44486.7). Total num frames: 637222912. Throughput: 0: 44522.3. Samples: 118422860. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:49:47,318][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:49:49,300][24347] Updated weights for policy 0, policy_version 38898 (0.0039) [2024-06-06 15:49:52,318][24114] Fps is (10 sec: 44236.7, 60 sec: 45057.4, 300 sec: 44542.3). Total num frames: 637435904. Throughput: 0: 44602.7. Samples: 118689340. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 15:49:52,318][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:49:53,238][24347] Updated weights for policy 0, policy_version 38908 (0.0037) [2024-06-06 15:49:54,407][24326] Signal inference workers to stop experience collection... (1650 times) [2024-06-06 15:49:54,408][24326] Signal inference workers to resume experience collection... (1650 times) [2024-06-06 15:49:54,454][24347] InferenceWorker_p0-w0: stopping experience collection (1650 times) [2024-06-06 15:49:54,454][24347] InferenceWorker_p0-w0: resuming experience collection (1650 times) [2024-06-06 15:49:56,741][24347] Updated weights for policy 0, policy_version 38918 (0.0028) [2024-06-06 15:49:57,320][24114] Fps is (10 sec: 44228.1, 60 sec: 44781.5, 300 sec: 44375.4). Total num frames: 637665280. Throughput: 0: 44582.1. Samples: 118952940. Policy #0 lag: (min: 1.0, avg: 11.7, max: 24.0) [2024-06-06 15:49:57,321][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:50:00,327][24347] Updated weights for policy 0, policy_version 38928 (0.0021) [2024-06-06 15:50:02,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 637894656. Throughput: 0: 44659.1. Samples: 119092400. Policy #0 lag: (min: 1.0, avg: 11.7, max: 24.0) [2024-06-06 15:50:02,319][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:50:03,858][24347] Updated weights for policy 0, policy_version 38938 (0.0032) [2024-06-06 15:50:07,318][24114] Fps is (10 sec: 42606.9, 60 sec: 44510.0, 300 sec: 44486.7). Total num frames: 638091264. Throughput: 0: 44870.9. Samples: 119363320. Policy #0 lag: (min: 1.0, avg: 11.7, max: 24.0) [2024-06-06 15:50:07,318][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:50:07,653][24347] Updated weights for policy 0, policy_version 38948 (0.0037) [2024-06-06 15:50:11,242][24347] Updated weights for policy 0, policy_version 38958 (0.0032) [2024-06-06 15:50:12,318][24114] Fps is (10 sec: 40960.5, 60 sec: 43963.8, 300 sec: 44320.1). Total num frames: 638304256. Throughput: 0: 44567.1. Samples: 119622320. Policy #0 lag: (min: 1.0, avg: 11.7, max: 24.0) [2024-06-06 15:50:12,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:50:14,989][24347] Updated weights for policy 0, policy_version 38968 (0.0026) [2024-06-06 15:50:17,318][24114] Fps is (10 sec: 45874.6, 60 sec: 44782.9, 300 sec: 44486.7). Total num frames: 638550016. Throughput: 0: 44488.5. Samples: 119755300. Policy #0 lag: (min: 1.0, avg: 11.7, max: 24.0) [2024-06-06 15:50:17,319][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:50:18,698][24347] Updated weights for policy 0, policy_version 38978 (0.0031) [2024-06-06 15:50:22,318][24114] Fps is (10 sec: 44235.8, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 638746624. Throughput: 0: 44414.2. Samples: 120025220. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 15:50:22,319][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 15:50:22,383][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000038987_638763008.pth... [2024-06-06 15:50:22,434][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000038338_628129792.pth [2024-06-06 15:50:22,585][24347] Updated weights for policy 0, policy_version 38988 (0.0030) [2024-06-06 15:50:26,334][24347] Updated weights for policy 0, policy_version 38998 (0.0044) [2024-06-06 15:50:27,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44782.8, 300 sec: 44375.6). Total num frames: 638992384. Throughput: 0: 44598.6. Samples: 120292760. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 15:50:27,318][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:50:29,650][24347] Updated weights for policy 0, policy_version 39008 (0.0022) [2024-06-06 15:50:32,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 639205376. Throughput: 0: 44528.7. Samples: 120426660. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 15:50:32,319][24114] Avg episode reward: [(0, '0.237')] [2024-06-06 15:50:33,361][24347] Updated weights for policy 0, policy_version 39018 (0.0034) [2024-06-06 15:50:37,187][24347] Updated weights for policy 0, policy_version 39028 (0.0040) [2024-06-06 15:50:37,318][24114] Fps is (10 sec: 44237.3, 60 sec: 44783.1, 300 sec: 44486.8). Total num frames: 639434752. Throughput: 0: 44621.4. Samples: 120697300. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 15:50:37,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 15:50:40,700][24347] Updated weights for policy 0, policy_version 39038 (0.0038) [2024-06-06 15:50:42,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 639647744. Throughput: 0: 44539.6. Samples: 120957140. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 15:50:42,319][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:50:44,737][24347] Updated weights for policy 0, policy_version 39048 (0.0025) [2024-06-06 15:50:47,318][24114] Fps is (10 sec: 42598.3, 60 sec: 43963.7, 300 sec: 44320.4). Total num frames: 639860736. Throughput: 0: 44267.2. Samples: 121084420. Policy #0 lag: (min: 1.0, avg: 11.5, max: 23.0) [2024-06-06 15:50:47,318][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 15:50:48,166][24347] Updated weights for policy 0, policy_version 39058 (0.0035) [2024-06-06 15:50:51,889][24347] Updated weights for policy 0, policy_version 39068 (0.0028) [2024-06-06 15:50:52,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44236.9, 300 sec: 44542.3). Total num frames: 640090112. Throughput: 0: 44375.1. Samples: 121360200. Policy #0 lag: (min: 1.0, avg: 11.5, max: 23.0) [2024-06-06 15:50:52,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:50:52,870][24326] Signal inference workers to stop experience collection... (1700 times) [2024-06-06 15:50:52,871][24326] Signal inference workers to resume experience collection... (1700 times) [2024-06-06 15:50:52,888][24347] InferenceWorker_p0-w0: stopping experience collection (1700 times) [2024-06-06 15:50:52,892][24347] InferenceWorker_p0-w0: resuming experience collection (1700 times) [2024-06-06 15:50:55,734][24347] Updated weights for policy 0, policy_version 39078 (0.0031) [2024-06-06 15:50:57,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44238.2, 300 sec: 44431.2). Total num frames: 640319488. Throughput: 0: 44402.6. Samples: 121620440. Policy #0 lag: (min: 1.0, avg: 11.5, max: 23.0) [2024-06-06 15:50:57,318][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 15:50:59,032][24347] Updated weights for policy 0, policy_version 39088 (0.0028) [2024-06-06 15:51:02,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44236.9, 300 sec: 44375.6). Total num frames: 640548864. Throughput: 0: 44596.5. Samples: 121762140. Policy #0 lag: (min: 1.0, avg: 11.5, max: 23.0) [2024-06-06 15:51:02,318][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:51:02,797][24347] Updated weights for policy 0, policy_version 39098 (0.0019) [2024-06-06 15:51:06,563][24347] Updated weights for policy 0, policy_version 39108 (0.0037) [2024-06-06 15:51:07,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 640761856. Throughput: 0: 44621.5. Samples: 122033180. Policy #0 lag: (min: 1.0, avg: 11.5, max: 23.0) [2024-06-06 15:51:07,319][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:51:10,223][24347] Updated weights for policy 0, policy_version 39118 (0.0031) [2024-06-06 15:51:12,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44782.9, 300 sec: 44487.6). Total num frames: 640991232. Throughput: 0: 44523.6. Samples: 122296320. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 15:51:12,318][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:51:14,022][24347] Updated weights for policy 0, policy_version 39128 (0.0033) [2024-06-06 15:51:17,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44236.8, 300 sec: 44320.1). Total num frames: 641204224. Throughput: 0: 44482.7. Samples: 122428380. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 15:51:17,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 15:51:17,601][24347] Updated weights for policy 0, policy_version 39138 (0.0029) [2024-06-06 15:51:21,136][24347] Updated weights for policy 0, policy_version 39148 (0.0026) [2024-06-06 15:51:22,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 641433600. Throughput: 0: 44482.6. Samples: 122699020. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 15:51:22,319][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:51:25,077][24347] Updated weights for policy 0, policy_version 39158 (0.0035) [2024-06-06 15:51:27,318][24114] Fps is (10 sec: 45875.6, 60 sec: 44510.0, 300 sec: 44542.3). Total num frames: 641662976. Throughput: 0: 44808.1. Samples: 122973500. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 15:51:27,318][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:51:28,201][24347] Updated weights for policy 0, policy_version 39168 (0.0034) [2024-06-06 15:51:32,054][24347] Updated weights for policy 0, policy_version 39178 (0.0028) [2024-06-06 15:51:32,318][24114] Fps is (10 sec: 45875.6, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 641892352. Throughput: 0: 44963.1. Samples: 123107760. Policy #0 lag: (min: 0.0, avg: 9.9, max: 21.0) [2024-06-06 15:51:32,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:51:35,731][24347] Updated weights for policy 0, policy_version 39188 (0.0026) [2024-06-06 15:51:37,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 642105344. Throughput: 0: 44647.1. Samples: 123369320. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:51:37,318][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 15:51:39,579][24347] Updated weights for policy 0, policy_version 39198 (0.0033) [2024-06-06 15:51:42,318][24114] Fps is (10 sec: 42598.5, 60 sec: 44510.0, 300 sec: 44486.8). Total num frames: 642318336. Throughput: 0: 44845.8. Samples: 123638500. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:51:42,318][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 15:51:43,180][24347] Updated weights for policy 0, policy_version 39208 (0.0038) [2024-06-06 15:51:47,155][24347] Updated weights for policy 0, policy_version 39218 (0.0025) [2024-06-06 15:51:47,318][24114] Fps is (10 sec: 45874.5, 60 sec: 45055.9, 300 sec: 44431.2). Total num frames: 642564096. Throughput: 0: 44633.6. Samples: 123770660. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:51:47,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 15:51:49,557][24326] Signal inference workers to stop experience collection... (1750 times) [2024-06-06 15:51:49,607][24347] InferenceWorker_p0-w0: stopping experience collection (1750 times) [2024-06-06 15:51:49,615][24326] Signal inference workers to resume experience collection... (1750 times) [2024-06-06 15:51:49,621][24347] InferenceWorker_p0-w0: resuming experience collection (1750 times) [2024-06-06 15:51:50,484][24347] Updated weights for policy 0, policy_version 39228 (0.0035) [2024-06-06 15:51:52,318][24114] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 44486.7). Total num frames: 642777088. Throughput: 0: 44522.7. Samples: 124036700. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:51:52,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 15:51:54,237][24347] Updated weights for policy 0, policy_version 39238 (0.0032) [2024-06-06 15:51:57,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44782.8, 300 sec: 44597.8). Total num frames: 643006464. Throughput: 0: 44783.0. Samples: 124311560. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 15:51:57,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 15:51:57,706][24347] Updated weights for policy 0, policy_version 39248 (0.0038) [2024-06-06 15:52:01,422][24347] Updated weights for policy 0, policy_version 39258 (0.0028) [2024-06-06 15:52:02,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 44486.7). Total num frames: 643235840. Throughput: 0: 44780.5. Samples: 124443500. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 15:52:02,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:52:05,057][24347] Updated weights for policy 0, policy_version 39268 (0.0023) [2024-06-06 15:52:07,318][24114] Fps is (10 sec: 44237.7, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 643448832. Throughput: 0: 44583.7. Samples: 124705280. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 15:52:07,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:52:09,161][24347] Updated weights for policy 0, policy_version 39278 (0.0032) [2024-06-06 15:52:12,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 643661824. Throughput: 0: 44380.8. Samples: 124970640. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 15:52:12,318][24114] Avg episode reward: [(0, '0.257')] [2024-06-06 15:52:12,604][24347] Updated weights for policy 0, policy_version 39288 (0.0042) [2024-06-06 15:52:16,407][24347] Updated weights for policy 0, policy_version 39298 (0.0031) [2024-06-06 15:52:17,318][24114] Fps is (10 sec: 45874.0, 60 sec: 45055.9, 300 sec: 44487.6). Total num frames: 643907584. Throughput: 0: 44414.0. Samples: 125106400. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 15:52:17,319][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:52:19,743][24347] Updated weights for policy 0, policy_version 39308 (0.0029) [2024-06-06 15:52:22,318][24114] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 44542.2). Total num frames: 644120576. Throughput: 0: 44628.3. Samples: 125377600. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 15:52:22,319][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:52:22,325][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000039314_644120576.pth... [2024-06-06 15:52:22,376][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000038661_633421824.pth [2024-06-06 15:52:23,517][24347] Updated weights for policy 0, policy_version 39318 (0.0040) [2024-06-06 15:52:27,206][24347] Updated weights for policy 0, policy_version 39328 (0.0037) [2024-06-06 15:52:27,318][24114] Fps is (10 sec: 44237.5, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 644349952. Throughput: 0: 44616.4. Samples: 125646240. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 15:52:27,318][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:52:30,963][24347] Updated weights for policy 0, policy_version 39338 (0.0033) [2024-06-06 15:52:32,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 644562944. Throughput: 0: 44624.1. Samples: 125778740. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 15:52:32,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 15:52:34,717][24347] Updated weights for policy 0, policy_version 39348 (0.0032) [2024-06-06 15:52:37,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 644775936. Throughput: 0: 44541.7. Samples: 126041080. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 15:52:37,319][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:52:38,362][24347] Updated weights for policy 0, policy_version 39358 (0.0024) [2024-06-06 15:52:41,878][24347] Updated weights for policy 0, policy_version 39368 (0.0030) [2024-06-06 15:52:42,318][24114] Fps is (10 sec: 45875.3, 60 sec: 45056.0, 300 sec: 44542.2). Total num frames: 645021696. Throughput: 0: 44314.4. Samples: 126305700. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 15:52:42,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 15:52:45,557][24347] Updated weights for policy 0, policy_version 39378 (0.0041) [2024-06-06 15:52:47,324][24114] Fps is (10 sec: 45848.1, 60 sec: 44505.5, 300 sec: 44541.4). Total num frames: 645234688. Throughput: 0: 44506.1. Samples: 126446540. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 15:52:47,325][24114] Avg episode reward: [(0, '0.226')] [2024-06-06 15:52:49,365][24347] Updated weights for policy 0, policy_version 39388 (0.0045) [2024-06-06 15:52:52,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 645447680. Throughput: 0: 44627.0. Samples: 126713500. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:52:52,318][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:52:52,711][24347] Updated weights for policy 0, policy_version 39398 (0.0040) [2024-06-06 15:52:56,646][24347] Updated weights for policy 0, policy_version 39408 (0.0028) [2024-06-06 15:52:57,318][24114] Fps is (10 sec: 44262.8, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 645677056. Throughput: 0: 44665.7. Samples: 126980600. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:52:57,319][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 15:53:00,187][24347] Updated weights for policy 0, policy_version 39418 (0.0031) [2024-06-06 15:53:02,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 645906432. Throughput: 0: 44721.0. Samples: 127118840. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:53:02,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:53:03,936][24347] Updated weights for policy 0, policy_version 39428 (0.0035) [2024-06-06 15:53:06,881][24326] Signal inference workers to stop experience collection... (1800 times) [2024-06-06 15:53:06,928][24347] InferenceWorker_p0-w0: stopping experience collection (1800 times) [2024-06-06 15:53:06,937][24326] Signal inference workers to resume experience collection... (1800 times) [2024-06-06 15:53:06,955][24347] InferenceWorker_p0-w0: resuming experience collection (1800 times) [2024-06-06 15:53:07,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44782.8, 300 sec: 44597.8). Total num frames: 646135808. Throughput: 0: 44494.7. Samples: 127379860. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:53:07,319][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:53:07,561][24347] Updated weights for policy 0, policy_version 39438 (0.0044) [2024-06-06 15:53:11,379][24347] Updated weights for policy 0, policy_version 39448 (0.0025) [2024-06-06 15:53:12,318][24114] Fps is (10 sec: 45874.6, 60 sec: 45055.9, 300 sec: 44597.8). Total num frames: 646365184. Throughput: 0: 44460.8. Samples: 127646980. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 15:53:12,319][24114] Avg episode reward: [(0, '0.237')] [2024-06-06 15:53:14,824][24347] Updated weights for policy 0, policy_version 39458 (0.0030) [2024-06-06 15:53:17,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 646578176. Throughput: 0: 44549.2. Samples: 127783460. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 15:53:17,319][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:53:18,691][24347] Updated weights for policy 0, policy_version 39468 (0.0023) [2024-06-06 15:53:21,939][24347] Updated weights for policy 0, policy_version 39478 (0.0028) [2024-06-06 15:53:22,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44783.0, 300 sec: 44598.7). Total num frames: 646807552. Throughput: 0: 44838.7. Samples: 128058820. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 15:53:22,327][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:53:25,849][24347] Updated weights for policy 0, policy_version 39488 (0.0033) [2024-06-06 15:53:27,318][24114] Fps is (10 sec: 45875.8, 60 sec: 44783.0, 300 sec: 44653.3). Total num frames: 647036928. Throughput: 0: 44872.0. Samples: 128324940. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 15:53:27,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:53:29,519][24347] Updated weights for policy 0, policy_version 39498 (0.0042) [2024-06-06 15:53:32,318][24114] Fps is (10 sec: 40959.7, 60 sec: 44236.7, 300 sec: 44542.3). Total num frames: 647217152. Throughput: 0: 44596.4. Samples: 128453120. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 15:53:32,319][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:53:33,318][24347] Updated weights for policy 0, policy_version 39508 (0.0035) [2024-06-06 15:53:36,843][24347] Updated weights for policy 0, policy_version 39518 (0.0037) [2024-06-06 15:53:37,318][24114] Fps is (10 sec: 42597.5, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 647462912. Throughput: 0: 44642.9. Samples: 128722440. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 15:53:37,319][24114] Avg episode reward: [(0, '0.239')] [2024-06-06 15:53:40,689][24347] Updated weights for policy 0, policy_version 39528 (0.0035) [2024-06-06 15:53:42,318][24114] Fps is (10 sec: 45876.0, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 647675904. Throughput: 0: 44585.0. Samples: 128986920. Policy #0 lag: (min: 0.0, avg: 10.9, max: 24.0) [2024-06-06 15:53:42,318][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 15:53:44,438][24347] Updated weights for policy 0, policy_version 39538 (0.0031) [2024-06-06 15:53:47,318][24114] Fps is (10 sec: 44238.0, 60 sec: 44514.4, 300 sec: 44653.7). Total num frames: 647905280. Throughput: 0: 44514.3. Samples: 129121980. Policy #0 lag: (min: 0.0, avg: 10.9, max: 24.0) [2024-06-06 15:53:47,318][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:53:48,267][24347] Updated weights for policy 0, policy_version 39548 (0.0033) [2024-06-06 15:53:51,560][24347] Updated weights for policy 0, policy_version 39558 (0.0034) [2024-06-06 15:53:52,318][24114] Fps is (10 sec: 47513.1, 60 sec: 45055.9, 300 sec: 44653.3). Total num frames: 648151040. Throughput: 0: 44831.1. Samples: 129397260. Policy #0 lag: (min: 0.0, avg: 10.9, max: 24.0) [2024-06-06 15:53:52,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 15:53:55,354][24347] Updated weights for policy 0, policy_version 39568 (0.0028) [2024-06-06 15:53:57,318][24114] Fps is (10 sec: 45874.3, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 648364032. Throughput: 0: 44862.3. Samples: 129665780. Policy #0 lag: (min: 0.0, avg: 10.9, max: 24.0) [2024-06-06 15:53:57,318][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:53:58,985][24347] Updated weights for policy 0, policy_version 39578 (0.0034) [2024-06-06 15:54:02,318][24114] Fps is (10 sec: 42597.5, 60 sec: 44509.7, 300 sec: 44597.8). Total num frames: 648577024. Throughput: 0: 44766.5. Samples: 129797960. Policy #0 lag: (min: 0.0, avg: 10.9, max: 24.0) [2024-06-06 15:54:02,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 15:54:02,630][24347] Updated weights for policy 0, policy_version 39588 (0.0027) [2024-06-06 15:54:05,918][24347] Updated weights for policy 0, policy_version 39598 (0.0037) [2024-06-06 15:54:07,318][24114] Fps is (10 sec: 45875.7, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 648822784. Throughput: 0: 44477.4. Samples: 130060300. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 15:54:07,318][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 15:54:10,010][24347] Updated weights for policy 0, policy_version 39608 (0.0036) [2024-06-06 15:54:12,318][24114] Fps is (10 sec: 44238.1, 60 sec: 44236.9, 300 sec: 44597.8). Total num frames: 649019392. Throughput: 0: 44686.2. Samples: 130335820. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 15:54:12,318][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:54:13,517][24347] Updated weights for policy 0, policy_version 39618 (0.0034) [2024-06-06 15:54:17,318][24114] Fps is (10 sec: 40959.3, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 649232384. Throughput: 0: 44828.5. Samples: 130470400. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 15:54:17,319][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:54:17,502][24347] Updated weights for policy 0, policy_version 39628 (0.0029) [2024-06-06 15:54:19,613][24326] Signal inference workers to stop experience collection... (1850 times) [2024-06-06 15:54:19,613][24326] Signal inference workers to resume experience collection... (1850 times) [2024-06-06 15:54:19,641][24347] InferenceWorker_p0-w0: stopping experience collection (1850 times) [2024-06-06 15:54:19,642][24347] InferenceWorker_p0-w0: resuming experience collection (1850 times) [2024-06-06 15:54:20,955][24347] Updated weights for policy 0, policy_version 39638 (0.0036) [2024-06-06 15:54:22,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44236.8, 300 sec: 44597.8). Total num frames: 649461760. Throughput: 0: 44903.2. Samples: 130743080. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 15:54:22,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 15:54:22,449][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000039641_649478144.pth... [2024-06-06 15:54:22,502][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000038987_638763008.pth [2024-06-06 15:54:24,907][24347] Updated weights for policy 0, policy_version 39648 (0.0027) [2024-06-06 15:54:27,320][24114] Fps is (10 sec: 49142.9, 60 sec: 44781.4, 300 sec: 44708.6). Total num frames: 649723904. Throughput: 0: 44782.0. Samples: 131002200. Policy #0 lag: (min: 0.0, avg: 11.0, max: 21.0) [2024-06-06 15:54:27,321][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 15:54:28,235][24347] Updated weights for policy 0, policy_version 39658 (0.0025) [2024-06-06 15:54:32,100][24347] Updated weights for policy 0, policy_version 39668 (0.0036) [2024-06-06 15:54:32,318][24114] Fps is (10 sec: 45875.3, 60 sec: 45056.1, 300 sec: 44653.4). Total num frames: 649920512. Throughput: 0: 44877.2. Samples: 131141460. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 15:54:32,319][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:54:35,421][24347] Updated weights for policy 0, policy_version 39678 (0.0033) [2024-06-06 15:54:37,318][24114] Fps is (10 sec: 42607.0, 60 sec: 44783.1, 300 sec: 44597.8). Total num frames: 650149888. Throughput: 0: 44577.9. Samples: 131403260. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 15:54:37,318][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:54:39,253][24347] Updated weights for policy 0, policy_version 39688 (0.0025) [2024-06-06 15:54:42,320][24114] Fps is (10 sec: 45866.2, 60 sec: 45054.5, 300 sec: 44597.5). Total num frames: 650379264. Throughput: 0: 44604.8. Samples: 131673080. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 15:54:42,321][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:54:42,949][24347] Updated weights for policy 0, policy_version 39698 (0.0039) [2024-06-06 15:54:46,911][24347] Updated weights for policy 0, policy_version 39708 (0.0029) [2024-06-06 15:54:47,324][24114] Fps is (10 sec: 44210.2, 60 sec: 44778.4, 300 sec: 44596.9). Total num frames: 650592256. Throughput: 0: 44666.8. Samples: 131808220. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 15:54:47,325][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 15:54:50,226][24347] Updated weights for policy 0, policy_version 39718 (0.0038) [2024-06-06 15:54:52,318][24114] Fps is (10 sec: 42607.2, 60 sec: 44236.9, 300 sec: 44542.6). Total num frames: 650805248. Throughput: 0: 44844.9. Samples: 132078320. Policy #0 lag: (min: 0.0, avg: 9.6, max: 22.0) [2024-06-06 15:54:52,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 15:54:54,346][24347] Updated weights for policy 0, policy_version 39728 (0.0027) [2024-06-06 15:54:57,320][24114] Fps is (10 sec: 45893.8, 60 sec: 44781.6, 300 sec: 44597.5). Total num frames: 651051008. Throughput: 0: 44436.7. Samples: 132335560. Policy #0 lag: (min: 0.0, avg: 12.3, max: 25.0) [2024-06-06 15:54:57,329][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:54:57,610][24347] Updated weights for policy 0, policy_version 39738 (0.0041) [2024-06-06 15:55:01,637][24347] Updated weights for policy 0, policy_version 39748 (0.0036) [2024-06-06 15:55:02,318][24114] Fps is (10 sec: 44235.9, 60 sec: 44510.0, 300 sec: 44597.8). Total num frames: 651247616. Throughput: 0: 44639.1. Samples: 132479160. Policy #0 lag: (min: 0.0, avg: 12.3, max: 25.0) [2024-06-06 15:55:02,319][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 15:55:04,759][24347] Updated weights for policy 0, policy_version 39758 (0.0033) [2024-06-06 15:55:07,324][24114] Fps is (10 sec: 42581.1, 60 sec: 44232.4, 300 sec: 44652.4). Total num frames: 651476992. Throughput: 0: 44462.1. Samples: 132744140. Policy #0 lag: (min: 0.0, avg: 12.3, max: 25.0) [2024-06-06 15:55:07,325][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 15:55:08,980][24347] Updated weights for policy 0, policy_version 39768 (0.0031) [2024-06-06 15:55:12,099][24347] Updated weights for policy 0, policy_version 39778 (0.0031) [2024-06-06 15:55:12,320][24114] Fps is (10 sec: 47504.6, 60 sec: 45054.5, 300 sec: 44653.0). Total num frames: 651722752. Throughput: 0: 44577.3. Samples: 133008180. Policy #0 lag: (min: 0.0, avg: 12.3, max: 25.0) [2024-06-06 15:55:12,321][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 15:55:16,322][24347] Updated weights for policy 0, policy_version 39788 (0.0026) [2024-06-06 15:55:17,318][24114] Fps is (10 sec: 44263.4, 60 sec: 44783.1, 300 sec: 44653.4). Total num frames: 651919360. Throughput: 0: 44606.7. Samples: 133148760. Policy #0 lag: (min: 0.0, avg: 12.3, max: 25.0) [2024-06-06 15:55:17,318][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:55:19,333][24347] Updated weights for policy 0, policy_version 39798 (0.0032) [2024-06-06 15:55:22,318][24114] Fps is (10 sec: 42606.4, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 652148736. Throughput: 0: 44767.8. Samples: 133417820. Policy #0 lag: (min: 0.0, avg: 12.3, max: 25.0) [2024-06-06 15:55:22,319][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:55:24,053][24347] Updated weights for policy 0, policy_version 39808 (0.0024) [2024-06-06 15:55:26,624][24347] Updated weights for policy 0, policy_version 39818 (0.0029) [2024-06-06 15:55:27,318][24114] Fps is (10 sec: 47513.9, 60 sec: 44511.4, 300 sec: 44708.9). Total num frames: 652394496. Throughput: 0: 44566.9. Samples: 133678500. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 15:55:27,318][24114] Avg episode reward: [(0, '0.232')] [2024-06-06 15:55:30,122][24326] Signal inference workers to stop experience collection... (1900 times) [2024-06-06 15:55:30,123][24326] Signal inference workers to resume experience collection... (1900 times) [2024-06-06 15:55:30,152][24347] InferenceWorker_p0-w0: stopping experience collection (1900 times) [2024-06-06 15:55:30,152][24347] InferenceWorker_p0-w0: resuming experience collection (1900 times) [2024-06-06 15:55:31,146][24347] Updated weights for policy 0, policy_version 39828 (0.0028) [2024-06-06 15:55:32,318][24114] Fps is (10 sec: 44237.5, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 652591104. Throughput: 0: 44689.5. Samples: 133818980. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 15:55:32,318][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:55:33,913][24347] Updated weights for policy 0, policy_version 39838 (0.0025) [2024-06-06 15:55:37,318][24114] Fps is (10 sec: 40959.8, 60 sec: 44236.8, 300 sec: 44597.8). Total num frames: 652804096. Throughput: 0: 44648.0. Samples: 134087480. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 15:55:37,318][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:55:38,453][24347] Updated weights for policy 0, policy_version 39848 (0.0041) [2024-06-06 15:55:41,231][24347] Updated weights for policy 0, policy_version 39858 (0.0028) [2024-06-06 15:55:42,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44511.4, 300 sec: 44708.9). Total num frames: 653049856. Throughput: 0: 44788.2. Samples: 134350940. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 15:55:42,318][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 15:55:45,820][24347] Updated weights for policy 0, policy_version 39868 (0.0033) [2024-06-06 15:55:47,318][24114] Fps is (10 sec: 49152.1, 60 sec: 45060.5, 300 sec: 44764.4). Total num frames: 653295616. Throughput: 0: 44677.1. Samples: 134489620. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 15:55:47,318][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 15:55:48,743][24347] Updated weights for policy 0, policy_version 39878 (0.0026) [2024-06-06 15:55:52,318][24114] Fps is (10 sec: 42598.5, 60 sec: 44509.8, 300 sec: 44597.8). Total num frames: 653475840. Throughput: 0: 44612.2. Samples: 134751420. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 15:55:52,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:55:53,432][24347] Updated weights for policy 0, policy_version 39888 (0.0032) [2024-06-06 15:55:56,077][24347] Updated weights for policy 0, policy_version 39898 (0.0023) [2024-06-06 15:55:57,318][24114] Fps is (10 sec: 40959.1, 60 sec: 44238.1, 300 sec: 44597.8). Total num frames: 653705216. Throughput: 0: 44569.8. Samples: 135013740. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 15:55:57,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 15:56:00,429][24347] Updated weights for policy 0, policy_version 39908 (0.0046) [2024-06-06 15:56:02,318][24114] Fps is (10 sec: 45874.5, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 653934592. Throughput: 0: 44653.2. Samples: 135158160. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 15:56:02,319][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 15:56:03,362][24347] Updated weights for policy 0, policy_version 39918 (0.0031) [2024-06-06 15:56:07,318][24114] Fps is (10 sec: 44237.7, 60 sec: 44514.3, 300 sec: 44597.8). Total num frames: 654147584. Throughput: 0: 44373.1. Samples: 135414600. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 15:56:07,318][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:56:07,838][24347] Updated weights for policy 0, policy_version 39928 (0.0024) [2024-06-06 15:56:10,950][24347] Updated weights for policy 0, policy_version 39938 (0.0037) [2024-06-06 15:56:12,318][24114] Fps is (10 sec: 44237.5, 60 sec: 44238.3, 300 sec: 44653.4). Total num frames: 654376960. Throughput: 0: 44471.5. Samples: 135679720. Policy #0 lag: (min: 0.0, avg: 11.7, max: 21.0) [2024-06-06 15:56:12,318][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 15:56:15,287][24347] Updated weights for policy 0, policy_version 39948 (0.0034) [2024-06-06 15:56:17,318][24114] Fps is (10 sec: 49151.1, 60 sec: 45328.9, 300 sec: 44764.4). Total num frames: 654639104. Throughput: 0: 44501.6. Samples: 135821560. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:56:17,319][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 15:56:18,214][24347] Updated weights for policy 0, policy_version 39958 (0.0034) [2024-06-06 15:56:22,318][24114] Fps is (10 sec: 42598.0, 60 sec: 44236.9, 300 sec: 44542.3). Total num frames: 654802944. Throughput: 0: 44537.7. Samples: 136091680. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:56:22,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 15:56:22,349][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000039966_654802944.pth... [2024-06-06 15:56:22,413][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000039314_644120576.pth [2024-06-06 15:56:22,846][24347] Updated weights for policy 0, policy_version 39968 (0.0034) [2024-06-06 15:56:25,346][24347] Updated weights for policy 0, policy_version 39978 (0.0036) [2024-06-06 15:56:27,318][24114] Fps is (10 sec: 39322.0, 60 sec: 43963.6, 300 sec: 44542.3). Total num frames: 655032320. Throughput: 0: 44343.9. Samples: 136346420. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:56:27,319][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 15:56:29,970][24347] Updated weights for policy 0, policy_version 39988 (0.0034) [2024-06-06 15:56:32,318][24114] Fps is (10 sec: 47513.8, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 655278080. Throughput: 0: 44261.7. Samples: 136481400. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:56:32,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 15:56:32,986][24347] Updated weights for policy 0, policy_version 39998 (0.0041) [2024-06-06 15:56:37,318][24114] Fps is (10 sec: 44236.9, 60 sec: 44509.8, 300 sec: 44597.8). Total num frames: 655474688. Throughput: 0: 44399.5. Samples: 136749400. Policy #0 lag: (min: 0.0, avg: 9.8, max: 21.0) [2024-06-06 15:56:37,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 15:56:37,404][24347] Updated weights for policy 0, policy_version 40008 (0.0043) [2024-06-06 15:56:40,171][24347] Updated weights for policy 0, policy_version 40018 (0.0033) [2024-06-06 15:56:42,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44236.7, 300 sec: 44542.3). Total num frames: 655704064. Throughput: 0: 44432.5. Samples: 137013200. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 15:56:42,319][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 15:56:44,734][24347] Updated weights for policy 0, policy_version 40028 (0.0035) [2024-06-06 15:56:47,318][24114] Fps is (10 sec: 49152.2, 60 sec: 44509.8, 300 sec: 44708.9). Total num frames: 655966208. Throughput: 0: 44374.8. Samples: 137155020. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 15:56:47,318][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 15:56:48,012][24347] Updated weights for policy 0, policy_version 40038 (0.0031) [2024-06-06 15:56:52,085][24347] Updated weights for policy 0, policy_version 40048 (0.0038) [2024-06-06 15:56:52,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 656146432. Throughput: 0: 44584.9. Samples: 137420920. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 15:56:52,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 15:56:55,078][24347] Updated weights for policy 0, policy_version 40058 (0.0031) [2024-06-06 15:56:57,318][24114] Fps is (10 sec: 39321.6, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 656359424. Throughput: 0: 44632.0. Samples: 137688160. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 15:56:57,318][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 15:56:59,374][24347] Updated weights for policy 0, policy_version 40068 (0.0034) [2024-06-06 15:57:02,305][24347] Updated weights for policy 0, policy_version 40078 (0.0041) [2024-06-06 15:57:02,318][24114] Fps is (10 sec: 49151.5, 60 sec: 45056.0, 300 sec: 44708.8). Total num frames: 656637952. Throughput: 0: 44302.7. Samples: 137815180. Policy #0 lag: (min: 0.0, avg: 11.1, max: 21.0) [2024-06-06 15:57:02,324][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 15:57:06,435][24326] Signal inference workers to stop experience collection... (1950 times) [2024-06-06 15:57:06,442][24326] Signal inference workers to resume experience collection... (1950 times) [2024-06-06 15:57:06,472][24347] InferenceWorker_p0-w0: stopping experience collection (1950 times) [2024-06-06 15:57:06,472][24347] InferenceWorker_p0-w0: resuming experience collection (1950 times) [2024-06-06 15:57:06,751][24347] Updated weights for policy 0, policy_version 40088 (0.0039) [2024-06-06 15:57:07,318][24114] Fps is (10 sec: 47514.0, 60 sec: 44783.0, 300 sec: 44653.4). Total num frames: 656834560. Throughput: 0: 44341.9. Samples: 138087060. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 15:57:07,318][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 15:57:09,644][24347] Updated weights for policy 0, policy_version 40098 (0.0030) [2024-06-06 15:57:12,318][24114] Fps is (10 sec: 39321.0, 60 sec: 44236.6, 300 sec: 44486.7). Total num frames: 657031168. Throughput: 0: 44685.1. Samples: 138357260. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 15:57:12,319][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 15:57:14,095][24347] Updated weights for policy 0, policy_version 40108 (0.0029) [2024-06-06 15:57:17,166][24347] Updated weights for policy 0, policy_version 40118 (0.0030) [2024-06-06 15:57:17,318][24114] Fps is (10 sec: 45874.9, 60 sec: 44236.9, 300 sec: 44653.4). Total num frames: 657293312. Throughput: 0: 44626.3. Samples: 138489580. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 15:57:17,318][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 15:57:21,479][24347] Updated weights for policy 0, policy_version 40128 (0.0041) [2024-06-06 15:57:22,318][24114] Fps is (10 sec: 47514.2, 60 sec: 45055.9, 300 sec: 44597.8). Total num frames: 657506304. Throughput: 0: 44659.0. Samples: 138759060. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 15:57:22,319][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:57:24,315][24347] Updated weights for policy 0, policy_version 40138 (0.0031) [2024-06-06 15:57:27,318][24114] Fps is (10 sec: 40959.6, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 657702912. Throughput: 0: 44723.1. Samples: 139025740. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 15:57:27,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 15:57:28,621][24347] Updated weights for policy 0, policy_version 40148 (0.0029) [2024-06-06 15:57:31,710][24347] Updated weights for policy 0, policy_version 40158 (0.0032) [2024-06-06 15:57:32,318][24114] Fps is (10 sec: 45875.9, 60 sec: 44783.0, 300 sec: 44708.9). Total num frames: 657965056. Throughput: 0: 44518.2. Samples: 139158340. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:57:32,318][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:57:35,854][24347] Updated weights for policy 0, policy_version 40168 (0.0037) [2024-06-06 15:57:37,318][24114] Fps is (10 sec: 45875.6, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 658161664. Throughput: 0: 44646.7. Samples: 139430020. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:57:37,324][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 15:57:38,787][24347] Updated weights for policy 0, policy_version 40178 (0.0042) [2024-06-06 15:57:42,318][24114] Fps is (10 sec: 40959.8, 60 sec: 44509.9, 300 sec: 44543.2). Total num frames: 658374656. Throughput: 0: 44629.3. Samples: 139696480. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:57:42,319][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:57:43,325][24347] Updated weights for policy 0, policy_version 40188 (0.0028) [2024-06-06 15:57:46,567][24347] Updated weights for policy 0, policy_version 40198 (0.0029) [2024-06-06 15:57:47,318][24114] Fps is (10 sec: 45874.5, 60 sec: 44236.7, 300 sec: 44653.3). Total num frames: 658620416. Throughput: 0: 44700.0. Samples: 139826680. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:57:47,319][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 15:57:50,859][24347] Updated weights for policy 0, policy_version 40208 (0.0027) [2024-06-06 15:57:52,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 658833408. Throughput: 0: 44564.8. Samples: 140092480. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 15:57:52,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 15:57:54,246][24347] Updated weights for policy 0, policy_version 40218 (0.0028) [2024-06-06 15:57:57,318][24114] Fps is (10 sec: 40960.7, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 659030016. Throughput: 0: 44354.1. Samples: 140353180. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 15:57:57,318][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:57:58,028][24347] Updated weights for policy 0, policy_version 40228 (0.0033) [2024-06-06 15:58:01,370][24347] Updated weights for policy 0, policy_version 40238 (0.0036) [2024-06-06 15:58:02,318][24114] Fps is (10 sec: 44236.8, 60 sec: 43963.8, 300 sec: 44542.3). Total num frames: 659275776. Throughput: 0: 44371.1. Samples: 140486280. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 15:58:02,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:58:05,289][24347] Updated weights for policy 0, policy_version 40248 (0.0034) [2024-06-06 15:58:07,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 659505152. Throughput: 0: 44472.6. Samples: 140760320. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 15:58:07,318][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:58:08,470][24347] Updated weights for policy 0, policy_version 40258 (0.0031) [2024-06-06 15:58:12,318][24114] Fps is (10 sec: 45874.9, 60 sec: 45056.2, 300 sec: 44597.8). Total num frames: 659734528. Throughput: 0: 44721.4. Samples: 141038200. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 15:58:12,318][24114] Avg episode reward: [(0, '0.237')] [2024-06-06 15:58:12,609][24347] Updated weights for policy 0, policy_version 40268 (0.0038) [2024-06-06 15:58:13,454][24326] Signal inference workers to stop experience collection... (2000 times) [2024-06-06 15:58:13,501][24326] Signal inference workers to resume experience collection... (2000 times) [2024-06-06 15:58:13,505][24347] InferenceWorker_p0-w0: stopping experience collection (2000 times) [2024-06-06 15:58:13,532][24347] InferenceWorker_p0-w0: resuming experience collection (2000 times) [2024-06-06 15:58:16,059][24347] Updated weights for policy 0, policy_version 40278 (0.0027) [2024-06-06 15:58:17,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 659947520. Throughput: 0: 44665.3. Samples: 141168280. Policy #0 lag: (min: 1.0, avg: 9.5, max: 21.0) [2024-06-06 15:58:17,318][24114] Avg episode reward: [(0, '0.237')] [2024-06-06 15:58:19,934][24347] Updated weights for policy 0, policy_version 40288 (0.0031) [2024-06-06 15:58:22,318][24114] Fps is (10 sec: 45874.9, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 660193280. Throughput: 0: 44541.7. Samples: 141434400. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 15:58:22,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 15:58:22,328][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000040295_660193280.pth... [2024-06-06 15:58:22,376][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000039641_649478144.pth [2024-06-06 15:58:23,402][24347] Updated weights for policy 0, policy_version 40298 (0.0046) [2024-06-06 15:58:27,203][24347] Updated weights for policy 0, policy_version 40308 (0.0033) [2024-06-06 15:58:27,318][24114] Fps is (10 sec: 45874.9, 60 sec: 45056.0, 300 sec: 44708.9). Total num frames: 660406272. Throughput: 0: 44527.9. Samples: 141700240. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 15:58:27,318][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 15:58:30,535][24347] Updated weights for policy 0, policy_version 40318 (0.0025) [2024-06-06 15:58:32,318][24114] Fps is (10 sec: 42598.5, 60 sec: 44236.7, 300 sec: 44597.8). Total num frames: 660619264. Throughput: 0: 44644.9. Samples: 141835700. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 15:58:32,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 15:58:34,664][24347] Updated weights for policy 0, policy_version 40328 (0.0037) [2024-06-06 15:58:37,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 660848640. Throughput: 0: 44534.2. Samples: 142096520. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 15:58:37,324][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 15:58:38,251][24347] Updated weights for policy 0, policy_version 40338 (0.0027) [2024-06-06 15:58:42,063][24347] Updated weights for policy 0, policy_version 40348 (0.0031) [2024-06-06 15:58:42,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 661061632. Throughput: 0: 44808.3. Samples: 142369560. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 15:58:42,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 15:58:45,816][24347] Updated weights for policy 0, policy_version 40358 (0.0035) [2024-06-06 15:58:47,324][24114] Fps is (10 sec: 42573.1, 60 sec: 44232.5, 300 sec: 44485.8). Total num frames: 661274624. Throughput: 0: 44735.8. Samples: 142499660. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 15:58:47,325][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 15:58:49,333][24347] Updated weights for policy 0, policy_version 40368 (0.0029) [2024-06-06 15:58:52,318][24114] Fps is (10 sec: 45875.5, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 661520384. Throughput: 0: 44737.7. Samples: 142773520. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 15:58:52,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 15:58:52,850][24347] Updated weights for policy 0, policy_version 40378 (0.0035) [2024-06-06 15:58:56,646][24347] Updated weights for policy 0, policy_version 40388 (0.0028) [2024-06-06 15:58:57,318][24114] Fps is (10 sec: 45902.9, 60 sec: 45056.0, 300 sec: 44597.9). Total num frames: 661733376. Throughput: 0: 44404.1. Samples: 143036380. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 15:58:57,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 15:58:59,995][24347] Updated weights for policy 0, policy_version 40398 (0.0029) [2024-06-06 15:59:02,318][24114] Fps is (10 sec: 42598.8, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 661946368. Throughput: 0: 44516.1. Samples: 143171500. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 15:59:02,318][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 15:59:04,052][24347] Updated weights for policy 0, policy_version 40408 (0.0031) [2024-06-06 15:59:07,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 662192128. Throughput: 0: 44638.4. Samples: 143443120. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 15:59:07,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 15:59:07,358][24347] Updated weights for policy 0, policy_version 40418 (0.0034) [2024-06-06 15:59:11,159][24347] Updated weights for policy 0, policy_version 40428 (0.0038) [2024-06-06 15:59:12,318][24114] Fps is (10 sec: 47513.1, 60 sec: 44782.9, 300 sec: 44708.9). Total num frames: 662421504. Throughput: 0: 44668.9. Samples: 143710340. Policy #0 lag: (min: 0.0, avg: 9.0, max: 22.0) [2024-06-06 15:59:12,319][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 15:59:14,893][24347] Updated weights for policy 0, policy_version 40438 (0.0027) [2024-06-06 15:59:17,318][24114] Fps is (10 sec: 42597.6, 60 sec: 44509.8, 300 sec: 44597.8). Total num frames: 662618112. Throughput: 0: 44695.5. Samples: 143847000. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 15:59:17,319][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 15:59:18,510][24347] Updated weights for policy 0, policy_version 40448 (0.0027) [2024-06-06 15:59:22,146][24347] Updated weights for policy 0, policy_version 40458 (0.0032) [2024-06-06 15:59:22,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44783.0, 300 sec: 44598.1). Total num frames: 662880256. Throughput: 0: 44900.0. Samples: 144117020. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 15:59:22,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 15:59:26,138][24347] Updated weights for policy 0, policy_version 40468 (0.0029) [2024-06-06 15:59:27,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 663093248. Throughput: 0: 44644.9. Samples: 144378580. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 15:59:27,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 15:59:29,237][24347] Updated weights for policy 0, policy_version 40478 (0.0047) [2024-06-06 15:59:32,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 663306240. Throughput: 0: 44788.6. Samples: 144514880. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 15:59:32,318][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 15:59:33,276][24347] Updated weights for policy 0, policy_version 40488 (0.0030) [2024-06-06 15:59:33,287][24326] Signal inference workers to stop experience collection... (2050 times) [2024-06-06 15:59:33,287][24326] Signal inference workers to resume experience collection... (2050 times) [2024-06-06 15:59:33,335][24347] InferenceWorker_p0-w0: stopping experience collection (2050 times) [2024-06-06 15:59:33,335][24347] InferenceWorker_p0-w0: resuming experience collection (2050 times) [2024-06-06 15:59:36,471][24347] Updated weights for policy 0, policy_version 40498 (0.0040) [2024-06-06 15:59:37,318][24114] Fps is (10 sec: 42599.0, 60 sec: 44509.9, 300 sec: 44542.6). Total num frames: 663519232. Throughput: 0: 44713.8. Samples: 144785640. Policy #0 lag: (min: 0.0, avg: 10.1, max: 21.0) [2024-06-06 15:59:37,318][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 15:59:40,576][24347] Updated weights for policy 0, policy_version 40508 (0.0032) [2024-06-06 15:59:42,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44783.0, 300 sec: 44598.7). Total num frames: 663748608. Throughput: 0: 44844.8. Samples: 145054400. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 15:59:42,319][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 15:59:43,992][24347] Updated weights for policy 0, policy_version 40518 (0.0041) [2024-06-06 15:59:47,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44787.4, 300 sec: 44597.8). Total num frames: 663961600. Throughput: 0: 44737.3. Samples: 145184680. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 15:59:47,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 15:59:47,910][24347] Updated weights for policy 0, policy_version 40528 (0.0037) [2024-06-06 15:59:51,273][24347] Updated weights for policy 0, policy_version 40538 (0.0030) [2024-06-06 15:59:52,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44509.8, 300 sec: 44542.5). Total num frames: 664190976. Throughput: 0: 44722.1. Samples: 145455620. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 15:59:52,319][24114] Avg episode reward: [(0, '0.239')] [2024-06-06 15:59:55,504][24347] Updated weights for policy 0, policy_version 40548 (0.0047) [2024-06-06 15:59:57,318][24114] Fps is (10 sec: 44236.1, 60 sec: 44509.7, 300 sec: 44597.8). Total num frames: 664403968. Throughput: 0: 44629.6. Samples: 145718680. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 15:59:57,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 15:59:58,687][24347] Updated weights for policy 0, policy_version 40558 (0.0036) [2024-06-06 16:00:02,318][24114] Fps is (10 sec: 44237.5, 60 sec: 44782.9, 300 sec: 44598.7). Total num frames: 664633344. Throughput: 0: 44513.9. Samples: 145850120. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:00:02,318][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:00:02,520][24347] Updated weights for policy 0, policy_version 40568 (0.0026) [2024-06-06 16:00:05,955][24347] Updated weights for policy 0, policy_version 40578 (0.0040) [2024-06-06 16:00:07,318][24114] Fps is (10 sec: 44238.0, 60 sec: 44236.8, 300 sec: 44487.0). Total num frames: 664846336. Throughput: 0: 44335.2. Samples: 146112100. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 16:00:07,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 16:00:10,015][24347] Updated weights for policy 0, policy_version 40588 (0.0034) [2024-06-06 16:00:12,318][24114] Fps is (10 sec: 44235.7, 60 sec: 44236.7, 300 sec: 44597.8). Total num frames: 665075712. Throughput: 0: 44665.7. Samples: 146388540. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 16:00:12,319][24114] Avg episode reward: [(0, '0.258')] [2024-06-06 16:00:13,338][24347] Updated weights for policy 0, policy_version 40598 (0.0040) [2024-06-06 16:00:17,324][24114] Fps is (10 sec: 45847.6, 60 sec: 44778.6, 300 sec: 44596.9). Total num frames: 665305088. Throughput: 0: 44510.6. Samples: 146518120. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 16:00:17,325][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 16:00:17,441][24347] Updated weights for policy 0, policy_version 40608 (0.0040) [2024-06-06 16:00:20,767][24347] Updated weights for policy 0, policy_version 40618 (0.0038) [2024-06-06 16:00:22,318][24114] Fps is (10 sec: 44237.9, 60 sec: 43963.8, 300 sec: 44486.7). Total num frames: 665518080. Throughput: 0: 44338.7. Samples: 146780880. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 16:00:22,318][24114] Avg episode reward: [(0, '0.258')] [2024-06-06 16:00:22,360][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000040621_665534464.pth... [2024-06-06 16:00:22,418][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000039966_654802944.pth [2024-06-06 16:00:24,991][24347] Updated weights for policy 0, policy_version 40628 (0.0040) [2024-06-06 16:00:27,318][24114] Fps is (10 sec: 44263.5, 60 sec: 44237.0, 300 sec: 44597.8). Total num frames: 665747456. Throughput: 0: 44497.5. Samples: 147056780. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 16:00:27,318][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 16:00:28,022][24347] Updated weights for policy 0, policy_version 40638 (0.0031) [2024-06-06 16:00:32,046][24347] Updated weights for policy 0, policy_version 40648 (0.0032) [2024-06-06 16:00:32,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44509.9, 300 sec: 44653.3). Total num frames: 665976832. Throughput: 0: 44593.8. Samples: 147191400. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:00:32,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:00:35,066][24347] Updated weights for policy 0, policy_version 40658 (0.0021) [2024-06-06 16:00:37,318][24114] Fps is (10 sec: 44236.0, 60 sec: 44509.8, 300 sec: 44542.2). Total num frames: 666189824. Throughput: 0: 44443.1. Samples: 147455560. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:00:37,319][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:00:39,405][24347] Updated weights for policy 0, policy_version 40668 (0.0019) [2024-06-06 16:00:42,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 666435584. Throughput: 0: 44533.5. Samples: 147722680. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:00:42,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:00:42,800][24347] Updated weights for policy 0, policy_version 40678 (0.0047) [2024-06-06 16:00:46,639][24347] Updated weights for policy 0, policy_version 40688 (0.0026) [2024-06-06 16:00:47,318][24114] Fps is (10 sec: 44237.5, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 666632192. Throughput: 0: 44689.8. Samples: 147861160. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:00:47,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:00:49,871][24326] Signal inference workers to stop experience collection... (2100 times) [2024-06-06 16:00:49,919][24347] InferenceWorker_p0-w0: stopping experience collection (2100 times) [2024-06-06 16:00:49,988][24326] Signal inference workers to resume experience collection... (2100 times) [2024-06-06 16:00:49,989][24347] InferenceWorker_p0-w0: resuming experience collection (2100 times) [2024-06-06 16:00:50,122][24347] Updated weights for policy 0, policy_version 40698 (0.0030) [2024-06-06 16:00:52,318][24114] Fps is (10 sec: 42598.6, 60 sec: 44510.0, 300 sec: 44597.8). Total num frames: 666861568. Throughput: 0: 44849.7. Samples: 148130340. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:00:52,318][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 16:00:54,116][24347] Updated weights for policy 0, policy_version 40708 (0.0034) [2024-06-06 16:00:57,318][24114] Fps is (10 sec: 45874.4, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 667090944. Throughput: 0: 44535.2. Samples: 148392620. Policy #0 lag: (min: 0.0, avg: 12.1, max: 24.0) [2024-06-06 16:00:57,319][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:00:57,507][24347] Updated weights for policy 0, policy_version 40718 (0.0021) [2024-06-06 16:01:01,431][24347] Updated weights for policy 0, policy_version 40728 (0.0026) [2024-06-06 16:01:02,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44509.8, 300 sec: 44597.8). Total num frames: 667303936. Throughput: 0: 44714.8. Samples: 148530020. Policy #0 lag: (min: 0.0, avg: 12.1, max: 24.0) [2024-06-06 16:01:02,319][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:01:04,535][24347] Updated weights for policy 0, policy_version 40738 (0.0027) [2024-06-06 16:01:07,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44782.8, 300 sec: 44597.8). Total num frames: 667533312. Throughput: 0: 44945.7. Samples: 148803440. Policy #0 lag: (min: 0.0, avg: 12.1, max: 24.0) [2024-06-06 16:01:07,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:01:08,483][24347] Updated weights for policy 0, policy_version 40748 (0.0035) [2024-06-06 16:01:12,117][24347] Updated weights for policy 0, policy_version 40758 (0.0034) [2024-06-06 16:01:12,324][24114] Fps is (10 sec: 47485.7, 60 sec: 45051.7, 300 sec: 44541.4). Total num frames: 667779072. Throughput: 0: 44661.6. Samples: 149066820. Policy #0 lag: (min: 0.0, avg: 12.1, max: 24.0) [2024-06-06 16:01:12,324][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:01:15,878][24347] Updated weights for policy 0, policy_version 40768 (0.0037) [2024-06-06 16:01:17,324][24114] Fps is (10 sec: 44210.7, 60 sec: 44509.9, 300 sec: 44652.5). Total num frames: 667975680. Throughput: 0: 44804.3. Samples: 149207860. Policy #0 lag: (min: 0.0, avg: 12.1, max: 24.0) [2024-06-06 16:01:17,324][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:01:19,429][24347] Updated weights for policy 0, policy_version 40778 (0.0031) [2024-06-06 16:01:22,318][24114] Fps is (10 sec: 45902.4, 60 sec: 45329.1, 300 sec: 44764.4). Total num frames: 668237824. Throughput: 0: 44781.9. Samples: 149470740. Policy #0 lag: (min: 0.0, avg: 12.1, max: 24.0) [2024-06-06 16:01:22,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:01:23,201][24347] Updated weights for policy 0, policy_version 40788 (0.0036) [2024-06-06 16:01:26,844][24347] Updated weights for policy 0, policy_version 40798 (0.0021) [2024-06-06 16:01:27,318][24114] Fps is (10 sec: 45902.7, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 668434432. Throughput: 0: 44680.1. Samples: 149733280. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:01:27,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:01:30,948][24347] Updated weights for policy 0, policy_version 40808 (0.0035) [2024-06-06 16:01:32,318][24114] Fps is (10 sec: 39321.3, 60 sec: 44236.8, 300 sec: 44597.8). Total num frames: 668631040. Throughput: 0: 44602.6. Samples: 149868280. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:01:32,319][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:01:33,895][24347] Updated weights for policy 0, policy_version 40818 (0.0034) [2024-06-06 16:01:37,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44783.0, 300 sec: 44653.3). Total num frames: 668876800. Throughput: 0: 44610.6. Samples: 150137820. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:01:37,321][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:01:37,973][24347] Updated weights for policy 0, policy_version 40828 (0.0045) [2024-06-06 16:01:41,531][24347] Updated weights for policy 0, policy_version 40838 (0.0038) [2024-06-06 16:01:42,318][24114] Fps is (10 sec: 47513.3, 60 sec: 44509.8, 300 sec: 44542.2). Total num frames: 669106176. Throughput: 0: 44662.7. Samples: 150402440. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:01:42,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:01:45,572][24347] Updated weights for policy 0, policy_version 40848 (0.0027) [2024-06-06 16:01:47,320][24114] Fps is (10 sec: 44228.3, 60 sec: 44781.4, 300 sec: 44653.0). Total num frames: 669319168. Throughput: 0: 44601.2. Samples: 150537160. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:01:47,321][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:01:48,937][24347] Updated weights for policy 0, policy_version 40858 (0.0031) [2024-06-06 16:01:52,318][24114] Fps is (10 sec: 44237.3, 60 sec: 44782.9, 300 sec: 44708.9). Total num frames: 669548544. Throughput: 0: 44512.0. Samples: 150806480. Policy #0 lag: (min: 0.0, avg: 12.2, max: 22.0) [2024-06-06 16:01:52,318][24114] Avg episode reward: [(0, '0.257')] [2024-06-06 16:01:52,636][24347] Updated weights for policy 0, policy_version 40868 (0.0036) [2024-06-06 16:01:56,293][24347] Updated weights for policy 0, policy_version 40878 (0.0031) [2024-06-06 16:01:57,318][24114] Fps is (10 sec: 45884.2, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 669777920. Throughput: 0: 44388.1. Samples: 151064020. Policy #0 lag: (min: 0.0, avg: 12.2, max: 22.0) [2024-06-06 16:01:57,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:02:00,564][24347] Updated weights for policy 0, policy_version 40888 (0.0038) [2024-06-06 16:02:02,318][24114] Fps is (10 sec: 40960.4, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 669958144. Throughput: 0: 44196.6. Samples: 151196440. Policy #0 lag: (min: 0.0, avg: 12.2, max: 22.0) [2024-06-06 16:02:02,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:02:02,688][24326] Signal inference workers to stop experience collection... (2150 times) [2024-06-06 16:02:02,740][24347] InferenceWorker_p0-w0: stopping experience collection (2150 times) [2024-06-06 16:02:02,748][24326] Signal inference workers to resume experience collection... (2150 times) [2024-06-06 16:02:02,761][24347] InferenceWorker_p0-w0: resuming experience collection (2150 times) [2024-06-06 16:02:03,448][24347] Updated weights for policy 0, policy_version 40898 (0.0027) [2024-06-06 16:02:07,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44509.9, 300 sec: 44653.4). Total num frames: 670203904. Throughput: 0: 44412.4. Samples: 151469300. Policy #0 lag: (min: 0.0, avg: 12.2, max: 22.0) [2024-06-06 16:02:07,318][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:02:07,619][24347] Updated weights for policy 0, policy_version 40908 (0.0034) [2024-06-06 16:02:11,005][24347] Updated weights for policy 0, policy_version 40918 (0.0047) [2024-06-06 16:02:12,318][24114] Fps is (10 sec: 49151.5, 60 sec: 44514.2, 300 sec: 44597.8). Total num frames: 670449664. Throughput: 0: 44499.9. Samples: 151735780. Policy #0 lag: (min: 0.0, avg: 12.2, max: 22.0) [2024-06-06 16:02:12,318][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:02:15,187][24347] Updated weights for policy 0, policy_version 40928 (0.0039) [2024-06-06 16:02:17,318][24114] Fps is (10 sec: 45875.6, 60 sec: 44787.4, 300 sec: 44597.8). Total num frames: 670662656. Throughput: 0: 44503.2. Samples: 151870920. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 16:02:17,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 16:02:18,617][24347] Updated weights for policy 0, policy_version 40938 (0.0031) [2024-06-06 16:02:22,319][24114] Fps is (10 sec: 42592.4, 60 sec: 43962.7, 300 sec: 44653.1). Total num frames: 670875648. Throughput: 0: 44456.4. Samples: 152138420. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 16:02:22,320][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:02:22,332][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000040947_670875648.pth... [2024-06-06 16:02:22,390][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000040295_660193280.pth [2024-06-06 16:02:22,695][24347] Updated weights for policy 0, policy_version 40948 (0.0032) [2024-06-06 16:02:25,965][24347] Updated weights for policy 0, policy_version 40958 (0.0041) [2024-06-06 16:02:27,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 671105024. Throughput: 0: 44422.8. Samples: 152401460. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 16:02:27,318][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:02:29,715][24347] Updated weights for policy 0, policy_version 40968 (0.0034) [2024-06-06 16:02:32,318][24114] Fps is (10 sec: 44243.8, 60 sec: 44783.1, 300 sec: 44597.8). Total num frames: 671318016. Throughput: 0: 44453.6. Samples: 152537480. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 16:02:32,318][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:02:32,921][24347] Updated weights for policy 0, policy_version 40978 (0.0033) [2024-06-06 16:02:36,909][24347] Updated weights for policy 0, policy_version 40988 (0.0029) [2024-06-06 16:02:37,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44509.8, 300 sec: 44653.3). Total num frames: 671547392. Throughput: 0: 44539.9. Samples: 152810780. Policy #0 lag: (min: 0.0, avg: 10.6, max: 22.0) [2024-06-06 16:02:37,319][24114] Avg episode reward: [(0, '0.260')] [2024-06-06 16:02:40,379][24347] Updated weights for policy 0, policy_version 40998 (0.0033) [2024-06-06 16:02:42,318][24114] Fps is (10 sec: 45874.7, 60 sec: 44510.0, 300 sec: 44597.8). Total num frames: 671776768. Throughput: 0: 44682.3. Samples: 153074720. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 16:02:42,318][24114] Avg episode reward: [(0, '0.257')] [2024-06-06 16:02:44,244][24347] Updated weights for policy 0, policy_version 41008 (0.0038) [2024-06-06 16:02:47,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44511.3, 300 sec: 44597.8). Total num frames: 671989760. Throughput: 0: 44837.7. Samples: 153214140. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 16:02:47,318][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:02:47,970][24347] Updated weights for policy 0, policy_version 41018 (0.0030) [2024-06-06 16:02:51,698][24347] Updated weights for policy 0, policy_version 41028 (0.0032) [2024-06-06 16:02:52,318][24114] Fps is (10 sec: 42597.5, 60 sec: 44236.7, 300 sec: 44653.3). Total num frames: 672202752. Throughput: 0: 44563.0. Samples: 153474640. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 16:02:52,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:02:55,041][24347] Updated weights for policy 0, policy_version 41038 (0.0039) [2024-06-06 16:02:57,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44509.8, 300 sec: 44653.3). Total num frames: 672448512. Throughput: 0: 44727.5. Samples: 153748520. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 16:02:57,319][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:02:58,728][24347] Updated weights for policy 0, policy_version 41048 (0.0032) [2024-06-06 16:03:02,147][24347] Updated weights for policy 0, policy_version 41058 (0.0030) [2024-06-06 16:03:02,318][24114] Fps is (10 sec: 49152.5, 60 sec: 45602.0, 300 sec: 44708.9). Total num frames: 672694272. Throughput: 0: 44817.2. Samples: 153887700. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 16:03:02,319][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:03:06,284][24347] Updated weights for policy 0, policy_version 41068 (0.0036) [2024-06-06 16:03:07,032][24326] Signal inference workers to stop experience collection... (2200 times) [2024-06-06 16:03:07,064][24347] InferenceWorker_p0-w0: stopping experience collection (2200 times) [2024-06-06 16:03:07,092][24326] Signal inference workers to resume experience collection... (2200 times) [2024-06-06 16:03:07,094][24347] InferenceWorker_p0-w0: resuming experience collection (2200 times) [2024-06-06 16:03:07,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44782.8, 300 sec: 44597.8). Total num frames: 672890880. Throughput: 0: 44787.5. Samples: 154153800. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 16:03:07,319][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 16:03:09,574][24347] Updated weights for policy 0, policy_version 41078 (0.0033) [2024-06-06 16:03:12,322][24114] Fps is (10 sec: 42582.3, 60 sec: 44507.0, 300 sec: 44652.8). Total num frames: 673120256. Throughput: 0: 44904.2. Samples: 154422320. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 16:03:12,322][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:03:13,476][24347] Updated weights for policy 0, policy_version 41088 (0.0044) [2024-06-06 16:03:17,107][24347] Updated weights for policy 0, policy_version 41098 (0.0030) [2024-06-06 16:03:17,318][24114] Fps is (10 sec: 45876.1, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 673349632. Throughput: 0: 44893.7. Samples: 154557700. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 16:03:17,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:03:20,799][24347] Updated weights for policy 0, policy_version 41108 (0.0031) [2024-06-06 16:03:22,318][24114] Fps is (10 sec: 40976.1, 60 sec: 44237.9, 300 sec: 44486.8). Total num frames: 673529856. Throughput: 0: 44694.8. Samples: 154822040. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 16:03:22,318][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 16:03:24,528][24347] Updated weights for policy 0, policy_version 41118 (0.0034) [2024-06-06 16:03:27,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44783.0, 300 sec: 44653.4). Total num frames: 673792000. Throughput: 0: 44768.9. Samples: 155089320. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 16:03:27,318][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 16:03:27,865][24347] Updated weights for policy 0, policy_version 41128 (0.0027) [2024-06-06 16:03:31,679][24347] Updated weights for policy 0, policy_version 41138 (0.0032) [2024-06-06 16:03:32,318][24114] Fps is (10 sec: 50790.0, 60 sec: 45328.9, 300 sec: 44708.9). Total num frames: 674037760. Throughput: 0: 44791.1. Samples: 155229740. Policy #0 lag: (min: 0.0, avg: 8.3, max: 20.0) [2024-06-06 16:03:32,319][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:03:35,735][24347] Updated weights for policy 0, policy_version 41148 (0.0026) [2024-06-06 16:03:37,318][24114] Fps is (10 sec: 42597.8, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 674217984. Throughput: 0: 44764.6. Samples: 155489040. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:03:37,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:03:39,117][24347] Updated weights for policy 0, policy_version 41158 (0.0032) [2024-06-06 16:03:42,318][24114] Fps is (10 sec: 40960.2, 60 sec: 44509.9, 300 sec: 44654.3). Total num frames: 674447360. Throughput: 0: 44633.0. Samples: 155757000. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:03:42,318][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 16:03:42,819][24347] Updated weights for policy 0, policy_version 41168 (0.0045) [2024-06-06 16:03:46,696][24347] Updated weights for policy 0, policy_version 41178 (0.0040) [2024-06-06 16:03:47,318][24114] Fps is (10 sec: 49151.5, 60 sec: 45329.0, 300 sec: 44708.9). Total num frames: 674709504. Throughput: 0: 44624.4. Samples: 155895800. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:03:47,319][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:03:50,047][24347] Updated weights for policy 0, policy_version 41188 (0.0027) [2024-06-06 16:03:52,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 674889728. Throughput: 0: 44600.5. Samples: 156160820. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:03:52,319][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:03:53,914][24347] Updated weights for policy 0, policy_version 41198 (0.0024) [2024-06-06 16:03:57,109][24347] Updated weights for policy 0, policy_version 41208 (0.0025) [2024-06-06 16:03:57,318][24114] Fps is (10 sec: 44237.3, 60 sec: 45056.0, 300 sec: 44764.4). Total num frames: 675151872. Throughput: 0: 44624.2. Samples: 156430240. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:03:57,318][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:04:01,023][24347] Updated weights for policy 0, policy_version 41218 (0.0022) [2024-06-06 16:04:02,318][24114] Fps is (10 sec: 47513.3, 60 sec: 44509.8, 300 sec: 44653.3). Total num frames: 675364864. Throughput: 0: 44727.4. Samples: 156570440. Policy #0 lag: (min: 0.0, avg: 8.7, max: 23.0) [2024-06-06 16:04:02,319][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:04:04,619][24347] Updated weights for policy 0, policy_version 41228 (0.0028) [2024-06-06 16:04:07,100][24326] Signal inference workers to stop experience collection... (2250 times) [2024-06-06 16:04:07,135][24347] InferenceWorker_p0-w0: stopping experience collection (2250 times) [2024-06-06 16:04:07,158][24326] Signal inference workers to resume experience collection... (2250 times) [2024-06-06 16:04:07,160][24347] InferenceWorker_p0-w0: resuming experience collection (2250 times) [2024-06-06 16:04:07,318][24114] Fps is (10 sec: 40960.1, 60 sec: 44510.0, 300 sec: 44542.3). Total num frames: 675561472. Throughput: 0: 44708.8. Samples: 156833940. Policy #0 lag: (min: 0.0, avg: 8.7, max: 23.0) [2024-06-06 16:04:07,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:04:08,403][24347] Updated weights for policy 0, policy_version 41238 (0.0031) [2024-06-06 16:04:12,301][24347] Updated weights for policy 0, policy_version 41248 (0.0027) [2024-06-06 16:04:12,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44785.8, 300 sec: 44708.9). Total num frames: 675807232. Throughput: 0: 44690.1. Samples: 157100380. Policy #0 lag: (min: 0.0, avg: 8.7, max: 23.0) [2024-06-06 16:04:12,319][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:04:16,061][24347] Updated weights for policy 0, policy_version 41258 (0.0034) [2024-06-06 16:04:17,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 676020224. Throughput: 0: 44684.8. Samples: 157240560. Policy #0 lag: (min: 0.0, avg: 8.7, max: 23.0) [2024-06-06 16:04:17,319][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:04:19,398][24347] Updated weights for policy 0, policy_version 41268 (0.0040) [2024-06-06 16:04:22,318][24114] Fps is (10 sec: 40959.7, 60 sec: 44782.8, 300 sec: 44486.7). Total num frames: 676216832. Throughput: 0: 44677.7. Samples: 157499540. Policy #0 lag: (min: 0.0, avg: 8.7, max: 23.0) [2024-06-06 16:04:22,319][24114] Avg episode reward: [(0, '0.260')] [2024-06-06 16:04:22,333][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000041273_676216832.pth... [2024-06-06 16:04:22,386][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000040621_665534464.pth [2024-06-06 16:04:23,573][24347] Updated weights for policy 0, policy_version 41278 (0.0033) [2024-06-06 16:04:26,567][24347] Updated weights for policy 0, policy_version 41288 (0.0024) [2024-06-06 16:04:27,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44509.7, 300 sec: 44597.8). Total num frames: 676462592. Throughput: 0: 44456.7. Samples: 157757560. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 16:04:27,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:04:30,549][24347] Updated weights for policy 0, policy_version 41298 (0.0022) [2024-06-06 16:04:32,318][24114] Fps is (10 sec: 47514.4, 60 sec: 44236.9, 300 sec: 44653.4). Total num frames: 676691968. Throughput: 0: 44702.0. Samples: 157907380. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 16:04:32,318][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:04:34,315][24347] Updated weights for policy 0, policy_version 41308 (0.0039) [2024-06-06 16:04:37,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 676888576. Throughput: 0: 44647.0. Samples: 158169940. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 16:04:37,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:04:37,972][24347] Updated weights for policy 0, policy_version 41318 (0.0032) [2024-06-06 16:04:41,702][24347] Updated weights for policy 0, policy_version 41328 (0.0022) [2024-06-06 16:04:42,318][24114] Fps is (10 sec: 44236.1, 60 sec: 44782.8, 300 sec: 44653.3). Total num frames: 677134336. Throughput: 0: 44505.7. Samples: 158433000. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 16:04:42,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:04:45,699][24347] Updated weights for policy 0, policy_version 41338 (0.0038) [2024-06-06 16:04:47,318][24114] Fps is (10 sec: 45876.2, 60 sec: 43963.9, 300 sec: 44597.8). Total num frames: 677347328. Throughput: 0: 44381.1. Samples: 158567580. Policy #0 lag: (min: 0.0, avg: 10.6, max: 21.0) [2024-06-06 16:04:47,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:04:48,863][24347] Updated weights for policy 0, policy_version 41348 (0.0026) [2024-06-06 16:04:52,320][24114] Fps is (10 sec: 42591.3, 60 sec: 44508.6, 300 sec: 44597.6). Total num frames: 677560320. Throughput: 0: 44499.6. Samples: 158836500. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:04:52,320][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:04:52,810][24347] Updated weights for policy 0, policy_version 41358 (0.0035) [2024-06-06 16:04:56,566][24347] Updated weights for policy 0, policy_version 41368 (0.0032) [2024-06-06 16:04:57,318][24114] Fps is (10 sec: 44236.7, 60 sec: 43963.8, 300 sec: 44597.8). Total num frames: 677789696. Throughput: 0: 44510.3. Samples: 159103340. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:04:57,319][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:04:59,941][24347] Updated weights for policy 0, policy_version 41378 (0.0040) [2024-06-06 16:05:02,318][24114] Fps is (10 sec: 47521.8, 60 sec: 44509.9, 300 sec: 44708.9). Total num frames: 678035456. Throughput: 0: 44389.3. Samples: 159238080. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:05:02,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:05:03,829][24347] Updated weights for policy 0, policy_version 41388 (0.0033) [2024-06-06 16:05:07,318][24114] Fps is (10 sec: 45874.6, 60 sec: 44782.9, 300 sec: 44653.4). Total num frames: 678248448. Throughput: 0: 44623.6. Samples: 159507600. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:05:07,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:05:07,450][24347] Updated weights for policy 0, policy_version 41398 (0.0043) [2024-06-06 16:05:10,860][24347] Updated weights for policy 0, policy_version 41408 (0.0032) [2024-06-06 16:05:12,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44236.9, 300 sec: 44598.7). Total num frames: 678461440. Throughput: 0: 44885.0. Samples: 159777380. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:05:12,318][24114] Avg episode reward: [(0, '0.257')] [2024-06-06 16:05:15,043][24347] Updated weights for policy 0, policy_version 41418 (0.0050) [2024-06-06 16:05:17,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 44708.9). Total num frames: 678707200. Throughput: 0: 44526.1. Samples: 159911060. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:05:17,318][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:05:18,061][24347] Updated weights for policy 0, policy_version 41428 (0.0038) [2024-06-06 16:05:22,171][24347] Updated weights for policy 0, policy_version 41438 (0.0028) [2024-06-06 16:05:22,324][24114] Fps is (10 sec: 45847.6, 60 sec: 45051.6, 300 sec: 44652.4). Total num frames: 678920192. Throughput: 0: 44637.8. Samples: 160178900. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 16:05:22,325][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 16:05:24,695][24326] Signal inference workers to stop experience collection... (2300 times) [2024-06-06 16:05:24,698][24326] Signal inference workers to resume experience collection... (2300 times) [2024-06-06 16:05:24,739][24347] InferenceWorker_p0-w0: stopping experience collection (2300 times) [2024-06-06 16:05:24,740][24347] InferenceWorker_p0-w0: resuming experience collection (2300 times) [2024-06-06 16:05:25,792][24347] Updated weights for policy 0, policy_version 41448 (0.0028) [2024-06-06 16:05:27,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 679133184. Throughput: 0: 44743.2. Samples: 160446440. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 16:05:27,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:05:29,532][24347] Updated weights for policy 0, policy_version 41458 (0.0032) [2024-06-06 16:05:32,318][24114] Fps is (10 sec: 42624.1, 60 sec: 44236.8, 300 sec: 44597.8). Total num frames: 679346176. Throughput: 0: 44596.4. Samples: 160574420. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 16:05:32,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:05:32,910][24347] Updated weights for policy 0, policy_version 41468 (0.0028) [2024-06-06 16:05:37,105][24347] Updated weights for policy 0, policy_version 41478 (0.0033) [2024-06-06 16:05:37,322][24114] Fps is (10 sec: 45858.4, 60 sec: 45053.3, 300 sec: 44597.2). Total num frames: 679591936. Throughput: 0: 44714.5. Samples: 160848740. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 16:05:37,322][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:05:39,994][24347] Updated weights for policy 0, policy_version 41488 (0.0036) [2024-06-06 16:05:42,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44236.8, 300 sec: 44597.8). Total num frames: 679788544. Throughput: 0: 44653.2. Samples: 161112740. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 16:05:42,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:05:44,453][24347] Updated weights for policy 0, policy_version 41498 (0.0025) [2024-06-06 16:05:47,318][24114] Fps is (10 sec: 44253.4, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 680034304. Throughput: 0: 44581.0. Samples: 161244220. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 16:05:47,318][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:05:47,605][24347] Updated weights for policy 0, policy_version 41508 (0.0037) [2024-06-06 16:05:51,518][24347] Updated weights for policy 0, policy_version 41518 (0.0027) [2024-06-06 16:05:52,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44784.2, 300 sec: 44597.8). Total num frames: 680247296. Throughput: 0: 44494.7. Samples: 161509860. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 16:05:52,319][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 16:05:55,199][24347] Updated weights for policy 0, policy_version 41528 (0.0036) [2024-06-06 16:05:57,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 680476672. Throughput: 0: 44531.1. Samples: 161781280. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 16:05:57,319][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:05:59,242][24347] Updated weights for policy 0, policy_version 41538 (0.0047) [2024-06-06 16:06:02,318][24114] Fps is (10 sec: 45875.6, 60 sec: 44509.9, 300 sec: 44653.4). Total num frames: 680706048. Throughput: 0: 44350.7. Samples: 161906840. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 16:06:02,318][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:06:02,367][24347] Updated weights for policy 0, policy_version 41548 (0.0035) [2024-06-06 16:06:06,619][24347] Updated weights for policy 0, policy_version 41558 (0.0044) [2024-06-06 16:06:07,318][24114] Fps is (10 sec: 45874.6, 60 sec: 44782.9, 300 sec: 44598.7). Total num frames: 680935424. Throughput: 0: 44571.6. Samples: 162184360. Policy #0 lag: (min: 0.0, avg: 10.9, max: 23.0) [2024-06-06 16:06:07,319][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:06:09,520][24347] Updated weights for policy 0, policy_version 41568 (0.0038) [2024-06-06 16:06:12,318][24114] Fps is (10 sec: 39320.7, 60 sec: 43963.6, 300 sec: 44487.6). Total num frames: 681099264. Throughput: 0: 44419.4. Samples: 162445320. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 16:06:12,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:06:13,967][24347] Updated weights for policy 0, policy_version 41578 (0.0043) [2024-06-06 16:06:16,929][24347] Updated weights for policy 0, policy_version 41588 (0.0033) [2024-06-06 16:06:17,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44509.8, 300 sec: 44542.2). Total num frames: 681377792. Throughput: 0: 44416.3. Samples: 162573160. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 16:06:17,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:06:21,073][24347] Updated weights for policy 0, policy_version 41598 (0.0024) [2024-06-06 16:06:22,318][24114] Fps is (10 sec: 49152.5, 60 sec: 44514.2, 300 sec: 44597.8). Total num frames: 681590784. Throughput: 0: 44371.6. Samples: 162845300. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 16:06:22,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:06:22,325][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000041601_681590784.pth... [2024-06-06 16:06:22,398][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000040947_670875648.pth [2024-06-06 16:06:24,504][24347] Updated weights for policy 0, policy_version 41608 (0.0032) [2024-06-06 16:06:27,318][24114] Fps is (10 sec: 40960.4, 60 sec: 44236.8, 300 sec: 44597.8). Total num frames: 681787392. Throughput: 0: 44403.6. Samples: 163110900. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 16:06:27,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 16:06:28,323][24326] Signal inference workers to stop experience collection... (2350 times) [2024-06-06 16:06:28,347][24347] InferenceWorker_p0-w0: stopping experience collection (2350 times) [2024-06-06 16:06:28,386][24326] Signal inference workers to resume experience collection... (2350 times) [2024-06-06 16:06:28,386][24347] InferenceWorker_p0-w0: resuming experience collection (2350 times) [2024-06-06 16:06:28,519][24347] Updated weights for policy 0, policy_version 41618 (0.0036) [2024-06-06 16:06:31,722][24347] Updated weights for policy 0, policy_version 41628 (0.0021) [2024-06-06 16:06:32,318][24114] Fps is (10 sec: 45875.8, 60 sec: 45056.0, 300 sec: 44653.4). Total num frames: 682049536. Throughput: 0: 44474.2. Samples: 163245560. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 16:06:32,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 16:06:36,143][24347] Updated weights for policy 0, policy_version 41638 (0.0027) [2024-06-06 16:06:37,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44239.6, 300 sec: 44542.3). Total num frames: 682246144. Throughput: 0: 44557.4. Samples: 163514940. Policy #0 lag: (min: 0.0, avg: 10.4, max: 21.0) [2024-06-06 16:06:37,318][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:06:38,944][24347] Updated weights for policy 0, policy_version 41648 (0.0035) [2024-06-06 16:06:42,318][24114] Fps is (10 sec: 39321.2, 60 sec: 44236.8, 300 sec: 44487.0). Total num frames: 682442752. Throughput: 0: 44300.4. Samples: 163774800. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 16:06:42,319][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:06:43,615][24347] Updated weights for policy 0, policy_version 41658 (0.0025) [2024-06-06 16:06:46,471][24347] Updated weights for policy 0, policy_version 41668 (0.0028) [2024-06-06 16:06:47,318][24114] Fps is (10 sec: 47512.9, 60 sec: 44782.8, 300 sec: 44653.3). Total num frames: 682721280. Throughput: 0: 44402.5. Samples: 163904960. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 16:06:47,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:06:50,796][24347] Updated weights for policy 0, policy_version 41678 (0.0021) [2024-06-06 16:06:52,318][24114] Fps is (10 sec: 47514.3, 60 sec: 44510.0, 300 sec: 44542.3). Total num frames: 682917888. Throughput: 0: 44110.5. Samples: 164169320. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 16:06:52,318][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:06:54,110][24347] Updated weights for policy 0, policy_version 41688 (0.0044) [2024-06-06 16:06:57,318][24114] Fps is (10 sec: 39322.1, 60 sec: 43963.8, 300 sec: 44597.8). Total num frames: 683114496. Throughput: 0: 44381.1. Samples: 164442460. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 16:06:57,318][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 16:06:58,231][24347] Updated weights for policy 0, policy_version 41698 (0.0026) [2024-06-06 16:07:01,343][24347] Updated weights for policy 0, policy_version 41708 (0.0030) [2024-06-06 16:07:02,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44236.8, 300 sec: 44597.8). Total num frames: 683360256. Throughput: 0: 44495.3. Samples: 164575440. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 16:07:02,318][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 16:07:05,974][24347] Updated weights for policy 0, policy_version 41718 (0.0030) [2024-06-06 16:07:07,320][24114] Fps is (10 sec: 45866.0, 60 sec: 43962.4, 300 sec: 44486.4). Total num frames: 683573248. Throughput: 0: 44340.4. Samples: 164840700. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:07:07,321][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:07:08,644][24347] Updated weights for policy 0, policy_version 41728 (0.0022) [2024-06-06 16:07:12,318][24114] Fps is (10 sec: 42597.6, 60 sec: 44783.0, 300 sec: 44486.7). Total num frames: 683786240. Throughput: 0: 44362.1. Samples: 165107200. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:07:12,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:07:13,102][24347] Updated weights for policy 0, policy_version 41738 (0.0043) [2024-06-06 16:07:15,985][24347] Updated weights for policy 0, policy_version 41748 (0.0031) [2024-06-06 16:07:17,318][24114] Fps is (10 sec: 45883.6, 60 sec: 44236.8, 300 sec: 44598.0). Total num frames: 684032000. Throughput: 0: 44268.7. Samples: 165237660. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:07:17,319][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 16:07:20,100][24347] Updated weights for policy 0, policy_version 41758 (0.0030) [2024-06-06 16:07:22,318][24114] Fps is (10 sec: 44236.6, 60 sec: 43963.7, 300 sec: 44486.7). Total num frames: 684228608. Throughput: 0: 44228.2. Samples: 165505220. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:07:22,319][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:07:23,634][24347] Updated weights for policy 0, policy_version 41768 (0.0032) [2024-06-06 16:07:27,318][24114] Fps is (10 sec: 40960.8, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 684441600. Throughput: 0: 44546.3. Samples: 165779380. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:07:27,318][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 16:07:27,693][24347] Updated weights for policy 0, policy_version 41778 (0.0037) [2024-06-06 16:07:30,731][24347] Updated weights for policy 0, policy_version 41788 (0.0027) [2024-06-06 16:07:32,318][24114] Fps is (10 sec: 47513.8, 60 sec: 44236.7, 300 sec: 44597.8). Total num frames: 684703744. Throughput: 0: 44564.4. Samples: 165910360. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 16:07:32,319][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:07:35,042][24326] Signal inference workers to stop experience collection... (2400 times) [2024-06-06 16:07:35,075][24347] InferenceWorker_p0-w0: stopping experience collection (2400 times) [2024-06-06 16:07:35,101][24326] Signal inference workers to resume experience collection... (2400 times) [2024-06-06 16:07:35,102][24347] InferenceWorker_p0-w0: resuming experience collection (2400 times) [2024-06-06 16:07:35,243][24347] Updated weights for policy 0, policy_version 41798 (0.0032) [2024-06-06 16:07:37,318][24114] Fps is (10 sec: 49151.2, 60 sec: 44782.8, 300 sec: 44597.8). Total num frames: 684933120. Throughput: 0: 44585.6. Samples: 166175680. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 16:07:37,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:07:37,850][24347] Updated weights for policy 0, policy_version 41808 (0.0035) [2024-06-06 16:07:42,282][24347] Updated weights for policy 0, policy_version 41818 (0.0025) [2024-06-06 16:07:42,318][24114] Fps is (10 sec: 44237.0, 60 sec: 45056.0, 300 sec: 44597.8). Total num frames: 685146112. Throughput: 0: 44519.0. Samples: 166445820. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 16:07:42,318][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 16:07:45,482][24347] Updated weights for policy 0, policy_version 41828 (0.0024) [2024-06-06 16:07:47,318][24114] Fps is (10 sec: 42599.0, 60 sec: 43963.8, 300 sec: 44597.8). Total num frames: 685359104. Throughput: 0: 44352.0. Samples: 166571280. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 16:07:47,318][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:07:49,679][24347] Updated weights for policy 0, policy_version 41838 (0.0032) [2024-06-06 16:07:52,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44509.7, 300 sec: 44542.3). Total num frames: 685588480. Throughput: 0: 44705.4. Samples: 166852360. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 16:07:52,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:07:53,059][24347] Updated weights for policy 0, policy_version 41848 (0.0036) [2024-06-06 16:07:57,060][24347] Updated weights for policy 0, policy_version 41858 (0.0029) [2024-06-06 16:07:57,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 685801472. Throughput: 0: 44566.4. Samples: 167112680. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 16:07:57,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:08:00,193][24347] Updated weights for policy 0, policy_version 41868 (0.0030) [2024-06-06 16:08:02,318][24114] Fps is (10 sec: 44237.3, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 686030848. Throughput: 0: 44535.3. Samples: 167241740. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:08:02,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:08:04,712][24347] Updated weights for policy 0, policy_version 41878 (0.0032) [2024-06-06 16:08:07,318][24114] Fps is (10 sec: 45874.3, 60 sec: 44784.3, 300 sec: 44542.8). Total num frames: 686260224. Throughput: 0: 44663.2. Samples: 167515060. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:08:07,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:08:07,487][24347] Updated weights for policy 0, policy_version 41888 (0.0023) [2024-06-06 16:08:11,728][24347] Updated weights for policy 0, policy_version 41898 (0.0034) [2024-06-06 16:08:12,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44783.1, 300 sec: 44486.7). Total num frames: 686473216. Throughput: 0: 44395.1. Samples: 167777160. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:08:12,318][24114] Avg episode reward: [(0, '0.239')] [2024-06-06 16:08:14,878][24347] Updated weights for policy 0, policy_version 41908 (0.0030) [2024-06-06 16:08:17,318][24114] Fps is (10 sec: 44237.5, 60 sec: 44510.0, 300 sec: 44653.3). Total num frames: 686702592. Throughput: 0: 44395.7. Samples: 167908160. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:08:17,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:08:18,821][24347] Updated weights for policy 0, policy_version 41918 (0.0033) [2024-06-06 16:08:22,268][24347] Updated weights for policy 0, policy_version 41928 (0.0036) [2024-06-06 16:08:22,318][24114] Fps is (10 sec: 47512.7, 60 sec: 45329.1, 300 sec: 44597.8). Total num frames: 686948352. Throughput: 0: 44714.7. Samples: 168187840. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:08:22,318][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:08:22,332][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000041928_686948352.pth... [2024-06-06 16:08:22,401][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000041273_676216832.pth [2024-06-06 16:08:26,398][24347] Updated weights for policy 0, policy_version 41938 (0.0036) [2024-06-06 16:08:27,324][24114] Fps is (10 sec: 42572.7, 60 sec: 44778.4, 300 sec: 44374.8). Total num frames: 687128576. Throughput: 0: 44501.3. Samples: 168448640. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 16:08:27,324][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:08:29,436][24347] Updated weights for policy 0, policy_version 41948 (0.0023) [2024-06-06 16:08:32,318][24114] Fps is (10 sec: 40960.1, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 687357952. Throughput: 0: 44509.7. Samples: 168574220. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 16:08:32,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:08:33,838][24347] Updated weights for policy 0, policy_version 41958 (0.0045) [2024-06-06 16:08:36,955][24347] Updated weights for policy 0, policy_version 41968 (0.0038) [2024-06-06 16:08:37,320][24114] Fps is (10 sec: 49171.7, 60 sec: 44781.6, 300 sec: 44653.0). Total num frames: 687620096. Throughput: 0: 44348.4. Samples: 168848120. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 16:08:37,321][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 16:08:41,269][24347] Updated weights for policy 0, policy_version 41978 (0.0036) [2024-06-06 16:08:42,318][24114] Fps is (10 sec: 42599.0, 60 sec: 43963.8, 300 sec: 44320.1). Total num frames: 687783936. Throughput: 0: 44404.9. Samples: 169110900. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 16:08:42,318][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:08:43,868][24326] Signal inference workers to stop experience collection... (2450 times) [2024-06-06 16:08:43,876][24326] Signal inference workers to resume experience collection... (2450 times) [2024-06-06 16:08:43,885][24347] InferenceWorker_p0-w0: stopping experience collection (2450 times) [2024-06-06 16:08:43,912][24347] InferenceWorker_p0-w0: resuming experience collection (2450 times) [2024-06-06 16:08:44,466][24347] Updated weights for policy 0, policy_version 41988 (0.0038) [2024-06-06 16:08:47,324][24114] Fps is (10 sec: 40943.3, 60 sec: 44505.4, 300 sec: 44541.4). Total num frames: 688029696. Throughput: 0: 44383.4. Samples: 169239260. Policy #0 lag: (min: 0.0, avg: 9.8, max: 20.0) [2024-06-06 16:08:47,325][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:08:48,298][24347] Updated weights for policy 0, policy_version 41998 (0.0027) [2024-06-06 16:08:51,962][24347] Updated weights for policy 0, policy_version 42008 (0.0041) [2024-06-06 16:08:52,318][24114] Fps is (10 sec: 47512.8, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 688259072. Throughput: 0: 44466.2. Samples: 169516040. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:08:52,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:08:55,954][24347] Updated weights for policy 0, policy_version 42018 (0.0041) [2024-06-06 16:08:57,318][24114] Fps is (10 sec: 44263.2, 60 sec: 44509.7, 300 sec: 44431.2). Total num frames: 688472064. Throughput: 0: 44733.2. Samples: 169790160. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:08:57,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:08:59,033][24347] Updated weights for policy 0, policy_version 42028 (0.0044) [2024-06-06 16:09:02,318][24114] Fps is (10 sec: 44237.6, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 688701440. Throughput: 0: 44552.9. Samples: 169913040. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:09:02,318][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 16:09:03,307][24347] Updated weights for policy 0, policy_version 42038 (0.0027) [2024-06-06 16:09:06,472][24347] Updated weights for policy 0, policy_version 42048 (0.0038) [2024-06-06 16:09:07,318][24114] Fps is (10 sec: 45875.5, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 688930816. Throughput: 0: 44340.6. Samples: 170183160. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:09:07,318][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:09:10,534][24347] Updated weights for policy 0, policy_version 42058 (0.0032) [2024-06-06 16:09:12,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 689127424. Throughput: 0: 44550.4. Samples: 170453140. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:09:12,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:09:13,881][24347] Updated weights for policy 0, policy_version 42068 (0.0022) [2024-06-06 16:09:17,318][24114] Fps is (10 sec: 44237.3, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 689373184. Throughput: 0: 44663.3. Samples: 170584060. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:09:17,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:09:17,530][24347] Updated weights for policy 0, policy_version 42078 (0.0025) [2024-06-06 16:09:21,247][24347] Updated weights for policy 0, policy_version 42088 (0.0034) [2024-06-06 16:09:22,318][24114] Fps is (10 sec: 49152.0, 60 sec: 44510.0, 300 sec: 44597.8). Total num frames: 689618944. Throughput: 0: 44673.1. Samples: 170858320. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 16:09:22,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:09:25,011][24347] Updated weights for policy 0, policy_version 42098 (0.0027) [2024-06-06 16:09:27,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44787.4, 300 sec: 44486.7). Total num frames: 689815552. Throughput: 0: 44828.4. Samples: 171128180. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 16:09:27,318][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 16:09:28,302][24347] Updated weights for policy 0, policy_version 42108 (0.0036) [2024-06-06 16:09:32,318][24114] Fps is (10 sec: 40959.8, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 690028544. Throughput: 0: 44820.2. Samples: 171255900. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 16:09:32,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:09:32,712][24347] Updated weights for policy 0, policy_version 42118 (0.0020) [2024-06-06 16:09:35,883][24347] Updated weights for policy 0, policy_version 42128 (0.0028) [2024-06-06 16:09:37,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44238.2, 300 sec: 44542.3). Total num frames: 690274304. Throughput: 0: 44579.2. Samples: 171522100. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 16:09:37,318][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:09:39,762][24347] Updated weights for policy 0, policy_version 42138 (0.0041) [2024-06-06 16:09:42,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44782.9, 300 sec: 44486.7). Total num frames: 690470912. Throughput: 0: 44626.3. Samples: 171798340. Policy #0 lag: (min: 0.0, avg: 10.8, max: 21.0) [2024-06-06 16:09:42,319][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:09:43,318][24347] Updated weights for policy 0, policy_version 42148 (0.0037) [2024-06-06 16:09:46,931][24347] Updated weights for policy 0, policy_version 42158 (0.0034) [2024-06-06 16:09:47,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44787.5, 300 sec: 44598.1). Total num frames: 690716672. Throughput: 0: 44760.4. Samples: 171927260. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:09:47,318][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:09:50,539][24347] Updated weights for policy 0, policy_version 42168 (0.0035) [2024-06-06 16:09:52,318][24114] Fps is (10 sec: 47513.3, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 690946048. Throughput: 0: 44691.9. Samples: 172194300. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:09:52,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:09:53,812][24326] Signal inference workers to stop experience collection... (2500 times) [2024-06-06 16:09:53,812][24326] Signal inference workers to resume experience collection... (2500 times) [2024-06-06 16:09:53,828][24347] InferenceWorker_p0-w0: stopping experience collection (2500 times) [2024-06-06 16:09:53,828][24347] InferenceWorker_p0-w0: resuming experience collection (2500 times) [2024-06-06 16:09:54,505][24347] Updated weights for policy 0, policy_version 42178 (0.0035) [2024-06-06 16:09:57,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44783.0, 300 sec: 44486.7). Total num frames: 691159040. Throughput: 0: 44727.9. Samples: 172465900. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:09:57,319][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 16:09:57,969][24347] Updated weights for policy 0, policy_version 42188 (0.0038) [2024-06-06 16:10:02,163][24347] Updated weights for policy 0, policy_version 42198 (0.0036) [2024-06-06 16:10:02,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44782.8, 300 sec: 44542.3). Total num frames: 691388416. Throughput: 0: 44840.7. Samples: 172601900. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:10:02,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:10:05,516][24347] Updated weights for policy 0, policy_version 42208 (0.0035) [2024-06-06 16:10:07,318][24114] Fps is (10 sec: 44236.9, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 691601408. Throughput: 0: 44572.9. Samples: 172864100. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:10:07,324][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:10:09,246][24347] Updated weights for policy 0, policy_version 42218 (0.0026) [2024-06-06 16:10:12,318][24114] Fps is (10 sec: 42598.6, 60 sec: 44782.9, 300 sec: 44431.2). Total num frames: 691814400. Throughput: 0: 44588.8. Samples: 173134680. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:10:12,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:10:12,666][24347] Updated weights for policy 0, policy_version 42228 (0.0028) [2024-06-06 16:10:16,694][24347] Updated weights for policy 0, policy_version 42238 (0.0044) [2024-06-06 16:10:17,320][24114] Fps is (10 sec: 44228.3, 60 sec: 44508.3, 300 sec: 44487.3). Total num frames: 692043776. Throughput: 0: 44599.9. Samples: 173262980. Policy #0 lag: (min: 2.0, avg: 11.2, max: 21.0) [2024-06-06 16:10:17,321][24114] Avg episode reward: [(0, '0.239')] [2024-06-06 16:10:19,986][24347] Updated weights for policy 0, policy_version 42248 (0.0028) [2024-06-06 16:10:22,318][24114] Fps is (10 sec: 45875.5, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 692273152. Throughput: 0: 44466.7. Samples: 173523100. Policy #0 lag: (min: 2.0, avg: 11.2, max: 21.0) [2024-06-06 16:10:22,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:10:22,428][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000042254_692289536.pth... [2024-06-06 16:10:22,483][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000041601_681590784.pth [2024-06-06 16:10:24,405][24347] Updated weights for policy 0, policy_version 42258 (0.0041) [2024-06-06 16:10:27,318][24114] Fps is (10 sec: 45883.8, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 692502528. Throughput: 0: 44304.9. Samples: 173792060. Policy #0 lag: (min: 2.0, avg: 11.2, max: 21.0) [2024-06-06 16:10:27,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:10:27,413][24347] Updated weights for policy 0, policy_version 42268 (0.0037) [2024-06-06 16:10:31,619][24347] Updated weights for policy 0, policy_version 42278 (0.0021) [2024-06-06 16:10:32,318][24114] Fps is (10 sec: 44235.9, 60 sec: 44782.8, 300 sec: 44487.3). Total num frames: 692715520. Throughput: 0: 44406.0. Samples: 173925540. Policy #0 lag: (min: 2.0, avg: 11.2, max: 21.0) [2024-06-06 16:10:32,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:10:35,219][24347] Updated weights for policy 0, policy_version 42288 (0.0023) [2024-06-06 16:10:37,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 692928512. Throughput: 0: 44350.2. Samples: 174190060. Policy #0 lag: (min: 2.0, avg: 11.2, max: 21.0) [2024-06-06 16:10:37,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:10:38,805][24347] Updated weights for policy 0, policy_version 42298 (0.0034) [2024-06-06 16:10:42,184][24347] Updated weights for policy 0, policy_version 42308 (0.0044) [2024-06-06 16:10:42,318][24114] Fps is (10 sec: 45874.7, 60 sec: 45055.8, 300 sec: 44542.2). Total num frames: 693174272. Throughput: 0: 44414.9. Samples: 174464580. Policy #0 lag: (min: 1.0, avg: 11.5, max: 24.0) [2024-06-06 16:10:42,319][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 16:10:46,199][24347] Updated weights for policy 0, policy_version 42318 (0.0032) [2024-06-06 16:10:47,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 693370880. Throughput: 0: 44462.7. Samples: 174602720. Policy #0 lag: (min: 1.0, avg: 11.5, max: 24.0) [2024-06-06 16:10:47,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:10:49,381][24347] Updated weights for policy 0, policy_version 42328 (0.0030) [2024-06-06 16:10:52,318][24114] Fps is (10 sec: 42599.4, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 693600256. Throughput: 0: 44400.0. Samples: 174862100. Policy #0 lag: (min: 1.0, avg: 11.5, max: 24.0) [2024-06-06 16:10:52,319][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 16:10:54,009][24347] Updated weights for policy 0, policy_version 42338 (0.0039) [2024-06-06 16:10:56,865][24347] Updated weights for policy 0, policy_version 42348 (0.0038) [2024-06-06 16:10:56,886][24326] Signal inference workers to stop experience collection... (2550 times) [2024-06-06 16:10:56,886][24326] Signal inference workers to resume experience collection... (2550 times) [2024-06-06 16:10:56,924][24347] InferenceWorker_p0-w0: stopping experience collection (2550 times) [2024-06-06 16:10:56,924][24347] InferenceWorker_p0-w0: resuming experience collection (2550 times) [2024-06-06 16:10:57,318][24114] Fps is (10 sec: 47513.5, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 693846016. Throughput: 0: 44377.3. Samples: 175131660. Policy #0 lag: (min: 1.0, avg: 11.5, max: 24.0) [2024-06-06 16:10:57,319][24114] Avg episode reward: [(0, '0.236')] [2024-06-06 16:11:01,078][24347] Updated weights for policy 0, policy_version 42358 (0.0037) [2024-06-06 16:11:02,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 694042624. Throughput: 0: 44655.1. Samples: 175272380. Policy #0 lag: (min: 1.0, avg: 11.5, max: 24.0) [2024-06-06 16:11:02,319][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 16:11:04,411][24347] Updated weights for policy 0, policy_version 42368 (0.0022) [2024-06-06 16:11:07,318][24114] Fps is (10 sec: 40960.2, 60 sec: 44236.8, 300 sec: 44597.8). Total num frames: 694255616. Throughput: 0: 44573.7. Samples: 175528920. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 16:11:07,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:11:08,198][24347] Updated weights for policy 0, policy_version 42378 (0.0038) [2024-06-06 16:11:11,761][24347] Updated weights for policy 0, policy_version 42388 (0.0038) [2024-06-06 16:11:12,318][24114] Fps is (10 sec: 47514.7, 60 sec: 45056.1, 300 sec: 44542.3). Total num frames: 694517760. Throughput: 0: 44613.5. Samples: 175799660. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 16:11:12,318][24114] Avg episode reward: [(0, '0.239')] [2024-06-06 16:11:15,747][24347] Updated weights for policy 0, policy_version 42398 (0.0039) [2024-06-06 16:11:17,320][24114] Fps is (10 sec: 45866.4, 60 sec: 44509.9, 300 sec: 44486.5). Total num frames: 694714368. Throughput: 0: 44791.2. Samples: 175941220. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 16:11:17,320][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:11:18,953][24347] Updated weights for policy 0, policy_version 42408 (0.0048) [2024-06-06 16:11:22,318][24114] Fps is (10 sec: 39320.8, 60 sec: 43963.6, 300 sec: 44486.7). Total num frames: 694910976. Throughput: 0: 44826.6. Samples: 176207260. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 16:11:22,318][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:11:23,497][24347] Updated weights for policy 0, policy_version 42418 (0.0031) [2024-06-06 16:11:26,137][24347] Updated weights for policy 0, policy_version 42428 (0.0032) [2024-06-06 16:11:27,318][24114] Fps is (10 sec: 45883.9, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 695173120. Throughput: 0: 44482.9. Samples: 176466300. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 16:11:27,318][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 16:11:30,608][24347] Updated weights for policy 0, policy_version 42438 (0.0020) [2024-06-06 16:11:32,318][24114] Fps is (10 sec: 47514.1, 60 sec: 44510.0, 300 sec: 44542.3). Total num frames: 695386112. Throughput: 0: 44577.8. Samples: 176608720. Policy #0 lag: (min: 0.0, avg: 11.4, max: 21.0) [2024-06-06 16:11:32,318][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 16:11:33,770][24347] Updated weights for policy 0, policy_version 42448 (0.0041) [2024-06-06 16:11:37,318][24114] Fps is (10 sec: 40959.6, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 695582720. Throughput: 0: 44552.4. Samples: 176866960. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:11:37,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 16:11:37,709][24347] Updated weights for policy 0, policy_version 42458 (0.0026) [2024-06-06 16:11:40,906][24347] Updated weights for policy 0, policy_version 42468 (0.0039) [2024-06-06 16:11:42,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44783.1, 300 sec: 44542.3). Total num frames: 695861248. Throughput: 0: 44493.8. Samples: 177133880. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:11:42,319][24114] Avg episode reward: [(0, '0.234')] [2024-06-06 16:11:45,277][24347] Updated weights for policy 0, policy_version 42478 (0.0030) [2024-06-06 16:11:47,318][24114] Fps is (10 sec: 47514.0, 60 sec: 44782.9, 300 sec: 44542.2). Total num frames: 696057856. Throughput: 0: 44644.1. Samples: 177281360. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:11:47,319][24114] Avg episode reward: [(0, '0.228')] [2024-06-06 16:11:48,196][24347] Updated weights for policy 0, policy_version 42488 (0.0036) [2024-06-06 16:11:52,318][24114] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 696254464. Throughput: 0: 44770.2. Samples: 177543580. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:11:52,327][24114] Avg episode reward: [(0, '0.235')] [2024-06-06 16:11:52,933][24347] Updated weights for policy 0, policy_version 42498 (0.0032) [2024-06-06 16:11:55,597][24347] Updated weights for policy 0, policy_version 42508 (0.0046) [2024-06-06 16:11:57,318][24114] Fps is (10 sec: 47513.0, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 696532992. Throughput: 0: 44615.3. Samples: 177807360. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:11:57,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:12:00,363][24347] Updated weights for policy 0, policy_version 42518 (0.0051) [2024-06-06 16:12:02,320][24114] Fps is (10 sec: 45868.0, 60 sec: 44508.8, 300 sec: 44542.3). Total num frames: 696713216. Throughput: 0: 44470.1. Samples: 177942360. Policy #0 lag: (min: 0.0, avg: 7.8, max: 22.0) [2024-06-06 16:12:02,320][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 16:12:03,102][24347] Updated weights for policy 0, policy_version 42528 (0.0029) [2024-06-06 16:12:07,264][24326] Signal inference workers to stop experience collection... (2600 times) [2024-06-06 16:12:07,264][24326] Signal inference workers to resume experience collection... (2600 times) [2024-06-06 16:12:07,289][24347] InferenceWorker_p0-w0: stopping experience collection (2600 times) [2024-06-06 16:12:07,289][24347] InferenceWorker_p0-w0: resuming experience collection (2600 times) [2024-06-06 16:12:07,318][24114] Fps is (10 sec: 39322.3, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 696926208. Throughput: 0: 44403.7. Samples: 178205420. Policy #0 lag: (min: 0.0, avg: 7.8, max: 22.0) [2024-06-06 16:12:07,318][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 16:12:07,407][24347] Updated weights for policy 0, policy_version 42538 (0.0025) [2024-06-06 16:12:10,459][24347] Updated weights for policy 0, policy_version 42548 (0.0044) [2024-06-06 16:12:12,318][24114] Fps is (10 sec: 47520.5, 60 sec: 44509.7, 300 sec: 44597.8). Total num frames: 697188352. Throughput: 0: 44529.7. Samples: 178470140. Policy #0 lag: (min: 0.0, avg: 7.8, max: 22.0) [2024-06-06 16:12:12,319][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 16:12:14,949][24347] Updated weights for policy 0, policy_version 42558 (0.0027) [2024-06-06 16:12:17,318][24114] Fps is (10 sec: 47513.2, 60 sec: 44784.3, 300 sec: 44653.4). Total num frames: 697401344. Throughput: 0: 44652.0. Samples: 178618060. Policy #0 lag: (min: 0.0, avg: 7.8, max: 22.0) [2024-06-06 16:12:17,319][24114] Avg episode reward: [(0, '0.238')] [2024-06-06 16:12:17,583][24347] Updated weights for policy 0, policy_version 42568 (0.0036) [2024-06-06 16:12:22,318][24114] Fps is (10 sec: 39322.0, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 697581568. Throughput: 0: 44713.4. Samples: 178879060. Policy #0 lag: (min: 0.0, avg: 7.8, max: 22.0) [2024-06-06 16:12:22,318][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:12:22,441][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000042578_697597952.pth... [2024-06-06 16:12:22,448][24347] Updated weights for policy 0, policy_version 42578 (0.0031) [2024-06-06 16:12:22,507][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000041928_686948352.pth [2024-06-06 16:12:25,008][24347] Updated weights for policy 0, policy_version 42588 (0.0024) [2024-06-06 16:12:27,322][24114] Fps is (10 sec: 44219.1, 60 sec: 44506.9, 300 sec: 44541.7). Total num frames: 697843712. Throughput: 0: 44531.1. Samples: 179137960. Policy #0 lag: (min: 0.0, avg: 7.8, max: 22.0) [2024-06-06 16:12:27,322][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:12:29,881][24347] Updated weights for policy 0, policy_version 42598 (0.0031) [2024-06-06 16:12:32,318][24114] Fps is (10 sec: 49152.0, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 698073088. Throughput: 0: 44418.7. Samples: 179280200. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 16:12:32,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:12:32,465][24347] Updated weights for policy 0, policy_version 42608 (0.0042) [2024-06-06 16:12:37,155][24347] Updated weights for policy 0, policy_version 42618 (0.0034) [2024-06-06 16:12:37,318][24114] Fps is (10 sec: 42616.0, 60 sec: 44783.1, 300 sec: 44486.7). Total num frames: 698269696. Throughput: 0: 44458.3. Samples: 179544200. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 16:12:37,327][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:12:39,861][24347] Updated weights for policy 0, policy_version 42628 (0.0029) [2024-06-06 16:12:42,320][24114] Fps is (10 sec: 44228.4, 60 sec: 44235.4, 300 sec: 44597.5). Total num frames: 698515456. Throughput: 0: 44512.0. Samples: 179810480. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 16:12:42,320][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:12:44,599][24347] Updated weights for policy 0, policy_version 42638 (0.0032) [2024-06-06 16:12:47,115][24347] Updated weights for policy 0, policy_version 42648 (0.0028) [2024-06-06 16:12:47,318][24114] Fps is (10 sec: 47513.5, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 698744832. Throughput: 0: 44673.6. Samples: 179952600. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 16:12:47,318][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:12:51,845][24347] Updated weights for policy 0, policy_version 42658 (0.0033) [2024-06-06 16:12:52,318][24114] Fps is (10 sec: 40968.0, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 698925056. Throughput: 0: 44669.3. Samples: 180215540. Policy #0 lag: (min: 0.0, avg: 11.6, max: 20.0) [2024-06-06 16:12:52,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 16:12:54,520][24347] Updated weights for policy 0, policy_version 42668 (0.0029) [2024-06-06 16:12:57,318][24114] Fps is (10 sec: 42598.1, 60 sec: 43963.8, 300 sec: 44542.3). Total num frames: 699170816. Throughput: 0: 44493.0. Samples: 180472320. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:12:57,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:12:59,293][24347] Updated weights for policy 0, policy_version 42678 (0.0033) [2024-06-06 16:13:02,025][24347] Updated weights for policy 0, policy_version 42688 (0.0029) [2024-06-06 16:13:02,318][24114] Fps is (10 sec: 47513.3, 60 sec: 44784.1, 300 sec: 44542.3). Total num frames: 699400192. Throughput: 0: 44185.4. Samples: 180606400. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:13:02,318][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:13:06,696][24347] Updated weights for policy 0, policy_version 42698 (0.0031) [2024-06-06 16:13:07,318][24114] Fps is (10 sec: 44236.9, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 699613184. Throughput: 0: 44394.7. Samples: 180876820. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:13:07,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:13:09,742][24347] Updated weights for policy 0, policy_version 42708 (0.0023) [2024-06-06 16:13:11,839][24326] Signal inference workers to stop experience collection... (2650 times) [2024-06-06 16:13:11,839][24326] Signal inference workers to resume experience collection... (2650 times) [2024-06-06 16:13:11,850][24347] InferenceWorker_p0-w0: stopping experience collection (2650 times) [2024-06-06 16:13:11,850][24347] InferenceWorker_p0-w0: resuming experience collection (2650 times) [2024-06-06 16:13:12,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44236.8, 300 sec: 44542.2). Total num frames: 699842560. Throughput: 0: 44620.8. Samples: 181145720. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:13:12,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:13:13,877][24347] Updated weights for policy 0, policy_version 42718 (0.0032) [2024-06-06 16:13:16,772][24347] Updated weights for policy 0, policy_version 42728 (0.0036) [2024-06-06 16:13:17,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44509.9, 300 sec: 44486.7). Total num frames: 700071936. Throughput: 0: 44392.5. Samples: 181277860. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:13:17,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:13:21,409][24347] Updated weights for policy 0, policy_version 42738 (0.0034) [2024-06-06 16:13:22,320][24114] Fps is (10 sec: 45866.5, 60 sec: 45327.6, 300 sec: 44653.9). Total num frames: 700301312. Throughput: 0: 44633.5. Samples: 181552800. Policy #0 lag: (min: 1.0, avg: 7.6, max: 22.0) [2024-06-06 16:13:22,321][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:13:24,128][24347] Updated weights for policy 0, policy_version 42748 (0.0040) [2024-06-06 16:13:27,318][24114] Fps is (10 sec: 42597.9, 60 sec: 44239.7, 300 sec: 44542.3). Total num frames: 700497920. Throughput: 0: 44499.2. Samples: 181812860. Policy #0 lag: (min: 1.0, avg: 7.6, max: 22.0) [2024-06-06 16:13:27,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:13:28,807][24347] Updated weights for policy 0, policy_version 42758 (0.0033) [2024-06-06 16:13:31,385][24347] Updated weights for policy 0, policy_version 42768 (0.0026) [2024-06-06 16:13:32,318][24114] Fps is (10 sec: 45884.0, 60 sec: 44782.9, 300 sec: 44542.6). Total num frames: 700760064. Throughput: 0: 44149.7. Samples: 181939340. Policy #0 lag: (min: 1.0, avg: 7.6, max: 22.0) [2024-06-06 16:13:32,318][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 16:13:35,844][24347] Updated weights for policy 0, policy_version 42778 (0.0026) [2024-06-06 16:13:37,320][24114] Fps is (10 sec: 47504.7, 60 sec: 45054.5, 300 sec: 44708.6). Total num frames: 700973056. Throughput: 0: 44511.4. Samples: 182218640. Policy #0 lag: (min: 1.0, avg: 7.6, max: 22.0) [2024-06-06 16:13:37,320][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:13:39,011][24347] Updated weights for policy 0, policy_version 42788 (0.0035) [2024-06-06 16:13:42,318][24114] Fps is (10 sec: 39321.0, 60 sec: 43965.0, 300 sec: 44487.6). Total num frames: 701153280. Throughput: 0: 44717.1. Samples: 182484600. Policy #0 lag: (min: 1.0, avg: 7.6, max: 22.0) [2024-06-06 16:13:42,318][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:13:43,144][24347] Updated weights for policy 0, policy_version 42798 (0.0046) [2024-06-06 16:13:46,470][24347] Updated weights for policy 0, policy_version 42808 (0.0038) [2024-06-06 16:13:47,318][24114] Fps is (10 sec: 44245.5, 60 sec: 44509.8, 300 sec: 44597.8). Total num frames: 701415424. Throughput: 0: 44599.6. Samples: 182613380. Policy #0 lag: (min: 1.0, avg: 7.6, max: 22.0) [2024-06-06 16:13:47,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:13:50,846][24347] Updated weights for policy 0, policy_version 42818 (0.0028) [2024-06-06 16:13:52,318][24114] Fps is (10 sec: 47514.3, 60 sec: 45055.9, 300 sec: 44597.8). Total num frames: 701628416. Throughput: 0: 44591.5. Samples: 182883440. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 16:13:52,319][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 16:13:53,757][24347] Updated weights for policy 0, policy_version 42828 (0.0026) [2024-06-06 16:13:57,318][24114] Fps is (10 sec: 40959.4, 60 sec: 44236.7, 300 sec: 44486.7). Total num frames: 701825024. Throughput: 0: 44428.4. Samples: 183145000. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 16:13:57,319][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:13:58,162][24347] Updated weights for policy 0, policy_version 42838 (0.0031) [2024-06-06 16:14:01,112][24347] Updated weights for policy 0, policy_version 42848 (0.0030) [2024-06-06 16:14:02,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 702070784. Throughput: 0: 44295.0. Samples: 183271140. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 16:14:02,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:14:05,180][24347] Updated weights for policy 0, policy_version 42858 (0.0032) [2024-06-06 16:14:07,318][24114] Fps is (10 sec: 47514.4, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 702300160. Throughput: 0: 44280.2. Samples: 183545320. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 16:14:07,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:14:08,479][24347] Updated weights for policy 0, policy_version 42868 (0.0044) [2024-06-06 16:14:12,318][24114] Fps is (10 sec: 40960.4, 60 sec: 43963.8, 300 sec: 44431.2). Total num frames: 702480384. Throughput: 0: 44541.9. Samples: 183817240. Policy #0 lag: (min: 0.0, avg: 11.2, max: 22.0) [2024-06-06 16:14:12,318][24114] Avg episode reward: [(0, '0.258')] [2024-06-06 16:14:12,628][24347] Updated weights for policy 0, policy_version 42878 (0.0043) [2024-06-06 16:14:15,849][24347] Updated weights for policy 0, policy_version 42888 (0.0032) [2024-06-06 16:14:17,318][24114] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 702758912. Throughput: 0: 44513.8. Samples: 183942460. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:14:17,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:14:20,215][24347] Updated weights for policy 0, policy_version 42898 (0.0038) [2024-06-06 16:14:22,320][24114] Fps is (10 sec: 49142.1, 60 sec: 44509.8, 300 sec: 44597.5). Total num frames: 702971904. Throughput: 0: 44243.9. Samples: 184209620. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:14:22,321][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:14:22,327][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000042906_702971904.pth... [2024-06-06 16:14:22,382][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000042254_692289536.pth [2024-06-06 16:14:23,573][24347] Updated weights for policy 0, policy_version 42908 (0.0039) [2024-06-06 16:14:27,318][24114] Fps is (10 sec: 37683.5, 60 sec: 43963.8, 300 sec: 44431.2). Total num frames: 703135744. Throughput: 0: 44354.9. Samples: 184480560. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:14:27,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:14:27,544][24347] Updated weights for policy 0, policy_version 42918 (0.0033) [2024-06-06 16:14:30,704][24347] Updated weights for policy 0, policy_version 42928 (0.0034) [2024-06-06 16:14:32,318][24114] Fps is (10 sec: 42607.1, 60 sec: 43963.8, 300 sec: 44486.7). Total num frames: 703397888. Throughput: 0: 44220.4. Samples: 184603300. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:14:32,318][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:14:33,927][24326] Signal inference workers to stop experience collection... (2700 times) [2024-06-06 16:14:33,928][24326] Signal inference workers to resume experience collection... (2700 times) [2024-06-06 16:14:33,960][24347] InferenceWorker_p0-w0: stopping experience collection (2700 times) [2024-06-06 16:14:33,960][24347] InferenceWorker_p0-w0: resuming experience collection (2700 times) [2024-06-06 16:14:34,610][24347] Updated weights for policy 0, policy_version 42938 (0.0026) [2024-06-06 16:14:37,324][24114] Fps is (10 sec: 49122.5, 60 sec: 44233.8, 300 sec: 44596.9). Total num frames: 703627264. Throughput: 0: 44265.3. Samples: 184875640. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:14:37,325][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:14:38,065][24347] Updated weights for policy 0, policy_version 42948 (0.0040) [2024-06-06 16:14:42,140][24347] Updated weights for policy 0, policy_version 42958 (0.0034) [2024-06-06 16:14:42,318][24114] Fps is (10 sec: 42598.0, 60 sec: 44510.0, 300 sec: 44431.2). Total num frames: 703823872. Throughput: 0: 44570.3. Samples: 185150660. Policy #0 lag: (min: 0.0, avg: 10.8, max: 22.0) [2024-06-06 16:14:42,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:14:45,467][24347] Updated weights for policy 0, policy_version 42968 (0.0036) [2024-06-06 16:14:47,318][24114] Fps is (10 sec: 44263.3, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 704069632. Throughput: 0: 44505.4. Samples: 185273880. Policy #0 lag: (min: 0.0, avg: 9.8, max: 19.0) [2024-06-06 16:14:47,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 16:14:49,783][24347] Updated weights for policy 0, policy_version 42978 (0.0031) [2024-06-06 16:14:52,318][24114] Fps is (10 sec: 47513.5, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 704299008. Throughput: 0: 44275.0. Samples: 185537700. Policy #0 lag: (min: 0.0, avg: 9.8, max: 19.0) [2024-06-06 16:14:52,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:14:53,066][24347] Updated weights for policy 0, policy_version 42988 (0.0036) [2024-06-06 16:14:57,011][24347] Updated weights for policy 0, policy_version 42998 (0.0026) [2024-06-06 16:14:57,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 704495616. Throughput: 0: 44319.5. Samples: 185811620. Policy #0 lag: (min: 0.0, avg: 9.8, max: 19.0) [2024-06-06 16:14:57,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:15:00,247][24347] Updated weights for policy 0, policy_version 43008 (0.0031) [2024-06-06 16:15:02,318][24114] Fps is (10 sec: 42598.8, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 704724992. Throughput: 0: 44344.0. Samples: 185937940. Policy #0 lag: (min: 0.0, avg: 9.8, max: 19.0) [2024-06-06 16:15:02,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:15:04,086][24347] Updated weights for policy 0, policy_version 43018 (0.0031) [2024-06-06 16:15:07,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44236.7, 300 sec: 44542.3). Total num frames: 704954368. Throughput: 0: 44422.8. Samples: 186208560. Policy #0 lag: (min: 0.0, avg: 9.8, max: 19.0) [2024-06-06 16:15:07,319][24114] Avg episode reward: [(0, '0.242')] [2024-06-06 16:15:07,468][24347] Updated weights for policy 0, policy_version 43028 (0.0021) [2024-06-06 16:15:11,663][24347] Updated weights for policy 0, policy_version 43038 (0.0030) [2024-06-06 16:15:12,318][24114] Fps is (10 sec: 45874.8, 60 sec: 45055.9, 300 sec: 44542.5). Total num frames: 705183744. Throughput: 0: 44411.4. Samples: 186479080. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:15:12,318][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:15:14,834][24347] Updated weights for policy 0, policy_version 43048 (0.0039) [2024-06-06 16:15:17,318][24114] Fps is (10 sec: 44236.6, 60 sec: 43963.7, 300 sec: 44486.7). Total num frames: 705396736. Throughput: 0: 44628.3. Samples: 186611580. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:15:17,323][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:15:19,147][24347] Updated weights for policy 0, policy_version 43058 (0.0023) [2024-06-06 16:15:22,107][24347] Updated weights for policy 0, policy_version 43068 (0.0046) [2024-06-06 16:15:22,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44238.2, 300 sec: 44486.7). Total num frames: 705626112. Throughput: 0: 44435.1. Samples: 186874960. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:15:22,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:15:26,403][24347] Updated weights for policy 0, policy_version 43078 (0.0036) [2024-06-06 16:15:27,318][24114] Fps is (10 sec: 45875.2, 60 sec: 45328.9, 300 sec: 44542.3). Total num frames: 705855488. Throughput: 0: 44372.8. Samples: 187147440. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:15:27,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:15:29,365][24347] Updated weights for policy 0, policy_version 43088 (0.0044) [2024-06-06 16:15:32,318][24114] Fps is (10 sec: 42598.9, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 706052096. Throughput: 0: 44512.0. Samples: 187276920. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:15:32,318][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:15:33,383][24347] Updated weights for policy 0, policy_version 43098 (0.0023) [2024-06-06 16:15:36,822][24347] Updated weights for policy 0, policy_version 43108 (0.0030) [2024-06-06 16:15:37,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44514.3, 300 sec: 44486.8). Total num frames: 706297856. Throughput: 0: 44656.5. Samples: 187547240. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:15:37,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:15:40,889][24347] Updated weights for policy 0, policy_version 43118 (0.0029) [2024-06-06 16:15:42,318][24114] Fps is (10 sec: 47513.8, 60 sec: 45056.1, 300 sec: 44597.8). Total num frames: 706527232. Throughput: 0: 44502.8. Samples: 187814240. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 16:15:42,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:15:44,179][24347] Updated weights for policy 0, policy_version 43128 (0.0033) [2024-06-06 16:15:47,318][24114] Fps is (10 sec: 40959.9, 60 sec: 43963.7, 300 sec: 44431.2). Total num frames: 706707456. Throughput: 0: 44663.1. Samples: 187947780. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 16:15:47,318][24114] Avg episode reward: [(0, '0.258')] [2024-06-06 16:15:48,657][24347] Updated weights for policy 0, policy_version 43138 (0.0050) [2024-06-06 16:15:51,491][24347] Updated weights for policy 0, policy_version 43148 (0.0039) [2024-06-06 16:15:52,324][24114] Fps is (10 sec: 42573.0, 60 sec: 44232.5, 300 sec: 44430.3). Total num frames: 706953216. Throughput: 0: 44531.1. Samples: 188212720. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 16:15:52,325][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:15:55,778][24347] Updated weights for policy 0, policy_version 43158 (0.0036) [2024-06-06 16:15:56,160][24326] Signal inference workers to stop experience collection... (2750 times) [2024-06-06 16:15:56,161][24326] Signal inference workers to resume experience collection... (2750 times) [2024-06-06 16:15:56,186][24347] InferenceWorker_p0-w0: stopping experience collection (2750 times) [2024-06-06 16:15:56,186][24347] InferenceWorker_p0-w0: resuming experience collection (2750 times) [2024-06-06 16:15:57,320][24114] Fps is (10 sec: 47504.7, 60 sec: 44781.5, 300 sec: 44542.0). Total num frames: 707182592. Throughput: 0: 44318.7. Samples: 188473500. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 16:15:57,320][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:15:59,056][24347] Updated weights for policy 0, policy_version 43168 (0.0049) [2024-06-06 16:16:02,318][24114] Fps is (10 sec: 42622.8, 60 sec: 44236.6, 300 sec: 44486.7). Total num frames: 707379200. Throughput: 0: 44473.3. Samples: 188612880. Policy #0 lag: (min: 0.0, avg: 10.2, max: 21.0) [2024-06-06 16:16:02,319][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:16:02,886][24347] Updated weights for policy 0, policy_version 43178 (0.0028) [2024-06-06 16:16:06,320][24347] Updated weights for policy 0, policy_version 43188 (0.0041) [2024-06-06 16:16:07,318][24114] Fps is (10 sec: 42606.2, 60 sec: 44236.8, 300 sec: 44375.6). Total num frames: 707608576. Throughput: 0: 44538.3. Samples: 188879180. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 16:16:07,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:16:10,587][24347] Updated weights for policy 0, policy_version 43198 (0.0033) [2024-06-06 16:16:12,318][24114] Fps is (10 sec: 47514.8, 60 sec: 44510.0, 300 sec: 44542.6). Total num frames: 707854336. Throughput: 0: 44269.9. Samples: 189139580. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 16:16:12,318][24114] Avg episode reward: [(0, '0.240')] [2024-06-06 16:16:13,846][24347] Updated weights for policy 0, policy_version 43208 (0.0023) [2024-06-06 16:16:17,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 708050944. Throughput: 0: 44550.1. Samples: 189281680. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 16:16:17,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:16:17,825][24347] Updated weights for policy 0, policy_version 43218 (0.0036) [2024-06-06 16:16:21,106][24347] Updated weights for policy 0, policy_version 43228 (0.0032) [2024-06-06 16:16:22,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 708280320. Throughput: 0: 44428.9. Samples: 189546540. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 16:16:22,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:16:22,416][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000043231_708296704.pth... [2024-06-06 16:16:22,470][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000042578_697597952.pth [2024-06-06 16:16:25,174][24347] Updated weights for policy 0, policy_version 43238 (0.0028) [2024-06-06 16:16:27,318][24114] Fps is (10 sec: 45875.5, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 708509696. Throughput: 0: 44340.4. Samples: 189809560. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 16:16:27,318][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 16:16:28,759][24347] Updated weights for policy 0, policy_version 43248 (0.0031) [2024-06-06 16:16:32,254][24347] Updated weights for policy 0, policy_version 43258 (0.0032) [2024-06-06 16:16:32,318][24114] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 708739072. Throughput: 0: 44607.5. Samples: 189955120. Policy #0 lag: (min: 0.0, avg: 11.2, max: 21.0) [2024-06-06 16:16:32,318][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:16:35,975][24347] Updated weights for policy 0, policy_version 43268 (0.0039) [2024-06-06 16:16:37,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44236.8, 300 sec: 44375.6). Total num frames: 708952064. Throughput: 0: 44724.1. Samples: 190225040. Policy #0 lag: (min: 0.0, avg: 11.0, max: 24.0) [2024-06-06 16:16:37,319][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 16:16:39,616][24347] Updated weights for policy 0, policy_version 43278 (0.0032) [2024-06-06 16:16:42,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 709197824. Throughput: 0: 44810.3. Samples: 190489880. Policy #0 lag: (min: 0.0, avg: 11.0, max: 24.0) [2024-06-06 16:16:42,319][24114] Avg episode reward: [(0, '0.241')] [2024-06-06 16:16:42,915][24347] Updated weights for policy 0, policy_version 43288 (0.0031) [2024-06-06 16:16:46,852][24347] Updated weights for policy 0, policy_version 43298 (0.0025) [2024-06-06 16:16:47,318][24114] Fps is (10 sec: 45875.0, 60 sec: 45056.0, 300 sec: 44597.8). Total num frames: 709410816. Throughput: 0: 44779.2. Samples: 190627940. Policy #0 lag: (min: 0.0, avg: 11.0, max: 24.0) [2024-06-06 16:16:47,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:16:50,379][24347] Updated weights for policy 0, policy_version 43308 (0.0026) [2024-06-06 16:16:52,318][24114] Fps is (10 sec: 40960.4, 60 sec: 44241.2, 300 sec: 44320.1). Total num frames: 709607424. Throughput: 0: 44719.6. Samples: 190891560. Policy #0 lag: (min: 0.0, avg: 11.0, max: 24.0) [2024-06-06 16:16:52,318][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 16:16:54,123][24347] Updated weights for policy 0, policy_version 43318 (0.0033) [2024-06-06 16:16:57,319][24114] Fps is (10 sec: 44234.4, 60 sec: 44510.8, 300 sec: 44542.4). Total num frames: 709853184. Throughput: 0: 44926.0. Samples: 191161280. Policy #0 lag: (min: 0.0, avg: 11.0, max: 24.0) [2024-06-06 16:16:57,319][24114] Avg episode reward: [(0, '0.243')] [2024-06-06 16:16:57,715][24347] Updated weights for policy 0, policy_version 43328 (0.0029) [2024-06-06 16:17:01,547][24347] Updated weights for policy 0, policy_version 43338 (0.0024) [2024-06-06 16:17:02,322][24114] Fps is (10 sec: 47496.1, 60 sec: 45053.4, 300 sec: 44597.2). Total num frames: 710082560. Throughput: 0: 44920.4. Samples: 191303260. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 16:17:02,322][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:17:04,955][24347] Updated weights for policy 0, policy_version 43348 (0.0030) [2024-06-06 16:17:07,318][24114] Fps is (10 sec: 44239.7, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 710295552. Throughput: 0: 45025.4. Samples: 191572680. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 16:17:07,318][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:17:08,529][24347] Updated weights for policy 0, policy_version 43358 (0.0033) [2024-06-06 16:17:12,009][24347] Updated weights for policy 0, policy_version 43368 (0.0046) [2024-06-06 16:17:12,318][24114] Fps is (10 sec: 45891.6, 60 sec: 44782.8, 300 sec: 44542.3). Total num frames: 710541312. Throughput: 0: 45052.4. Samples: 191836920. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 16:17:12,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:17:15,525][24326] Signal inference workers to stop experience collection... (2800 times) [2024-06-06 16:17:15,526][24326] Signal inference workers to resume experience collection... (2800 times) [2024-06-06 16:17:15,535][24347] InferenceWorker_p0-w0: stopping experience collection (2800 times) [2024-06-06 16:17:15,558][24347] InferenceWorker_p0-w0: resuming experience collection (2800 times) [2024-06-06 16:17:16,102][24347] Updated weights for policy 0, policy_version 43378 (0.0029) [2024-06-06 16:17:17,318][24114] Fps is (10 sec: 47513.6, 60 sec: 45329.2, 300 sec: 44708.9). Total num frames: 710770688. Throughput: 0: 44861.0. Samples: 191973860. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 16:17:17,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:17:19,546][24347] Updated weights for policy 0, policy_version 43388 (0.0034) [2024-06-06 16:17:22,320][24114] Fps is (10 sec: 40954.1, 60 sec: 44508.7, 300 sec: 44431.6). Total num frames: 710950912. Throughput: 0: 44860.7. Samples: 192243840. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 16:17:22,320][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:17:23,297][24347] Updated weights for policy 0, policy_version 43398 (0.0028) [2024-06-06 16:17:27,071][24347] Updated weights for policy 0, policy_version 43408 (0.0041) [2024-06-06 16:17:27,318][24114] Fps is (10 sec: 44236.4, 60 sec: 45056.0, 300 sec: 44542.3). Total num frames: 711213056. Throughput: 0: 44901.4. Samples: 192510440. Policy #0 lag: (min: 0.0, avg: 9.1, max: 22.0) [2024-06-06 16:17:27,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:17:30,609][24347] Updated weights for policy 0, policy_version 43418 (0.0032) [2024-06-06 16:17:32,318][24114] Fps is (10 sec: 49159.9, 60 sec: 45056.1, 300 sec: 44653.3). Total num frames: 711442432. Throughput: 0: 44825.9. Samples: 192645100. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:17:32,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:17:34,008][24347] Updated weights for policy 0, policy_version 43428 (0.0033) [2024-06-06 16:17:37,318][24114] Fps is (10 sec: 42596.7, 60 sec: 44782.6, 300 sec: 44486.9). Total num frames: 711639040. Throughput: 0: 45008.9. Samples: 192916980. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:17:37,319][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 16:17:37,946][24347] Updated weights for policy 0, policy_version 43438 (0.0026) [2024-06-06 16:17:41,242][24347] Updated weights for policy 0, policy_version 43448 (0.0022) [2024-06-06 16:17:42,318][24114] Fps is (10 sec: 40960.1, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 711852032. Throughput: 0: 44742.0. Samples: 193174640. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:17:42,318][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:17:45,622][24347] Updated weights for policy 0, policy_version 43458 (0.0038) [2024-06-06 16:17:47,318][24114] Fps is (10 sec: 45876.9, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 712097792. Throughput: 0: 44583.6. Samples: 193309360. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:17:47,319][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:17:48,786][24347] Updated weights for policy 0, policy_version 43468 (0.0031) [2024-06-06 16:17:52,318][24114] Fps is (10 sec: 45874.8, 60 sec: 45056.0, 300 sec: 44542.3). Total num frames: 712310784. Throughput: 0: 44659.0. Samples: 193582340. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:17:52,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:17:52,634][24347] Updated weights for policy 0, policy_version 43478 (0.0023) [2024-06-06 16:17:56,293][24347] Updated weights for policy 0, policy_version 43488 (0.0034) [2024-06-06 16:17:57,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44510.3, 300 sec: 44486.7). Total num frames: 712523776. Throughput: 0: 44724.9. Samples: 193849540. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:17:57,319][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:18:00,062][24347] Updated weights for policy 0, policy_version 43498 (0.0027) [2024-06-06 16:18:02,318][24114] Fps is (10 sec: 47514.2, 60 sec: 45058.9, 300 sec: 44653.4). Total num frames: 712785920. Throughput: 0: 44604.5. Samples: 193981060. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:18:02,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:18:03,322][24347] Updated weights for policy 0, policy_version 43508 (0.0046) [2024-06-06 16:18:07,245][24347] Updated weights for policy 0, policy_version 43518 (0.0024) [2024-06-06 16:18:07,318][24114] Fps is (10 sec: 47513.0, 60 sec: 45055.8, 300 sec: 44597.8). Total num frames: 712998912. Throughput: 0: 44594.7. Samples: 194250540. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:18:07,319][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:18:10,393][24347] Updated weights for policy 0, policy_version 43528 (0.0029) [2024-06-06 16:18:12,320][24114] Fps is (10 sec: 40951.5, 60 sec: 44235.4, 300 sec: 44486.4). Total num frames: 713195520. Throughput: 0: 44646.5. Samples: 194519620. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:18:12,321][24114] Avg episode reward: [(0, '0.245')] [2024-06-06 16:18:14,893][24347] Updated weights for policy 0, policy_version 43538 (0.0036) [2024-06-06 16:18:17,318][24114] Fps is (10 sec: 44237.3, 60 sec: 44509.8, 300 sec: 44542.6). Total num frames: 713441280. Throughput: 0: 44513.7. Samples: 194648220. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:18:17,318][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:18:18,119][24347] Updated weights for policy 0, policy_version 43548 (0.0024) [2024-06-06 16:18:21,932][24347] Updated weights for policy 0, policy_version 43558 (0.0031) [2024-06-06 16:18:22,318][24114] Fps is (10 sec: 45884.2, 60 sec: 45057.2, 300 sec: 44597.8). Total num frames: 713654272. Throughput: 0: 44476.0. Samples: 194918380. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:18:22,318][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 16:18:22,404][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000043559_713670656.pth... [2024-06-06 16:18:22,463][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000042906_702971904.pth [2024-06-06 16:18:22,804][24326] Signal inference workers to stop experience collection... (2850 times) [2024-06-06 16:18:22,856][24347] InferenceWorker_p0-w0: stopping experience collection (2850 times) [2024-06-06 16:18:22,917][24326] Signal inference workers to resume experience collection... (2850 times) [2024-06-06 16:18:22,917][24347] InferenceWorker_p0-w0: resuming experience collection (2850 times) [2024-06-06 16:18:25,823][24347] Updated weights for policy 0, policy_version 43568 (0.0035) [2024-06-06 16:18:27,318][24114] Fps is (10 sec: 42598.9, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 713867264. Throughput: 0: 44802.2. Samples: 195190740. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 16:18:27,318][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:18:29,410][24347] Updated weights for policy 0, policy_version 43578 (0.0028) [2024-06-06 16:18:32,320][24114] Fps is (10 sec: 45866.2, 60 sec: 44508.4, 300 sec: 44542.3). Total num frames: 714113024. Throughput: 0: 44632.8. Samples: 195317920. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 16:18:32,321][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:18:32,942][24347] Updated weights for policy 0, policy_version 43588 (0.0027) [2024-06-06 16:18:36,883][24347] Updated weights for policy 0, policy_version 43598 (0.0043) [2024-06-06 16:18:37,318][24114] Fps is (10 sec: 44235.9, 60 sec: 44510.1, 300 sec: 44597.8). Total num frames: 714309632. Throughput: 0: 44499.9. Samples: 195584840. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 16:18:37,319][24114] Avg episode reward: [(0, '0.248')] [2024-06-06 16:18:40,184][24347] Updated weights for policy 0, policy_version 43608 (0.0027) [2024-06-06 16:18:42,318][24114] Fps is (10 sec: 40967.9, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 714522624. Throughput: 0: 44573.8. Samples: 195855360. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 16:18:42,318][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:18:44,507][24347] Updated weights for policy 0, policy_version 43618 (0.0029) [2024-06-06 16:18:47,320][24114] Fps is (10 sec: 47505.0, 60 sec: 44781.5, 300 sec: 44597.5). Total num frames: 714784768. Throughput: 0: 44493.5. Samples: 195983360. Policy #0 lag: (min: 0.0, avg: 10.9, max: 20.0) [2024-06-06 16:18:47,321][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:18:47,883][24347] Updated weights for policy 0, policy_version 43628 (0.0027) [2024-06-06 16:18:51,607][24347] Updated weights for policy 0, policy_version 43638 (0.0026) [2024-06-06 16:18:52,318][24114] Fps is (10 sec: 47513.9, 60 sec: 44783.0, 300 sec: 44653.4). Total num frames: 714997760. Throughput: 0: 44364.2. Samples: 196246920. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 16:18:52,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:18:55,686][24347] Updated weights for policy 0, policy_version 43648 (0.0033) [2024-06-06 16:18:57,318][24114] Fps is (10 sec: 42606.5, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 715210752. Throughput: 0: 44488.6. Samples: 196521520. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 16:18:57,319][24114] Avg episode reward: [(0, '0.258')] [2024-06-06 16:18:59,037][24347] Updated weights for policy 0, policy_version 43658 (0.0025) [2024-06-06 16:19:02,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44236.7, 300 sec: 44542.2). Total num frames: 715440128. Throughput: 0: 44500.0. Samples: 196650720. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 16:19:02,319][24114] Avg episode reward: [(0, '0.246')] [2024-06-06 16:19:02,783][24347] Updated weights for policy 0, policy_version 43668 (0.0030) [2024-06-06 16:19:06,720][24347] Updated weights for policy 0, policy_version 43678 (0.0031) [2024-06-06 16:19:07,318][24114] Fps is (10 sec: 45875.6, 60 sec: 44510.0, 300 sec: 44708.9). Total num frames: 715669504. Throughput: 0: 44523.1. Samples: 196921920. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 16:19:07,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:19:10,000][24347] Updated weights for policy 0, policy_version 43688 (0.0031) [2024-06-06 16:19:12,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44784.4, 300 sec: 44486.7). Total num frames: 715882496. Throughput: 0: 44341.3. Samples: 197186100. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 16:19:12,319][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:19:13,812][24347] Updated weights for policy 0, policy_version 43698 (0.0044) [2024-06-06 16:19:17,324][24114] Fps is (10 sec: 42572.8, 60 sec: 44232.4, 300 sec: 44486.1). Total num frames: 716095488. Throughput: 0: 44507.1. Samples: 197320920. Policy #0 lag: (min: 0.0, avg: 10.3, max: 23.0) [2024-06-06 16:19:17,325][24114] Avg episode reward: [(0, '0.253')] [2024-06-06 16:19:17,573][24347] Updated weights for policy 0, policy_version 43708 (0.0025) [2024-06-06 16:19:20,914][24347] Updated weights for policy 0, policy_version 43718 (0.0029) [2024-06-06 16:19:22,318][24114] Fps is (10 sec: 47513.7, 60 sec: 45056.0, 300 sec: 44820.0). Total num frames: 716357632. Throughput: 0: 44615.7. Samples: 197592540. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 16:19:22,318][24114] Avg episode reward: [(0, '0.258')] [2024-06-06 16:19:24,600][24347] Updated weights for policy 0, policy_version 43728 (0.0030) [2024-06-06 16:19:27,318][24114] Fps is (10 sec: 45902.8, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 716554240. Throughput: 0: 44535.6. Samples: 197859460. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 16:19:27,318][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 16:19:28,443][24347] Updated weights for policy 0, policy_version 43738 (0.0031) [2024-06-06 16:19:32,276][24347] Updated weights for policy 0, policy_version 43748 (0.0043) [2024-06-06 16:19:32,318][24114] Fps is (10 sec: 40959.9, 60 sec: 44238.2, 300 sec: 44543.2). Total num frames: 716767232. Throughput: 0: 44613.9. Samples: 197990900. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 16:19:32,318][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:19:35,908][24347] Updated weights for policy 0, policy_version 43758 (0.0025) [2024-06-06 16:19:36,502][24326] Signal inference workers to stop experience collection... (2900 times) [2024-06-06 16:19:36,550][24347] InferenceWorker_p0-w0: stopping experience collection (2900 times) [2024-06-06 16:19:36,556][24326] Signal inference workers to resume experience collection... (2900 times) [2024-06-06 16:19:36,561][24347] InferenceWorker_p0-w0: resuming experience collection (2900 times) [2024-06-06 16:19:37,324][24114] Fps is (10 sec: 47485.3, 60 sec: 45324.7, 300 sec: 44763.5). Total num frames: 717029376. Throughput: 0: 44868.7. Samples: 198266280. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 16:19:37,324][24114] Avg episode reward: [(0, '0.257')] [2024-06-06 16:19:39,443][24347] Updated weights for policy 0, policy_version 43768 (0.0036) [2024-06-06 16:19:42,318][24114] Fps is (10 sec: 45875.7, 60 sec: 45056.1, 300 sec: 44597.8). Total num frames: 717225984. Throughput: 0: 44518.8. Samples: 198524860. Policy #0 lag: (min: 0.0, avg: 9.2, max: 21.0) [2024-06-06 16:19:42,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:19:43,237][24347] Updated weights for policy 0, policy_version 43778 (0.0027) [2024-06-06 16:19:47,099][24347] Updated weights for policy 0, policy_version 43788 (0.0030) [2024-06-06 16:19:47,318][24114] Fps is (10 sec: 39345.1, 60 sec: 43965.2, 300 sec: 44486.7). Total num frames: 717422592. Throughput: 0: 44640.6. Samples: 198659540. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 16:19:47,318][24114] Avg episode reward: [(0, '0.257')] [2024-06-06 16:19:50,343][24347] Updated weights for policy 0, policy_version 43798 (0.0023) [2024-06-06 16:19:52,318][24114] Fps is (10 sec: 47513.0, 60 sec: 45056.0, 300 sec: 44764.4). Total num frames: 717701120. Throughput: 0: 44601.7. Samples: 198929000. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 16:19:52,318][24114] Avg episode reward: [(0, '0.244')] [2024-06-06 16:19:54,094][24347] Updated weights for policy 0, policy_version 43808 (0.0027) [2024-06-06 16:19:57,318][24114] Fps is (10 sec: 45874.6, 60 sec: 44509.8, 300 sec: 44597.8). Total num frames: 717881344. Throughput: 0: 44679.5. Samples: 199196680. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 16:19:57,319][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:19:58,141][24347] Updated weights for policy 0, policy_version 43818 (0.0036) [2024-06-06 16:20:01,487][24347] Updated weights for policy 0, policy_version 43828 (0.0022) [2024-06-06 16:20:02,318][24114] Fps is (10 sec: 40959.5, 60 sec: 44509.8, 300 sec: 44597.8). Total num frames: 718110720. Throughput: 0: 44504.9. Samples: 199323380. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 16:20:02,327][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:20:05,369][24347] Updated weights for policy 0, policy_version 43838 (0.0033) [2024-06-06 16:20:07,318][24114] Fps is (10 sec: 47514.2, 60 sec: 44782.9, 300 sec: 44653.4). Total num frames: 718356480. Throughput: 0: 44564.5. Samples: 199597940. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 16:20:07,318][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:20:08,982][24347] Updated weights for policy 0, policy_version 43848 (0.0029) [2024-06-06 16:20:12,319][24114] Fps is (10 sec: 44232.7, 60 sec: 44509.1, 300 sec: 44597.7). Total num frames: 718553088. Throughput: 0: 44333.1. Samples: 199854500. Policy #0 lag: (min: 0.0, avg: 9.0, max: 21.0) [2024-06-06 16:20:12,320][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:20:12,604][24347] Updated weights for policy 0, policy_version 43858 (0.0031) [2024-06-06 16:20:16,422][24347] Updated weights for policy 0, policy_version 43868 (0.0030) [2024-06-06 16:20:17,318][24114] Fps is (10 sec: 42597.6, 60 sec: 44787.3, 300 sec: 44597.8). Total num frames: 718782464. Throughput: 0: 44404.3. Samples: 199989100. Policy #0 lag: (min: 0.0, avg: 12.0, max: 25.0) [2024-06-06 16:20:17,319][24114] Avg episode reward: [(0, '0.261')] [2024-06-06 16:20:19,899][24347] Updated weights for policy 0, policy_version 43878 (0.0032) [2024-06-06 16:20:22,318][24114] Fps is (10 sec: 47518.3, 60 sec: 44509.8, 300 sec: 44653.3). Total num frames: 719028224. Throughput: 0: 44445.3. Samples: 200266060. Policy #0 lag: (min: 0.0, avg: 12.0, max: 25.0) [2024-06-06 16:20:22,319][24114] Avg episode reward: [(0, '0.247')] [2024-06-06 16:20:22,335][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000043886_719028224.pth... [2024-06-06 16:20:22,389][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000043231_708296704.pth [2024-06-06 16:20:23,689][24347] Updated weights for policy 0, policy_version 43888 (0.0023) [2024-06-06 16:20:27,121][24347] Updated weights for policy 0, policy_version 43898 (0.0035) [2024-06-06 16:20:27,318][24114] Fps is (10 sec: 44237.3, 60 sec: 44509.8, 300 sec: 44653.3). Total num frames: 719224832. Throughput: 0: 44813.2. Samples: 200541460. Policy #0 lag: (min: 0.0, avg: 12.0, max: 25.0) [2024-06-06 16:20:27,319][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:20:30,581][24347] Updated weights for policy 0, policy_version 43908 (0.0033) [2024-06-06 16:20:32,318][24114] Fps is (10 sec: 40960.3, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 719437824. Throughput: 0: 44655.0. Samples: 200669020. Policy #0 lag: (min: 0.0, avg: 12.0, max: 25.0) [2024-06-06 16:20:32,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:20:34,549][24347] Updated weights for policy 0, policy_version 43918 (0.0029) [2024-06-06 16:20:37,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44241.2, 300 sec: 44597.8). Total num frames: 719683584. Throughput: 0: 44704.5. Samples: 200940700. Policy #0 lag: (min: 0.0, avg: 12.0, max: 25.0) [2024-06-06 16:20:37,318][24114] Avg episode reward: [(0, '0.249')] [2024-06-06 16:20:38,128][24347] Updated weights for policy 0, policy_version 43928 (0.0022) [2024-06-06 16:20:39,370][24326] Signal inference workers to stop experience collection... (2950 times) [2024-06-06 16:20:39,370][24326] Signal inference workers to resume experience collection... (2950 times) [2024-06-06 16:20:39,410][24347] InferenceWorker_p0-w0: stopping experience collection (2950 times) [2024-06-06 16:20:39,410][24347] InferenceWorker_p0-w0: resuming experience collection (2950 times) [2024-06-06 16:20:41,645][24347] Updated weights for policy 0, policy_version 43938 (0.0030) [2024-06-06 16:20:42,318][24114] Fps is (10 sec: 45875.7, 60 sec: 44509.9, 300 sec: 44708.9). Total num frames: 719896576. Throughput: 0: 44621.9. Samples: 201204660. Policy #0 lag: (min: 0.0, avg: 12.0, max: 25.0) [2024-06-06 16:20:42,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:20:45,681][24347] Updated weights for policy 0, policy_version 43948 (0.0027) [2024-06-06 16:20:47,318][24114] Fps is (10 sec: 44237.1, 60 sec: 45056.0, 300 sec: 44654.2). Total num frames: 720125952. Throughput: 0: 44907.3. Samples: 201344200. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:20:47,318][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:20:49,139][24347] Updated weights for policy 0, policy_version 43958 (0.0036) [2024-06-06 16:20:52,318][24114] Fps is (10 sec: 44236.3, 60 sec: 43963.7, 300 sec: 44598.1). Total num frames: 720338944. Throughput: 0: 44656.4. Samples: 201607480. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:20:52,319][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:20:52,792][24347] Updated weights for policy 0, policy_version 43968 (0.0037) [2024-06-06 16:20:56,630][24347] Updated weights for policy 0, policy_version 43978 (0.0028) [2024-06-06 16:20:57,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44783.1, 300 sec: 44708.9). Total num frames: 720568320. Throughput: 0: 45101.6. Samples: 201884020. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:20:57,318][24114] Avg episode reward: [(0, '0.258')] [2024-06-06 16:20:59,852][24347] Updated weights for policy 0, policy_version 43988 (0.0028) [2024-06-06 16:21:02,318][24114] Fps is (10 sec: 47513.8, 60 sec: 45056.1, 300 sec: 44764.4). Total num frames: 720814080. Throughput: 0: 44842.8. Samples: 202007020. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:21:02,318][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:21:03,802][24347] Updated weights for policy 0, policy_version 43998 (0.0034) [2024-06-06 16:21:07,305][24347] Updated weights for policy 0, policy_version 44008 (0.0034) [2024-06-06 16:21:07,318][24114] Fps is (10 sec: 45874.2, 60 sec: 44509.7, 300 sec: 44653.3). Total num frames: 721027072. Throughput: 0: 44800.0. Samples: 202282060. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:21:07,319][24114] Avg episode reward: [(0, '0.254')] [2024-06-06 16:21:11,241][24347] Updated weights for policy 0, policy_version 44018 (0.0039) [2024-06-06 16:21:12,318][24114] Fps is (10 sec: 44236.6, 60 sec: 45056.8, 300 sec: 44764.4). Total num frames: 721256448. Throughput: 0: 44500.0. Samples: 202543960. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 16:21:12,319][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:21:15,023][24347] Updated weights for policy 0, policy_version 44028 (0.0033) [2024-06-06 16:21:17,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44782.9, 300 sec: 44708.9). Total num frames: 721469440. Throughput: 0: 44674.6. Samples: 202679380. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 16:21:17,319][24114] Avg episode reward: [(0, '0.256')] [2024-06-06 16:21:18,638][24347] Updated weights for policy 0, policy_version 44038 (0.0032) [2024-06-06 16:21:22,250][24347] Updated weights for policy 0, policy_version 44048 (0.0027) [2024-06-06 16:21:22,318][24114] Fps is (10 sec: 42598.6, 60 sec: 44236.9, 300 sec: 44653.3). Total num frames: 721682432. Throughput: 0: 44518.7. Samples: 202944040. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 16:21:22,318][24114] Avg episode reward: [(0, '0.257')] [2024-06-06 16:21:25,711][24347] Updated weights for policy 0, policy_version 44058 (0.0048) [2024-06-06 16:21:27,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 721911808. Throughput: 0: 44768.8. Samples: 203219260. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 16:21:27,319][24114] Avg episode reward: [(0, '0.260')] [2024-06-06 16:21:29,465][24347] Updated weights for policy 0, policy_version 44068 (0.0036) [2024-06-06 16:21:32,318][24114] Fps is (10 sec: 45874.7, 60 sec: 45056.0, 300 sec: 44708.9). Total num frames: 722141184. Throughput: 0: 44543.4. Samples: 203348660. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 16:21:32,319][24114] Avg episode reward: [(0, '0.261')] [2024-06-06 16:21:33,130][24347] Updated weights for policy 0, policy_version 44078 (0.0045) [2024-06-06 16:21:36,993][24347] Updated weights for policy 0, policy_version 44088 (0.0028) [2024-06-06 16:21:37,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 722337792. Throughput: 0: 44677.8. Samples: 203617980. Policy #0 lag: (min: 0.0, avg: 11.5, max: 21.0) [2024-06-06 16:21:37,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:21:40,393][24347] Updated weights for policy 0, policy_version 44098 (0.0027) [2024-06-06 16:21:42,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44509.7, 300 sec: 44597.8). Total num frames: 722567168. Throughput: 0: 44465.6. Samples: 203884980. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 16:21:42,319][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 16:21:44,298][24347] Updated weights for policy 0, policy_version 44108 (0.0035) [2024-06-06 16:21:47,318][24114] Fps is (10 sec: 47513.4, 60 sec: 44782.9, 300 sec: 44764.4). Total num frames: 722812928. Throughput: 0: 44726.6. Samples: 204019720. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 16:21:47,319][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:21:47,823][24347] Updated weights for policy 0, policy_version 44118 (0.0025) [2024-06-06 16:21:51,649][24347] Updated weights for policy 0, policy_version 44128 (0.0035) [2024-06-06 16:21:52,123][24326] Signal inference workers to stop experience collection... (3000 times) [2024-06-06 16:21:52,124][24326] Signal inference workers to resume experience collection... (3000 times) [2024-06-06 16:21:52,151][24347] InferenceWorker_p0-w0: stopping experience collection (3000 times) [2024-06-06 16:21:52,151][24347] InferenceWorker_p0-w0: resuming experience collection (3000 times) [2024-06-06 16:21:52,318][24114] Fps is (10 sec: 45875.9, 60 sec: 44783.0, 300 sec: 44653.4). Total num frames: 723025920. Throughput: 0: 44453.5. Samples: 204282460. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 16:21:52,318][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:21:54,967][24347] Updated weights for policy 0, policy_version 44138 (0.0036) [2024-06-06 16:21:57,318][24114] Fps is (10 sec: 40959.7, 60 sec: 44236.7, 300 sec: 44542.8). Total num frames: 723222528. Throughput: 0: 44697.7. Samples: 204555360. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 16:21:57,319][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:21:58,719][24347] Updated weights for policy 0, policy_version 44148 (0.0034) [2024-06-06 16:22:02,294][24347] Updated weights for policy 0, policy_version 44158 (0.0029) [2024-06-06 16:22:02,318][24114] Fps is (10 sec: 45874.5, 60 sec: 44509.8, 300 sec: 44708.9). Total num frames: 723484672. Throughput: 0: 44558.7. Samples: 204684520. Policy #0 lag: (min: 0.0, avg: 10.7, max: 23.0) [2024-06-06 16:22:02,319][24114] Avg episode reward: [(0, '0.260')] [2024-06-06 16:22:06,583][24347] Updated weights for policy 0, policy_version 44168 (0.0043) [2024-06-06 16:22:07,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 723697664. Throughput: 0: 44554.1. Samples: 204948980. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 16:22:07,319][24114] Avg episode reward: [(0, '0.251')] [2024-06-06 16:22:09,685][24347] Updated weights for policy 0, policy_version 44178 (0.0039) [2024-06-06 16:22:12,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 723927040. Throughput: 0: 44578.3. Samples: 205225280. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 16:22:12,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:22:13,576][24347] Updated weights for policy 0, policy_version 44188 (0.0033) [2024-06-06 16:22:16,931][24347] Updated weights for policy 0, policy_version 44198 (0.0043) [2024-06-06 16:22:17,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44509.9, 300 sec: 44709.1). Total num frames: 724140032. Throughput: 0: 44514.3. Samples: 205351800. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 16:22:17,320][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 16:22:21,002][24347] Updated weights for policy 0, policy_version 44208 (0.0031) [2024-06-06 16:22:22,320][24114] Fps is (10 sec: 44227.8, 60 sec: 44781.4, 300 sec: 44597.5). Total num frames: 724369408. Throughput: 0: 44560.2. Samples: 205623280. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 16:22:22,321][24114] Avg episode reward: [(0, '0.250')] [2024-06-06 16:22:22,326][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000044212_724369408.pth... [2024-06-06 16:22:22,395][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000043559_713670656.pth [2024-06-06 16:22:24,288][24347] Updated weights for policy 0, policy_version 44218 (0.0041) [2024-06-06 16:22:27,318][24114] Fps is (10 sec: 42599.0, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 724566016. Throughput: 0: 44543.8. Samples: 205889440. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 16:22:27,318][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:22:28,243][24347] Updated weights for policy 0, policy_version 44228 (0.0026) [2024-06-06 16:22:31,483][24347] Updated weights for policy 0, policy_version 44238 (0.0023) [2024-06-06 16:22:32,318][24114] Fps is (10 sec: 44245.0, 60 sec: 44509.8, 300 sec: 44653.4). Total num frames: 724811776. Throughput: 0: 44479.0. Samples: 206021280. Policy #0 lag: (min: 0.0, avg: 9.6, max: 21.0) [2024-06-06 16:22:32,319][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 16:22:35,495][24347] Updated weights for policy 0, policy_version 44248 (0.0027) [2024-06-06 16:22:37,318][24114] Fps is (10 sec: 47513.4, 60 sec: 45056.0, 300 sec: 44708.9). Total num frames: 725041152. Throughput: 0: 44642.2. Samples: 206291360. Policy #0 lag: (min: 1.0, avg: 10.0, max: 23.0) [2024-06-06 16:22:37,318][24114] Avg episode reward: [(0, '0.257')] [2024-06-06 16:22:38,975][24347] Updated weights for policy 0, policy_version 44258 (0.0033) [2024-06-06 16:22:42,320][24114] Fps is (10 sec: 45866.9, 60 sec: 45054.6, 300 sec: 44653.1). Total num frames: 725270528. Throughput: 0: 44484.4. Samples: 206557240. Policy #0 lag: (min: 1.0, avg: 10.0, max: 23.0) [2024-06-06 16:22:42,320][24114] Avg episode reward: [(0, '0.260')] [2024-06-06 16:22:42,573][24347] Updated weights for policy 0, policy_version 44268 (0.0031) [2024-06-06 16:22:46,157][24347] Updated weights for policy 0, policy_version 44278 (0.0035) [2024-06-06 16:22:47,320][24114] Fps is (10 sec: 44228.0, 60 sec: 44508.4, 300 sec: 44653.1). Total num frames: 725483520. Throughput: 0: 44689.7. Samples: 206695640. Policy #0 lag: (min: 1.0, avg: 10.0, max: 23.0) [2024-06-06 16:22:47,320][24114] Avg episode reward: [(0, '0.263')] [2024-06-06 16:22:49,696][24347] Updated weights for policy 0, policy_version 44288 (0.0045) [2024-06-06 16:22:52,318][24114] Fps is (10 sec: 42607.1, 60 sec: 44509.9, 300 sec: 44653.4). Total num frames: 725696512. Throughput: 0: 44819.3. Samples: 206965840. Policy #0 lag: (min: 1.0, avg: 10.0, max: 23.0) [2024-06-06 16:22:52,318][24114] Avg episode reward: [(0, '0.255')] [2024-06-06 16:22:53,542][24347] Updated weights for policy 0, policy_version 44298 (0.0030) [2024-06-06 16:22:57,318][24114] Fps is (10 sec: 42607.1, 60 sec: 44783.1, 300 sec: 44486.7). Total num frames: 725909504. Throughput: 0: 44614.7. Samples: 207232940. Policy #0 lag: (min: 1.0, avg: 10.0, max: 23.0) [2024-06-06 16:22:57,318][24114] Avg episode reward: [(0, '0.264')] [2024-06-06 16:22:57,476][24347] Updated weights for policy 0, policy_version 44308 (0.0041) [2024-06-06 16:23:00,626][24347] Updated weights for policy 0, policy_version 44318 (0.0028) [2024-06-06 16:23:02,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44236.9, 300 sec: 44542.3). Total num frames: 726138880. Throughput: 0: 44771.1. Samples: 207366500. Policy #0 lag: (min: 1.0, avg: 10.0, max: 23.0) [2024-06-06 16:23:02,319][24114] Avg episode reward: [(0, '0.261')] [2024-06-06 16:23:04,709][24347] Updated weights for policy 0, policy_version 44328 (0.0034) [2024-06-06 16:23:07,318][24114] Fps is (10 sec: 45874.2, 60 sec: 44509.8, 300 sec: 44653.6). Total num frames: 726368256. Throughput: 0: 44594.3. Samples: 207629940. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 16:23:07,319][24114] Avg episode reward: [(0, '0.263')] [2024-06-06 16:23:08,190][24347] Updated weights for policy 0, policy_version 44338 (0.0034) [2024-06-06 16:23:11,552][24326] Signal inference workers to stop experience collection... (3050 times) [2024-06-06 16:23:11,552][24326] Signal inference workers to resume experience collection... (3050 times) [2024-06-06 16:23:11,591][24347] InferenceWorker_p0-w0: stopping experience collection (3050 times) [2024-06-06 16:23:11,591][24347] InferenceWorker_p0-w0: resuming experience collection (3050 times) [2024-06-06 16:23:11,690][24347] Updated weights for policy 0, policy_version 44348 (0.0036) [2024-06-06 16:23:12,318][24114] Fps is (10 sec: 47513.9, 60 sec: 44782.9, 300 sec: 44653.4). Total num frames: 726614016. Throughput: 0: 44715.5. Samples: 207901640. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 16:23:12,318][24114] Avg episode reward: [(0, '0.267')] [2024-06-06 16:23:15,468][24347] Updated weights for policy 0, policy_version 44358 (0.0034) [2024-06-06 16:23:17,318][24114] Fps is (10 sec: 45875.8, 60 sec: 44783.0, 300 sec: 44653.3). Total num frames: 726827008. Throughput: 0: 44971.3. Samples: 208044980. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 16:23:17,318][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 16:23:18,970][24347] Updated weights for policy 0, policy_version 44368 (0.0042) [2024-06-06 16:23:22,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44784.4, 300 sec: 44708.9). Total num frames: 727056384. Throughput: 0: 44947.5. Samples: 208314000. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 16:23:22,318][24114] Avg episode reward: [(0, '0.264')] [2024-06-06 16:23:22,633][24347] Updated weights for policy 0, policy_version 44378 (0.0029) [2024-06-06 16:23:26,660][24347] Updated weights for policy 0, policy_version 44388 (0.0036) [2024-06-06 16:23:27,318][24114] Fps is (10 sec: 45874.7, 60 sec: 45328.9, 300 sec: 44653.6). Total num frames: 727285760. Throughput: 0: 44829.8. Samples: 208574500. Policy #0 lag: (min: 0.0, avg: 10.0, max: 21.0) [2024-06-06 16:23:27,320][24114] Avg episode reward: [(0, '0.264')] [2024-06-06 16:23:29,904][24347] Updated weights for policy 0, policy_version 44398 (0.0037) [2024-06-06 16:23:32,318][24114] Fps is (10 sec: 40960.6, 60 sec: 44237.0, 300 sec: 44597.8). Total num frames: 727465984. Throughput: 0: 44790.9. Samples: 208711140. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:23:32,318][24114] Avg episode reward: [(0, '0.260')] [2024-06-06 16:23:34,139][24347] Updated weights for policy 0, policy_version 44408 (0.0038) [2024-06-06 16:23:37,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44782.9, 300 sec: 44764.4). Total num frames: 727728128. Throughput: 0: 44674.2. Samples: 208976180. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:23:37,318][24114] Avg episode reward: [(0, '0.264')] [2024-06-06 16:23:37,383][24347] Updated weights for policy 0, policy_version 44418 (0.0041) [2024-06-06 16:23:41,216][24347] Updated weights for policy 0, policy_version 44428 (0.0040) [2024-06-06 16:23:42,318][24114] Fps is (10 sec: 47513.4, 60 sec: 44511.4, 300 sec: 44598.1). Total num frames: 727941120. Throughput: 0: 44791.5. Samples: 209248560. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:23:42,318][24114] Avg episode reward: [(0, '0.263')] [2024-06-06 16:23:44,774][24347] Updated weights for policy 0, policy_version 44438 (0.0033) [2024-06-06 16:23:47,318][24114] Fps is (10 sec: 44236.9, 60 sec: 44784.4, 300 sec: 44653.3). Total num frames: 728170496. Throughput: 0: 44731.6. Samples: 209379420. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:23:47,318][24114] Avg episode reward: [(0, '0.260')] [2024-06-06 16:23:48,427][24347] Updated weights for policy 0, policy_version 44448 (0.0037) [2024-06-06 16:23:52,060][24347] Updated weights for policy 0, policy_version 44458 (0.0041) [2024-06-06 16:23:52,318][24114] Fps is (10 sec: 47512.7, 60 sec: 45328.9, 300 sec: 44764.4). Total num frames: 728416256. Throughput: 0: 45018.2. Samples: 209655760. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:23:52,319][24114] Avg episode reward: [(0, '0.268')] [2024-06-06 16:23:56,074][24347] Updated weights for policy 0, policy_version 44468 (0.0033) [2024-06-06 16:23:57,318][24114] Fps is (10 sec: 45874.8, 60 sec: 45329.0, 300 sec: 44708.9). Total num frames: 728629248. Throughput: 0: 44847.5. Samples: 209919780. Policy #0 lag: (min: 0.0, avg: 10.9, max: 22.0) [2024-06-06 16:23:57,318][24114] Avg episode reward: [(0, '0.265')] [2024-06-06 16:23:59,258][24347] Updated weights for policy 0, policy_version 44478 (0.0041) [2024-06-06 16:24:02,318][24114] Fps is (10 sec: 40960.7, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 728825856. Throughput: 0: 44642.7. Samples: 210053900. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:24:02,318][24114] Avg episode reward: [(0, '0.262')] [2024-06-06 16:24:03,428][24347] Updated weights for policy 0, policy_version 44488 (0.0029) [2024-06-06 16:24:06,832][24347] Updated weights for policy 0, policy_version 44498 (0.0033) [2024-06-06 16:24:07,318][24114] Fps is (10 sec: 44236.8, 60 sec: 45056.1, 300 sec: 44708.9). Total num frames: 729071616. Throughput: 0: 44673.8. Samples: 210324320. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:24:07,319][24114] Avg episode reward: [(0, '0.263')] [2024-06-06 16:24:10,572][24347] Updated weights for policy 0, policy_version 44508 (0.0037) [2024-06-06 16:24:12,319][24114] Fps is (10 sec: 45868.4, 60 sec: 44508.8, 300 sec: 44709.6). Total num frames: 729284608. Throughput: 0: 44888.9. Samples: 210594560. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:24:12,320][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 16:24:14,240][24347] Updated weights for policy 0, policy_version 44518 (0.0044) [2024-06-06 16:24:17,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44782.8, 300 sec: 44597.8). Total num frames: 729513984. Throughput: 0: 44773.5. Samples: 210725960. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:24:17,319][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 16:24:17,623][24347] Updated weights for policy 0, policy_version 44528 (0.0039) [2024-06-06 16:24:21,747][24347] Updated weights for policy 0, policy_version 44538 (0.0034) [2024-06-06 16:24:22,318][24114] Fps is (10 sec: 45881.7, 60 sec: 44783.0, 300 sec: 44708.9). Total num frames: 729743360. Throughput: 0: 44694.2. Samples: 210987420. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:24:22,318][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 16:24:22,337][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000044540_729743360.pth... [2024-06-06 16:24:22,403][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000043886_719028224.pth [2024-06-06 16:24:25,340][24347] Updated weights for policy 0, policy_version 44548 (0.0023) [2024-06-06 16:24:27,318][24114] Fps is (10 sec: 44237.5, 60 sec: 44509.9, 300 sec: 44708.9). Total num frames: 729956352. Throughput: 0: 44684.4. Samples: 211259360. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:24:27,318][24114] Avg episode reward: [(0, '0.264')] [2024-06-06 16:24:28,690][24347] Updated weights for policy 0, policy_version 44558 (0.0029) [2024-06-06 16:24:32,318][24114] Fps is (10 sec: 44236.4, 60 sec: 45328.9, 300 sec: 44598.7). Total num frames: 730185728. Throughput: 0: 44604.7. Samples: 211386640. Policy #0 lag: (min: 0.0, avg: 9.7, max: 19.0) [2024-06-06 16:24:32,319][24114] Avg episode reward: [(0, '0.271')] [2024-06-06 16:24:32,851][24347] Updated weights for policy 0, policy_version 44568 (0.0024) [2024-06-06 16:24:33,589][24326] Signal inference workers to stop experience collection... (3100 times) [2024-06-06 16:24:33,590][24326] Signal inference workers to resume experience collection... (3100 times) [2024-06-06 16:24:33,610][24347] InferenceWorker_p0-w0: stopping experience collection (3100 times) [2024-06-06 16:24:33,610][24347] InferenceWorker_p0-w0: resuming experience collection (3100 times) [2024-06-06 16:24:36,126][24347] Updated weights for policy 0, policy_version 44578 (0.0040) [2024-06-06 16:24:37,318][24114] Fps is (10 sec: 45874.7, 60 sec: 44782.8, 300 sec: 44708.9). Total num frames: 730415104. Throughput: 0: 44536.5. Samples: 211659900. Policy #0 lag: (min: 0.0, avg: 9.7, max: 19.0) [2024-06-06 16:24:37,319][24114] Avg episode reward: [(0, '0.261')] [2024-06-06 16:24:39,893][24347] Updated weights for policy 0, policy_version 44588 (0.0030) [2024-06-06 16:24:42,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44509.8, 300 sec: 44708.9). Total num frames: 730611712. Throughput: 0: 44647.6. Samples: 211928920. Policy #0 lag: (min: 0.0, avg: 9.7, max: 19.0) [2024-06-06 16:24:42,318][24114] Avg episode reward: [(0, '0.265')] [2024-06-06 16:24:43,450][24347] Updated weights for policy 0, policy_version 44598 (0.0030) [2024-06-06 16:24:47,281][24347] Updated weights for policy 0, policy_version 44608 (0.0036) [2024-06-06 16:24:47,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 730857472. Throughput: 0: 44536.8. Samples: 212058060. Policy #0 lag: (min: 0.0, avg: 9.7, max: 19.0) [2024-06-06 16:24:47,318][24114] Avg episode reward: [(0, '0.268')] [2024-06-06 16:24:51,185][24347] Updated weights for policy 0, policy_version 44618 (0.0042) [2024-06-06 16:24:52,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44236.9, 300 sec: 44708.9). Total num frames: 731070464. Throughput: 0: 44372.9. Samples: 212321100. Policy #0 lag: (min: 0.0, avg: 9.7, max: 19.0) [2024-06-06 16:24:52,324][24114] Avg episode reward: [(0, '0.273')] [2024-06-06 16:24:54,646][24347] Updated weights for policy 0, policy_version 44628 (0.0042) [2024-06-06 16:24:57,320][24114] Fps is (10 sec: 42590.1, 60 sec: 44235.4, 300 sec: 44653.1). Total num frames: 731283456. Throughput: 0: 44377.7. Samples: 212591580. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 16:24:57,321][24114] Avg episode reward: [(0, '0.268')] [2024-06-06 16:24:58,205][24347] Updated weights for policy 0, policy_version 44638 (0.0026) [2024-06-06 16:25:01,942][24347] Updated weights for policy 0, policy_version 44648 (0.0021) [2024-06-06 16:25:02,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 731512832. Throughput: 0: 44381.4. Samples: 212723120. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 16:25:02,319][24114] Avg episode reward: [(0, '0.270')] [2024-06-06 16:25:05,472][24347] Updated weights for policy 0, policy_version 44658 (0.0033) [2024-06-06 16:25:07,318][24114] Fps is (10 sec: 44245.1, 60 sec: 44236.8, 300 sec: 44653.5). Total num frames: 731725824. Throughput: 0: 44618.6. Samples: 212995260. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 16:25:07,319][24114] Avg episode reward: [(0, '0.265')] [2024-06-06 16:25:08,920][24347] Updated weights for policy 0, policy_version 44668 (0.0037) [2024-06-06 16:25:12,318][24114] Fps is (10 sec: 42599.1, 60 sec: 44237.9, 300 sec: 44597.8). Total num frames: 731938816. Throughput: 0: 44526.3. Samples: 213263040. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 16:25:12,318][24114] Avg episode reward: [(0, '0.260')] [2024-06-06 16:25:12,867][24347] Updated weights for policy 0, policy_version 44678 (0.0027) [2024-06-06 16:25:16,275][24347] Updated weights for policy 0, policy_version 44688 (0.0034) [2024-06-06 16:25:17,318][24114] Fps is (10 sec: 45874.7, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 732184576. Throughput: 0: 44866.6. Samples: 213405640. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 16:25:17,319][24114] Avg episode reward: [(0, '0.260')] [2024-06-06 16:25:20,410][24347] Updated weights for policy 0, policy_version 44698 (0.0036) [2024-06-06 16:25:22,318][24114] Fps is (10 sec: 45874.4, 60 sec: 44236.8, 300 sec: 44653.3). Total num frames: 732397568. Throughput: 0: 44505.8. Samples: 213662660. Policy #0 lag: (min: 0.0, avg: 10.7, max: 21.0) [2024-06-06 16:25:22,319][24114] Avg episode reward: [(0, '0.252')] [2024-06-06 16:25:23,915][24347] Updated weights for policy 0, policy_version 44708 (0.0030) [2024-06-06 16:25:27,318][24114] Fps is (10 sec: 45876.1, 60 sec: 44782.9, 300 sec: 44764.4). Total num frames: 732643328. Throughput: 0: 44567.6. Samples: 213934460. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:25:27,318][24114] Avg episode reward: [(0, '0.259')] [2024-06-06 16:25:27,480][24347] Updated weights for policy 0, policy_version 44718 (0.0026) [2024-06-06 16:25:31,120][24347] Updated weights for policy 0, policy_version 44728 (0.0038) [2024-06-06 16:25:32,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44236.9, 300 sec: 44597.8). Total num frames: 732839936. Throughput: 0: 44689.8. Samples: 214069100. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:25:32,318][24114] Avg episode reward: [(0, '0.263')] [2024-06-06 16:25:34,938][24347] Updated weights for policy 0, policy_version 44738 (0.0046) [2024-06-06 16:25:37,318][24114] Fps is (10 sec: 42598.2, 60 sec: 44236.9, 300 sec: 44653.3). Total num frames: 733069312. Throughput: 0: 44821.3. Samples: 214338060. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:25:37,318][24114] Avg episode reward: [(0, '0.270')] [2024-06-06 16:25:38,143][24347] Updated weights for policy 0, policy_version 44748 (0.0031) [2024-06-06 16:25:42,247][24347] Updated weights for policy 0, policy_version 44758 (0.0036) [2024-06-06 16:25:42,318][24114] Fps is (10 sec: 47512.7, 60 sec: 45055.9, 300 sec: 44708.8). Total num frames: 733315072. Throughput: 0: 44763.1. Samples: 214605840. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:25:42,319][24114] Avg episode reward: [(0, '0.265')] [2024-06-06 16:25:45,694][24347] Updated weights for policy 0, policy_version 44768 (0.0027) [2024-06-06 16:25:47,318][24114] Fps is (10 sec: 45874.6, 60 sec: 44509.8, 300 sec: 44708.9). Total num frames: 733528064. Throughput: 0: 44817.2. Samples: 214739900. Policy #0 lag: (min: 0.0, avg: 10.8, max: 23.0) [2024-06-06 16:25:47,319][24114] Avg episode reward: [(0, '0.270')] [2024-06-06 16:25:49,526][24347] Updated weights for policy 0, policy_version 44778 (0.0028) [2024-06-06 16:25:52,318][24114] Fps is (10 sec: 42598.7, 60 sec: 44509.8, 300 sec: 44653.3). Total num frames: 733741056. Throughput: 0: 44608.4. Samples: 215002640. Policy #0 lag: (min: 0.0, avg: 8.6, max: 18.0) [2024-06-06 16:25:52,319][24114] Avg episode reward: [(0, '0.266')] [2024-06-06 16:25:52,599][24326] Signal inference workers to stop experience collection... (3150 times) [2024-06-06 16:25:52,647][24347] InferenceWorker_p0-w0: stopping experience collection (3150 times) [2024-06-06 16:25:52,656][24326] Signal inference workers to resume experience collection... (3150 times) [2024-06-06 16:25:52,657][24347] InferenceWorker_p0-w0: resuming experience collection (3150 times) [2024-06-06 16:25:53,425][24347] Updated weights for policy 0, policy_version 44788 (0.0035) [2024-06-06 16:25:56,988][24347] Updated weights for policy 0, policy_version 44798 (0.0028) [2024-06-06 16:25:57,318][24114] Fps is (10 sec: 45876.1, 60 sec: 45057.5, 300 sec: 44653.3). Total num frames: 733986816. Throughput: 0: 44530.2. Samples: 215266900. Policy #0 lag: (min: 0.0, avg: 8.6, max: 18.0) [2024-06-06 16:25:57,318][24114] Avg episode reward: [(0, '0.263')] [2024-06-06 16:26:00,693][24347] Updated weights for policy 0, policy_version 44808 (0.0043) [2024-06-06 16:26:02,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 734183424. Throughput: 0: 44368.2. Samples: 215402200. Policy #0 lag: (min: 0.0, avg: 8.6, max: 18.0) [2024-06-06 16:26:02,318][24114] Avg episode reward: [(0, '0.273')] [2024-06-06 16:26:04,219][24347] Updated weights for policy 0, policy_version 44818 (0.0036) [2024-06-06 16:26:07,318][24114] Fps is (10 sec: 42598.5, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 734412800. Throughput: 0: 44646.4. Samples: 215671740. Policy #0 lag: (min: 0.0, avg: 8.6, max: 18.0) [2024-06-06 16:26:07,318][24114] Avg episode reward: [(0, '0.268')] [2024-06-06 16:26:08,254][24347] Updated weights for policy 0, policy_version 44828 (0.0030) [2024-06-06 16:26:11,496][24347] Updated weights for policy 0, policy_version 44838 (0.0029) [2024-06-06 16:26:12,318][24114] Fps is (10 sec: 45874.8, 60 sec: 45055.9, 300 sec: 44653.4). Total num frames: 734642176. Throughput: 0: 44464.4. Samples: 215935360. Policy #0 lag: (min: 0.0, avg: 8.6, max: 18.0) [2024-06-06 16:26:12,319][24114] Avg episode reward: [(0, '0.268')] [2024-06-06 16:26:15,384][24347] Updated weights for policy 0, policy_version 44848 (0.0044) [2024-06-06 16:26:17,322][24114] Fps is (10 sec: 44219.1, 60 sec: 44507.1, 300 sec: 44652.7). Total num frames: 734855168. Throughput: 0: 44605.9. Samples: 216076540. Policy #0 lag: (min: 0.0, avg: 8.6, max: 18.0) [2024-06-06 16:26:17,322][24114] Avg episode reward: [(0, '0.263')] [2024-06-06 16:26:19,067][24347] Updated weights for policy 0, policy_version 44858 (0.0035) [2024-06-06 16:26:22,322][24114] Fps is (10 sec: 44219.9, 60 sec: 44780.1, 300 sec: 44652.8). Total num frames: 735084544. Throughput: 0: 44665.1. Samples: 216348160. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 16:26:22,322][24114] Avg episode reward: [(0, '0.262')] [2024-06-06 16:26:22,333][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000044866_735084544.pth... [2024-06-06 16:26:22,390][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000044212_724369408.pth [2024-06-06 16:26:22,991][24347] Updated weights for policy 0, policy_version 44868 (0.0032) [2024-06-06 16:26:26,255][24347] Updated weights for policy 0, policy_version 44878 (0.0045) [2024-06-06 16:26:27,318][24114] Fps is (10 sec: 45892.8, 60 sec: 44509.8, 300 sec: 44653.3). Total num frames: 735313920. Throughput: 0: 44493.0. Samples: 216608020. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 16:26:27,319][24114] Avg episode reward: [(0, '0.269')] [2024-06-06 16:26:30,011][24347] Updated weights for policy 0, policy_version 44888 (0.0036) [2024-06-06 16:26:32,318][24114] Fps is (10 sec: 44253.4, 60 sec: 44782.8, 300 sec: 44708.9). Total num frames: 735526912. Throughput: 0: 44648.0. Samples: 216749060. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 16:26:32,319][24114] Avg episode reward: [(0, '0.269')] [2024-06-06 16:26:33,523][24347] Updated weights for policy 0, policy_version 44898 (0.0034) [2024-06-06 16:26:37,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44782.9, 300 sec: 44708.9). Total num frames: 735756288. Throughput: 0: 44844.0. Samples: 217020620. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 16:26:37,319][24114] Avg episode reward: [(0, '0.258')] [2024-06-06 16:26:37,535][24347] Updated weights for policy 0, policy_version 44908 (0.0034) [2024-06-06 16:26:40,685][24347] Updated weights for policy 0, policy_version 44918 (0.0028) [2024-06-06 16:26:42,318][24114] Fps is (10 sec: 44237.5, 60 sec: 44237.0, 300 sec: 44597.8). Total num frames: 735969280. Throughput: 0: 44871.1. Samples: 217286100. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 16:26:42,318][24114] Avg episode reward: [(0, '0.262')] [2024-06-06 16:26:45,283][24347] Updated weights for policy 0, policy_version 44928 (0.0033) [2024-06-06 16:26:47,318][24114] Fps is (10 sec: 44237.4, 60 sec: 44510.0, 300 sec: 44653.3). Total num frames: 736198656. Throughput: 0: 44777.3. Samples: 217417180. Policy #0 lag: (min: 0.0, avg: 10.0, max: 22.0) [2024-06-06 16:26:47,318][24114] Avg episode reward: [(0, '0.268')] [2024-06-06 16:26:48,184][24347] Updated weights for policy 0, policy_version 44938 (0.0031) [2024-06-06 16:26:52,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44510.0, 300 sec: 44708.9). Total num frames: 736411648. Throughput: 0: 44802.6. Samples: 217687860. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:26:52,318][24114] Avg episode reward: [(0, '0.268')] [2024-06-06 16:26:52,343][24347] Updated weights for policy 0, policy_version 44948 (0.0031) [2024-06-06 16:26:55,716][24347] Updated weights for policy 0, policy_version 44958 (0.0043) [2024-06-06 16:26:57,320][24114] Fps is (10 sec: 44228.0, 60 sec: 44235.3, 300 sec: 44597.5). Total num frames: 736641024. Throughput: 0: 44886.1. Samples: 217955320. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:26:57,321][24114] Avg episode reward: [(0, '0.270')] [2024-06-06 16:26:59,397][24347] Updated weights for policy 0, policy_version 44968 (0.0037) [2024-06-06 16:27:02,318][24114] Fps is (10 sec: 47513.4, 60 sec: 45056.0, 300 sec: 44708.9). Total num frames: 736886784. Throughput: 0: 44808.4. Samples: 218092740. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:27:02,318][24114] Avg episode reward: [(0, '0.263')] [2024-06-06 16:27:02,830][24347] Updated weights for policy 0, policy_version 44978 (0.0032) [2024-06-06 16:27:06,955][24347] Updated weights for policy 0, policy_version 44988 (0.0029) [2024-06-06 16:27:07,318][24114] Fps is (10 sec: 45884.1, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 737099776. Throughput: 0: 44678.0. Samples: 218358500. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:27:07,318][24114] Avg episode reward: [(0, '0.266')] [2024-06-06 16:27:09,823][24347] Updated weights for policy 0, policy_version 44998 (0.0032) [2024-06-06 16:27:12,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44509.9, 300 sec: 44653.3). Total num frames: 737312768. Throughput: 0: 44903.2. Samples: 218628660. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:27:12,319][24114] Avg episode reward: [(0, '0.270')] [2024-06-06 16:27:14,580][24347] Updated weights for policy 0, policy_version 45008 (0.0036) [2024-06-06 16:27:17,318][24114] Fps is (10 sec: 45875.5, 60 sec: 45059.0, 300 sec: 44709.2). Total num frames: 737558528. Throughput: 0: 44697.0. Samples: 218760420. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:27:17,318][24114] Avg episode reward: [(0, '0.265')] [2024-06-06 16:27:17,450][24347] Updated weights for policy 0, policy_version 45018 (0.0028) [2024-06-06 16:27:21,578][24347] Updated weights for policy 0, policy_version 45028 (0.0030) [2024-06-06 16:27:22,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44512.6, 300 sec: 44708.8). Total num frames: 737755136. Throughput: 0: 44715.5. Samples: 219032820. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 16:27:22,319][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:27:23,249][24326] Signal inference workers to stop experience collection... (3200 times) [2024-06-06 16:27:23,251][24326] Signal inference workers to resume experience collection... (3200 times) [2024-06-06 16:27:23,289][24347] InferenceWorker_p0-w0: stopping experience collection (3200 times) [2024-06-06 16:27:23,289][24347] InferenceWorker_p0-w0: resuming experience collection (3200 times) [2024-06-06 16:27:24,987][24347] Updated weights for policy 0, policy_version 45038 (0.0032) [2024-06-06 16:27:27,318][24114] Fps is (10 sec: 42597.9, 60 sec: 44509.9, 300 sec: 44653.4). Total num frames: 737984512. Throughput: 0: 44758.6. Samples: 219300240. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 16:27:27,319][24114] Avg episode reward: [(0, '0.271')] [2024-06-06 16:27:28,631][24347] Updated weights for policy 0, policy_version 45048 (0.0027) [2024-06-06 16:27:32,078][24347] Updated weights for policy 0, policy_version 45058 (0.0032) [2024-06-06 16:27:32,318][24114] Fps is (10 sec: 47513.4, 60 sec: 45055.9, 300 sec: 44708.8). Total num frames: 738230272. Throughput: 0: 44872.6. Samples: 219436460. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 16:27:32,319][24114] Avg episode reward: [(0, '0.273')] [2024-06-06 16:27:36,215][24347] Updated weights for policy 0, policy_version 45068 (0.0034) [2024-06-06 16:27:37,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 44653.6). Total num frames: 738443264. Throughput: 0: 44867.9. Samples: 219706920. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 16:27:37,319][24114] Avg episode reward: [(0, '0.282')] [2024-06-06 16:27:39,070][24347] Updated weights for policy 0, policy_version 45078 (0.0037) [2024-06-06 16:27:42,318][24114] Fps is (10 sec: 40961.0, 60 sec: 44509.9, 300 sec: 44598.1). Total num frames: 738639872. Throughput: 0: 44842.9. Samples: 219973160. Policy #0 lag: (min: 0.0, avg: 9.7, max: 20.0) [2024-06-06 16:27:42,318][24114] Avg episode reward: [(0, '0.272')] [2024-06-06 16:27:43,686][24347] Updated weights for policy 0, policy_version 45088 (0.0031) [2024-06-06 16:27:46,514][24347] Updated weights for policy 0, policy_version 45098 (0.0033) [2024-06-06 16:27:47,324][24114] Fps is (10 sec: 45848.1, 60 sec: 45051.5, 300 sec: 44763.5). Total num frames: 738902016. Throughput: 0: 44784.3. Samples: 220108300. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 16:27:47,325][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:27:50,832][24347] Updated weights for policy 0, policy_version 45108 (0.0029) [2024-06-06 16:27:52,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44782.9, 300 sec: 44708.9). Total num frames: 739098624. Throughput: 0: 44934.3. Samples: 220380540. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 16:27:52,318][24114] Avg episode reward: [(0, '0.270')] [2024-06-06 16:27:54,054][24347] Updated weights for policy 0, policy_version 45118 (0.0023) [2024-06-06 16:27:57,318][24114] Fps is (10 sec: 42623.7, 60 sec: 44784.4, 300 sec: 44708.9). Total num frames: 739328000. Throughput: 0: 45058.3. Samples: 220656280. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 16:27:57,318][24114] Avg episode reward: [(0, '0.278')] [2024-06-06 16:27:58,000][24347] Updated weights for policy 0, policy_version 45128 (0.0044) [2024-06-06 16:28:01,065][24347] Updated weights for policy 0, policy_version 45138 (0.0026) [2024-06-06 16:28:02,324][24114] Fps is (10 sec: 47485.0, 60 sec: 44778.5, 300 sec: 44763.5). Total num frames: 739573760. Throughput: 0: 44818.9. Samples: 220777540. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 16:28:02,325][24114] Avg episode reward: [(0, '0.266')] [2024-06-06 16:28:05,493][24347] Updated weights for policy 0, policy_version 45148 (0.0037) [2024-06-06 16:28:07,318][24114] Fps is (10 sec: 47513.5, 60 sec: 45056.0, 300 sec: 44708.9). Total num frames: 739803136. Throughput: 0: 44954.8. Samples: 221055780. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 16:28:07,318][24114] Avg episode reward: [(0, '0.278')] [2024-06-06 16:28:08,052][24347] Updated weights for policy 0, policy_version 45158 (0.0029) [2024-06-06 16:28:12,318][24114] Fps is (10 sec: 42623.6, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 739999744. Throughput: 0: 45011.1. Samples: 221325740. Policy #0 lag: (min: 0.0, avg: 9.9, max: 20.0) [2024-06-06 16:28:12,319][24114] Avg episode reward: [(0, '0.275')] [2024-06-06 16:28:13,054][24347] Updated weights for policy 0, policy_version 45168 (0.0030) [2024-06-06 16:28:15,861][24347] Updated weights for policy 0, policy_version 45178 (0.0035) [2024-06-06 16:28:17,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44782.8, 300 sec: 44708.9). Total num frames: 740245504. Throughput: 0: 44658.8. Samples: 221446100. Policy #0 lag: (min: 0.0, avg: 12.5, max: 26.0) [2024-06-06 16:28:17,319][24114] Avg episode reward: [(0, '0.269')] [2024-06-06 16:28:20,236][24347] Updated weights for policy 0, policy_version 45188 (0.0033) [2024-06-06 16:28:21,754][24326] Signal inference workers to stop experience collection... (3250 times) [2024-06-06 16:28:21,781][24347] InferenceWorker_p0-w0: stopping experience collection (3250 times) [2024-06-06 16:28:21,808][24326] Signal inference workers to resume experience collection... (3250 times) [2024-06-06 16:28:21,812][24347] InferenceWorker_p0-w0: resuming experience collection (3250 times) [2024-06-06 16:28:22,318][24114] Fps is (10 sec: 49151.7, 60 sec: 45602.2, 300 sec: 44764.4). Total num frames: 740491264. Throughput: 0: 44846.6. Samples: 221725020. Policy #0 lag: (min: 0.0, avg: 12.5, max: 26.0) [2024-06-06 16:28:22,318][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:28:22,333][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000045196_740491264.pth... [2024-06-06 16:28:22,394][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000044540_729743360.pth [2024-06-06 16:28:23,302][24347] Updated weights for policy 0, policy_version 45198 (0.0026) [2024-06-06 16:28:27,318][24114] Fps is (10 sec: 42599.0, 60 sec: 44783.0, 300 sec: 44764.4). Total num frames: 740671488. Throughput: 0: 44944.0. Samples: 221995640. Policy #0 lag: (min: 0.0, avg: 12.5, max: 26.0) [2024-06-06 16:28:27,318][24114] Avg episode reward: [(0, '0.279')] [2024-06-06 16:28:27,347][24347] Updated weights for policy 0, policy_version 45208 (0.0024) [2024-06-06 16:28:30,313][24347] Updated weights for policy 0, policy_version 45218 (0.0028) [2024-06-06 16:28:32,318][24114] Fps is (10 sec: 42599.1, 60 sec: 44783.1, 300 sec: 44708.9). Total num frames: 740917248. Throughput: 0: 44841.5. Samples: 222125900. Policy #0 lag: (min: 0.0, avg: 12.5, max: 26.0) [2024-06-06 16:28:32,318][24114] Avg episode reward: [(0, '0.279')] [2024-06-06 16:28:34,766][24347] Updated weights for policy 0, policy_version 45228 (0.0034) [2024-06-06 16:28:37,318][24114] Fps is (10 sec: 49151.3, 60 sec: 45329.0, 300 sec: 44819.9). Total num frames: 741163008. Throughput: 0: 44810.5. Samples: 222397020. Policy #0 lag: (min: 0.0, avg: 12.5, max: 26.0) [2024-06-06 16:28:37,319][24114] Avg episode reward: [(0, '0.279')] [2024-06-06 16:28:37,548][24347] Updated weights for policy 0, policy_version 45238 (0.0040) [2024-06-06 16:28:42,318][24114] Fps is (10 sec: 40959.8, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 741326848. Throughput: 0: 44822.2. Samples: 222673280. Policy #0 lag: (min: 0.0, avg: 12.5, max: 26.0) [2024-06-06 16:28:42,327][24114] Avg episode reward: [(0, '0.279')] [2024-06-06 16:28:42,406][24347] Updated weights for policy 0, policy_version 45248 (0.0024) [2024-06-06 16:28:44,976][24347] Updated weights for policy 0, policy_version 45258 (0.0024) [2024-06-06 16:28:47,318][24114] Fps is (10 sec: 42598.6, 60 sec: 44787.3, 300 sec: 44653.4). Total num frames: 741588992. Throughput: 0: 44923.3. Samples: 222798820. Policy #0 lag: (min: 0.0, avg: 12.4, max: 21.0) [2024-06-06 16:28:47,319][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:28:49,341][24347] Updated weights for policy 0, policy_version 45268 (0.0032) [2024-06-06 16:28:52,303][24347] Updated weights for policy 0, policy_version 45278 (0.0044) [2024-06-06 16:28:52,318][24114] Fps is (10 sec: 50789.9, 60 sec: 45602.0, 300 sec: 44764.4). Total num frames: 741834752. Throughput: 0: 44749.7. Samples: 223069520. Policy #0 lag: (min: 0.0, avg: 12.4, max: 21.0) [2024-06-06 16:28:52,319][24114] Avg episode reward: [(0, '0.273')] [2024-06-06 16:28:56,452][24347] Updated weights for policy 0, policy_version 45288 (0.0031) [2024-06-06 16:28:57,318][24114] Fps is (10 sec: 44237.1, 60 sec: 45056.0, 300 sec: 44764.4). Total num frames: 742031360. Throughput: 0: 44798.3. Samples: 223341660. Policy #0 lag: (min: 0.0, avg: 12.4, max: 21.0) [2024-06-06 16:28:57,318][24114] Avg episode reward: [(0, '0.272')] [2024-06-06 16:28:59,697][24347] Updated weights for policy 0, policy_version 45298 (0.0037) [2024-06-06 16:29:02,318][24114] Fps is (10 sec: 40960.4, 60 sec: 44514.3, 300 sec: 44653.3). Total num frames: 742244352. Throughput: 0: 45045.4. Samples: 223473140. Policy #0 lag: (min: 0.0, avg: 12.4, max: 21.0) [2024-06-06 16:29:02,319][24114] Avg episode reward: [(0, '0.272')] [2024-06-06 16:29:04,118][24347] Updated weights for policy 0, policy_version 45308 (0.0034) [2024-06-06 16:29:06,964][24347] Updated weights for policy 0, policy_version 45318 (0.0038) [2024-06-06 16:29:07,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 44764.6). Total num frames: 742490112. Throughput: 0: 44655.7. Samples: 223734520. Policy #0 lag: (min: 0.0, avg: 12.4, max: 21.0) [2024-06-06 16:29:07,318][24114] Avg episode reward: [(0, '0.267')] [2024-06-06 16:29:11,570][24347] Updated weights for policy 0, policy_version 45328 (0.0035) [2024-06-06 16:29:12,318][24114] Fps is (10 sec: 45874.9, 60 sec: 45056.0, 300 sec: 44708.9). Total num frames: 742703104. Throughput: 0: 44792.3. Samples: 224011300. Policy #0 lag: (min: 0.0, avg: 12.4, max: 21.0) [2024-06-06 16:29:12,318][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:29:14,396][24347] Updated weights for policy 0, policy_version 45338 (0.0034) [2024-06-06 16:29:17,318][24114] Fps is (10 sec: 40960.0, 60 sec: 44236.9, 300 sec: 44597.8). Total num frames: 742899712. Throughput: 0: 44726.6. Samples: 224138600. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 16:29:17,318][24114] Avg episode reward: [(0, '0.281')] [2024-06-06 16:29:18,606][24347] Updated weights for policy 0, policy_version 45348 (0.0043) [2024-06-06 16:29:21,556][24347] Updated weights for policy 0, policy_version 45358 (0.0033) [2024-06-06 16:29:22,318][24114] Fps is (10 sec: 45875.0, 60 sec: 44509.9, 300 sec: 44764.4). Total num frames: 743161856. Throughput: 0: 44698.6. Samples: 224408460. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 16:29:22,321][24114] Avg episode reward: [(0, '0.273')] [2024-06-06 16:29:25,865][24347] Updated weights for policy 0, policy_version 45368 (0.0027) [2024-06-06 16:29:27,318][24114] Fps is (10 sec: 47513.5, 60 sec: 45055.9, 300 sec: 44708.9). Total num frames: 743374848. Throughput: 0: 44452.0. Samples: 224673620. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 16:29:27,319][24114] Avg episode reward: [(0, '0.280')] [2024-06-06 16:29:28,967][24326] Signal inference workers to stop experience collection... (3300 times) [2024-06-06 16:29:28,987][24347] InferenceWorker_p0-w0: stopping experience collection (3300 times) [2024-06-06 16:29:29,024][24326] Signal inference workers to resume experience collection... (3300 times) [2024-06-06 16:29:29,024][24347] InferenceWorker_p0-w0: resuming experience collection (3300 times) [2024-06-06 16:29:29,159][24347] Updated weights for policy 0, policy_version 45378 (0.0022) [2024-06-06 16:29:32,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44509.8, 300 sec: 44653.3). Total num frames: 743587840. Throughput: 0: 44629.3. Samples: 224807140. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 16:29:32,319][24114] Avg episode reward: [(0, '0.283')] [2024-06-06 16:29:33,510][24347] Updated weights for policy 0, policy_version 45388 (0.0038) [2024-06-06 16:29:36,679][24347] Updated weights for policy 0, policy_version 45398 (0.0033) [2024-06-06 16:29:37,318][24114] Fps is (10 sec: 44236.2, 60 sec: 44236.8, 300 sec: 44764.4). Total num frames: 743817216. Throughput: 0: 44525.3. Samples: 225073160. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 16:29:37,319][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:29:40,820][24347] Updated weights for policy 0, policy_version 45408 (0.0033) [2024-06-06 16:29:42,320][24114] Fps is (10 sec: 45866.7, 60 sec: 45327.6, 300 sec: 44708.6). Total num frames: 744046592. Throughput: 0: 44420.7. Samples: 225340680. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:29:42,320][24114] Avg episode reward: [(0, '0.276')] [2024-06-06 16:29:43,608][24347] Updated weights for policy 0, policy_version 45418 (0.0029) [2024-06-06 16:29:47,318][24114] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 44653.3). Total num frames: 744243200. Throughput: 0: 44486.6. Samples: 225475040. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:29:47,319][24114] Avg episode reward: [(0, '0.283')] [2024-06-06 16:29:48,037][24347] Updated weights for policy 0, policy_version 45428 (0.0036) [2024-06-06 16:29:50,942][24347] Updated weights for policy 0, policy_version 45438 (0.0041) [2024-06-06 16:29:52,318][24114] Fps is (10 sec: 44245.4, 60 sec: 44236.9, 300 sec: 44764.7). Total num frames: 744488960. Throughput: 0: 44594.2. Samples: 225741260. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:29:52,319][24114] Avg episode reward: [(0, '0.275')] [2024-06-06 16:29:55,183][24347] Updated weights for policy 0, policy_version 45448 (0.0032) [2024-06-06 16:29:57,318][24114] Fps is (10 sec: 45875.8, 60 sec: 44509.9, 300 sec: 44708.9). Total num frames: 744701952. Throughput: 0: 44570.8. Samples: 226016980. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:29:57,318][24114] Avg episode reward: [(0, '0.284')] [2024-06-06 16:29:58,348][24347] Updated weights for policy 0, policy_version 45458 (0.0025) [2024-06-06 16:30:02,322][24114] Fps is (10 sec: 42580.5, 60 sec: 44506.7, 300 sec: 44708.3). Total num frames: 744914944. Throughput: 0: 44599.8. Samples: 226145780. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:30:02,323][24114] Avg episode reward: [(0, '0.277')] [2024-06-06 16:30:02,707][24347] Updated weights for policy 0, policy_version 45468 (0.0028) [2024-06-06 16:30:05,583][24347] Updated weights for policy 0, policy_version 45478 (0.0029) [2024-06-06 16:30:07,318][24114] Fps is (10 sec: 47513.3, 60 sec: 44782.9, 300 sec: 44875.5). Total num frames: 745177088. Throughput: 0: 44482.3. Samples: 226410160. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:30:07,318][24114] Avg episode reward: [(0, '0.283')] [2024-06-06 16:30:10,166][24347] Updated weights for policy 0, policy_version 45488 (0.0040) [2024-06-06 16:30:12,318][24114] Fps is (10 sec: 45894.5, 60 sec: 44509.9, 300 sec: 44708.9). Total num frames: 745373696. Throughput: 0: 44644.4. Samples: 226682620. Policy #0 lag: (min: 1.0, avg: 9.3, max: 21.0) [2024-06-06 16:30:12,319][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:30:12,885][24347] Updated weights for policy 0, policy_version 45498 (0.0033) [2024-06-06 16:30:17,166][24347] Updated weights for policy 0, policy_version 45508 (0.0043) [2024-06-06 16:30:17,318][24114] Fps is (10 sec: 42598.7, 60 sec: 45056.0, 300 sec: 44764.4). Total num frames: 745603072. Throughput: 0: 44603.3. Samples: 226814280. Policy #0 lag: (min: 1.0, avg: 9.3, max: 21.0) [2024-06-06 16:30:17,318][24114] Avg episode reward: [(0, '0.279')] [2024-06-06 16:30:20,477][24347] Updated weights for policy 0, policy_version 45518 (0.0041) [2024-06-06 16:30:22,318][24114] Fps is (10 sec: 45875.6, 60 sec: 44510.0, 300 sec: 44708.9). Total num frames: 745832448. Throughput: 0: 44548.2. Samples: 227077820. Policy #0 lag: (min: 1.0, avg: 9.3, max: 21.0) [2024-06-06 16:30:22,318][24114] Avg episode reward: [(0, '0.276')] [2024-06-06 16:30:22,329][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000045522_745832448.pth... [2024-06-06 16:30:22,396][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000044866_735084544.pth [2024-06-06 16:30:24,364][24347] Updated weights for policy 0, policy_version 45528 (0.0028) [2024-06-06 16:30:27,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 44708.9). Total num frames: 746029056. Throughput: 0: 44712.2. Samples: 227352640. Policy #0 lag: (min: 1.0, avg: 9.3, max: 21.0) [2024-06-06 16:30:27,318][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:30:27,983][24347] Updated weights for policy 0, policy_version 45538 (0.0026) [2024-06-06 16:30:32,051][24347] Updated weights for policy 0, policy_version 45548 (0.0026) [2024-06-06 16:30:32,318][24114] Fps is (10 sec: 42598.0, 60 sec: 44509.9, 300 sec: 44708.9). Total num frames: 746258432. Throughput: 0: 44594.3. Samples: 227481780. Policy #0 lag: (min: 1.0, avg: 9.3, max: 21.0) [2024-06-06 16:30:32,318][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:30:35,119][24347] Updated weights for policy 0, policy_version 45558 (0.0029) [2024-06-06 16:30:37,318][24114] Fps is (10 sec: 45874.8, 60 sec: 44510.0, 300 sec: 44653.4). Total num frames: 746487808. Throughput: 0: 44509.3. Samples: 227744180. Policy #0 lag: (min: 1.0, avg: 9.3, max: 21.0) [2024-06-06 16:30:37,318][24114] Avg episode reward: [(0, '0.279')] [2024-06-06 16:30:39,403][24347] Updated weights for policy 0, policy_version 45568 (0.0027) [2024-06-06 16:30:42,318][24114] Fps is (10 sec: 47513.9, 60 sec: 44784.4, 300 sec: 44764.5). Total num frames: 746733568. Throughput: 0: 44452.9. Samples: 228017360. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:30:42,318][24114] Avg episode reward: [(0, '0.275')] [2024-06-06 16:30:42,527][24347] Updated weights for policy 0, policy_version 45578 (0.0040) [2024-06-06 16:30:46,562][24347] Updated weights for policy 0, policy_version 45588 (0.0029) [2024-06-06 16:30:47,324][24114] Fps is (10 sec: 45848.0, 60 sec: 45051.6, 300 sec: 44763.5). Total num frames: 746946560. Throughput: 0: 44594.7. Samples: 228152620. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:30:47,325][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:30:49,848][24347] Updated weights for policy 0, policy_version 45598 (0.0024) [2024-06-06 16:30:52,318][24114] Fps is (10 sec: 40959.6, 60 sec: 44236.8, 300 sec: 44597.8). Total num frames: 747143168. Throughput: 0: 44501.3. Samples: 228412720. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:30:52,318][24114] Avg episode reward: [(0, '0.278')] [2024-06-06 16:30:53,673][24326] Signal inference workers to stop experience collection... (3350 times) [2024-06-06 16:30:53,720][24347] InferenceWorker_p0-w0: stopping experience collection (3350 times) [2024-06-06 16:30:53,727][24326] Signal inference workers to resume experience collection... (3350 times) [2024-06-06 16:30:53,735][24347] InferenceWorker_p0-w0: resuming experience collection (3350 times) [2024-06-06 16:30:53,872][24347] Updated weights for policy 0, policy_version 45608 (0.0027) [2024-06-06 16:30:57,318][24114] Fps is (10 sec: 44263.0, 60 sec: 44782.9, 300 sec: 44764.4). Total num frames: 747388928. Throughput: 0: 44495.1. Samples: 228684900. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:30:57,318][24114] Avg episode reward: [(0, '0.273')] [2024-06-06 16:30:57,361][24347] Updated weights for policy 0, policy_version 45618 (0.0040) [2024-06-06 16:33:45,098][24114] Fps is (10 sec: 1896.5, 60 sec: 11473.0, 300 sec: 28524.7). Total num frames: 747470848. Throughput: 0: 9658.8. Samples: 228821180. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,098][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,105][24114] Fps is (10 sec: 488.2, 60 sec: 10532.1, 300 sec: 28332.7). Total num frames: 747470848. Throughput: 0: 8597.0. Samples: 228821180. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,106][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,113][24114] Fps is (10 sec: 0.0, 60 sec: 9855.3, 300 sec: 28173.2). Total num frames: 747470848. Throughput: 0: 7424.6. Samples: 228821180. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,113][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,119][24114] Fps is (10 sec: 0.0, 60 sec: 8988.3, 300 sec: 27972.9). Total num frames: 747470848. Throughput: 0: 6947.1. Samples: 228821180. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,119][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,129][24114] Fps is (10 sec: 0.0, 60 sec: 8078.4, 300 sec: 27767.5). Total num frames: 747470848. Throughput: 0: 5747.5. Samples: 228823620. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,129][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,134][24114] Fps is (10 sec: 0.0, 60 sec: 7288.6, 300 sec: 27595.9). Total num frames: 747470848. Throughput: 0: 4410.2. Samples: 228823620. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,134][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,138][24114] Fps is (10 sec: 0.0, 60 sec: 6287.8, 300 sec: 27382.0). Total num frames: 747470848. Throughput: 0: 3773.6. Samples: 228823620. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,138][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,142][24114] Fps is (10 sec: 0.0, 60 sec: 5233.8, 300 sec: 27201.8). Total num frames: 747470848. Throughput: 0: 2377.6. Samples: 228823620. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,142][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,145][24114] Fps is (10 sec: 0.0, 60 sec: 4032.7, 300 sec: 26978.1). Total num frames: 747470848. Throughput: 0: 826.6. Samples: 228823620. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,145][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,149][24114] Fps is (10 sec: 0.0, 60 sec: 2948.3, 300 sec: 26788.7). Total num frames: 747470848. Throughput: 0: 47923.9. Samples: 228823620. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,149][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,153][24114] Fps is (10 sec: 0.0, 60 sec: 1895.9, 300 sec: 26554.5). Total num frames: 747470848. Throughput: 0: 50657.3. Samples: 228823620. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,153][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,169][24114] Fps is (10 sec: 0.0, 60 sec: 488.1, 300 sec: 26272.9). Total num frames: 747470848. Throughput: 0: 43581.1. Samples: 228823620. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,169][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,173][24114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 26067.5). Total num frames: 747470848. Throughput: 0: 66921.6. Samples: 228824800. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,173][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,177][24114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 25856.8). Total num frames: 747470848. Throughput: 0: 24559.9. Samples: 228824800. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,177][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,182][24114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 25556.2). Total num frames: 747470848. Throughput: 0: 24539.6. Samples: 228824800. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,182][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,185][24114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 25376.2). Total num frames: 747470848. Throughput: 0: 25097.4. Samples: 228824800. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,185][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,186][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000045623_747487232.pth... [2024-06-06 16:33:45,206][24114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 25103.6). Total num frames: 747470848. Throughput: 0: 18342.5. Samples: 228824800. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,206][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,219][24114] Fps is (10 sec: 478958.1, 60 sec: 191468.2, 300 sec: 24824.2). Total num frames: 747487232. Throughput: 0: 31381.1. Samples: 228825940. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,220][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,220][24114] Fps is (10 sec: 1138116.5, 60 sec: 198872.2, 300 sec: 24582.5). Total num frames: 747487232. Throughput: 0: 41654.7. Samples: 228826600. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,221][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,221][24114] Fps is (10 sec: 0.0, 60 sec: 206534.7, 300 sec: 24379.4). Total num frames: 747487232. Throughput: 0: 43959.4. Samples: 228826600. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,221][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,222][24114] Fps is (10 sec: 0.0, 60 sec: 214875.3, 300 sec: 23987.9). Total num frames: 747487232. Throughput: 0: 56359.0. Samples: 228826600. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,222][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,222][24114] Fps is (10 sec: 0.0, 60 sec: 223654.7, 300 sec: 23770.2). Total num frames: 747487232. Throughput: 0: 47693.6. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,222][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,223][24114] Fps is (10 sec: 0.0, 60 sec: 235837.4, 300 sec: 23452.5). Total num frames: 747487232. Throughput: 0: 51299.2. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,223][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,223][24114] Fps is (10 sec: 0.0, 60 sec: 300254.2, 300 sec: 23078.1). Total num frames: 747487232. Throughput: 0: 56391.2. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,223][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,228][24114] Fps is (10 sec: 0.0, 60 sec: 301328.5, 300 sec: 22740.1). Total num frames: 747487232. Throughput: 0: 55198.8. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,228][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,228][24114] Fps is (10 sec: 0.0, 60 sec: 319900.7, 300 sec: 22491.0). Total num frames: 747487232. Throughput: 0: 105029.3. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,228][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,229][24114] Fps is (10 sec: 0.0, 60 sec: 348774.9, 300 sec: 22084.4). Total num frames: 747487232. Throughput: 0: 126436.0. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,229][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,229][24114] Fps is (10 sec: 0.0, 60 sec: 370259.8, 300 sec: 21665.3). Total num frames: 747487232. Throughput: 0: 60795.2. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,230][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,230][24114] Fps is (10 sec: 0.0, 60 sec: 685269.2, 300 sec: 21439.1). Total num frames: 747487232. Throughput: 0: 60590.3. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,230][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,231][24114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 20996.2). Total num frames: 747487232. Throughput: 0: 60189.3. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,231][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,231][24114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 20539.0). Total num frames: 747487232. Throughput: 0: 0.0. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,232][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,232][24114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 20337.1). Total num frames: 747487232. Throughput: 0: 0.0. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,232][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,233][24114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 19798.4). Total num frames: 747487232. Throughput: 0: 0.0. Samples: 228827140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:33:45,233][24114] Avg episode reward: [(0, '0.274')] [2024-06-06 16:33:45,251][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000045196_740491264.pth [2024-06-06 16:33:45,372][24114] Heartbeat reconnected after 180 seconds from RolloutWorker_w21 [2024-06-06 16:33:46,748][24347] Updated weights for policy 0, policy_version 45628 (0.0034) [2024-06-06 16:33:47,318][24114] Fps is (10 sec: 54980.5, 60 sec: 54720.6, 300 sec: 19549.7). Total num frames: 747601920. Throughput: 0: 14120.5. Samples: 228856660. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 16:33:47,319][24114] Avg episode reward: [(0, '0.282')] [2024-06-06 16:33:49,810][24347] Updated weights for policy 0, policy_version 45638 (0.0027) [2024-06-06 16:33:52,320][24114] Fps is (10 sec: 43924.3, 60 sec: 43862.5, 300 sec: 19549.6). Total num frames: 747798528. Throughput: 0: 34909.4. Samples: 229074700. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 16:33:52,320][24114] Avg episode reward: [(0, '0.270')] [2024-06-06 16:33:54,010][24347] Updated weights for policy 0, policy_version 45648 (0.0033) [2024-06-06 16:33:57,000][24347] Updated weights for policy 0, policy_version 45658 (0.0048) [2024-06-06 16:33:57,318][24114] Fps is (10 sec: 45875.7, 60 sec: 47412.7, 300 sec: 19716.3). Total num frames: 748060672. Throughput: 0: 42727.5. Samples: 229343680. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 16:33:57,319][24114] Avg episode reward: [(0, '0.276')] [2024-06-06 16:34:00,989][24347] Updated weights for policy 0, policy_version 45668 (0.0032) [2024-06-06 16:34:02,318][24114] Fps is (10 sec: 47522.2, 60 sec: 46015.6, 300 sec: 19605.3). Total num frames: 748273664. Throughput: 0: 38610.3. Samples: 229486940. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 16:34:02,319][24114] Avg episode reward: [(0, '0.279')] [2024-06-06 16:34:04,419][24347] Updated weights for policy 0, policy_version 45678 (0.0039) [2024-06-06 16:34:07,318][24114] Fps is (10 sec: 40960.1, 60 sec: 44502.4, 300 sec: 19549.7). Total num frames: 748470272. Throughput: 0: 41710.5. Samples: 229748440. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 16:34:07,318][24114] Avg episode reward: [(0, '0.277')] [2024-06-06 16:34:08,470][24347] Updated weights for policy 0, policy_version 45688 (0.0025) [2024-06-06 16:34:11,749][24347] Updated weights for policy 0, policy_version 45698 (0.0033) [2024-06-06 16:34:12,318][24114] Fps is (10 sec: 45875.1, 60 sec: 45966.0, 300 sec: 19771.9). Total num frames: 748732416. Throughput: 0: 43991.6. Samples: 230018760. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 16:34:12,319][24114] Avg episode reward: [(0, '0.284')] [2024-06-06 16:34:15,927][24347] Updated weights for policy 0, policy_version 45708 (0.0028) [2024-06-06 16:34:17,318][24114] Fps is (10 sec: 47514.0, 60 sec: 45442.3, 300 sec: 19605.3). Total num frames: 748945408. Throughput: 0: 41529.0. Samples: 230159660. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:34:17,318][24114] Avg episode reward: [(0, '0.282')] [2024-06-06 16:34:18,857][24347] Updated weights for policy 0, policy_version 45718 (0.0028) [2024-06-06 16:34:22,318][24114] Fps is (10 sec: 44236.8, 60 sec: 45501.2, 300 sec: 19660.8). Total num frames: 749174784. Throughput: 0: 43187.8. Samples: 230428800. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:34:22,319][24114] Avg episode reward: [(0, '0.282')] [2024-06-06 16:34:22,331][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000045726_749174784.pth... [2024-06-06 16:34:22,373][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000045522_745832448.pth [2024-06-06 16:34:22,996][24347] Updated weights for policy 0, policy_version 45728 (0.0028) [2024-06-06 16:34:26,324][24347] Updated weights for policy 0, policy_version 45738 (0.0039) [2024-06-06 16:34:27,318][24114] Fps is (10 sec: 45874.4, 60 sec: 45546.3, 300 sec: 19716.3). Total num frames: 749404160. Throughput: 0: 44456.9. Samples: 230698120. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:34:27,319][24114] Avg episode reward: [(0, '0.287')] [2024-06-06 16:34:30,080][24347] Updated weights for policy 0, policy_version 45748 (0.0045) [2024-06-06 16:34:32,318][24114] Fps is (10 sec: 44237.6, 60 sec: 45234.2, 300 sec: 19660.8). Total num frames: 749617152. Throughput: 0: 43868.6. Samples: 230830740. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:34:32,318][24114] Avg episode reward: [(0, '0.283')] [2024-06-06 16:34:33,887][24347] Updated weights for policy 0, policy_version 45758 (0.0029) [2024-06-06 16:34:37,318][24114] Fps is (10 sec: 44237.5, 60 sec: 45296.3, 300 sec: 19660.9). Total num frames: 749846528. Throughput: 0: 45134.0. Samples: 231105640. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:34:37,318][24114] Avg episode reward: [(0, '0.279')] [2024-06-06 16:34:37,554][24347] Updated weights for policy 0, policy_version 45768 (0.0037) [2024-06-06 16:34:41,054][24347] Updated weights for policy 0, policy_version 45778 (0.0020) [2024-06-06 16:34:42,324][24114] Fps is (10 sec: 44210.2, 60 sec: 45055.8, 300 sec: 19715.9). Total num frames: 750059520. Throughput: 0: 45067.4. Samples: 231371980. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:34:42,325][24114] Avg episode reward: [(0, '0.283')] [2024-06-06 16:34:45,016][24347] Updated weights for policy 0, policy_version 45788 (0.0038) [2024-06-06 16:34:47,318][24114] Fps is (10 sec: 47512.8, 60 sec: 45329.1, 300 sec: 19771.9). Total num frames: 750321664. Throughput: 0: 44972.9. Samples: 231510720. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:34:47,319][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:34:48,016][24347] Updated weights for policy 0, policy_version 45798 (0.0035) [2024-06-06 16:34:52,127][24347] Updated weights for policy 0, policy_version 45808 (0.0033) [2024-06-06 16:34:52,318][24114] Fps is (10 sec: 45902.7, 60 sec: 45330.5, 300 sec: 19716.3). Total num frames: 750518272. Throughput: 0: 45187.6. Samples: 231781880. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:34:52,318][24114] Avg episode reward: [(0, '0.284')] [2024-06-06 16:34:55,606][24347] Updated weights for policy 0, policy_version 45818 (0.0027) [2024-06-06 16:34:57,318][24114] Fps is (10 sec: 40960.0, 60 sec: 44509.8, 300 sec: 19716.6). Total num frames: 750731264. Throughput: 0: 45166.2. Samples: 232051240. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:34:57,319][24114] Avg episode reward: [(0, '0.278')] [2024-06-06 16:34:59,247][24347] Updated weights for policy 0, policy_version 45828 (0.0031) [2024-06-06 16:35:02,318][24114] Fps is (10 sec: 45875.2, 60 sec: 45056.1, 300 sec: 19660.8). Total num frames: 750977024. Throughput: 0: 44999.5. Samples: 232184640. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:35:02,318][24114] Avg episode reward: [(0, '0.279')] [2024-06-06 16:35:03,010][24347] Updated weights for policy 0, policy_version 45838 (0.0040) [2024-06-06 16:35:04,810][24326] Signal inference workers to stop experience collection... (3400 times) [2024-06-06 16:35:04,811][24326] Signal inference workers to resume experience collection... (3400 times) [2024-06-06 16:35:04,828][24347] InferenceWorker_p0-w0: stopping experience collection (3400 times) [2024-06-06 16:35:04,828][24347] InferenceWorker_p0-w0: resuming experience collection (3400 times) [2024-06-06 16:35:06,606][24347] Updated weights for policy 0, policy_version 45848 (0.0035) [2024-06-06 16:35:07,318][24114] Fps is (10 sec: 47513.6, 60 sec: 45602.1, 300 sec: 19771.9). Total num frames: 751206400. Throughput: 0: 45012.5. Samples: 232454360. Policy #0 lag: (min: 0.0, avg: 10.4, max: 22.0) [2024-06-06 16:35:07,319][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:35:10,061][24347] Updated weights for policy 0, policy_version 45858 (0.0031) [2024-06-06 16:35:12,318][24114] Fps is (10 sec: 44236.1, 60 sec: 44782.9, 300 sec: 19716.3). Total num frames: 751419392. Throughput: 0: 45204.9. Samples: 232732340. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-06 16:35:12,319][24114] Avg episode reward: [(0, '0.281')] [2024-06-06 16:35:13,921][24347] Updated weights for policy 0, policy_version 45868 (0.0027) [2024-06-06 16:35:17,126][24347] Updated weights for policy 0, policy_version 45878 (0.0034) [2024-06-06 16:35:17,318][24114] Fps is (10 sec: 45875.3, 60 sec: 45328.9, 300 sec: 19771.9). Total num frames: 751665152. Throughput: 0: 45210.1. Samples: 232865200. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-06 16:35:17,319][24114] Avg episode reward: [(0, '0.277')] [2024-06-06 16:35:21,104][24347] Updated weights for policy 0, policy_version 45888 (0.0035) [2024-06-06 16:35:22,318][24114] Fps is (10 sec: 47514.2, 60 sec: 45329.2, 300 sec: 19883.0). Total num frames: 751894528. Throughput: 0: 45172.8. Samples: 233138420. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-06 16:35:22,318][24114] Avg episode reward: [(0, '0.286')] [2024-06-06 16:35:24,369][24347] Updated weights for policy 0, policy_version 45898 (0.0035) [2024-06-06 16:35:27,318][24114] Fps is (10 sec: 42599.0, 60 sec: 44783.1, 300 sec: 19771.9). Total num frames: 752091136. Throughput: 0: 45492.3. Samples: 233418860. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-06 16:35:27,318][24114] Avg episode reward: [(0, '0.275')] [2024-06-06 16:35:28,207][24347] Updated weights for policy 0, policy_version 45908 (0.0021) [2024-06-06 16:35:31,850][24347] Updated weights for policy 0, policy_version 45918 (0.0035) [2024-06-06 16:35:32,318][24114] Fps is (10 sec: 42598.5, 60 sec: 45056.0, 300 sec: 19771.9). Total num frames: 752320512. Throughput: 0: 45177.9. Samples: 233543720. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-06 16:35:32,318][24114] Avg episode reward: [(0, '0.284')] [2024-06-06 16:35:35,551][24347] Updated weights for policy 0, policy_version 45928 (0.0037) [2024-06-06 16:35:37,319][24114] Fps is (10 sec: 47506.5, 60 sec: 45327.9, 300 sec: 19771.8). Total num frames: 752566272. Throughput: 0: 45274.1. Samples: 233819280. Policy #0 lag: (min: 1.0, avg: 9.9, max: 22.0) [2024-06-06 16:35:37,320][24114] Avg episode reward: [(0, '0.281')] [2024-06-06 16:35:39,191][24347] Updated weights for policy 0, policy_version 45938 (0.0034) [2024-06-06 16:35:42,318][24114] Fps is (10 sec: 47513.3, 60 sec: 45606.6, 300 sec: 19827.8). Total num frames: 752795648. Throughput: 0: 45313.8. Samples: 234090360. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:35:42,318][24114] Avg episode reward: [(0, '0.285')] [2024-06-06 16:35:42,869][24347] Updated weights for policy 0, policy_version 45948 (0.0026) [2024-06-06 16:35:46,428][24347] Updated weights for policy 0, policy_version 45958 (0.0034) [2024-06-06 16:35:47,318][24114] Fps is (10 sec: 42604.3, 60 sec: 44509.9, 300 sec: 19827.4). Total num frames: 752992256. Throughput: 0: 45371.0. Samples: 234226340. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:35:47,319][24114] Avg episode reward: [(0, '0.280')] [2024-06-06 16:35:49,779][24347] Updated weights for policy 0, policy_version 45968 (0.0032) [2024-06-06 16:35:52,318][24114] Fps is (10 sec: 44237.0, 60 sec: 45329.0, 300 sec: 19827.4). Total num frames: 753238016. Throughput: 0: 45274.8. Samples: 234491720. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:35:52,318][24114] Avg episode reward: [(0, '0.289')] [2024-06-06 16:35:53,623][24347] Updated weights for policy 0, policy_version 45978 (0.0032) [2024-06-06 16:35:57,092][24347] Updated weights for policy 0, policy_version 45988 (0.0036) [2024-06-06 16:35:57,318][24114] Fps is (10 sec: 47514.0, 60 sec: 45602.2, 300 sec: 45352.8). Total num frames: 753467392. Throughput: 0: 45163.3. Samples: 234764680. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:35:57,318][24114] Avg episode reward: [(0, '0.284')] [2024-06-06 16:36:00,909][24347] Updated weights for policy 0, policy_version 45998 (0.0028) [2024-06-06 16:36:02,318][24114] Fps is (10 sec: 44237.0, 60 sec: 45056.0, 300 sec: 45254.8). Total num frames: 753680384. Throughput: 0: 45221.0. Samples: 234900140. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:36:02,318][24114] Avg episode reward: [(0, '0.281')] [2024-06-06 16:36:04,189][24347] Updated weights for policy 0, policy_version 46008 (0.0035) [2024-06-06 16:36:07,318][24114] Fps is (10 sec: 44236.9, 60 sec: 45056.1, 300 sec: 45279.0). Total num frames: 753909760. Throughput: 0: 45067.2. Samples: 235166440. Policy #0 lag: (min: 0.0, avg: 8.5, max: 21.0) [2024-06-06 16:36:07,318][24114] Avg episode reward: [(0, '0.282')] [2024-06-06 16:36:08,416][24347] Updated weights for policy 0, policy_version 46018 (0.0029) [2024-06-06 16:36:11,606][24347] Updated weights for policy 0, policy_version 46028 (0.0042) [2024-06-06 16:36:12,318][24114] Fps is (10 sec: 49151.6, 60 sec: 45875.3, 300 sec: 45523.8). Total num frames: 754171904. Throughput: 0: 45053.7. Samples: 235446280. Policy #0 lag: (min: 1.0, avg: 8.6, max: 21.0) [2024-06-06 16:36:12,319][24114] Avg episode reward: [(0, '0.286')] [2024-06-06 16:36:15,514][24347] Updated weights for policy 0, policy_version 46038 (0.0034) [2024-06-06 16:36:17,318][24114] Fps is (10 sec: 44236.1, 60 sec: 44782.9, 300 sec: 45215.4). Total num frames: 754352128. Throughput: 0: 45349.2. Samples: 235584440. Policy #0 lag: (min: 1.0, avg: 8.6, max: 21.0) [2024-06-06 16:36:17,319][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:36:18,838][24347] Updated weights for policy 0, policy_version 46048 (0.0028) [2024-06-06 16:36:22,318][24114] Fps is (10 sec: 40960.4, 60 sec: 44783.0, 300 sec: 45237.7). Total num frames: 754581504. Throughput: 0: 45251.7. Samples: 235855540. Policy #0 lag: (min: 1.0, avg: 8.6, max: 21.0) [2024-06-06 16:36:22,318][24114] Avg episode reward: [(0, '0.294')] [2024-06-06 16:36:22,431][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000046057_754597888.pth... [2024-06-06 16:36:22,486][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000045623_747487232.pth [2024-06-06 16:36:22,647][24347] Updated weights for policy 0, policy_version 46058 (0.0035) [2024-06-06 16:36:25,091][24326] Signal inference workers to stop experience collection... (3450 times) [2024-06-06 16:36:25,139][24347] InferenceWorker_p0-w0: stopping experience collection (3450 times) [2024-06-06 16:36:25,198][24326] Signal inference workers to resume experience collection... (3450 times) [2024-06-06 16:36:25,198][24347] InferenceWorker_p0-w0: resuming experience collection (3450 times) [2024-06-06 16:36:26,055][24347] Updated weights for policy 0, policy_version 46068 (0.0035) [2024-06-06 16:36:27,320][24114] Fps is (10 sec: 49142.8, 60 sec: 45873.7, 300 sec: 45460.1). Total num frames: 754843648. Throughput: 0: 45178.1. Samples: 236123460. Policy #0 lag: (min: 1.0, avg: 8.6, max: 21.0) [2024-06-06 16:36:27,321][24114] Avg episode reward: [(0, '0.284')] [2024-06-06 16:36:29,934][24347] Updated weights for policy 0, policy_version 46078 (0.0038) [2024-06-06 16:36:32,318][24114] Fps is (10 sec: 47513.6, 60 sec: 45602.1, 300 sec: 45376.0). Total num frames: 755056640. Throughput: 0: 45351.6. Samples: 236267160. Policy #0 lag: (min: 1.0, avg: 8.6, max: 21.0) [2024-06-06 16:36:32,318][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:36:33,322][24347] Updated weights for policy 0, policy_version 46088 (0.0041) [2024-06-06 16:36:37,318][24114] Fps is (10 sec: 40967.6, 60 sec: 44783.9, 300 sec: 45201.1). Total num frames: 755253248. Throughput: 0: 45443.0. Samples: 236536660. Policy #0 lag: (min: 1.0, avg: 8.6, max: 21.0) [2024-06-06 16:36:37,319][24114] Avg episode reward: [(0, '0.282')] [2024-06-06 16:36:37,569][24347] Updated weights for policy 0, policy_version 46098 (0.0039) [2024-06-06 16:36:40,366][24347] Updated weights for policy 0, policy_version 46108 (0.0021) [2024-06-06 16:36:42,318][24114] Fps is (10 sec: 45875.2, 60 sec: 45329.1, 300 sec: 45406.1). Total num frames: 755515392. Throughput: 0: 45369.8. Samples: 236806320. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:36:42,318][24114] Avg episode reward: [(0, '0.296')] [2024-06-06 16:36:44,690][24347] Updated weights for policy 0, policy_version 46118 (0.0030) [2024-06-06 16:36:47,318][24114] Fps is (10 sec: 47514.4, 60 sec: 45602.2, 300 sec: 45330.1). Total num frames: 755728384. Throughput: 0: 45525.8. Samples: 236948800. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:36:47,318][24114] Avg episode reward: [(0, '0.282')] [2024-06-06 16:36:47,556][24347] Updated weights for policy 0, policy_version 46128 (0.0033) [2024-06-06 16:36:51,672][24347] Updated weights for policy 0, policy_version 46138 (0.0025) [2024-06-06 16:36:52,318][24114] Fps is (10 sec: 44236.9, 60 sec: 45329.1, 300 sec: 45348.4). Total num frames: 755957760. Throughput: 0: 45660.4. Samples: 237221160. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:36:52,318][24114] Avg episode reward: [(0, '0.291')] [2024-06-06 16:36:54,994][24347] Updated weights for policy 0, policy_version 46148 (0.0039) [2024-06-06 16:36:57,318][24114] Fps is (10 sec: 45875.1, 60 sec: 45329.1, 300 sec: 45363.1). Total num frames: 756187136. Throughput: 0: 45265.0. Samples: 237483200. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:36:57,318][24114] Avg episode reward: [(0, '0.287')] [2024-06-06 16:36:59,106][24347] Updated weights for policy 0, policy_version 46158 (0.0026) [2024-06-06 16:37:02,217][24347] Updated weights for policy 0, policy_version 46168 (0.0033) [2024-06-06 16:37:02,318][24114] Fps is (10 sec: 45875.4, 60 sec: 45602.2, 300 sec: 45377.1). Total num frames: 756416512. Throughput: 0: 45404.2. Samples: 237627620. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:37:02,318][24114] Avg episode reward: [(0, '0.280')] [2024-06-06 16:37:06,426][24347] Updated weights for policy 0, policy_version 46178 (0.0037) [2024-06-06 16:37:07,318][24114] Fps is (10 sec: 44236.5, 60 sec: 45329.0, 300 sec: 45309.3). Total num frames: 756629504. Throughput: 0: 45452.4. Samples: 237900900. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:37:07,318][24114] Avg episode reward: [(0, '0.281')] [2024-06-06 16:37:09,294][24347] Updated weights for policy 0, policy_version 46188 (0.0031) [2024-06-06 16:37:12,318][24114] Fps is (10 sec: 45874.4, 60 sec: 45056.0, 300 sec: 45402.8). Total num frames: 756875264. Throughput: 0: 45517.5. Samples: 238171660. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:37:12,318][24114] Avg episode reward: [(0, '0.285')] [2024-06-06 16:37:13,371][24347] Updated weights for policy 0, policy_version 46198 (0.0031) [2024-06-06 16:37:16,592][24347] Updated weights for policy 0, policy_version 46208 (0.0043) [2024-06-06 16:37:17,318][24114] Fps is (10 sec: 49152.2, 60 sec: 46148.3, 300 sec: 45495.7). Total num frames: 757121024. Throughput: 0: 45339.5. Samples: 238307440. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:37:17,318][24114] Avg episode reward: [(0, '0.285')] [2024-06-06 16:37:20,346][24347] Updated weights for policy 0, policy_version 46218 (0.0029) [2024-06-06 16:37:22,318][24114] Fps is (10 sec: 44236.6, 60 sec: 45602.0, 300 sec: 45280.8). Total num frames: 757317632. Throughput: 0: 45428.9. Samples: 238580960. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:37:22,319][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:37:23,735][24326] Signal inference workers to stop experience collection... (3500 times) [2024-06-06 16:37:23,762][24347] InferenceWorker_p0-w0: stopping experience collection (3500 times) [2024-06-06 16:37:23,791][24326] Signal inference workers to resume experience collection... (3500 times) [2024-06-06 16:37:23,792][24347] InferenceWorker_p0-w0: resuming experience collection (3500 times) [2024-06-06 16:37:23,949][24347] Updated weights for policy 0, policy_version 46228 (0.0035) [2024-06-06 16:37:27,318][24114] Fps is (10 sec: 40959.4, 60 sec: 44784.3, 300 sec: 45220.6). Total num frames: 757530624. Throughput: 0: 45391.8. Samples: 238848960. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:37:27,319][24114] Avg episode reward: [(0, '0.284')] [2024-06-06 16:37:27,908][24347] Updated weights for policy 0, policy_version 46238 (0.0034) [2024-06-06 16:37:31,172][24347] Updated weights for policy 0, policy_version 46248 (0.0031) [2024-06-06 16:37:32,318][24114] Fps is (10 sec: 49152.0, 60 sec: 45875.1, 300 sec: 45451.6). Total num frames: 757809152. Throughput: 0: 45187.4. Samples: 238982240. Policy #0 lag: (min: 1.0, avg: 10.1, max: 22.0) [2024-06-06 16:37:32,319][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:37:35,233][24347] Updated weights for policy 0, policy_version 46258 (0.0031) [2024-06-06 16:37:37,320][24114] Fps is (10 sec: 45868.1, 60 sec: 45600.9, 300 sec: 45248.7). Total num frames: 757989376. Throughput: 0: 45328.9. Samples: 239261040. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 16:37:37,320][24114] Avg episode reward: [(0, '0.284')] [2024-06-06 16:37:38,190][24347] Updated weights for policy 0, policy_version 46268 (0.0027) [2024-06-06 16:37:42,294][24347] Updated weights for policy 0, policy_version 46278 (0.0028) [2024-06-06 16:37:42,318][24114] Fps is (10 sec: 40960.3, 60 sec: 45055.9, 300 sec: 45262.4). Total num frames: 758218752. Throughput: 0: 45475.5. Samples: 239529600. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 16:37:42,319][24114] Avg episode reward: [(0, '0.279')] [2024-06-06 16:37:45,580][24347] Updated weights for policy 0, policy_version 46288 (0.0037) [2024-06-06 16:37:47,318][24114] Fps is (10 sec: 47521.4, 60 sec: 45602.0, 300 sec: 45342.8). Total num frames: 758464512. Throughput: 0: 45398.5. Samples: 239670560. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 16:37:47,319][24114] Avg episode reward: [(0, '0.292')] [2024-06-06 16:37:49,353][24347] Updated weights for policy 0, policy_version 46298 (0.0028) [2024-06-06 16:37:52,318][24114] Fps is (10 sec: 42597.9, 60 sec: 44782.8, 300 sec: 45154.8). Total num frames: 758644736. Throughput: 0: 45221.2. Samples: 239935860. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 16:37:52,321][24114] Avg episode reward: [(0, '0.285')] [2024-06-06 16:37:52,943][24347] Updated weights for policy 0, policy_version 46308 (0.0027) [2024-06-06 16:37:56,698][24347] Updated weights for policy 0, policy_version 46318 (0.0030) [2024-06-06 16:37:57,318][24114] Fps is (10 sec: 42598.6, 60 sec: 45056.0, 300 sec: 45234.8). Total num frames: 758890496. Throughput: 0: 45234.3. Samples: 240207200. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 16:37:57,318][24114] Avg episode reward: [(0, '0.286')] [2024-06-06 16:37:59,998][24347] Updated weights for policy 0, policy_version 46328 (0.0029) [2024-06-06 16:38:02,318][24114] Fps is (10 sec: 47514.4, 60 sec: 45055.9, 300 sec: 45247.4). Total num frames: 759119872. Throughput: 0: 45128.0. Samples: 240338200. Policy #0 lag: (min: 0.0, avg: 9.6, max: 20.0) [2024-06-06 16:38:02,318][24114] Avg episode reward: [(0, '0.285')] [2024-06-06 16:38:04,168][24347] Updated weights for policy 0, policy_version 46338 (0.0027) [2024-06-06 16:38:07,219][24347] Updated weights for policy 0, policy_version 46348 (0.0025) [2024-06-06 16:38:07,318][24114] Fps is (10 sec: 47513.6, 60 sec: 45602.2, 300 sec: 45322.0). Total num frames: 759365632. Throughput: 0: 45136.6. Samples: 240612100. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 16:38:07,318][24114] Avg episode reward: [(0, '0.287')] [2024-06-06 16:38:11,243][24347] Updated weights for policy 0, policy_version 46358 (0.0035) [2024-06-06 16:38:12,319][24114] Fps is (10 sec: 47506.5, 60 sec: 45328.0, 300 sec: 45332.2). Total num frames: 759595008. Throughput: 0: 45222.6. Samples: 240884040. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 16:38:12,320][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:38:14,663][24347] Updated weights for policy 0, policy_version 46368 (0.0034) [2024-06-06 16:38:17,318][24114] Fps is (10 sec: 44236.6, 60 sec: 44782.9, 300 sec: 45282.3). Total num frames: 759808000. Throughput: 0: 45303.6. Samples: 241020900. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 16:38:17,318][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:38:18,156][24347] Updated weights for policy 0, policy_version 46378 (0.0036) [2024-06-06 16:38:21,853][24326] Signal inference workers to stop experience collection... (3550 times) [2024-06-06 16:38:21,902][24347] InferenceWorker_p0-w0: stopping experience collection (3550 times) [2024-06-06 16:38:21,960][24326] Signal inference workers to resume experience collection... (3550 times) [2024-06-06 16:38:21,960][24347] InferenceWorker_p0-w0: resuming experience collection (3550 times) [2024-06-06 16:38:21,962][24347] Updated weights for policy 0, policy_version 46388 (0.0031) [2024-06-06 16:38:22,318][24114] Fps is (10 sec: 45882.5, 60 sec: 45602.3, 300 sec: 45352.2). Total num frames: 760053760. Throughput: 0: 45234.7. Samples: 241296520. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 16:38:22,318][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:38:22,457][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000046391_760070144.pth... [2024-06-06 16:38:22,519][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000045726_749174784.pth [2024-06-06 16:38:25,700][24347] Updated weights for policy 0, policy_version 46398 (0.0028) [2024-06-06 16:38:27,318][24114] Fps is (10 sec: 44236.6, 60 sec: 45329.1, 300 sec: 45245.4). Total num frames: 760250368. Throughput: 0: 45072.4. Samples: 241557860. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 16:38:27,319][24114] Avg episode reward: [(0, '0.286')] [2024-06-06 16:38:29,217][24347] Updated weights for policy 0, policy_version 46408 (0.0037) [2024-06-06 16:38:32,318][24114] Fps is (10 sec: 42597.8, 60 sec: 44509.9, 300 sec: 45256.5). Total num frames: 760479744. Throughput: 0: 44793.3. Samples: 241686260. Policy #0 lag: (min: 0.0, avg: 9.2, max: 22.0) [2024-06-06 16:38:32,319][24114] Avg episode reward: [(0, '0.291')] [2024-06-06 16:38:32,969][24347] Updated weights for policy 0, policy_version 46418 (0.0034) [2024-06-06 16:38:36,302][24347] Updated weights for policy 0, policy_version 46428 (0.0035) [2024-06-06 16:38:37,318][24114] Fps is (10 sec: 47513.6, 60 sec: 45603.3, 300 sec: 45323.3). Total num frames: 760725504. Throughput: 0: 45271.2. Samples: 241973060. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:38:37,319][24114] Avg episode reward: [(0, '0.289')] [2024-06-06 16:38:39,903][24347] Updated weights for policy 0, policy_version 46438 (0.0038) [2024-06-06 16:38:42,318][24114] Fps is (10 sec: 44236.7, 60 sec: 45056.0, 300 sec: 45153.2). Total num frames: 760922112. Throughput: 0: 45133.7. Samples: 242238220. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:38:42,319][24114] Avg episode reward: [(0, '0.286')] [2024-06-06 16:38:43,880][24347] Updated weights for policy 0, policy_version 46448 (0.0036) [2024-06-06 16:38:47,286][24347] Updated weights for policy 0, policy_version 46458 (0.0026) [2024-06-06 16:38:47,324][24114] Fps is (10 sec: 44210.9, 60 sec: 45051.6, 300 sec: 45319.2). Total num frames: 761167872. Throughput: 0: 45131.8. Samples: 242369400. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:38:47,325][24114] Avg episode reward: [(0, '0.289')] [2024-06-06 16:38:51,140][24347] Updated weights for policy 0, policy_version 46468 (0.0025) [2024-06-06 16:38:52,318][24114] Fps is (10 sec: 49151.5, 60 sec: 46148.3, 300 sec: 45264.2). Total num frames: 761413632. Throughput: 0: 45306.9. Samples: 242650920. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:38:52,319][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:38:54,542][24347] Updated weights for policy 0, policy_version 46478 (0.0027) [2024-06-06 16:38:57,318][24114] Fps is (10 sec: 40984.5, 60 sec: 44782.9, 300 sec: 45097.7). Total num frames: 761577472. Throughput: 0: 45164.2. Samples: 242916360. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:38:57,318][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:38:58,123][24347] Updated weights for policy 0, policy_version 46488 (0.0031) [2024-06-06 16:39:02,142][24347] Updated weights for policy 0, policy_version 46498 (0.0033) [2024-06-06 16:39:02,318][24114] Fps is (10 sec: 42598.7, 60 sec: 45329.0, 300 sec: 45319.8). Total num frames: 761839616. Throughput: 0: 44946.6. Samples: 243043500. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:39:02,319][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:39:05,438][24347] Updated weights for policy 0, policy_version 46508 (0.0033) [2024-06-06 16:39:07,318][24114] Fps is (10 sec: 49151.7, 60 sec: 45056.0, 300 sec: 45208.7). Total num frames: 762068992. Throughput: 0: 44963.0. Samples: 243319860. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 16:39:07,319][24114] Avg episode reward: [(0, '0.286')] [2024-06-06 16:39:09,150][24347] Updated weights for policy 0, policy_version 46518 (0.0030) [2024-06-06 16:39:12,318][24114] Fps is (10 sec: 42598.9, 60 sec: 44511.0, 300 sec: 45153.2). Total num frames: 762265600. Throughput: 0: 45378.8. Samples: 243599900. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 16:39:12,318][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:39:12,813][24347] Updated weights for policy 0, policy_version 46528 (0.0034) [2024-06-06 16:39:12,974][24326] Signal inference workers to stop experience collection... (3600 times) [2024-06-06 16:39:13,017][24347] InferenceWorker_p0-w0: stopping experience collection (3600 times) [2024-06-06 16:39:13,027][24326] Signal inference workers to resume experience collection... (3600 times) [2024-06-06 16:39:13,031][24347] InferenceWorker_p0-w0: resuming experience collection (3600 times) [2024-06-06 16:39:16,410][24347] Updated weights for policy 0, policy_version 46538 (0.0024) [2024-06-06 16:39:17,318][24114] Fps is (10 sec: 45875.5, 60 sec: 45329.1, 300 sec: 45264.3). Total num frames: 762527744. Throughput: 0: 45370.7. Samples: 243727940. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 16:39:17,318][24114] Avg episode reward: [(0, '0.283')] [2024-06-06 16:39:20,055][24347] Updated weights for policy 0, policy_version 46548 (0.0031) [2024-06-06 16:39:22,318][24114] Fps is (10 sec: 47512.5, 60 sec: 44782.7, 300 sec: 45208.7). Total num frames: 762740736. Throughput: 0: 44797.7. Samples: 243988960. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 16:39:22,319][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:39:23,635][24347] Updated weights for policy 0, policy_version 46558 (0.0028) [2024-06-06 16:39:27,318][24114] Fps is (10 sec: 42598.5, 60 sec: 45056.1, 300 sec: 45208.7). Total num frames: 762953728. Throughput: 0: 44899.2. Samples: 244258680. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 16:39:27,318][24114] Avg episode reward: [(0, '0.287')] [2024-06-06 16:39:27,358][24347] Updated weights for policy 0, policy_version 46568 (0.0026) [2024-06-06 16:39:31,004][24347] Updated weights for policy 0, policy_version 46578 (0.0045) [2024-06-06 16:39:32,318][24114] Fps is (10 sec: 44237.3, 60 sec: 45056.0, 300 sec: 45208.7). Total num frames: 763183104. Throughput: 0: 44971.2. Samples: 244392840. Policy #0 lag: (min: 0.0, avg: 8.3, max: 21.0) [2024-06-06 16:39:32,318][24114] Avg episode reward: [(0, '0.280')] [2024-06-06 16:39:34,643][24347] Updated weights for policy 0, policy_version 46588 (0.0026) [2024-06-06 16:39:37,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44509.9, 300 sec: 45209.6). Total num frames: 763396096. Throughput: 0: 44707.7. Samples: 244662760. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:39:37,318][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:39:38,020][24347] Updated weights for policy 0, policy_version 46598 (0.0032) [2024-06-06 16:39:42,210][24347] Updated weights for policy 0, policy_version 46608 (0.0031) [2024-06-06 16:39:42,318][24114] Fps is (10 sec: 44237.3, 60 sec: 45056.1, 300 sec: 45097.7). Total num frames: 763625472. Throughput: 0: 44934.7. Samples: 244938420. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:39:42,318][24114] Avg episode reward: [(0, '0.277')] [2024-06-06 16:39:45,304][24347] Updated weights for policy 0, policy_version 46618 (0.0036) [2024-06-06 16:39:47,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44787.4, 300 sec: 45208.7). Total num frames: 763854848. Throughput: 0: 45073.0. Samples: 245071780. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:39:47,319][24114] Avg episode reward: [(0, '0.289')] [2024-06-06 16:39:49,409][24347] Updated weights for policy 0, policy_version 46628 (0.0032) [2024-06-06 16:39:52,318][24114] Fps is (10 sec: 45874.6, 60 sec: 44509.9, 300 sec: 45264.3). Total num frames: 764084224. Throughput: 0: 44702.2. Samples: 245331460. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:39:52,319][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:39:52,724][24347] Updated weights for policy 0, policy_version 46638 (0.0032) [2024-06-06 16:39:56,841][24347] Updated weights for policy 0, policy_version 46648 (0.0048) [2024-06-06 16:39:57,318][24114] Fps is (10 sec: 44237.1, 60 sec: 45329.1, 300 sec: 45153.2). Total num frames: 764297216. Throughput: 0: 44603.1. Samples: 245607040. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:39:57,318][24114] Avg episode reward: [(0, '0.280')] [2024-06-06 16:40:00,146][24347] Updated weights for policy 0, policy_version 46658 (0.0042) [2024-06-06 16:40:02,318][24114] Fps is (10 sec: 42598.9, 60 sec: 44510.0, 300 sec: 45097.7). Total num frames: 764510208. Throughput: 0: 44617.3. Samples: 245735720. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 16:40:02,318][24114] Avg episode reward: [(0, '0.292')] [2024-06-06 16:40:04,076][24347] Updated weights for policy 0, policy_version 46668 (0.0032) [2024-06-06 16:40:07,257][24347] Updated weights for policy 0, policy_version 46678 (0.0031) [2024-06-06 16:40:07,318][24114] Fps is (10 sec: 47513.2, 60 sec: 45056.0, 300 sec: 45264.3). Total num frames: 764772352. Throughput: 0: 44850.8. Samples: 246007240. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:40:07,319][24114] Avg episode reward: [(0, '0.291')] [2024-06-06 16:40:11,330][24347] Updated weights for policy 0, policy_version 46688 (0.0021) [2024-06-06 16:40:12,318][24114] Fps is (10 sec: 49152.0, 60 sec: 45602.1, 300 sec: 45208.7). Total num frames: 765001728. Throughput: 0: 45006.2. Samples: 246283960. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:40:12,318][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:40:14,362][24347] Updated weights for policy 0, policy_version 46698 (0.0030) [2024-06-06 16:40:17,318][24114] Fps is (10 sec: 40960.0, 60 sec: 44236.8, 300 sec: 45042.1). Total num frames: 765181952. Throughput: 0: 44906.3. Samples: 246413620. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:40:17,319][24114] Avg episode reward: [(0, '0.292')] [2024-06-06 16:40:18,605][24347] Updated weights for policy 0, policy_version 46708 (0.0033) [2024-06-06 16:40:21,179][24326] Signal inference workers to stop experience collection... (3650 times) [2024-06-06 16:40:21,210][24347] InferenceWorker_p0-w0: stopping experience collection (3650 times) [2024-06-06 16:40:21,248][24326] Signal inference workers to resume experience collection... (3650 times) [2024-06-06 16:40:21,248][24347] InferenceWorker_p0-w0: resuming experience collection (3650 times) [2024-06-06 16:40:21,871][24347] Updated weights for policy 0, policy_version 46718 (0.0021) [2024-06-06 16:40:22,320][24114] Fps is (10 sec: 42589.9, 60 sec: 44781.6, 300 sec: 45208.4). Total num frames: 765427712. Throughput: 0: 44847.4. Samples: 246680980. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:40:22,320][24114] Avg episode reward: [(0, '0.296')] [2024-06-06 16:40:22,339][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000046718_765427712.pth... [2024-06-06 16:40:22,405][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000046057_754597888.pth [2024-06-06 16:40:26,101][24347] Updated weights for policy 0, policy_version 46728 (0.0031) [2024-06-06 16:40:27,321][24114] Fps is (10 sec: 45860.1, 60 sec: 44780.4, 300 sec: 45152.7). Total num frames: 765640704. Throughput: 0: 44629.1. Samples: 246946880. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:40:27,322][24114] Avg episode reward: [(0, '0.295')] [2024-06-06 16:40:29,229][24347] Updated weights for policy 0, policy_version 46738 (0.0031) [2024-06-06 16:40:32,318][24114] Fps is (10 sec: 42606.4, 60 sec: 44509.9, 300 sec: 45042.3). Total num frames: 765853696. Throughput: 0: 44680.8. Samples: 247082420. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:40:32,318][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:40:33,509][24347] Updated weights for policy 0, policy_version 46748 (0.0030) [2024-06-06 16:40:36,299][24347] Updated weights for policy 0, policy_version 46758 (0.0024) [2024-06-06 16:40:37,318][24114] Fps is (10 sec: 45890.7, 60 sec: 45056.0, 300 sec: 45097.7). Total num frames: 766099456. Throughput: 0: 44920.6. Samples: 247352880. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 16:40:37,318][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:40:40,702][24347] Updated weights for policy 0, policy_version 46768 (0.0027) [2024-06-06 16:40:42,318][24114] Fps is (10 sec: 47513.9, 60 sec: 45056.0, 300 sec: 45208.7). Total num frames: 766328832. Throughput: 0: 44836.8. Samples: 247624700. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 16:40:42,319][24114] Avg episode reward: [(0, '0.301')] [2024-06-06 16:40:43,480][24347] Updated weights for policy 0, policy_version 46778 (0.0025) [2024-06-06 16:40:47,318][24114] Fps is (10 sec: 42597.5, 60 sec: 44509.8, 300 sec: 45042.1). Total num frames: 766525440. Throughput: 0: 44979.8. Samples: 247759820. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 16:40:47,319][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:40:47,995][24347] Updated weights for policy 0, policy_version 46788 (0.0029) [2024-06-06 16:40:50,983][24347] Updated weights for policy 0, policy_version 46798 (0.0035) [2024-06-06 16:40:52,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44509.9, 300 sec: 45042.1). Total num frames: 766754816. Throughput: 0: 44816.9. Samples: 248024000. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 16:40:52,318][24114] Avg episode reward: [(0, '0.289')] [2024-06-06 16:40:55,554][24347] Updated weights for policy 0, policy_version 46808 (0.0033) [2024-06-06 16:40:57,318][24114] Fps is (10 sec: 47513.9, 60 sec: 45055.9, 300 sec: 45153.2). Total num frames: 767000576. Throughput: 0: 44688.8. Samples: 248294960. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 16:40:57,319][24114] Avg episode reward: [(0, '0.292')] [2024-06-06 16:40:58,312][24347] Updated weights for policy 0, policy_version 46818 (0.0031) [2024-06-06 16:41:02,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44782.9, 300 sec: 45042.1). Total num frames: 767197184. Throughput: 0: 44958.2. Samples: 248436740. Policy #0 lag: (min: 0.0, avg: 10.1, max: 22.0) [2024-06-06 16:41:02,319][24114] Avg episode reward: [(0, '0.295')] [2024-06-06 16:41:02,898][24347] Updated weights for policy 0, policy_version 46828 (0.0038) [2024-06-06 16:41:05,401][24347] Updated weights for policy 0, policy_version 46838 (0.0036) [2024-06-06 16:41:07,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 44931.0). Total num frames: 767426560. Throughput: 0: 44873.0. Samples: 248700180. Policy #0 lag: (min: 0.0, avg: 8.3, max: 19.0) [2024-06-06 16:41:07,319][24114] Avg episode reward: [(0, '0.291')] [2024-06-06 16:41:10,071][24347] Updated weights for policy 0, policy_version 46848 (0.0034) [2024-06-06 16:41:12,318][24114] Fps is (10 sec: 50790.4, 60 sec: 45056.0, 300 sec: 45264.3). Total num frames: 767705088. Throughput: 0: 44954.4. Samples: 248969680. Policy #0 lag: (min: 0.0, avg: 8.3, max: 19.0) [2024-06-06 16:41:12,319][24114] Avg episode reward: [(0, '0.296')] [2024-06-06 16:41:12,524][24347] Updated weights for policy 0, policy_version 46858 (0.0033) [2024-06-06 16:41:17,180][24347] Updated weights for policy 0, policy_version 46868 (0.0033) [2024-06-06 16:41:17,318][24114] Fps is (10 sec: 45875.7, 60 sec: 45056.0, 300 sec: 45097.6). Total num frames: 767885312. Throughput: 0: 44945.9. Samples: 249104980. Policy #0 lag: (min: 0.0, avg: 8.3, max: 19.0) [2024-06-06 16:41:17,319][24114] Avg episode reward: [(0, '0.292')] [2024-06-06 16:41:20,091][24347] Updated weights for policy 0, policy_version 46878 (0.0032) [2024-06-06 16:41:22,318][24114] Fps is (10 sec: 39321.5, 60 sec: 44511.3, 300 sec: 44931.3). Total num frames: 768098304. Throughput: 0: 44858.1. Samples: 249371500. Policy #0 lag: (min: 0.0, avg: 8.3, max: 19.0) [2024-06-06 16:41:22,319][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:41:24,643][24347] Updated weights for policy 0, policy_version 46888 (0.0028) [2024-06-06 16:41:27,255][24347] Updated weights for policy 0, policy_version 46898 (0.0036) [2024-06-06 16:41:27,318][24114] Fps is (10 sec: 49151.7, 60 sec: 45604.6, 300 sec: 45153.2). Total num frames: 768376832. Throughput: 0: 44845.3. Samples: 249642740. Policy #0 lag: (min: 0.0, avg: 8.3, max: 19.0) [2024-06-06 16:41:27,319][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:41:31,901][24326] Signal inference workers to stop experience collection... (3700 times) [2024-06-06 16:41:31,932][24347] InferenceWorker_p0-w0: stopping experience collection (3700 times) [2024-06-06 16:41:31,963][24326] Signal inference workers to resume experience collection... (3700 times) [2024-06-06 16:41:31,967][24347] InferenceWorker_p0-w0: resuming experience collection (3700 times) [2024-06-06 16:41:32,126][24347] Updated weights for policy 0, policy_version 46908 (0.0039) [2024-06-06 16:41:32,318][24114] Fps is (10 sec: 45875.2, 60 sec: 45056.0, 300 sec: 45097.7). Total num frames: 768557056. Throughput: 0: 44985.0. Samples: 249784140. Policy #0 lag: (min: 1.0, avg: 8.5, max: 19.0) [2024-06-06 16:41:32,319][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:41:34,504][24347] Updated weights for policy 0, policy_version 46918 (0.0028) [2024-06-06 16:41:37,318][24114] Fps is (10 sec: 39321.6, 60 sec: 44509.8, 300 sec: 44931.0). Total num frames: 768770048. Throughput: 0: 45072.4. Samples: 250052260. Policy #0 lag: (min: 1.0, avg: 8.5, max: 19.0) [2024-06-06 16:41:37,319][24114] Avg episode reward: [(0, '0.294')] [2024-06-06 16:41:39,169][24347] Updated weights for policy 0, policy_version 46928 (0.0040) [2024-06-06 16:41:41,656][24347] Updated weights for policy 0, policy_version 46938 (0.0034) [2024-06-06 16:41:42,320][24114] Fps is (10 sec: 47504.5, 60 sec: 45054.6, 300 sec: 45097.3). Total num frames: 769032192. Throughput: 0: 44819.5. Samples: 250311920. Policy #0 lag: (min: 1.0, avg: 8.5, max: 19.0) [2024-06-06 16:41:42,320][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:41:46,599][24347] Updated weights for policy 0, policy_version 46948 (0.0035) [2024-06-06 16:41:47,318][24114] Fps is (10 sec: 47513.3, 60 sec: 45329.1, 300 sec: 45042.1). Total num frames: 769245184. Throughput: 0: 44945.3. Samples: 250459280. Policy #0 lag: (min: 1.0, avg: 8.5, max: 19.0) [2024-06-06 16:41:47,319][24114] Avg episode reward: [(0, '0.289')] [2024-06-06 16:41:49,155][24347] Updated weights for policy 0, policy_version 46958 (0.0030) [2024-06-06 16:41:52,324][24114] Fps is (10 sec: 40943.8, 60 sec: 44778.5, 300 sec: 44930.1). Total num frames: 769441792. Throughput: 0: 44928.9. Samples: 250722240. Policy #0 lag: (min: 1.0, avg: 8.5, max: 19.0) [2024-06-06 16:41:52,324][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:41:53,680][24347] Updated weights for policy 0, policy_version 46968 (0.0031) [2024-06-06 16:41:56,340][24347] Updated weights for policy 0, policy_version 46978 (0.0028) [2024-06-06 16:41:57,318][24114] Fps is (10 sec: 45875.4, 60 sec: 45056.0, 300 sec: 45042.1). Total num frames: 769703936. Throughput: 0: 44946.2. Samples: 250992260. Policy #0 lag: (min: 1.0, avg: 8.5, max: 19.0) [2024-06-06 16:41:57,319][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:42:01,249][24347] Updated weights for policy 0, policy_version 46988 (0.0028) [2024-06-06 16:42:02,318][24114] Fps is (10 sec: 47542.0, 60 sec: 45329.1, 300 sec: 45042.1). Total num frames: 769916928. Throughput: 0: 45073.4. Samples: 251133280. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:42:02,318][24114] Avg episode reward: [(0, '0.295')] [2024-06-06 16:42:03,769][24347] Updated weights for policy 0, policy_version 46998 (0.0024) [2024-06-06 16:42:07,318][24114] Fps is (10 sec: 40960.3, 60 sec: 44783.0, 300 sec: 44875.5). Total num frames: 770113536. Throughput: 0: 45157.4. Samples: 251403580. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:42:07,318][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:42:08,385][24347] Updated weights for policy 0, policy_version 47008 (0.0024) [2024-06-06 16:42:11,249][24347] Updated weights for policy 0, policy_version 47018 (0.0035) [2024-06-06 16:42:12,318][24114] Fps is (10 sec: 44236.0, 60 sec: 44236.7, 300 sec: 44875.5). Total num frames: 770359296. Throughput: 0: 44843.0. Samples: 251660680. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:42:12,319][24114] Avg episode reward: [(0, '0.291')] [2024-06-06 16:42:15,711][24347] Updated weights for policy 0, policy_version 47028 (0.0026) [2024-06-06 16:42:17,320][24114] Fps is (10 sec: 49142.4, 60 sec: 45327.6, 300 sec: 45041.8). Total num frames: 770605056. Throughput: 0: 44894.1. Samples: 251804460. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:42:17,329][24114] Avg episode reward: [(0, '0.296')] [2024-06-06 16:42:18,282][24347] Updated weights for policy 0, policy_version 47038 (0.0026) [2024-06-06 16:42:22,318][24114] Fps is (10 sec: 40960.8, 60 sec: 44510.0, 300 sec: 44875.5). Total num frames: 770768896. Throughput: 0: 44839.2. Samples: 252070020. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:42:22,318][24114] Avg episode reward: [(0, '0.299')] [2024-06-06 16:42:22,527][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000047046_770801664.pth... [2024-06-06 16:42:22,588][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000046391_760070144.pth [2024-06-06 16:42:23,087][24347] Updated weights for policy 0, policy_version 47048 (0.0045) [2024-06-06 16:42:25,669][24347] Updated weights for policy 0, policy_version 47058 (0.0028) [2024-06-06 16:42:27,318][24114] Fps is (10 sec: 40968.1, 60 sec: 43963.8, 300 sec: 44764.4). Total num frames: 771014656. Throughput: 0: 45010.0. Samples: 252337280. Policy #0 lag: (min: 0.0, avg: 10.2, max: 23.0) [2024-06-06 16:42:27,318][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:42:30,333][24347] Updated weights for policy 0, policy_version 47068 (0.0038) [2024-06-06 16:42:31,349][24326] Signal inference workers to stop experience collection... (3750 times) [2024-06-06 16:42:31,379][24347] InferenceWorker_p0-w0: stopping experience collection (3750 times) [2024-06-06 16:42:31,407][24326] Signal inference workers to resume experience collection... (3750 times) [2024-06-06 16:42:31,408][24347] InferenceWorker_p0-w0: resuming experience collection (3750 times) [2024-06-06 16:42:32,318][24114] Fps is (10 sec: 52429.0, 60 sec: 45602.2, 300 sec: 45097.9). Total num frames: 771293184. Throughput: 0: 44850.9. Samples: 252477560. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:42:32,318][24114] Avg episode reward: [(0, '0.292')] [2024-06-06 16:42:32,829][24347] Updated weights for policy 0, policy_version 47078 (0.0036) [2024-06-06 16:42:37,318][24114] Fps is (10 sec: 42598.2, 60 sec: 44509.9, 300 sec: 44820.0). Total num frames: 771440640. Throughput: 0: 44957.9. Samples: 252745080. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:42:37,319][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:42:37,791][24347] Updated weights for policy 0, policy_version 47088 (0.0044) [2024-06-06 16:42:40,058][24347] Updated weights for policy 0, policy_version 47098 (0.0026) [2024-06-06 16:42:42,318][24114] Fps is (10 sec: 40959.3, 60 sec: 44511.3, 300 sec: 44875.5). Total num frames: 771702784. Throughput: 0: 44803.5. Samples: 253008420. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:42:42,319][24114] Avg episode reward: [(0, '0.291')] [2024-06-06 16:42:44,969][24347] Updated weights for policy 0, policy_version 47108 (0.0032) [2024-06-06 16:42:47,318][24114] Fps is (10 sec: 52428.9, 60 sec: 45329.2, 300 sec: 45153.2). Total num frames: 771964928. Throughput: 0: 44649.3. Samples: 253142500. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:42:47,319][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:42:47,410][24347] Updated weights for policy 0, policy_version 47118 (0.0020) [2024-06-06 16:42:52,318][24114] Fps is (10 sec: 42598.8, 60 sec: 44787.4, 300 sec: 44875.5). Total num frames: 772128768. Throughput: 0: 44573.3. Samples: 253409380. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:42:52,318][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:42:52,375][24347] Updated weights for policy 0, policy_version 47128 (0.0047) [2024-06-06 16:42:55,040][24347] Updated weights for policy 0, policy_version 47138 (0.0027) [2024-06-06 16:42:57,318][24114] Fps is (10 sec: 39321.7, 60 sec: 44236.9, 300 sec: 44875.5). Total num frames: 772358144. Throughput: 0: 44857.5. Samples: 253679260. Policy #0 lag: (min: 0.0, avg: 10.1, max: 20.0) [2024-06-06 16:42:57,318][24114] Avg episode reward: [(0, '0.298')] [2024-06-06 16:42:59,778][24347] Updated weights for policy 0, policy_version 47148 (0.0039) [2024-06-06 16:43:02,138][24347] Updated weights for policy 0, policy_version 47158 (0.0043) [2024-06-06 16:43:02,318][24114] Fps is (10 sec: 50789.6, 60 sec: 45328.9, 300 sec: 44986.6). Total num frames: 772636672. Throughput: 0: 44676.9. Samples: 253814840. Policy #0 lag: (min: 2.0, avg: 11.5, max: 24.0) [2024-06-06 16:43:02,319][24114] Avg episode reward: [(0, '0.295')] [2024-06-06 16:43:07,099][24347] Updated weights for policy 0, policy_version 47168 (0.0024) [2024-06-06 16:43:07,318][24114] Fps is (10 sec: 44235.6, 60 sec: 44782.7, 300 sec: 44764.6). Total num frames: 772800512. Throughput: 0: 44703.7. Samples: 254081700. Policy #0 lag: (min: 2.0, avg: 11.5, max: 24.0) [2024-06-06 16:43:07,319][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:43:09,404][24347] Updated weights for policy 0, policy_version 47178 (0.0036) [2024-06-06 16:43:12,318][24114] Fps is (10 sec: 40960.0, 60 sec: 44782.9, 300 sec: 44875.5). Total num frames: 773046272. Throughput: 0: 44656.7. Samples: 254346840. Policy #0 lag: (min: 2.0, avg: 11.5, max: 24.0) [2024-06-06 16:43:12,319][24114] Avg episode reward: [(0, '0.295')] [2024-06-06 16:43:14,531][24347] Updated weights for policy 0, policy_version 47188 (0.0031) [2024-06-06 16:43:17,050][24347] Updated weights for policy 0, policy_version 47198 (0.0023) [2024-06-06 16:43:17,318][24114] Fps is (10 sec: 49153.5, 60 sec: 44784.4, 300 sec: 44875.5). Total num frames: 773292032. Throughput: 0: 44527.5. Samples: 254481300. Policy #0 lag: (min: 2.0, avg: 11.5, max: 24.0) [2024-06-06 16:43:17,318][24114] Avg episode reward: [(0, '0.299')] [2024-06-06 16:43:21,869][24347] Updated weights for policy 0, policy_version 47208 (0.0040) [2024-06-06 16:43:22,318][24114] Fps is (10 sec: 42598.9, 60 sec: 45055.9, 300 sec: 44820.0). Total num frames: 773472256. Throughput: 0: 44710.7. Samples: 254757060. Policy #0 lag: (min: 2.0, avg: 11.5, max: 24.0) [2024-06-06 16:43:22,318][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:43:24,754][24347] Updated weights for policy 0, policy_version 47218 (0.0032) [2024-06-06 16:43:27,318][24114] Fps is (10 sec: 40959.6, 60 sec: 44782.9, 300 sec: 44820.0). Total num frames: 773701632. Throughput: 0: 44634.2. Samples: 255016960. Policy #0 lag: (min: 2.0, avg: 11.5, max: 24.0) [2024-06-06 16:43:27,319][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:43:28,958][24347] Updated weights for policy 0, policy_version 47228 (0.0025) [2024-06-06 16:43:31,790][24347] Updated weights for policy 0, policy_version 47238 (0.0034) [2024-06-06 16:43:32,318][24114] Fps is (10 sec: 47513.2, 60 sec: 44236.7, 300 sec: 44820.0). Total num frames: 773947392. Throughput: 0: 44843.9. Samples: 255160480. Policy #0 lag: (min: 1.0, avg: 12.2, max: 23.0) [2024-06-06 16:43:32,318][24114] Avg episode reward: [(0, '0.301')] [2024-06-06 16:43:36,651][24347] Updated weights for policy 0, policy_version 47248 (0.0048) [2024-06-06 16:43:37,318][24114] Fps is (10 sec: 45875.7, 60 sec: 45329.1, 300 sec: 44875.5). Total num frames: 774160384. Throughput: 0: 44755.1. Samples: 255423360. Policy #0 lag: (min: 1.0, avg: 12.2, max: 23.0) [2024-06-06 16:43:37,318][24114] Avg episode reward: [(0, '0.299')] [2024-06-06 16:43:39,020][24347] Updated weights for policy 0, policy_version 47258 (0.0040) [2024-06-06 16:43:42,319][24114] Fps is (10 sec: 42594.7, 60 sec: 44509.2, 300 sec: 44765.2). Total num frames: 774373376. Throughput: 0: 44472.8. Samples: 255680580. Policy #0 lag: (min: 1.0, avg: 12.2, max: 23.0) [2024-06-06 16:43:42,324][24114] Avg episode reward: [(0, '0.291')] [2024-06-06 16:43:43,771][24326] Signal inference workers to stop experience collection... (3800 times) [2024-06-06 16:43:43,823][24326] Signal inference workers to resume experience collection... (3800 times) [2024-06-06 16:43:43,823][24347] InferenceWorker_p0-w0: stopping experience collection (3800 times) [2024-06-06 16:43:43,840][24347] InferenceWorker_p0-w0: resuming experience collection (3800 times) [2024-06-06 16:43:43,968][24347] Updated weights for policy 0, policy_version 47268 (0.0040) [2024-06-06 16:43:46,387][24347] Updated weights for policy 0, policy_version 47278 (0.0038) [2024-06-06 16:43:47,318][24114] Fps is (10 sec: 45875.2, 60 sec: 44236.8, 300 sec: 44764.5). Total num frames: 774619136. Throughput: 0: 44465.5. Samples: 255815780. Policy #0 lag: (min: 1.0, avg: 12.2, max: 23.0) [2024-06-06 16:43:47,318][24114] Avg episode reward: [(0, '0.300')] [2024-06-06 16:43:51,167][24347] Updated weights for policy 0, policy_version 47288 (0.0028) [2024-06-06 16:43:52,318][24114] Fps is (10 sec: 47518.6, 60 sec: 45329.1, 300 sec: 44986.6). Total num frames: 774848512. Throughput: 0: 44627.5. Samples: 256089920. Policy #0 lag: (min: 1.0, avg: 12.2, max: 23.0) [2024-06-06 16:43:52,318][24114] Avg episode reward: [(0, '0.294')] [2024-06-06 16:43:54,110][24347] Updated weights for policy 0, policy_version 47298 (0.0033) [2024-06-06 16:43:57,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44782.9, 300 sec: 44764.4). Total num frames: 775045120. Throughput: 0: 44492.2. Samples: 256348980. Policy #0 lag: (min: 1.0, avg: 12.2, max: 23.0) [2024-06-06 16:43:57,319][24114] Avg episode reward: [(0, '0.298')] [2024-06-06 16:43:58,382][24347] Updated weights for policy 0, policy_version 47308 (0.0033) [2024-06-06 16:44:01,422][24347] Updated weights for policy 0, policy_version 47318 (0.0027) [2024-06-06 16:44:02,318][24114] Fps is (10 sec: 42598.1, 60 sec: 43963.9, 300 sec: 44764.4). Total num frames: 775274496. Throughput: 0: 44560.4. Samples: 256486520. Policy #0 lag: (min: 0.0, avg: 11.6, max: 25.0) [2024-06-06 16:44:02,318][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:44:05,888][24347] Updated weights for policy 0, policy_version 47328 (0.0045) [2024-06-06 16:44:07,324][24114] Fps is (10 sec: 45848.1, 60 sec: 45051.8, 300 sec: 44874.6). Total num frames: 775503872. Throughput: 0: 44492.0. Samples: 256759460. Policy #0 lag: (min: 0.0, avg: 11.6, max: 25.0) [2024-06-06 16:44:07,324][24114] Avg episode reward: [(0, '0.288')] [2024-06-06 16:44:08,569][24347] Updated weights for policy 0, policy_version 47338 (0.0033) [2024-06-06 16:44:12,318][24114] Fps is (10 sec: 42597.8, 60 sec: 44236.8, 300 sec: 44653.3). Total num frames: 775700480. Throughput: 0: 44646.2. Samples: 257026040. Policy #0 lag: (min: 0.0, avg: 11.6, max: 25.0) [2024-06-06 16:44:12,319][24114] Avg episode reward: [(0, '0.299')] [2024-06-06 16:44:13,261][24347] Updated weights for policy 0, policy_version 47348 (0.0042) [2024-06-06 16:44:16,089][24347] Updated weights for policy 0, policy_version 47358 (0.0031) [2024-06-06 16:44:17,318][24114] Fps is (10 sec: 45902.5, 60 sec: 44509.9, 300 sec: 44820.0). Total num frames: 775962624. Throughput: 0: 44361.9. Samples: 257156760. Policy #0 lag: (min: 0.0, avg: 11.6, max: 25.0) [2024-06-06 16:44:17,318][24114] Avg episode reward: [(0, '0.296')] [2024-06-06 16:44:20,404][24347] Updated weights for policy 0, policy_version 47368 (0.0032) [2024-06-06 16:44:22,318][24114] Fps is (10 sec: 49152.5, 60 sec: 45329.1, 300 sec: 44875.5). Total num frames: 776192000. Throughput: 0: 44680.4. Samples: 257433980. Policy #0 lag: (min: 0.0, avg: 11.6, max: 25.0) [2024-06-06 16:44:22,318][24114] Avg episode reward: [(0, '0.294')] [2024-06-06 16:44:22,331][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000047375_776192000.pth... [2024-06-06 16:44:22,403][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000046718_765427712.pth [2024-06-06 16:44:23,624][24347] Updated weights for policy 0, policy_version 47378 (0.0028) [2024-06-06 16:44:27,318][24114] Fps is (10 sec: 42597.7, 60 sec: 44782.9, 300 sec: 44764.4). Total num frames: 776388608. Throughput: 0: 44887.9. Samples: 257700500. Policy #0 lag: (min: 0.0, avg: 11.6, max: 25.0) [2024-06-06 16:44:27,319][24114] Avg episode reward: [(0, '0.301')] [2024-06-06 16:44:27,722][24347] Updated weights for policy 0, policy_version 47388 (0.0029) [2024-06-06 16:44:30,618][24347] Updated weights for policy 0, policy_version 47398 (0.0036) [2024-06-06 16:44:32,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44509.9, 300 sec: 44820.0). Total num frames: 776617984. Throughput: 0: 44783.9. Samples: 257831060. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:44:32,319][24114] Avg episode reward: [(0, '0.298')] [2024-06-06 16:44:35,304][24347] Updated weights for policy 0, policy_version 47408 (0.0043) [2024-06-06 16:44:37,318][24114] Fps is (10 sec: 45876.0, 60 sec: 44782.9, 300 sec: 44820.0). Total num frames: 776847360. Throughput: 0: 44638.6. Samples: 258098660. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:44:37,318][24114] Avg episode reward: [(0, '0.304')] [2024-06-06 16:44:37,916][24347] Updated weights for policy 0, policy_version 47418 (0.0038) [2024-06-06 16:44:42,318][24114] Fps is (10 sec: 42598.8, 60 sec: 44510.6, 300 sec: 44708.9). Total num frames: 777043968. Throughput: 0: 44931.1. Samples: 258370880. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:44:42,318][24114] Avg episode reward: [(0, '0.292')] [2024-06-06 16:44:42,389][24347] Updated weights for policy 0, policy_version 47428 (0.0031) [2024-06-06 16:44:45,387][24347] Updated weights for policy 0, policy_version 47438 (0.0025) [2024-06-06 16:44:47,318][24114] Fps is (10 sec: 44236.8, 60 sec: 44509.9, 300 sec: 44764.4). Total num frames: 777289728. Throughput: 0: 44683.1. Samples: 258497260. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:44:47,318][24114] Avg episode reward: [(0, '0.295')] [2024-06-06 16:44:49,585][24347] Updated weights for policy 0, policy_version 47448 (0.0033) [2024-06-06 16:44:52,318][24114] Fps is (10 sec: 47513.6, 60 sec: 44509.8, 300 sec: 44820.0). Total num frames: 777519104. Throughput: 0: 44652.6. Samples: 258768560. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:44:52,318][24114] Avg episode reward: [(0, '0.295')] [2024-06-06 16:44:52,991][24347] Updated weights for policy 0, policy_version 47458 (0.0040) [2024-06-06 16:44:57,133][24347] Updated weights for policy 0, policy_version 47468 (0.0033) [2024-06-06 16:44:57,318][24114] Fps is (10 sec: 42597.5, 60 sec: 44509.7, 300 sec: 44764.4). Total num frames: 777715712. Throughput: 0: 44791.0. Samples: 259041640. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:44:57,319][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:44:59,300][24326] Signal inference workers to stop experience collection... (3850 times) [2024-06-06 16:44:59,301][24326] Signal inference workers to resume experience collection... (3850 times) [2024-06-06 16:44:59,337][24347] InferenceWorker_p0-w0: stopping experience collection (3850 times) [2024-06-06 16:44:59,337][24347] InferenceWorker_p0-w0: resuming experience collection (3850 times) [2024-06-06 16:45:00,179][24347] Updated weights for policy 0, policy_version 47478 (0.0027) [2024-06-06 16:45:02,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44782.9, 300 sec: 44708.9). Total num frames: 777961472. Throughput: 0: 44800.3. Samples: 259172780. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 16:45:02,319][24114] Avg episode reward: [(0, '0.295')] [2024-06-06 16:45:04,543][24347] Updated weights for policy 0, policy_version 47488 (0.0032) [2024-06-06 16:45:07,303][24347] Updated weights for policy 0, policy_version 47498 (0.0043) [2024-06-06 16:45:07,318][24114] Fps is (10 sec: 49152.2, 60 sec: 45060.3, 300 sec: 44764.4). Total num frames: 778207232. Throughput: 0: 44483.0. Samples: 259435720. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 16:45:07,319][24114] Avg episode reward: [(0, '0.306')] [2024-06-06 16:45:11,707][24347] Updated weights for policy 0, policy_version 47508 (0.0029) [2024-06-06 16:45:12,318][24114] Fps is (10 sec: 44237.2, 60 sec: 45056.1, 300 sec: 44820.0). Total num frames: 778403840. Throughput: 0: 44596.6. Samples: 259707340. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 16:45:12,318][24114] Avg episode reward: [(0, '0.303')] [2024-06-06 16:45:14,737][24347] Updated weights for policy 0, policy_version 47518 (0.0025) [2024-06-06 16:45:17,318][24114] Fps is (10 sec: 40960.0, 60 sec: 44236.7, 300 sec: 44709.2). Total num frames: 778616832. Throughput: 0: 44616.4. Samples: 259838800. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 16:45:17,319][24114] Avg episode reward: [(0, '0.304')] [2024-06-06 16:45:18,627][24347] Updated weights for policy 0, policy_version 47528 (0.0038) [2024-06-06 16:45:22,019][24347] Updated weights for policy 0, policy_version 47538 (0.0034) [2024-06-06 16:45:22,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44509.9, 300 sec: 44820.5). Total num frames: 778862592. Throughput: 0: 44829.3. Samples: 260115980. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 16:45:22,318][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:45:26,068][24347] Updated weights for policy 0, policy_version 47548 (0.0040) [2024-06-06 16:45:27,318][24114] Fps is (10 sec: 47514.0, 60 sec: 45056.1, 300 sec: 44875.5). Total num frames: 779091968. Throughput: 0: 44919.9. Samples: 260392280. Policy #0 lag: (min: 0.0, avg: 10.7, max: 22.0) [2024-06-06 16:45:27,319][24114] Avg episode reward: [(0, '0.292')] [2024-06-06 16:45:29,351][24347] Updated weights for policy 0, policy_version 47558 (0.0043) [2024-06-06 16:45:32,318][24114] Fps is (10 sec: 44236.3, 60 sec: 44783.0, 300 sec: 44764.4). Total num frames: 779304960. Throughput: 0: 44859.5. Samples: 260515940. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:45:32,319][24114] Avg episode reward: [(0, '0.301')] [2024-06-06 16:45:33,538][24347] Updated weights for policy 0, policy_version 47568 (0.0026) [2024-06-06 16:45:36,630][24347] Updated weights for policy 0, policy_version 47578 (0.0023) [2024-06-06 16:45:37,318][24114] Fps is (10 sec: 44237.2, 60 sec: 44782.9, 300 sec: 44764.4). Total num frames: 779534336. Throughput: 0: 44890.7. Samples: 260788640. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:45:37,318][24114] Avg episode reward: [(0, '0.291')] [2024-06-06 16:45:40,931][24347] Updated weights for policy 0, policy_version 47588 (0.0043) [2024-06-06 16:45:42,318][24114] Fps is (10 sec: 45874.8, 60 sec: 45329.0, 300 sec: 44875.5). Total num frames: 779763712. Throughput: 0: 44700.0. Samples: 261053140. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:45:42,319][24114] Avg episode reward: [(0, '0.295')] [2024-06-06 16:45:43,931][24347] Updated weights for policy 0, policy_version 47598 (0.0034) [2024-06-06 16:45:47,318][24114] Fps is (10 sec: 42598.4, 60 sec: 44509.9, 300 sec: 44764.4). Total num frames: 779960320. Throughput: 0: 44837.0. Samples: 261190440. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:45:47,318][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:45:48,069][24347] Updated weights for policy 0, policy_version 47608 (0.0031) [2024-06-06 16:45:51,339][24347] Updated weights for policy 0, policy_version 47618 (0.0039) [2024-06-06 16:45:52,318][24114] Fps is (10 sec: 45875.8, 60 sec: 45056.0, 300 sec: 44820.0). Total num frames: 780222464. Throughput: 0: 44953.9. Samples: 261458640. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:45:52,319][24114] Avg episode reward: [(0, '0.298')] [2024-06-06 16:45:55,103][24347] Updated weights for policy 0, policy_version 47628 (0.0037) [2024-06-06 16:45:57,318][24114] Fps is (10 sec: 45874.7, 60 sec: 45056.1, 300 sec: 44820.0). Total num frames: 780419072. Throughput: 0: 44883.9. Samples: 261727120. Policy #0 lag: (min: 0.0, avg: 11.1, max: 23.0) [2024-06-06 16:45:57,319][24114] Avg episode reward: [(0, '0.298')] [2024-06-06 16:45:58,780][24347] Updated weights for policy 0, policy_version 47638 (0.0036) [2024-06-06 16:46:02,318][24114] Fps is (10 sec: 40959.9, 60 sec: 44509.9, 300 sec: 44764.4). Total num frames: 780632064. Throughput: 0: 44857.9. Samples: 261857400. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 16:46:02,318][24114] Avg episode reward: [(0, '0.298')] [2024-06-06 16:46:02,741][24347] Updated weights for policy 0, policy_version 47648 (0.0041) [2024-06-06 16:46:06,179][24347] Updated weights for policy 0, policy_version 47658 (0.0039) [2024-06-06 16:46:07,318][24114] Fps is (10 sec: 45875.1, 60 sec: 44509.9, 300 sec: 44653.3). Total num frames: 780877824. Throughput: 0: 44673.6. Samples: 262126300. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 16:46:07,319][24114] Avg episode reward: [(0, '0.296')] [2024-06-06 16:46:10,169][24347] Updated weights for policy 0, policy_version 47668 (0.0038) [2024-06-06 16:46:12,318][24114] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 44764.4). Total num frames: 781090816. Throughput: 0: 44532.1. Samples: 262396220. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 16:46:12,318][24114] Avg episode reward: [(0, '0.298')] [2024-06-06 16:46:13,210][24347] Updated weights for policy 0, policy_version 47678 (0.0027) [2024-06-06 16:46:17,178][24347] Updated weights for policy 0, policy_version 47688 (0.0032) [2024-06-06 16:46:17,318][24114] Fps is (10 sec: 44237.5, 60 sec: 45056.1, 300 sec: 44820.0). Total num frames: 781320192. Throughput: 0: 44910.3. Samples: 262536900. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 16:46:17,318][24114] Avg episode reward: [(0, '0.305')] [2024-06-06 16:46:20,649][24347] Updated weights for policy 0, policy_version 47698 (0.0034) [2024-06-06 16:46:22,320][24114] Fps is (10 sec: 44227.6, 60 sec: 44508.3, 300 sec: 44597.5). Total num frames: 781533184. Throughput: 0: 44739.3. Samples: 262802000. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 16:46:22,321][24114] Avg episode reward: [(0, '0.296')] [2024-06-06 16:46:22,378][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000047702_781549568.pth... [2024-06-06 16:46:22,434][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000047046_770801664.pth [2024-06-06 16:46:24,189][24326] Signal inference workers to stop experience collection... (3900 times) [2024-06-06 16:46:24,220][24347] InferenceWorker_p0-w0: stopping experience collection (3900 times) [2024-06-06 16:46:24,245][24326] Signal inference workers to resume experience collection... (3900 times) [2024-06-06 16:46:24,252][24347] InferenceWorker_p0-w0: resuming experience collection (3900 times) [2024-06-06 16:46:24,382][24347] Updated weights for policy 0, policy_version 47708 (0.0022) [2024-06-06 16:46:27,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44509.9, 300 sec: 44764.4). Total num frames: 781762560. Throughput: 0: 44923.6. Samples: 263074700. Policy #0 lag: (min: 0.0, avg: 11.5, max: 22.0) [2024-06-06 16:46:27,319][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:46:28,131][24347] Updated weights for policy 0, policy_version 47718 (0.0030) [2024-06-06 16:46:31,873][24347] Updated weights for policy 0, policy_version 47728 (0.0045) [2024-06-06 16:46:32,318][24114] Fps is (10 sec: 45884.7, 60 sec: 44783.0, 300 sec: 44820.0). Total num frames: 781991936. Throughput: 0: 44927.5. Samples: 263212180. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 16:46:32,318][24114] Avg episode reward: [(0, '0.303')] [2024-06-06 16:46:35,233][24347] Updated weights for policy 0, policy_version 47738 (0.0050) [2024-06-06 16:46:37,318][24114] Fps is (10 sec: 44236.7, 60 sec: 44509.8, 300 sec: 44653.6). Total num frames: 782204928. Throughput: 0: 44726.6. Samples: 263471340. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 16:46:37,319][24114] Avg episode reward: [(0, '0.299')] [2024-06-06 16:46:39,245][24347] Updated weights for policy 0, policy_version 47748 (0.0041) [2024-06-06 16:46:42,302][24347] Updated weights for policy 0, policy_version 47758 (0.0021) [2024-06-06 16:46:42,322][24114] Fps is (10 sec: 47492.4, 60 sec: 45052.7, 300 sec: 44819.3). Total num frames: 782467072. Throughput: 0: 44801.4. Samples: 263743380. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 16:46:42,323][24114] Avg episode reward: [(0, '0.286')] [2024-06-06 16:46:46,415][24347] Updated weights for policy 0, policy_version 47768 (0.0033) [2024-06-06 16:46:47,318][24114] Fps is (10 sec: 44237.1, 60 sec: 44782.9, 300 sec: 44765.3). Total num frames: 782647296. Throughput: 0: 44880.9. Samples: 263877040. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 16:46:47,318][24114] Avg episode reward: [(0, '0.301')] [2024-06-06 16:46:49,916][24347] Updated weights for policy 0, policy_version 47778 (0.0030) [2024-06-06 16:46:52,318][24114] Fps is (10 sec: 40978.4, 60 sec: 44236.8, 300 sec: 44653.4). Total num frames: 782876672. Throughput: 0: 44769.9. Samples: 264140940. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 16:46:52,318][24114] Avg episode reward: [(0, '0.300')] [2024-06-06 16:46:53,669][24347] Updated weights for policy 0, policy_version 47788 (0.0033) [2024-06-06 16:46:57,318][24114] Fps is (10 sec: 44235.8, 60 sec: 44509.8, 300 sec: 44653.3). Total num frames: 783089664. Throughput: 0: 44727.3. Samples: 264408960. Policy #0 lag: (min: 0.0, avg: 9.4, max: 21.0) [2024-06-06 16:46:57,319][24114] Avg episode reward: [(0, '0.299')] [2024-06-06 16:46:57,658][24347] Updated weights for policy 0, policy_version 47798 (0.0032) [2024-06-06 16:47:01,187][24347] Updated weights for policy 0, policy_version 47808 (0.0032) [2024-06-06 16:47:02,318][24114] Fps is (10 sec: 45875.1, 60 sec: 45056.0, 300 sec: 44820.0). Total num frames: 783335424. Throughput: 0: 44641.3. Samples: 264545760. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-06 16:47:02,318][24114] Avg episode reward: [(0, '0.305')] [2024-06-06 16:47:04,716][24347] Updated weights for policy 0, policy_version 47818 (0.0044) [2024-06-06 16:47:07,318][24114] Fps is (10 sec: 44238.0, 60 sec: 44236.9, 300 sec: 44653.4). Total num frames: 783532032. Throughput: 0: 44602.1. Samples: 264809000. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-06 16:47:07,318][24114] Avg episode reward: [(0, '0.303')] [2024-06-06 16:47:08,365][24347] Updated weights for policy 0, policy_version 47828 (0.0036) [2024-06-06 16:47:11,746][24347] Updated weights for policy 0, policy_version 47838 (0.0025) [2024-06-06 16:47:12,319][24114] Fps is (10 sec: 45869.4, 60 sec: 45055.0, 300 sec: 44709.0). Total num frames: 783794176. Throughput: 0: 44425.9. Samples: 265073920. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-06 16:47:12,320][24114] Avg episode reward: [(0, '0.299')] [2024-06-06 16:47:15,899][24347] Updated weights for policy 0, policy_version 47848 (0.0053) [2024-06-06 16:47:17,318][24114] Fps is (10 sec: 47512.9, 60 sec: 44782.8, 300 sec: 44875.5). Total num frames: 784007168. Throughput: 0: 44457.2. Samples: 265212760. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-06 16:47:17,319][24114] Avg episode reward: [(0, '0.301')] [2024-06-06 16:47:19,398][24347] Updated weights for policy 0, policy_version 47858 (0.0027) [2024-06-06 16:47:22,318][24114] Fps is (10 sec: 44242.0, 60 sec: 45057.5, 300 sec: 44819.9). Total num frames: 784236544. Throughput: 0: 44735.5. Samples: 265484440. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-06 16:47:22,318][24114] Avg episode reward: [(0, '0.294')] [2024-06-06 16:47:23,469][24347] Updated weights for policy 0, policy_version 47868 (0.0037) [2024-06-06 16:47:26,855][24347] Updated weights for policy 0, policy_version 47878 (0.0030) [2024-06-06 16:47:27,318][24114] Fps is (10 sec: 44237.0, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 784449536. Throughput: 0: 44608.8. Samples: 265750580. Policy #0 lag: (min: 1.0, avg: 10.1, max: 21.0) [2024-06-06 16:47:27,319][24114] Avg episode reward: [(0, '0.296')] [2024-06-06 16:47:30,670][24347] Updated weights for policy 0, policy_version 47888 (0.0031) [2024-06-06 16:47:32,318][24114] Fps is (10 sec: 42598.3, 60 sec: 44509.8, 300 sec: 44820.0). Total num frames: 784662528. Throughput: 0: 44472.8. Samples: 265878320. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:47:32,319][24114] Avg episode reward: [(0, '0.301')] [2024-06-06 16:47:34,078][24347] Updated weights for policy 0, policy_version 47898 (0.0037) [2024-06-06 16:47:37,318][24114] Fps is (10 sec: 44236.5, 60 sec: 44782.9, 300 sec: 44708.9). Total num frames: 784891904. Throughput: 0: 44610.9. Samples: 266148440. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:47:37,319][24114] Avg episode reward: [(0, '0.295')] [2024-06-06 16:47:37,846][24347] Updated weights for policy 0, policy_version 47908 (0.0024) [2024-06-06 16:47:41,064][24347] Updated weights for policy 0, policy_version 47918 (0.0029) [2024-06-06 16:47:42,318][24114] Fps is (10 sec: 45875.7, 60 sec: 44240.1, 300 sec: 44597.8). Total num frames: 785121280. Throughput: 0: 44660.2. Samples: 266418660. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:47:42,318][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:47:45,271][24347] Updated weights for policy 0, policy_version 47928 (0.0037) [2024-06-06 16:47:47,318][24114] Fps is (10 sec: 45875.3, 60 sec: 45055.9, 300 sec: 44819.9). Total num frames: 785350656. Throughput: 0: 44716.7. Samples: 266558020. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:47:47,319][24114] Avg episode reward: [(0, '0.296')] [2024-06-06 16:47:48,789][24347] Updated weights for policy 0, policy_version 47938 (0.0028) [2024-06-06 16:47:49,280][24326] Signal inference workers to stop experience collection... (3950 times) [2024-06-06 16:47:49,325][24347] InferenceWorker_p0-w0: stopping experience collection (3950 times) [2024-06-06 16:47:49,333][24326] Signal inference workers to resume experience collection... (3950 times) [2024-06-06 16:47:49,341][24347] InferenceWorker_p0-w0: resuming experience collection (3950 times) [2024-06-06 16:47:52,318][24114] Fps is (10 sec: 44235.9, 60 sec: 44782.8, 300 sec: 44764.4). Total num frames: 785563648. Throughput: 0: 44832.7. Samples: 266826480. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:47:52,319][24114] Avg episode reward: [(0, '0.303')] [2024-06-06 16:47:52,457][24347] Updated weights for policy 0, policy_version 47948 (0.0030) [2024-06-06 16:47:56,307][24347] Updated weights for policy 0, policy_version 47958 (0.0034) [2024-06-06 16:47:57,318][24114] Fps is (10 sec: 44236.9, 60 sec: 45056.1, 300 sec: 44597.8). Total num frames: 785793024. Throughput: 0: 44811.8. Samples: 267090400. Policy #0 lag: (min: 0.0, avg: 10.3, max: 21.0) [2024-06-06 16:47:57,319][24114] Avg episode reward: [(0, '0.291')] [2024-06-06 16:47:59,680][24347] Updated weights for policy 0, policy_version 47968 (0.0042) [2024-06-06 16:48:02,318][24114] Fps is (10 sec: 45876.2, 60 sec: 44783.0, 300 sec: 44820.0). Total num frames: 786022400. Throughput: 0: 44575.7. Samples: 267218660. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:48:02,318][24114] Avg episode reward: [(0, '0.294')] [2024-06-06 16:48:03,718][24347] Updated weights for policy 0, policy_version 47978 (0.0035) [2024-06-06 16:48:07,011][24347] Updated weights for policy 0, policy_version 47988 (0.0034) [2024-06-06 16:48:07,318][24114] Fps is (10 sec: 44237.0, 60 sec: 45055.9, 300 sec: 44708.9). Total num frames: 786235392. Throughput: 0: 44477.8. Samples: 267485940. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:48:07,320][24114] Avg episode reward: [(0, '0.301')] [2024-06-06 16:48:10,946][24347] Updated weights for policy 0, policy_version 47998 (0.0032) [2024-06-06 16:48:12,318][24114] Fps is (10 sec: 44236.4, 60 sec: 44510.8, 300 sec: 44653.3). Total num frames: 786464768. Throughput: 0: 44565.8. Samples: 267756040. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:48:12,319][24114] Avg episode reward: [(0, '0.294')] [2024-06-06 16:48:14,956][24347] Updated weights for policy 0, policy_version 48008 (0.0031) [2024-06-06 16:48:17,318][24114] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 44708.9). Total num frames: 786661376. Throughput: 0: 44743.9. Samples: 267891800. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:48:17,319][24114] Avg episode reward: [(0, '0.305')] [2024-06-06 16:48:18,439][24347] Updated weights for policy 0, policy_version 48018 (0.0023) [2024-06-06 16:48:22,009][24347] Updated weights for policy 0, policy_version 48028 (0.0023) [2024-06-06 16:48:22,320][24114] Fps is (10 sec: 42590.3, 60 sec: 44235.4, 300 sec: 44708.6). Total num frames: 786890752. Throughput: 0: 44562.2. Samples: 268153820. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:48:22,321][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:48:22,336][24326] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000048028_786890752.pth... [2024-06-06 16:48:22,398][24326] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000047375_776192000.pth [2024-06-06 16:48:25,980][24347] Updated weights for policy 0, policy_version 48038 (0.0032) [2024-06-06 16:48:27,318][24114] Fps is (10 sec: 44235.4, 60 sec: 44236.5, 300 sec: 44597.7). Total num frames: 787103744. Throughput: 0: 44460.8. Samples: 268419420. Policy #0 lag: (min: 0.0, avg: 9.7, max: 21.0) [2024-06-06 16:48:27,319][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:48:29,286][24347] Updated weights for policy 0, policy_version 48048 (0.0024) [2024-06-06 16:48:32,320][24114] Fps is (10 sec: 45875.1, 60 sec: 44781.5, 300 sec: 44708.6). Total num frames: 787349504. Throughput: 0: 44324.9. Samples: 268552720. Policy #0 lag: (min: 1.0, avg: 11.4, max: 21.0) [2024-06-06 16:48:32,321][24114] Avg episode reward: [(0, '0.300')] [2024-06-06 16:48:33,187][24347] Updated weights for policy 0, policy_version 48058 (0.0030) [2024-06-06 16:48:36,842][24347] Updated weights for policy 0, policy_version 48068 (0.0028) [2024-06-06 16:48:37,318][24114] Fps is (10 sec: 44238.9, 60 sec: 44236.9, 300 sec: 44653.5). Total num frames: 787546112. Throughput: 0: 44261.5. Samples: 268818240. Policy #0 lag: (min: 1.0, avg: 11.4, max: 21.0) [2024-06-06 16:48:37,319][24114] Avg episode reward: [(0, '0.285')] [2024-06-06 16:48:40,388][24347] Updated weights for policy 0, policy_version 48078 (0.0038) [2024-06-06 16:48:42,324][24114] Fps is (10 sec: 45856.6, 60 sec: 44778.4, 300 sec: 44708.0). Total num frames: 787808256. Throughput: 0: 44463.5. Samples: 269091520. Policy #0 lag: (min: 1.0, avg: 11.4, max: 21.0) [2024-06-06 16:48:42,325][24114] Avg episode reward: [(0, '0.297')] [2024-06-06 16:48:44,341][24347] Updated weights for policy 0, policy_version 48088 (0.0027) [2024-06-06 16:48:47,320][24114] Fps is (10 sec: 45866.2, 60 sec: 44235.5, 300 sec: 44597.5). Total num frames: 788004864. Throughput: 0: 44827.3. Samples: 269235980. Policy #0 lag: (min: 1.0, avg: 11.4, max: 21.0) [2024-06-06 16:48:47,320][24114] Avg episode reward: [(0, '0.304')] [2024-06-06 16:48:47,854][24347] Updated weights for policy 0, policy_version 48098 (0.0042) [2024-06-06 16:48:51,386][24347] Updated weights for policy 0, policy_version 48108 (0.0039) [2024-06-06 16:48:52,318][24114] Fps is (10 sec: 44263.1, 60 sec: 44783.0, 300 sec: 44764.4). Total num frames: 788250624. Throughput: 0: 44654.7. Samples: 269495400. Policy #0 lag: (min: 1.0, avg: 11.4, max: 21.0) [2024-06-06 16:48:52,319][24114] Avg episode reward: [(0, '0.301')] [2024-06-06 16:48:55,453][24347] Updated weights for policy 0, policy_version 48118 (0.0040) [2024-06-06 16:48:57,318][24114] Fps is (10 sec: 44245.4, 60 sec: 44236.9, 300 sec: 44653.3). Total num frames: 788447232. Throughput: 0: 44545.8. Samples: 269760600. Policy #0 lag: (min: 1.0, avg: 11.4, max: 21.0) [2024-06-06 16:48:57,318][24114] Avg episode reward: [(0, '0.292')] [2024-06-06 16:48:58,735][24347] Updated weights for policy 0, policy_version 48128 (0.0032) [2024-06-06 16:49:02,318][24114] Fps is (10 sec: 40960.5, 60 sec: 43963.7, 300 sec: 44598.7). Total num frames: 788660224. Throughput: 0: 44514.9. Samples: 269894960. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 16:49:02,318][24114] Avg episode reward: [(0, '0.290')] [2024-06-06 16:49:02,778][24347] Updated weights for policy 0, policy_version 48138 (0.0032) [2024-06-06 16:49:04,542][24326] Signal inference workers to stop experience collection... (4000 times) [2024-06-06 16:49:04,543][24326] Signal inference workers to resume experience collection... (4000 times) [2024-06-06 16:49:04,585][24347] InferenceWorker_p0-w0: stopping experience collection (4000 times) [2024-06-06 16:49:04,585][24347] InferenceWorker_p0-w0: resuming experience collection (4000 times) [2024-06-06 16:49:06,237][24347] Updated weights for policy 0, policy_version 48148 (0.0034) [2024-06-06 16:49:07,318][24114] Fps is (10 sec: 45875.3, 60 sec: 44509.9, 300 sec: 44764.4). Total num frames: 788905984. Throughput: 0: 44784.6. Samples: 270169040. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 16:49:07,318][24114] Avg episode reward: [(0, '0.294')] [2024-06-06 16:49:09,775][24347] Updated weights for policy 0, policy_version 48158 (0.0028) [2024-06-06 16:49:12,318][24114] Fps is (10 sec: 49151.3, 60 sec: 44782.9, 300 sec: 44708.9). Total num frames: 789151744. Throughput: 0: 44620.8. Samples: 270427340. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 16:49:12,319][24114] Avg episode reward: [(0, '0.294')] [2024-06-06 16:49:13,936][24347] Updated weights for policy 0, policy_version 48168 (0.0042) [2024-06-06 16:49:17,318][24114] Fps is (10 sec: 42598.2, 60 sec: 44510.0, 300 sec: 44542.3). Total num frames: 789331968. Throughput: 0: 44718.4. Samples: 270564960. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 16:49:17,318][24114] Avg episode reward: [(0, '0.293')] [2024-06-06 16:49:17,352][24347] Updated weights for policy 0, policy_version 48178 (0.0021) [2024-06-06 16:49:21,006][24347] Updated weights for policy 0, policy_version 48188 (0.0033) [2024-06-06 16:49:22,318][24114] Fps is (10 sec: 42599.0, 60 sec: 44784.4, 300 sec: 44708.9). Total num frames: 789577728. Throughput: 0: 44815.1. Samples: 270834920. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 16:49:22,318][24114] Avg episode reward: [(0, '0.283')] [2024-06-06 16:49:24,780][24347] Updated weights for policy 0, policy_version 48198 (0.0031) [2024-06-06 16:49:27,318][24114] Fps is (10 sec: 47513.8, 60 sec: 45056.4, 300 sec: 44708.9). Total num frames: 789807104. Throughput: 0: 44464.2. Samples: 271092140. Policy #0 lag: (min: 0.0, avg: 11.5, max: 23.0) [2024-06-06 16:49:27,318][24114] Avg episode reward: [(0, '0.298')] [2024-06-06 16:49:28,336][24347] Updated weights for policy 0, policy_version 48208 (0.0022) [2024-06-06 16:49:32,117][24347] Updated weights for policy 0, policy_version 48218 (0.0034) [2024-06-06 16:49:32,324][24114] Fps is (10 sec: 42573.5, 60 sec: 44233.9, 300 sec: 44596.9). Total num frames: 790003712. Throughput: 0: 44299.3. Samples: 271229620. Policy #0 lag: (min: 0.0, avg: 8.9, max: 21.0) [2024-06-06 16:49:32,324][24114] Avg episode reward: [(0, '0.289')] [2024-06-06 16:49:35,724][24347] Updated weights for policy 0, policy_version 48228 (0.0032) [2024-06-06 16:49:45,934][27571] Saving configuration to /workspace/metta/train_dir/p2.metta.4/config.json... [2024-06-06 16:49:45,951][27571] Rollout worker 0 uses device cpu [2024-06-06 16:49:45,952][27571] Rollout worker 1 uses device cpu [2024-06-06 16:49:45,952][27571] Rollout worker 2 uses device cpu [2024-06-06 16:49:45,952][27571] Rollout worker 3 uses device cpu [2024-06-06 16:49:45,952][27571] Rollout worker 4 uses device cpu [2024-06-06 16:49:45,953][27571] Rollout worker 5 uses device cpu [2024-06-06 16:49:45,953][27571] Rollout worker 6 uses device cpu [2024-06-06 16:49:45,953][27571] Rollout worker 7 uses device cpu [2024-06-06 16:49:45,953][27571] Rollout worker 8 uses device cpu [2024-06-06 16:49:45,954][27571] Rollout worker 9 uses device cpu [2024-06-06 16:49:45,954][27571] Rollout worker 10 uses device cpu [2024-06-06 16:49:45,954][27571] Rollout worker 11 uses device cpu [2024-06-06 16:49:45,954][27571] Rollout worker 12 uses device cpu [2024-06-06 16:49:45,955][27571] Rollout worker 13 uses device cpu [2024-06-06 16:49:45,955][27571] Rollout worker 14 uses device cpu [2024-06-06 16:49:45,955][27571] Rollout worker 15 uses device cpu [2024-06-06 16:49:45,955][27571] Rollout worker 16 uses device cpu [2024-06-06 16:49:45,956][27571] Rollout worker 17 uses device cpu [2024-06-06 16:49:45,956][27571] Rollout worker 18 uses device cpu [2024-06-06 16:49:45,956][27571] Rollout worker 19 uses device cpu [2024-06-06 16:49:45,957][27571] Rollout worker 20 uses device cpu [2024-06-06 16:49:45,957][27571] Rollout worker 21 uses device cpu [2024-06-06 16:49:45,957][27571] Rollout worker 22 uses device cpu [2024-06-06 16:49:45,957][27571] Rollout worker 23 uses device cpu [2024-06-06 16:49:45,958][27571] Rollout worker 24 uses device cpu [2024-06-06 16:49:45,958][27571] Rollout worker 25 uses device cpu [2024-06-06 16:49:45,958][27571] Rollout worker 26 uses device cpu [2024-06-06 16:49:45,959][27571] Rollout worker 27 uses device cpu [2024-06-06 16:49:45,959][27571] Rollout worker 28 uses device cpu [2024-06-06 16:49:45,959][27571] Rollout worker 29 uses device cpu [2024-06-06 16:49:45,959][27571] Rollout worker 30 uses device cpu [2024-06-06 16:49:45,960][27571] Rollout worker 31 uses device cpu [2024-06-06 16:49:46,499][27571] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 16:49:46,499][27571] InferenceWorker_p0-w0: min num requests: 10 [2024-06-06 16:49:46,546][27571] Starting all processes... [2024-06-06 16:49:46,547][27571] Starting process learner_proc0 [2024-06-06 16:49:46,818][27571] Starting all processes... [2024-06-06 16:49:46,821][27571] Starting process inference_proc0-0 [2024-06-06 16:49:46,822][27571] Starting process rollout_proc0 [2024-06-06 16:49:46,822][27571] Starting process rollout_proc2 [2024-06-06 16:49:46,822][27571] Starting process rollout_proc1 [2024-06-06 16:49:46,823][27571] Starting process rollout_proc3 [2024-06-06 16:49:46,824][27571] Starting process rollout_proc4 [2024-06-06 16:49:46,824][27571] Starting process rollout_proc5 [2024-06-06 16:49:46,824][27571] Starting process rollout_proc6 [2024-06-06 16:49:46,826][27571] Starting process rollout_proc7 [2024-06-06 16:49:46,827][27571] Starting process rollout_proc8 [2024-06-06 16:49:46,828][27571] Starting process rollout_proc9 [2024-06-06 16:49:46,828][27571] Starting process rollout_proc10 [2024-06-06 16:49:46,828][27571] Starting process rollout_proc11 [2024-06-06 16:49:46,830][27571] Starting process rollout_proc12 [2024-06-06 16:49:46,838][27571] Starting process rollout_proc25 [2024-06-06 16:49:46,832][27571] Starting process rollout_proc14 [2024-06-06 16:49:46,832][27571] Starting process rollout_proc15 [2024-06-06 16:49:46,832][27571] Starting process rollout_proc16 [2024-06-06 16:49:46,833][27571] Starting process rollout_proc17 [2024-06-06 16:49:46,834][27571] Starting process rollout_proc18 [2024-06-06 16:49:46,834][27571] Starting process rollout_proc19 [2024-06-06 16:49:46,834][27571] Starting process rollout_proc20 [2024-06-06 16:49:46,834][27571] Starting process rollout_proc21 [2024-06-06 16:49:46,834][27571] Starting process rollout_proc22 [2024-06-06 16:49:46,836][27571] Starting process rollout_proc23 [2024-06-06 16:49:46,838][27571] Starting process rollout_proc24 [2024-06-06 16:49:46,830][27571] Starting process rollout_proc13 [2024-06-06 16:49:46,842][27571] Starting process rollout_proc26 [2024-06-06 16:49:46,842][27571] Starting process rollout_proc27 [2024-06-06 16:49:46,843][27571] Starting process rollout_proc28 [2024-06-06 16:49:46,843][27571] Starting process rollout_proc29 [2024-06-06 16:49:46,845][27571] Starting process rollout_proc30 [2024-06-06 16:49:46,849][27571] Starting process rollout_proc31 [2024-06-06 16:49:48,732][27830] Worker 28 uses CPU cores [28] [2024-06-06 16:49:49,046][27783] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 16:49:49,046][27783] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-06 16:49:49,058][27783] Num visible devices: 1 [2024-06-06 16:49:49,072][27783] Setting fixed seed 0 [2024-06-06 16:49:49,074][27783] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 16:49:49,074][27783] Initializing actor-critic model on device cuda:0 [2024-06-06 16:49:49,096][27820] Worker 17 uses CPU cores [17] [2024-06-06 16:49:49,102][27829] Worker 27 uses CPU cores [27] [2024-06-06 16:49:49,102][27812] Worker 8 uses CPU cores [8] [2024-06-06 16:49:49,103][27805] Worker 2 uses CPU cores [2] [2024-06-06 16:49:49,104][27818] Worker 25 uses CPU cores [25] [2024-06-06 16:49:49,111][27810] Worker 4 uses CPU cores [4] [2024-06-06 16:49:49,120][27828] Worker 22 uses CPU cores [22] [2024-06-06 16:49:49,124][27819] Worker 15 uses CPU cores [15] [2024-06-06 16:49:49,139][27821] Worker 16 uses CPU cores [16] [2024-06-06 16:49:49,156][27826] Worker 24 uses CPU cores [24] [2024-06-06 16:49:49,168][27804] Worker 0 uses CPU cores [0] [2024-06-06 16:49:49,212][27817] Worker 14 uses CPU cores [14] [2024-06-06 16:49:49,216][27834] Worker 13 uses CPU cores [13] [2024-06-06 16:49:49,224][27832] Worker 29 uses CPU cores [29] [2024-06-06 16:49:49,243][27827] Worker 21 uses CPU cores [21] [2024-06-06 16:49:49,244][27825] Worker 23 uses CPU cores [23] [2024-06-06 16:49:49,262][27811] Worker 9 uses CPU cores [9] [2024-06-06 16:49:49,268][27803] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 16:49:49,268][27803] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-06 16:49:49,277][27803] Num visible devices: 1 [2024-06-06 16:49:49,316][27814] Worker 11 uses CPU cores [11] [2024-06-06 16:49:49,319][27807] Worker 6 uses CPU cores [6] [2024-06-06 16:49:49,336][27806] Worker 1 uses CPU cores [1] [2024-06-06 16:49:49,347][27815] Worker 10 uses CPU cores [10] [2024-06-06 16:49:49,360][27816] Worker 12 uses CPU cores [12] [2024-06-06 16:49:49,371][27822] Worker 18 uses CPU cores [18] [2024-06-06 16:49:49,408][27808] Worker 3 uses CPU cores [3] [2024-06-06 16:49:49,409][27823] Worker 19 uses CPU cores [19] [2024-06-06 16:49:49,424][27833] Worker 30 uses CPU cores [30] [2024-06-06 16:49:49,430][27813] Worker 7 uses CPU cores [7] [2024-06-06 16:49:49,438][27831] Worker 26 uses CPU cores [26] [2024-06-06 16:49:49,460][27835] Worker 31 uses CPU cores [31] [2024-06-06 16:49:49,471][27824] Worker 20 uses CPU cores [20] [2024-06-06 16:49:49,477][27809] Worker 5 uses CPU cores [5] [2024-06-06 16:49:49,974][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,975][27783] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:49,979][27783] RunningMeanStd input shape: (1,) [2024-06-06 16:49:49,979][27783] RunningMeanStd input shape: (1,) [2024-06-06 16:49:49,979][27783] RunningMeanStd input shape: (1,) [2024-06-06 16:49:49,979][27783] RunningMeanStd input shape: (1,) [2024-06-06 16:49:50,019][27783] RunningMeanStd input shape: (1,) [2024-06-06 16:49:50,024][27783] Created Actor Critic model with architecture: [2024-06-06 16:49:50,024][27783] SampleFactoryAgentWrapper( (obs_normalizer): ObservationNormalizer() (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (agent): MettaAgent( (_encoder): MultiFeatureSetEncoder( (feature_set_encoders): ModuleDict( (grid_obs): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (agent): RunningMeanStdInPlace() (altar): RunningMeanStdInPlace() (converter): RunningMeanStdInPlace() (generator): RunningMeanStdInPlace() (wall): RunningMeanStdInPlace() (agent:dir): RunningMeanStdInPlace() (agent:energy): RunningMeanStdInPlace() (agent:frozen): RunningMeanStdInPlace() (agent:hp): RunningMeanStdInPlace() (agent:id): RunningMeanStdInPlace() (agent:inv_r1): RunningMeanStdInPlace() (agent:inv_r2): RunningMeanStdInPlace() (agent:inv_r3): RunningMeanStdInPlace() (agent:shield): RunningMeanStdInPlace() (altar:hp): RunningMeanStdInPlace() (altar:state): RunningMeanStdInPlace() (converter:hp): RunningMeanStdInPlace() (converter:state): RunningMeanStdInPlace() (generator:amount): RunningMeanStdInPlace() (generator:hp): RunningMeanStdInPlace() (generator:state): RunningMeanStdInPlace() (wall:hp): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=125, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=512, out_features=512, bias=True) (7): ELU(alpha=1.0) ) ) (global_vars): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (_steps): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_action): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_action_id): RunningMeanStdInPlace() (last_action_val): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) (last_reward): FeatureSetEncoder( (_normalizer): FeatureListNormalizer( (_norms_dict): ModuleDict( (last_reward): RunningMeanStdInPlace() ) ) (embedding_net): Sequential( (0): Linear(in_features=5, out_features=8, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=8, out_features=8, bias=True) (3): ELU(alpha=1.0) ) ) ) (merged_encoder): Sequential( (0): Linear(in_features=536, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=512, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=512, out_features=512, bias=True) (5): ELU(alpha=1.0) ) ) (_core): ModelCoreRNN( (core): GRU(512, 512) ) (_decoder): Decoder( (mlp): Identity() ) (_critic_linear): Linear(in_features=512, out_features=1, bias=True) (_action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=16, bias=True) ) ) ) [2024-06-06 16:49:50,092][27783] Using optimizer [2024-06-06 16:49:50,282][27783] Loading state from checkpoint /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000048028_786890752.pth... [2024-06-06 16:49:50,297][27783] Loading model from checkpoint [2024-06-06 16:49:50,298][27783] Loaded experiment state at self.train_step=48028, self.env_steps=786890752 [2024-06-06 16:49:50,298][27783] Initialized policy 0 weights for model version 48028 [2024-06-06 16:49:50,300][27783] LearnerWorker_p0 finished initialization! [2024-06-06 16:49:50,300][27783] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,035][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,036][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,036][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,036][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,036][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,036][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,036][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,036][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,036][27803] RunningMeanStd input shape: (11, 11) [2024-06-06 16:49:51,039][27803] RunningMeanStd input shape: (1,) [2024-06-06 16:49:51,039][27803] RunningMeanStd input shape: (1,) [2024-06-06 16:49:51,039][27803] RunningMeanStd input shape: (1,) [2024-06-06 16:49:51,040][27803] RunningMeanStd input shape: (1,) [2024-06-06 16:49:51,079][27803] RunningMeanStd input shape: (1,) [2024-06-06 16:49:51,101][27571] Inference worker 0-0 is ready! [2024-06-06 16:49:51,102][27571] All inference workers are ready! Signal rollout workers to start! [2024-06-06 16:49:53,656][27571] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 786890752. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 16:49:53,660][27825] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,663][27820] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,663][27826] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,666][27824] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,668][27835] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,672][27821] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,680][27828] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,680][27827] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,683][27831] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,686][27833] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,686][27818] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,711][27830] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,712][27832] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,715][27822] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,741][27809] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,746][27806] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,747][27808] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,749][27811] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,751][27819] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,754][27834] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,757][27814] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,758][27810] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,758][27813] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,758][27805] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,762][27823] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,762][27812] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,763][27816] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,764][27804] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,765][27815] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,771][27807] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,772][27817] Decorrelating experience for 0 frames... [2024-06-06 16:49:53,806][27829] Decorrelating experience for 0 frames... [2024-06-06 16:49:55,140][27820] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,150][27825] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,157][27826] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,166][27824] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,173][27821] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,197][27835] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,208][27827] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,215][27831] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,220][27828] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,221][27818] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,226][27833] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,262][27822] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,279][27809] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,281][27832] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,300][27808] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,300][27811] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,305][27806] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,309][27834] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,311][27819] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,315][27813] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,322][27810] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,324][27814] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,324][27830] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,328][27805] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,330][27804] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,331][27816] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,331][27812] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,334][27815] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,341][27807] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,344][27817] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,368][27823] Decorrelating experience for 256 frames... [2024-06-06 16:49:55,388][27829] Decorrelating experience for 256 frames... [2024-06-06 16:49:58,656][27571] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 786890752. Throughput: 0: 6783.8. Samples: 33920. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 16:50:01,836][27812] Worker 8, sleep for 37.500 sec to decorrelate experience collection [2024-06-06 16:50:01,876][27817] Worker 14, sleep for 65.625 sec to decorrelate experience collection [2024-06-06 16:50:01,876][27815] Worker 10, sleep for 46.875 sec to decorrelate experience collection [2024-06-06 16:50:01,885][27826] Worker 24, sleep for 112.500 sec to decorrelate experience collection [2024-06-06 16:50:01,885][27824] Worker 20, sleep for 93.750 sec to decorrelate experience collection [2024-06-06 16:50:01,892][27811] Worker 9, sleep for 42.188 sec to decorrelate experience collection [2024-06-06 16:50:01,893][27814] Worker 11, sleep for 51.562 sec to decorrelate experience collection [2024-06-06 16:50:01,899][27819] Worker 15, sleep for 70.312 sec to decorrelate experience collection [2024-06-06 16:50:01,900][27834] Worker 13, sleep for 60.938 sec to decorrelate experience collection [2024-06-06 16:50:01,901][27828] Worker 22, sleep for 103.125 sec to decorrelate experience collection [2024-06-06 16:50:01,906][27816] Worker 12, sleep for 56.250 sec to decorrelate experience collection [2024-06-06 16:50:01,908][27808] Worker 3, sleep for 14.062 sec to decorrelate experience collection [2024-06-06 16:50:01,908][27818] Worker 25, sleep for 117.188 sec to decorrelate experience collection [2024-06-06 16:50:01,909][27833] Worker 30, sleep for 140.625 sec to decorrelate experience collection [2024-06-06 16:50:01,909][27822] Worker 18, sleep for 84.375 sec to decorrelate experience collection [2024-06-06 16:50:01,909][27835] Worker 31, sleep for 145.312 sec to decorrelate experience collection [2024-06-06 16:50:01,917][27827] Worker 21, sleep for 98.438 sec to decorrelate experience collection [2024-06-06 16:50:01,918][27821] Worker 16, sleep for 75.000 sec to decorrelate experience collection [2024-06-06 16:50:01,919][27805] Worker 2, sleep for 9.375 sec to decorrelate experience collection [2024-06-06 16:50:01,921][27830] Worker 28, sleep for 131.250 sec to decorrelate experience collection [2024-06-06 16:50:01,921][27831] Worker 26, sleep for 121.875 sec to decorrelate experience collection [2024-06-06 16:50:01,923][27832] Worker 29, sleep for 135.938 sec to decorrelate experience collection [2024-06-06 16:50:01,926][27806] Worker 1, sleep for 4.688 sec to decorrelate experience collection [2024-06-06 16:50:01,931][27825] Worker 23, sleep for 107.812 sec to decorrelate experience collection [2024-06-06 16:50:01,939][27809] Worker 5, sleep for 23.438 sec to decorrelate experience collection [2024-06-06 16:50:01,939][27807] Worker 6, sleep for 28.125 sec to decorrelate experience collection [2024-06-06 16:50:01,939][27813] Worker 7, sleep for 32.812 sec to decorrelate experience collection [2024-06-06 16:50:01,940][27820] Worker 17, sleep for 79.688 sec to decorrelate experience collection [2024-06-06 16:50:01,978][27783] Signal inference workers to stop experience collection... [2024-06-06 16:50:01,986][27823] Worker 19, sleep for 89.062 sec to decorrelate experience collection [2024-06-06 16:50:02,037][27803] InferenceWorker_p0-w0: stopping experience collection [2024-06-06 16:50:02,075][27829] Worker 27, sleep for 126.562 sec to decorrelate experience collection [2024-06-06 16:50:02,502][27783] Signal inference workers to resume experience collection... [2024-06-06 16:50:02,502][27803] InferenceWorker_p0-w0: resuming experience collection [2024-06-06 16:50:02,833][27810] Worker 4, sleep for 18.750 sec to decorrelate experience collection [2024-06-06 16:50:03,625][27803] Updated weights for policy 0, policy_version 48038 (0.0012) [2024-06-06 16:50:03,656][27571] Fps is (10 sec: 16383.6, 60 sec: 16383.6, 300 sec: 16383.6). Total num frames: 787054592. Throughput: 0: 32829.2. Samples: 328300. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 16:50:06,495][27571] Heartbeat connected on Batcher_0 [2024-06-06 16:50:06,497][27571] Heartbeat connected on LearnerWorker_p0 [2024-06-06 16:50:06,503][27571] Heartbeat connected on RolloutWorker_w0 [2024-06-06 16:50:06,558][27571] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-06 16:50:06,636][27806] Worker 1 awakens! [2024-06-06 16:50:06,649][27571] Heartbeat connected on RolloutWorker_w1 [2024-06-06 16:50:08,656][27571] Fps is (10 sec: 16384.4, 60 sec: 10922.7, 300 sec: 10922.7). Total num frames: 787054592. Throughput: 0: 22100.1. Samples: 331500. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 16:50:11,335][27805] Worker 2 awakens! [2024-06-06 16:50:11,344][27571] Heartbeat connected on RolloutWorker_w2 [2024-06-06 16:50:13,656][27571] Fps is (10 sec: 1638.5, 60 sec: 9011.3, 300 sec: 9011.3). Total num frames: 787070976. Throughput: 0: 17303.2. Samples: 346060. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 16:50:16,041][27808] Worker 3 awakens! [2024-06-06 16:50:16,053][27571] Heartbeat connected on RolloutWorker_w3 [2024-06-06 16:50:18,656][27571] Fps is (10 sec: 3276.8, 60 sec: 7864.3, 300 sec: 7864.3). Total num frames: 787087360. Throughput: 0: 14812.8. Samples: 370320. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 16:50:21,676][27810] Worker 4 awakens! [2024-06-06 16:50:21,680][27571] Heartbeat connected on RolloutWorker_w4 [2024-06-06 16:50:23,656][27571] Fps is (10 sec: 6553.6, 60 sec: 8192.1, 300 sec: 8192.1). Total num frames: 787136512. Throughput: 0: 12812.1. Samples: 384360. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 16:50:23,656][27571] Avg episode reward: [(0, '0.197')] [2024-06-06 16:50:25,478][27809] Worker 5 awakens! [2024-06-06 16:50:25,484][27571] Heartbeat connected on RolloutWorker_w5 [2024-06-06 16:50:28,656][27571] Fps is (10 sec: 9830.6, 60 sec: 8426.1, 300 sec: 8426.1). Total num frames: 787185664. Throughput: 0: 12977.8. Samples: 454220. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-06-06 16:50:28,656][27571] Avg episode reward: [(0, '0.214')] [2024-06-06 16:50:29,845][27803] Updated weights for policy 0, policy_version 48048 (0.0015) [2024-06-06 16:50:30,168][27807] Worker 6 awakens! [2024-06-06 16:50:30,173][27571] Heartbeat connected on RolloutWorker_w6 [2024-06-06 16:50:33,656][27571] Fps is (10 sec: 16384.0, 60 sec: 10240.1, 300 sec: 10240.1). Total num frames: 787300352. Throughput: 0: 13956.6. Samples: 558260. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) [2024-06-06 16:50:33,656][27571] Avg episode reward: [(0, '0.230')] [2024-06-06 16:50:34,852][27813] Worker 7 awakens! [2024-06-06 16:50:34,857][27571] Heartbeat connected on RolloutWorker_w7 [2024-06-06 16:50:37,175][27803] Updated weights for policy 0, policy_version 48058 (0.0011) [2024-06-06 16:50:38,656][27571] Fps is (10 sec: 21299.2, 60 sec: 11286.8, 300 sec: 11286.8). Total num frames: 787398656. Throughput: 0: 13888.1. Samples: 624960. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) [2024-06-06 16:50:38,656][27571] Avg episode reward: [(0, '0.235')] [2024-06-06 16:50:39,436][27812] Worker 8 awakens! [2024-06-06 16:50:39,440][27571] Heartbeat connected on RolloutWorker_w8 [2024-06-06 16:50:43,656][27571] Fps is (10 sec: 19660.8, 60 sec: 12124.2, 300 sec: 12124.2). Total num frames: 787496960. Throughput: 0: 16065.0. Samples: 756840. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) [2024-06-06 16:50:43,656][27571] Avg episode reward: [(0, '0.258')] [2024-06-06 16:50:44,180][27811] Worker 9 awakens! [2024-06-06 16:50:44,186][27571] Heartbeat connected on RolloutWorker_w9 [2024-06-06 16:50:44,725][27803] Updated weights for policy 0, policy_version 48068 (0.0012) [2024-06-06 16:50:48,656][27571] Fps is (10 sec: 24575.8, 60 sec: 13703.0, 300 sec: 13703.0). Total num frames: 787644416. Throughput: 0: 12956.6. Samples: 911340. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) [2024-06-06 16:50:48,656][27571] Avg episode reward: [(0, '0.274')] [2024-06-06 16:50:48,848][27815] Worker 10 awakens! [2024-06-06 16:50:48,854][27571] Heartbeat connected on RolloutWorker_w10 [2024-06-06 16:50:50,442][27803] Updated weights for policy 0, policy_version 48078 (0.0015) [2024-06-06 16:50:53,522][27814] Worker 11 awakens! [2024-06-06 16:50:53,530][27571] Heartbeat connected on RolloutWorker_w11 [2024-06-06 16:50:53,656][27571] Fps is (10 sec: 29491.3, 60 sec: 15018.7, 300 sec: 15018.7). Total num frames: 787791872. Throughput: 0: 14960.1. Samples: 1004700. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) [2024-06-06 16:50:53,656][27571] Avg episode reward: [(0, '0.288')] [2024-06-06 16:50:55,943][27803] Updated weights for policy 0, policy_version 48088 (0.0014) [2024-06-06 16:50:58,257][27816] Worker 12 awakens! [2024-06-06 16:50:58,263][27571] Heartbeat connected on RolloutWorker_w12 [2024-06-06 16:50:58,656][27571] Fps is (10 sec: 31129.5, 60 sec: 17749.4, 300 sec: 16384.0). Total num frames: 787955712. Throughput: 0: 18770.2. Samples: 1190720. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) [2024-06-06 16:50:58,656][27571] Avg episode reward: [(0, '0.288')] [2024-06-06 16:51:00,735][27803] Updated weights for policy 0, policy_version 48098 (0.0014) [2024-06-06 16:51:02,872][27834] Worker 13 awakens! [2024-06-06 16:51:02,880][27571] Heartbeat connected on RolloutWorker_w13 [2024-06-06 16:51:03,656][27571] Fps is (10 sec: 32767.7, 60 sec: 17749.5, 300 sec: 17554.3). Total num frames: 788119552. Throughput: 0: 22733.0. Samples: 1393300. Policy #0 lag: (min: 0.0, avg: 25.9, max: 74.0) [2024-06-06 16:51:03,656][27571] Avg episode reward: [(0, '0.313')] [2024-06-06 16:51:03,674][27783] Saving new best policy, reward=0.313! [2024-06-06 16:51:05,304][27803] Updated weights for policy 0, policy_version 48108 (0.0022) [2024-06-06 16:51:07,601][27817] Worker 14 awakens! [2024-06-06 16:51:07,610][27571] Heartbeat connected on RolloutWorker_w14 [2024-06-06 16:51:08,656][27571] Fps is (10 sec: 34406.6, 60 sec: 20753.1, 300 sec: 18787.0). Total num frames: 788299776. Throughput: 0: 24808.0. Samples: 1500720. Policy #0 lag: (min: 0.0, avg: 25.9, max: 74.0) [2024-06-06 16:51:08,656][27571] Avg episode reward: [(0, '0.304')] [2024-06-06 16:51:10,023][27803] Updated weights for policy 0, policy_version 48118 (0.0019) [2024-06-06 16:51:12,311][27819] Worker 15 awakens! [2024-06-06 16:51:12,319][27571] Heartbeat connected on RolloutWorker_w15 [2024-06-06 16:51:13,656][27571] Fps is (10 sec: 36044.2, 60 sec: 23483.7, 300 sec: 19865.6). Total num frames: 788480000. Throughput: 0: 28015.0. Samples: 1714900. Policy #0 lag: (min: 0.0, avg: 25.9, max: 74.0) [2024-06-06 16:51:13,657][27571] Avg episode reward: [(0, '0.322')] [2024-06-06 16:51:13,666][27783] Saving new best policy, reward=0.322! [2024-06-06 16:51:14,229][27803] Updated weights for policy 0, policy_version 48128 (0.0018) [2024-06-06 16:51:17,018][27821] Worker 16 awakens! [2024-06-06 16:51:17,028][27571] Heartbeat connected on RolloutWorker_w16 [2024-06-06 16:51:18,656][27571] Fps is (10 sec: 37682.7, 60 sec: 26487.5, 300 sec: 21010.1). Total num frames: 788676608. Throughput: 0: 30219.9. Samples: 1918160. Policy #0 lag: (min: 0.0, avg: 25.9, max: 74.0) [2024-06-06 16:51:18,656][27571] Avg episode reward: [(0, '0.328')] [2024-06-06 16:51:18,657][27783] Saving new best policy, reward=0.328! [2024-06-06 16:51:19,216][27803] Updated weights for policy 0, policy_version 48138 (0.0021) [2024-06-06 16:51:21,727][27820] Worker 17 awakens! [2024-06-06 16:51:21,738][27571] Heartbeat connected on RolloutWorker_w17 [2024-06-06 16:51:23,401][27803] Updated weights for policy 0, policy_version 48148 (0.0026) [2024-06-06 16:51:23,656][27571] Fps is (10 sec: 37683.4, 60 sec: 28671.9, 300 sec: 21845.4). Total num frames: 788856832. Throughput: 0: 31316.3. Samples: 2034200. Policy #0 lag: (min: 0.0, avg: 25.9, max: 74.0) [2024-06-06 16:51:23,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 16:51:26,384][27822] Worker 18 awakens! [2024-06-06 16:51:26,394][27571] Heartbeat connected on RolloutWorker_w18 [2024-06-06 16:51:27,910][27803] Updated weights for policy 0, policy_version 48158 (0.0022) [2024-06-06 16:51:28,656][27571] Fps is (10 sec: 34406.9, 60 sec: 30583.4, 300 sec: 22420.3). Total num frames: 789020672. Throughput: 0: 33222.6. Samples: 2251860. Policy #0 lag: (min: 0.0, avg: 25.9, max: 74.0) [2024-06-06 16:51:28,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 16:51:31,150][27823] Worker 19 awakens! [2024-06-06 16:51:31,161][27571] Heartbeat connected on RolloutWorker_w19 [2024-06-06 16:51:32,521][27803] Updated weights for policy 0, policy_version 48168 (0.0042) [2024-06-06 16:51:33,656][27571] Fps is (10 sec: 37683.2, 60 sec: 32221.8, 300 sec: 23429.1). Total num frames: 789233664. Throughput: 0: 34752.8. Samples: 2475220. Policy #0 lag: (min: 0.0, avg: 8.3, max: 16.0) [2024-06-06 16:51:33,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 16:51:35,740][27824] Worker 20 awakens! [2024-06-06 16:51:35,751][27571] Heartbeat connected on RolloutWorker_w20 [2024-06-06 16:51:37,146][27803] Updated weights for policy 0, policy_version 48178 (0.0030) [2024-06-06 16:51:38,656][27571] Fps is (10 sec: 40959.6, 60 sec: 33860.2, 300 sec: 24185.9). Total num frames: 789430272. Throughput: 0: 35523.0. Samples: 2603240. Policy #0 lag: (min: 0.0, avg: 8.3, max: 16.0) [2024-06-06 16:51:38,656][27571] Avg episode reward: [(0, '0.318')] [2024-06-06 16:51:40,455][27827] Worker 21 awakens! [2024-06-06 16:51:40,467][27571] Heartbeat connected on RolloutWorker_w21 [2024-06-06 16:51:41,055][27803] Updated weights for policy 0, policy_version 48188 (0.0030) [2024-06-06 16:51:43,656][27571] Fps is (10 sec: 37683.4, 60 sec: 35225.5, 300 sec: 24725.0). Total num frames: 789610496. Throughput: 0: 36688.5. Samples: 2841700. Policy #0 lag: (min: 0.0, avg: 8.3, max: 16.0) [2024-06-06 16:51:43,656][27571] Avg episode reward: [(0, '0.322')] [2024-06-06 16:51:43,695][27783] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000048195_789626880.pth... [2024-06-06 16:51:43,759][27783] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000047702_781549568.pth [2024-06-06 16:51:44,495][27803] Updated weights for policy 0, policy_version 48198 (0.0025) [2024-06-06 16:51:45,127][27828] Worker 22 awakens! [2024-06-06 16:51:45,139][27571] Heartbeat connected on RolloutWorker_w22 [2024-06-06 16:51:48,656][27571] Fps is (10 sec: 37683.5, 60 sec: 36044.8, 300 sec: 25359.6). Total num frames: 789807104. Throughput: 0: 37698.2. Samples: 3089720. Policy #0 lag: (min: 0.0, avg: 8.3, max: 16.0) [2024-06-06 16:51:48,656][27571] Avg episode reward: [(0, '0.328')] [2024-06-06 16:51:49,112][27803] Updated weights for policy 0, policy_version 48208 (0.0027) [2024-06-06 16:51:49,844][27825] Worker 23 awakens! [2024-06-06 16:51:49,856][27571] Heartbeat connected on RolloutWorker_w23 [2024-06-06 16:51:53,656][27571] Fps is (10 sec: 37682.8, 60 sec: 36590.8, 300 sec: 25804.8). Total num frames: 789987328. Throughput: 0: 37795.4. Samples: 3201520. Policy #0 lag: (min: 0.0, avg: 8.3, max: 16.0) [2024-06-06 16:51:53,657][27571] Avg episode reward: [(0, '0.332')] [2024-06-06 16:51:53,720][27783] Saving new best policy, reward=0.332! [2024-06-06 16:51:53,727][27803] Updated weights for policy 0, policy_version 48218 (0.0032) [2024-06-06 16:51:54,484][27826] Worker 24 awakens! [2024-06-06 16:51:54,496][27571] Heartbeat connected on RolloutWorker_w24 [2024-06-06 16:51:56,285][27803] Updated weights for policy 0, policy_version 48228 (0.0024) [2024-06-06 16:51:58,656][27571] Fps is (10 sec: 39321.1, 60 sec: 37410.1, 300 sec: 26476.6). Total num frames: 790200320. Throughput: 0: 38525.8. Samples: 3448560. Policy #0 lag: (min: 0.0, avg: 8.3, max: 16.0) [2024-06-06 16:51:58,656][27571] Avg episode reward: [(0, '0.326')] [2024-06-06 16:51:59,196][27818] Worker 25 awakens! [2024-06-06 16:51:59,208][27571] Heartbeat connected on RolloutWorker_w25 [2024-06-06 16:52:01,123][27803] Updated weights for policy 0, policy_version 48238 (0.0022) [2024-06-06 16:52:03,656][27571] Fps is (10 sec: 47513.7, 60 sec: 39048.4, 300 sec: 27474.7). Total num frames: 790462464. Throughput: 0: 39534.6. Samples: 3697220. Policy #0 lag: (min: 0.0, avg: 8.3, max: 16.0) [2024-06-06 16:52:03,656][27571] Avg episode reward: [(0, '0.317')] [2024-06-06 16:52:03,896][27831] Worker 26 awakens! [2024-06-06 16:52:03,909][27571] Heartbeat connected on RolloutWorker_w26 [2024-06-06 16:52:05,297][27803] Updated weights for policy 0, policy_version 48248 (0.0036) [2024-06-06 16:52:08,230][27803] Updated weights for policy 0, policy_version 48258 (0.0022) [2024-06-06 16:52:08,656][27571] Fps is (10 sec: 45875.8, 60 sec: 39321.6, 300 sec: 27913.5). Total num frames: 790659072. Throughput: 0: 40072.5. Samples: 3837460. Policy #0 lag: (min: 0.0, avg: 62.0, max: 218.0) [2024-06-06 16:52:08,656][27571] Avg episode reward: [(0, '0.317')] [2024-06-06 16:52:08,736][27829] Worker 27 awakens! [2024-06-06 16:52:08,750][27571] Heartbeat connected on RolloutWorker_w27 [2024-06-06 16:52:13,271][27830] Worker 28 awakens! [2024-06-06 16:52:13,282][27571] Heartbeat connected on RolloutWorker_w28 [2024-06-06 16:52:13,314][27803] Updated weights for policy 0, policy_version 48268 (0.0034) [2024-06-06 16:52:13,656][27571] Fps is (10 sec: 37683.7, 60 sec: 39321.7, 300 sec: 28203.9). Total num frames: 790839296. Throughput: 0: 41069.3. Samples: 4099980. Policy #0 lag: (min: 0.0, avg: 62.0, max: 218.0) [2024-06-06 16:52:13,656][27571] Avg episode reward: [(0, '0.328')] [2024-06-06 16:52:15,539][27803] Updated weights for policy 0, policy_version 48278 (0.0028) [2024-06-06 16:52:17,961][27832] Worker 29 awakens! [2024-06-06 16:52:17,975][27571] Heartbeat connected on RolloutWorker_w29 [2024-06-06 16:52:18,656][27571] Fps is (10 sec: 40959.5, 60 sec: 39867.7, 300 sec: 28813.3). Total num frames: 791068672. Throughput: 0: 41680.9. Samples: 4350860. Policy #0 lag: (min: 0.0, avg: 62.0, max: 218.0) [2024-06-06 16:52:18,658][27571] Avg episode reward: [(0, '0.317')] [2024-06-06 16:52:20,420][27803] Updated weights for policy 0, policy_version 48288 (0.0024) [2024-06-06 16:52:22,634][27833] Worker 30 awakens! [2024-06-06 16:52:22,652][27571] Heartbeat connected on RolloutWorker_w30 [2024-06-06 16:52:23,087][27803] Updated weights for policy 0, policy_version 48298 (0.0034) [2024-06-06 16:52:23,656][27571] Fps is (10 sec: 49151.8, 60 sec: 41233.1, 300 sec: 29600.5). Total num frames: 791330816. Throughput: 0: 41644.5. Samples: 4477240. Policy #0 lag: (min: 0.0, avg: 62.0, max: 218.0) [2024-06-06 16:52:23,656][27571] Avg episode reward: [(0, '0.316')] [2024-06-06 16:52:27,324][27835] Worker 31 awakens! [2024-06-06 16:52:27,338][27571] Heartbeat connected on RolloutWorker_w31 [2024-06-06 16:52:27,686][27803] Updated weights for policy 0, policy_version 48308 (0.0027) [2024-06-06 16:52:28,656][27571] Fps is (10 sec: 45875.4, 60 sec: 41779.1, 300 sec: 29914.0). Total num frames: 791527424. Throughput: 0: 42364.9. Samples: 4748120. Policy #0 lag: (min: 0.0, avg: 62.0, max: 218.0) [2024-06-06 16:52:28,657][27571] Avg episode reward: [(0, '0.319')] [2024-06-06 16:52:30,168][27803] Updated weights for policy 0, policy_version 48318 (0.0033) [2024-06-06 16:52:33,656][27571] Fps is (10 sec: 39321.4, 60 sec: 41506.2, 300 sec: 30208.0). Total num frames: 791724032. Throughput: 0: 42974.1. Samples: 5023560. Policy #0 lag: (min: 0.0, avg: 62.0, max: 218.0) [2024-06-06 16:52:33,656][27571] Avg episode reward: [(0, '0.331')] [2024-06-06 16:52:34,355][27783] Signal inference workers to stop experience collection... (50 times) [2024-06-06 16:52:34,390][27803] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-06 16:52:34,418][27783] Signal inference workers to resume experience collection... (50 times) [2024-06-06 16:52:34,419][27803] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-06 16:52:35,194][27803] Updated weights for policy 0, policy_version 48328 (0.0026) [2024-06-06 16:52:37,411][27803] Updated weights for policy 0, policy_version 48338 (0.0028) [2024-06-06 16:52:38,656][27571] Fps is (10 sec: 47513.3, 60 sec: 42871.4, 300 sec: 30980.7). Total num frames: 792002560. Throughput: 0: 43351.1. Samples: 5152320. Policy #0 lag: (min: 0.0, avg: 6.6, max: 19.0) [2024-06-06 16:52:38,657][27571] Avg episode reward: [(0, '0.315')] [2024-06-06 16:52:42,457][27803] Updated weights for policy 0, policy_version 48348 (0.0036) [2024-06-06 16:52:43,656][27571] Fps is (10 sec: 49151.8, 60 sec: 43417.6, 300 sec: 31322.4). Total num frames: 792215552. Throughput: 0: 43988.0. Samples: 5428020. Policy #0 lag: (min: 0.0, avg: 6.6, max: 19.0) [2024-06-06 16:52:43,657][27571] Avg episode reward: [(0, '0.321')] [2024-06-06 16:52:44,564][27803] Updated weights for policy 0, policy_version 48358 (0.0042) [2024-06-06 16:52:48,657][27571] Fps is (10 sec: 37678.8, 60 sec: 42870.5, 300 sec: 31363.5). Total num frames: 792379392. Throughput: 0: 44435.3. Samples: 5696860. Policy #0 lag: (min: 0.0, avg: 6.6, max: 19.0) [2024-06-06 16:52:48,658][27571] Avg episode reward: [(0, '0.313')] [2024-06-06 16:52:49,823][27803] Updated weights for policy 0, policy_version 48368 (0.0033) [2024-06-06 16:52:52,054][27803] Updated weights for policy 0, policy_version 48378 (0.0028) [2024-06-06 16:52:53,656][27571] Fps is (10 sec: 44236.0, 60 sec: 44509.8, 300 sec: 32039.8). Total num frames: 792657920. Throughput: 0: 44023.3. Samples: 5818520. Policy #0 lag: (min: 0.0, avg: 6.6, max: 19.0) [2024-06-06 16:52:53,657][27571] Avg episode reward: [(0, '0.314')] [2024-06-06 16:52:57,038][27803] Updated weights for policy 0, policy_version 48388 (0.0041) [2024-06-06 16:52:58,656][27571] Fps is (10 sec: 50797.5, 60 sec: 44783.1, 300 sec: 32413.8). Total num frames: 792887296. Throughput: 0: 44270.8. Samples: 6092160. Policy #0 lag: (min: 0.0, avg: 6.6, max: 19.0) [2024-06-06 16:52:58,656][27571] Avg episode reward: [(0, '0.320')] [2024-06-06 16:52:59,499][27803] Updated weights for policy 0, policy_version 48398 (0.0033) [2024-06-06 16:53:03,656][27571] Fps is (10 sec: 40961.3, 60 sec: 43417.7, 300 sec: 32509.3). Total num frames: 793067520. Throughput: 0: 44686.4. Samples: 6361740. Policy #0 lag: (min: 0.0, avg: 6.6, max: 19.0) [2024-06-06 16:53:03,656][27571] Avg episode reward: [(0, '0.319')] [2024-06-06 16:53:04,530][27803] Updated weights for policy 0, policy_version 48408 (0.0046) [2024-06-06 16:53:06,754][27803] Updated weights for policy 0, policy_version 48418 (0.0039) [2024-06-06 16:53:08,656][27571] Fps is (10 sec: 42597.7, 60 sec: 44236.7, 300 sec: 32936.1). Total num frames: 793313280. Throughput: 0: 44542.6. Samples: 6481660. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:53:08,656][27571] Avg episode reward: [(0, '0.317')] [2024-06-06 16:53:12,025][27803] Updated weights for policy 0, policy_version 48428 (0.0026) [2024-06-06 16:53:13,656][27571] Fps is (10 sec: 50790.4, 60 sec: 45602.1, 300 sec: 33423.4). Total num frames: 793575424. Throughput: 0: 44756.1. Samples: 6762140. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:53:13,656][27571] Avg episode reward: [(0, '0.317')] [2024-06-06 16:53:13,985][27803] Updated weights for policy 0, policy_version 48438 (0.0044) [2024-06-06 16:53:18,656][27571] Fps is (10 sec: 40960.6, 60 sec: 44237.0, 300 sec: 33327.5). Total num frames: 793722880. Throughput: 0: 44678.8. Samples: 7034100. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:53:18,656][27571] Avg episode reward: [(0, '0.336')] [2024-06-06 16:53:18,666][27783] Saving new best policy, reward=0.336! [2024-06-06 16:53:19,117][27803] Updated weights for policy 0, policy_version 48448 (0.0026) [2024-06-06 16:53:21,462][27803] Updated weights for policy 0, policy_version 48458 (0.0031) [2024-06-06 16:53:23,656][27571] Fps is (10 sec: 40959.4, 60 sec: 44236.7, 300 sec: 33782.3). Total num frames: 793985024. Throughput: 0: 44469.8. Samples: 7153460. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:53:23,657][27571] Avg episode reward: [(0, '0.328')] [2024-06-06 16:53:26,462][27803] Updated weights for policy 0, policy_version 48468 (0.0037) [2024-06-06 16:53:28,656][27571] Fps is (10 sec: 52428.0, 60 sec: 45329.1, 300 sec: 34215.9). Total num frames: 794247168. Throughput: 0: 44514.7. Samples: 7431180. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:53:28,656][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 16:53:28,821][27803] Updated weights for policy 0, policy_version 48478 (0.0033) [2024-06-06 16:53:33,656][27571] Fps is (10 sec: 40960.3, 60 sec: 44509.9, 300 sec: 34108.5). Total num frames: 794394624. Throughput: 0: 44513.7. Samples: 7699920. Policy #0 lag: (min: 0.0, avg: 8.8, max: 21.0) [2024-06-06 16:53:33,656][27571] Avg episode reward: [(0, '0.319')] [2024-06-06 16:53:34,143][27803] Updated weights for policy 0, policy_version 48488 (0.0040) [2024-06-06 16:53:35,913][27803] Updated weights for policy 0, policy_version 48498 (0.0032) [2024-06-06 16:53:38,656][27571] Fps is (10 sec: 39321.6, 60 sec: 43963.8, 300 sec: 34442.8). Total num frames: 794640384. Throughput: 0: 44571.4. Samples: 7824220. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 16:53:38,656][27571] Avg episode reward: [(0, '0.310')] [2024-06-06 16:53:41,212][27783] Signal inference workers to stop experience collection... (100 times) [2024-06-06 16:53:41,212][27783] Signal inference workers to resume experience collection... (100 times) [2024-06-06 16:53:41,246][27803] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-06 16:53:41,247][27803] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-06 16:53:41,338][27803] Updated weights for policy 0, policy_version 48508 (0.0037) [2024-06-06 16:53:43,181][27803] Updated weights for policy 0, policy_version 48518 (0.0030) [2024-06-06 16:53:43,656][27571] Fps is (10 sec: 52428.7, 60 sec: 45056.0, 300 sec: 34905.1). Total num frames: 794918912. Throughput: 0: 44543.8. Samples: 8096640. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 16:53:43,656][27571] Avg episode reward: [(0, '0.313')] [2024-06-06 16:53:43,674][27783] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000048518_794918912.pth... [2024-06-06 16:53:43,747][27783] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000048028_786890752.pth [2024-06-06 16:53:48,656][27571] Fps is (10 sec: 42598.2, 60 sec: 44783.8, 300 sec: 34789.9). Total num frames: 795066368. Throughput: 0: 44634.1. Samples: 8370280. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 16:53:48,659][27571] Avg episode reward: [(0, '0.310')] [2024-06-06 16:53:48,749][27803] Updated weights for policy 0, policy_version 48528 (0.0039) [2024-06-06 16:53:50,778][27803] Updated weights for policy 0, policy_version 48538 (0.0037) [2024-06-06 16:53:53,656][27571] Fps is (10 sec: 37683.4, 60 sec: 43963.9, 300 sec: 35020.8). Total num frames: 795295744. Throughput: 0: 44593.8. Samples: 8488380. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 16:53:53,656][27571] Avg episode reward: [(0, '0.314')] [2024-06-06 16:53:55,862][27803] Updated weights for policy 0, policy_version 48548 (0.0035) [2024-06-06 16:53:58,207][27803] Updated weights for policy 0, policy_version 48558 (0.0037) [2024-06-06 16:53:58,656][27571] Fps is (10 sec: 50791.3, 60 sec: 44782.9, 300 sec: 35443.0). Total num frames: 795574272. Throughput: 0: 44422.7. Samples: 8761160. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 16:53:58,656][27571] Avg episode reward: [(0, '0.320')] [2024-06-06 16:54:03,417][27803] Updated weights for policy 0, policy_version 48568 (0.0033) [2024-06-06 16:54:03,656][27571] Fps is (10 sec: 45874.8, 60 sec: 44782.8, 300 sec: 35455.0). Total num frames: 795754496. Throughput: 0: 44590.9. Samples: 9040700. Policy #0 lag: (min: 0.0, avg: 11.6, max: 23.0) [2024-06-06 16:54:03,656][27571] Avg episode reward: [(0, '0.311')] [2024-06-06 16:54:05,435][27803] Updated weights for policy 0, policy_version 48578 (0.0029) [2024-06-06 16:54:08,656][27571] Fps is (10 sec: 40959.2, 60 sec: 44509.8, 300 sec: 35659.3). Total num frames: 795983872. Throughput: 0: 44460.9. Samples: 9154200. Policy #0 lag: (min: 0.0, avg: 13.4, max: 23.0) [2024-06-06 16:54:08,656][27571] Avg episode reward: [(0, '0.320')] [2024-06-06 16:54:10,750][27803] Updated weights for policy 0, policy_version 48588 (0.0040) [2024-06-06 16:54:12,637][27803] Updated weights for policy 0, policy_version 48598 (0.0024) [2024-06-06 16:54:13,656][27571] Fps is (10 sec: 49150.9, 60 sec: 44509.6, 300 sec: 35981.8). Total num frames: 796246016. Throughput: 0: 44313.9. Samples: 9425320. Policy #0 lag: (min: 0.0, avg: 13.4, max: 23.0) [2024-06-06 16:54:13,657][27571] Avg episode reward: [(0, '0.315')] [2024-06-06 16:54:18,123][27803] Updated weights for policy 0, policy_version 48608 (0.0035) [2024-06-06 16:54:18,656][27571] Fps is (10 sec: 45875.7, 60 sec: 45329.0, 300 sec: 36044.8). Total num frames: 796442624. Throughput: 0: 44479.6. Samples: 9701500. Policy #0 lag: (min: 0.0, avg: 13.4, max: 23.0) [2024-06-06 16:54:18,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 16:54:20,224][27803] Updated weights for policy 0, policy_version 48618 (0.0036) [2024-06-06 16:54:23,656][27571] Fps is (10 sec: 40961.6, 60 sec: 44510.0, 300 sec: 36166.2). Total num frames: 796655616. Throughput: 0: 44553.9. Samples: 9829140. Policy #0 lag: (min: 0.0, avg: 13.4, max: 23.0) [2024-06-06 16:54:23,656][27571] Avg episode reward: [(0, '0.325')] [2024-06-06 16:54:25,282][27803] Updated weights for policy 0, policy_version 48628 (0.0022) [2024-06-06 16:54:27,797][27803] Updated weights for policy 0, policy_version 48638 (0.0029) [2024-06-06 16:54:28,656][27571] Fps is (10 sec: 45874.8, 60 sec: 44236.8, 300 sec: 36402.3). Total num frames: 796901376. Throughput: 0: 44345.7. Samples: 10092200. Policy #0 lag: (min: 0.0, avg: 13.4, max: 23.0) [2024-06-06 16:54:28,657][27571] Avg episode reward: [(0, '0.310')] [2024-06-06 16:54:32,588][27803] Updated weights for policy 0, policy_version 48648 (0.0033) [2024-06-06 16:54:33,656][27571] Fps is (10 sec: 44236.2, 60 sec: 45056.0, 300 sec: 36454.4). Total num frames: 797097984. Throughput: 0: 44491.6. Samples: 10372400. Policy #0 lag: (min: 0.0, avg: 13.4, max: 23.0) [2024-06-06 16:54:33,657][27571] Avg episode reward: [(0, '0.311')] [2024-06-06 16:54:34,377][27783] Signal inference workers to stop experience collection... (150 times) [2024-06-06 16:54:34,415][27803] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-06 16:54:34,433][27783] Signal inference workers to resume experience collection... (150 times) [2024-06-06 16:54:34,433][27803] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-06 16:54:34,920][27803] Updated weights for policy 0, policy_version 48658 (0.0032) [2024-06-06 16:54:38,660][27571] Fps is (10 sec: 40943.8, 60 sec: 44506.9, 300 sec: 36561.7). Total num frames: 797310976. Throughput: 0: 44637.3. Samples: 10497240. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 16:54:38,661][27571] Avg episode reward: [(0, '0.313')] [2024-06-06 16:54:40,098][27803] Updated weights for policy 0, policy_version 48668 (0.0037) [2024-06-06 16:54:42,269][27803] Updated weights for policy 0, policy_version 48678 (0.0022) [2024-06-06 16:54:43,656][27571] Fps is (10 sec: 45874.9, 60 sec: 43963.7, 300 sec: 36779.3). Total num frames: 797556736. Throughput: 0: 44346.4. Samples: 10756760. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 16:54:43,657][27571] Avg episode reward: [(0, '0.313')] [2024-06-06 16:54:47,602][27803] Updated weights for policy 0, policy_version 48688 (0.0047) [2024-06-06 16:54:48,656][27571] Fps is (10 sec: 47532.9, 60 sec: 45329.1, 300 sec: 36933.5). Total num frames: 797786112. Throughput: 0: 44236.1. Samples: 11031320. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 16:54:48,656][27571] Avg episode reward: [(0, '0.315')] [2024-06-06 16:54:50,140][27803] Updated weights for policy 0, policy_version 48698 (0.0037) [2024-06-06 16:54:53,656][27571] Fps is (10 sec: 40960.7, 60 sec: 44509.9, 300 sec: 37544.4). Total num frames: 797966336. Throughput: 0: 44681.9. Samples: 11164880. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 16:54:53,656][27571] Avg episode reward: [(0, '0.313')] [2024-06-06 16:54:54,820][27803] Updated weights for policy 0, policy_version 48708 (0.0037) [2024-06-06 16:54:57,380][27803] Updated weights for policy 0, policy_version 48718 (0.0040) [2024-06-06 16:54:58,656][27571] Fps is (10 sec: 42598.0, 60 sec: 43963.6, 300 sec: 37822.1). Total num frames: 798212096. Throughput: 0: 44550.5. Samples: 11430080. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 16:54:58,659][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 16:55:01,972][27803] Updated weights for policy 0, policy_version 48728 (0.0030) [2024-06-06 16:55:03,656][27571] Fps is (10 sec: 47513.3, 60 sec: 44783.0, 300 sec: 38599.6). Total num frames: 798441472. Throughput: 0: 44425.7. Samples: 11700660. Policy #0 lag: (min: 0.0, avg: 11.3, max: 22.0) [2024-06-06 16:55:03,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 16:55:04,931][27803] Updated weights for policy 0, policy_version 48738 (0.0028) [2024-06-06 16:55:08,656][27571] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 39210.5). Total num frames: 798638080. Throughput: 0: 44473.2. Samples: 11830440. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 16:55:08,656][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 16:55:09,403][27803] Updated weights for policy 0, policy_version 48748 (0.0039) [2024-06-06 16:55:12,862][27803] Updated weights for policy 0, policy_version 48758 (0.0036) [2024-06-06 16:55:13,656][27571] Fps is (10 sec: 42598.1, 60 sec: 43690.8, 300 sec: 39932.5). Total num frames: 798867456. Throughput: 0: 44430.6. Samples: 12091580. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 16:55:13,660][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 16:55:16,981][27803] Updated weights for policy 0, policy_version 48768 (0.0029) [2024-06-06 16:55:18,656][27571] Fps is (10 sec: 47513.5, 60 sec: 44509.8, 300 sec: 40599.0). Total num frames: 799113216. Throughput: 0: 44095.1. Samples: 12356680. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 16:55:18,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 16:55:19,946][27803] Updated weights for policy 0, policy_version 48778 (0.0027) [2024-06-06 16:55:23,656][27571] Fps is (10 sec: 44237.3, 60 sec: 44236.8, 300 sec: 41098.8). Total num frames: 799309824. Throughput: 0: 44454.2. Samples: 12497500. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 16:55:23,656][27571] Avg episode reward: [(0, '0.299')] [2024-06-06 16:55:24,098][27803] Updated weights for policy 0, policy_version 48788 (0.0035) [2024-06-06 16:55:27,367][27803] Updated weights for policy 0, policy_version 48798 (0.0031) [2024-06-06 16:55:28,656][27571] Fps is (10 sec: 42598.4, 60 sec: 43963.7, 300 sec: 41487.6). Total num frames: 799539200. Throughput: 0: 44595.2. Samples: 12763540. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 16:55:28,657][27571] Avg episode reward: [(0, '0.319')] [2024-06-06 16:55:31,136][27803] Updated weights for policy 0, policy_version 48808 (0.0040) [2024-06-06 16:55:33,656][27571] Fps is (10 sec: 47513.5, 60 sec: 44783.0, 300 sec: 41987.5). Total num frames: 799784960. Throughput: 0: 44512.9. Samples: 13034400. Policy #0 lag: (min: 0.0, avg: 9.5, max: 22.0) [2024-06-06 16:55:33,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 16:55:34,673][27803] Updated weights for policy 0, policy_version 48818 (0.0032) [2024-06-06 16:55:38,656][27571] Fps is (10 sec: 44237.0, 60 sec: 44512.8, 300 sec: 42320.7). Total num frames: 799981568. Throughput: 0: 44535.9. Samples: 13169000. Policy #0 lag: (min: 0.0, avg: 9.8, max: 23.0) [2024-06-06 16:55:38,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 16:55:38,783][27803] Updated weights for policy 0, policy_version 48828 (0.0031) [2024-06-06 16:55:42,048][27803] Updated weights for policy 0, policy_version 48838 (0.0025) [2024-06-06 16:55:43,656][27571] Fps is (10 sec: 40960.1, 60 sec: 43963.8, 300 sec: 42542.9). Total num frames: 800194560. Throughput: 0: 44502.3. Samples: 13432680. Policy #0 lag: (min: 0.0, avg: 9.8, max: 23.0) [2024-06-06 16:55:43,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 16:55:43,731][27783] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000048841_800210944.pth... [2024-06-06 16:55:43,806][27783] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000048195_789626880.pth [2024-06-06 16:55:46,125][27803] Updated weights for policy 0, policy_version 48848 (0.0034) [2024-06-06 16:55:47,154][27783] Signal inference workers to stop experience collection... (200 times) [2024-06-06 16:55:47,184][27803] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-06 16:55:47,263][27783] Signal inference workers to resume experience collection... (200 times) [2024-06-06 16:55:47,263][27803] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-06 16:55:48,656][27571] Fps is (10 sec: 47513.4, 60 sec: 44509.8, 300 sec: 42931.6). Total num frames: 800456704. Throughput: 0: 44513.3. Samples: 13703760. Policy #0 lag: (min: 0.0, avg: 9.8, max: 23.0) [2024-06-06 16:55:48,656][27571] Avg episode reward: [(0, '0.302')] [2024-06-06 16:55:49,030][27803] Updated weights for policy 0, policy_version 48858 (0.0033) [2024-06-06 16:55:53,191][27803] Updated weights for policy 0, policy_version 48868 (0.0036) [2024-06-06 16:55:53,658][27571] Fps is (10 sec: 47501.3, 60 sec: 45054.1, 300 sec: 43097.9). Total num frames: 800669696. Throughput: 0: 44668.2. Samples: 13840620. Policy #0 lag: (min: 0.0, avg: 9.8, max: 23.0) [2024-06-06 16:55:53,659][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 16:55:56,619][27803] Updated weights for policy 0, policy_version 48878 (0.0033) [2024-06-06 16:55:58,656][27571] Fps is (10 sec: 42598.9, 60 sec: 44509.9, 300 sec: 43264.9). Total num frames: 800882688. Throughput: 0: 44792.1. Samples: 14107220. Policy #0 lag: (min: 0.0, avg: 9.8, max: 23.0) [2024-06-06 16:55:58,656][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 16:56:00,399][27803] Updated weights for policy 0, policy_version 48888 (0.0030) [2024-06-06 16:56:03,656][27571] Fps is (10 sec: 45886.6, 60 sec: 44782.9, 300 sec: 43487.0). Total num frames: 801128448. Throughput: 0: 44786.7. Samples: 14372080. Policy #0 lag: (min: 0.0, avg: 9.8, max: 23.0) [2024-06-06 16:56:03,657][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 16:56:03,862][27803] Updated weights for policy 0, policy_version 48898 (0.0032) [2024-06-06 16:56:07,933][27803] Updated weights for policy 0, policy_version 48908 (0.0029) [2024-06-06 16:56:08,656][27571] Fps is (10 sec: 45874.9, 60 sec: 45056.0, 300 sec: 43598.1). Total num frames: 801341440. Throughput: 0: 44595.5. Samples: 14504300. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 16:56:08,657][27571] Avg episode reward: [(0, '0.304')] [2024-06-06 16:56:11,113][27803] Updated weights for policy 0, policy_version 48918 (0.0036) [2024-06-06 16:56:13,656][27571] Fps is (10 sec: 42598.6, 60 sec: 44783.0, 300 sec: 43653.7). Total num frames: 801554432. Throughput: 0: 44695.2. Samples: 14774820. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 16:56:13,657][27571] Avg episode reward: [(0, '0.314')] [2024-06-06 16:56:15,419][27803] Updated weights for policy 0, policy_version 48928 (0.0032) [2024-06-06 16:56:18,377][27803] Updated weights for policy 0, policy_version 48938 (0.0038) [2024-06-06 16:56:18,656][27571] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 43875.8). Total num frames: 801800192. Throughput: 0: 44569.2. Samples: 15040020. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 16:56:18,657][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 16:56:22,633][27803] Updated weights for policy 0, policy_version 48948 (0.0037) [2024-06-06 16:56:23,656][27571] Fps is (10 sec: 45874.9, 60 sec: 45055.9, 300 sec: 44042.4). Total num frames: 802013184. Throughput: 0: 44610.6. Samples: 15176480. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 16:56:23,657][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 16:56:26,265][27803] Updated weights for policy 0, policy_version 48958 (0.0033) [2024-06-06 16:56:28,656][27571] Fps is (10 sec: 40960.7, 60 sec: 44510.0, 300 sec: 43986.9). Total num frames: 802209792. Throughput: 0: 44648.1. Samples: 15441840. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 16:56:28,656][27571] Avg episode reward: [(0, '0.299')] [2024-06-06 16:56:29,963][27803] Updated weights for policy 0, policy_version 48968 (0.0028) [2024-06-06 16:56:33,351][27803] Updated weights for policy 0, policy_version 48978 (0.0040) [2024-06-06 16:56:33,656][27571] Fps is (10 sec: 44237.1, 60 sec: 44509.8, 300 sec: 44153.5). Total num frames: 802455552. Throughput: 0: 44442.3. Samples: 15703660. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 16:56:33,656][27571] Avg episode reward: [(0, '0.298')] [2024-06-06 16:56:37,307][27803] Updated weights for policy 0, policy_version 48988 (0.0028) [2024-06-06 16:56:38,656][27571] Fps is (10 sec: 45875.2, 60 sec: 44783.0, 300 sec: 44264.6). Total num frames: 802668544. Throughput: 0: 44527.9. Samples: 15844260. Policy #0 lag: (min: 0.0, avg: 10.9, max: 21.0) [2024-06-06 16:56:38,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 16:56:40,762][27803] Updated weights for policy 0, policy_version 48998 (0.0037) [2024-06-06 16:56:43,656][27571] Fps is (10 sec: 42598.3, 60 sec: 44782.9, 300 sec: 44320.1). Total num frames: 802881536. Throughput: 0: 44473.7. Samples: 16108540. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:56:43,657][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 16:56:44,714][27803] Updated weights for policy 0, policy_version 49008 (0.0029) [2024-06-06 16:56:48,434][27803] Updated weights for policy 0, policy_version 49018 (0.0023) [2024-06-06 16:56:48,656][27571] Fps is (10 sec: 44236.0, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 803110912. Throughput: 0: 44576.4. Samples: 16378020. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:56:48,657][27571] Avg episode reward: [(0, '0.304')] [2024-06-06 16:56:52,152][27803] Updated weights for policy 0, policy_version 49028 (0.0032) [2024-06-06 16:56:53,656][27571] Fps is (10 sec: 47514.0, 60 sec: 44784.9, 300 sec: 44597.8). Total num frames: 803356672. Throughput: 0: 44672.5. Samples: 16514560. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:56:53,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 16:56:55,588][27803] Updated weights for policy 0, policy_version 49038 (0.0033) [2024-06-06 16:56:58,656][27571] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 44320.1). Total num frames: 803536896. Throughput: 0: 44529.8. Samples: 16778660. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:56:58,656][27571] Avg episode reward: [(0, '0.317')] [2024-06-06 16:56:59,145][27783] Signal inference workers to stop experience collection... (250 times) [2024-06-06 16:56:59,199][27803] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-06 16:56:59,259][27783] Signal inference workers to resume experience collection... (250 times) [2024-06-06 16:56:59,260][27803] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-06 16:56:59,403][27803] Updated weights for policy 0, policy_version 49048 (0.0027) [2024-06-06 16:57:02,744][27803] Updated weights for policy 0, policy_version 49058 (0.0039) [2024-06-06 16:57:03,656][27571] Fps is (10 sec: 42598.0, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 803782656. Throughput: 0: 44442.2. Samples: 17039920. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:57:03,657][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 16:57:06,865][27803] Updated weights for policy 0, policy_version 49068 (0.0028) [2024-06-06 16:57:08,656][27571] Fps is (10 sec: 47513.6, 60 sec: 44509.9, 300 sec: 44653.3). Total num frames: 804012032. Throughput: 0: 44558.3. Samples: 17181600. Policy #0 lag: (min: 0.0, avg: 10.2, max: 22.0) [2024-06-06 16:57:08,656][27571] Avg episode reward: [(0, '0.305')] [2024-06-06 16:57:10,191][27803] Updated weights for policy 0, policy_version 49078 (0.0039) [2024-06-06 16:57:13,656][27571] Fps is (10 sec: 42598.9, 60 sec: 44236.9, 300 sec: 44542.3). Total num frames: 804208640. Throughput: 0: 44416.0. Samples: 17440560. Policy #0 lag: (min: 1.0, avg: 12.0, max: 22.0) [2024-06-06 16:57:13,656][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 16:57:14,118][27803] Updated weights for policy 0, policy_version 49088 (0.0030) [2024-06-06 16:57:17,855][27803] Updated weights for policy 0, policy_version 49098 (0.0027) [2024-06-06 16:57:18,656][27571] Fps is (10 sec: 42598.5, 60 sec: 43963.8, 300 sec: 44431.2). Total num frames: 804438016. Throughput: 0: 44603.6. Samples: 17710820. Policy #0 lag: (min: 1.0, avg: 12.0, max: 22.0) [2024-06-06 16:57:18,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 16:57:21,420][27803] Updated weights for policy 0, policy_version 49108 (0.0037) [2024-06-06 16:57:23,656][27571] Fps is (10 sec: 47513.3, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 804683776. Throughput: 0: 44561.7. Samples: 17849540. Policy #0 lag: (min: 1.0, avg: 12.0, max: 22.0) [2024-06-06 16:57:23,656][27571] Avg episode reward: [(0, '0.301')] [2024-06-06 16:57:25,005][27803] Updated weights for policy 0, policy_version 49118 (0.0034) [2024-06-06 16:57:28,624][27803] Updated weights for policy 0, policy_version 49128 (0.0021) [2024-06-06 16:57:28,656][27571] Fps is (10 sec: 47513.3, 60 sec: 45055.9, 300 sec: 44708.9). Total num frames: 804913152. Throughput: 0: 44579.1. Samples: 18114600. Policy #0 lag: (min: 1.0, avg: 12.0, max: 22.0) [2024-06-06 16:57:28,657][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 16:57:32,083][27803] Updated weights for policy 0, policy_version 49138 (0.0038) [2024-06-06 16:57:33,656][27571] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 805109760. Throughput: 0: 44545.4. Samples: 18382560. Policy #0 lag: (min: 1.0, avg: 12.0, max: 22.0) [2024-06-06 16:57:33,660][27571] Avg episode reward: [(0, '0.304')] [2024-06-06 16:57:36,296][27803] Updated weights for policy 0, policy_version 49148 (0.0027) [2024-06-06 16:57:38,660][27571] Fps is (10 sec: 44219.1, 60 sec: 44779.8, 300 sec: 44541.7). Total num frames: 805355520. Throughput: 0: 44531.5. Samples: 18518660. Policy #0 lag: (min: 1.0, avg: 12.0, max: 22.0) [2024-06-06 16:57:38,661][27571] Avg episode reward: [(0, '0.310')] [2024-06-06 16:57:39,731][27803] Updated weights for policy 0, policy_version 49158 (0.0046) [2024-06-06 16:57:43,656][27571] Fps is (10 sec: 44237.3, 60 sec: 44510.0, 300 sec: 44653.6). Total num frames: 805552128. Throughput: 0: 44551.7. Samples: 18783480. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 16:57:43,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 16:57:43,751][27783] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000049168_805568512.pth... [2024-06-06 16:57:43,759][27803] Updated weights for policy 0, policy_version 49168 (0.0029) [2024-06-06 16:57:43,801][27783] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000048518_794918912.pth [2024-06-06 16:57:47,190][27803] Updated weights for policy 0, policy_version 49178 (0.0042) [2024-06-06 16:57:48,656][27571] Fps is (10 sec: 42615.5, 60 sec: 44509.9, 300 sec: 44486.8). Total num frames: 805781504. Throughput: 0: 44691.1. Samples: 19051020. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 16:57:48,657][27571] Avg episode reward: [(0, '0.315')] [2024-06-06 16:57:51,049][27803] Updated weights for policy 0, policy_version 49188 (0.0038) [2024-06-06 16:57:53,660][27571] Fps is (10 sec: 47493.6, 60 sec: 44506.8, 300 sec: 44541.6). Total num frames: 806027264. Throughput: 0: 44552.0. Samples: 19186620. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 16:57:53,661][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 16:57:54,236][27803] Updated weights for policy 0, policy_version 49198 (0.0041) [2024-06-06 16:57:58,184][27803] Updated weights for policy 0, policy_version 49208 (0.0037) [2024-06-06 16:57:58,657][27571] Fps is (10 sec: 45868.8, 60 sec: 45054.9, 300 sec: 44653.1). Total num frames: 806240256. Throughput: 0: 44771.4. Samples: 19455340. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 16:57:58,658][27571] Avg episode reward: [(0, '0.298')] [2024-06-06 16:58:01,541][27803] Updated weights for policy 0, policy_version 49218 (0.0029) [2024-06-06 16:58:03,656][27571] Fps is (10 sec: 40977.1, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 806436864. Throughput: 0: 44597.8. Samples: 19717720. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 16:58:03,656][27571] Avg episode reward: [(0, '0.299')] [2024-06-06 16:58:05,807][27803] Updated weights for policy 0, policy_version 49228 (0.0037) [2024-06-06 16:58:08,656][27571] Fps is (10 sec: 42604.6, 60 sec: 44236.8, 300 sec: 44375.6). Total num frames: 806666240. Throughput: 0: 44430.2. Samples: 19848900. Policy #0 lag: (min: 0.0, avg: 9.5, max: 20.0) [2024-06-06 16:58:08,656][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 16:58:09,099][27803] Updated weights for policy 0, policy_version 49238 (0.0038) [2024-06-06 16:58:13,344][27803] Updated weights for policy 0, policy_version 49248 (0.0027) [2024-06-06 16:58:13,656][27571] Fps is (10 sec: 45874.5, 60 sec: 44782.8, 300 sec: 44653.3). Total num frames: 806895616. Throughput: 0: 44596.4. Samples: 20121440. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 16:58:13,657][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 16:58:16,614][27803] Updated weights for policy 0, policy_version 49258 (0.0025) [2024-06-06 16:58:18,656][27571] Fps is (10 sec: 42597.9, 60 sec: 44236.7, 300 sec: 44431.2). Total num frames: 807092224. Throughput: 0: 44533.2. Samples: 20386560. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 16:58:18,657][27571] Avg episode reward: [(0, '0.301')] [2024-06-06 16:58:20,344][27803] Updated weights for policy 0, policy_version 49268 (0.0031) [2024-06-06 16:58:22,296][27783] Signal inference workers to stop experience collection... (300 times) [2024-06-06 16:58:22,339][27803] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-06 16:58:22,349][27783] Signal inference workers to resume experience collection... (300 times) [2024-06-06 16:58:22,354][27803] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-06 16:58:23,656][27571] Fps is (10 sec: 45875.8, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 807354368. Throughput: 0: 44414.3. Samples: 20517120. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 16:58:23,656][27571] Avg episode reward: [(0, '0.304')] [2024-06-06 16:58:23,751][27803] Updated weights for policy 0, policy_version 49278 (0.0033) [2024-06-06 16:58:27,580][27803] Updated weights for policy 0, policy_version 49288 (0.0031) [2024-06-06 16:58:28,656][27571] Fps is (10 sec: 45875.3, 60 sec: 43963.7, 300 sec: 44597.8). Total num frames: 807550976. Throughput: 0: 44406.9. Samples: 20781800. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 16:58:28,657][27571] Avg episode reward: [(0, '0.304')] [2024-06-06 16:58:31,026][27803] Updated weights for policy 0, policy_version 49298 (0.0031) [2024-06-06 16:58:33,656][27571] Fps is (10 sec: 40959.7, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 807763968. Throughput: 0: 44438.2. Samples: 21050740. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 16:58:33,660][27571] Avg episode reward: [(0, '0.298')] [2024-06-06 16:58:35,457][27803] Updated weights for policy 0, policy_version 49308 (0.0027) [2024-06-06 16:58:38,417][27803] Updated weights for policy 0, policy_version 49318 (0.0038) [2024-06-06 16:58:38,656][27571] Fps is (10 sec: 47513.9, 60 sec: 44512.9, 300 sec: 44431.2). Total num frames: 808026112. Throughput: 0: 44328.5. Samples: 21181220. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 16:58:38,657][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 16:58:42,662][27803] Updated weights for policy 0, policy_version 49328 (0.0040) [2024-06-06 16:58:43,656][27571] Fps is (10 sec: 45874.7, 60 sec: 44509.7, 300 sec: 44597.8). Total num frames: 808222720. Throughput: 0: 44363.9. Samples: 21451660. Policy #0 lag: (min: 0.0, avg: 9.5, max: 24.0) [2024-06-06 16:58:43,656][27571] Avg episode reward: [(0, '0.305')] [2024-06-06 16:58:46,095][27803] Updated weights for policy 0, policy_version 49338 (0.0031) [2024-06-06 16:58:48,656][27571] Fps is (10 sec: 42598.3, 60 sec: 44509.9, 300 sec: 44597.8). Total num frames: 808452096. Throughput: 0: 44610.1. Samples: 21725180. Policy #0 lag: (min: 0.0, avg: 9.5, max: 24.0) [2024-06-06 16:58:48,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 16:58:49,616][27803] Updated weights for policy 0, policy_version 49348 (0.0035) [2024-06-06 16:58:53,146][27803] Updated weights for policy 0, policy_version 49358 (0.0032) [2024-06-06 16:58:53,656][27571] Fps is (10 sec: 47513.9, 60 sec: 44512.9, 300 sec: 44486.7). Total num frames: 808697856. Throughput: 0: 44586.1. Samples: 21855280. Policy #0 lag: (min: 0.0, avg: 9.5, max: 24.0) [2024-06-06 16:58:53,657][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 16:58:57,000][27803] Updated weights for policy 0, policy_version 49368 (0.0040) [2024-06-06 16:58:58,656][27571] Fps is (10 sec: 44236.5, 60 sec: 44237.8, 300 sec: 44542.3). Total num frames: 808894464. Throughput: 0: 44380.0. Samples: 22118540. Policy #0 lag: (min: 0.0, avg: 9.5, max: 24.0) [2024-06-06 16:58:58,657][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 16:59:00,354][27803] Updated weights for policy 0, policy_version 49378 (0.0036) [2024-06-06 16:59:03,656][27571] Fps is (10 sec: 40960.0, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 809107456. Throughput: 0: 44480.9. Samples: 22388200. Policy #0 lag: (min: 0.0, avg: 9.5, max: 24.0) [2024-06-06 16:59:03,668][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 16:59:04,582][27803] Updated weights for policy 0, policy_version 49388 (0.0041) [2024-06-06 16:59:07,784][27803] Updated weights for policy 0, policy_version 49398 (0.0028) [2024-06-06 16:59:08,656][27571] Fps is (10 sec: 45873.9, 60 sec: 44782.6, 300 sec: 44431.2). Total num frames: 809353216. Throughput: 0: 44442.7. Samples: 22517060. Policy #0 lag: (min: 0.0, avg: 9.5, max: 24.0) [2024-06-06 16:59:08,657][27571] Avg episode reward: [(0, '0.316')] [2024-06-06 16:59:12,137][27803] Updated weights for policy 0, policy_version 49408 (0.0027) [2024-06-06 16:59:13,656][27571] Fps is (10 sec: 45875.8, 60 sec: 44510.0, 300 sec: 44486.7). Total num frames: 809566208. Throughput: 0: 44526.8. Samples: 22785500. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:59:13,656][27571] Avg episode reward: [(0, '0.315')] [2024-06-06 16:59:15,380][27803] Updated weights for policy 0, policy_version 49418 (0.0021) [2024-06-06 16:59:18,657][27571] Fps is (10 sec: 44234.0, 60 sec: 45055.3, 300 sec: 44542.1). Total num frames: 809795584. Throughput: 0: 44591.9. Samples: 23057420. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:59:18,657][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 16:59:19,182][27803] Updated weights for policy 0, policy_version 49428 (0.0038) [2024-06-06 16:59:22,423][27803] Updated weights for policy 0, policy_version 49438 (0.0035) [2024-06-06 16:59:23,656][27571] Fps is (10 sec: 44236.5, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 810008576. Throughput: 0: 44655.1. Samples: 23190700. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:59:23,656][27571] Avg episode reward: [(0, '0.302')] [2024-06-06 16:59:26,538][27803] Updated weights for policy 0, policy_version 49448 (0.0036) [2024-06-06 16:59:28,656][27571] Fps is (10 sec: 44241.4, 60 sec: 44783.0, 300 sec: 44542.3). Total num frames: 810237952. Throughput: 0: 44515.3. Samples: 23454840. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:59:28,656][27571] Avg episode reward: [(0, '0.300')] [2024-06-06 16:59:29,826][27803] Updated weights for policy 0, policy_version 49458 (0.0035) [2024-06-06 16:59:33,656][27571] Fps is (10 sec: 44237.3, 60 sec: 44783.0, 300 sec: 44542.9). Total num frames: 810450944. Throughput: 0: 44417.9. Samples: 23723980. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:59:33,656][27571] Avg episode reward: [(0, '0.313')] [2024-06-06 16:59:34,184][27803] Updated weights for policy 0, policy_version 49468 (0.0034) [2024-06-06 16:59:37,220][27803] Updated weights for policy 0, policy_version 49478 (0.0028) [2024-06-06 16:59:38,656][27571] Fps is (10 sec: 45874.4, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 810696704. Throughput: 0: 44423.9. Samples: 23854360. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:59:38,657][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 16:59:41,537][27803] Updated weights for policy 0, policy_version 49488 (0.0027) [2024-06-06 16:59:43,656][27571] Fps is (10 sec: 47512.8, 60 sec: 45056.0, 300 sec: 44542.2). Total num frames: 810926080. Throughput: 0: 44645.8. Samples: 24127600. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 16:59:43,657][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 16:59:43,669][27783] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000049495_810926080.pth... [2024-06-06 16:59:43,712][27783] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000048841_800210944.pth [2024-06-06 16:59:44,780][27803] Updated weights for policy 0, policy_version 49498 (0.0032) [2024-06-06 16:59:48,656][27571] Fps is (10 sec: 42599.4, 60 sec: 44510.0, 300 sec: 44597.8). Total num frames: 811122688. Throughput: 0: 44536.6. Samples: 24392340. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 16:59:48,656][27571] Avg episode reward: [(0, '0.311')] [2024-06-06 16:59:48,785][27803] Updated weights for policy 0, policy_version 49508 (0.0033) [2024-06-06 16:59:51,956][27803] Updated weights for policy 0, policy_version 49518 (0.0038) [2024-06-06 16:59:53,656][27571] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 811352064. Throughput: 0: 44579.9. Samples: 24523140. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 16:59:53,656][27571] Avg episode reward: [(0, '0.311')] [2024-06-06 16:59:55,028][27783] Signal inference workers to stop experience collection... (350 times) [2024-06-06 16:59:55,078][27803] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-06 16:59:55,085][27783] Signal inference workers to resume experience collection... (350 times) [2024-06-06 16:59:55,087][27803] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-06 16:59:56,289][27803] Updated weights for policy 0, policy_version 49528 (0.0031) [2024-06-06 16:59:58,656][27571] Fps is (10 sec: 45875.3, 60 sec: 44783.1, 300 sec: 44542.3). Total num frames: 811581440. Throughput: 0: 44780.5. Samples: 24800620. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 16:59:58,656][27571] Avg episode reward: [(0, '0.305')] [2024-06-06 16:59:58,954][27803] Updated weights for policy 0, policy_version 49538 (0.0029) [2024-06-06 17:00:03,656][27571] Fps is (10 sec: 42598.6, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 811778048. Throughput: 0: 44699.3. Samples: 25068840. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 17:00:03,656][27571] Avg episode reward: [(0, '0.304')] [2024-06-06 17:00:03,780][27803] Updated weights for policy 0, policy_version 49548 (0.0036) [2024-06-06 17:00:06,526][27803] Updated weights for policy 0, policy_version 49558 (0.0025) [2024-06-06 17:00:08,656][27571] Fps is (10 sec: 44236.8, 60 sec: 44510.2, 300 sec: 44597.8). Total num frames: 812023808. Throughput: 0: 44529.0. Samples: 25194500. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 17:00:08,656][27571] Avg episode reward: [(0, '0.314')] [2024-06-06 17:00:10,856][27803] Updated weights for policy 0, policy_version 49568 (0.0033) [2024-06-06 17:00:13,656][27571] Fps is (10 sec: 49152.0, 60 sec: 45056.0, 300 sec: 44597.8). Total num frames: 812269568. Throughput: 0: 44672.9. Samples: 25465120. Policy #0 lag: (min: 0.0, avg: 11.3, max: 23.0) [2024-06-06 17:00:13,656][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 17:00:14,127][27803] Updated weights for policy 0, policy_version 49578 (0.0043) [2024-06-06 17:00:17,947][27803] Updated weights for policy 0, policy_version 49588 (0.0041) [2024-06-06 17:00:18,656][27571] Fps is (10 sec: 42598.0, 60 sec: 44237.6, 300 sec: 44542.3). Total num frames: 812449792. Throughput: 0: 44532.8. Samples: 25727960. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 17:00:18,656][27571] Avg episode reward: [(0, '0.311')] [2024-06-06 17:00:21,230][27803] Updated weights for policy 0, policy_version 49598 (0.0026) [2024-06-06 17:00:23,656][27571] Fps is (10 sec: 42598.1, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 812695552. Throughput: 0: 44563.2. Samples: 25859700. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 17:00:23,660][27571] Avg episode reward: [(0, '0.316')] [2024-06-06 17:00:25,648][27803] Updated weights for policy 0, policy_version 49608 (0.0029) [2024-06-06 17:00:28,528][27803] Updated weights for policy 0, policy_version 49618 (0.0035) [2024-06-06 17:00:28,656][27571] Fps is (10 sec: 49151.7, 60 sec: 45055.9, 300 sec: 44597.8). Total num frames: 812941312. Throughput: 0: 44502.7. Samples: 26130220. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 17:00:28,660][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 17:00:33,281][27803] Updated weights for policy 0, policy_version 49628 (0.0037) [2024-06-06 17:00:33,656][27571] Fps is (10 sec: 40959.9, 60 sec: 44236.7, 300 sec: 44486.7). Total num frames: 813105152. Throughput: 0: 44573.2. Samples: 26398140. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 17:00:33,657][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 17:00:36,232][27803] Updated weights for policy 0, policy_version 49638 (0.0026) [2024-06-06 17:00:38,656][27571] Fps is (10 sec: 42598.4, 60 sec: 44510.0, 300 sec: 44653.3). Total num frames: 813367296. Throughput: 0: 44503.1. Samples: 26525780. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 17:00:38,656][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 17:00:40,391][27803] Updated weights for policy 0, policy_version 49648 (0.0031) [2024-06-06 17:00:43,471][27803] Updated weights for policy 0, policy_version 49658 (0.0036) [2024-06-06 17:00:43,656][27571] Fps is (10 sec: 49152.0, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 813596672. Throughput: 0: 44325.6. Samples: 26795280. Policy #0 lag: (min: 0.0, avg: 9.5, max: 21.0) [2024-06-06 17:00:43,657][27571] Avg episode reward: [(0, '0.315')] [2024-06-06 17:00:47,809][27803] Updated weights for policy 0, policy_version 49668 (0.0048) [2024-06-06 17:00:48,656][27571] Fps is (10 sec: 40959.5, 60 sec: 44236.6, 300 sec: 44431.5). Total num frames: 813776896. Throughput: 0: 44221.1. Samples: 27058800. Policy #0 lag: (min: 0.0, avg: 12.1, max: 23.0) [2024-06-06 17:00:48,665][27571] Avg episode reward: [(0, '0.316')] [2024-06-06 17:00:50,742][27803] Updated weights for policy 0, policy_version 49678 (0.0037) [2024-06-06 17:00:53,656][27571] Fps is (10 sec: 40960.1, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 814006272. Throughput: 0: 44295.0. Samples: 27187780. Policy #0 lag: (min: 0.0, avg: 12.1, max: 23.0) [2024-06-06 17:00:53,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 17:00:54,887][27803] Updated weights for policy 0, policy_version 49688 (0.0032) [2024-06-06 17:00:57,406][27783] Signal inference workers to stop experience collection... (400 times) [2024-06-06 17:00:57,428][27803] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-06 17:00:57,462][27783] Signal inference workers to resume experience collection... (400 times) [2024-06-06 17:00:57,464][27803] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-06 17:00:57,987][27803] Updated weights for policy 0, policy_version 49698 (0.0033) [2024-06-06 17:00:58,656][27571] Fps is (10 sec: 49151.7, 60 sec: 44782.7, 300 sec: 44542.2). Total num frames: 814268416. Throughput: 0: 44337.1. Samples: 27460300. Policy #0 lag: (min: 0.0, avg: 12.1, max: 23.0) [2024-06-06 17:00:58,657][27571] Avg episode reward: [(0, '0.310')] [2024-06-06 17:01:02,279][27803] Updated weights for policy 0, policy_version 49708 (0.0030) [2024-06-06 17:01:03,656][27571] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 44375.7). Total num frames: 814432256. Throughput: 0: 44499.6. Samples: 27730440. Policy #0 lag: (min: 0.0, avg: 12.1, max: 23.0) [2024-06-06 17:01:03,656][27571] Avg episode reward: [(0, '0.314')] [2024-06-06 17:01:05,707][27803] Updated weights for policy 0, policy_version 49718 (0.0023) [2024-06-06 17:01:08,656][27571] Fps is (10 sec: 40961.2, 60 sec: 44236.8, 300 sec: 44486.7). Total num frames: 814678016. Throughput: 0: 44390.3. Samples: 27857260. Policy #0 lag: (min: 0.0, avg: 12.1, max: 23.0) [2024-06-06 17:01:08,656][27571] Avg episode reward: [(0, '0.310')] [2024-06-06 17:01:09,287][27803] Updated weights for policy 0, policy_version 49728 (0.0025) [2024-06-06 17:01:13,149][27803] Updated weights for policy 0, policy_version 49738 (0.0047) [2024-06-06 17:01:13,660][27571] Fps is (10 sec: 47494.1, 60 sec: 43960.7, 300 sec: 44430.6). Total num frames: 814907392. Throughput: 0: 44429.8. Samples: 28129740. Policy #0 lag: (min: 0.0, avg: 12.1, max: 23.0) [2024-06-06 17:01:13,661][27571] Avg episode reward: [(0, '0.318')] [2024-06-06 17:01:16,804][27803] Updated weights for policy 0, policy_version 49748 (0.0037) [2024-06-06 17:01:18,656][27571] Fps is (10 sec: 44236.4, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 815120384. Throughput: 0: 44460.9. Samples: 28398880. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 17:01:18,657][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 17:01:20,221][27803] Updated weights for policy 0, policy_version 49758 (0.0032) [2024-06-06 17:01:23,656][27571] Fps is (10 sec: 45894.3, 60 sec: 44510.0, 300 sec: 44597.8). Total num frames: 815366144. Throughput: 0: 44501.9. Samples: 28528360. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 17:01:23,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 17:01:24,530][27803] Updated weights for policy 0, policy_version 49768 (0.0033) [2024-06-06 17:01:27,702][27803] Updated weights for policy 0, policy_version 49778 (0.0036) [2024-06-06 17:01:28,656][27571] Fps is (10 sec: 45874.9, 60 sec: 43963.7, 300 sec: 44486.7). Total num frames: 815579136. Throughput: 0: 44380.9. Samples: 28792420. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 17:01:28,657][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 17:01:31,722][27803] Updated weights for policy 0, policy_version 49788 (0.0033) [2024-06-06 17:01:33,656][27571] Fps is (10 sec: 44236.6, 60 sec: 45056.1, 300 sec: 44542.3). Total num frames: 815808512. Throughput: 0: 44566.9. Samples: 29064300. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 17:01:33,656][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 17:01:35,275][27803] Updated weights for policy 0, policy_version 49798 (0.0037) [2024-06-06 17:01:38,656][27571] Fps is (10 sec: 45874.8, 60 sec: 44509.8, 300 sec: 44597.8). Total num frames: 816037888. Throughput: 0: 44629.7. Samples: 29196120. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 17:01:38,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 17:01:38,769][27803] Updated weights for policy 0, policy_version 49808 (0.0027) [2024-06-06 17:01:42,692][27803] Updated weights for policy 0, policy_version 49818 (0.0044) [2024-06-06 17:01:43,656][27571] Fps is (10 sec: 42597.9, 60 sec: 43963.7, 300 sec: 44486.7). Total num frames: 816234496. Throughput: 0: 44622.8. Samples: 29468320. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 17:01:43,657][27571] Avg episode reward: [(0, '0.314')] [2024-06-06 17:01:43,784][27783] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000049820_816250880.pth... [2024-06-06 17:01:43,832][27783] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000049168_805568512.pth [2024-06-06 17:01:46,463][27803] Updated weights for policy 0, policy_version 49828 (0.0027) [2024-06-06 17:01:48,656][27571] Fps is (10 sec: 42598.9, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 816463872. Throughput: 0: 44613.7. Samples: 29738060. Policy #0 lag: (min: 0.0, avg: 8.5, max: 20.0) [2024-06-06 17:01:48,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 17:01:49,872][27803] Updated weights for policy 0, policy_version 49838 (0.0030) [2024-06-06 17:01:53,656][27571] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 44597.8). Total num frames: 816693248. Throughput: 0: 44576.7. Samples: 29863220. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 17:01:53,657][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 17:01:54,156][27803] Updated weights for policy 0, policy_version 49848 (0.0043) [2024-06-06 17:01:57,391][27803] Updated weights for policy 0, policy_version 49858 (0.0026) [2024-06-06 17:01:58,656][27571] Fps is (10 sec: 45875.2, 60 sec: 44236.9, 300 sec: 44542.3). Total num frames: 816922624. Throughput: 0: 44514.6. Samples: 30132720. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 17:01:58,657][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 17:02:01,366][27803] Updated weights for policy 0, policy_version 49868 (0.0034) [2024-06-06 17:02:03,656][27571] Fps is (10 sec: 44236.9, 60 sec: 45055.9, 300 sec: 44486.7). Total num frames: 817135616. Throughput: 0: 44476.8. Samples: 30400340. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 17:02:03,657][27571] Avg episode reward: [(0, '0.315')] [2024-06-06 17:02:04,944][27803] Updated weights for policy 0, policy_version 49878 (0.0033) [2024-06-06 17:02:08,656][27571] Fps is (10 sec: 42598.5, 60 sec: 44509.8, 300 sec: 44542.3). Total num frames: 817348608. Throughput: 0: 44490.5. Samples: 30530440. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 17:02:08,656][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 17:02:08,750][27803] Updated weights for policy 0, policy_version 49888 (0.0024) [2024-06-06 17:02:12,131][27803] Updated weights for policy 0, policy_version 49898 (0.0030) [2024-06-06 17:02:13,656][27571] Fps is (10 sec: 45874.9, 60 sec: 44785.9, 300 sec: 44597.8). Total num frames: 817594368. Throughput: 0: 44510.2. Samples: 30795380. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 17:02:13,656][27571] Avg episode reward: [(0, '0.301')] [2024-06-06 17:02:16,253][27803] Updated weights for policy 0, policy_version 49908 (0.0033) [2024-06-06 17:02:18,656][27571] Fps is (10 sec: 44236.9, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 817790976. Throughput: 0: 44550.2. Samples: 31069060. Policy #0 lag: (min: 0.0, avg: 9.9, max: 22.0) [2024-06-06 17:02:18,656][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 17:02:19,394][27803] Updated weights for policy 0, policy_version 49918 (0.0033) [2024-06-06 17:02:21,446][27783] Signal inference workers to stop experience collection... (450 times) [2024-06-06 17:02:21,446][27783] Signal inference workers to resume experience collection... (450 times) [2024-06-06 17:02:21,476][27803] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-06 17:02:21,476][27803] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-06 17:02:23,656][27571] Fps is (10 sec: 40961.0, 60 sec: 43963.7, 300 sec: 44375.7). Total num frames: 818003968. Throughput: 0: 44488.3. Samples: 31198080. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 17:02:23,656][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 17:02:23,789][27803] Updated weights for policy 0, policy_version 49928 (0.0028) [2024-06-06 17:02:27,020][27803] Updated weights for policy 0, policy_version 49938 (0.0031) [2024-06-06 17:02:28,656][27571] Fps is (10 sec: 47513.9, 60 sec: 44783.0, 300 sec: 44597.8). Total num frames: 818266112. Throughput: 0: 44377.9. Samples: 31465320. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 17:02:28,656][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 17:02:30,953][27803] Updated weights for policy 0, policy_version 49948 (0.0036) [2024-06-06 17:02:33,656][27571] Fps is (10 sec: 44236.2, 60 sec: 43963.7, 300 sec: 44376.3). Total num frames: 818446336. Throughput: 0: 44340.9. Samples: 31733400. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 17:02:33,657][27571] Avg episode reward: [(0, '0.301')] [2024-06-06 17:02:34,453][27803] Updated weights for policy 0, policy_version 49958 (0.0030) [2024-06-06 17:02:38,011][27803] Updated weights for policy 0, policy_version 49968 (0.0033) [2024-06-06 17:02:38,656][27571] Fps is (10 sec: 42597.9, 60 sec: 44236.9, 300 sec: 44542.2). Total num frames: 818692096. Throughput: 0: 44522.2. Samples: 31866720. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 17:02:38,657][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 17:02:41,593][27803] Updated weights for policy 0, policy_version 49978 (0.0036) [2024-06-06 17:02:43,656][27571] Fps is (10 sec: 49151.7, 60 sec: 45056.0, 300 sec: 44597.8). Total num frames: 818937856. Throughput: 0: 44407.1. Samples: 32131040. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 17:02:43,657][27571] Avg episode reward: [(0, '0.305')] [2024-06-06 17:02:45,563][27803] Updated weights for policy 0, policy_version 49988 (0.0045) [2024-06-06 17:02:48,656][27571] Fps is (10 sec: 44236.7, 60 sec: 44509.8, 300 sec: 44431.8). Total num frames: 819134464. Throughput: 0: 44431.1. Samples: 32399740. Policy #0 lag: (min: 0.0, avg: 11.6, max: 21.0) [2024-06-06 17:02:48,656][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 17:02:49,095][27803] Updated weights for policy 0, policy_version 49998 (0.0026) [2024-06-06 17:02:53,072][27803] Updated weights for policy 0, policy_version 50008 (0.0032) [2024-06-06 17:02:53,656][27571] Fps is (10 sec: 39321.6, 60 sec: 43963.7, 300 sec: 44375.8). Total num frames: 819331072. Throughput: 0: 44357.7. Samples: 32526540. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 17:02:53,656][27571] Avg episode reward: [(0, '0.311')] [2024-06-06 17:02:56,506][27803] Updated weights for policy 0, policy_version 50018 (0.0033) [2024-06-06 17:02:58,656][27571] Fps is (10 sec: 45876.1, 60 sec: 44510.0, 300 sec: 44597.8). Total num frames: 819593216. Throughput: 0: 44411.8. Samples: 32793900. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 17:02:58,656][27571] Avg episode reward: [(0, '0.314')] [2024-06-06 17:03:00,465][27803] Updated weights for policy 0, policy_version 50028 (0.0032) [2024-06-06 17:03:03,656][27571] Fps is (10 sec: 45875.9, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 819789824. Throughput: 0: 44392.1. Samples: 33066700. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 17:03:03,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 17:03:04,047][27803] Updated weights for policy 0, policy_version 50038 (0.0034) [2024-06-06 17:03:07,703][27803] Updated weights for policy 0, policy_version 50048 (0.0022) [2024-06-06 17:03:08,656][27571] Fps is (10 sec: 40959.7, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 820002816. Throughput: 0: 44394.6. Samples: 33195840. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 17:03:08,656][27571] Avg episode reward: [(0, '0.305')] [2024-06-06 17:03:11,185][27803] Updated weights for policy 0, policy_version 50058 (0.0028) [2024-06-06 17:03:13,656][27571] Fps is (10 sec: 47513.4, 60 sec: 44510.0, 300 sec: 44653.4). Total num frames: 820264960. Throughput: 0: 44355.1. Samples: 33461300. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 17:03:13,656][27571] Avg episode reward: [(0, '0.305')] [2024-06-06 17:03:14,847][27803] Updated weights for policy 0, policy_version 50068 (0.0029) [2024-06-06 17:03:18,605][27803] Updated weights for policy 0, policy_version 50078 (0.0020) [2024-06-06 17:03:18,656][27571] Fps is (10 sec: 47513.6, 60 sec: 44783.0, 300 sec: 44486.7). Total num frames: 820477952. Throughput: 0: 44431.2. Samples: 33732800. Policy #0 lag: (min: 0.0, avg: 11.0, max: 22.0) [2024-06-06 17:03:18,656][27571] Avg episode reward: [(0, '0.301')] [2024-06-06 17:03:22,103][27803] Updated weights for policy 0, policy_version 50088 (0.0030) [2024-06-06 17:03:23,656][27571] Fps is (10 sec: 40960.1, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 820674560. Throughput: 0: 44482.3. Samples: 33868420. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 17:03:23,656][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 17:03:25,964][27803] Updated weights for policy 0, policy_version 50098 (0.0030) [2024-06-06 17:03:28,660][27571] Fps is (10 sec: 44218.7, 60 sec: 44233.8, 300 sec: 44597.2). Total num frames: 820920320. Throughput: 0: 44450.3. Samples: 34131480. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 17:03:28,661][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 17:03:29,830][27803] Updated weights for policy 0, policy_version 50108 (0.0029) [2024-06-06 17:03:31,565][27783] Signal inference workers to stop experience collection... (500 times) [2024-06-06 17:03:31,612][27783] Signal inference workers to resume experience collection... (500 times) [2024-06-06 17:03:31,613][27803] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-06 17:03:31,639][27803] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-06 17:03:33,573][27803] Updated weights for policy 0, policy_version 50118 (0.0034) [2024-06-06 17:03:33,656][27571] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 44431.2). Total num frames: 821133312. Throughput: 0: 44595.3. Samples: 34406520. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 17:03:33,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 17:03:37,038][27803] Updated weights for policy 0, policy_version 50128 (0.0026) [2024-06-06 17:03:38,660][27571] Fps is (10 sec: 39321.2, 60 sec: 43687.7, 300 sec: 44375.0). Total num frames: 821313536. Throughput: 0: 44612.9. Samples: 34534300. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 17:03:38,661][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 17:03:40,601][27803] Updated weights for policy 0, policy_version 50138 (0.0027) [2024-06-06 17:03:43,656][27571] Fps is (10 sec: 45874.9, 60 sec: 44236.9, 300 sec: 44542.3). Total num frames: 821592064. Throughput: 0: 44463.5. Samples: 34794760. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 17:03:43,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 17:03:43,734][27783] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000050147_821608448.pth... [2024-06-06 17:03:43,785][27783] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000049495_810926080.pth [2024-06-06 17:03:44,666][27803] Updated weights for policy 0, policy_version 50148 (0.0027) [2024-06-06 17:03:48,182][27803] Updated weights for policy 0, policy_version 50158 (0.0031) [2024-06-06 17:03:48,656][27571] Fps is (10 sec: 49172.8, 60 sec: 44510.0, 300 sec: 44431.2). Total num frames: 821805056. Throughput: 0: 44334.2. Samples: 35061740. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 17:03:48,656][27571] Avg episode reward: [(0, '0.302')] [2024-06-06 17:03:51,744][27803] Updated weights for policy 0, policy_version 50168 (0.0031) [2024-06-06 17:03:53,656][27571] Fps is (10 sec: 40960.2, 60 sec: 44510.0, 300 sec: 44431.2). Total num frames: 822001664. Throughput: 0: 44408.0. Samples: 35194200. Policy #0 lag: (min: 0.0, avg: 8.7, max: 20.0) [2024-06-06 17:03:53,656][27571] Avg episode reward: [(0, '0.304')] [2024-06-06 17:03:55,481][27803] Updated weights for policy 0, policy_version 50178 (0.0041) [2024-06-06 17:03:58,656][27571] Fps is (10 sec: 44236.5, 60 sec: 44236.7, 300 sec: 44542.3). Total num frames: 822247424. Throughput: 0: 44328.4. Samples: 35456080. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 17:03:58,658][27571] Avg episode reward: [(0, '0.302')] [2024-06-06 17:03:59,259][27803] Updated weights for policy 0, policy_version 50188 (0.0035) [2024-06-06 17:04:03,079][27803] Updated weights for policy 0, policy_version 50198 (0.0028) [2024-06-06 17:04:03,656][27571] Fps is (10 sec: 47513.4, 60 sec: 44782.9, 300 sec: 44486.8). Total num frames: 822476800. Throughput: 0: 44474.6. Samples: 35734160. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 17:04:03,656][27571] Avg episode reward: [(0, '0.298')] [2024-06-06 17:04:06,765][27803] Updated weights for policy 0, policy_version 50208 (0.0024) [2024-06-06 17:04:08,656][27571] Fps is (10 sec: 40960.2, 60 sec: 44236.8, 300 sec: 44375.6). Total num frames: 822657024. Throughput: 0: 44370.2. Samples: 35865080. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 17:04:08,656][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 17:04:10,371][27803] Updated weights for policy 0, policy_version 50218 (0.0043) [2024-06-06 17:04:13,656][27571] Fps is (10 sec: 44237.0, 60 sec: 44236.8, 300 sec: 44486.9). Total num frames: 822919168. Throughput: 0: 44311.2. Samples: 36125300. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 17:04:13,656][27571] Avg episode reward: [(0, '0.310')] [2024-06-06 17:04:14,448][27803] Updated weights for policy 0, policy_version 50228 (0.0024) [2024-06-06 17:04:17,667][27803] Updated weights for policy 0, policy_version 50238 (0.0025) [2024-06-06 17:04:18,656][27571] Fps is (10 sec: 49152.4, 60 sec: 44509.9, 300 sec: 44542.3). Total num frames: 823148544. Throughput: 0: 44264.9. Samples: 36398440. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 17:04:18,656][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 17:04:21,464][27803] Updated weights for policy 0, policy_version 50248 (0.0033) [2024-06-06 17:04:23,656][27571] Fps is (10 sec: 40959.6, 60 sec: 44236.7, 300 sec: 44375.6). Total num frames: 823328768. Throughput: 0: 44355.2. Samples: 36530100. Policy #0 lag: (min: 0.0, avg: 9.1, max: 21.0) [2024-06-06 17:04:23,657][27571] Avg episode reward: [(0, '0.311')] [2024-06-06 17:04:24,918][27803] Updated weights for policy 0, policy_version 50258 (0.0038) [2024-06-06 17:04:27,664][27783] Signal inference workers to stop experience collection... (550 times) [2024-06-06 17:04:27,665][27783] Signal inference workers to resume experience collection... (550 times) [2024-06-06 17:04:27,680][27803] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-06 17:04:27,680][27803] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-06 17:04:28,660][27571] Fps is (10 sec: 42580.5, 60 sec: 44236.8, 300 sec: 44486.1). Total num frames: 823574528. Throughput: 0: 44345.7. Samples: 36790500. Policy #0 lag: (min: 0.0, avg: 12.0, max: 24.0) [2024-06-06 17:04:28,661][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 17:04:29,042][27803] Updated weights for policy 0, policy_version 50268 (0.0028) [2024-06-06 17:04:32,703][27803] Updated weights for policy 0, policy_version 50278 (0.0035) [2024-06-06 17:04:33,656][27571] Fps is (10 sec: 47514.7, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 823803904. Throughput: 0: 44379.2. Samples: 37058800. Policy #0 lag: (min: 0.0, avg: 12.0, max: 24.0) [2024-06-06 17:04:33,656][27571] Avg episode reward: [(0, '0.310')] [2024-06-06 17:04:36,413][27803] Updated weights for policy 0, policy_version 50288 (0.0039) [2024-06-06 17:04:38,656][27571] Fps is (10 sec: 42615.8, 60 sec: 44786.0, 300 sec: 44320.1). Total num frames: 824000512. Throughput: 0: 44368.4. Samples: 37190780. Policy #0 lag: (min: 0.0, avg: 12.0, max: 24.0) [2024-06-06 17:04:38,657][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 17:04:40,052][27803] Updated weights for policy 0, policy_version 50298 (0.0047) [2024-06-06 17:04:43,656][27571] Fps is (10 sec: 42597.0, 60 sec: 43963.6, 300 sec: 44431.2). Total num frames: 824229888. Throughput: 0: 44438.1. Samples: 37455800. Policy #0 lag: (min: 0.0, avg: 12.0, max: 24.0) [2024-06-06 17:04:43,657][27571] Avg episode reward: [(0, '0.314')] [2024-06-06 17:04:43,890][27803] Updated weights for policy 0, policy_version 50308 (0.0033) [2024-06-06 17:04:47,380][27803] Updated weights for policy 0, policy_version 50318 (0.0022) [2024-06-06 17:04:48,656][27571] Fps is (10 sec: 49152.0, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 824492032. Throughput: 0: 44124.0. Samples: 37719740. Policy #0 lag: (min: 0.0, avg: 12.0, max: 24.0) [2024-06-06 17:04:48,656][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 17:04:50,961][27803] Updated weights for policy 0, policy_version 50328 (0.0030) [2024-06-06 17:04:53,660][27571] Fps is (10 sec: 44219.6, 60 sec: 44506.8, 300 sec: 44375.0). Total num frames: 824672256. Throughput: 0: 44211.6. Samples: 37854780. Policy #0 lag: (min: 0.0, avg: 12.0, max: 24.0) [2024-06-06 17:04:53,660][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 17:04:54,716][27803] Updated weights for policy 0, policy_version 50338 (0.0026) [2024-06-06 17:04:58,451][27803] Updated weights for policy 0, policy_version 50348 (0.0025) [2024-06-06 17:04:58,656][27571] Fps is (10 sec: 40959.6, 60 sec: 44236.7, 300 sec: 44486.7). Total num frames: 824901632. Throughput: 0: 44382.5. Samples: 38122520. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 17:04:58,657][27571] Avg episode reward: [(0, '0.302')] [2024-06-06 17:05:01,955][27803] Updated weights for policy 0, policy_version 50358 (0.0035) [2024-06-06 17:05:03,656][27571] Fps is (10 sec: 49171.5, 60 sec: 44782.9, 300 sec: 44542.2). Total num frames: 825163776. Throughput: 0: 44352.7. Samples: 38394320. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 17:05:03,660][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 17:05:05,435][27803] Updated weights for policy 0, policy_version 50368 (0.0036) [2024-06-06 17:05:08,656][27571] Fps is (10 sec: 44237.4, 60 sec: 44782.9, 300 sec: 44320.1). Total num frames: 825344000. Throughput: 0: 44484.1. Samples: 38531880. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 17:05:08,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 17:05:09,480][27803] Updated weights for policy 0, policy_version 50378 (0.0043) [2024-06-06 17:05:12,947][27803] Updated weights for policy 0, policy_version 50388 (0.0044) [2024-06-06 17:05:13,656][27571] Fps is (10 sec: 40959.8, 60 sec: 44236.7, 300 sec: 44486.7). Total num frames: 825573376. Throughput: 0: 44601.3. Samples: 38797380. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 17:05:13,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 17:05:16,735][27803] Updated weights for policy 0, policy_version 50398 (0.0028) [2024-06-06 17:05:18,656][27571] Fps is (10 sec: 47513.6, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 825819136. Throughput: 0: 44501.2. Samples: 39061360. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 17:05:18,656][27571] Avg episode reward: [(0, '0.305')] [2024-06-06 17:05:20,619][27803] Updated weights for policy 0, policy_version 50408 (0.0039) [2024-06-06 17:05:23,656][27571] Fps is (10 sec: 44237.9, 60 sec: 44783.1, 300 sec: 44320.1). Total num frames: 826015744. Throughput: 0: 44583.2. Samples: 39197020. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 17:05:23,656][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 17:05:23,870][27803] Updated weights for policy 0, policy_version 50418 (0.0022) [2024-06-06 17:05:27,864][27803] Updated weights for policy 0, policy_version 50428 (0.0031) [2024-06-06 17:05:28,656][27571] Fps is (10 sec: 42598.0, 60 sec: 44512.8, 300 sec: 44542.3). Total num frames: 826245120. Throughput: 0: 44615.6. Samples: 39463500. Policy #0 lag: (min: 0.0, avg: 10.5, max: 22.0) [2024-06-06 17:05:28,656][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 17:05:31,207][27803] Updated weights for policy 0, policy_version 50438 (0.0031) [2024-06-06 17:05:33,656][27571] Fps is (10 sec: 45874.8, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 826474496. Throughput: 0: 44614.7. Samples: 39727400. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 17:05:33,656][27571] Avg episode reward: [(0, '0.305')] [2024-06-06 17:05:34,955][27803] Updated weights for policy 0, policy_version 50448 (0.0027) [2024-06-06 17:05:38,660][27571] Fps is (10 sec: 44219.2, 60 sec: 44779.9, 300 sec: 44375.0). Total num frames: 826687488. Throughput: 0: 44738.2. Samples: 39868000. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 17:05:38,661][27571] Avg episode reward: [(0, '0.298')] [2024-06-06 17:05:38,778][27803] Updated weights for policy 0, policy_version 50458 (0.0036) [2024-06-06 17:05:38,941][27783] Signal inference workers to stop experience collection... (600 times) [2024-06-06 17:05:38,941][27783] Signal inference workers to resume experience collection... (600 times) [2024-06-06 17:05:38,980][27803] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-06 17:05:38,980][27803] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-06 17:05:42,521][27803] Updated weights for policy 0, policy_version 50468 (0.0034) [2024-06-06 17:05:43,656][27571] Fps is (10 sec: 44237.0, 60 sec: 44783.1, 300 sec: 44542.3). Total num frames: 826916864. Throughput: 0: 44730.9. Samples: 40135400. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 17:05:43,656][27571] Avg episode reward: [(0, '0.304')] [2024-06-06 17:05:43,687][27783] Saving /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000050471_826916864.pth... [2024-06-06 17:05:43,747][27783] Removing /workspace/metta/train_dir/p2.metta.4/checkpoint_p0/checkpoint_000049820_816250880.pth [2024-06-06 17:05:46,102][27803] Updated weights for policy 0, policy_version 50478 (0.0036) [2024-06-06 17:05:48,656][27571] Fps is (10 sec: 44255.1, 60 sec: 43963.8, 300 sec: 44486.7). Total num frames: 827129856. Throughput: 0: 44591.7. Samples: 40400940. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 17:05:48,656][27571] Avg episode reward: [(0, '0.306')] [2024-06-06 17:05:50,047][27803] Updated weights for policy 0, policy_version 50488 (0.0032) [2024-06-06 17:05:53,472][27803] Updated weights for policy 0, policy_version 50498 (0.0032) [2024-06-06 17:05:53,656][27571] Fps is (10 sec: 44236.9, 60 sec: 44786.0, 300 sec: 44375.7). Total num frames: 827359232. Throughput: 0: 44501.4. Samples: 40534440. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 17:05:53,656][27571] Avg episode reward: [(0, '0.305')] [2024-06-06 17:05:57,161][27803] Updated weights for policy 0, policy_version 50508 (0.0020) [2024-06-06 17:05:58,656][27571] Fps is (10 sec: 44236.5, 60 sec: 44510.0, 300 sec: 44542.3). Total num frames: 827572224. Throughput: 0: 44610.3. Samples: 40804840. Policy #0 lag: (min: 0.0, avg: 9.4, max: 22.0) [2024-06-06 17:05:58,656][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 17:06:00,643][27803] Updated weights for policy 0, policy_version 50518 (0.0029) [2024-06-06 17:06:03,656][27571] Fps is (10 sec: 45874.5, 60 sec: 44236.8, 300 sec: 44542.2). Total num frames: 827817984. Throughput: 0: 44669.7. Samples: 41071500. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 17:06:03,656][27571] Avg episode reward: [(0, '0.313')] [2024-06-06 17:06:04,253][27803] Updated weights for policy 0, policy_version 50528 (0.0029) [2024-06-06 17:06:08,018][27803] Updated weights for policy 0, policy_version 50538 (0.0048) [2024-06-06 17:06:08,656][27571] Fps is (10 sec: 47514.1, 60 sec: 45056.1, 300 sec: 44542.9). Total num frames: 828047360. Throughput: 0: 44726.2. Samples: 41209700. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 17:06:08,656][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 17:06:11,783][27803] Updated weights for policy 0, policy_version 50548 (0.0027) [2024-06-06 17:06:13,656][27571] Fps is (10 sec: 40960.6, 60 sec: 44236.9, 300 sec: 44431.2). Total num frames: 828227584. Throughput: 0: 44646.4. Samples: 41472580. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 17:06:13,656][27571] Avg episode reward: [(0, '0.309')] [2024-06-06 17:06:15,605][27803] Updated weights for policy 0, policy_version 50558 (0.0030) [2024-06-06 17:06:18,656][27571] Fps is (10 sec: 42597.4, 60 sec: 44236.7, 300 sec: 44431.2). Total num frames: 828473344. Throughput: 0: 44599.8. Samples: 41734400. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 17:06:18,656][27571] Avg episode reward: [(0, '0.308')] [2024-06-06 17:06:19,575][27803] Updated weights for policy 0, policy_version 50568 (0.0023) [2024-06-06 17:06:22,825][27803] Updated weights for policy 0, policy_version 50578 (0.0026) [2024-06-06 17:06:23,656][27571] Fps is (10 sec: 47513.2, 60 sec: 44782.8, 300 sec: 44486.7). Total num frames: 828702720. Throughput: 0: 44509.3. Samples: 41870740. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 17:06:23,656][27571] Avg episode reward: [(0, '0.315')] [2024-06-06 17:06:26,697][27803] Updated weights for policy 0, policy_version 50588 (0.0038) [2024-06-06 17:06:28,656][27571] Fps is (10 sec: 42599.7, 60 sec: 44237.0, 300 sec: 44375.7). Total num frames: 828899328. Throughput: 0: 44485.9. Samples: 42137260. Policy #0 lag: (min: 0.0, avg: 11.9, max: 22.0) [2024-06-06 17:06:28,656][27571] Avg episode reward: [(0, '0.316')] [2024-06-06 17:06:29,950][27803] Updated weights for policy 0, policy_version 50598 (0.0028) [2024-06-06 17:06:33,656][27571] Fps is (10 sec: 44237.4, 60 sec: 44509.9, 300 sec: 44431.2). Total num frames: 829145088. Throughput: 0: 44616.5. Samples: 42408680. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 17:06:33,656][27571] Avg episode reward: [(0, '0.310')] [2024-06-06 17:06:33,741][27803] Updated weights for policy 0, policy_version 50608 (0.0032) [2024-06-06 17:06:37,432][27803] Updated weights for policy 0, policy_version 50618 (0.0034) [2024-06-06 17:06:38,656][27571] Fps is (10 sec: 47512.8, 60 sec: 44786.0, 300 sec: 44542.3). Total num frames: 829374464. Throughput: 0: 44643.9. Samples: 42543420. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 17:06:38,656][27571] Avg episode reward: [(0, '0.321')] [2024-06-06 17:06:41,340][27803] Updated weights for policy 0, policy_version 50628 (0.0028) [2024-06-06 17:06:43,656][27571] Fps is (10 sec: 44235.9, 60 sec: 44509.8, 300 sec: 44486.7). Total num frames: 829587456. Throughput: 0: 44503.9. Samples: 42807520. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 17:06:43,656][27571] Avg episode reward: [(0, '0.320')] [2024-06-06 17:06:45,034][27803] Updated weights for policy 0, policy_version 50638 (0.0037) [2024-06-06 17:06:48,656][27571] Fps is (10 sec: 40960.1, 60 sec: 44236.8, 300 sec: 44375.7). Total num frames: 829784064. Throughput: 0: 44412.1. Samples: 43070040. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 17:06:48,660][27571] Avg episode reward: [(0, '0.325')] [2024-06-06 17:06:49,022][27803] Updated weights for policy 0, policy_version 50648 (0.0035) [2024-06-06 17:06:51,450][27783] Signal inference workers to stop experience collection... (650 times) [2024-06-06 17:06:51,451][27783] Signal inference workers to resume experience collection... (650 times) [2024-06-06 17:06:51,477][27803] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-06 17:06:51,477][27803] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-06 17:06:52,299][27803] Updated weights for policy 0, policy_version 50658 (0.0037) [2024-06-06 17:06:53,656][27571] Fps is (10 sec: 44236.9, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 830029824. Throughput: 0: 44274.5. Samples: 43202060. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 17:06:53,659][27571] Avg episode reward: [(0, '0.316')] [2024-06-06 17:06:56,066][27803] Updated weights for policy 0, policy_version 50668 (0.0042) [2024-06-06 17:06:58,656][27571] Fps is (10 sec: 44237.0, 60 sec: 44236.9, 300 sec: 44375.7). Total num frames: 830226432. Throughput: 0: 44328.9. Samples: 43467380. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 17:06:58,656][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 17:06:59,432][27803] Updated weights for policy 0, policy_version 50678 (0.0041) [2024-06-06 17:07:03,098][27803] Updated weights for policy 0, policy_version 50688 (0.0038) [2024-06-06 17:07:03,656][27571] Fps is (10 sec: 44237.4, 60 sec: 44236.9, 300 sec: 44486.7). Total num frames: 830472192. Throughput: 0: 44279.8. Samples: 43726980. Policy #0 lag: (min: 0.0, avg: 12.0, max: 23.0) [2024-06-06 17:07:03,656][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 17:07:06,851][27803] Updated weights for policy 0, policy_version 50698 (0.0042) [2024-06-06 17:07:08,656][27571] Fps is (10 sec: 47513.2, 60 sec: 44236.7, 300 sec: 44431.2). Total num frames: 830701568. Throughput: 0: 44417.3. Samples: 43869520. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 17:07:08,656][27571] Avg episode reward: [(0, '0.305')] [2024-06-06 17:07:11,179][27803] Updated weights for policy 0, policy_version 50708 (0.0025) [2024-06-06 17:07:13,656][27571] Fps is (10 sec: 44236.2, 60 sec: 44782.8, 300 sec: 44486.7). Total num frames: 830914560. Throughput: 0: 44346.0. Samples: 44132840. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 17:07:13,656][27571] Avg episode reward: [(0, '0.312')] [2024-06-06 17:07:14,381][27803] Updated weights for policy 0, policy_version 50718 (0.0041) [2024-06-06 17:07:18,656][27571] Fps is (10 sec: 40959.9, 60 sec: 43963.8, 300 sec: 44431.2). Total num frames: 831111168. Throughput: 0: 44279.9. Samples: 44401280. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 17:07:18,656][27571] Avg episode reward: [(0, '0.303')] [2024-06-06 17:07:18,757][27803] Updated weights for policy 0, policy_version 50728 (0.0031) [2024-06-06 17:07:21,771][27803] Updated weights for policy 0, policy_version 50738 (0.0034) [2024-06-06 17:07:23,656][27571] Fps is (10 sec: 45875.2, 60 sec: 44509.8, 300 sec: 44431.2). Total num frames: 831373312. Throughput: 0: 44169.3. Samples: 44531040. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 17:07:23,656][27571] Avg episode reward: [(0, '0.317')] [2024-06-06 17:07:25,956][27803] Updated weights for policy 0, policy_version 50748 (0.0031) [2024-06-06 17:07:28,656][27571] Fps is (10 sec: 47514.1, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 831586304. Throughput: 0: 44343.3. Samples: 44802960. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 17:07:28,656][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 17:07:28,874][27803] Updated weights for policy 0, policy_version 50758 (0.0025) [2024-06-06 17:07:33,389][27803] Updated weights for policy 0, policy_version 50768 (0.0037) [2024-06-06 17:07:33,656][27571] Fps is (10 sec: 40960.2, 60 sec: 43963.6, 300 sec: 44375.7). Total num frames: 831782912. Throughput: 0: 44501.3. Samples: 45072600. Policy #0 lag: (min: 0.0, avg: 8.6, max: 20.0) [2024-06-06 17:07:33,657][27571] Avg episode reward: [(0, '0.311')] [2024-06-06 17:07:36,195][27803] Updated weights for policy 0, policy_version 50778 (0.0029) [2024-06-06 17:07:38,656][27571] Fps is (10 sec: 44236.7, 60 sec: 44236.9, 300 sec: 44375.7). Total num frames: 832028672. Throughput: 0: 44339.7. Samples: 45197340. Policy #0 lag: (min: 1.0, avg: 12.0, max: 23.0) [2024-06-06 17:07:38,656][27571] Avg episode reward: [(0, '0.307')] [2024-06-06 17:07:40,851][27803] Updated weights for policy 0, policy_version 50788 (0.0031)